Files
hakmem/docs/archive/PHASE_6.6_SUMMARY.md
Moe Charm (CI) 52386401b3 Debug Counters Implementation - Clean History
Major Features:
- Debug counter infrastructure for Refill Stage tracking
- Free Pipeline counters (ss_local, ss_remote, tls_sll)
- Diagnostic counters for early return analysis
- Unified larson.sh benchmark runner with profiles
- Phase 6-3 regression analysis documentation

Bug Fixes:
- Fix SuperSlab disabled by default (HAKMEM_TINY_USE_SUPERSLAB)
- Fix profile variable naming consistency
- Add .gitignore patterns for large files

Performance:
- Phase 6-3: 4.79 M ops/s (has OOM risk)
- With SuperSlab: 3.13 M ops/s (+19% improvement)

This is a clean repository without large log files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-05 12:31:14 +09:00

209 lines
6.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Phase 6.6 Complete Summary
**Date**: 2025-10-21
**Status**: ✅ **COMPLETE**
---
## 🎯 Goal & Achievement
**Goal**: Fix ELO control flow bug that prevented batch madvise activation
**Result**: ✅ **Successfully fixed and verified** - Batch madvise now working correctly
---
## 🐛 Problem
After Phase 6.5 (Learning Lifecycle) integration:
- 2MB allocations were using `MALLOC` instead of `MMAP`
- BigCache eviction called `free()` instead of `hak_batch_add()`
- Batch madvise statistics showed **0 blocks batched** (completely inactive)
---
## 🔍 Root Cause (Diagnosed by Gemini Pro)
**Control flow ordering bug** in `hakmem.c:hak_alloc_at()`:
1. OLD policy decision (`infer_policy()`) executed FIRST → returned `POLICY_DEFAULT`
2. Allocation happened using old policy → `alloc_malloc()` called
3. ELO strategy selection executed TOO LATE → results completely ignored
4. ELO results only used for BigCache eligibility, not allocation method
**Key insight**: "The right answer computed at the wrong time is the wrong answer"
---
## ✅ Fix Applied
**Modified**: `hakmem.c` (lines 645-720)
**Before** (WRONG):
```c
void* hak_alloc_at(size_t size, ...) {
// 1. Old policy (WRONG!)
policy = POLICY_DEFAULT;
// 2. Allocate (TOO EARLY!)
ptr = allocate_with_policy(size, policy); // Uses malloc
// 3. ELO selection (TOO LATE!)
strategy_id = hak_elo_select_strategy(); // Result not used!
threshold = hak_elo_get_threshold(strategy_id);
}
```
**After** (CORRECT):
```c
void* hak_alloc_at(size_t size, ...) {
// 1. ELO selection FIRST!
strategy_id = hak_elo_select_strategy();
threshold = hak_elo_get_threshold(strategy_id);
// 2. BigCache check
if (hak_bigcache_try_get(...)) return cached_ptr;
// 3. Use ELO threshold to decide malloc vs mmap
ptr = (size >= threshold) ? alloc_mmap(size) : alloc_malloc(size);
}
```
**Result**: 2MB allocations now correctly use `mmap`, enabling batch madvise.
---
## 📊 Benchmark Results
**Configuration**: `bench_runner.sh --warmup 2 --runs 10` (200 total runs)
### VM Scenario (2MB allocations)
| Allocator | Median (ns) | vs Phase 6.4 | vs mimalloc |
|-----------|-------------|--------------|-------------|
| mimalloc | 19,964 | +12.6% | baseline |
| jemalloc | 26,241 | -3.0% | +31.4% |
| **hakmem-evolving** | **37,602** | **+2.6%** | **+88.3%** |
| hakmem-baseline | 40,282 | +9.1% | +101.7% |
| system | 59,995 | -4.4% | +200.4% |
### Analysis
1.**No regression**: +2.6% difference vs Phase 6.4 is within measurement variance
2.**ELO working**: hakmem-evolving beats hakmem-baseline
3.**Batch madvise active**: Verified with debug logging
4. ⚠️ **Overhead gap**: Still 2× slower than mimalloc → Phase 6.7 investigation
**Note**: README.md claimed "16,125 ns" for Phase 6.4, but FINAL_RESULTS.md shows 36,647 ns (the correct baseline for comparison).
---
## 🧪 Verification
### Batch Madvise Activation Confirmed
```
[DEBUG] BigCache eviction: method=1 (MMAP), size=2097152 ✅
[DEBUG] Calling hak_batch_add(raw=0x..., size=2097152) ✅
Batch Statistics:
Total blocks added: 1 ✅
Flush operations: 1 ✅
Total bytes flushed: 2097152 ✅
```
---
## 🎓 Lessons Learned
### Design Mistakes
1. **Control flow ordering**: Strategy selection must happen BEFORE usage
2. **Dead code accumulation**: Old `infer_policy()` logic left behind
3. **Silent failures**: ELO results computed but not used
### Detection Challenges
1. **High-level symptoms**: "Batch not activating" didn't point to control flow
2. **Required detailed tracing**: Had to add debug logging to discover MALLOC usage
3. **Multi-layer architecture**: Problem spanned ELO, allocation, BigCache, batch
### AI Collaboration Success
- **Gemini Pro**: Root cause diagnosis from logs + code analysis
- **Claude**: Applied fix, tested, documented
- **Synergy**: Gemini saw the forest (control flow), Claude fixed the trees (code)
---
## 📝 Bonus Findings
### BigCache Size Check Bug (Already Fixed)
Gemini Task 5cfad9 diagnosed a heap-buffer-overflow bug:
- **Problem**: BigCache returning undersized blocks without `actual_bytes >= requested_bytes` check
- **Impact**: cold-churn benchmark (varying sizes) triggers buffer overflow
- **Status**: ✅ **Already fixed** in previous session
- **Code**: `hakmem_bigcache.c:151` has size check with "Segfault fix!" comment
---
## 🚀 Next Steps (Phase 6.7)
### 1. Overhead Analysis
**Goal**: Identify why hakmem is 2× slower than mimalloc
**Candidates** (from OVERHEAD_ANALYSIS_PLAN.md):
- P0: BigCache lookup (~50-100 ns)
- P0: ELO strategy selection (~100-200 ns)
- P1: mmap/munmap syscalls (~1,000-5,000 ns) ← **Main suspect**
- P1: Page faults (~100-500 ns per page)
**Strategy**:
1. Feature isolation testing (environment variables)
2. `perf` profiling (hotspot identification)
3. `strace` syscall counting
### 2. Optimization Ideas
1. **FROZEN mode by default** (after learning) → -5% overhead
2. **BigCache direct indexing** (instead of linear search) → -5% overhead
3. **Pre-allocated arena** (Phase 7+) → -50% overhead target
**Realistic goal**: Reduce gap from +88% to +40% (Phase 7), then +20% (Phase 8)
**Limit**: Cannot beat mimalloc without slab allocator (industry standard, 10+ years optimization)
---
## 📁 Documentation Created
1. **PHASE_6.6_ELO_CONTROL_FLOW_FIX.md** (updated with benchmark results)
2. **OVERHEAD_ANALYSIS_PLAN.md** (Phase 6.7 preparation)
3. **PHASE_6.6_SUMMARY.md** (this file)
4. **GEMINI_BIGCACHE_ANALYSIS.md** (confirmed existing fix)
---
## 🏆 Final Status
**Phase 6.6**: ✅ **COMPLETE**
**Achievements**:
- ✅ ELO control flow bug fixed
- ✅ Batch madvise activation verified
- ✅ Performance parity with Phase 6.4 maintained (+2.6% variance)
- ✅ Comprehensive documentation created
- ✅ Phase 6.7 roadmap prepared
**Code quality**:
- Modified files: 1 (`hakmem.c`)
- Lines changed: ~75 lines (reordering + cleanup)
- Test coverage: VM scenario verified (200 runs)
**Time investment**: ~6 hours (diagnosis + fix + benchmarking + documentation)
---
**Ready for Phase 6.7: Overhead Analysis & Optimization** 🚀