Major Features: - Debug counter infrastructure for Refill Stage tracking - Free Pipeline counters (ss_local, ss_remote, tls_sll) - Diagnostic counters for early return analysis - Unified larson.sh benchmark runner with profiles - Phase 6-3 regression analysis documentation Bug Fixes: - Fix SuperSlab disabled by default (HAKMEM_TINY_USE_SUPERSLAB) - Fix profile variable naming consistency - Add .gitignore patterns for large files Performance: - Phase 6-3: 4.79 M ops/s (has OOM risk) - With SuperSlab: 3.13 M ops/s (+19% improvement) This is a clean repository without large log files. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
209 lines
6.1 KiB
Markdown
209 lines
6.1 KiB
Markdown
# Phase 6.6 Complete Summary
|
||
|
||
**Date**: 2025-10-21
|
||
**Status**: ✅ **COMPLETE**
|
||
|
||
---
|
||
|
||
## 🎯 Goal & Achievement
|
||
|
||
**Goal**: Fix ELO control flow bug that prevented batch madvise activation
|
||
**Result**: ✅ **Successfully fixed and verified** - Batch madvise now working correctly
|
||
|
||
---
|
||
|
||
## 🐛 Problem
|
||
|
||
After Phase 6.5 (Learning Lifecycle) integration:
|
||
- 2MB allocations were using `MALLOC` instead of `MMAP`
|
||
- BigCache eviction called `free()` instead of `hak_batch_add()`
|
||
- Batch madvise statistics showed **0 blocks batched** (completely inactive)
|
||
|
||
---
|
||
|
||
## 🔍 Root Cause (Diagnosed by Gemini Pro)
|
||
|
||
**Control flow ordering bug** in `hakmem.c:hak_alloc_at()`:
|
||
|
||
1. OLD policy decision (`infer_policy()`) executed FIRST → returned `POLICY_DEFAULT`
|
||
2. Allocation happened using old policy → `alloc_malloc()` called
|
||
3. ELO strategy selection executed TOO LATE → results completely ignored
|
||
4. ELO results only used for BigCache eligibility, not allocation method
|
||
|
||
**Key insight**: "The right answer computed at the wrong time is the wrong answer"
|
||
|
||
---
|
||
|
||
## ✅ Fix Applied
|
||
|
||
**Modified**: `hakmem.c` (lines 645-720)
|
||
|
||
**Before** (WRONG):
|
||
```c
|
||
void* hak_alloc_at(size_t size, ...) {
|
||
// 1. Old policy (WRONG!)
|
||
policy = POLICY_DEFAULT;
|
||
|
||
// 2. Allocate (TOO EARLY!)
|
||
ptr = allocate_with_policy(size, policy); // Uses malloc
|
||
|
||
// 3. ELO selection (TOO LATE!)
|
||
strategy_id = hak_elo_select_strategy(); // Result not used!
|
||
threshold = hak_elo_get_threshold(strategy_id);
|
||
}
|
||
```
|
||
|
||
**After** (CORRECT):
|
||
```c
|
||
void* hak_alloc_at(size_t size, ...) {
|
||
// 1. ELO selection FIRST!
|
||
strategy_id = hak_elo_select_strategy();
|
||
threshold = hak_elo_get_threshold(strategy_id);
|
||
|
||
// 2. BigCache check
|
||
if (hak_bigcache_try_get(...)) return cached_ptr;
|
||
|
||
// 3. Use ELO threshold to decide malloc vs mmap
|
||
ptr = (size >= threshold) ? alloc_mmap(size) : alloc_malloc(size);
|
||
}
|
||
```
|
||
|
||
**Result**: 2MB allocations now correctly use `mmap`, enabling batch madvise.
|
||
|
||
---
|
||
|
||
## 📊 Benchmark Results
|
||
|
||
**Configuration**: `bench_runner.sh --warmup 2 --runs 10` (200 total runs)
|
||
|
||
### VM Scenario (2MB allocations)
|
||
|
||
| Allocator | Median (ns) | vs Phase 6.4 | vs mimalloc |
|
||
|-----------|-------------|--------------|-------------|
|
||
| mimalloc | 19,964 | +12.6% | baseline |
|
||
| jemalloc | 26,241 | -3.0% | +31.4% |
|
||
| **hakmem-evolving** | **37,602** | **+2.6%** | **+88.3%** |
|
||
| hakmem-baseline | 40,282 | +9.1% | +101.7% |
|
||
| system | 59,995 | -4.4% | +200.4% |
|
||
|
||
### Analysis
|
||
|
||
1. ✅ **No regression**: +2.6% difference vs Phase 6.4 is within measurement variance
|
||
2. ✅ **ELO working**: hakmem-evolving beats hakmem-baseline
|
||
3. ✅ **Batch madvise active**: Verified with debug logging
|
||
4. ⚠️ **Overhead gap**: Still 2× slower than mimalloc → Phase 6.7 investigation
|
||
|
||
**Note**: README.md claimed "16,125 ns" for Phase 6.4, but FINAL_RESULTS.md shows 36,647 ns (the correct baseline for comparison).
|
||
|
||
---
|
||
|
||
## 🧪 Verification
|
||
|
||
### Batch Madvise Activation Confirmed
|
||
|
||
```
|
||
[DEBUG] BigCache eviction: method=1 (MMAP), size=2097152 ✅
|
||
[DEBUG] Calling hak_batch_add(raw=0x..., size=2097152) ✅
|
||
|
||
Batch Statistics:
|
||
Total blocks added: 1 ✅
|
||
Flush operations: 1 ✅
|
||
Total bytes flushed: 2097152 ✅
|
||
```
|
||
|
||
---
|
||
|
||
## 🎓 Lessons Learned
|
||
|
||
### Design Mistakes
|
||
|
||
1. **Control flow ordering**: Strategy selection must happen BEFORE usage
|
||
2. **Dead code accumulation**: Old `infer_policy()` logic left behind
|
||
3. **Silent failures**: ELO results computed but not used
|
||
|
||
### Detection Challenges
|
||
|
||
1. **High-level symptoms**: "Batch not activating" didn't point to control flow
|
||
2. **Required detailed tracing**: Had to add debug logging to discover MALLOC usage
|
||
3. **Multi-layer architecture**: Problem spanned ELO, allocation, BigCache, batch
|
||
|
||
### AI Collaboration Success
|
||
|
||
- **Gemini Pro**: Root cause diagnosis from logs + code analysis
|
||
- **Claude**: Applied fix, tested, documented
|
||
- **Synergy**: Gemini saw the forest (control flow), Claude fixed the trees (code)
|
||
|
||
---
|
||
|
||
## 📝 Bonus Findings
|
||
|
||
### BigCache Size Check Bug (Already Fixed)
|
||
|
||
Gemini Task 5cfad9 diagnosed a heap-buffer-overflow bug:
|
||
- **Problem**: BigCache returning undersized blocks without `actual_bytes >= requested_bytes` check
|
||
- **Impact**: cold-churn benchmark (varying sizes) triggers buffer overflow
|
||
- **Status**: ✅ **Already fixed** in previous session
|
||
- **Code**: `hakmem_bigcache.c:151` has size check with "Segfault fix!" comment
|
||
|
||
---
|
||
|
||
## 🚀 Next Steps (Phase 6.7)
|
||
|
||
### 1. Overhead Analysis
|
||
|
||
**Goal**: Identify why hakmem is 2× slower than mimalloc
|
||
|
||
**Candidates** (from OVERHEAD_ANALYSIS_PLAN.md):
|
||
- P0: BigCache lookup (~50-100 ns)
|
||
- P0: ELO strategy selection (~100-200 ns)
|
||
- P1: mmap/munmap syscalls (~1,000-5,000 ns) ← **Main suspect**
|
||
- P1: Page faults (~100-500 ns per page)
|
||
|
||
**Strategy**:
|
||
1. Feature isolation testing (environment variables)
|
||
2. `perf` profiling (hotspot identification)
|
||
3. `strace` syscall counting
|
||
|
||
### 2. Optimization Ideas
|
||
|
||
1. **FROZEN mode by default** (after learning) → -5% overhead
|
||
2. **BigCache direct indexing** (instead of linear search) → -5% overhead
|
||
3. **Pre-allocated arena** (Phase 7+) → -50% overhead target
|
||
|
||
**Realistic goal**: Reduce gap from +88% to +40% (Phase 7), then +20% (Phase 8)
|
||
|
||
**Limit**: Cannot beat mimalloc without slab allocator (industry standard, 10+ years optimization)
|
||
|
||
---
|
||
|
||
## 📁 Documentation Created
|
||
|
||
1. **PHASE_6.6_ELO_CONTROL_FLOW_FIX.md** (updated with benchmark results)
|
||
2. **OVERHEAD_ANALYSIS_PLAN.md** (Phase 6.7 preparation)
|
||
3. **PHASE_6.6_SUMMARY.md** (this file)
|
||
4. **GEMINI_BIGCACHE_ANALYSIS.md** (confirmed existing fix)
|
||
|
||
---
|
||
|
||
## 🏆 Final Status
|
||
|
||
**Phase 6.6**: ✅ **COMPLETE**
|
||
|
||
**Achievements**:
|
||
- ✅ ELO control flow bug fixed
|
||
- ✅ Batch madvise activation verified
|
||
- ✅ Performance parity with Phase 6.4 maintained (+2.6% variance)
|
||
- ✅ Comprehensive documentation created
|
||
- ✅ Phase 6.7 roadmap prepared
|
||
|
||
**Code quality**:
|
||
- Modified files: 1 (`hakmem.c`)
|
||
- Lines changed: ~75 lines (reordering + cleanup)
|
||
- Test coverage: VM scenario verified (200 runs)
|
||
|
||
**Time investment**: ~6 hours (diagnosis + fix + benchmarking + documentation)
|
||
|
||
---
|
||
|
||
**Ready for Phase 6.7: Overhead Analysis & Optimization** 🚀
|