Files
hakmem/docs/archive/PHASE_6.6_SUMMARY.md
Moe Charm (CI) 52386401b3 Debug Counters Implementation - Clean History
Major Features:
- Debug counter infrastructure for Refill Stage tracking
- Free Pipeline counters (ss_local, ss_remote, tls_sll)
- Diagnostic counters for early return analysis
- Unified larson.sh benchmark runner with profiles
- Phase 6-3 regression analysis documentation

Bug Fixes:
- Fix SuperSlab disabled by default (HAKMEM_TINY_USE_SUPERSLAB)
- Fix profile variable naming consistency
- Add .gitignore patterns for large files

Performance:
- Phase 6-3: 4.79 M ops/s (has OOM risk)
- With SuperSlab: 3.13 M ops/s (+19% improvement)

This is a clean repository without large log files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-05 12:31:14 +09:00

6.1 KiB
Raw Blame History

Phase 6.6 Complete Summary

Date: 2025-10-21 Status: COMPLETE


🎯 Goal & Achievement

Goal: Fix ELO control flow bug that prevented batch madvise activation Result: Successfully fixed and verified - Batch madvise now working correctly


🐛 Problem

After Phase 6.5 (Learning Lifecycle) integration:

  • 2MB allocations were using MALLOC instead of MMAP
  • BigCache eviction called free() instead of hak_batch_add()
  • Batch madvise statistics showed 0 blocks batched (completely inactive)

🔍 Root Cause (Diagnosed by Gemini Pro)

Control flow ordering bug in hakmem.c:hak_alloc_at():

  1. OLD policy decision (infer_policy()) executed FIRST → returned POLICY_DEFAULT
  2. Allocation happened using old policy → alloc_malloc() called
  3. ELO strategy selection executed TOO LATE → results completely ignored
  4. ELO results only used for BigCache eligibility, not allocation method

Key insight: "The right answer computed at the wrong time is the wrong answer"


Fix Applied

Modified: hakmem.c (lines 645-720)

Before (WRONG):

void* hak_alloc_at(size_t size, ...) {
    // 1. Old policy (WRONG!)
    policy = POLICY_DEFAULT;

    // 2. Allocate (TOO EARLY!)
    ptr = allocate_with_policy(size, policy);  // Uses malloc

    // 3. ELO selection (TOO LATE!)
    strategy_id = hak_elo_select_strategy();   // Result not used!
    threshold = hak_elo_get_threshold(strategy_id);
}

After (CORRECT):

void* hak_alloc_at(size_t size, ...) {
    // 1. ELO selection FIRST!
    strategy_id = hak_elo_select_strategy();
    threshold = hak_elo_get_threshold(strategy_id);

    // 2. BigCache check
    if (hak_bigcache_try_get(...)) return cached_ptr;

    // 3. Use ELO threshold to decide malloc vs mmap
    ptr = (size >= threshold) ? alloc_mmap(size) : alloc_malloc(size);
}

Result: 2MB allocations now correctly use mmap, enabling batch madvise.


📊 Benchmark Results

Configuration: bench_runner.sh --warmup 2 --runs 10 (200 total runs)

VM Scenario (2MB allocations)

Allocator Median (ns) vs Phase 6.4 vs mimalloc
mimalloc 19,964 +12.6% baseline
jemalloc 26,241 -3.0% +31.4%
hakmem-evolving 37,602 +2.6% +88.3%
hakmem-baseline 40,282 +9.1% +101.7%
system 59,995 -4.4% +200.4%

Analysis

  1. No regression: +2.6% difference vs Phase 6.4 is within measurement variance
  2. ELO working: hakmem-evolving beats hakmem-baseline
  3. Batch madvise active: Verified with debug logging
  4. ⚠️ Overhead gap: Still 2× slower than mimalloc → Phase 6.7 investigation

Note: README.md claimed "16,125 ns" for Phase 6.4, but FINAL_RESULTS.md shows 36,647 ns (the correct baseline for comparison).


🧪 Verification

Batch Madvise Activation Confirmed

[DEBUG] BigCache eviction: method=1 (MMAP), size=2097152  ✅
[DEBUG] Calling hak_batch_add(raw=0x..., size=2097152)    ✅

Batch Statistics:
  Total blocks added:       1                              ✅
  Flush operations:         1                              ✅
  Total bytes flushed:      2097152                        ✅

🎓 Lessons Learned

Design Mistakes

  1. Control flow ordering: Strategy selection must happen BEFORE usage
  2. Dead code accumulation: Old infer_policy() logic left behind
  3. Silent failures: ELO results computed but not used

Detection Challenges

  1. High-level symptoms: "Batch not activating" didn't point to control flow
  2. Required detailed tracing: Had to add debug logging to discover MALLOC usage
  3. Multi-layer architecture: Problem spanned ELO, allocation, BigCache, batch

AI Collaboration Success

  • Gemini Pro: Root cause diagnosis from logs + code analysis
  • Claude: Applied fix, tested, documented
  • Synergy: Gemini saw the forest (control flow), Claude fixed the trees (code)

📝 Bonus Findings

BigCache Size Check Bug (Already Fixed)

Gemini Task 5cfad9 diagnosed a heap-buffer-overflow bug:

  • Problem: BigCache returning undersized blocks without actual_bytes >= requested_bytes check
  • Impact: cold-churn benchmark (varying sizes) triggers buffer overflow
  • Status: Already fixed in previous session
  • Code: hakmem_bigcache.c:151 has size check with "Segfault fix!" comment

🚀 Next Steps (Phase 6.7)

1. Overhead Analysis

Goal: Identify why hakmem is 2× slower than mimalloc

Candidates (from OVERHEAD_ANALYSIS_PLAN.md):

  • P0: BigCache lookup (~50-100 ns)
  • P0: ELO strategy selection (~100-200 ns)
  • P1: mmap/munmap syscalls (~1,000-5,000 ns) ← Main suspect
  • P1: Page faults (~100-500 ns per page)

Strategy:

  1. Feature isolation testing (environment variables)
  2. perf profiling (hotspot identification)
  3. strace syscall counting

2. Optimization Ideas

  1. FROZEN mode by default (after learning) → -5% overhead
  2. BigCache direct indexing (instead of linear search) → -5% overhead
  3. Pre-allocated arena (Phase 7+) → -50% overhead target

Realistic goal: Reduce gap from +88% to +40% (Phase 7), then +20% (Phase 8)

Limit: Cannot beat mimalloc without slab allocator (industry standard, 10+ years optimization)


📁 Documentation Created

  1. PHASE_6.6_ELO_CONTROL_FLOW_FIX.md (updated with benchmark results)
  2. OVERHEAD_ANALYSIS_PLAN.md (Phase 6.7 preparation)
  3. PHASE_6.6_SUMMARY.md (this file)
  4. GEMINI_BIGCACHE_ANALYSIS.md (confirmed existing fix)

🏆 Final Status

Phase 6.6: COMPLETE

Achievements:

  • ELO control flow bug fixed
  • Batch madvise activation verified
  • Performance parity with Phase 6.4 maintained (+2.6% variance)
  • Comprehensive documentation created
  • Phase 6.7 roadmap prepared

Code quality:

  • Modified files: 1 (hakmem.c)
  • Lines changed: ~75 lines (reordering + cleanup)
  • Test coverage: VM scenario verified (200 runs)

Time investment: ~6 hours (diagnosis + fix + benchmarking + documentation)


Ready for Phase 6.7: Overhead Analysis & Optimization 🚀