Files

Moe Charm (CI) 52386401b3 Debug Counters Implementation - Clean History

Major Features:
- Debug counter infrastructure for Refill Stage tracking
- Free Pipeline counters (ss_local, ss_remote, tls_sll)
- Diagnostic counters for early return analysis
- Unified larson.sh benchmark runner with profiles
- Phase 6-3 regression analysis documentation

Bug Fixes:
- Fix SuperSlab disabled by default (HAKMEM_TINY_USE_SUPERSLAB)
- Fix profile variable naming consistency
- Add .gitignore patterns for large files

Performance:
- Phase 6-3: 4.79 M ops/s (has OOM risk)
- With SuperSlab: 3.13 M ops/s (+19% improvement)

This is a clean repository without large log files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-05 12:31:14 +09:00

6.1 KiB

Raw Blame History

Phase 6.6 Complete Summary

Date: 2025-10-21 Status: ✅ COMPLETE

🎯 Goal & Achievement

Goal: Fix ELO control flow bug that prevented batch madvise activation Result: ✅ Successfully fixed and verified - Batch madvise now working correctly

🐛 Problem

After Phase 6.5 (Learning Lifecycle) integration:

2MB allocations were using MALLOC instead of MMAP
BigCache eviction called free() instead of hak_batch_add()
Batch madvise statistics showed 0 blocks batched (completely inactive)

🔍 Root Cause (Diagnosed by Gemini Pro)

Control flow ordering bug in hakmem.c:hak_alloc_at():

OLD policy decision (infer_policy()) executed FIRST → returned POLICY_DEFAULT
Allocation happened using old policy → alloc_malloc() called
ELO strategy selection executed TOO LATE → results completely ignored
ELO results only used for BigCache eligibility, not allocation method

Key insight: "The right answer computed at the wrong time is the wrong answer"

✅ Fix Applied

Modified: hakmem.c (lines 645-720)

Before (WRONG):

void* hak_alloc_at(size_t size, ...) {
    // 1. Old policy (WRONG!)
    policy = POLICY_DEFAULT;

    // 2. Allocate (TOO EARLY!)
    ptr = allocate_with_policy(size, policy);  // Uses malloc

    // 3. ELO selection (TOO LATE!)
    strategy_id = hak_elo_select_strategy();   // Result not used!
    threshold = hak_elo_get_threshold(strategy_id);
}

After (CORRECT):

void* hak_alloc_at(size_t size, ...) {
    // 1. ELO selection FIRST!
    strategy_id = hak_elo_select_strategy();
    threshold = hak_elo_get_threshold(strategy_id);

    // 2. BigCache check
    if (hak_bigcache_try_get(...)) return cached_ptr;

    // 3. Use ELO threshold to decide malloc vs mmap
    ptr = (size >= threshold) ? alloc_mmap(size) : alloc_malloc(size);
}

Result: 2MB allocations now correctly use mmap, enabling batch madvise.

📊 Benchmark Results

Configuration: bench_runner.sh --warmup 2 --runs 10 (200 total runs)

VM Scenario (2MB allocations)

Allocator	Median (ns)	vs Phase 6.4	vs mimalloc
mimalloc	19,964	+12.6%	baseline
jemalloc	26,241	-3.0%	+31.4%
hakmem-evolving	37,602	+2.6%	+88.3%
hakmem-baseline	40,282	+9.1%	+101.7%
system	59,995	-4.4%	+200.4%

Analysis

✅ No regression: +2.6% difference vs Phase 6.4 is within measurement variance
✅ ELO working: hakmem-evolving beats hakmem-baseline
✅ Batch madvise active: Verified with debug logging
⚠️ Overhead gap: Still 2× slower than mimalloc → Phase 6.7 investigation

Note: README.md claimed "16,125 ns" for Phase 6.4, but FINAL_RESULTS.md shows 36,647 ns (the correct baseline for comparison).

🧪 Verification

Batch Madvise Activation Confirmed

[DEBUG] BigCache eviction: method=1 (MMAP), size=2097152  ✅
[DEBUG] Calling hak_batch_add(raw=0x..., size=2097152)    ✅

Batch Statistics:
  Total blocks added:       1                              ✅
  Flush operations:         1                              ✅
  Total bytes flushed:      2097152                        ✅

🎓 Lessons Learned

Design Mistakes

Control flow ordering: Strategy selection must happen BEFORE usage
Dead code accumulation: Old infer_policy() logic left behind
Silent failures: ELO results computed but not used

Detection Challenges

High-level symptoms: "Batch not activating" didn't point to control flow
Required detailed tracing: Had to add debug logging to discover MALLOC usage
Multi-layer architecture: Problem spanned ELO, allocation, BigCache, batch

AI Collaboration Success

Gemini Pro: Root cause diagnosis from logs + code analysis
Claude: Applied fix, tested, documented
Synergy: Gemini saw the forest (control flow), Claude fixed the trees (code)

📝 Bonus Findings

BigCache Size Check Bug (Already Fixed)

Gemini Task 5cfad9 diagnosed a heap-buffer-overflow bug:

Problem: BigCache returning undersized blocks without actual_bytes >= requested_bytes check
Impact: cold-churn benchmark (varying sizes) triggers buffer overflow
Status: ✅ Already fixed in previous session
Code: hakmem_bigcache.c:151 has size check with "Segfault fix!" comment

🚀 Next Steps (Phase 6.7)

1. Overhead Analysis

Goal: Identify why hakmem is 2× slower than mimalloc

Candidates (from OVERHEAD_ANALYSIS_PLAN.md):

P0: BigCache lookup (~50-100 ns)
P0: ELO strategy selection (~100-200 ns)
P1: mmap/munmap syscalls (~1,000-5,000 ns) ← Main suspect
P1: Page faults (~100-500 ns per page)

Strategy:

Feature isolation testing (environment variables)
perf profiling (hotspot identification)
strace syscall counting

2. Optimization Ideas

FROZEN mode by default (after learning) → -5% overhead
BigCache direct indexing (instead of linear search) → -5% overhead
Pre-allocated arena (Phase 7+) → -50% overhead target

Realistic goal: Reduce gap from +88% to +40% (Phase 7), then +20% (Phase 8)

Limit: Cannot beat mimalloc without slab allocator (industry standard, 10+ years optimization)

📁 Documentation Created

PHASE_6.6_ELO_CONTROL_FLOW_FIX.md (updated with benchmark results)
OVERHEAD_ANALYSIS_PLAN.md (Phase 6.7 preparation)
PHASE_6.6_SUMMARY.md (this file)
GEMINI_BIGCACHE_ANALYSIS.md (confirmed existing fix)

🏆 Final Status

Phase 6.6: ✅ COMPLETE

Achievements:

✅ ELO control flow bug fixed
✅ Batch madvise activation verified
✅ Performance parity with Phase 6.4 maintained (+2.6% variance)
✅ Comprehensive documentation created
✅ Phase 6.7 roadmap prepared

Code quality:

Modified files: 1 (hakmem.c)
Lines changed: ~75 lines (reordering + cleanup)
Test coverage: VM scenario verified (200 runs)

Time investment: ~6 hours (diagnosis + fix + benchmarking + documentation)

Ready for Phase 6.7: Overhead Analysis & Optimization 🚀

6.1 KiB Raw Blame History Unescape Escape