# hakmem Overhead Analysis Plan (Phase 6.7 準備) **Gap**: hakmem-evolving (37,602 ns) vs mimalloc (19,964 ns) = **+88.3%** --- ## 🎯 Overhead 候補(優先度順) ### P0: Critical Path Overhead 1. **BigCache lookup** (毎回実行) - Hash table lookup for site_id - Size class matching - Slot iteration - **推定コスト**: 50-100 ns 2. **ELO strategy selection** (LEARN mode) - `hak_elo_select_strategy()`: softmax calculation - 12 strategies の確率計算 - Random number generation - **推定コスト**: 100-200 ns 3. **Header read/write** - AllocHeader (32 bytes) の read/write - Magic verification - **推定コスト**: 10-20 ns 4. **Atomic tick counter** - `atomic_fetch_add(&tick_counter, 1)` - Every allocation - **推定コスト**: 5-10 ns ### P1: Syscall Overhead 5. **mmap/munmap** - System call overhead - TLB flush - Page table updates - **推定コスト**: 1,000-5,000 ns (syscall dependent) 6. **Page faults** - First touch of mmap'd memory - Soft page faults - **推定コスト**: 100-500 ns per page ### P2: Other Overhead 7. **Evolution lifecycle** - `hak_evo_tick()` (every 1024 allocs) - `hak_evo_record_size()` (every alloc) - **推定コスト**: 5-10 ns 8. **Batch madvise** - Batch add/flush overhead - **推定コスト**: Amortized, should be near-zero --- ## 🔬 Measurement Strategy ### Phase 1: Feature Isolation Test configurations (environment variables): 1. **Baseline**: All features ON (current) 2. **No BigCache**: `HAKMEM_DISABLE_BIGCACHE=1` 3. **No ELO**: `HAKMEM_DISABLE_ELO=1` (use fixed threshold) 4. **Frozen mode**: `HAKMEM_EVO_POLICY=frozen` (skip learning) 5. **Minimal**: BigCache + ELO + Evolution すべて OFF **Expected results**: - If "No BigCache" → -100ns: BigCache overhead = 100ns - If "No ELO" → -200ns: ELO overhead = 200ns - If "Minimal" → -500ns: Total feature overhead = 500ns - Remaining gap (~17,000 ns) → syscall/page fault overhead ### Phase 2: Profiling ```bash # Compile with debug symbols make clean && make CFLAGS="-g -O2" # Run with perf perf record -g ./bench_allocators --allocator hakmem-evolving --scenario vm --iterations 100 perf report # Look for: - hak_alloc_at() time breakdown - hak_bigcache_try_get() cost - hak_elo_select_strategy() cost - mmap/munmap syscall time ``` ### Phase 3: Syscall Analysis ```bash # Count syscalls strace -c ./bench_allocators --allocator hakmem-evolving --scenario vm --iterations 10 # Compare with mimalloc strace -c -o hakmem.strace ./bench_allocators --allocator hakmem-evolving --scenario vm --iterations 10 strace -c -o mimalloc.strace ./bench_allocators --allocator mimalloc --scenario vm --iterations 10 diff hakmem.strace mimalloc.strace ``` --- ## 🎯 Expected Findings **Hypothesis 1: BigCache overhead = 5-10%** - Hash lookup + slot iteration - Negligible compared to total gap **Hypothesis 2: ELO overhead = 5-10%** - Softmax calculation - Can be eliminated in FROZEN mode **Hypothesis 3: mmap/munmap overhead = 60-70%** - System call overhead - Page fault overhead - **This is the main gap** - Solution: Reduce mmap/munmap calls (already doing with BigCache) **Hypothesis 4: Remaining gap = mimalloc's slab allocator** - mimalloc uses slab allocator for 2MB - Pre-allocated, no syscalls - hakmem uses mmap per allocation (first miss) - **Can't compete without similar architecture** --- ## 💡 Optimization Ideas (Phase 6.7+) 1. **FROZEN mode by default** (after learning) - Zero ELO overhead - -5% improvement 2. **BigCache optimization** - Direct indexing instead of linear search - -5% improvement 3. **Pre-allocated arena** (Phase 7?) - mmap large arena once - Suballocate from arena - Avoid per-allocation syscalls - Target: -50% improvement 4. **Header optimization** - Reduce AllocHeader size (32 → 16 bytes?) - Use bit packing - -2% improvement --- ## 📊 Success Metrics **Phase 6.7 Goal**: Identify top 3 overhead sources **Phase 7 Goal**: Reduce gap to +40% (vs +88% now) **Phase 8 Goal**: Reduce gap to +20% (competitive) **Realistic limit**: Cannot beat mimalloc without slab allocator - mimalloc: Industry-standard, 10+ years of optimization - hakmem: Research PoC, 2 months of development - **Target: Within 20-30% is acceptable for PoC**