Major Features: - Debug counter infrastructure for Refill Stage tracking - Free Pipeline counters (ss_local, ss_remote, tls_sll) - Diagnostic counters for early return analysis - Unified larson.sh benchmark runner with profiles - Phase 6-3 regression analysis documentation Bug Fixes: - Fix SuperSlab disabled by default (HAKMEM_TINY_USE_SUPERSLAB) - Fix profile variable naming consistency - Add .gitignore patterns for large files Performance: - Phase 6-3: 4.79 M ops/s (has OOM risk) - With SuperSlab: 3.13 M ops/s (+19% improvement) This is a clean repository without large log files. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
4.3 KiB
hakmem Overhead Analysis Plan (Phase 6.7 準備)
Gap: hakmem-evolving (37,602 ns) vs mimalloc (19,964 ns) = +88.3%
🎯 Overhead 候補(優先度順)
P0: Critical Path Overhead
-
BigCache lookup (毎回実行)
- Hash table lookup for site_id
- Size class matching
- Slot iteration
- 推定コスト: 50-100 ns
-
ELO strategy selection (LEARN mode)
hak_elo_select_strategy(): softmax calculation- 12 strategies の確率計算
- Random number generation
- 推定コスト: 100-200 ns
-
Header read/write
- AllocHeader (32 bytes) の read/write
- Magic verification
- 推定コスト: 10-20 ns
-
Atomic tick counter
atomic_fetch_add(&tick_counter, 1)- Every allocation
- 推定コスト: 5-10 ns
P1: Syscall Overhead
-
mmap/munmap
- System call overhead
- TLB flush
- Page table updates
- 推定コスト: 1,000-5,000 ns (syscall dependent)
-
Page faults
- First touch of mmap'd memory
- Soft page faults
- 推定コスト: 100-500 ns per page
P2: Other Overhead
-
Evolution lifecycle
hak_evo_tick()(every 1024 allocs)hak_evo_record_size()(every alloc)- 推定コスト: 5-10 ns
-
Batch madvise
- Batch add/flush overhead
- 推定コスト: Amortized, should be near-zero
🔬 Measurement Strategy
Phase 1: Feature Isolation
Test configurations (environment variables):
- Baseline: All features ON (current)
- No BigCache:
HAKMEM_DISABLE_BIGCACHE=1 - No ELO:
HAKMEM_DISABLE_ELO=1(use fixed threshold) - Frozen mode:
HAKMEM_EVO_POLICY=frozen(skip learning) - Minimal: BigCache + ELO + Evolution すべて OFF
Expected results:
- If "No BigCache" → -100ns: BigCache overhead = 100ns
- If "No ELO" → -200ns: ELO overhead = 200ns
- If "Minimal" → -500ns: Total feature overhead = 500ns
- Remaining gap (~17,000 ns) → syscall/page fault overhead
Phase 2: Profiling
# Compile with debug symbols
make clean && make CFLAGS="-g -O2"
# Run with perf
perf record -g ./bench_allocators --allocator hakmem-evolving --scenario vm --iterations 100
perf report
# Look for:
- hak_alloc_at() time breakdown
- hak_bigcache_try_get() cost
- hak_elo_select_strategy() cost
- mmap/munmap syscall time
Phase 3: Syscall Analysis
# Count syscalls
strace -c ./bench_allocators --allocator hakmem-evolving --scenario vm --iterations 10
# Compare with mimalloc
strace -c -o hakmem.strace ./bench_allocators --allocator hakmem-evolving --scenario vm --iterations 10
strace -c -o mimalloc.strace ./bench_allocators --allocator mimalloc --scenario vm --iterations 10
diff hakmem.strace mimalloc.strace
🎯 Expected Findings
Hypothesis 1: BigCache overhead = 5-10%
- Hash lookup + slot iteration
- Negligible compared to total gap
Hypothesis 2: ELO overhead = 5-10%
- Softmax calculation
- Can be eliminated in FROZEN mode
Hypothesis 3: mmap/munmap overhead = 60-70%
- System call overhead
- Page fault overhead
- This is the main gap
- Solution: Reduce mmap/munmap calls (already doing with BigCache)
Hypothesis 4: Remaining gap = mimalloc's slab allocator
- mimalloc uses slab allocator for 2MB
- Pre-allocated, no syscalls
- hakmem uses mmap per allocation (first miss)
- Can't compete without similar architecture
💡 Optimization Ideas (Phase 6.7+)
-
FROZEN mode by default (after learning)
- Zero ELO overhead
- -5% improvement
-
BigCache optimization
- Direct indexing instead of linear search
- -5% improvement
-
Pre-allocated arena (Phase 7?)
- mmap large arena once
- Suballocate from arena
- Avoid per-allocation syscalls
- Target: -50% improvement
-
Header optimization
- Reduce AllocHeader size (32 → 16 bytes?)
- Use bit packing
- -2% improvement
📊 Success Metrics
Phase 6.7 Goal: Identify top 3 overhead sources Phase 7 Goal: Reduce gap to +40% (vs +88% now) Phase 8 Goal: Reduce gap to +20% (competitive)
Realistic limit: Cannot beat mimalloc without slab allocator
- mimalloc: Industry-standard, 10+ years of optimization
- hakmem: Research PoC, 2 months of development
- Target: Within 20-30% is acceptable for PoC