# hakmem Overhead Analysis Plan (Phase 6.7 準備)

**Gap**: hakmem-evolving (37,602 ns) vs mimalloc (19,964 ns) = **+88.3%**

---

## 🎯 Overhead 候補（優先度順）

### P0: Critical Path Overhead

1. **BigCache lookup** (毎回実行)
   - Hash table lookup for site_id
   - Size class matching
   - Slot iteration
   - **推定コスト**: 50-100 ns

2. **ELO strategy selection** (LEARN mode)
   - `hak_elo_select_strategy()`: softmax calculation
   - 12 strategies の確率計算
   - Random number generation
   - **推定コスト**: 100-200 ns

3. **Header read/write**
   - AllocHeader (32 bytes) の read/write
   - Magic verification
   - **推定コスト**: 10-20 ns

4. **Atomic tick counter**
   - `atomic_fetch_add(&tick_counter, 1)`
   - Every allocation
   - **推定コスト**: 5-10 ns

### P1: Syscall Overhead

5. **mmap/munmap**
   - System call overhead
   - TLB flush
   - Page table updates
   - **推定コスト**: 1,000-5,000 ns (syscall dependent)

6. **Page faults**
   - First touch of mmap'd memory
   - Soft page faults
   - **推定コスト**: 100-500 ns per page

### P2: Other Overhead

7. **Evolution lifecycle**
   - `hak_evo_tick()` (every 1024 allocs)
   - `hak_evo_record_size()` (every alloc)
   - **推定コスト**: 5-10 ns

8. **Batch madvise**
   - Batch add/flush overhead
   - **推定コスト**: Amortized, should be near-zero

---

## 🔬 Measurement Strategy

### Phase 1: Feature Isolation

Test configurations (environment variables):
1. **Baseline**: All features ON (current)
2. **No BigCache**: `HAKMEM_DISABLE_BIGCACHE=1`
3. **No ELO**: `HAKMEM_DISABLE_ELO=1` (use fixed threshold)
4. **Frozen mode**: `HAKMEM_EVO_POLICY=frozen` (skip learning)
5. **Minimal**: BigCache + ELO + Evolution すべて OFF

**Expected results**:
- If "No BigCache" → -100ns: BigCache overhead = 100ns
- If "No ELO" → -200ns: ELO overhead = 200ns
- If "Minimal" → -500ns: Total feature overhead = 500ns
- Remaining gap (~17,000 ns) → syscall/page fault overhead

### Phase 2: Profiling

```bash
# Compile with debug symbols
make clean && make CFLAGS="-g -O2"

# Run with perf
perf record -g ./bench_allocators --allocator hakmem-evolving --scenario vm --iterations 100
perf report

# Look for:
- hak_alloc_at() time breakdown
- hak_bigcache_try_get() cost
- hak_elo_select_strategy() cost
- mmap/munmap syscall time
```

### Phase 3: Syscall Analysis

```bash
# Count syscalls
strace -c ./bench_allocators --allocator hakmem-evolving --scenario vm --iterations 10

# Compare with mimalloc
strace -c -o hakmem.strace ./bench_allocators --allocator hakmem-evolving --scenario vm --iterations 10
strace -c -o mimalloc.strace ./bench_allocators --allocator mimalloc --scenario vm --iterations 10

diff hakmem.strace mimalloc.strace
```

---

## 🎯 Expected Findings

**Hypothesis 1: BigCache overhead = 5-10%**
- Hash lookup + slot iteration
- Negligible compared to total gap

**Hypothesis 2: ELO overhead = 5-10%**
- Softmax calculation
- Can be eliminated in FROZEN mode

**Hypothesis 3: mmap/munmap overhead = 60-70%**
- System call overhead
- Page fault overhead
- **This is the main gap**
- Solution: Reduce mmap/munmap calls (already doing with BigCache)

**Hypothesis 4: Remaining gap = mimalloc's slab allocator**
- mimalloc uses slab allocator for 2MB
- Pre-allocated, no syscalls
- hakmem uses mmap per allocation (first miss)
- **Can't compete without similar architecture**

---

## 💡 Optimization Ideas (Phase 6.7+)

1. **FROZEN mode by default** (after learning)
   - Zero ELO overhead
   - -5% improvement

2. **BigCache optimization**
   - Direct indexing instead of linear search
   - -5% improvement

3. **Pre-allocated arena** (Phase 7?)
   - mmap large arena once
   - Suballocate from arena
   - Avoid per-allocation syscalls
   - Target: -50% improvement

4. **Header optimization**
   - Reduce AllocHeader size (32 → 16 bytes?)
   - Use bit packing
   - -2% improvement

---

## 📊 Success Metrics

**Phase 6.7 Goal**: Identify top 3 overhead sources
**Phase 7 Goal**: Reduce gap to +40% (vs +88% now)
**Phase 8 Goal**: Reduce gap to +20% (competitive)

**Realistic limit**: Cannot beat mimalloc without slab allocator
- mimalloc: Industry-standard, 10+ years of optimization
- hakmem: Research PoC, 2 months of development
- **Target: Within 20-30% is acceptable for PoC**