Files
hakmem/docs/analysis/OVERHEAD_ANALYSIS_PLAN.md

165 lines
4.3 KiB
Markdown
Raw Normal View History

# hakmem Overhead Analysis Plan (Phase 6.7 準備)
**Gap**: hakmem-evolving (37,602 ns) vs mimalloc (19,964 ns) = **+88.3%**
---
## 🎯 Overhead 候補(優先度順)
### P0: Critical Path Overhead
1. **BigCache lookup** (毎回実行)
- Hash table lookup for site_id
- Size class matching
- Slot iteration
- **推定コスト**: 50-100 ns
2. **ELO strategy selection** (LEARN mode)
- `hak_elo_select_strategy()`: softmax calculation
- 12 strategies の確率計算
- Random number generation
- **推定コスト**: 100-200 ns
3. **Header read/write**
- AllocHeader (32 bytes) の read/write
- Magic verification
- **推定コスト**: 10-20 ns
4. **Atomic tick counter**
- `atomic_fetch_add(&tick_counter, 1)`
- Every allocation
- **推定コスト**: 5-10 ns
### P1: Syscall Overhead
5. **mmap/munmap**
- System call overhead
- TLB flush
- Page table updates
- **推定コスト**: 1,000-5,000 ns (syscall dependent)
6. **Page faults**
- First touch of mmap'd memory
- Soft page faults
- **推定コスト**: 100-500 ns per page
### P2: Other Overhead
7. **Evolution lifecycle**
- `hak_evo_tick()` (every 1024 allocs)
- `hak_evo_record_size()` (every alloc)
- **推定コスト**: 5-10 ns
8. **Batch madvise**
- Batch add/flush overhead
- **推定コスト**: Amortized, should be near-zero
---
## 🔬 Measurement Strategy
### Phase 1: Feature Isolation
Test configurations (environment variables):
1. **Baseline**: All features ON (current)
2. **No BigCache**: `HAKMEM_DISABLE_BIGCACHE=1`
3. **No ELO**: `HAKMEM_DISABLE_ELO=1` (use fixed threshold)
4. **Frozen mode**: `HAKMEM_EVO_POLICY=frozen` (skip learning)
5. **Minimal**: BigCache + ELO + Evolution すべて OFF
**Expected results**:
- If "No BigCache" → -100ns: BigCache overhead = 100ns
- If "No ELO" → -200ns: ELO overhead = 200ns
- If "Minimal" → -500ns: Total feature overhead = 500ns
- Remaining gap (~17,000 ns) → syscall/page fault overhead
### Phase 2: Profiling
```bash
# Compile with debug symbols
make clean && make CFLAGS="-g -O2"
# Run with perf
perf record -g ./bench_allocators --allocator hakmem-evolving --scenario vm --iterations 100
perf report
# Look for:
- hak_alloc_at() time breakdown
- hak_bigcache_try_get() cost
- hak_elo_select_strategy() cost
- mmap/munmap syscall time
```
### Phase 3: Syscall Analysis
```bash
# Count syscalls
strace -c ./bench_allocators --allocator hakmem-evolving --scenario vm --iterations 10
# Compare with mimalloc
strace -c -o hakmem.strace ./bench_allocators --allocator hakmem-evolving --scenario vm --iterations 10
strace -c -o mimalloc.strace ./bench_allocators --allocator mimalloc --scenario vm --iterations 10
diff hakmem.strace mimalloc.strace
```
---
## 🎯 Expected Findings
**Hypothesis 1: BigCache overhead = 5-10%**
- Hash lookup + slot iteration
- Negligible compared to total gap
**Hypothesis 2: ELO overhead = 5-10%**
- Softmax calculation
- Can be eliminated in FROZEN mode
**Hypothesis 3: mmap/munmap overhead = 60-70%**
- System call overhead
- Page fault overhead
- **This is the main gap**
- Solution: Reduce mmap/munmap calls (already doing with BigCache)
**Hypothesis 4: Remaining gap = mimalloc's slab allocator**
- mimalloc uses slab allocator for 2MB
- Pre-allocated, no syscalls
- hakmem uses mmap per allocation (first miss)
- **Can't compete without similar architecture**
---
## 💡 Optimization Ideas (Phase 6.7+)
1. **FROZEN mode by default** (after learning)
- Zero ELO overhead
- -5% improvement
2. **BigCache optimization**
- Direct indexing instead of linear search
- -5% improvement
3. **Pre-allocated arena** (Phase 7?)
- mmap large arena once
- Suballocate from arena
- Avoid per-allocation syscalls
- Target: -50% improvement
4. **Header optimization**
- Reduce AllocHeader size (32 → 16 bytes?)
- Use bit packing
- -2% improvement
---
## 📊 Success Metrics
**Phase 6.7 Goal**: Identify top 3 overhead sources
**Phase 7 Goal**: Reduce gap to +40% (vs +88% now)
**Phase 8 Goal**: Reduce gap to +20% (competitive)
**Realistic limit**: Cannot beat mimalloc without slab allocator
- mimalloc: Industry-standard, 10+ years of optimization
- hakmem: Research PoC, 2 months of development
- **Target: Within 20-30% is acceptable for PoC**