Files
hakmem/docs/archive/PHASE_6.11.4_COMPLETION_REPORT.md

303 lines
9.3 KiB
Markdown
Raw Normal View History

# Phase 6.11.4 Completion Report: hak_alloc Optimization
**Date**: 2025-10-22
**Status**: ✅ **Implementation Complete** (P0-1 + P0-2)
**Goal**: Optimize hak_alloc hotpath to beat mimalloc in all scenarios
---
## 📊 **Background: Why hak_alloc Optimization?**
### Problem: hak_alloc is the #2 Bottleneck (Phase 6.11.3 Discovery)
**Profiling results** (Phase 6.11.3):
```
syscall_munmap: 131,666 cycles (41.3%) ← #1 Bottleneck
hak_alloc: 126,479 cycles (39.6%) ← #2 NEW DISCOVERY! 🔥
hak_free: 48,206 cycles (15.1%)
```
**Target**: Reduce hak_alloc overhead by ~45% to beat mimalloc in all scenarios
---
## 🔧 **Implementation**
### **Phase 6.11.4 P0-1: Atomic Operation Elimination** (30 min)
**Goal**: Eliminate atomic operations when EVOLUTION feature is disabled
**Changes**:
1. **hakmem.c Line 361-375**: Replace runtime `if (HAK_ENABLED_LEARNING(...))` with compile-time `#if HAKMEM_FEATURE_EVOLUTION`
```c
// Before (runtime check)
if (HAK_ENABLED_LEARNING(HAKMEM_FEATURE_EVOLUTION)) {
static _Atomic uint64_t tick_counter = 0;
if ((atomic_fetch_add(&tick_counter, 1) & 0x3FF) == 0) {
hak_evo_tick(now_ns);
}
}
// After (compile-time check)
#if HAKMEM_FEATURE_EVOLUTION
static _Atomic uint64_t tick_counter = 0;
if ((atomic_fetch_add(&tick_counter, 1) & 0x3FF) == 0) {
if (hak_evo_tick(now_ns)) {
// P0-2: Update cached strategy when window closes
int new_strategy = hak_elo_select_strategy();
atomic_store(&g_cached_strategy_id, new_strategy);
}
}
#endif
```
**Benefits**:
- **Compile-time guard**: Zero overhead when EVOLUTION disabled
- **Reduced runtime checks**: -70 cycles/alloc in minimal mode
---
### **Phase 6.11.4 P0-2: Cached Strategy** (1-2 hrs)
**Goal**: Eliminate ELO strategy selection overhead in LEARN mode
**Problem**: Heavy overhead in LEARN mode
```c
// Before (LEARN mode): 100回ごとに重い計算
g_elo_call_count++;
if (g_elo_call_count % 100 == 0 || g_cached_strategy_id == -1) {
strategy_id = hak_elo_select_strategy(); // Heavy
g_cached_strategy_id = strategy_id;
hak_elo_record_alloc(strategy_id, size, 0); // Heavy
} else {
strategy_id = g_cached_strategy_id; // 99回はキャッシュ
}
// Overhead:
// - 剰余計算 (% 100): 10-20 cycles
// - 条件分岐: 5-10 cycles
// - カウンタインクリメント: 3-5 cycles
// Total: 18-35 cycles (99回) + heavy (1回)
```
**Solution**: Always use cached strategy
```c
// After (ALL modes): すべてのモードで同じ速度
int strategy_id = atomic_load(&g_cached_strategy_id); // 10 cycles のみ
size_t threshold = hak_elo_get_threshold(strategy_id);
// 更新は window closure 時のみ (hak_evo_tick)
if (hak_evo_tick(now_ns)) {
int new_strategy = hak_elo_select_strategy();
atomic_store(&g_cached_strategy_id, new_strategy);
}
```
**Changes**:
1. **hakmem.c Line 57-58**:
- Removed `g_elo_call_count`
- Changed `g_cached_strategy_id` to `static _Atomic int`
2. **hakmem.c Line 376-383**: Simplified ELO logic (42 lines → 10 lines)
```c
// Before: 42 lines (complex branching)
if (HAK_ENABLED_LEARNING(HAKMEM_FEATURE_ELO)) {
if (hak_evo_is_frozen()) { ... }
else if (hak_evo_is_canary()) { ... }
else { /* LEARN: 15 lines of ELO logic */ }
} else { threshold = 2097152; }
// After: 10 lines (simple atomic load)
if (HAK_ENABLED_LEARNING(HAKMEM_FEATURE_ELO)) {
int strategy_id = atomic_load(&g_cached_strategy_id);
threshold = hak_elo_get_threshold(strategy_id);
} else {
threshold = 2097152; // 2MB
}
```
3. **hakmem.c Line 299-300**: Initialize cached strategy in `hak_init()`
**Benefits**:
- **LEARN mode**: 剰余・分岐・カウンタ削除 → -18-35 cycles
- **FROZEN/CANARY**: Same speed (10 cycles atomic load)
- **Code simplification**: 42 lines → 10 lines (-76%)
---
## 📈 **Test Results**
### **Profiling Results** (minimal mode, vm scenario, 10 iterations)
**Before (Phase 6.11.3)**:
```
hak_alloc: 126,479 cycles (39.6%)
```
**After P0-1**:
```
hak_alloc: 119,480 cycles (24.3%) → -6,999 cycles (-5.5%)
```
**After P0-2**:
```
hak_alloc: 114,186 cycles (33.8%) → -12,293 cycles (-9.7% total)
```
**Analysis**:
- **Expected**: -45% (-56,479 cycles)
- **Actual**: -9.7% (-12,293 cycles)
- **Reason**: minimal mode では EVOLUTION 無効 → 削減されたのは実行時チェックのみ
---
### **Benchmark Results** (all scenarios, 100 iterations)
| Scenario | Phase 6.10.1 | **P0-2 後** | mimalloc | vs mimalloc |
|----------|--------------|-------------|----------|-------------|
| json (64KB) | 298 ns | **300 ns** | 220 ns | **+36.4%** ❌ |
| mir (256KB) | - | **870 ns** | 1,072 ns | **-18.8%** ✅ |
| vm (2MB) | - | **15,385 ns** | 13,812 ns | **+11.4%** ❌ |
**Analysis**:
- json: ほぼ変化なし (+0.7%)
- mir: わずかに改善 (-0.5% vs Phase 6.11.3 874 ns)
- vm: 悪化 (+10.4% vs Phase 6.11.3 13,933 ns)
---
## 🔍 **Key Discoveries**
### 1⃣ **Pool/Cache が支配的で P0-2 の効果が見られない**
**実際の状況**:
- **json**: L2.5 Pool hit **100%** → hak_alloc のメインロジックをスキップ
- **mir**: L2.5 Pool hit **100%** → hak_alloc のメインロジックをスキップ
- **vm**: BigCache hit **99.9%** → hak_alloc のメインロジックをスキップ
**結論**: Pool/Cache の hit rate が高すぎて、`hak_alloc` の最適化が効果を発揮していない
### 2⃣ **Profiling では効果あり、Benchmark では効果なし**
- **Profiling** (minimal mode): -9.7% 削減 ✅
- **Benchmark** (balanced mode): ほぼ変化なし ❌
**理由**:
- Profiling は Pool/Cache 無効minimal mode
- Benchmark は Pool/Cache 有効balanced mode → Pool/Cache が支配的
### 3⃣ **次の最適化ターゲットは Pool/Cache 自体**
**hak_alloc の最適化は完了**-9.7%)。次は:
- **L2.5 Pool の高速化** (Phase 6.13)
- **BigCache の高速化** (Phase 6.8+)
- **Tiny Pool の高速化** (Phase 6.12)
---
## 💡 **Lessons Learned**
### 1. **Profiling と Benchmark の違いを理解すべき**
- Profiling: 特定機能の overhead 測定minimal mode
- Benchmark: 実際のワークロードでの性能balanced mode
- **両方を測定しないと全体像が見えない**
### 2. **Pool/Cache が支配的な環境では hak_alloc 最適化の効果は限定的**
- json/mir: L2.5 Pool が 100% ヒット
- vm: BigCache が 99.9% ヒット
- **→ Pool/Cache 自体を最適化すべき**
### 3. **Compile-time guard は有効**
- P0-1: 実行時チェック削除で -5.5% 削減
- minimal mode で効果が見られた
### 4. **Cached Strategy は実装できたが、効果は限定的**
- P0-2: 42 lines → 10 lines (-76% コード削減) ✅
- でも、benchmark では効果が見られない ❌
---
## ✅ **Implementation Checklist**
### Completed
- [x] P0-1: Atomic operation elimination (30 min)
- [x] P0-2: Cached strategy (1-2 hrs)
- [x] Build & test (clean compile)
- [x] Profiling test (minimal mode)
- [x] Benchmark test (json/mir/vm all scenarios)
- [x] Analysis & completion report
---
## 🚀 **Next Steps**
### P0: 現状維持
- P0-1/P0-2 は実装完了
- Profiling では効果あり (-9.7%)
- Benchmark では効果が見られないPool/Cache が支配的)
### P1: Pool/Cache 最適化に注力
**Phase 6.12 (Tiny Pool)**:
- ≤1KB allocations の高速化
- Slab allocator 最適化
**Phase 6.13 (L2.5 LargePool)**:
- 64KB-1MB allocations の高速化
- mir scenario の改善(-18.8% → < -30%
**Phase 6.8+ (BigCache)**:
- vm scenario の改善(+11.4% → < +0%
### P2: mimalloc 打倒目標の再評価
**現状**:
- json: +36.4% ❌ (Phase 6.10.1 と同レベル)
- mir: -18.8% ✅
- vm: +11.4% ❌ (悪化)
**新目標**: Pool/Cache 最適化で json/vm も改善
- json: 300 ns → < 220 ns (mimalloc レベル)
- vm: 15,385 ns → < 13,812 ns (mimalloc レベル)
---
## 📝 **Technical Details**
### Code Changes Summary
1. **hakmem.c**:
- Line 57-58: `g_cached_strategy_id``_Atomic` に変更、`g_elo_call_count` 削除
- Line 361-375: Compile-time guard 追加 (P0-1 + P0-2 window closure 更新)
- Line 376-383: ELO logic 簡略化 (42 lines → 10 lines)
- Line 299-300: `hak_init()` で cached strategy 初期化
### Expected vs Actual
| Metric | Expected | Actual | Reason |
|--------|----------|--------|--------|
| Profiling (hak_alloc) | -45% | **-9.7%** | minimal mode (EVOLUTION 無効) |
| Benchmark (json) | -30% | **+0.7%** | Pool hit 100% |
| Benchmark (mir) | -42% | **-0.5%** | Pool hit 100% |
| Benchmark (vm) | -20% | **+10.4%** | BigCache hit 99.9% |
---
## 📚 **Summary**
### Implemented (Phase 6.11.4)
- ✅ P0-1: Atomic operation elimination (compile-time guard)
- ✅ P0-2: Cached strategy (42 lines → 10 lines)
- ✅ Profiling: hak_alloc -9.7% 削減
### Discovered ❌ **Pool/Cache が支配的で効果が見られない**
- **json/mir**: L2.5 Pool hit 100%
- **vm**: BigCache hit 99.9%
- **→ Pool/Cache 自体を最適化すべき**
### Recommendation ✅ **次は Pool/Cache 最適化に注力**
**Next Phase**: Phase 6.12 (Tiny Pool) / 6.13 (L2.5 Pool) / 6.8+ (BigCache)
---
**Implementation Time**: 約2-3時間予想通り
**Profiling Impact**: hak_alloc -9.7% 削減 ✅
**Benchmark Impact**: ほぼ変化なしPool/Cache が支配的)❌
**Lesson**: **Pool/Cache の最適化が次の優先事項** 🎯