CURRENT_TASK: Phase 70-73 complete, Phase 72 plan (post-instruction-reduction)

Updated with: - Phase 70-1/2/3: Route Banner + consistency checks + refill optimization freeze - Phase 73: Hardware profiling paradox resolved (instruction reduction wins despite worse TLB/cache) - Phase 72-0: Function-level perf record plan (identify which functions reduced instructions) - Phase 72-1: Structure optimization targeting identified hot functions Key insight: WarmPool=16 selects shorter code paths, not memory hierarchy optimization. Next action: Phase 72-0 confirmed unified_cache_push as primary target.
2025-12-18 05:55:47 +09:00
parent 8fdbc6d07e
commit b38a8b73b6
1 changed files with 59 additions and 0 deletions
--- a/CURRENT_TASK.md
+++ b/CURRENT_TASK.md
@ -140,6 +140,65 @@
 - 注意: `Route assignments: LEGACY` は「Unified Cache 未使用」を意味しない（backend route kind）
 - SSOT: `docs/analysis/PHASE70_REFILL_OBSERVABILITY_PREREQS_SSOT.md`
  - Mixed SSOT（WS=400）で `unified_cache_refill()` / WarmPool pop が有意に起きているかを **OBSERVE で確定**してから Phase 70 を進める
+- ✅ Phase 70-1: Route Banner 実装（経路誤認の根絶）
+  - ENV: `HAKMEM_ROUTE_BANNER=1`
+  - 出力: Route assignments（backend route kind）+ cache config（unified_cache / warm_pool_max_per_class）
+- ✅ Phase 70-3: OBSERVE 統計の整合性 SSOT（“見えてないだけ”事故の根絶）
+  - `Unified-STATS total_allocs == total_frees` を確認してから議論する（統計の信頼性ゲート）
+- ✅ Phase 70-2: Refill 最適化の扱い確定（SSOT）
+  - Mixed SSOT（WS=400）で `Unified-STATS miss < 1000` なら **Refill 最適化は凍結（ROIゼロ）**
+  - 現状の実測: miss は極小（例: total miss=5）→ refill最適化は SSOT workload では ROI なし
+  - 詳細: `docs/analysis/PHASE70_71_WARMPOOL16_ANALYSIS.md`
+
+---
+
+**Phase 73: WarmPool=16 の "勝ち筋" を perf で確定** ✅ **完了・パラドックス解決**
+
+- 背景: WarmPool=16 は throughput/CV を改善するが、Unified/WarmPool 等の可視カウンタはほぼ同一 → **「1回あたりのコスト差」**（TLB/LLC/周波数/配置）の可能性が高い
+- 目的: WarmPool=12 vs 16 の差分を **perf stat** で "何が減ったか" に落とし、次の構造最適化（Phase 72）を決め打ちする
+- 方式: **同一バイナリ + cleanenv + 交互実行**（layout tax/環境ドリフトを避ける）
+  - A: `HAKMEM_WARM_POOL_SIZE=12`
+  - B: `HAKMEM_WARM_POOL_SIZE=16`
+  - events: `cycles,instructions,branches,branch-misses,cache-misses,LLC-load-misses,iTLB-load-misses,dTLB-load-misses,page-faults`
+
+**結果**（パラドックス）:
+- ✅ Throughput: +0.91% (46.52M → 46.95M ops/s)
+- ✅ **instructions**: -0.38% (-17.4M instructions) ← **PRIMARY WIN SOURCE**
+- ✅ **branches**: -0.30% (-3.7M branches) ← **SECONDARY WIN SOURCE**
+- ⚠️ **dTLB-load-misses**: +29.06% (28,792 → 37,158) ← **WORSE**
+- ⚠️ **cache-misses**: +17.80% (458K → 540K) ← **WORSE**
+- ✓ page-faults: -0.21% (negligible)
+
+**Phase 71 仮説（REJECTED）**:
+- 予測: "TLB/cache efficiency improvement from memory layout"
+- 実測: TLB/cache metrics both **DEGRADED**
+
+**Phase 73 確定**:
+- 勝ち筋: **Control-flow optimization (instruction/branch count reduction)**
+- 機構: WarmPool=16 がより短い code path を選択 → 17.4M instructions 削減
+- Trade-off: +4MB RSS → worse TLB/cache, but instruction savings dominate
+- Net benefit: ~8.2M cycles saved (instruction/branch) >> ~4.2M cycles lost (TLB/cache)
+
+**詳細**: `docs/analysis/PHASE70_71_WARMPOOL16_ANALYSIS.md` Phase 73 section
+
+**Phase 72（構造）: WarmPool=16 の勝ち筋を増幅（Phase 73 結果が出てから）**
+
+- 前提: Phase 73 で “勝ち筋” を数値で確定してから着手（推測で弄ると Phase 40/41/64 の再発）
+- Phase 73 の結論: **instruction/branch 減が支配的**（TLB/cache はむしろ悪化）→「WarmPool=16 が “短い経路” を踏ませている」ことが本質
+
+**Phase 72-0（SSOT）: “どの関数が短くなったか” を特定してから構造に入る**
+
+- A/B は WarmPool=12 vs 16 のまま（同一バイナリ・cleanenv）
+- perf record を **cycles ではなく instruction/branch で取る**（原因が instruction/branch 減だから）
+  - `perf record -e instructions:u -c 100000 -- ./bench_random_mixed_hakmem_observe 20000000 400 1`
+  - `perf record -e branches:u -c 100000 -- ./bench_random_mixed_hakmem_observe 20000000 400 1`
+- 目的: WarmPool=16 で **instruction share / branch share が減った関数 top 3** を確定（例: `shared_pool_acquire_slab`, `unified_cache_refill`, `warm_pool_do_prefill`, `superslab_refill` 等）
+
+**Phase 72-1（構造）: 特定した関数にだけ手を入れる（箱の境界 1 箇所化）**
+
+- `shared_pool_acquire_slab` 側が主因なら: “scan/lock/mmap” を減らす設計（warm prefill の境界を 1 箇所に固定）
+- `unified_cache_refill` 側が主因なら: “refill の準備/検証” を境界側へ寄せ、hot 側は直線化
+- 注意: 目標は「miss を減らす」ではなく **同じ miss でも “短い経路” を踏ませる**こと（Phase 73 の教訓）

 **注記**: 研究箱の削除は今やらない（link-out/削除が layout tax を起こす前例が強いので、compile-out維持が正解）