CURRENT_TASK: Phase 72-2 complete (WarmPool sweep, all NO-GO, ENV knob ROI exhausted)

Phase 72-2 Results:
- WarmPool=16 (baseline): 56.23M ops/s
- WarmPool=20: 56.13M ops/s (-0.18%, NO-GO)
- WarmPool=24: 56.30M ops/s (+0.12%, noise)
- WarmPool=32: 56.07M ops/s (-0.28%, NO-GO)

Conclusion:
- ENV knob optimization exhausted
- WarmPool=16 remains optimal
- Next: Structural changes (Phase 74)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
This commit is contained in:
Moe Charm (CI)
2025-12-18 06:11:21 +09:00
parent b38a8b73b6
commit e4baa1894f
2 changed files with 40 additions and 6 deletions

View File

@ -194,14 +194,42 @@
- `perf record -e branches:u -c 100000 -- ./bench_random_mixed_hakmem_observe 20000000 400 1`
- 目的: WarmPool=16 で **instruction share / branch share が減った関数 top 3** を確定(例: `shared_pool_acquire_slab`, `unified_cache_refill`, `warm_pool_do_prefill`, `superslab_refill` 等)
**Phase 72-1構造: 特定した関数にだけ手を入れる(箱の境界 1 箇所化)**
**Phase 72-1構造: 特定した関数にだけ手を入れる(箱の境界 1 箇所化)****キャンセルROIゼロ**
- `shared_pool_acquire_slab` 側が主因なら: “scan/lock/mmap” を減らす設計warm prefill の境界を 1 箇所に固定
- `unified_cache_refill` 側が主因なら: “refill の準備/検証” を境界側へ寄せ、hot 側は直線
- 注意: 目標は「miss を減らす」ではなく **同じ miss でも “短い経路” を踏ませる**ことPhase 73 の教訓)
- perf record 結果: `unified_cache_push` が -0.86% branches最大削減
- 当初計画: Unified Cache の FULL drain 最適
- **キャンセル理由**: 全クラスで `full=0`FULL イベントが発生していない)→ ROI ゼロ
**Phase 72-2: WarmPool 追加 sweep****完了ROI枯れ**
- 目的: WarmPool=16 以外に勝者がいるか確認
- Baseline: WarmPool=16 = 56.23M ops/s (10-run)
- 結果:
- WarmPool=20: 56.13M ops/s (**-0.18%**, NO-GO)
- WarmPool=24: 56.30M ops/s (**+0.12%**, 誤差範囲)
- WarmPool=32: 56.07M ops/s (**-0.28%**, NO-GO)
- **判定**: 全候補が ±0.5% 以内 → **Phase 72 終了ENV knob ROI 枯れ)**
---
**Phase 72 総括**:
- **確定**: WarmPool=16 が最適値Phase 69 で確定、Phase 72 で再確認)
- **確定**: ENV knob による追加最適化の余地なし
- **勝ち筋**: instruction/branch 削減が支配的Phase 73 で確定)
- **次のステップ**: 構造変更(コード変更)が必要
**注記**: 研究箱の削除は今やらないlink-out/削除が layout tax を起こす前例が強いので、compile-out維持が正解
---
**Phase 74次候補: 構造変更による最適化**
- **前提**: ENV knob ROI 枯れ → コード変更が必要
- **候補 A**: `unified_cache_push` の branch 削減Phase 72-0 で最大寄与確認済み)
- **候補 B**: hot path の inline 強化layout tax リスクあり、要 forensics
- **候補 C**: PGO profile 再調整WarmPool=16 前提で retrain
- **判定基準**: +1.0% → GO、+0.5% 未満 → NO-GO
## 3) アーカイブ
- 詳細ログ: `CURRENT_TASK_ARCHIVE_20251210.md`

View File

@ -1,11 +1,12 @@
# Phase 70: Refill Tuning Prerequisites (Observability SSOT)
**Status**: 🟡 ACTIVE
**Status**: ✅ COMPLETE (SSOT established)
**Context**:
- Current baseline (Mixed WS=400 + prefault) generates almost **zero cache misses** (refills) in Unified Cache.
- Optimizing `unified_cache_refill()` or Warm Pool logic will yield **zero throughput gain** if this path is not hot.
- Phase 70 must be gated on *observing significant refill activity*.
- Current conclusion (WS=400 SSOT): Refill/WarmPool-pop is **not hot** (misses are extremely low), so refill micro-optimizations are **frozen** for SSOT workloads; only research workloads should touch refill behavior.
## 1. Observability Protocol (Step 0)
@ -53,6 +54,11 @@ HAKMEM_BENCH_PREFAULT=0 ./bench_random_mixed_hakmem_observe ...
**WARNING**: Do NOT change `HAKMEM_TINY_STATIC_ROUTE` or `ULTRA` flags unless specifically testing routing changes. The default `LEGACY` route *does* use Unified Cache for alloc/free.
**NOTE (Warm Pool sizing semantics)**:
- `HAKMEM_WARM_POOL_SIZE` primarily controls the **registry-scan prefill cap** (`warm_pool_max_per_class()`).
- The steady-state push-back cap inside `unified_cache_refill()` is `TinyClassPolicy.warm_cap` (typically 4/8), so
`HAKMEM_WARM_POOL_SIZE` only matters when registry scans/refills happen often enough to benefit from prefilled slabs.
## 3. Reference: Why "LEGACY" Route is OK
- **LEGACY** route means "not ULTRA/MID/V7 specialized".
@ -63,4 +69,4 @@ HAKMEM_BENCH_PREFAULT=0 ./bench_random_mixed_hakmem_observe ...
Previous confusion ("LEGACY unused") was due to:
1. Stats counting was gated by `#if !HAKMEM_BUILD_RELEASE`.
2. Low miss rate in WS=400 made it look unused.
3. Phase 70-0 fixed the stats visibility.
3. Phase 70-0 fixed the stats visibility.