From e4baa1894f0a2d6584402cd476d7e9e0f7b8f911 Mon Sep 17 00:00:00 2001 From: "Moe Charm (CI)" Date: Thu, 18 Dec 2025 06:11:21 +0900 Subject: [PATCH] CURRENT_TASK: Phase 72-2 complete (WarmPool sweep, all NO-GO, ENV knob ROI exhausted) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Phase 72-2 Results: - WarmPool=16 (baseline): 56.23M ops/s - WarmPool=20: 56.13M ops/s (-0.18%, NO-GO) - WarmPool=24: 56.30M ops/s (+0.12%, noise) - WarmPool=32: 56.07M ops/s (-0.28%, NO-GO) Conclusion: - ENV knob optimization exhausted - WarmPool=16 remains optimal - Next: Structural changes (Phase 74) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Haiku 4.5 --- CURRENT_TASK.md | 36 ++++++++++++++++--- ...ASE70_REFILL_OBSERVABILITY_PREREQS_SSOT.md | 10 ++++-- 2 files changed, 40 insertions(+), 6 deletions(-) diff --git a/CURRENT_TASK.md b/CURRENT_TASK.md index b4ceea8c..e091896d 100644 --- a/CURRENT_TASK.md +++ b/CURRENT_TASK.md @@ -194,14 +194,42 @@ - `perf record -e branches:u -c 100000 -- ./bench_random_mixed_hakmem_observe 20000000 400 1` - 目的: WarmPool=16 で **instruction share / branch share が減った関数 top 3** を確定(例: `shared_pool_acquire_slab`, `unified_cache_refill`, `warm_pool_do_prefill`, `superslab_refill` 等) -**Phase 72-1(構造): 特定した関数にだけ手を入れる(箱の境界 1 箇所化)** +**Phase 72-1(構造): 特定した関数にだけ手を入れる(箱の境界 1 箇所化)** ✅ **キャンセル(ROIゼロ)** -- `shared_pool_acquire_slab` 側が主因なら: “scan/lock/mmap” を減らす設計(warm prefill の境界を 1 箇所に固定) -- `unified_cache_refill` 側が主因なら: “refill の準備/検証” を境界側へ寄せ、hot 側は直線化 -- 注意: 目標は「miss を減らす」ではなく **同じ miss でも “短い経路” を踏ませる**こと(Phase 73 の教訓) +- perf record 結果: `unified_cache_push` が -0.86% branches(最大削減) +- 当初計画: Unified Cache の FULL drain 最適化 +- **キャンセル理由**: 全クラスで `full=0`(FULL イベントが発生していない)→ ROI ゼロ + +**Phase 72-2: WarmPool 追加 sweep** ✅ **完了(ROI枯れ)** + +- 目的: WarmPool=16 以外に勝者がいるか確認 +- Baseline: WarmPool=16 = 56.23M ops/s (10-run) +- 結果: + - WarmPool=20: 56.13M ops/s (**-0.18%**, NO-GO) + - WarmPool=24: 56.30M ops/s (**+0.12%**, 誤差範囲) + - WarmPool=32: 56.07M ops/s (**-0.28%**, NO-GO) +- **判定**: 全候補が ±0.5% 以内 → **Phase 72 終了(ENV knob ROI 枯れ)** + +--- + +**Phase 72 総括**: +- **確定**: WarmPool=16 が最適値(Phase 69 で確定、Phase 72 で再確認) +- **確定**: ENV knob による追加最適化の余地なし +- **勝ち筋**: instruction/branch 削減が支配的(Phase 73 で確定) +- **次のステップ**: 構造変更(コード変更)が必要 **注記**: 研究箱の削除は今やらない(link-out/削除が layout tax を起こす前例が強いので、compile-out維持が正解) +--- + +**Phase 74(次候補): 構造変更による最適化** + +- **前提**: ENV knob ROI 枯れ → コード変更が必要 +- **候補 A**: `unified_cache_push` の branch 削減(Phase 72-0 で最大寄与確認済み) +- **候補 B**: hot path の inline 強化(layout tax リスクあり、要 forensics) +- **候補 C**: PGO profile 再調整(WarmPool=16 前提で retrain) +- **判定基準**: +1.0% → GO、+0.5% 未満 → NO-GO + ## 3) アーカイブ - 詳細ログ: `CURRENT_TASK_ARCHIVE_20251210.md` diff --git a/docs/analysis/PHASE70_REFILL_OBSERVABILITY_PREREQS_SSOT.md b/docs/analysis/PHASE70_REFILL_OBSERVABILITY_PREREQS_SSOT.md index ce29a7b4..2c10d4fc 100644 --- a/docs/analysis/PHASE70_REFILL_OBSERVABILITY_PREREQS_SSOT.md +++ b/docs/analysis/PHASE70_REFILL_OBSERVABILITY_PREREQS_SSOT.md @@ -1,11 +1,12 @@ # Phase 70: Refill Tuning Prerequisites (Observability SSOT) -**Status**: 🟡 ACTIVE +**Status**: ✅ COMPLETE (SSOT established) **Context**: - Current baseline (Mixed WS=400 + prefault) generates almost **zero cache misses** (refills) in Unified Cache. - Optimizing `unified_cache_refill()` or Warm Pool logic will yield **zero throughput gain** if this path is not hot. - Phase 70 must be gated on *observing significant refill activity*. +- Current conclusion (WS=400 SSOT): Refill/WarmPool-pop is **not hot** (misses are extremely low), so refill micro-optimizations are **frozen** for SSOT workloads; only research workloads should touch refill behavior. ## 1. Observability Protocol (Step 0) @@ -53,6 +54,11 @@ HAKMEM_BENCH_PREFAULT=0 ./bench_random_mixed_hakmem_observe ... **WARNING**: Do NOT change `HAKMEM_TINY_STATIC_ROUTE` or `ULTRA` flags unless specifically testing routing changes. The default `LEGACY` route *does* use Unified Cache for alloc/free. +**NOTE (Warm Pool sizing semantics)**: +- `HAKMEM_WARM_POOL_SIZE` primarily controls the **registry-scan prefill cap** (`warm_pool_max_per_class()`). +- The steady-state push-back cap inside `unified_cache_refill()` is `TinyClassPolicy.warm_cap` (typically 4/8), so + `HAKMEM_WARM_POOL_SIZE` only matters when registry scans/refills happen often enough to benefit from prefilled slabs. + ## 3. Reference: Why "LEGACY" Route is OK - **LEGACY** route means "not ULTRA/MID/V7 specialized". @@ -63,4 +69,4 @@ HAKMEM_BENCH_PREFAULT=0 ./bench_random_mixed_hakmem_observe ... Previous confusion ("LEGACY unused") was due to: 1. Stats counting was gated by `#if !HAKMEM_BUILD_RELEASE`. 2. Low miss rate in WS=400 made it look unused. -3. Phase 70-0 fixed the stats visibility. \ No newline at end of file +3. Phase 70-0 fixed the stats visibility.