Phase 75: record FAST PGO rebase and add PGO regeneration instructions

2025-12-18 09:32:43 +09:00
parent 3dbf4acb48
commit e51231471b
3 changed files with 128 additions and 2 deletions
--- a/CURRENT_TASK.md
+++ b/CURRENT_TASK.md
@ -3,12 +3,14 @@
 ## 0) 今の「正」（SSOT）
 - **性能比較の正**: FAST PGO build（`make pgo-fast-full` → `bench_random_mixed_hakmem_minimal_pgo`）＋ **WarmPool=16**
-  - Phase 75（C5/C6 inline slots）は presets に昇格済みだが、**FAST PGO での再計測（rebase）は未実施**（Standard での A/B は +5.41%）。
+  - Phase 75（C5/C6 inline slots）は presets に昇格済み
  - Phase 75-4 で FAST PGO rebase を実施し **C5+C6=ON が +3.16% (GO)** を確認（ただし **FAST PGO baseline 自体が Phase 69 から大きく後退**している疑い → Phase 75-5 で PGO 再生成が必要）
 - **安全・互換の正**: Standard build（`make bench_random_mixed_hakmem`）
 - **観測の正**: OBSERVE build（`make perf_observe`）
 - **スコアカード（目標/現在値）**: `docs/analysis/PERFORMANCE_TARGETS_SCORECARD.md`
  - **FAST baseline（SSOT）**: `docs/analysis/PERFORMANCE_TARGETS_SCORECARD.md` を正とする（Phase 69: 62.63M ops/s = 51.77% of mimalloc）
-  - **Phase 75 の計測**: `bench_random_mixed_hakmem`（非PGO/別ターゲット）で **A/B +5.41%** を確認（Phase 75-3 4-point matrix）。FAST PGO への反映（rebase）は別途。
+  - **Phase 75 の計測（Standard）**: `bench_random_mixed_hakmem` で **A/B +5.41%** を確認（Phase 75-3 4-point matrix）
  - **Phase 75 の計測（FAST PGO）**: `bench_random_mixed_hakmem_minimal_pgo` で **A/B +3.16%** を確認（Phase 75-4 4-point matrix）
  - 次の目標: **M2 = 55%**（gap は FAST baseline を基準に判断する）
 - **Mixed 10-run SSOT（ハーネス）**: `scripts/run_mixed_10_cleanenv.sh`
  - デフォルト `BENCH_BIN=./bench_random_mixed_hakmem`（Standard）
@ -206,6 +208,23 @@ Per-class Unified-STATS (Mixed SSOT, WS=400, HAKMEM_MEASURE_UNIFIED_CACHE=1):
 **Phase 75 Complete**: C5+C6 inline slots (129-256B) deliver +5.41% proven gain **on Standard binary**（`bench_random_mixed_hakmem`）。
 - FAST PGO baseline（スコアカード）を更新する前に、`BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo` で **同条件の A/B（C5/C6 OFF/ON）** を再計測すること。
 ### Phase 75-4（FAST PGO rebase）✅ 完了
 - 結果: **+3.16% (GO)**（4-point matrix、outlier 除外後）
 - 詳細: `docs/analysis/PHASE75_4_FAST_PGO_REBASE_RESULTS.md`
 - 重要: Phase 69 の FAST baseline (62.63M) と比較して **現行 FAST PGO baseline が大きく低い**疑い（PGO profile staleness / training mismatch / build drift）
 ### Phase 75-5（PGO 再生成）🟥 次のActive（HIGH PRIORITY）
 目的:
 - C5/C6 inline slots を含む現行コードに対して PGO training を再生成し、Phase 69 クラスの FAST baseline を取り戻す。
 手順（骨子）:
 1. PGO training を “C5/C6=ON” 前提で回す（training 時に `HAKMEM_TINY_C5_INLINE_SLOTS=1` / `HAKMEM_TINY_C6_INLINE_SLOTS=1` を必ず設定）
 2. `make pgo-fast-full` で `bench_random_mixed_hakmem_minimal_pgo` を再生成
 3. 10-run で baseline を再測定し、Phase 75-4 の Point A/D を再計測
 4. Layout tax / drift の疑いが出たら `scripts/box/layout_tax_forensics_box.sh` で原因分類
 **参考**:
 - 4-point matrix 結果: `docs/analysis/PHASE75_3_C5_C6_INTERACTION_RESULTS.md`
 - Test script: `scripts/phase75_3_matrix_test.sh`
--- a/docs/analysis/PHASE75_5_PGO_REGENERATION_NEXT_INSTRUCTIONS.md
+++ b/docs/analysis/PHASE75_5_PGO_REGENERATION_NEXT_INSTRUCTIONS.md
@ -0,0 +1,103 @@
 # Phase 75-5: PGO Regeneration (C5/C6 Inline Slots Aware) — Next Instructions
 **Status**: NEXT (HIGH PRIORITY)
 ## Goal
 Rebuild the FAST PGO SSOT binary (`bench_random_mixed_hakmem_minimal_pgo`) with a training profile that matches the **current promoted defaults**:
 - `HAKMEM_WARM_POOL_SIZE=16`
 - `HAKMEM_TINY_C5_INLINE_SLOTS=1`
 - `HAKMEM_TINY_C6_INLINE_SLOTS=1`
 This is required because Phase 75-4 observed a large gap between:
 - **Phase 69 historical FAST baseline** (62.63M ops/s)
 - **Phase 75-4 current FAST PGO Point A baseline** (53.81M ops/s)
 ## SSOT Rules
 - Use `scripts/run_mixed_10_cleanenv.sh` as the harness.
 - Always pin the binary explicitly via `BENCH_BIN=...` to avoid Standard/FAST confusion.
 - Keep comparisons within the **same binary** when judging a single knob (C5/C6 OFF/ON).
 ## Step 1: Prepare training commands (C5/C6 ON)
 Pick one of these approaches (A is preferred):
 ### A) Training uses the harness (preferred)
 Ensure the training workload exports the correct knobs:
 ```bash
 export HAKMEM_WARM_POOL_SIZE=16
 export HAKMEM_TINY_C5_INLINE_SLOTS=1
 export HAKMEM_TINY_C6_INLINE_SLOTS=1
 ```
 Then run the existing PGO training target (repo-specific; example):
 ```bash
 make pgo-fast-full
 ```
 ### B) Hard-pin knobs inside PGO training config (if needed)
 If the training driver does not inherit ENV cleanly, update the PGO training config script to include:
 - `HAKMEM_WARM_POOL_SIZE=16`
 - `HAKMEM_TINY_C5_INLINE_SLOTS=1`
 - `HAKMEM_TINY_C6_INLINE_SLOTS=1`
 ## Step 2: Validate the rebuilt binary
 Run Mixed SSOT 10-run on FAST PGO:
 ```bash
 BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo scripts/run_mixed_10_cleanenv.sh
 ```
 Record mean/median/CV and update the scorecard baseline if improved.
 ## Step 3: Re-run Phase 75-4 matrix on FAST PGO (sanity)
 Run 4-point matrix on FAST PGO to confirm:
 - Point D > Point A
 - and quantify additivity (B/C contributions)
 ```bash
 BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo \
  HAKMEM_TINY_C5_INLINE_SLOTS=0 HAKMEM_TINY_C6_INLINE_SLOTS=0 RUNS=10 \
  scripts/run_mixed_10_cleanenv.sh
 BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo \
  HAKMEM_TINY_C5_INLINE_SLOTS=1 HAKMEM_TINY_C6_INLINE_SLOTS=0 RUNS=10 \
  scripts/run_mixed_10_cleanenv.sh
 BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo \
  HAKMEM_TINY_C5_INLINE_SLOTS=0 HAKMEM_TINY_C6_INLINE_SLOTS=1 RUNS=10 \
  scripts/run_mixed_10_cleanenv.sh
 BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo \
  HAKMEM_TINY_C5_INLINE_SLOTS=1 HAKMEM_TINY_C6_INLINE_SLOTS=1 RUNS=10 \
  scripts/run_mixed_10_cleanenv.sh
 ```
 ## Step 4: If regression persists, do layout tax forensics
 Use:
 ```bash
 ./scripts/box/layout_tax_forensics_box.sh \
  ./bench_random_mixed_hakmem_minimal_pgo_phase69_best \
  ./bench_random_mixed_hakmem_minimal_pgo
 ```
 Then classify:
 - IPC drop (>3%) → text layout / inlining / code placement issue
 - branch-miss spike (>10%) → hint mismatch / control-flow reshaping
 - cache/dTLB spike → data layout / TLS bloat / spill
 ## GO/NO-GO Gates
 - **GO**: FAST PGO baseline recovers significantly (target: close to Phase 69 order-of-magnitude), and Phase 75-4 D vs A remains ≥ +1.0%.
 - **NEUTRAL**: D vs A stays positive but baseline still low → keep investigating training config.
 - **NO-GO**: D vs A becomes negative → revert or rework inline slots integration for FAST builds.
--- a/docs/analysis/PHASE75_COMPLETE_SUMMARY.md
+++ b/docs/analysis/PHASE75_COMPLETE_SUMMARY.md
@ -24,6 +24,10 @@
  - `BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo scripts/run_mixed_10_cleanenv.sh`
  - and toggle `HAKMEM_TINY_C5_INLINE_SLOTS` / `HAKMEM_TINY_C6_INLINE_SLOTS`.
 **Update**:
 - Phase 75-4 completed the FAST PGO rebase and confirmed **+3.16% (GO)** on FAST PGO via a 4-point matrix A/B.
 - See `docs/analysis/PHASE75_4_FAST_PGO_REBASE_RESULTS.md`.
 ---
 ## Phase 75 Journey