Phase 75: record FAST PGO rebase and add PGO regeneration instructions

2025-12-18 09:32:43 +09:00
parent 3dbf4acb48
commit e51231471b
3 changed files with 128 additions and 2 deletions
--- a/docs/analysis/PHASE75_5_PGO_REGENERATION_NEXT_INSTRUCTIONS.md
+++ b/docs/analysis/PHASE75_5_PGO_REGENERATION_NEXT_INSTRUCTIONS.md
@ -0,0 +1,103 @@
+# Phase 75-5: PGO Regeneration (C5/C6 Inline Slots Aware) — Next Instructions
+
+**Status**: NEXT (HIGH PRIORITY)
+
+## Goal
+
+Rebuild the FAST PGO SSOT binary (`bench_random_mixed_hakmem_minimal_pgo`) with a training profile that matches the **current promoted defaults**:
+- `HAKMEM_WARM_POOL_SIZE=16`
+- `HAKMEM_TINY_C5_INLINE_SLOTS=1`
+- `HAKMEM_TINY_C6_INLINE_SLOTS=1`
+
+This is required because Phase 75-4 observed a large gap between:
+- **Phase 69 historical FAST baseline** (62.63M ops/s)
+- **Phase 75-4 current FAST PGO Point A baseline** (53.81M ops/s)
+
+## SSOT Rules
+
+- Use `scripts/run_mixed_10_cleanenv.sh` as the harness.
+- Always pin the binary explicitly via `BENCH_BIN=...` to avoid Standard/FAST confusion.
+- Keep comparisons within the **same binary** when judging a single knob (C5/C6 OFF/ON).
+
+## Step 1: Prepare training commands (C5/C6 ON)
+
+Pick one of these approaches (A is preferred):
+
+### A) Training uses the harness (preferred)
+
+Ensure the training workload exports the correct knobs:
+
+```bash
+export HAKMEM_WARM_POOL_SIZE=16
+export HAKMEM_TINY_C5_INLINE_SLOTS=1
+export HAKMEM_TINY_C6_INLINE_SLOTS=1
+```
+
+Then run the existing PGO training target (repo-specific; example):
+
+```bash
+make pgo-fast-full
+```
+
+### B) Hard-pin knobs inside PGO training config (if needed)
+
+If the training driver does not inherit ENV cleanly, update the PGO training config script to include:
+- `HAKMEM_WARM_POOL_SIZE=16`
+- `HAKMEM_TINY_C5_INLINE_SLOTS=1`
+- `HAKMEM_TINY_C6_INLINE_SLOTS=1`
+
+## Step 2: Validate the rebuilt binary
+
+Run Mixed SSOT 10-run on FAST PGO:
+
+```bash
+BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo scripts/run_mixed_10_cleanenv.sh
+```
+
+Record mean/median/CV and update the scorecard baseline if improved.
+
+## Step 3: Re-run Phase 75-4 matrix on FAST PGO (sanity)
+
+Run 4-point matrix on FAST PGO to confirm:
+- Point D > Point A
+- and quantify additivity (B/C contributions)
+
+```bash
+BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo \
+  HAKMEM_TINY_C5_INLINE_SLOTS=0 HAKMEM_TINY_C6_INLINE_SLOTS=0 RUNS=10 \
+  scripts/run_mixed_10_cleanenv.sh
+
+BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo \
+  HAKMEM_TINY_C5_INLINE_SLOTS=1 HAKMEM_TINY_C6_INLINE_SLOTS=0 RUNS=10 \
+  scripts/run_mixed_10_cleanenv.sh
+
+BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo \
+  HAKMEM_TINY_C5_INLINE_SLOTS=0 HAKMEM_TINY_C6_INLINE_SLOTS=1 RUNS=10 \
+  scripts/run_mixed_10_cleanenv.sh
+
+BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo \
+  HAKMEM_TINY_C5_INLINE_SLOTS=1 HAKMEM_TINY_C6_INLINE_SLOTS=1 RUNS=10 \
+  scripts/run_mixed_10_cleanenv.sh
+```
+
+## Step 4: If regression persists, do layout tax forensics
+
+Use:
+
+```bash
+./scripts/box/layout_tax_forensics_box.sh \
+  ./bench_random_mixed_hakmem_minimal_pgo_phase69_best \
+  ./bench_random_mixed_hakmem_minimal_pgo
+```
+
+Then classify:
+- IPC drop (>3%) → text layout / inlining / code placement issue
+- branch-miss spike (>10%) → hint mismatch / control-flow reshaping
+- cache/dTLB spike → data layout / TLS bloat / spill
+
+## GO/NO-GO Gates
+
+- **GO**: FAST PGO baseline recovers significantly (target: close to Phase 69 order-of-magnitude), and Phase 75-4 D vs A remains ≥ +1.0%.
+- **NEUTRAL**: D vs A stays positive but baseline still low → keep investigating training config.
+- **NO-GO**: D vs A becomes negative → revert or rework inline slots integration for FAST builds.
+
--- a/docs/analysis/PHASE75_COMPLETE_SUMMARY.md
+++ b/docs/analysis/PHASE75_COMPLETE_SUMMARY.md
@ -24,6 +24,10 @@
  - `BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo scripts/run_mixed_10_cleanenv.sh`
  - and toggle `HAKMEM_TINY_C5_INLINE_SLOTS` / `HAKMEM_TINY_C6_INLINE_SLOTS`.

+**Update**:
+- Phase 75-4 completed the FAST PGO rebase and confirmed **+3.16% (GO)** on FAST PGO via a 4-point matrix A/B.
+- See `docs/analysis/PHASE75_4_FAST_PGO_REBASE_RESULTS.md`.
+
 ---

 ## Phase 75 Journey