# Phase 75-5: PGO Regeneration (C5/C6 Inline Slots Aware) — Next Instructions **Status**: NEXT (HIGH PRIORITY) ## Goal Rebuild the FAST PGO SSOT binary (`bench_random_mixed_hakmem_minimal_pgo`) with a training profile that matches the **current promoted defaults**: - `HAKMEM_WARM_POOL_SIZE=16` - `HAKMEM_TINY_C5_INLINE_SLOTS=1` - `HAKMEM_TINY_C6_INLINE_SLOTS=1` This is required because Phase 75-4 observed a large gap between: - **Phase 69 historical FAST baseline** (62.63M ops/s) - **Phase 75-4 current FAST PGO Point A baseline** (53.81M ops/s) ## SSOT Rules - Use `scripts/run_mixed_10_cleanenv.sh` as the harness. - Always pin the binary explicitly via `BENCH_BIN=...` to avoid Standard/FAST confusion. - Keep comparisons within the **same binary** when judging a single knob (C5/C6 OFF/ON). ## Step 1: Prepare training commands (C5/C6 ON) Pick one of these approaches (A is preferred): ### A) Training uses the harness (preferred) Ensure the training workload exports the correct knobs: ```bash export HAKMEM_WARM_POOL_SIZE=16 export HAKMEM_TINY_C5_INLINE_SLOTS=1 export HAKMEM_TINY_C6_INLINE_SLOTS=1 ``` Then run the existing PGO training target (repo-specific; example): ```bash make pgo-fast-full ``` ### B) Hard-pin knobs inside PGO training config (if needed) If the training driver does not inherit ENV cleanly, update the PGO training config script to include: - `HAKMEM_WARM_POOL_SIZE=16` - `HAKMEM_TINY_C5_INLINE_SLOTS=1` - `HAKMEM_TINY_C6_INLINE_SLOTS=1` ## Step 2: Validate the rebuilt binary Run Mixed SSOT 10-run on FAST PGO: ```bash BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo scripts/run_mixed_10_cleanenv.sh ``` Record mean/median/CV and update the scorecard baseline if improved. ## Step 3: Re-run Phase 75-4 matrix on FAST PGO (sanity) Run 4-point matrix on FAST PGO to confirm: - Point D > Point A - and quantify additivity (B/C contributions) ```bash BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo \ HAKMEM_TINY_C5_INLINE_SLOTS=0 HAKMEM_TINY_C6_INLINE_SLOTS=0 RUNS=10 \ scripts/run_mixed_10_cleanenv.sh BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo \ HAKMEM_TINY_C5_INLINE_SLOTS=1 HAKMEM_TINY_C6_INLINE_SLOTS=0 RUNS=10 \ scripts/run_mixed_10_cleanenv.sh BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo \ HAKMEM_TINY_C5_INLINE_SLOTS=0 HAKMEM_TINY_C6_INLINE_SLOTS=1 RUNS=10 \ scripts/run_mixed_10_cleanenv.sh BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo \ HAKMEM_TINY_C5_INLINE_SLOTS=1 HAKMEM_TINY_C6_INLINE_SLOTS=1 RUNS=10 \ scripts/run_mixed_10_cleanenv.sh ``` ## Step 4: If regression persists, do layout tax forensics Use: ```bash ./scripts/box/layout_tax_forensics_box.sh \ ./bench_random_mixed_hakmem_minimal_pgo_phase69_best \ ./bench_random_mixed_hakmem_minimal_pgo ``` Then classify: - IPC drop (>3%) → text layout / inlining / code placement issue - branch-miss spike (>10%) → hint mismatch / control-flow reshaping - cache/dTLB spike → data layout / TLS bloat / spill ## GO/NO-GO Gates - **GO**: FAST PGO baseline recovers significantly (target: close to Phase 69 order-of-magnitude), and Phase 75-4 D vs A remains ≥ +1.0%. - **NEUTRAL**: D vs A stays positive but baseline still low → keep investigating training config. - **NO-GO**: D vs A becomes negative → revert or rework inline slots integration for FAST builds.