# Phase 66: PGO (FAST minimal, GCC+LTO) — Instructions ## Goal Use GCC PGO **without changing the toolchain** (keep GCC + `-flto`) to reduce layout tax and improve inline/layout decisions for the FAST minimal benchmark binary. ## Principles (Box Theory) - No “link-out” pruning for performance (layout tax risk). - A/B must remain fair: same compiler/linker/LTO; only PGO profile differs. - Fail-fast: profile collection failures abort. ## Workflow (Makefile SSOT) ### Full pipeline ```sh make pgo-fast-full ``` This runs: 1. `make pgo-fast-profile` — builds profile-gen binaries (FAST minimal) 2. `make pgo-fast-collect` — collects `.gcda` by running deterministic workloads 3. `make pgo-fast-build` — builds PGO-optimized binary and renames it to `bench_random_mixed_hakmem_minimal_pgo` 4. Runs Mixed 10-run with `BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo` ### Manual steps (debug) ```sh make pgo-fast-profile make pgo-fast-collect make pgo-fast-build BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo scripts/run_mixed_10_cleanenv.sh ``` ## Profile workloads (SSOT) - Config file: `scripts/box/pgo_fast_profile_config.sh` - Collector: `scripts/box/pgo_tiny_profile_box.sh` The collector enforces a per-workload timeout and verifies `.gcda` generation. Important: - PGO は **training workload と benchmark preset/ENV の一致**が生命線。 - `scripts/box/pgo_fast_profile_config.sh` は `scripts/run_mixed_10_cleanenv.sh` 経由で profile を取る(mismatch を避ける)。 ## GO / NO-GO - GO: Mixed 10-run mean **+1.0%** or more vs `bench_random_mixed_hakmem_minimal` - NEUTRAL: ±1.0% → keep as research target (do not promote) - NO-GO: -1.0% or worse → investigate profile mismatch / layout tax / workload coverage