Files
hakmem/docs/analysis/PHASE66_PGO_FAST_WITH_LTO_INSTRUCTIONS.md

52 lines
1.7 KiB
Markdown
Raw Permalink Normal View History

# Phase 66: PGO (FAST minimal, GCC+LTO) — Instructions
## Goal
Use GCC PGO **without changing the toolchain** (keep GCC + `-flto`) to reduce layout tax and improve inline/layout decisions for the FAST minimal benchmark binary.
## Principles (Box Theory)
- No “link-out” pruning for performance (layout tax risk).
- A/B must remain fair: same compiler/linker/LTO; only PGO profile differs.
- Fail-fast: profile collection failures abort.
## Workflow (Makefile SSOT)
### Full pipeline
```sh
make pgo-fast-full
```
This runs:
1. `make pgo-fast-profile` — builds profile-gen binaries (FAST minimal)
2. `make pgo-fast-collect` — collects `.gcda` by running deterministic workloads
3. `make pgo-fast-build` — builds PGO-optimized binary and renames it to `bench_random_mixed_hakmem_minimal_pgo`
4. Runs Mixed 10-run with `BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo`
### Manual steps (debug)
```sh
make pgo-fast-profile
make pgo-fast-collect
make pgo-fast-build
BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo scripts/run_mixed_10_cleanenv.sh
```
## Profile workloads (SSOT)
- Config file: `scripts/box/pgo_fast_profile_config.sh`
- Collector: `scripts/box/pgo_tiny_profile_box.sh`
The collector enforces a per-workload timeout and verifies `.gcda` generation.
Important:
- PGO は **training workload と benchmark preset/ENV の一致**が生命線。
- `scripts/box/pgo_fast_profile_config.sh``scripts/run_mixed_10_cleanenv.sh` 経由で profile を取るmismatch を避ける)。
## GO / NO-GO
- GO: Mixed 10-run mean **+1.0%** or more vs `bench_random_mixed_hakmem_minimal`
- NEUTRAL: ±1.0% → keep as research target (do not promote)
- NO-GO: -1.0% or worse → investigate profile mismatch / layout tax / workload coverage