Changes: - scripts/box/pgo_fast_profile_config.sh: Expanded WS patterns (3→5) and seeds (1→3) for reduced overfitting and better production workload representativeness - PERFORMANCE_TARGETS_SCORECARD.md: Phase 68 baseline promoted (61.614M = 50.93%) - CURRENT_TASK.md: Phase 68 marked complete, Phase 67a (layout tax forensics) set Active Results: - 10-run verification: +1.19% vs Phase 66 baseline (GO, >+1.0% threshold) - M1 milestone: 50.93% of mimalloc (target 50%, exceeded by +0.93pp) - Stability: 10-run mean/median with <2.1% CV 🤖 Generated with Claude Code Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
1.7 KiB
1.7 KiB
Phase 66: PGO (FAST minimal, GCC+LTO) — Instructions
Goal
Use GCC PGO without changing the toolchain (keep GCC + -flto) to reduce layout tax and improve inline/layout decisions for the FAST minimal benchmark binary.
Principles (Box Theory)
- No “link-out” pruning for performance (layout tax risk).
- A/B must remain fair: same compiler/linker/LTO; only PGO profile differs.
- Fail-fast: profile collection failures abort.
Workflow (Makefile SSOT)
Full pipeline
make pgo-fast-full
This runs:
make pgo-fast-profile— builds profile-gen binaries (FAST minimal)make pgo-fast-collect— collects.gcdaby running deterministic workloadsmake pgo-fast-build— builds PGO-optimized binary and renames it tobench_random_mixed_hakmem_minimal_pgo- Runs Mixed 10-run with
BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo
Manual steps (debug)
make pgo-fast-profile
make pgo-fast-collect
make pgo-fast-build
BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo scripts/run_mixed_10_cleanenv.sh
Profile workloads (SSOT)
- Config file:
scripts/box/pgo_fast_profile_config.sh - Collector:
scripts/box/pgo_tiny_profile_box.sh
The collector enforces a per-workload timeout and verifies .gcda generation.
Important:
- PGO は training workload と benchmark preset/ENV の一致が生命線。
scripts/box/pgo_fast_profile_config.shはscripts/run_mixed_10_cleanenv.sh経由で profile を取る(mismatch を避ける)。
GO / NO-GO
- GO: Mixed 10-run mean +1.0% or more vs
bench_random_mixed_hakmem_minimal - NEUTRAL: ±1.0% → keep as research target (do not promote)
- NO-GO: -1.0% or worse → investigate profile mismatch / layout tax / workload coverage