Changes: - scripts/box/pgo_fast_profile_config.sh: Expanded WS patterns (3→5) and seeds (1→3) for reduced overfitting and better production workload representativeness - PERFORMANCE_TARGETS_SCORECARD.md: Phase 68 baseline promoted (61.614M = 50.93%) - CURRENT_TASK.md: Phase 68 marked complete, Phase 67a (layout tax forensics) set Active Results: - 10-run verification: +1.19% vs Phase 66 baseline (GO, >+1.0% threshold) - M1 milestone: 50.93% of mimalloc (target 50%, exceeded by +0.93pp) - Stability: 10-run mean/median with <2.1% CV 🤖 Generated with Claude Code Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
1.7 KiB
1.7 KiB
Phase 66: PGO (FAST minimal, GCC+LTO) — Results
TL;DR
PGO は GO。BENCH_MINIMAL の Mixed 10-run で +6.58%(mean)を達成。
What changed
- Makefile:
pgo-fast-*の PGO ワークフローを追加(GCC +-fltoを維持) scripts/box/pgo_tiny_profile_box.sh:PGO_CONFIG切替対応 + workload をbash -lcで実行scripts/box/pgo_fast_profile_config.sh: FAST minimal 用の PGO 代表ワークロード(cleanenv 前提)- Makefile:
bench_tiny_hot_hakmemを$(TINY_BENCH_OBJS)でリンク(LTO 時の未解決参照を解消)
A/B (Mixed 10-run, cleanenv)
計測の正:
scripts/run_mixed_10_cleanenv.sh(ITERS=20000000 WS=400)- baseline:
bench_random_mixed_hakmem_minimal - treatment:
bench_random_mixed_hakmem_minimal_pgo
結果(n=10):
- Baseline mean:
61.718839M ops/s/ median:61.672012M ops/s - PGO mean:
65.780056M ops/s/ median:66.227247M ops/s - Delta: +6.58% mean / +7.38% median
Verdict: ✅ GO(build-level のため +1.0% 以上で十分)
Key lesson (important)
PGO は profile mismatch で簡単に NO-GO になる。
- NG 例:
bench_random_mixed_hakmemを “直起動” で profile 収集- preset/ENV が一致せず、
FASTLANE_DIRECT等が OFF のプロファイルが混ざる - 結果: PGO が逆方向に最適化して -5% 級の regression になり得る
- preset/ENV が一致せず、
- OK 例(本 Phase 66): cleanenv 経由で profile 収集
scripts/box/pgo_fast_profile_config.shがscripts/run_mixed_10_cleanenv.shを使う
How to reproduce
make pgo-fast-full
(手動手順は docs/analysis/PHASE66_PGO_FAST_WITH_LTO_INSTRUCTIONS.md)