Files
hakmem/docs/analysis/PHASE66_PGO_FAST_WITH_LTO_RESULTS.md
Moe Charm (CI) 84f5034e45 Phase 68: PGO training set diversification (seed/WS expansion)
Changes:
- scripts/box/pgo_fast_profile_config.sh: Expanded WS patterns (3→5) and seeds (1→3)
  for reduced overfitting and better production workload representativeness
- PERFORMANCE_TARGETS_SCORECARD.md: Phase 68 baseline promoted (61.614M = 50.93%)
- CURRENT_TASK.md: Phase 68 marked complete, Phase 67a (layout tax forensics) set Active

Results:
- 10-run verification: +1.19% vs Phase 66 baseline (GO, >+1.0% threshold)
- M1 milestone: 50.93% of mimalloc (target 50%, exceeded by +0.93pp)
- Stability: 10-run mean/median with <2.1% CV

🤖 Generated with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-17 21:08:17 +09:00

1.7 KiB
Raw Blame History

Phase 66: PGO (FAST minimal, GCC+LTO) — Results

TL;DR

PGO は GOBENCH_MINIMAL の Mixed 10-run で +6.58%meanを達成。

What changed

  • Makefile: pgo-fast-* の PGO ワークフローを追加GCC + -flto を維持)
  • scripts/box/pgo_tiny_profile_box.sh: PGO_CONFIG 切替対応 + workload を bash -lc で実行
  • scripts/box/pgo_fast_profile_config.sh: FAST minimal 用の PGO 代表ワークロードcleanenv 前提)
  • Makefile: bench_tiny_hot_hakmem$(TINY_BENCH_OBJS) でリンクLTO 時の未解決参照を解消)

A/B (Mixed 10-run, cleanenv)

計測の正:

  • scripts/run_mixed_10_cleanenv.shITERS=20000000 WS=400
  • baseline: bench_random_mixed_hakmem_minimal
  • treatment: bench_random_mixed_hakmem_minimal_pgo

結果n=10:

  • Baseline mean: 61.718839M ops/s / median: 61.672012M ops/s
  • PGO mean: 65.780056M ops/s / median: 66.227247M ops/s
  • Delta: +6.58% mean / +7.38% median

Verdict: GObuild-level のため +1.0% 以上で十分)

Key lesson (important)

PGO は profile mismatch で簡単に NO-GO になる

  • NG 例: bench_random_mixed_hakmem を “直起動” で profile 収集
    • preset/ENV が一致せず、FASTLANE_DIRECT 等が OFF のプロファイルが混ざる
    • 結果: PGO が逆方向に最適化して -5% 級の regression になり得る
  • OK 例(本 Phase 66: cleanenv 経由で profile 収集
    • scripts/box/pgo_fast_profile_config.shscripts/run_mixed_10_cleanenv.sh を使う

How to reproduce

make pgo-fast-full

(手動手順は docs/analysis/PHASE66_PGO_FAST_WITH_LTO_INSTRUCTIONS.md