Files
hakmem/docs/analysis/PHASE66_PGO_FAST_WITH_LTO_INSTRUCTIONS.md
Moe Charm (CI) 84f5034e45 Phase 68: PGO training set diversification (seed/WS expansion)
Changes:
- scripts/box/pgo_fast_profile_config.sh: Expanded WS patterns (3→5) and seeds (1→3)
  for reduced overfitting and better production workload representativeness
- PERFORMANCE_TARGETS_SCORECARD.md: Phase 68 baseline promoted (61.614M = 50.93%)
- CURRENT_TASK.md: Phase 68 marked complete, Phase 67a (layout tax forensics) set Active

Results:
- 10-run verification: +1.19% vs Phase 66 baseline (GO, >+1.0% threshold)
- M1 milestone: 50.93% of mimalloc (target 50%, exceeded by +0.93pp)
- Stability: 10-run mean/median with <2.1% CV

🤖 Generated with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-17 21:08:17 +09:00

1.7 KiB
Raw Blame History

Phase 66: PGO (FAST minimal, GCC+LTO) — Instructions

Goal

Use GCC PGO without changing the toolchain (keep GCC + -flto) to reduce layout tax and improve inline/layout decisions for the FAST minimal benchmark binary.

Principles (Box Theory)

  • No “link-out” pruning for performance (layout tax risk).
  • A/B must remain fair: same compiler/linker/LTO; only PGO profile differs.
  • Fail-fast: profile collection failures abort.

Workflow (Makefile SSOT)

Full pipeline

make pgo-fast-full

This runs:

  1. make pgo-fast-profile — builds profile-gen binaries (FAST minimal)
  2. make pgo-fast-collect — collects .gcda by running deterministic workloads
  3. make pgo-fast-build — builds PGO-optimized binary and renames it to bench_random_mixed_hakmem_minimal_pgo
  4. Runs Mixed 10-run with BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo

Manual steps (debug)

make pgo-fast-profile
make pgo-fast-collect
make pgo-fast-build
BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo scripts/run_mixed_10_cleanenv.sh

Profile workloads (SSOT)

  • Config file: scripts/box/pgo_fast_profile_config.sh
  • Collector: scripts/box/pgo_tiny_profile_box.sh

The collector enforces a per-workload timeout and verifies .gcda generation.

Important:

  • PGO は training workload と benchmark preset/ENV の一致が生命線。
  • scripts/box/pgo_fast_profile_config.shscripts/run_mixed_10_cleanenv.sh 経由で profile を取るmismatch を避ける)。

GO / NO-GO

  • GO: Mixed 10-run mean +1.0% or more vs bench_random_mixed_hakmem_minimal
  • NEUTRAL: ±1.0% → keep as research target (do not promote)
  • NO-GO: -1.0% or worse → investigate profile mismatch / layout tax / workload coverage