Phase 75: record FAST PGO rebase and add PGO regeneration instructions

This commit is contained in:
Moe Charm (CI)
2025-12-18 09:32:43 +09:00
parent 3dbf4acb48
commit e51231471b
3 changed files with 128 additions and 2 deletions

View File

@ -3,12 +3,14 @@
## 0) 今の「正」SSOT ## 0) 今の「正」SSOT
- **性能比較の正**: FAST PGO build`make pgo-fast-full``bench_random_mixed_hakmem_minimal_pgo` **WarmPool=16** - **性能比較の正**: FAST PGO build`make pgo-fast-full``bench_random_mixed_hakmem_minimal_pgo` **WarmPool=16**
- Phase 75C5/C6 inline slotsは presets に昇格済みだが、**FAST PGO での再計測rebaseは未実施**Standard での A/B は +5.41%)。 - Phase 75C5/C6 inline slotsは presets に昇格済み
- Phase 75-4 で FAST PGO rebase を実施し **C5+C6=ON が +3.16% (GO)** を確認(ただし **FAST PGO baseline 自体が Phase 69 から大きく後退**している疑い → Phase 75-5 で PGO 再生成が必要)
- **安全・互換の正**: Standard build`make bench_random_mixed_hakmem` - **安全・互換の正**: Standard build`make bench_random_mixed_hakmem`
- **観測の正**: OBSERVE build`make perf_observe` - **観測の正**: OBSERVE build`make perf_observe`
- **スコアカード(目標/現在値)**: `docs/analysis/PERFORMANCE_TARGETS_SCORECARD.md` - **スコアカード(目標/現在値)**: `docs/analysis/PERFORMANCE_TARGETS_SCORECARD.md`
- **FAST baselineSSOT**: `docs/analysis/PERFORMANCE_TARGETS_SCORECARD.md` を正とするPhase 69: 62.63M ops/s = 51.77% of mimalloc - **FAST baselineSSOT**: `docs/analysis/PERFORMANCE_TARGETS_SCORECARD.md` を正とするPhase 69: 62.63M ops/s = 51.77% of mimalloc
- **Phase 75 の計測**: `bench_random_mixed_hakmem`非PGO/別ターゲット)**A/B +5.41%** を確認Phase 75-3 4-point matrix。FAST PGO への反映rebaseは別途。 - **Phase 75 の計測Standard**: `bench_random_mixed_hakmem` **A/B +5.41%** を確認Phase 75-3 4-point matrix
- **Phase 75 の計測FAST PGO**: `bench_random_mixed_hakmem_minimal_pgo`**A/B +3.16%** を確認Phase 75-4 4-point matrix
- 次の目標: **M2 = 55%**gap は FAST baseline を基準に判断する) - 次の目標: **M2 = 55%**gap は FAST baseline を基準に判断する)
- **Mixed 10-run SSOTハーネス**: `scripts/run_mixed_10_cleanenv.sh` - **Mixed 10-run SSOTハーネス**: `scripts/run_mixed_10_cleanenv.sh`
- デフォルト `BENCH_BIN=./bench_random_mixed_hakmem`Standard - デフォルト `BENCH_BIN=./bench_random_mixed_hakmem`Standard
@ -206,6 +208,23 @@ Per-class Unified-STATS (Mixed SSOT, WS=400, HAKMEM_MEASURE_UNIFIED_CACHE=1):
**Phase 75 Complete**: C5+C6 inline slots (129-256B) deliver +5.41% proven gain **on Standard binary**`bench_random_mixed_hakmem`)。 **Phase 75 Complete**: C5+C6 inline slots (129-256B) deliver +5.41% proven gain **on Standard binary**`bench_random_mixed_hakmem`)。
- FAST PGO baselineスコアカードを更新する前に`BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo` **同条件の A/BC5/C6 OFF/ON** を再計測すること - FAST PGO baselineスコアカードを更新する前に`BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo` **同条件の A/BC5/C6 OFF/ON** を再計測すること
### Phase 75-4FAST PGO rebase✅ 完了
- 結果: **+3.16% (GO)**4-point matrixoutlier 除外後
- 詳細: `docs/analysis/PHASE75_4_FAST_PGO_REBASE_RESULTS.md`
- 重要: Phase 69 FAST baseline (62.63M) と比較して **現行 FAST PGO baseline が大きく低い**疑いPGO profile staleness / training mismatch / build drift
### Phase 75-5PGO 再生成)🟥 次のActiveHIGH PRIORITY
目的:
- C5/C6 inline slots を含む現行コードに対して PGO training を再生成しPhase 69 クラスの FAST baseline を取り戻す
手順骨子:
1. PGO training C5/C6=ON” 前提で回すtraining 時に `HAKMEM_TINY_C5_INLINE_SLOTS=1` / `HAKMEM_TINY_C6_INLINE_SLOTS=1` を必ず設定
2. `make pgo-fast-full` `bench_random_mixed_hakmem_minimal_pgo` を再生成
3. 10-run baseline を再測定しPhase 75-4 Point A/D を再計測
4. Layout tax / drift の疑いが出たら `scripts/box/layout_tax_forensics_box.sh` で原因分類
**参考**: **参考**:
- 4-point matrix 結果: `docs/analysis/PHASE75_3_C5_C6_INTERACTION_RESULTS.md` - 4-point matrix 結果: `docs/analysis/PHASE75_3_C5_C6_INTERACTION_RESULTS.md`
- Test script: `scripts/phase75_3_matrix_test.sh` - Test script: `scripts/phase75_3_matrix_test.sh`

View File

@ -0,0 +1,103 @@
# Phase 75-5: PGO Regeneration (C5/C6 Inline Slots Aware) — Next Instructions
**Status**: NEXT (HIGH PRIORITY)
## Goal
Rebuild the FAST PGO SSOT binary (`bench_random_mixed_hakmem_minimal_pgo`) with a training profile that matches the **current promoted defaults**:
- `HAKMEM_WARM_POOL_SIZE=16`
- `HAKMEM_TINY_C5_INLINE_SLOTS=1`
- `HAKMEM_TINY_C6_INLINE_SLOTS=1`
This is required because Phase 75-4 observed a large gap between:
- **Phase 69 historical FAST baseline** (62.63M ops/s)
- **Phase 75-4 current FAST PGO Point A baseline** (53.81M ops/s)
## SSOT Rules
- Use `scripts/run_mixed_10_cleanenv.sh` as the harness.
- Always pin the binary explicitly via `BENCH_BIN=...` to avoid Standard/FAST confusion.
- Keep comparisons within the **same binary** when judging a single knob (C5/C6 OFF/ON).
## Step 1: Prepare training commands (C5/C6 ON)
Pick one of these approaches (A is preferred):
### A) Training uses the harness (preferred)
Ensure the training workload exports the correct knobs:
```bash
export HAKMEM_WARM_POOL_SIZE=16
export HAKMEM_TINY_C5_INLINE_SLOTS=1
export HAKMEM_TINY_C6_INLINE_SLOTS=1
```
Then run the existing PGO training target (repo-specific; example):
```bash
make pgo-fast-full
```
### B) Hard-pin knobs inside PGO training config (if needed)
If the training driver does not inherit ENV cleanly, update the PGO training config script to include:
- `HAKMEM_WARM_POOL_SIZE=16`
- `HAKMEM_TINY_C5_INLINE_SLOTS=1`
- `HAKMEM_TINY_C6_INLINE_SLOTS=1`
## Step 2: Validate the rebuilt binary
Run Mixed SSOT 10-run on FAST PGO:
```bash
BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo scripts/run_mixed_10_cleanenv.sh
```
Record mean/median/CV and update the scorecard baseline if improved.
## Step 3: Re-run Phase 75-4 matrix on FAST PGO (sanity)
Run 4-point matrix on FAST PGO to confirm:
- Point D > Point A
- and quantify additivity (B/C contributions)
```bash
BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo \
HAKMEM_TINY_C5_INLINE_SLOTS=0 HAKMEM_TINY_C6_INLINE_SLOTS=0 RUNS=10 \
scripts/run_mixed_10_cleanenv.sh
BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo \
HAKMEM_TINY_C5_INLINE_SLOTS=1 HAKMEM_TINY_C6_INLINE_SLOTS=0 RUNS=10 \
scripts/run_mixed_10_cleanenv.sh
BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo \
HAKMEM_TINY_C5_INLINE_SLOTS=0 HAKMEM_TINY_C6_INLINE_SLOTS=1 RUNS=10 \
scripts/run_mixed_10_cleanenv.sh
BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo \
HAKMEM_TINY_C5_INLINE_SLOTS=1 HAKMEM_TINY_C6_INLINE_SLOTS=1 RUNS=10 \
scripts/run_mixed_10_cleanenv.sh
```
## Step 4: If regression persists, do layout tax forensics
Use:
```bash
./scripts/box/layout_tax_forensics_box.sh \
./bench_random_mixed_hakmem_minimal_pgo_phase69_best \
./bench_random_mixed_hakmem_minimal_pgo
```
Then classify:
- IPC drop (>3%) → text layout / inlining / code placement issue
- branch-miss spike (>10%) → hint mismatch / control-flow reshaping
- cache/dTLB spike → data layout / TLS bloat / spill
## GO/NO-GO Gates
- **GO**: FAST PGO baseline recovers significantly (target: close to Phase 69 order-of-magnitude), and Phase 75-4 D vs A remains ≥ +1.0%.
- **NEUTRAL**: D vs A stays positive but baseline still low → keep investigating training config.
- **NO-GO**: D vs A becomes negative → revert or rework inline slots integration for FAST builds.

View File

@ -24,6 +24,10 @@
- `BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo scripts/run_mixed_10_cleanenv.sh` - `BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo scripts/run_mixed_10_cleanenv.sh`
- and toggle `HAKMEM_TINY_C5_INLINE_SLOTS` / `HAKMEM_TINY_C6_INLINE_SLOTS`. - and toggle `HAKMEM_TINY_C5_INLINE_SLOTS` / `HAKMEM_TINY_C6_INLINE_SLOTS`.
**Update**:
- Phase 75-4 completed the FAST PGO rebase and confirmed **+3.16% (GO)** on FAST PGO via a 4-point matrix A/B.
- See `docs/analysis/PHASE75_4_FAST_PGO_REBASE_RESULTS.md`.
--- ---
## Phase 75 Journey ## Phase 75 Journey