Phase 75: record FAST PGO rebase and add PGO regeneration instructions
This commit is contained in:
@ -3,12 +3,14 @@
|
|||||||
## 0) 今の「正」(SSOT)
|
## 0) 今の「正」(SSOT)
|
||||||
|
|
||||||
- **性能比較の正**: FAST PGO build(`make pgo-fast-full` → `bench_random_mixed_hakmem_minimal_pgo`)+ **WarmPool=16**
|
- **性能比較の正**: FAST PGO build(`make pgo-fast-full` → `bench_random_mixed_hakmem_minimal_pgo`)+ **WarmPool=16**
|
||||||
- Phase 75(C5/C6 inline slots)は presets に昇格済みだが、**FAST PGO での再計測(rebase)は未実施**(Standard での A/B は +5.41%)。
|
- Phase 75(C5/C6 inline slots)は presets に昇格済み
|
||||||
|
- Phase 75-4 で FAST PGO rebase を実施し **C5+C6=ON が +3.16% (GO)** を確認(ただし **FAST PGO baseline 自体が Phase 69 から大きく後退**している疑い → Phase 75-5 で PGO 再生成が必要)
|
||||||
- **安全・互換の正**: Standard build(`make bench_random_mixed_hakmem`)
|
- **安全・互換の正**: Standard build(`make bench_random_mixed_hakmem`)
|
||||||
- **観測の正**: OBSERVE build(`make perf_observe`)
|
- **観測の正**: OBSERVE build(`make perf_observe`)
|
||||||
- **スコアカード(目標/現在値)**: `docs/analysis/PERFORMANCE_TARGETS_SCORECARD.md`
|
- **スコアカード(目標/現在値)**: `docs/analysis/PERFORMANCE_TARGETS_SCORECARD.md`
|
||||||
- **FAST baseline(SSOT)**: `docs/analysis/PERFORMANCE_TARGETS_SCORECARD.md` を正とする(Phase 69: 62.63M ops/s = 51.77% of mimalloc)
|
- **FAST baseline(SSOT)**: `docs/analysis/PERFORMANCE_TARGETS_SCORECARD.md` を正とする(Phase 69: 62.63M ops/s = 51.77% of mimalloc)
|
||||||
- **Phase 75 の計測**: `bench_random_mixed_hakmem`(非PGO/別ターゲット)で **A/B +5.41%** を確認(Phase 75-3 4-point matrix)。FAST PGO への反映(rebase)は別途。
|
- **Phase 75 の計測(Standard)**: `bench_random_mixed_hakmem` で **A/B +5.41%** を確認(Phase 75-3 4-point matrix)
|
||||||
|
- **Phase 75 の計測(FAST PGO)**: `bench_random_mixed_hakmem_minimal_pgo` で **A/B +3.16%** を確認(Phase 75-4 4-point matrix)
|
||||||
- 次の目標: **M2 = 55%**(gap は FAST baseline を基準に判断する)
|
- 次の目標: **M2 = 55%**(gap は FAST baseline を基準に判断する)
|
||||||
- **Mixed 10-run SSOT(ハーネス)**: `scripts/run_mixed_10_cleanenv.sh`
|
- **Mixed 10-run SSOT(ハーネス)**: `scripts/run_mixed_10_cleanenv.sh`
|
||||||
- デフォルト `BENCH_BIN=./bench_random_mixed_hakmem`(Standard)
|
- デフォルト `BENCH_BIN=./bench_random_mixed_hakmem`(Standard)
|
||||||
@ -206,6 +208,23 @@ Per-class Unified-STATS (Mixed SSOT, WS=400, HAKMEM_MEASURE_UNIFIED_CACHE=1):
|
|||||||
**Phase 75 Complete**: C5+C6 inline slots (129-256B) deliver +5.41% proven gain **on Standard binary**(`bench_random_mixed_hakmem`)。
|
**Phase 75 Complete**: C5+C6 inline slots (129-256B) deliver +5.41% proven gain **on Standard binary**(`bench_random_mixed_hakmem`)。
|
||||||
- FAST PGO baseline(スコアカード)を更新する前に、`BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo` で **同条件の A/B(C5/C6 OFF/ON)** を再計測すること。
|
- FAST PGO baseline(スコアカード)を更新する前に、`BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo` で **同条件の A/B(C5/C6 OFF/ON)** を再計測すること。
|
||||||
|
|
||||||
|
### Phase 75-4(FAST PGO rebase)✅ 完了
|
||||||
|
|
||||||
|
- 結果: **+3.16% (GO)**(4-point matrix、outlier 除外後)
|
||||||
|
- 詳細: `docs/analysis/PHASE75_4_FAST_PGO_REBASE_RESULTS.md`
|
||||||
|
- 重要: Phase 69 の FAST baseline (62.63M) と比較して **現行 FAST PGO baseline が大きく低い**疑い(PGO profile staleness / training mismatch / build drift)
|
||||||
|
|
||||||
|
### Phase 75-5(PGO 再生成)🟥 次のActive(HIGH PRIORITY)
|
||||||
|
|
||||||
|
目的:
|
||||||
|
- C5/C6 inline slots を含む現行コードに対して PGO training を再生成し、Phase 69 クラスの FAST baseline を取り戻す。
|
||||||
|
|
||||||
|
手順(骨子):
|
||||||
|
1. PGO training を “C5/C6=ON” 前提で回す(training 時に `HAKMEM_TINY_C5_INLINE_SLOTS=1` / `HAKMEM_TINY_C6_INLINE_SLOTS=1` を必ず設定)
|
||||||
|
2. `make pgo-fast-full` で `bench_random_mixed_hakmem_minimal_pgo` を再生成
|
||||||
|
3. 10-run で baseline を再測定し、Phase 75-4 の Point A/D を再計測
|
||||||
|
4. Layout tax / drift の疑いが出たら `scripts/box/layout_tax_forensics_box.sh` で原因分類
|
||||||
|
|
||||||
**参考**:
|
**参考**:
|
||||||
- 4-point matrix 結果: `docs/analysis/PHASE75_3_C5_C6_INTERACTION_RESULTS.md`
|
- 4-point matrix 結果: `docs/analysis/PHASE75_3_C5_C6_INTERACTION_RESULTS.md`
|
||||||
- Test script: `scripts/phase75_3_matrix_test.sh`
|
- Test script: `scripts/phase75_3_matrix_test.sh`
|
||||||
|
|||||||
103
docs/analysis/PHASE75_5_PGO_REGENERATION_NEXT_INSTRUCTIONS.md
Normal file
103
docs/analysis/PHASE75_5_PGO_REGENERATION_NEXT_INSTRUCTIONS.md
Normal file
@ -0,0 +1,103 @@
|
|||||||
|
# Phase 75-5: PGO Regeneration (C5/C6 Inline Slots Aware) — Next Instructions
|
||||||
|
|
||||||
|
**Status**: NEXT (HIGH PRIORITY)
|
||||||
|
|
||||||
|
## Goal
|
||||||
|
|
||||||
|
Rebuild the FAST PGO SSOT binary (`bench_random_mixed_hakmem_minimal_pgo`) with a training profile that matches the **current promoted defaults**:
|
||||||
|
- `HAKMEM_WARM_POOL_SIZE=16`
|
||||||
|
- `HAKMEM_TINY_C5_INLINE_SLOTS=1`
|
||||||
|
- `HAKMEM_TINY_C6_INLINE_SLOTS=1`
|
||||||
|
|
||||||
|
This is required because Phase 75-4 observed a large gap between:
|
||||||
|
- **Phase 69 historical FAST baseline** (62.63M ops/s)
|
||||||
|
- **Phase 75-4 current FAST PGO Point A baseline** (53.81M ops/s)
|
||||||
|
|
||||||
|
## SSOT Rules
|
||||||
|
|
||||||
|
- Use `scripts/run_mixed_10_cleanenv.sh` as the harness.
|
||||||
|
- Always pin the binary explicitly via `BENCH_BIN=...` to avoid Standard/FAST confusion.
|
||||||
|
- Keep comparisons within the **same binary** when judging a single knob (C5/C6 OFF/ON).
|
||||||
|
|
||||||
|
## Step 1: Prepare training commands (C5/C6 ON)
|
||||||
|
|
||||||
|
Pick one of these approaches (A is preferred):
|
||||||
|
|
||||||
|
### A) Training uses the harness (preferred)
|
||||||
|
|
||||||
|
Ensure the training workload exports the correct knobs:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export HAKMEM_WARM_POOL_SIZE=16
|
||||||
|
export HAKMEM_TINY_C5_INLINE_SLOTS=1
|
||||||
|
export HAKMEM_TINY_C6_INLINE_SLOTS=1
|
||||||
|
```
|
||||||
|
|
||||||
|
Then run the existing PGO training target (repo-specific; example):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
make pgo-fast-full
|
||||||
|
```
|
||||||
|
|
||||||
|
### B) Hard-pin knobs inside PGO training config (if needed)
|
||||||
|
|
||||||
|
If the training driver does not inherit ENV cleanly, update the PGO training config script to include:
|
||||||
|
- `HAKMEM_WARM_POOL_SIZE=16`
|
||||||
|
- `HAKMEM_TINY_C5_INLINE_SLOTS=1`
|
||||||
|
- `HAKMEM_TINY_C6_INLINE_SLOTS=1`
|
||||||
|
|
||||||
|
## Step 2: Validate the rebuilt binary
|
||||||
|
|
||||||
|
Run Mixed SSOT 10-run on FAST PGO:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo scripts/run_mixed_10_cleanenv.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
Record mean/median/CV and update the scorecard baseline if improved.
|
||||||
|
|
||||||
|
## Step 3: Re-run Phase 75-4 matrix on FAST PGO (sanity)
|
||||||
|
|
||||||
|
Run 4-point matrix on FAST PGO to confirm:
|
||||||
|
- Point D > Point A
|
||||||
|
- and quantify additivity (B/C contributions)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo \
|
||||||
|
HAKMEM_TINY_C5_INLINE_SLOTS=0 HAKMEM_TINY_C6_INLINE_SLOTS=0 RUNS=10 \
|
||||||
|
scripts/run_mixed_10_cleanenv.sh
|
||||||
|
|
||||||
|
BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo \
|
||||||
|
HAKMEM_TINY_C5_INLINE_SLOTS=1 HAKMEM_TINY_C6_INLINE_SLOTS=0 RUNS=10 \
|
||||||
|
scripts/run_mixed_10_cleanenv.sh
|
||||||
|
|
||||||
|
BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo \
|
||||||
|
HAKMEM_TINY_C5_INLINE_SLOTS=0 HAKMEM_TINY_C6_INLINE_SLOTS=1 RUNS=10 \
|
||||||
|
scripts/run_mixed_10_cleanenv.sh
|
||||||
|
|
||||||
|
BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo \
|
||||||
|
HAKMEM_TINY_C5_INLINE_SLOTS=1 HAKMEM_TINY_C6_INLINE_SLOTS=1 RUNS=10 \
|
||||||
|
scripts/run_mixed_10_cleanenv.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
## Step 4: If regression persists, do layout tax forensics
|
||||||
|
|
||||||
|
Use:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./scripts/box/layout_tax_forensics_box.sh \
|
||||||
|
./bench_random_mixed_hakmem_minimal_pgo_phase69_best \
|
||||||
|
./bench_random_mixed_hakmem_minimal_pgo
|
||||||
|
```
|
||||||
|
|
||||||
|
Then classify:
|
||||||
|
- IPC drop (>3%) → text layout / inlining / code placement issue
|
||||||
|
- branch-miss spike (>10%) → hint mismatch / control-flow reshaping
|
||||||
|
- cache/dTLB spike → data layout / TLS bloat / spill
|
||||||
|
|
||||||
|
## GO/NO-GO Gates
|
||||||
|
|
||||||
|
- **GO**: FAST PGO baseline recovers significantly (target: close to Phase 69 order-of-magnitude), and Phase 75-4 D vs A remains ≥ +1.0%.
|
||||||
|
- **NEUTRAL**: D vs A stays positive but baseline still low → keep investigating training config.
|
||||||
|
- **NO-GO**: D vs A becomes negative → revert or rework inline slots integration for FAST builds.
|
||||||
|
|
||||||
@ -24,6 +24,10 @@
|
|||||||
- `BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo scripts/run_mixed_10_cleanenv.sh`
|
- `BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo scripts/run_mixed_10_cleanenv.sh`
|
||||||
- and toggle `HAKMEM_TINY_C5_INLINE_SLOTS` / `HAKMEM_TINY_C6_INLINE_SLOTS`.
|
- and toggle `HAKMEM_TINY_C5_INLINE_SLOTS` / `HAKMEM_TINY_C6_INLINE_SLOTS`.
|
||||||
|
|
||||||
|
**Update**:
|
||||||
|
- Phase 75-4 completed the FAST PGO rebase and confirmed **+3.16% (GO)** on FAST PGO via a 4-point matrix A/B.
|
||||||
|
- See `docs/analysis/PHASE75_4_FAST_PGO_REBASE_RESULTS.md`.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Phase 75 Journey
|
## Phase 75 Journey
|
||||||
|
|||||||
Reference in New Issue
Block a user