diff --git a/CURRENT_TASK.md b/CURRENT_TASK.md index 4055750c..452b6b49 100644 --- a/CURRENT_TASK.md +++ b/CURRENT_TASK.md @@ -3,12 +3,14 @@ ## 0) 今の「正」(SSOT) - **性能比較の正**: FAST PGO build(`make pgo-fast-full` → `bench_random_mixed_hakmem_minimal_pgo`)+ **WarmPool=16** - - Phase 75(C5/C6 inline slots)は presets に昇格済みだが、**FAST PGO での再計測(rebase)は未実施**(Standard での A/B は +5.41%)。 + - Phase 75(C5/C6 inline slots)は presets に昇格済み + - Phase 75-4 で FAST PGO rebase を実施し **C5+C6=ON が +3.16% (GO)** を確認(ただし **FAST PGO baseline 自体が Phase 69 から大きく後退**している疑い → Phase 75-5 で PGO 再生成が必要) - **安全・互換の正**: Standard build(`make bench_random_mixed_hakmem`) - **観測の正**: OBSERVE build(`make perf_observe`) - **スコアカード(目標/現在値)**: `docs/analysis/PERFORMANCE_TARGETS_SCORECARD.md` - **FAST baseline(SSOT)**: `docs/analysis/PERFORMANCE_TARGETS_SCORECARD.md` を正とする(Phase 69: 62.63M ops/s = 51.77% of mimalloc) - - **Phase 75 の計測**: `bench_random_mixed_hakmem`(非PGO/別ターゲット)で **A/B +5.41%** を確認(Phase 75-3 4-point matrix)。FAST PGO への反映(rebase)は別途。 + - **Phase 75 の計測(Standard)**: `bench_random_mixed_hakmem` で **A/B +5.41%** を確認(Phase 75-3 4-point matrix) + - **Phase 75 の計測(FAST PGO)**: `bench_random_mixed_hakmem_minimal_pgo` で **A/B +3.16%** を確認(Phase 75-4 4-point matrix) - 次の目標: **M2 = 55%**(gap は FAST baseline を基準に判断する) - **Mixed 10-run SSOT(ハーネス)**: `scripts/run_mixed_10_cleanenv.sh` - デフォルト `BENCH_BIN=./bench_random_mixed_hakmem`(Standard) @@ -206,6 +208,23 @@ Per-class Unified-STATS (Mixed SSOT, WS=400, HAKMEM_MEASURE_UNIFIED_CACHE=1): **Phase 75 Complete**: C5+C6 inline slots (129-256B) deliver +5.41% proven gain **on Standard binary**(`bench_random_mixed_hakmem`)。 - FAST PGO baseline(スコアカード)を更新する前に、`BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo` で **同条件の A/B(C5/C6 OFF/ON)** を再計測すること。 +### Phase 75-4(FAST PGO rebase)✅ 完了 + +- 結果: **+3.16% (GO)**(4-point matrix、outlier 除外後) +- 詳細: `docs/analysis/PHASE75_4_FAST_PGO_REBASE_RESULTS.md` +- 重要: Phase 69 の FAST baseline (62.63M) と比較して **現行 FAST PGO baseline が大きく低い**疑い(PGO profile staleness / training mismatch / build drift) + +### Phase 75-5(PGO 再生成)🟥 次のActive(HIGH PRIORITY) + +目的: +- C5/C6 inline slots を含む現行コードに対して PGO training を再生成し、Phase 69 クラスの FAST baseline を取り戻す。 + +手順(骨子): +1. PGO training を “C5/C6=ON” 前提で回す(training 時に `HAKMEM_TINY_C5_INLINE_SLOTS=1` / `HAKMEM_TINY_C6_INLINE_SLOTS=1` を必ず設定) +2. `make pgo-fast-full` で `bench_random_mixed_hakmem_minimal_pgo` を再生成 +3. 10-run で baseline を再測定し、Phase 75-4 の Point A/D を再計測 +4. Layout tax / drift の疑いが出たら `scripts/box/layout_tax_forensics_box.sh` で原因分類 + **参考**: - 4-point matrix 結果: `docs/analysis/PHASE75_3_C5_C6_INTERACTION_RESULTS.md` - Test script: `scripts/phase75_3_matrix_test.sh` diff --git a/docs/analysis/PHASE75_5_PGO_REGENERATION_NEXT_INSTRUCTIONS.md b/docs/analysis/PHASE75_5_PGO_REGENERATION_NEXT_INSTRUCTIONS.md new file mode 100644 index 00000000..8f65fd6e --- /dev/null +++ b/docs/analysis/PHASE75_5_PGO_REGENERATION_NEXT_INSTRUCTIONS.md @@ -0,0 +1,103 @@ +# Phase 75-5: PGO Regeneration (C5/C6 Inline Slots Aware) — Next Instructions + +**Status**: NEXT (HIGH PRIORITY) + +## Goal + +Rebuild the FAST PGO SSOT binary (`bench_random_mixed_hakmem_minimal_pgo`) with a training profile that matches the **current promoted defaults**: +- `HAKMEM_WARM_POOL_SIZE=16` +- `HAKMEM_TINY_C5_INLINE_SLOTS=1` +- `HAKMEM_TINY_C6_INLINE_SLOTS=1` + +This is required because Phase 75-4 observed a large gap between: +- **Phase 69 historical FAST baseline** (62.63M ops/s) +- **Phase 75-4 current FAST PGO Point A baseline** (53.81M ops/s) + +## SSOT Rules + +- Use `scripts/run_mixed_10_cleanenv.sh` as the harness. +- Always pin the binary explicitly via `BENCH_BIN=...` to avoid Standard/FAST confusion. +- Keep comparisons within the **same binary** when judging a single knob (C5/C6 OFF/ON). + +## Step 1: Prepare training commands (C5/C6 ON) + +Pick one of these approaches (A is preferred): + +### A) Training uses the harness (preferred) + +Ensure the training workload exports the correct knobs: + +```bash +export HAKMEM_WARM_POOL_SIZE=16 +export HAKMEM_TINY_C5_INLINE_SLOTS=1 +export HAKMEM_TINY_C6_INLINE_SLOTS=1 +``` + +Then run the existing PGO training target (repo-specific; example): + +```bash +make pgo-fast-full +``` + +### B) Hard-pin knobs inside PGO training config (if needed) + +If the training driver does not inherit ENV cleanly, update the PGO training config script to include: +- `HAKMEM_WARM_POOL_SIZE=16` +- `HAKMEM_TINY_C5_INLINE_SLOTS=1` +- `HAKMEM_TINY_C6_INLINE_SLOTS=1` + +## Step 2: Validate the rebuilt binary + +Run Mixed SSOT 10-run on FAST PGO: + +```bash +BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo scripts/run_mixed_10_cleanenv.sh +``` + +Record mean/median/CV and update the scorecard baseline if improved. + +## Step 3: Re-run Phase 75-4 matrix on FAST PGO (sanity) + +Run 4-point matrix on FAST PGO to confirm: +- Point D > Point A +- and quantify additivity (B/C contributions) + +```bash +BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo \ + HAKMEM_TINY_C5_INLINE_SLOTS=0 HAKMEM_TINY_C6_INLINE_SLOTS=0 RUNS=10 \ + scripts/run_mixed_10_cleanenv.sh + +BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo \ + HAKMEM_TINY_C5_INLINE_SLOTS=1 HAKMEM_TINY_C6_INLINE_SLOTS=0 RUNS=10 \ + scripts/run_mixed_10_cleanenv.sh + +BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo \ + HAKMEM_TINY_C5_INLINE_SLOTS=0 HAKMEM_TINY_C6_INLINE_SLOTS=1 RUNS=10 \ + scripts/run_mixed_10_cleanenv.sh + +BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo \ + HAKMEM_TINY_C5_INLINE_SLOTS=1 HAKMEM_TINY_C6_INLINE_SLOTS=1 RUNS=10 \ + scripts/run_mixed_10_cleanenv.sh +``` + +## Step 4: If regression persists, do layout tax forensics + +Use: + +```bash +./scripts/box/layout_tax_forensics_box.sh \ + ./bench_random_mixed_hakmem_minimal_pgo_phase69_best \ + ./bench_random_mixed_hakmem_minimal_pgo +``` + +Then classify: +- IPC drop (>3%) → text layout / inlining / code placement issue +- branch-miss spike (>10%) → hint mismatch / control-flow reshaping +- cache/dTLB spike → data layout / TLS bloat / spill + +## GO/NO-GO Gates + +- **GO**: FAST PGO baseline recovers significantly (target: close to Phase 69 order-of-magnitude), and Phase 75-4 D vs A remains ≥ +1.0%. +- **NEUTRAL**: D vs A stays positive but baseline still low → keep investigating training config. +- **NO-GO**: D vs A becomes negative → revert or rework inline slots integration for FAST builds. + diff --git a/docs/analysis/PHASE75_COMPLETE_SUMMARY.md b/docs/analysis/PHASE75_COMPLETE_SUMMARY.md index 03353696..370f84ea 100644 --- a/docs/analysis/PHASE75_COMPLETE_SUMMARY.md +++ b/docs/analysis/PHASE75_COMPLETE_SUMMARY.md @@ -24,6 +24,10 @@ - `BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo scripts/run_mixed_10_cleanenv.sh` - and toggle `HAKMEM_TINY_C5_INLINE_SLOTS` / `HAKMEM_TINY_C6_INLINE_SLOTS`. +**Update**: +- Phase 75-4 completed the FAST PGO rebase and confirmed **+3.16% (GO)** on FAST PGO via a 4-point matrix A/B. +- See `docs/analysis/PHASE75_4_FAST_PGO_REBASE_RESULTS.md`. + --- ## Phase 75 Journey