From 3dbf4acb48996107328525b6deab7f8baf747629 Mon Sep 17 00:00:00 2001 From: "Moe Charm (CI)" Date: Thu, 18 Dec 2025 09:28:09 +0900 Subject: [PATCH] Update scorecard: Phase 75-4 FAST PGO rebase (+3.16%) + critical PGO staleness finding Phase 75-4 validates C5+C6 inline slots on FAST PGO baseline: - Point A (baseline, C5=0, C6=0): 53.81 M ops/s - Point D (C5=1, C6=1): 55.51 M ops/s (+3.16%) CRITICAL FINDING: 14% regression vs Phase 69 baseline (53.81 vs 62.63 M ops/s) Root cause: Stale PGO profile (likely trained pre-Phase 69, missing Phase 75 benefits) Recommended next: Phase 75-5 (PGO Profile Regeneration) to recover lost performance Scorecard updated with Phase 75-4 results and high-priority action items. --- .../analysis/PERFORMANCE_TARGETS_SCORECARD.md | 45 +++++++++++++++++++ 1 file changed, 45 insertions(+) diff --git a/docs/analysis/PERFORMANCE_TARGETS_SCORECARD.md b/docs/analysis/PERFORMANCE_TARGETS_SCORECARD.md index 6bf08060..0ad3f7a1 100644 --- a/docs/analysis/PERFORMANCE_TARGETS_SCORECARD.md +++ b/docs/analysis/PERFORMANCE_TARGETS_SCORECARD.md @@ -33,6 +33,7 @@ Note: | **FAST v3 + PGO (Phase 66)** | **60.89** | **61.35** | **50.32%** | **GO: +3.0% mean (3回検証済み、安定 <±1%)**。Phase 66 PGO initial baseline | | **FAST v3 + PGO (Phase 68)** | **61.614** | **61.924** | **50.93%** | **GO: +1.19% vs Phase 66** ✓ (seed/WS diversification) | | **FAST v3 + PGO (Phase 69)** | **62.63** | **63.38** | **51.77%** | **強GO: +3.26% vs Phase 68** ✓✓✓ (Warm Pool Size=16, ENV-only) → **昇格済み 新 FAST baseline** ✓ | +| FAST v3 + PGO + Phase 75 (C5+C6 ON) [Point D] | **55.51** | - | **45.70%** | Phase 75-4 FAST PGO rebase (C5+C6 inline slots): +3.16% vs Point A ✓ **[REBASE URGENT]** | | Standard | 53.50 | - | 44.21% | 安全・互換基準(Phase 48 前計測、要 rebase) | | OBSERVE | TBD | - | - | 診断カウンタ ON | @@ -118,6 +119,50 @@ Notes: - Rollback: Set `HAKMEM_WARM_POOL_SIZE=12` or remove ENV variable - Results: `docs/analysis/PHASE69_REFILL_TUNING_1_RESULTS.md` +**Phase 75-4: FAST PGO Rebase (C5+C6 Inline Slots Validation) — CRITICAL FINDING** + +Phase 75-3 validated C5+C6 inline slots optimization on Standard binary (+5.41%). Phase 75-4 rebased this onto FAST PGO baseline to update SSOT: + +**4-Point Matrix (FAST PGO, Mixed SSOT):** +| Point | Config | Throughput | Delta vs A | +|-------|--------|-----------|-----------| +| A | C5=0, C6=0 | 53.81 M ops/s | baseline | +| B | C5=1, C6=0 | 53.03 M ops/s | -1.45% | +| C | C5=0, C6=1 | 54.17 M ops/s | +0.67% | +| **D** | **C5=1, C6=1** | **55.51 M ops/s** | **+3.16%** | + +**Decision**: ✅ **GO** (Point D exceeds +3.0% ideal threshold by +0.16%) + +**⚠️ CRITICAL FINDING: PGO Profile Staleness** + +- **Phase 69 FAST baseline**: 62.63 M ops/s +- **Phase 75-4 Point A (FAST PGO baseline)**: 53.81 M ops/s +- **Regression**: -14.09% (not explained by Phase 75 additions) +- **Root cause hypothesis**: PGO profile trained pre-Phase 69 (likely Phase 68 or earlier) with C5=0, C6=0 configuration +- **Impact**: FAST PGO captures only 58.4% of Standard's +5.41% gain (3.16% vs 5.41%) + +**Recommended Actions (Priority Order):** + +1. **IMMEDIATE - UPDATE SSOT**: Phase 75 C5+C6 inline slots confirmed working (+3.16% on FAST PGO) + - Promote to core/bench_profile.h (already done for Standard, now FAST PGO validated) + - Update this scorecard: Phase 75 baseline = 55.51 M ops/s (Point D, with C5+C6 ON) + +2. **HIGH PRIORITY - PHASE 75-5 (PGO Profile Regeneration)** + - Regenerate PGO profile with C5=1, C6=1 training configuration + - Expected gain: +5-8% (if profile aligns with actual code optimization) + - Estimated recovery: 55.51 M ops/s → ~58-59 M ops/s + - Root cause analysis: Investigate 14% gap vs Phase 69 (layout, code bloat, or profile mismatch) + +**Documentation:** +- Phase 75-4 results: `docs/analysis/PHASE75_4_FAST_PGO_REBASE_RESULTS.md` +- Next: Phase 75-5 (PGO regeneration) required before next optimization phase + +**Impact on M2 Milestone:** +- Phase 69 FAST baseline: 62.63 M ops/s (51.77% of mimalloc, +3.23pp to M2) +- Phase 75-4 Point A (baseline): 53.81 M ops/s (44.35% of mimalloc, +10.65pp to M2) +- Phase 75-4 Point D (C5+C6): 55.51 M ops/s (45.70% of mimalloc, +9.30pp to M2) +- **Status**: Phase 75 optimization proven, but PGO profile regression masks true progress + ※注意: `mimalloc/system/jemalloc` の参照値は環境ドリフトでズレるため、定期的に再ベースラインする。 - Phase 48 完了: `docs/analysis/PHASE48_REBASE_ALLOCATORS_AND_STABILITY_SUITE_RESULTS.md` - Phase 59 完了: `docs/analysis/PHASE59_50PERCENT_RECOVERY_BASELINE_REBASE_RESULTS.md`