Changes: - docs/analysis/PHASE69_REFILL_TUNING_0_DESIGN.md: New design document * Identified 3 tunable parameters: refill batch size, unified cache C5-C7 capacity, warm pool size * Sweep plan: single-parameter isolation → combined optimization * Expected gain: +3-6% (shortest path to M2: 55% target) * Risk assessment and decision criteria (GO/Strong GO/NO-GO thresholds) - CURRENT_TASK.md: Phase 69-0 marked complete, Phase 69-1 (sweep execution) set Active Key Parameters Identified: 1. TINY_REFILL_BATCH_SIZE: 16 → 32/64 (expected +1-3%) 2. Unified Cache C5-C7: 128 → 256/512 slots (expected +1-2%) 3. Warm Pool: 12 → 16/24 SuperSlabs (expected +0.5-1%) Strategy: - ENV-only sweeps first (warm pool, cache capacity) - no recompile - Batch size sweep requires PGO rebuild - highest expected gain - Combined optimization targets +3-6% additive gain 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
107 lines
5.0 KiB
Markdown
107 lines
5.0 KiB
Markdown
# CURRENT_TASK(Rolling, SSOT)
|
||
|
||
## 0) 今の「正」
|
||
|
||
- **性能比較の正**: FAST PGO build(`make pgo-fast-full` → `bench_random_mixed_hakmem_minimal_pgo`)✓ **Phase 68 昇格済み** (seed/WS diversified)
|
||
- **安全・互換の正**: Standard build(`make bench_random_mixed_hakmem`)
|
||
- **観測の正**: OBSERVE build(`make perf_observe`)
|
||
- **スコアカード**: `docs/analysis/PERFORMANCE_TARGETS_SCORECARD.md`(M1 達成・超過: 50.93% vs 50% target)
|
||
- **計測の正(Mixed 10-run)**: `scripts/run_mixed_10_cleanenv.sh`(`ITERS=20000000 WS=400`)
|
||
|
||
## 1) 現状(要点)
|
||
|
||
- Phase 64(backend prune / DCE): **NO-GO**(-4.05%) → layout tax 由来
|
||
- Phase 63(FAST_PROFILE_FIXED): **研究用ビルド**として保持(FAST の gate を compile-time 固定)
|
||
- Phase 65(Hot Symbol Ordering): **BLOCKED**(GCC+LTO の制約で不公平/不可能)→ `docs/analysis/PHASE65_HOT_SYMBOL_ORDERING_1_RESULTS.md`
|
||
- Phase 66(PGO, GCC+LTO): **GO** ✓
|
||
- 検証: 3回独立実行で +3.0% mean, all >+2.89%, 分散 <±1%
|
||
- Baseline: `bench_random_mixed_hakmem_minimal_pgo` = 60.89M ops/s = 50.32% (initial PGO)
|
||
- Phase 68(PGO training set 最適化): **GO & 昇格完了** ✓
|
||
- 検証: 10-run で +1.19% vs Phase 66 (GO: +1.0% threshold超過)
|
||
- 新 baseline: `bench_random_mixed_hakmem_minimal_pgo` (upgraded) = 61.614M ops/s = **50.93%** (50% target 超過、+0.93pp)
|
||
|
||
## 2) 次の指示書(Active)
|
||
|
||
**Phase 68: PGO training set 最適化** ✅ **完了**
|
||
|
||
- ✓ seed/WS diversification: WS (3→5パターン), seed (1→3パターン)
|
||
- ✓ 10-run 検証: +1.19% vs Phase 66 (GO threshold +1.0% 超過)
|
||
- ✓ Baseline 昇格: 61.614M ops/s = 50.93% (M1 target 50% を +0.93pp 超過)
|
||
- ✓ スコアカード・CURRENT_TASK 更新完了
|
||
|
||
---
|
||
|
||
**Phase 67a: Layout Tax 法医学(変更最小)** ✅ **完了・実運用可能**
|
||
|
||
- ✓ `scripts/box/layout_tax_forensics_box.sh` 新規(測定ハーネス)
|
||
- Baseline vs Treatment の 10-run throughput 比較
|
||
- perf stat 自動収集(cycles, IPC, branches, branch-misses, cache-misses, iTLB/dTLB)
|
||
- Binary metadata(サイズ、セクション構成)
|
||
|
||
- ✓ `docs/analysis/PHASE67A_LAYOUT_TAX_FORENSICS_SSOT.md` 新規(診断ガイド)
|
||
- 判定ルール: GO (+1% 以上) / NEUTRAL (±1%) / NO-GO (-1% 以下)
|
||
- "症状→原因候補" マッピング表
|
||
* IPC 低下 3%↑ → I-cache miss / code layout dispersal
|
||
* branch-misses ↑10%↑ → branch prediction penalty
|
||
* dTLB-misses ↑100%↑ → data layout fragmentation
|
||
- Phase 64 case study(-4.05% の root cause: IPC 2.05 → 1.98)
|
||
- 運用ガイドライン
|
||
|
||
**使用例**:
|
||
```bash
|
||
./scripts/box/layout_tax_forensics_box.sh \
|
||
./bench_random_mixed_hakmem_minimal_pgo \
|
||
./bench_random_mixed_hakmem_fast_pruned # or Phase 64 attempt
|
||
```
|
||
|
||
成果: 「削る系」NO-GO が出た時に、どの指標が悪化しているかを **1回で診断可能** → 以後の link-out/大削除を事前に止められる
|
||
|
||
---
|
||
|
||
**Phase 69: "refill頻度×固定税" を削る(M2への最短距離)**
|
||
|
||
**Phase 69-0: パラメータ sweep 設計メモ** ✅ **完了**
|
||
|
||
- ✓ `docs/analysis/PHASE69_REFILL_TUNING_0_DESIGN.md` 作成
|
||
- ✓ Tunable parameters 特定:
|
||
- `TINY_REFILL_BATCH_SIZE` (16 → 32/64/128)
|
||
- Unified Cache C5-C7 capacity (128 → 256/512)
|
||
- Warm Pool size (12 → 16/24)
|
||
- ✓ Sweep 計画立案(single-parameter → combined optimization)
|
||
- ✓ Risk assessment & 判定基準定義
|
||
|
||
**Phase 69-1(Active): Sweep 実行**
|
||
|
||
- **狙い**: +3〜6% (M2: 55% target への最短距離)
|
||
- **施策**(優先順):
|
||
1. **Warm Pool Size** (ENV-only, no recompile): 12 → 16 → 24
|
||
- Expected: +0.5-1.0% (registry scan reduction)
|
||
2. **Unified Cache C5-C7** (ENV-only, no recompile): 128 → 256 → 512
|
||
- Expected: +1-2% (miss rate reduction for mid-size allocations)
|
||
3. **Refill Batch Size** (requires PGO rebuild): 16 → 32 → 64
|
||
- Expected: +1-3% (refill frequency reduction)
|
||
- **手順**:
|
||
- `scripts/run_mixed_10_cleanenv.sh` で 10-run (各パラメータ)
|
||
- 失敗時は `scripts/box/layout_tax_forensics_box.sh` を当てて原因分類
|
||
- **判定**:
|
||
- GO: +1.0%(まず1段目)
|
||
- "強GO": +3.0% 以上(M2射程の芯として昇格)
|
||
|
||
**Phase 69-2(後続): 勝ち設定を baseline に反映**
|
||
- 勝ちパラメータを `pgo_fast_profile_config.sh` / `core/hakmem_tiny_config.h` に反映
|
||
- `make pgo-fast-full` で再ビルド → baseline 昇格
|
||
|
||
---
|
||
|
||
**Phase 67b(後続・保険): 境界inline/unrollチューニング**
|
||
- **注意**: layout tax リスク高い(Phase 64 reference)
|
||
- **前提**: Top 50 実行確認が必須
|
||
- Phase 69 が外れた時の保険として後回し推奨
|
||
|
||
**注記**: 研究箱の削除は今やらない(link-out/削除が layout tax を起こす前例が強いので、compile-out維持が正解)
|
||
|
||
## 3) アーカイブ
|
||
|
||
- 詳細ログ: `CURRENT_TASK_ARCHIVE_20251210.md`
|
||
- 直近整理前スナップショット: `docs/analysis/CURRENT_TASK_ARCHIVE.md`
|