Files

Moe Charm (CI) 5c9b09148b Phase 69-0: Refill tuning design memo (parameter sweep plan)

Changes:
- docs/analysis/PHASE69_REFILL_TUNING_0_DESIGN.md: New design document
  * Identified 3 tunable parameters: refill batch size, unified cache C5-C7 capacity, warm pool size
  * Sweep plan: single-parameter isolation → combined optimization
  * Expected gain: +3-6% (shortest path to M2: 55% target)
  * Risk assessment and decision criteria (GO/Strong GO/NO-GO thresholds)

- CURRENT_TASK.md: Phase 69-0 marked complete, Phase 69-1 (sweep execution) set Active

Key Parameters Identified:
1. TINY_REFILL_BATCH_SIZE: 16 → 32/64 (expected +1-3%)
2. Unified Cache C5-C7: 128 → 256/512 slots (expected +1-2%)
3. Warm Pool: 12 → 16/24 SuperSlabs (expected +0.5-1%)

Strategy:
- ENV-only sweeps first (warm pool, cache capacity) - no recompile
- Batch size sweep requires PGO rebuild - highest expected gain
- Combined optimization targets +3-6% additive gain

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

2025-12-17 21:22:21 +09:00

5.0 KiB

Raw Blame History

CURRENT_TASK（Rolling, SSOT）

0) 今の「正」

性能比較の正: FAST PGO build（make pgo-fast-full → bench_random_mixed_hakmem_minimal_pgo）✓ Phase 68 昇格済み (seed/WS diversified)
安全・互換の正: Standard build（make bench_random_mixed_hakmem）
観測の正: OBSERVE build（make perf_observe）
スコアカード: docs/analysis/PERFORMANCE_TARGETS_SCORECARD.md（M1 達成・超過: 50.93% vs 50% target）
計測の正（Mixed 10-run）: scripts/run_mixed_10_cleanenv.sh（ITERS=20000000 WS=400）

1) 現状（要点）

Phase 64（backend prune / DCE）: NO-GO（-4.05%） → layout tax 由来
Phase 63（FAST_PROFILE_FIXED）: 研究用ビルドとして保持（FAST の gate を compile-time 固定）
Phase 65（Hot Symbol Ordering）: BLOCKED（GCC+LTO の制約で不公平/不可能）→ docs/analysis/PHASE65_HOT_SYMBOL_ORDERING_1_RESULTS.md
Phase 66（PGO, GCC+LTO）: GO ✓
- 検証: 3回独立実行で +3.0% mean, all >+2.89%, 分散 <±1%
- Baseline: bench_random_mixed_hakmem_minimal_pgo = 60.89M ops/s = 50.32% (initial PGO)
Phase 68（PGO training set 最適化）: GO & 昇格完了 ✓
- 検証: 10-run で +1.19% vs Phase 66 (GO: +1.0% threshold超過)
- 新 baseline: bench_random_mixed_hakmem_minimal_pgo (upgraded) = 61.614M ops/s = 50.93% (50% target 超過、+0.93pp)

2) 次の指示書（Active）

Phase 68: PGO training set 最適化 ✅ 完了

✓ seed/WS diversification: WS (3→5パターン), seed (1→3パターン)
✓ 10-run 検証: +1.19% vs Phase 66 (GO threshold +1.0% 超過)
✓ Baseline 昇格: 61.614M ops/s = 50.93% (M1 target 50% を +0.93pp 超過)
✓ スコアカード・CURRENT_TASK 更新完了

Phase 67a: Layout Tax 法医学（変更最小） ✅ 完了・実運用可能

✓ scripts/box/layout_tax_forensics_box.sh 新規（測定ハーネス）
- Baseline vs Treatment の 10-run throughput 比較
- perf stat 自動収集（cycles, IPC, branches, branch-misses, cache-misses, iTLB/dTLB）
- Binary metadata（サイズ、セクション構成）
✓ docs/analysis/PHASE67A_LAYOUT_TAX_FORENSICS_SSOT.md 新規（診断ガイド）
- 判定ルール: GO (+1% 以上) / NEUTRAL (±1%) / NO-GO (-1% 以下)
- "症状→原因候補" マッピング表
  - IPC 低下 3%↑ → I-cache miss / code layout dispersal
  - branch-misses ↑10%↑ → branch prediction penalty
  - dTLB-misses ↑100%↑ → data layout fragmentation
- Phase 64 case study（-4.05% の root cause: IPC 2.05 → 1.98）
- 運用ガイドライン

使用例:

./scripts/box/layout_tax_forensics_box.sh \
    ./bench_random_mixed_hakmem_minimal_pgo \
    ./bench_random_mixed_hakmem_fast_pruned  # or Phase 64 attempt

成果: 「削る系」NO-GO が出た時に、どの指標が悪化しているかを 1回で診断可能 → 以後の link-out/大削除を事前に止められる

Phase 69: "refill頻度×固定税" を削る（M2への最短距離）

Phase 69-0: パラメータ sweep 設計メモ ✅ 完了

✓ docs/analysis/PHASE69_REFILL_TUNING_0_DESIGN.md 作成
✓ Tunable parameters 特定:
- TINY_REFILL_BATCH_SIZE (16 → 32/64/128)
- Unified Cache C5-C7 capacity (128 → 256/512)
- Warm Pool size (12 → 16/24)
✓ Sweep 計画立案（single-parameter → combined optimization）
✓ Risk assessment & 判定基準定義

Phase 69-1（Active）: Sweep 実行

狙い: +3〜6% (M2: 55% target への最短距離)
施策（優先順）:
1. Warm Pool Size (ENV-only, no recompile): 12 → 16 → 24
  - Expected: +0.5-1.0% (registry scan reduction)
2. Unified Cache C5-C7 (ENV-only, no recompile): 128 → 256 → 512
  - Expected: +1-2% (miss rate reduction for mid-size allocations)
3. Refill Batch Size (requires PGO rebuild): 16 → 32 → 64
  - Expected: +1-3% (refill frequency reduction)
手順:
- scripts/run_mixed_10_cleanenv.sh で 10-run (各パラメータ)
- 失敗時は scripts/box/layout_tax_forensics_box.sh を当てて原因分類
判定:
- GO: +1.0%（まず1段目）
- "強GO": +3.0% 以上（M2射程の芯として昇格）

Phase 69-2（後続）: 勝ち設定を baseline に反映

勝ちパラメータを pgo_fast_profile_config.sh / core/hakmem_tiny_config.h に反映
make pgo-fast-full で再ビルド → baseline 昇格

Phase 67b（後続・保険）: 境界inline/unrollチューニング

注意: layout tax リスク高い（Phase 64 reference）
前提: Top 50 実行確認が必須
Phase 69 が外れた時の保険として後回し推奨

注記: 研究箱の削除は今やらない（link-out/削除が layout tax を起こす前例が強いので、compile-out維持が正解）

3) アーカイブ

詳細ログ: CURRENT_TASK_ARCHIVE_20251210.md
直近整理前スナップショット: docs/analysis/CURRENT_TASK_ARCHIVE.md