Files
hakmem/CURRENT_TASK.md
Moe Charm (CI) 5c9b09148b Phase 69-0: Refill tuning design memo (parameter sweep plan)
Changes:
- docs/analysis/PHASE69_REFILL_TUNING_0_DESIGN.md: New design document
  * Identified 3 tunable parameters: refill batch size, unified cache C5-C7 capacity, warm pool size
  * Sweep plan: single-parameter isolation → combined optimization
  * Expected gain: +3-6% (shortest path to M2: 55% target)
  * Risk assessment and decision criteria (GO/Strong GO/NO-GO thresholds)

- CURRENT_TASK.md: Phase 69-0 marked complete, Phase 69-1 (sweep execution) set Active

Key Parameters Identified:
1. TINY_REFILL_BATCH_SIZE: 16 → 32/64 (expected +1-3%)
2. Unified Cache C5-C7: 128 → 256/512 slots (expected +1-2%)
3. Warm Pool: 12 → 16/24 SuperSlabs (expected +0.5-1%)

Strategy:
- ENV-only sweeps first (warm pool, cache capacity) - no recompile
- Batch size sweep requires PGO rebuild - highest expected gain
- Combined optimization targets +3-6% additive gain

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-17 21:22:21 +09:00

107 lines
5.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# CURRENT_TASKRolling, SSOT
## 0) 今の「正」
- **性能比較の正**: FAST PGO build`make pgo-fast-full``bench_random_mixed_hakmem_minimal_pgo`)✓ **Phase 68 昇格済み** (seed/WS diversified)
- **安全・互換の正**: Standard build`make bench_random_mixed_hakmem`
- **観測の正**: OBSERVE build`make perf_observe`
- **スコアカード**: `docs/analysis/PERFORMANCE_TARGETS_SCORECARD.md`M1 達成・超過: 50.93% vs 50% target
- **計測の正Mixed 10-run**: `scripts/run_mixed_10_cleanenv.sh``ITERS=20000000 WS=400`
## 1) 現状(要点)
- Phase 64backend prune / DCE: **NO-GO**-4.05% → layout tax 由来
- Phase 63FAST_PROFILE_FIXED: **研究用ビルド**として保持FAST の gate を compile-time 固定)
- Phase 65Hot Symbol Ordering: **BLOCKED**GCC+LTO の制約で不公平/不可能)→ `docs/analysis/PHASE65_HOT_SYMBOL_ORDERING_1_RESULTS.md`
- Phase 66PGO, GCC+LTO: **GO**
- 検証: 3回独立実行で +3.0% mean, all >+2.89%, 分散 <±1%
- Baseline: `bench_random_mixed_hakmem_minimal_pgo` = 60.89M ops/s = 50.32% (initial PGO)
- Phase 68PGO training set 最適化): **GO & 昇格完了**
- 検証: 10-run で +1.19% vs Phase 66 (GO: +1.0% threshold超過)
- 新 baseline: `bench_random_mixed_hakmem_minimal_pgo` (upgraded) = 61.614M ops/s = **50.93%** (50% target 超過、+0.93pp)
## 2) 次の指示書Active
**Phase 68: PGO training set 最適化****完了**
- ✓ seed/WS diversification: WS (3→5パターン), seed (1→3パターン)
- ✓ 10-run 検証: +1.19% vs Phase 66 (GO threshold +1.0% 超過)
- ✓ Baseline 昇格: 61.614M ops/s = 50.93% (M1 target 50% を +0.93pp 超過)
- ✓ スコアカード・CURRENT_TASK 更新完了
---
**Phase 67a: Layout Tax 法医学(変更最小)****完了・実運用可能**
-`scripts/box/layout_tax_forensics_box.sh` 新規(測定ハーネス)
- Baseline vs Treatment の 10-run throughput 比較
- perf stat 自動収集cycles, IPC, branches, branch-misses, cache-misses, iTLB/dTLB
- Binary metadataサイズ、セクション構成
-`docs/analysis/PHASE67A_LAYOUT_TAX_FORENSICS_SSOT.md` 新規(診断ガイド)
- 判定ルール: GO (+1% 以上) / NEUTRAL (±1%) / NO-GO (-1% 以下)
- "症状→原因候補" マッピング表
* IPC 低下 3%↑ → I-cache miss / code layout dispersal
* branch-misses ↑10%↑ → branch prediction penalty
* dTLB-misses ↑100%↑ → data layout fragmentation
- Phase 64 case study-4.05% の root cause: IPC 2.05 → 1.98
- 運用ガイドライン
**使用例**:
```bash
./scripts/box/layout_tax_forensics_box.sh \
./bench_random_mixed_hakmem_minimal_pgo \
./bench_random_mixed_hakmem_fast_pruned # or Phase 64 attempt
```
成果: 「削る系」NO-GO が出た時に、どの指標が悪化しているかを **1回で診断可能** → 以後の link-out/大削除を事前に止められる
---
**Phase 69: "refill頻度×固定税" を削るM2への最短距離**
**Phase 69-0: パラメータ sweep 設計メモ****完了**
-`docs/analysis/PHASE69_REFILL_TUNING_0_DESIGN.md` 作成
- ✓ Tunable parameters 特定:
- `TINY_REFILL_BATCH_SIZE` (16 → 32/64/128)
- Unified Cache C5-C7 capacity (128 → 256/512)
- Warm Pool size (12 → 16/24)
- ✓ Sweep 計画立案single-parameter → combined optimization
- ✓ Risk assessment & 判定基準定義
**Phase 69-1Active: Sweep 実行**
- **狙い**: +3〜6% (M2: 55% target への最短距離)
- **施策**(優先順):
1. **Warm Pool Size** (ENV-only, no recompile): 12 → 16 → 24
- Expected: +0.5-1.0% (registry scan reduction)
2. **Unified Cache C5-C7** (ENV-only, no recompile): 128 → 256 → 512
- Expected: +1-2% (miss rate reduction for mid-size allocations)
3. **Refill Batch Size** (requires PGO rebuild): 16 → 32 → 64
- Expected: +1-3% (refill frequency reduction)
- **手順**:
- `scripts/run_mixed_10_cleanenv.sh` で 10-run (各パラメータ)
- 失敗時は `scripts/box/layout_tax_forensics_box.sh` を当てて原因分類
- **判定**:
- GO: +1.0%まず1段目
- "強GO": +3.0% 以上M2射程の芯として昇格
**Phase 69-2後続: 勝ち設定を baseline に反映**
- 勝ちパラメータを `pgo_fast_profile_config.sh` / `core/hakmem_tiny_config.h` に反映
- `make pgo-fast-full` で再ビルド → baseline 昇格
---
**Phase 67b後続・保険: 境界inline/unrollチューニング**
- **注意**: layout tax リスク高いPhase 64 reference
- **前提**: Top 50 実行確認が必須
- Phase 69 が外れた時の保険として後回し推奨
**注記**: 研究箱の削除は今やらないlink-out/削除が layout tax を起こす前例が強いので、compile-out維持が正解
## 3) アーカイブ
- 詳細ログ: `CURRENT_TASK_ARCHIVE_20251210.md`
- 直近整理前スナップショット: `docs/analysis/CURRENT_TASK_ARCHIVE.md`