2025-12-13 17:32:34 +09:00
|
|
|
|
# Phase 3: Cache Locality 最適化(開始指示)
|
|
|
|
|
|
|
|
|
|
|
|
## 目標
|
|
|
|
|
|
|
|
|
|
|
|
**現状**: Mixed ~35.2M ops/s (B3+B4 後)
|
|
|
|
|
|
**目標**: 57-68M ops/s (+12-22%)
|
|
|
|
|
|
|
|
|
|
|
|
## Phase 3 構成
|
|
|
|
|
|
|
|
|
|
|
|
### C3(優先度: 🔴 最高): Static Routing
|
|
|
|
|
|
|
|
|
|
|
|
**背景**:
|
|
|
|
|
|
- Mixed の perf top では malloc/policy_snapshot が hot
|
|
|
|
|
|
- 現在: 毎回 malloc 時に policy snapshot + learner evaluation → 大きな overhead
|
|
|
|
|
|
- 案: malloc_tiny_fast() 呼び出し前に "static route" を init 時決定
|
|
|
|
|
|
|
2025-12-13 18:46:11 +09:00
|
|
|
|
**設計メモ**: `docs/analysis/PHASE3_C3_STATIC_ROUTING_1_DESIGN.md`
|
|
|
|
|
|
|
2025-12-13 17:32:34 +09:00
|
|
|
|
**実装ステップ**:
|
|
|
|
|
|
|
|
|
|
|
|
1. **Profiling(現状把握)**
|
|
|
|
|
|
```bash
|
2025-12-13 18:46:11 +09:00
|
|
|
|
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE ./bench_random_mixed_hakmem 1000000 400 1
|
|
|
|
|
|
perf record -F 99 --call-graph dwarf -- HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE ./bench_random_mixed_hakmem 1000000 400 1
|
|
|
|
|
|
perf report --stdio
|
2025-12-13 17:32:34 +09:00
|
|
|
|
# → malloc/policy_snapshot/learner がどの程度か確認
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
2. **Static Route Detection (init 時)**
|
|
|
|
|
|
- malloc_tiny_fast() が呼ばれる前に route を "決定"
|
|
|
|
|
|
- 対象: C0-C7 の class 別に「LEGACY が dominant か」を判定
|
2025-12-13 18:46:11 +09:00
|
|
|
|
- ENV gate: `HAKMEM_TINY_STATIC_ROUTE=1/0` (default 0)
|
2025-12-13 17:32:34 +09:00
|
|
|
|
|
|
|
|
|
|
3. **Route Bypass**
|
|
|
|
|
|
```c
|
|
|
|
|
|
// 現在(毎回評価):
|
|
|
|
|
|
route = g_policy_learner->get_route(class_idx); // 高コスト
|
|
|
|
|
|
|
|
|
|
|
|
// C3 Static(init 時決定):
|
|
|
|
|
|
if (static_route_enabled()) {
|
|
|
|
|
|
route = g_static_route[class_idx]; // cached, no learner
|
|
|
|
|
|
} else {
|
|
|
|
|
|
route = learner_route(...); // 従来通り
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
4. **期待**: +5-8%
|
|
|
|
|
|
5. **A/B Test**: Mixed 10-run + C6-heavy 5-run
|
|
|
|
|
|
|
|
|
|
|
|
### C1(優先度: 🟡 中): TLS Cache Prefetch
|
|
|
|
|
|
|
2025-12-13 19:01:57 +09:00
|
|
|
|
**狙い**: policy ではなく、実際に alloc が触る **TLS cache**(Unified Cache)をプリフェッチ
|
|
|
|
|
|
|
|
|
|
|
|
**設計メモ**: `docs/analysis/PHASE3_C1_TLS_PREFETCH_1_DESIGN.md`
|
2025-12-13 17:32:34 +09:00
|
|
|
|
|
|
|
|
|
|
**実装**:
|
|
|
|
|
|
```c
|
2025-12-13 19:01:57 +09:00
|
|
|
|
// malloc_tiny_fast_for_class() 内で、LEGACY route のときだけ:
|
|
|
|
|
|
__builtin_prefetch(&g_unified_cache[class_idx], 0, 3);
|
2025-12-13 17:32:34 +09:00
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**期待**: +2-4%
|
|
|
|
|
|
|
|
|
|
|
|
### C2(優先度: 🟡 中): Slab Metadata Cache Optimization
|
|
|
|
|
|
|
|
|
|
|
|
**狙い**: hot metadata(policy, slab descriptor)をより近い場所に配置
|
|
|
|
|
|
|
|
|
|
|
|
**期待**: +5-10%
|
|
|
|
|
|
|
|
|
|
|
|
## 実装フロー
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
1. C3(Static Routing)実装 & A/B test
|
|
|
|
|
|
├─ GO: default 化
|
|
|
|
|
|
└─ NO-GO: freeze
|
|
|
|
|
|
|
|
|
|
|
|
2. C1(TLS Prefetch)追加実装
|
|
|
|
|
|
└─ Cumulative test
|
|
|
|
|
|
|
|
|
|
|
|
3. C2(Metadata Optimization)
|
|
|
|
|
|
└─ Final A/B
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
## 安全性確認
|
|
|
|
|
|
|
|
|
|
|
|
- **LD mode**: static route は LD 環境では disable(policy learner は LD で active)
|
|
|
|
|
|
- **Lock depth**: malloc 側なので不要
|
|
|
|
|
|
- **Rollback**: ENV gate で即時 OFF 可能
|
|
|
|
|
|
|
|
|
|
|
|
## 次のアクション
|
|
|
|
|
|
|
|
|
|
|
|
**今すぐ**:
|
|
|
|
|
|
1. `perf record -F 99 ./bench_random_mixed_hakmem` を実行して hot spot 特定
|
|
|
|
|
|
2. policy_snapshot / learner evaluation の overhead 定量化
|
|
|
|
|
|
3. C3 static route detection の実装開始
|
|
|
|
|
|
|
|
|
|
|
|
**以降**:
|
|
|
|
|
|
- A/B テスト(Mixed 10-run)で +5-8% 確認
|
|
|
|
|
|
- C1/C2 の段階的導入
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
**背中が見える段階から、さらに奥深く。**
|