287 lines
7.7 KiB
Markdown
287 lines
7.7 KiB
Markdown
|
|
# Phase 3 C2: Slab Metadata Cache Optimization 設計メモ
|
|||
|
|
|
|||
|
|
## 目的
|
|||
|
|
|
|||
|
|
Free path が全体の **29-31%** を占める中、metadata access(policy snapshot, slab descriptor)の cache locality を改善。
|
|||
|
|
|
|||
|
|
**期待**: **+5-10%** 改善
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 結果(A/B)
|
|||
|
|
|
|||
|
|
**判定**: 🔬 NEUTRAL(研究箱維持、default OFF)
|
|||
|
|
|
|||
|
|
- Mixed 10-run:
|
|||
|
|
- Baseline(`HAKMEM_TINY_METADATA_CACHE=0`): avg **40.43M** / median **40.72M**
|
|||
|
|
- Optimized(`HAKMEM_TINY_METADATA_CACHE=1`): avg **40.25M** / median **40.29M**
|
|||
|
|
- Delta: avg **-0.45%** / median **-1.06%**
|
|||
|
|
|
|||
|
|
**理由(要約)**:
|
|||
|
|
- policy hot cache: learner interlock の probe コストが gain を相殺
|
|||
|
|
- first page cache: “lookup が支配的” な状況ではなかった(free 側は既に軽い)
|
|||
|
|
- bounds check: 既にコンパイラ最適化が効いていた
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 観察と根拠
|
|||
|
|
|
|||
|
|
### 現状のメモリアクセスパターン(Free path)
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
free(ptr)
|
|||
|
|
→ tiny_free_fast()
|
|||
|
|
① TLS g_unified_cache[class] 取得 [L1 miss 確率: ~5%]
|
|||
|
|
② policy_snapshot() / static_route [L1 miss 確率: ~2%]
|
|||
|
|
③ slab descriptor lookup(superslab, segment) [L1 miss 確率: ~8%]
|
|||
|
|
④ slots[] array write [L1 miss 確率: ~10%]
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**ボトルネック**: ③ slab descriptor lookup
|
|||
|
|
- Tiny クラス(C0-C3): `tiny_legacy_fallback_free_base()`
|
|||
|
|
- それ以上: `small_heap_free()` → policy → route → handler
|
|||
|
|
|
|||
|
|
### 改善対象
|
|||
|
|
|
|||
|
|
1. **Policy struct**: 現在は global + TLS キャッシュ
|
|||
|
|
- 理想: Hot member(route_kind[8], learner_v7_enabled)だけを TLS に複製
|
|||
|
|
|
|||
|
|
2. **Slab descriptor**: `SmallSegment` or `SmallPageMeta` lookup
|
|||
|
|
- 現在: 64-byte alignment の metadata struct (複数 cache line)
|
|||
|
|
- 改善案: First page descriptor を inline TLS context に(C0-C7 の "second tier" キャッシュ)
|
|||
|
|
|
|||
|
|
3. **Unified cache slot[]**: 既に TLS だが、array bounds check のため head/tail 更新が頻繁
|
|||
|
|
- 改善案: `capacity` をマクロ定数化して bounds check を compile-out
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Box Theory(実装計画)
|
|||
|
|
|
|||
|
|
### L0: Env(戻せる)
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
HAKMEM_TINY_METADATA_CACHE=0/1 # default: 0(OFF)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
- 目的: metadata hot 配置の on/off(回帰時に即戻せる)
|
|||
|
|
|
|||
|
|
### L1: MetadataCacheBox(境界: 2 箇所)
|
|||
|
|
|
|||
|
|
#### 1.1 Policy Hot Cache
|
|||
|
|
|
|||
|
|
**責務**: Free path での policy_snapshot() をスキップ
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
// tiny_env_box.h に追加
|
|||
|
|
typedef struct {
|
|||
|
|
uint8_t route_kind[8]; // C0-C7 route (copied from policy, not learner-synced)
|
|||
|
|
uint8_t learner_v7_enabled; // Boolean: is learner v7 active?
|
|||
|
|
} TinyPolicyHot; // 9 bytes packed, fits in 16-byte slot
|
|||
|
|
|
|||
|
|
extern __thread TinyPolicyHot g_policy_hot;
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**初期化**:
|
|||
|
|
- malloc/free entry で `tiny_policy_hot_refresh()`(policy_snapshot() の代わり)
|
|||
|
|
- learner_v7 enabled 時は disable(learner は動的に route_kind を更新)
|
|||
|
|
|
|||
|
|
#### 1.2 Slab First Page Inline
|
|||
|
|
|
|||
|
|
**責務**: Slab descriptor lookup の "first hit" 率向上
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
// tiny_front_hot_box.h に追加(TLS context)
|
|||
|
|
typedef struct {
|
|||
|
|
// Tiny LEGACY cache
|
|||
|
|
void* first_page_base; // Current page pointer (avoid superslab lookup)
|
|||
|
|
uint16_t first_page_free_count; // Free slots in current page
|
|||
|
|
} TinyFirstPageCache;
|
|||
|
|
|
|||
|
|
extern __thread TinyFirstPageCache g_first_page_cache[8]; // Per-class
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**ライフサイクル**:
|
|||
|
|
- Refill 時: first_page_cache[class] = new page
|
|||
|
|
- Retire 時: first_page_cache[class] = NULL(次の refill で update)
|
|||
|
|
|
|||
|
|
### L2: Integration Points(2 箇所)
|
|||
|
|
|
|||
|
|
#### 2.1 Free path(目標)
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
// tiny_legacy_fallback_box.h
|
|||
|
|
inline void tiny_legacy_fallback_free_base(...) {
|
|||
|
|
if (tiny_metadata_cache_enabled()) {
|
|||
|
|
// Fast: Check cached first page
|
|||
|
|
if (ptr >= g_first_page_cache[class].base &&
|
|||
|
|
ptr < g_first_page_cache[class].base + PAGE_SIZE) {
|
|||
|
|
// Hit: update counter, push to unified cache
|
|||
|
|
...
|
|||
|
|
return;
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
// Slow: standard superslab lookup
|
|||
|
|
...
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 2.2 Policy hot cache refresh
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
// malloc/free wrapper
|
|||
|
|
if (__builtin_expect(small_policy_v7_version_changed(), 0)) {
|
|||
|
|
if (tiny_metadata_cache_enabled()) {
|
|||
|
|
tiny_policy_hot_refresh(); // Sync route_kind[8] from policy
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 実装指示(段階的パッチ)
|
|||
|
|
|
|||
|
|
### Patch 1: Policy Hot Cache(小)
|
|||
|
|
|
|||
|
|
**ファイル**:
|
|||
|
|
- `core/box/tiny_metadata_cache_env_box.h` (新規)
|
|||
|
|
- ENV gate
|
|||
|
|
- `TinyPolicyHot` struct
|
|||
|
|
|
|||
|
|
- `core/box/tiny_metadata_cache_hot_box.h` (新規)
|
|||
|
|
- API: `tiny_policy_hot_refresh()`
|
|||
|
|
- TLS: `extern __thread TinyPolicyHot g_policy_hot;`
|
|||
|
|
|
|||
|
|
- `core/box/tiny_metadata_cache_hot_box.c` (新規)
|
|||
|
|
- Implementation
|
|||
|
|
|
|||
|
|
**条件**:
|
|||
|
|
- `tiny_metadata_cache_enabled() && route_kind != learner_active`
|
|||
|
|
- `__builtin_expect(tiny_metadata_cache_enabled(), 0)` (rare, ON時のみ)
|
|||
|
|
|
|||
|
|
**測定対象**: A/B test で +1-3% を期待
|
|||
|
|
|
|||
|
|
### Patch 2: First Page Inline Cache(中)
|
|||
|
|
|
|||
|
|
**ファイル**:
|
|||
|
|
- `core/front/tiny_first_page_cache.h` (新規)
|
|||
|
|
- `TinyFirstPageCache` struct
|
|||
|
|
- `tiny_first_page_cache_hit()` inline check
|
|||
|
|
- `tiny_first_page_cache_update()` on refill
|
|||
|
|
|
|||
|
|
- `core/front/tiny_legacy_fallback_box.h` (変更)
|
|||
|
|
- free path で first_page_cache check 追加
|
|||
|
|
|
|||
|
|
**条件**:
|
|||
|
|
- LEGACY route のみ(MID/ULTRA は不要)
|
|||
|
|
- Refill 時に自動 update
|
|||
|
|
|
|||
|
|
**測定対象**: Patch 1 + 2 で +3-7% を期待
|
|||
|
|
|
|||
|
|
### Patch 3: Bounds Check Compile-out(小最適化)
|
|||
|
|
|
|||
|
|
**ファイル**:
|
|||
|
|
- `core/front/tiny_unified_cache.h` (変更)
|
|||
|
|
- capacity をマクロ定数化(Hot_2048 strategy の hardcode)
|
|||
|
|
|
|||
|
|
**条件**:
|
|||
|
|
- compile-time 定数ならば `& (2048-1)` は一度だけ計算
|
|||
|
|
|
|||
|
|
**測定対象**: Patch 1+2+3 で +4-9% を期待
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## A/B(GO/NO-GO)
|
|||
|
|
|
|||
|
|
### Test Plan
|
|||
|
|
|
|||
|
|
**プロファイル**: `HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE`
|
|||
|
|
|
|||
|
|
**ベースライン**: Patch 0(現在: +2.20% from C3)
|
|||
|
|
```
|
|||
|
|
HAKMEM_TINY_METADATA_CACHE=0 (default OFF)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**最適化版**: Patch 1+2 (必須) + Patch 3 (optional)
|
|||
|
|
```
|
|||
|
|
HAKMEM_TINY_METADATA_CACHE=1 (metadata cache ON)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**条件**:
|
|||
|
|
- Mixed (10-run) で測定
|
|||
|
|
- Learner disabled(learner ON 時は policy_hot disable)
|
|||
|
|
|
|||
|
|
**決定基準**:
|
|||
|
|
- **GO**: +1.0% 以上
|
|||
|
|
- **NEUTRAL**: ±1.0% → 研究箱維持(Patch 1+2 を default OFF のまま)
|
|||
|
|
- **NO-GO**: -1.0% 以下 → FREEZE(ENV gate で disable)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## リスク評価
|
|||
|
|
|
|||
|
|
### 安全性チェック
|
|||
|
|
|
|||
|
|
| リスク | 対策 |
|
|||
|
|
|--------|------|
|
|||
|
|
| Policy hot cache が stale | learner disabled で OFF、bench putenv sync あり |
|
|||
|
|
| First page cache invalid | Refill/retire で explicit invalidate |
|
|||
|
|
| Bounds check miss | Macro hardcode で compile-out(型安全) |
|
|||
|
|
| Lock depth | Free path なので不要 |
|
|||
|
|
|
|||
|
|
### Rollback
|
|||
|
|
|
|||
|
|
- `HAKMEM_TINY_METADATA_CACHE=0` で即座に disable
|
|||
|
|
- ENV gate のみ(コード削除不要)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 期待値の根拠
|
|||
|
|
|
|||
|
|
**なぜ +5-10% か?**
|
|||
|
|
|
|||
|
|
1. Policy hot cache: policy_snapshot() bypass → -2 memory ops
|
|||
|
|
- 期待: +1-2%
|
|||
|
|
|
|||
|
|
2. First page inline: superslab lookup bypass → -1-2 memory ops
|
|||
|
|
- 期待: +2-4%
|
|||
|
|
|
|||
|
|
3. Bounds check compile-out: modulo 演算の削減
|
|||
|
|
- 期待: +0.5-1%
|
|||
|
|
|
|||
|
|
**合計**: +3.5-7% (保守的に見積もると +3-5%)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 実装スケジュール(推定)
|
|||
|
|
|
|||
|
|
1. **Patch 1 実装** (15-20分)
|
|||
|
|
- Box ファイル作成
|
|||
|
|
- Cache refresh ロジック
|
|||
|
|
- ENV gate
|
|||
|
|
|
|||
|
|
2. **Patch 2 実装** (20-30分)
|
|||
|
|
- First page inline cache
|
|||
|
|
- Free path 統合
|
|||
|
|
|
|||
|
|
3. **Build & Test** (10分)
|
|||
|
|
- Compile 確認
|
|||
|
|
- Sanity benchmark
|
|||
|
|
|
|||
|
|
4. **A/B Test** (15-20分)
|
|||
|
|
- 10-run Mixed
|
|||
|
|
- 統計分析
|
|||
|
|
|
|||
|
|
5. **Commit & Summary** (5分)
|
|||
|
|
- GO/NO-GO 判定
|
|||
|
|
- ドキュメント更新
|
|||
|
|
|
|||
|
|
**Total**: 約 65-85 分
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 非目標
|
|||
|
|
|
|||
|
|
- ルーティングアルゴリズム変更(metadata cache は "hint" のみ)
|
|||
|
|
- Learner との相互作用修正(disabled 時のみ動作)
|
|||
|
|
- Cold path 最適化(C2 は hot path focused)
|