374 lines
9.3 KiB
Markdown
374 lines
9.3 KiB
Markdown
|
|
# Phase 8 戦略: Ultrathink Analysis
|
|||
|
|
|
|||
|
|
## 🔍 調査結果サマリー
|
|||
|
|
|
|||
|
|
### 完全なOverhead内訳(1M × 16B test)
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Total RSS at peak: 30.6 MB
|
|||
|
|
Expected data: 22.9 MB (7.6 ptr + 15.3 allocs)
|
|||
|
|
Total overhead: 7.7 MB
|
|||
|
|
|
|||
|
|
完全な内訳:
|
|||
|
|
├─ Reserved SuperSlabs: 4.0 MB (52%) ← Phase 7.6 意図的設計
|
|||
|
|
├─ SuperSlab fragmentation: 1.0 MB (13%) ← 2MB alignment必須
|
|||
|
|
├─ Program baseline: 2.0 MB (26%) ← libc + global構造体
|
|||
|
|
├─ Slab metadata: 0.4 MB (5%)
|
|||
|
|
├─ TLS structures: 0.1 MB (1%)
|
|||
|
|
└─ その他: 0.2 MB (3%)
|
|||
|
|
──────────────────────────────────────
|
|||
|
|
Total identified: 7.7 MB (100%) ✅ 完全解明!
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Gap to mimalloc分析
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
HAKMEM (1M scale): 32.9 MB
|
|||
|
|
mimalloc: 25.1 MB
|
|||
|
|
Gap: 7.8 MB
|
|||
|
|
|
|||
|
|
Gap内訳:
|
|||
|
|
├─ Reserved SuperSlabs: 4.0 MB (51%) ← 最大の犯人
|
|||
|
|
├─ Magazine cache想定: 2.0 MB (26%) ← Flush後は0だが、運用時残存
|
|||
|
|
├─ その他 overhead: 1.8 MB (23%) ← 削減困難
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🎯 Phase 7.6設計の功罪
|
|||
|
|
|
|||
|
|
### EMPTY_SUPERSLAB_RESERVE = 2 の影響
|
|||
|
|
|
|||
|
|
**定義場所:** `hakmem_tiny.c:111-113`
|
|||
|
|
```c
|
|||
|
|
#define EMPTY_SUPERSLAB_RESERVE 2 // Keep up to N empty SuperSlabs per class
|
|||
|
|
static SuperSlab* g_empty_superslabs[TINY_NUM_CLASSES];
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**メリット(Phase 7.6で追加された理由):**
|
|||
|
|
1. Re-allocation高速化: mmap syscall不要
|
|||
|
|
2. Fragmentation削減: 既存SuperSlab再利用
|
|||
|
|
3. Latency安定化: メモリ確保コスト排除
|
|||
|
|
|
|||
|
|
**デメリット:**
|
|||
|
|
1. **4MB 常時overhead** ← mimalloc比較で致命的
|
|||
|
|
2. Small workload時のメモリ無駄
|
|||
|
|
3. Idle時にメモリ返却されない
|
|||
|
|
|
|||
|
|
### Phase 7.6での判断基準(当時)
|
|||
|
|
|
|||
|
|
**当時の状況:**
|
|||
|
|
- Phase 7.6前: 40.9 MB
|
|||
|
|
- Phase 7.6後: 33.0 MB (-8 MB削減成功)
|
|||
|
|
- 目標: SuperSlab dynamic deallocation実現
|
|||
|
|
|
|||
|
|
**Phase 7.6の成果:**
|
|||
|
|
- Empty SuperSlab検出: ✅ 成功
|
|||
|
|
- Dynamic deallocation: ✅ 成功
|
|||
|
|
- Reserve設計: ✅ 性能維持
|
|||
|
|
|
|||
|
|
**Phase 7.6では正しい設計だった:**
|
|||
|
|
- 8MB削減達成(reserve 4MBを考慮しても net -4MB)
|
|||
|
|
- 性能維持(re-allocation penalty無し)
|
|||
|
|
|
|||
|
|
**Phase 8で再検討すべき:**
|
|||
|
|
- mimalloc比較で4MBは大きすぎる
|
|||
|
|
- Reserve戦略の見直しが必要
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📊 Magazine Cache実態調査
|
|||
|
|
|
|||
|
|
### 現状設定(hakmem_tiny.c:79-150)
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
#define TINY_TLS_MAG_CAP 2048
|
|||
|
|
|
|||
|
|
static inline int tiny_effective_cap(int class_idx) {
|
|||
|
|
case 0: return 2048; // Class 0 (16B): 2048 × 16B = 32 KB
|
|||
|
|
case 1: return 1024; // Class 1 (32B): 1024 × 32B = 32 KB
|
|||
|
|
case 2: return 768;
|
|||
|
|
case 3: return 512;
|
|||
|
|
case 4: case 5: return 256;
|
|||
|
|
case 6: return 128;
|
|||
|
|
default: return 64;
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Magazine Memory Usage(最大時)
|
|||
|
|
|
|||
|
|
| Class | Block Size | Capacity | Max Memory | Typical Usage |
|
|||
|
|
|-------|-----------|----------|------------|---------------|
|
|||
|
|
| 0 | 16B | 2048 | 32 KB | < 1% (調査結果) |
|
|||
|
|
| 1 | 32B | 1024 | 32 KB | < 1% |
|
|||
|
|
| 2 | 64B | 768 | 48 KB | < 5% |
|
|||
|
|
| 3 | 128B | 512 | 64 KB | < 10% |
|
|||
|
|
| 4 | 256B | 256 | 64 KB | 適正? |
|
|||
|
|
| 5 | 512B | 256 | 128 KB | 適正? |
|
|||
|
|
| 6 | 1KB | 128 | 128 KB | 適正? |
|
|||
|
|
| 7 | 2KB | 64 | 128 KB | 適正? |
|
|||
|
|
| **Total** | | | **624 KB** | **実使用 < 10%** |
|
|||
|
|
|
|||
|
|
**発見:**
|
|||
|
|
- Class 0-3が過剰確保(usage < 1-10%)
|
|||
|
|
- Class 4-7は適正規模
|
|||
|
|
- Total 624 KB確保で実使用 < 62 KB ← **90% waste!**
|
|||
|
|
|
|||
|
|
**1M test時のMagazine残存想定:**
|
|||
|
|
- Worst case: 2048 blocks (class 0) = 32 KB
|
|||
|
|
- Typical: 200-500 blocks = 3-8 KB
|
|||
|
|
- Best case (flush後): 0 KB
|
|||
|
|
|
|||
|
|
**つまり、Magazine cacheは実はそこまで大きくない!**
|
|||
|
|
- Peak時でも 32 KB程度
|
|||
|
|
- Gap 7.8 MBの 0.4% にすぎない
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 💡 Phase 8 戦略の抜本的見直し
|
|||
|
|
|
|||
|
|
### 従来の想定(調査前)
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Gap 7.8 MB の内訳(想定):
|
|||
|
|
├─ Magazine cache: 4 MB (51%) ← 間違い!
|
|||
|
|
├─ System overhead: 3 MB (38%)
|
|||
|
|
└─ その他: 0.8 MB (11%)
|
|||
|
|
|
|||
|
|
Phase 8 計画(従来):
|
|||
|
|
→ Two-level Magazine で -3-4 MB削減
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 実態(調査後)
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Gap 7.8 MB の内訳(実測):
|
|||
|
|
├─ Reserved SuperSlabs: 4 MB (51%) ← 最大の犯人!
|
|||
|
|
├─ Magazine cache: 0.03 MB (0.4%) ← ほぼ無視できる
|
|||
|
|
├─ Fragmentation: 1 MB (13%)
|
|||
|
|
├─ Program baseline: 2 MB (26%)
|
|||
|
|
└─ その他: 0.77 MB (10%)
|
|||
|
|
|
|||
|
|
Phase 8 真の課題:
|
|||
|
|
→ Reserved SuperSlabs 4MB をどう削減するか
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🚀 Phase 8 戦略オプション(Ultrathink)
|
|||
|
|
|
|||
|
|
### Option A: Reserved SuperSlabs削減 ⭐⭐⭐⭐⭐
|
|||
|
|
|
|||
|
|
**アプローチ:**
|
|||
|
|
```c
|
|||
|
|
// Before (Phase 7.6):
|
|||
|
|
#define EMPTY_SUPERSLAB_RESERVE 2 // 4 MB overhead
|
|||
|
|
|
|||
|
|
// After (Phase 8):
|
|||
|
|
#define EMPTY_SUPERSLAB_RESERVE 0 // 0 MB overhead
|
|||
|
|
// または
|
|||
|
|
#define EMPTY_SUPERSLAB_RESERVE 1 // 2 MB overhead(妥協案)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**期待効果:**
|
|||
|
|
- Reserve 2 → 0: **-4 MB削減** ✨
|
|||
|
|
- Reserve 2 → 1: **-2 MB削減**
|
|||
|
|
- Gap: 7.8 MB → 3.8-5.8 MB
|
|||
|
|
|
|||
|
|
**懸念点:**
|
|||
|
|
1. Re-allocation性能低下
|
|||
|
|
- mmap syscall頻度増加(~5 μs/call)
|
|||
|
|
- 影響: Alloc/free cycling workload
|
|||
|
|
- 実測必要
|
|||
|
|
|
|||
|
|
2. Fragmentation増加可能性
|
|||
|
|
- 新SuperSlab確保 → 古いの未使用
|
|||
|
|
- 実装次第で回避可能
|
|||
|
|
|
|||
|
|
**実装難易度:** ⭐ (1行変更)
|
|||
|
|
|
|||
|
|
**リスク:** ⭐⭐⭐ (性能影響要測定)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### Option B: Adaptive Reserve ⭐⭐⭐⭐
|
|||
|
|
|
|||
|
|
**アプローチ:**
|
|||
|
|
```c
|
|||
|
|
// Workload監視して動的調整
|
|||
|
|
int adaptive_reserve_count(int class_idx) {
|
|||
|
|
if (alloc_rate_high && churn_rate_high) {
|
|||
|
|
return 2; // Hot workload: keep reserve
|
|||
|
|
}
|
|||
|
|
if (idle_time > 10ms) {
|
|||
|
|
return 0; // Idle: release reserve
|
|||
|
|
}
|
|||
|
|
return 1; // Normal: minimal reserve
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**期待効果:**
|
|||
|
|
- Idle時: -4 MB
|
|||
|
|
- Hot時: 性能維持
|
|||
|
|
- Best of both worlds!
|
|||
|
|
|
|||
|
|
**実装難易度:** ⭐⭐⭐⭐ (100-200 lines)
|
|||
|
|
|
|||
|
|
**リスク:** ⭐⭐ (Tuning必要)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### Option C: Two-level Magazine (従来計画) ⭐⭐⭐
|
|||
|
|
|
|||
|
|
**再評価:**
|
|||
|
|
- **想定効果: -3-4 MB** ← 間違い!
|
|||
|
|
- **実際効果: -0.03 MB** ← Magazine cacheは小さい
|
|||
|
|
|
|||
|
|
**結論:** Two-level Magazineは**Gap削減にほぼ効果なし**
|
|||
|
|
|
|||
|
|
**ただし別のメリットあり:**
|
|||
|
|
1. Locality向上(Hot cache 256 → L1ヒット率向上)
|
|||
|
|
2. Spill頻度削減(性能向上)
|
|||
|
|
3. TLS memory削減(624 KB → 200 KB)
|
|||
|
|
|
|||
|
|
**Phase 8での優先度:** ⭐⭐⭐ (性能向上目的なら価値あり)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### Option D: SuperSlab Alignment最適化 ⭐⭐
|
|||
|
|
|
|||
|
|
**現状:**
|
|||
|
|
- 2MB alignment必須
|
|||
|
|
- Fragmentation: 1 MB (13%)
|
|||
|
|
|
|||
|
|
**改善案:**
|
|||
|
|
1. Partial SuperSlab release(slab単位 64KB)
|
|||
|
|
2. Dynamic alignment(small workloadは1MB)
|
|||
|
|
|
|||
|
|
**期待効果:** -0.5-1 MB
|
|||
|
|
|
|||
|
|
**実装難易度:** ⭐⭐⭐⭐⭐ (高リスク・低効果)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📋 推奨ロードマップ
|
|||
|
|
|
|||
|
|
### Phase 8.1: Reserved SuperSlabs削減実験(1-2日)
|
|||
|
|
|
|||
|
|
**Step 1: Reserved 2 → 1 に削減**
|
|||
|
|
```c
|
|||
|
|
#define EMPTY_SUPERSLAB_RESERVE 1 // 2 → 1
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**測定項目:**
|
|||
|
|
1. Battle test memory: 期待 32.9 → 30.9 MB (-2 MB)
|
|||
|
|
2. Re-allocation benchmark性能影響
|
|||
|
|
3. Fragmentation変化
|
|||
|
|
|
|||
|
|
**判定基準:**
|
|||
|
|
- 性能低下 < 5%: OK → Reserve 1 → 0 検討
|
|||
|
|
- 性能低下 > 5%: NG → Adaptive Reserve検討
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### Phase 8.2: Reserved 1 → 0 または Adaptive実装(2-3日)
|
|||
|
|
|
|||
|
|
**Option 2a: Reserved 0(シンプル)**
|
|||
|
|
```c
|
|||
|
|
#define EMPTY_SUPERSLAB_RESERVE 0
|
|||
|
|
```
|
|||
|
|
- 期待: 32.9 → 28.9 MB (-4 MB)
|
|||
|
|
- Gap: 7.8 → 3.8 MB (**51% gap close!**)
|
|||
|
|
|
|||
|
|
**Option 2b: Adaptive Reserve(高度)**
|
|||
|
|
- Idle検出実装
|
|||
|
|
- Alloc rate監視
|
|||
|
|
- 期待: Idle時 -4 MB, Hot時性能維持
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### Phase 8.3: Two-level Magazine(性能最適化)(3-4日)
|
|||
|
|
|
|||
|
|
**目的変更:** Gap削減 → **性能向上**
|
|||
|
|
|
|||
|
|
**設計:**
|
|||
|
|
```
|
|||
|
|
Hot Magazine (256 cap, TLS)
|
|||
|
|
├─ L1キャッシュ最適化
|
|||
|
|
├─ Spill頻度 1/8 → 1/1(常にfull活用)
|
|||
|
|
└─ Locality向上
|
|||
|
|
|
|||
|
|
Cold Magazine (削除 or 最小限)
|
|||
|
|
└─ 不要(Reserved SuperSlabsで代替)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**期待効果:**
|
|||
|
|
- Memory: 変化なし
|
|||
|
|
- Performance: +5-10%(locality向上)
|
|||
|
|
- TLS usage: -400 KB
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🎯 Phase 8 最終目標
|
|||
|
|
|
|||
|
|
### Conservative Plan(Reserved 1)
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Phase 8.1完了:
|
|||
|
|
HAKMEM: 30.9 MB (-2 MB)
|
|||
|
|
Gap: 5.8 MB (23%)
|
|||
|
|
|
|||
|
|
Phase 8.2 (Two-level):
|
|||
|
|
HAKMEM: 30.5 MB (-0.4 MB, TLS削減)
|
|||
|
|
Gap: 5.4 MB (22%)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Aggressive Plan(Reserved 0)
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Phase 8.1完了:
|
|||
|
|
HAKMEM: 28.9 MB (-4 MB) ✨
|
|||
|
|
Gap: 3.8 MB (15%)
|
|||
|
|
|
|||
|
|
Phase 8.2 (Two-level):
|
|||
|
|
HAKMEM: 28.5 MB (-0.4 MB)
|
|||
|
|
Gap: 3.4 MB (14%) 🎯
|
|||
|
|
|
|||
|
|
🏆 mimalloc比 +14% = 許容範囲!
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## ✅ 結論
|
|||
|
|
|
|||
|
|
### 調査結果
|
|||
|
|
|
|||
|
|
1. **Gap 7.8 MBの51%はReserved SuperSlabs(Phase 7.6設計)**
|
|||
|
|
2. **Magazine cacheは想定の1/100程度(0.4%)**
|
|||
|
|
3. **Two-level Magazineはメモリ削減効果ほぼゼロ**
|
|||
|
|
|
|||
|
|
### Phase 8戦略
|
|||
|
|
|
|||
|
|
**Priority 1:** Reserved SuperSlabs削減 ⭐⭐⭐⭐⭐
|
|||
|
|
- 最大効果: -4 MB(Gap 51%削減)
|
|||
|
|
- 最小リスク: 1行変更
|
|||
|
|
- 要測定: 性能影響
|
|||
|
|
|
|||
|
|
**Priority 2:** Two-level Magazine(目的変更)⭐⭐⭐
|
|||
|
|
- 効果: 性能向上+5-10%(メモリ効果minimal)
|
|||
|
|
- 意義: Locality最適化
|
|||
|
|
|
|||
|
|
**Priority 3:** Adaptive Reserve ⭐⭐⭐⭐
|
|||
|
|
- 効果: P1性能問題があればこれ
|
|||
|
|
- Best of both worlds
|
|||
|
|
|
|||
|
|
### Target
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Conservative: Gap 5.4 MB (22%)
|
|||
|
|
Aggressive: Gap 3.4 MB (14%) 🎯
|
|||
|
|
|
|||
|
|
🚀 Phase 8完了後、mimalloc比+14-22%
|
|||
|
|
= Production-ready!
|
|||
|
|
```
|