334 lines
7.2 KiB
Markdown
334 lines
7.2 KiB
Markdown
|
|
# Magazine Capacity 最適化シミュレーション
|
|||
|
|
|
|||
|
|
## 現状
|
|||
|
|
- Magazine capacity: 2048 (class 0)
|
|||
|
|
- Spill ratio: 1/2 (1024 blocks)
|
|||
|
|
- Test: 1M allocs → 1M frees
|
|||
|
|
- 問題: Magazine内に最大2048 blocks残存 → 2 phantom SuperSlabs
|
|||
|
|
|
|||
|
|
## Option 1: Capacity削減 (2048 → 1024)
|
|||
|
|
**期待効果:**
|
|||
|
|
- Phantom blocks: 2048 → 1024 (-50%)
|
|||
|
|
- Phantom SuperSlabs: 2個 → 1個 (-2 MB)
|
|||
|
|
- Spill頻度: 2倍(性能影響)
|
|||
|
|
|
|||
|
|
**メリット:**
|
|||
|
|
- 実装: 1行変更
|
|||
|
|
- リスク: 低
|
|||
|
|
|
|||
|
|
**デメリット:**
|
|||
|
|
- Spill頻度増加 → lock contention増加
|
|||
|
|
- 性能低下: ~5-10%?
|
|||
|
|
|
|||
|
|
## Option 2: Spill ratio変更 (1/2 → 3/4)
|
|||
|
|
**期待効果:**
|
|||
|
|
- Spill量: 1024 → 1536 blocks (+50%)
|
|||
|
|
- Magazine残存: 少なくなる
|
|||
|
|
- SuperSlab empty検出: 早くなる
|
|||
|
|
|
|||
|
|
**メリット:**
|
|||
|
|
- 実装: 1行変更
|
|||
|
|
- リスク: 低
|
|||
|
|
|
|||
|
|
**デメリット:**
|
|||
|
|
- Magazine効果減少
|
|||
|
|
- Spill overhead増加
|
|||
|
|
|
|||
|
|
## Option 3: Combined (1024 cap + 3/4 spill)
|
|||
|
|
**期待効果:**
|
|||
|
|
- Phantom SuperSlabs: 2個 → 0-1個 (-2-4 MB)
|
|||
|
|
- Magazine残存: 256-512 blocks
|
|||
|
|
|
|||
|
|
**トレードオフ:**
|
|||
|
|
- メモリ: -2-4 MB ✅
|
|||
|
|
- 性能: -5-15% ❌
|
|||
|
|
|
|||
|
|
## 評価
|
|||
|
|
数値変更だけでは**限界がある**:
|
|||
|
|
- Best case: -4 MB削減
|
|||
|
|
- 残り gap: 3.9 MB (System overhead未解決)
|
|||
|
|
- 性能犠牲: 大きい
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
# 仕組み変更の選択肢
|
|||
|
|
|
|||
|
|
## A. Magazine Flush API(中程度の労力)
|
|||
|
|
|
|||
|
|
**設計:**
|
|||
|
|
```c
|
|||
|
|
void hak_tiny_magazine_flush(int class_idx) {
|
|||
|
|
// Magazine内の全blocksをfreelistへspill
|
|||
|
|
// → SuperSlab empty検出が発動
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**トリガー:**
|
|||
|
|
1. Test終了時(手動呼び出し)
|
|||
|
|
2. Idle検出時(自動)
|
|||
|
|
3. メモリ圧迫時(自動)
|
|||
|
|
|
|||
|
|
**期待効果:**
|
|||
|
|
- Phantom SuperSlabs: 2個 → 0個 (-4 MB)
|
|||
|
|
- 性能影響: ほぼゼロ(flush頻度低い)
|
|||
|
|
|
|||
|
|
**実装コスト:**
|
|||
|
|
- API追加: 50 lines
|
|||
|
|
- Idle検出: 100 lines(オプション)
|
|||
|
|
- テスト: 1-2時間
|
|||
|
|
|
|||
|
|
**リスク:**
|
|||
|
|
- API使い忘れ(手動の場合)
|
|||
|
|
- Idle検出の精度(自動の場合)
|
|||
|
|
|
|||
|
|
**評価:** ⭐⭐⭐⭐ (コスパ良い)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## B. Empty検出をMagazine-aware化(高労力)
|
|||
|
|
|
|||
|
|
**設計:**
|
|||
|
|
```c
|
|||
|
|
// Magazine内のblocksもカウント
|
|||
|
|
if (ss->total_active_blocks + magazine_blocks_from_ss(ss) == 0) {
|
|||
|
|
superslab_free(ss);
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**課題:**
|
|||
|
|
- Magazine → SuperSlab の逆参照が必要
|
|||
|
|
- Magazine内の各blockがどのSuperSlabか追跡
|
|||
|
|
|
|||
|
|
**期待効果:**
|
|||
|
|
- Phantom SuperSlabs: 2個 → 0個 (-4 MB)
|
|||
|
|
- 完全自動(APIなし)
|
|||
|
|
|
|||
|
|
**実装コスト:**
|
|||
|
|
- Magazineに metadata追加: 200 lines
|
|||
|
|
- Empty検出ロジック変更: 100 lines
|
|||
|
|
- テスト: 4-6時間
|
|||
|
|
|
|||
|
|
**リスク:**
|
|||
|
|
- Magazine overhead増加
|
|||
|
|
- 複雑度増加 → バグの可能性
|
|||
|
|
|
|||
|
|
**評価:** ⭐⭐⭐ (複雑すぎる)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## C. Two-level Magazine(高労力、高性能)
|
|||
|
|
|
|||
|
|
**設計:**
|
|||
|
|
```
|
|||
|
|
Hot Magazine (256 cap, TLS-local, lock-free)
|
|||
|
|
↓ spill
|
|||
|
|
Cold Magazine (1792 cap, shared, locked)
|
|||
|
|
↓ periodic flush (idle時)
|
|||
|
|
Freelist
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**メリット:**
|
|||
|
|
- Hot pathは超高速(256 cap)
|
|||
|
|
- Cold部分は定期的にflush → メモリ削減
|
|||
|
|
- 性能 vs メモリの良いバランス
|
|||
|
|
|
|||
|
|
**実装コスト:**
|
|||
|
|
- Hot/Cold Magazine実装: 300 lines
|
|||
|
|
- Flush policy: 150 lines
|
|||
|
|
- テスト: 8-10時間
|
|||
|
|
|
|||
|
|
**評価:** ⭐⭐⭐⭐⭐ (Phase 8で検討価値高い)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## D. Adaptive Magazine Capacity(中~高労力)
|
|||
|
|
|
|||
|
|
**設計:**
|
|||
|
|
```c
|
|||
|
|
// Allocation rate監視
|
|||
|
|
if (alloc_rate_low && magazine_age > threshold) {
|
|||
|
|
shrink_magazine_capacity(); // 2048 → 512
|
|||
|
|
}
|
|||
|
|
if (alloc_rate_high) {
|
|||
|
|
grow_magazine_capacity(); // 512 → 2048
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**メリット:**
|
|||
|
|
- Idle時は自動的にメモリ削減
|
|||
|
|
- Busy時は性能維持
|
|||
|
|
|
|||
|
|
**実装コスト:**
|
|||
|
|
- Rate tracking: 100 lines
|
|||
|
|
- Adaptive logic: 150 lines
|
|||
|
|
- Tuning: 4-6時間
|
|||
|
|
|
|||
|
|
**評価:** ⭐⭐⭐⭐ (良いがtuning難しい)
|
|||
|
|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
# System Overhead 6.0 MB の内訳調査
|
|||
|
|
|
|||
|
|
## 疑問点
|
|||
|
|
```
|
|||
|
|
Total RSS: 33.0 MB
|
|||
|
|
├─ User data: 15.3 MB ✅
|
|||
|
|
├─ Test overhead: 7.6 MB ✅ (pointer array)
|
|||
|
|
├─ Active SuperSlabs: 4.0 MB ✅ (Magazine cache)
|
|||
|
|
└─ System overhead: 6.0 MB ⚠️ ← これは何?
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## 可能性のある要因
|
|||
|
|
|
|||
|
|
### 1. Mid/Large Pool overhead
|
|||
|
|
- Mid Pool (64KB-512KB): 静的確保?
|
|||
|
|
- Large Pool (>512KB): Batch cache?
|
|||
|
|
- **調査必要:** Mid/Large がどれだけメモリ使ってる?
|
|||
|
|
|
|||
|
|
### 2. TLS structures
|
|||
|
|
- `g_tls_slabs[8]` × thread数
|
|||
|
|
- `g_tls_mags[8]` × thread数
|
|||
|
|
- **計算:** 8 classes × 数KB × 1 thread = ~64 KB(小さい)
|
|||
|
|
|
|||
|
|
### 3. Global structures
|
|||
|
|
- UCB1, ELO, ACE stats
|
|||
|
|
- Batch cache
|
|||
|
|
- BigCache
|
|||
|
|
- **計算:** 合計で数百KB(小さい)
|
|||
|
|
|
|||
|
|
### 4. Page table overhead
|
|||
|
|
- 1M allocs = 多数のページ
|
|||
|
|
- Page table entries: ~2-4 MB?
|
|||
|
|
- **影響:** 削減困難(OS管理)
|
|||
|
|
|
|||
|
|
### 5. Fragmentation
|
|||
|
|
- SuperSlab内の未使用領域
|
|||
|
|
- Slab alignment gaps
|
|||
|
|
- **計算:** 2MB × 2個 - 実使用 = ?
|
|||
|
|
|
|||
|
|
## 調査アクション
|
|||
|
|
1. Mid/Large Pool のメモリ使用量計測
|
|||
|
|
2. /proc/self/smaps 詳細解析
|
|||
|
|
3. SuperSlab内の実utilization計測
|
|||
|
|
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
# 🚀 総合推奨戦略(段階的アプローチ)
|
|||
|
|
|
|||
|
|
## Phase 7.7: Quick Wins(1-2日)
|
|||
|
|
|
|||
|
|
### Step 1: Magazine Flush API 実装 ⭐⭐⭐⭐⭐
|
|||
|
|
**理由:**
|
|||
|
|
- 実装簡単(50 lines)
|
|||
|
|
- リスク低い
|
|||
|
|
- 効果確実(-4 MB)
|
|||
|
|
- 性能影響ゼロ
|
|||
|
|
|
|||
|
|
**実装:**
|
|||
|
|
```c
|
|||
|
|
void hak_tiny_magazine_flush_all(void) {
|
|||
|
|
for (int i = 0; i < TINY_CLASS_COUNT; i++) {
|
|||
|
|
hak_tiny_magazine_flush(i);
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**期待結果:**
|
|||
|
|
- RSS: 33.0 → 29.0 MB (-4 MB)
|
|||
|
|
- Gap vs mimalloc: 7.9 → 3.9 MB
|
|||
|
|
|
|||
|
|
### Step 2: System Overhead調査
|
|||
|
|
- smaps 解析
|
|||
|
|
- Mid/Large Pool 計測
|
|||
|
|
- 隠れたボトルネック発見
|
|||
|
|
|
|||
|
|
**期待:** 1-2 MBの追加削減ヒント
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Phase 8: Architectural Improvements(1-2週間)
|
|||
|
|
|
|||
|
|
### Priority 1: Two-level Magazine ⭐⭐⭐⭐⭐
|
|||
|
|
**理由:**
|
|||
|
|
- 性能 vs メモリの最適バランス
|
|||
|
|
- mimalloc と同様のアプローチ
|
|||
|
|
- 将来性高い
|
|||
|
|
|
|||
|
|
**設計詳細:**
|
|||
|
|
```
|
|||
|
|
TLS Hot Magazine:
|
|||
|
|
- Capacity: 256
|
|||
|
|
- 100% lock-free
|
|||
|
|
- Fast path
|
|||
|
|
|
|||
|
|
Shared Cold Magazine:
|
|||
|
|
- Capacity: 1792
|
|||
|
|
- Periodic flush (10ms idle)
|
|||
|
|
- Memory efficient
|
|||
|
|
|
|||
|
|
Combined効果:
|
|||
|
|
- Hot pathは現在と同等の性能
|
|||
|
|
- Cold部分で自動メモリ削減
|
|||
|
|
- Phantom SuperSlabs: 0個
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**期待結果:**
|
|||
|
|
- RSS: 29.0 → 26.0 MB (-3 MB)
|
|||
|
|
- 性能: 維持または向上
|
|||
|
|
- Gap vs mimalloc: 3.9 → 0.9 MB 🎯
|
|||
|
|
|
|||
|
|
### Priority 2: Mid/Large Pool 動的化
|
|||
|
|
**現状:** 静的or半静的割当?
|
|||
|
|
**改善:** 完全動的化 + 解放
|
|||
|
|
|
|||
|
|
**期待結果:**
|
|||
|
|
- RSS: -1-2 MB
|
|||
|
|
- Gap vs mimalloc: ほぼ解消!
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 最終予測
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Phase 7.6完了:
|
|||
|
|
HAKMEM: 33.0 MB (116% overhead)
|
|||
|
|
mimalloc: 25.1 MB (64% overhead)
|
|||
|
|
Gap: 7.9 MB
|
|||
|
|
|
|||
|
|
Phase 7.7完了(Quick Wins):
|
|||
|
|
HAKMEM: 29.0 MB (90% overhead) ← -4 MB
|
|||
|
|
Gap: 3.9 MB
|
|||
|
|
|
|||
|
|
Phase 8完了(Two-level + Mid/Large):
|
|||
|
|
HAKMEM: 25.5-26.0 MB (67-70% overhead) ← -3-3.5 MB
|
|||
|
|
Gap: 0.0-0.9 MB
|
|||
|
|
|
|||
|
|
🎉 mimalloc と同等達成!
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
# 結論
|
|||
|
|
|
|||
|
|
## 数値だけの変更 ❌
|
|||
|
|
- 効果限定的(Best case -4 MB)
|
|||
|
|
- 性能犠牲大(-5-15%)
|
|||
|
|
- System overhead未解決
|
|||
|
|
- **推奨しない**
|
|||
|
|
|
|||
|
|
## 仕組みの変更 ✅
|
|||
|
|
- Magazine Flush API: **即実装すべき**
|
|||
|
|
- Two-level Magazine: **Phase 8の柱**
|
|||
|
|
- Mid/Large動的化: **Phase 8で必須**
|
|||
|
|
|
|||
|
|
## ロードマップ
|
|||
|
|
1. **今すぐ:** Magazine Flush API (1日)
|
|||
|
|
2. **1週間:** System overhead調査
|
|||
|
|
3. **2週間:** Two-level Magazine実装
|
|||
|
|
4. **1ヶ月:** Mid/Large Pool完全動的化
|
|||
|
|
|
|||
|
|
**目標:** mimalloc と同等のメモリ効率 + 同等の性能 🚀
|
|||
|
|
|