2025-11-26 13:14:18 +09:00
|
|
|
|
# Random Mixed (128-1KB) ボトルネック分析レポート
|
|
|
|
|
|
|
|
|
|
|
|
**Analyzed**: 2025-11-16
|
|
|
|
|
|
**Performance Gap**: 19.4M ops/s → 23.4% of System (目標: 80%)
|
|
|
|
|
|
**Analysis Depth**: Architecture review + Code tracing + Performance pathfinding
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## Executive Summary
|
|
|
|
|
|
|
|
|
|
|
|
Random Mixed が 23% で停滞している根本原因は、**複数の最適化層が C2-C7(64B-1KB)の異なるクラスに部分的にしか適用されていない** ことです。Fixed-size 256B (40.3M ops/s) との性能差から、**class切り替え頻度と、各クラスの最適化カバレッジ不足** が支配的ボトルネックです。
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 1. Cycles 分布分析
|
|
|
|
|
|
|
|
|
|
|
|
### 1.1 レイヤー別コスト推定
|
|
|
|
|
|
|
|
|
|
|
|
| Layer | Target Classes | Hit Rate | Cycles | Assessment |
|
|
|
|
|
|
|-------|---|---|---|---|
|
|
|
|
|
|
| **HeapV2** | C0-C3 (8-64B) | 88-99% ✅ | **Low (2-3)** | Working well |
|
|
|
|
|
|
| **Ring Cache** | C2-C3 only | 0% (OFF) ❌ | N/A | Not enabled |
|
|
|
|
|
|
| **TLS SLL** | C0-C7 (全) | 0.7-2.7% | **Medium (8-12)** | Fallback only |
|
|
|
|
|
|
| **SuperSlab refill** | All classes | ~2-5% miss | **High (50-200)** | Dominant cost |
|
|
|
|
|
|
| **UltraHot** | C1-C2 | 11.7% | Medium | Disabled (Phase 19) |
|
|
|
|
|
|
|
|
|
|
|
|
### 1.2 支配的ボトルネック: SuperSlab Refill
|
|
|
|
|
|
|
|
|
|
|
|
**理由**:
|
|
|
|
|
|
1. **Refill頻度**: Random Mixed では class切り替え多発 → TLS SLL が複数クラスで頻繁に空になる
|
|
|
|
|
|
2. **Class-specific carving**: SuperSlab内の各slabは「1クラス専用」→ C4/C5/C6/C7 では carving/batch overhead が相対的に大きい
|
|
|
|
|
|
3. **Metadata access**: SuperSlab → TinySlabMeta → carving → SLL push の連鎖で 50-200 cycles
|
|
|
|
|
|
|
|
|
|
|
|
**Code Path** (`core/tiny_alloc_fast.inc.h:386-450` + `core/hakmem_tiny_refill_p0.inc.h`):
|
|
|
|
|
|
```
|
|
|
|
|
|
tiny_alloc_fast_pop() miss
|
|
|
|
|
|
↓
|
|
|
|
|
|
tiny_alloc_fast_refill() called
|
|
|
|
|
|
↓
|
|
|
|
|
|
sll_refill_batch_from_ss() or sll_refill_small_from_ss()
|
|
|
|
|
|
↓
|
|
|
|
|
|
hak_super_registry lookup (linear search)
|
|
|
|
|
|
↓
|
|
|
|
|
|
SuperSlab -> TinySlabMeta[] iteration (32 slabs)
|
|
|
|
|
|
↓
|
|
|
|
|
|
carve_batch_from_slab() (write multiple fields)
|
|
|
|
|
|
↓
|
|
|
|
|
|
tls_sll_push() (chain push)
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 1.3 ボトルネック確定
|
|
|
|
|
|
|
|
|
|
|
|
**最優先**: **SuperSlab refill コスト** (50-200 cycles/refill)
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 2. FrontMetrics 状況確認
|
|
|
|
|
|
|
|
|
|
|
|
### 2.1 実装状況
|
|
|
|
|
|
|
|
|
|
|
|
✅ **実装完了** (`core/box/front_metrics_box.{h,c}`)
|
|
|
|
|
|
|
|
|
|
|
|
**Current Status** (Phase 19-4):
|
|
|
|
|
|
- HeapV2: C0-C3 で 88-99% ヒット率 → 本命層として機能中
|
|
|
|
|
|
- UltraHot: デフォルト OFF (Phase 19-4 で +12.9% 改善のため削除)
|
|
|
|
|
|
- FC/SFC: 実質 OFF
|
|
|
|
|
|
- TLS SLL: Fallback のみ (0.7-2.7%)
|
|
|
|
|
|
|
|
|
|
|
|
### 2.2 Fixed vs Random Mixed の構造的違い
|
|
|
|
|
|
|
|
|
|
|
|
| 側面 | Fixed 256B | Random Mixed |
|
|
|
|
|
|
|------|---|---|
|
|
|
|
|
|
| **使用クラス** | C5 のみ (100%) | C3, C5, C6, C7 (混在) |
|
|
|
|
|
|
| **Class切り替え** | 0 (固定) | 頻繁 (各iteration) |
|
|
|
|
|
|
| **HeapV2適用** | C5 には非適用 ❌ | C0-C3 のみ適用 (部分) |
|
|
|
|
|
|
| **TLS SLL hit率** | High (C5は SLL頼り) | Low (複数class混在) |
|
|
|
|
|
|
| **Refill頻度** | 低い (C5 warm) | **高い (class ごとに空)** |
|
|
|
|
|
|
|
|
|
|
|
|
### 2.3 「死んでいる層」の候補
|
|
|
|
|
|
|
|
|
|
|
|
**C4-C7 (128B-1KB) に対する最適化が極度に不足**:
|
|
|
|
|
|
|
|
|
|
|
|
| Class | Size | Ring | HeapV2 | UltraHot | Coverage |
|
|
|
|
|
|
|-------|---|---|---|---|---|
|
|
|
|
|
|
| C0 | 8B | ❌ | ✅ | ❌ | 1/3 |
|
|
|
|
|
|
| C1 | 16B | ❌ | ✅ | ❌ (OFF) | 1/3 |
|
|
|
|
|
|
| C2 | 32B | ❌ (OFF) | ✅ | ❌ (OFF) | 1/3 |
|
|
|
|
|
|
| C3 | 64B | ❌ (OFF) | ✅ | ❌ (OFF) | 1/3 |
|
|
|
|
|
|
| **C4** | **128B** | ❌ | ❌ | ❌ | **0/3** ← 完全未最適化 |
|
|
|
|
|
|
| **C5** | **256B** | ❌ | ❌ | ❌ | **0/3** ← 完全未最適化 |
|
|
|
|
|
|
| **C6** | **512B** | ❌ | ❌ | ❌ | **0/3** ← 完全未最適化 |
|
|
|
|
|
|
| **C7** | **1024B** | ❌ | ❌ | ❌ | **0/3** ← 完全未最適化 |
|
|
|
|
|
|
|
|
|
|
|
|
**衝撃的発見**: Random Mixed で使用されるクラスの **50%** (C5, C6, C7) が全く最適化されていない!
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 3. Class別パフォーマンスプロファイル
|
|
|
|
|
|
|
|
|
|
|
|
### 3.1 Random Mixed で使用されるクラス
|
|
|
|
|
|
|
|
|
|
|
|
コード分析 (`bench_random_mixed.c:77`):
|
|
|
|
|
|
```c
|
|
|
|
|
|
size_t sz = 16u + (r & 0x3FFu); // 16B-1040B の範囲
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
マッピング:
|
|
|
|
|
|
```
|
|
|
|
|
|
16-31B → C2 (32B) [16B requested]
|
|
|
|
|
|
32-63B → C3 (64B) [32-63B requested]
|
|
|
|
|
|
64-127B → C4 (128B) [64-127B requested]
|
|
|
|
|
|
128-255B → C5 (256B) [128-255B requested]
|
|
|
|
|
|
256-511B → C6 (512B) [256-511B requested]
|
|
|
|
|
|
512-1024B → C7 (1024B) [512-1023B requested]
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**実際の分布**: ほぼ均一分布(ビット選択の性質上)
|
|
|
|
|
|
|
|
|
|
|
|
### 3.2 各クラスの最適化カバレッジ
|
|
|
|
|
|
|
|
|
|
|
|
**C0-C3 (HeapV2): 実装済みだが Random Mixed では使用量少ない**
|
|
|
|
|
|
- HeapV2 magazine capacity: 16/class
|
|
|
|
|
|
- Hit rate: 88-99%(実装は良い)
|
|
|
|
|
|
- **制限**: C4+ に対応していない
|
|
|
|
|
|
|
|
|
|
|
|
**C4-C7 (完全未最適化)**:
|
2025-11-26 14:45:26 +09:00
|
|
|
|
- Ring cache: 実装済みだが **デフォルト OFF** (`HAKMEM_TINY_HOT_RING_ENABLE=0`)
|
2025-11-26 13:14:18 +09:00
|
|
|
|
- HeapV2: C0-C3 のみ
|
|
|
|
|
|
- UltraHot: デフォルト OFF
|
|
|
|
|
|
- **結果**: 素の TLS SLL + SuperSlab refill に頼る
|
|
|
|
|
|
|
|
|
|
|
|
### 3.3 性能への影響
|
|
|
|
|
|
|
|
|
|
|
|
Random Mixed の大半は C4-C7 で処理されているのに、**全く最適化されていない**:
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
固定 256B での性能向上の理由:
|
|
|
|
|
|
- C5 単独 → HeapV2 未適用だが TLS SLL warm保持可能
|
|
|
|
|
|
- Class切り替えない → refill不要
|
|
|
|
|
|
- 結果: 40.3M ops/s
|
|
|
|
|
|
|
|
|
|
|
|
Random Mixed での性能低下の理由:
|
|
|
|
|
|
- C3/C5/C6/C7 混在
|
|
|
|
|
|
- 各クラス TLS SLL small → refill頻繁
|
|
|
|
|
|
- Refill cost: 50-200 cycles/回
|
|
|
|
|
|
- 結果: 19.4M ops/s (47% の性能低下)
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 4. 次の一手候補の優先度付け
|
|
|
|
|
|
|
|
|
|
|
|
### 候補分析
|
|
|
|
|
|
|
|
|
|
|
|
#### 候補A: Ring Cache を C4/C5 に拡張 🔴 最優先
|
|
|
|
|
|
|
|
|
|
|
|
**理由**:
|
|
|
|
|
|
- Phase 21-1 で既に **実装済み**(`core/front/tiny_ring_cache.{h,c}`)
|
|
|
|
|
|
- C2/C3 では未使用(デフォルト OFF)
|
|
|
|
|
|
- C4-C7 への拡張は小さな変更で済む
|
|
|
|
|
|
- **効果**: ポインタチェイス削減 (+15-20%)
|
|
|
|
|
|
|
|
|
|
|
|
**実装状況**:
|
|
|
|
|
|
```c
|
|
|
|
|
|
// tiny_ring_cache.h:67-80
|
|
|
|
|
|
static inline int ring_cache_enabled(void) {
|
|
|
|
|
|
const char* e = getenv("HAKMEM_TINY_HOT_RING_ENABLE");
|
|
|
|
|
|
// デフォルト: 0 (OFF)
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**有効化方法**:
|
|
|
|
|
|
```bash
|
|
|
|
|
|
export HAKMEM_TINY_HOT_RING_ENABLE=1
|
|
|
|
|
|
export HAKMEM_TINY_HOT_RING_C4=128
|
|
|
|
|
|
export HAKMEM_TINY_HOT_RING_C5=128
|
|
|
|
|
|
export HAKMEM_TINY_HOT_RING_C6=64
|
|
|
|
|
|
export HAKMEM_TINY_HOT_RING_C7=64
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**推定効果**:
|
|
|
|
|
|
- 19.4M → 22-25M ops/s (+13-29%)
|
|
|
|
|
|
- TLS SLL pointer chasing: 3 mem → 2 mem
|
|
|
|
|
|
- Cache locality 向上
|
|
|
|
|
|
|
|
|
|
|
|
**実装コスト**: **LOW** (既存実装の有効化のみ)
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
#### 候補B: HeapV2 を C4/C5 に拡張 🟡 中優先度
|
|
|
|
|
|
|
|
|
|
|
|
**理由**:
|
|
|
|
|
|
- Phase 13-A で既に **実装済み**(`core/front/tiny_heap_v2.h`)
|
|
|
|
|
|
- 現在 C0-C3 のみ(`HAKMEM_TINY_HEAP_V2_CLASS_MASK=0xE`)
|
|
|
|
|
|
- Magazine supply で TLS SLL hit rate 向上可能
|
|
|
|
|
|
|
|
|
|
|
|
**制限**:
|
|
|
|
|
|
- Magazine size: 16/class → Random Mixed では小さい
|
|
|
|
|
|
- Phase 17-1 実験: `+0.3%` のみ改善
|
|
|
|
|
|
- **理由**: Delegation overhead = TLS savings
|
|
|
|
|
|
|
|
|
|
|
|
**推定効果**: +2-5% (TLS refill削減)
|
|
|
|
|
|
|
|
|
|
|
|
**実装コスト**: LOW(ENV設定変更のみ)
|
|
|
|
|
|
|
|
|
|
|
|
**判断**: Ring Cache の方が効果的(候補A推奨)
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
#### 候補C: C7 (1KB) 専用 HotPath 実装 🟢 長期
|
|
|
|
|
|
|
|
|
|
|
|
**理由**:
|
|
|
|
|
|
- C7 は Random Mixed の ~16% を占める
|
|
|
|
|
|
- SuperSlab refill cost が大きい
|
|
|
|
|
|
- 専用設計で carve/batch overhead 削減可能
|
|
|
|
|
|
|
|
|
|
|
|
**推定効果**: +5-10% (C7 単体で)
|
|
|
|
|
|
|
|
|
|
|
|
**実装コスト**: **HIGH** (新規設計)
|
|
|
|
|
|
|
|
|
|
|
|
**判断**: 後回し(Ring Cache + その他の最適化後に検討)
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
#### 候補D: SuperSlab refill の高速化 🔥 超長期
|
|
|
|
|
|
|
|
|
|
|
|
**理由**:
|
|
|
|
|
|
- 根本原因(50-200 cycles/refill)の直接攻撃
|
|
|
|
|
|
- Phase 12 (Shared SuperSlab Pool) でアーキテクチャ変更
|
|
|
|
|
|
- 877 SuperSlab → 100-200 に削減
|
|
|
|
|
|
|
|
|
|
|
|
**推定効果**: **+300-400%** (9.38M → 70-90M ops/s)
|
|
|
|
|
|
|
|
|
|
|
|
**実装コスト**: **VERY HIGH** (アーキテクチャ変更)
|
|
|
|
|
|
|
|
|
|
|
|
**判断**: Phase 21(前提となる細かい最適化)完了後に着手
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
### 優先順位付け結論
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
🔴 最優先: Ring Cache C4/C7 拡張 (実装済み、有効化のみ)
|
|
|
|
|
|
期待: +13-29% (19.4M → 22-25M ops/s)
|
|
|
|
|
|
工数: LOW
|
|
|
|
|
|
リスク: LOW
|
|
|
|
|
|
|
|
|
|
|
|
🟡 次点: HeapV2 C4/C5 拡張 (実装済み、有効化のみ)
|
|
|
|
|
|
期待: +2-5%
|
|
|
|
|
|
工数: LOW
|
|
|
|
|
|
リスク: LOW
|
|
|
|
|
|
判断: 効果が小さい(Ring優先)
|
|
|
|
|
|
|
|
|
|
|
|
🟢 長期: C7 専用 HotPath
|
|
|
|
|
|
期待: +5-10%
|
|
|
|
|
|
工数: HIGH
|
|
|
|
|
|
判断: 後回し
|
|
|
|
|
|
|
|
|
|
|
|
🔥 超長期: SuperSlab Shared Pool (Phase 12)
|
|
|
|
|
|
期待: +300-400%
|
|
|
|
|
|
工数: VERY HIGH
|
|
|
|
|
|
判断: 根本解決(Phase 21終了後)
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 5. 推奨施策
|
|
|
|
|
|
|
|
|
|
|
|
### 5.1 即実施: Ring Cache 有効化テスト
|
|
|
|
|
|
|
|
|
|
|
|
**スクリプト** (`scripts/test_ring_cache.sh` の例):
|
|
|
|
|
|
```bash
|
|
|
|
|
|
#!/bin/bash
|
|
|
|
|
|
|
|
|
|
|
|
echo "=== Ring Cache OFF (Baseline) ==="
|
|
|
|
|
|
./out/release/bench_random_mixed_hakmem 500000 256 42
|
|
|
|
|
|
|
|
|
|
|
|
echo "=== Ring Cache ON (C4/C7) ==="
|
|
|
|
|
|
export HAKMEM_TINY_HOT_RING_ENABLE=1
|
|
|
|
|
|
export HAKMEM_TINY_HOT_RING_C4=128
|
|
|
|
|
|
export HAKMEM_TINY_HOT_RING_C5=128
|
|
|
|
|
|
export HAKMEM_TINY_HOT_RING_C6=64
|
|
|
|
|
|
export HAKMEM_TINY_HOT_RING_C7=64
|
|
|
|
|
|
./out/release/bench_random_mixed_hakmem 500000 256 42
|
|
|
|
|
|
|
|
|
|
|
|
echo "=== Ring Cache ON (C2/C3 original) ==="
|
|
|
|
|
|
export HAKMEM_TINY_HOT_RING_ENABLE=1
|
|
|
|
|
|
export HAKMEM_TINY_HOT_RING_C2=128
|
|
|
|
|
|
export HAKMEM_TINY_HOT_RING_C3=128
|
|
|
|
|
|
unset HAKMEM_TINY_HOT_RING_C4 HAKMEM_TINY_HOT_RING_C5 HAKMEM_TINY_HOT_RING_C6 HAKMEM_TINY_HOT_RING_C7
|
|
|
|
|
|
./out/release/bench_random_mixed_hakmem 500000 256 42
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**期待結果**:
|
|
|
|
|
|
- Baseline: 19.4M ops/s (23.4%)
|
|
|
|
|
|
- Ring C4/C7: 22-25M ops/s (24-28%) ← +13-29%
|
|
|
|
|
|
- Ring C2/C3: 20-21M ops/s (23-24%) ← +3-8%
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
### 5.2 検証用 FrontMetrics 計測
|
|
|
|
|
|
|
|
|
|
|
|
**有効化**:
|
|
|
|
|
|
```bash
|
|
|
|
|
|
export HAKMEM_TINY_FRONT_METRICS=1
|
|
|
|
|
|
export HAKMEM_TINY_FRONT_DUMP=1
|
|
|
|
|
|
./out/release/bench_random_mixed_hakmem 500000 256 42 2>&1 | grep -A 100 "Frontend Metrics"
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**期待出力**: クラス別ヒット率一覧(Ring 有効化前後で比較)
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
### 5.3 長期ロードマップ
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
フェーズ 21-1: Ring Cache 有効化 (即実施)
|
|
|
|
|
|
├─ C2/C3 テスト(既実装)
|
|
|
|
|
|
├─ C4-C7 拡張テスト
|
|
|
|
|
|
└─ 期待: 20-25M ops/s (+13-29%)
|
|
|
|
|
|
|
|
|
|
|
|
フェーズ 21-2: Hot Slab Direct Index (Class5+)
|
|
|
|
|
|
└─ SuperSlab slab ループ削減
|
|
|
|
|
|
└─ 期待: 22-30M ops/s (+13-55%)
|
|
|
|
|
|
|
|
|
|
|
|
フェーズ 21-3: Minimal Meta Access
|
|
|
|
|
|
└─ 触るフィールド削減(accessed pattern 限定)
|
|
|
|
|
|
└─ 期待: 24-35M ops/s (+24-80%)
|
|
|
|
|
|
|
|
|
|
|
|
フェーズ 22: Phase 12 (Shared SuperSlab Pool) 着手
|
|
|
|
|
|
└─ 877 SuperSlab → 100-200 削減
|
|
|
|
|
|
└─ 期待: 70-90M ops/s (+260-364%)
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 6. 技術的根拠
|
|
|
|
|
|
|
|
|
|
|
|
### 6.1 Fixed 256B (C5) vs Random Mixed (C3/C5/C6/C7)
|
|
|
|
|
|
|
|
|
|
|
|
**固定の高速性の理由**:
|
|
|
|
|
|
1. **Class 固定** → TLS SLL warm保持
|
|
|
|
|
|
2. **HeapV2 非適用** → でも SLL hit率高い
|
|
|
|
|
|
3. **Refill少ない** → class切り替えない
|
|
|
|
|
|
|
|
|
|
|
|
**Random Mixed の低速性の理由**:
|
|
|
|
|
|
1. **Class 頻繁切り替え** → TLS SLL → 複数class で枯渇
|
|
|
|
|
|
2. **各クラス refill多発** → 50-200 cycles × 多発
|
|
|
|
|
|
3. **最適化カバレッジ 0%** → C4-C7 が素のパス
|
|
|
|
|
|
|
|
|
|
|
|
**差分**: 40.3M - 19.4M = **20.9M ops/s**
|
|
|
|
|
|
|
|
|
|
|
|
素の TLS SLL と Ring Cache の差:
|
|
|
|
|
|
```
|
|
|
|
|
|
TLS SLL (pointer chasing): 3 mem accesses
|
|
|
|
|
|
- Load head: 1 mem
|
|
|
|
|
|
- Load next: 1 mem (cache miss)
|
|
|
|
|
|
- Update head: 1 mem
|
|
|
|
|
|
|
|
|
|
|
|
Ring Cache (array): 2 mem accesses
|
|
|
|
|
|
- Load from array: 1 mem
|
|
|
|
|
|
- Update index: 1 mem (同一cache line)
|
|
|
|
|
|
|
|
|
|
|
|
改善: 3→2 = -33% cycles
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 6.2 Refill Cost 見積もり
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
Random Mixed refill frequency:
|
|
|
|
|
|
- Total iterations: 500K
|
|
|
|
|
|
- Classes: 6 (C2-C7)
|
|
|
|
|
|
- Per-class avg lifetime: 500K/6 ≈ 83K
|
|
|
|
|
|
- TLS SLL typical warmth: 16-32 blocks
|
|
|
|
|
|
- Refill per 50 ops: ~1 refill per 50-100 ops
|
|
|
|
|
|
|
|
|
|
|
|
→ 500K × 1/75 ≈ 6.7K refills
|
|
|
|
|
|
|
|
|
|
|
|
Refill cost:
|
|
|
|
|
|
- SuperSlab lookup: 10-20 cycles
|
|
|
|
|
|
- Slab iteration: 30-50 cycles (32 slabs)
|
|
|
|
|
|
- Carving: 10-15 cycles
|
|
|
|
|
|
- Push chain: 5-10 cycles
|
|
|
|
|
|
Total: ~60-95 cycles/refill (average)
|
|
|
|
|
|
|
|
|
|
|
|
Impact:
|
|
|
|
|
|
- 6.7K × 80 cycles = 536K cycles
|
|
|
|
|
|
- vs 500K × 50 cycles = 25M cycles total
|
|
|
|
|
|
= 2.1% のみ
|
|
|
|
|
|
|
|
|
|
|
|
理由: refill は相対的に少ない、むしろ TLS hit rate の悪さと
|
|
|
|
|
|
class切り替え overhead が支配的
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 7. 最終推奨
|
|
|
|
|
|
|
|
|
|
|
|
| 項目 | 内容 |
|
|
|
|
|
|
|------|------|
|
|
|
|
|
|
| **最優先施策** | **Ring Cache C4/C7 有効化テスト** |
|
|
|
|
|
|
| **期待改善** | +13-29% (19.4M → 22-25M ops/s) |
|
|
|
|
|
|
| **実装期間** | < 1日 (ENV設定のみ) |
|
|
|
|
|
|
| **リスク** | 極低(既実装、有効化のみ) |
|
|
|
|
|
|
| **成功条件** | 23-25M ops/s 到達 (25-28% of system) |
|
|
|
|
|
|
| **次ステップ** | Phase 21-2 (Hot Slab Cache) |
|
|
|
|
|
|
| **長期目標** | Phase 12 (Shared SS Pool) で 70-90M ops/s |
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
**End of Analysis**
|
2025-11-26 14:45:26 +09:00
|
|
|
|
|