302 lines
8.2 KiB
Markdown
302 lines
8.2 KiB
Markdown
|
|
# Ring Cache C4-C7 有効化ガイド(Phase 21-1 即実施版)
|
|||
|
|
|
|||
|
|
**Priority**: 🔴 HIGHEST
|
|||
|
|
**Status**: Implementation Ready (待つだけ)
|
|||
|
|
**Expected Gain**: +13-29% (19.4M → 22-25M ops/s)
|
|||
|
|
**Risk Level**: LOW (既実装、有効化のみ)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 概要
|
|||
|
|
|
|||
|
|
Random Mixed の bottleneck は **C4-C7 (128B-1KB) が完全未最適化** されている点です。
|
|||
|
|
Phase 21-1 で実装済みの **Ring Cache** を有効化することで、TLS SLL のポインタチェイス(3 mem)を 配列アクセス(2 mem)に削減し、+13-29% の性能向上が期待できます。
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Ring Cache とは
|
|||
|
|
|
|||
|
|
### アーキテクチャ
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
3-層階層:
|
|||
|
|
Layer 0: Ring Cache (array-based, 128 slots)
|
|||
|
|
└─ Fast pop/push (1-2 mem accesses)
|
|||
|
|
|
|||
|
|
Layer 1: TLS SLL (linked list)
|
|||
|
|
└─ Medium pop/push (3 mem accesses + cache miss)
|
|||
|
|
|
|||
|
|
Layer 2: SuperSlab
|
|||
|
|
└─ Slow refill (50-200 cycles)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 性能改善の仕組み
|
|||
|
|
|
|||
|
|
**従来の TLS SLL (pointer chasing)**:
|
|||
|
|
```
|
|||
|
|
Pop:
|
|||
|
|
1. Load head pointer: mov rax, [g_tls_sll_head]
|
|||
|
|
2. Load next pointer: mov rdx, [rax] ← cache miss!
|
|||
|
|
3. Update head: mov [g_tls_sll_head], rdx
|
|||
|
|
= 3 memory accesses
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Ring Cache (array-based)**:
|
|||
|
|
```
|
|||
|
|
Pop:
|
|||
|
|
1. Load from array: mov rax, [g_ring_cache + head*8]
|
|||
|
|
2. Update head index: add head, 1 ← CPU register!
|
|||
|
|
= 2 memory accesses、キャッシュミスなし
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**改善**: 3 → 2 memory = -33% cycles per alloc/free
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 実装状況確認
|
|||
|
|
|
|||
|
|
### ファイル一覧
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# Ring Cache 実装ファイル
|
|||
|
|
ls -la /mnt/workdisk/public_share/hakmem/core/front/tiny_ring_cache.{h,c}
|
|||
|
|
|
|||
|
|
# 確認コマンド
|
|||
|
|
grep -n "ring_cache_enabled\|HAKMEM_TINY_HOT_RING" \
|
|||
|
|
/mnt/workdisk/public_share/hakmem/core/front/tiny_ring_cache.h | head -20
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 既実装機能の確認
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
// core/front/tiny_ring_cache.h:67-80
|
|||
|
|
static inline int ring_cache_enabled(void) {
|
|||
|
|
static int g_enable = -1;
|
|||
|
|
if (__builtin_expect(g_enable == -1, 0)) {
|
|||
|
|
const char* e = getenv("HAKMEM_TINY_HOT_RING_ENABLE");
|
|||
|
|
g_enable = (e && *e && *e != '0') ? 1 : 0; // Default: 0 (OFF)
|
|||
|
|
#if !HAKMEM_BUILD_RELEASE
|
|||
|
|
if (g_enable) {
|
|||
|
|
fprintf(stderr, "[Ring-INIT] ring_cache_enabled() = %d\n", g_enable);
|
|||
|
|
}
|
|||
|
|
#endif
|
|||
|
|
}
|
|||
|
|
return g_enable;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// Ring pop/push already implemented:
|
|||
|
|
// - ring_cache_pop() (line 159-190)
|
|||
|
|
// - ring_cache_push() (line 195-228)
|
|||
|
|
// - Per-class capacities: C2/C3 (default: 128, configurable)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## テスト実施手順
|
|||
|
|
|
|||
|
|
### Step 1: ビルド確認
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
cd /mnt/workdisk/public_share/hakmem
|
|||
|
|
|
|||
|
|
# Release ビルド
|
|||
|
|
./build.sh bench_random_mixed_hakmem
|
|||
|
|
./build.sh bench_random_mixed_system
|
|||
|
|
|
|||
|
|
# 確認
|
|||
|
|
ls -lh ./out/release/bench_random_mixed_*
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Step 2: Baseline 測定
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# Ring Cache OFF (現在のデフォルト)
|
|||
|
|
echo "=== Baseline (Ring Cache OFF) ==="
|
|||
|
|
./out/release/bench_random_mixed_hakmem 500000 256 42
|
|||
|
|
|
|||
|
|
# Expected: ~19.4M ops/s (23.4% of system)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Step 3: Ring Cache C2/C3 テスト(既存)
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
echo "=== Ring Cache C2/C3 (experimental baseline) ==="
|
|||
|
|
export HAKMEM_TINY_HOT_RING_ENABLE=1
|
|||
|
|
export HAKMEM_TINY_HOT_RING_C2=128
|
|||
|
|
export HAKMEM_TINY_HOT_RING_C3=128
|
|||
|
|
|
|||
|
|
./out/release/bench_random_mixed_hakmem 500000 256 42
|
|||
|
|
|
|||
|
|
# Expected: ~20-21M ops/s (+3-8% from baseline)
|
|||
|
|
# Note: C2/C3 は Random Mixed で少数派
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Step 4: Ring Cache C4-C7 テスト(推奨)
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
echo "=== Ring Cache C4-C7 (推奨: Random Mixed の主要クラス) ==="
|
|||
|
|
export HAKMEM_TINY_HOT_RING_ENABLE=1
|
|||
|
|
export HAKMEM_TINY_HOT_RING_C4=128
|
|||
|
|
export HAKMEM_TINY_HOT_RING_C5=128
|
|||
|
|
export HAKMEM_TINY_HOT_RING_C6=64
|
|||
|
|
export HAKMEM_TINY_HOT_RING_C7=64
|
|||
|
|
unset HAKMEM_TINY_HOT_RING_C2 HAKMEM_TINY_HOT_RING_C3
|
|||
|
|
|
|||
|
|
./out/release/bench_random_mixed_hakmem 500000 256 42
|
|||
|
|
|
|||
|
|
# Expected: ~22-25M ops/s (+13-29% from baseline)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Step 5: Combined (全クラス) テスト
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
echo "=== Ring Cache All Classes (C0-C7) ==="
|
|||
|
|
export HAKMEM_TINY_HOT_RING_ENABLE=1
|
|||
|
|
# デフォルト: C2=128, C3=128, C4=128, C5=128, C6=64, C7=64
|
|||
|
|
unset HAKMEM_TINY_HOT_RING_C2 HAKMEM_TINY_HOT_RING_C3 HAKMEM_TINY_HOT_RING_C4 \
|
|||
|
|
HAKMEM_TINY_HOT_RING_C5 HAKMEM_TINY_HOT_RING_C6 HAKMEM_TINY_HOT_RING_C7
|
|||
|
|
|
|||
|
|
./out/release/bench_random_mixed_hakmem 500000 256 42
|
|||
|
|
|
|||
|
|
# Expected: ~23-24M ops/s (+18-24% from baseline)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## ENV変数リファレンス
|
|||
|
|
|
|||
|
|
### 有効化/無効化
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# Ring Cache 全体の有効/無効
|
|||
|
|
export HAKMEM_TINY_HOT_RING_ENABLE=1 # ON (default: 0 = OFF)
|
|||
|
|
export HAKMEM_TINY_HOT_RING_ENABLE=0 # OFF
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### クラス別容量設定
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# デフォルト値: すべて 128 (Ring サイズ)
|
|||
|
|
export HAKMEM_TINY_HOT_RING_C0=128 # 8B
|
|||
|
|
export HAKMEM_TINY_HOT_RING_C1=128 # 16B
|
|||
|
|
export HAKMEM_TINY_HOT_RING_C2=128 # 32B
|
|||
|
|
export HAKMEM_TINY_HOT_RING_C3=128 # 64B
|
|||
|
|
export HAKMEM_TINY_HOT_RING_C4=128 # 128B (新)
|
|||
|
|
export HAKMEM_TINY_HOT_RING_C5=128 # 256B (新)
|
|||
|
|
export HAKMEM_TINY_HOT_RING_C6=64 # 512B (新)
|
|||
|
|
export HAKMEM_TINY_HOT_RING_C7=64 # 1024B (新)
|
|||
|
|
|
|||
|
|
# サイズ指定: 32-256 (power of 2 に自動調整)
|
|||
|
|
# 小さい: 32, 64 → メモリ効率優先、ヒット率低
|
|||
|
|
# 中: 128 → バランス型(推奨)
|
|||
|
|
# 大: 256 → ヒット率優先、メモリ多消費
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### カスケード設定(上級)
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# Ring → SLL への一方向補充(デフォルト: OFF)
|
|||
|
|
export HAKMEM_TINY_HOT_RING_CASCADE=1 # SLL 空時に Ring から補充
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### デバッグ出力
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# Metrics 出力(リリースビルド時は無効)
|
|||
|
|
export HAKMEM_DEBUG_COUNTERS=1 # Ring hit/miss カウント
|
|||
|
|
export HAKMEM_BUILD_RELEASE=0 # デバッグビルド(遅い)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## テスト結果フォーマット
|
|||
|
|
|
|||
|
|
各テストの結果を以下形式で記録してください:
|
|||
|
|
|
|||
|
|
```markdown
|
|||
|
|
### Test Results (YYYY-MM-DD HH:MM)
|
|||
|
|
|
|||
|
|
| Test | Iterations | Workset | Seed | Result | vs Baseline | Status |
|
|||
|
|
|------|---|---|---|---|---|---|
|
|||
|
|
| Baseline (OFF) | 500K | 256 | 42 | 19.4M | - | ✓ |
|
|||
|
|
| C2/C3 Ring | 500K | 256 | 42 | 20.5M | +5.7% | ✓ |
|
|||
|
|
| C4/C7 Ring | 500K | 256 | 42 | 23.0M | +18.6% | ✓✓ |
|
|||
|
|
| All Classes | 500K | 256 | 42 | 22.8M | +17.5% | ✓✓ |
|
|||
|
|
|
|||
|
|
**Recommendation**: C4-C7 設定で +18.6% 改善、目標達成
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## トラブルシューティング
|
|||
|
|
|
|||
|
|
### 問題: Ring Cache 有効化しても性能向上しない
|
|||
|
|
|
|||
|
|
**診断**:
|
|||
|
|
```bash
|
|||
|
|
# ENV が実際に反映されているか確認
|
|||
|
|
./out/release/bench_random_mixed_hakmem 100 256 42 2>&1 | grep -i "ring\|cache"
|
|||
|
|
|
|||
|
|
# 期待出力: [Ring-INIT] ring_cache_enabled() = 1
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**原因候補**:
|
|||
|
|
1. **ENV が設定されていない** → `export HAKMEM_TINY_HOT_RING_ENABLE=1` を再確認
|
|||
|
|
2. **ビルドが古い** → `./build.sh clean && ./build.sh bench_random_mixed_hakmem`
|
|||
|
|
3. **リリースビルド** → デバッグ出力なし(正常、性能測定のため)
|
|||
|
|
|
|||
|
|
### 問題: ハング or SEGV
|
|||
|
|
|
|||
|
|
**対応**:
|
|||
|
|
```bash
|
|||
|
|
# Ring Cache OFF に戻す
|
|||
|
|
unset HAKMEM_TINY_HOT_RING_ENABLE
|
|||
|
|
unset HAKMEM_TINY_HOT_RING_C{0..7}
|
|||
|
|
|
|||
|
|
./out/release/bench_random_mixed_hakmem 100 256 42
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**報告**: 発生時は StackTrace + ENV 設定を記録
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 成功基準
|
|||
|
|
|
|||
|
|
| 項目 | 基準 | 判定 |
|
|||
|
|
|------|------|------|
|
|||
|
|
| **Baseline 測定** | 19-20M ops/s | ✅ Pass |
|
|||
|
|
| **C4-C7 Ring 有効化** | 22M ops/s 以上 | ✅ Pass (+13%+) |
|
|||
|
|
| **目標達成** | 23-25M ops/s | 🎯 Target |
|
|||
|
|
| **Crash/Hang** | なし | ✅ Stability |
|
|||
|
|
| **FrontMetrics 検証** | Ring hit > 50% | ✅ Confirm |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 次のステップ
|
|||
|
|
|
|||
|
|
### 成功時 (23-25M ops/s 到達):
|
|||
|
|
1. ✅ Ring Cache C4-C7 を本番設定として固定
|
|||
|
|
2. 🔄 Phase 21-2 (Hot Slab Direct Index) 実装開始
|
|||
|
|
3. 📊 FrontMetrics で詳細分析(class別 hit rate)
|
|||
|
|
|
|||
|
|
### 失敗時 (改善なし):
|
|||
|
|
1. 🔍 FrontMetrics で Ring hit rate 確認
|
|||
|
|
2. 🐛 Ring cache initialization デバッグ
|
|||
|
|
3. 🔧 キャパシティ調整テスト(64 / 256 等)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 参考資料
|
|||
|
|
|
|||
|
|
- **実装**: `/mnt/workdisk/public_share/hakmem/core/front/tiny_ring_cache.h/c`
|
|||
|
|
- **ボトルネック分析**: `/mnt/workdisk/public_share/hakmem/RANDOM_MIXED_BOTTLENECK_ANALYSIS.md`
|
|||
|
|
- **Phase 21-1 計画**: `/mnt/workdisk/public_share/hakmem/CURRENT_TASK.md` § 10, 11
|
|||
|
|
- **Alloc fast path**: `/mnt/workdisk/public_share/hakmem/core/tiny_alloc_fast.inc.h:199-310`
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**End of Guide**
|
|||
|
|
|
|||
|
|
準備完了。実施をお待ちしています!
|
|||
|
|
|