278 lines
7.0 KiB
Markdown
278 lines
7.0 KiB
Markdown
|
|
# Phase 6.22-B: Registry 削除 + SuperSlab 統合実装
|
|||
|
|
|
|||
|
|
**日付**: 2025-10-24
|
|||
|
|
**前提**: Phase 6.22-A 完了(SuperSlab 基盤実装済み)
|
|||
|
|
**目標**: Registry hash 削除、ptr_to_superslab() による高速化
|
|||
|
|
**期待効果**: **+10-15%** (Tiny 1T/4T)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📋 現状
|
|||
|
|
|
|||
|
|
### Phase 6.22-A 完了済み
|
|||
|
|
- ✅ SuperSlab 構造体定義 (`hakmem_tiny_superslab.h`)
|
|||
|
|
- ✅ 2MB aligned allocator (`hakmem_tiny_superslab.c`)
|
|||
|
|
- ✅ Makefile 統合
|
|||
|
|
- ✅ ビルド成功
|
|||
|
|
- ✅ **Tiny 4T で +8% 向上**(SuperSlab コード追加のみ)
|
|||
|
|
|
|||
|
|
### 現在のボトルネック(hakmem_tiny.c)
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
// 現状: Registry hash lookup (O(1) だがキャッシュミス)
|
|||
|
|
void hak_tiny_free(void* ptr) {
|
|||
|
|
// 1. Hash 計算
|
|||
|
|
void* page_base = PAGE_BASE(ptr);
|
|||
|
|
uint32_t h = hash(page_base);
|
|||
|
|
|
|||
|
|
// 2. Registry lookup(キャッシュミス!)
|
|||
|
|
TinySlabDesc* d = g_registry[h];
|
|||
|
|
while (d && d->page != page_base) {
|
|||
|
|
d = d->next; // Hash collision 時のリスト走査
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// 3. Slab metadata 取得
|
|||
|
|
TinySlab* slab = d->slab;
|
|||
|
|
// ... freelist push
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**問題点**:
|
|||
|
|
1. Hash 計算コスト
|
|||
|
|
2. Registry lookup のキャッシュミス
|
|||
|
|
3. Hash collision 時のリスト走査
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🎯 Phase 6.22-B 実装内容
|
|||
|
|
|
|||
|
|
### 1. Registry 削除 + ptr_to_superslab() 使用
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
// Phase 6.22-B: SuperSlab fast path (1 AND operation)
|
|||
|
|
void hak_tiny_free(void* ptr) {
|
|||
|
|
// 1. SuperSlab 取得(1 AND 演算)
|
|||
|
|
SuperSlab* ss = ptr_to_superslab(ptr);
|
|||
|
|
|
|||
|
|
// 2. Slab index 計算(1 shift 演算)
|
|||
|
|
int slab_idx = ptr_to_slab_index(ptr);
|
|||
|
|
|
|||
|
|
// 3. Slab metadata 取得(direct access)
|
|||
|
|
TinySlabMeta* meta = &ss->slabs[slab_idx];
|
|||
|
|
|
|||
|
|
// 4. Same-thread check(TLS 比較)
|
|||
|
|
if (meta->owner_tid == get_tid()) {
|
|||
|
|
// Fast path: Push to freelist
|
|||
|
|
*(void**)ptr = meta->freelist;
|
|||
|
|
meta->freelist = ptr;
|
|||
|
|
meta->used--;
|
|||
|
|
} else {
|
|||
|
|
// Slow path: Remote free
|
|||
|
|
remote_free(ss, slab_idx, ptr);
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**改善点**:
|
|||
|
|
- Hash 計算 → AND 演算(1命令)
|
|||
|
|
- Registry lookup → Direct access
|
|||
|
|
- Cache locality 向上
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🏗️ 実装ステップ
|
|||
|
|
|
|||
|
|
### Step 1: Registry 構造体の削除準備
|
|||
|
|
|
|||
|
|
**ファイル**: `hakmem_tiny.c`
|
|||
|
|
|
|||
|
|
1. **Registry 関連コメントアウト**
|
|||
|
|
```c
|
|||
|
|
// Phase 6.22-B: Registry disabled (using SuperSlab now)
|
|||
|
|
// static TinySlabDesc* g_registry[REGISTRY_SIZE];
|
|||
|
|
// static pthread_mutex_t g_registry_locks[REGISTRY_SIZE];
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
2. **Registry 関数をスタブ化**
|
|||
|
|
```c
|
|||
|
|
// static void registry_insert(...) { /* DEPRECATED */ }
|
|||
|
|
// static TinySlabDesc* registry_lookup(...) { return NULL; /* DEPRECATED */ }
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Step 2: Allocation path の SuperSlab 統合
|
|||
|
|
|
|||
|
|
**関数**: `hak_tiny_alloc()`
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
void* hak_tiny_alloc(size_t size) {
|
|||
|
|
int class_idx = SIZE_TO_CLASS[size >> 3];
|
|||
|
|
TinyTLS* tls = get_tls();
|
|||
|
|
|
|||
|
|
// 1. Try TLS active slab
|
|||
|
|
TinySlab* active = tls->active_slab[class_idx];
|
|||
|
|
if (!active || !active->freelist) {
|
|||
|
|
// 2. Refill from SuperSlab
|
|||
|
|
active = refill_from_superslab(class_idx);
|
|||
|
|
if (!active) return NULL; // OOM
|
|||
|
|
tls->active_slab[class_idx] = active;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// 3. Pop from freelist
|
|||
|
|
void* block = active->freelist;
|
|||
|
|
active->freelist = *(void**)block;
|
|||
|
|
active->used++;
|
|||
|
|
|
|||
|
|
return block;
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Step 3: Free path の SuperSlab 統合
|
|||
|
|
|
|||
|
|
**関数**: `hak_tiny_free()`
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
void hak_tiny_free(void* ptr) {
|
|||
|
|
// Phase 6.22-B: Use SuperSlab fast path
|
|||
|
|
SuperSlab* ss = ptr_to_superslab(ptr);
|
|||
|
|
if (ss->magic != SUPERSLAB_MAGIC) {
|
|||
|
|
// Not a SuperSlab pointer (legacy or error)
|
|||
|
|
return; // Or fallback to old registry
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
int slab_idx = ptr_to_slab_index(ptr);
|
|||
|
|
TinySlabMeta* meta = &ss->slabs[slab_idx];
|
|||
|
|
|
|||
|
|
// Same-thread fast path
|
|||
|
|
uint64_t my_tid = (uint64_t)(uintptr_t)pthread_self();
|
|||
|
|
if (meta->owner_tid == (uint32_t)my_tid) {
|
|||
|
|
// Fast path: Direct freelist push
|
|||
|
|
*(void**)ptr = meta->freelist;
|
|||
|
|
meta->freelist = ptr;
|
|||
|
|
meta->used--;
|
|||
|
|
} else {
|
|||
|
|
// Slow path: Remote free (lock or lock-free queue)
|
|||
|
|
remote_free_superslab(ss, slab_idx, ptr);
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Step 4: Refill mechanism の更新
|
|||
|
|
|
|||
|
|
**関数**: `refill_from_superslab()`
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
TinySlab* refill_from_superslab(int class_idx) {
|
|||
|
|
// 1. Try TLS slab queue (if any)
|
|||
|
|
TinyTLS* tls = get_tls();
|
|||
|
|
if (tls->slab_queue[class_idx].head) {
|
|||
|
|
return pop_slab_queue(tls, class_idx);
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// 2. Allocate new SuperSlab
|
|||
|
|
SuperSlab* ss = superslab_allocate(class_idx);
|
|||
|
|
if (!ss) return NULL;
|
|||
|
|
|
|||
|
|
// 3. Initialize first slab
|
|||
|
|
uint32_t my_tid = (uint32_t)(uintptr_t)pthread_self();
|
|||
|
|
superslab_init_slab(ss, 0, g_class_sizes[class_idx], my_tid);
|
|||
|
|
|
|||
|
|
// 4. Return slab metadata
|
|||
|
|
return &ss->slabs[0]; // Note: Need to create TinySlab wrapper
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📊 Expected Performance Impact
|
|||
|
|
|
|||
|
|
### Before (Phase 6.22-A)
|
|||
|
|
- Tiny 1T: 20.0 M/s
|
|||
|
|
- Tiny 4T: 57.9 M/s
|
|||
|
|
|
|||
|
|
### Target (Phase 6.22-B)
|
|||
|
|
- Tiny 1T: **22-23 M/s** (+10-15%)
|
|||
|
|
- Tiny 4T: **63-67 M/s** (+9-16%)
|
|||
|
|
|
|||
|
|
### vs mimalloc
|
|||
|
|
- Tiny 1T: 33.8 M/s → 65-68% 達成見込み
|
|||
|
|
- Tiny 4T: 76.5 M/s → 82-88% 達成見込み
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## ⚠️ Implementation Risks
|
|||
|
|
|
|||
|
|
### Risk 1: TinySlab vs TinySlabMeta の不整合
|
|||
|
|
|
|||
|
|
**問題**: 既存コードは `TinySlab` 構造体を使用、SuperSlab は `TinySlabMeta`
|
|||
|
|
|
|||
|
|
**対策**:
|
|||
|
|
- TinySlabMeta を TinySlab に統合
|
|||
|
|
- または TinySlab → TinySlabMeta のラッパー作成
|
|||
|
|
|
|||
|
|
### Risk 2: Remote free の実装
|
|||
|
|
|
|||
|
|
**問題**: Cross-thread free の処理が複雑
|
|||
|
|
|
|||
|
|
**対策**:
|
|||
|
|
- Phase 6.22-B では same-thread のみ最適化
|
|||
|
|
- Remote free は既存の Global freelist を一時的に使用
|
|||
|
|
- Phase 6.23 で Per-thread queues 実装
|
|||
|
|
|
|||
|
|
### Risk 3: Backward compatibility
|
|||
|
|
|
|||
|
|
**問題**: 既存の Registry ベースのコードが壊れる
|
|||
|
|
|
|||
|
|
**対策**:
|
|||
|
|
- Registry コードを残して fallback 可能にする
|
|||
|
|
- 環境変数で SuperSlab ON/OFF 切り替え
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🎯 Success Criteria
|
|||
|
|
|
|||
|
|
### Must Have
|
|||
|
|
- ✅ Registry 削除完了(コメントアウト)
|
|||
|
|
- ✅ ptr_to_superslab() 使用
|
|||
|
|
- ✅ ビルド成功
|
|||
|
|
- ✅ Tiny 1T: **+5%** 以上
|
|||
|
|
- ✅ Tiny 4T: **+5%** 以上
|
|||
|
|
|
|||
|
|
### Should Have
|
|||
|
|
- ✅ Tiny 1T: **+10-15%**
|
|||
|
|
- ✅ Tiny 4T: **+10-15%**
|
|||
|
|
- ✅ コードのシンプル化
|
|||
|
|
|
|||
|
|
### Nice to Have
|
|||
|
|
- ✅ Remote free の最適化
|
|||
|
|
- ✅ Per-thread queues (Phase 6.23 へ延期可)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📝 Implementation Plan (1-2時間)
|
|||
|
|
|
|||
|
|
### Phase 1: Registry コメントアウト(30分)
|
|||
|
|
- [ ] Registry 構造体をコメントアウト
|
|||
|
|
- [ ] Registry 関数をスタブ化
|
|||
|
|
- [ ] ビルド確認
|
|||
|
|
|
|||
|
|
### Phase 2: Free path 統合(30分)
|
|||
|
|
- [ ] `hak_tiny_free()` を SuperSlab 版に書き換え
|
|||
|
|
- [ ] Same-thread fast path 実装
|
|||
|
|
- [ ] ビルド + 動作確認
|
|||
|
|
|
|||
|
|
### Phase 3: Refill 統合(30分)
|
|||
|
|
- [ ] `refill_from_superslab()` 実装
|
|||
|
|
- [ ] TinySlab と TinySlabMeta の統合
|
|||
|
|
- [ ] ビルド + 動作確認
|
|||
|
|
|
|||
|
|
### Phase 4: ベンチマーク(10分)
|
|||
|
|
- [ ] Tiny 1T/4T 測定
|
|||
|
|
- [ ] Phase 6.22-A 比較
|
|||
|
|
- [ ] 結果ドキュメント作成
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**作成日**: 2025-10-24 12:00 JST
|
|||
|
|
**ステータス**: 🚀 **Ready to implement**
|
|||
|
|
**次のアクション**: Phase 1 実装開始(Registry コメントアウト)
|