219 lines
6.1 KiB
Markdown
219 lines
6.1 KiB
Markdown
|
|
# Phase 6.12: Tiny Pool 実装完了レポート
|
|||
|
|
|
|||
|
|
**完了日**: 2025-10-21
|
|||
|
|
**ステータス**: ✅ 基本実装完了、❌ 性能目標未達、🎯 P0最適化へ進む
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📊 **実装サマリ**
|
|||
|
|
|
|||
|
|
### ✅ **完了した実装**
|
|||
|
|
|
|||
|
|
1. **8 size classes 実装**
|
|||
|
|
- 8B, 16B, 32B, 64B, 128B, 256B, 512B, 1KB
|
|||
|
|
- Bitmap-based free block search (`__builtin_ctzll`)
|
|||
|
|
- Free list管理 (free_slabs / full_slabs)
|
|||
|
|
|
|||
|
|
2. **64KB slab allocator**
|
|||
|
|
- posix_memalign使用(memory leak修正済み)
|
|||
|
|
- Slab metadata: TinySlab構造体 + bitmap
|
|||
|
|
|
|||
|
|
3. **Lite P1 pre-allocation**
|
|||
|
|
- Tier 1 (8-64B) 4クラスのみ事前確保
|
|||
|
|
- 256KB常駐(512KBではない)
|
|||
|
|
|
|||
|
|
4. **ベンチマークシナリオ追加**
|
|||
|
|
- string-builder (8-64B, short-lived)
|
|||
|
|
- token-stream (16-128B, FIFO)
|
|||
|
|
- small-objects (32-256B, long-lived)
|
|||
|
|
|
|||
|
|
5. **Warmup分離実装**
|
|||
|
|
- 測定フェーズと初期化フェーズを分離
|
|||
|
|
- 測定精度向上
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📊 **ベンチマーク結果**
|
|||
|
|
|
|||
|
|
### **性能測定結果** (vs mimalloc)
|
|||
|
|
|
|||
|
|
| Scenario | hakmem | mimalloc | system | 相対性能 |
|
|||
|
|
|----------|--------|----------|--------|---------|
|
|||
|
|
| **string-builder** (8-64B) | 7,871 ns | 18 ns | 18 ns | **437x slower** ❌ |
|
|||
|
|
| **token-stream** (16-128B) | 99 ns | 9 ns | 12 ns | **11x slower** ⚠️ |
|
|||
|
|
| **small-objects** (32-256B) | 6 ns | 3 ns | 6 ns | **2x slower** ✅ |
|
|||
|
|
|
|||
|
|
### **測定環境**
|
|||
|
|
- CPU: x86_64 Linux
|
|||
|
|
- Compiler: gcc -O2
|
|||
|
|
- Iterations: 10,000 (string-builder, token-stream, small-objects)
|
|||
|
|
- Warmup: 各シナリオで4サイズクラス事前確保
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🔍 **根本原因分析**
|
|||
|
|
|
|||
|
|
### **Task先生調査結果**
|
|||
|
|
|
|||
|
|
**犯人特定**: `find_slab_by_ptr()` の二重呼び出し = **6,000ns/op (75%)**
|
|||
|
|
|
|||
|
|
#### **問題のコード**
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
// hakmem.c:510 - hak_free_at()
|
|||
|
|
if (hak_tiny_is_managed(ptr)) { // ← 1回目の find_slab_by_ptr()
|
|||
|
|
hak_tiny_free(ptr); // ← 2回目の find_slab_by_ptr()
|
|||
|
|
return;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// hakmem_tiny.c:253 - hak_tiny_is_managed()
|
|||
|
|
int hak_tiny_is_managed(void* ptr) {
|
|||
|
|
return find_slab_by_ptr(ptr) != NULL; // ← O(N) 線形探索!
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// hakmem_tiny.c:71-96 - find_slab_by_ptr()
|
|||
|
|
static TinySlab* find_slab_by_ptr(void* ptr) {
|
|||
|
|
// Search in free_slabs (O(N))
|
|||
|
|
for (int class_idx = 0; class_idx < TINY_NUM_CLASSES; class_idx++) {
|
|||
|
|
for (TinySlab* slab = ...; slab; slab = slab->next) {
|
|||
|
|
if ((uintptr_t)slab->base == slab_base) return slab;
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
// Search in full_slabs (O(N))
|
|||
|
|
for (int class_idx = 0; class_idx < TINY_NUM_CLASSES; class_idx++) {
|
|||
|
|
for (TinySlab* slab = ...; slab; slab = slab->next) {
|
|||
|
|
if ((uintptr_t)slab->base == slab_base) return slab;
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
return NULL;
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### **コスト内訳**
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
string-builder: 40,000 free calls
|
|||
|
|
× 2回 find_slab_by_ptr()
|
|||
|
|
× 平均 3,000ns/call (O(N) 探索)
|
|||
|
|
= 240,000,000 ns total
|
|||
|
|
→ 6,000 ns/op (75%)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**その他のオーバーヘッド**:
|
|||
|
|
- memset: 100 ns/op (1.3%)
|
|||
|
|
- 関数呼び出し: 80 ns/op (1.0%)
|
|||
|
|
- bitmap探索: 推定 1,691 ns/op (21.5%)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 💡 **ChatGPT Pro 診断結果**
|
|||
|
|
|
|||
|
|
### **判断: 継続推奨!諦めない**
|
|||
|
|
|
|||
|
|
**理由**:
|
|||
|
|
1. ✅ **P0+TLS で桁が変わる** - 7,871ns → 50-80ns (157倍高速化)
|
|||
|
|
2. ✅ **SACS/ELO の差別化** - Tiny帯でもHot/Warm/Cold適用可能
|
|||
|
|
3. ✅ **一貫性** - L1/L2/L2.5/L3が同じ方針で動く
|
|||
|
|
|
|||
|
|
**タイムボックス**: P0で ≤200ns/op 切れなければL2.5に注力
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🎯 **P0最適化戦略**
|
|||
|
|
|
|||
|
|
### **Option B: Embedded Metadata (Slab先頭16Bにタグ)**
|
|||
|
|
|
|||
|
|
**実装**:
|
|||
|
|
```c
|
|||
|
|
typedef struct __attribute__((packed)) {
|
|||
|
|
uintptr_t xored_owner; // owner ^ cookie
|
|||
|
|
uint32_t magic; // 0xH4KM3M01
|
|||
|
|
uint16_t class_idx;
|
|||
|
|
uint16_t epoch; // ABA防止
|
|||
|
|
} SlabTag;
|
|||
|
|
|
|||
|
|
static inline TinySlab* owner_slab(void* p) {
|
|||
|
|
uintptr_t base = (uintptr_t)p & ~(TINY_SLAB_SIZE-1);
|
|||
|
|
SlabTag* t = (SlabTag*)base;
|
|||
|
|
if (unlikely(t->magic != MAGIC)) return NULL;
|
|||
|
|
return (TinySlab*)((t->xored_owner) ^ cookie); // O(1)!
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**期待効果**: 6,000ns → 5ns (**1200倍高速化**)
|
|||
|
|
|
|||
|
|
### **Option C: 二重呼び出し削除**
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
// hak_free_at() 修正
|
|||
|
|
TinySlab* slab = owner_slab(ptr); // ← 1回のみ
|
|||
|
|
if (slab) {
|
|||
|
|
hak_tiny_free_with_slab(ptr, slab);
|
|||
|
|
return;
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**期待効果**: 2倍高速化
|
|||
|
|
|
|||
|
|
### **memset全削除**
|
|||
|
|
|
|||
|
|
ベンチマーク測定用memset以外を削除
|
|||
|
|
|
|||
|
|
**期待効果**: 100ns削減
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📊 **期待される改善効果**
|
|||
|
|
|
|||
|
|
| 改善 | 現状 | 改善後 | 効果 |
|
|||
|
|
|------|------|--------|------|
|
|||
|
|
| **P0: Option B + C + memset削除** | 7,871 ns | 1,871 ns | 4.2倍高速 |
|
|||
|
|
| **P1: TLS freelist** | 1,871 ns | 50-80 ns | 27倍高速 |
|
|||
|
|
| **最終** | 7,871 ns | **50-80 ns** | **157倍高速** |
|
|||
|
|
|
|||
|
|
**mimalloc比**: 18ns vs 50-80ns → 2.8-4.4倍遅い(許容範囲)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🎯 **最終目標**
|
|||
|
|
|
|||
|
|
| Scenario | 現状 | 目標 | mimalloc | 達成度 |
|
|||
|
|
|----------|------|------|----------|--------|
|
|||
|
|
| **string-builder** | 7,871 ns | **50-80 ns** | 18 ns | mimalloc比 2.8-4.4倍 ✅ |
|
|||
|
|
| **token-stream** | 99 ns | **≤20 ns** | 9 ns | mimalloc比 2.2倍 ✅ |
|
|||
|
|
| **small-objects** | 6 ns | **≤10 ns** | 3 ns | mimalloc比 3.3倍 ✅ |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📁 **関連ファイル**
|
|||
|
|
|
|||
|
|
### **実装**
|
|||
|
|
- `apps/experiments/hakmem-poc/hakmem_tiny.h` - Tiny Pool API
|
|||
|
|
- `apps/experiments/hakmem-poc/hakmem_tiny.c` - Slab allocator実装
|
|||
|
|
- `apps/experiments/hakmem-poc/hakmem.c` - 統合コード
|
|||
|
|
|
|||
|
|
### **ベンチマーク**
|
|||
|
|
- `apps/experiments/hakmem-poc/bench_allocators.c` - 3シナリオ実装
|
|||
|
|
- `apps/experiments/hakmem-poc/Makefile` - ビルド設定
|
|||
|
|
|
|||
|
|
### **調査レポート**
|
|||
|
|
- `apps/experiments/hakmem-poc/WARMUP_ZERO_EFFECT_INVESTIGATION.md` - Task先生調査
|
|||
|
|
- `/tmp/chatgpt_tiny_pool_optimization.txt` - ChatGPT Pro質問状
|
|||
|
|
- `/mnt/c/git/nyash-project/chatgpt_tiny_pool_optimization.txt` - 同上(Windows版)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## ✅ **Phase 6.12 完了判定**
|
|||
|
|
|
|||
|
|
**基本実装**: ✅ 完了
|
|||
|
|
**性能目標**: ❌ 未達(mimalloc比 437倍遅い)
|
|||
|
|
**根本原因**: ✅ 特定(find_slab_by_ptr 二重呼び出し)
|
|||
|
|
**最適化戦略**: ✅ 確定(ChatGPT Pro承認済み)
|
|||
|
|
|
|||
|
|
**次のステップ**: Phase 6.12.1 (P0最適化) 実装開始
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**作成者**: Claude + Task先生 + ChatGPT Pro
|
|||
|
|
**作成日**: 2025-10-21
|