2025-11-08 01:18:37 +09:00
|
|
|
|
# Current Task – 2025-11-08
|
2025-11-05 16:47:04 +09:00
|
|
|
|
|
2025-11-08 03:18:17 +09:00
|
|
|
|
## 🚀 Phase 7: Region-ID Direct Lookup - System malloc に勝つ
|
2025-11-07 17:34:24 +09:00
|
|
|
|
|
2025-11-08 03:18:17 +09:00
|
|
|
|
### ミッション
|
|
|
|
|
|
**HAKMEM を System malloc/mimalloc より速くする**
|
|
|
|
|
|
- **Current**: 1.2M ops/s (bench_random_mixed)
|
|
|
|
|
|
- **Target**: 40-80M ops/s (70-140% of System malloc)
|
|
|
|
|
|
- **Strategy**: SuperSlab lookup 削除 → Ultra-fast free (3-5 instructions)
|
2025-11-08 01:35:45 +09:00
|
|
|
|
|
2025-11-08 03:18:17 +09:00
|
|
|
|
---
|
2025-11-08 01:35:45 +09:00
|
|
|
|
|
2025-11-08 03:18:17 +09:00
|
|
|
|
## 📊 現状分析(完了)
|
2025-11-08 01:35:45 +09:00
|
|
|
|
|
2025-11-08 03:18:17 +09:00
|
|
|
|
### Performance Gap 発見
|
|
|
|
|
|
- **System malloc**: 56M ops/s
|
|
|
|
|
|
- **HAKMEM**: 1.2M ops/s
|
|
|
|
|
|
- **Gap**: **47x slower** 💀
|
2025-11-07 17:34:24 +09:00
|
|
|
|
|
2025-11-08 03:18:17 +09:00
|
|
|
|
### Root Cause 特定(ChatGPT Pro Ultrathink)
|
|
|
|
|
|
**Free path で 2回の SuperSlab lookup が 52.63% CPU を消費**
|
2025-11-08 01:35:45 +09:00
|
|
|
|
|
2025-11-08 03:18:17 +09:00
|
|
|
|
```c
|
|
|
|
|
|
// 現状の問題
|
|
|
|
|
|
void free(ptr) {
|
|
|
|
|
|
SuperSlab* ss = hak_super_lookup(ptr); // ← Lookup #1 (100+ cycles)
|
|
|
|
|
|
int class_idx = ss->size_class;
|
|
|
|
|
|
// ... 330 lines of validation ...
|
|
|
|
|
|
hak_tiny_free_superslab(ptr, ss); // ← Lookup #2 (redundant!)
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**比較:**
|
|
|
|
|
|
| Path | Instructions | Atomics | Lookups | Cycles |
|
|
|
|
|
|
|------|--------------|---------|---------|--------|
|
|
|
|
|
|
| **Allocation** | 3-4 | 0 | 0 | ~10 |
|
|
|
|
|
|
| **Free (現状)** | 330+ | 5-7 | 2 | ~500+ |
|
|
|
|
|
|
| **System tcache** | 3-4 | 0 | 0 | ~10 |
|
|
|
|
|
|
|
|
|
|
|
|
---
|
2025-11-05 12:31:14 +09:00
|
|
|
|
|
2025-11-08 03:18:17 +09:00
|
|
|
|
## ✅ 設計完了(Task Agent Opus Ultrathink)
|
2025-11-08 01:35:45 +09:00
|
|
|
|
|
2025-11-08 03:18:17 +09:00
|
|
|
|
### 推奨方式: Smart Headers (Hybrid 1B)
|
2025-11-08 01:35:45 +09:00
|
|
|
|
|
2025-11-08 03:18:17 +09:00
|
|
|
|
**天才的発見:**
|
|
|
|
|
|
> SuperSlab の slab[0] に **960 bytes の無駄パディング** が存在
|
|
|
|
|
|
> → Header に再利用すれば **メモリ overhead ゼロ!**
|
|
|
|
|
|
|
|
|
|
|
|
**実装:**
|
2025-11-08 01:35:45 +09:00
|
|
|
|
```c
|
2025-11-08 03:18:17 +09:00
|
|
|
|
// Ultra-Fast Free (3-5 instructions, 5-10 cycles)
|
|
|
|
|
|
void hak_free_fast(void* ptr) {
|
|
|
|
|
|
// 1. Get class from inline header (1 instruction)
|
|
|
|
|
|
uint8_t cls = *((uint8_t*)ptr - 1);
|
|
|
|
|
|
|
|
|
|
|
|
// 2. Push to TLS freelist (2-3 instructions)
|
|
|
|
|
|
*(void**)ptr = g_tls_sll_head[cls];
|
|
|
|
|
|
g_tls_sll_head[cls] = ptr;
|
|
|
|
|
|
g_tls_sll_count[cls]++;
|
2025-11-08 01:35:45 +09:00
|
|
|
|
|
2025-11-08 03:18:17 +09:00
|
|
|
|
// Done! No lookup, no validation, no atomic
|
|
|
|
|
|
}
|
2025-11-05 12:31:14 +09:00
|
|
|
|
```
|
2025-11-08 01:35:45 +09:00
|
|
|
|
|
2025-11-08 03:18:17 +09:00
|
|
|
|
**Performance Projection:**
|
|
|
|
|
|
- **1.2M → 40-60M ops/s** (30-50x improvement) 🚀
|
|
|
|
|
|
- **vs System malloc**: 70-110% (互角〜勝ち!) 🏆
|
|
|
|
|
|
- **vs mimalloc**: 同等レベル
|
2025-11-08 01:35:45 +09:00
|
|
|
|
|
2025-11-08 03:18:17 +09:00
|
|
|
|
**Memory Overhead:**
|
|
|
|
|
|
- Slab[0]: 0% (パディング再利用)
|
|
|
|
|
|
- Other slabs: ~1.5% (1 byte/block)
|
|
|
|
|
|
- Average: <2% (許容範囲)
|
2025-11-08 01:35:45 +09:00
|
|
|
|
|
2025-11-08 03:18:17 +09:00
|
|
|
|
**設計ドキュメント:**
|
|
|
|
|
|
- [`REGION_ID_DESIGN.md`](REGION_ID_DESIGN.md) - 完全設計(Task Agent Opus)
|
|
|
|
|
|
- [`CLAUDE.md#phase-7`](CLAUDE.md#phase-7-region-id-direct-lookup---ultra-fast-free-path-2025-11-08-) - Phase 7 概要
|
2025-11-08 01:35:45 +09:00
|
|
|
|
|
2025-11-08 03:18:17 +09:00
|
|
|
|
---
|
2025-11-05 12:31:14 +09:00
|
|
|
|
|
2025-11-08 03:18:17 +09:00
|
|
|
|
## 📋 実装計画
|
|
|
|
|
|
|
|
|
|
|
|
### Phase 7-1: Proof of Concept (1-2日) ⏳
|
|
|
|
|
|
**Goal**: Header 方式の動作確認 + 効果測定
|
|
|
|
|
|
|
|
|
|
|
|
**Tasks:**
|
|
|
|
|
|
1. **Header 書き込み実装** (Allocation path)
|
|
|
|
|
|
- `core/tiny_alloc_fast.inc.h` - Header 書き込み追加
|
|
|
|
|
|
- `core/tiny_region_id.h` - Header API 定義(新規)
|
|
|
|
|
|
```c
|
|
|
|
|
|
// Allocation 時に class_idx を header に書き込む
|
|
|
|
|
|
static inline void* alloc_with_header(int class_idx, void* ptr) {
|
|
|
|
|
|
*((uint8_t*)ptr - 1) = (uint8_t)class_idx;
|
|
|
|
|
|
return ptr;
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
2. **Ultra-fast free 実装** (Free path)
|
|
|
|
|
|
- `core/tiny_free_fast_v2.inc.h` - 新しい free path(新規、10-20 LOC)
|
|
|
|
|
|
- Feature flag: `HAKMEM_TINY_HEADER_CLASSIDX=1`
|
|
|
|
|
|
```c
|
|
|
|
|
|
void hak_free_fast_v2(void* ptr) {
|
|
|
|
|
|
uint8_t cls = *((uint8_t*)ptr - 1);
|
|
|
|
|
|
*(void**)ptr = g_tls_sll_head[cls];
|
|
|
|
|
|
g_tls_sll_head[cls] = ptr;
|
|
|
|
|
|
g_tls_sll_count[cls]++;
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
3. **Benchmark 測定**
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# Before (現状)
|
|
|
|
|
|
make clean && make bench_random_mixed_hakmem
|
|
|
|
|
|
HAKMEM_TINY_USE_SUPERSLAB=1 ./bench_random_mixed_hakmem 100000 2048 1234567
|
|
|
|
|
|
# → 1.2M ops/s
|
|
|
|
|
|
|
|
|
|
|
|
# After (Header 方式)
|
|
|
|
|
|
make clean && make HEADER_CLASSIDX=1 bench_random_mixed_hakmem
|
|
|
|
|
|
HAKMEM_TINY_USE_SUPERSLAB=1 HAKMEM_TINY_HEADER_CLASSIDX=1 \
|
|
|
|
|
|
./bench_random_mixed_hakmem 100000 2048 1234567
|
|
|
|
|
|
# → Target: 40-60M ops/s
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**Success Criteria:**
|
|
|
|
|
|
- ✅ Throughput > 30M ops/s (25x improvement)
|
|
|
|
|
|
- ✅ No crashes (stability test 10 runs)
|
|
|
|
|
|
- ✅ Memory overhead < 3%
|
2025-11-08 01:35:45 +09:00
|
|
|
|
|
2025-11-08 03:18:17 +09:00
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
### Phase 7-2: Production Integration (2-3日)
|
|
|
|
|
|
**Goal**: Feature flag + Fallback + Debug validation
|
|
|
|
|
|
|
|
|
|
|
|
**Tasks:**
|
|
|
|
|
|
1. **Feature flag 追加**
|
|
|
|
|
|
- `core/hakmem_build_flags.h` - `HAKMEM_TINY_HEADER_CLASSIDX` flag
|
|
|
|
|
|
- Default: OFF (後方互換性)
|
|
|
|
|
|
- A/B toggle で簡単切り替え
|
|
|
|
|
|
|
|
|
|
|
|
2. **Fallback path 実装**
|
|
|
|
|
|
- Header なし allocation への対応
|
|
|
|
|
|
- Legacy mode サポート
|
|
|
|
|
|
```c
|
|
|
|
|
|
if (has_header(ptr)) {
|
|
|
|
|
|
fast_free_v2(ptr); // Header 方式
|
|
|
|
|
|
} else {
|
|
|
|
|
|
fast_free_v1(ptr); // Legacy (SuperSlab lookup)
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
3. **Debug validation**
|
|
|
|
|
|
- Magic byte for UAF detection
|
|
|
|
|
|
- Header corruption check
|
|
|
|
|
|
- Fail-Fast integration
|
|
|
|
|
|
```c
|
|
|
|
|
|
#if !HAKMEM_BUILD_RELEASE
|
|
|
|
|
|
if (cls >= TINY_NUM_CLASSES) {
|
|
|
|
|
|
fprintf(stderr, "[HEADER_CORRUPT] Invalid class_idx=%u\n", cls);
|
|
|
|
|
|
abort();
|
|
|
|
|
|
}
|
|
|
|
|
|
#endif
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**Success Criteria:**
|
|
|
|
|
|
- ✅ Feature flag で instant rollback 可能
|
|
|
|
|
|
- ✅ Legacy mode で既存コード動作
|
|
|
|
|
|
- ✅ Debug mode で validation 完璧
|
2025-11-08 01:35:45 +09:00
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
2025-11-08 03:18:17 +09:00
|
|
|
|
### Phase 7-3: Testing & Optimization (1-2日)
|
|
|
|
|
|
**Goal**: 本番品質達成
|
2025-11-05 12:31:14 +09:00
|
|
|
|
|
2025-11-08 03:18:17 +09:00
|
|
|
|
**Tasks:**
|
|
|
|
|
|
1. **Unit tests**
|
|
|
|
|
|
- Header 書き込み/読み込み正確性
|
|
|
|
|
|
- Edge cases (slab[0] パディング、class 境界)
|
|
|
|
|
|
- UAF detection
|
2025-11-08 01:35:45 +09:00
|
|
|
|
|
2025-11-08 03:18:17 +09:00
|
|
|
|
2. **Stress tests**
|
|
|
|
|
|
- Larson 4T (MT stability)
|
|
|
|
|
|
- Fragmentation stress
|
|
|
|
|
|
- Long-running test (1000+ seconds)
|
2025-11-05 12:31:14 +09:00
|
|
|
|
|
2025-11-08 03:18:17 +09:00
|
|
|
|
3. **Full benchmark suite**
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# Comprehensive benchmark
|
|
|
|
|
|
make bench_comprehensive_hakmem
|
|
|
|
|
|
./bench_comprehensive_hakmem
|
2025-11-05 12:31:14 +09:00
|
|
|
|
|
2025-11-08 03:18:17 +09:00
|
|
|
|
# vs System malloc
|
|
|
|
|
|
make bench_comprehensive_system
|
|
|
|
|
|
./bench_comprehensive_system
|
|
|
|
|
|
|
|
|
|
|
|
# Comparison report
|
|
|
|
|
|
diff comprehensive_hakmem.txt comprehensive_system.txt
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**Success Criteria:**
|
|
|
|
|
|
- ✅ bench_random_mixed: 40-60M ops/s
|
|
|
|
|
|
- ✅ larson_hakmem 4T: 4-6M ops/s
|
|
|
|
|
|
- ✅ vs System: 70-110%
|
|
|
|
|
|
- ✅ vs mimalloc: 同等以上
|
2025-11-05 12:31:14 +09:00
|
|
|
|
|
2025-11-08 01:35:45 +09:00
|
|
|
|
---
|
2025-11-05 12:31:14 +09:00
|
|
|
|
|
2025-11-08 03:18:17 +09:00
|
|
|
|
## 🎯 Expected Outcomes
|
2025-11-05 12:31:14 +09:00
|
|
|
|
|
2025-11-08 03:18:17 +09:00
|
|
|
|
### Performance Targets
|
2025-11-05 12:31:14 +09:00
|
|
|
|
|
2025-11-08 03:18:17 +09:00
|
|
|
|
| Benchmark | Before | After | vs System | Result |
|
|
|
|
|
|
|-----------|--------|-------|-----------|--------|
|
|
|
|
|
|
| bench_random_mixed | 1.2M | **40-60M** | **70-110%** | ✅ 互角〜勝ち |
|
|
|
|
|
|
| larson_hakmem 4T | 0.8M | **4-6M** | **120-180%** | ✅ 勝ち |
|
|
|
|
|
|
| Tiny hot path | TBD | **50-80M** | **90-140%** | ✅ 互角〜勝ち |
|
2025-11-05 12:31:14 +09:00
|
|
|
|
|
2025-11-08 03:18:17 +09:00
|
|
|
|
### 総合評価(ChatGPT Pro)
|
2025-11-05 12:31:14 +09:00
|
|
|
|
|
2025-11-08 03:18:17 +09:00
|
|
|
|
**勝てる領域:**
|
|
|
|
|
|
- ✅ **Tiny (≤1KB)**: Header 直帰で System/mimalloc 同等
|
|
|
|
|
|
- ✅ **MT Larson**: Remote side-table でスケール
|
|
|
|
|
|
- ✅ **Mid-Large (8-32KB)**: 既に +171% で勝ち
|
|
|
|
|
|
|
|
|
|
|
|
**難所(追いつく):**
|
|
|
|
|
|
- ⚠️ **VM系(大)**: mmap/munmap 最適化が必要
|
|
|
|
|
|
|
|
|
|
|
|
**総合勝算:**
|
|
|
|
|
|
> Front直帰 + 裏段バッチ + 学習 で **System/mimalloc を超える** 🏆
|
|
|
|
|
|
|
|
|
|
|
|
---
|
2025-11-08 01:35:45 +09:00
|
|
|
|
|
2025-11-08 03:18:17 +09:00
|
|
|
|
## 📁 関連ドキュメント
|
2025-11-05 12:31:14 +09:00
|
|
|
|
|
2025-11-08 03:18:17 +09:00
|
|
|
|
- [`REGION_ID_DESIGN.md`](REGION_ID_DESIGN.md) - 完全設計(Task Agent Opus Ultrathink)
|
|
|
|
|
|
- [`CLAUDE.md#phase-7`](CLAUDE.md#phase-7-region-id-direct-lookup---ultra-fast-free-path-2025-11-08-) - Phase 7 概要
|
|
|
|
|
|
- [`FREE_PATH_ULTRATHINK_ANALYSIS.md`](FREE_PATH_ULTRATHINK_ANALYSIS.md) - 現状ボトルネック分析
|
|
|
|
|
|
- [`DEBUG_LOGGING_POLICY.md`](DEBUG_LOGGING_POLICY.md) - Debug/Release ビルドポリシー
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 🛠️ 実行コマンド(Phase 7-1 用)
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# 現状ベースライン測定
|
|
|
|
|
|
make clean && make bench_random_mixed_hakmem
|
|
|
|
|
|
HAKMEM_TINY_USE_SUPERSLAB=1 ./bench_random_mixed_hakmem 100000 2048 1234567
|
|
|
|
|
|
# → Expected: 1.2M ops/s
|
|
|
|
|
|
|
|
|
|
|
|
# Header 方式実装後(Phase 7-1)
|
|
|
|
|
|
make clean && make HEADER_CLASSIDX=1 bench_random_mixed_hakmem
|
|
|
|
|
|
HAKMEM_TINY_USE_SUPERSLAB=1 HAKMEM_TINY_HEADER_CLASSIDX=1 \
|
|
|
|
|
|
./bench_random_mixed_hakmem 100000 2048 1234567
|
|
|
|
|
|
# → Target: 40-60M ops/s (30-50x improvement!)
|
|
|
|
|
|
|
|
|
|
|
|
# Larson MT test
|
|
|
|
|
|
HAKMEM_TINY_USE_SUPERSLAB=1 HAKMEM_TINY_HEADER_CLASSIDX=1 \
|
|
|
|
|
|
./larson_hakmem 2 8 128 1024 1 12345 4
|
|
|
|
|
|
# → Target: 4-6M ops/s
|
|
|
|
|
|
|
|
|
|
|
|
# Debug validation mode
|
|
|
|
|
|
HAKMEM_TINY_USE_SUPERSLAB=1 HAKMEM_TINY_HEADER_CLASSIDX=1 \
|
|
|
|
|
|
HAKMEM_TINY_REFILL_FAILFAST=2 \
|
2025-11-08 01:18:37 +09:00
|
|
|
|
./bench_random_mixed_hakmem 50000 2048 1234567
|
2025-11-08 03:18:17 +09:00
|
|
|
|
# → Header validation + Fail-Fast
|
2025-11-05 12:31:14 +09:00
|
|
|
|
```
|
2025-11-08 03:18:17 +09:00
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 📅 Timeline
|
|
|
|
|
|
|
|
|
|
|
|
- **Phase 7-1 (PoC)**: 1-2日 ← **次のステップ!**
|
|
|
|
|
|
- **Phase 7-2 (Integration)**: 2-3日
|
|
|
|
|
|
- **Phase 7-3 (Testing)**: 1-2日
|
|
|
|
|
|
- **Total**: **4-6日で System malloc に勝つ** 🎉
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## ✅ 完了済み(Phase 6 まで)
|
|
|
|
|
|
|
|
|
|
|
|
### Release Build 最適化 (2025-11-08)
|
|
|
|
|
|
- ✅ Safety Checks を Debug mode に移動
|
|
|
|
|
|
- ✅ `-DNDEBUG` を Makefile に追加
|
|
|
|
|
|
- ✅ Remote push debug log を Release で無効化
|
|
|
|
|
|
- **Result**: 1.02M → 1.20M ops/s (+17.3%)
|
|
|
|
|
|
|
|
|
|
|
|
### リモートキュー競合バグ修正 (2025-11-07)
|
|
|
|
|
|
- ✅ Freelist pop 前に remote drain 追加
|
|
|
|
|
|
- ✅ Larson 4T 安定化 (1073秒稼働)
|
|
|
|
|
|
|
|
|
|
|
|
### 二重割り当てバグ修正 (2025-11-07)
|
|
|
|
|
|
- ✅ `TinySlabMeta` に `carved` フィールド追加
|
|
|
|
|
|
- ✅ Linear carve カーソル修正
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
**次のアクション: Phase 7-1 実装開始!** 🚀
|