773 lines
21 KiB
Markdown
773 lines
21 KiB
Markdown
|
|
# HAKMEM Tiny Allocator スーパーリファクタリング計画
|
|||
|
|
|
|||
|
|
## 執行サマリー
|
|||
|
|
|
|||
|
|
### 現状
|
|||
|
|
- **hakmem_tiny.c (1584行)**: 複数の .inc ファイルをアグリゲートする器
|
|||
|
|
- **hakmem_tiny_free.inc (1470行)**: 最大級の混合ファイル
|
|||
|
|
- Free パス (33-558行)
|
|||
|
|
- SuperSlab Allocation (559-998行)
|
|||
|
|
- SuperSlab Free (999-1369行)
|
|||
|
|
- Query API (commented-out, extracted to hakmem_tiny_query.c)
|
|||
|
|
|
|||
|
|
**問題点**:
|
|||
|
|
1. 単一のメガファイル (1470行)
|
|||
|
|
2. Free + Allocation が混在
|
|||
|
|
3. 責務が不明確
|
|||
|
|
4. Static inline の嵌套が深い
|
|||
|
|
|
|||
|
|
### 目標
|
|||
|
|
**「箱理論に基づいて、500行以下のファイルに分割」**
|
|||
|
|
- 各ファイルが単一責務 (SRP)
|
|||
|
|
- `static inline` で境界をゼロコスト化
|
|||
|
|
- 依存関係を明確化
|
|||
|
|
- リファクタリング順序の最適化
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Phase 1: 現状分析
|
|||
|
|
|
|||
|
|
### 巨大ファイル TOP 10
|
|||
|
|
|
|||
|
|
| ランク | ファイル | 行数 | 責務 |
|
|||
|
|
|--------|---------|------|------|
|
|||
|
|
| 1 | hakmem_pool.c | 2592 | Mid/Large allocator (対象外) |
|
|||
|
|
| 2 | hakmem_tiny.c | 1584 | Tiny アグリゲータ (分析対象) |
|
|||
|
|
| 3 | **hakmem_tiny_free.inc** | **1470** | Free + SS Alloc + Query (要分割) |
|
|||
|
|
| 4 | hakmem.c | 1449 | Top-level allocator (対象外) |
|
|||
|
|
| 5 | hakmem_l25_pool.c | 1195 | L25 pool (対象外) |
|
|||
|
|
| 6 | hakmem_tiny_intel.inc | 863 | Intel 最適化 (分割候補) |
|
|||
|
|
| 7 | hakmem_tiny_superslab.c | 810 | SuperSlab (継続, 強化済み) |
|
|||
|
|
| 8 | hakmem_tiny_stats.c | 697 | Statistics (継続) |
|
|||
|
|
| 9 | tiny_remote.c | 645 | Remote queue (継続, 分割候補) |
|
|||
|
|
| 10 | hakmem_learner.c | 603 | Learning (対象外) |
|
|||
|
|
|
|||
|
|
### Tiny 関連で 500行超のファイル
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
hakmem_tiny_free.inc 1470 ← 要分割(最優先)
|
|||
|
|
hakmem_tiny_intel.inc 863 ← 分割候補
|
|||
|
|
hakmem_tiny_init.inc 544 ← 分割候補
|
|||
|
|
tiny_remote.c 645 ← 分割候補
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### hakmem_tiny.c が include する .inc ファイル (44個)
|
|||
|
|
|
|||
|
|
**最大級 (300行超):**
|
|||
|
|
- hakmem_tiny_free.inc (1470) ← **最優先**
|
|||
|
|
- hakmem_tiny_intel.inc (863)
|
|||
|
|
- hakmem_tiny_init.inc (544)
|
|||
|
|
|
|||
|
|
**中規模 (150-300行):**
|
|||
|
|
- hakmem_tiny_refill.inc.h (410)
|
|||
|
|
- hakmem_tiny_alloc_new.inc (275)
|
|||
|
|
- hakmem_tiny_background.inc (261)
|
|||
|
|
- hakmem_tiny_alloc.inc (249)
|
|||
|
|
- hakmem_tiny_lifecycle.inc (244)
|
|||
|
|
- hakmem_tiny_metadata.inc (226)
|
|||
|
|
|
|||
|
|
**小規模 (50-150行):**
|
|||
|
|
- hakmem_tiny_ultra_simple.inc (176)
|
|||
|
|
- hakmem_tiny_slab_mgmt.inc (163)
|
|||
|
|
- hakmem_tiny_fastcache.inc.h (149)
|
|||
|
|
- hakmem_tiny_hotmag.inc.h (147)
|
|||
|
|
- hakmem_tiny_smallmag.inc.h (139)
|
|||
|
|
- hakmem_tiny_hot_pop.inc.h (118)
|
|||
|
|
- hakmem_tiny_bump.inc.h (107)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Phase 2: 箱理論による責務分類
|
|||
|
|
|
|||
|
|
### Box 1: Atomic Ops (最下層, 50-100行)
|
|||
|
|
**責務**: CAS/Exchange/Fetch のラッパー、メモリ順序管理
|
|||
|
|
|
|||
|
|
**新規作成**:
|
|||
|
|
- `tiny_atomic.h` (80行)
|
|||
|
|
|
|||
|
|
**含める内容**:
|
|||
|
|
```c
|
|||
|
|
// Atomics for remote queue, owner_tid, refcount
|
|||
|
|
- tiny_atomic_cas()
|
|||
|
|
- tiny_atomic_exchange()
|
|||
|
|
- tiny_atomic_load/store()
|
|||
|
|
- Memory order wrapper
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### Box 2: Remote Queue & Ownership (下層, 500-700行)
|
|||
|
|
|
|||
|
|
#### 2.1: Remote Queue Operations (`tiny_remote_queue.inc.h`, 250-350行)
|
|||
|
|
**責務**: MPSC stack ops, guard check, node management
|
|||
|
|
|
|||
|
|
**出処**: hakmem_tiny_free.inc の remote queue 部分を抽出
|
|||
|
|
```c
|
|||
|
|
- tiny_remote_queue_contains_guard()
|
|||
|
|
- tiny_remote_queue_push()
|
|||
|
|
- tiny_remote_queue_pop()
|
|||
|
|
- tiny_remote_drain_owner() // from hakmem_tiny_free.inc:170
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 2.2: Remote Drain Logic (`tiny_remote_drain.inc.h`, 200-250行)
|
|||
|
|
**責務**: Drain logic, TLS cleanup
|
|||
|
|
|
|||
|
|
**出処**: hakmem_tiny_free.inc の drain ロジック
|
|||
|
|
```c
|
|||
|
|
- tiny_remote_drain_batch()
|
|||
|
|
- tiny_remote_process_mailbox()
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 2.3: Ownership (Owner TID) (`tiny_owner.inc.h`, 100-150行)
|
|||
|
|
**責務**: owner_tid の acquire/release, slab ownership
|
|||
|
|
|
|||
|
|
**既存**: slab_handle.h (295行, 継続) + 強化
|
|||
|
|
**新規**: tiny_owner.inc.h
|
|||
|
|
```c
|
|||
|
|
- tiny_owner_acquire()
|
|||
|
|
- tiny_owner_release()
|
|||
|
|
- tiny_owner_self()
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**依存**: Box 1 (Atomic)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### Box 3: Superslab Core (`hakmem_tiny_superslab.c` + `hakmem_tiny_superslab.h`, 継続)
|
|||
|
|
**責務**: SuperSlab allocation, cache, registry
|
|||
|
|
|
|||
|
|
**現状**: 810行(既に well-structured)
|
|||
|
|
|
|||
|
|
**強化**: 下記の Box と連携
|
|||
|
|
- Box 4 の Publish/Adopt
|
|||
|
|
- Box 2 の Remote ops
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### Box 4: Publish/Adopt (上層, 400-500行)
|
|||
|
|
|
|||
|
|
#### 4.1: Publish (`tiny_publish.c/h`, 継続, 34行)
|
|||
|
|
**責務**: Freelist 変化を publish
|
|||
|
|
|
|||
|
|
**既存**: tiny_publish.c (34行) ← 既に tiny
|
|||
|
|
|
|||
|
|
#### 4.2: Mailbox (`tiny_mailbox.c/h`, 継続, 252行)
|
|||
|
|
**責務**: 他スレッドからの adopt 要求
|
|||
|
|
|
|||
|
|
**既存**: tiny_mailbox.c (252行) → 分割検討
|
|||
|
|
```c
|
|||
|
|
- tiny_mailbox_push() // 50行
|
|||
|
|
- tiny_mailbox_drain() // 150行
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**分割案**:
|
|||
|
|
- `tiny_mailbox_push.inc.h` (50行)
|
|||
|
|
- `tiny_mailbox_drain.inc.h` (150行)
|
|||
|
|
|
|||
|
|
#### 4.3: Adopt Logic (`tiny_adopt.inc.h`, 200-300行)
|
|||
|
|
**責務**: SuperSlab から slab を adopt する logic
|
|||
|
|
|
|||
|
|
**出処**: hakmem_tiny_free.inc の adoption ロジックを抽出
|
|||
|
|
```c
|
|||
|
|
- tiny_adopt_request()
|
|||
|
|
- tiny_adopt_select()
|
|||
|
|
- tiny_adopt_cooldown()
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**依存**: Box 3 (SuperSlab), Box 4.2 (Mailbox), Box 2 (Ownership)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### Box 5: Allocation Path (横断, 600-800行)
|
|||
|
|
|
|||
|
|
#### 5.1: Fast Path (`tiny_alloc_fast.inc.h`, 200-300行)
|
|||
|
|
**責務**: 3-4 命令の fast path (TLS cache direct pop)
|
|||
|
|
|
|||
|
|
**出処**: hakmem_tiny_ultra_simple.inc (176行) + hakmem_tiny_fastcache.inc.h (149行)
|
|||
|
|
```c
|
|||
|
|
// Ultra-simple fast (SRP):
|
|||
|
|
static inline void* tiny_fast_alloc(int class_idx) {
|
|||
|
|
void** head = &g_tls_cache[class_idx];
|
|||
|
|
void* ptr = *head;
|
|||
|
|
if (ptr) *head = *(void**)ptr; // Pop
|
|||
|
|
return ptr;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// Fast push:
|
|||
|
|
static inline int tiny_fast_push(int class_idx, void* ptr) {
|
|||
|
|
int cap = g_tls_cache_cap[class_idx];
|
|||
|
|
int cnt = atomic_load(&g_tls_cache_count[class_idx]);
|
|||
|
|
if (cnt < cap) {
|
|||
|
|
void** head = &g_tls_cache[class_idx];
|
|||
|
|
*(void**)ptr = *head;
|
|||
|
|
*head = ptr;
|
|||
|
|
atomic_increment(&g_tls_cache_count[class_idx]);
|
|||
|
|
return 1;
|
|||
|
|
}
|
|||
|
|
return 0; // Slow path
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 5.2: Refill Logic (`tiny_refill.inc.h`, 410行, 既存)
|
|||
|
|
**責務**: キャッシュのリファイル
|
|||
|
|
|
|||
|
|
**現状**: hakmem_tiny_refill.inc.h (410行) ← 既に well-sized
|
|||
|
|
|
|||
|
|
#### 5.3: Slow Path (`tiny_alloc_slow.inc.h`, 250-350行)
|
|||
|
|
**責務**: SuperSlab → New Slab → Refill
|
|||
|
|
|
|||
|
|
**出処**: hakmem_tiny_free.inc の superslab_refill + allocation logic
|
|||
|
|
+ hakmem_tiny_alloc.inc (249行)
|
|||
|
|
```c
|
|||
|
|
- tiny_alloc_slow()
|
|||
|
|
- tiny_refill_from_superslab()
|
|||
|
|
- tiny_new_slab_alloc()
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**依存**: Box 3 (SuperSlab), Box 5.2 (Refill)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### Box 6: Free Path (横断, 600-800行)
|
|||
|
|
|
|||
|
|
#### 6.1: Fast Free (`tiny_free_fast.inc.h`, 200-250行)
|
|||
|
|
**責務**: Same-thread free, TLS cache push
|
|||
|
|
|
|||
|
|
**出処**: hakmem_tiny_free.inc の fast-path free logic
|
|||
|
|
```c
|
|||
|
|
// Fast same-thread free:
|
|||
|
|
static inline int tiny_free_fast(void* ptr, int class_idx) {
|
|||
|
|
// Owner check + Cache push
|
|||
|
|
uint32_t self_tid = tiny_self_u32();
|
|||
|
|
TinySlab* slab = hak_tiny_owner_slab(ptr);
|
|||
|
|
if (!slab || slab->owner_tid != self_tid)
|
|||
|
|
return 0; // Slow path
|
|||
|
|
|
|||
|
|
return tiny_fast_push(class_idx, ptr);
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 6.2: Cross-Thread Free (`tiny_free_remote.inc.h`, 250-300行)
|
|||
|
|
**責務**: Remote queue push, publish
|
|||
|
|
|
|||
|
|
**出処**: hakmem_tiny_free.inc の cross-thread logic + remote push
|
|||
|
|
```c
|
|||
|
|
- tiny_free_remote()
|
|||
|
|
- tiny_free_remote_queue_push()
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**依存**: Box 2 (Remote Queue), Box 4.1 (Publish)
|
|||
|
|
|
|||
|
|
#### 6.3: Guard/Safety (`tiny_free_guard.inc.h`, 100-150行)
|
|||
|
|
**責務**: Guard sentinel check, bounds validation
|
|||
|
|
|
|||
|
|
**出処**: hakmem_tiny_free.inc の guard logic
|
|||
|
|
```c
|
|||
|
|
- tiny_free_guard_check()
|
|||
|
|
- tiny_free_validate_ptr()
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### Box 7: Statistics & Query (分析層, 700-900行)
|
|||
|
|
|
|||
|
|
#### 既存(継続):
|
|||
|
|
- hakmem_tiny_stats.c (697行) - Stats aggregate
|
|||
|
|
- hakmem_tiny_stats_api.h (103行) - Stats API
|
|||
|
|
- hakmem_tiny_stats.h (278行) - Stats internal
|
|||
|
|
- hakmem_tiny_query.c (72行) - Query API
|
|||
|
|
|
|||
|
|
#### 分割検討:
|
|||
|
|
hakmem_tiny_stats.c (697行) は統計エンジン専門なので OK
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### Box 8: Lifecycle (初期化・クリーンアップ, 544行)
|
|||
|
|
|
|||
|
|
#### 既存:
|
|||
|
|
- hakmem_tiny_init.inc (544行) - Initialization
|
|||
|
|
- hakmem_tiny_lifecycle.inc (244行) - Lifecycle
|
|||
|
|
- hakmem_tiny_slab_mgmt.inc (163行) - Slab management
|
|||
|
|
|
|||
|
|
**分割検討**:
|
|||
|
|
- `tiny_init_globals.inc.h` (150行) - Global vars
|
|||
|
|
- `tiny_init_config.inc.h` (150行) - Config from env
|
|||
|
|
- `tiny_init_pools.inc.h` (150行) - Pool allocation
|
|||
|
|
- `tiny_lifecycle_trim.inc.h` (120行) - Trim logic
|
|||
|
|
- `tiny_lifecycle_shutdown.inc.h` (120行) - Shutdown
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### Box 9: Intel Specific (863行)
|
|||
|
|
|
|||
|
|
**分割案**:
|
|||
|
|
- `tiny_intel_fast.inc.h` (300行) - Prefetch + PAUSE
|
|||
|
|
- `tiny_intel_cache.inc.h` (200行) - Cache tuning
|
|||
|
|
- `tiny_intel_cfl.inc.h` (150行) - CFL-specific
|
|||
|
|
- `tiny_intel_skl.inc.h` (150行) - SKL-specific (共通化)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Phase 3: 分割実行計画
|
|||
|
|
|
|||
|
|
### Priority 1: Critical Path (1週間)
|
|||
|
|
|
|||
|
|
**目標**: Fast path を 3-4 命令レベルまで削減
|
|||
|
|
|
|||
|
|
1. **Box 1: tiny_atomic.h** (80行) ✨
|
|||
|
|
- `atomic_load_explicit()` wrapper
|
|||
|
|
- `atomic_store_explicit()` wrapper
|
|||
|
|
- `atomic_cas()` wrapper
|
|||
|
|
- 依存: `<stdatomic.h>` のみ
|
|||
|
|
|
|||
|
|
2. **Box 5.1: tiny_alloc_fast.inc.h** (250行) ✨
|
|||
|
|
- Ultra-simple TLS cache pop
|
|||
|
|
- 依存: Box 1
|
|||
|
|
|
|||
|
|
3. **Box 6.1: tiny_free_fast.inc.h** (200行) ✨
|
|||
|
|
- Same-thread fast free
|
|||
|
|
- 依存: Box 1, Box 5.1
|
|||
|
|
|
|||
|
|
4. **Extract from hakmem_tiny_free.inc**:
|
|||
|
|
- Fast path logic (500行) → 上記へ
|
|||
|
|
- SuperSlab path (400行) → Box 5.3, 6.2へ
|
|||
|
|
- Remote logic (250行) → Box 2へ
|
|||
|
|
- Cleanup → hakmem_tiny_free.inc は 300行に削減
|
|||
|
|
|
|||
|
|
**効果**: Fast path を system tcache 並みに最適化
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### Priority 2: Remote & Ownership (1週間)
|
|||
|
|
|
|||
|
|
5. **Box 2.1: tiny_remote_queue.inc.h** (300行)
|
|||
|
|
- Remote queue ops
|
|||
|
|
- 依存: Box 1
|
|||
|
|
|
|||
|
|
6. **Box 2.3: tiny_owner.inc.h** (120行)
|
|||
|
|
- Owner TID management
|
|||
|
|
- 依存: Box 1, slab_handle.h (既存)
|
|||
|
|
|
|||
|
|
7. **tiny_remote.c の整理**: 645行
|
|||
|
|
- `tiny_remote_queue_ops()` → tiny_remote_queue.inc.h へ
|
|||
|
|
- `tiny_remote_side_*()` → 継続
|
|||
|
|
- リサイズ: 645 → 350行に削減
|
|||
|
|
|
|||
|
|
**効果**: Remote ops を モジュール化
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### Priority 3: SuperSlab Integration (1-2週間)
|
|||
|
|
|
|||
|
|
8. **Box 3 強化**: hakmem_tiny_superslab.c (810行, 継続)
|
|||
|
|
- Publish/Adopt 統合
|
|||
|
|
- 依存: Box 2, Box 4
|
|||
|
|
|
|||
|
|
9. **Box 4.1-4.3: Publish/Adopt Path** (400-500行)
|
|||
|
|
- `tiny_publish.c` (34行, 既存)
|
|||
|
|
- `tiny_mailbox.c` → 分割
|
|||
|
|
- `tiny_adopt.inc.h` (新規)
|
|||
|
|
|
|||
|
|
**効果**: SuperSlab adoption を完全に統合
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### Priority 4: Allocation/Free Slow Path (1週間)
|
|||
|
|
|
|||
|
|
10. **Box 5.2-5.3: Refill & Slow Allocation** (650行)
|
|||
|
|
- hakmem_tiny_refill.inc.h (410行, 既存)
|
|||
|
|
- `tiny_alloc_slow.inc.h` (新規, 300行)
|
|||
|
|
|
|||
|
|
11. **Box 6.2-6.3: Cross-thread Free** (400行)
|
|||
|
|
- `tiny_free_remote.inc.h` (新規)
|
|||
|
|
- `tiny_free_guard.inc.h` (新規)
|
|||
|
|
|
|||
|
|
**効果**: Slow path を 明確に分離
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### Priority 5: Lifecycle & Config (1-2週間)
|
|||
|
|
|
|||
|
|
12. **Box 8: Lifecycle の分割** (400-500行)
|
|||
|
|
- hakmem_tiny_init.inc (544行) → 150 + 150 + 150
|
|||
|
|
- hakmem_tiny_lifecycle.inc (244行) → 120 + 120
|
|||
|
|
- Remove duplication
|
|||
|
|
|
|||
|
|
13. **Box 9: Intel-specific の整理** (863行)
|
|||
|
|
- `tiny_intel_fast.inc.h` (300行)
|
|||
|
|
- `tiny_intel_cache.inc.h` (200行)
|
|||
|
|
- `tiny_intel_common.inc.h` (150行)
|
|||
|
|
- Deduplicate × 3 architectures
|
|||
|
|
|
|||
|
|
**効果**: 設定管理を統一化
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Phase 4: 新ファイル構成案
|
|||
|
|
|
|||
|
|
### 最終構成
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
core/
|
|||
|
|
├─ Box 1: Atomic Ops
|
|||
|
|
│ └─ tiny_atomic.h (80行)
|
|||
|
|
│
|
|||
|
|
├─ Box 2: Remote & Ownership
|
|||
|
|
│ ├─ tiny_remote.h (80行, 既存, 軽量化)
|
|||
|
|
│ ├─ tiny_remote_queue.inc.h (300行, 新規)
|
|||
|
|
│ ├─ tiny_remote_drain.inc.h (150行, 新規)
|
|||
|
|
│ ├─ tiny_owner.inc.h (120行, 新規)
|
|||
|
|
│ └─ slab_handle.h (295行, 既存, 継続)
|
|||
|
|
│
|
|||
|
|
├─ Box 3: SuperSlab Core
|
|||
|
|
│ ├─ hakmem_tiny_superslab.h (500行, 既存)
|
|||
|
|
│ └─ hakmem_tiny_superslab.c (810行, 既存)
|
|||
|
|
│
|
|||
|
|
├─ Box 4: Publish/Adopt
|
|||
|
|
│ ├─ tiny_publish.h (6行, 既존)
|
|||
|
|
│ ├─ tiny_publish.c (34行, 既存)
|
|||
|
|
│ ├─ tiny_mailbox.h (11行, 既存)
|
|||
|
|
│ ├─ tiny_mailbox.c (252行, 既존) → 분할 가능
|
|||
|
|
│ ├─ tiny_mailbox_push.inc.h (80行, 새로)
|
|||
|
|
│ ├─ tiny_mailbox_drain.inc.h (150行, 새로)
|
|||
|
|
│ └─ tiny_adopt.inc.h (300行, 새로)
|
|||
|
|
│
|
|||
|
|
├─ Box 5: Allocation
|
|||
|
|
│ ├─ tiny_alloc_fast.inc.h (250行, 新規)
|
|||
|
|
│ ├─ hakmem_tiny_refill.inc.h (410行, 既存)
|
|||
|
|
│ └─ tiny_alloc_slow.inc.h (300行, 新規)
|
|||
|
|
│
|
|||
|
|
├─ Box 6: Free
|
|||
|
|
│ ├─ tiny_free_fast.inc.h (200行, 新規)
|
|||
|
|
│ ├─ tiny_free_remote.inc.h (300行, 新規)
|
|||
|
|
│ ├─ tiny_free_guard.inc.h (120行, 新規)
|
|||
|
|
│ └─ hakmem_tiny_free.inc (1470行, 既存) → 300行に削減
|
|||
|
|
│
|
|||
|
|
├─ Box 7: Statistics
|
|||
|
|
│ ├─ hakmem_tiny_stats.c (697行, 既存)
|
|||
|
|
│ ├─ hakmem_tiny_stats.h (278行, 既存)
|
|||
|
|
│ ├─ hakmem_tiny_stats_api.h (103行, 既存)
|
|||
|
|
│ └─ hakmem_tiny_query.c (72行, 既存)
|
|||
|
|
│
|
|||
|
|
├─ Box 8: Lifecycle
|
|||
|
|
│ ├─ tiny_init_globals.inc.h (150行, 新規)
|
|||
|
|
│ ├─ tiny_init_config.inc.h (150行, 新規)
|
|||
|
|
│ ├─ tiny_init_pools.inc.h (150行, 新規)
|
|||
|
|
│ ├─ tiny_lifecycle_trim.inc.h (120行, 新規)
|
|||
|
|
│ └─ tiny_lifecycle_shutdown.inc.h (120行, 新規)
|
|||
|
|
│
|
|||
|
|
├─ Box 9: Intel-specific
|
|||
|
|
│ ├─ tiny_intel_common.inc.h (150行, 新規)
|
|||
|
|
│ ├─ tiny_intel_fast.inc.h (300行, 新規)
|
|||
|
|
│ └─ tiny_intel_cache.inc.h (200行, 新規)
|
|||
|
|
│
|
|||
|
|
└─ Integration
|
|||
|
|
└─ hakmem_tiny.c (1584行, 既存, include aggregator)
|
|||
|
|
└─ 新規フォーマット:
|
|||
|
|
1. includes Box 1-9
|
|||
|
|
2. Minimal glue code only
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Phase 5: Include 順序の最適化
|
|||
|
|
|
|||
|
|
### 安全な include 依存関係
|
|||
|
|
|
|||
|
|
```mermaid
|
|||
|
|
graph TD
|
|||
|
|
A[Box 1: tiny_atomic.h] --> B[Box 2: tiny_remote.h]
|
|||
|
|
A --> C[Box 5/6: Alloc/Free]
|
|||
|
|
B --> D[Box 2.1: tiny_remote_queue.inc.h]
|
|||
|
|
D --> E[tiny_remote.c]
|
|||
|
|
|
|||
|
|
A --> F[Box 4: Publish/Adopt]
|
|||
|
|
E --> F
|
|||
|
|
|
|||
|
|
C --> G[Box 3: SuperSlab]
|
|||
|
|
F --> G
|
|||
|
|
G --> H[Box 5.3/6.2: Slow Path]
|
|||
|
|
|
|||
|
|
I[Box 8: Lifecycle] --> H
|
|||
|
|
J[Box 9: Intel] --> C
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### hakmem_tiny.c の新規フォーマット
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
#include "hakmem_tiny.h"
|
|||
|
|
#include "hakmem_tiny_config.h"
|
|||
|
|
|
|||
|
|
// ============================================================
|
|||
|
|
// LAYER 0: Atomic + Ownership (lowest)
|
|||
|
|
// ============================================================
|
|||
|
|
#include "tiny_atomic.h"
|
|||
|
|
#include "tiny_owner.inc.h"
|
|||
|
|
#include "slab_handle.h"
|
|||
|
|
|
|||
|
|
// ============================================================
|
|||
|
|
// LAYER 1: Remote Queue + SuperSlab Core
|
|||
|
|
// ============================================================
|
|||
|
|
#include "hakmem_tiny_superslab.h"
|
|||
|
|
#include "tiny_remote_queue.inc.h"
|
|||
|
|
#include "tiny_remote_drain.inc.h"
|
|||
|
|
#include "tiny_remote.inc" // tiny_remote_side_*
|
|||
|
|
#include "tiny_remote.c" // Link-time
|
|||
|
|
|
|||
|
|
// ============================================================
|
|||
|
|
// LAYER 2: Publish/Adopt (publication mechanism)
|
|||
|
|
// ============================================================
|
|||
|
|
#include "tiny_publish.h"
|
|||
|
|
#include "tiny_publish.c"
|
|||
|
|
#include "tiny_mailbox.h"
|
|||
|
|
#include "tiny_mailbox_push.inc.h"
|
|||
|
|
#include "tiny_mailbox_drain.inc.h"
|
|||
|
|
#include "tiny_mailbox.c"
|
|||
|
|
#include "tiny_adopt.inc.h"
|
|||
|
|
|
|||
|
|
// ============================================================
|
|||
|
|
// LAYER 3: Fast Path (allocation + free)
|
|||
|
|
// ============================================================
|
|||
|
|
#include "tiny_alloc_fast.inc.h"
|
|||
|
|
#include "tiny_free_fast.inc.h"
|
|||
|
|
|
|||
|
|
// ============================================================
|
|||
|
|
// LAYER 4: Slow Path (refill + cross-thread free)
|
|||
|
|
// ============================================================
|
|||
|
|
#include "hakmem_tiny_refill.inc.h"
|
|||
|
|
#include "tiny_alloc_slow.inc.h"
|
|||
|
|
#include "tiny_free_remote.inc.h"
|
|||
|
|
#include "tiny_free_guard.inc.h"
|
|||
|
|
|
|||
|
|
// ============================================================
|
|||
|
|
// LAYER 5: Statistics + Query + Metadata
|
|||
|
|
// ============================================================
|
|||
|
|
#include "hakmem_tiny_stats.h"
|
|||
|
|
#include "hakmem_tiny_query.c"
|
|||
|
|
#include "hakmem_tiny_metadata.inc"
|
|||
|
|
|
|||
|
|
// ============================================================
|
|||
|
|
// LAYER 6: Lifecycle + Init
|
|||
|
|
// ============================================================
|
|||
|
|
#include "tiny_init_globals.inc.h"
|
|||
|
|
#include "tiny_init_config.inc.h"
|
|||
|
|
#include "tiny_init_pools.inc.h"
|
|||
|
|
#include "tiny_lifecycle_trim.inc.h"
|
|||
|
|
#include "tiny_lifecycle_shutdown.inc.h"
|
|||
|
|
|
|||
|
|
// ============================================================
|
|||
|
|
// LAYER 7: Intel-specific optimizations
|
|||
|
|
// ============================================================
|
|||
|
|
#include "tiny_intel_common.inc.h"
|
|||
|
|
#include "tiny_intel_fast.inc.h"
|
|||
|
|
#include "tiny_intel_cache.inc.h"
|
|||
|
|
|
|||
|
|
// ============================================================
|
|||
|
|
// LAYER 8: Legacy/Experimental (kept for compat)
|
|||
|
|
// ============================================================
|
|||
|
|
#include "hakmem_tiny_ultra_simple.inc"
|
|||
|
|
#include "hakmem_tiny_alloc.inc"
|
|||
|
|
#include "hakmem_tiny_slow.inc"
|
|||
|
|
|
|||
|
|
// ============================================================
|
|||
|
|
// LAYER 9: Old free.inc (minimal, mostly extracted)
|
|||
|
|
// ============================================================
|
|||
|
|
#include "hakmem_tiny_free.inc" // Now just cleanup
|
|||
|
|
|
|||
|
|
#include "hakmem_tiny_background.inc"
|
|||
|
|
#include "hakmem_tiny_magazine.h"
|
|||
|
|
#include "tiny_refill.h"
|
|||
|
|
#include "tiny_mmap_gate.h"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Phase 6: 実装ガイド
|
|||
|
|
|
|||
|
|
### Key Principles
|
|||
|
|
|
|||
|
|
1. **SRP (Single Responsibility Principle)**
|
|||
|
|
- Each file: 1 責務、500行以下
|
|||
|
|
- No sideways dependencies
|
|||
|
|
|
|||
|
|
2. **Zero-Cost Abstraction**
|
|||
|
|
- All boundaries via `static inline`
|
|||
|
|
- No function pointer indirection
|
|||
|
|
- Compiler inlines aggressively
|
|||
|
|
|
|||
|
|
3. **Cyclic Dependency Prevention**
|
|||
|
|
- Layer 1 → Layer 2 → ... → Layer 9
|
|||
|
|
- Backward dependency は回避
|
|||
|
|
|
|||
|
|
4. **Backward Compatibility**
|
|||
|
|
- Legacy .inc files は維持(互換性)
|
|||
|
|
- 段階的に新ファイルに移行
|
|||
|
|
|
|||
|
|
### Static Inline の使用場所
|
|||
|
|
|
|||
|
|
#### ✅ Use `static inline`:
|
|||
|
|
```c
|
|||
|
|
// tiny_atomic.h
|
|||
|
|
static inline void tiny_atomic_store(volatile int* p, int v) {
|
|||
|
|
atomic_store_explicit((_Atomic int*)p, v, memory_order_release);
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// tiny_free_fast.inc.h
|
|||
|
|
static inline void* tiny_fast_pop_alloc(int class_idx) {
|
|||
|
|
void** head = &g_tls_cache[class_idx];
|
|||
|
|
void* ptr = *head;
|
|||
|
|
if (ptr) *head = *(void**)ptr;
|
|||
|
|
return ptr;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// tiny_alloc_slow.inc.h
|
|||
|
|
static inline void* tiny_refill_from_superslab(int class_idx) {
|
|||
|
|
SuperSlab* ss = g_tls_current_ss[class_idx];
|
|||
|
|
if (ss) return superslab_alloc_from_slab(ss, ...);
|
|||
|
|
return NULL;
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### ❌ Don't use `static inline` for:
|
|||
|
|
- Large functions (>20 lines)
|
|||
|
|
- Slow path logic
|
|||
|
|
- Setup/teardown code
|
|||
|
|
|
|||
|
|
#### ✅ Use regular functions:
|
|||
|
|
```c
|
|||
|
|
// tiny_remote.c
|
|||
|
|
void tiny_remote_drain_batch(int class_idx) {
|
|||
|
|
// 50+ lines: slow path → regular function
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// hakmem_tiny_superslab.c
|
|||
|
|
SuperSlab* superslab_refill(int class_idx) {
|
|||
|
|
// Complex allocation → regular function
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Macro Usage
|
|||
|
|
|
|||
|
|
#### Use Macros for:
|
|||
|
|
```c
|
|||
|
|
// tiny_atomic.h
|
|||
|
|
#define TINY_ATOMIC_LOAD(ptr, order) \
|
|||
|
|
atomic_load_explicit((_Atomic typeof(*ptr)*)ptr, order)
|
|||
|
|
|
|||
|
|
#define TINY_ATOMIC_CAS(ptr, expected, desired) \
|
|||
|
|
atomic_compare_exchange_strong_explicit( \
|
|||
|
|
(_Atomic typeof(*ptr)*)ptr, expected, desired, \
|
|||
|
|
memory_order_release, memory_order_relaxed)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### Don't over-use for:
|
|||
|
|
- Complex logic (use functions)
|
|||
|
|
- Multiple statements (hard to debug)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Phase 7: Testing Strategy
|
|||
|
|
|
|||
|
|
### Per-File Unit Tests
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
// test_tiny_alloc_fast.c
|
|||
|
|
void test_tiny_alloc_fast_pop_empty() {
|
|||
|
|
g_tls_cache[0] = NULL;
|
|||
|
|
assert(tiny_fast_pop_alloc(0) == NULL);
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
void test_tiny_alloc_fast_push_pop() {
|
|||
|
|
void* ptr = malloc(8);
|
|||
|
|
tiny_fast_push_alloc(0, ptr);
|
|||
|
|
assert(tiny_fast_pop_alloc(0) == ptr);
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Integration Tests
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
// test_tiny_alloc_free_cycle.c
|
|||
|
|
void test_alloc_free_single_thread() {
|
|||
|
|
void* p1 = hak_tiny_alloc(8);
|
|||
|
|
void* p2 = hak_tiny_alloc(8);
|
|||
|
|
hak_tiny_free(p1);
|
|||
|
|
hak_tiny_free(p2);
|
|||
|
|
// Verify no memory leak
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
void test_alloc_free_cross_thread() {
|
|||
|
|
// Thread A allocs, Thread B frees
|
|||
|
|
// Verify remote queue works
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 期待される効果
|
|||
|
|
|
|||
|
|
### パフォーマンス
|
|||
|
|
| 指標 | 現状 | 目標 | 効果 |
|
|||
|
|
|------|------|------|------|
|
|||
|
|
| Fast path 命令数 | 20+ | 3-4 | -80% cycles |
|
|||
|
|
| Branch misprediction | 50-100 cycles | 15-20 cycles | -70% |
|
|||
|
|
| TLS cache hit rate | 70% | 85% | +15% throughput |
|
|||
|
|
|
|||
|
|
### 保守性
|
|||
|
|
| 指標 | 現状 | 目標 | 効果 |
|
|||
|
|
|------|------|------|------|
|
|||
|
|
| Max file size | 1470行 | 300-400行 | -70% 複雑度 |
|
|||
|
|
| Cyclic dependencies | 多数 | 0 | 100% 明確化 |
|
|||
|
|
| Code review time | 3h | 30min | -90% |
|
|||
|
|
|
|||
|
|
### 開発速度
|
|||
|
|
| タスク | 現状 | リファクタ後 |
|
|||
|
|
|--------|------|-------------|
|
|||
|
|
| Bug fix | 2-4h | 30min |
|
|||
|
|
| Optimization | 4-6h | 1-2h |
|
|||
|
|
| Feature add | 6-8h | 2-3h |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Timeline
|
|||
|
|
|
|||
|
|
| Week | Task | Owner | Status |
|
|||
|
|
|------|------|-------|--------|
|
|||
|
|
| 1 | Box 1,5,6 (Fast path) | Claude | TODO |
|
|||
|
|
| 2 | Box 2,3 (Remote/SS) | Claude | TODO |
|
|||
|
|
| 3 | Box 4 (Publish/Adopt) | Claude | TODO |
|
|||
|
|
| 4 | Box 8,9 (Lifecycle/Intel) | Claude | TODO |
|
|||
|
|
| 5 | Testing + Integration | Claude | TODO |
|
|||
|
|
| 6 | Benchmark + Tuning | Claude | TODO |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Rollback Strategy
|
|||
|
|
|
|||
|
|
If performance regresses:
|
|||
|
|
1. Keep all old .inc files (legacy compatibility)
|
|||
|
|
2. hakmem_tiny.c can include either old or new
|
|||
|
|
3. Gradual migration: one Box at a time
|
|||
|
|
4. Benchmark after each Box
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Known Risks
|
|||
|
|
|
|||
|
|
1. **Include order sensitivity**: New Box 順序が critical → Test carefully
|
|||
|
|
2. **Inlining threshold**: Compiler may not inline all static inline functions → Profiling needed
|
|||
|
|
3. **TLS cache contention**: Fast path の simple化で TLS synchronization が bottleneck化する可能性 → Monitor g_tls_cache_count
|
|||
|
|
4. **RemoteQueue scalability**: Box 2 の remote queue が high-contention に弱い → Lock-free 化検討
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Success Criteria
|
|||
|
|
|
|||
|
|
✅ All tests pass (unit + integration + larson)
|
|||
|
|
✅ Fast path = 3-4 命令 (assembly analysis)
|
|||
|
|
✅ +10-15% throughput on Tiny allocations
|
|||
|
|
✅ All files <= 500 行
|
|||
|
|
✅ Zero cyclic dependencies
|
|||
|
|
✅ Documentation complete
|
|||
|
|
|