Files
hakmem/docs/design/REFACTOR_PLAN.md
Moe Charm (CI) 67fb15f35f Wrap debug fprintf in !HAKMEM_BUILD_RELEASE guards (Release build optimization)
## Changes

### 1. core/page_arena.c
- Removed init failure message (lines 25-27) - error is handled by returning early
- All other fprintf statements already wrapped in existing #if !HAKMEM_BUILD_RELEASE blocks

### 2. core/hakmem.c
- Wrapped SIGSEGV handler init message (line 72)
- CRITICAL: Kept SIGSEGV/SIGBUS/SIGABRT error messages (lines 62-64) - production needs crash logs

### 3. core/hakmem_shared_pool.c
- Wrapped all debug fprintf statements in #if !HAKMEM_BUILD_RELEASE:
  - Node pool exhaustion warning (line 252)
  - SP_META_CAPACITY_ERROR warning (line 421)
  - SP_FIX_GEOMETRY debug logging (line 745)
  - SP_ACQUIRE_STAGE0.5_EMPTY debug logging (line 865)
  - SP_ACQUIRE_STAGE0_L0 debug logging (line 803)
  - SP_ACQUIRE_STAGE1_LOCKFREE debug logging (line 922)
  - SP_ACQUIRE_STAGE2_LOCKFREE debug logging (line 996)
  - SP_ACQUIRE_STAGE3 debug logging (line 1116)
  - SP_SLOT_RELEASE debug logging (line 1245)
  - SP_SLOT_FREELIST_LOCKFREE debug logging (line 1305)
  - SP_SLOT_COMPLETELY_EMPTY debug logging (line 1316)
- Fixed lock_stats_init() for release builds (lines 60-65) - ensure g_lock_stats_enabled is initialized

## Performance Validation

Before: 51M ops/s (with debug fprintf overhead)
After:  49.1M ops/s (consistent performance, fprintf removed from hot paths)

## Build & Test

```bash
./build.sh larson_hakmem
./out/release/larson_hakmem 1 5 1 1000 100 10000 42
# Result: 49.1M ops/s
```

Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 13:14:18 +09:00

773 lines
21 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# HAKMEM Tiny Allocator スーパーリファクタリング計画
## 執行サマリー
### 現状
- **hakmem_tiny.c (1584行)**: 複数の .inc ファイルをアグリゲートする器
- **hakmem_tiny_free.inc (1470行)**: 最大級の混合ファイル
- Free パス (33-558行)
- SuperSlab Allocation (559-998行)
- SuperSlab Free (999-1369行)
- Query API (commented-out, extracted to hakmem_tiny_query.c)
**問題点**:
1. 単一のメガファイル (1470行)
2. Free + Allocation が混在
3. 責務が不明確
4. Static inline の嵌套が深い
### 目標
**「箱理論に基づいて、500行以下のファイルに分割」**
- 各ファイルが単一責務 (SRP)
- `static inline` で境界をゼロコスト化
- 依存関係を明確化
- リファクタリング順序の最適化
---
## Phase 1: 現状分析
### 巨大ファイル TOP 10
| ランク | ファイル | 行数 | 責務 |
|--------|---------|------|------|
| 1 | hakmem_pool.c | 2592 | Mid/Large allocator (対象外) |
| 2 | hakmem_tiny.c | 1584 | Tiny アグリゲータ (分析対象) |
| 3 | **hakmem_tiny_free.inc** | **1470** | Free + SS Alloc + Query (要分割) |
| 4 | hakmem.c | 1449 | Top-level allocator (対象外) |
| 5 | hakmem_l25_pool.c | 1195 | L25 pool (対象外) |
| 6 | hakmem_tiny_intel.inc | 863 | Intel 最適化 (分割候補) |
| 7 | hakmem_tiny_superslab.c | 810 | SuperSlab (継続, 強化済み) |
| 8 | hakmem_tiny_stats.c | 697 | Statistics (継続) |
| 9 | tiny_remote.c | 645 | Remote queue (継続, 分割候補) |
| 10 | hakmem_learner.c | 603 | Learning (対象外) |
### Tiny 関連で 500行超のファイル
```
hakmem_tiny_free.inc 1470 ← 要分割(最優先)
hakmem_tiny_intel.inc 863 ← 分割候補
hakmem_tiny_init.inc 544 ← 分割候補
tiny_remote.c 645 ← 分割候補
```
### hakmem_tiny.c が include する .inc ファイル (44個)
**最大級 (300行超):**
- hakmem_tiny_free.inc (1470) ← **最優先**
- hakmem_tiny_intel.inc (863)
- hakmem_tiny_init.inc (544)
**中規模 (150-300行):**
- hakmem_tiny_refill.inc.h (410)
- hakmem_tiny_alloc_new.inc (275)
- hakmem_tiny_background.inc (261)
- hakmem_tiny_alloc.inc (249)
- hakmem_tiny_lifecycle.inc (244)
- hakmem_tiny_metadata.inc (226)
**小規模 (50-150行):**
- hakmem_tiny_ultra_simple.inc (176)
- hakmem_tiny_slab_mgmt.inc (163)
- hakmem_tiny_fastcache.inc.h (149)
- hakmem_tiny_hotmag.inc.h (147)
- hakmem_tiny_smallmag.inc.h (139)
- hakmem_tiny_hot_pop.inc.h (118)
- hakmem_tiny_bump.inc.h (107)
---
## Phase 2: 箱理論による責務分類
### Box 1: Atomic Ops (最下層, 50-100行)
**責務**: CAS/Exchange/Fetch のラッパー、メモリ順序管理
**新規作成**:
- `tiny_atomic.h` (80行)
**含める内容**:
```c
// Atomics for remote queue, owner_tid, refcount
- tiny_atomic_cas()
- tiny_atomic_exchange()
- tiny_atomic_load/store()
- Memory order wrapper
```
---
### Box 2: Remote Queue & Ownership (下層, 500-700行)
#### 2.1: Remote Queue Operations (`tiny_remote_queue.inc.h`, 250-350行)
**責務**: MPSC stack ops, guard check, node management
**出処**: hakmem_tiny_free.inc の remote queue 部分を抽出
```c
- tiny_remote_queue_contains_guard()
- tiny_remote_queue_push()
- tiny_remote_queue_pop()
- tiny_remote_drain_owner() // from hakmem_tiny_free.inc:170
```
#### 2.2: Remote Drain Logic (`tiny_remote_drain.inc.h`, 200-250行)
**責務**: Drain logic, TLS cleanup
**出処**: hakmem_tiny_free.inc の drain ロジック
```c
- tiny_remote_drain_batch()
- tiny_remote_process_mailbox()
```
#### 2.3: Ownership (Owner TID) (`tiny_owner.inc.h`, 100-150行)
**責務**: owner_tid の acquire/release, slab ownership
**既存**: slab_handle.h (295行, 継続) + 強化
**新規**: tiny_owner.inc.h
```c
- tiny_owner_acquire()
- tiny_owner_release()
- tiny_owner_self()
```
**依存**: Box 1 (Atomic)
---
### Box 3: Superslab Core (`hakmem_tiny_superslab.c` + `hakmem_tiny_superslab.h`, 継続)
**責務**: SuperSlab allocation, cache, registry
**現状**: 810行既に well-structured
**強化**: 下記の Box と連携
- Box 4 の Publish/Adopt
- Box 2 の Remote ops
---
### Box 4: Publish/Adopt (上層, 400-500行)
#### 4.1: Publish (`tiny_publish.c/h`, 継続, 34行)
**責務**: Freelist 変化を publish
**既存**: tiny_publish.c (34行) ← 既に tiny
#### 4.2: Mailbox (`tiny_mailbox.c/h`, 継続, 252行)
**責務**: 他スレッドからの adopt 要求
**既存**: tiny_mailbox.c (252行) → 分割検討
```c
- tiny_mailbox_push() // 50行
- tiny_mailbox_drain() // 150行
```
**分割案**:
- `tiny_mailbox_push.inc.h` (50行)
- `tiny_mailbox_drain.inc.h` (150行)
#### 4.3: Adopt Logic (`tiny_adopt.inc.h`, 200-300行)
**責務**: SuperSlab から slab を adopt する logic
**出処**: hakmem_tiny_free.inc の adoption ロジックを抽出
```c
- tiny_adopt_request()
- tiny_adopt_select()
- tiny_adopt_cooldown()
```
**依存**: Box 3 (SuperSlab), Box 4.2 (Mailbox), Box 2 (Ownership)
---
### Box 5: Allocation Path (横断, 600-800行)
#### 5.1: Fast Path (`tiny_alloc_fast.inc.h`, 200-300行)
**責務**: 3-4 命令の fast path (TLS cache direct pop)
**出処**: hakmem_tiny_ultra_simple.inc (176行) + hakmem_tiny_fastcache.inc.h (149行)
```c
// Ultra-simple fast (SRP):
static inline void* tiny_fast_alloc(int class_idx) {
void** head = &g_tls_cache[class_idx];
void* ptr = *head;
if (ptr) *head = *(void**)ptr; // Pop
return ptr;
}
// Fast push:
static inline int tiny_fast_push(int class_idx, void* ptr) {
int cap = g_tls_cache_cap[class_idx];
int cnt = atomic_load(&g_tls_cache_count[class_idx]);
if (cnt < cap) {
void** head = &g_tls_cache[class_idx];
*(void**)ptr = *head;
*head = ptr;
atomic_increment(&g_tls_cache_count[class_idx]);
return 1;
}
return 0; // Slow path
}
```
#### 5.2: Refill Logic (`tiny_refill.inc.h`, 410行, 既存)
**責務**: キャッシュのリファイル
**現状**: hakmem_tiny_refill.inc.h (410行) ← 既に well-sized
#### 5.3: Slow Path (`tiny_alloc_slow.inc.h`, 250-350行)
**責務**: SuperSlab → New Slab → Refill
**出処**: hakmem_tiny_free.inc の superslab_refill + allocation logic
+ hakmem_tiny_alloc.inc (249行)
```c
- tiny_alloc_slow()
- tiny_refill_from_superslab()
- tiny_new_slab_alloc()
```
**依存**: Box 3 (SuperSlab), Box 5.2 (Refill)
---
### Box 6: Free Path (横断, 600-800行)
#### 6.1: Fast Free (`tiny_free_fast.inc.h`, 200-250行)
**責務**: Same-thread free, TLS cache push
**出処**: hakmem_tiny_free.inc の fast-path free logic
```c
// Fast same-thread free:
static inline int tiny_free_fast(void* ptr, int class_idx) {
// Owner check + Cache push
uint32_t self_tid = tiny_self_u32();
TinySlab* slab = hak_tiny_owner_slab(ptr);
if (!slab || slab->owner_tid != self_tid)
return 0; // Slow path
return tiny_fast_push(class_idx, ptr);
}
```
#### 6.2: Cross-Thread Free (`tiny_free_remote.inc.h`, 250-300行)
**責務**: Remote queue push, publish
**出処**: hakmem_tiny_free.inc の cross-thread logic + remote push
```c
- tiny_free_remote()
- tiny_free_remote_queue_push()
```
**依存**: Box 2 (Remote Queue), Box 4.1 (Publish)
#### 6.3: Guard/Safety (`tiny_free_guard.inc.h`, 100-150行)
**責務**: Guard sentinel check, bounds validation
**出処**: hakmem_tiny_free.inc の guard logic
```c
- tiny_free_guard_check()
- tiny_free_validate_ptr()
```
---
### Box 7: Statistics & Query (分析層, 700-900行)
#### 既存(継続):
- hakmem_tiny_stats.c (697行) - Stats aggregate
- hakmem_tiny_stats_api.h (103行) - Stats API
- hakmem_tiny_stats.h (278行) - Stats internal
- hakmem_tiny_query.c (72行) - Query API
#### 分割検討:
hakmem_tiny_stats.c (697行) は統計エンジン専門なので OK
---
### Box 8: Lifecycle (初期化・クリーンアップ, 544行)
#### 既存:
- hakmem_tiny_init.inc (544行) - Initialization
- hakmem_tiny_lifecycle.inc (244行) - Lifecycle
- hakmem_tiny_slab_mgmt.inc (163行) - Slab management
**分割検討**:
- `tiny_init_globals.inc.h` (150行) - Global vars
- `tiny_init_config.inc.h` (150行) - Config from env
- `tiny_init_pools.inc.h` (150行) - Pool allocation
- `tiny_lifecycle_trim.inc.h` (120行) - Trim logic
- `tiny_lifecycle_shutdown.inc.h` (120行) - Shutdown
---
### Box 9: Intel Specific (863行)
**分割案**:
- `tiny_intel_fast.inc.h` (300行) - Prefetch + PAUSE
- `tiny_intel_cache.inc.h` (200行) - Cache tuning
- `tiny_intel_cfl.inc.h` (150行) - CFL-specific
- `tiny_intel_skl.inc.h` (150行) - SKL-specific (共通化)
---
## Phase 3: 分割実行計画
### Priority 1: Critical Path (1週間)
**目標**: Fast path を 3-4 命令レベルまで削減
1. **Box 1: tiny_atomic.h** (80行) ✨
- `atomic_load_explicit()` wrapper
- `atomic_store_explicit()` wrapper
- `atomic_cas()` wrapper
- 依存: `<stdatomic.h>` のみ
2. **Box 5.1: tiny_alloc_fast.inc.h** (250行) ✨
- Ultra-simple TLS cache pop
- 依存: Box 1
3. **Box 6.1: tiny_free_fast.inc.h** (200行) ✨
- Same-thread fast free
- 依存: Box 1, Box 5.1
4. **Extract from hakmem_tiny_free.inc**:
- Fast path logic (500行) → 上記へ
- SuperSlab path (400行) → Box 5.3, 6.2へ
- Remote logic (250行) → Box 2へ
- Cleanup → hakmem_tiny_free.inc は 300行に削減
**効果**: Fast path を system tcache 並みに最適化
---
### Priority 2: Remote & Ownership (1週間)
5. **Box 2.1: tiny_remote_queue.inc.h** (300行)
- Remote queue ops
- 依存: Box 1
6. **Box 2.3: tiny_owner.inc.h** (120行)
- Owner TID management
- 依存: Box 1, slab_handle.h (既存)
7. **tiny_remote.c の整理**: 645行
- `tiny_remote_queue_ops()` → tiny_remote_queue.inc.h へ
- `tiny_remote_side_*()` → 継続
- リサイズ: 645 → 350行に削減
**効果**: Remote ops を モジュール化
---
### Priority 3: SuperSlab Integration (1-2週間)
8. **Box 3 強化**: hakmem_tiny_superslab.c (810行, 継続)
- Publish/Adopt 統合
- 依存: Box 2, Box 4
9. **Box 4.1-4.3: Publish/Adopt Path** (400-500行)
- `tiny_publish.c` (34行, 既存)
- `tiny_mailbox.c` → 分割
- `tiny_adopt.inc.h` (新規)
**効果**: SuperSlab adoption を完全に統合
---
### Priority 4: Allocation/Free Slow Path (1週間)
10. **Box 5.2-5.3: Refill & Slow Allocation** (650行)
- hakmem_tiny_refill.inc.h (410行, 既存)
- `tiny_alloc_slow.inc.h` (新規, 300行)
11. **Box 6.2-6.3: Cross-thread Free** (400行)
- `tiny_free_remote.inc.h` (新規)
- `tiny_free_guard.inc.h` (新規)
**効果**: Slow path を 明確に分離
---
### Priority 5: Lifecycle & Config (1-2週間)
12. **Box 8: Lifecycle の分割** (400-500行)
- hakmem_tiny_init.inc (544行) → 150 + 150 + 150
- hakmem_tiny_lifecycle.inc (244行) → 120 + 120
- Remove duplication
13. **Box 9: Intel-specific の整理** (863行)
- `tiny_intel_fast.inc.h` (300行)
- `tiny_intel_cache.inc.h` (200行)
- `tiny_intel_common.inc.h` (150行)
- Deduplicate × 3 architectures
**効果**: 設定管理を統一化
---
## Phase 4: 新ファイル構成案
### 最終構成
```
core/
├─ Box 1: Atomic Ops
│ └─ tiny_atomic.h (80行)
├─ Box 2: Remote & Ownership
│ ├─ tiny_remote.h (80行, 既存, 軽量化)
│ ├─ tiny_remote_queue.inc.h (300行, 新規)
│ ├─ tiny_remote_drain.inc.h (150行, 新規)
│ ├─ tiny_owner.inc.h (120行, 新規)
│ └─ slab_handle.h (295行, 既存, 継続)
├─ Box 3: SuperSlab Core
│ ├─ hakmem_tiny_superslab.h (500行, 既存)
│ └─ hakmem_tiny_superslab.c (810行, 既存)
├─ Box 4: Publish/Adopt
│ ├─ tiny_publish.h (6行, 既존)
│ ├─ tiny_publish.c (34行, 既存)
│ ├─ tiny_mailbox.h (11行, 既存)
│ ├─ tiny_mailbox.c (252行, 既존) → 분할 가능
│ ├─ tiny_mailbox_push.inc.h (80行, 새로)
│ ├─ tiny_mailbox_drain.inc.h (150行, 새로)
│ └─ tiny_adopt.inc.h (300行, 새로)
├─ Box 5: Allocation
│ ├─ tiny_alloc_fast.inc.h (250行, 新規)
│ ├─ hakmem_tiny_refill.inc.h (410行, 既存)
│ └─ tiny_alloc_slow.inc.h (300行, 新規)
├─ Box 6: Free
│ ├─ tiny_free_fast.inc.h (200行, 新規)
│ ├─ tiny_free_remote.inc.h (300行, 新規)
│ ├─ tiny_free_guard.inc.h (120行, 新規)
│ └─ hakmem_tiny_free.inc (1470行, 既存) → 300行に削減
├─ Box 7: Statistics
│ ├─ hakmem_tiny_stats.c (697行, 既存)
│ ├─ hakmem_tiny_stats.h (278行, 既存)
│ ├─ hakmem_tiny_stats_api.h (103行, 既存)
│ └─ hakmem_tiny_query.c (72行, 既存)
├─ Box 8: Lifecycle
│ ├─ tiny_init_globals.inc.h (150行, 新規)
│ ├─ tiny_init_config.inc.h (150行, 新規)
│ ├─ tiny_init_pools.inc.h (150行, 新規)
│ ├─ tiny_lifecycle_trim.inc.h (120行, 新規)
│ └─ tiny_lifecycle_shutdown.inc.h (120行, 新規)
├─ Box 9: Intel-specific
│ ├─ tiny_intel_common.inc.h (150行, 新規)
│ ├─ tiny_intel_fast.inc.h (300行, 新規)
│ └─ tiny_intel_cache.inc.h (200行, 新規)
└─ Integration
└─ hakmem_tiny.c (1584行, 既存, include aggregator)
└─ 新規フォーマット:
1. includes Box 1-9
2. Minimal glue code only
```
---
## Phase 5: Include 順序の最適化
### 安全な include 依存関係
```mermaid
graph TD
A[Box 1: tiny_atomic.h] --> B[Box 2: tiny_remote.h]
A --> C[Box 5/6: Alloc/Free]
B --> D[Box 2.1: tiny_remote_queue.inc.h]
D --> E[tiny_remote.c]
A --> F[Box 4: Publish/Adopt]
E --> F
C --> G[Box 3: SuperSlab]
F --> G
G --> H[Box 5.3/6.2: Slow Path]
I[Box 8: Lifecycle] --> H
J[Box 9: Intel] --> C
```
### hakmem_tiny.c の新規フォーマット
```c
#include "hakmem_tiny.h"
#include "hakmem_tiny_config.h"
// ============================================================
// LAYER 0: Atomic + Ownership (lowest)
// ============================================================
#include "tiny_atomic.h"
#include "tiny_owner.inc.h"
#include "slab_handle.h"
// ============================================================
// LAYER 1: Remote Queue + SuperSlab Core
// ============================================================
#include "hakmem_tiny_superslab.h"
#include "tiny_remote_queue.inc.h"
#include "tiny_remote_drain.inc.h"
#include "tiny_remote.inc" // tiny_remote_side_*
#include "tiny_remote.c" // Link-time
// ============================================================
// LAYER 2: Publish/Adopt (publication mechanism)
// ============================================================
#include "tiny_publish.h"
#include "tiny_publish.c"
#include "tiny_mailbox.h"
#include "tiny_mailbox_push.inc.h"
#include "tiny_mailbox_drain.inc.h"
#include "tiny_mailbox.c"
#include "tiny_adopt.inc.h"
// ============================================================
// LAYER 3: Fast Path (allocation + free)
// ============================================================
#include "tiny_alloc_fast.inc.h"
#include "tiny_free_fast.inc.h"
// ============================================================
// LAYER 4: Slow Path (refill + cross-thread free)
// ============================================================
#include "hakmem_tiny_refill.inc.h"
#include "tiny_alloc_slow.inc.h"
#include "tiny_free_remote.inc.h"
#include "tiny_free_guard.inc.h"
// ============================================================
// LAYER 5: Statistics + Query + Metadata
// ============================================================
#include "hakmem_tiny_stats.h"
#include "hakmem_tiny_query.c"
#include "hakmem_tiny_metadata.inc"
// ============================================================
// LAYER 6: Lifecycle + Init
// ============================================================
#include "tiny_init_globals.inc.h"
#include "tiny_init_config.inc.h"
#include "tiny_init_pools.inc.h"
#include "tiny_lifecycle_trim.inc.h"
#include "tiny_lifecycle_shutdown.inc.h"
// ============================================================
// LAYER 7: Intel-specific optimizations
// ============================================================
#include "tiny_intel_common.inc.h"
#include "tiny_intel_fast.inc.h"
#include "tiny_intel_cache.inc.h"
// ============================================================
// LAYER 8: Legacy/Experimental (kept for compat)
// ============================================================
#include "hakmem_tiny_ultra_simple.inc"
#include "hakmem_tiny_alloc.inc"
#include "hakmem_tiny_slow.inc"
// ============================================================
// LAYER 9: Old free.inc (minimal, mostly extracted)
// ============================================================
#include "hakmem_tiny_free.inc" // Now just cleanup
#include "hakmem_tiny_background.inc"
#include "hakmem_tiny_magazine.h"
#include "tiny_refill.h"
#include "tiny_mmap_gate.h"
```
---
## Phase 6: 実装ガイド
### Key Principles
1. **SRP (Single Responsibility Principle)**
- Each file: 1 責務、500行以下
- No sideways dependencies
2. **Zero-Cost Abstraction**
- All boundaries via `static inline`
- No function pointer indirection
- Compiler inlines aggressively
3. **Cyclic Dependency Prevention**
- Layer 1 → Layer 2 → ... → Layer 9
- Backward dependency は回避
4. **Backward Compatibility**
- Legacy .inc files は維持(互換性)
- 段階的に新ファイルに移行
### Static Inline の使用場所
#### ✅ Use `static inline`:
```c
// tiny_atomic.h
static inline void tiny_atomic_store(volatile int* p, int v) {
atomic_store_explicit((_Atomic int*)p, v, memory_order_release);
}
// tiny_free_fast.inc.h
static inline void* tiny_fast_pop_alloc(int class_idx) {
void** head = &g_tls_cache[class_idx];
void* ptr = *head;
if (ptr) *head = *(void**)ptr;
return ptr;
}
// tiny_alloc_slow.inc.h
static inline void* tiny_refill_from_superslab(int class_idx) {
SuperSlab* ss = g_tls_current_ss[class_idx];
if (ss) return superslab_alloc_from_slab(ss, ...);
return NULL;
}
```
#### ❌ Don't use `static inline` for:
- Large functions (>20 lines)
- Slow path logic
- Setup/teardown code
#### ✅ Use regular functions:
```c
// tiny_remote.c
void tiny_remote_drain_batch(int class_idx) {
// 50+ lines: slow path → regular function
}
// hakmem_tiny_superslab.c
SuperSlab* superslab_refill(int class_idx) {
// Complex allocation → regular function
}
```
### Macro Usage
#### Use Macros for:
```c
// tiny_atomic.h
#define TINY_ATOMIC_LOAD(ptr, order) \
atomic_load_explicit((_Atomic typeof(*ptr)*)ptr, order)
#define TINY_ATOMIC_CAS(ptr, expected, desired) \
atomic_compare_exchange_strong_explicit( \
(_Atomic typeof(*ptr)*)ptr, expected, desired, \
memory_order_release, memory_order_relaxed)
```
#### Don't over-use for:
- Complex logic (use functions)
- Multiple statements (hard to debug)
---
## Phase 7: Testing Strategy
### Per-File Unit Tests
```c
// test_tiny_alloc_fast.c
void test_tiny_alloc_fast_pop_empty() {
g_tls_cache[0] = NULL;
assert(tiny_fast_pop_alloc(0) == NULL);
}
void test_tiny_alloc_fast_push_pop() {
void* ptr = malloc(8);
tiny_fast_push_alloc(0, ptr);
assert(tiny_fast_pop_alloc(0) == ptr);
}
```
### Integration Tests
```c
// test_tiny_alloc_free_cycle.c
void test_alloc_free_single_thread() {
void* p1 = hak_tiny_alloc(8);
void* p2 = hak_tiny_alloc(8);
hak_tiny_free(p1);
hak_tiny_free(p2);
// Verify no memory leak
}
void test_alloc_free_cross_thread() {
// Thread A allocs, Thread B frees
// Verify remote queue works
}
```
---
## 期待される効果
### パフォーマンス
| 指標 | 現状 | 目標 | 効果 |
|------|------|------|------|
| Fast path 命令数 | 20+ | 3-4 | -80% cycles |
| Branch misprediction | 50-100 cycles | 15-20 cycles | -70% |
| TLS cache hit rate | 70% | 85% | +15% throughput |
### 保守性
| 指標 | 現状 | 目標 | 効果 |
|------|------|------|------|
| Max file size | 1470行 | 300-400行 | -70% 複雑度 |
| Cyclic dependencies | 多数 | 0 | 100% 明確化 |
| Code review time | 3h | 30min | -90% |
### 開発速度
| タスク | 現状 | リファクタ後 |
|--------|------|-------------|
| Bug fix | 2-4h | 30min |
| Optimization | 4-6h | 1-2h |
| Feature add | 6-8h | 2-3h |
---
## Timeline
| Week | Task | Owner | Status |
|------|------|-------|--------|
| 1 | Box 1,5,6 (Fast path) | Claude | TODO |
| 2 | Box 2,3 (Remote/SS) | Claude | TODO |
| 3 | Box 4 (Publish/Adopt) | Claude | TODO |
| 4 | Box 8,9 (Lifecycle/Intel) | Claude | TODO |
| 5 | Testing + Integration | Claude | TODO |
| 6 | Benchmark + Tuning | Claude | TODO |
---
## Rollback Strategy
If performance regresses:
1. Keep all old .inc files (legacy compatibility)
2. hakmem_tiny.c can include either old or new
3. Gradual migration: one Box at a time
4. Benchmark after each Box
---
## Known Risks
1. **Include order sensitivity**: New Box 順序が critical → Test carefully
2. **Inlining threshold**: Compiler may not inline all static inline functions → Profiling needed
3. **TLS cache contention**: Fast path の simple化で TLS synchronization が bottleneck化する可能性 → Monitor g_tls_cache_count
4. **RemoteQueue scalability**: Box 2 の remote queue が high-contention に弱い → Lock-free 化検討
---
## Success Criteria
✅ All tests pass (unit + integration + larson)
✅ Fast path = 3-4 命令 (assembly analysis)
✅ +10-15% throughput on Tiny allocations
✅ All files <= 500 行
✅ Zero cyclic dependencies
✅ Documentation complete