Box API Phase 1-3: Capacity Manager, Carve-Push, Prewarm 実装

Priority 1-3のBox Modulesを実装し、安全なpre-warming APIを提供。
既存の複雑なprewarmコードを1行のBox API呼び出しに置き換え。

## 新規Box Modules

1. **Box Capacity Manager** (capacity_box.h/c)
   - TLS SLL容量の一元管理
   - adaptive_sizing初期化保証
   - Double-free バグ防止

2. **Box Carve-And-Push** (carve_push_box.h/c)
   - アトミックなblock carve + TLS SLL push
   - All-or-nothing semantics
   - Rollback保証(partial failure防止)

3. **Box Prewarm** (prewarm_box.h/c)
   - 安全なTLS cache pre-warming
   - 初期化依存性を隠蔽
   - シンプルなAPI (1関数呼び出し)

## コード簡略化

hakmem_tiny_init.inc: 20行 → 1行
```c
// BEFORE: 複雑なP0分岐とエラー処理
adaptive_sizing_init();
if (prewarm > 0) {
    #if HAKMEM_TINY_P0_BATCH_REFILL
        int taken = sll_refill_batch_from_ss(5, prewarm);
    #else
        int taken = sll_refill_small_from_ss(5, prewarm);
    #endif
}

// AFTER: Box API 1行
int taken = box_prewarm_tls(5, prewarm);
```

## シンボルExport修正

hakmem_tiny.c: 5つのシンボルをstatic → non-static
- g_tls_slabs[] (TLS slab配列)
- g_sll_multiplier (SLL容量乗数)
- g_sll_cap_override[] (容量オーバーライド)
- superslab_refill() (SuperSlab再充填)
- ss_active_add() (アクティブカウンタ)

## ビルドシステム

Makefile: TINY_BENCH_OBJS_BASEに3つのBox modules追加
- core/box/capacity_box.o
- core/box/carve_push_box.o
- core/box/prewarm_box.o

## 動作確認

 Debug build成功
 Box Prewarm API動作確認
   [PREWARM] class=5 requested=128 taken=32

## 次のステップ

- Box Refill Manager (Priority 4)
- Box SuperSlab Allocator (Priority 5)
- Release build修正(tiny_debug_ring_record)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Moe Charm (CI)
2025-11-13 01:45:30 +09:00
parent 0543642dea
commit c7616fd161
14 changed files with 876 additions and 152 deletions

View File

@ -51,32 +51,29 @@ extern void hak_tiny_free(void* ptr); // Fallback for non-header allocations
static inline int hak_tiny_free_fast_v2(void* ptr) {
if (__builtin_expect(!ptr, 0)) return 0;
// CRITICAL: C7 (1KB) is headerless and CANNOT use fast path
// Reading ptr-1 for C7 causes SIGBUS (accesses previous allocation or unmapped page)
// Solution: Check for 1KB alignment and delegate to slow path
// Note: This heuristic has ~0.1% false positive rate (other allocations at 1KB boundaries)
// but is necessary for C7 safety. Slow path handles all cases correctly.
if (__builtin_expect(((uintptr_t)ptr & 0x3FF) == 0, 0)) {
// Pointer is 1KB-aligned → likely C7 or page boundary allocation
// Use slow path for safety (slow path has proper C7 handling)
return 0;
}
// Phase E3-1: Remove registry lookup (50-100 cycles overhead)
// Reason: Phase E1 added headers to C7, making this check redundant
// Header magic validation (2-3 cycles) is now sufficient for all classes
// Expected: 9M → 30-50M ops/s recovery (+226-443%)
// CRITICAL: Check if header is accessible
// CRITICAL: Check if header is accessible before reading
void* header_addr = (char*)ptr - 1;
#if defined(HAKMEM_POOL_TLS_PHASE1) && HAKMEM_TINY_SAFE_FREE
// Strict mode: validate header address with mincore() on every free
#if !HAKMEM_BUILD_RELEASE
// Debug: Always validate header accessibility (strict safety check)
// Cost: ~634 cycles per free (mincore syscall)
// Benefit: Catch all SEGV cases (100% safe)
extern int hak_is_memory_readable(void* addr);
if (!hak_is_memory_readable(header_addr)) {
return 0; // Header not accessible - not a Tiny allocation
}
#else
// Pool TLS disabled: Optimize for common case (99.9% hit rate)
// Release: Optimize for common case (99.9% hit rate)
// Strategy: Only check page boundaries (ptr & 0xFFF == 0)
// - Page boundary check: 1-2 cycles
// - mincore() syscall: ~634 cycles (only if page-aligned)
// - Result: 99.9% of frees avoid mincore() → 317-634x faster!
// - Safety: Page-aligned allocations are rare, most Tiny blocks are interior
if (__builtin_expect(((uintptr_t)ptr & 0xFFF) == 0, 0)) {
extern int hak_is_memory_readable(void* addr);
if (!hak_is_memory_readable(header_addr)) {
@ -116,30 +113,23 @@ static inline int hak_tiny_free_fast_v2(void* ptr) {
}
atomic_fetch_add(&g_integrity_check_class_bounds, 1);
// 2. Check TLS freelist capacity (optional, for bounded cache)
// Note: Can be disabled in release for maximum speed
#if !HAKMEM_BUILD_RELEASE
// Debug-only: simple capacity guard to avoid unbounded TLS growth
// 2. Check TLS freelist capacity (defense in depth - ALWAYS ENABLED)
// CRITICAL: Enable in both debug and release to prevent corruption accumulation
// Reason: If C7 slips through magic validation, capacity limit prevents unbounded growth
// Cost: 1 comparison (~1 cycle, predict-not-taken)
// Benefit: Fail-safe against TLS SLL pollution from false positives
uint32_t cap = (uint32_t)TINY_TLS_MAG_CAP;
if (__builtin_expect(g_tls_sll_count[class_idx] >= cap, 0)) {
return 0; // Route to slow path for spill
return 0; // Route to slow path for spill (Front Gate will catch corruption)
}
#endif
// 3. Push base to TLS freelist (4 instructions, 5-7 cycles)
// Must push base (block start) not user pointer!
// Classes 0-6: Allocation returns base+1 (after header) Free must compute base = ptr-1
// Class 7 (C7): Headerless, allocation returns base → Free uses ptr as-is
void* base;
if (__builtin_expect(class_idx == 7, 0)) {
// C7 is headerless - ptr IS the base (no adjustment needed)
base = ptr;
} else {
// Normal classes have 1-byte header - base is ptr-1
base = (char*)ptr - 1;
}
// Phase E1: ALL classes (C0-C7) have 1-byte header → base = ptr-1
void* base = (char*)ptr - 1;
// Use Box TLS-SLL API (C7-safe)
// REVERT E3-2: Use Box TLS-SLL for all builds (testing hypothesis)
// Hypothesis: Box TLS-SLL acts as verification layer, masking underlying bugs
if (!tls_sll_push(class_idx, base, UINT32_MAX)) {
// C7 rejected or capacity exceeded - route to slow path
return 0;