Files
hakmem/docs/analysis/TLS_LAYOUT_V11B1_PLAN.md
Moe Charm (CI) 1a8652a91a Phase TLS-UNIFY-3: C6 intrusive freelist implementation (完成)
Implement C6 ULTRA intrusive LIFO freelist with ENV gating:
- Single-linked LIFO using next pointer at USER+1 offset
- tiny_next_store/tiny_next_load for pointer access (single source of truth)
- Segment learning via ss_fast_lookup (per-class seg_base/seg_end)
- ENV gate: HAKMEM_TINY_C6_ULTRA_INTRUSIVE_FL (default OFF)
- Counters: c6_ifl_push/pop/fallback in FREE_PATH_STATS

Files:
- core/box/tiny_ultra_tls_box.h: Added c6_head field for intrusive LIFO
- core/box/tiny_ultra_tls_box.c: Pop/push with intrusive branching (case 6)
- core/box/tiny_c6_ultra_intrusive_env_box.h: ENV gate (new)
- core/box/tiny_c6_intrusive_freelist_box.h: L1 pure LIFO (new)
- core/tiny_debug_ring.h: C6_IFL events
- core/box/free_path_stats_box.h/c: c6_ifl_* counters

A/B Test Results (1M iterations, ws=200, 257-512B):
- ENV_OFF (array): 56.6 Mop/s avg
- ENV_ON (intrusive): 57.6 Mop/s avg (+1.8%, within noise)
- Counters verified: c6_ifl_push=265890, c6_ifl_pop=265815, fallback=0

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-12 16:26:42 +09:00

159 lines
4.8 KiB
Markdown

# TLS Layout Plan: Unified ULTRA TLS (Phase v11b-2 Target)
## Goal
C4-C7 ULTRA の hot path を **1 cache line (64B)** に収める。
## Design: TinyUltraTlsCtx
```c
// ============================================================================
// LINE 1: Hot fields (64B) - alloc/free hot path
// ============================================================================
typedef struct TinyUltraTlsCtx {
// Counts (8B total, padded for alignment)
uint16_t c4_count; // 2B
uint16_t c5_count; // 2B
uint16_t c6_count; // 2B
uint16_t c7_count; // 2B
// Freelist heads (32B)
void* c4_head; // 8B - next free slot for C4
void* c5_head; // 8B
void* c6_head; // 8B
void* c7_head; // 8B
// Segment range (shared across C4-C7, 16B)
uintptr_t seg_base; // 8B
uintptr_t seg_end; // 8B
// ========== LINE 1 END: 56B used, 8B spare ==========
uint64_t _hot_pad; // 8B - align to 64B
// ============================================================================
// LINE 2+: Cold fields (refill/retire, debug, stats)
// ============================================================================
// Freelist tails (for bulk push, 32B)
void* c4_tail; // 8B
void* c5_tail; // 8B
void* c6_tail; // 8B
void* c7_tail; // 8B
// Segment metadata (16B)
void* segment; // 8B - owning segment pointer
uint32_t page_idx; // 4B - current page index
uint32_t _cold_pad; // 4B
// Stats (optional, 16B)
uint64_t alloc_count; // 8B
uint64_t free_count; // 8B
} TinyUltraTlsCtx;
// Total: 128B (2 cache lines)
```
## Memory Layout
```
Offset Field Size Cache Line
------ ----- ---- ----------
0x00 c4_count 2B LINE 1 (HOT)
0x02 c5_count 2B LINE 1
0x04 c6_count 2B LINE 1
0x06 c7_count 2B LINE 1
0x08 c4_head 8B LINE 1
0x10 c5_head 8B LINE 1
0x18 c6_head 8B LINE 1
0x20 c7_head 8B LINE 1
0x28 seg_base 8B LINE 1
0x30 seg_end 8B LINE 1
0x38 _hot_pad 8B LINE 1
------ ----- ---- ----------
0x40 c4_tail 8B LINE 2 (COLD)
0x48 c5_tail 8B LINE 2
0x50 c6_tail 8B LINE 2
0x58 c7_tail 8B LINE 2
0x60 segment 8B LINE 3
0x68 page_idx 4B LINE 3
0x6C _cold_pad 4B LINE 3
0x70 alloc_count 8B LINE 3
0x78 free_count 8B LINE 3
```
## Hot Path Access Pattern
### alloc (TLS hit)
```c
static inline void* tiny_ultra_alloc_fast(TinyUltraTlsCtx* ctx, uint8_t class_idx) {
// Single cache line access
uint16_t* counts = &ctx->c4_count;
void** heads = &ctx->c4_head;
uint16_t c = counts[class_idx - 4];
if (likely(c > 0)) {
counts[class_idx - 4] = c - 1;
return heads[class_idx - 4]; // pop from linked list
}
return tiny_ultra_alloc_slow(ctx, class_idx);
}
```
### free (TLS push)
```c
static inline void tiny_ultra_free_fast(TinyUltraTlsCtx* ctx, void* ptr, uint8_t class_idx) {
// Range check (seg_base/end in same cache line)
uintptr_t p = (uintptr_t)ptr;
if (likely(p >= ctx->seg_base && p < ctx->seg_end)) {
// Push to freelist (single cache line)
void** heads = &ctx->c4_head;
uint16_t* counts = &ctx->c4_count;
*(void**)ptr = heads[class_idx - 4];
heads[class_idx - 4] = ptr;
counts[class_idx - 4]++;
return;
}
tiny_ultra_free_slow(ctx, ptr, class_idx);
}
```
## Comparison: Before vs After
| Metric | Current (v11b-1) | Unified (v11b-2) |
|--------|------------------|------------------|
| TLS size (C4-C7) | 3712B | 128B |
| Cache lines (hot) | ~60 | **1** |
| seg_base/end copies | 4 | 1 |
| count access | scattered | contiguous |
## Freelist Design: Linked List vs Array
**選択: Linked List (head/tail)**
理由:
1. **固定配列不要**: freelist[128] の 1KB を削除
2. **O(1) push/pop**: head だけで十分
3. **Bulk drain**: tail があれば一括返却可能
4. **メモリ効率**: 使用中スロットにのみリンク
トレードオフ:
- prefetch しにくい(配列なら連続アクセス可能)
- 空間局所性が落ちる可能性
→ プロファイル後に配列版も検討可能
## Implementation Notes
1. **Backward Compatibility**: 既存の TinyC*UltraFreeTLS API を維持しつつ、内部で TinyUltraTlsCtx を使う
2. **Gradual Migration**: まず C7 を新構造に移行し、効果を計測
3. **ENV Gate**: `HAKMEM_ULTRA_UNIFIED_TLS=1` で有効化
---
**Date**: 2025-12-12
**Phase**: v11b-2 planning