Files
hakmem/docs/analysis/TLS_LAYOUT_V11B1_PLAN.md

159 lines
4.8 KiB
Markdown
Raw Normal View History

# TLS Layout Plan: Unified ULTRA TLS (Phase v11b-2 Target)
## Goal
C4-C7 ULTRA の hot path を **1 cache line (64B)** に収める。
## Design: TinyUltraTlsCtx
```c
// ============================================================================
// LINE 1: Hot fields (64B) - alloc/free hot path
// ============================================================================
typedef struct TinyUltraTlsCtx {
// Counts (8B total, padded for alignment)
uint16_t c4_count; // 2B
uint16_t c5_count; // 2B
uint16_t c6_count; // 2B
uint16_t c7_count; // 2B
// Freelist heads (32B)
void* c4_head; // 8B - next free slot for C4
void* c5_head; // 8B
void* c6_head; // 8B
void* c7_head; // 8B
// Segment range (shared across C4-C7, 16B)
uintptr_t seg_base; // 8B
uintptr_t seg_end; // 8B
// ========== LINE 1 END: 56B used, 8B spare ==========
uint64_t _hot_pad; // 8B - align to 64B
// ============================================================================
// LINE 2+: Cold fields (refill/retire, debug, stats)
// ============================================================================
// Freelist tails (for bulk push, 32B)
void* c4_tail; // 8B
void* c5_tail; // 8B
void* c6_tail; // 8B
void* c7_tail; // 8B
// Segment metadata (16B)
void* segment; // 8B - owning segment pointer
uint32_t page_idx; // 4B - current page index
uint32_t _cold_pad; // 4B
// Stats (optional, 16B)
uint64_t alloc_count; // 8B
uint64_t free_count; // 8B
} TinyUltraTlsCtx;
// Total: 128B (2 cache lines)
```
## Memory Layout
```
Offset Field Size Cache Line
------ ----- ---- ----------
0x00 c4_count 2B LINE 1 (HOT)
0x02 c5_count 2B LINE 1
0x04 c6_count 2B LINE 1
0x06 c7_count 2B LINE 1
0x08 c4_head 8B LINE 1
0x10 c5_head 8B LINE 1
0x18 c6_head 8B LINE 1
0x20 c7_head 8B LINE 1
0x28 seg_base 8B LINE 1
0x30 seg_end 8B LINE 1
0x38 _hot_pad 8B LINE 1
------ ----- ---- ----------
0x40 c4_tail 8B LINE 2 (COLD)
0x48 c5_tail 8B LINE 2
0x50 c6_tail 8B LINE 2
0x58 c7_tail 8B LINE 2
0x60 segment 8B LINE 3
0x68 page_idx 4B LINE 3
0x6C _cold_pad 4B LINE 3
0x70 alloc_count 8B LINE 3
0x78 free_count 8B LINE 3
```
## Hot Path Access Pattern
### alloc (TLS hit)
```c
static inline void* tiny_ultra_alloc_fast(TinyUltraTlsCtx* ctx, uint8_t class_idx) {
// Single cache line access
uint16_t* counts = &ctx->c4_count;
void** heads = &ctx->c4_head;
uint16_t c = counts[class_idx - 4];
if (likely(c > 0)) {
counts[class_idx - 4] = c - 1;
return heads[class_idx - 4]; // pop from linked list
}
return tiny_ultra_alloc_slow(ctx, class_idx);
}
```
### free (TLS push)
```c
static inline void tiny_ultra_free_fast(TinyUltraTlsCtx* ctx, void* ptr, uint8_t class_idx) {
// Range check (seg_base/end in same cache line)
uintptr_t p = (uintptr_t)ptr;
if (likely(p >= ctx->seg_base && p < ctx->seg_end)) {
// Push to freelist (single cache line)
void** heads = &ctx->c4_head;
uint16_t* counts = &ctx->c4_count;
*(void**)ptr = heads[class_idx - 4];
heads[class_idx - 4] = ptr;
counts[class_idx - 4]++;
return;
}
tiny_ultra_free_slow(ctx, ptr, class_idx);
}
```
## Comparison: Before vs After
| Metric | Current (v11b-1) | Unified (v11b-2) |
|--------|------------------|------------------|
| TLS size (C4-C7) | 3712B | 128B |
| Cache lines (hot) | ~60 | **1** |
| seg_base/end copies | 4 | 1 |
| count access | scattered | contiguous |
## Freelist Design: Linked List vs Array
**選択: Linked List (head/tail)**
理由:
1. **固定配列不要**: freelist[128] の 1KB を削除
2. **O(1) push/pop**: head だけで十分
3. **Bulk drain**: tail があれば一括返却可能
4. **メモリ効率**: 使用中スロットにのみリンク
トレードオフ:
- prefetch しにくい(配列なら連続アクセス可能)
- 空間局所性が落ちる可能性
→ プロファイル後に配列版も検討可能
## Implementation Notes
1. **Backward Compatibility**: 既存の TinyC*UltraFreeTLS API を維持しつつ、内部で TinyUltraTlsCtx を使う
2. **Gradual Migration**: まず C7 を新構造に移行し、効果を計測
3. **ENV Gate**: `HAKMEM_ULTRA_UNIFIED_TLS=1` で有効化
---
**Date**: 2025-12-12
**Phase**: v11b-2 planning