Files
hakmem/docs/analysis/TLS_LAYOUT_V11B1_CURRENT.md
Moe Charm (CI) 1a8652a91a Phase TLS-UNIFY-3: C6 intrusive freelist implementation (完成)
Implement C6 ULTRA intrusive LIFO freelist with ENV gating:
- Single-linked LIFO using next pointer at USER+1 offset
- tiny_next_store/tiny_next_load for pointer access (single source of truth)
- Segment learning via ss_fast_lookup (per-class seg_base/seg_end)
- ENV gate: HAKMEM_TINY_C6_ULTRA_INTRUSIVE_FL (default OFF)
- Counters: c6_ifl_push/pop/fallback in FREE_PATH_STATS

Files:
- core/box/tiny_ultra_tls_box.h: Added c6_head field for intrusive LIFO
- core/box/tiny_ultra_tls_box.c: Pop/push with intrusive branching (case 6)
- core/box/tiny_c6_ultra_intrusive_env_box.h: ENV gate (new)
- core/box/tiny_c6_intrusive_freelist_box.h: L1 pure LIFO (new)
- core/tiny_debug_ring.h: C6_IFL events
- core/box/free_path_stats_box.h/c: c6_ifl_* counters

A/B Test Results (1M iterations, ws=200, 257-512B):
- ENV_OFF (array): 56.6 Mop/s avg
- ENV_ON (intrusive): 57.6 Mop/s avg (+1.8%, within noise)
- Counters verified: c6_ifl_push=265890, c6_ifl_pop=265815, fallback=0

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-12 16:26:42 +09:00

178 lines
4.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# TLS Layout Analysis (Phase v11b-1 Current State)
## Overview
現在の ULTRA TLS は **クラス別に独立した struct** が散在しており、L1D キャッシュ効率が悪い。
## TLS Structures (ULTRA C4-C7)
### 1. TinyC7UltraFreeTLS (`tiny_c7_ultra_tls_t`)
```c
typedef struct tiny_c7_ultra_tls_t {
uint16_t count; // 2B (hot)
uint16_t _pad; // 2B
void* freelist[128]; // 1024B (128 * 8)
// --- cold fields ---
uintptr_t seg_base; // 8B
uintptr_t seg_end; // 8B
tiny_c7_ultra_segment_t* seg; // 8B
void* page_base; // 8B
size_t block_size; // 8B
uint32_t page_idx; // 4B
tiny_c7_ultra_page_meta_t* page_meta; // 8B
bool headers_initialized; // 1B
} tiny_c7_ultra_tls_t;
```
| Field | Size | Total |
|-------|------|-------|
| Hot (count + freelist) | 4 + 1024 | 1028B |
| Cold (seg_base...headers_initialized) | ~53B | ~53B |
| **Total** | | **~1080B (17 cache lines)** |
### 2. TinyC6UltraFreeTLS
```c
typedef struct TinyC6UltraFreeTLS {
void* freelist[128]; // 1024B (128 * 8)
uint8_t count; // 1B
uint8_t _pad[7]; // 7B
uintptr_t seg_base; // 8B
uintptr_t seg_end; // 8B
} TinyC6UltraFreeTLS;
```
| Field | Size |
|-------|------|
| freelist | 1024B |
| count + pad | 8B |
| seg_base/end | 16B |
| **Total** | **1048B (17 cache lines)** |
### 3. TinyC5UltraFreeTLS
Same as C6: **1048B (17 cache lines)**
### 4. TinyC4UltraFreeTLS
```c
typedef struct TinyC4UltraFreeTLS {
void* freelist[64]; // 512B (64 * 8)
uint8_t count; // 1B
uint8_t _pad[7]; // 7B
uintptr_t seg_base; // 8B
uintptr_t seg_end; // 8B
} TinyC4UltraFreeTLS;
```
| Field | Size |
|-------|------|
| freelist | 512B |
| count + pad | 8B |
| seg_base/end | 16B |
| **Total** | **536B (9 cache lines)** |
### 5. SmallMidV35TlsCtx (MID v3.5)
```c
typedef struct {
void *page[8]; // 64B
uint32_t offset[8]; // 32B
uint32_t capacity[8]; // 32B
SmallPageMeta_MID_v3 *meta[8]; // 64B
} SmallMidV35TlsCtx;
```
| Field | Size |
|-------|------|
| page[8] | 64B |
| offset[8] | 32B |
| capacity[8] | 32B |
| meta[8] | 64B |
| **Total** | **192B (3 cache lines)** |
## Summary: Total TLS Footprint
| Structure | Size | Cache Lines |
|-----------|------|-------------|
| TinyC7UltraFreeTLS | 1080B | 17 |
| TinyC6UltraFreeTLS | 1048B | 17 |
| TinyC5UltraFreeTLS | 1048B | 17 |
| TinyC4UltraFreeTLS | 536B | 9 |
| SmallMidV35TlsCtx | 192B | 3 |
| **Total ULTRA (C4-C7)** | **3712B** | **~60 lines** |
## Problem Analysis
### 1. Hot Path に必要な最小フィールド
| Operation | Required Fields |
|-----------|-----------------|
| alloc (TLS hit) | count, freelist[count-1] |
| free (TLS push) | count, freelist[count], seg_base/end |
**Hot path は実質 count + head + seg_range の ~24B で済む。**
### 2. 現状の問題
1. **freelist 配列が巨大**: 各クラス 512-1024B の配列を TLS に保持
2. **クラス間で seg_base/end が重複**: C4-C7 が同一セグメント範囲なら共有可能
3. **count の配置が非統一**: C7 は先頭、C4-C6 は freelist の後ろ
4. **Cold fields が hot 領域に混在**: C7 の page_meta 等が毎回ロード
### 3. Cache Miss の原因
- alloc/free のたびに **各クラス専用の TLS struct** をアクセス
- 4 クラス × 平均 16 cache lines = **64 cache lines が L1D を争奪**
- Mixed workload では C4-C7 がランダムに切り替わり、thrashing 発生
---
## Phase TLS-UNIFY-2a: C4-C6 Unified TLS (2025-12-12)
### 実装内容
C4-C6 ULTRA の TLS を `TinyUltraTlsCtx` 1箱に統合:
```c
typedef struct TinyUltraTlsCtx {
// Hot line: counts (8B aligned)
uint16_t c4_count;
uint16_t c5_count;
uint16_t c6_count;
uint16_t _pad_count;
// Per-class segment ranges (learned on first free)
uintptr_t c4_seg_base, c4_seg_end;
uintptr_t c5_seg_base, c5_seg_end;
uintptr_t c6_seg_base, c6_seg_end;
// Per-class array magazines
void* c4_freelist[64]; // 512B
void* c5_freelist[64]; // 512B
void* c6_freelist[128]; // 1024B
} TinyUltraTlsCtx;
// Total: ~2KB per thread
```
**変更点**:
- C4/C5/C6 の TLS を 1 struct に統合
- 配列マガジン方式を維持(安全)
- C7 は別箱のまま(既に安定)
-`TinyC4/5/6UltraFreeTLS` への委譲を廃止
### A/B テスト結果
| Test | v11b-1 (Phase 1) | TLS-UNIFY-2a | Diff |
|------|------------------|--------------|------|
| Mixed 16-1024B | 8.0-8.8 Mop/s | 8.5-9.0 Mop/s | +0~5% |
| MID 257-768B | 8.5-9.0 Mop/s | 8.1-9.0 Mop/s | ±0% |
**結果**: 性能同等以上、SEGV/assert なし ✅
---
**Date**: 2025-12-12
**Phase**: TLS-UNIFY-2a completed