Phase TLS-UNIFY-3: C6 intrusive freelist implementation (完成)
Implement C6 ULTRA intrusive LIFO freelist with ENV gating: - Single-linked LIFO using next pointer at USER+1 offset - tiny_next_store/tiny_next_load for pointer access (single source of truth) - Segment learning via ss_fast_lookup (per-class seg_base/seg_end) - ENV gate: HAKMEM_TINY_C6_ULTRA_INTRUSIVE_FL (default OFF) - Counters: c6_ifl_push/pop/fallback in FREE_PATH_STATS Files: - core/box/tiny_ultra_tls_box.h: Added c6_head field for intrusive LIFO - core/box/tiny_ultra_tls_box.c: Pop/push with intrusive branching (case 6) - core/box/tiny_c6_ultra_intrusive_env_box.h: ENV gate (new) - core/box/tiny_c6_intrusive_freelist_box.h: L1 pure LIFO (new) - core/tiny_debug_ring.h: C6_IFL events - core/box/free_path_stats_box.h/c: c6_ifl_* counters A/B Test Results (1M iterations, ws=200, 257-512B): - ENV_OFF (array): 56.6 Mop/s avg - ENV_ON (intrusive): 57.6 Mop/s avg (+1.8%, within noise) - Counters verified: c6_ifl_push=265890, c6_ifl_pop=265815, fallback=0 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
This commit is contained in:
@ -435,4 +435,30 @@ v3 backend の so_alloc_fast/so_free_fast パスの「内部最適化」に進
|
||||
|
||||
**推奨**: Phase SO-BACKEND-OPT-2 は実装前に perf profile (cycles:u) で so_alloc_fast/so_free_fast を詳細計測することを推奨。
|
||||
|
||||
---
|
||||
|
||||
## Phase v11b-1: Free Path Micro-Optimization (2025-12-12)
|
||||
|
||||
### 変更内容
|
||||
|
||||
perf profile で `free_tiny_fast()` のシリアル ULTRA チェック (C7→C6→C5→C4) が 11.73% overhead を占めていることを発見。`malloc_tiny_fast()` と同様のパターンを適用:
|
||||
|
||||
1. **C7 ULTRA early-exit**: Policy snapshot 前に C7 判定(最頻出パスを最短化)
|
||||
2. **Single switch**: route_kind[class_idx] で一発分岐(jump table 生成)
|
||||
3. **Dead code 削除**: 未使用の v4 チェック、重複 v7 チェックを除去
|
||||
|
||||
### 結果
|
||||
|
||||
| Workload | v11a-5 | v11b-1 | 改善 |
|
||||
|----------|--------|--------|------|
|
||||
| Mixed 16-1024B | 45.4M ops/s | 50.7M ops/s | **+11.7%** |
|
||||
| C6-heavy | 49.1M ops/s | 52.0M ops/s | **+5.9%** |
|
||||
| C6-heavy + MID v3.5 | 53.1M ops/s | 53.6M ops/s | +0.9% |
|
||||
|
||||
### 教訓
|
||||
|
||||
- alloc パス最適化 (v11a-5) と同じパターンが free パスにも有効
|
||||
- シリアル if-else チェーン → switch (jump table) で大幅改善
|
||||
- フロント層の分岐コストは backend より大きい(今回 +11.7% vs 想定 +1-2%)
|
||||
|
||||
***
|
||||
|
||||
Reference in New Issue
Block a user