Phase 14 kickoff: Pointer-chase reduction (tcache-style intrusive LIFO)

Design and implementation plan for Phase 14 v1:
- Target: Reduce pointer-chase overhead in TinyUnifiedCache
- Strategy: Add intrusive LIFO tcache layer before array-based cache
- Inspired by glibc tcache (per-bin head pointer, intrusive next)

Approach:
- L0: tiny_tcache_env_box (ENV gate: HAKMEM_TINY_TCACHE=0/1, default OFF)
- L1: tiny_tcache_box (intrusive LIFO: push/pop with cap=64)
- Integration: Inside unified_cache_push/pop (minimal call site changes)

Expected benefits:
- tcache hit: No array access, just head pointer + intrusive next
- Better locality (LIFO vs FIFO)
- Closer to system malloc tcache behavior

A/B plan:
- Test: HAKMEM_TINY_TCACHE=0/1 on Mixed 10-run
- GO threshold: +1.0% mean
- Rollback: ENV-gated, default OFF

Files added:
- docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_1_DESIGN.md
- docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_1_NEXT_INSTRUCTIONS.md

Next: Implement Phase 14 v1 patches (ENV box → tcache box → integration)

🤖 Generated with Claude Code

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
Moe Charm (CI)
2025-12-15 00:32:56 +09:00
parent cbb35ee27f
commit 0b306f72f4
2 changed files with 260 additions and 0 deletions

View File

@ -0,0 +1,111 @@
# Phase 14: Pointer-Chase Reduction v1 — 次の指示書Tiny tcache intrusive LIFO
## 0. Status
- Phase 13 v1 / E5-2 で header write tax は **NEUTRAL** → 次の仮説へ
- 次の芯: **thread cache / pointer chase**system malloc の tcache と構造差が濃厚)
設計: `docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_1_DESIGN.md`
---
## 1. 目的GO 条件)
Mixed 10-runclean envで:
- **GO**: mean +1.0% 以上
- **NO-GO**: mean -1.0% 以下(即 rollback / freeze
- **NEUTRAL**: ±1.0%research box freeze
---
## 2. 実装パッチ順(小さく積む)
### Patch 1: L0 ENV Box戻せる + refresh
新規:
- `core/box/tiny_tcache_env_box.h`
- `core/box/tiny_tcache_env_box.c`
ENV:
- `HAKMEM_TINY_TCACHE=0/1`default: 0
- `HAKMEM_TINY_TCACHE_CAP=64`default: 64
API:
- `tiny_tcache_enabled()`
- `tiny_tcache_cap()`
- `tiny_tcache_env_refresh_from_env()`
要件:
- hot path に `getenv()` を置かないcached read のみ)
### Patch 2: L1 tcache Boxintrusive LIFO
新規:
- `core/box/tiny_tcache_box.h`
中身:
- `__thread` の head/count を class 別に持つ8クラス固定
- next pointer の読み書きは `tiny_next_store/load` を必須にする
API:
- `tiny_tcache_try_push(class_idx, base) -> bool`
- `tiny_tcache_try_pop(class_idx) -> void*`BASE or NULL
### Patch 3: 統合点は unified_cache の内部call site を増やさない)
修正:
- `core/front/tiny_unified_cache.h``unified_cache_push/pop` の先頭に “1回だけ if”
方針:
- tcache hit: 即 return配列に触らない
- miss/overflow: 既存 array cache にフォールバック
### Patch 4: bench_profile の refresh 同期
修正:
- `core/bench_profile.h`
追加:
- `bench_setenv_default(...)` 後に `tiny_tcache_env_refresh_from_env()` を呼ぶ
---
## 3. A/B テスト(同一バイナリ)
Baseline:
```sh
HAKMEM_TINY_TCACHE=0 scripts/run_mixed_10_cleanenv.sh
```
Optimized:
```sh
HAKMEM_TINY_TCACHE=1 scripts/run_mixed_10_cleanenv.sh
```
任意cap 探索は research:
```sh
HAKMEM_TINY_TCACHE=1 HAKMEM_TINY_TCACHE_CAP=32 scripts/run_mixed_10_cleanenv.sh
HAKMEM_TINY_TCACHE=1 HAKMEM_TINY_TCACHE_CAP=64 scripts/run_mixed_10_cleanenv.sh
HAKMEM_TINY_TCACHE=1 HAKMEM_TINY_TCACHE_CAP=128 scripts/run_mixed_10_cleanenv.sh
```
---
## 4. 可視化(最小)
必要なら tcache hit/miss を **TLS カウンタ**で持つatomic 禁止)。
“勝ち筋確認が必要なときだけ” `fprintf(stderr, ...)` でワンショット dump常時ログ禁止
---
## 5. 昇格GO の場合のみ)
GO のとき:
1. preset へ追加(まずは `MIXED_TINYV3_C7_SAFE` のみ)
2. `CURRENT_TASK.md` に A/B を記録
3. rollback 手順:
- `export HAKMEM_TINY_TCACHE=0`
NO-GO/NEUTRAL のとき:
- research box freezedefault OFF のまま保持)