Phase 15 v1: UnifiedCache FIFO→LIFO NEUTRAL (-0.70% Mixed, +0.42% C7)
Transform existing array-based UnifiedCache from FIFO ring to LIFO stack.
A/B Results:
- Mixed (16-1024B): -0.70% (52,965,966 → 52,593,948 ops/s)
- C7-only (1025-2048B): +0.42% (78,010,783 → 78,335,509 ops/s)
Verdict: NEUTRAL (both below +1.0% GO threshold) - freeze as research box
Implementation:
- L0 ENV gate: tiny_unified_lifo_env_box.{h,c} (HAKMEM_TINY_UNIFIED_LIFO=0/1)
- L1 LIFO ops: tiny_unified_lifo_box.h (unified_cache_try_pop/push_lifo)
- L2 integration: tiny_front_hot_box.h (mode check at entry)
- Reuses existing slots[] array (no intrusive pointers)
Root Causes:
1. Mode check overhead (tiny_unified_lifo_enabled() call)
2. Minimal LIFO vs FIFO locality delta in practice
3. Existing FIFO ring already well-optimized
Bonus Fix: LTO bug for tiny_c7_preserve_header_enabled() (Phase 13/14 latent issue)
- Converted static inline to extern + non-inline implementation
- Fixes undefined reference during LTO linking
Design: docs/analysis/PHASE15_UNIFIEDCACHE_LIFO_1_DESIGN.md
Results: docs/analysis/PHASE15_UNIFIEDCACHE_LIFO_1_AB_TEST_RESULTS.md
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
@ -129,3 +129,17 @@ scripts/verify_health_profiles.sh
|
||||
- Rollback:
|
||||
- `export HAKMEM_TINY_TCACHE=0`
|
||||
|
||||
---
|
||||
|
||||
## 6. 追加調査(やるなら最小 2 本だけ)
|
||||
|
||||
Phase 14 v2 が NEUTRAL の場合、これ以上 “cap 探索” を無制限に回すより、まず原因を 2 本に絞る:
|
||||
|
||||
1) **tcache hit 率の可視化(TLS カウンタのみ、atomic 禁止)**
|
||||
- `tiny_tcache_try_pop/push` の hit/miss/overflow を TLS で数え、Mixed/C7-only で “本当に hit しているか” を確認する。
|
||||
|
||||
2) **TLS-only nextptr wrapper(fence なし)を tcache 専用に導入する v3**
|
||||
- `tiny_next_load/store` は汎用 SSOT のため fence / header restore 分岐を含む。
|
||||
- tcache は TLS-only なので、`tiny_nextptr_offset()` だけを使い、load/store は memcpy/直書き(fenceなし)にする “tcache専用 next” を L1 に閉じ込めて A/B。
|
||||
|
||||
上記 2 本が “当たらない” 場合は、Phase 14 系(tcache 追加)は freeze を確定し、別の構造差(metadata/segment/remote/footprint)へ移る。
|
||||
|
||||
83
docs/analysis/PHASE15_UNIFIEDCACHE_LIFO_1_AB_TEST_RESULTS.md
Normal file
83
docs/analysis/PHASE15_UNIFIEDCACHE_LIFO_1_AB_TEST_RESULTS.md
Normal file
@ -0,0 +1,83 @@
|
||||
# Phase 15 v1: UnifiedCache FIFO→LIFO (Stack) A/B Test Results
|
||||
|
||||
**Date:** 2025-12-15
|
||||
**Benchmark:** Mixed (16–1024B) + C7-only (1025–2048B) 10-run cleanenv
|
||||
**Target:** Transform existing UnifiedCache from FIFO ring to LIFO stack
|
||||
**Expected ROI:** +5-10% (design estimate, cache locality improvement)
|
||||
**GO Threshold:** +1.0% mean improvement
|
||||
|
||||
---
|
||||
|
||||
## 1. Implementation Summary
|
||||
|
||||
Phase 15 v1 transforms the existing array-based UnifiedCache from FIFO (ring buffer) to LIFO (stack) layout.
|
||||
|
||||
**Key Changes:**
|
||||
- **Patch 1**: L0 ENV gate box (`tiny_unified_lifo_env_box.{h,c}`)
|
||||
- **Patch 2**: L1 LIFO operations (`tiny_unified_lifo_box.h`)
|
||||
- **Patch 3**: Hot path integration (`tiny_front_hot_box.h` - alloc/free both)
|
||||
- **Patch 4**: Makefile updates (added `.o` files)
|
||||
- **Patch 5**: bench_profile.h refresh sync
|
||||
|
||||
**Design:**
|
||||
- Reuses existing `TinyUnifiedCache.slots[]` array (no intrusive pointers)
|
||||
- `tail` treated as stack top (depth), `head` unused (always 0)
|
||||
- Mode check at function entry (once per call)
|
||||
- No wrap-around (`mask` unused in LIFO mode)
|
||||
|
||||
**ENV Control:**
|
||||
```bash
|
||||
export HAKMEM_TINY_UNIFIED_LIFO=0 # Baseline (FIFO)
|
||||
export HAKMEM_TINY_UNIFIED_LIFO=1 # Optimized (LIFO)
|
||||
```
|
||||
|
||||
**Bonus Fix:**
|
||||
- Discovered and fixed pre-existing LTO linkage bug for `tiny_c7_preserve_header_enabled()` (Phase 13/14 latent issue)
|
||||
- Converted static inline to extern declaration + non-inline implementation
|
||||
|
||||
---
|
||||
|
||||
## 2. A/B Test Results
|
||||
|
||||
### Mixed (16–1024B):
|
||||
- **Baseline (LIFO=0):** 52,965,966 ops/s
|
||||
- **Optimized (LIFO=1):** 52,593,948 ops/s
|
||||
- **Delta:** **-0.70%** (regression)
|
||||
|
||||
### C7-only (1025–2048B):
|
||||
- **Baseline (LIFO=0):** 78,010,783 ops/s
|
||||
- **Optimized (LIFO=1):** 78,335,509 ops/s
|
||||
- **Delta:** **+0.42%** (slight improvement)
|
||||
|
||||
---
|
||||
|
||||
## 3. Verdict: NEUTRAL
|
||||
|
||||
**Result:** Mixed -0.70%, C7-only +0.42% (both below GO threshold)
|
||||
|
||||
**Comparison to Phase 14:**
|
||||
- Phase 14 v1 (tcache free-side only): Mixed +0.20% (NEUTRAL)
|
||||
- Phase 14 v2 (tcache alloc+free): Mixed +0.08%, C7-only -0.39% (NEUTRAL)
|
||||
- Phase 15 v1 (FIFO→LIFO): Mixed -0.70%, C7-only +0.42% (NEUTRAL)
|
||||
|
||||
**Root Cause:**
|
||||
1. **Mode check overhead**: Entry-point `tiny_unified_lifo_enabled()` call adds branch
|
||||
2. **Minimal locality delta**: LIFO vs FIFO temporal locality difference is small in practice
|
||||
3. **Existing optimization**: FIFO ring implementation already well-optimized
|
||||
4. **Cache warming**: TLS cache pre-warming reduces locality sensitivity
|
||||
|
||||
---
|
||||
|
||||
## 4. Recommendation: Freeze as Research Box
|
||||
|
||||
**Decision:** Freeze Phase 15 v1 as research box (HAKMEM_TINY_UNIFIED_LIFO=0 default, OFF)
|
||||
|
||||
**Rationale:**
|
||||
- Neither LIFO nor FIFO shows significant advantage
|
||||
- Mode switching overhead outweighs potential locality gains
|
||||
- Existing FIFO ring is simple and already fast
|
||||
|
||||
**Next:** Explore alternative approaches:
|
||||
- Hybrid strategies (per-class mode selection)
|
||||
- Batch operations (reduce per-call overhead)
|
||||
- Hardware prefetch hints (explicit locality control)
|
||||
120
docs/analysis/PHASE15_UNIFIEDCACHE_LIFO_1_DESIGN.md
Normal file
120
docs/analysis/PHASE15_UNIFIEDCACHE_LIFO_1_DESIGN.md
Normal file
@ -0,0 +1,120 @@
|
||||
# Phase 15: UnifiedCache FIFO→LIFO (Stack) v1 Design
|
||||
|
||||
**Date:** 2025-12-15
|
||||
**Status:** DESIGN (next candidate)
|
||||
|
||||
---
|
||||
|
||||
## 0. Motivation (Why this next?)
|
||||
|
||||
Phase 14(intrusive tcache)v1/v2 は通電確認まで行ったが **NEUTRAL**。
|
||||
一方で system/mimalloc と比べると、Tiny の thread cache 形状は依然として最重要仮説のまま。
|
||||
|
||||
現行の `TinyUnifiedCache` は **FIFO ring (head/tail + mask)**:
|
||||
- pop/push が毎回 `head/tail` と `mask` の更新を行う
|
||||
- “最近 free したブロック” を最優先で再利用できない(局所性が薄い)
|
||||
|
||||
glibc tcache / mimalloc 系の勝ちパターンは **LIFO** が多い。
|
||||
Phase 15 は intrusive(nextptr)を増やさず、既存 `slots[]` 配列をそのまま使って
|
||||
**FIFO→LIFO(stack)**へ形状変更し、命令数・局所性の両方で勝ち筋を狙う。
|
||||
|
||||
---
|
||||
|
||||
## 1. Design (Box Theory)
|
||||
|
||||
### 1.1 Boxes
|
||||
|
||||
```
|
||||
L0: unified_cache_shape_env_box (ENV gate, reversible)
|
||||
↓
|
||||
L1: unified_cache_lifo_box (push/pop LIFO only, no side effects)
|
||||
↓
|
||||
L2: existing unified_cache (FIFO) (fallback / compatibility)
|
||||
```
|
||||
|
||||
**境界は 1 箇所**:
|
||||
- LIFO disabled → FIFO を使う
|
||||
- LIFO enabled → LIFO を使う
|
||||
|
||||
(実装上は “同一関数内で分岐” になる可能性があるが、責務は箱で分ける)
|
||||
|
||||
---
|
||||
|
||||
## 2. API (minimal)
|
||||
|
||||
ENV:
|
||||
- `HAKMEM_TINY_UNIFIED_LIFO=0/1` (default 0, opt-in)
|
||||
|
||||
L1 API(内部用、static inline):
|
||||
- `unified_cache_pop_lifo(int class_idx) -> BASE or NULL`
|
||||
- `unified_cache_push_lifo(int class_idx, BASE) -> 1/0`
|
||||
|
||||
統合点(候補):
|
||||
- `tiny_hot_alloc_fast()` / `tiny_hot_free_fast()`
|
||||
- ここが “実ホットパス” で、FIFO/LIFO の差分を最も素直に測れる
|
||||
- もしくは `unified_cache_pop()` / `unified_cache_push()`
|
||||
- 既存 caller を増やさず広く効かせたい場合
|
||||
|
||||
---
|
||||
|
||||
## 3. Implementation sketch (concept)
|
||||
|
||||
### 3.1 LIFO state
|
||||
|
||||
`TinyUnifiedCache` に `top` を追加するのではなく、v1 では互換性優先:
|
||||
- 既存の `head/tail` のうち **`tail` を “top” とみなす**(stack depth)
|
||||
- `head` は常に 0(または未使用)
|
||||
- `mask` は不要(wrap-around しない)
|
||||
|
||||
LIFO push:
|
||||
- if `tail < capacity`: `slots[tail++] = base`
|
||||
|
||||
LIFO pop:
|
||||
- if `tail > 0`: `base = slots[--tail]`
|
||||
|
||||
(FIFO とは並立できないため、モード切替時の整合は “drain/reset” を境界に置く)
|
||||
|
||||
### 3.2 Mode switch safety (Fail-Fast)
|
||||
|
||||
- LIFO ON へ切り替える際は `unified_cache_init()` 後に **各 class の `head/tail` を reset**(empty扱い)
|
||||
- bench/profile では init 前に ENV が確定する前提だが、研究箱として “refresh” を持つなら
|
||||
- refresh 時は切替を禁止(または drain/reset を強制)
|
||||
|
||||
---
|
||||
|
||||
## 4. A/B Plan (same binary)
|
||||
|
||||
Baseline:
|
||||
```sh
|
||||
HAKMEM_TINY_UNIFIED_LIFO=0 scripts/run_mixed_10_cleanenv.sh
|
||||
```
|
||||
|
||||
Optimized:
|
||||
```sh
|
||||
HAKMEM_TINY_UNIFIED_LIFO=1 scripts/run_mixed_10_cleanenv.sh
|
||||
```
|
||||
|
||||
追加で “局所性が効くか” を確認:
|
||||
```sh
|
||||
HAKMEM_BENCH_C7_ONLY=1 HAKMEM_TINY_UNIFIED_LIFO=0 scripts/run_mixed_10_cleanenv.sh
|
||||
HAKMEM_BENCH_C7_ONLY=1 HAKMEM_TINY_UNIFIED_LIFO=1 scripts/run_mixed_10_cleanenv.sh
|
||||
```
|
||||
|
||||
GO/NO-GO:
|
||||
- GO: Mixed mean +1.0% 以上
|
||||
- NO-GO: mean -1.0% 以下
|
||||
- NEUTRAL: ±1.0%(freeze)
|
||||
|
||||
---
|
||||
|
||||
## 5. Risks
|
||||
|
||||
1) **mode 分岐の固定費**で相殺(Phase 11/14 の再来)
|
||||
→ 対策: “入口で 1 回だけ” 判定し、hot では分岐を増やさない(関数ポインタ or snapshot)
|
||||
|
||||
2) **切替時の整合**(FIFO state と LIFO state の互換なし)
|
||||
→ 対策: refresh 時の切替は禁止 or drain/reset を境界 1 箇所に固定
|
||||
|
||||
3) **容量チューニング依存**
|
||||
→ v1 はまず形状のみを変えて ROI を確認し、cap 探索は v2 へ分離
|
||||
|
||||
@ -0,0 +1,98 @@
|
||||
# Phase 15: UnifiedCache FIFO→LIFO (Stack) v1 — Next Instructions
|
||||
|
||||
設計: `docs/analysis/PHASE15_UNIFIEDCACHE_LIFO_1_DESIGN.md`
|
||||
|
||||
---
|
||||
|
||||
## 0. Status / Why now
|
||||
|
||||
- Phase 14 v1/v2(intrusive tcache)は **NEUTRAL** → freeze(default OFF)
|
||||
- 次の狙いは intrusive を増やさず、既存 `slots[]` を使って **FIFO ring → LIFO stack** に変える(形状で命令数と局所性を取りに行く)
|
||||
|
||||
---
|
||||
|
||||
## 1. GO 条件
|
||||
|
||||
Mixed 10-run(clean env):
|
||||
- **GO**: mean +1.0% 以上
|
||||
- **NO-GO**: mean -1.0% 以下
|
||||
- **NEUTRAL**: ±1.0% → research box freeze
|
||||
|
||||
---
|
||||
|
||||
## 2. Patch 順(小さく積む)
|
||||
|
||||
### Patch 1: L0 ENV gate box(戻せる)
|
||||
|
||||
新規:
|
||||
- `core/box/tiny_unified_lifo_env_box.{h,c}`
|
||||
|
||||
ENV:
|
||||
- `HAKMEM_TINY_UNIFIED_LIFO=0/1`(default 0)
|
||||
|
||||
要件:
|
||||
- hot path に `getenv()` を置かない(cached)
|
||||
- bench_profile の `putenv()` 同期が必要なら refresh API を用意(ただし mode 切替の整合に注意)
|
||||
|
||||
### Patch 2: L1 LIFO 操作箱(副作用ゼロ)
|
||||
|
||||
新規(static inline 想定):
|
||||
- `core/box/tiny_unified_lifo_box.h`
|
||||
|
||||
API:
|
||||
- `unified_cache_try_pop_lifo(int class_idx) -> void* base_or_null`
|
||||
- `unified_cache_try_push_lifo(int class_idx, void* base) -> int handled(1/0)`
|
||||
|
||||
実装方針:
|
||||
- `TinyUnifiedCache` の `tail` を “top” とみなす(互換優先)
|
||||
- LIFO enabled のときは head は使わない(または 0 に固定)
|
||||
|
||||
### Patch 3: 統合点(入口で 1 回だけ)
|
||||
|
||||
統合候補(優先順):
|
||||
1) `core/box/tiny_front_hot_box.h`(hot alloc/free の実体)
|
||||
2) `core/front/tiny_unified_cache.h`(広範囲に効かせたい場合)
|
||||
|
||||
原則:
|
||||
- “mode 判定” は **関数入口で 1 回だけ**
|
||||
- hot パス中で mode の再判定を散らさない(Phase 11 の反省)
|
||||
|
||||
### Patch 4: 可視化(最小)
|
||||
|
||||
必要なときだけ:
|
||||
- LIFO hit/miss を TLS カウンタ(atomic 禁止)
|
||||
- ワンショット dump(ENV opt-in)
|
||||
|
||||
---
|
||||
|
||||
## 3. A/B(同一バイナリ)
|
||||
|
||||
Baseline:
|
||||
```sh
|
||||
HAKMEM_TINY_UNIFIED_LIFO=0 scripts/run_mixed_10_cleanenv.sh
|
||||
```
|
||||
|
||||
Optimized:
|
||||
```sh
|
||||
HAKMEM_TINY_UNIFIED_LIFO=1 scripts/run_mixed_10_cleanenv.sh
|
||||
```
|
||||
|
||||
追加(局所性が効くか):
|
||||
```sh
|
||||
HAKMEM_BENCH_C7_ONLY=1 HAKMEM_TINY_UNIFIED_LIFO=0 scripts/run_mixed_10_cleanenv.sh
|
||||
HAKMEM_BENCH_C7_ONLY=1 HAKMEM_TINY_UNIFIED_LIFO=1 scripts/run_mixed_10_cleanenv.sh
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. 健康診断
|
||||
|
||||
```sh
|
||||
scripts/verify_health_profiles.sh
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Rollback
|
||||
|
||||
- `export HAKMEM_TINY_UNIFIED_LIFO=0`
|
||||
Reference in New Issue
Block a user