Files

Moe Charm (CI) 87fa27518c Phase 15 v1: UnifiedCache FIFO→LIFO NEUTRAL (-0.70% Mixed, +0.42% C7)

Transform existing array-based UnifiedCache from FIFO ring to LIFO stack.

A/B Results:
- Mixed (16-1024B): -0.70% (52,965,966 → 52,593,948 ops/s)
- C7-only (1025-2048B): +0.42% (78,010,783 → 78,335,509 ops/s)

Verdict: NEUTRAL (both below +1.0% GO threshold) - freeze as research box

Implementation:
- L0 ENV gate: tiny_unified_lifo_env_box.{h,c} (HAKMEM_TINY_UNIFIED_LIFO=0/1)
- L1 LIFO ops: tiny_unified_lifo_box.h (unified_cache_try_pop/push_lifo)
- L2 integration: tiny_front_hot_box.h (mode check at entry)
- Reuses existing slots[] array (no intrusive pointers)

Root Causes:
1. Mode check overhead (tiny_unified_lifo_enabled() call)
2. Minimal LIFO vs FIFO locality delta in practice
3. Existing FIFO ring already well-optimized

Bonus Fix: LTO bug for tiny_c7_preserve_header_enabled() (Phase 13/14 latent issue)
- Converted static inline to extern + non-inline implementation
- Fixes undefined reference during LTO linking

Design: docs/analysis/PHASE15_UNIFIEDCACHE_LIFO_1_DESIGN.md
Results: docs/analysis/PHASE15_UNIFIEDCACHE_LIFO_1_AB_TEST_RESULTS.md

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

2025-12-15 02:19:26 +09:00

3.7 KiB

Raw Blame History

Phase 15: UnifiedCache FIFO→LIFO (Stack) v1 Design

Date: 2025-12-15
Status: DESIGN (next candidate)

0. Motivation (Why this next?)

Phase 14（intrusive tcache）v1/v2 は通電確認まで行ったが NEUTRAL。一方で system/mimalloc と比べると、Tiny の thread cache 形状は依然として最重要仮説のまま。

現行の TinyUnifiedCache は FIFO ring (head/tail + mask):

pop/push が毎回 head/tail と mask の更新を行う
“最近 free したブロック” を最優先で再利用できない（局所性が薄い）

glibc tcache / mimalloc 系の勝ちパターンは LIFO が多い。 Phase 15 は intrusive（nextptr）を増やさず、既存 slots[] 配列をそのまま使って **FIFO→LIFO（stack）**へ形状変更し、命令数・局所性の両方で勝ち筋を狙う。

1. Design (Box Theory)

1.1 Boxes

L0: unified_cache_shape_env_box     (ENV gate, reversible)
  ↓
L1: unified_cache_lifo_box          (push/pop LIFO only, no side effects)
  ↓
L2: existing unified_cache (FIFO)  (fallback / compatibility)

境界は 1 箇所:

LIFO disabled → FIFO を使う
LIFO enabled → LIFO を使う

（実装上は “同一関数内で分岐” になる可能性があるが、責務は箱で分ける）

2. API (minimal)

ENV:

HAKMEM_TINY_UNIFIED_LIFO=0/1 (default 0, opt-in)

L1 API（内部用、static inline）:

unified_cache_pop_lifo(int class_idx) -> BASE or NULL
unified_cache_push_lifo(int class_idx, BASE) -> 1/0

統合点（候補）:

tiny_hot_alloc_fast() / tiny_hot_free_fast()
- ここが “実ホットパス” で、FIFO/LIFO の差分を最も素直に測れる
もしくは unified_cache_pop() / unified_cache_push()
- 既存 caller を増やさず広く効かせたい場合

3. Implementation sketch (concept)

3.1 LIFO state

TinyUnifiedCache に top を追加するのではなく、v1 では互換性優先:

既存の head/tail のうち tail を “top” とみなす（stack depth）
head は常に 0（または未使用）
mask は不要（wrap-around しない）

LIFO push:

if tail < capacity: slots[tail++] = base

LIFO pop:

if tail > 0: base = slots[--tail]

（FIFO とは並立できないため、モード切替時の整合は “drain/reset” を境界に置く）

3.2 Mode switch safety (Fail-Fast)

LIFO ON へ切り替える際は unified_cache_init() 後に 各 class の head/tail を reset（empty扱い）
bench/profile では init 前に ENV が確定する前提だが、研究箱として “refresh” を持つなら
- refresh 時は切替を禁止（または drain/reset を強制）

4. A/B Plan (same binary)

Baseline:

HAKMEM_TINY_UNIFIED_LIFO=0 scripts/run_mixed_10_cleanenv.sh

Optimized:

HAKMEM_TINY_UNIFIED_LIFO=1 scripts/run_mixed_10_cleanenv.sh

追加で “局所性が効くか” を確認:

HAKMEM_BENCH_C7_ONLY=1 HAKMEM_TINY_UNIFIED_LIFO=0 scripts/run_mixed_10_cleanenv.sh
HAKMEM_BENCH_C7_ONLY=1 HAKMEM_TINY_UNIFIED_LIFO=1 scripts/run_mixed_10_cleanenv.sh

GO/NO-GO:

GO: Mixed mean +1.0% 以上
NO-GO: mean -1.0% 以下
NEUTRAL: ±1.0%（freeze）

5. Risks

mode 分岐の固定費で相殺（Phase 11/14 の再来）
→ 対策: “入口で 1 回だけ” 判定し、hot では分岐を増やさない（関数ポインタ or snapshot）
切替時の整合（FIFO state と LIFO state の互換なし）
→ 対策: refresh 時の切替は禁止 or drain/reset を境界 1 箇所に固定
容量チューニング依存
→ v1 はまず形状のみを変えて ROI を確認し、cap 探索は v2 へ分離

3.7 KiB Raw Blame History Unescape Escape