Phase 15 v1: UnifiedCache FIFO→LIFO NEUTRAL (-0.70% Mixed, +0.42% C7)

Transform existing array-based UnifiedCache from FIFO ring to LIFO stack.

A/B Results:
- Mixed (16-1024B): -0.70% (52,965,966 → 52,593,948 ops/s)
- C7-only (1025-2048B): +0.42% (78,010,783 → 78,335,509 ops/s)

Verdict: NEUTRAL (both below +1.0% GO threshold) - freeze as research box

Implementation:
- L0 ENV gate: tiny_unified_lifo_env_box.{h,c} (HAKMEM_TINY_UNIFIED_LIFO=0/1)
- L1 LIFO ops: tiny_unified_lifo_box.h (unified_cache_try_pop/push_lifo)
- L2 integration: tiny_front_hot_box.h (mode check at entry)
- Reuses existing slots[] array (no intrusive pointers)

Root Causes:
1. Mode check overhead (tiny_unified_lifo_enabled() call)
2. Minimal LIFO vs FIFO locality delta in practice
3. Existing FIFO ring already well-optimized

Bonus Fix: LTO bug for tiny_c7_preserve_header_enabled() (Phase 13/14 latent issue)
- Converted static inline to extern + non-inline implementation
- Fixes undefined reference during LTO linking

Design: docs/analysis/PHASE15_UNIFIEDCACHE_LIFO_1_DESIGN.md
Results: docs/analysis/PHASE15_UNIFIEDCACHE_LIFO_1_AB_TEST_RESULTS.md

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
Moe Charm (CI)
2025-12-15 02:19:26 +09:00
parent b7e01a9419
commit 87fa27518c
14 changed files with 712 additions and 17 deletions

View File

@ -268,6 +268,80 @@ Phase 6-10 で達成した累積改善:
**Future Work**: Consider per-class cap tuning or alternative pointer-chase reduction strategies
### Phase 14 v2: Pointer Chase Reduction — Hot Path Integration — NEUTRAL (+0.08%) ⚠️ RESEARCH BOX
**Date**: 2025-12-15
**Verdict**: **NEUTRAL (+0.08% Mixed)** / **-0.39% (C7-only)** — research box 維持default OFF
**Motivation**: Phase 14 v1 は “alloc 側が tcache を消費していない” 疑義があったため、`tiny_front_hot_box` の hot alloc/free に tcache を接続して再 A/B を実施。
**Results**:
| Workload | TCACHE=0 | TCACHE=1 | Delta |
|---------|----------|----------|-------|
| Mixed (161024B) | 51,287,515 | 51,330,213 | **+0.08%** |
| C7-only | 80,975,651 | 80,660,283 | **-0.39%** |
**Conclusion**:
- v2 で通電は確認したが、Mixed の “本線” 改善にはならずGO 閾値 +1.0% 未達)
- Phase 14tcache-style intrusive LIFOは現状 **freeze 維持**が妥当
**Possible root causes**(次に掘るなら):
1. `tiny_next_load/store` の fence/補助処理が TLS-only tcache には重すぎる可能性
2. `tiny_tcache_enabled/cap` の固定費load/branchが savings を相殺
3. Mixed では bin ごとの hit 率が薄いworkload mismatch
**Refs**:
- v2 results: `docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_2_AB_TEST_RESULTS.md`
- v2 instructions: `docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_2_NEXT_INSTRUCTIONS.md`
---
### Phase 15 v1: UnifiedCache FIFO→LIFO (Stack) — NEUTRAL (-0.70% Mixed, +0.42% C7) ⚠️ RESEARCH BOX
**Date**: 2025-12-15
**Verdict**: **NEUTRAL (-0.70% Mixed, +0.42% C7-only)** — research box 維持default OFF
**Motivation**: Phase 14tcache intrusiveが NEUTRAL だったため、intrusive を増やさず、既存 `TinyUnifiedCache.slots[]` を FIFO ring から LIFO stack に変更して局所性改善を狙った。
**Results**:
| Workload | LIFO=0 (FIFO) | LIFO=1 (LIFO) | Delta |
|---------|----------|----------|-------|
| Mixed (161024B) | 52,965,966 | 52,593,948 | **-0.70%** |
| C7-only (10252048B) | 78,010,783 | 78,335,509 | **+0.42%** |
**Conclusion**:
- LIFO への変更は期待した効果なしMixed で劣化、C7 で微改善だが両方 GO 閾値未達)
- モード判定分岐オーバーヘッド(`tiny_unified_lifo_enabled()`)が局所性改善を相殺
- 既存 FIFO ring 実装が既に十分最適化されている
**Root causes**:
1. Entry-point mode check overhead (`tiny_unified_lifo_enabled()` call)
2. Minimal LIFO vs FIFO locality delta in practice (cache warming mitigates)
3. Existing FIFO ring already well-optimized
**Bonus**: LTO bug fix for `tiny_c7_preserve_header_enabled()` (Phase 13/14 latent issue)
**Refs**:
- A/B results: `docs/analysis/PHASE15_UNIFIEDCACHE_LIFO_1_AB_TEST_RESULTS.md`
- Design: `docs/analysis/PHASE15_UNIFIEDCACHE_LIFO_1_DESIGN.md`
- Instructions: `docs/analysis/PHASE15_UNIFIEDCACHE_LIFO_1_NEXT_INSTRUCTIONS.md`
---
### Phase 14-15 Summary: Pointer-Chase & Cache-Shape Research ⚠️
**Conclusion**: 両 Phase とも NEUTRAL研究箱として凍結
| Phase | Approach | Mixed Delta | C7 Delta | Verdict |
|-------|----------|-------------|----------|---------|
| 14 v1 | tcache (free-side only) | +0.20% | N/A | NEUTRAL |
| 14 v2 | tcache (alloc+free) | +0.08% | -0.39% | NEUTRAL |
| 15 v1 | FIFO→LIFO (array cache) | -0.70% | +0.42% | NEUTRAL |
**教訓**:
- Pointer-chase 削減も cache 形状変更も、現状の TLS array cache に対して有意な改善を生まない
- 次の mimalloc gap約 2.4x)を埋めるには、別次元のアプローチが必要
## 更新メモ2025-12-14 Phase 5 E5-3 Analysis - Strategic Pivot
### Phase 5 E5-3: Candidate Analysis & Strategic Recommendations ⚠️ DEFER (2025-12-14)