# Phase 15 v1: UnifiedCache FIFO→LIFO (Stack) A/B Test Results **Date:** 2025-12-15 **Benchmark:** Mixed (16–1024B) + C7-only (1025–2048B) 10-run cleanenv **Target:** Transform existing UnifiedCache from FIFO ring to LIFO stack **Expected ROI:** +5-10% (design estimate, cache locality improvement) **GO Threshold:** +1.0% mean improvement --- ## 1. Implementation Summary Phase 15 v1 transforms the existing array-based UnifiedCache from FIFO (ring buffer) to LIFO (stack) layout. **Key Changes:** - **Patch 1**: L0 ENV gate box (`tiny_unified_lifo_env_box.{h,c}`) - **Patch 2**: L1 LIFO operations (`tiny_unified_lifo_box.h`) - **Patch 3**: Hot path integration (`tiny_front_hot_box.h` - alloc/free both) - **Patch 4**: Makefile updates (added `.o` files) - **Patch 5**: bench_profile.h refresh sync **Design:** - Reuses existing `TinyUnifiedCache.slots[]` array (no intrusive pointers) - `tail` treated as stack top (depth), `head` unused (always 0) - Mode check at function entry (once per call) - No wrap-around (`mask` unused in LIFO mode) **ENV Control:** ```bash export HAKMEM_TINY_UNIFIED_LIFO=0 # Baseline (FIFO) export HAKMEM_TINY_UNIFIED_LIFO=1 # Optimized (LIFO) ``` **Bonus Fix:** - Discovered and fixed pre-existing LTO linkage bug for `tiny_c7_preserve_header_enabled()` (Phase 13/14 latent issue) - Converted static inline to extern declaration + non-inline implementation --- ## 2. A/B Test Results ### Mixed (16–1024B): - **Baseline (LIFO=0):** 52,965,966 ops/s - **Optimized (LIFO=1):** 52,593,948 ops/s - **Delta:** **-0.70%** (regression) ### C7-only (1025–2048B): - **Baseline (LIFO=0):** 78,010,783 ops/s - **Optimized (LIFO=1):** 78,335,509 ops/s - **Delta:** **+0.42%** (slight improvement) --- ## 3. Verdict: NEUTRAL **Result:** Mixed -0.70%, C7-only +0.42% (both below GO threshold) **Comparison to Phase 14:** - Phase 14 v1 (tcache free-side only): Mixed +0.20% (NEUTRAL) - Phase 14 v2 (tcache alloc+free): Mixed +0.08%, C7-only -0.39% (NEUTRAL) - Phase 15 v1 (FIFO→LIFO): Mixed -0.70%, C7-only +0.42% (NEUTRAL) **Root Cause:** 1. **Mode check overhead**: Entry-point `tiny_unified_lifo_enabled()` call adds branch 2. **Minimal locality delta**: LIFO vs FIFO temporal locality difference is small in practice 3. **Existing optimization**: FIFO ring implementation already well-optimized 4. **Cache warming**: TLS cache pre-warming reduces locality sensitivity --- ## 4. Recommendation: Freeze as Research Box **Decision:** Freeze Phase 15 v1 as research box (HAKMEM_TINY_UNIFIED_LIFO=0 default, OFF) **Rationale:** - Neither LIFO nor FIFO shows significant advantage - Mode switching overhead outweighs potential locality gains - Existing FIFO ring is simple and already fast **Next:** Explore alternative approaches: - Hybrid strategies (per-class mode selection) - Batch operations (reduce per-call overhead) - Hardware prefetch hints (explicit locality control)