# Phase 15 v1: UnifiedCache FIFO→LIFO (Stack) A/B Test Results

**Date:** 2025-12-15
**Benchmark:** Mixed (16–1024B) + C7-only (1025–2048B) 10-run cleanenv
**Target:** Transform existing UnifiedCache from FIFO ring to LIFO stack
**Expected ROI:** +5-10% (design estimate, cache locality improvement)
**GO Threshold:** +1.0% mean improvement

---

## 1. Implementation Summary

Phase 15 v1 transforms the existing array-based UnifiedCache from FIFO (ring buffer) to LIFO (stack) layout.

**Key Changes:**
- **Patch 1**: L0 ENV gate box (`tiny_unified_lifo_env_box.{h,c}`)
- **Patch 2**: L1 LIFO operations (`tiny_unified_lifo_box.h`)
- **Patch 3**: Hot path integration (`tiny_front_hot_box.h` - alloc/free both)
- **Patch 4**: Makefile updates (added `.o` files)
- **Patch 5**: bench_profile.h refresh sync

**Design:**
- Reuses existing `TinyUnifiedCache.slots[]` array (no intrusive pointers)
- `tail` treated as stack top (depth), `head` unused (always 0)
- Mode check at function entry (once per call)
- No wrap-around (`mask` unused in LIFO mode)

**ENV Control:**
```bash
export HAKMEM_TINY_UNIFIED_LIFO=0  # Baseline (FIFO)
export HAKMEM_TINY_UNIFIED_LIFO=1  # Optimized (LIFO)
```

**Bonus Fix:**
- Discovered and fixed pre-existing LTO linkage bug for `tiny_c7_preserve_header_enabled()` (Phase 13/14 latent issue)
- Converted static inline to extern declaration + non-inline implementation

---

## 2. A/B Test Results

### Mixed (16–1024B):
- **Baseline (LIFO=0):** 52,965,966 ops/s
- **Optimized (LIFO=1):** 52,593,948 ops/s
- **Delta:** **-0.70%** (regression)

### C7-only (1025–2048B):
- **Baseline (LIFO=0):** 78,010,783 ops/s
- **Optimized (LIFO=1):** 78,335,509 ops/s
- **Delta:** **+0.42%** (slight improvement)

---

## 3. Verdict: NEUTRAL

**Result:** Mixed -0.70%, C7-only +0.42% (both below GO threshold)

**Comparison to Phase 14:**
- Phase 14 v1 (tcache free-side only): Mixed +0.20% (NEUTRAL)
- Phase 14 v2 (tcache alloc+free): Mixed +0.08%, C7-only -0.39% (NEUTRAL)
- Phase 15 v1 (FIFO→LIFO): Mixed -0.70%, C7-only +0.42% (NEUTRAL)

**Root Cause:**
1. **Mode check overhead**: Entry-point `tiny_unified_lifo_enabled()` call adds branch
2. **Minimal locality delta**: LIFO vs FIFO temporal locality difference is small in practice
3. **Existing optimization**: FIFO ring implementation already well-optimized
4. **Cache warming**: TLS cache pre-warming reduces locality sensitivity

---

## 4. Recommendation: Freeze as Research Box

**Decision:** Freeze Phase 15 v1 as research box (HAKMEM_TINY_UNIFIED_LIFO=0 default, OFF)

**Rationale:**
- Neither LIFO nor FIFO shows significant advantage
- Mode switching overhead outweighs potential locality gains
- Existing FIFO ring is simple and already fast

**Next:** Explore alternative approaches:
- Hybrid strategies (per-class mode selection)
- Batch operations (reduce per-call overhead)
- Hardware prefetch hints (explicit locality control)