Files
hakmem/docs/analysis/PHASE15_UNIFIEDCACHE_LIFO_1_AB_TEST_RESULTS.md

84 lines
2.9 KiB
Markdown
Raw Normal View History

# Phase 15 v1: UnifiedCache FIFO→LIFO (Stack) A/B Test Results
**Date:** 2025-12-15
**Benchmark:** Mixed (161024B) + C7-only (10252048B) 10-run cleanenv
**Target:** Transform existing UnifiedCache from FIFO ring to LIFO stack
**Expected ROI:** +5-10% (design estimate, cache locality improvement)
**GO Threshold:** +1.0% mean improvement
---
## 1. Implementation Summary
Phase 15 v1 transforms the existing array-based UnifiedCache from FIFO (ring buffer) to LIFO (stack) layout.
**Key Changes:**
- **Patch 1**: L0 ENV gate box (`tiny_unified_lifo_env_box.{h,c}`)
- **Patch 2**: L1 LIFO operations (`tiny_unified_lifo_box.h`)
- **Patch 3**: Hot path integration (`tiny_front_hot_box.h` - alloc/free both)
- **Patch 4**: Makefile updates (added `.o` files)
- **Patch 5**: bench_profile.h refresh sync
**Design:**
- Reuses existing `TinyUnifiedCache.slots[]` array (no intrusive pointers)
- `tail` treated as stack top (depth), `head` unused (always 0)
- Mode check at function entry (once per call)
- No wrap-around (`mask` unused in LIFO mode)
**ENV Control:**
```bash
export HAKMEM_TINY_UNIFIED_LIFO=0 # Baseline (FIFO)
export HAKMEM_TINY_UNIFIED_LIFO=1 # Optimized (LIFO)
```
**Bonus Fix:**
- Discovered and fixed pre-existing LTO linkage bug for `tiny_c7_preserve_header_enabled()` (Phase 13/14 latent issue)
- Converted static inline to extern declaration + non-inline implementation
---
## 2. A/B Test Results
### Mixed (161024B):
- **Baseline (LIFO=0):** 52,965,966 ops/s
- **Optimized (LIFO=1):** 52,593,948 ops/s
- **Delta:** **-0.70%** (regression)
### C7-only (10252048B):
- **Baseline (LIFO=0):** 78,010,783 ops/s
- **Optimized (LIFO=1):** 78,335,509 ops/s
- **Delta:** **+0.42%** (slight improvement)
---
## 3. Verdict: NEUTRAL
**Result:** Mixed -0.70%, C7-only +0.42% (both below GO threshold)
**Comparison to Phase 14:**
- Phase 14 v1 (tcache free-side only): Mixed +0.20% (NEUTRAL)
- Phase 14 v2 (tcache alloc+free): Mixed +0.08%, C7-only -0.39% (NEUTRAL)
- Phase 15 v1 (FIFO→LIFO): Mixed -0.70%, C7-only +0.42% (NEUTRAL)
**Root Cause:**
1. **Mode check overhead**: Entry-point `tiny_unified_lifo_enabled()` call adds branch
2. **Minimal locality delta**: LIFO vs FIFO temporal locality difference is small in practice
3. **Existing optimization**: FIFO ring implementation already well-optimized
4. **Cache warming**: TLS cache pre-warming reduces locality sensitivity
---
## 4. Recommendation: Freeze as Research Box
**Decision:** Freeze Phase 15 v1 as research box (HAKMEM_TINY_UNIFIED_LIFO=0 default, OFF)
**Rationale:**
- Neither LIFO nor FIFO shows significant advantage
- Mode switching overhead outweighs potential locality gains
- Existing FIFO ring is simple and already fast
**Next:** Explore alternative approaches:
- Hybrid strategies (per-class mode selection)
- Batch operations (reduce per-call overhead)
- Hardware prefetch hints (explicit locality control)