Transform existing array-based UnifiedCache from FIFO ring to LIFO stack.
A/B Results:
- Mixed (16-1024B): -0.70% (52,965,966 → 52,593,948 ops/s)
- C7-only (1025-2048B): +0.42% (78,010,783 → 78,335,509 ops/s)
Verdict: NEUTRAL (both below +1.0% GO threshold) - freeze as research box
Implementation:
- L0 ENV gate: tiny_unified_lifo_env_box.{h,c} (HAKMEM_TINY_UNIFIED_LIFO=0/1)
- L1 LIFO ops: tiny_unified_lifo_box.h (unified_cache_try_pop/push_lifo)
- L2 integration: tiny_front_hot_box.h (mode check at entry)
- Reuses existing slots[] array (no intrusive pointers)
Root Causes:
1. Mode check overhead (tiny_unified_lifo_enabled() call)
2. Minimal LIFO vs FIFO locality delta in practice
3. Existing FIFO ring already well-optimized
Bonus Fix: LTO bug for tiny_c7_preserve_header_enabled() (Phase 13/14 latent issue)
- Converted static inline to extern + non-inline implementation
- Fixes undefined reference during LTO linking
Design: docs/analysis/PHASE15_UNIFIEDCACHE_LIFO_1_DESIGN.md
Results: docs/analysis/PHASE15_UNIFIEDCACHE_LIFO_1_AB_TEST_RESULTS.md
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2.9 KiB
2.9 KiB
Phase 15 v1: UnifiedCache FIFO→LIFO (Stack) A/B Test Results
Date: 2025-12-15 Benchmark: Mixed (16–1024B) + C7-only (1025–2048B) 10-run cleanenv Target: Transform existing UnifiedCache from FIFO ring to LIFO stack Expected ROI: +5-10% (design estimate, cache locality improvement) GO Threshold: +1.0% mean improvement
1. Implementation Summary
Phase 15 v1 transforms the existing array-based UnifiedCache from FIFO (ring buffer) to LIFO (stack) layout.
Key Changes:
- Patch 1: L0 ENV gate box (
tiny_unified_lifo_env_box.{h,c}) - Patch 2: L1 LIFO operations (
tiny_unified_lifo_box.h) - Patch 3: Hot path integration (
tiny_front_hot_box.h- alloc/free both) - Patch 4: Makefile updates (added
.ofiles) - Patch 5: bench_profile.h refresh sync
Design:
- Reuses existing
TinyUnifiedCache.slots[]array (no intrusive pointers) tailtreated as stack top (depth),headunused (always 0)- Mode check at function entry (once per call)
- No wrap-around (
maskunused in LIFO mode)
ENV Control:
export HAKMEM_TINY_UNIFIED_LIFO=0 # Baseline (FIFO)
export HAKMEM_TINY_UNIFIED_LIFO=1 # Optimized (LIFO)
Bonus Fix:
- Discovered and fixed pre-existing LTO linkage bug for
tiny_c7_preserve_header_enabled()(Phase 13/14 latent issue) - Converted static inline to extern declaration + non-inline implementation
2. A/B Test Results
Mixed (16–1024B):
- Baseline (LIFO=0): 52,965,966 ops/s
- Optimized (LIFO=1): 52,593,948 ops/s
- Delta: -0.70% (regression)
C7-only (1025–2048B):
- Baseline (LIFO=0): 78,010,783 ops/s
- Optimized (LIFO=1): 78,335,509 ops/s
- Delta: +0.42% (slight improvement)
3. Verdict: NEUTRAL
Result: Mixed -0.70%, C7-only +0.42% (both below GO threshold)
Comparison to Phase 14:
- Phase 14 v1 (tcache free-side only): Mixed +0.20% (NEUTRAL)
- Phase 14 v2 (tcache alloc+free): Mixed +0.08%, C7-only -0.39% (NEUTRAL)
- Phase 15 v1 (FIFO→LIFO): Mixed -0.70%, C7-only +0.42% (NEUTRAL)
Root Cause:
- Mode check overhead: Entry-point
tiny_unified_lifo_enabled()call adds branch - Minimal locality delta: LIFO vs FIFO temporal locality difference is small in practice
- Existing optimization: FIFO ring implementation already well-optimized
- Cache warming: TLS cache pre-warming reduces locality sensitivity
4. Recommendation: Freeze as Research Box
Decision: Freeze Phase 15 v1 as research box (HAKMEM_TINY_UNIFIED_LIFO=0 default, OFF)
Rationale:
- Neither LIFO nor FIFO shows significant advantage
- Mode switching overhead outweighs potential locality gains
- Existing FIFO ring is simple and already fast
Next: Explore alternative approaches:
- Hybrid strategies (per-class mode selection)
- Batch operations (reduce per-call overhead)
- Hardware prefetch hints (explicit locality control)