Files

Moe Charm (CI) 87fa27518c Phase 15 v1: UnifiedCache FIFO→LIFO NEUTRAL (-0.70% Mixed, +0.42% C7)

Transform existing array-based UnifiedCache from FIFO ring to LIFO stack.

A/B Results:
- Mixed (16-1024B): -0.70% (52,965,966 → 52,593,948 ops/s)
- C7-only (1025-2048B): +0.42% (78,010,783 → 78,335,509 ops/s)

Verdict: NEUTRAL (both below +1.0% GO threshold) - freeze as research box

Implementation:
- L0 ENV gate: tiny_unified_lifo_env_box.{h,c} (HAKMEM_TINY_UNIFIED_LIFO=0/1)
- L1 LIFO ops: tiny_unified_lifo_box.h (unified_cache_try_pop/push_lifo)
- L2 integration: tiny_front_hot_box.h (mode check at entry)
- Reuses existing slots[] array (no intrusive pointers)

Root Causes:
1. Mode check overhead (tiny_unified_lifo_enabled() call)
2. Minimal LIFO vs FIFO locality delta in practice
3. Existing FIFO ring already well-optimized

Bonus Fix: LTO bug for tiny_c7_preserve_header_enabled() (Phase 13/14 latent issue)
- Converted static inline to extern + non-inline implementation
- Fixes undefined reference during LTO linking

Design: docs/analysis/PHASE15_UNIFIEDCACHE_LIFO_1_DESIGN.md
Results: docs/analysis/PHASE15_UNIFIEDCACHE_LIFO_1_AB_TEST_RESULTS.md

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

2025-12-15 02:19:26 +09:00

2.9 KiB

Raw Blame History

Phase 15 v1: UnifiedCache FIFO→LIFO (Stack) A/B Test Results

Date: 2025-12-15 Benchmark: Mixed (16–1024B) + C7-only (1025–2048B) 10-run cleanenv Target: Transform existing UnifiedCache from FIFO ring to LIFO stack Expected ROI: +5-10% (design estimate, cache locality improvement) GO Threshold: +1.0% mean improvement

1. Implementation Summary

Phase 15 v1 transforms the existing array-based UnifiedCache from FIFO (ring buffer) to LIFO (stack) layout.

Key Changes:

Patch 1: L0 ENV gate box (tiny_unified_lifo_env_box.{h,c})
Patch 2: L1 LIFO operations (tiny_unified_lifo_box.h)
Patch 3: Hot path integration (tiny_front_hot_box.h - alloc/free both)
Patch 4: Makefile updates (added .o files)
Patch 5: bench_profile.h refresh sync

Design:

Reuses existing TinyUnifiedCache.slots[] array (no intrusive pointers)
tail treated as stack top (depth), head unused (always 0)
Mode check at function entry (once per call)
No wrap-around (mask unused in LIFO mode)

ENV Control:

export HAKMEM_TINY_UNIFIED_LIFO=0  # Baseline (FIFO)
export HAKMEM_TINY_UNIFIED_LIFO=1  # Optimized (LIFO)

Bonus Fix:

Discovered and fixed pre-existing LTO linkage bug for tiny_c7_preserve_header_enabled() (Phase 13/14 latent issue)
Converted static inline to extern declaration + non-inline implementation

2. A/B Test Results

Mixed (16–1024B):

Baseline (LIFO=0): 52,965,966 ops/s
Optimized (LIFO=1): 52,593,948 ops/s
Delta: -0.70% (regression)

C7-only (1025–2048B):

Baseline (LIFO=0): 78,010,783 ops/s
Optimized (LIFO=1): 78,335,509 ops/s
Delta: +0.42% (slight improvement)

3. Verdict: NEUTRAL

Result: Mixed -0.70%, C7-only +0.42% (both below GO threshold)

Comparison to Phase 14:

Phase 14 v1 (tcache free-side only): Mixed +0.20% (NEUTRAL)
Phase 14 v2 (tcache alloc+free): Mixed +0.08%, C7-only -0.39% (NEUTRAL)
Phase 15 v1 (FIFO→LIFO): Mixed -0.70%, C7-only +0.42% (NEUTRAL)

Root Cause:

Mode check overhead: Entry-point tiny_unified_lifo_enabled() call adds branch
Minimal locality delta: LIFO vs FIFO temporal locality difference is small in practice
Existing optimization: FIFO ring implementation already well-optimized
Cache warming: TLS cache pre-warming reduces locality sensitivity

4. Recommendation: Freeze as Research Box

Decision: Freeze Phase 15 v1 as research box (HAKMEM_TINY_UNIFIED_LIFO=0 default, OFF)

Rationale:

Neither LIFO nor FIFO shows significant advantage
Mode switching overhead outweighs potential locality gains
Existing FIFO ring is simple and already fast

Next: Explore alternative approaches:

Hybrid strategies (per-class mode selection)
Batch operations (reduce per-call overhead)
Hardware prefetch hints (explicit locality control)

2.9 KiB Raw Blame History Unescape Escape