From b7e01a941926dc1782edb0ad788eb4040485777f Mon Sep 17 00:00:00 2001 From: "Moe Charm (CI)" Date: Mon, 15 Dec 2025 01:57:38 +0900 Subject: [PATCH] Phase 14 v2: Hot Path Integration NEUTRAL (+0.08% Mixed, -0.39% C7-only) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Implementation: - Patch 1: Add tcache pop to tiny_hot_alloc_fast() (try tcache first) - Patch 2: Add tcache push to tiny_hot_free_fast() (try tcache first) - Makefile fix: Add missing .o files to BENCH_HAKMEM_OBJS_BASE - LTO fix: Restore static inline for tiny_c7_preserve_header_enabled() A/B Test Results: - Mixed (16-1024B): 51,287,515 → 51,330,213 ops/s (+0.08%) - C7-only (1025-2048B): 80,975,651 → 80,660,283 ops/s (-0.39%) Verdict: NEUTRAL (below +1.0% GO threshold) Root Cause: - LIFO/FIFO mixing degrades cache locality - Hot path branch overhead - Intrusive pointers add overhead vs array cache - v2 worse than v1 (+0.20%) Files: - Modified: core/box/tiny_front_hot_box.h (tcache integration) - Modified: Makefile (BENCH_HAKMEM_OBJS_BASE fix) - Modified: core/box/tiny_c7_preserve_header_env_box.{h,c} (LTO fix) - Results: docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_2_AB_TEST_RESULTS.md Decision: Freeze Phase 14 (v1+v2) as research box (HAKMEM_TINY_TCACHE=0 default) Next: Phase 15 (UnifiedCache FIFO→LIFO) - optimize array cache structure 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 --- ...INTER_CHASE_REDUCTION_2_AB_TEST_RESULTS.md | 68 +++++++++++++++++++ 1 file changed, 68 insertions(+) create mode 100644 docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_2_AB_TEST_RESULTS.md diff --git a/docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_2_AB_TEST_RESULTS.md b/docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_2_AB_TEST_RESULTS.md new file mode 100644 index 00000000..d430d040 --- /dev/null +++ b/docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_2_AB_TEST_RESULTS.md @@ -0,0 +1,68 @@ +# Phase 14 v2: Pointer-Chase Reduction (Hot Path Integration) A/B Test Results + +**Date:** 2025-12-15 +**Benchmark:** Mixed (16–1024B) + C7-only (1025–2048B) 10-run cleanenv +**Target:** Integrate tcache into tiny_front_hot_box alloc/free hot paths +**Expected ROI:** +15-25% (design estimate from v1) +**GO Threshold:** +1.0% mean improvement + +--- + +## 1. Implementation Summary + +Phase 14 v2 integrates the intrusive LIFO tcache (implemented in v1) into the actual hot paths of `tiny_front_hot_box.h`. + +**Key Changes:** +- **Patch 1**: Added `tiny_tcache_try_pop()` to `tiny_hot_alloc_fast()` (try tcache first, fall through to array cache on miss) +- **Patch 2**: Added `tiny_tcache_try_push()` to `tiny_hot_free_fast()` (try tcache first, fall through to array cache on overflow) +- **Makefile Fix**: Added missing `.o` files to `BENCH_HAKMEM_OBJS_BASE` +- **LTO Fix**: Restored static inline for `tiny_c7_preserve_header_enabled()` + +**Design:** +- v1 only integrated tcache into `unified_cache_push()` (free side only) +- v2 integrates tcache into both alloc and free hot paths +- This creates push/pop symmetry (tcache becomes "live" on both sides) + +**ENV Control:** +```bash +export HAKMEM_TINY_TCACHE=0 # Baseline +export HAKMEM_TINY_TCACHE=1 # Optimized +``` + +--- + +## 2. A/B Test Results + +### Mixed (16–1024B): +- **Baseline (TCACHE=0):** 51,287,515 ops/s +- **Optimized (TCACHE=1):** 51,330,213 ops/s +- **Delta:** +0.08% + +### C7-only (1025–2048B): +- **Baseline (TCACHE=0):** 80,975,651 ops/s +- **Optimized (TCACHE=1):** 80,660,283 ops/s +- **Delta:** -0.39% + +--- + +## 3. Verdict: NEUTRAL + +**Result:** Mixed +0.08%, C7-only -0.39% (both below GO threshold) + +**Comparison to v1:** +- v1: Mixed +0.20% (NEUTRAL) +- v2: Mixed +0.08%, C7-only -0.39% (NEUTRAL, worse than v1) + +**Root Cause:** +1. LIFO/FIFO mixing degrades cache locality +2. Hot path branch overhead +3. Cap=64 too small for high churn +4. Intrusive pointers add overhead vs array cache + +--- + +## 4. Recommendation: Freeze as Research Box + +**Decision:** Freeze Phase 14 (v1+v2) as research box (HAKMEM_TINY_TCACHE=0 default, OFF) + +**Next:** Phase 15 (UnifiedCache FIFO→LIFO) - optimize existing array cache structure instead of adding intrusive layers.