# Phase 31: Tiny Free Trace Atomic Prune - Results **Date:** 2025-12-16 **Type:** HOT path TELEMETRY atomic prune **Target:** `g_tiny_free_trace` atomic in `core/hakmem_tiny_free.inc:326` **Verdict:** NEUTRAL (code cleanliness adopted) --- ## Executive Summary Phase 31 targeted the `g_tiny_free_trace` atomic in the HOT path (`hak_tiny_free()` entry point). A/B testing showed **NEUTRAL performance** (-0.35% mean, +0.19% median), well within noise range (±0.5%). Following Phase 26 precedent (5 atomics, -0.33%, adopted for code cleanliness), **Phase 31 is ADOPTED** with COMPILED=0 as default to reduce HOT path complexity. --- ## Background ### Phase 30 Selection Process From 412 total atomics audited: - **HOT path candidates:** 16 total - 5 TELEMETRY (4 already compiled-out in Phases 24-27) - 11 UNKNOWN (require manual review) **Phase 31 candidate selected:** `g_tiny_free_trace` (HOT path, TELEMETRY, TOP PRIORITY) **Step 0 verification (MANDATORY):** - No ENV gate → always active - Located in `hak_tiny_free()` → executes on EVERY tiny free call - Mixed benchmark heavily exercises free path → high execution count - **Execution confirmed:** First instruction in HOT path function ### Target Profile **Location:** `core/hakmem_tiny_free.inc:326` **Original Code:** ```c void hak_tiny_free(void* ptr) { static _Atomic int g_tiny_free_trace = 0; if (atomic_fetch_add_explicit(&g_tiny_free_trace, 1, memory_order_relaxed) < 128) { HAK_TRACE("[hak_tiny_free_enter]\n"); } // ... rest of function ... } ``` **Classification:** - **Class:** TELEMETRY (trace rate-limit only) - **Path:** HOT (every tiny free operation) - **Flow Control:** None (only affects `HAK_TRACE` macro output) - **Correctness Impact:** None **Similar precedent:** Phase 25 (`g_free_ss_enter`: +1.07% GO) --- ## Implementation (4-Step Standard Procedure) ### Step 0: Execution Verification (Phase 29 lesson) **ENV gate check:** ```bash $ rg "getenv.*TRACE" core/ --type c # (No results - no ENV gate blocking execution) ``` **Execution check:** - Located at entry of `hak_tiny_free()` (line 326) - Executes on EVERY tiny free call (no conditional bypass) - Mixed benchmark: ~10M+ free operations per run - **Verification:** PASSED (always active) ### Step 1: CORRECTNESS/TELEMETRY Classification (Phase 28 lesson) **Full usage audit:** ```bash $ rg -n "g_tiny_free_trace" core/ core/hakmem_tiny_free.inc:326: static _Atomic int g_tiny_free_trace = 0; core/hakmem_tiny_free.inc:327: if (atomic_fetch_add_explicit(&g_tiny_free_trace, 1, memory_order_relaxed) < 128) { ``` **Analysis:** - Only 2 uses: declaration + atomic increment - No `if` conditions using the counter value - Only affects `HAK_TRACE` printf (debug macro) - **Classification:** Pure TELEMETRY ✅ ### Step 2: Compile-Out Implementation **File 1:** `core/hakmem_build_flags.h` **Added:** ```c // ------------------------------------------------------------ // Phase 31: Tiny Free Trace Atomic Prune (Compile-out trace atomic) // ------------------------------------------------------------ // Tiny Free Trace: Compile gate (default OFF = compile-out) // Set to 1 for research builds that need free path trace diagnostics // Target: g_tiny_free_trace atomic in core/hakmem_tiny_free.inc:326 // Impact: HOT path atomic (every free operation) // Expected improvement: +0.5% to +1.0% (similar to Phase 25: +1.07%) #ifndef HAKMEM_TINY_FREE_TRACE_COMPILED # define HAKMEM_TINY_FREE_TRACE_COMPILED 0 #endif ``` **File 2:** `core/hakmem_tiny_free.inc:326` **Before:** ```c void hak_tiny_free(void* ptr) { static _Atomic int g_tiny_free_trace = 0; if (atomic_fetch_add_explicit(&g_tiny_free_trace, 1, memory_order_relaxed) < 128) { HAK_TRACE("[hak_tiny_free_enter]\n"); } // ... rest of function ... } ``` **After:** ```c void hak_tiny_free(void* ptr) { #if HAKMEM_TINY_FREE_TRACE_COMPILED static _Atomic int g_tiny_free_trace = 0; if (atomic_fetch_add_explicit(&g_tiny_free_trace, 1, memory_order_relaxed) < 128) { HAK_TRACE("[hak_tiny_free_enter]\n"); } #else (void)0; // No-op when trace compiled out #endif // ... rest of function ... } ``` **Include verification:** - `hakmem_build_flags.h` included transitively via `tiny_front_config_box.h` - No explicit include needed ### Step 3: A/B Test (Build-Level Comparison) **Baseline (COMPILED=0, default - trace compiled-out):** ```bash make clean && make -j bench_random_mixed_hakmem scripts/run_mixed_10_cleanenv.sh ``` **Compiled-in (COMPILED=1, research - trace active):** ```bash make clean && make -j EXTRA_CFLAGS='-DHAKMEM_TINY_FREE_TRACE_COMPILED=1' bench_random_mixed_hakmem scripts/run_mixed_10_cleanenv.sh ``` --- ## A/B Test Results ### Raw Data (10-run clean environment) **Baseline (COMPILED=0, trace compiled-out):** ``` Run 1: 53432447 ops/s Run 2: 53846666 ops/s Run 3: 53256003 ops/s Run 4: 54007573 ops/s Run 5: 54132468 ops/s Run 6: 53937278 ops/s Run 7: 53752216 ops/s Run 8: 53106138 ops/s Run 9: 53861749 ops/s Run 10: 53052398 ops/s ``` **Compiled-in (COMPILED=1, trace active):** ``` Run 1: 53667388 ops/s Run 2: 53623799 ops/s Run 3: 54099595 ops/s Run 4: 53993106 ops/s Run 5: 53530214 ops/s Run 6: 54275707 ops/s Run 7: 53726604 ops/s Run 8: 53607801 ops/s Run 9: 54122912 ops/s Run 10: 53630312 ops/s ``` ### Statistical Analysis | Metric | Baseline (COMPILED=0) | Compiled-in (COMPILED=1) | Difference | |--------|----------------------|-------------------------|------------| | **Mean** | 53,638,493.60 ops/s | 53,827,743.80 ops/s | **-0.35%** | | **Median** | 53,799,441.00 ops/s | 53,696,996.00 ops/s | **+0.19%** | | **Stdev** | 393,174.93 (0.73%) | 267,178.23 (0.50%) | - | **Difference interpretation:** - **Mean:** Baseline -0.35% (SLOWER, but within noise) - **Median:** Baseline +0.19% (FASTER, but within noise) - **Verdict range:** Both within ±0.5% NEUTRAL threshold --- ## Verdict ### Performance: NEUTRAL **Criteria:** - GO: +0.5% or more (compile-out wins) - NEUTRAL: ±0.5% (no significant difference) - NO-GO: -0.5% or worse (compile-out loses) **Result:** NEUTRAL (-0.35% mean, +0.19% median) **Analysis:** - Mean shows slight regression (-0.35%), median shows slight improvement (+0.19%) - Conflicting signals suggest **measurement noise** rather than true effect - Standard deviation overlap confirms lack of statistical significance - Similar to Phase 26 pattern (-0.33%, 5 atomics, NEUTRAL) ### Decision: ADOPTED (COMPILED=0 default) **Rationale (following Phase 26 precedent):** 1. **Code Cleanliness:** - Removes unused TELEMETRY atomic from HOT path - Reduces complexity at `hak_tiny_free()` entry point - No correctness impact (pure trace macro) 2. **Consistency:** - Phase 26 precedent: -0.33% NEUTRAL result adopted for cleanliness - Phase 31: -0.35% NEUTRAL result follows same logic - Maintains atomic prune momentum (Phases 24-31) 3. **Research Flexibility:** - `COMPILED=1` still available for trace diagnostics - No functionality lost, only default changed - Easy revert if needed (`make EXTRA_CFLAGS=-DHAKMEM_TINY_FREE_TRACE_COMPILED=1`) 4. **Why Not NO-GO?** - Median +0.19% (slight win, not loss) - Mean -0.35% within noise range (±0.5% threshold) - Phase 26 set precedent: NEUTRAL + cleanliness = ADOPT --- ## Comparison: Phase 25 vs Phase 31 **Phase 25:** `g_free_ss_enter` (free stats atomic) - **Location:** `tiny_superslab_free.inc.h:25` (entry point) - **Result:** +1.07% (GO) - **Path:** Same HOT path (free entry) - **Similarity:** Both trace/stats atomics at free entry **Phase 31:** `g_tiny_free_trace` (trace rate-limit atomic) - **Location:** `hakmem_tiny_free.inc:326` (entry point) - **Result:** -0.35% mean, +0.19% median (NEUTRAL) - **Path:** Same HOT path (free entry) - **Difference:** Rate-limited (128 calls) vs always-increment **Why different results?** 1. **Execution frequency:** - Phase 25: EVERY free call increments stats - Phase 31: EVERY free call increments, but trace only 128 times - **Hypothesis:** Phase 25's always-active stats had higher overhead 2. **Atomic placement:** - Phase 25: Inside `hak_tiny_free_superslab()` (deeper in call stack) - Phase 31: First instruction in `hak_tiny_free()` (entry point) - **Hypothesis:** Entry point atomic may be better optimized by compiler 3. **Measurement variance:** - Phase 25: Clear +1.07% signal above noise - Phase 31: -0.35% / +0.19% conflicting signals (noise) - **Conclusion:** Phase 31 likely true NEUTRAL, not hidden win --- ## Lessons Learned ### 1. HOT Path ≠ Guaranteed Win **Previous assumption (from Phase 25):** - HOT path TELEMETRY atomic → +0.5% to +1.0% expected **Phase 31 reality:** - HOT path TELEMETRY atomic → NEUTRAL (±0.0%) **Insight:** - Not all HOT path atomics have measurable overhead - Rate-limited trace (128 calls) may be optimized away by compiler - Entry point placement may reduce overhead vs mid-function ### 2. NEUTRAL + Cleanliness = ADOPT **Established precedent (Phase 26):** - 5 diagnostic atomics, -0.33% NEUTRAL result - Adopted for code cleanliness despite no performance win **Phase 31 confirms:** - -0.35% NEUTRAL result, same adoption logic - Code cleanliness is valid secondary criterion - Maintains atomic prune momentum (Phases 24-31) ### 3. Step 0 (Execution Verification) Essential **Phase 31 validated:** - Step 0 confirmed no ENV gate → always active - Prevented Phase 29 "empty bench" scenario - Standard procedure working as designed --- ## Next Steps ### Phase 32 Candidate: `g_hak_tiny_free_calls` **Location:** `core/hakmem_tiny_free.inc:335` (same function, 9 lines after Phase 31 target) **Code context:** ```c void hak_tiny_free(void* ptr) { #if HAKMEM_TINY_FREE_TRACE_COMPILED // Phase 31 target (now compiled-out) #endif // Track total tiny free calls (diagnostics) extern _Atomic uint64_t g_hak_tiny_free_calls; atomic_fetch_add_explicit(&g_hak_tiny_free_calls, 1, memory_order_relaxed); // ← Phase 32 target // ... rest of function ... } ``` **Profile:** - **Path:** HOT (every tiny free call, same as Phase 31) - **Classification:** TELEMETRY (diagnostic counter, no flow control) - **Expected:** +0.3% to +0.7% (smaller than Phase 25, similar to Phase 31) - **Step 0 verification needed:** Check for ENV gate, confirm execution **Alternative candidates:** - Manual review of UNKNOWN atomics (284 candidates from Phase 30 audit) - Lower priority than confirmed HOT path targets --- ## Files Modified ### Code Changes 1. **`core/hakmem_build_flags.h`** - Added `HAKMEM_TINY_FREE_TRACE_COMPILED` flag (default OFF) - Lines 363-373 2. **`core/hakmem_tiny_free.inc`** - Wrapped `g_tiny_free_trace` atomic in `#if HAKMEM_TINY_FREE_TRACE_COMPILED` - Lines 326-333 ### Documentation 1. **`docs/analysis/PHASE31_TINY_FREE_TRACE_ATOMIC_PRUNE_RESULTS.md`** (this file) - A/B test results - NEUTRAL verdict + code cleanliness adoption - Phase 32 candidate proposal 2. **`docs/analysis/ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md`** (to be updated) - Phase 24-31 cumulative summary - Updated precedents section - Phase 32 roadmap 3. **`CURRENT_TASK.md`** (to be updated) - Phase 31 completion - Phase 32 candidate recommendation --- ## Cumulative Progress (Phases 24-31) | Phase | Target | Atomics | Result | Status | |-------|--------|---------|--------|--------| | **24** | Tiny Class Stats (OBSERVE) | 5 | **+0.93%** | GO ✅ | | **25** | Free Stats (`g_free_ss_enter`) | 1 | **+1.07%** | GO ✅ | | **26** | Hot Path Diagnostics | 5 | **-0.33%** | NEUTRAL ✅ | | **27** | Unified Cache Stats | 6 | **+0.74%** | GO ✅ | | **28** | Background Spill Queue | 8 | N/A | NO-OP ✅ | | **29** | Pool Hotbox v2 Stats | 12 | **0.00%** | NO-OP ✅ | | **30** | Standard Procedure | 412 audit | N/A | PROCEDURE ✅ | | **31** | Tiny Free Trace | 1 | **-0.35%** | NEUTRAL ✅ | | **Total** | **18 atomics removed** | **+2.74%** | **net cumulative** | **✅** | **Net cumulative gain:** +2.74% (Phases 24+25+27, excluding NEUTRAL 26+31) **Note:** Phase 26 and 31 NEUTRAL results do not degrade cumulative gain (no regression). --- ## Conclusion Phase 31 demonstrates that **not all HOT path TELEMETRY atomics have measurable overhead**. While Phase 25 (`g_free_ss_enter`) delivered +1.07%, Phase 31 (`g_tiny_free_trace`) showed NEUTRAL performance (-0.35% mean, +0.19% median). Following Phase 26 precedent, **Phase 31 is ADOPTED** with COMPILED=0 as default for **code cleanliness** benefits. **Key takeaways:** 1. HOT path location does not guarantee performance wins 2. NEUTRAL + code cleanliness is valid adoption criterion (Phase 26/31 pattern) 3. Standard 4-step procedure successfully prevented false positives (Step 0 execution check) 4. Phase 32 candidate ready: `g_hak_tiny_free_calls` (same HOT path, 9 lines below) **Recommendation:** Proceed to Phase 32 (`g_hak_tiny_free_calls`) following same 4-step procedure.