# Phase 26: Hot Path Atomic Telemetry Prune - Audit & Plan **Date:** 2025-12-16 **Purpose:** Identify and compile-out telemetry-only atomics in hot alloc/free paths **Pattern:** Follow Phase 24 (tiny_class_stats) + Phase 25 (g_free_ss_enter) **Expected Gain:** +2-3% cumulative improvement --- ## Executive Summary **Goal:** Remove all telemetry-only `atomic_fetch_add/sub` from hot paths (alloc/free direct paths). **Methodology:** 1. Audit all atomics in `core/` directory 2. Classify: **CORRECTNESS** (keep) vs **TELEMETRY** (compile-out) 3. Prioritize: **HOT** (direct alloc/free) > **WARM** (refill/spill) > **COLD** (init/shutdown) 4. Implement compile gates following Phase 24+25 pattern 5. A/B test each candidate independently **Status:** Phase 25 complete (+1.07% GO). Starting Phase 26. --- ## Classification Criteria ### CORRECTNESS (Do NOT touch) - Remote queue management: `remote_count`, `remote_head`, `remote_tail` - Refcount/ownership: `refcount`, `owner`, `in_use`, `active` - Lock/synchronization: `lock`, `mutex`, `head`, `tail` (queue atomics) - Metadata: `meta->used`, `meta->active`, `meta->tls_cached` ### TELEMETRY (Candidate for compile-out) - Stats counters: `*_stats`, `*_count`, `*_calls` - Diagnostics: `*_trace`, `*_debug`, `*_diag`, `*_log` - Observability: `*_enter`, `*_exit`, `*_hit`, `*_miss`, `*_attempt`, `*_success` - Metrics: `g_metric_*`, `g_dbg_*`, `g_rel_*` --- ## Phase 26 Candidates: HOT PATH TELEMETRY ATOMICS ### Priority A: Direct Free Path (tiny_superslab_free.inc.h) #### 1. `g_free_ss_enter` - **ALREADY DONE (Phase 25)** - **Status:** GO (+1.07%) - **Location:** `core/tiny_superslab_free.inc.h:22` - **Gate:** `HAKMEM_TINY_FREE_STATS_COMPILED` - **Verdict:** Keep compiled-out (default: 0) #### 2. `c7_free_count` - **NEW CANDIDATE** - **Location:** `core/tiny_superslab_free.inc.h:51` - **Code:** `atomic_fetch_add_explicit(&c7_free_count, 1, memory_order_relaxed);` - **Purpose:** Debug counter for C7 free path diagnostics - **Path:** HOT (free superslab fast path) - **Expected Gain:** +0.3-0.8% - **Priority:** HIGH - **Action:** Create Phase 26A #### 3. `g_hdr_mismatch_log` - **NEW CANDIDATE** - **Location:** `core/tiny_superslab_free.inc.h:147` - **Code:** `atomic_fetch_add_explicit(&g_hdr_mismatch_log, 1, memory_order_relaxed);` - **Purpose:** Log header validation mismatches (debug only) - **Path:** HOT (free path validation) - **Expected Gain:** +0.2-0.5% - **Priority:** HIGH - **Action:** Create Phase 26B #### 4. `g_hdr_meta_mismatch` - **NEW CANDIDATE** - **Location:** `core/tiny_superslab_free.inc.h:182` - **Code:** `atomic_fetch_add_explicit(&g_hdr_meta_mismatch, 1, memory_order_relaxed);` - **Purpose:** Log metadata validation failures (debug only) - **Path:** HOT (free path validation) - **Expected Gain:** +0.2-0.5% - **Priority:** HIGH - **Action:** Create Phase 26C --- ### Priority B: Direct Alloc Path #### 5. `g_metric_bad_class_once` - **NEW CANDIDATE** - **Location:** `core/hakmem_tiny_alloc.inc:22` - **Code:** `atomic_fetch_add_explicit(&g_metric_bad_class_once, 1, memory_order_relaxed)` - **Purpose:** One-shot metric for bad class index (safety check) - **Path:** HOT (alloc entry gate) - **Expected Gain:** +0.1-0.3% - **Priority:** MEDIUM - **Action:** Create Phase 26D #### 6. `g_hdr_meta_fast` - **NEW CANDIDATE** - **Location:** `core/tiny_free_fast_v2.inc.h:181` - **Code:** `atomic_fetch_add_explicit(&g_hdr_meta_fast, 1, memory_order_relaxed);` - **Purpose:** Fast-path header metadata hit counter (telemetry) - **Path:** HOT (free_fast_v2 path) - **Expected Gain:** +0.3-0.7% - **Priority:** HIGH - **Action:** Create Phase 26E --- ### Priority C: Warm Path (Refill/Spill) #### 7. `g_bg_spill_len` - **BORDERLINE** - **Location:** `core/hakmem_tiny_bg_spill.h:32,44` - **Code:** `atomic_fetch_add_explicit(&g_bg_spill_len[class_idx], ...)` - **Purpose:** Background spill queue length tracking - **Path:** WARM (spill path) - **Expected Gain:** +0.1-0.2% - **Priority:** MEDIUM - **Note:** May be CORRECTNESS if queue length is used for flow control - **Action:** Review code, then decide (Phase 27+) #### 8. Unified Cache Stats - **MULTIPLE ATOMICS** - **Location:** `core/front/tiny_unified_cache.c` (multiple lines) - **Variables:** `g_unified_cache_hits_global`, `g_unified_cache_misses_global`, etc. - **Purpose:** Unified cache hit/miss telemetry - **Path:** WARM (cache layer) - **Expected Gain:** +0.2-0.4% - **Priority:** MEDIUM - **Action:** Group into single Phase 27+ candidate --- ## Phase 26 Implementation Plan ### Phase 26A: `c7_free_count` Atomic Prune **Target:** `core/tiny_superslab_free.inc.h:51` #### Step 1: Add Build Flag ```c // core/hakmem_build_flags.h (after line 290) // ------------------------------------------------------------ // Phase 26A: C7 Free Count Atomic Prune (Compile-out c7_free_count) // ------------------------------------------------------------ // C7 Free Count: Compile gate (default OFF = compile-out) // Set to 1 for research builds that need C7 free path diagnostics // Target: c7_free_count atomic in core/tiny_superslab_free.inc.h:51 #ifndef HAKMEM_C7_FREE_COUNT_COMPILED # define HAKMEM_C7_FREE_COUNT_COMPILED 0 #endif ``` #### Step 2: Wrap Atomic with Compile Gate ```c // core/tiny_superslab_free.inc.h:51 #if HAKMEM_C7_FREE_COUNT_COMPILED extern _Atomic int c7_free_count; int count = atomic_fetch_add_explicit(&c7_free_count, 1, memory_order_relaxed); #else int count = 0; // No-op when compiled out (void)count; // Suppress unused warning #endif ``` #### Step 3: A/B Test (Build-Level) ```bash # Baseline (compiled-out, default) make clean && make -j bench_random_mixed_hakmem ./bench_random_mixed_hakmem > baseline_26a.txt # Compiled-in (for comparison) make clean && make -j EXTRA_CFLAGS='-DHAKMEM_C7_FREE_COUNT_COMPILED=1' bench_random_mixed_hakmem ./bench_random_mixed_hakmem > compiled_in_26a.txt # Run full bench suite ./scripts/run_mixed_10_cleanenv.sh > bench_26a_baseline.txt make clean && make -j EXTRA_CFLAGS='-DHAKMEM_C7_FREE_COUNT_COMPILED=1' bench_random_mixed_hakmem ./scripts/run_mixed_10_cleanenv.sh > bench_26a_compiled.txt ``` #### Step 4: Verdict - **GO:** +0.5% or more → keep compiled-out (default: 0) - **NEUTRAL:** ±0.5% → document, keep compiled-out for cleanliness - **NO-GO:** -0.5% or worse → revert change --- ### Phase 26B-E: Repeat Pattern Follow same pattern for: - **26B:** `g_hdr_mismatch_log` (tiny_superslab_free.inc.h:147) - **26C:** `g_hdr_meta_mismatch` (tiny_superslab_free.inc.h:182) - **26D:** `g_metric_bad_class_once` (hakmem_tiny_alloc.inc:22) - **26E:** `g_hdr_meta_fast` (tiny_free_fast_v2.inc.h:181) **Each Phase:** 1. Add `HAKMEM_[NAME]_COMPILED` flag to `hakmem_build_flags.h` 2. Wrap atomic with `#if HAKMEM_[NAME]_COMPILED` 3. Run A/B test (baseline vs compiled-in) 4. Measure improvement 5. Document verdict --- ## Expected Cumulative Impact | Phase | Target Atomic | File | Expected Gain | Status | |-------|---------------|------|---------------|--------| | 24 | `g_tiny_class_stats_*` | tiny_class_stats_box.h | +0.93% | GO ✅ | | 25 | `g_free_ss_enter` | tiny_superslab_free.inc.h:22 | +1.07% | GO ✅ | | 26A | `c7_free_count` | tiny_superslab_free.inc.h:51 | +0.3-0.8% | TBD | | 26B | `g_hdr_mismatch_log` | tiny_superslab_free.inc.h:147 | +0.2-0.5% | TBD | | 26C | `g_hdr_meta_mismatch` | tiny_superslab_free.inc.h:182 | +0.2-0.5% | TBD | | 26D | `g_metric_bad_class_once` | hakmem_tiny_alloc.inc:22 | +0.1-0.3% | TBD | | 26E | `g_hdr_meta_fast` | tiny_free_fast_v2.inc.h:181 | +0.3-0.7% | TBD | | **Total (24-26E)** | - | - | **+2.93-4.83%** | - | **Conservative Estimate:** +3.0% cumulative improvement from hot-path atomic prune. --- ## Next Steps 1. ✅ Audit complete (this document) 2. ⏳ Implement Phase 26A (`c7_free_count`) 3. ⏳ Run A/B test (baseline vs compiled-in) 4. ⏳ Document results in `PHASE26A_C7_FREE_COUNT_RESULTS.md` 5. ⏳ Repeat for 26B-E 6. ⏳ Create cumulative report --- ## References - **Phase 24 Pattern:** `core/box/tiny_class_stats_box.h` - **Phase 25 Pattern:** `core/tiny_superslab_free.inc.h:20-25` - **Build Flags:** `core/hakmem_build_flags.h:274-290` - **Mimalloc Principle:** No atomics/observe in hot path --- ## Notes - **DO NOT** touch correctness atomics (`remote_count`, `refcount`, `meta->used`, etc.) - **ALWAYS** A/B test each candidate independently (no batching) - **ALWAYS** use build-level flags (compile-time, not runtime) - **FOLLOW** Phase 24+25 pattern (`#if COMPILED` with default: 0) - **DOCUMENT** all verdicts (GO/NEUTRAL/NO-GO) **mimalloc Gap Analysis:** This work closes the "hot path atomic tax" gap identified in optimization roadmap.