# Phase 30: Standard Procedure for Atomic Prune Operations **Date:** 2025-12-16 **Status:** PROCEDURE STANDARDIZATION **Purpose:** Codify learnings from Phase 24-29 to prevent no-op phases --- ## Executive Summary Phase 24-29 taught us critical lessons about atomic pruning success factors: - **GO phases** (+2.74% cumulative): HOT/WARM path telemetry atomic removal works - **NO-OP phases** (Phase 28-29): Correctness atomics and ENV-gated code waste effort This document standardizes a 4-step procedure to ensure future phases target high-impact, executable code. --- ## 1. Phase 24-29 Cumulative Lessons ### Phase 24-27: GO (+2.74% cumulative) **Pattern: HOT/WARM path telemetry atomic removal** - **Phase 24 (alloc stats)**: +0.93% - Removed `atomic_fetch_add` in `malloc_tiny_fast()` hot path - Stats compiled out with `HAKMEM_ALLOC_GATE_STATS_COMPILED=0` - **Phase 25 (free stats)**: +1.07% - Removed `atomic_fetch_add` in `free_tiny_fast_hotcold()` hot path - Stats compiled out with `HAKMEM_FREE_PATH_STATS_COMPILED=0` - **Phase 27 (unified cache)**: +0.74% - Removed `atomic_fetch_add` in TLS cache hit path - Stats compiled out with `HAKMEM_TINY_FRONT_STATS_COMPILED=0` **Success Factors:** - ✅ Executed in every allocation/free (HOT path) - ✅ Pure telemetry (stats only, no control flow) - ✅ Build-level compile-out (no runtime overhead) ### Phase 26: NEUTRAL (code cleanliness) **Pattern: Low-frequency but still compile-out** - Tiny header tracking stats (COLD path) - No performance impact but maintains future maintainability - Kept compile-out mechanism for consistency **Lesson:** Even low-frequency telemetry benefits from compile-out for code cleanliness. ### Phase 28: NO-OP (CORRECTNESS atomics) **Anti-pattern: Misidentified counter purpose** - **Target:** `g_bg_spill_len` (looked like a counter) - **Reality:** Flow control atomic (queue depth tracking) - **Usage:** ```c if (atomic_load(&g_bg_spill_len) < TARGET_SPILL_LEN) { // Decision-making logic } ``` **Critical Lesson:** **Counter name ≠ Counter purpose** **CORRECTNESS atomics (NEVER touch):** - Used in `if/while` conditions - Flow control (queue depth, threshold checks) - Lock-free synchronization (CAS, load-store ordering) - Affects program behavior if removed ### Phase 29: NO-OP (ENV-gated, not executed) **Anti-pattern: Optimizing dead code** - **Target:** Pool v2 stats atomics - **Reality:** Gated by `getenv("HAKMEM_POOL_V2")` = OFF by default - **Benchmark:** Never executes pool v2 code paths - **Result:** Zero impact on measurements **Critical Lesson:** **Execution verification is MANDATORY before optimization** --- ## 2. Standard Procedure (4 Steps) ### Step 0: Execution Verification (MANDATORY GATE) ⚠️ **Purpose:** Prevent wasted effort on ENV-gated or low-frequency code (Phase 29 lesson) #### Methods: **A. ENV Gate Check** ```bash # Check if feature is runtime-disabled rg "getenv.*FEATURE_NAME" core/ rg "getenv.*POOL_V2" core/ # Example ``` **B. Execution Counter Verification** 1. **Find counter reference:** ```bash rg -n "atomic.*g_target_counter" core/ ``` 2. **Check counter in benchmark output:** ```bash # Run mixed benchmark 10 times scripts/run_mixed_10_cleanenv.sh # Check if counter > 0 in any run grep "target_counter" results/*.txt ``` 3. **Optional: Add debug printf (if counter not visible):** ```c #if HAKMEM_DEBUG_PRINT fprintf(stderr, "[DEBUG] counter=%lu\n", atomic_load(&g_target_counter)); #endif ``` **C. perf/flamegraph Verification (optional but recommended)** ```bash # Record with perf perf record -g -F 99 -- ./bench_random_mixed_hakmem # Check if function appears in profile perf report | grep "target_function" ``` #### Decision Matrix: | Condition | Action | |-----------|--------| | ✅ Counter > 0 in benchmark | Proceed to Step 1 | | ✅ Function in perf profile | Proceed to Step 1 | | ❌ ENV gated + OFF by default | **SKIP** (Phase 29 pattern) | | ❌ Counter = 0 in all runs | **SKIP** (not executed) | | ❌ Function not in flamegraph | **SKIP** (negligible frequency) | **Output:** Document execution verification results in `PHASE[N]_AUDIT.md` --- ### Step 1: CORRECTNESS/TELEMETRY Classification (Phase 28 lesson) **Purpose:** Distinguish between atomics that control behavior vs. atomics that just observe #### Classification Rules: **CORRECTNESS (NEVER touch):** - ❌ Used in `if/while/for` conditions - ❌ Flow control (queue depth, threshold, capacity checks) - ❌ Lock-free synchronization (CAS, `atomic_compare_exchange_*`) - ❌ Load-store ordering dependencies - ❌ Affects program decisions/behavior **Examples:** ```c // CORRECTNESS: Controls loop behavior while (atomic_load(&g_queue_len) < target) { ... } // CORRECTNESS: Threshold check if (atomic_load(&g_bg_spill_len) >= MAX_SPILL) { ... } // CORRECTNESS: CAS synchronization atomic_compare_exchange_weak(&g_state, &expected, desired) ``` **TELEMETRY (compile-out candidate):** - ✅ Stats/logging/observation only - ✅ Used exclusively in `printf/fprintf/sprintf` - ✅ Deletion changes no program behavior - ✅ Pure counters (hits, misses, totals) **Examples:** ```c // TELEMETRY: Stats only atomic_fetch_add(&stats[idx].hits, 1, memory_order_relaxed); // TELEMETRY: Logging only fprintf(stderr, "allocs=%lu\n", atomic_load(&g_alloc_count)); ``` #### Verification Process: 1. **List all atomics in target scope:** ```bash rg -n "atomic_(fetch_add|load|store).*g_target" core/ ``` 2. **Track all usage sites:** ```bash rg -n "g_target_atomic" core/ ``` 3. **Check each usage:** - Is it in an `if` condition? → **CORRECTNESS** - Is it only in `printf/fprintf`? → **TELEMETRY** - Unsure? → **CORRECTNESS** (safe default) 4. **Document classification:** ```markdown ## Atomic Classification ### g_alloc_stats (TELEMETRY) - core/box/alloc_gate_stats_box.h:15: atomic_fetch_add (stats only) - core/hakmem.c:89: fprintf output only - **Verdict:** TELEMETRY ✅ ### g_bg_spill_len (CORRECTNESS) - core/box/bgthread_box.h:42: if (atomic_load(...) < TARGET) - **Verdict:** CORRECTNESS ❌ DO NOT TOUCH ``` **Output:** Classification table in `PHASE[N]_AUDIT.md` --- ### Step 2: Compile-Out Implementation (Phase 24-27 pattern) **Purpose:** Build-level removal of telemetry atomics (not link-out) #### A. Add Compile Gate to BuildFlags **File:** `core/hakmem_build_flags.h` ```c // ========== [Feature Name] Stats (Phase N) ========== #ifndef HAKMEM_[NAME]_STATS_COMPILED # define HAKMEM_[NAME]_STATS_COMPILED 0 #endif ``` **Example:** ```c // ========== Alloc Gate Stats (Phase 24) ========== #ifndef HAKMEM_ALLOC_GATE_STATS_COMPILED # define HAKMEM_ALLOC_GATE_STATS_COMPILED 0 #endif ``` #### B. Wrap TELEMETRY Atomics with #if **Pattern:** ```c #if HAKMEM_[NAME]_STATS_COMPILED atomic_fetch_add_explicit(&g_[name]_stat, 1, memory_order_relaxed); #else (void)0; // No-op when compiled out #endif ``` **Example:** ```c #if HAKMEM_ALLOC_GATE_STATS_COMPILED atomic_fetch_add_explicit(&g_alloc_gate_slow, 1, memory_order_relaxed); #else (void)0; #endif ``` #### C. Keep Variable Definitions (important!) **Do NOT remove:** ```c // Keep atomic variable definition (for COMPILED=1 case) static _Atomic uint64_t g_stat_counter = 0; // Keep print functions (guarded by same flag) #if HAKMEM_[NAME]_STATS_COMPILED void print_stats(void) { fprintf(stderr, "counter=%lu\n", atomic_load(&g_stat_counter)); } #endif ``` #### D. Prohibited Actions (Phase 22-2 NO-GO lesson) **NEVER:** - ❌ Link-out (removing `.o` files from Makefile) - ❌ Deleting API functions (breaks linkage) - ❌ Removing struct definitions (breaks compilation) - ❌ Runtime `if` checks (adds branch overhead) **Rationale:** Build-level `#if` has zero runtime cost. Link-out risks ABI breaks. --- ### Step 3: A/B Test (build-level comparison) **Purpose:** Measure impact of compile-out vs. compiled-in #### A. Baseline Build (COMPILED=0, default) ```bash # Clean build with stats compiled OUT make clean make -j bench_random_mixed_hakmem # Run 10 iterations scripts/run_mixed_10_cleanenv.sh # Record results cp results/mixed_10_summary.txt docs/analysis/PHASE[N]_BASELINE.txt ``` #### B. Compiled-In Build (COMPILED=1) ```bash # Clean build with stats compiled IN make clean make -j EXTRA_CFLAGS='-DHAKMEM_[NAME]_STATS_COMPILED=1' bench_random_mixed_hakmem # Run 10 iterations scripts/run_mixed_10_cleanenv.sh # Record results cp results/mixed_10_summary.txt docs/analysis/PHASE[N]_COMPILED_IN.txt ``` #### C. Compare Results ```bash # Calculate delta scripts/compare_benchmark_results.sh \ docs/analysis/PHASE[N]_BASELINE.txt \ docs/analysis/PHASE[N]_COMPILED_IN.txt ``` #### D. Decision Matrix | Delta | Verdict | Action | |-------|---------|--------| | **+0.5% or higher** | **GO** | Keep compile-out, document win | | **±0.5%** | **NEUTRAL** | Keep for code cleanliness | | **-0.5% or lower** | **NO-GO** | Revert changes | **Rationale:** - +0.5%: Statistically significant (HOT path impact) - ±0.5%: Noise range (but cleanliness still valuable) - -0.5%: Unexpected regression (likely measurement error, revert) **Output:** `PHASE[N]_RESULTS.md` with full comparison --- ## 3. Phase Checklist Template Copy this for each new phase: ```markdown ## Phase [N]: [Target Description] Atomic Prune **Date:** YYYY-MM-DD **Target:** [Atomic variable/scope name] **Expected Impact:** [HOT/WARM/COLD path, estimated %] --- ### Step 0: Execution Verification ✅/❌ - [ ] **ENV Gate Check** ```bash rg "getenv.*[FEATURE]" core/ ``` Result: [No ENV gate / Gated by X=OFF / Gated by X=ON] - [ ] **Execution Counter Verification** ```bash rg -n "atomic.*g_target" core/ scripts/run_mixed_10_cleanenv.sh grep "target_counter" results/*.txt ``` Result: [Counter > 0 in all runs / Counter = 0 / Not visible] - [ ] **perf Profile Check (optional)** ```bash perf record -g -F 99 -- ./bench_random_mixed_hakmem perf report | grep "target_function" ``` Result: [Function appears in profile / Not in profile] **Verdict:** [✅ PROCEED / ❌ SKIP (reason)] --- ### Step 1: CORRECTNESS/TELEMETRY Classification - [ ] **List All Atomics** ```bash rg -n "atomic_(fetch_add|load|store).*g_" [target_file] ``` - [ ] **Track All Usage Sites** ```bash rg -n "g_atomic_var" core/ ``` - [ ] **Classify Each Atomic** | Atomic Variable | Usage | Class | Verdict | |-----------------|-------|-------|---------| | `g_var1` | `if` condition | CORRECTNESS | ❌ DO NOT TOUCH | | `g_var2` | `fprintf` only | TELEMETRY | ✅ Candidate | - [ ] **Document Classification Rationale** **Output:** Classification table saved to `PHASE[N]_AUDIT.md` --- ### Step 2: Compile-Out Implementation - [ ] **Add BuildFlags Gate** ```c // core/hakmem_build_flags.h #ifndef HAKMEM_[NAME]_STATS_COMPILED # define HAKMEM_[NAME]_STATS_COMPILED 0 #endif ``` - [ ] **Wrap TELEMETRY Atomics** ```c #if HAKMEM_[NAME]_STATS_COMPILED atomic_fetch_add_explicit(&g_stat, 1, memory_order_relaxed); #else (void)0; #endif ``` - [ ] **Verify Compilation** ```bash make clean && make -j # COMPILED=0 default make clean && make -j EXTRA_CFLAGS='-DHAKMEM_[NAME]_STATS_COMPILED=1' ``` --- ### Step 3: A/B Test - [ ] **Baseline Build (COMPILED=0)** ```bash make clean && make -j bench_random_mixed_hakmem scripts/run_mixed_10_cleanenv.sh cp results/mixed_10_summary.txt docs/analysis/PHASE[N]_BASELINE.txt ``` - [ ] **Compiled-In Build (COMPILED=1)** ```bash make clean && make -j EXTRA_CFLAGS='-DHAKMEM_[NAME]_STATS_COMPILED=1' bench_random_mixed_hakmem scripts/run_mixed_10_cleanenv.sh cp results/mixed_10_summary.txt docs/analysis/PHASE[N]_COMPILED_IN.txt ``` - [ ] **Compare Results** ```bash scripts/compare_benchmark_results.sh \ docs/analysis/PHASE[N]_BASELINE.txt \ docs/analysis/PHASE[N]_COMPILED_IN.txt ``` - [ ] **Record Verdict** - Delta: [+X.XX%] - Verdict: [GO / NEUTRAL / NO-GO] - Rationale: [...] **Output:** `PHASE[N]_RESULTS.md` with full comparison --- ### Deliverables - [ ] `PHASE[N]_AUDIT.md` - Classification and execution verification - [ ] `PHASE[N]_BASELINE.txt` - Baseline benchmark results - [ ] `PHASE[N]_COMPILED_IN.txt` - Compiled-in benchmark results - [ ] `PHASE[N]_RESULTS.md` - A/B comparison and verdict - [ ] Update `ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md` with Phase [N] results - [ ] Update `CURRENT_TASK.md` with next phase --- ### Notes [Add any phase-specific observations, gotchas, or learnings here] ``` --- ## 4. Success Criteria A phase is considered **GO** if: 1. ✅ Step 0: Execution verified (counter > 0 or perf profile hit) 2. ✅ Step 1: Pure TELEMETRY classification (no CORRECTNESS atomics) 3. ✅ Step 2: Clean compile-out implementation (no link-out) 4. ✅ Step 3: +0.5% or higher performance delta A phase is **NO-OP** if: - ❌ Step 0: Not executed in benchmark (Phase 29) - ❌ Step 1: CORRECTNESS atomic (Phase 28) - ❌ Step 3: Delta within ±0.5% noise range --- ## 5. Anti-Patterns to Avoid ### ❌ Skipping Execution Verification (Phase 29) **Problem:** Optimizing ENV-gated code that never runs **Solution:** Always run Step 0 before any work ### ❌ Assuming Counter = Telemetry (Phase 28) **Problem:** Flow control atomics look like counters **Solution:** Check all usage sites, especially `if` conditions ### ❌ Link-Out Instead of Compile-Out (Phase 22-2) **Problem:** ABI breaks, mysterious link errors **Solution:** Use `#if` preprocessor guards, never remove `.o` files ### ❌ Runtime Flags for Stats (not attempted, but common mistake) **Problem:** `if (g_enable_stats)` adds branch overhead **Solution:** Build-level `#if` has zero runtime cost --- ## 6. Expected Impact by Path Type Based on Phase 24-29 results: | Path Type | Expected Delta | Example Phases | |-----------|----------------|----------------| | **HOT** (alloc/free fast path) | **+0.5% to +1.5%** | Phase 24 (+0.93%), Phase 25 (+1.07%) | | **WARM** (TLS cache hit) | **+0.2% to +0.8%** | Phase 27 (+0.74%) | | **COLD** (slow path, rare events) | **±0.0% to +0.2%** | Phase 26 (NEUTRAL, cleanliness) | | **ENV-gated OFF** | **0.0% (no-op)** | Phase 29 (pool v2) | | **CORRECTNESS** | **Undefined (DO NOT TOUCH)** | Phase 28 (bg_spill_len) | --- ## 7. Tools and Scripts ### Execution Verification ```bash # ENV gate check rg "getenv.*FEATURE" core/ # Counter check (requires benchmark run) scripts/run_mixed_10_cleanenv.sh grep "counter_name" results/*.txt # perf profile perf record -g -F 99 -- ./bench_random_mixed_hakmem perf report | grep "function_name" ``` ### Classification Audit ```bash # List all atomics in scope rg -n "atomic_(fetch_add|load|store|compare_exchange)" [file] # Track variable usage rg -n "g_variable_name" core/ # Find if conditions rg -n "if.*g_variable" core/ ``` ### A/B Testing ```bash # Baseline make clean && make -j bench_random_mixed_hakmem scripts/run_mixed_10_cleanenv.sh # Compiled-in make clean && make -j EXTRA_CFLAGS='-DHAKMEM_FEATURE_COMPILED=1' bench_random_mixed_hakmem scripts/run_mixed_10_cleanenv.sh # Compare (if script exists) scripts/compare_benchmark_results.sh baseline.txt compiled_in.txt ``` --- ## 8. Governance **When to Use This Procedure:** - Any new atomic prune phase (Phase 31+) - Reviewing existing compile-out flags for consistency - Training new contributors on atomic optimization **When to Skip:** - Non-atomic optimizations (inlining, data structure changes) - Known CORRECTNESS atomics (Step 1 already failed) - Features explicitly marked "do not optimize" **Document Updates:** - This procedure should be updated after each phase if new patterns emerge - Phase results should update `ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md` - New anti-patterns should be added to Section 5 --- ## 9. References - **Phase 24 Results:** `docs/analysis/PHASE24_ALLOC_GATE_STATS_RESULTS.md` (+0.93%) - **Phase 25 Results:** `docs/analysis/PHASE25_FREE_PATH_STATS_RESULTS.md` (+1.07%) - **Phase 27 Results:** `docs/analysis/PHASE27_TINY_FRONT_STATS_RESULTS.md` (+0.74%) - **Phase 28 NO-OP:** `docs/analysis/PHASE28_BGTHREAD_ATOMIC_AUDIT.md` (CORRECTNESS) - **Phase 29 NO-OP:** `docs/analysis/PHASE29_POOL_V2_AUDIT.md` (ENV-gated) - **Cumulative Summary:** `docs/analysis/ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md` --- **End of Standard Procedure Document** **Next:** Apply Step 0 to Phase 31 candidates to ensure execution before optimization.