hakmem/docs/analysis/PHASE30_STANDARD_PROCEDURE.md

# Phase 30: Standard Procedure for Atomic Prune Operations

**Date:** 2025-12-16
**Status:** PROCEDURE STANDARDIZATION
**Purpose:** Codify learnings from Phase 24-29 to prevent no-op phases

---

## Executive Summary

Phase 24-29 taught us critical lessons about atomic pruning success factors:
- **GO phases** (+2.74% cumulative): HOT/WARM path telemetry atomic removal works
- **NO-OP phases** (Phase 28-29): Correctness atomics and ENV-gated code waste effort

This document standardizes a 4-step procedure to ensure future phases target high-impact, executable code.

---

## 1. Phase 24-29 Cumulative Lessons

### Phase 24-27: GO (+2.74% cumulative)

**Pattern: HOT/WARM path telemetry atomic removal**

- **Phase 24 (alloc stats)**: +0.93%
  - Removed `atomic_fetch_add` in `malloc_tiny_fast()` hot path
  - Stats compiled out with `HAKMEM_ALLOC_GATE_STATS_COMPILED=0`

- **Phase 25 (free stats)**: +1.07%
  - Removed `atomic_fetch_add` in `free_tiny_fast_hotcold()` hot path
  - Stats compiled out with `HAKMEM_FREE_PATH_STATS_COMPILED=0`

- **Phase 27 (unified cache)**: +0.74%
  - Removed `atomic_fetch_add` in TLS cache hit path
  - Stats compiled out with `HAKMEM_TINY_FRONT_STATS_COMPILED=0`

**Success Factors:**
- ✅ Executed in every allocation/free (HOT path)
- ✅ Pure telemetry (stats only, no control flow)
- ✅ Build-level compile-out (no runtime overhead)

### Phase 26: NEUTRAL (code cleanliness)

**Pattern: Low-frequency but still compile-out**

- Tiny header tracking stats (COLD path)
- No performance impact but maintains future maintainability
- Kept compile-out mechanism for consistency

**Lesson:** Even low-frequency telemetry benefits from compile-out for code cleanliness.

### Phase 28: NO-OP (CORRECTNESS atomics)

**Anti-pattern: Misidentified counter purpose**

- **Target:** `g_bg_spill_len` (looked like a counter)
- **Reality:** Flow control atomic (queue depth tracking)
- **Usage:**
  ```c
  if (atomic_load(&g_bg_spill_len) < TARGET_SPILL_LEN) {
      // Decision-making logic
  }
  ```

**Critical Lesson:**
**Counter name ≠ Counter purpose**

**CORRECTNESS atomics (NEVER touch):**
- Used in `if/while` conditions
- Flow control (queue depth, threshold checks)
- Lock-free synchronization (CAS, load-store ordering)
- Affects program behavior if removed

### Phase 29: NO-OP (ENV-gated, not executed)

**Anti-pattern: Optimizing dead code**

- **Target:** Pool v2 stats atomics
- **Reality:** Gated by `getenv("HAKMEM_POOL_V2")` = OFF by default
- **Benchmark:** Never executes pool v2 code paths
- **Result:** Zero impact on measurements

**Critical Lesson:**
**Execution verification is MANDATORY before optimization**

---

## 2. Standard Procedure (4 Steps)

### Step 0: Execution Verification (MANDATORY GATE) ⚠️

**Purpose:** Prevent wasted effort on ENV-gated or low-frequency code (Phase 29 lesson)

#### Methods:

**A. ENV Gate Check**
```bash
# Check if feature is runtime-disabled
rg "getenv.*FEATURE_NAME" core/
rg "getenv.*POOL_V2" core/  # Example
```

**B. Execution Counter Verification**

1. **Find counter reference:**
   ```bash
   rg -n "atomic.*g_target_counter" core/
   ```

2. **Check counter in benchmark output:**
   ```bash
   # Run mixed benchmark 10 times
   scripts/run_mixed_10_cleanenv.sh

   # Check if counter > 0 in any run
   grep "target_counter" results/*.txt
   ```

3. **Optional: Add debug printf (if counter not visible):**
   ```c
   #if HAKMEM_DEBUG_PRINT
   fprintf(stderr, "[DEBUG] counter=%lu\n",
           atomic_load(&g_target_counter));
   #endif
   ```

**C. perf/flamegraph Verification (optional but recommended)**
```bash
# Record with perf
perf record -g -F 99 -- ./bench_random_mixed_hakmem

# Check if function appears in profile
perf report | grep "target_function"
```

#### Decision Matrix:

| Condition | Action |
|-----------|--------|
| ✅ Counter > 0 in benchmark | Proceed to Step 1 |
| ✅ Function in perf profile | Proceed to Step 1 |
| ❌ ENV gated + OFF by default | **SKIP** (Phase 29 pattern) |
| ❌ Counter = 0 in all runs | **SKIP** (not executed) |
| ❌ Function not in flamegraph | **SKIP** (negligible frequency) |

**Output:** Document execution verification results in `PHASE[N]_AUDIT.md`

---

### Step 1: CORRECTNESS/TELEMETRY Classification (Phase 28 lesson)

**Purpose:** Distinguish between atomics that control behavior vs. atomics that just observe

#### Classification Rules:

**CORRECTNESS (NEVER touch):**
- ❌ Used in `if/while/for` conditions
- ❌ Flow control (queue depth, threshold, capacity checks)
- ❌ Lock-free synchronization (CAS, `atomic_compare_exchange_*`)
- ❌ Load-store ordering dependencies
- ❌ Affects program decisions/behavior

**Examples:**
```c
// CORRECTNESS: Controls loop behavior
while (atomic_load(&g_queue_len) < target) { ... }

// CORRECTNESS: Threshold check
if (atomic_load(&g_bg_spill_len) >= MAX_SPILL) { ... }

// CORRECTNESS: CAS synchronization
atomic_compare_exchange_weak(&g_state, &expected, desired)
```

**TELEMETRY (compile-out candidate):**
- ✅ Stats/logging/observation only
- ✅ Used exclusively in `printf/fprintf/sprintf`
- ✅ Deletion changes no program behavior
- ✅ Pure counters (hits, misses, totals)

**Examples:**
```c
// TELEMETRY: Stats only
atomic_fetch_add(&stats[idx].hits, 1, memory_order_relaxed);

// TELEMETRY: Logging only
fprintf(stderr, "allocs=%lu\n", atomic_load(&g_alloc_count));
```

#### Verification Process:

1. **List all atomics in target scope:**
   ```bash
   rg -n "atomic_(fetch_add|load|store).*g_target" core/
   ```

2. **Track all usage sites:**
   ```bash
   rg -n "g_target_atomic" core/
   ```

3. **Check each usage:**
   - Is it in an `if` condition? → **CORRECTNESS**
   - Is it only in `printf/fprintf`? → **TELEMETRY**
   - Unsure? → **CORRECTNESS** (safe default)

4. **Document classification:**
   ```markdown
   ## Atomic Classification

   ### g_alloc_stats (TELEMETRY)
   - core/box/alloc_gate_stats_box.h:15: atomic_fetch_add (stats only)
   - core/hakmem.c:89: fprintf output only
   - **Verdict:** TELEMETRY ✅

   ### g_bg_spill_len (CORRECTNESS)
   - core/box/bgthread_box.h:42: if (atomic_load(...) < TARGET)
   - **Verdict:** CORRECTNESS ❌ DO NOT TOUCH
   ```

**Output:** Classification table in `PHASE[N]_AUDIT.md`

---

### Step 2: Compile-Out Implementation (Phase 24-27 pattern)

**Purpose:** Build-level removal of telemetry atomics (not link-out)

#### A. Add Compile Gate to BuildFlags

**File:** `core/hakmem_build_flags.h`

```c
// ========== [Feature Name] Stats (Phase N) ==========
#ifndef HAKMEM_[NAME]_STATS_COMPILED
#  define HAKMEM_[NAME]_STATS_COMPILED 0
#endif
```

**Example:**
```c
// ========== Alloc Gate Stats (Phase 24) ==========
#ifndef HAKMEM_ALLOC_GATE_STATS_COMPILED
#  define HAKMEM_ALLOC_GATE_STATS_COMPILED 0
#endif
```

#### B. Wrap TELEMETRY Atomics with #if

**Pattern:**
```c
#if HAKMEM_[NAME]_STATS_COMPILED
    atomic_fetch_add_explicit(&g_[name]_stat, 1, memory_order_relaxed);
#else
    (void)0;  // No-op when compiled out
#endif
```

**Example:**
```c
#if HAKMEM_ALLOC_GATE_STATS_COMPILED
    atomic_fetch_add_explicit(&g_alloc_gate_slow, 1, memory_order_relaxed);
#else
    (void)0;
#endif
```

#### C. Keep Variable Definitions (important!)

**Do NOT remove:**
```c
// Keep atomic variable definition (for COMPILED=1 case)
static _Atomic uint64_t g_stat_counter = 0;

// Keep print functions (guarded by same flag)
#if HAKMEM_[NAME]_STATS_COMPILED
void print_stats(void) {
    fprintf(stderr, "counter=%lu\n", atomic_load(&g_stat_counter));
}
#endif
```

#### D. Prohibited Actions (Phase 22-2 NO-GO lesson)

**NEVER:**
- ❌ Link-out (removing `.o` files from Makefile)
- ❌ Deleting API functions (breaks linkage)
- ❌ Removing struct definitions (breaks compilation)
- ❌ Runtime `if` checks (adds branch overhead)

**Rationale:** Build-level `#if` has zero runtime cost. Link-out risks ABI breaks.

---

### Step 3: A/B Test (build-level comparison)

**Purpose:** Measure impact of compile-out vs. compiled-in

#### A. Baseline Build (COMPILED=0, default)

```bash
# Clean build with stats compiled OUT
make clean
make -j bench_random_mixed_hakmem

# Run 10 iterations
scripts/run_mixed_10_cleanenv.sh

# Record results
cp results/mixed_10_summary.txt docs/analysis/PHASE[N]_BASELINE.txt
```

#### B. Compiled-In Build (COMPILED=1)

```bash
# Clean build with stats compiled IN
make clean
make -j EXTRA_CFLAGS='-DHAKMEM_[NAME]_STATS_COMPILED=1' bench_random_mixed_hakmem

# Run 10 iterations
scripts/run_mixed_10_cleanenv.sh

# Record results
cp results/mixed_10_summary.txt docs/analysis/PHASE[N]_COMPILED_IN.txt
```

#### C. Compare Results

```bash
# Calculate delta
scripts/compare_benchmark_results.sh \
    docs/analysis/PHASE[N]_BASELINE.txt \
    docs/analysis/PHASE[N]_COMPILED_IN.txt
```

#### D. Decision Matrix

| Delta | Verdict | Action |
|-------|---------|--------|
| **+0.5% or higher** | **GO** | Keep compile-out, document win |
| **±0.5%** | **NEUTRAL** | Keep for code cleanliness |
| **-0.5% or lower** | **NO-GO** | Revert changes |

**Rationale:**
- +0.5%: Statistically significant (HOT path impact)
- ±0.5%: Noise range (but cleanliness still valuable)
- -0.5%: Unexpected regression (likely measurement error, revert)

**Output:** `PHASE[N]_RESULTS.md` with full comparison

---

## 3. Phase Checklist Template

Copy this for each new phase:

```markdown
## Phase [N]: [Target Description] Atomic Prune

**Date:** YYYY-MM-DD
**Target:** [Atomic variable/scope name]
**Expected Impact:** [HOT/WARM/COLD path, estimated %]

---

### Step 0: Execution Verification ✅/❌

- [ ] **ENV Gate Check**
  ```bash
  rg "getenv.*[FEATURE]" core/
  ```
  Result: [No ENV gate / Gated by X=OFF / Gated by X=ON]

- [ ] **Execution Counter Verification**
  ```bash
  rg -n "atomic.*g_target" core/
  scripts/run_mixed_10_cleanenv.sh
  grep "target_counter" results/*.txt
  ```
  Result: [Counter > 0 in all runs / Counter = 0 / Not visible]

- [ ] **perf Profile Check (optional)**
  ```bash
  perf record -g -F 99 -- ./bench_random_mixed_hakmem
  perf report | grep "target_function"
  ```
  Result: [Function appears in profile / Not in profile]

**Verdict:** [✅ PROCEED / ❌ SKIP (reason)]

---

### Step 1: CORRECTNESS/TELEMETRY Classification

- [ ] **List All Atomics**
  ```bash
  rg -n "atomic_(fetch_add|load|store).*g_" [target_file]
  ```

- [ ] **Track All Usage Sites**
  ```bash
  rg -n "g_atomic_var" core/
  ```

- [ ] **Classify Each Atomic**

  | Atomic Variable | Usage | Class | Verdict |
  |-----------------|-------|-------|---------|
  | `g_var1` | `if` condition | CORRECTNESS | ❌ DO NOT TOUCH |
  | `g_var2` | `fprintf` only | TELEMETRY | ✅ Candidate |

- [ ] **Document Classification Rationale**

**Output:** Classification table saved to `PHASE[N]_AUDIT.md`

---

### Step 2: Compile-Out Implementation

- [ ] **Add BuildFlags Gate**
  ```c
  // core/hakmem_build_flags.h
  #ifndef HAKMEM_[NAME]_STATS_COMPILED
  #  define HAKMEM_[NAME]_STATS_COMPILED 0
  #endif
  ```

- [ ] **Wrap TELEMETRY Atomics**
  ```c
  #if HAKMEM_[NAME]_STATS_COMPILED
      atomic_fetch_add_explicit(&g_stat, 1, memory_order_relaxed);
  #else
      (void)0;
  #endif
  ```

- [ ] **Verify Compilation**
  ```bash
  make clean && make -j  # COMPILED=0 default
  make clean && make -j EXTRA_CFLAGS='-DHAKMEM_[NAME]_STATS_COMPILED=1'
  ```

---

### Step 3: A/B Test

- [ ] **Baseline Build (COMPILED=0)**
  ```bash
  make clean && make -j bench_random_mixed_hakmem
  scripts/run_mixed_10_cleanenv.sh
  cp results/mixed_10_summary.txt docs/analysis/PHASE[N]_BASELINE.txt
  ```

- [ ] **Compiled-In Build (COMPILED=1)**
  ```bash
  make clean && make -j EXTRA_CFLAGS='-DHAKMEM_[NAME]_STATS_COMPILED=1' bench_random_mixed_hakmem
  scripts/run_mixed_10_cleanenv.sh
  cp results/mixed_10_summary.txt docs/analysis/PHASE[N]_COMPILED_IN.txt
  ```

- [ ] **Compare Results**
  ```bash
  scripts/compare_benchmark_results.sh \
      docs/analysis/PHASE[N]_BASELINE.txt \
      docs/analysis/PHASE[N]_COMPILED_IN.txt
  ```

- [ ] **Record Verdict**
  - Delta: [+X.XX%]
  - Verdict: [GO / NEUTRAL / NO-GO]
  - Rationale: [...]

**Output:** `PHASE[N]_RESULTS.md` with full comparison

---

### Deliverables

- [ ] `PHASE[N]_AUDIT.md` - Classification and execution verification
- [ ] `PHASE[N]_BASELINE.txt` - Baseline benchmark results
- [ ] `PHASE[N]_COMPILED_IN.txt` - Compiled-in benchmark results
- [ ] `PHASE[N]_RESULTS.md` - A/B comparison and verdict
- [ ] Update `ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md` with Phase [N] results
- [ ] Update `CURRENT_TASK.md` with next phase

---

### Notes

[Add any phase-specific observations, gotchas, or learnings here]
```

---

## 4. Success Criteria

A phase is considered **GO** if:
1. ✅ Step 0: Execution verified (counter > 0 or perf profile hit)
2. ✅ Step 1: Pure TELEMETRY classification (no CORRECTNESS atomics)
3. ✅ Step 2: Clean compile-out implementation (no link-out)
4. ✅ Step 3: +0.5% or higher performance delta

A phase is **NO-OP** if:
- ❌ Step 0: Not executed in benchmark (Phase 29)
- ❌ Step 1: CORRECTNESS atomic (Phase 28)
- ❌ Step 3: Delta within ±0.5% noise range

---

## 5. Anti-Patterns to Avoid

### ❌ Skipping Execution Verification (Phase 29)
**Problem:** Optimizing ENV-gated code that never runs
**Solution:** Always run Step 0 before any work

### ❌ Assuming Counter = Telemetry (Phase 28)
**Problem:** Flow control atomics look like counters
**Solution:** Check all usage sites, especially `if` conditions

### ❌ Link-Out Instead of Compile-Out (Phase 22-2)
**Problem:** ABI breaks, mysterious link errors
**Solution:** Use `#if` preprocessor guards, never remove `.o` files

### ❌ Runtime Flags for Stats (not attempted, but common mistake)
**Problem:** `if (g_enable_stats)` adds branch overhead
**Solution:** Build-level `#if` has zero runtime cost

---

## 6. Expected Impact by Path Type

Based on Phase 24-29 results:

| Path Type | Expected Delta | Example Phases |
|-----------|----------------|----------------|
| **HOT** (alloc/free fast path) | **+0.5% to +1.5%** | Phase 24 (+0.93%), Phase 25 (+1.07%) |
| **WARM** (TLS cache hit) | **+0.2% to +0.8%** | Phase 27 (+0.74%) |
| **COLD** (slow path, rare events) | **±0.0% to +0.2%** | Phase 26 (NEUTRAL, cleanliness) |
| **ENV-gated OFF** | **0.0% (no-op)** | Phase 29 (pool v2) |
| **CORRECTNESS** | **Undefined (DO NOT TOUCH)** | Phase 28 (bg_spill_len) |

---

## 7. Tools and Scripts

### Execution Verification
```bash
# ENV gate check
rg "getenv.*FEATURE" core/

# Counter check (requires benchmark run)
scripts/run_mixed_10_cleanenv.sh
grep "counter_name" results/*.txt

# perf profile
perf record -g -F 99 -- ./bench_random_mixed_hakmem
perf report | grep "function_name"
```

### Classification Audit
```bash
# List all atomics in scope
rg -n "atomic_(fetch_add|load|store|compare_exchange)" [file]

# Track variable usage
rg -n "g_variable_name" core/

# Find if conditions
rg -n "if.*g_variable" core/
```

### A/B Testing
```bash
# Baseline
make clean && make -j bench_random_mixed_hakmem
scripts/run_mixed_10_cleanenv.sh

# Compiled-in
make clean && make -j EXTRA_CFLAGS='-DHAKMEM_FEATURE_COMPILED=1' bench_random_mixed_hakmem
scripts/run_mixed_10_cleanenv.sh

# Compare (if script exists)
scripts/compare_benchmark_results.sh baseline.txt compiled_in.txt
```

---

## 8. Governance

**When to Use This Procedure:**
- Any new atomic prune phase (Phase 31+)
- Reviewing existing compile-out flags for consistency
- Training new contributors on atomic optimization

**When to Skip:**
- Non-atomic optimizations (inlining, data structure changes)
- Known CORRECTNESS atomics (Step 1 already failed)
- Features explicitly marked "do not optimize"

**Document Updates:**
- This procedure should be updated after each phase if new patterns emerge
- Phase results should update `ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md`
- New anti-patterns should be added to Section 5

---

## 9. References

- **Phase 24 Results:** `docs/analysis/PHASE24_ALLOC_GATE_STATS_RESULTS.md` (+0.93%)
- **Phase 25 Results:** `docs/analysis/PHASE25_FREE_PATH_STATS_RESULTS.md` (+1.07%)
- **Phase 27 Results:** `docs/analysis/PHASE27_TINY_FRONT_STATS_RESULTS.md` (+0.74%)
- **Phase 28 NO-OP:** `docs/analysis/PHASE28_BGTHREAD_ATOMIC_AUDIT.md` (CORRECTNESS)
- **Phase 29 NO-OP:** `docs/analysis/PHASE29_POOL_V2_AUDIT.md` (ENV-gated)
- **Cumulative Summary:** `docs/analysis/ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md`

---

**End of Standard Procedure Document**

**Next:** Apply Step 0 to Phase 31 candidates to ensure execution before optimization.
Phase 30-31: Standard procedure + g_tiny_free_trace atomic prune Phase 30: Standard Procedure Establishment - Created 4-step standardized methodology (Step 0-3) - Step 0: Execution Verification (NEW - Phase 29 lesson) - Step 1: CORRECTNESS/TELEMETRY Classification (Phase 28 lesson) - Step 2: Compile-Out Implementation (Phase 24-27 pattern) - Step 3: A/B Test (build-level comparison) - Executed audit_atomics.sh: 412 atomics analyzed - Identified Phase 31 candidate: g_tiny_free_trace (HOT path, TOP PRIORITY) Phase 31: g_tiny_free_trace Compile-Out (HOT Path TELEMETRY) - Target: core/hakmem_tiny_free.inc:326 (trace-rate-limit atomic) - Added HAKMEM_TINY_FREE_TRACE_COMPILED (default: 0) - Classification: Pure TELEMETRY (trace output only, no flow control) - A/B Result: NEUTRAL (baseline -0.35% mean, +0.19% median) - Verdict: NEUTRAL → Adopted for code cleanliness (Phase 26 precedent) - Rationale: HOT path TELEMETRY removal improves code quality A/B Test Details: - Baseline (COMPILED=0): 53.638M ops/s mean, 53.799M median - Compiled-in (COMPILED=1): 53.828M ops/s mean, 53.697M median - Conflicting signals within ±0.5% noise margin - Phase 25 comparison: g_free_ss_enter (+1.07% GO) vs g_tiny_free_trace (NEUTRAL) - Hypothesis: Rate-limited atomic (128 calls) optimized by compiler Cumulative Progress (Phase 24-31): - Phase 24 (class stats): +0.93% GO - Phase 25 (free stats): +1.07% GO - Phase 26 (diagnostics): -0.33% NEUTRAL - Phase 27 (unified cache): +0.74% GO - Phase 28 (bg spill): NO-OP (all CORRECTNESS) - Phase 29 (pool v2): NO-OP (ENV-gated) - Phase 30 (procedure): PROCEDURE - Phase 31 (free trace): -0.35% NEUTRAL - Total: 18 atomics removed, +2.74% net improvement Documentation Created: - PHASE30_STANDARD_PROCEDURE.md: Complete 4-step methodology - ATOMIC_AUDIT_FULL.txt: 412 atomics comprehensive audit - PHASE31_CANDIDATES_HOT/WARM.txt: Priority-sorted candidates - PHASE31_RECOMMENDED_CANDIDATES.md: TOP 3 with Step 0 verification - PHASE31_TINY_FREE_TRACE_ATOMIC_PRUNE_RESULTS.md: Complete A/B results - ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md: Updated (Phase 30-31) - CURRENT_TASK.md: Phase 32 candidate identified (g_hak_tiny_free_calls) Key Lessons: - Lesson 7 (Phase 30): Step 0 execution verification prevents wasted effort - Lesson 8 (Phase 31): NEUTRAL + code cleanliness = valid adoption - HOT path ≠ guaranteed performance win (rate-limited atomics may be optimized) Next Phase: Phase 32 candidate (g_hak_tiny_free_calls) - Location: core/hakmem_tiny_free.inc:335 (9 lines below Phase 31 target) - Expected: +0.3~0.7% or NEUTRAL Generated with Claude Code https://claude.com/claude-code Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> 2025-12-16 07:31:15 +09:00			`# Phase 30: Standard Procedure for Atomic Prune Operations`

			`Date: 2025-12-16`
			`Status: PROCEDURE STANDARDIZATION`
			`Purpose: Codify learnings from Phase 24-29 to prevent no-op phases`

			`---`

			`## Executive Summary`

			`Phase 24-29 taught us critical lessons about atomic pruning success factors:`
			`- GO phases (+2.74% cumulative): HOT/WARM path telemetry atomic removal works`
			`- NO-OP phases (Phase 28-29): Correctness atomics and ENV-gated code waste effort`

			`This document standardizes a 4-step procedure to ensure future phases target high-impact, executable code.`

			`---`

			`## 1. Phase 24-29 Cumulative Lessons`

			`### Phase 24-27: GO (+2.74% cumulative)`

			`Pattern: HOT/WARM path telemetry atomic removal`

			`- Phase 24 (alloc stats): +0.93%`
			- Removed `atomic_fetch_add` in `malloc_tiny_fast()` hot path
			- Stats compiled out with `HAKMEM_ALLOC_GATE_STATS_COMPILED=0`

			`- Phase 25 (free stats): +1.07%`
			- Removed `atomic_fetch_add` in `free_tiny_fast_hotcold()` hot path
			- Stats compiled out with `HAKMEM_FREE_PATH_STATS_COMPILED=0`

			`- Phase 27 (unified cache): +0.74%`
			- Removed `atomic_fetch_add` in TLS cache hit path
			- Stats compiled out with `HAKMEM_TINY_FRONT_STATS_COMPILED=0`

			`Success Factors:`
			`- ✅ Executed in every allocation/free (HOT path)`
			`- ✅ Pure telemetry (stats only, no control flow)`
			`- ✅ Build-level compile-out (no runtime overhead)`

			`### Phase 26: NEUTRAL (code cleanliness)`

			`Pattern: Low-frequency but still compile-out`

			`- Tiny header tracking stats (COLD path)`
			`- No performance impact but maintains future maintainability`
			`- Kept compile-out mechanism for consistency`

			`Lesson: Even low-frequency telemetry benefits from compile-out for code cleanliness.`

			`### Phase 28: NO-OP (CORRECTNESS atomics)`

			`Anti-pattern: Misidentified counter purpose`

			- Target: `g_bg_spill_len` (looked like a counter)
			`- Reality: Flow control atomic (queue depth tracking)`
			`- Usage:`
			```c
			`if (atomic_load(&g_bg_spill_len) < TARGET_SPILL_LEN) {`
			`// Decision-making logic`
			`}`
			```

			`Critical Lesson:`
			`Counter name ≠ Counter purpose`

			`CORRECTNESS atomics (NEVER touch):`
			- Used in `if/while` conditions
			`- Flow control (queue depth, threshold checks)`
			`- Lock-free synchronization (CAS, load-store ordering)`
			`- Affects program behavior if removed`

			`### Phase 29: NO-OP (ENV-gated, not executed)`

			`Anti-pattern: Optimizing dead code`

			`- Target: Pool v2 stats atomics`
			- Reality: Gated by `getenv("HAKMEM_POOL_V2")` = OFF by default
			`- Benchmark: Never executes pool v2 code paths`
			`- Result: Zero impact on measurements`

			`Critical Lesson:`
			`Execution verification is MANDATORY before optimization`

			`---`

			`## 2. Standard Procedure (4 Steps)`

			`### Step 0: Execution Verification (MANDATORY GATE) ⚠️`

			`Purpose: Prevent wasted effort on ENV-gated or low-frequency code (Phase 29 lesson)`

			`#### Methods:`

			`A. ENV Gate Check`
			```bash
			`# Check if feature is runtime-disabled`
			`rg "getenv.*FEATURE_NAME" core/`
			`rg "getenv.*POOL_V2" core/ # Example`
			```

			`B. Execution Counter Verification`

			`1. Find counter reference:`
			```bash
			`rg -n "atomic.*g_target_counter" core/`
			```

			`2. Check counter in benchmark output:`
			```bash
			`# Run mixed benchmark 10 times`
			`scripts/run_mixed_10_cleanenv.sh`

			`# Check if counter > 0 in any run`
			`grep "target_counter" results/*.txt`
			```

			`3. Optional: Add debug printf (if counter not visible):`
			```c
			`#if HAKMEM_DEBUG_PRINT`
			`fprintf(stderr, "[DEBUG] counter=%lu\n",`
			`atomic_load(&g_target_counter));`
			`#endif`
			```

			`C. perf/flamegraph Verification (optional but recommended)`
			```bash
			`# Record with perf`
			`perf record -g -F 99 -- ./bench_random_mixed_hakmem`

			`# Check if function appears in profile`
			`perf report \| grep "target_function"`
			```

			`#### Decision Matrix:`

			`\| Condition \| Action \|`
			`\|-----------\|--------\|`
			`\| ✅ Counter > 0 in benchmark \| Proceed to Step 1 \|`
			`\| ✅ Function in perf profile \| Proceed to Step 1 \|`
			`\| ❌ ENV gated + OFF by default \| SKIP (Phase 29 pattern) \|`
			`\| ❌ Counter = 0 in all runs \| SKIP (not executed) \|`
			`\| ❌ Function not in flamegraph \| SKIP (negligible frequency) \|`

			Output: Document execution verification results in `PHASE[N]_AUDIT.md`

			`---`

			`### Step 1: CORRECTNESS/TELEMETRY Classification (Phase 28 lesson)`

			`Purpose: Distinguish between atomics that control behavior vs. atomics that just observe`

			`#### Classification Rules:`

			`CORRECTNESS (NEVER touch):`
			- ❌ Used in `if/while/for` conditions
			`- ❌ Flow control (queue depth, threshold, capacity checks)`
			- ❌ Lock-free synchronization (CAS, `atomic_compare_exchange_*`)
			`- ❌ Load-store ordering dependencies`
			`- ❌ Affects program decisions/behavior`

			`Examples:`
			```c
			`// CORRECTNESS: Controls loop behavior`
			`while (atomic_load(&g_queue_len) < target) { ... }`

			`// CORRECTNESS: Threshold check`
			`if (atomic_load(&g_bg_spill_len) >= MAX_SPILL) { ... }`

			`// CORRECTNESS: CAS synchronization`
			`atomic_compare_exchange_weak(&g_state, &expected, desired)`
			```

			`TELEMETRY (compile-out candidate):`
			`- ✅ Stats/logging/observation only`
			- ✅ Used exclusively in `printf/fprintf/sprintf`
			`- ✅ Deletion changes no program behavior`
			`- ✅ Pure counters (hits, misses, totals)`

			`Examples:`
			```c
			`// TELEMETRY: Stats only`
			`atomic_fetch_add(&stats[idx].hits, 1, memory_order_relaxed);`

			`// TELEMETRY: Logging only`
			`fprintf(stderr, "allocs=%lu\n", atomic_load(&g_alloc_count));`
			```

			`#### Verification Process:`

			`1. List all atomics in target scope:`
			```bash
			`rg -n "atomic_(fetch_add\|load\|store).*g_target" core/`
			```

			`2. Track all usage sites:`
			```bash
			`rg -n "g_target_atomic" core/`
			```

			`3. Check each usage:`
			- Is it in an `if` condition? → CORRECTNESS
			- Is it only in `printf/fprintf`? → TELEMETRY
			`- Unsure? → CORRECTNESS (safe default)`

			`4. Document classification:`
			```markdown
			`## Atomic Classification`

			`### g_alloc_stats (TELEMETRY)`
			`- core/box/alloc_gate_stats_box.h:15: atomic_fetch_add (stats only)`
			`- core/hakmem.c:89: fprintf output only`
			`- Verdict: TELEMETRY ✅`

			`### g_bg_spill_len (CORRECTNESS)`
			`- core/box/bgthread_box.h:42: if (atomic_load(...) < TARGET)`
			`- Verdict: CORRECTNESS ❌ DO NOT TOUCH`
			```

			Output: Classification table in `PHASE[N]_AUDIT.md`

			`---`

			`### Step 2: Compile-Out Implementation (Phase 24-27 pattern)`

			`Purpose: Build-level removal of telemetry atomics (not link-out)`

			`#### A. Add Compile Gate to BuildFlags`

			File: `core/hakmem_build_flags.h`

			```c
			`// ========== [Feature Name] Stats (Phase N) ==========`
			`#ifndef HAKMEM_[NAME]_STATS_COMPILED`
			`# define HAKMEM_[NAME]_STATS_COMPILED 0`
			`#endif`
			```

			`Example:`
			```c
			`// ========== Alloc Gate Stats (Phase 24) ==========`
			`#ifndef HAKMEM_ALLOC_GATE_STATS_COMPILED`
			`# define HAKMEM_ALLOC_GATE_STATS_COMPILED 0`
			`#endif`
			```

			`#### B. Wrap TELEMETRY Atomics with #if`

			`Pattern:`
			```c
			`#if HAKMEM_[NAME]_STATS_COMPILED`
			`atomic_fetch_add_explicit(&g_[name]_stat, 1, memory_order_relaxed);`
			`#else`
			`(void)0; // No-op when compiled out`
			`#endif`
			```

			`Example:`
			```c
			`#if HAKMEM_ALLOC_GATE_STATS_COMPILED`
			`atomic_fetch_add_explicit(&g_alloc_gate_slow, 1, memory_order_relaxed);`
			`#else`
			`(void)0;`
			`#endif`
			```

			`#### C. Keep Variable Definitions (important!)`

			`Do NOT remove:`
			```c
			`// Keep atomic variable definition (for COMPILED=1 case)`
			`static _Atomic uint64_t g_stat_counter = 0;`

			`// Keep print functions (guarded by same flag)`
			`#if HAKMEM_[NAME]_STATS_COMPILED`
			`void print_stats(void) {`
			`fprintf(stderr, "counter=%lu\n", atomic_load(&g_stat_counter));`
			`}`
			`#endif`
			```

			`#### D. Prohibited Actions (Phase 22-2 NO-GO lesson)`

			`NEVER:`
			- ❌ Link-out (removing `.o` files from Makefile)
			`- ❌ Deleting API functions (breaks linkage)`
			`- ❌ Removing struct definitions (breaks compilation)`
			- ❌ Runtime `if` checks (adds branch overhead)

			Rationale: Build-level `#if` has zero runtime cost. Link-out risks ABI breaks.

			`---`

			`### Step 3: A/B Test (build-level comparison)`

			`Purpose: Measure impact of compile-out vs. compiled-in`

			`#### A. Baseline Build (COMPILED=0, default)`

			```bash
			`# Clean build with stats compiled OUT`
			`make clean`
			`make -j bench_random_mixed_hakmem`

			`# Run 10 iterations`
			`scripts/run_mixed_10_cleanenv.sh`

			`# Record results`
			`cp results/mixed_10_summary.txt docs/analysis/PHASE[N]_BASELINE.txt`
			```

			`#### B. Compiled-In Build (COMPILED=1)`

			```bash
			`# Clean build with stats compiled IN`
			`make clean`
			`make -j EXTRA_CFLAGS='-DHAKMEM_[NAME]_STATS_COMPILED=1' bench_random_mixed_hakmem`

			`# Run 10 iterations`
			`scripts/run_mixed_10_cleanenv.sh`

			`# Record results`
			`cp results/mixed_10_summary.txt docs/analysis/PHASE[N]_COMPILED_IN.txt`
			```

			`#### C. Compare Results`

			```bash
			`# Calculate delta`
			`scripts/compare_benchmark_results.sh \`
			`docs/analysis/PHASE[N]_BASELINE.txt \`
			`docs/analysis/PHASE[N]_COMPILED_IN.txt`
			```

			`#### D. Decision Matrix`

			`\| Delta \| Verdict \| Action \|`
			`\|-------\|---------\|--------\|`
			`\| +0.5% or higher \| GO \| Keep compile-out, document win \|`
			`\| ±0.5% \| NEUTRAL \| Keep for code cleanliness \|`
			`\| -0.5% or lower \| NO-GO \| Revert changes \|`

			`Rationale:`
			`- +0.5%: Statistically significant (HOT path impact)`
			`- ±0.5%: Noise range (but cleanliness still valuable)`
			`- -0.5%: Unexpected regression (likely measurement error, revert)`

			Output: `PHASE[N]_RESULTS.md` with full comparison

			`---`

			`## 3. Phase Checklist Template`

			`Copy this for each new phase:`

			```markdown
			`## Phase [N]: [Target Description] Atomic Prune`

			`Date: YYYY-MM-DD`
			`Target: [Atomic variable/scope name]`
			`Expected Impact: [HOT/WARM/COLD path, estimated %]`

			`---`

			`### Step 0: Execution Verification ✅/❌`

			`- [ ] ENV Gate Check`
			```bash
			`rg "getenv.*[FEATURE]" core/`
			```
			`Result: [No ENV gate / Gated by X=OFF / Gated by X=ON]`

			`- [ ] Execution Counter Verification`
			```bash
			`rg -n "atomic.*g_target" core/`
			`scripts/run_mixed_10_cleanenv.sh`
			`grep "target_counter" results/*.txt`
			```
			`Result: [Counter > 0 in all runs / Counter = 0 / Not visible]`

			`- [ ] perf Profile Check (optional)`
			```bash
			`perf record -g -F 99 -- ./bench_random_mixed_hakmem`
			`perf report \| grep "target_function"`
			```
			`Result: [Function appears in profile / Not in profile]`

			`Verdict: [✅ PROCEED / ❌ SKIP (reason)]`

			`---`

			`### Step 1: CORRECTNESS/TELEMETRY Classification`

			`- [ ] List All Atomics`
			```bash
			`rg -n "atomic_(fetch_add\|load\|store).*g_" [target_file]`
			```

			`- [ ] Track All Usage Sites`
			```bash
			`rg -n "g_atomic_var" core/`
			```

			`- [ ] Classify Each Atomic`

			`\| Atomic Variable \| Usage \| Class \| Verdict \|`
			`\|-----------------\|-------\|-------\|---------\|`
			\| `g_var1` \| `if` condition \| CORRECTNESS \| ❌ DO NOT TOUCH \|
			\| `g_var2` \| `fprintf` only \| TELEMETRY \| ✅ Candidate \|

			`- [ ] Document Classification Rationale`

			Output: Classification table saved to `PHASE[N]_AUDIT.md`

			`---`

			`### Step 2: Compile-Out Implementation`

			`- [ ] Add BuildFlags Gate`
			```c
			`// core/hakmem_build_flags.h`
			`#ifndef HAKMEM_[NAME]_STATS_COMPILED`
			`# define HAKMEM_[NAME]_STATS_COMPILED 0`
			`#endif`
			```

			`- [ ] Wrap TELEMETRY Atomics`
			```c
			`#if HAKMEM_[NAME]_STATS_COMPILED`
			`atomic_fetch_add_explicit(&g_stat, 1, memory_order_relaxed);`
			`#else`
			`(void)0;`
			`#endif`
			```

			`- [ ] Verify Compilation`
			```bash
			`make clean && make -j # COMPILED=0 default`
			`make clean && make -j EXTRA_CFLAGS='-DHAKMEM_[NAME]_STATS_COMPILED=1'`
			```

			`---`

			`### Step 3: A/B Test`

			`- [ ] Baseline Build (COMPILED=0)`
			```bash
			`make clean && make -j bench_random_mixed_hakmem`
			`scripts/run_mixed_10_cleanenv.sh`
			`cp results/mixed_10_summary.txt docs/analysis/PHASE[N]_BASELINE.txt`
			```

			`- [ ] Compiled-In Build (COMPILED=1)`
			```bash
			`make clean && make -j EXTRA_CFLAGS='-DHAKMEM_[NAME]_STATS_COMPILED=1' bench_random_mixed_hakmem`
			`scripts/run_mixed_10_cleanenv.sh`
			`cp results/mixed_10_summary.txt docs/analysis/PHASE[N]_COMPILED_IN.txt`
			```

			`- [ ] Compare Results`
			```bash
			`scripts/compare_benchmark_results.sh \`
			`docs/analysis/PHASE[N]_BASELINE.txt \`
			`docs/analysis/PHASE[N]_COMPILED_IN.txt`
			```

			`- [ ] Record Verdict`
			`- Delta: [+X.XX%]`
			`- Verdict: [GO / NEUTRAL / NO-GO]`
			`- Rationale: [...]`

			Output: `PHASE[N]_RESULTS.md` with full comparison

			`---`

			`### Deliverables`

			- [ ] `PHASE[N]_AUDIT.md` - Classification and execution verification
			- [ ] `PHASE[N]_BASELINE.txt` - Baseline benchmark results
			- [ ] `PHASE[N]_COMPILED_IN.txt` - Compiled-in benchmark results
			- [ ] `PHASE[N]_RESULTS.md` - A/B comparison and verdict
			- [ ] Update `ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md` with Phase [N] results
			- [ ] Update `CURRENT_TASK.md` with next phase

			`---`

			`### Notes`

			`[Add any phase-specific observations, gotchas, or learnings here]`
			```

			`---`

			`## 4. Success Criteria`

			`A phase is considered GO if:`
			`1. ✅ Step 0: Execution verified (counter > 0 or perf profile hit)`
			`2. ✅ Step 1: Pure TELEMETRY classification (no CORRECTNESS atomics)`
			`3. ✅ Step 2: Clean compile-out implementation (no link-out)`
			`4. ✅ Step 3: +0.5% or higher performance delta`

			`A phase is NO-OP if:`
			`- ❌ Step 0: Not executed in benchmark (Phase 29)`
			`- ❌ Step 1: CORRECTNESS atomic (Phase 28)`
			`- ❌ Step 3: Delta within ±0.5% noise range`

			`---`

			`## 5. Anti-Patterns to Avoid`

			`### ❌ Skipping Execution Verification (Phase 29)`
			`Problem: Optimizing ENV-gated code that never runs`
			`Solution: Always run Step 0 before any work`

			`### ❌ Assuming Counter = Telemetry (Phase 28)`
			`Problem: Flow control atomics look like counters`
			Solution: Check all usage sites, especially `if` conditions

			`### ❌ Link-Out Instead of Compile-Out (Phase 22-2)`
			`Problem: ABI breaks, mysterious link errors`
			Solution: Use `#if` preprocessor guards, never remove `.o` files

			`### ❌ Runtime Flags for Stats (not attempted, but common mistake)`
			Problem: `if (g_enable_stats)` adds branch overhead
			Solution: Build-level `#if` has zero runtime cost

			`---`

			`## 6. Expected Impact by Path Type`

			`Based on Phase 24-29 results:`

			`\| Path Type \| Expected Delta \| Example Phases \|`
			`\|-----------\|----------------\|----------------\|`
			`\| HOT (alloc/free fast path) \| +0.5% to +1.5% \| Phase 24 (+0.93%), Phase 25 (+1.07%) \|`
			`\| WARM (TLS cache hit) \| +0.2% to +0.8% \| Phase 27 (+0.74%) \|`
			`\| COLD (slow path, rare events) \| ±0.0% to +0.2% \| Phase 26 (NEUTRAL, cleanliness) \|`
			`\| ENV-gated OFF \| 0.0% (no-op) \| Phase 29 (pool v2) \|`
			`\| CORRECTNESS \| Undefined (DO NOT TOUCH) \| Phase 28 (bg_spill_len) \|`

			`---`

			`## 7. Tools and Scripts`

			`### Execution Verification`
			```bash
			`# ENV gate check`
			`rg "getenv.*FEATURE" core/`

			`# Counter check (requires benchmark run)`
			`scripts/run_mixed_10_cleanenv.sh`
			`grep "counter_name" results/*.txt`

			`# perf profile`
			`perf record -g -F 99 -- ./bench_random_mixed_hakmem`
			`perf report \| grep "function_name"`
			```

			`### Classification Audit`
			```bash
			`# List all atomics in scope`
			`rg -n "atomic_(fetch_add\|load\|store\|compare_exchange)" [file]`

			`# Track variable usage`
			`rg -n "g_variable_name" core/`

			`# Find if conditions`
			`rg -n "if.*g_variable" core/`
			```

			`### A/B Testing`
			```bash
			`# Baseline`
			`make clean && make -j bench_random_mixed_hakmem`
			`scripts/run_mixed_10_cleanenv.sh`

			`# Compiled-in`
			`make clean && make -j EXTRA_CFLAGS='-DHAKMEM_FEATURE_COMPILED=1' bench_random_mixed_hakmem`
			`scripts/run_mixed_10_cleanenv.sh`

			`# Compare (if script exists)`
			`scripts/compare_benchmark_results.sh baseline.txt compiled_in.txt`
			```

			`---`

			`## 8. Governance`

			`When to Use This Procedure:`
			`- Any new atomic prune phase (Phase 31+)`
			`- Reviewing existing compile-out flags for consistency`
			`- Training new contributors on atomic optimization`

			`When to Skip:`
			`- Non-atomic optimizations (inlining, data structure changes)`
			`- Known CORRECTNESS atomics (Step 1 already failed)`
			`- Features explicitly marked "do not optimize"`

			`Document Updates:`
			`- This procedure should be updated after each phase if new patterns emerge`
			- Phase results should update `ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md`
			`- New anti-patterns should be added to Section 5`

			`---`

			`## 9. References`

			- Phase 24 Results: `docs/analysis/PHASE24_ALLOC_GATE_STATS_RESULTS.md` (+0.93%)
			- Phase 25 Results: `docs/analysis/PHASE25_FREE_PATH_STATS_RESULTS.md` (+1.07%)
			- Phase 27 Results: `docs/analysis/PHASE27_TINY_FRONT_STATS_RESULTS.md` (+0.74%)
			- Phase 28 NO-OP: `docs/analysis/PHASE28_BGTHREAD_ATOMIC_AUDIT.md` (CORRECTNESS)
			- Phase 29 NO-OP: `docs/analysis/PHASE29_POOL_V2_AUDIT.md` (ENV-gated)
			- Cumulative Summary: `docs/analysis/ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md`

			`---`

			`End of Standard Procedure Document`

			`Next: Apply Step 0 to Phase 31 candidates to ensure execution before optimization.`