# Phase 30: Standard Procedure for Atomic Prune Operations

**Date:** 2025-12-16
**Status:** PROCEDURE STANDARDIZATION
**Purpose:** Codify learnings from Phase 24-29 to prevent no-op phases

---

## Executive Summary

Phase 24-29 taught us critical lessons about atomic pruning success factors:
- **GO phases** (+2.74% cumulative): HOT/WARM path telemetry atomic removal works
- **NO-OP phases** (Phase 28-29): Correctness atomics and ENV-gated code waste effort

This document standardizes a 4-step procedure to ensure future phases target high-impact, executable code.

---

## 1. Phase 24-29 Cumulative Lessons

### Phase 24-27: GO (+2.74% cumulative)

**Pattern: HOT/WARM path telemetry atomic removal**

- **Phase 24 (alloc stats)**: +0.93%
  - Removed `atomic_fetch_add` in `malloc_tiny_fast()` hot path
  - Stats compiled out with `HAKMEM_ALLOC_GATE_STATS_COMPILED=0`

- **Phase 25 (free stats)**: +1.07%
  - Removed `atomic_fetch_add` in `free_tiny_fast_hotcold()` hot path
  - Stats compiled out with `HAKMEM_FREE_PATH_STATS_COMPILED=0`

- **Phase 27 (unified cache)**: +0.74%
  - Removed `atomic_fetch_add` in TLS cache hit path
  - Stats compiled out with `HAKMEM_TINY_FRONT_STATS_COMPILED=0`

**Success Factors:**
- ✅ Executed in every allocation/free (HOT path)
- ✅ Pure telemetry (stats only, no control flow)
- ✅ Build-level compile-out (no runtime overhead)

### Phase 26: NEUTRAL (code cleanliness)

**Pattern: Low-frequency but still compile-out**

- Tiny header tracking stats (COLD path)
- No performance impact but maintains future maintainability
- Kept compile-out mechanism for consistency

**Lesson:** Even low-frequency telemetry benefits from compile-out for code cleanliness.

### Phase 28: NO-OP (CORRECTNESS atomics)

**Anti-pattern: Misidentified counter purpose**

- **Target:** `g_bg_spill_len` (looked like a counter)
- **Reality:** Flow control atomic (queue depth tracking)
- **Usage:**
  ```c
  if (atomic_load(&g_bg_spill_len) < TARGET_SPILL_LEN) {
      // Decision-making logic
  }
  ```

**Critical Lesson:**
**Counter name ≠ Counter purpose**

**CORRECTNESS atomics (NEVER touch):**
- Used in `if/while` conditions
- Flow control (queue depth, threshold checks)
- Lock-free synchronization (CAS, load-store ordering)
- Affects program behavior if removed

### Phase 29: NO-OP (ENV-gated, not executed)

**Anti-pattern: Optimizing dead code**

- **Target:** Pool v2 stats atomics
- **Reality:** Gated by `getenv("HAKMEM_POOL_V2")` = OFF by default
- **Benchmark:** Never executes pool v2 code paths
- **Result:** Zero impact on measurements

**Critical Lesson:**
**Execution verification is MANDATORY before optimization**

---

## 2. Standard Procedure (4 Steps)

### Step 0: Execution Verification (MANDATORY GATE) ⚠️

**Purpose:** Prevent wasted effort on ENV-gated or low-frequency code (Phase 29 lesson)

#### Methods:

**A. ENV Gate Check**
```bash
# Check if feature is runtime-disabled
rg "getenv.*FEATURE_NAME" core/
rg "getenv.*POOL_V2" core/  # Example
```

**B. Execution Counter Verification**

1. **Find counter reference:**
   ```bash
   rg -n "atomic.*g_target_counter" core/
   ```

2. **Check counter in benchmark output:**
   ```bash
   # Run mixed benchmark 10 times
   scripts/run_mixed_10_cleanenv.sh

   # Check if counter > 0 in any run
   grep "target_counter" results/*.txt
   ```

3. **Optional: Add debug printf (if counter not visible):**
   ```c
   #if HAKMEM_DEBUG_PRINT
   fprintf(stderr, "[DEBUG] counter=%lu\n",
           atomic_load(&g_target_counter));
   #endif
   ```

**C. perf/flamegraph Verification (optional but recommended)**
```bash
# Record with perf
perf record -g -F 99 -- ./bench_random_mixed_hakmem

# Check if function appears in profile
perf report | grep "target_function"
```

#### Decision Matrix:

| Condition | Action |
|-----------|--------|
| ✅ Counter > 0 in benchmark | Proceed to Step 1 |
| ✅ Function in perf profile | Proceed to Step 1 |
| ❌ ENV gated + OFF by default | **SKIP** (Phase 29 pattern) |
| ❌ Counter = 0 in all runs | **SKIP** (not executed) |
| ❌ Function not in flamegraph | **SKIP** (negligible frequency) |

**Output:** Document execution verification results in `PHASE[N]_AUDIT.md`

---

### Step 1: CORRECTNESS/TELEMETRY Classification (Phase 28 lesson)

**Purpose:** Distinguish between atomics that control behavior vs. atomics that just observe

#### Classification Rules:

**CORRECTNESS (NEVER touch):**
- ❌ Used in `if/while/for` conditions
- ❌ Flow control (queue depth, threshold, capacity checks)
- ❌ Lock-free synchronization (CAS, `atomic_compare_exchange_*`)
- ❌ Load-store ordering dependencies
- ❌ Affects program decisions/behavior

**Examples:**
```c
// CORRECTNESS: Controls loop behavior
while (atomic_load(&g_queue_len) < target) { ... }

// CORRECTNESS: Threshold check
if (atomic_load(&g_bg_spill_len) >= MAX_SPILL) { ... }

// CORRECTNESS: CAS synchronization
atomic_compare_exchange_weak(&g_state, &expected, desired)
```

**TELEMETRY (compile-out candidate):**
- ✅ Stats/logging/observation only
- ✅ Used exclusively in `printf/fprintf/sprintf`
- ✅ Deletion changes no program behavior
- ✅ Pure counters (hits, misses, totals)

**Examples:**
```c
// TELEMETRY: Stats only
atomic_fetch_add(&stats[idx].hits, 1, memory_order_relaxed);

// TELEMETRY: Logging only
fprintf(stderr, "allocs=%lu\n", atomic_load(&g_alloc_count));
```

#### Verification Process:

1. **List all atomics in target scope:**
   ```bash
   rg -n "atomic_(fetch_add|load|store).*g_target" core/
   ```

2. **Track all usage sites:**
   ```bash
   rg -n "g_target_atomic" core/
   ```

3. **Check each usage:**
   - Is it in an `if` condition? → **CORRECTNESS**
   - Is it only in `printf/fprintf`? → **TELEMETRY**
   - Unsure? → **CORRECTNESS** (safe default)

4. **Document classification:**
   ```markdown
   ## Atomic Classification

   ### g_alloc_stats (TELEMETRY)
   - core/box/alloc_gate_stats_box.h:15: atomic_fetch_add (stats only)
   - core/hakmem.c:89: fprintf output only
   - **Verdict:** TELEMETRY ✅

   ### g_bg_spill_len (CORRECTNESS)
   - core/box/bgthread_box.h:42: if (atomic_load(...) < TARGET)
   - **Verdict:** CORRECTNESS ❌ DO NOT TOUCH
   ```

**Output:** Classification table in `PHASE[N]_AUDIT.md`

---

### Step 2: Compile-Out Implementation (Phase 24-27 pattern)

**Purpose:** Build-level removal of telemetry atomics (not link-out)

#### A. Add Compile Gate to BuildFlags

**File:** `core/hakmem_build_flags.h`

```c
// ========== [Feature Name] Stats (Phase N) ==========
#ifndef HAKMEM_[NAME]_STATS_COMPILED
#  define HAKMEM_[NAME]_STATS_COMPILED 0
#endif
```

**Example:**
```c
// ========== Alloc Gate Stats (Phase 24) ==========
#ifndef HAKMEM_ALLOC_GATE_STATS_COMPILED
#  define HAKMEM_ALLOC_GATE_STATS_COMPILED 0
#endif
```

#### B. Wrap TELEMETRY Atomics with #if

**Pattern:**
```c
#if HAKMEM_[NAME]_STATS_COMPILED
    atomic_fetch_add_explicit(&g_[name]_stat, 1, memory_order_relaxed);
#else
    (void)0;  // No-op when compiled out
#endif
```

**Example:**
```c
#if HAKMEM_ALLOC_GATE_STATS_COMPILED
    atomic_fetch_add_explicit(&g_alloc_gate_slow, 1, memory_order_relaxed);
#else
    (void)0;
#endif
```

#### C. Keep Variable Definitions (important!)

**Do NOT remove:**
```c
// Keep atomic variable definition (for COMPILED=1 case)
static _Atomic uint64_t g_stat_counter = 0;

// Keep print functions (guarded by same flag)
#if HAKMEM_[NAME]_STATS_COMPILED
void print_stats(void) {
    fprintf(stderr, "counter=%lu\n", atomic_load(&g_stat_counter));
}
#endif
```

#### D. Prohibited Actions (Phase 22-2 NO-GO lesson)

**NEVER:**
- ❌ Link-out (removing `.o` files from Makefile)
- ❌ Deleting API functions (breaks linkage)
- ❌ Removing struct definitions (breaks compilation)
- ❌ Runtime `if` checks (adds branch overhead)

**Rationale:** Build-level `#if` has zero runtime cost. Link-out risks ABI breaks.

---

### Step 3: A/B Test (build-level comparison)

**Purpose:** Measure impact of compile-out vs. compiled-in

#### A. Baseline Build (COMPILED=0, default)

```bash
# Clean build with stats compiled OUT
make clean
make -j bench_random_mixed_hakmem

# Run 10 iterations
scripts/run_mixed_10_cleanenv.sh

# Record results
cp results/mixed_10_summary.txt docs/analysis/PHASE[N]_BASELINE.txt
```

#### B. Compiled-In Build (COMPILED=1)

```bash
# Clean build with stats compiled IN
make clean
make -j EXTRA_CFLAGS='-DHAKMEM_[NAME]_STATS_COMPILED=1' bench_random_mixed_hakmem

# Run 10 iterations
scripts/run_mixed_10_cleanenv.sh

# Record results
cp results/mixed_10_summary.txt docs/analysis/PHASE[N]_COMPILED_IN.txt
```

#### C. Compare Results

```bash
# Calculate delta
scripts/compare_benchmark_results.sh \
    docs/analysis/PHASE[N]_BASELINE.txt \
    docs/analysis/PHASE[N]_COMPILED_IN.txt
```

#### D. Decision Matrix

| Delta | Verdict | Action |
|-------|---------|--------|
| **+0.5% or higher** | **GO** | Keep compile-out, document win |
| **±0.5%** | **NEUTRAL** | Keep for code cleanliness |
| **-0.5% or lower** | **NO-GO** | Revert changes |

**Rationale:**
- +0.5%: Statistically significant (HOT path impact)
- ±0.5%: Noise range (but cleanliness still valuable)
- -0.5%: Unexpected regression (likely measurement error, revert)

**Output:** `PHASE[N]_RESULTS.md` with full comparison

---

## 3. Phase Checklist Template

Copy this for each new phase:

```markdown
## Phase [N]: [Target Description] Atomic Prune

**Date:** YYYY-MM-DD
**Target:** [Atomic variable/scope name]
**Expected Impact:** [HOT/WARM/COLD path, estimated %]

---

### Step 0: Execution Verification ✅/❌

- [ ] **ENV Gate Check**
  ```bash
  rg "getenv.*[FEATURE]" core/
  ```
  Result: [No ENV gate / Gated by X=OFF / Gated by X=ON]

- [ ] **Execution Counter Verification**
  ```bash
  rg -n "atomic.*g_target" core/
  scripts/run_mixed_10_cleanenv.sh
  grep "target_counter" results/*.txt
  ```
  Result: [Counter > 0 in all runs / Counter = 0 / Not visible]

- [ ] **perf Profile Check (optional)**
  ```bash
  perf record -g -F 99 -- ./bench_random_mixed_hakmem
  perf report | grep "target_function"
  ```
  Result: [Function appears in profile / Not in profile]

**Verdict:** [✅ PROCEED / ❌ SKIP (reason)]

---

### Step 1: CORRECTNESS/TELEMETRY Classification

- [ ] **List All Atomics**
  ```bash
  rg -n "atomic_(fetch_add|load|store).*g_" [target_file]
  ```

- [ ] **Track All Usage Sites**
  ```bash
  rg -n "g_atomic_var" core/
  ```

- [ ] **Classify Each Atomic**

  | Atomic Variable | Usage | Class | Verdict |
  |-----------------|-------|-------|---------|
  | `g_var1` | `if` condition | CORRECTNESS | ❌ DO NOT TOUCH |
  | `g_var2` | `fprintf` only | TELEMETRY | ✅ Candidate |

- [ ] **Document Classification Rationale**

**Output:** Classification table saved to `PHASE[N]_AUDIT.md`

---

### Step 2: Compile-Out Implementation

- [ ] **Add BuildFlags Gate**
  ```c
  // core/hakmem_build_flags.h
  #ifndef HAKMEM_[NAME]_STATS_COMPILED
  #  define HAKMEM_[NAME]_STATS_COMPILED 0
  #endif
  ```

- [ ] **Wrap TELEMETRY Atomics**
  ```c
  #if HAKMEM_[NAME]_STATS_COMPILED
      atomic_fetch_add_explicit(&g_stat, 1, memory_order_relaxed);
  #else
      (void)0;
  #endif
  ```

- [ ] **Verify Compilation**
  ```bash
  make clean && make -j  # COMPILED=0 default
  make clean && make -j EXTRA_CFLAGS='-DHAKMEM_[NAME]_STATS_COMPILED=1'
  ```

---

### Step 3: A/B Test

- [ ] **Baseline Build (COMPILED=0)**
  ```bash
  make clean && make -j bench_random_mixed_hakmem
  scripts/run_mixed_10_cleanenv.sh
  cp results/mixed_10_summary.txt docs/analysis/PHASE[N]_BASELINE.txt
  ```

- [ ] **Compiled-In Build (COMPILED=1)**
  ```bash
  make clean && make -j EXTRA_CFLAGS='-DHAKMEM_[NAME]_STATS_COMPILED=1' bench_random_mixed_hakmem
  scripts/run_mixed_10_cleanenv.sh
  cp results/mixed_10_summary.txt docs/analysis/PHASE[N]_COMPILED_IN.txt
  ```

- [ ] **Compare Results**
  ```bash
  scripts/compare_benchmark_results.sh \
      docs/analysis/PHASE[N]_BASELINE.txt \
      docs/analysis/PHASE[N]_COMPILED_IN.txt
  ```

- [ ] **Record Verdict**
  - Delta: [+X.XX%]
  - Verdict: [GO / NEUTRAL / NO-GO]
  - Rationale: [...]

**Output:** `PHASE[N]_RESULTS.md` with full comparison

---

### Deliverables

- [ ] `PHASE[N]_AUDIT.md` - Classification and execution verification
- [ ] `PHASE[N]_BASELINE.txt` - Baseline benchmark results
- [ ] `PHASE[N]_COMPILED_IN.txt` - Compiled-in benchmark results
- [ ] `PHASE[N]_RESULTS.md` - A/B comparison and verdict
- [ ] Update `ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md` with Phase [N] results
- [ ] Update `CURRENT_TASK.md` with next phase

---

### Notes

[Add any phase-specific observations, gotchas, or learnings here]
```

---

## 4. Success Criteria

A phase is considered **GO** if:
1. ✅ Step 0: Execution verified (counter > 0 or perf profile hit)
2. ✅ Step 1: Pure TELEMETRY classification (no CORRECTNESS atomics)
3. ✅ Step 2: Clean compile-out implementation (no link-out)
4. ✅ Step 3: +0.5% or higher performance delta

A phase is **NO-OP** if:
- ❌ Step 0: Not executed in benchmark (Phase 29)
- ❌ Step 1: CORRECTNESS atomic (Phase 28)
- ❌ Step 3: Delta within ±0.5% noise range

---

## 5. Anti-Patterns to Avoid

### ❌ Skipping Execution Verification (Phase 29)
**Problem:** Optimizing ENV-gated code that never runs
**Solution:** Always run Step 0 before any work

### ❌ Assuming Counter = Telemetry (Phase 28)
**Problem:** Flow control atomics look like counters
**Solution:** Check all usage sites, especially `if` conditions

### ❌ Link-Out Instead of Compile-Out (Phase 22-2)
**Problem:** ABI breaks, mysterious link errors
**Solution:** Use `#if` preprocessor guards, never remove `.o` files

### ❌ Runtime Flags for Stats (not attempted, but common mistake)
**Problem:** `if (g_enable_stats)` adds branch overhead
**Solution:** Build-level `#if` has zero runtime cost

---

## 6. Expected Impact by Path Type

Based on Phase 24-29 results:

| Path Type | Expected Delta | Example Phases |
|-----------|----------------|----------------|
| **HOT** (alloc/free fast path) | **+0.5% to +1.5%** | Phase 24 (+0.93%), Phase 25 (+1.07%) |
| **WARM** (TLS cache hit) | **+0.2% to +0.8%** | Phase 27 (+0.74%) |
| **COLD** (slow path, rare events) | **±0.0% to +0.2%** | Phase 26 (NEUTRAL, cleanliness) |
| **ENV-gated OFF** | **0.0% (no-op)** | Phase 29 (pool v2) |
| **CORRECTNESS** | **Undefined (DO NOT TOUCH)** | Phase 28 (bg_spill_len) |

---

## 7. Tools and Scripts

### Execution Verification
```bash
# ENV gate check
rg "getenv.*FEATURE" core/

# Counter check (requires benchmark run)
scripts/run_mixed_10_cleanenv.sh
grep "counter_name" results/*.txt

# perf profile
perf record -g -F 99 -- ./bench_random_mixed_hakmem
perf report | grep "function_name"
```

### Classification Audit
```bash
# List all atomics in scope
rg -n "atomic_(fetch_add|load|store|compare_exchange)" [file]

# Track variable usage
rg -n "g_variable_name" core/

# Find if conditions
rg -n "if.*g_variable" core/
```

### A/B Testing
```bash
# Baseline
make clean && make -j bench_random_mixed_hakmem
scripts/run_mixed_10_cleanenv.sh

# Compiled-in
make clean && make -j EXTRA_CFLAGS='-DHAKMEM_FEATURE_COMPILED=1' bench_random_mixed_hakmem
scripts/run_mixed_10_cleanenv.sh

# Compare (if script exists)
scripts/compare_benchmark_results.sh baseline.txt compiled_in.txt
```

---

## 8. Governance

**When to Use This Procedure:**
- Any new atomic prune phase (Phase 31+)
- Reviewing existing compile-out flags for consistency
- Training new contributors on atomic optimization

**When to Skip:**
- Non-atomic optimizations (inlining, data structure changes)
- Known CORRECTNESS atomics (Step 1 already failed)
- Features explicitly marked "do not optimize"

**Document Updates:**
- This procedure should be updated after each phase if new patterns emerge
- Phase results should update `ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md`
- New anti-patterns should be added to Section 5

---

## 9. References

- **Phase 24 Results:** `docs/analysis/PHASE24_ALLOC_GATE_STATS_RESULTS.md` (+0.93%)
- **Phase 25 Results:** `docs/analysis/PHASE25_FREE_PATH_STATS_RESULTS.md` (+1.07%)
- **Phase 27 Results:** `docs/analysis/PHASE27_TINY_FRONT_STATS_RESULTS.md` (+0.74%)
- **Phase 28 NO-OP:** `docs/analysis/PHASE28_BGTHREAD_ATOMIC_AUDIT.md` (CORRECTNESS)
- **Phase 29 NO-OP:** `docs/analysis/PHASE29_POOL_V2_AUDIT.md` (ENV-gated)
- **Cumulative Summary:** `docs/analysis/ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md`

---

**End of Standard Procedure Document**

**Next:** Apply Step 0 to Phase 31 candidates to ensure execution before optimization.