621 lines
16 KiB
Markdown
621 lines
16 KiB
Markdown
|
|
# Phase 30: Standard Procedure for Atomic Prune Operations
|
||
|
|
|
||
|
|
**Date:** 2025-12-16
|
||
|
|
**Status:** PROCEDURE STANDARDIZATION
|
||
|
|
**Purpose:** Codify learnings from Phase 24-29 to prevent no-op phases
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Executive Summary
|
||
|
|
|
||
|
|
Phase 24-29 taught us critical lessons about atomic pruning success factors:
|
||
|
|
- **GO phases** (+2.74% cumulative): HOT/WARM path telemetry atomic removal works
|
||
|
|
- **NO-OP phases** (Phase 28-29): Correctness atomics and ENV-gated code waste effort
|
||
|
|
|
||
|
|
This document standardizes a 4-step procedure to ensure future phases target high-impact, executable code.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 1. Phase 24-29 Cumulative Lessons
|
||
|
|
|
||
|
|
### Phase 24-27: GO (+2.74% cumulative)
|
||
|
|
|
||
|
|
**Pattern: HOT/WARM path telemetry atomic removal**
|
||
|
|
|
||
|
|
- **Phase 24 (alloc stats)**: +0.93%
|
||
|
|
- Removed `atomic_fetch_add` in `malloc_tiny_fast()` hot path
|
||
|
|
- Stats compiled out with `HAKMEM_ALLOC_GATE_STATS_COMPILED=0`
|
||
|
|
|
||
|
|
- **Phase 25 (free stats)**: +1.07%
|
||
|
|
- Removed `atomic_fetch_add` in `free_tiny_fast_hotcold()` hot path
|
||
|
|
- Stats compiled out with `HAKMEM_FREE_PATH_STATS_COMPILED=0`
|
||
|
|
|
||
|
|
- **Phase 27 (unified cache)**: +0.74%
|
||
|
|
- Removed `atomic_fetch_add` in TLS cache hit path
|
||
|
|
- Stats compiled out with `HAKMEM_TINY_FRONT_STATS_COMPILED=0`
|
||
|
|
|
||
|
|
**Success Factors:**
|
||
|
|
- ✅ Executed in every allocation/free (HOT path)
|
||
|
|
- ✅ Pure telemetry (stats only, no control flow)
|
||
|
|
- ✅ Build-level compile-out (no runtime overhead)
|
||
|
|
|
||
|
|
### Phase 26: NEUTRAL (code cleanliness)
|
||
|
|
|
||
|
|
**Pattern: Low-frequency but still compile-out**
|
||
|
|
|
||
|
|
- Tiny header tracking stats (COLD path)
|
||
|
|
- No performance impact but maintains future maintainability
|
||
|
|
- Kept compile-out mechanism for consistency
|
||
|
|
|
||
|
|
**Lesson:** Even low-frequency telemetry benefits from compile-out for code cleanliness.
|
||
|
|
|
||
|
|
### Phase 28: NO-OP (CORRECTNESS atomics)
|
||
|
|
|
||
|
|
**Anti-pattern: Misidentified counter purpose**
|
||
|
|
|
||
|
|
- **Target:** `g_bg_spill_len` (looked like a counter)
|
||
|
|
- **Reality:** Flow control atomic (queue depth tracking)
|
||
|
|
- **Usage:**
|
||
|
|
```c
|
||
|
|
if (atomic_load(&g_bg_spill_len) < TARGET_SPILL_LEN) {
|
||
|
|
// Decision-making logic
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Critical Lesson:**
|
||
|
|
**Counter name ≠ Counter purpose**
|
||
|
|
|
||
|
|
**CORRECTNESS atomics (NEVER touch):**
|
||
|
|
- Used in `if/while` conditions
|
||
|
|
- Flow control (queue depth, threshold checks)
|
||
|
|
- Lock-free synchronization (CAS, load-store ordering)
|
||
|
|
- Affects program behavior if removed
|
||
|
|
|
||
|
|
### Phase 29: NO-OP (ENV-gated, not executed)
|
||
|
|
|
||
|
|
**Anti-pattern: Optimizing dead code**
|
||
|
|
|
||
|
|
- **Target:** Pool v2 stats atomics
|
||
|
|
- **Reality:** Gated by `getenv("HAKMEM_POOL_V2")` = OFF by default
|
||
|
|
- **Benchmark:** Never executes pool v2 code paths
|
||
|
|
- **Result:** Zero impact on measurements
|
||
|
|
|
||
|
|
**Critical Lesson:**
|
||
|
|
**Execution verification is MANDATORY before optimization**
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 2. Standard Procedure (4 Steps)
|
||
|
|
|
||
|
|
### Step 0: Execution Verification (MANDATORY GATE) ⚠️
|
||
|
|
|
||
|
|
**Purpose:** Prevent wasted effort on ENV-gated or low-frequency code (Phase 29 lesson)
|
||
|
|
|
||
|
|
#### Methods:
|
||
|
|
|
||
|
|
**A. ENV Gate Check**
|
||
|
|
```bash
|
||
|
|
# Check if feature is runtime-disabled
|
||
|
|
rg "getenv.*FEATURE_NAME" core/
|
||
|
|
rg "getenv.*POOL_V2" core/ # Example
|
||
|
|
```
|
||
|
|
|
||
|
|
**B. Execution Counter Verification**
|
||
|
|
|
||
|
|
1. **Find counter reference:**
|
||
|
|
```bash
|
||
|
|
rg -n "atomic.*g_target_counter" core/
|
||
|
|
```
|
||
|
|
|
||
|
|
2. **Check counter in benchmark output:**
|
||
|
|
```bash
|
||
|
|
# Run mixed benchmark 10 times
|
||
|
|
scripts/run_mixed_10_cleanenv.sh
|
||
|
|
|
||
|
|
# Check if counter > 0 in any run
|
||
|
|
grep "target_counter" results/*.txt
|
||
|
|
```
|
||
|
|
|
||
|
|
3. **Optional: Add debug printf (if counter not visible):**
|
||
|
|
```c
|
||
|
|
#if HAKMEM_DEBUG_PRINT
|
||
|
|
fprintf(stderr, "[DEBUG] counter=%lu\n",
|
||
|
|
atomic_load(&g_target_counter));
|
||
|
|
#endif
|
||
|
|
```
|
||
|
|
|
||
|
|
**C. perf/flamegraph Verification (optional but recommended)**
|
||
|
|
```bash
|
||
|
|
# Record with perf
|
||
|
|
perf record -g -F 99 -- ./bench_random_mixed_hakmem
|
||
|
|
|
||
|
|
# Check if function appears in profile
|
||
|
|
perf report | grep "target_function"
|
||
|
|
```
|
||
|
|
|
||
|
|
#### Decision Matrix:
|
||
|
|
|
||
|
|
| Condition | Action |
|
||
|
|
|-----------|--------|
|
||
|
|
| ✅ Counter > 0 in benchmark | Proceed to Step 1 |
|
||
|
|
| ✅ Function in perf profile | Proceed to Step 1 |
|
||
|
|
| ❌ ENV gated + OFF by default | **SKIP** (Phase 29 pattern) |
|
||
|
|
| ❌ Counter = 0 in all runs | **SKIP** (not executed) |
|
||
|
|
| ❌ Function not in flamegraph | **SKIP** (negligible frequency) |
|
||
|
|
|
||
|
|
**Output:** Document execution verification results in `PHASE[N]_AUDIT.md`
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### Step 1: CORRECTNESS/TELEMETRY Classification (Phase 28 lesson)
|
||
|
|
|
||
|
|
**Purpose:** Distinguish between atomics that control behavior vs. atomics that just observe
|
||
|
|
|
||
|
|
#### Classification Rules:
|
||
|
|
|
||
|
|
**CORRECTNESS (NEVER touch):**
|
||
|
|
- ❌ Used in `if/while/for` conditions
|
||
|
|
- ❌ Flow control (queue depth, threshold, capacity checks)
|
||
|
|
- ❌ Lock-free synchronization (CAS, `atomic_compare_exchange_*`)
|
||
|
|
- ❌ Load-store ordering dependencies
|
||
|
|
- ❌ Affects program decisions/behavior
|
||
|
|
|
||
|
|
**Examples:**
|
||
|
|
```c
|
||
|
|
// CORRECTNESS: Controls loop behavior
|
||
|
|
while (atomic_load(&g_queue_len) < target) { ... }
|
||
|
|
|
||
|
|
// CORRECTNESS: Threshold check
|
||
|
|
if (atomic_load(&g_bg_spill_len) >= MAX_SPILL) { ... }
|
||
|
|
|
||
|
|
// CORRECTNESS: CAS synchronization
|
||
|
|
atomic_compare_exchange_weak(&g_state, &expected, desired)
|
||
|
|
```
|
||
|
|
|
||
|
|
**TELEMETRY (compile-out candidate):**
|
||
|
|
- ✅ Stats/logging/observation only
|
||
|
|
- ✅ Used exclusively in `printf/fprintf/sprintf`
|
||
|
|
- ✅ Deletion changes no program behavior
|
||
|
|
- ✅ Pure counters (hits, misses, totals)
|
||
|
|
|
||
|
|
**Examples:**
|
||
|
|
```c
|
||
|
|
// TELEMETRY: Stats only
|
||
|
|
atomic_fetch_add(&stats[idx].hits, 1, memory_order_relaxed);
|
||
|
|
|
||
|
|
// TELEMETRY: Logging only
|
||
|
|
fprintf(stderr, "allocs=%lu\n", atomic_load(&g_alloc_count));
|
||
|
|
```
|
||
|
|
|
||
|
|
#### Verification Process:
|
||
|
|
|
||
|
|
1. **List all atomics in target scope:**
|
||
|
|
```bash
|
||
|
|
rg -n "atomic_(fetch_add|load|store).*g_target" core/
|
||
|
|
```
|
||
|
|
|
||
|
|
2. **Track all usage sites:**
|
||
|
|
```bash
|
||
|
|
rg -n "g_target_atomic" core/
|
||
|
|
```
|
||
|
|
|
||
|
|
3. **Check each usage:**
|
||
|
|
- Is it in an `if` condition? → **CORRECTNESS**
|
||
|
|
- Is it only in `printf/fprintf`? → **TELEMETRY**
|
||
|
|
- Unsure? → **CORRECTNESS** (safe default)
|
||
|
|
|
||
|
|
4. **Document classification:**
|
||
|
|
```markdown
|
||
|
|
## Atomic Classification
|
||
|
|
|
||
|
|
### g_alloc_stats (TELEMETRY)
|
||
|
|
- core/box/alloc_gate_stats_box.h:15: atomic_fetch_add (stats only)
|
||
|
|
- core/hakmem.c:89: fprintf output only
|
||
|
|
- **Verdict:** TELEMETRY ✅
|
||
|
|
|
||
|
|
### g_bg_spill_len (CORRECTNESS)
|
||
|
|
- core/box/bgthread_box.h:42: if (atomic_load(...) < TARGET)
|
||
|
|
- **Verdict:** CORRECTNESS ❌ DO NOT TOUCH
|
||
|
|
```
|
||
|
|
|
||
|
|
**Output:** Classification table in `PHASE[N]_AUDIT.md`
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### Step 2: Compile-Out Implementation (Phase 24-27 pattern)
|
||
|
|
|
||
|
|
**Purpose:** Build-level removal of telemetry atomics (not link-out)
|
||
|
|
|
||
|
|
#### A. Add Compile Gate to BuildFlags
|
||
|
|
|
||
|
|
**File:** `core/hakmem_build_flags.h`
|
||
|
|
|
||
|
|
```c
|
||
|
|
// ========== [Feature Name] Stats (Phase N) ==========
|
||
|
|
#ifndef HAKMEM_[NAME]_STATS_COMPILED
|
||
|
|
# define HAKMEM_[NAME]_STATS_COMPILED 0
|
||
|
|
#endif
|
||
|
|
```
|
||
|
|
|
||
|
|
**Example:**
|
||
|
|
```c
|
||
|
|
// ========== Alloc Gate Stats (Phase 24) ==========
|
||
|
|
#ifndef HAKMEM_ALLOC_GATE_STATS_COMPILED
|
||
|
|
# define HAKMEM_ALLOC_GATE_STATS_COMPILED 0
|
||
|
|
#endif
|
||
|
|
```
|
||
|
|
|
||
|
|
#### B. Wrap TELEMETRY Atomics with #if
|
||
|
|
|
||
|
|
**Pattern:**
|
||
|
|
```c
|
||
|
|
#if HAKMEM_[NAME]_STATS_COMPILED
|
||
|
|
atomic_fetch_add_explicit(&g_[name]_stat, 1, memory_order_relaxed);
|
||
|
|
#else
|
||
|
|
(void)0; // No-op when compiled out
|
||
|
|
#endif
|
||
|
|
```
|
||
|
|
|
||
|
|
**Example:**
|
||
|
|
```c
|
||
|
|
#if HAKMEM_ALLOC_GATE_STATS_COMPILED
|
||
|
|
atomic_fetch_add_explicit(&g_alloc_gate_slow, 1, memory_order_relaxed);
|
||
|
|
#else
|
||
|
|
(void)0;
|
||
|
|
#endif
|
||
|
|
```
|
||
|
|
|
||
|
|
#### C. Keep Variable Definitions (important!)
|
||
|
|
|
||
|
|
**Do NOT remove:**
|
||
|
|
```c
|
||
|
|
// Keep atomic variable definition (for COMPILED=1 case)
|
||
|
|
static _Atomic uint64_t g_stat_counter = 0;
|
||
|
|
|
||
|
|
// Keep print functions (guarded by same flag)
|
||
|
|
#if HAKMEM_[NAME]_STATS_COMPILED
|
||
|
|
void print_stats(void) {
|
||
|
|
fprintf(stderr, "counter=%lu\n", atomic_load(&g_stat_counter));
|
||
|
|
}
|
||
|
|
#endif
|
||
|
|
```
|
||
|
|
|
||
|
|
#### D. Prohibited Actions (Phase 22-2 NO-GO lesson)
|
||
|
|
|
||
|
|
**NEVER:**
|
||
|
|
- ❌ Link-out (removing `.o` files from Makefile)
|
||
|
|
- ❌ Deleting API functions (breaks linkage)
|
||
|
|
- ❌ Removing struct definitions (breaks compilation)
|
||
|
|
- ❌ Runtime `if` checks (adds branch overhead)
|
||
|
|
|
||
|
|
**Rationale:** Build-level `#if` has zero runtime cost. Link-out risks ABI breaks.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### Step 3: A/B Test (build-level comparison)
|
||
|
|
|
||
|
|
**Purpose:** Measure impact of compile-out vs. compiled-in
|
||
|
|
|
||
|
|
#### A. Baseline Build (COMPILED=0, default)
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Clean build with stats compiled OUT
|
||
|
|
make clean
|
||
|
|
make -j bench_random_mixed_hakmem
|
||
|
|
|
||
|
|
# Run 10 iterations
|
||
|
|
scripts/run_mixed_10_cleanenv.sh
|
||
|
|
|
||
|
|
# Record results
|
||
|
|
cp results/mixed_10_summary.txt docs/analysis/PHASE[N]_BASELINE.txt
|
||
|
|
```
|
||
|
|
|
||
|
|
#### B. Compiled-In Build (COMPILED=1)
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Clean build with stats compiled IN
|
||
|
|
make clean
|
||
|
|
make -j EXTRA_CFLAGS='-DHAKMEM_[NAME]_STATS_COMPILED=1' bench_random_mixed_hakmem
|
||
|
|
|
||
|
|
# Run 10 iterations
|
||
|
|
scripts/run_mixed_10_cleanenv.sh
|
||
|
|
|
||
|
|
# Record results
|
||
|
|
cp results/mixed_10_summary.txt docs/analysis/PHASE[N]_COMPILED_IN.txt
|
||
|
|
```
|
||
|
|
|
||
|
|
#### C. Compare Results
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Calculate delta
|
||
|
|
scripts/compare_benchmark_results.sh \
|
||
|
|
docs/analysis/PHASE[N]_BASELINE.txt \
|
||
|
|
docs/analysis/PHASE[N]_COMPILED_IN.txt
|
||
|
|
```
|
||
|
|
|
||
|
|
#### D. Decision Matrix
|
||
|
|
|
||
|
|
| Delta | Verdict | Action |
|
||
|
|
|-------|---------|--------|
|
||
|
|
| **+0.5% or higher** | **GO** | Keep compile-out, document win |
|
||
|
|
| **±0.5%** | **NEUTRAL** | Keep for code cleanliness |
|
||
|
|
| **-0.5% or lower** | **NO-GO** | Revert changes |
|
||
|
|
|
||
|
|
**Rationale:**
|
||
|
|
- +0.5%: Statistically significant (HOT path impact)
|
||
|
|
- ±0.5%: Noise range (but cleanliness still valuable)
|
||
|
|
- -0.5%: Unexpected regression (likely measurement error, revert)
|
||
|
|
|
||
|
|
**Output:** `PHASE[N]_RESULTS.md` with full comparison
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 3. Phase Checklist Template
|
||
|
|
|
||
|
|
Copy this for each new phase:
|
||
|
|
|
||
|
|
```markdown
|
||
|
|
## Phase [N]: [Target Description] Atomic Prune
|
||
|
|
|
||
|
|
**Date:** YYYY-MM-DD
|
||
|
|
**Target:** [Atomic variable/scope name]
|
||
|
|
**Expected Impact:** [HOT/WARM/COLD path, estimated %]
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### Step 0: Execution Verification ✅/❌
|
||
|
|
|
||
|
|
- [ ] **ENV Gate Check**
|
||
|
|
```bash
|
||
|
|
rg "getenv.*[FEATURE]" core/
|
||
|
|
```
|
||
|
|
Result: [No ENV gate / Gated by X=OFF / Gated by X=ON]
|
||
|
|
|
||
|
|
- [ ] **Execution Counter Verification**
|
||
|
|
```bash
|
||
|
|
rg -n "atomic.*g_target" core/
|
||
|
|
scripts/run_mixed_10_cleanenv.sh
|
||
|
|
grep "target_counter" results/*.txt
|
||
|
|
```
|
||
|
|
Result: [Counter > 0 in all runs / Counter = 0 / Not visible]
|
||
|
|
|
||
|
|
- [ ] **perf Profile Check (optional)**
|
||
|
|
```bash
|
||
|
|
perf record -g -F 99 -- ./bench_random_mixed_hakmem
|
||
|
|
perf report | grep "target_function"
|
||
|
|
```
|
||
|
|
Result: [Function appears in profile / Not in profile]
|
||
|
|
|
||
|
|
**Verdict:** [✅ PROCEED / ❌ SKIP (reason)]
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### Step 1: CORRECTNESS/TELEMETRY Classification
|
||
|
|
|
||
|
|
- [ ] **List All Atomics**
|
||
|
|
```bash
|
||
|
|
rg -n "atomic_(fetch_add|load|store).*g_" [target_file]
|
||
|
|
```
|
||
|
|
|
||
|
|
- [ ] **Track All Usage Sites**
|
||
|
|
```bash
|
||
|
|
rg -n "g_atomic_var" core/
|
||
|
|
```
|
||
|
|
|
||
|
|
- [ ] **Classify Each Atomic**
|
||
|
|
|
||
|
|
| Atomic Variable | Usage | Class | Verdict |
|
||
|
|
|-----------------|-------|-------|---------|
|
||
|
|
| `g_var1` | `if` condition | CORRECTNESS | ❌ DO NOT TOUCH |
|
||
|
|
| `g_var2` | `fprintf` only | TELEMETRY | ✅ Candidate |
|
||
|
|
|
||
|
|
- [ ] **Document Classification Rationale**
|
||
|
|
|
||
|
|
**Output:** Classification table saved to `PHASE[N]_AUDIT.md`
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### Step 2: Compile-Out Implementation
|
||
|
|
|
||
|
|
- [ ] **Add BuildFlags Gate**
|
||
|
|
```c
|
||
|
|
// core/hakmem_build_flags.h
|
||
|
|
#ifndef HAKMEM_[NAME]_STATS_COMPILED
|
||
|
|
# define HAKMEM_[NAME]_STATS_COMPILED 0
|
||
|
|
#endif
|
||
|
|
```
|
||
|
|
|
||
|
|
- [ ] **Wrap TELEMETRY Atomics**
|
||
|
|
```c
|
||
|
|
#if HAKMEM_[NAME]_STATS_COMPILED
|
||
|
|
atomic_fetch_add_explicit(&g_stat, 1, memory_order_relaxed);
|
||
|
|
#else
|
||
|
|
(void)0;
|
||
|
|
#endif
|
||
|
|
```
|
||
|
|
|
||
|
|
- [ ] **Verify Compilation**
|
||
|
|
```bash
|
||
|
|
make clean && make -j # COMPILED=0 default
|
||
|
|
make clean && make -j EXTRA_CFLAGS='-DHAKMEM_[NAME]_STATS_COMPILED=1'
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### Step 3: A/B Test
|
||
|
|
|
||
|
|
- [ ] **Baseline Build (COMPILED=0)**
|
||
|
|
```bash
|
||
|
|
make clean && make -j bench_random_mixed_hakmem
|
||
|
|
scripts/run_mixed_10_cleanenv.sh
|
||
|
|
cp results/mixed_10_summary.txt docs/analysis/PHASE[N]_BASELINE.txt
|
||
|
|
```
|
||
|
|
|
||
|
|
- [ ] **Compiled-In Build (COMPILED=1)**
|
||
|
|
```bash
|
||
|
|
make clean && make -j EXTRA_CFLAGS='-DHAKMEM_[NAME]_STATS_COMPILED=1' bench_random_mixed_hakmem
|
||
|
|
scripts/run_mixed_10_cleanenv.sh
|
||
|
|
cp results/mixed_10_summary.txt docs/analysis/PHASE[N]_COMPILED_IN.txt
|
||
|
|
```
|
||
|
|
|
||
|
|
- [ ] **Compare Results**
|
||
|
|
```bash
|
||
|
|
scripts/compare_benchmark_results.sh \
|
||
|
|
docs/analysis/PHASE[N]_BASELINE.txt \
|
||
|
|
docs/analysis/PHASE[N]_COMPILED_IN.txt
|
||
|
|
```
|
||
|
|
|
||
|
|
- [ ] **Record Verdict**
|
||
|
|
- Delta: [+X.XX%]
|
||
|
|
- Verdict: [GO / NEUTRAL / NO-GO]
|
||
|
|
- Rationale: [...]
|
||
|
|
|
||
|
|
**Output:** `PHASE[N]_RESULTS.md` with full comparison
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### Deliverables
|
||
|
|
|
||
|
|
- [ ] `PHASE[N]_AUDIT.md` - Classification and execution verification
|
||
|
|
- [ ] `PHASE[N]_BASELINE.txt` - Baseline benchmark results
|
||
|
|
- [ ] `PHASE[N]_COMPILED_IN.txt` - Compiled-in benchmark results
|
||
|
|
- [ ] `PHASE[N]_RESULTS.md` - A/B comparison and verdict
|
||
|
|
- [ ] Update `ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md` with Phase [N] results
|
||
|
|
- [ ] Update `CURRENT_TASK.md` with next phase
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### Notes
|
||
|
|
|
||
|
|
[Add any phase-specific observations, gotchas, or learnings here]
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 4. Success Criteria
|
||
|
|
|
||
|
|
A phase is considered **GO** if:
|
||
|
|
1. ✅ Step 0: Execution verified (counter > 0 or perf profile hit)
|
||
|
|
2. ✅ Step 1: Pure TELEMETRY classification (no CORRECTNESS atomics)
|
||
|
|
3. ✅ Step 2: Clean compile-out implementation (no link-out)
|
||
|
|
4. ✅ Step 3: +0.5% or higher performance delta
|
||
|
|
|
||
|
|
A phase is **NO-OP** if:
|
||
|
|
- ❌ Step 0: Not executed in benchmark (Phase 29)
|
||
|
|
- ❌ Step 1: CORRECTNESS atomic (Phase 28)
|
||
|
|
- ❌ Step 3: Delta within ±0.5% noise range
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 5. Anti-Patterns to Avoid
|
||
|
|
|
||
|
|
### ❌ Skipping Execution Verification (Phase 29)
|
||
|
|
**Problem:** Optimizing ENV-gated code that never runs
|
||
|
|
**Solution:** Always run Step 0 before any work
|
||
|
|
|
||
|
|
### ❌ Assuming Counter = Telemetry (Phase 28)
|
||
|
|
**Problem:** Flow control atomics look like counters
|
||
|
|
**Solution:** Check all usage sites, especially `if` conditions
|
||
|
|
|
||
|
|
### ❌ Link-Out Instead of Compile-Out (Phase 22-2)
|
||
|
|
**Problem:** ABI breaks, mysterious link errors
|
||
|
|
**Solution:** Use `#if` preprocessor guards, never remove `.o` files
|
||
|
|
|
||
|
|
### ❌ Runtime Flags for Stats (not attempted, but common mistake)
|
||
|
|
**Problem:** `if (g_enable_stats)` adds branch overhead
|
||
|
|
**Solution:** Build-level `#if` has zero runtime cost
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 6. Expected Impact by Path Type
|
||
|
|
|
||
|
|
Based on Phase 24-29 results:
|
||
|
|
|
||
|
|
| Path Type | Expected Delta | Example Phases |
|
||
|
|
|-----------|----------------|----------------|
|
||
|
|
| **HOT** (alloc/free fast path) | **+0.5% to +1.5%** | Phase 24 (+0.93%), Phase 25 (+1.07%) |
|
||
|
|
| **WARM** (TLS cache hit) | **+0.2% to +0.8%** | Phase 27 (+0.74%) |
|
||
|
|
| **COLD** (slow path, rare events) | **±0.0% to +0.2%** | Phase 26 (NEUTRAL, cleanliness) |
|
||
|
|
| **ENV-gated OFF** | **0.0% (no-op)** | Phase 29 (pool v2) |
|
||
|
|
| **CORRECTNESS** | **Undefined (DO NOT TOUCH)** | Phase 28 (bg_spill_len) |
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 7. Tools and Scripts
|
||
|
|
|
||
|
|
### Execution Verification
|
||
|
|
```bash
|
||
|
|
# ENV gate check
|
||
|
|
rg "getenv.*FEATURE" core/
|
||
|
|
|
||
|
|
# Counter check (requires benchmark run)
|
||
|
|
scripts/run_mixed_10_cleanenv.sh
|
||
|
|
grep "counter_name" results/*.txt
|
||
|
|
|
||
|
|
# perf profile
|
||
|
|
perf record -g -F 99 -- ./bench_random_mixed_hakmem
|
||
|
|
perf report | grep "function_name"
|
||
|
|
```
|
||
|
|
|
||
|
|
### Classification Audit
|
||
|
|
```bash
|
||
|
|
# List all atomics in scope
|
||
|
|
rg -n "atomic_(fetch_add|load|store|compare_exchange)" [file]
|
||
|
|
|
||
|
|
# Track variable usage
|
||
|
|
rg -n "g_variable_name" core/
|
||
|
|
|
||
|
|
# Find if conditions
|
||
|
|
rg -n "if.*g_variable" core/
|
||
|
|
```
|
||
|
|
|
||
|
|
### A/B Testing
|
||
|
|
```bash
|
||
|
|
# Baseline
|
||
|
|
make clean && make -j bench_random_mixed_hakmem
|
||
|
|
scripts/run_mixed_10_cleanenv.sh
|
||
|
|
|
||
|
|
# Compiled-in
|
||
|
|
make clean && make -j EXTRA_CFLAGS='-DHAKMEM_FEATURE_COMPILED=1' bench_random_mixed_hakmem
|
||
|
|
scripts/run_mixed_10_cleanenv.sh
|
||
|
|
|
||
|
|
# Compare (if script exists)
|
||
|
|
scripts/compare_benchmark_results.sh baseline.txt compiled_in.txt
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 8. Governance
|
||
|
|
|
||
|
|
**When to Use This Procedure:**
|
||
|
|
- Any new atomic prune phase (Phase 31+)
|
||
|
|
- Reviewing existing compile-out flags for consistency
|
||
|
|
- Training new contributors on atomic optimization
|
||
|
|
|
||
|
|
**When to Skip:**
|
||
|
|
- Non-atomic optimizations (inlining, data structure changes)
|
||
|
|
- Known CORRECTNESS atomics (Step 1 already failed)
|
||
|
|
- Features explicitly marked "do not optimize"
|
||
|
|
|
||
|
|
**Document Updates:**
|
||
|
|
- This procedure should be updated after each phase if new patterns emerge
|
||
|
|
- Phase results should update `ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md`
|
||
|
|
- New anti-patterns should be added to Section 5
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 9. References
|
||
|
|
|
||
|
|
- **Phase 24 Results:** `docs/analysis/PHASE24_ALLOC_GATE_STATS_RESULTS.md` (+0.93%)
|
||
|
|
- **Phase 25 Results:** `docs/analysis/PHASE25_FREE_PATH_STATS_RESULTS.md` (+1.07%)
|
||
|
|
- **Phase 27 Results:** `docs/analysis/PHASE27_TINY_FRONT_STATS_RESULTS.md` (+0.74%)
|
||
|
|
- **Phase 28 NO-OP:** `docs/analysis/PHASE28_BGTHREAD_ATOMIC_AUDIT.md` (CORRECTNESS)
|
||
|
|
- **Phase 29 NO-OP:** `docs/analysis/PHASE29_POOL_V2_AUDIT.md` (ENV-gated)
|
||
|
|
- **Cumulative Summary:** `docs/analysis/ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md`
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
**End of Standard Procedure Document**
|
||
|
|
|
||
|
|
**Next:** Apply Step 0 to Phase 31 candidates to ensure execution before optimization.
|