Phase 30: Standard Procedure Establishment - Created 4-step standardized methodology (Step 0-3) - Step 0: Execution Verification (NEW - Phase 29 lesson) - Step 1: CORRECTNESS/TELEMETRY Classification (Phase 28 lesson) - Step 2: Compile-Out Implementation (Phase 24-27 pattern) - Step 3: A/B Test (build-level comparison) - Executed audit_atomics.sh: 412 atomics analyzed - Identified Phase 31 candidate: g_tiny_free_trace (HOT path, TOP PRIORITY) Phase 31: g_tiny_free_trace Compile-Out (HOT Path TELEMETRY) - Target: core/hakmem_tiny_free.inc:326 (trace-rate-limit atomic) - Added HAKMEM_TINY_FREE_TRACE_COMPILED (default: 0) - Classification: Pure TELEMETRY (trace output only, no flow control) - A/B Result: NEUTRAL (baseline -0.35% mean, +0.19% median) - Verdict: NEUTRAL → Adopted for code cleanliness (Phase 26 precedent) - Rationale: HOT path TELEMETRY removal improves code quality A/B Test Details: - Baseline (COMPILED=0): 53.638M ops/s mean, 53.799M median - Compiled-in (COMPILED=1): 53.828M ops/s mean, 53.697M median - Conflicting signals within ±0.5% noise margin - Phase 25 comparison: g_free_ss_enter (+1.07% GO) vs g_tiny_free_trace (NEUTRAL) - Hypothesis: Rate-limited atomic (128 calls) optimized by compiler Cumulative Progress (Phase 24-31): - Phase 24 (class stats): +0.93% GO - Phase 25 (free stats): +1.07% GO - Phase 26 (diagnostics): -0.33% NEUTRAL - Phase 27 (unified cache): +0.74% GO - Phase 28 (bg spill): NO-OP (all CORRECTNESS) - Phase 29 (pool v2): NO-OP (ENV-gated) - Phase 30 (procedure): PROCEDURE - Phase 31 (free trace): -0.35% NEUTRAL - Total: 18 atomics removed, +2.74% net improvement Documentation Created: - PHASE30_STANDARD_PROCEDURE.md: Complete 4-step methodology - ATOMIC_AUDIT_FULL.txt: 412 atomics comprehensive audit - PHASE31_CANDIDATES_HOT/WARM.txt: Priority-sorted candidates - PHASE31_RECOMMENDED_CANDIDATES.md: TOP 3 with Step 0 verification - PHASE31_TINY_FREE_TRACE_ATOMIC_PRUNE_RESULTS.md: Complete A/B results - ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md: Updated (Phase 30-31) - CURRENT_TASK.md: Phase 32 candidate identified (g_hak_tiny_free_calls) Key Lessons: - Lesson 7 (Phase 30): Step 0 execution verification prevents wasted effort - Lesson 8 (Phase 31): NEUTRAL + code cleanliness = valid adoption - HOT path ≠ guaranteed performance win (rate-limited atomics may be optimized) Next Phase: Phase 32 candidate (g_hak_tiny_free_calls) - Location: core/hakmem_tiny_free.inc:335 (9 lines below Phase 31 target) - Expected: +0.3~0.7% or NEUTRAL Generated with Claude Code https://claude.com/claude-code Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
16 KiB
Phase 30: Standard Procedure for Atomic Prune Operations
Date: 2025-12-16 Status: PROCEDURE STANDARDIZATION Purpose: Codify learnings from Phase 24-29 to prevent no-op phases
Executive Summary
Phase 24-29 taught us critical lessons about atomic pruning success factors:
- GO phases (+2.74% cumulative): HOT/WARM path telemetry atomic removal works
- NO-OP phases (Phase 28-29): Correctness atomics and ENV-gated code waste effort
This document standardizes a 4-step procedure to ensure future phases target high-impact, executable code.
1. Phase 24-29 Cumulative Lessons
Phase 24-27: GO (+2.74% cumulative)
Pattern: HOT/WARM path telemetry atomic removal
-
Phase 24 (alloc stats): +0.93%
- Removed
atomic_fetch_addinmalloc_tiny_fast()hot path - Stats compiled out with
HAKMEM_ALLOC_GATE_STATS_COMPILED=0
- Removed
-
Phase 25 (free stats): +1.07%
- Removed
atomic_fetch_addinfree_tiny_fast_hotcold()hot path - Stats compiled out with
HAKMEM_FREE_PATH_STATS_COMPILED=0
- Removed
-
Phase 27 (unified cache): +0.74%
- Removed
atomic_fetch_addin TLS cache hit path - Stats compiled out with
HAKMEM_TINY_FRONT_STATS_COMPILED=0
- Removed
Success Factors:
- ✅ Executed in every allocation/free (HOT path)
- ✅ Pure telemetry (stats only, no control flow)
- ✅ Build-level compile-out (no runtime overhead)
Phase 26: NEUTRAL (code cleanliness)
Pattern: Low-frequency but still compile-out
- Tiny header tracking stats (COLD path)
- No performance impact but maintains future maintainability
- Kept compile-out mechanism for consistency
Lesson: Even low-frequency telemetry benefits from compile-out for code cleanliness.
Phase 28: NO-OP (CORRECTNESS atomics)
Anti-pattern: Misidentified counter purpose
- Target:
g_bg_spill_len(looked like a counter) - Reality: Flow control atomic (queue depth tracking)
- Usage:
if (atomic_load(&g_bg_spill_len) < TARGET_SPILL_LEN) { // Decision-making logic }
Critical Lesson: Counter name ≠ Counter purpose
CORRECTNESS atomics (NEVER touch):
- Used in
if/whileconditions - Flow control (queue depth, threshold checks)
- Lock-free synchronization (CAS, load-store ordering)
- Affects program behavior if removed
Phase 29: NO-OP (ENV-gated, not executed)
Anti-pattern: Optimizing dead code
- Target: Pool v2 stats atomics
- Reality: Gated by
getenv("HAKMEM_POOL_V2")= OFF by default - Benchmark: Never executes pool v2 code paths
- Result: Zero impact on measurements
Critical Lesson: Execution verification is MANDATORY before optimization
2. Standard Procedure (4 Steps)
Step 0: Execution Verification (MANDATORY GATE) ⚠️
Purpose: Prevent wasted effort on ENV-gated or low-frequency code (Phase 29 lesson)
Methods:
A. ENV Gate Check
# Check if feature is runtime-disabled
rg "getenv.*FEATURE_NAME" core/
rg "getenv.*POOL_V2" core/ # Example
B. Execution Counter Verification
-
Find counter reference:
rg -n "atomic.*g_target_counter" core/ -
Check counter in benchmark output:
# Run mixed benchmark 10 times scripts/run_mixed_10_cleanenv.sh # Check if counter > 0 in any run grep "target_counter" results/*.txt -
Optional: Add debug printf (if counter not visible):
#if HAKMEM_DEBUG_PRINT fprintf(stderr, "[DEBUG] counter=%lu\n", atomic_load(&g_target_counter)); #endif
C. perf/flamegraph Verification (optional but recommended)
# Record with perf
perf record -g -F 99 -- ./bench_random_mixed_hakmem
# Check if function appears in profile
perf report | grep "target_function"
Decision Matrix:
| Condition | Action |
|---|---|
| ✅ Counter > 0 in benchmark | Proceed to Step 1 |
| ✅ Function in perf profile | Proceed to Step 1 |
| ❌ ENV gated + OFF by default | SKIP (Phase 29 pattern) |
| ❌ Counter = 0 in all runs | SKIP (not executed) |
| ❌ Function not in flamegraph | SKIP (negligible frequency) |
Output: Document execution verification results in PHASE[N]_AUDIT.md
Step 1: CORRECTNESS/TELEMETRY Classification (Phase 28 lesson)
Purpose: Distinguish between atomics that control behavior vs. atomics that just observe
Classification Rules:
CORRECTNESS (NEVER touch):
- ❌ Used in
if/while/forconditions - ❌ Flow control (queue depth, threshold, capacity checks)
- ❌ Lock-free synchronization (CAS,
atomic_compare_exchange_*) - ❌ Load-store ordering dependencies
- ❌ Affects program decisions/behavior
Examples:
// CORRECTNESS: Controls loop behavior
while (atomic_load(&g_queue_len) < target) { ... }
// CORRECTNESS: Threshold check
if (atomic_load(&g_bg_spill_len) >= MAX_SPILL) { ... }
// CORRECTNESS: CAS synchronization
atomic_compare_exchange_weak(&g_state, &expected, desired)
TELEMETRY (compile-out candidate):
- ✅ Stats/logging/observation only
- ✅ Used exclusively in
printf/fprintf/sprintf - ✅ Deletion changes no program behavior
- ✅ Pure counters (hits, misses, totals)
Examples:
// TELEMETRY: Stats only
atomic_fetch_add(&stats[idx].hits, 1, memory_order_relaxed);
// TELEMETRY: Logging only
fprintf(stderr, "allocs=%lu\n", atomic_load(&g_alloc_count));
Verification Process:
-
List all atomics in target scope:
rg -n "atomic_(fetch_add|load|store).*g_target" core/ -
Track all usage sites:
rg -n "g_target_atomic" core/ -
Check each usage:
- Is it in an
ifcondition? → CORRECTNESS - Is it only in
printf/fprintf? → TELEMETRY - Unsure? → CORRECTNESS (safe default)
- Is it in an
-
Document classification:
## Atomic Classification ### g_alloc_stats (TELEMETRY) - core/box/alloc_gate_stats_box.h:15: atomic_fetch_add (stats only) - core/hakmem.c:89: fprintf output only - **Verdict:** TELEMETRY ✅ ### g_bg_spill_len (CORRECTNESS) - core/box/bgthread_box.h:42: if (atomic_load(...) < TARGET) - **Verdict:** CORRECTNESS ❌ DO NOT TOUCH
Output: Classification table in PHASE[N]_AUDIT.md
Step 2: Compile-Out Implementation (Phase 24-27 pattern)
Purpose: Build-level removal of telemetry atomics (not link-out)
A. Add Compile Gate to BuildFlags
File: core/hakmem_build_flags.h
// ========== [Feature Name] Stats (Phase N) ==========
#ifndef HAKMEM_[NAME]_STATS_COMPILED
# define HAKMEM_[NAME]_STATS_COMPILED 0
#endif
Example:
// ========== Alloc Gate Stats (Phase 24) ==========
#ifndef HAKMEM_ALLOC_GATE_STATS_COMPILED
# define HAKMEM_ALLOC_GATE_STATS_COMPILED 0
#endif
B. Wrap TELEMETRY Atomics with #if
Pattern:
#if HAKMEM_[NAME]_STATS_COMPILED
atomic_fetch_add_explicit(&g_[name]_stat, 1, memory_order_relaxed);
#else
(void)0; // No-op when compiled out
#endif
Example:
#if HAKMEM_ALLOC_GATE_STATS_COMPILED
atomic_fetch_add_explicit(&g_alloc_gate_slow, 1, memory_order_relaxed);
#else
(void)0;
#endif
C. Keep Variable Definitions (important!)
Do NOT remove:
// Keep atomic variable definition (for COMPILED=1 case)
static _Atomic uint64_t g_stat_counter = 0;
// Keep print functions (guarded by same flag)
#if HAKMEM_[NAME]_STATS_COMPILED
void print_stats(void) {
fprintf(stderr, "counter=%lu\n", atomic_load(&g_stat_counter));
}
#endif
D. Prohibited Actions (Phase 22-2 NO-GO lesson)
NEVER:
- ❌ Link-out (removing
.ofiles from Makefile) - ❌ Deleting API functions (breaks linkage)
- ❌ Removing struct definitions (breaks compilation)
- ❌ Runtime
ifchecks (adds branch overhead)
Rationale: Build-level #if has zero runtime cost. Link-out risks ABI breaks.
Step 3: A/B Test (build-level comparison)
Purpose: Measure impact of compile-out vs. compiled-in
A. Baseline Build (COMPILED=0, default)
# Clean build with stats compiled OUT
make clean
make -j bench_random_mixed_hakmem
# Run 10 iterations
scripts/run_mixed_10_cleanenv.sh
# Record results
cp results/mixed_10_summary.txt docs/analysis/PHASE[N]_BASELINE.txt
B. Compiled-In Build (COMPILED=1)
# Clean build with stats compiled IN
make clean
make -j EXTRA_CFLAGS='-DHAKMEM_[NAME]_STATS_COMPILED=1' bench_random_mixed_hakmem
# Run 10 iterations
scripts/run_mixed_10_cleanenv.sh
# Record results
cp results/mixed_10_summary.txt docs/analysis/PHASE[N]_COMPILED_IN.txt
C. Compare Results
# Calculate delta
scripts/compare_benchmark_results.sh \
docs/analysis/PHASE[N]_BASELINE.txt \
docs/analysis/PHASE[N]_COMPILED_IN.txt
D. Decision Matrix
| Delta | Verdict | Action |
|---|---|---|
| +0.5% or higher | GO | Keep compile-out, document win |
| ±0.5% | NEUTRAL | Keep for code cleanliness |
| -0.5% or lower | NO-GO | Revert changes |
Rationale:
- +0.5%: Statistically significant (HOT path impact)
- ±0.5%: Noise range (but cleanliness still valuable)
- -0.5%: Unexpected regression (likely measurement error, revert)
Output: PHASE[N]_RESULTS.md with full comparison
3. Phase Checklist Template
Copy this for each new phase:
## Phase [N]: [Target Description] Atomic Prune
**Date:** YYYY-MM-DD
**Target:** [Atomic variable/scope name]
**Expected Impact:** [HOT/WARM/COLD path, estimated %]
---
### Step 0: Execution Verification ✅/❌
- [ ] **ENV Gate Check**
```bash
rg "getenv.*[FEATURE]" core/
Result: [No ENV gate / Gated by X=OFF / Gated by X=ON]
-
Execution Counter Verification
rg -n "atomic.*g_target" core/ scripts/run_mixed_10_cleanenv.sh grep "target_counter" results/*.txtResult: [Counter > 0 in all runs / Counter = 0 / Not visible]
-
perf Profile Check (optional)
perf record -g -F 99 -- ./bench_random_mixed_hakmem perf report | grep "target_function"Result: [Function appears in profile / Not in profile]
Verdict: [✅ PROCEED / ❌ SKIP (reason)]
Step 1: CORRECTNESS/TELEMETRY Classification
-
List All Atomics
rg -n "atomic_(fetch_add|load|store).*g_" [target_file] -
Track All Usage Sites
rg -n "g_atomic_var" core/ -
Classify Each Atomic
Atomic Variable Usage Class Verdict g_var1ifconditionCORRECTNESS ❌ DO NOT TOUCH g_var2fprintfonlyTELEMETRY ✅ Candidate -
Document Classification Rationale
Output: Classification table saved to PHASE[N]_AUDIT.md
Step 2: Compile-Out Implementation
-
Add BuildFlags Gate
// core/hakmem_build_flags.h #ifndef HAKMEM_[NAME]_STATS_COMPILED # define HAKMEM_[NAME]_STATS_COMPILED 0 #endif -
Wrap TELEMETRY Atomics
#if HAKMEM_[NAME]_STATS_COMPILED atomic_fetch_add_explicit(&g_stat, 1, memory_order_relaxed); #else (void)0; #endif -
Verify Compilation
make clean && make -j # COMPILED=0 default make clean && make -j EXTRA_CFLAGS='-DHAKMEM_[NAME]_STATS_COMPILED=1'
Step 3: A/B Test
-
Baseline Build (COMPILED=0)
make clean && make -j bench_random_mixed_hakmem scripts/run_mixed_10_cleanenv.sh cp results/mixed_10_summary.txt docs/analysis/PHASE[N]_BASELINE.txt -
Compiled-In Build (COMPILED=1)
make clean && make -j EXTRA_CFLAGS='-DHAKMEM_[NAME]_STATS_COMPILED=1' bench_random_mixed_hakmem scripts/run_mixed_10_cleanenv.sh cp results/mixed_10_summary.txt docs/analysis/PHASE[N]_COMPILED_IN.txt -
Compare Results
scripts/compare_benchmark_results.sh \ docs/analysis/PHASE[N]_BASELINE.txt \ docs/analysis/PHASE[N]_COMPILED_IN.txt -
Record Verdict
- Delta: [+X.XX%]
- Verdict: [GO / NEUTRAL / NO-GO]
- Rationale: [...]
Output: PHASE[N]_RESULTS.md with full comparison
Deliverables
PHASE[N]_AUDIT.md- Classification and execution verificationPHASE[N]_BASELINE.txt- Baseline benchmark resultsPHASE[N]_COMPILED_IN.txt- Compiled-in benchmark resultsPHASE[N]_RESULTS.md- A/B comparison and verdict- Update
ATOMIC_PRUNE_CUMULATIVE_SUMMARY.mdwith Phase [N] results - Update
CURRENT_TASK.mdwith next phase
Notes
[Add any phase-specific observations, gotchas, or learnings here]
---
## 4. Success Criteria
A phase is considered **GO** if:
1. ✅ Step 0: Execution verified (counter > 0 or perf profile hit)
2. ✅ Step 1: Pure TELEMETRY classification (no CORRECTNESS atomics)
3. ✅ Step 2: Clean compile-out implementation (no link-out)
4. ✅ Step 3: +0.5% or higher performance delta
A phase is **NO-OP** if:
- ❌ Step 0: Not executed in benchmark (Phase 29)
- ❌ Step 1: CORRECTNESS atomic (Phase 28)
- ❌ Step 3: Delta within ±0.5% noise range
---
## 5. Anti-Patterns to Avoid
### ❌ Skipping Execution Verification (Phase 29)
**Problem:** Optimizing ENV-gated code that never runs
**Solution:** Always run Step 0 before any work
### ❌ Assuming Counter = Telemetry (Phase 28)
**Problem:** Flow control atomics look like counters
**Solution:** Check all usage sites, especially `if` conditions
### ❌ Link-Out Instead of Compile-Out (Phase 22-2)
**Problem:** ABI breaks, mysterious link errors
**Solution:** Use `#if` preprocessor guards, never remove `.o` files
### ❌ Runtime Flags for Stats (not attempted, but common mistake)
**Problem:** `if (g_enable_stats)` adds branch overhead
**Solution:** Build-level `#if` has zero runtime cost
---
## 6. Expected Impact by Path Type
Based on Phase 24-29 results:
| Path Type | Expected Delta | Example Phases |
|-----------|----------------|----------------|
| **HOT** (alloc/free fast path) | **+0.5% to +1.5%** | Phase 24 (+0.93%), Phase 25 (+1.07%) |
| **WARM** (TLS cache hit) | **+0.2% to +0.8%** | Phase 27 (+0.74%) |
| **COLD** (slow path, rare events) | **±0.0% to +0.2%** | Phase 26 (NEUTRAL, cleanliness) |
| **ENV-gated OFF** | **0.0% (no-op)** | Phase 29 (pool v2) |
| **CORRECTNESS** | **Undefined (DO NOT TOUCH)** | Phase 28 (bg_spill_len) |
---
## 7. Tools and Scripts
### Execution Verification
```bash
# ENV gate check
rg "getenv.*FEATURE" core/
# Counter check (requires benchmark run)
scripts/run_mixed_10_cleanenv.sh
grep "counter_name" results/*.txt
# perf profile
perf record -g -F 99 -- ./bench_random_mixed_hakmem
perf report | grep "function_name"
Classification Audit
# List all atomics in scope
rg -n "atomic_(fetch_add|load|store|compare_exchange)" [file]
# Track variable usage
rg -n "g_variable_name" core/
# Find if conditions
rg -n "if.*g_variable" core/
A/B Testing
# Baseline
make clean && make -j bench_random_mixed_hakmem
scripts/run_mixed_10_cleanenv.sh
# Compiled-in
make clean && make -j EXTRA_CFLAGS='-DHAKMEM_FEATURE_COMPILED=1' bench_random_mixed_hakmem
scripts/run_mixed_10_cleanenv.sh
# Compare (if script exists)
scripts/compare_benchmark_results.sh baseline.txt compiled_in.txt
8. Governance
When to Use This Procedure:
- Any new atomic prune phase (Phase 31+)
- Reviewing existing compile-out flags for consistency
- Training new contributors on atomic optimization
When to Skip:
- Non-atomic optimizations (inlining, data structure changes)
- Known CORRECTNESS atomics (Step 1 already failed)
- Features explicitly marked "do not optimize"
Document Updates:
- This procedure should be updated after each phase if new patterns emerge
- Phase results should update
ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md - New anti-patterns should be added to Section 5
9. References
- Phase 24 Results:
docs/analysis/PHASE24_ALLOC_GATE_STATS_RESULTS.md(+0.93%) - Phase 25 Results:
docs/analysis/PHASE25_FREE_PATH_STATS_RESULTS.md(+1.07%) - Phase 27 Results:
docs/analysis/PHASE27_TINY_FRONT_STATS_RESULTS.md(+0.74%) - Phase 28 NO-OP:
docs/analysis/PHASE28_BGTHREAD_ATOMIC_AUDIT.md(CORRECTNESS) - Phase 29 NO-OP:
docs/analysis/PHASE29_POOL_V2_AUDIT.md(ENV-gated) - Cumulative Summary:
docs/analysis/ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md
End of Standard Procedure Document
Next: Apply Step 0 to Phase 31 candidates to ensure execution before optimization.