Files
hakmem/docs/analysis/PHASE30_STANDARD_PROCEDURE.md
Moe Charm (CI) 506e724c3b Phase 30-31: Standard procedure + g_tiny_free_trace atomic prune
Phase 30: Standard Procedure Establishment
- Created 4-step standardized methodology (Step 0-3)
- Step 0: Execution Verification (NEW - Phase 29 lesson)
- Step 1: CORRECTNESS/TELEMETRY Classification (Phase 28 lesson)
- Step 2: Compile-Out Implementation (Phase 24-27 pattern)
- Step 3: A/B Test (build-level comparison)
- Executed audit_atomics.sh: 412 atomics analyzed
- Identified Phase 31 candidate: g_tiny_free_trace (HOT path, TOP PRIORITY)

Phase 31: g_tiny_free_trace Compile-Out (HOT Path TELEMETRY)
- Target: core/hakmem_tiny_free.inc:326 (trace-rate-limit atomic)
- Added HAKMEM_TINY_FREE_TRACE_COMPILED (default: 0)
- Classification: Pure TELEMETRY (trace output only, no flow control)
- A/B Result: NEUTRAL (baseline -0.35% mean, +0.19% median)
- Verdict: NEUTRAL → Adopted for code cleanliness (Phase 26 precedent)
- Rationale: HOT path TELEMETRY removal improves code quality

A/B Test Details:
- Baseline (COMPILED=0): 53.638M ops/s mean, 53.799M median
- Compiled-in (COMPILED=1): 53.828M ops/s mean, 53.697M median
- Conflicting signals within ±0.5% noise margin
- Phase 25 comparison: g_free_ss_enter (+1.07% GO) vs g_tiny_free_trace (NEUTRAL)
- Hypothesis: Rate-limited atomic (128 calls) optimized by compiler

Cumulative Progress (Phase 24-31):
- Phase 24 (class stats): +0.93% GO
- Phase 25 (free stats): +1.07% GO
- Phase 26 (diagnostics): -0.33% NEUTRAL
- Phase 27 (unified cache): +0.74% GO
- Phase 28 (bg spill): NO-OP (all CORRECTNESS)
- Phase 29 (pool v2): NO-OP (ENV-gated)
- Phase 30 (procedure): PROCEDURE
- Phase 31 (free trace): -0.35% NEUTRAL
- Total: 18 atomics removed, +2.74% net improvement

Documentation Created:
- PHASE30_STANDARD_PROCEDURE.md: Complete 4-step methodology
- ATOMIC_AUDIT_FULL.txt: 412 atomics comprehensive audit
- PHASE31_CANDIDATES_HOT/WARM.txt: Priority-sorted candidates
- PHASE31_RECOMMENDED_CANDIDATES.md: TOP 3 with Step 0 verification
- PHASE31_TINY_FREE_TRACE_ATOMIC_PRUNE_RESULTS.md: Complete A/B results
- ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md: Updated (Phase 30-31)
- CURRENT_TASK.md: Phase 32 candidate identified (g_hak_tiny_free_calls)

Key Lessons:
- Lesson 7 (Phase 30): Step 0 execution verification prevents wasted effort
- Lesson 8 (Phase 31): NEUTRAL + code cleanliness = valid adoption
- HOT path ≠ guaranteed performance win (rate-limited atomics may be optimized)

Next Phase: Phase 32 candidate (g_hak_tiny_free_calls)
- Location: core/hakmem_tiny_free.inc:335 (9 lines below Phase 31 target)
- Expected: +0.3~0.7% or NEUTRAL

Generated with Claude Code
https://claude.com/claude-code

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-16 07:31:15 +09:00

16 KiB

Phase 30: Standard Procedure for Atomic Prune Operations

Date: 2025-12-16 Status: PROCEDURE STANDARDIZATION Purpose: Codify learnings from Phase 24-29 to prevent no-op phases


Executive Summary

Phase 24-29 taught us critical lessons about atomic pruning success factors:

  • GO phases (+2.74% cumulative): HOT/WARM path telemetry atomic removal works
  • NO-OP phases (Phase 28-29): Correctness atomics and ENV-gated code waste effort

This document standardizes a 4-step procedure to ensure future phases target high-impact, executable code.


1. Phase 24-29 Cumulative Lessons

Phase 24-27: GO (+2.74% cumulative)

Pattern: HOT/WARM path telemetry atomic removal

  • Phase 24 (alloc stats): +0.93%

    • Removed atomic_fetch_add in malloc_tiny_fast() hot path
    • Stats compiled out with HAKMEM_ALLOC_GATE_STATS_COMPILED=0
  • Phase 25 (free stats): +1.07%

    • Removed atomic_fetch_add in free_tiny_fast_hotcold() hot path
    • Stats compiled out with HAKMEM_FREE_PATH_STATS_COMPILED=0
  • Phase 27 (unified cache): +0.74%

    • Removed atomic_fetch_add in TLS cache hit path
    • Stats compiled out with HAKMEM_TINY_FRONT_STATS_COMPILED=0

Success Factors:

  • Executed in every allocation/free (HOT path)
  • Pure telemetry (stats only, no control flow)
  • Build-level compile-out (no runtime overhead)

Phase 26: NEUTRAL (code cleanliness)

Pattern: Low-frequency but still compile-out

  • Tiny header tracking stats (COLD path)
  • No performance impact but maintains future maintainability
  • Kept compile-out mechanism for consistency

Lesson: Even low-frequency telemetry benefits from compile-out for code cleanliness.

Phase 28: NO-OP (CORRECTNESS atomics)

Anti-pattern: Misidentified counter purpose

  • Target: g_bg_spill_len (looked like a counter)
  • Reality: Flow control atomic (queue depth tracking)
  • Usage:
    if (atomic_load(&g_bg_spill_len) < TARGET_SPILL_LEN) {
        // Decision-making logic
    }
    

Critical Lesson: Counter name ≠ Counter purpose

CORRECTNESS atomics (NEVER touch):

  • Used in if/while conditions
  • Flow control (queue depth, threshold checks)
  • Lock-free synchronization (CAS, load-store ordering)
  • Affects program behavior if removed

Phase 29: NO-OP (ENV-gated, not executed)

Anti-pattern: Optimizing dead code

  • Target: Pool v2 stats atomics
  • Reality: Gated by getenv("HAKMEM_POOL_V2") = OFF by default
  • Benchmark: Never executes pool v2 code paths
  • Result: Zero impact on measurements

Critical Lesson: Execution verification is MANDATORY before optimization


2. Standard Procedure (4 Steps)

Step 0: Execution Verification (MANDATORY GATE) ⚠️

Purpose: Prevent wasted effort on ENV-gated or low-frequency code (Phase 29 lesson)

Methods:

A. ENV Gate Check

# Check if feature is runtime-disabled
rg "getenv.*FEATURE_NAME" core/
rg "getenv.*POOL_V2" core/  # Example

B. Execution Counter Verification

  1. Find counter reference:

    rg -n "atomic.*g_target_counter" core/
    
  2. Check counter in benchmark output:

    # Run mixed benchmark 10 times
    scripts/run_mixed_10_cleanenv.sh
    
    # Check if counter > 0 in any run
    grep "target_counter" results/*.txt
    
  3. Optional: Add debug printf (if counter not visible):

    #if HAKMEM_DEBUG_PRINT
    fprintf(stderr, "[DEBUG] counter=%lu\n",
            atomic_load(&g_target_counter));
    #endif
    

C. perf/flamegraph Verification (optional but recommended)

# Record with perf
perf record -g -F 99 -- ./bench_random_mixed_hakmem

# Check if function appears in profile
perf report | grep "target_function"

Decision Matrix:

Condition Action
Counter > 0 in benchmark Proceed to Step 1
Function in perf profile Proceed to Step 1
ENV gated + OFF by default SKIP (Phase 29 pattern)
Counter = 0 in all runs SKIP (not executed)
Function not in flamegraph SKIP (negligible frequency)

Output: Document execution verification results in PHASE[N]_AUDIT.md


Step 1: CORRECTNESS/TELEMETRY Classification (Phase 28 lesson)

Purpose: Distinguish between atomics that control behavior vs. atomics that just observe

Classification Rules:

CORRECTNESS (NEVER touch):

  • Used in if/while/for conditions
  • Flow control (queue depth, threshold, capacity checks)
  • Lock-free synchronization (CAS, atomic_compare_exchange_*)
  • Load-store ordering dependencies
  • Affects program decisions/behavior

Examples:

// CORRECTNESS: Controls loop behavior
while (atomic_load(&g_queue_len) < target) { ... }

// CORRECTNESS: Threshold check
if (atomic_load(&g_bg_spill_len) >= MAX_SPILL) { ... }

// CORRECTNESS: CAS synchronization
atomic_compare_exchange_weak(&g_state, &expected, desired)

TELEMETRY (compile-out candidate):

  • Stats/logging/observation only
  • Used exclusively in printf/fprintf/sprintf
  • Deletion changes no program behavior
  • Pure counters (hits, misses, totals)

Examples:

// TELEMETRY: Stats only
atomic_fetch_add(&stats[idx].hits, 1, memory_order_relaxed);

// TELEMETRY: Logging only
fprintf(stderr, "allocs=%lu\n", atomic_load(&g_alloc_count));

Verification Process:

  1. List all atomics in target scope:

    rg -n "atomic_(fetch_add|load|store).*g_target" core/
    
  2. Track all usage sites:

    rg -n "g_target_atomic" core/
    
  3. Check each usage:

    • Is it in an if condition? → CORRECTNESS
    • Is it only in printf/fprintf? → TELEMETRY
    • Unsure? → CORRECTNESS (safe default)
  4. Document classification:

    ## Atomic Classification
    
    ### g_alloc_stats (TELEMETRY)
    - core/box/alloc_gate_stats_box.h:15: atomic_fetch_add (stats only)
    - core/hakmem.c:89: fprintf output only
    - **Verdict:** TELEMETRY ✅
    
    ### g_bg_spill_len (CORRECTNESS)
    - core/box/bgthread_box.h:42: if (atomic_load(...) < TARGET)
    - **Verdict:** CORRECTNESS  DO NOT TOUCH
    

Output: Classification table in PHASE[N]_AUDIT.md


Step 2: Compile-Out Implementation (Phase 24-27 pattern)

Purpose: Build-level removal of telemetry atomics (not link-out)

A. Add Compile Gate to BuildFlags

File: core/hakmem_build_flags.h

// ========== [Feature Name] Stats (Phase N) ==========
#ifndef HAKMEM_[NAME]_STATS_COMPILED
#  define HAKMEM_[NAME]_STATS_COMPILED 0
#endif

Example:

// ========== Alloc Gate Stats (Phase 24) ==========
#ifndef HAKMEM_ALLOC_GATE_STATS_COMPILED
#  define HAKMEM_ALLOC_GATE_STATS_COMPILED 0
#endif

B. Wrap TELEMETRY Atomics with #if

Pattern:

#if HAKMEM_[NAME]_STATS_COMPILED
    atomic_fetch_add_explicit(&g_[name]_stat, 1, memory_order_relaxed);
#else
    (void)0;  // No-op when compiled out
#endif

Example:

#if HAKMEM_ALLOC_GATE_STATS_COMPILED
    atomic_fetch_add_explicit(&g_alloc_gate_slow, 1, memory_order_relaxed);
#else
    (void)0;
#endif

C. Keep Variable Definitions (important!)

Do NOT remove:

// Keep atomic variable definition (for COMPILED=1 case)
static _Atomic uint64_t g_stat_counter = 0;

// Keep print functions (guarded by same flag)
#if HAKMEM_[NAME]_STATS_COMPILED
void print_stats(void) {
    fprintf(stderr, "counter=%lu\n", atomic_load(&g_stat_counter));
}
#endif

D. Prohibited Actions (Phase 22-2 NO-GO lesson)

NEVER:

  • Link-out (removing .o files from Makefile)
  • Deleting API functions (breaks linkage)
  • Removing struct definitions (breaks compilation)
  • Runtime if checks (adds branch overhead)

Rationale: Build-level #if has zero runtime cost. Link-out risks ABI breaks.


Step 3: A/B Test (build-level comparison)

Purpose: Measure impact of compile-out vs. compiled-in

A. Baseline Build (COMPILED=0, default)

# Clean build with stats compiled OUT
make clean
make -j bench_random_mixed_hakmem

# Run 10 iterations
scripts/run_mixed_10_cleanenv.sh

# Record results
cp results/mixed_10_summary.txt docs/analysis/PHASE[N]_BASELINE.txt

B. Compiled-In Build (COMPILED=1)

# Clean build with stats compiled IN
make clean
make -j EXTRA_CFLAGS='-DHAKMEM_[NAME]_STATS_COMPILED=1' bench_random_mixed_hakmem

# Run 10 iterations
scripts/run_mixed_10_cleanenv.sh

# Record results
cp results/mixed_10_summary.txt docs/analysis/PHASE[N]_COMPILED_IN.txt

C. Compare Results

# Calculate delta
scripts/compare_benchmark_results.sh \
    docs/analysis/PHASE[N]_BASELINE.txt \
    docs/analysis/PHASE[N]_COMPILED_IN.txt

D. Decision Matrix

Delta Verdict Action
+0.5% or higher GO Keep compile-out, document win
±0.5% NEUTRAL Keep for code cleanliness
-0.5% or lower NO-GO Revert changes

Rationale:

  • +0.5%: Statistically significant (HOT path impact)
  • ±0.5%: Noise range (but cleanliness still valuable)
  • -0.5%: Unexpected regression (likely measurement error, revert)

Output: PHASE[N]_RESULTS.md with full comparison


3. Phase Checklist Template

Copy this for each new phase:

## Phase [N]: [Target Description] Atomic Prune

**Date:** YYYY-MM-DD
**Target:** [Atomic variable/scope name]
**Expected Impact:** [HOT/WARM/COLD path, estimated %]

---

### Step 0: Execution Verification ✅/❌

- [ ] **ENV Gate Check**
  ```bash
  rg "getenv.*[FEATURE]" core/

Result: [No ENV gate / Gated by X=OFF / Gated by X=ON]

  • Execution Counter Verification

    rg -n "atomic.*g_target" core/
    scripts/run_mixed_10_cleanenv.sh
    grep "target_counter" results/*.txt
    

    Result: [Counter > 0 in all runs / Counter = 0 / Not visible]

  • perf Profile Check (optional)

    perf record -g -F 99 -- ./bench_random_mixed_hakmem
    perf report | grep "target_function"
    

    Result: [Function appears in profile / Not in profile]

Verdict: [ PROCEED / SKIP (reason)]


Step 1: CORRECTNESS/TELEMETRY Classification

  • List All Atomics

    rg -n "atomic_(fetch_add|load|store).*g_" [target_file]
    
  • Track All Usage Sites

    rg -n "g_atomic_var" core/
    
  • Classify Each Atomic

    Atomic Variable Usage Class Verdict
    g_var1 if condition CORRECTNESS DO NOT TOUCH
    g_var2 fprintf only TELEMETRY Candidate
  • Document Classification Rationale

Output: Classification table saved to PHASE[N]_AUDIT.md


Step 2: Compile-Out Implementation

  • Add BuildFlags Gate

    // core/hakmem_build_flags.h
    #ifndef HAKMEM_[NAME]_STATS_COMPILED
    #  define HAKMEM_[NAME]_STATS_COMPILED 0
    #endif
    
  • Wrap TELEMETRY Atomics

    #if HAKMEM_[NAME]_STATS_COMPILED
        atomic_fetch_add_explicit(&g_stat, 1, memory_order_relaxed);
    #else
        (void)0;
    #endif
    
  • Verify Compilation

    make clean && make -j  # COMPILED=0 default
    make clean && make -j EXTRA_CFLAGS='-DHAKMEM_[NAME]_STATS_COMPILED=1'
    

Step 3: A/B Test

  • Baseline Build (COMPILED=0)

    make clean && make -j bench_random_mixed_hakmem
    scripts/run_mixed_10_cleanenv.sh
    cp results/mixed_10_summary.txt docs/analysis/PHASE[N]_BASELINE.txt
    
  • Compiled-In Build (COMPILED=1)

    make clean && make -j EXTRA_CFLAGS='-DHAKMEM_[NAME]_STATS_COMPILED=1' bench_random_mixed_hakmem
    scripts/run_mixed_10_cleanenv.sh
    cp results/mixed_10_summary.txt docs/analysis/PHASE[N]_COMPILED_IN.txt
    
  • Compare Results

    scripts/compare_benchmark_results.sh \
        docs/analysis/PHASE[N]_BASELINE.txt \
        docs/analysis/PHASE[N]_COMPILED_IN.txt
    
  • Record Verdict

    • Delta: [+X.XX%]
    • Verdict: [GO / NEUTRAL / NO-GO]
    • Rationale: [...]

Output: PHASE[N]_RESULTS.md with full comparison


Deliverables

  • PHASE[N]_AUDIT.md - Classification and execution verification
  • PHASE[N]_BASELINE.txt - Baseline benchmark results
  • PHASE[N]_COMPILED_IN.txt - Compiled-in benchmark results
  • PHASE[N]_RESULTS.md - A/B comparison and verdict
  • Update ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md with Phase [N] results
  • Update CURRENT_TASK.md with next phase

Notes

[Add any phase-specific observations, gotchas, or learnings here]


---

## 4. Success Criteria

A phase is considered **GO** if:
1. ✅ Step 0: Execution verified (counter > 0 or perf profile hit)
2. ✅ Step 1: Pure TELEMETRY classification (no CORRECTNESS atomics)
3. ✅ Step 2: Clean compile-out implementation (no link-out)
4. ✅ Step 3: +0.5% or higher performance delta

A phase is **NO-OP** if:
- ❌ Step 0: Not executed in benchmark (Phase 29)
- ❌ Step 1: CORRECTNESS atomic (Phase 28)
- ❌ Step 3: Delta within ±0.5% noise range

---

## 5. Anti-Patterns to Avoid

### ❌ Skipping Execution Verification (Phase 29)
**Problem:** Optimizing ENV-gated code that never runs
**Solution:** Always run Step 0 before any work

### ❌ Assuming Counter = Telemetry (Phase 28)
**Problem:** Flow control atomics look like counters
**Solution:** Check all usage sites, especially `if` conditions

### ❌ Link-Out Instead of Compile-Out (Phase 22-2)
**Problem:** ABI breaks, mysterious link errors
**Solution:** Use `#if` preprocessor guards, never remove `.o` files

### ❌ Runtime Flags for Stats (not attempted, but common mistake)
**Problem:** `if (g_enable_stats)` adds branch overhead
**Solution:** Build-level `#if` has zero runtime cost

---

## 6. Expected Impact by Path Type

Based on Phase 24-29 results:

| Path Type | Expected Delta | Example Phases |
|-----------|----------------|----------------|
| **HOT** (alloc/free fast path) | **+0.5% to +1.5%** | Phase 24 (+0.93%), Phase 25 (+1.07%) |
| **WARM** (TLS cache hit) | **+0.2% to +0.8%** | Phase 27 (+0.74%) |
| **COLD** (slow path, rare events) | **±0.0% to +0.2%** | Phase 26 (NEUTRAL, cleanliness) |
| **ENV-gated OFF** | **0.0% (no-op)** | Phase 29 (pool v2) |
| **CORRECTNESS** | **Undefined (DO NOT TOUCH)** | Phase 28 (bg_spill_len) |

---

## 7. Tools and Scripts

### Execution Verification
```bash
# ENV gate check
rg "getenv.*FEATURE" core/

# Counter check (requires benchmark run)
scripts/run_mixed_10_cleanenv.sh
grep "counter_name" results/*.txt

# perf profile
perf record -g -F 99 -- ./bench_random_mixed_hakmem
perf report | grep "function_name"

Classification Audit

# List all atomics in scope
rg -n "atomic_(fetch_add|load|store|compare_exchange)" [file]

# Track variable usage
rg -n "g_variable_name" core/

# Find if conditions
rg -n "if.*g_variable" core/

A/B Testing

# Baseline
make clean && make -j bench_random_mixed_hakmem
scripts/run_mixed_10_cleanenv.sh

# Compiled-in
make clean && make -j EXTRA_CFLAGS='-DHAKMEM_FEATURE_COMPILED=1' bench_random_mixed_hakmem
scripts/run_mixed_10_cleanenv.sh

# Compare (if script exists)
scripts/compare_benchmark_results.sh baseline.txt compiled_in.txt

8. Governance

When to Use This Procedure:

  • Any new atomic prune phase (Phase 31+)
  • Reviewing existing compile-out flags for consistency
  • Training new contributors on atomic optimization

When to Skip:

  • Non-atomic optimizations (inlining, data structure changes)
  • Known CORRECTNESS atomics (Step 1 already failed)
  • Features explicitly marked "do not optimize"

Document Updates:

  • This procedure should be updated after each phase if new patterns emerge
  • Phase results should update ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md
  • New anti-patterns should be added to Section 5

9. References

  • Phase 24 Results: docs/analysis/PHASE24_ALLOC_GATE_STATS_RESULTS.md (+0.93%)
  • Phase 25 Results: docs/analysis/PHASE25_FREE_PATH_STATS_RESULTS.md (+1.07%)
  • Phase 27 Results: docs/analysis/PHASE27_TINY_FRONT_STATS_RESULTS.md (+0.74%)
  • Phase 28 NO-OP: docs/analysis/PHASE28_BGTHREAD_ATOMIC_AUDIT.md (CORRECTNESS)
  • Phase 29 NO-OP: docs/analysis/PHASE29_POOL_V2_AUDIT.md (ENV-gated)
  • Cumulative Summary: docs/analysis/ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md

End of Standard Procedure Document

Next: Apply Step 0 to Phase 31 candidates to ensure execution before optimization.