Files

Moe Charm (CI) 506e724c3b Phase 30-31: Standard procedure + g_tiny_free_trace atomic prune

Phase 30: Standard Procedure Establishment
- Created 4-step standardized methodology (Step 0-3)
- Step 0: Execution Verification (NEW - Phase 29 lesson)
- Step 1: CORRECTNESS/TELEMETRY Classification (Phase 28 lesson)
- Step 2: Compile-Out Implementation (Phase 24-27 pattern)
- Step 3: A/B Test (build-level comparison)
- Executed audit_atomics.sh: 412 atomics analyzed
- Identified Phase 31 candidate: g_tiny_free_trace (HOT path, TOP PRIORITY)

Phase 31: g_tiny_free_trace Compile-Out (HOT Path TELEMETRY)
- Target: core/hakmem_tiny_free.inc:326 (trace-rate-limit atomic)
- Added HAKMEM_TINY_FREE_TRACE_COMPILED (default: 0)
- Classification: Pure TELEMETRY (trace output only, no flow control)
- A/B Result: NEUTRAL (baseline -0.35% mean, +0.19% median)
- Verdict: NEUTRAL → Adopted for code cleanliness (Phase 26 precedent)
- Rationale: HOT path TELEMETRY removal improves code quality

A/B Test Details:
- Baseline (COMPILED=0): 53.638M ops/s mean, 53.799M median
- Compiled-in (COMPILED=1): 53.828M ops/s mean, 53.697M median
- Conflicting signals within ±0.5% noise margin
- Phase 25 comparison: g_free_ss_enter (+1.07% GO) vs g_tiny_free_trace (NEUTRAL)
- Hypothesis: Rate-limited atomic (128 calls) optimized by compiler

Cumulative Progress (Phase 24-31):
- Phase 24 (class stats): +0.93% GO
- Phase 25 (free stats): +1.07% GO
- Phase 26 (diagnostics): -0.33% NEUTRAL
- Phase 27 (unified cache): +0.74% GO
- Phase 28 (bg spill): NO-OP (all CORRECTNESS)
- Phase 29 (pool v2): NO-OP (ENV-gated)
- Phase 30 (procedure): PROCEDURE
- Phase 31 (free trace): -0.35% NEUTRAL
- Total: 18 atomics removed, +2.74% net improvement

Documentation Created:
- PHASE30_STANDARD_PROCEDURE.md: Complete 4-step methodology
- ATOMIC_AUDIT_FULL.txt: 412 atomics comprehensive audit
- PHASE31_CANDIDATES_HOT/WARM.txt: Priority-sorted candidates
- PHASE31_RECOMMENDED_CANDIDATES.md: TOP 3 with Step 0 verification
- PHASE31_TINY_FREE_TRACE_ATOMIC_PRUNE_RESULTS.md: Complete A/B results
- ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md: Updated (Phase 30-31)
- CURRENT_TASK.md: Phase 32 candidate identified (g_hak_tiny_free_calls)

Key Lessons:
- Lesson 7 (Phase 30): Step 0 execution verification prevents wasted effort
- Lesson 8 (Phase 31): NEUTRAL + code cleanliness = valid adoption
- HOT path ≠ guaranteed performance win (rate-limited atomics may be optimized)

Next Phase: Phase 32 candidate (g_hak_tiny_free_calls)
- Location: core/hakmem_tiny_free.inc:335 (9 lines below Phase 31 target)
- Expected: +0.3~0.7% or NEUTRAL

Generated with Claude Code
https://claude.com/claude-code

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

2025-12-16 07:31:15 +09:00

16 KiB

Raw Blame History

Phase 30: Standard Procedure for Atomic Prune Operations

Date: 2025-12-16 Status: PROCEDURE STANDARDIZATION Purpose: Codify learnings from Phase 24-29 to prevent no-op phases

Executive Summary

Phase 24-29 taught us critical lessons about atomic pruning success factors:

GO phases (+2.74% cumulative): HOT/WARM path telemetry atomic removal works
NO-OP phases (Phase 28-29): Correctness atomics and ENV-gated code waste effort

This document standardizes a 4-step procedure to ensure future phases target high-impact, executable code.

1. Phase 24-29 Cumulative Lessons

Phase 24-27: GO (+2.74% cumulative)

Pattern: HOT/WARM path telemetry atomic removal

Phase 24 (alloc stats): +0.93%
- Removed atomic_fetch_add in malloc_tiny_fast() hot path
- Stats compiled out with HAKMEM_ALLOC_GATE_STATS_COMPILED=0
Phase 25 (free stats): +1.07%
- Removed atomic_fetch_add in free_tiny_fast_hotcold() hot path
- Stats compiled out with HAKMEM_FREE_PATH_STATS_COMPILED=0
Phase 27 (unified cache): +0.74%
- Removed atomic_fetch_add in TLS cache hit path
- Stats compiled out with HAKMEM_TINY_FRONT_STATS_COMPILED=0

Success Factors:

✅ Executed in every allocation/free (HOT path)
✅ Pure telemetry (stats only, no control flow)
✅ Build-level compile-out (no runtime overhead)

Phase 26: NEUTRAL (code cleanliness)

Pattern: Low-frequency but still compile-out

Tiny header tracking stats (COLD path)
No performance impact but maintains future maintainability
Kept compile-out mechanism for consistency

Lesson: Even low-frequency telemetry benefits from compile-out for code cleanliness.

Phase 28: NO-OP (CORRECTNESS atomics)

Anti-pattern: Misidentified counter purpose

Target: g_bg_spill_len (looked like a counter)
Reality: Flow control atomic (queue depth tracking)

Usage:

if (atomic_load(&g_bg_spill_len) < TARGET_SPILL_LEN) {
    // Decision-making logic
}

Critical Lesson: Counter name ≠ Counter purpose

CORRECTNESS atomics (NEVER touch):

Used in if/while conditions
Flow control (queue depth, threshold checks)
Lock-free synchronization (CAS, load-store ordering)
Affects program behavior if removed

Phase 29: NO-OP (ENV-gated, not executed)

Anti-pattern: Optimizing dead code

Target: Pool v2 stats atomics
Reality: Gated by getenv("HAKMEM_POOL_V2") = OFF by default
Benchmark: Never executes pool v2 code paths
Result: Zero impact on measurements

Critical Lesson: Execution verification is MANDATORY before optimization

2. Standard Procedure (4 Steps)

Step 0: Execution Verification (MANDATORY GATE) ⚠️

Purpose: Prevent wasted effort on ENV-gated or low-frequency code (Phase 29 lesson)

Methods:

A. ENV Gate Check

# Check if feature is runtime-disabled
rg "getenv.*FEATURE_NAME" core/
rg "getenv.*POOL_V2" core/  # Example

B. Execution Counter Verification

Find counter reference:
```
rg -n "atomic.*g_target_counter" core/
```

Check counter in benchmark output:

# Run mixed benchmark 10 times
scripts/run_mixed_10_cleanenv.sh

# Check if counter > 0 in any run
grep "target_counter" results/*.txt

Optional: Add debug printf (if counter not visible):

#if HAKMEM_DEBUG_PRINT
fprintf(stderr, "[DEBUG] counter=%lu\n",
        atomic_load(&g_target_counter));
#endif

C. perf/flamegraph Verification (optional but recommended)

# Record with perf
perf record -g -F 99 -- ./bench_random_mixed_hakmem

# Check if function appears in profile
perf report | grep "target_function"

Decision Matrix:

Condition	Action
✅ Counter > 0 in benchmark	Proceed to Step 1
✅ Function in perf profile	Proceed to Step 1
❌ ENV gated + OFF by default	SKIP (Phase 29 pattern)
❌ Counter = 0 in all runs	SKIP (not executed)
❌ Function not in flamegraph	SKIP (negligible frequency)

Output: Document execution verification results in PHASE[N]_AUDIT.md

Step 1: CORRECTNESS/TELEMETRY Classification (Phase 28 lesson)

Purpose: Distinguish between atomics that control behavior vs. atomics that just observe

Classification Rules:

CORRECTNESS (NEVER touch):

❌ Used in if/while/for conditions
❌ Flow control (queue depth, threshold, capacity checks)
❌ Lock-free synchronization (CAS, atomic_compare_exchange_*)
❌ Load-store ordering dependencies
❌ Affects program decisions/behavior

Examples:

// CORRECTNESS: Controls loop behavior
while (atomic_load(&g_queue_len) < target) { ... }

// CORRECTNESS: Threshold check
if (atomic_load(&g_bg_spill_len) >= MAX_SPILL) { ... }

// CORRECTNESS: CAS synchronization
atomic_compare_exchange_weak(&g_state, &expected, desired)

TELEMETRY (compile-out candidate):

✅ Stats/logging/observation only
✅ Used exclusively in printf/fprintf/sprintf
✅ Deletion changes no program behavior
✅ Pure counters (hits, misses, totals)

Examples:

// TELEMETRY: Stats only
atomic_fetch_add(&stats[idx].hits, 1, memory_order_relaxed);

// TELEMETRY: Logging only
fprintf(stderr, "allocs=%lu\n", atomic_load(&g_alloc_count));

Verification Process:

List all atomics in target scope:

rg -n "atomic_(fetch_add|load|store).*g_target" core/

Track all usage sites:
```
rg -n "g_target_atomic" core/
```
Check each usage:
- Is it in an if condition? → CORRECTNESS
- Is it only in printf/fprintf? → TELEMETRY
- Unsure? → CORRECTNESS (safe default)

Document classification:

## Atomic Classification

### g_alloc_stats (TELEMETRY)
- core/box/alloc_gate_stats_box.h:15: atomic_fetch_add (stats only)
- core/hakmem.c:89: fprintf output only
- **Verdict:** TELEMETRY ✅

### g_bg_spill_len (CORRECTNESS)
- core/box/bgthread_box.h:42: if (atomic_load(...) < TARGET)
- **Verdict:** CORRECTNESS ❌ DO NOT TOUCH

Output: Classification table in PHASE[N]_AUDIT.md

Step 2: Compile-Out Implementation (Phase 24-27 pattern)

Purpose: Build-level removal of telemetry atomics (not link-out)

A. Add Compile Gate to BuildFlags

File: core/hakmem_build_flags.h

// ========== [Feature Name] Stats (Phase N) ==========
#ifndef HAKMEM_[NAME]_STATS_COMPILED
#  define HAKMEM_[NAME]_STATS_COMPILED 0
#endif

Example:

// ========== Alloc Gate Stats (Phase 24) ==========
#ifndef HAKMEM_ALLOC_GATE_STATS_COMPILED
#  define HAKMEM_ALLOC_GATE_STATS_COMPILED 0
#endif

B. Wrap TELEMETRY Atomics with #if

Pattern:

#if HAKMEM_[NAME]_STATS_COMPILED
    atomic_fetch_add_explicit(&g_[name]_stat, 1, memory_order_relaxed);
#else
    (void)0;  // No-op when compiled out
#endif

Example:

#if HAKMEM_ALLOC_GATE_STATS_COMPILED
    atomic_fetch_add_explicit(&g_alloc_gate_slow, 1, memory_order_relaxed);
#else
    (void)0;
#endif

C. Keep Variable Definitions (important!)

Do NOT remove:

// Keep atomic variable definition (for COMPILED=1 case)
static _Atomic uint64_t g_stat_counter = 0;

// Keep print functions (guarded by same flag)
#if HAKMEM_[NAME]_STATS_COMPILED
void print_stats(void) {
    fprintf(stderr, "counter=%lu\n", atomic_load(&g_stat_counter));
}
#endif

D. Prohibited Actions (Phase 22-2 NO-GO lesson)

NEVER:

❌ Link-out (removing .o files from Makefile)
❌ Deleting API functions (breaks linkage)
❌ Removing struct definitions (breaks compilation)
❌ Runtime if checks (adds branch overhead)

Rationale: Build-level #if has zero runtime cost. Link-out risks ABI breaks.

Step 3: A/B Test (build-level comparison)

Purpose: Measure impact of compile-out vs. compiled-in

A. Baseline Build (COMPILED=0, default)

# Clean build with stats compiled OUT
make clean
make -j bench_random_mixed_hakmem

# Run 10 iterations
scripts/run_mixed_10_cleanenv.sh

# Record results
cp results/mixed_10_summary.txt docs/analysis/PHASE[N]_BASELINE.txt

B. Compiled-In Build (COMPILED=1)

# Clean build with stats compiled IN
make clean
make -j EXTRA_CFLAGS='-DHAKMEM_[NAME]_STATS_COMPILED=1' bench_random_mixed_hakmem

# Run 10 iterations
scripts/run_mixed_10_cleanenv.sh

# Record results
cp results/mixed_10_summary.txt docs/analysis/PHASE[N]_COMPILED_IN.txt

C. Compare Results

# Calculate delta
scripts/compare_benchmark_results.sh \
    docs/analysis/PHASE[N]_BASELINE.txt \
    docs/analysis/PHASE[N]_COMPILED_IN.txt

D. Decision Matrix

Delta	Verdict	Action
+0.5% or higher	GO	Keep compile-out, document win
±0.5%	NEUTRAL	Keep for code cleanliness
-0.5% or lower	NO-GO	Revert changes

Rationale:

+0.5%: Statistically significant (HOT path impact)
±0.5%: Noise range (but cleanliness still valuable)
-0.5%: Unexpected regression (likely measurement error, revert)

Output: PHASE[N]_RESULTS.md with full comparison

3. Phase Checklist Template

Copy this for each new phase:

## Phase [N]: [Target Description] Atomic Prune

**Date:** YYYY-MM-DD
**Target:** [Atomic variable/scope name]
**Expected Impact:** [HOT/WARM/COLD path, estimated %]

---

### Step 0: Execution Verification ✅/❌

- [ ] **ENV Gate Check**
  ```bash
  rg "getenv.*[FEATURE]" core/

Result: [No ENV gate / Gated by X=OFF / Gated by X=ON]

Execution Counter Verification

rg -n "atomic.*g_target" core/
scripts/run_mixed_10_cleanenv.sh
grep "target_counter" results/*.txt

Result: [Counter > 0 in all runs / Counter = 0 / Not visible]

perf Profile Check (optional)

perf record -g -F 99 -- ./bench_random_mixed_hakmem
perf report | grep "target_function"

Result: [Function appears in profile / Not in profile]

Verdict: [✅ PROCEED / ❌ SKIP (reason)]

Step 1: CORRECTNESS/TELEMETRY Classification

List All Atomics

rg -n "atomic_(fetch_add|load|store).*g_" [target_file]

Track All Usage Sites
```
rg -n "g_atomic_var" core/
```
Classify Each Atomic

Atomic Variable Usage Class Verdict

g_var1 if condition CORRECTNESS ❌ DO NOT TOUCH

g_var2 fprintf only TELEMETRY ✅ Candidate
Document Classification Rationale

Atomic Variable	Usage	Class	Verdict
`g_var1`	`if` condition	CORRECTNESS	❌ DO NOT TOUCH
`g_var2`	`fprintf` only	TELEMETRY	✅ Candidate

Output: Classification table saved to PHASE[N]_AUDIT.md

Step 2: Compile-Out Implementation

Add BuildFlags Gate

// core/hakmem_build_flags.h
#ifndef HAKMEM_[NAME]_STATS_COMPILED
#  define HAKMEM_[NAME]_STATS_COMPILED 0
#endif

Wrap TELEMETRY Atomics

#if HAKMEM_[NAME]_STATS_COMPILED
    atomic_fetch_add_explicit(&g_stat, 1, memory_order_relaxed);
#else
    (void)0;
#endif

Verify Compilation

make clean && make -j  # COMPILED=0 default
make clean && make -j EXTRA_CFLAGS='-DHAKMEM_[NAME]_STATS_COMPILED=1'

Step 3: A/B Test

Baseline Build (COMPILED=0)

make clean && make -j bench_random_mixed_hakmem
scripts/run_mixed_10_cleanenv.sh
cp results/mixed_10_summary.txt docs/analysis/PHASE[N]_BASELINE.txt

Compiled-In Build (COMPILED=1)

make clean && make -j EXTRA_CFLAGS='-DHAKMEM_[NAME]_STATS_COMPILED=1' bench_random_mixed_hakmem
scripts/run_mixed_10_cleanenv.sh
cp results/mixed_10_summary.txt docs/analysis/PHASE[N]_COMPILED_IN.txt

Compare Results

scripts/compare_benchmark_results.sh \
    docs/analysis/PHASE[N]_BASELINE.txt \
    docs/analysis/PHASE[N]_COMPILED_IN.txt

Record Verdict
- Delta: [+X.XX%]
- Verdict: [GO / NEUTRAL / NO-GO]
- Rationale: [...]