Files

Moe Charm (CI) 506e724c3b Phase 30-31: Standard procedure + g_tiny_free_trace atomic prune

Phase 30: Standard Procedure Establishment
- Created 4-step standardized methodology (Step 0-3)
- Step 0: Execution Verification (NEW - Phase 29 lesson)
- Step 1: CORRECTNESS/TELEMETRY Classification (Phase 28 lesson)
- Step 2: Compile-Out Implementation (Phase 24-27 pattern)
- Step 3: A/B Test (build-level comparison)
- Executed audit_atomics.sh: 412 atomics analyzed
- Identified Phase 31 candidate: g_tiny_free_trace (HOT path, TOP PRIORITY)

Phase 31: g_tiny_free_trace Compile-Out (HOT Path TELEMETRY)
- Target: core/hakmem_tiny_free.inc:326 (trace-rate-limit atomic)
- Added HAKMEM_TINY_FREE_TRACE_COMPILED (default: 0)
- Classification: Pure TELEMETRY (trace output only, no flow control)
- A/B Result: NEUTRAL (baseline -0.35% mean, +0.19% median)
- Verdict: NEUTRAL → Adopted for code cleanliness (Phase 26 precedent)
- Rationale: HOT path TELEMETRY removal improves code quality

A/B Test Details:
- Baseline (COMPILED=0): 53.638M ops/s mean, 53.799M median
- Compiled-in (COMPILED=1): 53.828M ops/s mean, 53.697M median
- Conflicting signals within ±0.5% noise margin
- Phase 25 comparison: g_free_ss_enter (+1.07% GO) vs g_tiny_free_trace (NEUTRAL)
- Hypothesis: Rate-limited atomic (128 calls) optimized by compiler

Cumulative Progress (Phase 24-31):
- Phase 24 (class stats): +0.93% GO
- Phase 25 (free stats): +1.07% GO
- Phase 26 (diagnostics): -0.33% NEUTRAL
- Phase 27 (unified cache): +0.74% GO
- Phase 28 (bg spill): NO-OP (all CORRECTNESS)
- Phase 29 (pool v2): NO-OP (ENV-gated)
- Phase 30 (procedure): PROCEDURE
- Phase 31 (free trace): -0.35% NEUTRAL
- Total: 18 atomics removed, +2.74% net improvement

Documentation Created:
- PHASE30_STANDARD_PROCEDURE.md: Complete 4-step methodology
- ATOMIC_AUDIT_FULL.txt: 412 atomics comprehensive audit
- PHASE31_CANDIDATES_HOT/WARM.txt: Priority-sorted candidates
- PHASE31_RECOMMENDED_CANDIDATES.md: TOP 3 with Step 0 verification
- PHASE31_TINY_FREE_TRACE_ATOMIC_PRUNE_RESULTS.md: Complete A/B results
- ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md: Updated (Phase 30-31)
- CURRENT_TASK.md: Phase 32 candidate identified (g_hak_tiny_free_calls)

Key Lessons:
- Lesson 7 (Phase 30): Step 0 execution verification prevents wasted effort
- Lesson 8 (Phase 31): NEUTRAL + code cleanliness = valid adoption
- HOT path ≠ guaranteed performance win (rate-limited atomics may be optimized)

Next Phase: Phase 32 candidate (g_hak_tiny_free_calls)
- Location: core/hakmem_tiny_free.inc:335 (9 lines below Phase 31 target)
- Expected: +0.3~0.7% or NEUTRAL

Generated with Claude Code
https://claude.com/claude-code

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

2025-12-16 07:31:15 +09:00

13 KiB

Raw Blame History

Phase 31: Tiny Free Trace Atomic Prune - Results

Date: 2025-12-16 Type: HOT path TELEMETRY atomic prune Target: g_tiny_free_trace atomic in core/hakmem_tiny_free.inc:326 Verdict: NEUTRAL (code cleanliness adopted)

Executive Summary

Phase 31 targeted the g_tiny_free_trace atomic in the HOT path (hak_tiny_free() entry point). A/B testing showed NEUTRAL performance (-0.35% mean, +0.19% median), well within noise range (±0.5%). Following Phase 26 precedent (5 atomics, -0.33%, adopted for code cleanliness), Phase 31 is ADOPTED with COMPILED=0 as default to reduce HOT path complexity.

Background

Phase 30 Selection Process

From 412 total atomics audited:

HOT path candidates: 16 total
- 5 TELEMETRY (4 already compiled-out in Phases 24-27)
- 11 UNKNOWN (require manual review)

Phase 31 candidate selected: g_tiny_free_trace (HOT path, TELEMETRY, TOP PRIORITY)

Step 0 verification (MANDATORY):

No ENV gate → always active
Located in hak_tiny_free() → executes on EVERY tiny free call
Mixed benchmark heavily exercises free path → high execution count
Execution confirmed: First instruction in HOT path function

Target Profile

Location: core/hakmem_tiny_free.inc:326

Original Code:

void hak_tiny_free(void* ptr) {
    static _Atomic int g_tiny_free_trace = 0;
    if (atomic_fetch_add_explicit(&g_tiny_free_trace, 1, memory_order_relaxed) < 128) {
        HAK_TRACE("[hak_tiny_free_enter]\n");
    }
    // ... rest of function ...
}

Classification:

Class: TELEMETRY (trace rate-limit only)
Path: HOT (every tiny free operation)
Flow Control: None (only affects HAK_TRACE macro output)
Correctness Impact: None

Similar precedent: Phase 25 (g_free_ss_enter: +1.07% GO)

Implementation (4-Step Standard Procedure)

Step 0: Execution Verification (Phase 29 lesson)

ENV gate check:

$ rg "getenv.*TRACE" core/ --type c
# (No results - no ENV gate blocking execution)

Execution check:

Located at entry of hak_tiny_free() (line 326)
Executes on EVERY tiny free call (no conditional bypass)
Mixed benchmark: ~10M+ free operations per run
Verification: PASSED (always active)

Step 1: CORRECTNESS/TELEMETRY Classification (Phase 28 lesson)

Full usage audit:

$ rg -n "g_tiny_free_trace" core/
core/hakmem_tiny_free.inc:326:    static _Atomic int g_tiny_free_trace = 0;
core/hakmem_tiny_free.inc:327:    if (atomic_fetch_add_explicit(&g_tiny_free_trace, 1, memory_order_relaxed) < 128) {

Analysis:

Only 2 uses: declaration + atomic increment
No if conditions using the counter value
Only affects HAK_TRACE printf (debug macro)
Classification: Pure TELEMETRY ✅

Step 2: Compile-Out Implementation

File 1: core/hakmem_build_flags.h

Added:

// ------------------------------------------------------------
// Phase 31: Tiny Free Trace Atomic Prune (Compile-out trace atomic)
// ------------------------------------------------------------
// Tiny Free Trace: Compile gate (default OFF = compile-out)
// Set to 1 for research builds that need free path trace diagnostics
// Target: g_tiny_free_trace atomic in core/hakmem_tiny_free.inc:326
// Impact: HOT path atomic (every free operation)
// Expected improvement: +0.5% to +1.0% (similar to Phase 25: +1.07%)
#ifndef HAKMEM_TINY_FREE_TRACE_COMPILED
#  define HAKMEM_TINY_FREE_TRACE_COMPILED 0
#endif

File 2: core/hakmem_tiny_free.inc:326

Before:

void hak_tiny_free(void* ptr) {
    static _Atomic int g_tiny_free_trace = 0;
    if (atomic_fetch_add_explicit(&g_tiny_free_trace, 1, memory_order_relaxed) < 128) {
        HAK_TRACE("[hak_tiny_free_enter]\n");
    }
    // ... rest of function ...
}

After:

void hak_tiny_free(void* ptr) {
#if HAKMEM_TINY_FREE_TRACE_COMPILED
    static _Atomic int g_tiny_free_trace = 0;
    if (atomic_fetch_add_explicit(&g_tiny_free_trace, 1, memory_order_relaxed) < 128) {
        HAK_TRACE("[hak_tiny_free_enter]\n");
    }
#else
    (void)0;  // No-op when trace compiled out
#endif
    // ... rest of function ...
}

Include verification:

hakmem_build_flags.h included transitively via tiny_front_config_box.h
No explicit include needed

Step 3: A/B Test (Build-Level Comparison)

Baseline (COMPILED=0, default - trace compiled-out):

make clean && make -j bench_random_mixed_hakmem
scripts/run_mixed_10_cleanenv.sh

Compiled-in (COMPILED=1, research - trace active):

make clean && make -j EXTRA_CFLAGS='-DHAKMEM_TINY_FREE_TRACE_COMPILED=1' bench_random_mixed_hakmem
scripts/run_mixed_10_cleanenv.sh

A/B Test Results

Raw Data (10-run clean environment)

Baseline (COMPILED=0, trace compiled-out):

Run  1: 53432447 ops/s
Run  2: 53846666 ops/s
Run  3: 53256003 ops/s
Run  4: 54007573 ops/s
Run  5: 54132468 ops/s
Run  6: 53937278 ops/s
Run  7: 53752216 ops/s
Run  8: 53106138 ops/s
Run  9: 53861749 ops/s
Run 10: 53052398 ops/s

Compiled-in (COMPILED=1, trace active):

Run  1: 53667388 ops/s
Run  2: 53623799 ops/s
Run  3: 54099595 ops/s
Run  4: 53993106 ops/s
Run  5: 53530214 ops/s
Run  6: 54275707 ops/s
Run  7: 53726604 ops/s
Run  8: 53607801 ops/s
Run  9: 54122912 ops/s
Run 10: 53630312 ops/s

Statistical Analysis

Metric	Baseline (COMPILED=0)	Compiled-in (COMPILED=1)	Difference
Mean	53,638,493.60 ops/s	53,827,743.80 ops/s	-0.35%
Median	53,799,441.00 ops/s	53,696,996.00 ops/s	+0.19%
Stdev	393,174.93 (0.73%)	267,178.23 (0.50%)	-

Difference interpretation:

Mean: Baseline -0.35% (SLOWER, but within noise)
Median: Baseline +0.19% (FASTER, but within noise)
Verdict range: Both within ±0.5% NEUTRAL threshold

Verdict

Performance: NEUTRAL

Criteria:

GO: +0.5% or more (compile-out wins)
NEUTRAL: ±0.5% (no significant difference)
NO-GO: -0.5% or worse (compile-out loses)

Result: NEUTRAL (-0.35% mean, +0.19% median)

Analysis:

Mean shows slight regression (-0.35%), median shows slight improvement (+0.19%)
Conflicting signals suggest measurement noise rather than true effect
Standard deviation overlap confirms lack of statistical significance
Similar to Phase 26 pattern (-0.33%, 5 atomics, NEUTRAL)

Decision: ADOPTED (COMPILED=0 default)

Rationale (following Phase 26 precedent):

Code Cleanliness:
- Removes unused TELEMETRY atomic from HOT path
- Reduces complexity at hak_tiny_free() entry point
- No correctness impact (pure trace macro)
Consistency:
- Phase 26 precedent: -0.33% NEUTRAL result adopted for cleanliness
- Phase 31: -0.35% NEUTRAL result follows same logic
- Maintains atomic prune momentum (Phases 24-31)
Research Flexibility:
- COMPILED=1 still available for trace diagnostics
- No functionality lost, only default changed
- Easy revert if needed (make EXTRA_CFLAGS=-DHAKMEM_TINY_FREE_TRACE_COMPILED=1)
Why Not NO-GO?
- Median +0.19% (slight win, not loss)
- Mean -0.35% within noise range (±0.5% threshold)
- Phase 26 set precedent: NEUTRAL + cleanliness = ADOPT

Comparison: Phase 25 vs Phase 31

Phase 25: g_free_ss_enter (free stats atomic)

Location: tiny_superslab_free.inc.h:25 (entry point)
Result: +1.07% (GO)
Path: Same HOT path (free entry)
Similarity: Both trace/stats atomics at free entry

Phase 31: g_tiny_free_trace (trace rate-limit atomic)

Location: hakmem_tiny_free.inc:326 (entry point)
Result: -0.35% mean, +0.19% median (NEUTRAL)
Path: Same HOT path (free entry)
Difference: Rate-limited (128 calls) vs always-increment

Why different results?

Execution frequency:
- Phase 25: EVERY free call increments stats
- Phase 31: EVERY free call increments, but trace only 128 times
- Hypothesis: Phase 25's always-active stats had higher overhead
Atomic placement:
- Phase 25: Inside hak_tiny_free_superslab() (deeper in call stack)
- Phase 31: First instruction in hak_tiny_free() (entry point)
- Hypothesis: Entry point atomic may be better optimized by compiler
Measurement variance:
- Phase 25: Clear +1.07% signal above noise
- Phase 31: -0.35% / +0.19% conflicting signals (noise)
- Conclusion: Phase 31 likely true NEUTRAL, not hidden win

Lessons Learned

1. HOT Path ≠ Guaranteed Win

Previous assumption (from Phase 25):

HOT path TELEMETRY atomic → +0.5% to +1.0% expected

Phase 31 reality:

HOT path TELEMETRY atomic → NEUTRAL (±0.0%)

Insight:

Not all HOT path atomics have measurable overhead
Rate-limited trace (128 calls) may be optimized away by compiler
Entry point placement may reduce overhead vs mid-function

2. NEUTRAL + Cleanliness = ADOPT

Established precedent (Phase 26):

5 diagnostic atomics, -0.33% NEUTRAL result
Adopted for code cleanliness despite no performance win

Phase 31 confirms:

-0.35% NEUTRAL result, same adoption logic
Code cleanliness is valid secondary criterion
Maintains atomic prune momentum (Phases 24-31)

3. Step 0 (Execution Verification) Essential

Phase 31 validated:

Step 0 confirmed no ENV gate → always active
Prevented Phase 29 "empty bench" scenario
Standard procedure working as designed

Next Steps

Phase 32 Candidate: `g_hak_tiny_free_calls`

Location: core/hakmem_tiny_free.inc:335 (same function, 9 lines after Phase 31 target)

Code context:

void hak_tiny_free(void* ptr) {
#if HAKMEM_TINY_FREE_TRACE_COMPILED
    // Phase 31 target (now compiled-out)
#endif
    // Track total tiny free calls (diagnostics)
    extern _Atomic uint64_t g_hak_tiny_free_calls;
    atomic_fetch_add_explicit(&g_hak_tiny_free_calls, 1, memory_order_relaxed);  // ← Phase 32 target
    // ... rest of function ...
}

Profile:

Path: HOT (every tiny free call, same as Phase 31)
Classification: TELEMETRY (diagnostic counter, no flow control)
Expected: +0.3% to +0.7% (smaller than Phase 25, similar to Phase 31)
Step 0 verification needed: Check for ENV gate, confirm execution

Alternative candidates:

Manual review of UNKNOWN atomics (284 candidates from Phase 30 audit)
Lower priority than confirmed HOT path targets

Files Modified

Code Changes

core/hakmem_build_flags.h
- Added HAKMEM_TINY_FREE_TRACE_COMPILED flag (default OFF)
- Lines 363-373
core/hakmem_tiny_free.inc
- Wrapped g_tiny_free_trace atomic in #if HAKMEM_TINY_FREE_TRACE_COMPILED
- Lines 326-333

Documentation

docs/analysis/PHASE31_TINY_FREE_TRACE_ATOMIC_PRUNE_RESULTS.md (this file)
- A/B test results
- NEUTRAL verdict + code cleanliness adoption
- Phase 32 candidate proposal
docs/analysis/ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md (to be updated)
- Phase 24-31 cumulative summary
- Updated precedents section
- Phase 32 roadmap
CURRENT_TASK.md (to be updated)
- Phase 31 completion
- Phase 32 candidate recommendation

Cumulative Progress (Phases 24-31)

Phase	Target	Atomics	Result	Status
24	Tiny Class Stats (OBSERVE)	5	+0.93%	GO ✅
25	Free Stats (`g_free_ss_enter`)	1	+1.07%	GO ✅
26	Hot Path Diagnostics	5	-0.33%	NEUTRAL ✅
27	Unified Cache Stats	6	+0.74%	GO ✅
28	Background Spill Queue	8	N/A	NO-OP ✅
29	Pool Hotbox v2 Stats	12	0.00%	NO-OP ✅
30	Standard Procedure	412 audit	N/A	PROCEDURE ✅
31	Tiny Free Trace	1	-0.35%	NEUTRAL ✅
Total	18 atomics removed	+2.74%	net cumulative	✅

Net cumulative gain: +2.74% (Phases 24+25+27, excluding NEUTRAL 26+31)

Note: Phase 26 and 31 NEUTRAL results do not degrade cumulative gain (no regression).

Conclusion

Phase 31 demonstrates that not all HOT path TELEMETRY atomics have measurable overhead. While Phase 25 (g_free_ss_enter) delivered +1.07%, Phase 31 (g_tiny_free_trace) showed NEUTRAL performance (-0.35% mean, +0.19% median). Following Phase 26 precedent, Phase 31 is ADOPTED with COMPILED=0 as default for code cleanliness benefits.

Key takeaways:

HOT path location does not guarantee performance wins
NEUTRAL + code cleanliness is valid adoption criterion (Phase 26/31 pattern)
Standard 4-step procedure successfully prevented false positives (Step 0 execution check)
Phase 32 candidate ready: g_hak_tiny_free_calls (same HOT path, 9 lines below)

Recommendation: Proceed to Phase 32 (g_hak_tiny_free_calls) following same 4-step procedure.

13 KiB Raw Blame History