2025-12-16 05:35:11 +09:00
# Hot Path Atomic Telemetry Prune - Cumulative Summary
**Project:** HAKMEM Memory Allocator - Hot Path Optimization
**Goal:** Remove all telemetry-only atomics from hot alloc/free paths
**Principle:** Follow mimalloc: No atomics/observe in hot path
2025-12-16 15:01:56 +09:00
**Status:** Phase 24+25+26+27+31+32 Complete (+2.74% cumulative), Phase 28+29 NO-OP, Phase 30 Procedure Complete
2025-12-16 05:35:11 +09:00
---
## Overview
This document tracks the systematic removal of telemetry-only `atomic_fetch_add/sub` operations from hot alloc/free code paths. Each phase follows a consistent pattern:
1. Identify telemetry-only atomic (not CORRECTNESS)
2. Add `HAKMEM_*_COMPILED` compile gate (default: 0)
3. A/B test: baseline (compiled-out) vs compiled-in
4. Verdict: GO (>+0.5%), NEUTRAL (±0.5%), or NO-GO (< -0.5 %)
5. Document and proceed to next candidate
---
## Completed Phases
### Phase 24: Tiny Class Stats Atomic Prune ✅ **GO (+0.93%)**
**Date:** 2025-12-15 (prior work)
**Target:** `g_tiny_class_stats_*` (per-class cache hit/miss counters)
**File:** `core/box/tiny_class_stats_box.h`
**Atomics:** 5 global counters (executed on every cache operation)
**Build Flag:** `HAKMEM_TINY_CLASS_STATS_COMPILED` (default: 0)
**Results:**
- **Baseline (compiled-out):** 57.8 M ops/s
- **Compiled-in:** 57.3 M ops/s
- **Improvement:** ** +0.93%**
- **Verdict:** **GO** ✅ (keep compiled-out)
**Analysis:** High-frequency atomics (every cache hit/miss) show measurable impact. Compiling out provides nearly 1% improvement.
**Reference:** Pattern established in Phase 24, used as template for all subsequent phases.
---
### Phase 25: Free Stats Atomic Prune ✅ **GO (+1.07%)**
**Date:** 2025-12-15 (prior work)
**Target:** `g_free_ss_enter` (superslab free entry counter)
**File:** `core/tiny_superslab_free.inc.h:22`
**Atomics:** 1 global counter (executed on every superslab free)
**Build Flag:** `HAKMEM_TINY_FREE_STATS_COMPILED` (default: 0)
**Results:**
- **Baseline (compiled-out):** 58.4 M ops/s
- **Compiled-in:** 57.8 M ops/s
- **Improvement:** ** +1.07%**
- **Verdict:** **GO** ✅ (keep compiled-out)
**Analysis:** Single high-frequency atomic (every free call) shows >1% impact. Demonstrates that even one hot-path atomic matters.
**Reference:** `docs/analysis/PHASE25_FREE_STATS_RESULTS.md` (assumed from pattern)
---
### Phase 26: Hot Path Diagnostic Atomics Prune ✅ **NEUTRAL (-0.33%)**
**Date:** 2025-12-16
**Targets:** 5 diagnostic atomics in hot-path edge cases
**Files:**
- `core/tiny_superslab_free.inc.h` (3 atomics)
- `core/hakmem_tiny_alloc.inc` (1 atomic)
- `core/tiny_free_fast_v2.inc.h` (1 atomic)
**Build Flags:** (all default: 0)
- `HAKMEM_C7_FREE_COUNT_COMPILED`
- `HAKMEM_HDR_MISMATCH_LOG_COMPILED`
- `HAKMEM_HDR_META_MISMATCH_COMPILED`
- `HAKMEM_METRIC_BAD_CLASS_COMPILED`
- `HAKMEM_HDR_META_FAST_COMPILED`
**Results:**
- **Baseline (compiled-out):** 53.14 M ops/s (±0.96M)
- **Compiled-in:** 53.31 M ops/s (±1.09M)
- **Improvement:** ** -0.33%** (within ±0.5% noise margin)
- **Verdict:** **NEUTRAL** ➡️ Keep compiled-out for cleanliness ✅
**Analysis:** Low-frequency atomics (only in error/diagnostic paths) show no measurable impact. Kept compiled-out for code cleanliness and maintainability.
**Reference:** `docs/analysis/PHASE26_HOT_PATH_ATOMIC_PRUNE_RESULTS.md`
---
2025-12-16 06:12:17 +09:00
### Phase 27: Unified Cache Stats Atomic Prune ✅ **GO (+0.74%)**
**Date:** 2025-12-16
**Target:** `g_unified_cache_*` (unified cache measurement atomics)
**File:** `core/front/tiny_unified_cache.c` , `core/front/tiny_unified_cache.h`
**Atomics:** 6 global counters (hits, misses, refill cycles, per-class variants)
**Build Flag:** `HAKMEM_TINY_UNIFIED_CACHE_MEASURE_COMPILED` (default: 0)
**Results:**
- **Baseline (compiled-out):** 52.94 M ops/s (mean), 53.59 M ops/s (median)
- **Compiled-in:** 52.55 M ops/s (mean), 53.06 M ops/s (median)
- **Improvement:** ** +0.74% (mean), +1.01% (median)**
- **Verdict:** **GO** ✅ (keep compiled-out)
**Analysis:** WARM path atomics (cache refill operations) show measurable impact exceeding initial expectations (+0.2-0.4% expected, +0.74% actual). This suggests refill frequency is substantial in the random_mixed benchmark. The improvement validates the Phase 23 compile-out decision.
**Path:** WARM (unified cache refill: 3 locations; cache hits: 2 locations)
**Frequency:** Medium (every cache miss triggers refill with 4 atomic ops + ENV check)
**Reference:** `docs/analysis/PHASE27_UNIFIED_CACHE_STATS_RESULTS.md`
---
### Phase 28: Background Spill Queue Atomic Audit ✅ **NO-OP (All CORRECTNESS)**
**Date:** 2025-12-16
**Target:** Background spill queue atomics (`g_bg_spill_head` , `g_bg_spill_len` )
**Files:** `core/hakmem_tiny_bg_spill.h` , `core/hakmem_tiny_bg_spill.c`
**Atomics:** 8 atomic operations (CAS loops, queue management)
**Build Flag:** None (no compile-out candidates)
**Audit Results:**
- **CORRECTNESS Atomics:** 8/8 (100%)
- **TELEMETRY Atomics:** 0/8 (0%)
- **Verdict:** **NO-OP** (no action taken)
**Analysis:**
All atomics are critical for correctness:
1. **Lock-free queue operations:** `atomic_load` , `atomic_compare_exchange_weak` for CAS loops
2. **Queue length tracking (`g_bg_spill_len`):** Used for **flow control** , NOT telemetry
- Checked in `tiny_free_magazine.inc.h:76-77` to decide whether to queue work
- Controls queue depth to prevent unbounded growth
- This is an operational counter, not a debug counter
**Key Finding:** `g_bg_spill_len` is superficially similar to telemetry counters, but serves a critical role:
```c
uint32_t qlen = atomic_load_explicit(& g_bg_spill_len[class_idx], memory_order_relaxed);
if ((int)qlen < g_bg_spill_target ) { / / FLOW CONTROL DECISION
// Queue work to background spill
}
```
**Conclusion:** Background spill queue is a lock-free data structure. All atomics are untouchable. Phase 28 completes with **no code changes** .
**Reference:** `docs/analysis/PHASE28_BG_SPILL_ATOMIC_AUDIT.md`
---
Phase 29: Pool Hotbox v2 Stats Prune - NO-OP (infrastructure ready)
Target: g_pool_hotbox_v2_stats atomics (12 total) in Pool v2
Result: 0.00% impact (code path inactive by default, ENV-gated)
Verdict: NO-OP - Maintain compile-out for future-proofing
Audit Results:
- Classification: 12/12 TELEMETRY (100% observational)
- Counters: alloc_calls, alloc_fast, alloc_refill, alloc_refill_fail,
alloc_fallback_v1, free_calls, free_fast, free_fallback_v1,
page_of_fail_* (4 failure counters)
- Verification: All stats/logging only, zero flow control usage
- Phase 28 lesson applied: Traced all usages, confirmed no CORRECTNESS
Key Finding: Pool v2 OFF by default
- Requires HAKMEM_POOL_V2_ENABLED=1 to activate
- Benchmark never executes Pool v2 code paths
- Compile-out has zero performance impact (code never runs)
Implementation (future-ready):
- Added HAKMEM_POOL_HOTBOX_V2_STATS_COMPILED (default: 0)
- Wrapped 13 atomic write sites in core/hakmem_pool.c
- Pattern: #if HAKMEM_POOL_HOTBOX_V2_STATS_COMPILED ... #endif
- Expected impact if Pool v2 enabled: +0.3~0.8% (HOT+WARM atomics)
A/B Test Results:
- Baseline (COMPILED=0): 52.98 M ops/s (±0.43M, 0.81% stdev)
- Research (COMPILED=1): 53.31 M ops/s (±0.80M, 1.50% stdev)
- Delta: -0.62% (noise, not real effect - code path not active)
Critical Lesson Learned (NEW):
Phase 29 revealed ENV-gated features can appear on hot paths but never
execute. Updated audit checklist:
1. Classify atomics (CORRECTNESS vs TELEMETRY)
2. Verify no flow control usage
3. NEW: Verify code path is ACTIVE in benchmark (check ENV gates)
4. Implement compile-out
5. A/B test
Verification methods added to documentation:
- rg "getenv.*FEATURE" to check ENV gates
- perf record/report to verify execution
- Debug printf for quick validation
Cumulative Progress (Phase 24-29):
- Phase 24 (class stats): +0.93% GO
- Phase 25 (free stats): +1.07% GO
- Phase 26 (diagnostics): -0.33% NEUTRAL
- Phase 27 (unified cache): +0.74% GO
- Phase 28 (bg spill): NO-OP (all CORRECTNESS)
- Phase 29 (pool v2): NO-OP (inactive code path)
- Total: 17 atomics removed, +2.74% improvement
Documentation:
- PHASE29_POOL_HOTBOX_V2_AUDIT.md: Complete audit with TELEMETRY classification
- PHASE29_POOL_HOTBOX_V2_STATS_RESULTS.md: Results + new lesson learned
- ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md: Updated with Phase 29 + new checklist
- PHASE29_COMPLETE.md: Completion summary with recommendations
Decision: Keep compile-out despite NO-OP
- Code cleanliness (binary size reduction)
- Future-proofing (ready when Pool v2 enabled)
- Consistency with Phase 24-28 pattern
Generated with Claude Code
https://claude.com/claude-code
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-16 06:33:41 +09:00
### Phase 29: Pool Hotbox v2 Stats Atomic Audit ✅ **NO-OP (Code Not Active)**
**Date:** 2025-12-16
**Target:** Pool Hotbox v2 stats atomics (`g_pool_hotbox_v2_stats[ci].*` )
**Files:** `core/hakmem_pool.c` , `core/box/pool_hotbox_v2_box.h`
**Atomics:** 12 atomic counters (alloc_calls, free_calls, alloc_fast, free_fast, etc.)
**Build Flag:** `HAKMEM_POOL_HOTBOX_V2_STATS_COMPILED` (default: 0)
**Audit Results:**
- **CORRECTNESS Atomics:** 0/12 (0%)
- **TELEMETRY Atomics:** 12/12 (100%)
- **Verdict:** **NO-OP** (code path not active)
**Analysis:**
All 12 atomics are pure TELEMETRY (destructor dump only, no flow control). However, Pool Hotbox v2 is **disabled by default** via `HAKMEM_POOL_V2_ENABLED` environment variable, so these atomics are **never executed** in the benchmark.
**A/B Test Results (Anomaly Detected):**
- **Baseline (compiled-out):** 52.98 M ops/s (±0.43M)
- **Compiled-in:** 53.31 M ops/s (±0.80M)
- **Improvement:** ** -0.62%** (compiled-in is faster!)
**Root Cause:** Pool v2 is OFF by default (ENV-gated):
```c
const char* e = getenv("HAKMEM_POOL_V2_ENABLED");
g = (e & & *e & & *e != '0') ? 1 : 0; // Default: OFF
```
**Result:** Atomics are never incremented → compile-out has **zero runtime effect** .
**Why anomaly (-0.62% faster with atomics ON)?**
1. High variance (research build: 1.50% stdev vs baseline: 0.81%)
2. Compiler optimization artifact (code layout, instruction cache alignment)
3. Sample size (10 runs) insufficient to distinguish signal from noise
4. **Conclusion:** Noise, not real effect
**Decision:** NEUTRAL - Keep compile-out for:
- Code cleanliness (reduces binary size)
- Future-proofing (ready if Pool v2 is enabled)
- Consistency with Phase 24-28 pattern
**Key Lesson:** Before A/B testing, verify code is ACTIVE:
```bash
rg "getenv.*FEATURE" & & echo "⚠️ ENV-gated, may be OFF"
```
**Updated Audit Checklist:**
1. ✅ Classify atomics (CORRECTNESS vs TELEMETRY)
2. ✅ Verify no flow control usage
3. **NEW:** ✅ Verify code path is ACTIVE in benchmark ← **Phase 29 lesson**
4. Implement compile-out
5. A/B test
**Reference:** `docs/analysis/PHASE29_POOL_HOTBOX_V2_STATS_RESULTS.md`
---
2025-12-16 07:31:15 +09:00
### Phase 30: Standard Procedure Documentation ✅ **PROCEDURE COMPLETE**
**Date:** 2025-12-16
**Target:** Standardization of atomic prune methodology (not a performance phase)
**Purpose:** Codify learnings from Phase 24-29 into reusable 4-step procedure
**Deliverables:**
1. `docs/analysis/PHASE30_STANDARD_PROCEDURE.md` - 4-step standardized methodology
2. `docs/analysis/ATOMIC_AUDIT_FULL.txt` - Complete atomic audit (412 atomics)
3. `docs/analysis/PHASE31_RECOMMENDED_CANDIDATES.md` - Phase 31 candidate selection
**4-Step Standard Procedure:**
**Step 0: Execution Verification (NEW - Phase 29 lesson)**
- Check for ENV gates (`getenv()` checks)
- Verify execution counters > 0 in benchmark
- Use perf/flamegraph to confirm code path is hit
- **Decision:** SKIP if ENV-gated or not executed
**Step 1: CORRECTNESS/TELEMETRY Classification (Phase 28 lesson)**
- Track all atomic usage sites
- Check for `if` conditions (CORRECTNESS)
- Verify pure telemetry usage (TELEMETRY)
- **Decision:** DO NOT TOUCH if CORRECTNESS
**Step 2: Compile-Out Implementation (Phase 24-27 pattern)**
- Add `HAKMEM_*_COMPILED` flag to `hakmem_build_flags.h`
- Wrap atomics with `#if` preprocessor gates
- Build-level compile-out (not link-out)
**Step 3: A/B Test (build-level comparison)**
- Baseline (COMPILED=0): default build
- Compiled-in (COMPILED=1): research build
- Compare 10-run averages
- **Verdict:** GO (+0.5%+), NEUTRAL (±0.5%), NO-GO (-0.5%+)
**Audit Results (Phase 30):**
- **Total atomics:** 412 (104 TELEMETRY, 24 CORRECTNESS, 284 UNKNOWN)
- **HOT path:** 16 atomics (5 TELEMETRY, 11 UNKNOWN)
- **WARM path:** 10 atomics (3 TELEMETRY, 7 UNKNOWN)
- **COLD path:** 386 atomics (remaining)
**Phase 31 Candidate Selection:**
- **TOP PRIORITY:** `g_tiny_free_trace` (HOT path, TELEMETRY, execution verified)
- **Expected Impact:** +0.5% to +1.0% (similar to Phase 25)
- **Skipped:** 2 ENV-gated WARM path candidates (Phase 29 lesson applied)
**Key Lesson:** Step 0 (execution verification) prevents wasted effort on ENV-gated or inactive code paths. Phase 29 taught us that optimization without execution = zero impact.
**Reference:** `docs/analysis/PHASE30_STANDARD_PROCEDURE.md` , `docs/analysis/PHASE31_RECOMMENDED_CANDIDATES.md`
---
### Phase 31: Tiny Free Trace Atomic Prune ✅ **NEUTRAL (-0.35%)**
**Date:** 2025-12-16
**Target:** `g_tiny_free_trace` (tiny free trace rate-limit counter)
**File:** `core/hakmem_tiny_free.inc:326`
**Atomics:** 1 global counter (executed on every tiny free)
**Build Flag:** `HAKMEM_TINY_FREE_TRACE_COMPILED` (default: 0)
**Results:**
- **Baseline (compiled-out):** 53.64 M ops/s (mean), 53.80 M ops/s (median)
- **Compiled-in:** 53.83 M ops/s (mean), 53.70 M ops/s (median)
- **Improvement:** ** -0.35% (mean), +0.19% (median)**
- **Verdict:** **NEUTRAL** ➡️ Keep compiled-out for cleanliness ✅
**Analysis:** HOT path atomic (every free call entry) shows no measurable impact (-0.35% mean, +0.19% median, both within ±0.5% noise margin). Unlike Phase 25 (`g_free_ss_enter` : +1.07%), this trace rate-limit atomic (128 calls) does not show performance overhead. Following Phase 26 precedent (-0.33% NEUTRAL, adopted for cleanliness), Phase 31 is ADOPTED with COMPILED=0 as default.
**Path:** HOT (entry point of `hak_tiny_free()` )
**Frequency:** High (every tiny free call, but rate-limited to 128 traces)
**Key Finding:** Not all HOT path atomics have measurable overhead. Rate-limited trace may be optimized by compiler.
**Reference:** `docs/analysis/PHASE31_TINY_FREE_TRACE_ATOMIC_PRUNE_RESULTS.md`
---
2025-12-16 15:01:56 +09:00
### Phase 32: Tiny Free Calls Atomic Prune ✅ **NEUTRAL (-0.46%)**
**Date:** 2025-12-16
**Target:** `g_hak_tiny_free_calls` (tiny free calls diagnostic counter)
**File:** `core/hakmem_tiny_free.inc:335` (9 lines after Phase 31)
**Atomics:** 1 global counter (executed on every tiny free, unconditional)
**Build Flag:** `HAKMEM_TINY_FREE_CALLS_COMPILED` (default: 0)
**Results:**
- **Baseline (compiled-out):** 52.94 M ops/s (mean), 53.22 M ops/s (median)
- **Compiled-in:** 53.28 M ops/s (mean), 53.46 M ops/s (median)
- **Improvement:** ** -0.46% (mean), -0.46% (median)**
- **Verdict:** **NEUTRAL** ➡️ Keep compiled-out for cleanliness ✅
**Analysis:** HOT path atomic (every free call, 9 lines after Phase 31 target) shows no measurable impact (-0.46%, within ±0.5% noise margin). Unexpectedly, the atomic counter compiled-in performed slightly better, suggesting code alignment effects rather than atomic overhead. Following Phase 31 precedent (-0.35% NEUTRAL), Phase 32 is ADOPTED with COMPILED=0 for code cleanliness and consistency.
**Path:** HOT (same function as Phase 31, `hak_tiny_free()` )
**Frequency:** High (every tiny free call, unconditional - no rate limit)
**Key Finding:** Diagnostic counter has negligible performance impact on modern CPUs. NEUTRAL result reinforces Phase 31 pattern: compile-out for code cleanliness, not performance.
**Reference:** `docs/analysis/PHASE32_TINY_FREE_CALLS_ATOMIC_PRUNE_RESULTS.md`
---
2025-12-16 05:35:11 +09:00
## Cumulative Impact
| Phase | Atomics Removed | Frequency | Impact | Status |
|-------|-----------------|-----------|--------|--------|
| 24 | 5 (class stats) | High (every cache op) | ** +0.93%** | GO ✅ |
| 25 | 1 (free_ss_enter) | High (every free) | ** +1.07%** | GO ✅ |
| 26 | 5 (diagnostics) | Low (edge cases) | -0.33% | NEUTRAL ✅ |
2025-12-16 06:12:17 +09:00
| 27 | 6 (unified cache) | Medium (refills) | ** +0.74%** | GO ✅ |
| **28** | **0 (bg spill)** | **N/A (all CORRECTNESS)** | **N/A** | **NO-OP ✅** |
Phase 29: Pool Hotbox v2 Stats Prune - NO-OP (infrastructure ready)
Target: g_pool_hotbox_v2_stats atomics (12 total) in Pool v2
Result: 0.00% impact (code path inactive by default, ENV-gated)
Verdict: NO-OP - Maintain compile-out for future-proofing
Audit Results:
- Classification: 12/12 TELEMETRY (100% observational)
- Counters: alloc_calls, alloc_fast, alloc_refill, alloc_refill_fail,
alloc_fallback_v1, free_calls, free_fast, free_fallback_v1,
page_of_fail_* (4 failure counters)
- Verification: All stats/logging only, zero flow control usage
- Phase 28 lesson applied: Traced all usages, confirmed no CORRECTNESS
Key Finding: Pool v2 OFF by default
- Requires HAKMEM_POOL_V2_ENABLED=1 to activate
- Benchmark never executes Pool v2 code paths
- Compile-out has zero performance impact (code never runs)
Implementation (future-ready):
- Added HAKMEM_POOL_HOTBOX_V2_STATS_COMPILED (default: 0)
- Wrapped 13 atomic write sites in core/hakmem_pool.c
- Pattern: #if HAKMEM_POOL_HOTBOX_V2_STATS_COMPILED ... #endif
- Expected impact if Pool v2 enabled: +0.3~0.8% (HOT+WARM atomics)
A/B Test Results:
- Baseline (COMPILED=0): 52.98 M ops/s (±0.43M, 0.81% stdev)
- Research (COMPILED=1): 53.31 M ops/s (±0.80M, 1.50% stdev)
- Delta: -0.62% (noise, not real effect - code path not active)
Critical Lesson Learned (NEW):
Phase 29 revealed ENV-gated features can appear on hot paths but never
execute. Updated audit checklist:
1. Classify atomics (CORRECTNESS vs TELEMETRY)
2. Verify no flow control usage
3. NEW: Verify code path is ACTIVE in benchmark (check ENV gates)
4. Implement compile-out
5. A/B test
Verification methods added to documentation:
- rg "getenv.*FEATURE" to check ENV gates
- perf record/report to verify execution
- Debug printf for quick validation
Cumulative Progress (Phase 24-29):
- Phase 24 (class stats): +0.93% GO
- Phase 25 (free stats): +1.07% GO
- Phase 26 (diagnostics): -0.33% NEUTRAL
- Phase 27 (unified cache): +0.74% GO
- Phase 28 (bg spill): NO-OP (all CORRECTNESS)
- Phase 29 (pool v2): NO-OP (inactive code path)
- Total: 17 atomics removed, +2.74% improvement
Documentation:
- PHASE29_POOL_HOTBOX_V2_AUDIT.md: Complete audit with TELEMETRY classification
- PHASE29_POOL_HOTBOX_V2_STATS_RESULTS.md: Results + new lesson learned
- ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md: Updated with Phase 29 + new checklist
- PHASE29_COMPLETE.md: Completion summary with recommendations
Decision: Keep compile-out despite NO-OP
- Code cleanliness (binary size reduction)
- Future-proofing (ready when Pool v2 enabled)
- Consistency with Phase 24-28 pattern
Generated with Claude Code
https://claude.com/claude-code
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-16 06:33:41 +09:00
| **29** | **0 (pool v2)** | **N/A (code not active)** | **0.00%** | **NO-OP ✅** |
2025-12-16 07:31:15 +09:00
| **30** | **0 (procedure)** | **N/A (standardization)** | **N/A** | **PROCEDURE ✅** |
| **31** | **1 (free trace)** | **High (every free entry)** | ** -0.35%** | **NEUTRAL ✅** |
2025-12-16 15:01:56 +09:00
| **32** | **1 (free calls)** | **High (every free, unconditional)** | ** -0.46%** | **NEUTRAL ✅** |
| **Total** | **19 atomics** | **Mixed** | ** +2.74%** | ** ✅** |
2025-12-16 05:35:11 +09:00
Phase 29: Pool Hotbox v2 Stats Prune - NO-OP (infrastructure ready)
Target: g_pool_hotbox_v2_stats atomics (12 total) in Pool v2
Result: 0.00% impact (code path inactive by default, ENV-gated)
Verdict: NO-OP - Maintain compile-out for future-proofing
Audit Results:
- Classification: 12/12 TELEMETRY (100% observational)
- Counters: alloc_calls, alloc_fast, alloc_refill, alloc_refill_fail,
alloc_fallback_v1, free_calls, free_fast, free_fallback_v1,
page_of_fail_* (4 failure counters)
- Verification: All stats/logging only, zero flow control usage
- Phase 28 lesson applied: Traced all usages, confirmed no CORRECTNESS
Key Finding: Pool v2 OFF by default
- Requires HAKMEM_POOL_V2_ENABLED=1 to activate
- Benchmark never executes Pool v2 code paths
- Compile-out has zero performance impact (code never runs)
Implementation (future-ready):
- Added HAKMEM_POOL_HOTBOX_V2_STATS_COMPILED (default: 0)
- Wrapped 13 atomic write sites in core/hakmem_pool.c
- Pattern: #if HAKMEM_POOL_HOTBOX_V2_STATS_COMPILED ... #endif
- Expected impact if Pool v2 enabled: +0.3~0.8% (HOT+WARM atomics)
A/B Test Results:
- Baseline (COMPILED=0): 52.98 M ops/s (±0.43M, 0.81% stdev)
- Research (COMPILED=1): 53.31 M ops/s (±0.80M, 1.50% stdev)
- Delta: -0.62% (noise, not real effect - code path not active)
Critical Lesson Learned (NEW):
Phase 29 revealed ENV-gated features can appear on hot paths but never
execute. Updated audit checklist:
1. Classify atomics (CORRECTNESS vs TELEMETRY)
2. Verify no flow control usage
3. NEW: Verify code path is ACTIVE in benchmark (check ENV gates)
4. Implement compile-out
5. A/B test
Verification methods added to documentation:
- rg "getenv.*FEATURE" to check ENV gates
- perf record/report to verify execution
- Debug printf for quick validation
Cumulative Progress (Phase 24-29):
- Phase 24 (class stats): +0.93% GO
- Phase 25 (free stats): +1.07% GO
- Phase 26 (diagnostics): -0.33% NEUTRAL
- Phase 27 (unified cache): +0.74% GO
- Phase 28 (bg spill): NO-OP (all CORRECTNESS)
- Phase 29 (pool v2): NO-OP (inactive code path)
- Total: 17 atomics removed, +2.74% improvement
Documentation:
- PHASE29_POOL_HOTBOX_V2_AUDIT.md: Complete audit with TELEMETRY classification
- PHASE29_POOL_HOTBOX_V2_STATS_RESULTS.md: Results + new lesson learned
- ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md: Updated with Phase 29 + new checklist
- PHASE29_COMPLETE.md: Completion summary with recommendations
Decision: Keep compile-out despite NO-OP
- Code cleanliness (binary size reduction)
- Future-proofing (ready when Pool v2 enabled)
- Consistency with Phase 24-28 pattern
Generated with Claude Code
https://claude.com/claude-code
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-16 06:33:41 +09:00
**Key Insights:**
1. **Frequency matters more than count:** High-frequency atomics (Phase 24+25) provide measurable benefit (+0.93%, +1.07%). Medium-frequency atomics (Phase 27, WARM path) provide substantial benefit (+0.74%). Low-frequency atomics (Phase 26) provide cleanliness but no performance gain.
2. **Correctness atomics are untouchable:** Phase 28 showed that lock-free queues and flow control counters must not be touched.
3. **ENV-gated code paths need verification:** Phase 29 showed that compile-out of inactive code has zero performance impact. Always verify code is active before A/B testing.
2025-12-16 07:31:15 +09:00
4. **Standardized procedure prevents wasted effort:** Phase 30 codified 4-step procedure with Step 0 (execution verification) as mandatory gate to avoid Phase 29-style no-ops.
5. **HOT path ≠ guaranteed performance win:** Phase 31 showed that even HOT path atomics may have zero measurable overhead if rate-limited or well-optimized. NEUTRAL results still justify adoption for code cleanliness (Phase 26/31 precedent).
2025-12-16 05:35:11 +09:00
---
## Lessons Learned
2025-12-16 07:31:15 +09:00
### 1. Frequency Trumps Count (But Not Always)
2025-12-16 05:35:11 +09:00
- **Phase 24:** 5 atomics, high frequency → +0.93% ✅
- **Phase 25:** 1 atomic, high frequency → +1.07% ✅
- **Phase 26:** 5 atomics, low frequency → -0.33% (NEUTRAL)
2025-12-16 07:31:15 +09:00
- **Phase 31:** 1 atomic, high frequency → -0.35% (NEUTRAL)
2025-12-16 05:35:11 +09:00
2025-12-16 07:31:15 +09:00
**Takeaway:** Focus on always-executed atomics, not just atomic count. However, even high-frequency atomics may have zero measurable overhead if optimized (e.g., rate-limited, compiler optimization).
2025-12-16 05:35:11 +09:00
### 2. Edge Cases Don't Matter (Performance-Wise)
- Phase 26 atomics are in error/diagnostic paths (header mismatch, bad class, etc.)
- Rarely executed in benchmarks → no measurable impact
- Still worth compiling out for code cleanliness
### 3. Compile-Time Gates Work Well
- Pattern: `#if HAKMEM_*_COMPILED` (default: 0)
- Clean separation between research (compiled-in) and production (compiled-out)
- Easy to A/B test individual flags
### 4. Noise Margin: ±0.5%
- Benchmark variance ~1-2%
- Improvements < 0.5 % are within noise
- NEUTRAL verdict: keep simpler code (compiled-out)
2025-12-16 06:12:17 +09:00
### 5. Classification is Critical
- **Phase 28:** All atomics were CORRECTNESS (lock-free queue, flow control)
- Must distinguish between:
- **Telemetry counters:** Observational only, safe to compile-out
- **Operational counters:** Used for control flow decisions, UNTOUCHABLE
- Example: `g_bg_spill_len` looks like telemetry but controls queue depth limits
Phase 29: Pool Hotbox v2 Stats Prune - NO-OP (infrastructure ready)
Target: g_pool_hotbox_v2_stats atomics (12 total) in Pool v2
Result: 0.00% impact (code path inactive by default, ENV-gated)
Verdict: NO-OP - Maintain compile-out for future-proofing
Audit Results:
- Classification: 12/12 TELEMETRY (100% observational)
- Counters: alloc_calls, alloc_fast, alloc_refill, alloc_refill_fail,
alloc_fallback_v1, free_calls, free_fast, free_fallback_v1,
page_of_fail_* (4 failure counters)
- Verification: All stats/logging only, zero flow control usage
- Phase 28 lesson applied: Traced all usages, confirmed no CORRECTNESS
Key Finding: Pool v2 OFF by default
- Requires HAKMEM_POOL_V2_ENABLED=1 to activate
- Benchmark never executes Pool v2 code paths
- Compile-out has zero performance impact (code never runs)
Implementation (future-ready):
- Added HAKMEM_POOL_HOTBOX_V2_STATS_COMPILED (default: 0)
- Wrapped 13 atomic write sites in core/hakmem_pool.c
- Pattern: #if HAKMEM_POOL_HOTBOX_V2_STATS_COMPILED ... #endif
- Expected impact if Pool v2 enabled: +0.3~0.8% (HOT+WARM atomics)
A/B Test Results:
- Baseline (COMPILED=0): 52.98 M ops/s (±0.43M, 0.81% stdev)
- Research (COMPILED=1): 53.31 M ops/s (±0.80M, 1.50% stdev)
- Delta: -0.62% (noise, not real effect - code path not active)
Critical Lesson Learned (NEW):
Phase 29 revealed ENV-gated features can appear on hot paths but never
execute. Updated audit checklist:
1. Classify atomics (CORRECTNESS vs TELEMETRY)
2. Verify no flow control usage
3. NEW: Verify code path is ACTIVE in benchmark (check ENV gates)
4. Implement compile-out
5. A/B test
Verification methods added to documentation:
- rg "getenv.*FEATURE" to check ENV gates
- perf record/report to verify execution
- Debug printf for quick validation
Cumulative Progress (Phase 24-29):
- Phase 24 (class stats): +0.93% GO
- Phase 25 (free stats): +1.07% GO
- Phase 26 (diagnostics): -0.33% NEUTRAL
- Phase 27 (unified cache): +0.74% GO
- Phase 28 (bg spill): NO-OP (all CORRECTNESS)
- Phase 29 (pool v2): NO-OP (inactive code path)
- Total: 17 atomics removed, +2.74% improvement
Documentation:
- PHASE29_POOL_HOTBOX_V2_AUDIT.md: Complete audit with TELEMETRY classification
- PHASE29_POOL_HOTBOX_V2_STATS_RESULTS.md: Results + new lesson learned
- ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md: Updated with Phase 29 + new checklist
- PHASE29_COMPLETE.md: Completion summary with recommendations
Decision: Keep compile-out despite NO-OP
- Code cleanliness (binary size reduction)
- Future-proofing (ready when Pool v2 enabled)
- Consistency with Phase 24-28 pattern
Generated with Claude Code
https://claude.com/claude-code
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-16 06:33:41 +09:00
### 6. Verify Code is Active (NEW: Phase 29 Lesson)
- **Phase 29:** Pool v2 stats were all TELEMETRY but ENV-gated (default OFF)
- Compile-out had **zero impact** because code never ran
- **Before A/B testing:**
1. Check for `getenv()` gates → may be OFF by default
2. Add temporary debug printf to verify code path is hit
3. Or use `perf record` to check if functions are called
- **Anomaly:** Compiled-in was 0.62% faster (noise due to compiler artifacts, not real effect)
2025-12-16 07:31:15 +09:00
### 7. Standard Procedure is Reusable (NEW: Phase 30)
- **Phase 30:** Codified 4-step procedure from Phase 24-29 learnings
- **Step 0 (execution verification):** Prevents Phase 29-style wasted effort on ENV-gated code
- **Step 1 (classification):** Prevents Phase 28-style mistakes (CORRECTNESS vs TELEMETRY)
- **Step 2-3 (implementation + A/B test):** Proven pattern from Phase 24-27
- **Result:** Systematic atomic audit (412 atomics), Phase 31 candidate selected with high confidence
### 8. NEUTRAL + Cleanliness = Valid Adoption (Phase 26/31 Pattern)
- **Phase 26:** -0.33% NEUTRAL → Adopted for code cleanliness
- **Phase 31:** -0.35% NEUTRAL → Adopted for code cleanliness (same precedent)
- **Rationale:** No performance regression (within noise), reduces complexity, maintains research flexibility (COMPILED=1 available)
- **Takeaway:** NEUTRAL verdicts justify compile-out even without performance wins
2025-12-16 05:35:11 +09:00
---
2025-12-16 07:31:15 +09:00
## Next Phase Candidates (Phase 31+)
2025-12-16 05:35:11 +09:00
Phase 29: Pool Hotbox v2 Stats Prune - NO-OP (infrastructure ready)
Target: g_pool_hotbox_v2_stats atomics (12 total) in Pool v2
Result: 0.00% impact (code path inactive by default, ENV-gated)
Verdict: NO-OP - Maintain compile-out for future-proofing
Audit Results:
- Classification: 12/12 TELEMETRY (100% observational)
- Counters: alloc_calls, alloc_fast, alloc_refill, alloc_refill_fail,
alloc_fallback_v1, free_calls, free_fast, free_fallback_v1,
page_of_fail_* (4 failure counters)
- Verification: All stats/logging only, zero flow control usage
- Phase 28 lesson applied: Traced all usages, confirmed no CORRECTNESS
Key Finding: Pool v2 OFF by default
- Requires HAKMEM_POOL_V2_ENABLED=1 to activate
- Benchmark never executes Pool v2 code paths
- Compile-out has zero performance impact (code never runs)
Implementation (future-ready):
- Added HAKMEM_POOL_HOTBOX_V2_STATS_COMPILED (default: 0)
- Wrapped 13 atomic write sites in core/hakmem_pool.c
- Pattern: #if HAKMEM_POOL_HOTBOX_V2_STATS_COMPILED ... #endif
- Expected impact if Pool v2 enabled: +0.3~0.8% (HOT+WARM atomics)
A/B Test Results:
- Baseline (COMPILED=0): 52.98 M ops/s (±0.43M, 0.81% stdev)
- Research (COMPILED=1): 53.31 M ops/s (±0.80M, 1.50% stdev)
- Delta: -0.62% (noise, not real effect - code path not active)
Critical Lesson Learned (NEW):
Phase 29 revealed ENV-gated features can appear on hot paths but never
execute. Updated audit checklist:
1. Classify atomics (CORRECTNESS vs TELEMETRY)
2. Verify no flow control usage
3. NEW: Verify code path is ACTIVE in benchmark (check ENV gates)
4. Implement compile-out
5. A/B test
Verification methods added to documentation:
- rg "getenv.*FEATURE" to check ENV gates
- perf record/report to verify execution
- Debug printf for quick validation
Cumulative Progress (Phase 24-29):
- Phase 24 (class stats): +0.93% GO
- Phase 25 (free stats): +1.07% GO
- Phase 26 (diagnostics): -0.33% NEUTRAL
- Phase 27 (unified cache): +0.74% GO
- Phase 28 (bg spill): NO-OP (all CORRECTNESS)
- Phase 29 (pool v2): NO-OP (inactive code path)
- Total: 17 atomics removed, +2.74% improvement
Documentation:
- PHASE29_POOL_HOTBOX_V2_AUDIT.md: Complete audit with TELEMETRY classification
- PHASE29_POOL_HOTBOX_V2_STATS_RESULTS.md: Results + new lesson learned
- ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md: Updated with Phase 29 + new checklist
- PHASE29_COMPLETE.md: Completion summary with recommendations
Decision: Keep compile-out despite NO-OP
- Code cleanliness (binary size reduction)
- Future-proofing (ready when Pool v2 enabled)
- Consistency with Phase 24-28 pattern
Generated with Claude Code
https://claude.com/claude-code
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-16 06:33:41 +09:00
### Completed Audits
2025-12-16 05:35:11 +09:00
2025-12-16 06:12:17 +09:00
1. ~~**Background Spill Queue** (Phase 28)~~ ✅ **COMPLETE (NO-OP)**
- **Result:** All CORRECTNESS atomics, no compile-out candidates
- **Reason:** Lock-free queue + flow control counter
Phase 29: Pool Hotbox v2 Stats Prune - NO-OP (infrastructure ready)
Target: g_pool_hotbox_v2_stats atomics (12 total) in Pool v2
Result: 0.00% impact (code path inactive by default, ENV-gated)
Verdict: NO-OP - Maintain compile-out for future-proofing
Audit Results:
- Classification: 12/12 TELEMETRY (100% observational)
- Counters: alloc_calls, alloc_fast, alloc_refill, alloc_refill_fail,
alloc_fallback_v1, free_calls, free_fast, free_fallback_v1,
page_of_fail_* (4 failure counters)
- Verification: All stats/logging only, zero flow control usage
- Phase 28 lesson applied: Traced all usages, confirmed no CORRECTNESS
Key Finding: Pool v2 OFF by default
- Requires HAKMEM_POOL_V2_ENABLED=1 to activate
- Benchmark never executes Pool v2 code paths
- Compile-out has zero performance impact (code never runs)
Implementation (future-ready):
- Added HAKMEM_POOL_HOTBOX_V2_STATS_COMPILED (default: 0)
- Wrapped 13 atomic write sites in core/hakmem_pool.c
- Pattern: #if HAKMEM_POOL_HOTBOX_V2_STATS_COMPILED ... #endif
- Expected impact if Pool v2 enabled: +0.3~0.8% (HOT+WARM atomics)
A/B Test Results:
- Baseline (COMPILED=0): 52.98 M ops/s (±0.43M, 0.81% stdev)
- Research (COMPILED=1): 53.31 M ops/s (±0.80M, 1.50% stdev)
- Delta: -0.62% (noise, not real effect - code path not active)
Critical Lesson Learned (NEW):
Phase 29 revealed ENV-gated features can appear on hot paths but never
execute. Updated audit checklist:
1. Classify atomics (CORRECTNESS vs TELEMETRY)
2. Verify no flow control usage
3. NEW: Verify code path is ACTIVE in benchmark (check ENV gates)
4. Implement compile-out
5. A/B test
Verification methods added to documentation:
- rg "getenv.*FEATURE" to check ENV gates
- perf record/report to verify execution
- Debug printf for quick validation
Cumulative Progress (Phase 24-29):
- Phase 24 (class stats): +0.93% GO
- Phase 25 (free stats): +1.07% GO
- Phase 26 (diagnostics): -0.33% NEUTRAL
- Phase 27 (unified cache): +0.74% GO
- Phase 28 (bg spill): NO-OP (all CORRECTNESS)
- Phase 29 (pool v2): NO-OP (inactive code path)
- Total: 17 atomics removed, +2.74% improvement
Documentation:
- PHASE29_POOL_HOTBOX_V2_AUDIT.md: Complete audit with TELEMETRY classification
- PHASE29_POOL_HOTBOX_V2_STATS_RESULTS.md: Results + new lesson learned
- ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md: Updated with Phase 29 + new checklist
- PHASE29_COMPLETE.md: Completion summary with recommendations
Decision: Keep compile-out despite NO-OP
- Code cleanliness (binary size reduction)
- Future-proofing (ready when Pool v2 enabled)
- Consistency with Phase 24-28 pattern
Generated with Claude Code
https://claude.com/claude-code
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-16 06:33:41 +09:00
2. ~~**Pool Hotbox v2 Stats** (Phase 29)~~ ✅ **COMPLETE (NO-OP)**
- **Result:** All TELEMETRY atomics, but code path not active (ENV-gated)
- **Reason:** `HAKMEM_POOL_V2_ENABLED` defaults to OFF
2025-12-16 07:31:15 +09:00
3. ~~**Standard Procedure Documentation** (Phase 30)~~ ✅ **COMPLETE (PROCEDURE)**
- **Result:** 4-step procedure standardized, atomic audit complete (412 atomics)
- **Reason:** Methodology standardization, not a performance phase
### High Priority: Phase 32 Target (NEXT)
4. ~~**Tiny Free Trace Atomic** (Phase 31)~~ ✅ **COMPLETE (NEUTRAL -0.35%)**
- **Result:** NEUTRAL verdict, adopted for code cleanliness
- **Reason:** HOT path atomic with zero measurable overhead (rate-limited trace)
2025-12-16 15:01:56 +09:00
5. ~~**Tiny Free Calls Counter** (Phase 32)~~ ✅ **COMPLETE (NEUTRAL -0.46%)**
- **Result:** NEUTRAL verdict, adopted for code cleanliness
- **Reason:** HOT path diagnostic counter with negligible overhead (code alignment effects)
### High Priority: Phase 33 Target (NEXT)
6. **Tiny Debug Ring Record** (Phase 33 - TOP PRIORITY) ⭐
- **Target:** `tiny_debug_ring_record(TINY_RING_EVENT_FREE_ENTER, ...)` (HOT path)
- **File:** `core/hakmem_tiny_free.inc:340` (3 lines after Phase 32 target)
- **Classification:** TELEMETRY (debug ring buffer, event logging)
- **Execution:** ⚠️ **REQUIRES STEP 0 VERIFICATION** (Phase 30 lesson)
- **Verification Required:**
```bash
# Check if debug ring is ENV-gated or always-on
rg "getenv.*DEBUG_RING" core/
rg "HAKMEM.*DEBUG.*RING" core/
```
- **Expected Gain:** +0.3% to +1.0% (if always-on, similar to Phase 25/31/32)
- **Priority:** **HIGHEST** (same HOT path as Phase 31+32, same function)
- **Warning:** Only proceed if debug ring is **always-on by default** (not ENV-gated)
2025-12-16 07:31:15 +09:00
### Medium Priority: Uncertain Candidates
2025-12-16 15:01:56 +09:00
7. **P0 Class OOB Log** (Phase 34 candidate)
2025-12-16 07:31:15 +09:00
- **Target:** `g_p0_class_oob_log` (WARM path)
- **File:** `core/hakmem_tiny_refill_p0.inc.h:41`
- **Classification:** TELEMETRY (error logging)
- **Execution:** ❓ UNCERTAIN (error path, needs verification)
- **Expected Gain:** ±0.0% to +0.2%
- **Priority:** MEDIUM (verify execution first)
7. **Remote Target Queue** (Phase 34 candidate)
2025-12-16 06:12:17 +09:00
- **Targets:** `g_remote_target_len[class_idx]` atomics
- **File:** `core/hakmem_tiny_remote_target.c`
- **Atomics:** `atomic_fetch_add/sub` on queue length
- **Frequency:** Warm (remote free path)
- **Expected Gain:** +0.1-0.3% (if telemetry)
- **Priority:** MEDIUM (needs correctness review - similar to bg_spill)
- **Warning:** May be flow control like `g_bg_spill_len` , needs audit
2025-12-16 05:35:11 +09:00
2025-12-16 07:31:15 +09:00
### Low Priority: ENV-gated (SKIP)
8. ~~**Warm Pool Prefill Logs** (SKIP - ENV-gated)~~
- **Targets:** `rel_logs` , `dbg_logs` (WARM path)
- **Files:** `core/box/warm_pool_prefill_box.h` , `core/hakmem_tiny_refill.inc.h`
- **Classification:** TELEMETRY (fprintf only)
- **Execution:** ❌ ENV-gated (HAKMEM_TINY_WARM_LOG=OFF by default)
- **Expected Gain:** 0.0% (NO-OP, Phase 29 lesson)
- **Priority:** SKIP (not executed in benchmark)
2025-12-16 05:35:11 +09:00
### Low Priority: Cold Path Atomics
2025-12-16 07:31:15 +09:00
9. **SuperSlab OS Stats** (Phase 35+)
2025-12-16 05:35:11 +09:00
- **Targets:** `g_ss_os_alloc_calls` , `g_ss_os_madvise_calls` , etc.
- **Files:** `core/box/ss_os_acquire_box.h` , `core/box/madvise_guard_box.c`
- **Frequency:** Cold (init/mmap/madvise)
- **Expected Gain:** < 0.1 %
- **Priority:** LOW (code cleanliness only)
---
## Pattern Template (For Future Phases)
### Step 1: Add Build Flag
```c
// core/hakmem_build_flags.h
#ifndef HAKMEM_[NAME]_COMPILED
# define HAKMEM_[NAME]_COMPILED 0
#endif
```
### Step 2: Wrap Atomic
```c
// core/[file].c
#if HAKMEM_[NAME]_COMPILED
atomic_fetch_add_explicit(& g_[name], 1, memory_order_relaxed);
#else
(void)0; // No-op when compiled out
#endif
```
### Step 3: A/B Test
```bash
# Baseline (compiled-out, default)
make clean & & make -j bench_random_mixed_hakmem
./scripts/run_mixed_10_cleanenv.sh > baseline.txt
# Compiled-in
make clean & & make -j EXTRA_CFLAGS='-DHAKMEM_[NAME]_COMPILED=1' bench_random_mixed_hakmem
./scripts/run_mixed_10_cleanenv.sh > compiled_in.txt
```
### Step 4: Analyze & Verdict
```python
improvement = ((baseline_avg - compiled_in_avg) / compiled_in_avg) * 100
if improvement >= 0.5:
verdict = "GO (keep compiled-out)"
elif improvement < = -0.5:
verdict = "NO-GO (revert, compiled-in is better)"
else:
verdict = "NEUTRAL (keep compiled-out for cleanliness)"
```
### Step 5: Document
Create `docs/analysis/PHASE[N]_[NAME]_RESULTS.md` with:
- Implementation details
- A/B test results
- Verdict & reasoning
- Files modified
---
## Build Flag Summary
All atomic compile gates in `core/hakmem_build_flags.h` :
```c
// Phase 24: Tiny Class Stats (GO +0.93%)
#ifndef HAKMEM_TINY_CLASS_STATS_COMPILED
# define HAKMEM_TINY_CLASS_STATS_COMPILED 0
#endif
// Phase 25: Tiny Free Stats (GO +1.07%)
#ifndef HAKMEM_TINY_FREE_STATS_COMPILED
# define HAKMEM_TINY_FREE_STATS_COMPILED 0
#endif
2025-12-16 06:12:17 +09:00
// Phase 27: Unified Cache Stats (GO +0.74%)
#ifndef HAKMEM_TINY_UNIFIED_CACHE_MEASURE_COMPILED
# define HAKMEM_TINY_UNIFIED_CACHE_MEASURE_COMPILED 0
#endif
2025-12-16 05:35:11 +09:00
// Phase 26A: C7 Free Count (NEUTRAL -0.33%)
#ifndef HAKMEM_C7_FREE_COUNT_COMPILED
# define HAKMEM_C7_FREE_COUNT_COMPILED 0
#endif
// Phase 26B: Header Mismatch Log (NEUTRAL)
#ifndef HAKMEM_HDR_MISMATCH_LOG_COMPILED
# define HAKMEM_HDR_MISMATCH_LOG_COMPILED 0
#endif
// Phase 26C: Header Meta Mismatch (NEUTRAL)
#ifndef HAKMEM_HDR_META_MISMATCH_COMPILED
# define HAKMEM_HDR_META_MISMATCH_COMPILED 0
#endif
// Phase 26D: Metric Bad Class (NEUTRAL)
#ifndef HAKMEM_METRIC_BAD_CLASS_COMPILED
# define HAKMEM_METRIC_BAD_CLASS_COMPILED 0
#endif
// Phase 26E: Header Meta Fast (NEUTRAL)
#ifndef HAKMEM_HDR_META_FAST_COMPILED
# define HAKMEM_HDR_META_FAST_COMPILED 0
#endif
Phase 29: Pool Hotbox v2 Stats Prune - NO-OP (infrastructure ready)
Target: g_pool_hotbox_v2_stats atomics (12 total) in Pool v2
Result: 0.00% impact (code path inactive by default, ENV-gated)
Verdict: NO-OP - Maintain compile-out for future-proofing
Audit Results:
- Classification: 12/12 TELEMETRY (100% observational)
- Counters: alloc_calls, alloc_fast, alloc_refill, alloc_refill_fail,
alloc_fallback_v1, free_calls, free_fast, free_fallback_v1,
page_of_fail_* (4 failure counters)
- Verification: All stats/logging only, zero flow control usage
- Phase 28 lesson applied: Traced all usages, confirmed no CORRECTNESS
Key Finding: Pool v2 OFF by default
- Requires HAKMEM_POOL_V2_ENABLED=1 to activate
- Benchmark never executes Pool v2 code paths
- Compile-out has zero performance impact (code never runs)
Implementation (future-ready):
- Added HAKMEM_POOL_HOTBOX_V2_STATS_COMPILED (default: 0)
- Wrapped 13 atomic write sites in core/hakmem_pool.c
- Pattern: #if HAKMEM_POOL_HOTBOX_V2_STATS_COMPILED ... #endif
- Expected impact if Pool v2 enabled: +0.3~0.8% (HOT+WARM atomics)
A/B Test Results:
- Baseline (COMPILED=0): 52.98 M ops/s (±0.43M, 0.81% stdev)
- Research (COMPILED=1): 53.31 M ops/s (±0.80M, 1.50% stdev)
- Delta: -0.62% (noise, not real effect - code path not active)
Critical Lesson Learned (NEW):
Phase 29 revealed ENV-gated features can appear on hot paths but never
execute. Updated audit checklist:
1. Classify atomics (CORRECTNESS vs TELEMETRY)
2. Verify no flow control usage
3. NEW: Verify code path is ACTIVE in benchmark (check ENV gates)
4. Implement compile-out
5. A/B test
Verification methods added to documentation:
- rg "getenv.*FEATURE" to check ENV gates
- perf record/report to verify execution
- Debug printf for quick validation
Cumulative Progress (Phase 24-29):
- Phase 24 (class stats): +0.93% GO
- Phase 25 (free stats): +1.07% GO
- Phase 26 (diagnostics): -0.33% NEUTRAL
- Phase 27 (unified cache): +0.74% GO
- Phase 28 (bg spill): NO-OP (all CORRECTNESS)
- Phase 29 (pool v2): NO-OP (inactive code path)
- Total: 17 atomics removed, +2.74% improvement
Documentation:
- PHASE29_POOL_HOTBOX_V2_AUDIT.md: Complete audit with TELEMETRY classification
- PHASE29_POOL_HOTBOX_V2_STATS_RESULTS.md: Results + new lesson learned
- ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md: Updated with Phase 29 + new checklist
- PHASE29_COMPLETE.md: Completion summary with recommendations
Decision: Keep compile-out despite NO-OP
- Code cleanliness (binary size reduction)
- Future-proofing (ready when Pool v2 enabled)
- Consistency with Phase 24-28 pattern
Generated with Claude Code
https://claude.com/claude-code
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-16 06:33:41 +09:00
// Phase 29: Pool Hotbox v2 Stats (NO-OP - code not active)
#ifndef HAKMEM_POOL_HOTBOX_V2_STATS_COMPILED
# define HAKMEM_POOL_HOTBOX_V2_STATS_COMPILED 0
#endif
2025-12-16 07:31:15 +09:00
// Phase 31: Tiny Free Trace (NEUTRAL -0.35%)
#ifndef HAKMEM_TINY_FREE_TRACE_COMPILED
# define HAKMEM_TINY_FREE_TRACE_COMPILED 0
#endif
2025-12-16 15:01:56 +09:00
// Phase 32: Tiny Free Calls (NEUTRAL -0.46%)
#ifndef HAKMEM_TINY_FREE_CALLS_COMPILED
# define HAKMEM_TINY_FREE_CALLS_COMPILED 0
#endif
2025-12-16 05:35:11 +09:00
```
**Default State:** All flags = 0 (compiled-out, production-ready)
**Research Use:** Set flag = 1 to enable specific telemetry atomic
---
## Conclusion
2025-12-16 15:01:56 +09:00
**Total Progress (Phase 24+25+26+27+28+29+30+31+32):**
- **Performance Gain:** +2.74% (Phase 24: +0.93%, Phase 25: +1.07%, Phase 26: NEUTRAL, Phase 27: +0.74%, Phase 28: NO-OP, Phase 29: NO-OP, Phase 30: PROCEDURE, Phase 31: NEUTRAL, Phase 32: NEUTRAL)
- **Atomics Removed:** 19 telemetry atomics from hot/warm paths (17 compiled-out + 1 Phase 31 + 1 Phase 32)
- **Phases Completed:** 9 phases (4 with performance changes, 2 audit-only, 1 standardization, 2 cleanliness)
2025-12-16 06:12:17 +09:00
- **Code Quality:** Cleaner hot/warm paths, closer to mimalloc's zero-overhead principle
2025-12-16 15:01:56 +09:00
- **Methodology:** 4-step standard procedure validated (Phase 30-31-32)
- **Next Target:** Phase 33 (`tiny_debug_ring_record` , HOT path, **REQUIRES STEP 0 VERIFICATION** )
2025-12-16 05:35:11 +09:00
**Key Success Factors:**
1. Systematic audit and classification (CORRECTNESS vs TELEMETRY)
2. Consistent A/B testing methodology
3. Clear verdict criteria (GO/NEUTRAL/NO-GO)
4. Focus on high-frequency atomics for performance
5. Compile-out low-frequency atomics for cleanliness
2025-12-16 07:31:15 +09:00
6. **NEW:** Step 0 execution verification (Phase 30 standard procedure)
2025-12-16 05:35:11 +09:00
**Future Work:**
2025-12-16 15:01:56 +09:00
- **Immediate:** Phase 33 (`tiny_debug_ring_record` , HOT path, same location as Phase 31+32)
- **CRITICAL:** Phase 33 requires Step 0 verification (ENV gate check) before proceeding
- Expected cumulative gain: +2.74% (stable, no further performance gains expected from Phase 31+32 NEUTRAL results)
2025-12-16 07:31:15 +09:00
- Follow Phase 30 standard procedure for all future candidates
- Focus on execution-verified, high-frequency paths
2025-12-16 05:35:11 +09:00
- Document all verdicts for reproducibility
2025-12-16 15:01:56 +09:00
- Accept NEUTRAL verdicts for code cleanliness (Phase 26/31/32 pattern)
2025-12-16 05:35:11 +09:00
2025-12-16 15:01:56 +09:00
**Lessons from Phase 28+29+30+31+32:**
Phase 29: Pool Hotbox v2 Stats Prune - NO-OP (infrastructure ready)
Target: g_pool_hotbox_v2_stats atomics (12 total) in Pool v2
Result: 0.00% impact (code path inactive by default, ENV-gated)
Verdict: NO-OP - Maintain compile-out for future-proofing
Audit Results:
- Classification: 12/12 TELEMETRY (100% observational)
- Counters: alloc_calls, alloc_fast, alloc_refill, alloc_refill_fail,
alloc_fallback_v1, free_calls, free_fast, free_fallback_v1,
page_of_fail_* (4 failure counters)
- Verification: All stats/logging only, zero flow control usage
- Phase 28 lesson applied: Traced all usages, confirmed no CORRECTNESS
Key Finding: Pool v2 OFF by default
- Requires HAKMEM_POOL_V2_ENABLED=1 to activate
- Benchmark never executes Pool v2 code paths
- Compile-out has zero performance impact (code never runs)
Implementation (future-ready):
- Added HAKMEM_POOL_HOTBOX_V2_STATS_COMPILED (default: 0)
- Wrapped 13 atomic write sites in core/hakmem_pool.c
- Pattern: #if HAKMEM_POOL_HOTBOX_V2_STATS_COMPILED ... #endif
- Expected impact if Pool v2 enabled: +0.3~0.8% (HOT+WARM atomics)
A/B Test Results:
- Baseline (COMPILED=0): 52.98 M ops/s (±0.43M, 0.81% stdev)
- Research (COMPILED=1): 53.31 M ops/s (±0.80M, 1.50% stdev)
- Delta: -0.62% (noise, not real effect - code path not active)
Critical Lesson Learned (NEW):
Phase 29 revealed ENV-gated features can appear on hot paths but never
execute. Updated audit checklist:
1. Classify atomics (CORRECTNESS vs TELEMETRY)
2. Verify no flow control usage
3. NEW: Verify code path is ACTIVE in benchmark (check ENV gates)
4. Implement compile-out
5. A/B test
Verification methods added to documentation:
- rg "getenv.*FEATURE" to check ENV gates
- perf record/report to verify execution
- Debug printf for quick validation
Cumulative Progress (Phase 24-29):
- Phase 24 (class stats): +0.93% GO
- Phase 25 (free stats): +1.07% GO
- Phase 26 (diagnostics): -0.33% NEUTRAL
- Phase 27 (unified cache): +0.74% GO
- Phase 28 (bg spill): NO-OP (all CORRECTNESS)
- Phase 29 (pool v2): NO-OP (inactive code path)
- Total: 17 atomics removed, +2.74% improvement
Documentation:
- PHASE29_POOL_HOTBOX_V2_AUDIT.md: Complete audit with TELEMETRY classification
- PHASE29_POOL_HOTBOX_V2_STATS_RESULTS.md: Results + new lesson learned
- ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md: Updated with Phase 29 + new checklist
- PHASE29_COMPLETE.md: Completion summary with recommendations
Decision: Keep compile-out despite NO-OP
- Code cleanliness (binary size reduction)
- Future-proofing (ready when Pool v2 enabled)
- Consistency with Phase 24-28 pattern
Generated with Claude Code
https://claude.com/claude-code
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-16 06:33:41 +09:00
- Not all atomic counters are telemetry (Phase 28: flow control counters are CORRECTNESS)
- Flow control counters (e.g., `g_bg_spill_len` ) are UNTOUCHABLE
2025-12-16 06:12:17 +09:00
- Always trace how counter is used before classifying
Phase 29: Pool Hotbox v2 Stats Prune - NO-OP (infrastructure ready)
Target: g_pool_hotbox_v2_stats atomics (12 total) in Pool v2
Result: 0.00% impact (code path inactive by default, ENV-gated)
Verdict: NO-OP - Maintain compile-out for future-proofing
Audit Results:
- Classification: 12/12 TELEMETRY (100% observational)
- Counters: alloc_calls, alloc_fast, alloc_refill, alloc_refill_fail,
alloc_fallback_v1, free_calls, free_fast, free_fallback_v1,
page_of_fail_* (4 failure counters)
- Verification: All stats/logging only, zero flow control usage
- Phase 28 lesson applied: Traced all usages, confirmed no CORRECTNESS
Key Finding: Pool v2 OFF by default
- Requires HAKMEM_POOL_V2_ENABLED=1 to activate
- Benchmark never executes Pool v2 code paths
- Compile-out has zero performance impact (code never runs)
Implementation (future-ready):
- Added HAKMEM_POOL_HOTBOX_V2_STATS_COMPILED (default: 0)
- Wrapped 13 atomic write sites in core/hakmem_pool.c
- Pattern: #if HAKMEM_POOL_HOTBOX_V2_STATS_COMPILED ... #endif
- Expected impact if Pool v2 enabled: +0.3~0.8% (HOT+WARM atomics)
A/B Test Results:
- Baseline (COMPILED=0): 52.98 M ops/s (±0.43M, 0.81% stdev)
- Research (COMPILED=1): 53.31 M ops/s (±0.80M, 1.50% stdev)
- Delta: -0.62% (noise, not real effect - code path not active)
Critical Lesson Learned (NEW):
Phase 29 revealed ENV-gated features can appear on hot paths but never
execute. Updated audit checklist:
1. Classify atomics (CORRECTNESS vs TELEMETRY)
2. Verify no flow control usage
3. NEW: Verify code path is ACTIVE in benchmark (check ENV gates)
4. Implement compile-out
5. A/B test
Verification methods added to documentation:
- rg "getenv.*FEATURE" to check ENV gates
- perf record/report to verify execution
- Debug printf for quick validation
Cumulative Progress (Phase 24-29):
- Phase 24 (class stats): +0.93% GO
- Phase 25 (free stats): +1.07% GO
- Phase 26 (diagnostics): -0.33% NEUTRAL
- Phase 27 (unified cache): +0.74% GO
- Phase 28 (bg spill): NO-OP (all CORRECTNESS)
- Phase 29 (pool v2): NO-OP (inactive code path)
- Total: 17 atomics removed, +2.74% improvement
Documentation:
- PHASE29_POOL_HOTBOX_V2_AUDIT.md: Complete audit with TELEMETRY classification
- PHASE29_POOL_HOTBOX_V2_STATS_RESULTS.md: Results + new lesson learned
- ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md: Updated with Phase 29 + new checklist
- PHASE29_COMPLETE.md: Completion summary with recommendations
Decision: Keep compile-out despite NO-OP
- Code cleanliness (binary size reduction)
- Future-proofing (ready when Pool v2 enabled)
- Consistency with Phase 24-28 pattern
Generated with Claude Code
https://claude.com/claude-code
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-16 06:33:41 +09:00
- Verify code path is ACTIVE before A/B testing (Phase 29: ENV-gated code has zero impact)
2025-12-16 07:31:15 +09:00
- Standard procedure prevents repeated mistakes (Phase 30: Step 0 gate prevents Phase 29-style no-ops)
2025-12-16 15:01:56 +09:00
- Not all HOT path atomics have measurable overhead (Phase 31: -0.35% NEUTRAL, Phase 32: -0.46% NEUTRAL)
- NEUTRAL verdicts justify adoption for code cleanliness (Phase 26/31/32 precedent)
- **Code alignment matters:** Phase 32 showed compiled-in was faster (code layout effects, not atomic overhead)
2025-12-16 06:12:17 +09:00
2025-12-16 05:35:11 +09:00
---
**Last Updated:** 2025-12-16
2025-12-16 15:01:56 +09:00
**Status:** Phase 24-27+31+32 Complete (+2.74%), Phase 28-29 NO-OP, Phase 30 Procedure Complete
**Next Phase:** Phase 33 (`tiny_debug_ring_record` , HOT path, **REQUIRES STEP 0 VERIFICATION** )
2025-12-16 05:35:11 +09:00
**Maintained By:** Claude Sonnet 4.5