Files
hakmem/docs/analysis/PHASE31_TINY_FREE_TRACE_ATOMIC_PRUNE_RESULTS.md

406 lines
13 KiB
Markdown
Raw Normal View History

Phase 30-31: Standard procedure + g_tiny_free_trace atomic prune Phase 30: Standard Procedure Establishment - Created 4-step standardized methodology (Step 0-3) - Step 0: Execution Verification (NEW - Phase 29 lesson) - Step 1: CORRECTNESS/TELEMETRY Classification (Phase 28 lesson) - Step 2: Compile-Out Implementation (Phase 24-27 pattern) - Step 3: A/B Test (build-level comparison) - Executed audit_atomics.sh: 412 atomics analyzed - Identified Phase 31 candidate: g_tiny_free_trace (HOT path, TOP PRIORITY) Phase 31: g_tiny_free_trace Compile-Out (HOT Path TELEMETRY) - Target: core/hakmem_tiny_free.inc:326 (trace-rate-limit atomic) - Added HAKMEM_TINY_FREE_TRACE_COMPILED (default: 0) - Classification: Pure TELEMETRY (trace output only, no flow control) - A/B Result: NEUTRAL (baseline -0.35% mean, +0.19% median) - Verdict: NEUTRAL → Adopted for code cleanliness (Phase 26 precedent) - Rationale: HOT path TELEMETRY removal improves code quality A/B Test Details: - Baseline (COMPILED=0): 53.638M ops/s mean, 53.799M median - Compiled-in (COMPILED=1): 53.828M ops/s mean, 53.697M median - Conflicting signals within ±0.5% noise margin - Phase 25 comparison: g_free_ss_enter (+1.07% GO) vs g_tiny_free_trace (NEUTRAL) - Hypothesis: Rate-limited atomic (128 calls) optimized by compiler Cumulative Progress (Phase 24-31): - Phase 24 (class stats): +0.93% GO - Phase 25 (free stats): +1.07% GO - Phase 26 (diagnostics): -0.33% NEUTRAL - Phase 27 (unified cache): +0.74% GO - Phase 28 (bg spill): NO-OP (all CORRECTNESS) - Phase 29 (pool v2): NO-OP (ENV-gated) - Phase 30 (procedure): PROCEDURE - Phase 31 (free trace): -0.35% NEUTRAL - Total: 18 atomics removed, +2.74% net improvement Documentation Created: - PHASE30_STANDARD_PROCEDURE.md: Complete 4-step methodology - ATOMIC_AUDIT_FULL.txt: 412 atomics comprehensive audit - PHASE31_CANDIDATES_HOT/WARM.txt: Priority-sorted candidates - PHASE31_RECOMMENDED_CANDIDATES.md: TOP 3 with Step 0 verification - PHASE31_TINY_FREE_TRACE_ATOMIC_PRUNE_RESULTS.md: Complete A/B results - ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md: Updated (Phase 30-31) - CURRENT_TASK.md: Phase 32 candidate identified (g_hak_tiny_free_calls) Key Lessons: - Lesson 7 (Phase 30): Step 0 execution verification prevents wasted effort - Lesson 8 (Phase 31): NEUTRAL + code cleanliness = valid adoption - HOT path ≠ guaranteed performance win (rate-limited atomics may be optimized) Next Phase: Phase 32 candidate (g_hak_tiny_free_calls) - Location: core/hakmem_tiny_free.inc:335 (9 lines below Phase 31 target) - Expected: +0.3~0.7% or NEUTRAL Generated with Claude Code https://claude.com/claude-code Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-16 07:31:15 +09:00
# Phase 31: Tiny Free Trace Atomic Prune - Results
**Date:** 2025-12-16
**Type:** HOT path TELEMETRY atomic prune
**Target:** `g_tiny_free_trace` atomic in `core/hakmem_tiny_free.inc:326`
**Verdict:** NEUTRAL (code cleanliness adopted)
---
## Executive Summary
Phase 31 targeted the `g_tiny_free_trace` atomic in the HOT path (`hak_tiny_free()` entry point). A/B testing showed **NEUTRAL performance** (-0.35% mean, +0.19% median), well within noise range (±0.5%). Following Phase 26 precedent (5 atomics, -0.33%, adopted for code cleanliness), **Phase 31 is ADOPTED** with COMPILED=0 as default to reduce HOT path complexity.
---
## Background
### Phase 30 Selection Process
From 412 total atomics audited:
- **HOT path candidates:** 16 total
- 5 TELEMETRY (4 already compiled-out in Phases 24-27)
- 11 UNKNOWN (require manual review)
**Phase 31 candidate selected:** `g_tiny_free_trace` (HOT path, TELEMETRY, TOP PRIORITY)
**Step 0 verification (MANDATORY):**
- No ENV gate → always active
- Located in `hak_tiny_free()` → executes on EVERY tiny free call
- Mixed benchmark heavily exercises free path → high execution count
- **Execution confirmed:** First instruction in HOT path function
### Target Profile
**Location:** `core/hakmem_tiny_free.inc:326`
**Original Code:**
```c
void hak_tiny_free(void* ptr) {
static _Atomic int g_tiny_free_trace = 0;
if (atomic_fetch_add_explicit(&g_tiny_free_trace, 1, memory_order_relaxed) < 128) {
HAK_TRACE("[hak_tiny_free_enter]\n");
}
// ... rest of function ...
}
```
**Classification:**
- **Class:** TELEMETRY (trace rate-limit only)
- **Path:** HOT (every tiny free operation)
- **Flow Control:** None (only affects `HAK_TRACE` macro output)
- **Correctness Impact:** None
**Similar precedent:** Phase 25 (`g_free_ss_enter`: +1.07% GO)
---
## Implementation (4-Step Standard Procedure)
### Step 0: Execution Verification (Phase 29 lesson)
**ENV gate check:**
```bash
$ rg "getenv.*TRACE" core/ --type c
# (No results - no ENV gate blocking execution)
```
**Execution check:**
- Located at entry of `hak_tiny_free()` (line 326)
- Executes on EVERY tiny free call (no conditional bypass)
- Mixed benchmark: ~10M+ free operations per run
- **Verification:** PASSED (always active)
### Step 1: CORRECTNESS/TELEMETRY Classification (Phase 28 lesson)
**Full usage audit:**
```bash
$ rg -n "g_tiny_free_trace" core/
core/hakmem_tiny_free.inc:326: static _Atomic int g_tiny_free_trace = 0;
core/hakmem_tiny_free.inc:327: if (atomic_fetch_add_explicit(&g_tiny_free_trace, 1, memory_order_relaxed) < 128) {
```
**Analysis:**
- Only 2 uses: declaration + atomic increment
- No `if` conditions using the counter value
- Only affects `HAK_TRACE` printf (debug macro)
- **Classification:** Pure TELEMETRY ✅
### Step 2: Compile-Out Implementation
**File 1:** `core/hakmem_build_flags.h`
**Added:**
```c
// ------------------------------------------------------------
// Phase 31: Tiny Free Trace Atomic Prune (Compile-out trace atomic)
// ------------------------------------------------------------
// Tiny Free Trace: Compile gate (default OFF = compile-out)
// Set to 1 for research builds that need free path trace diagnostics
// Target: g_tiny_free_trace atomic in core/hakmem_tiny_free.inc:326
// Impact: HOT path atomic (every free operation)
// Expected improvement: +0.5% to +1.0% (similar to Phase 25: +1.07%)
#ifndef HAKMEM_TINY_FREE_TRACE_COMPILED
# define HAKMEM_TINY_FREE_TRACE_COMPILED 0
#endif
```
**File 2:** `core/hakmem_tiny_free.inc:326`
**Before:**
```c
void hak_tiny_free(void* ptr) {
static _Atomic int g_tiny_free_trace = 0;
if (atomic_fetch_add_explicit(&g_tiny_free_trace, 1, memory_order_relaxed) < 128) {
HAK_TRACE("[hak_tiny_free_enter]\n");
}
// ... rest of function ...
}
```
**After:**
```c
void hak_tiny_free(void* ptr) {
#if HAKMEM_TINY_FREE_TRACE_COMPILED
static _Atomic int g_tiny_free_trace = 0;
if (atomic_fetch_add_explicit(&g_tiny_free_trace, 1, memory_order_relaxed) < 128) {
HAK_TRACE("[hak_tiny_free_enter]\n");
}
#else
(void)0; // No-op when trace compiled out
#endif
// ... rest of function ...
}
```
**Include verification:**
- `hakmem_build_flags.h` included transitively via `tiny_front_config_box.h`
- No explicit include needed
### Step 3: A/B Test (Build-Level Comparison)
**Baseline (COMPILED=0, default - trace compiled-out):**
```bash
make clean && make -j bench_random_mixed_hakmem
scripts/run_mixed_10_cleanenv.sh
```
**Compiled-in (COMPILED=1, research - trace active):**
```bash
make clean && make -j EXTRA_CFLAGS='-DHAKMEM_TINY_FREE_TRACE_COMPILED=1' bench_random_mixed_hakmem
scripts/run_mixed_10_cleanenv.sh
```
---
## A/B Test Results
### Raw Data (10-run clean environment)
**Baseline (COMPILED=0, trace compiled-out):**
```
Run 1: 53432447 ops/s
Run 2: 53846666 ops/s
Run 3: 53256003 ops/s
Run 4: 54007573 ops/s
Run 5: 54132468 ops/s
Run 6: 53937278 ops/s
Run 7: 53752216 ops/s
Run 8: 53106138 ops/s
Run 9: 53861749 ops/s
Run 10: 53052398 ops/s
```
**Compiled-in (COMPILED=1, trace active):**
```
Run 1: 53667388 ops/s
Run 2: 53623799 ops/s
Run 3: 54099595 ops/s
Run 4: 53993106 ops/s
Run 5: 53530214 ops/s
Run 6: 54275707 ops/s
Run 7: 53726604 ops/s
Run 8: 53607801 ops/s
Run 9: 54122912 ops/s
Run 10: 53630312 ops/s
```
### Statistical Analysis
| Metric | Baseline (COMPILED=0) | Compiled-in (COMPILED=1) | Difference |
|--------|----------------------|-------------------------|------------|
| **Mean** | 53,638,493.60 ops/s | 53,827,743.80 ops/s | **-0.35%** |
| **Median** | 53,799,441.00 ops/s | 53,696,996.00 ops/s | **+0.19%** |
| **Stdev** | 393,174.93 (0.73%) | 267,178.23 (0.50%) | - |
**Difference interpretation:**
- **Mean:** Baseline -0.35% (SLOWER, but within noise)
- **Median:** Baseline +0.19% (FASTER, but within noise)
- **Verdict range:** Both within ±0.5% NEUTRAL threshold
---
## Verdict
### Performance: NEUTRAL
**Criteria:**
- GO: +0.5% or more (compile-out wins)
- NEUTRAL: ±0.5% (no significant difference)
- NO-GO: -0.5% or worse (compile-out loses)
**Result:** NEUTRAL (-0.35% mean, +0.19% median)
**Analysis:**
- Mean shows slight regression (-0.35%), median shows slight improvement (+0.19%)
- Conflicting signals suggest **measurement noise** rather than true effect
- Standard deviation overlap confirms lack of statistical significance
- Similar to Phase 26 pattern (-0.33%, 5 atomics, NEUTRAL)
### Decision: ADOPTED (COMPILED=0 default)
**Rationale (following Phase 26 precedent):**
1. **Code Cleanliness:**
- Removes unused TELEMETRY atomic from HOT path
- Reduces complexity at `hak_tiny_free()` entry point
- No correctness impact (pure trace macro)
2. **Consistency:**
- Phase 26 precedent: -0.33% NEUTRAL result adopted for cleanliness
- Phase 31: -0.35% NEUTRAL result follows same logic
- Maintains atomic prune momentum (Phases 24-31)
3. **Research Flexibility:**
- `COMPILED=1` still available for trace diagnostics
- No functionality lost, only default changed
- Easy revert if needed (`make EXTRA_CFLAGS=-DHAKMEM_TINY_FREE_TRACE_COMPILED=1`)
4. **Why Not NO-GO?**
- Median +0.19% (slight win, not loss)
- Mean -0.35% within noise range (±0.5% threshold)
- Phase 26 set precedent: NEUTRAL + cleanliness = ADOPT
---
## Comparison: Phase 25 vs Phase 31
**Phase 25:** `g_free_ss_enter` (free stats atomic)
- **Location:** `tiny_superslab_free.inc.h:25` (entry point)
- **Result:** +1.07% (GO)
- **Path:** Same HOT path (free entry)
- **Similarity:** Both trace/stats atomics at free entry
**Phase 31:** `g_tiny_free_trace` (trace rate-limit atomic)
- **Location:** `hakmem_tiny_free.inc:326` (entry point)
- **Result:** -0.35% mean, +0.19% median (NEUTRAL)
- **Path:** Same HOT path (free entry)
- **Difference:** Rate-limited (128 calls) vs always-increment
**Why different results?**
1. **Execution frequency:**
- Phase 25: EVERY free call increments stats
- Phase 31: EVERY free call increments, but trace only 128 times
- **Hypothesis:** Phase 25's always-active stats had higher overhead
2. **Atomic placement:**
- Phase 25: Inside `hak_tiny_free_superslab()` (deeper in call stack)
- Phase 31: First instruction in `hak_tiny_free()` (entry point)
- **Hypothesis:** Entry point atomic may be better optimized by compiler
3. **Measurement variance:**
- Phase 25: Clear +1.07% signal above noise
- Phase 31: -0.35% / +0.19% conflicting signals (noise)
- **Conclusion:** Phase 31 likely true NEUTRAL, not hidden win
---
## Lessons Learned
### 1. HOT Path ≠ Guaranteed Win
**Previous assumption (from Phase 25):**
- HOT path TELEMETRY atomic → +0.5% to +1.0% expected
**Phase 31 reality:**
- HOT path TELEMETRY atomic → NEUTRAL (±0.0%)
**Insight:**
- Not all HOT path atomics have measurable overhead
- Rate-limited trace (128 calls) may be optimized away by compiler
- Entry point placement may reduce overhead vs mid-function
### 2. NEUTRAL + Cleanliness = ADOPT
**Established precedent (Phase 26):**
- 5 diagnostic atomics, -0.33% NEUTRAL result
- Adopted for code cleanliness despite no performance win
**Phase 31 confirms:**
- -0.35% NEUTRAL result, same adoption logic
- Code cleanliness is valid secondary criterion
- Maintains atomic prune momentum (Phases 24-31)
### 3. Step 0 (Execution Verification) Essential
**Phase 31 validated:**
- Step 0 confirmed no ENV gate → always active
- Prevented Phase 29 "empty bench" scenario
- Standard procedure working as designed
---
## Next Steps
### Phase 32 Candidate: `g_hak_tiny_free_calls`
**Location:** `core/hakmem_tiny_free.inc:335` (same function, 9 lines after Phase 31 target)
**Code context:**
```c
void hak_tiny_free(void* ptr) {
#if HAKMEM_TINY_FREE_TRACE_COMPILED
// Phase 31 target (now compiled-out)
#endif
// Track total tiny free calls (diagnostics)
extern _Atomic uint64_t g_hak_tiny_free_calls;
atomic_fetch_add_explicit(&g_hak_tiny_free_calls, 1, memory_order_relaxed); // ← Phase 32 target
// ... rest of function ...
}
```
**Profile:**
- **Path:** HOT (every tiny free call, same as Phase 31)
- **Classification:** TELEMETRY (diagnostic counter, no flow control)
- **Expected:** +0.3% to +0.7% (smaller than Phase 25, similar to Phase 31)
- **Step 0 verification needed:** Check for ENV gate, confirm execution
**Alternative candidates:**
- Manual review of UNKNOWN atomics (284 candidates from Phase 30 audit)
- Lower priority than confirmed HOT path targets
---
## Files Modified
### Code Changes
1. **`core/hakmem_build_flags.h`**
- Added `HAKMEM_TINY_FREE_TRACE_COMPILED` flag (default OFF)
- Lines 363-373
2. **`core/hakmem_tiny_free.inc`**
- Wrapped `g_tiny_free_trace` atomic in `#if HAKMEM_TINY_FREE_TRACE_COMPILED`
- Lines 326-333
### Documentation
1. **`docs/analysis/PHASE31_TINY_FREE_TRACE_ATOMIC_PRUNE_RESULTS.md`** (this file)
- A/B test results
- NEUTRAL verdict + code cleanliness adoption
- Phase 32 candidate proposal
2. **`docs/analysis/ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md`** (to be updated)
- Phase 24-31 cumulative summary
- Updated precedents section
- Phase 32 roadmap
3. **`CURRENT_TASK.md`** (to be updated)
- Phase 31 completion
- Phase 32 candidate recommendation
---
## Cumulative Progress (Phases 24-31)
| Phase | Target | Atomics | Result | Status |
|-------|--------|---------|--------|--------|
| **24** | Tiny Class Stats (OBSERVE) | 5 | **+0.93%** | GO ✅ |
| **25** | Free Stats (`g_free_ss_enter`) | 1 | **+1.07%** | GO ✅ |
| **26** | Hot Path Diagnostics | 5 | **-0.33%** | NEUTRAL ✅ |
| **27** | Unified Cache Stats | 6 | **+0.74%** | GO ✅ |
| **28** | Background Spill Queue | 8 | N/A | NO-OP ✅ |
| **29** | Pool Hotbox v2 Stats | 12 | **0.00%** | NO-OP ✅ |
| **30** | Standard Procedure | 412 audit | N/A | PROCEDURE ✅ |
| **31** | Tiny Free Trace | 1 | **-0.35%** | NEUTRAL ✅ |
| **Total** | **18 atomics removed** | **+2.74%** | **net cumulative** | **✅** |
**Net cumulative gain:** +2.74% (Phases 24+25+27, excluding NEUTRAL 26+31)
**Note:** Phase 26 and 31 NEUTRAL results do not degrade cumulative gain (no regression).
---
## Conclusion
Phase 31 demonstrates that **not all HOT path TELEMETRY atomics have measurable overhead**. While Phase 25 (`g_free_ss_enter`) delivered +1.07%, Phase 31 (`g_tiny_free_trace`) showed NEUTRAL performance (-0.35% mean, +0.19% median). Following Phase 26 precedent, **Phase 31 is ADOPTED** with COMPILED=0 as default for **code cleanliness** benefits.
**Key takeaways:**
1. HOT path location does not guarantee performance wins
2. NEUTRAL + code cleanliness is valid adoption criterion (Phase 26/31 pattern)
3. Standard 4-step procedure successfully prevented false positives (Step 0 execution check)
4. Phase 32 candidate ready: `g_hak_tiny_free_calls` (same HOT path, 9 lines below)
**Recommendation:** Proceed to Phase 32 (`g_hak_tiny_free_calls`) following same 4-step procedure.