# Phase 31: Tiny Free Trace Atomic Prune - Results

**Date:** 2025-12-16
**Type:** HOT path TELEMETRY atomic prune
**Target:** `g_tiny_free_trace` atomic in `core/hakmem_tiny_free.inc:326`
**Verdict:** NEUTRAL (code cleanliness adopted)

---

## Executive Summary

Phase 31 targeted the `g_tiny_free_trace` atomic in the HOT path (`hak_tiny_free()` entry point). A/B testing showed **NEUTRAL performance** (-0.35% mean, +0.19% median), well within noise range (±0.5%). Following Phase 26 precedent (5 atomics, -0.33%, adopted for code cleanliness), **Phase 31 is ADOPTED** with COMPILED=0 as default to reduce HOT path complexity.

---

## Background

### Phase 30 Selection Process

From 412 total atomics audited:
- **HOT path candidates:** 16 total
  - 5 TELEMETRY (4 already compiled-out in Phases 24-27)
  - 11 UNKNOWN (require manual review)

**Phase 31 candidate selected:** `g_tiny_free_trace` (HOT path, TELEMETRY, TOP PRIORITY)

**Step 0 verification (MANDATORY):**
- No ENV gate → always active
- Located in `hak_tiny_free()` → executes on EVERY tiny free call
- Mixed benchmark heavily exercises free path → high execution count
- **Execution confirmed:** First instruction in HOT path function

### Target Profile

**Location:** `core/hakmem_tiny_free.inc:326`

**Original Code:**
```c
void hak_tiny_free(void* ptr) {
    static _Atomic int g_tiny_free_trace = 0;
    if (atomic_fetch_add_explicit(&g_tiny_free_trace, 1, memory_order_relaxed) < 128) {
        HAK_TRACE("[hak_tiny_free_enter]\n");
    }
    // ... rest of function ...
}
```

**Classification:**
- **Class:** TELEMETRY (trace rate-limit only)
- **Path:** HOT (every tiny free operation)
- **Flow Control:** None (only affects `HAK_TRACE` macro output)
- **Correctness Impact:** None

**Similar precedent:** Phase 25 (`g_free_ss_enter`: +1.07% GO)

---

## Implementation (4-Step Standard Procedure)

### Step 0: Execution Verification (Phase 29 lesson)

**ENV gate check:**
```bash
$ rg "getenv.*TRACE" core/ --type c
# (No results - no ENV gate blocking execution)
```

**Execution check:**
- Located at entry of `hak_tiny_free()` (line 326)
- Executes on EVERY tiny free call (no conditional bypass)
- Mixed benchmark: ~10M+ free operations per run
- **Verification:** PASSED (always active)

### Step 1: CORRECTNESS/TELEMETRY Classification (Phase 28 lesson)

**Full usage audit:**
```bash
$ rg -n "g_tiny_free_trace" core/
core/hakmem_tiny_free.inc:326:    static _Atomic int g_tiny_free_trace = 0;
core/hakmem_tiny_free.inc:327:    if (atomic_fetch_add_explicit(&g_tiny_free_trace, 1, memory_order_relaxed) < 128) {
```

**Analysis:**
- Only 2 uses: declaration + atomic increment
- No `if` conditions using the counter value
- Only affects `HAK_TRACE` printf (debug macro)
- **Classification:** Pure TELEMETRY ✅

### Step 2: Compile-Out Implementation

**File 1:** `core/hakmem_build_flags.h`

**Added:**
```c
// ------------------------------------------------------------
// Phase 31: Tiny Free Trace Atomic Prune (Compile-out trace atomic)
// ------------------------------------------------------------
// Tiny Free Trace: Compile gate (default OFF = compile-out)
// Set to 1 for research builds that need free path trace diagnostics
// Target: g_tiny_free_trace atomic in core/hakmem_tiny_free.inc:326
// Impact: HOT path atomic (every free operation)
// Expected improvement: +0.5% to +1.0% (similar to Phase 25: +1.07%)
#ifndef HAKMEM_TINY_FREE_TRACE_COMPILED
#  define HAKMEM_TINY_FREE_TRACE_COMPILED 0
#endif
```

**File 2:** `core/hakmem_tiny_free.inc:326`

**Before:**
```c
void hak_tiny_free(void* ptr) {
    static _Atomic int g_tiny_free_trace = 0;
    if (atomic_fetch_add_explicit(&g_tiny_free_trace, 1, memory_order_relaxed) < 128) {
        HAK_TRACE("[hak_tiny_free_enter]\n");
    }
    // ... rest of function ...
}
```

**After:**
```c
void hak_tiny_free(void* ptr) {
#if HAKMEM_TINY_FREE_TRACE_COMPILED
    static _Atomic int g_tiny_free_trace = 0;
    if (atomic_fetch_add_explicit(&g_tiny_free_trace, 1, memory_order_relaxed) < 128) {
        HAK_TRACE("[hak_tiny_free_enter]\n");
    }
#else
    (void)0;  // No-op when trace compiled out
#endif
    // ... rest of function ...
}
```

**Include verification:**
- `hakmem_build_flags.h` included transitively via `tiny_front_config_box.h`
- No explicit include needed

### Step 3: A/B Test (Build-Level Comparison)

**Baseline (COMPILED=0, default - trace compiled-out):**
```bash
make clean && make -j bench_random_mixed_hakmem
scripts/run_mixed_10_cleanenv.sh
```

**Compiled-in (COMPILED=1, research - trace active):**
```bash
make clean && make -j EXTRA_CFLAGS='-DHAKMEM_TINY_FREE_TRACE_COMPILED=1' bench_random_mixed_hakmem
scripts/run_mixed_10_cleanenv.sh
```

---

## A/B Test Results

### Raw Data (10-run clean environment)

**Baseline (COMPILED=0, trace compiled-out):**
```
Run  1: 53432447 ops/s
Run  2: 53846666 ops/s
Run  3: 53256003 ops/s
Run  4: 54007573 ops/s
Run  5: 54132468 ops/s
Run  6: 53937278 ops/s
Run  7: 53752216 ops/s
Run  8: 53106138 ops/s
Run  9: 53861749 ops/s
Run 10: 53052398 ops/s
```

**Compiled-in (COMPILED=1, trace active):**
```
Run  1: 53667388 ops/s
Run  2: 53623799 ops/s
Run  3: 54099595 ops/s
Run  4: 53993106 ops/s
Run  5: 53530214 ops/s
Run  6: 54275707 ops/s
Run  7: 53726604 ops/s
Run  8: 53607801 ops/s
Run  9: 54122912 ops/s
Run 10: 53630312 ops/s
```

### Statistical Analysis

| Metric | Baseline (COMPILED=0) | Compiled-in (COMPILED=1) | Difference |
|--------|----------------------|-------------------------|------------|
| **Mean** | 53,638,493.60 ops/s | 53,827,743.80 ops/s | **-0.35%** |
| **Median** | 53,799,441.00 ops/s | 53,696,996.00 ops/s | **+0.19%** |
| **Stdev** | 393,174.93 (0.73%) | 267,178.23 (0.50%) | - |

**Difference interpretation:**
- **Mean:** Baseline -0.35% (SLOWER, but within noise)
- **Median:** Baseline +0.19% (FASTER, but within noise)
- **Verdict range:** Both within ±0.5% NEUTRAL threshold

---

## Verdict

### Performance: NEUTRAL

**Criteria:**
- GO: +0.5% or more (compile-out wins)
- NEUTRAL: ±0.5% (no significant difference)
- NO-GO: -0.5% or worse (compile-out loses)

**Result:** NEUTRAL (-0.35% mean, +0.19% median)

**Analysis:**
- Mean shows slight regression (-0.35%), median shows slight improvement (+0.19%)
- Conflicting signals suggest **measurement noise** rather than true effect
- Standard deviation overlap confirms lack of statistical significance
- Similar to Phase 26 pattern (-0.33%, 5 atomics, NEUTRAL)

### Decision: ADOPTED (COMPILED=0 default)

**Rationale (following Phase 26 precedent):**

1. **Code Cleanliness:**
   - Removes unused TELEMETRY atomic from HOT path
   - Reduces complexity at `hak_tiny_free()` entry point
   - No correctness impact (pure trace macro)

2. **Consistency:**
   - Phase 26 precedent: -0.33% NEUTRAL result adopted for cleanliness
   - Phase 31: -0.35% NEUTRAL result follows same logic
   - Maintains atomic prune momentum (Phases 24-31)

3. **Research Flexibility:**
   - `COMPILED=1` still available for trace diagnostics
   - No functionality lost, only default changed
   - Easy revert if needed (`make EXTRA_CFLAGS=-DHAKMEM_TINY_FREE_TRACE_COMPILED=1`)

4. **Why Not NO-GO?**
   - Median +0.19% (slight win, not loss)
   - Mean -0.35% within noise range (±0.5% threshold)
   - Phase 26 set precedent: NEUTRAL + cleanliness = ADOPT

---

## Comparison: Phase 25 vs Phase 31

**Phase 25:** `g_free_ss_enter` (free stats atomic)
- **Location:** `tiny_superslab_free.inc.h:25` (entry point)
- **Result:** +1.07% (GO)
- **Path:** Same HOT path (free entry)
- **Similarity:** Both trace/stats atomics at free entry

**Phase 31:** `g_tiny_free_trace` (trace rate-limit atomic)
- **Location:** `hakmem_tiny_free.inc:326` (entry point)
- **Result:** -0.35% mean, +0.19% median (NEUTRAL)
- **Path:** Same HOT path (free entry)
- **Difference:** Rate-limited (128 calls) vs always-increment

**Why different results?**

1. **Execution frequency:**
   - Phase 25: EVERY free call increments stats
   - Phase 31: EVERY free call increments, but trace only 128 times
   - **Hypothesis:** Phase 25's always-active stats had higher overhead

2. **Atomic placement:**
   - Phase 25: Inside `hak_tiny_free_superslab()` (deeper in call stack)
   - Phase 31: First instruction in `hak_tiny_free()` (entry point)
   - **Hypothesis:** Entry point atomic may be better optimized by compiler

3. **Measurement variance:**
   - Phase 25: Clear +1.07% signal above noise
   - Phase 31: -0.35% / +0.19% conflicting signals (noise)
   - **Conclusion:** Phase 31 likely true NEUTRAL, not hidden win

---

## Lessons Learned

### 1. HOT Path ≠ Guaranteed Win

**Previous assumption (from Phase 25):**
- HOT path TELEMETRY atomic → +0.5% to +1.0% expected

**Phase 31 reality:**
- HOT path TELEMETRY atomic → NEUTRAL (±0.0%)

**Insight:**
- Not all HOT path atomics have measurable overhead
- Rate-limited trace (128 calls) may be optimized away by compiler
- Entry point placement may reduce overhead vs mid-function

### 2. NEUTRAL + Cleanliness = ADOPT

**Established precedent (Phase 26):**
- 5 diagnostic atomics, -0.33% NEUTRAL result
- Adopted for code cleanliness despite no performance win

**Phase 31 confirms:**
- -0.35% NEUTRAL result, same adoption logic
- Code cleanliness is valid secondary criterion
- Maintains atomic prune momentum (Phases 24-31)

### 3. Step 0 (Execution Verification) Essential

**Phase 31 validated:**
- Step 0 confirmed no ENV gate → always active
- Prevented Phase 29 "empty bench" scenario
- Standard procedure working as designed

---

## Next Steps

### Phase 32 Candidate: `g_hak_tiny_free_calls`

**Location:** `core/hakmem_tiny_free.inc:335` (same function, 9 lines after Phase 31 target)

**Code context:**
```c
void hak_tiny_free(void* ptr) {
#if HAKMEM_TINY_FREE_TRACE_COMPILED
    // Phase 31 target (now compiled-out)
#endif
    // Track total tiny free calls (diagnostics)
    extern _Atomic uint64_t g_hak_tiny_free_calls;
    atomic_fetch_add_explicit(&g_hak_tiny_free_calls, 1, memory_order_relaxed);  // ← Phase 32 target
    // ... rest of function ...
}
```

**Profile:**
- **Path:** HOT (every tiny free call, same as Phase 31)
- **Classification:** TELEMETRY (diagnostic counter, no flow control)
- **Expected:** +0.3% to +0.7% (smaller than Phase 25, similar to Phase 31)
- **Step 0 verification needed:** Check for ENV gate, confirm execution

**Alternative candidates:**
- Manual review of UNKNOWN atomics (284 candidates from Phase 30 audit)
- Lower priority than confirmed HOT path targets

---

## Files Modified

### Code Changes

1. **`core/hakmem_build_flags.h`**
   - Added `HAKMEM_TINY_FREE_TRACE_COMPILED` flag (default OFF)
   - Lines 363-373

2. **`core/hakmem_tiny_free.inc`**
   - Wrapped `g_tiny_free_trace` atomic in `#if HAKMEM_TINY_FREE_TRACE_COMPILED`
   - Lines 326-333

### Documentation

1. **`docs/analysis/PHASE31_TINY_FREE_TRACE_ATOMIC_PRUNE_RESULTS.md`** (this file)
   - A/B test results
   - NEUTRAL verdict + code cleanliness adoption
   - Phase 32 candidate proposal

2. **`docs/analysis/ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md`** (to be updated)
   - Phase 24-31 cumulative summary
   - Updated precedents section
   - Phase 32 roadmap

3. **`CURRENT_TASK.md`** (to be updated)
   - Phase 31 completion
   - Phase 32 candidate recommendation

---

## Cumulative Progress (Phases 24-31)

| Phase | Target | Atomics | Result | Status |
|-------|--------|---------|--------|--------|
| **24** | Tiny Class Stats (OBSERVE) | 5 | **+0.93%** | GO ✅ |
| **25** | Free Stats (`g_free_ss_enter`) | 1 | **+1.07%** | GO ✅ |
| **26** | Hot Path Diagnostics | 5 | **-0.33%** | NEUTRAL ✅ |
| **27** | Unified Cache Stats | 6 | **+0.74%** | GO ✅ |
| **28** | Background Spill Queue | 8 | N/A | NO-OP ✅ |
| **29** | Pool Hotbox v2 Stats | 12 | **0.00%** | NO-OP ✅ |
| **30** | Standard Procedure | 412 audit | N/A | PROCEDURE ✅ |
| **31** | Tiny Free Trace | 1 | **-0.35%** | NEUTRAL ✅ |
| **Total** | **18 atomics removed** | **+2.74%** | **net cumulative** | **✅** |

**Net cumulative gain:** +2.74% (Phases 24+25+27, excluding NEUTRAL 26+31)

**Note:** Phase 26 and 31 NEUTRAL results do not degrade cumulative gain (no regression).

---

## Conclusion

Phase 31 demonstrates that **not all HOT path TELEMETRY atomics have measurable overhead**. While Phase 25 (`g_free_ss_enter`) delivered +1.07%, Phase 31 (`g_tiny_free_trace`) showed NEUTRAL performance (-0.35% mean, +0.19% median). Following Phase 26 precedent, **Phase 31 is ADOPTED** with COMPILED=0 as default for **code cleanliness** benefits.

**Key takeaways:**
1. HOT path location does not guarantee performance wins
2. NEUTRAL + code cleanliness is valid adoption criterion (Phase 26/31 pattern)
3. Standard 4-step procedure successfully prevented false positives (Step 0 execution check)
4. Phase 32 candidate ready: `g_hak_tiny_free_calls` (same HOT path, 9 lines below)

**Recommendation:** Proceed to Phase 32 (`g_hak_tiny_free_calls`) following same 4-step procedure.