406 lines
13 KiB
Markdown
406 lines
13 KiB
Markdown
|
|
# Phase 31: Tiny Free Trace Atomic Prune - Results
|
||
|
|
|
||
|
|
**Date:** 2025-12-16
|
||
|
|
**Type:** HOT path TELEMETRY atomic prune
|
||
|
|
**Target:** `g_tiny_free_trace` atomic in `core/hakmem_tiny_free.inc:326`
|
||
|
|
**Verdict:** NEUTRAL (code cleanliness adopted)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Executive Summary
|
||
|
|
|
||
|
|
Phase 31 targeted the `g_tiny_free_trace` atomic in the HOT path (`hak_tiny_free()` entry point). A/B testing showed **NEUTRAL performance** (-0.35% mean, +0.19% median), well within noise range (±0.5%). Following Phase 26 precedent (5 atomics, -0.33%, adopted for code cleanliness), **Phase 31 is ADOPTED** with COMPILED=0 as default to reduce HOT path complexity.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Background
|
||
|
|
|
||
|
|
### Phase 30 Selection Process
|
||
|
|
|
||
|
|
From 412 total atomics audited:
|
||
|
|
- **HOT path candidates:** 16 total
|
||
|
|
- 5 TELEMETRY (4 already compiled-out in Phases 24-27)
|
||
|
|
- 11 UNKNOWN (require manual review)
|
||
|
|
|
||
|
|
**Phase 31 candidate selected:** `g_tiny_free_trace` (HOT path, TELEMETRY, TOP PRIORITY)
|
||
|
|
|
||
|
|
**Step 0 verification (MANDATORY):**
|
||
|
|
- No ENV gate → always active
|
||
|
|
- Located in `hak_tiny_free()` → executes on EVERY tiny free call
|
||
|
|
- Mixed benchmark heavily exercises free path → high execution count
|
||
|
|
- **Execution confirmed:** First instruction in HOT path function
|
||
|
|
|
||
|
|
### Target Profile
|
||
|
|
|
||
|
|
**Location:** `core/hakmem_tiny_free.inc:326`
|
||
|
|
|
||
|
|
**Original Code:**
|
||
|
|
```c
|
||
|
|
void hak_tiny_free(void* ptr) {
|
||
|
|
static _Atomic int g_tiny_free_trace = 0;
|
||
|
|
if (atomic_fetch_add_explicit(&g_tiny_free_trace, 1, memory_order_relaxed) < 128) {
|
||
|
|
HAK_TRACE("[hak_tiny_free_enter]\n");
|
||
|
|
}
|
||
|
|
// ... rest of function ...
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Classification:**
|
||
|
|
- **Class:** TELEMETRY (trace rate-limit only)
|
||
|
|
- **Path:** HOT (every tiny free operation)
|
||
|
|
- **Flow Control:** None (only affects `HAK_TRACE` macro output)
|
||
|
|
- **Correctness Impact:** None
|
||
|
|
|
||
|
|
**Similar precedent:** Phase 25 (`g_free_ss_enter`: +1.07% GO)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Implementation (4-Step Standard Procedure)
|
||
|
|
|
||
|
|
### Step 0: Execution Verification (Phase 29 lesson)
|
||
|
|
|
||
|
|
**ENV gate check:**
|
||
|
|
```bash
|
||
|
|
$ rg "getenv.*TRACE" core/ --type c
|
||
|
|
# (No results - no ENV gate blocking execution)
|
||
|
|
```
|
||
|
|
|
||
|
|
**Execution check:**
|
||
|
|
- Located at entry of `hak_tiny_free()` (line 326)
|
||
|
|
- Executes on EVERY tiny free call (no conditional bypass)
|
||
|
|
- Mixed benchmark: ~10M+ free operations per run
|
||
|
|
- **Verification:** PASSED (always active)
|
||
|
|
|
||
|
|
### Step 1: CORRECTNESS/TELEMETRY Classification (Phase 28 lesson)
|
||
|
|
|
||
|
|
**Full usage audit:**
|
||
|
|
```bash
|
||
|
|
$ rg -n "g_tiny_free_trace" core/
|
||
|
|
core/hakmem_tiny_free.inc:326: static _Atomic int g_tiny_free_trace = 0;
|
||
|
|
core/hakmem_tiny_free.inc:327: if (atomic_fetch_add_explicit(&g_tiny_free_trace, 1, memory_order_relaxed) < 128) {
|
||
|
|
```
|
||
|
|
|
||
|
|
**Analysis:**
|
||
|
|
- Only 2 uses: declaration + atomic increment
|
||
|
|
- No `if` conditions using the counter value
|
||
|
|
- Only affects `HAK_TRACE` printf (debug macro)
|
||
|
|
- **Classification:** Pure TELEMETRY ✅
|
||
|
|
|
||
|
|
### Step 2: Compile-Out Implementation
|
||
|
|
|
||
|
|
**File 1:** `core/hakmem_build_flags.h`
|
||
|
|
|
||
|
|
**Added:**
|
||
|
|
```c
|
||
|
|
// ------------------------------------------------------------
|
||
|
|
// Phase 31: Tiny Free Trace Atomic Prune (Compile-out trace atomic)
|
||
|
|
// ------------------------------------------------------------
|
||
|
|
// Tiny Free Trace: Compile gate (default OFF = compile-out)
|
||
|
|
// Set to 1 for research builds that need free path trace diagnostics
|
||
|
|
// Target: g_tiny_free_trace atomic in core/hakmem_tiny_free.inc:326
|
||
|
|
// Impact: HOT path atomic (every free operation)
|
||
|
|
// Expected improvement: +0.5% to +1.0% (similar to Phase 25: +1.07%)
|
||
|
|
#ifndef HAKMEM_TINY_FREE_TRACE_COMPILED
|
||
|
|
# define HAKMEM_TINY_FREE_TRACE_COMPILED 0
|
||
|
|
#endif
|
||
|
|
```
|
||
|
|
|
||
|
|
**File 2:** `core/hakmem_tiny_free.inc:326`
|
||
|
|
|
||
|
|
**Before:**
|
||
|
|
```c
|
||
|
|
void hak_tiny_free(void* ptr) {
|
||
|
|
static _Atomic int g_tiny_free_trace = 0;
|
||
|
|
if (atomic_fetch_add_explicit(&g_tiny_free_trace, 1, memory_order_relaxed) < 128) {
|
||
|
|
HAK_TRACE("[hak_tiny_free_enter]\n");
|
||
|
|
}
|
||
|
|
// ... rest of function ...
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**After:**
|
||
|
|
```c
|
||
|
|
void hak_tiny_free(void* ptr) {
|
||
|
|
#if HAKMEM_TINY_FREE_TRACE_COMPILED
|
||
|
|
static _Atomic int g_tiny_free_trace = 0;
|
||
|
|
if (atomic_fetch_add_explicit(&g_tiny_free_trace, 1, memory_order_relaxed) < 128) {
|
||
|
|
HAK_TRACE("[hak_tiny_free_enter]\n");
|
||
|
|
}
|
||
|
|
#else
|
||
|
|
(void)0; // No-op when trace compiled out
|
||
|
|
#endif
|
||
|
|
// ... rest of function ...
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Include verification:**
|
||
|
|
- `hakmem_build_flags.h` included transitively via `tiny_front_config_box.h`
|
||
|
|
- No explicit include needed
|
||
|
|
|
||
|
|
### Step 3: A/B Test (Build-Level Comparison)
|
||
|
|
|
||
|
|
**Baseline (COMPILED=0, default - trace compiled-out):**
|
||
|
|
```bash
|
||
|
|
make clean && make -j bench_random_mixed_hakmem
|
||
|
|
scripts/run_mixed_10_cleanenv.sh
|
||
|
|
```
|
||
|
|
|
||
|
|
**Compiled-in (COMPILED=1, research - trace active):**
|
||
|
|
```bash
|
||
|
|
make clean && make -j EXTRA_CFLAGS='-DHAKMEM_TINY_FREE_TRACE_COMPILED=1' bench_random_mixed_hakmem
|
||
|
|
scripts/run_mixed_10_cleanenv.sh
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## A/B Test Results
|
||
|
|
|
||
|
|
### Raw Data (10-run clean environment)
|
||
|
|
|
||
|
|
**Baseline (COMPILED=0, trace compiled-out):**
|
||
|
|
```
|
||
|
|
Run 1: 53432447 ops/s
|
||
|
|
Run 2: 53846666 ops/s
|
||
|
|
Run 3: 53256003 ops/s
|
||
|
|
Run 4: 54007573 ops/s
|
||
|
|
Run 5: 54132468 ops/s
|
||
|
|
Run 6: 53937278 ops/s
|
||
|
|
Run 7: 53752216 ops/s
|
||
|
|
Run 8: 53106138 ops/s
|
||
|
|
Run 9: 53861749 ops/s
|
||
|
|
Run 10: 53052398 ops/s
|
||
|
|
```
|
||
|
|
|
||
|
|
**Compiled-in (COMPILED=1, trace active):**
|
||
|
|
```
|
||
|
|
Run 1: 53667388 ops/s
|
||
|
|
Run 2: 53623799 ops/s
|
||
|
|
Run 3: 54099595 ops/s
|
||
|
|
Run 4: 53993106 ops/s
|
||
|
|
Run 5: 53530214 ops/s
|
||
|
|
Run 6: 54275707 ops/s
|
||
|
|
Run 7: 53726604 ops/s
|
||
|
|
Run 8: 53607801 ops/s
|
||
|
|
Run 9: 54122912 ops/s
|
||
|
|
Run 10: 53630312 ops/s
|
||
|
|
```
|
||
|
|
|
||
|
|
### Statistical Analysis
|
||
|
|
|
||
|
|
| Metric | Baseline (COMPILED=0) | Compiled-in (COMPILED=1) | Difference |
|
||
|
|
|--------|----------------------|-------------------------|------------|
|
||
|
|
| **Mean** | 53,638,493.60 ops/s | 53,827,743.80 ops/s | **-0.35%** |
|
||
|
|
| **Median** | 53,799,441.00 ops/s | 53,696,996.00 ops/s | **+0.19%** |
|
||
|
|
| **Stdev** | 393,174.93 (0.73%) | 267,178.23 (0.50%) | - |
|
||
|
|
|
||
|
|
**Difference interpretation:**
|
||
|
|
- **Mean:** Baseline -0.35% (SLOWER, but within noise)
|
||
|
|
- **Median:** Baseline +0.19% (FASTER, but within noise)
|
||
|
|
- **Verdict range:** Both within ±0.5% NEUTRAL threshold
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Verdict
|
||
|
|
|
||
|
|
### Performance: NEUTRAL
|
||
|
|
|
||
|
|
**Criteria:**
|
||
|
|
- GO: +0.5% or more (compile-out wins)
|
||
|
|
- NEUTRAL: ±0.5% (no significant difference)
|
||
|
|
- NO-GO: -0.5% or worse (compile-out loses)
|
||
|
|
|
||
|
|
**Result:** NEUTRAL (-0.35% mean, +0.19% median)
|
||
|
|
|
||
|
|
**Analysis:**
|
||
|
|
- Mean shows slight regression (-0.35%), median shows slight improvement (+0.19%)
|
||
|
|
- Conflicting signals suggest **measurement noise** rather than true effect
|
||
|
|
- Standard deviation overlap confirms lack of statistical significance
|
||
|
|
- Similar to Phase 26 pattern (-0.33%, 5 atomics, NEUTRAL)
|
||
|
|
|
||
|
|
### Decision: ADOPTED (COMPILED=0 default)
|
||
|
|
|
||
|
|
**Rationale (following Phase 26 precedent):**
|
||
|
|
|
||
|
|
1. **Code Cleanliness:**
|
||
|
|
- Removes unused TELEMETRY atomic from HOT path
|
||
|
|
- Reduces complexity at `hak_tiny_free()` entry point
|
||
|
|
- No correctness impact (pure trace macro)
|
||
|
|
|
||
|
|
2. **Consistency:**
|
||
|
|
- Phase 26 precedent: -0.33% NEUTRAL result adopted for cleanliness
|
||
|
|
- Phase 31: -0.35% NEUTRAL result follows same logic
|
||
|
|
- Maintains atomic prune momentum (Phases 24-31)
|
||
|
|
|
||
|
|
3. **Research Flexibility:**
|
||
|
|
- `COMPILED=1` still available for trace diagnostics
|
||
|
|
- No functionality lost, only default changed
|
||
|
|
- Easy revert if needed (`make EXTRA_CFLAGS=-DHAKMEM_TINY_FREE_TRACE_COMPILED=1`)
|
||
|
|
|
||
|
|
4. **Why Not NO-GO?**
|
||
|
|
- Median +0.19% (slight win, not loss)
|
||
|
|
- Mean -0.35% within noise range (±0.5% threshold)
|
||
|
|
- Phase 26 set precedent: NEUTRAL + cleanliness = ADOPT
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Comparison: Phase 25 vs Phase 31
|
||
|
|
|
||
|
|
**Phase 25:** `g_free_ss_enter` (free stats atomic)
|
||
|
|
- **Location:** `tiny_superslab_free.inc.h:25` (entry point)
|
||
|
|
- **Result:** +1.07% (GO)
|
||
|
|
- **Path:** Same HOT path (free entry)
|
||
|
|
- **Similarity:** Both trace/stats atomics at free entry
|
||
|
|
|
||
|
|
**Phase 31:** `g_tiny_free_trace` (trace rate-limit atomic)
|
||
|
|
- **Location:** `hakmem_tiny_free.inc:326` (entry point)
|
||
|
|
- **Result:** -0.35% mean, +0.19% median (NEUTRAL)
|
||
|
|
- **Path:** Same HOT path (free entry)
|
||
|
|
- **Difference:** Rate-limited (128 calls) vs always-increment
|
||
|
|
|
||
|
|
**Why different results?**
|
||
|
|
|
||
|
|
1. **Execution frequency:**
|
||
|
|
- Phase 25: EVERY free call increments stats
|
||
|
|
- Phase 31: EVERY free call increments, but trace only 128 times
|
||
|
|
- **Hypothesis:** Phase 25's always-active stats had higher overhead
|
||
|
|
|
||
|
|
2. **Atomic placement:**
|
||
|
|
- Phase 25: Inside `hak_tiny_free_superslab()` (deeper in call stack)
|
||
|
|
- Phase 31: First instruction in `hak_tiny_free()` (entry point)
|
||
|
|
- **Hypothesis:** Entry point atomic may be better optimized by compiler
|
||
|
|
|
||
|
|
3. **Measurement variance:**
|
||
|
|
- Phase 25: Clear +1.07% signal above noise
|
||
|
|
- Phase 31: -0.35% / +0.19% conflicting signals (noise)
|
||
|
|
- **Conclusion:** Phase 31 likely true NEUTRAL, not hidden win
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Lessons Learned
|
||
|
|
|
||
|
|
### 1. HOT Path ≠ Guaranteed Win
|
||
|
|
|
||
|
|
**Previous assumption (from Phase 25):**
|
||
|
|
- HOT path TELEMETRY atomic → +0.5% to +1.0% expected
|
||
|
|
|
||
|
|
**Phase 31 reality:**
|
||
|
|
- HOT path TELEMETRY atomic → NEUTRAL (±0.0%)
|
||
|
|
|
||
|
|
**Insight:**
|
||
|
|
- Not all HOT path atomics have measurable overhead
|
||
|
|
- Rate-limited trace (128 calls) may be optimized away by compiler
|
||
|
|
- Entry point placement may reduce overhead vs mid-function
|
||
|
|
|
||
|
|
### 2. NEUTRAL + Cleanliness = ADOPT
|
||
|
|
|
||
|
|
**Established precedent (Phase 26):**
|
||
|
|
- 5 diagnostic atomics, -0.33% NEUTRAL result
|
||
|
|
- Adopted for code cleanliness despite no performance win
|
||
|
|
|
||
|
|
**Phase 31 confirms:**
|
||
|
|
- -0.35% NEUTRAL result, same adoption logic
|
||
|
|
- Code cleanliness is valid secondary criterion
|
||
|
|
- Maintains atomic prune momentum (Phases 24-31)
|
||
|
|
|
||
|
|
### 3. Step 0 (Execution Verification) Essential
|
||
|
|
|
||
|
|
**Phase 31 validated:**
|
||
|
|
- Step 0 confirmed no ENV gate → always active
|
||
|
|
- Prevented Phase 29 "empty bench" scenario
|
||
|
|
- Standard procedure working as designed
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Next Steps
|
||
|
|
|
||
|
|
### Phase 32 Candidate: `g_hak_tiny_free_calls`
|
||
|
|
|
||
|
|
**Location:** `core/hakmem_tiny_free.inc:335` (same function, 9 lines after Phase 31 target)
|
||
|
|
|
||
|
|
**Code context:**
|
||
|
|
```c
|
||
|
|
void hak_tiny_free(void* ptr) {
|
||
|
|
#if HAKMEM_TINY_FREE_TRACE_COMPILED
|
||
|
|
// Phase 31 target (now compiled-out)
|
||
|
|
#endif
|
||
|
|
// Track total tiny free calls (diagnostics)
|
||
|
|
extern _Atomic uint64_t g_hak_tiny_free_calls;
|
||
|
|
atomic_fetch_add_explicit(&g_hak_tiny_free_calls, 1, memory_order_relaxed); // ← Phase 32 target
|
||
|
|
// ... rest of function ...
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Profile:**
|
||
|
|
- **Path:** HOT (every tiny free call, same as Phase 31)
|
||
|
|
- **Classification:** TELEMETRY (diagnostic counter, no flow control)
|
||
|
|
- **Expected:** +0.3% to +0.7% (smaller than Phase 25, similar to Phase 31)
|
||
|
|
- **Step 0 verification needed:** Check for ENV gate, confirm execution
|
||
|
|
|
||
|
|
**Alternative candidates:**
|
||
|
|
- Manual review of UNKNOWN atomics (284 candidates from Phase 30 audit)
|
||
|
|
- Lower priority than confirmed HOT path targets
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Files Modified
|
||
|
|
|
||
|
|
### Code Changes
|
||
|
|
|
||
|
|
1. **`core/hakmem_build_flags.h`**
|
||
|
|
- Added `HAKMEM_TINY_FREE_TRACE_COMPILED` flag (default OFF)
|
||
|
|
- Lines 363-373
|
||
|
|
|
||
|
|
2. **`core/hakmem_tiny_free.inc`**
|
||
|
|
- Wrapped `g_tiny_free_trace` atomic in `#if HAKMEM_TINY_FREE_TRACE_COMPILED`
|
||
|
|
- Lines 326-333
|
||
|
|
|
||
|
|
### Documentation
|
||
|
|
|
||
|
|
1. **`docs/analysis/PHASE31_TINY_FREE_TRACE_ATOMIC_PRUNE_RESULTS.md`** (this file)
|
||
|
|
- A/B test results
|
||
|
|
- NEUTRAL verdict + code cleanliness adoption
|
||
|
|
- Phase 32 candidate proposal
|
||
|
|
|
||
|
|
2. **`docs/analysis/ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md`** (to be updated)
|
||
|
|
- Phase 24-31 cumulative summary
|
||
|
|
- Updated precedents section
|
||
|
|
- Phase 32 roadmap
|
||
|
|
|
||
|
|
3. **`CURRENT_TASK.md`** (to be updated)
|
||
|
|
- Phase 31 completion
|
||
|
|
- Phase 32 candidate recommendation
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Cumulative Progress (Phases 24-31)
|
||
|
|
|
||
|
|
| Phase | Target | Atomics | Result | Status |
|
||
|
|
|-------|--------|---------|--------|--------|
|
||
|
|
| **24** | Tiny Class Stats (OBSERVE) | 5 | **+0.93%** | GO ✅ |
|
||
|
|
| **25** | Free Stats (`g_free_ss_enter`) | 1 | **+1.07%** | GO ✅ |
|
||
|
|
| **26** | Hot Path Diagnostics | 5 | **-0.33%** | NEUTRAL ✅ |
|
||
|
|
| **27** | Unified Cache Stats | 6 | **+0.74%** | GO ✅ |
|
||
|
|
| **28** | Background Spill Queue | 8 | N/A | NO-OP ✅ |
|
||
|
|
| **29** | Pool Hotbox v2 Stats | 12 | **0.00%** | NO-OP ✅ |
|
||
|
|
| **30** | Standard Procedure | 412 audit | N/A | PROCEDURE ✅ |
|
||
|
|
| **31** | Tiny Free Trace | 1 | **-0.35%** | NEUTRAL ✅ |
|
||
|
|
| **Total** | **18 atomics removed** | **+2.74%** | **net cumulative** | **✅** |
|
||
|
|
|
||
|
|
**Net cumulative gain:** +2.74% (Phases 24+25+27, excluding NEUTRAL 26+31)
|
||
|
|
|
||
|
|
**Note:** Phase 26 and 31 NEUTRAL results do not degrade cumulative gain (no regression).
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Conclusion
|
||
|
|
|
||
|
|
Phase 31 demonstrates that **not all HOT path TELEMETRY atomics have measurable overhead**. While Phase 25 (`g_free_ss_enter`) delivered +1.07%, Phase 31 (`g_tiny_free_trace`) showed NEUTRAL performance (-0.35% mean, +0.19% median). Following Phase 26 precedent, **Phase 31 is ADOPTED** with COMPILED=0 as default for **code cleanliness** benefits.
|
||
|
|
|
||
|
|
**Key takeaways:**
|
||
|
|
1. HOT path location does not guarantee performance wins
|
||
|
|
2. NEUTRAL + code cleanliness is valid adoption criterion (Phase 26/31 pattern)
|
||
|
|
3. Standard 4-step procedure successfully prevented false positives (Step 0 execution check)
|
||
|
|
4. Phase 32 candidate ready: `g_hak_tiny_free_calls` (same HOT path, 9 lines below)
|
||
|
|
|
||
|
|
**Recommendation:** Proceed to Phase 32 (`g_hak_tiny_free_calls`) following same 4-step procedure.
|