Files
hakmem/docs/analysis/PHASE30_STANDARD_PROCEDURE.md

621 lines
16 KiB
Markdown
Raw Normal View History

Phase 30-31: Standard procedure + g_tiny_free_trace atomic prune Phase 30: Standard Procedure Establishment - Created 4-step standardized methodology (Step 0-3) - Step 0: Execution Verification (NEW - Phase 29 lesson) - Step 1: CORRECTNESS/TELEMETRY Classification (Phase 28 lesson) - Step 2: Compile-Out Implementation (Phase 24-27 pattern) - Step 3: A/B Test (build-level comparison) - Executed audit_atomics.sh: 412 atomics analyzed - Identified Phase 31 candidate: g_tiny_free_trace (HOT path, TOP PRIORITY) Phase 31: g_tiny_free_trace Compile-Out (HOT Path TELEMETRY) - Target: core/hakmem_tiny_free.inc:326 (trace-rate-limit atomic) - Added HAKMEM_TINY_FREE_TRACE_COMPILED (default: 0) - Classification: Pure TELEMETRY (trace output only, no flow control) - A/B Result: NEUTRAL (baseline -0.35% mean, +0.19% median) - Verdict: NEUTRAL → Adopted for code cleanliness (Phase 26 precedent) - Rationale: HOT path TELEMETRY removal improves code quality A/B Test Details: - Baseline (COMPILED=0): 53.638M ops/s mean, 53.799M median - Compiled-in (COMPILED=1): 53.828M ops/s mean, 53.697M median - Conflicting signals within ±0.5% noise margin - Phase 25 comparison: g_free_ss_enter (+1.07% GO) vs g_tiny_free_trace (NEUTRAL) - Hypothesis: Rate-limited atomic (128 calls) optimized by compiler Cumulative Progress (Phase 24-31): - Phase 24 (class stats): +0.93% GO - Phase 25 (free stats): +1.07% GO - Phase 26 (diagnostics): -0.33% NEUTRAL - Phase 27 (unified cache): +0.74% GO - Phase 28 (bg spill): NO-OP (all CORRECTNESS) - Phase 29 (pool v2): NO-OP (ENV-gated) - Phase 30 (procedure): PROCEDURE - Phase 31 (free trace): -0.35% NEUTRAL - Total: 18 atomics removed, +2.74% net improvement Documentation Created: - PHASE30_STANDARD_PROCEDURE.md: Complete 4-step methodology - ATOMIC_AUDIT_FULL.txt: 412 atomics comprehensive audit - PHASE31_CANDIDATES_HOT/WARM.txt: Priority-sorted candidates - PHASE31_RECOMMENDED_CANDIDATES.md: TOP 3 with Step 0 verification - PHASE31_TINY_FREE_TRACE_ATOMIC_PRUNE_RESULTS.md: Complete A/B results - ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md: Updated (Phase 30-31) - CURRENT_TASK.md: Phase 32 candidate identified (g_hak_tiny_free_calls) Key Lessons: - Lesson 7 (Phase 30): Step 0 execution verification prevents wasted effort - Lesson 8 (Phase 31): NEUTRAL + code cleanliness = valid adoption - HOT path ≠ guaranteed performance win (rate-limited atomics may be optimized) Next Phase: Phase 32 candidate (g_hak_tiny_free_calls) - Location: core/hakmem_tiny_free.inc:335 (9 lines below Phase 31 target) - Expected: +0.3~0.7% or NEUTRAL Generated with Claude Code https://claude.com/claude-code Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-16 07:31:15 +09:00
# Phase 30: Standard Procedure for Atomic Prune Operations
**Date:** 2025-12-16
**Status:** PROCEDURE STANDARDIZATION
**Purpose:** Codify learnings from Phase 24-29 to prevent no-op phases
---
## Executive Summary
Phase 24-29 taught us critical lessons about atomic pruning success factors:
- **GO phases** (+2.74% cumulative): HOT/WARM path telemetry atomic removal works
- **NO-OP phases** (Phase 28-29): Correctness atomics and ENV-gated code waste effort
This document standardizes a 4-step procedure to ensure future phases target high-impact, executable code.
---
## 1. Phase 24-29 Cumulative Lessons
### Phase 24-27: GO (+2.74% cumulative)
**Pattern: HOT/WARM path telemetry atomic removal**
- **Phase 24 (alloc stats)**: +0.93%
- Removed `atomic_fetch_add` in `malloc_tiny_fast()` hot path
- Stats compiled out with `HAKMEM_ALLOC_GATE_STATS_COMPILED=0`
- **Phase 25 (free stats)**: +1.07%
- Removed `atomic_fetch_add` in `free_tiny_fast_hotcold()` hot path
- Stats compiled out with `HAKMEM_FREE_PATH_STATS_COMPILED=0`
- **Phase 27 (unified cache)**: +0.74%
- Removed `atomic_fetch_add` in TLS cache hit path
- Stats compiled out with `HAKMEM_TINY_FRONT_STATS_COMPILED=0`
**Success Factors:**
- ✅ Executed in every allocation/free (HOT path)
- ✅ Pure telemetry (stats only, no control flow)
- ✅ Build-level compile-out (no runtime overhead)
### Phase 26: NEUTRAL (code cleanliness)
**Pattern: Low-frequency but still compile-out**
- Tiny header tracking stats (COLD path)
- No performance impact but maintains future maintainability
- Kept compile-out mechanism for consistency
**Lesson:** Even low-frequency telemetry benefits from compile-out for code cleanliness.
### Phase 28: NO-OP (CORRECTNESS atomics)
**Anti-pattern: Misidentified counter purpose**
- **Target:** `g_bg_spill_len` (looked like a counter)
- **Reality:** Flow control atomic (queue depth tracking)
- **Usage:**
```c
if (atomic_load(&g_bg_spill_len) < TARGET_SPILL_LEN) {
// Decision-making logic
}
```
**Critical Lesson:**
**Counter name ≠ Counter purpose**
**CORRECTNESS atomics (NEVER touch):**
- Used in `if/while` conditions
- Flow control (queue depth, threshold checks)
- Lock-free synchronization (CAS, load-store ordering)
- Affects program behavior if removed
### Phase 29: NO-OP (ENV-gated, not executed)
**Anti-pattern: Optimizing dead code**
- **Target:** Pool v2 stats atomics
- **Reality:** Gated by `getenv("HAKMEM_POOL_V2")` = OFF by default
- **Benchmark:** Never executes pool v2 code paths
- **Result:** Zero impact on measurements
**Critical Lesson:**
**Execution verification is MANDATORY before optimization**
---
## 2. Standard Procedure (4 Steps)
### Step 0: Execution Verification (MANDATORY GATE) ⚠️
**Purpose:** Prevent wasted effort on ENV-gated or low-frequency code (Phase 29 lesson)
#### Methods:
**A. ENV Gate Check**
```bash
# Check if feature is runtime-disabled
rg "getenv.*FEATURE_NAME" core/
rg "getenv.*POOL_V2" core/ # Example
```
**B. Execution Counter Verification**
1. **Find counter reference:**
```bash
rg -n "atomic.*g_target_counter" core/
```
2. **Check counter in benchmark output:**
```bash
# Run mixed benchmark 10 times
scripts/run_mixed_10_cleanenv.sh
# Check if counter > 0 in any run
grep "target_counter" results/*.txt
```
3. **Optional: Add debug printf (if counter not visible):**
```c
#if HAKMEM_DEBUG_PRINT
fprintf(stderr, "[DEBUG] counter=%lu\n",
atomic_load(&g_target_counter));
#endif
```
**C. perf/flamegraph Verification (optional but recommended)**
```bash
# Record with perf
perf record -g -F 99 -- ./bench_random_mixed_hakmem
# Check if function appears in profile
perf report | grep "target_function"
```
#### Decision Matrix:
| Condition | Action |
|-----------|--------|
| ✅ Counter > 0 in benchmark | Proceed to Step 1 |
| ✅ Function in perf profile | Proceed to Step 1 |
| ❌ ENV gated + OFF by default | **SKIP** (Phase 29 pattern) |
| ❌ Counter = 0 in all runs | **SKIP** (not executed) |
| ❌ Function not in flamegraph | **SKIP** (negligible frequency) |
**Output:** Document execution verification results in `PHASE[N]_AUDIT.md`
---
### Step 1: CORRECTNESS/TELEMETRY Classification (Phase 28 lesson)
**Purpose:** Distinguish between atomics that control behavior vs. atomics that just observe
#### Classification Rules:
**CORRECTNESS (NEVER touch):**
- ❌ Used in `if/while/for` conditions
- ❌ Flow control (queue depth, threshold, capacity checks)
- ❌ Lock-free synchronization (CAS, `atomic_compare_exchange_*`)
- ❌ Load-store ordering dependencies
- ❌ Affects program decisions/behavior
**Examples:**
```c
// CORRECTNESS: Controls loop behavior
while (atomic_load(&g_queue_len) < target) { ... }
// CORRECTNESS: Threshold check
if (atomic_load(&g_bg_spill_len) >= MAX_SPILL) { ... }
// CORRECTNESS: CAS synchronization
atomic_compare_exchange_weak(&g_state, &expected, desired)
```
**TELEMETRY (compile-out candidate):**
- ✅ Stats/logging/observation only
- ✅ Used exclusively in `printf/fprintf/sprintf`
- ✅ Deletion changes no program behavior
- ✅ Pure counters (hits, misses, totals)
**Examples:**
```c
// TELEMETRY: Stats only
atomic_fetch_add(&stats[idx].hits, 1, memory_order_relaxed);
// TELEMETRY: Logging only
fprintf(stderr, "allocs=%lu\n", atomic_load(&g_alloc_count));
```
#### Verification Process:
1. **List all atomics in target scope:**
```bash
rg -n "atomic_(fetch_add|load|store).*g_target" core/
```
2. **Track all usage sites:**
```bash
rg -n "g_target_atomic" core/
```
3. **Check each usage:**
- Is it in an `if` condition? → **CORRECTNESS**
- Is it only in `printf/fprintf`? → **TELEMETRY**
- Unsure? → **CORRECTNESS** (safe default)
4. **Document classification:**
```markdown
## Atomic Classification
### g_alloc_stats (TELEMETRY)
- core/box/alloc_gate_stats_box.h:15: atomic_fetch_add (stats only)
- core/hakmem.c:89: fprintf output only
- **Verdict:** TELEMETRY ✅
### g_bg_spill_len (CORRECTNESS)
- core/box/bgthread_box.h:42: if (atomic_load(...) < TARGET)
- **Verdict:** CORRECTNESS ❌ DO NOT TOUCH
```
**Output:** Classification table in `PHASE[N]_AUDIT.md`
---
### Step 2: Compile-Out Implementation (Phase 24-27 pattern)
**Purpose:** Build-level removal of telemetry atomics (not link-out)
#### A. Add Compile Gate to BuildFlags
**File:** `core/hakmem_build_flags.h`
```c
// ========== [Feature Name] Stats (Phase N) ==========
#ifndef HAKMEM_[NAME]_STATS_COMPILED
# define HAKMEM_[NAME]_STATS_COMPILED 0
#endif
```
**Example:**
```c
// ========== Alloc Gate Stats (Phase 24) ==========
#ifndef HAKMEM_ALLOC_GATE_STATS_COMPILED
# define HAKMEM_ALLOC_GATE_STATS_COMPILED 0
#endif
```
#### B. Wrap TELEMETRY Atomics with #if
**Pattern:**
```c
#if HAKMEM_[NAME]_STATS_COMPILED
atomic_fetch_add_explicit(&g_[name]_stat, 1, memory_order_relaxed);
#else
(void)0; // No-op when compiled out
#endif
```
**Example:**
```c
#if HAKMEM_ALLOC_GATE_STATS_COMPILED
atomic_fetch_add_explicit(&g_alloc_gate_slow, 1, memory_order_relaxed);
#else
(void)0;
#endif
```
#### C. Keep Variable Definitions (important!)
**Do NOT remove:**
```c
// Keep atomic variable definition (for COMPILED=1 case)
static _Atomic uint64_t g_stat_counter = 0;
// Keep print functions (guarded by same flag)
#if HAKMEM_[NAME]_STATS_COMPILED
void print_stats(void) {
fprintf(stderr, "counter=%lu\n", atomic_load(&g_stat_counter));
}
#endif
```
#### D. Prohibited Actions (Phase 22-2 NO-GO lesson)
**NEVER:**
- ❌ Link-out (removing `.o` files from Makefile)
- ❌ Deleting API functions (breaks linkage)
- ❌ Removing struct definitions (breaks compilation)
- ❌ Runtime `if` checks (adds branch overhead)
**Rationale:** Build-level `#if` has zero runtime cost. Link-out risks ABI breaks.
---
### Step 3: A/B Test (build-level comparison)
**Purpose:** Measure impact of compile-out vs. compiled-in
#### A. Baseline Build (COMPILED=0, default)
```bash
# Clean build with stats compiled OUT
make clean
make -j bench_random_mixed_hakmem
# Run 10 iterations
scripts/run_mixed_10_cleanenv.sh
# Record results
cp results/mixed_10_summary.txt docs/analysis/PHASE[N]_BASELINE.txt
```
#### B. Compiled-In Build (COMPILED=1)
```bash
# Clean build with stats compiled IN
make clean
make -j EXTRA_CFLAGS='-DHAKMEM_[NAME]_STATS_COMPILED=1' bench_random_mixed_hakmem
# Run 10 iterations
scripts/run_mixed_10_cleanenv.sh
# Record results
cp results/mixed_10_summary.txt docs/analysis/PHASE[N]_COMPILED_IN.txt
```
#### C. Compare Results
```bash
# Calculate delta
scripts/compare_benchmark_results.sh \
docs/analysis/PHASE[N]_BASELINE.txt \
docs/analysis/PHASE[N]_COMPILED_IN.txt
```
#### D. Decision Matrix
| Delta | Verdict | Action |
|-------|---------|--------|
| **+0.5% or higher** | **GO** | Keep compile-out, document win |
| **±0.5%** | **NEUTRAL** | Keep for code cleanliness |
| **-0.5% or lower** | **NO-GO** | Revert changes |
**Rationale:**
- +0.5%: Statistically significant (HOT path impact)
- ±0.5%: Noise range (but cleanliness still valuable)
- -0.5%: Unexpected regression (likely measurement error, revert)
**Output:** `PHASE[N]_RESULTS.md` with full comparison
---
## 3. Phase Checklist Template
Copy this for each new phase:
```markdown
## Phase [N]: [Target Description] Atomic Prune
**Date:** YYYY-MM-DD
**Target:** [Atomic variable/scope name]
**Expected Impact:** [HOT/WARM/COLD path, estimated %]
---
### Step 0: Execution Verification ✅/❌
- [ ] **ENV Gate Check**
```bash
rg "getenv.*[FEATURE]" core/
```
Result: [No ENV gate / Gated by X=OFF / Gated by X=ON]
- [ ] **Execution Counter Verification**
```bash
rg -n "atomic.*g_target" core/
scripts/run_mixed_10_cleanenv.sh
grep "target_counter" results/*.txt
```
Result: [Counter > 0 in all runs / Counter = 0 / Not visible]
- [ ] **perf Profile Check (optional)**
```bash
perf record -g -F 99 -- ./bench_random_mixed_hakmem
perf report | grep "target_function"
```
Result: [Function appears in profile / Not in profile]
**Verdict:** [✅ PROCEED / ❌ SKIP (reason)]
---
### Step 1: CORRECTNESS/TELEMETRY Classification
- [ ] **List All Atomics**
```bash
rg -n "atomic_(fetch_add|load|store).*g_" [target_file]
```
- [ ] **Track All Usage Sites**
```bash
rg -n "g_atomic_var" core/
```
- [ ] **Classify Each Atomic**
| Atomic Variable | Usage | Class | Verdict |
|-----------------|-------|-------|---------|
| `g_var1` | `if` condition | CORRECTNESS | ❌ DO NOT TOUCH |
| `g_var2` | `fprintf` only | TELEMETRY | ✅ Candidate |
- [ ] **Document Classification Rationale**
**Output:** Classification table saved to `PHASE[N]_AUDIT.md`
---
### Step 2: Compile-Out Implementation
- [ ] **Add BuildFlags Gate**
```c
// core/hakmem_build_flags.h
#ifndef HAKMEM_[NAME]_STATS_COMPILED
# define HAKMEM_[NAME]_STATS_COMPILED 0
#endif
```
- [ ] **Wrap TELEMETRY Atomics**
```c
#if HAKMEM_[NAME]_STATS_COMPILED
atomic_fetch_add_explicit(&g_stat, 1, memory_order_relaxed);
#else
(void)0;
#endif
```
- [ ] **Verify Compilation**
```bash
make clean && make -j # COMPILED=0 default
make clean && make -j EXTRA_CFLAGS='-DHAKMEM_[NAME]_STATS_COMPILED=1'
```
---
### Step 3: A/B Test
- [ ] **Baseline Build (COMPILED=0)**
```bash
make clean && make -j bench_random_mixed_hakmem
scripts/run_mixed_10_cleanenv.sh
cp results/mixed_10_summary.txt docs/analysis/PHASE[N]_BASELINE.txt
```
- [ ] **Compiled-In Build (COMPILED=1)**
```bash
make clean && make -j EXTRA_CFLAGS='-DHAKMEM_[NAME]_STATS_COMPILED=1' bench_random_mixed_hakmem
scripts/run_mixed_10_cleanenv.sh
cp results/mixed_10_summary.txt docs/analysis/PHASE[N]_COMPILED_IN.txt
```
- [ ] **Compare Results**
```bash
scripts/compare_benchmark_results.sh \
docs/analysis/PHASE[N]_BASELINE.txt \
docs/analysis/PHASE[N]_COMPILED_IN.txt
```
- [ ] **Record Verdict**
- Delta: [+X.XX%]
- Verdict: [GO / NEUTRAL / NO-GO]
- Rationale: [...]
**Output:** `PHASE[N]_RESULTS.md` with full comparison
---
### Deliverables
- [ ] `PHASE[N]_AUDIT.md` - Classification and execution verification
- [ ] `PHASE[N]_BASELINE.txt` - Baseline benchmark results
- [ ] `PHASE[N]_COMPILED_IN.txt` - Compiled-in benchmark results
- [ ] `PHASE[N]_RESULTS.md` - A/B comparison and verdict
- [ ] Update `ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md` with Phase [N] results
- [ ] Update `CURRENT_TASK.md` with next phase
---
### Notes
[Add any phase-specific observations, gotchas, or learnings here]
```
---
## 4. Success Criteria
A phase is considered **GO** if:
1. ✅ Step 0: Execution verified (counter > 0 or perf profile hit)
2. ✅ Step 1: Pure TELEMETRY classification (no CORRECTNESS atomics)
3. ✅ Step 2: Clean compile-out implementation (no link-out)
4. ✅ Step 3: +0.5% or higher performance delta
A phase is **NO-OP** if:
- ❌ Step 0: Not executed in benchmark (Phase 29)
- ❌ Step 1: CORRECTNESS atomic (Phase 28)
- ❌ Step 3: Delta within ±0.5% noise range
---
## 5. Anti-Patterns to Avoid
### ❌ Skipping Execution Verification (Phase 29)
**Problem:** Optimizing ENV-gated code that never runs
**Solution:** Always run Step 0 before any work
### ❌ Assuming Counter = Telemetry (Phase 28)
**Problem:** Flow control atomics look like counters
**Solution:** Check all usage sites, especially `if` conditions
### ❌ Link-Out Instead of Compile-Out (Phase 22-2)
**Problem:** ABI breaks, mysterious link errors
**Solution:** Use `#if` preprocessor guards, never remove `.o` files
### ❌ Runtime Flags for Stats (not attempted, but common mistake)
**Problem:** `if (g_enable_stats)` adds branch overhead
**Solution:** Build-level `#if` has zero runtime cost
---
## 6. Expected Impact by Path Type
Based on Phase 24-29 results:
| Path Type | Expected Delta | Example Phases |
|-----------|----------------|----------------|
| **HOT** (alloc/free fast path) | **+0.5% to +1.5%** | Phase 24 (+0.93%), Phase 25 (+1.07%) |
| **WARM** (TLS cache hit) | **+0.2% to +0.8%** | Phase 27 (+0.74%) |
| **COLD** (slow path, rare events) | **±0.0% to +0.2%** | Phase 26 (NEUTRAL, cleanliness) |
| **ENV-gated OFF** | **0.0% (no-op)** | Phase 29 (pool v2) |
| **CORRECTNESS** | **Undefined (DO NOT TOUCH)** | Phase 28 (bg_spill_len) |
---
## 7. Tools and Scripts
### Execution Verification
```bash
# ENV gate check
rg "getenv.*FEATURE" core/
# Counter check (requires benchmark run)
scripts/run_mixed_10_cleanenv.sh
grep "counter_name" results/*.txt
# perf profile
perf record -g -F 99 -- ./bench_random_mixed_hakmem
perf report | grep "function_name"
```
### Classification Audit
```bash
# List all atomics in scope
rg -n "atomic_(fetch_add|load|store|compare_exchange)" [file]
# Track variable usage
rg -n "g_variable_name" core/
# Find if conditions
rg -n "if.*g_variable" core/
```
### A/B Testing
```bash
# Baseline
make clean && make -j bench_random_mixed_hakmem
scripts/run_mixed_10_cleanenv.sh
# Compiled-in
make clean && make -j EXTRA_CFLAGS='-DHAKMEM_FEATURE_COMPILED=1' bench_random_mixed_hakmem
scripts/run_mixed_10_cleanenv.sh
# Compare (if script exists)
scripts/compare_benchmark_results.sh baseline.txt compiled_in.txt
```
---
## 8. Governance
**When to Use This Procedure:**
- Any new atomic prune phase (Phase 31+)
- Reviewing existing compile-out flags for consistency
- Training new contributors on atomic optimization
**When to Skip:**
- Non-atomic optimizations (inlining, data structure changes)
- Known CORRECTNESS atomics (Step 1 already failed)
- Features explicitly marked "do not optimize"
**Document Updates:**
- This procedure should be updated after each phase if new patterns emerge
- Phase results should update `ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md`
- New anti-patterns should be added to Section 5
---
## 9. References
- **Phase 24 Results:** `docs/analysis/PHASE24_ALLOC_GATE_STATS_RESULTS.md` (+0.93%)
- **Phase 25 Results:** `docs/analysis/PHASE25_FREE_PATH_STATS_RESULTS.md` (+1.07%)
- **Phase 27 Results:** `docs/analysis/PHASE27_TINY_FRONT_STATS_RESULTS.md` (+0.74%)
- **Phase 28 NO-OP:** `docs/analysis/PHASE28_BGTHREAD_ATOMIC_AUDIT.md` (CORRECTNESS)
- **Phase 29 NO-OP:** `docs/analysis/PHASE29_POOL_V2_AUDIT.md` (ENV-gated)
- **Cumulative Summary:** `docs/analysis/ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md`
---
**End of Standard Procedure Document**
**Next:** Apply Step 0 to Phase 31 candidates to ensure execution before optimization.