Files
hakmem/docs/analysis/PHASE26_HOT_PATH_ATOMIC_AUDIT.md

244 lines
8.5 KiB
Markdown
Raw Normal View History

# Phase 26: Hot Path Atomic Telemetry Prune - Audit & Plan
**Date:** 2025-12-16
**Purpose:** Identify and compile-out telemetry-only atomics in hot alloc/free paths
**Pattern:** Follow Phase 24 (tiny_class_stats) + Phase 25 (g_free_ss_enter)
**Expected Gain:** +2-3% cumulative improvement
---
## Executive Summary
**Goal:** Remove all telemetry-only `atomic_fetch_add/sub` from hot paths (alloc/free direct paths).
**Methodology:**
1. Audit all atomics in `core/` directory
2. Classify: **CORRECTNESS** (keep) vs **TELEMETRY** (compile-out)
3. Prioritize: **HOT** (direct alloc/free) > **WARM** (refill/spill) > **COLD** (init/shutdown)
4. Implement compile gates following Phase 24+25 pattern
5. A/B test each candidate independently
**Status:** Phase 25 complete (+1.07% GO). Starting Phase 26.
---
## Classification Criteria
### CORRECTNESS (Do NOT touch)
- Remote queue management: `remote_count`, `remote_head`, `remote_tail`
- Refcount/ownership: `refcount`, `owner`, `in_use`, `active`
- Lock/synchronization: `lock`, `mutex`, `head`, `tail` (queue atomics)
- Metadata: `meta->used`, `meta->active`, `meta->tls_cached`
### TELEMETRY (Candidate for compile-out)
- Stats counters: `*_stats`, `*_count`, `*_calls`
- Diagnostics: `*_trace`, `*_debug`, `*_diag`, `*_log`
- Observability: `*_enter`, `*_exit`, `*_hit`, `*_miss`, `*_attempt`, `*_success`
- Metrics: `g_metric_*`, `g_dbg_*`, `g_rel_*`
---
## Phase 26 Candidates: HOT PATH TELEMETRY ATOMICS
### Priority A: Direct Free Path (tiny_superslab_free.inc.h)
#### 1. `g_free_ss_enter` - **ALREADY DONE (Phase 25)**
- **Status:** GO (+1.07%)
- **Location:** `core/tiny_superslab_free.inc.h:22`
- **Gate:** `HAKMEM_TINY_FREE_STATS_COMPILED`
- **Verdict:** Keep compiled-out (default: 0)
#### 2. `c7_free_count` - **NEW CANDIDATE**
- **Location:** `core/tiny_superslab_free.inc.h:51`
- **Code:** `atomic_fetch_add_explicit(&c7_free_count, 1, memory_order_relaxed);`
- **Purpose:** Debug counter for C7 free path diagnostics
- **Path:** HOT (free superslab fast path)
- **Expected Gain:** +0.3-0.8%
- **Priority:** HIGH
- **Action:** Create Phase 26A
#### 3. `g_hdr_mismatch_log` - **NEW CANDIDATE**
- **Location:** `core/tiny_superslab_free.inc.h:147`
- **Code:** `atomic_fetch_add_explicit(&g_hdr_mismatch_log, 1, memory_order_relaxed);`
- **Purpose:** Log header validation mismatches (debug only)
- **Path:** HOT (free path validation)
- **Expected Gain:** +0.2-0.5%
- **Priority:** HIGH
- **Action:** Create Phase 26B
#### 4. `g_hdr_meta_mismatch` - **NEW CANDIDATE**
- **Location:** `core/tiny_superslab_free.inc.h:182`
- **Code:** `atomic_fetch_add_explicit(&g_hdr_meta_mismatch, 1, memory_order_relaxed);`
- **Purpose:** Log metadata validation failures (debug only)
- **Path:** HOT (free path validation)
- **Expected Gain:** +0.2-0.5%
- **Priority:** HIGH
- **Action:** Create Phase 26C
---
### Priority B: Direct Alloc Path
#### 5. `g_metric_bad_class_once` - **NEW CANDIDATE**
- **Location:** `core/hakmem_tiny_alloc.inc:22`
- **Code:** `atomic_fetch_add_explicit(&g_metric_bad_class_once, 1, memory_order_relaxed)`
- **Purpose:** One-shot metric for bad class index (safety check)
- **Path:** HOT (alloc entry gate)
- **Expected Gain:** +0.1-0.3%
- **Priority:** MEDIUM
- **Action:** Create Phase 26D
#### 6. `g_hdr_meta_fast` - **NEW CANDIDATE**
- **Location:** `core/tiny_free_fast_v2.inc.h:181`
- **Code:** `atomic_fetch_add_explicit(&g_hdr_meta_fast, 1, memory_order_relaxed);`
- **Purpose:** Fast-path header metadata hit counter (telemetry)
- **Path:** HOT (free_fast_v2 path)
- **Expected Gain:** +0.3-0.7%
- **Priority:** HIGH
- **Action:** Create Phase 26E
---
### Priority C: Warm Path (Refill/Spill)
#### 7. `g_bg_spill_len` - **BORDERLINE**
- **Location:** `core/hakmem_tiny_bg_spill.h:32,44`
- **Code:** `atomic_fetch_add_explicit(&g_bg_spill_len[class_idx], ...)`
- **Purpose:** Background spill queue length tracking
- **Path:** WARM (spill path)
- **Expected Gain:** +0.1-0.2%
- **Priority:** MEDIUM
- **Note:** May be CORRECTNESS if queue length is used for flow control
- **Action:** Review code, then decide (Phase 27+)
#### 8. Unified Cache Stats - **MULTIPLE ATOMICS**
- **Location:** `core/front/tiny_unified_cache.c` (multiple lines)
- **Variables:** `g_unified_cache_hits_global`, `g_unified_cache_misses_global`, etc.
- **Purpose:** Unified cache hit/miss telemetry
- **Path:** WARM (cache layer)
- **Expected Gain:** +0.2-0.4%
- **Priority:** MEDIUM
- **Action:** Group into single Phase 27+ candidate
---
## Phase 26 Implementation Plan
### Phase 26A: `c7_free_count` Atomic Prune
**Target:** `core/tiny_superslab_free.inc.h:51`
#### Step 1: Add Build Flag
```c
// core/hakmem_build_flags.h (after line 290)
// ------------------------------------------------------------
// Phase 26A: C7 Free Count Atomic Prune (Compile-out c7_free_count)
// ------------------------------------------------------------
// C7 Free Count: Compile gate (default OFF = compile-out)
// Set to 1 for research builds that need C7 free path diagnostics
// Target: c7_free_count atomic in core/tiny_superslab_free.inc.h:51
#ifndef HAKMEM_C7_FREE_COUNT_COMPILED
# define HAKMEM_C7_FREE_COUNT_COMPILED 0
#endif
```
#### Step 2: Wrap Atomic with Compile Gate
```c
// core/tiny_superslab_free.inc.h:51
#if HAKMEM_C7_FREE_COUNT_COMPILED
extern _Atomic int c7_free_count;
int count = atomic_fetch_add_explicit(&c7_free_count, 1, memory_order_relaxed);
#else
int count = 0; // No-op when compiled out
(void)count; // Suppress unused warning
#endif
```
#### Step 3: A/B Test (Build-Level)
```bash
# Baseline (compiled-out, default)
make clean && make -j bench_random_mixed_hakmem
./bench_random_mixed_hakmem > baseline_26a.txt
# Compiled-in (for comparison)
make clean && make -j EXTRA_CFLAGS='-DHAKMEM_C7_FREE_COUNT_COMPILED=1' bench_random_mixed_hakmem
./bench_random_mixed_hakmem > compiled_in_26a.txt
# Run full bench suite
./scripts/run_mixed_10_cleanenv.sh > bench_26a_baseline.txt
make clean && make -j EXTRA_CFLAGS='-DHAKMEM_C7_FREE_COUNT_COMPILED=1' bench_random_mixed_hakmem
./scripts/run_mixed_10_cleanenv.sh > bench_26a_compiled.txt
```
#### Step 4: Verdict
- **GO:** +0.5% or more → keep compiled-out (default: 0)
- **NEUTRAL:** ±0.5% → document, keep compiled-out for cleanliness
- **NO-GO:** -0.5% or worse → revert change
---
### Phase 26B-E: Repeat Pattern
Follow same pattern for:
- **26B:** `g_hdr_mismatch_log` (tiny_superslab_free.inc.h:147)
- **26C:** `g_hdr_meta_mismatch` (tiny_superslab_free.inc.h:182)
- **26D:** `g_metric_bad_class_once` (hakmem_tiny_alloc.inc:22)
- **26E:** `g_hdr_meta_fast` (tiny_free_fast_v2.inc.h:181)
**Each Phase:**
1. Add `HAKMEM_[NAME]_COMPILED` flag to `hakmem_build_flags.h`
2. Wrap atomic with `#if HAKMEM_[NAME]_COMPILED`
3. Run A/B test (baseline vs compiled-in)
4. Measure improvement
5. Document verdict
---
## Expected Cumulative Impact
| Phase | Target Atomic | File | Expected Gain | Status |
|-------|---------------|------|---------------|--------|
| 24 | `g_tiny_class_stats_*` | tiny_class_stats_box.h | +0.93% | GO ✅ |
| 25 | `g_free_ss_enter` | tiny_superslab_free.inc.h:22 | +1.07% | GO ✅ |
| 26A | `c7_free_count` | tiny_superslab_free.inc.h:51 | +0.3-0.8% | TBD |
| 26B | `g_hdr_mismatch_log` | tiny_superslab_free.inc.h:147 | +0.2-0.5% | TBD |
| 26C | `g_hdr_meta_mismatch` | tiny_superslab_free.inc.h:182 | +0.2-0.5% | TBD |
| 26D | `g_metric_bad_class_once` | hakmem_tiny_alloc.inc:22 | +0.1-0.3% | TBD |
| 26E | `g_hdr_meta_fast` | tiny_free_fast_v2.inc.h:181 | +0.3-0.7% | TBD |
| **Total (24-26E)** | - | - | **+2.93-4.83%** | - |
**Conservative Estimate:** +3.0% cumulative improvement from hot-path atomic prune.
---
## Next Steps
1. ✅ Audit complete (this document)
2. ⏳ Implement Phase 26A (`c7_free_count`)
3. ⏳ Run A/B test (baseline vs compiled-in)
4. ⏳ Document results in `PHASE26A_C7_FREE_COUNT_RESULTS.md`
5. ⏳ Repeat for 26B-E
6. ⏳ Create cumulative report
---
## References
- **Phase 24 Pattern:** `core/box/tiny_class_stats_box.h`
- **Phase 25 Pattern:** `core/tiny_superslab_free.inc.h:20-25`
- **Build Flags:** `core/hakmem_build_flags.h:274-290`
- **Mimalloc Principle:** No atomics/observe in hot path
---
## Notes
- **DO NOT** touch correctness atomics (`remote_count`, `refcount`, `meta->used`, etc.)
- **ALWAYS** A/B test each candidate independently (no batching)
- **ALWAYS** use build-level flags (compile-time, not runtime)
- **FOLLOW** Phase 24+25 pattern (`#if COMPILED` with default: 0)
- **DOCUMENT** all verdicts (GO/NEUTRAL/NO-GO)
**mimalloc Gap Analysis:** This work closes the "hot path atomic tax" gap identified in optimization roadmap.