Summary: - Phase 24 (alloc stats): +0.93% GO - Phase 25 (free stats): +1.07% GO - Phase 26 (diagnostics): -0.33% NEUTRAL (code cleanliness) - Total: 11 atomics compiled-out, +2.00% improvement Phase 24: OBSERVE tax prune (tiny_class_stats_box.h) - Added HAKMEM_TINY_CLASS_STATS_COMPILED (default: 0) - Wrapped 5 stats functions: uc_miss, warm_hit, shared_lock, tls_carve_* - Result: +0.93% (baseline 56.675M vs compiled-in 56.151M ops/s) Phase 25: Tiny free stats prune (tiny_superslab_free.inc.h) - Added HAKMEM_TINY_FREE_STATS_COMPILED (default: 0) - Wrapped g_free_ss_enter atomic in free hot path - Result: +1.07% (baseline 57.017M vs compiled-in 56.415M ops/s) Phase 26: Hot path diagnostic atomics prune - Added 5 compile gates for low-frequency error counters: - HAKMEM_TINY_C7_FREE_COUNT_COMPILED - HAKMEM_TINY_HDR_MISMATCH_LOG_COMPILED - HAKMEM_TINY_HDR_META_MISMATCH_COMPILED - HAKMEM_TINY_METRIC_BAD_CLASS_COMPILED - HAKMEM_TINY_HDR_META_FAST_COMPILED - Result: -0.33% NEUTRAL (within noise, kept for cleanliness) Alignment with mimalloc principles: - "No atomics on hot path" - telemetry moved to compile-time opt-in - Fixed per-op tax elimination - Production builds: maximum performance (atomics compiled-out) - Research builds: full diagnostics (COMPILED=1) Generated with Claude Code https://claude.com/claude-code Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
244 lines
8.5 KiB
Markdown
244 lines
8.5 KiB
Markdown
# Phase 26: Hot Path Atomic Telemetry Prune - Audit & Plan
|
|
|
|
**Date:** 2025-12-16
|
|
**Purpose:** Identify and compile-out telemetry-only atomics in hot alloc/free paths
|
|
**Pattern:** Follow Phase 24 (tiny_class_stats) + Phase 25 (g_free_ss_enter)
|
|
**Expected Gain:** +2-3% cumulative improvement
|
|
|
|
---
|
|
|
|
## Executive Summary
|
|
|
|
**Goal:** Remove all telemetry-only `atomic_fetch_add/sub` from hot paths (alloc/free direct paths).
|
|
|
|
**Methodology:**
|
|
1. Audit all atomics in `core/` directory
|
|
2. Classify: **CORRECTNESS** (keep) vs **TELEMETRY** (compile-out)
|
|
3. Prioritize: **HOT** (direct alloc/free) > **WARM** (refill/spill) > **COLD** (init/shutdown)
|
|
4. Implement compile gates following Phase 24+25 pattern
|
|
5. A/B test each candidate independently
|
|
|
|
**Status:** Phase 25 complete (+1.07% GO). Starting Phase 26.
|
|
|
|
---
|
|
|
|
## Classification Criteria
|
|
|
|
### CORRECTNESS (Do NOT touch)
|
|
- Remote queue management: `remote_count`, `remote_head`, `remote_tail`
|
|
- Refcount/ownership: `refcount`, `owner`, `in_use`, `active`
|
|
- Lock/synchronization: `lock`, `mutex`, `head`, `tail` (queue atomics)
|
|
- Metadata: `meta->used`, `meta->active`, `meta->tls_cached`
|
|
|
|
### TELEMETRY (Candidate for compile-out)
|
|
- Stats counters: `*_stats`, `*_count`, `*_calls`
|
|
- Diagnostics: `*_trace`, `*_debug`, `*_diag`, `*_log`
|
|
- Observability: `*_enter`, `*_exit`, `*_hit`, `*_miss`, `*_attempt`, `*_success`
|
|
- Metrics: `g_metric_*`, `g_dbg_*`, `g_rel_*`
|
|
|
|
---
|
|
|
|
## Phase 26 Candidates: HOT PATH TELEMETRY ATOMICS
|
|
|
|
### Priority A: Direct Free Path (tiny_superslab_free.inc.h)
|
|
|
|
#### 1. `g_free_ss_enter` - **ALREADY DONE (Phase 25)**
|
|
- **Status:** GO (+1.07%)
|
|
- **Location:** `core/tiny_superslab_free.inc.h:22`
|
|
- **Gate:** `HAKMEM_TINY_FREE_STATS_COMPILED`
|
|
- **Verdict:** Keep compiled-out (default: 0)
|
|
|
|
#### 2. `c7_free_count` - **NEW CANDIDATE**
|
|
- **Location:** `core/tiny_superslab_free.inc.h:51`
|
|
- **Code:** `atomic_fetch_add_explicit(&c7_free_count, 1, memory_order_relaxed);`
|
|
- **Purpose:** Debug counter for C7 free path diagnostics
|
|
- **Path:** HOT (free superslab fast path)
|
|
- **Expected Gain:** +0.3-0.8%
|
|
- **Priority:** HIGH
|
|
- **Action:** Create Phase 26A
|
|
|
|
#### 3. `g_hdr_mismatch_log` - **NEW CANDIDATE**
|
|
- **Location:** `core/tiny_superslab_free.inc.h:147`
|
|
- **Code:** `atomic_fetch_add_explicit(&g_hdr_mismatch_log, 1, memory_order_relaxed);`
|
|
- **Purpose:** Log header validation mismatches (debug only)
|
|
- **Path:** HOT (free path validation)
|
|
- **Expected Gain:** +0.2-0.5%
|
|
- **Priority:** HIGH
|
|
- **Action:** Create Phase 26B
|
|
|
|
#### 4. `g_hdr_meta_mismatch` - **NEW CANDIDATE**
|
|
- **Location:** `core/tiny_superslab_free.inc.h:182`
|
|
- **Code:** `atomic_fetch_add_explicit(&g_hdr_meta_mismatch, 1, memory_order_relaxed);`
|
|
- **Purpose:** Log metadata validation failures (debug only)
|
|
- **Path:** HOT (free path validation)
|
|
- **Expected Gain:** +0.2-0.5%
|
|
- **Priority:** HIGH
|
|
- **Action:** Create Phase 26C
|
|
|
|
---
|
|
|
|
### Priority B: Direct Alloc Path
|
|
|
|
#### 5. `g_metric_bad_class_once` - **NEW CANDIDATE**
|
|
- **Location:** `core/hakmem_tiny_alloc.inc:22`
|
|
- **Code:** `atomic_fetch_add_explicit(&g_metric_bad_class_once, 1, memory_order_relaxed)`
|
|
- **Purpose:** One-shot metric for bad class index (safety check)
|
|
- **Path:** HOT (alloc entry gate)
|
|
- **Expected Gain:** +0.1-0.3%
|
|
- **Priority:** MEDIUM
|
|
- **Action:** Create Phase 26D
|
|
|
|
#### 6. `g_hdr_meta_fast` - **NEW CANDIDATE**
|
|
- **Location:** `core/tiny_free_fast_v2.inc.h:181`
|
|
- **Code:** `atomic_fetch_add_explicit(&g_hdr_meta_fast, 1, memory_order_relaxed);`
|
|
- **Purpose:** Fast-path header metadata hit counter (telemetry)
|
|
- **Path:** HOT (free_fast_v2 path)
|
|
- **Expected Gain:** +0.3-0.7%
|
|
- **Priority:** HIGH
|
|
- **Action:** Create Phase 26E
|
|
|
|
---
|
|
|
|
### Priority C: Warm Path (Refill/Spill)
|
|
|
|
#### 7. `g_bg_spill_len` - **BORDERLINE**
|
|
- **Location:** `core/hakmem_tiny_bg_spill.h:32,44`
|
|
- **Code:** `atomic_fetch_add_explicit(&g_bg_spill_len[class_idx], ...)`
|
|
- **Purpose:** Background spill queue length tracking
|
|
- **Path:** WARM (spill path)
|
|
- **Expected Gain:** +0.1-0.2%
|
|
- **Priority:** MEDIUM
|
|
- **Note:** May be CORRECTNESS if queue length is used for flow control
|
|
- **Action:** Review code, then decide (Phase 27+)
|
|
|
|
#### 8. Unified Cache Stats - **MULTIPLE ATOMICS**
|
|
- **Location:** `core/front/tiny_unified_cache.c` (multiple lines)
|
|
- **Variables:** `g_unified_cache_hits_global`, `g_unified_cache_misses_global`, etc.
|
|
- **Purpose:** Unified cache hit/miss telemetry
|
|
- **Path:** WARM (cache layer)
|
|
- **Expected Gain:** +0.2-0.4%
|
|
- **Priority:** MEDIUM
|
|
- **Action:** Group into single Phase 27+ candidate
|
|
|
|
---
|
|
|
|
## Phase 26 Implementation Plan
|
|
|
|
### Phase 26A: `c7_free_count` Atomic Prune
|
|
|
|
**Target:** `core/tiny_superslab_free.inc.h:51`
|
|
|
|
#### Step 1: Add Build Flag
|
|
```c
|
|
// core/hakmem_build_flags.h (after line 290)
|
|
|
|
// ------------------------------------------------------------
|
|
// Phase 26A: C7 Free Count Atomic Prune (Compile-out c7_free_count)
|
|
// ------------------------------------------------------------
|
|
// C7 Free Count: Compile gate (default OFF = compile-out)
|
|
// Set to 1 for research builds that need C7 free path diagnostics
|
|
// Target: c7_free_count atomic in core/tiny_superslab_free.inc.h:51
|
|
#ifndef HAKMEM_C7_FREE_COUNT_COMPILED
|
|
# define HAKMEM_C7_FREE_COUNT_COMPILED 0
|
|
#endif
|
|
```
|
|
|
|
#### Step 2: Wrap Atomic with Compile Gate
|
|
```c
|
|
// core/tiny_superslab_free.inc.h:51
|
|
#if HAKMEM_C7_FREE_COUNT_COMPILED
|
|
extern _Atomic int c7_free_count;
|
|
int count = atomic_fetch_add_explicit(&c7_free_count, 1, memory_order_relaxed);
|
|
#else
|
|
int count = 0; // No-op when compiled out
|
|
(void)count; // Suppress unused warning
|
|
#endif
|
|
```
|
|
|
|
#### Step 3: A/B Test (Build-Level)
|
|
```bash
|
|
# Baseline (compiled-out, default)
|
|
make clean && make -j bench_random_mixed_hakmem
|
|
./bench_random_mixed_hakmem > baseline_26a.txt
|
|
|
|
# Compiled-in (for comparison)
|
|
make clean && make -j EXTRA_CFLAGS='-DHAKMEM_C7_FREE_COUNT_COMPILED=1' bench_random_mixed_hakmem
|
|
./bench_random_mixed_hakmem > compiled_in_26a.txt
|
|
|
|
# Run full bench suite
|
|
./scripts/run_mixed_10_cleanenv.sh > bench_26a_baseline.txt
|
|
make clean && make -j EXTRA_CFLAGS='-DHAKMEM_C7_FREE_COUNT_COMPILED=1' bench_random_mixed_hakmem
|
|
./scripts/run_mixed_10_cleanenv.sh > bench_26a_compiled.txt
|
|
```
|
|
|
|
#### Step 4: Verdict
|
|
- **GO:** +0.5% or more → keep compiled-out (default: 0)
|
|
- **NEUTRAL:** ±0.5% → document, keep compiled-out for cleanliness
|
|
- **NO-GO:** -0.5% or worse → revert change
|
|
|
|
---
|
|
|
|
### Phase 26B-E: Repeat Pattern
|
|
|
|
Follow same pattern for:
|
|
- **26B:** `g_hdr_mismatch_log` (tiny_superslab_free.inc.h:147)
|
|
- **26C:** `g_hdr_meta_mismatch` (tiny_superslab_free.inc.h:182)
|
|
- **26D:** `g_metric_bad_class_once` (hakmem_tiny_alloc.inc:22)
|
|
- **26E:** `g_hdr_meta_fast` (tiny_free_fast_v2.inc.h:181)
|
|
|
|
**Each Phase:**
|
|
1. Add `HAKMEM_[NAME]_COMPILED` flag to `hakmem_build_flags.h`
|
|
2. Wrap atomic with `#if HAKMEM_[NAME]_COMPILED`
|
|
3. Run A/B test (baseline vs compiled-in)
|
|
4. Measure improvement
|
|
5. Document verdict
|
|
|
|
---
|
|
|
|
## Expected Cumulative Impact
|
|
|
|
| Phase | Target Atomic | File | Expected Gain | Status |
|
|
|-------|---------------|------|---------------|--------|
|
|
| 24 | `g_tiny_class_stats_*` | tiny_class_stats_box.h | +0.93% | GO ✅ |
|
|
| 25 | `g_free_ss_enter` | tiny_superslab_free.inc.h:22 | +1.07% | GO ✅ |
|
|
| 26A | `c7_free_count` | tiny_superslab_free.inc.h:51 | +0.3-0.8% | TBD |
|
|
| 26B | `g_hdr_mismatch_log` | tiny_superslab_free.inc.h:147 | +0.2-0.5% | TBD |
|
|
| 26C | `g_hdr_meta_mismatch` | tiny_superslab_free.inc.h:182 | +0.2-0.5% | TBD |
|
|
| 26D | `g_metric_bad_class_once` | hakmem_tiny_alloc.inc:22 | +0.1-0.3% | TBD |
|
|
| 26E | `g_hdr_meta_fast` | tiny_free_fast_v2.inc.h:181 | +0.3-0.7% | TBD |
|
|
| **Total (24-26E)** | - | - | **+2.93-4.83%** | - |
|
|
|
|
**Conservative Estimate:** +3.0% cumulative improvement from hot-path atomic prune.
|
|
|
|
---
|
|
|
|
## Next Steps
|
|
|
|
1. ✅ Audit complete (this document)
|
|
2. ⏳ Implement Phase 26A (`c7_free_count`)
|
|
3. ⏳ Run A/B test (baseline vs compiled-in)
|
|
4. ⏳ Document results in `PHASE26A_C7_FREE_COUNT_RESULTS.md`
|
|
5. ⏳ Repeat for 26B-E
|
|
6. ⏳ Create cumulative report
|
|
|
|
---
|
|
|
|
## References
|
|
|
|
- **Phase 24 Pattern:** `core/box/tiny_class_stats_box.h`
|
|
- **Phase 25 Pattern:** `core/tiny_superslab_free.inc.h:20-25`
|
|
- **Build Flags:** `core/hakmem_build_flags.h:274-290`
|
|
- **Mimalloc Principle:** No atomics/observe in hot path
|
|
|
|
---
|
|
|
|
## Notes
|
|
|
|
- **DO NOT** touch correctness atomics (`remote_count`, `refcount`, `meta->used`, etc.)
|
|
- **ALWAYS** A/B test each candidate independently (no batching)
|
|
- **ALWAYS** use build-level flags (compile-time, not runtime)
|
|
- **FOLLOW** Phase 24+25 pattern (`#if COMPILED` with default: 0)
|
|
- **DOCUMENT** all verdicts (GO/NEUTRAL/NO-GO)
|
|
|
|
**mimalloc Gap Analysis:** This work closes the "hot path atomic tax" gap identified in optimization roadmap.
|