244 lines
8.5 KiB
Markdown
244 lines
8.5 KiB
Markdown
|
|
# Phase 26: Hot Path Atomic Telemetry Prune - Audit & Plan
|
||
|
|
|
||
|
|
**Date:** 2025-12-16
|
||
|
|
**Purpose:** Identify and compile-out telemetry-only atomics in hot alloc/free paths
|
||
|
|
**Pattern:** Follow Phase 24 (tiny_class_stats) + Phase 25 (g_free_ss_enter)
|
||
|
|
**Expected Gain:** +2-3% cumulative improvement
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Executive Summary
|
||
|
|
|
||
|
|
**Goal:** Remove all telemetry-only `atomic_fetch_add/sub` from hot paths (alloc/free direct paths).
|
||
|
|
|
||
|
|
**Methodology:**
|
||
|
|
1. Audit all atomics in `core/` directory
|
||
|
|
2. Classify: **CORRECTNESS** (keep) vs **TELEMETRY** (compile-out)
|
||
|
|
3. Prioritize: **HOT** (direct alloc/free) > **WARM** (refill/spill) > **COLD** (init/shutdown)
|
||
|
|
4. Implement compile gates following Phase 24+25 pattern
|
||
|
|
5. A/B test each candidate independently
|
||
|
|
|
||
|
|
**Status:** Phase 25 complete (+1.07% GO). Starting Phase 26.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Classification Criteria
|
||
|
|
|
||
|
|
### CORRECTNESS (Do NOT touch)
|
||
|
|
- Remote queue management: `remote_count`, `remote_head`, `remote_tail`
|
||
|
|
- Refcount/ownership: `refcount`, `owner`, `in_use`, `active`
|
||
|
|
- Lock/synchronization: `lock`, `mutex`, `head`, `tail` (queue atomics)
|
||
|
|
- Metadata: `meta->used`, `meta->active`, `meta->tls_cached`
|
||
|
|
|
||
|
|
### TELEMETRY (Candidate for compile-out)
|
||
|
|
- Stats counters: `*_stats`, `*_count`, `*_calls`
|
||
|
|
- Diagnostics: `*_trace`, `*_debug`, `*_diag`, `*_log`
|
||
|
|
- Observability: `*_enter`, `*_exit`, `*_hit`, `*_miss`, `*_attempt`, `*_success`
|
||
|
|
- Metrics: `g_metric_*`, `g_dbg_*`, `g_rel_*`
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Phase 26 Candidates: HOT PATH TELEMETRY ATOMICS
|
||
|
|
|
||
|
|
### Priority A: Direct Free Path (tiny_superslab_free.inc.h)
|
||
|
|
|
||
|
|
#### 1. `g_free_ss_enter` - **ALREADY DONE (Phase 25)**
|
||
|
|
- **Status:** GO (+1.07%)
|
||
|
|
- **Location:** `core/tiny_superslab_free.inc.h:22`
|
||
|
|
- **Gate:** `HAKMEM_TINY_FREE_STATS_COMPILED`
|
||
|
|
- **Verdict:** Keep compiled-out (default: 0)
|
||
|
|
|
||
|
|
#### 2. `c7_free_count` - **NEW CANDIDATE**
|
||
|
|
- **Location:** `core/tiny_superslab_free.inc.h:51`
|
||
|
|
- **Code:** `atomic_fetch_add_explicit(&c7_free_count, 1, memory_order_relaxed);`
|
||
|
|
- **Purpose:** Debug counter for C7 free path diagnostics
|
||
|
|
- **Path:** HOT (free superslab fast path)
|
||
|
|
- **Expected Gain:** +0.3-0.8%
|
||
|
|
- **Priority:** HIGH
|
||
|
|
- **Action:** Create Phase 26A
|
||
|
|
|
||
|
|
#### 3. `g_hdr_mismatch_log` - **NEW CANDIDATE**
|
||
|
|
- **Location:** `core/tiny_superslab_free.inc.h:147`
|
||
|
|
- **Code:** `atomic_fetch_add_explicit(&g_hdr_mismatch_log, 1, memory_order_relaxed);`
|
||
|
|
- **Purpose:** Log header validation mismatches (debug only)
|
||
|
|
- **Path:** HOT (free path validation)
|
||
|
|
- **Expected Gain:** +0.2-0.5%
|
||
|
|
- **Priority:** HIGH
|
||
|
|
- **Action:** Create Phase 26B
|
||
|
|
|
||
|
|
#### 4. `g_hdr_meta_mismatch` - **NEW CANDIDATE**
|
||
|
|
- **Location:** `core/tiny_superslab_free.inc.h:182`
|
||
|
|
- **Code:** `atomic_fetch_add_explicit(&g_hdr_meta_mismatch, 1, memory_order_relaxed);`
|
||
|
|
- **Purpose:** Log metadata validation failures (debug only)
|
||
|
|
- **Path:** HOT (free path validation)
|
||
|
|
- **Expected Gain:** +0.2-0.5%
|
||
|
|
- **Priority:** HIGH
|
||
|
|
- **Action:** Create Phase 26C
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### Priority B: Direct Alloc Path
|
||
|
|
|
||
|
|
#### 5. `g_metric_bad_class_once` - **NEW CANDIDATE**
|
||
|
|
- **Location:** `core/hakmem_tiny_alloc.inc:22`
|
||
|
|
- **Code:** `atomic_fetch_add_explicit(&g_metric_bad_class_once, 1, memory_order_relaxed)`
|
||
|
|
- **Purpose:** One-shot metric for bad class index (safety check)
|
||
|
|
- **Path:** HOT (alloc entry gate)
|
||
|
|
- **Expected Gain:** +0.1-0.3%
|
||
|
|
- **Priority:** MEDIUM
|
||
|
|
- **Action:** Create Phase 26D
|
||
|
|
|
||
|
|
#### 6. `g_hdr_meta_fast` - **NEW CANDIDATE**
|
||
|
|
- **Location:** `core/tiny_free_fast_v2.inc.h:181`
|
||
|
|
- **Code:** `atomic_fetch_add_explicit(&g_hdr_meta_fast, 1, memory_order_relaxed);`
|
||
|
|
- **Purpose:** Fast-path header metadata hit counter (telemetry)
|
||
|
|
- **Path:** HOT (free_fast_v2 path)
|
||
|
|
- **Expected Gain:** +0.3-0.7%
|
||
|
|
- **Priority:** HIGH
|
||
|
|
- **Action:** Create Phase 26E
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### Priority C: Warm Path (Refill/Spill)
|
||
|
|
|
||
|
|
#### 7. `g_bg_spill_len` - **BORDERLINE**
|
||
|
|
- **Location:** `core/hakmem_tiny_bg_spill.h:32,44`
|
||
|
|
- **Code:** `atomic_fetch_add_explicit(&g_bg_spill_len[class_idx], ...)`
|
||
|
|
- **Purpose:** Background spill queue length tracking
|
||
|
|
- **Path:** WARM (spill path)
|
||
|
|
- **Expected Gain:** +0.1-0.2%
|
||
|
|
- **Priority:** MEDIUM
|
||
|
|
- **Note:** May be CORRECTNESS if queue length is used for flow control
|
||
|
|
- **Action:** Review code, then decide (Phase 27+)
|
||
|
|
|
||
|
|
#### 8. Unified Cache Stats - **MULTIPLE ATOMICS**
|
||
|
|
- **Location:** `core/front/tiny_unified_cache.c` (multiple lines)
|
||
|
|
- **Variables:** `g_unified_cache_hits_global`, `g_unified_cache_misses_global`, etc.
|
||
|
|
- **Purpose:** Unified cache hit/miss telemetry
|
||
|
|
- **Path:** WARM (cache layer)
|
||
|
|
- **Expected Gain:** +0.2-0.4%
|
||
|
|
- **Priority:** MEDIUM
|
||
|
|
- **Action:** Group into single Phase 27+ candidate
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Phase 26 Implementation Plan
|
||
|
|
|
||
|
|
### Phase 26A: `c7_free_count` Atomic Prune
|
||
|
|
|
||
|
|
**Target:** `core/tiny_superslab_free.inc.h:51`
|
||
|
|
|
||
|
|
#### Step 1: Add Build Flag
|
||
|
|
```c
|
||
|
|
// core/hakmem_build_flags.h (after line 290)
|
||
|
|
|
||
|
|
// ------------------------------------------------------------
|
||
|
|
// Phase 26A: C7 Free Count Atomic Prune (Compile-out c7_free_count)
|
||
|
|
// ------------------------------------------------------------
|
||
|
|
// C7 Free Count: Compile gate (default OFF = compile-out)
|
||
|
|
// Set to 1 for research builds that need C7 free path diagnostics
|
||
|
|
// Target: c7_free_count atomic in core/tiny_superslab_free.inc.h:51
|
||
|
|
#ifndef HAKMEM_C7_FREE_COUNT_COMPILED
|
||
|
|
# define HAKMEM_C7_FREE_COUNT_COMPILED 0
|
||
|
|
#endif
|
||
|
|
```
|
||
|
|
|
||
|
|
#### Step 2: Wrap Atomic with Compile Gate
|
||
|
|
```c
|
||
|
|
// core/tiny_superslab_free.inc.h:51
|
||
|
|
#if HAKMEM_C7_FREE_COUNT_COMPILED
|
||
|
|
extern _Atomic int c7_free_count;
|
||
|
|
int count = atomic_fetch_add_explicit(&c7_free_count, 1, memory_order_relaxed);
|
||
|
|
#else
|
||
|
|
int count = 0; // No-op when compiled out
|
||
|
|
(void)count; // Suppress unused warning
|
||
|
|
#endif
|
||
|
|
```
|
||
|
|
|
||
|
|
#### Step 3: A/B Test (Build-Level)
|
||
|
|
```bash
|
||
|
|
# Baseline (compiled-out, default)
|
||
|
|
make clean && make -j bench_random_mixed_hakmem
|
||
|
|
./bench_random_mixed_hakmem > baseline_26a.txt
|
||
|
|
|
||
|
|
# Compiled-in (for comparison)
|
||
|
|
make clean && make -j EXTRA_CFLAGS='-DHAKMEM_C7_FREE_COUNT_COMPILED=1' bench_random_mixed_hakmem
|
||
|
|
./bench_random_mixed_hakmem > compiled_in_26a.txt
|
||
|
|
|
||
|
|
# Run full bench suite
|
||
|
|
./scripts/run_mixed_10_cleanenv.sh > bench_26a_baseline.txt
|
||
|
|
make clean && make -j EXTRA_CFLAGS='-DHAKMEM_C7_FREE_COUNT_COMPILED=1' bench_random_mixed_hakmem
|
||
|
|
./scripts/run_mixed_10_cleanenv.sh > bench_26a_compiled.txt
|
||
|
|
```
|
||
|
|
|
||
|
|
#### Step 4: Verdict
|
||
|
|
- **GO:** +0.5% or more → keep compiled-out (default: 0)
|
||
|
|
- **NEUTRAL:** ±0.5% → document, keep compiled-out for cleanliness
|
||
|
|
- **NO-GO:** -0.5% or worse → revert change
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### Phase 26B-E: Repeat Pattern
|
||
|
|
|
||
|
|
Follow same pattern for:
|
||
|
|
- **26B:** `g_hdr_mismatch_log` (tiny_superslab_free.inc.h:147)
|
||
|
|
- **26C:** `g_hdr_meta_mismatch` (tiny_superslab_free.inc.h:182)
|
||
|
|
- **26D:** `g_metric_bad_class_once` (hakmem_tiny_alloc.inc:22)
|
||
|
|
- **26E:** `g_hdr_meta_fast` (tiny_free_fast_v2.inc.h:181)
|
||
|
|
|
||
|
|
**Each Phase:**
|
||
|
|
1. Add `HAKMEM_[NAME]_COMPILED` flag to `hakmem_build_flags.h`
|
||
|
|
2. Wrap atomic with `#if HAKMEM_[NAME]_COMPILED`
|
||
|
|
3. Run A/B test (baseline vs compiled-in)
|
||
|
|
4. Measure improvement
|
||
|
|
5. Document verdict
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Expected Cumulative Impact
|
||
|
|
|
||
|
|
| Phase | Target Atomic | File | Expected Gain | Status |
|
||
|
|
|-------|---------------|------|---------------|--------|
|
||
|
|
| 24 | `g_tiny_class_stats_*` | tiny_class_stats_box.h | +0.93% | GO ✅ |
|
||
|
|
| 25 | `g_free_ss_enter` | tiny_superslab_free.inc.h:22 | +1.07% | GO ✅ |
|
||
|
|
| 26A | `c7_free_count` | tiny_superslab_free.inc.h:51 | +0.3-0.8% | TBD |
|
||
|
|
| 26B | `g_hdr_mismatch_log` | tiny_superslab_free.inc.h:147 | +0.2-0.5% | TBD |
|
||
|
|
| 26C | `g_hdr_meta_mismatch` | tiny_superslab_free.inc.h:182 | +0.2-0.5% | TBD |
|
||
|
|
| 26D | `g_metric_bad_class_once` | hakmem_tiny_alloc.inc:22 | +0.1-0.3% | TBD |
|
||
|
|
| 26E | `g_hdr_meta_fast` | tiny_free_fast_v2.inc.h:181 | +0.3-0.7% | TBD |
|
||
|
|
| **Total (24-26E)** | - | - | **+2.93-4.83%** | - |
|
||
|
|
|
||
|
|
**Conservative Estimate:** +3.0% cumulative improvement from hot-path atomic prune.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Next Steps
|
||
|
|
|
||
|
|
1. ✅ Audit complete (this document)
|
||
|
|
2. ⏳ Implement Phase 26A (`c7_free_count`)
|
||
|
|
3. ⏳ Run A/B test (baseline vs compiled-in)
|
||
|
|
4. ⏳ Document results in `PHASE26A_C7_FREE_COUNT_RESULTS.md`
|
||
|
|
5. ⏳ Repeat for 26B-E
|
||
|
|
6. ⏳ Create cumulative report
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## References
|
||
|
|
|
||
|
|
- **Phase 24 Pattern:** `core/box/tiny_class_stats_box.h`
|
||
|
|
- **Phase 25 Pattern:** `core/tiny_superslab_free.inc.h:20-25`
|
||
|
|
- **Build Flags:** `core/hakmem_build_flags.h:274-290`
|
||
|
|
- **Mimalloc Principle:** No atomics/observe in hot path
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Notes
|
||
|
|
|
||
|
|
- **DO NOT** touch correctness atomics (`remote_count`, `refcount`, `meta->used`, etc.)
|
||
|
|
- **ALWAYS** A/B test each candidate independently (no batching)
|
||
|
|
- **ALWAYS** use build-level flags (compile-time, not runtime)
|
||
|
|
- **FOLLOW** Phase 24+25 pattern (`#if COMPILED` with default: 0)
|
||
|
|
- **DOCUMENT** all verdicts (GO/NEUTRAL/NO-GO)
|
||
|
|
|
||
|
|
**mimalloc Gap Analysis:** This work closes the "hot path atomic tax" gap identified in optimization roadmap.
|