369 lines
11 KiB
Markdown
369 lines
11 KiB
Markdown
|
|
# Phase 31: Recommended Atomic Prune Candidates
|
||
|
|
|
||
|
|
**Date:** 2025-12-16
|
||
|
|
**Status:** CANDIDATE SELECTION (Step 0 verification complete)
|
||
|
|
**Purpose:** Select next high-impact atomic prune target based on Phase 30 standard procedure
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Executive Summary
|
||
|
|
|
||
|
|
**Audit Results:**
|
||
|
|
- Total atomics found: 412
|
||
|
|
- TELEMETRY candidates: 104
|
||
|
|
- CORRECTNESS (do not touch): 24
|
||
|
|
- UNKNOWN (needs manual review): 284
|
||
|
|
- HOT path atomics: 16
|
||
|
|
- WARM path atomics: 10
|
||
|
|
|
||
|
|
**NEW Candidates (not yet compiled out):**
|
||
|
|
- **1 HOT path** TELEMETRY candidate
|
||
|
|
- **3 WARM path** TELEMETRY candidates
|
||
|
|
|
||
|
|
**Phase 24-29 completed candidates (already done):**
|
||
|
|
- 4 HOT path atomics already compiled out (Phase 24-27)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Step 0 Verification Results
|
||
|
|
|
||
|
|
### Priority 1: HOT Path NEW Candidates
|
||
|
|
|
||
|
|
#### Candidate 1: `g_tiny_free_trace` (HOT path)
|
||
|
|
|
||
|
|
**Location:** `core/hakmem_tiny_free.inc:326`
|
||
|
|
|
||
|
|
**Code Context:**
|
||
|
|
```c
|
||
|
|
void hak_tiny_free(void* ptr) {
|
||
|
|
static _Atomic int g_tiny_free_trace = 0;
|
||
|
|
if (atomic_fetch_add_explicit(&g_tiny_free_trace, 1, memory_order_relaxed) < 128) {
|
||
|
|
HAK_TRACE("[hak_tiny_free_enter]\n");
|
||
|
|
}
|
||
|
|
// Track total tiny free calls (diagnostics)
|
||
|
|
```
|
||
|
|
|
||
|
|
**Classification:**
|
||
|
|
- **Class:** TELEMETRY (trace logging only)
|
||
|
|
- **Path:** HOT (executed on every tiny free call)
|
||
|
|
- **Usage:** Only for `HAK_TRACE` debug macro output
|
||
|
|
- **ENV Gate:** None (always active in HOT path)
|
||
|
|
|
||
|
|
**Step 0 Verification:**
|
||
|
|
- ✅ No ENV gate blocking execution
|
||
|
|
- ✅ In `hak_tiny_free()` - called on every tiny free operation
|
||
|
|
- ✅ Mixed benchmark heavily exercises tiny free path
|
||
|
|
- ✅ Confirmed: Executes thousands of times per benchmark run
|
||
|
|
|
||
|
|
**Step 1 Pre-Classification:**
|
||
|
|
- Pure TELEMETRY: Only used in trace macro (logging)
|
||
|
|
- Not in any `if` condition for control flow
|
||
|
|
- Removing it changes no behavior (only limits trace output to first 128 calls)
|
||
|
|
|
||
|
|
**Expected Impact:** **+0.5% to +1.0%** (HOT path, similar to Phase 25 free stats: +1.07%)
|
||
|
|
|
||
|
|
**Recommendation:** **TOP PRIORITY for Phase 31**
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### Priority 2: WARM Path NEW Candidates
|
||
|
|
|
||
|
|
#### Candidate 2A: `rel_logs` (WARM path)
|
||
|
|
|
||
|
|
**Location:**
|
||
|
|
- `core/hakmem_tiny_refill.inc.h:106`
|
||
|
|
- `core/box/warm_pool_prefill_box.h:35`
|
||
|
|
|
||
|
|
**Code Context:**
|
||
|
|
```c
|
||
|
|
static inline void warm_prefill_log_c7_meta(const char* tag, TinyTLSSlab* tls) {
|
||
|
|
if (!tls || !tls->ss) return;
|
||
|
|
if (!warm_prefill_log_enabled()) return; // ENV gate check
|
||
|
|
#if HAKMEM_BUILD_RELEASE
|
||
|
|
static _Atomic uint32_t rel_logs = 0;
|
||
|
|
uint32_t n = atomic_fetch_add_explicit(&rel_logs, 1, memory_order_relaxed);
|
||
|
|
if (n < 4) {
|
||
|
|
fprintf(stderr, "[REL_C7_USED_ASSIGN] tag=%s used=%u ...\n", tag, ...);
|
||
|
|
}
|
||
|
|
#else
|
||
|
|
// Debug version (different logging)
|
||
|
|
#endif
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Classification:**
|
||
|
|
- **Class:** TELEMETRY (fprintf logging only)
|
||
|
|
- **Path:** WARM (refill operations)
|
||
|
|
- **Usage:** Only for limiting log output to first 4 calls
|
||
|
|
- **ENV Gate:** `HAKMEM_TINY_WARM_LOG` (OFF by default)
|
||
|
|
|
||
|
|
**Step 0 Verification:**
|
||
|
|
- ⚠️ ENV gated by `warm_prefill_log_enabled()` → checks `HAKMEM_TINY_WARM_LOG`
|
||
|
|
- ❌ ENV default: OFF (not set in benchmark environment)
|
||
|
|
- ❌ Execution in benchmark: **LIKELY ZERO** (gated by ENV check)
|
||
|
|
|
||
|
|
**Expected Impact:** **0.0% (NO-OP)** - ENV gated like Phase 29 pool v2
|
||
|
|
|
||
|
|
**Recommendation:** **SKIP** (Phase 29 lesson: ENV-gated code = no-op)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
#### Candidate 2B: `dbg_logs` (WARM path)
|
||
|
|
|
||
|
|
**Location:**
|
||
|
|
- `core/hakmem_tiny_refill.inc.h:118`
|
||
|
|
- `core/box/warm_pool_prefill_box.h:53`
|
||
|
|
|
||
|
|
**Code Context:**
|
||
|
|
```c
|
||
|
|
static inline void warm_prefill_dbg_c7_meta(const char* tag, TinyTLSSlab* tls) {
|
||
|
|
if (!tls || !tls->ss) return;
|
||
|
|
if (!warm_prefill_log_enabled()) return; // ENV gate check
|
||
|
|
#if HAKMEM_BUILD_RELEASE
|
||
|
|
// rel_logs version
|
||
|
|
#else
|
||
|
|
static _Atomic uint32_t dbg_logs = 0;
|
||
|
|
uint32_t n = atomic_fetch_add_explicit(&dbg_logs, 1, memory_order_relaxed);
|
||
|
|
if (n < 4) {
|
||
|
|
fprintf(stderr, "[DBG_C7_USED_ASSIGN] tag=%s used=%u ...\n", tag, ...);
|
||
|
|
}
|
||
|
|
#endif
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Classification:**
|
||
|
|
- **Class:** TELEMETRY (fprintf logging only)
|
||
|
|
- **Path:** WARM (refill operations)
|
||
|
|
- **Usage:** Only for limiting log output to first 4 calls
|
||
|
|
- **ENV Gate:** `HAKMEM_TINY_WARM_LOG` (OFF by default)
|
||
|
|
- **Build Gate:** `#if HAKMEM_BUILD_RELEASE` - dbg_logs only in debug builds
|
||
|
|
|
||
|
|
**Step 0 Verification:**
|
||
|
|
- ⚠️ ENV gated by `warm_prefill_log_enabled()` → checks `HAKMEM_TINY_WARM_LOG`
|
||
|
|
- ❌ ENV default: OFF (not set in benchmark environment)
|
||
|
|
- ⚠️ Build gated: Only in debug builds (opposite branch from `rel_logs`)
|
||
|
|
- ❌ Execution in benchmark: **LIKELY ZERO** (ENV gate + wrong build branch)
|
||
|
|
|
||
|
|
**Expected Impact:** **0.0% (NO-OP)** - ENV gated + debug build only
|
||
|
|
|
||
|
|
**Recommendation:** **SKIP** (same ENV gate issue as `rel_logs`)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
#### Candidate 2C: `g_p0_class_oob_log` (WARM path)
|
||
|
|
|
||
|
|
**Location:** `core/hakmem_tiny_refill_p0.inc.h:41`
|
||
|
|
|
||
|
|
**Code Context:**
|
||
|
|
```c
|
||
|
|
static inline int sll_refill_batch_from_ss(int class_idx, int max_take) {
|
||
|
|
HAK_CHECK_CLASS_IDX(class_idx, "sll_refill_batch_from_ss");
|
||
|
|
if (__builtin_expect(class_idx < 0 || class_idx >= TINY_NUM_CLASSES, 0)) {
|
||
|
|
static _Atomic int g_p0_class_oob_log = 0;
|
||
|
|
if (atomic_fetch_add_explicit(&g_p0_class_oob_log, 1, memory_order_relaxed) == 0) {
|
||
|
|
fprintf(stderr, "[P0_CLASS_OOB] class_idx=%d max_take=%d\n", class_idx, max_take);
|
||
|
|
}
|
||
|
|
return 0;
|
||
|
|
}
|
||
|
|
// ... normal path ...
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Classification:**
|
||
|
|
- **Class:** TELEMETRY (error logging only)
|
||
|
|
- **Path:** WARM (P0 batch refill)
|
||
|
|
- **Usage:** Only for `fprintf` on first error occurrence
|
||
|
|
- **ENV Gate:** None
|
||
|
|
|
||
|
|
**Step 0 Verification:**
|
||
|
|
- ✅ No ENV gate blocking execution
|
||
|
|
- ⚠️ In error path: `if (class_idx < 0 || class_idx >= TINY_NUM_CLASSES)`
|
||
|
|
- ⚠️ Error condition should be rare (out-of-bounds class index)
|
||
|
|
- ❓ Execution frequency: **Unknown** (depends on whether benchmark triggers OOB)
|
||
|
|
|
||
|
|
**Expected Impact:** **±0.0% to +0.2%** (error path, likely infrequent)
|
||
|
|
|
||
|
|
**Recommendation:** **LOW PRIORITY** (error path, uncertain execution frequency)
|
||
|
|
|
||
|
|
**Action Required:** Need to verify if error path is ever hit:
|
||
|
|
```bash
|
||
|
|
# Add temporary counter to verify execution
|
||
|
|
grep -n "P0_CLASS_OOB" benchmark_output.txt
|
||
|
|
# OR check if class_idx is ever out of bounds
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Phase 31 Recommendation: TOP 3 Candidates
|
||
|
|
|
||
|
|
### Tier S: Immediate Action (HIGH Impact Expected)
|
||
|
|
|
||
|
|
**#1: `g_tiny_free_trace` (HOT path, TELEMETRY)**
|
||
|
|
- **Location:** `core/hakmem_tiny_free.inc:326`
|
||
|
|
- **Path:** HOT (every tiny free call)
|
||
|
|
- **Expected Impact:** **+0.5% to +1.0%**
|
||
|
|
- **Execution Verified:** ✅ YES (no ENV gate, core free path)
|
||
|
|
- **Classification:** Pure TELEMETRY (trace macro only)
|
||
|
|
- **Precedent:** Similar to Phase 25 free stats (+1.07%)
|
||
|
|
- **Action:** Proceed to Phase 31 implementation
|
||
|
|
|
||
|
|
**Rationale:**
|
||
|
|
- Only NEW HOT path candidate remaining
|
||
|
|
- No ENV gate blocking execution
|
||
|
|
- Similar profile to successful Phase 25 (free path stats)
|
||
|
|
- High confidence of GO result
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### Tier B: Consider Later (Uncertain Execution)
|
||
|
|
|
||
|
|
**#2: `g_p0_class_oob_log` (WARM path, error logging)**
|
||
|
|
- **Location:** `core/hakmem_tiny_refill_p0.inc.h:41`
|
||
|
|
- **Path:** WARM (but error path)
|
||
|
|
- **Expected Impact:** **±0.0% to +0.2%**
|
||
|
|
- **Execution Verified:** ❓ UNCERTAIN (error path, needs verification)
|
||
|
|
- **Classification:** TELEMETRY (fprintf only)
|
||
|
|
- **Action:** Verify execution first, then consider for Phase 32
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### Tier C: Skip (ENV-gated, no execution)
|
||
|
|
|
||
|
|
**#3: `rel_logs` + `dbg_logs` (WARM path, ENV-gated)**
|
||
|
|
- **Location:** `core/box/warm_pool_prefill_box.h`, `core/hakmem_tiny_refill.inc.h`
|
||
|
|
- **Path:** WARM (refill operations)
|
||
|
|
- **Expected Impact:** **0.0% (NO-OP)**
|
||
|
|
- **Execution Verified:** ❌ NO (ENV gate OFF by default)
|
||
|
|
- **Classification:** TELEMETRY (fprintf only)
|
||
|
|
- **Action:** SKIP (Phase 29 lesson: ENV-gated = wasted effort)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Phase 31 Implementation Plan
|
||
|
|
|
||
|
|
### Recommended Target: `g_tiny_free_trace`
|
||
|
|
|
||
|
|
**Step 1: CORRECTNESS/TELEMETRY Classification**
|
||
|
|
|
||
|
|
Already verified:
|
||
|
|
- ✅ Pure TELEMETRY (only used in HAK_TRACE macro)
|
||
|
|
- ✅ Not in any `if` condition for control flow
|
||
|
|
- ✅ Removing changes no behavior
|
||
|
|
|
||
|
|
**Step 2: Compile-Out Implementation**
|
||
|
|
|
||
|
|
a) Add BuildFlags gate:
|
||
|
|
```c
|
||
|
|
// core/hakmem_build_flags.h
|
||
|
|
// ========== Tiny Free Trace Atomic Prune (Phase 31) ==========
|
||
|
|
#ifndef HAKMEM_TINY_FREE_TRACE_COMPILED
|
||
|
|
# define HAKMEM_TINY_FREE_TRACE_COMPILED 0
|
||
|
|
#endif
|
||
|
|
```
|
||
|
|
|
||
|
|
b) Wrap atomic in `core/hakmem_tiny_free.inc`:
|
||
|
|
```c
|
||
|
|
void hak_tiny_free(void* ptr) {
|
||
|
|
#if HAKMEM_TINY_FREE_TRACE_COMPILED
|
||
|
|
static _Atomic int g_tiny_free_trace = 0;
|
||
|
|
if (atomic_fetch_add_explicit(&g_tiny_free_trace, 1, memory_order_relaxed) < 128) {
|
||
|
|
HAK_TRACE("[hak_tiny_free_enter]\n");
|
||
|
|
}
|
||
|
|
#else
|
||
|
|
(void)0; // No-op when compiled out
|
||
|
|
#endif
|
||
|
|
// ... rest of function ...
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Step 3: A/B Test**
|
||
|
|
|
||
|
|
Baseline (COMPILED=0):
|
||
|
|
```bash
|
||
|
|
make clean && make -j bench_random_mixed_hakmem
|
||
|
|
scripts/run_mixed_10_cleanenv.sh
|
||
|
|
```
|
||
|
|
|
||
|
|
Compiled-in (COMPILED=1):
|
||
|
|
```bash
|
||
|
|
make clean && make -j EXTRA_CFLAGS='-DHAKMEM_TINY_FREE_TRACE_COMPILED=1' bench_random_mixed_hakmem
|
||
|
|
scripts/run_mixed_10_cleanenv.sh
|
||
|
|
```
|
||
|
|
|
||
|
|
**Expected Result:** +0.5% to +1.0% (GO)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Alternative: Broader Atomic Audit
|
||
|
|
|
||
|
|
If `g_tiny_free_trace` yields NO-GO, consider:
|
||
|
|
|
||
|
|
1. **Manual review of UNKNOWN atomics (284 candidates)**
|
||
|
|
- Many may be misclassified by naming heuristics
|
||
|
|
- Potential hidden TELEMETRY candidates
|
||
|
|
- Requires deeper code inspection
|
||
|
|
|
||
|
|
2. **Expand to COLD path TELEMETRY**
|
||
|
|
- 386 COLD path atomics total
|
||
|
|
- Lower impact but code cleanliness benefit
|
||
|
|
- Example: Background thread stats, rare error paths
|
||
|
|
|
||
|
|
3. **Focus on non-atomic optimizations**
|
||
|
|
- Phase 30 procedure is for atomics only
|
||
|
|
- Branch optimization, inlining, etc. require different approach
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Summary Table
|
||
|
|
|
||
|
|
| Candidate | Path | Class | ENV Gate | Exec Verified | Expected Impact | Priority |
|
||
|
|
|-----------|------|-------|----------|---------------|-----------------|----------|
|
||
|
|
| `g_tiny_free_trace` | HOT | TELEMETRY | None | ✅ YES | **+0.5% to +1.0%** | **#1 (TOP)** |
|
||
|
|
| `g_p0_class_oob_log` | WARM | TELEMETRY | None | ❓ UNCERTAIN | ±0.0% to +0.2% | #2 (verify first) |
|
||
|
|
| `rel_logs` | WARM | TELEMETRY | ❌ OFF | ❌ NO | 0.0% (NO-OP) | SKIP |
|
||
|
|
| `dbg_logs` | WARM | TELEMETRY | ❌ OFF | ❌ NO | 0.0% (NO-OP) | SKIP |
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Lessons Applied from Phase 30 Standard Procedure
|
||
|
|
|
||
|
|
✅ **Step 0 Execution Verification:**
|
||
|
|
- Checked all candidates for ENV gates
|
||
|
|
- Identified 2 ENV-gated candidates (rel_logs, dbg_logs) → SKIP
|
||
|
|
- Verified HOT path candidate has no execution blockers
|
||
|
|
|
||
|
|
✅ **Phase 28 Lesson (CORRECTNESS check):**
|
||
|
|
- Verified `g_tiny_free_trace` not in `if` conditions
|
||
|
|
- Confirmed pure TELEMETRY usage (trace macro only)
|
||
|
|
|
||
|
|
✅ **Phase 29 Lesson (ENV gate):**
|
||
|
|
- Eliminated `rel_logs` and `dbg_logs` due to ENV gate
|
||
|
|
- Avoided wasting effort on non-executing code
|
||
|
|
|
||
|
|
✅ **Phase 24-27 Pattern (HOT path impact):**
|
||
|
|
- Selected HOT path candidate for maximum impact
|
||
|
|
- Expected similar gains to Phase 25 free stats
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Next Steps
|
||
|
|
|
||
|
|
1. **Proceed with Phase 31: `g_tiny_free_trace` atomic prune**
|
||
|
|
- Follow Phase 30 standard procedure (4 steps)
|
||
|
|
- Expected result: GO (+0.5% to +1.0%)
|
||
|
|
|
||
|
|
2. **If Phase 31 yields GO:**
|
||
|
|
- Update cumulative summary (+3.24% to +3.74% total)
|
||
|
|
- Move to Phase 32: Verify `g_p0_class_oob_log` execution
|
||
|
|
|
||
|
|
3. **If Phase 31 yields NO-GO:**
|
||
|
|
- Investigate why (measurement noise? unusual workload?)
|
||
|
|
- Consider manual audit of UNKNOWN atomics (284 candidates)
|
||
|
|
- Shift focus to non-atomic optimizations
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
**Recommendation:** **Proceed with Phase 31 targeting `g_tiny_free_trace`**
|
||
|
|
|
||
|
|
**Confidence Level:** High (HOT path, no blockers, proven pattern)
|