hakmem/docs/analysis/PHASE28_BG_SPILL_ATOMIC_PRUNE_RESULTS.md

# Phase 28: Background Spill Queue Atomic Prune Results

**Date:** 2025-12-16
**Status:** ✅ **COMPLETE (NO-OP)**
**Verdict:** **All CORRECTNESS - No compile-out candidates**

---

## Executive Summary

Phase 28 conducted a thorough audit of all atomic operations in the background spill queue subsystem (`core/hakmem_tiny_bg_spill.*`). **Result: All 8 atomics are CORRECTNESS-critical.** No telemetry atomics were found, therefore no compile-out was performed.

**Key Finding:** The `g_bg_spill_len` counter, which superficially resembles telemetry counters from previous phases, is actually used for **flow control** (queue depth limiting) and is therefore untouchable.

---

## Audit Results

### Total Atomics: 8

| Atomic Operation | Location | Classification | Reason |
|-----------------|----------|----------------|--------|
| `atomic_load(&g_bg_spill_head)` | `hakmem_tiny_bg_spill.h:27` | CORRECTNESS | Lock-free queue CAS loop |
| `atomic_compare_exchange_weak(&g_bg_spill_head)` | `hakmem_tiny_bg_spill.h:29` | CORRECTNESS | Lock-free queue CAS |
| `atomic_fetch_add(&g_bg_spill_len, 1)` | `hakmem_tiny_bg_spill.h:32` | CORRECTNESS | Queue length (flow control) |
| `atomic_load(&g_bg_spill_head)` | `hakmem_tiny_bg_spill.h:39` | CORRECTNESS | Lock-free queue CAS loop |
| `atomic_compare_exchange_weak(&g_bg_spill_head)` | `hakmem_tiny_bg_spill.h:41` | CORRECTNESS | Lock-free queue CAS |
| `atomic_fetch_add(&g_bg_spill_len, count)` | `hakmem_tiny_bg_spill.h:44` | CORRECTNESS | Queue length (flow control) |
| `atomic_load(&g_bg_spill_len)` | `hakmem_tiny_bg_spill.c:30` | CORRECTNESS | Early-exit optimization |
| `atomic_fetch_sub(&g_bg_spill_len)` | `hakmem_tiny_bg_spill.c:91` | CORRECTNESS | Queue length decrement |

**CORRECTNESS:** 8/8 (100%)
**TELEMETRY:** 0/8 (0%)

---

## Critical Finding: `g_bg_spill_len` is NOT Telemetry

### The Trap

At first glance, `g_bg_spill_len` looks like a telemetry counter:
- Named with `_len` suffix (like stats counters)
- Incremented on push, decremented on drain
- Uses `atomic_fetch_add/sub` (same pattern as telemetry)

### The Reality

**`g_bg_spill_len` is used for flow control in the hot free path:**

```c
// core/tiny_free_magazine.inc.h:75-77
if (g_bg_spill_enable) {
    uint32_t qlen = atomic_load_explicit(&g_bg_spill_len[class_idx], memory_order_relaxed);
    if ((int)qlen < g_bg_spill_target) {  // <-- FLOW CONTROL DECISION
        // Build a small chain: include current ptr and pop from mag up to limit
        // ...
        bg_spill_push_chain(class_idx, head, tail, taken);
        return;
    }
}
```

**What this means:**
- If queue length < target: queue work to background thread
- If queue length >= target: take alternate path (direct free)
- **Removing this atomic would change program behavior** (unbounded queue growth)
- **This is an operational counter, not a debug counter**

### Comparison with Telemetry Counters

| Counter | Phase | Purpose | Flow Control? | Classification |
|---------|-------|---------|---------------|----------------|
| `g_tiny_class_stats_*` | 24 | Observe cache hits | NO | TELEMETRY |
| `g_free_ss_enter` | 25 | Count free calls | NO | TELEMETRY |
| `g_unified_cache_*` | 27 | Measure cache perf | NO | TELEMETRY |
| **`g_bg_spill_len`** | **28** | **Queue depth limit** | **YES** | **CORRECTNESS** |

**Key Distinction:** Telemetry counters are **observational** (removed if not observed). Operational counters are **functional** (program behavior depends on them).

---

## Lock-Free Queue Atomics

The remaining 6 atomics are part of the lock-free stack implementation:

### Push Operation (lines 24-32, 36-44)
```c
static inline void bg_spill_push_one(int class_idx, void* p) {
    uintptr_t old_head;
    do {
        old_head = atomic_load_explicit(&g_bg_spill_head[class_idx], memory_order_acquire);  // CORRECTNESS
        tiny_next_write(class_idx, p, (void*)old_head);
    } while (!atomic_compare_exchange_weak_explicit(&g_bg_spill_head[class_idx], &old_head,  // CORRECTNESS
                                                    (uintptr_t)p,
                                                    memory_order_release, memory_order_relaxed));
    atomic_fetch_add_explicit(&g_bg_spill_len[class_idx], 1u, memory_order_relaxed);  // CORRECTNESS (flow control)
}
```

**Analysis:**
- Classic lock-free stack pattern (load → link → CAS loop)
- `atomic_load` + `atomic_compare_exchange_weak` are fundamental to correctness
- Cannot be removed without replacing entire queue implementation

---

## Decision: NO-OP

**Verdict:** Phase 28 is a **NO-OP**. No code changes required.

**Rationale:**
1. All atomics are CORRECTNESS-critical
2. `g_bg_spill_len` is used for flow control, not telemetry
3. Lock-free queue operations are untouchable
4. No A/B testing needed (nothing to test)

**Phase 28 Result:**
- **Atomics removed:** 0
- **Performance gain:** N/A
- **Code changes:** None
- **Documentation:** Audit complete, classification recorded

---

## Impact on Future Phases

### Lesson Learned

**Not all counters are telemetry.** Before classifying an atomic as TELEMETRY:
1. Search for all uses of the variable
2. Check if it's used in control flow (`if`, `while`, comparisons)
3. Determine if removal would change program behavior
4. Only compile-out if purely observational

### Similar Candidates to Audit Carefully

**Phase 29+ candidates that may have flow control:**
- `g_remote_target_len` (remote queue length - same pattern as bg_spill)
- `g_l25_pool.remote_count` (L25 pool remote counts)
- Any `*_len`, `*_count` that might be used for queue management

**Red flags for CORRECTNESS:**
- Used in `if (count < threshold)` statements
- Used to decide whether to queue work
- Used to prevent unbounded growth
- Paired with lock-free queue head pointers

---

## Phase 28 Files Analyzed

**No modifications:**
- `core/hakmem_tiny_bg_spill.h` (audit only)
- `core/hakmem_tiny_bg_spill.c` (audit only)
- `core/tiny_free_magazine.inc.h` (flow control usage identified)

**Documentation created:**
- `docs/analysis/PHASE28_BG_SPILL_ATOMIC_AUDIT.md` (detailed audit)
- `docs/analysis/PHASE28_BG_SPILL_ATOMIC_PRUNE_RESULTS.md` (this file)
- `docs/analysis/ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md` (updated)

---

## Cumulative Progress

| Phase | Atomics Removed | Impact | Verdict |
|-------|-----------------|--------|---------|
| 24 | 5 | +0.93% | GO ✅ |
| 25 | 1 | +1.07% | GO ✅ |
| 26 | 5 | -0.33% | NEUTRAL ✅ |
| 27 | 6 | +0.74% | GO ✅ |
| **28** | **0** | **N/A** | **NO-OP ✅** |
| **Total** | **17** | **+2.74%** | **✅** |

**Next:** Phase 29 (remote target queue or pool hotbox v2 stats)

---

## Conclusion

Phase 28 successfully completed its audit objective:
1. ✅ All atomics identified (8 total)
2. ✅ All atomics classified (100% CORRECTNESS)
3. ✅ Flow control usage documented (`g_bg_spill_len`)
4. ✅ No compile-out candidates found
5. ✅ Cumulative summary updated

**Key Takeaway:** Audit phases are valuable even when they result in NO-OP. They document which atomics are untouchable and why, preventing future incorrect optimizations.

---

**Phase 28 Status:** ✅ **COMPLETE (NO-OP)**
**Next Phase:** 29 (TBD based on priority)
**Date:** 2025-12-16
Phase 27-28: Unified Cache stats validation + BG Spill audit Phase 27: Unified Cache Stats A/B Test - GO (+0.74%) - Target: g_unified_cache_* atomics (6 total) in WARM refill path - Already implemented in Phase 23 (HAKMEM_TINY_UNIFIED_CACHE_MEASURE_COMPILED) - A/B validation: Baseline 52.94M vs Compiled-in 52.55M ops/s - Result: +0.74% mean, +1.01% median (both exceed +0.5% GO threshold) - Impact: WARM path atomics have similar impact to HOT path - Insight: Refill frequency is substantial, ENV check overhead matters Phase 28: BG Spill Queue Atomic Audit - NO-OP - Target: g_bg_spill_* atomics (8 total) in background spill subsystem - Classification: 8/8 CORRECTNESS (100% untouchable) - Key finding: g_bg_spill_len is flow control, NOT telemetry - Used in queue depth limiting: if (qlen < target) {...} - Operational counter (affects behavior), not observational - Lesson: Counter name ≠ purpose, must trace all usages - Result: NO-OP (no code changes, audit documentation only) Cumulative Progress (Phase 24-28): - Phase 24 (class stats): +0.93% GO - Phase 25 (free stats): +1.07% GO - Phase 26 (diagnostics): -0.33% NEUTRAL - Phase 27 (unified cache): +0.74% GO - Phase 28 (bg spill): NO-OP (audit only) - Total: 17 atomics removed, +2.74% improvement Documentation: - PHASE27_UNIFIED_CACHE_STATS_RESULTS.md: Complete A/B test report - PHASE28_BG_SPILL_ATOMIC_AUDIT.md: Detailed CORRECTNESS classification - PHASE28_BG_SPILL_ATOMIC_PRUNE_RESULTS.md: NO-OP verdict and lessons - ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md: Updated with Phase 27-28 - CURRENT_TASK.md: Phase 29 candidate identified (Pool Hotbox v2) Generated with Claude Code https://claude.com/claude-code Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> 2025-12-16 06:12:17 +09:00			`# Phase 28: Background Spill Queue Atomic Prune Results`

			`Date: 2025-12-16`
			`Status: ✅ COMPLETE (NO-OP)`
			`Verdict: All CORRECTNESS - No compile-out candidates`

			`---`

			`## Executive Summary`

			Phase 28 conducted a thorough audit of all atomic operations in the background spill queue subsystem (`core/hakmem_tiny_bg_spill.`). Result: All 8 atomics are CORRECTNESS-critical.* No telemetry atomics were found, therefore no compile-out was performed.

			Key Finding: The `g_bg_spill_len` counter, which superficially resembles telemetry counters from previous phases, is actually used for flow control (queue depth limiting) and is therefore untouchable.

			`---`

			`## Audit Results`

			`### Total Atomics: 8`

			`\| Atomic Operation \| Location \| Classification \| Reason \|`
			`\|-----------------\|----------\|----------------\|--------\|`
			\| `atomic_load(&g_bg_spill_head)` \| `hakmem_tiny_bg_spill.h:27` \| CORRECTNESS \| Lock-free queue CAS loop \|
			\| `atomic_compare_exchange_weak(&g_bg_spill_head)` \| `hakmem_tiny_bg_spill.h:29` \| CORRECTNESS \| Lock-free queue CAS \|
			\| `atomic_fetch_add(&g_bg_spill_len, 1)` \| `hakmem_tiny_bg_spill.h:32` \| CORRECTNESS \| Queue length (flow control) \|
			\| `atomic_load(&g_bg_spill_head)` \| `hakmem_tiny_bg_spill.h:39` \| CORRECTNESS \| Lock-free queue CAS loop \|
			\| `atomic_compare_exchange_weak(&g_bg_spill_head)` \| `hakmem_tiny_bg_spill.h:41` \| CORRECTNESS \| Lock-free queue CAS \|
			\| `atomic_fetch_add(&g_bg_spill_len, count)` \| `hakmem_tiny_bg_spill.h:44` \| CORRECTNESS \| Queue length (flow control) \|
			\| `atomic_load(&g_bg_spill_len)` \| `hakmem_tiny_bg_spill.c:30` \| CORRECTNESS \| Early-exit optimization \|
			\| `atomic_fetch_sub(&g_bg_spill_len)` \| `hakmem_tiny_bg_spill.c:91` \| CORRECTNESS \| Queue length decrement \|

			`CORRECTNESS: 8/8 (100%)`
			`TELEMETRY: 0/8 (0%)`

			`---`

			## Critical Finding: `g_bg_spill_len` is NOT Telemetry

			`### The Trap`

			At first glance, `g_bg_spill_len` looks like a telemetry counter:
			- Named with `_len` suffix (like stats counters)
			`- Incremented on push, decremented on drain`
			- Uses `atomic_fetch_add/sub` (same pattern as telemetry)

			`### The Reality`

			`g_bg_spill_len` is used for flow control in the hot free path:

			```c
			`// core/tiny_free_magazine.inc.h:75-77`
			`if (g_bg_spill_enable) {`
			`uint32_t qlen = atomic_load_explicit(&g_bg_spill_len[class_idx], memory_order_relaxed);`
			`if ((int)qlen < g_bg_spill_target) { // <-- FLOW CONTROL DECISION`
			`// Build a small chain: include current ptr and pop from mag up to limit`
			`// ...`
			`bg_spill_push_chain(class_idx, head, tail, taken);`
			`return;`
			`}`
			`}`
			```

			`What this means:`
			`- If queue length < target: queue work to background thread`
			`- If queue length >= target: take alternate path (direct free)`
			`- Removing this atomic would change program behavior (unbounded queue growth)`
			`- This is an operational counter, not a debug counter`

			`### Comparison with Telemetry Counters`

			`\| Counter \| Phase \| Purpose \| Flow Control? \| Classification \|`
			`\|---------\|-------\|---------\|---------------\|----------------\|`
			\| `g_tiny_class_stats_*` \| 24 \| Observe cache hits \| NO \| TELEMETRY \|
			\| `g_free_ss_enter` \| 25 \| Count free calls \| NO \| TELEMETRY \|
			\| `g_unified_cache_*` \| 27 \| Measure cache perf \| NO \| TELEMETRY \|
			\| `g_bg_spill_len` \| 28 \| Queue depth limit \| YES \| CORRECTNESS \|

			`Key Distinction: Telemetry counters are observational (removed if not observed). Operational counters are functional (program behavior depends on them).`

			`---`

			`## Lock-Free Queue Atomics`

			`The remaining 6 atomics are part of the lock-free stack implementation:`

			`### Push Operation (lines 24-32, 36-44)`
			```c
			`static inline void bg_spill_push_one(int class_idx, void* p) {`
			`uintptr_t old_head;`
			`do {`
			`old_head = atomic_load_explicit(&g_bg_spill_head[class_idx], memory_order_acquire); // CORRECTNESS`
			`tiny_next_write(class_idx, p, (void*)old_head);`
			`} while (!atomic_compare_exchange_weak_explicit(&g_bg_spill_head[class_idx], &old_head, // CORRECTNESS`
			`(uintptr_t)p,`
			`memory_order_release, memory_order_relaxed));`
			`atomic_fetch_add_explicit(&g_bg_spill_len[class_idx], 1u, memory_order_relaxed); // CORRECTNESS (flow control)`
			`}`
			```

			`Analysis:`
			`- Classic lock-free stack pattern (load → link → CAS loop)`
			- `atomic_load` + `atomic_compare_exchange_weak` are fundamental to correctness
			`- Cannot be removed without replacing entire queue implementation`

			`---`

			`## Decision: NO-OP`

			`Verdict: Phase 28 is a NO-OP. No code changes required.`

			`Rationale:`
			`1. All atomics are CORRECTNESS-critical`
			2. `g_bg_spill_len` is used for flow control, not telemetry
			`3. Lock-free queue operations are untouchable`
			`4. No A/B testing needed (nothing to test)`

			`Phase 28 Result:`
			`- Atomics removed: 0`
			`- Performance gain: N/A`
			`- Code changes: None`
			`- Documentation: Audit complete, classification recorded`

			`---`

			`## Impact on Future Phases`

			`### Lesson Learned`

			`Not all counters are telemetry. Before classifying an atomic as TELEMETRY:`
			`1. Search for all uses of the variable`
			2. Check if it's used in control flow (`if`, `while`, comparisons)
			`3. Determine if removal would change program behavior`
			`4. Only compile-out if purely observational`

			`### Similar Candidates to Audit Carefully`

			`Phase 29+ candidates that may have flow control:`
			- `g_remote_target_len` (remote queue length - same pattern as bg_spill)
			- `g_l25_pool.remote_count` (L25 pool remote counts)
			- Any `_len`, `_count` that might be used for queue management

			`Red flags for CORRECTNESS:`
			- Used in `if (count < threshold)` statements
			`- Used to decide whether to queue work`
			`- Used to prevent unbounded growth`
			`- Paired with lock-free queue head pointers`

			`---`

			`## Phase 28 Files Analyzed`

			`No modifications:`
			- `core/hakmem_tiny_bg_spill.h` (audit only)
			- `core/hakmem_tiny_bg_spill.c` (audit only)
			- `core/tiny_free_magazine.inc.h` (flow control usage identified)

			`Documentation created:`
			- `docs/analysis/PHASE28_BG_SPILL_ATOMIC_AUDIT.md` (detailed audit)
			- `docs/analysis/PHASE28_BG_SPILL_ATOMIC_PRUNE_RESULTS.md` (this file)
			- `docs/analysis/ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md` (updated)

			`---`

			`## Cumulative Progress`

			`\| Phase \| Atomics Removed \| Impact \| Verdict \|`
			`\|-------\|-----------------\|--------\|---------\|`
			`\| 24 \| 5 \| +0.93% \| GO ✅ \|`
			`\| 25 \| 1 \| +1.07% \| GO ✅ \|`
			`\| 26 \| 5 \| -0.33% \| NEUTRAL ✅ \|`
			`\| 27 \| 6 \| +0.74% \| GO ✅ \|`
			`\| 28 \| 0 \| N/A \| NO-OP ✅ \|`
			`\| Total \| 17 \| +2.74% \| ✅ \|`

			`Next: Phase 29 (remote target queue or pool hotbox v2 stats)`

			`---`

			`## Conclusion`

			`Phase 28 successfully completed its audit objective:`
			`1. ✅ All atomics identified (8 total)`
			`2. ✅ All atomics classified (100% CORRECTNESS)`
			3. ✅ Flow control usage documented (`g_bg_spill_len`)
			`4. ✅ No compile-out candidates found`
			`5. ✅ Cumulative summary updated`

			`Key Takeaway: Audit phases are valuable even when they result in NO-OP. They document which atomics are untouchable and why, preventing future incorrect optimizations.`

			`---`

			`Phase 28 Status: ✅ COMPLETE (NO-OP)`
			`Next Phase: 29 (TBD based on priority)`
			`Date: 2025-12-16`