# Phase 20 (Proposal): WarmPool SlabIdx Hint (avoid O(cap) scan on warm hit) ## TL;DR `unified_cache_refill()` already does **batch refill** (up to 128/256/512 blocks) and already has the **Warm Pool** to avoid registry scans. The remaining hot cost on the warm-hit path is typically: 1. `tiny_warm_pool_pop(class_idx)` (O(1)) 2. `for (i=0..cap)` scan to find `slab_idx` where `tiny_get_class_from_ss(ss,i)==class_idx` (**O(cap)**) 3. `ss_tls_bind_one(...)` and optional TLS carve This Phase 20 reduces step (2) by storing a **slab_idx hint** alongside the warm pool entry. Expected ROI: **+1–4%** on Mixed (only if warm-hit rate is high and cap scan is nontrivial). --- ## Box Theory framing ### New box **WarmPoolEntryBox**: “warm entry = (SuperSlab*, slab_idx_hint)” - Responsibility: store + retrieve warm candidates with a verified slab index hint - No side effects outside warm pool memory (pure stack operations) ### Boundary (single conversion point) The only “conversion” is in `unified_cache_refill()`: `WarmPoolEntryBox.pop(class_idx)` → `(ss, slab_idx_hint)` → validate hint once → `ss_tls_bind_one(class_idx, tls, ss, slab_idx, tid)` No other code paths should re-infer slab_idx. ### Rollback ENV gate: `HAKMEM_WARM_POOL_SLABIDX_HINT=0/1` (default 0, opt-in) - OFF: current `TinyWarmPool` behavior (ss only, scan to find slab_idx) - ON: use entry-with-hint fast path; fallback to scan if hint invalid --- ## Design details ### Data structure Add a parallel “entry” type without changing SuperSlab layout: ```c typedef struct { SuperSlab* ss; uint16_t slab_idx_hint; // [0..cap-1], or 0xFFFF for “unknown” } TinyWarmEntry; ``` Implementation options: 1) Replace `TinyWarmPool.slabs[]` with `TinyWarmEntry entries[]` 2) Dual arrays (`slabs[]` + `slab_idx[]`) Prefer (2) if you want minimal diff risk. ### Push contract When pushing a warm candidate, compute slab_idx once (cold-ish context) and store the hint: ```c int slab_idx = ss_find_first_slab_for_class(ss, class_idx); // may scan warm_pool_push(class_idx, ss, slab_idx); ``` This moves scan cost to the producer side (where it is already scanning registry / iterating slabs). ### Pop contract (hot) ```c TinyWarmEntry e = warm_pool_pop_entry(class_idx); if (!e.ss) return NULL; int slab_idx = e.slab_idx_hint; if (slab_idx != 0xFFFF && tiny_get_class_from_ss(e.ss, slab_idx) == class_idx) { // fast: validated } else { // fallback: scan to find slab_idx (rare) slab_idx = ss_find_first_slab_for_class(e.ss, class_idx); if (slab_idx < 0) return NULL; // fail-fast → warm miss fallback } ``` ### Fail-fast rules - If hint invalid and scan fails → treat as warm miss (fallback to existing refill path) - Never bind TLS to a slab_idx that doesn’t match class ### Minimal observability Add only coarse counters (ENV-gated, or debug builds only): - `warm_hint_hit` (hint valid) - `warm_hint_miss` (hint invalid → scanned) - `warm_hint_scan_fail` (should be ~0; fail-fast signal) --- ## A/B plan ### Step 0: prove this is worth doing Before implementing, validate that warm pool is used and the slab scan is nontrivial: - `perf record` / `perf report` focus on `unified_cache_refill` - confirm the slab scan loop shows up (e.g. `tiny_get_class_from_ss` / loop body) If `unified_cache_refill` is < ~3% total cycles in Mixed, ROI will be capped. ### Step 1: implement behind ENV - `HAKMEM_WARM_POOL_SLABIDX_HINT=0/1` (default 0) - Keep behavior identical when OFF. ### Step 2: Mixed 10-run Use the existing cleanenv harness: ```sh HAKMEM_WARM_POOL_SLABIDX_HINT=0 scripts/run_mixed_10_cleanenv.sh HAKMEM_WARM_POOL_SLABIDX_HINT=1 scripts/run_mixed_10_cleanenv.sh ``` GO/NO-GO: - GO: mean **+1.0%** or more - NEUTRAL: ±1.0% - NO-GO: -1.0% or worse ### Step 3: perf sanity If GO, confirm: - `branches/op` down (or stable) - no iTLB explosion --- ## Why “Segment Batch Refill Layer” is probably redundant here `unified_cache_refill()` already batches heavily (up to 512) and already has Warm Pool + PageBox + carve boxes. If you want a Phase 20 with big ROI, aim at the remaining **O(cap) scan** and any remaining registry scans, not adding another batch layer on top of an existing batch layer.