153 lines
4.2 KiB
Markdown
153 lines
4.2 KiB
Markdown
|
|
# Phase 20 (Proposal): WarmPool SlabIdx Hint (avoid O(cap) scan on warm hit)
|
|||
|
|
|
|||
|
|
## TL;DR
|
|||
|
|
|
|||
|
|
`unified_cache_refill()` already does **batch refill** (up to 128/256/512 blocks) and already has the **Warm Pool** to avoid registry scans.
|
|||
|
|
|
|||
|
|
The remaining hot cost on the warm-hit path is typically:
|
|||
|
|
|
|||
|
|
1. `tiny_warm_pool_pop(class_idx)` (O(1))
|
|||
|
|
2. `for (i=0..cap)` scan to find `slab_idx` where `tiny_get_class_from_ss(ss,i)==class_idx` (**O(cap)**)
|
|||
|
|
3. `ss_tls_bind_one(...)` and optional TLS carve
|
|||
|
|
|
|||
|
|
This Phase 20 reduces step (2) by storing a **slab_idx hint** alongside the warm pool entry.
|
|||
|
|
|
|||
|
|
Expected ROI: **+1–4%** on Mixed (only if warm-hit rate is high and cap scan is nontrivial).
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Box Theory framing
|
|||
|
|
|
|||
|
|
### New box
|
|||
|
|
|
|||
|
|
**WarmPoolEntryBox**: “warm entry = (SuperSlab*, slab_idx_hint)”
|
|||
|
|
|
|||
|
|
- Responsibility: store + retrieve warm candidates with a verified slab index hint
|
|||
|
|
- No side effects outside warm pool memory (pure stack operations)
|
|||
|
|
|
|||
|
|
### Boundary (single conversion point)
|
|||
|
|
|
|||
|
|
The only “conversion” is in `unified_cache_refill()`:
|
|||
|
|
|
|||
|
|
`WarmPoolEntryBox.pop(class_idx)` → `(ss, slab_idx_hint)`
|
|||
|
|
→ validate hint once
|
|||
|
|
→ `ss_tls_bind_one(class_idx, tls, ss, slab_idx, tid)`
|
|||
|
|
|
|||
|
|
No other code paths should re-infer slab_idx.
|
|||
|
|
|
|||
|
|
### Rollback
|
|||
|
|
|
|||
|
|
ENV gate: `HAKMEM_WARM_POOL_SLABIDX_HINT=0/1` (default 0, opt-in)
|
|||
|
|
|
|||
|
|
- OFF: current `TinyWarmPool` behavior (ss only, scan to find slab_idx)
|
|||
|
|
- ON: use entry-with-hint fast path; fallback to scan if hint invalid
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Design details
|
|||
|
|
|
|||
|
|
### Data structure
|
|||
|
|
|
|||
|
|
Add a parallel “entry” type without changing SuperSlab layout:
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
typedef struct {
|
|||
|
|
SuperSlab* ss;
|
|||
|
|
uint16_t slab_idx_hint; // [0..cap-1], or 0xFFFF for “unknown”
|
|||
|
|
} TinyWarmEntry;
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Implementation options:
|
|||
|
|
|
|||
|
|
1) Replace `TinyWarmPool.slabs[]` with `TinyWarmEntry entries[]`
|
|||
|
|
2) Dual arrays (`slabs[]` + `slab_idx[]`)
|
|||
|
|
|
|||
|
|
Prefer (2) if you want minimal diff risk.
|
|||
|
|
|
|||
|
|
### Push contract
|
|||
|
|
|
|||
|
|
When pushing a warm candidate, compute slab_idx once (cold-ish context) and store the hint:
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
int slab_idx = ss_find_first_slab_for_class(ss, class_idx); // may scan
|
|||
|
|
warm_pool_push(class_idx, ss, slab_idx);
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
This moves scan cost to the producer side (where it is already scanning registry / iterating slabs).
|
|||
|
|
|
|||
|
|
### Pop contract (hot)
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
TinyWarmEntry e = warm_pool_pop_entry(class_idx);
|
|||
|
|
if (!e.ss) return NULL;
|
|||
|
|
|
|||
|
|
int slab_idx = e.slab_idx_hint;
|
|||
|
|
if (slab_idx != 0xFFFF && tiny_get_class_from_ss(e.ss, slab_idx) == class_idx) {
|
|||
|
|
// fast: validated
|
|||
|
|
} else {
|
|||
|
|
// fallback: scan to find slab_idx (rare)
|
|||
|
|
slab_idx = ss_find_first_slab_for_class(e.ss, class_idx);
|
|||
|
|
if (slab_idx < 0) return NULL; // fail-fast → warm miss fallback
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Fail-fast rules
|
|||
|
|
|
|||
|
|
- If hint invalid and scan fails → treat as warm miss (fallback to existing refill path)
|
|||
|
|
- Never bind TLS to a slab_idx that doesn’t match class
|
|||
|
|
|
|||
|
|
### Minimal observability
|
|||
|
|
|
|||
|
|
Add only coarse counters (ENV-gated, or debug builds only):
|
|||
|
|
|
|||
|
|
- `warm_hint_hit` (hint valid)
|
|||
|
|
- `warm_hint_miss` (hint invalid → scanned)
|
|||
|
|
- `warm_hint_scan_fail` (should be ~0; fail-fast signal)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## A/B plan
|
|||
|
|
|
|||
|
|
### Step 0: prove this is worth doing
|
|||
|
|
|
|||
|
|
Before implementing, validate that warm pool is used and the slab scan is nontrivial:
|
|||
|
|
|
|||
|
|
- `perf record` / `perf report` focus on `unified_cache_refill`
|
|||
|
|
- confirm the slab scan loop shows up (e.g. `tiny_get_class_from_ss` / loop body)
|
|||
|
|
|
|||
|
|
If `unified_cache_refill` is < ~3% total cycles in Mixed, ROI will be capped.
|
|||
|
|
|
|||
|
|
### Step 1: implement behind ENV
|
|||
|
|
|
|||
|
|
- `HAKMEM_WARM_POOL_SLABIDX_HINT=0/1` (default 0)
|
|||
|
|
- Keep behavior identical when OFF.
|
|||
|
|
|
|||
|
|
### Step 2: Mixed 10-run
|
|||
|
|
|
|||
|
|
Use the existing cleanenv harness:
|
|||
|
|
|
|||
|
|
```sh
|
|||
|
|
HAKMEM_WARM_POOL_SLABIDX_HINT=0 scripts/run_mixed_10_cleanenv.sh
|
|||
|
|
HAKMEM_WARM_POOL_SLABIDX_HINT=1 scripts/run_mixed_10_cleanenv.sh
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
GO/NO-GO:
|
|||
|
|
- GO: mean **+1.0%** or more
|
|||
|
|
- NEUTRAL: ±1.0%
|
|||
|
|
- NO-GO: -1.0% or worse
|
|||
|
|
|
|||
|
|
### Step 3: perf sanity
|
|||
|
|
|
|||
|
|
If GO, confirm:
|
|||
|
|
- `branches/op` down (or stable)
|
|||
|
|
- no iTLB explosion
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Why “Segment Batch Refill Layer” is probably redundant here
|
|||
|
|
|
|||
|
|
`unified_cache_refill()` already batches heavily (up to 512) and already has Warm Pool + PageBox + carve boxes.
|
|||
|
|
|
|||
|
|
If you want a Phase 20 with big ROI, aim at the remaining **O(cap) scan** and any remaining registry scans, not adding another batch layer on top of an existing batch layer.
|
|||
|
|
|