Major Features: - Debug counter infrastructure for Refill Stage tracking - Free Pipeline counters (ss_local, ss_remote, tls_sll) - Diagnostic counters for early return analysis - Unified larson.sh benchmark runner with profiles - Phase 6-3 regression analysis documentation Bug Fixes: - Fix SuperSlab disabled by default (HAKMEM_TINY_USE_SUPERSLAB) - Fix profile variable naming consistency - Add .gitignore patterns for large files Performance: - Phase 6-3: 4.79 M ops/s (has OOM risk) - With SuperSlab: 3.13 M ops/s (+19% improvement) This is a clean repository without large log files. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2.9 KiB
2.9 KiB
SACS‑3 Overview (Phase 6.16)
Goal: keep the hot path predictable and fast by deciding tiers by size only, and let ACE optimize just the cache layer (L1) dynamically.
Tiers
- L0 Tiny (≤ 1 KiB)
- TLS magazine, TLS Active Slab, MPSC remote‑free.
- L1 ACE (1 KiB < size < 2 MiB)
- MidPool: 2/4/8/16/32 KiB (5 classes)
- LargePool: 64/128/256/512 KiB/1 MiB (5 classes)
- W_MAX rounding: class
caccepted ifc ≤ W_MAX × size. - Gap 32–64 KiB absorbed by rounding to 64 KiB when within W_MAX.
- L2 Big (≥ 2 MiB)
- BigCache + mmap, THP gate.
Hot Path (hak_alloc_at)
if (size ≤ 1 KiB) return tinyslab_alloc(size);
if (size < 2 MiB) return hkm_ace_alloc(size, site, P); // L1
else return bigcache/mmap; // L2
Where P is a FrozenPolicy snapshot (RCU‑published); hot path reads it once per call.
ACE = “Smart Cache” for L1
What ACE does (off hot path):
- CAP: per‑class budget (front‑loading) for Mid/Large.
- Site/class→shard: fix locality and reduce contention.
- Free policy per class: KEEP / delayed MADV_FREE / batched DONTNEED.
- W_MAX candidates: choose (e.g., {1.25, 1.5, 1.75}) via CANARY.
- BATCH_RANGE: pick from a few candidates.
- All decisions baked into a
FrozenPolicy, published once per window.
Profiling & Overhead Tracking
Enable sampling profiler:
export HAKMEM_PROF=1
export HAKMEM_PROF_SAMPLE=10 # sample 1/1024
Key categories: tiny_alloc, ace_alloc, malloc_alloc, mmap_alloc, plus internals like pool_lock/refill, l25_lock/refill, and tiny internals.
Sweep helper:
scripts/prof_sweep.sh -d 2 -t 1,4 -s 8
Roadmap
- CAP tuning (static → learning):
- Observe L1
malloc_allocavg ns and L1 fallback rate per range. - Increase CAP where hit < target; decrease where overshoot.
- Observe L1
- W_MAX tuning per tier (Mid vs Large) with guard rails.
- Shard routing via FrozenPolicy (reduce lock contention).
- Publish policy via
hkm_policy_publish()on window boundaries (RCU).
Learning Axes & Controls
- Axes:
- Threshold (mmap/L1↔L2), Class Count (containers), Class Shape (boundaries + W_MAX), Class Volume (CAP)
- Implemented now:
- Soft CAP gating on Mid/L2.5 refills (CAP over → bundle=1, under → up to 4)
- Learner thread adjusts
mid_cap[]/large_cap[]by target hit rate (hysteresis, budget)
- Env controls:
HAKMEM_LEARN=1,HAKMEM_LEARN_WINDOW_MS,HAKMEM_TARGET_HIT_MID/LARGE,HAKMEM_CAP_STEP_MID/LARGE,HAKMEM_BUDGET_MID/LARGEHAKMEM_CAP_MID/LARGE(manual override),HAKMEM_WMAX_MID/LARGE- Planned:
HAKMEM_WRAP_L2/L25,HAKMEM_MID_DYN1
Inline Policy (Hot Path)
- Size-only tier selection; no syscalls on hot path.
- Static inline + LUT for O(1) class mapping.
- One
FrozenPolicyload per call; otherwise read-only. - Site Rules stays off hot path (layer-internal hints only in future).