Major Features: - Debug counter infrastructure for Refill Stage tracking - Free Pipeline counters (ss_local, ss_remote, tls_sll) - Diagnostic counters for early return analysis - Unified larson.sh benchmark runner with profiles - Phase 6-3 regression analysis documentation Bug Fixes: - Fix SuperSlab disabled by default (HAKMEM_TINY_USE_SUPERSLAB) - Fix profile variable naming consistency - Add .gitignore patterns for large files Performance: - Phase 6-3: 4.79 M ops/s (has OOM risk) - With SuperSlab: 3.13 M ops/s (+19% improvement) This is a clean repository without large log files. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2.0 KiB
2.0 KiB
Mainline Integration Plan (Tiny focus)
What we promoted to mainline (safe, general)
- Safer tiny refill into SLL (already integrated)
sll_refill_small_from_ss(class_idx, max_take)now caps refill by the actual SLL free capacity, avoiding overtake and wasted meta->used increments.
- Entry order consolidation in the small fast path
- Prefer
SLL → Magazine → SuperSlabin the normal (non‑bench) path; Quick/FrontCache/Ultra tiers remain opt‑in.
- Prefer
- Targeted remote‑drain queue (compiled in, default OFF)
- Per‑class Treiber queue for slabs that exceed remote thresholds; disabled by default via env knobs to preserve conservative behavior.
- PGO recipe (opt‑in)
- Makefile targets for PGO on tiny benches are available, but not required for normal builds.
What remains bench‑only (NOT promoted)
- SLL‑only front (Magazine compiled out) and TLS warmup
- These are highly benchmark‑specific and are kept in bench‑only builds.
- Free‑side SLL‑first push without owner/stats
- Mainline preserves learning/stats semantics; bench builds can cut them out.
- Quick/FrontCache / 32/64 specialization hardwiring
- Retained as A/B options; not enabled by default in mainline.
Recommended “Perf‑Main” preset (opt‑in, no bench macros)
- Environment (Tiny‑Hot biased but general):
HAKMEM_TINY_TLS_SLL=1HAKMEM_TINY_REFILL_MAX=96HAKMEM_TINY_REFILL_MAX_HOT=192HAKMEM_TINY_SPILL_HYST=16HAKMEM_TINY_BG_REMOTE=0(keep targeted remote drain off by default)- Keep Quick/FrontCache/Ultra OFF unless explicitly A/B tested
How to try Perf‑Main locally
- Build benches:
make bench_fast - Run tiny‑hot triad (no bench macros):
bash scripts/run_tiny_hot_triad.sh 60000 - Run random‑mixed matrix:
bash scripts/run_random_mixed_matrix.sh 100000
Notes
- LD_PRELOAD/app mode remains conservative (LD_SAFE staging). Tiny‑only and pass‑through modes are recommended for stability. The bench‑only optimizations are intentionally not applied in LD mode.