Major Features: - Debug counter infrastructure for Refill Stage tracking - Free Pipeline counters (ss_local, ss_remote, tls_sll) - Diagnostic counters for early return analysis - Unified larson.sh benchmark runner with profiles - Phase 6-3 regression analysis documentation Bug Fixes: - Fix SuperSlab disabled by default (HAKMEM_TINY_USE_SUPERSLAB) - Fix profile variable naming consistency - Add .gitignore patterns for large files Performance: - Phase 6-3: 4.79 M ops/s (has OOM risk) - With SuperSlab: 3.13 M ops/s (+19% improvement) This is a clean repository without large log files. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
38 lines
2.0 KiB
Markdown
38 lines
2.0 KiB
Markdown
Mainline Integration Plan (Tiny focus)
|
||
|
||
What we promoted to mainline (safe, general)
|
||
- Safer tiny refill into SLL (already integrated)
|
||
- `sll_refill_small_from_ss(class_idx, max_take)` now caps refill by the actual SLL free capacity, avoiding overtake and wasted meta->used increments.
|
||
- Entry order consolidation in the small fast path
|
||
- Prefer `SLL → Magazine → SuperSlab` in the normal (non‑bench) path; Quick/FrontCache/Ultra tiers remain opt‑in.
|
||
- Targeted remote‑drain queue (compiled in, default OFF)
|
||
- Per‑class Treiber queue for slabs that exceed remote thresholds; disabled by default via env knobs to preserve conservative behavior.
|
||
- PGO recipe (opt‑in)
|
||
- Makefile targets for PGO on tiny benches are available, but not required for normal builds.
|
||
|
||
What remains bench‑only (NOT promoted)
|
||
- SLL‑only front (Magazine compiled out) and TLS warmup
|
||
- These are highly benchmark‑specific and are kept in bench‑only builds.
|
||
- Free‑side SLL‑first push without owner/stats
|
||
- Mainline preserves learning/stats semantics; bench builds can cut them out.
|
||
- Quick/FrontCache / 32/64 specialization hardwiring
|
||
- Retained as A/B options; not enabled by default in mainline.
|
||
|
||
Recommended “Perf‑Main” preset (opt‑in, no bench macros)
|
||
- Environment (Tiny‑Hot biased but general):
|
||
- `HAKMEM_TINY_TLS_SLL=1`
|
||
- `HAKMEM_TINY_REFILL_MAX=96`
|
||
- `HAKMEM_TINY_REFILL_MAX_HOT=192`
|
||
- `HAKMEM_TINY_SPILL_HYST=16`
|
||
- `HAKMEM_TINY_BG_REMOTE=0` (keep targeted remote drain off by default)
|
||
- Keep Quick/FrontCache/Ultra OFF unless explicitly A/B tested
|
||
|
||
How to try Perf‑Main locally
|
||
- Build benches: `make bench_fast`
|
||
- Run tiny‑hot triad (no bench macros): `bash scripts/run_tiny_hot_triad.sh 60000`
|
||
- Run random‑mixed matrix: `bash scripts/run_random_mixed_matrix.sh 100000`
|
||
|
||
Notes
|
||
- LD_PRELOAD/app mode remains conservative (LD_SAFE staging). Tiny‑only and pass‑through modes are recommended for stability. The bench‑only optimizations are intentionally not applied in LD mode.
|
||
|