Files
hakmem/docs/archive/MAINLINE_INTEGRATION.md
Moe Charm (CI) 52386401b3 Debug Counters Implementation - Clean History
Major Features:
- Debug counter infrastructure for Refill Stage tracking
- Free Pipeline counters (ss_local, ss_remote, tls_sll)
- Diagnostic counters for early return analysis
- Unified larson.sh benchmark runner with profiles
- Phase 6-3 regression analysis documentation

Bug Fixes:
- Fix SuperSlab disabled by default (HAKMEM_TINY_USE_SUPERSLAB)
- Fix profile variable naming consistency
- Add .gitignore patterns for large files

Performance:
- Phase 6-3: 4.79 M ops/s (has OOM risk)
- With SuperSlab: 3.13 M ops/s (+19% improvement)

This is a clean repository without large log files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-05 12:31:14 +09:00

2.0 KiB
Raw Blame History

Mainline Integration Plan (Tiny focus)

What we promoted to mainline (safe, general)

  • Safer tiny refill into SLL (already integrated)
    • sll_refill_small_from_ss(class_idx, max_take) now caps refill by the actual SLL free capacity, avoiding overtake and wasted meta->used increments.
  • Entry order consolidation in the small fast path
    • Prefer SLL → Magazine → SuperSlab in the normal (nonbench) path; Quick/FrontCache/Ultra tiers remain optin.
  • Targeted remotedrain queue (compiled in, default OFF)
    • Perclass Treiber queue for slabs that exceed remote thresholds; disabled by default via env knobs to preserve conservative behavior.
  • PGO recipe (optin)
    • Makefile targets for PGO on tiny benches are available, but not required for normal builds.

What remains benchonly (NOT promoted)

  • SLLonly front (Magazine compiled out) and TLS warmup
    • These are highly benchmarkspecific and are kept in benchonly builds.
  • Freeside SLLfirst push without owner/stats
    • Mainline preserves learning/stats semantics; bench builds can cut them out.
  • Quick/FrontCache / 32/64 specialization hardwiring
    • Retained as A/B options; not enabled by default in mainline.

Recommended “PerfMain” preset (optin, no bench macros)

  • Environment (TinyHot biased but general):
    • HAKMEM_TINY_TLS_SLL=1
    • HAKMEM_TINY_REFILL_MAX=96
    • HAKMEM_TINY_REFILL_MAX_HOT=192
    • HAKMEM_TINY_SPILL_HYST=16
    • HAKMEM_TINY_BG_REMOTE=0 (keep targeted remote drain off by default)
    • Keep Quick/FrontCache/Ultra OFF unless explicitly A/B tested

How to try PerfMain locally

  • Build benches: make bench_fast
  • Run tinyhot triad (no bench macros): bash scripts/run_tiny_hot_triad.sh 60000
  • Run randommixed matrix: bash scripts/run_random_mixed_matrix.sh 100000

Notes

  • LD_PRELOAD/app mode remains conservative (LD_SAFE staging). Tinyonly and passthrough modes are recommended for stability. The benchonly optimizations are intentionally not applied in LD mode.