Files
hakmem/docs/archive/SACS_3_OVERVIEW.md
Moe Charm (CI) 52386401b3 Debug Counters Implementation - Clean History
Major Features:
- Debug counter infrastructure for Refill Stage tracking
- Free Pipeline counters (ss_local, ss_remote, tls_sll)
- Diagnostic counters for early return analysis
- Unified larson.sh benchmark runner with profiles
- Phase 6-3 regression analysis documentation

Bug Fixes:
- Fix SuperSlab disabled by default (HAKMEM_TINY_USE_SUPERSLAB)
- Fix profile variable naming consistency
- Add .gitignore patterns for large files

Performance:
- Phase 6-3: 4.79 M ops/s (has OOM risk)
- With SuperSlab: 3.13 M ops/s (+19% improvement)

This is a clean repository without large log files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-05 12:31:14 +09:00

2.9 KiB
Raw Blame History

SACS3 Overview (Phase 6.16)

Goal: keep the hot path predictable and fast by deciding tiers by size only, and let ACE optimize just the cache layer (L1) dynamically.

Tiers

  • L0 Tiny (≤ 1 KiB)
    • TLS magazine, TLS Active Slab, MPSC remotefree.
  • L1 ACE (1 KiB < size < 2 MiB)
    • MidPool: 2/4/8/16/32 KiB (5 classes)
    • LargePool: 64/128/256/512 KiB/1 MiB (5 classes)
    • W_MAX rounding: class c accepted if c ≤ W_MAX × size.
    • Gap 3264 KiB absorbed by rounding to 64 KiB when within W_MAX.
  • L2 Big (≥ 2 MiB)
    • BigCache + mmap, THP gate.

Hot Path (hak_alloc_at)

if (size ≤ 1 KiB)      return tinyslab_alloc(size);
if (size <  2 MiB)      return hkm_ace_alloc(size, site, P);  // L1
else                    return bigcache/mmap;                 // L2

Where P is a FrozenPolicy snapshot (RCUpublished); hot path reads it once per call.

ACE = “Smart Cache” for L1

What ACE does (off hot path):

  • CAP: perclass budget (frontloading) for Mid/Large.
  • Site/class→shard: fix locality and reduce contention.
  • Free policy per class: KEEP / delayed MADV_FREE / batched DONTNEED.
  • W_MAX candidates: choose (e.g., {1.25, 1.5, 1.75}) via CANARY.
  • BATCH_RANGE: pick from a few candidates.
  • All decisions baked into a FrozenPolicy, published once per window.

Profiling & Overhead Tracking

Enable sampling profiler:

export HAKMEM_PROF=1
export HAKMEM_PROF_SAMPLE=10   # sample 1/1024

Key categories: tiny_alloc, ace_alloc, malloc_alloc, mmap_alloc, plus internals like pool_lock/refill, l25_lock/refill, and tiny internals.

Sweep helper:

scripts/prof_sweep.sh -d 2 -t 1,4 -s 8

Roadmap

  1. CAP tuning (static → learning):
    • Observe L1 malloc_alloc avg ns and L1 fallback rate per range.
    • Increase CAP where hit < target; decrease where overshoot.
  2. W_MAX tuning per tier (Mid vs Large) with guard rails.
  3. Shard routing via FrozenPolicy (reduce lock contention).
  4. Publish policy via hkm_policy_publish() on window boundaries (RCU).

Learning Axes & Controls

  • Axes:
    • Threshold (mmap/L1↔L2), Class Count (containers), Class Shape (boundaries + W_MAX), Class Volume (CAP)
  • Implemented now:
    • Soft CAP gating on Mid/L2.5 refills (CAP over → bundle=1, under → up to 4)
    • Learner thread adjusts mid_cap[]/large_cap[] by target hit rate (hysteresis, budget)
  • Env controls:
    • HAKMEM_LEARN=1, HAKMEM_LEARN_WINDOW_MS, HAKMEM_TARGET_HIT_MID/LARGE, HAKMEM_CAP_STEP_MID/LARGE, HAKMEM_BUDGET_MID/LARGE
    • HAKMEM_CAP_MID/LARGE (manual override), HAKMEM_WMAX_MID/LARGE
    • Planned: HAKMEM_WRAP_L2/L25, HAKMEM_MID_DYN1

Inline Policy (Hot Path)

  • Size-only tier selection; no syscalls on hot path.
  • Static inline + LUT for O(1) class mapping.
  • One FrozenPolicy load per call; otherwise read-only.
  • Site Rules stays off hot path (layer-internal hints only in future).