Files
hakmem/docs/archive/SACS_3_OVERVIEW.md
Moe Charm (CI) 52386401b3 Debug Counters Implementation - Clean History
Major Features:
- Debug counter infrastructure for Refill Stage tracking
- Free Pipeline counters (ss_local, ss_remote, tls_sll)
- Diagnostic counters for early return analysis
- Unified larson.sh benchmark runner with profiles
- Phase 6-3 regression analysis documentation

Bug Fixes:
- Fix SuperSlab disabled by default (HAKMEM_TINY_USE_SUPERSLAB)
- Fix profile variable naming consistency
- Add .gitignore patterns for large files

Performance:
- Phase 6-3: 4.79 M ops/s (has OOM risk)
- With SuperSlab: 3.13 M ops/s (+19% improvement)

This is a clean repository without large log files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-05 12:31:14 +09:00

82 lines
2.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# SACS3 Overview (Phase 6.16)
Goal: keep the hot path predictable and fast by deciding tiers by size only, and let ACE optimize just the cache layer (L1) dynamically.
## Tiers
- L0 Tiny (≤ 1 KiB)
- TLS magazine, TLS Active Slab, MPSC remotefree.
- L1 ACE (1 KiB < size < 2 MiB)
- MidPool: 2/4/8/16/32 KiB (5 classes)
- LargePool: 64/128/256/512 KiB/1 MiB (5 classes)
- W_MAX rounding: class `c` accepted if `c ≤ W_MAX × size`.
- Gap 3264 KiB absorbed by rounding to 64 KiB when within W_MAX.
- L2 Big (≥ 2 MiB)
- BigCache + mmap, THP gate.
## Hot Path (hak_alloc_at)
```
if (size ≤ 1 KiB) return tinyslab_alloc(size);
if (size < 2 MiB) return hkm_ace_alloc(size, site, P); // L1
else return bigcache/mmap; // L2
```
Where `P` is a `FrozenPolicy` snapshot (RCUpublished); hot path reads it once per call.
## ACE = “Smart Cache” for L1
What ACE does (off hot path):
- CAP: perclass budget (frontloading) for Mid/Large.
- Site/classshard: fix locality and reduce contention.
- Free policy per class: KEEP / delayed MADV_FREE / batched DONTNEED.
- W_MAX candidates: choose (e.g., {1.25, 1.5, 1.75}) via CANARY.
- BATCH_RANGE: pick from a few candidates.
- All decisions baked into a `FrozenPolicy`, published once per window.
## Profiling & Overhead Tracking
Enable sampling profiler:
```
export HAKMEM_PROF=1
export HAKMEM_PROF_SAMPLE=10 # sample 1/1024
```
Key categories: `tiny_alloc`, `ace_alloc`, `malloc_alloc`, `mmap_alloc`, plus internals like `pool_lock/refill`, `l25_lock/refill`, and tiny internals.
Sweep helper:
```
scripts/prof_sweep.sh -d 2 -t 1,4 -s 8
```
## Roadmap
1. CAP tuning (static learning):
- Observe L1 `malloc_alloc` avg ns and L1 fallback rate per range.
- Increase CAP where hit < target; decrease where overshoot.
2. W_MAX tuning per tier (Mid vs Large) with guard rails.
3. Shard routing via FrozenPolicy (reduce lock contention).
4. Publish policy via `hkm_policy_publish()` on window boundaries (RCU).
## Learning Axes & Controls
- Axes:
- Threshold (mmap/L1L2), Class Count (containers), Class Shape (boundaries + W_MAX), Class Volume (CAP)
- Implemented now:
- Soft CAP gating on Mid/L2.5 refills (CAP over bundle=1, under up to 4)
- Learner thread adjusts `mid_cap[]/large_cap[]` by target hit rate (hysteresis, budget)
- Env controls:
- `HAKMEM_LEARN=1`, `HAKMEM_LEARN_WINDOW_MS`, `HAKMEM_TARGET_HIT_MID/LARGE`, `HAKMEM_CAP_STEP_MID/LARGE`, `HAKMEM_BUDGET_MID/LARGE`
- `HAKMEM_CAP_MID/LARGE` (manual override), `HAKMEM_WMAX_MID/LARGE`
- Planned: `HAKMEM_WRAP_L2/L25`, `HAKMEM_MID_DYN1`
## Inline Policy (Hot Path)
- Size-only tier selection; no syscalls on hot path.
- Static inline + LUT for O(1) class mapping.
- One `FrozenPolicy` load per call; otherwise read-only.
- Site Rules stays off hot path (layer-internal hints only in future).