hakmem/docs/archive/SACS_3_OVERVIEW.md

# SACS‑3 Overview (Phase 6.16)

Goal: keep the hot path predictable and fast by deciding tiers by size only, and let ACE optimize just the cache layer (L1) dynamically.

## Tiers

- L0 Tiny (≤ 1 KiB)
  - TLS magazine, TLS Active Slab, MPSC remote‑free.
- L1 ACE (1 KiB < size < 2 MiB)
  - MidPool: 2/4/8/16/32 KiB (5 classes)
  - LargePool: 64/128/256/512 KiB/1 MiB (5 classes)
  - W_MAX rounding: class `c` accepted if `c ≤ W_MAX × size`.
  - Gap 32–64 KiB absorbed by rounding to 64 KiB when within W_MAX.
- L2 Big (≥ 2 MiB)
  - BigCache + mmap, THP gate.

## Hot Path (hak_alloc_at)

```
if (size ≤ 1 KiB)      return tinyslab_alloc(size);
if (size <  2 MiB)      return hkm_ace_alloc(size, site, P);  // L1
else                    return bigcache/mmap;                 // L2
```

Where `P` is a `FrozenPolicy` snapshot (RCU‑published); hot path reads it once per call.

## ACE = “Smart Cache” for L1

What ACE does (off hot path):

- CAP: per‑class budget (front‑loading) for Mid/Large.
- Site/class→shard: fix locality and reduce contention.
- Free policy per class: KEEP / delayed MADV_FREE / batched DONTNEED.
- W_MAX candidates: choose (e.g., {1.25, 1.5, 1.75}) via CANARY.
- BATCH_RANGE: pick from a few candidates.
- All decisions baked into a `FrozenPolicy`, published once per window.

## Profiling & Overhead Tracking

Enable sampling profiler:

```
export HAKMEM_PROF=1
export HAKMEM_PROF_SAMPLE=10   # sample 1/1024
```

Key categories: `tiny_alloc`, `ace_alloc`, `malloc_alloc`, `mmap_alloc`, plus internals like `pool_lock/refill`, `l25_lock/refill`, and tiny internals.

Sweep helper:

```
scripts/prof_sweep.sh -d 2 -t 1,4 -s 8
```

## Roadmap

1. CAP tuning (static → learning):
   - Observe L1 `malloc_alloc` avg ns and L1 fallback rate per range.
   - Increase CAP where hit < target; decrease where overshoot.
2. W_MAX tuning per tier (Mid vs Large) with guard rails.
3. Shard routing via FrozenPolicy (reduce lock contention).
4. Publish policy via `hkm_policy_publish()` on window boundaries (RCU).

## Learning Axes & Controls

- Axes:
  - Threshold (mmap/L1↔L2), Class Count (containers), Class Shape (boundaries + W_MAX), Class Volume (CAP)
- Implemented now:
  - Soft CAP gating on Mid/L2.5 refills (CAP over → bundle=1, under → up to 4)
  - Learner thread adjusts `mid_cap[]/large_cap[]` by target hit rate (hysteresis, budget)
- Env controls:
  - `HAKMEM_LEARN=1`, `HAKMEM_LEARN_WINDOW_MS`, `HAKMEM_TARGET_HIT_MID/LARGE`, `HAKMEM_CAP_STEP_MID/LARGE`, `HAKMEM_BUDGET_MID/LARGE`
  - `HAKMEM_CAP_MID/LARGE` (manual override), `HAKMEM_WMAX_MID/LARGE`
  - Planned: `HAKMEM_WRAP_L2/L25`, `HAKMEM_MID_DYN1`

## Inline Policy (Hot Path)

- Size-only tier selection; no syscalls on hot path.
- Static inline + LUT for O(1) class mapping.
- One `FrozenPolicy` load per call; otherwise read-only.
- Site Rules stays off hot path (layer-internal hints only in future).
-												Debug Counters Implementation - Clean History

Major Features:
- Debug counter infrastructure for Refill Stage tracking
- Free Pipeline counters (ss_local, ss_remote, tls_sll)
- Diagnostic counters for early return analysis
- Unified larson.sh benchmark runner with profiles
- Phase 6-3 regression analysis documentation

Bug Fixes:
- Fix SuperSlab disabled by default (HAKMEM_TINY_USE_SUPERSLAB)
- Fix profile variable naming consistency
- Add .gitignore patterns for large files

Performance:
- Phase 6-3: 4.79 M ops/s (has OOM risk)
- With SuperSlab: 3.13 M ops/s (+19% improvement)

This is a clean repository without large log files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>

											
										
										
											2025-11-05 12:31:14 +09:00
+								# SACS‑3 Overview (Phase 6.16)
 								Goal: keep the hot path predictable and fast by deciding tiers by size only, and let ACE optimize just the cache layer (L1) dynamically.
 								## Tiers
 								- L0 Tiny (≤ 1 KiB)
 								  - TLS magazine, TLS Active Slab, MPSC remote‑free.
 								- L1 ACE (1 KiB < size < 2 MiB)
 								  - MidPool: 2/4/8/16/32 KiB (5 classes)
 								  - LargePool: 64/128/256/512 KiB/1 MiB (5 classes)
 								  - W_MAX rounding: class `c` accepted if `c ≤ W_MAX × size`.
 								  - Gap 32–64 KiB absorbed by rounding to 64 KiB when within W_MAX.
 								- L2 Big (≥ 2 MiB)
 								  - BigCache + mmap, THP gate.
 								## Hot Path (hak_alloc_at)
 								```
 								if (size ≤ 1 KiB)      return tinyslab_alloc(size);
 								if (size <  2 MiB)      return hkm_ace_alloc(size, site, P);  // L1
 								else                    return bigcache/mmap;                 // L2
 								```
 								Where `P` is a `FrozenPolicy` snapshot (RCU‑published); hot path reads it once per call.
 								## ACE = “Smart Cache” for L1
 								What ACE does (off hot path):
 								- CAP: per‑class budget (front‑loading) for Mid/Large.
 								- Site/class→shard: fix locality and reduce contention.
 								- Free policy per class: KEEP / delayed MADV_FREE / batched DONTNEED.
 								- W_MAX candidates: choose (e.g., {1.25, 1.5, 1.75}) via CANARY.
 								- BATCH_RANGE: pick from a few candidates.
 								- All decisions baked into a `FrozenPolicy`, published once per window.
 								## Profiling & Overhead Tracking
 								Enable sampling profiler:
 								```
 								export HAKMEM_PROF=1
 								export HAKMEM_PROF_SAMPLE=10   # sample 1/1024
 								```
 								Key categories: `tiny_alloc`, `ace_alloc`, `malloc_alloc`, `mmap_alloc`, plus internals like `pool_lock/refill`, `l25_lock/refill`, and tiny internals.
 								Sweep helper:
 								```
 								scripts/prof_sweep.sh -d 2 -t 1,4 -s 8
 								```
 								## Roadmap
 . CAP tuning (static → learning):
 								   - Observe L1 `malloc_alloc` avg ns and L1 fallback rate per range.
 								   - Increase CAP where hit < target; decrease where overshoot.
 . W_MAX tuning per tier (Mid vs Large) with guard rails.
 . Shard routing via FrozenPolicy (reduce lock contention).
 . Publish policy via `hkm_policy_publish()` on window boundaries (RCU).
 								## Learning Axes & Controls
 								- Axes:
 								  - Threshold (mmap/L1↔L2), Class Count (containers), Class Shape (boundaries + W_MAX), Class Volume (CAP)
 								- Implemented now:
 								  - Soft CAP gating on Mid/L2.5 refills (CAP over → bundle=1, under → up to 4)
 								  - Learner thread adjusts `mid_cap[]/large_cap[]` by target hit rate (hysteresis, budget)
 								- Env controls:
 								  - `HAKMEM_LEARN=1`, `HAKMEM_LEARN_WINDOW_MS`, `HAKMEM_TARGET_HIT_MID/LARGE`, `HAKMEM_CAP_STEP_MID/LARGE`, `HAKMEM_BUDGET_MID/LARGE`
 								  - `HAKMEM_CAP_MID/LARGE` (manual override), `HAKMEM_WMAX_MID/LARGE`
 								  - Planned: `HAKMEM_WRAP_L2/L25`, `HAKMEM_MID_DYN1`
 								## Inline Policy (Hot Path)
 								- Size-only tier selection; no syscalls on hot path.
 								- Static inline + LUT for O(1) class mapping.
 								- One `FrozenPolicy` load per call; otherwise read-only.
 								- Site Rules stays off hot path (layer-internal hints only in future).