Files
hakmem/docs/analysis/ENV_CONSOLIDATION_PLAN.md

160 lines
6.1 KiB
Markdown
Raw Normal View History

# ENV Consolidation Plan (149 policy vars → ~30)
## Summary
- Current surface: ~279 getenv-based ENV names; policy/tuning bucket accounts for **~149** of them.
- Guardrails: **Ultra**, **BG**, **HeapV2/FrontV2**, **SFC**, **TC/Ring** stay; consolidation must not remove A/B rollback paths.
- Goal: converge on **30 policy knobs** (plus ~20 essentials) by merging duplicate/refined knobs and moving getenv to init-time Box boundaries.
- Docs-only debris: 21 proposal-only names are tracked separately in `SAFE_TO_DELETE_ENV_VARS.md` and can be pruned from docs.
## What Stays (no consolidation/removal)
- **Essential (runtime-required)**: `HAKMEM_WRAP_TINY`, `HAKMEM_TINY_USE_SUPERSLAB`, `HAKMEM_TINY_TLS_SLL`, `HAKMEM_TINY_FREE_FAST`, `HAKMEM_TINY_UNIFIED_CACHE`, `HAKMEM_SS_EMPTY_REUSE`, `HAKMEM_FRONT_GATE_UNIFIED`, `HAKMEM_POOL_TLS_FREE`, page arena knobs, prewarm knobs, and SmallMid enable.
- **Active feature flags**: Ultra (`HAKMEM_TINY_ULTRA_*`, `HAKMEM_ULTRA_SLIM_STATS`), BG (`HAKMEM_BATCH_BG`, `HAKMEM_L25_BG_*`), HeapV2/FrontV2 (`HAKMEM_TINY_HEAP_V2_*`, `HAKMEM_TINY_FRONT_*`), SFC (`HAKMEM_SFC_*`), TC/Ring (`HAKMEM_TC_*`, `HAKMEM_POOL_TLS_RING`, `HAKMEM_RING_RETURN_DIV`, `HAKMEM_L25_RING_TRIGGER`).
- **Debug knobs**: Keep but gate behind init-time caches or build flags (no repeated getenv in hot paths).
## Consolidation Targets (149 → ~30)
- **Refill parameters**: 18 knobs → 3 (`HAKMEM_REFILL_BATCH`, `HAKMEM_REFILL_HOT_BATCH`, `HAKMEM_REFILL_MAX`).
- **Capacity presets**: 25 knobs → 6 (`HAKMEM_CACHE_PRESET`, `HAKMEM_TINY_CACHE_SLOTS`, `HAKMEM_MID_CACHE_SLOTS`, `HAKMEM_LARGE_CACHE_SLOTS`, `HAKMEM_SS_MEMORY_LIMIT_MB`, `HAKMEM_POOL_MEMORY_LIMIT_MB`).
- **Adaptive/learning**: 35 knobs → 8 (`HAKMEM_ADAPTIVE_MODE`, `HAKMEM_ADAPTIVE_TARGETS`, `HAKMEM_ADAPTIVE_INTERVAL_MS`, `HAKMEM_ADAPTIVE_AGGRESSIVENESS`, `HAKMEM_ADAPTIVE_MIN_CHANGE_PCT`, `HAKMEM_THP_POLICY`, `HAKMEM_WMAX_MID_KB`, `HAKMEM_WMAX_LARGE_KB`).
- **Drain/thresholds**: 20 knobs → 5 (`HAKMEM_REMOTE_DRAIN_THRESHOLD`, `HAKMEM_REMOTE_DRAIN_BATCH`, `HAKMEM_CACHE_SPILL_PCT`, `HAKMEM_MEMORY_TRIM_INTERVAL_MS`, `HAKMEM_IDLE_TIMEOUT_MS`).
- **L25/Pool tuning**: 15 knobs → 5 (`HAKMEM_L25_SIZE_ZONES`, `HAKMEM_L25_BUNDLE_SIZE`, `HAKMEM_MID_BUNDLE_SIZE`, `HAKMEM_L25_RUN_BLOCKS`, `HAKMEM_L25_RUN_FACTOR`).
- **Global presets**: 36 knobs → 3 (`HAKMEM_MODE`, `HAKMEM_FORCE_LIBC`, `HAKMEM_RSS_BUDGET_KB`).
## Proposed End-State Variables
**Essential (keep as-is, ~20)**
```
HAKMEM_WRAP_TINY=1
HAKMEM_TINY_USE_SUPERSLAB=1
HAKMEM_TINY_TLS_SLL=1
HAKMEM_TINY_FREE_FAST=1
HAKMEM_TINY_UNIFIED_CACHE=1
HAKMEM_SS_EMPTY_REUSE=1
HAKMEM_FRONT_GATE_UNIFIED=1
HAKMEM_POOL_TLS_FREE=1
HAKMEM_POOL_TLS_ARENA_MB_INIT=1
HAKMEM_POOL_TLS_ARENA_MB_MAX=8
HAKMEM_POOL_TLS_ARENA_GROWTH_LEVELS=3
HAKMEM_PAGE_ARENA_ENABLE=1
HAKMEM_PAGE_ARENA_HOT_SIZE=...
HAKMEM_PAGE_ARENA_WARM_64K=...
HAKMEM_PAGE_ARENA_WARM_128K=...
HAKMEM_PAGE_ARENA_WARM_2M=...
HAKMEM_PREWARM_SUPERSLABS=1
HAKMEM_TINY_PREWARM_ALL=1
HAKMEM_WRAP_TINY_REFILL=1
HAKMEM_SMALLMID_ENABLE=1
```
**Policy (~30 total)**
```
# Refill (3)
HAKMEM_REFILL_BATCH=64
HAKMEM_REFILL_HOT_BATCH=192
HAKMEM_REFILL_MAX=1024
# Capacity (6)
HAKMEM_CACHE_PRESET=medium
HAKMEM_TINY_CACHE_SLOTS=512
HAKMEM_MID_CACHE_SLOTS=128
HAKMEM_LARGE_CACHE_SLOTS=32
HAKMEM_SS_MEMORY_LIMIT_MB=512
HAKMEM_POOL_MEMORY_LIMIT_MB=256
# Adaptive (8)
HAKMEM_ADAPTIVE_MODE=off|observe|active
HAKMEM_ADAPTIVE_TARGETS=caps,refill,wmax
HAKMEM_ADAPTIVE_INTERVAL_MS=1000
HAKMEM_ADAPTIVE_AGGRESSIVENESS=0.1
HAKMEM_ADAPTIVE_MIN_CHANGE_PCT=5
HAKMEM_THP_POLICY=off|auto|on
HAKMEM_WMAX_MID_KB=256
HAKMEM_WMAX_LARGE_KB=2048
# Drain/Threshold (5)
HAKMEM_REMOTE_DRAIN_THRESHOLD=32
HAKMEM_REMOTE_DRAIN_BATCH=16
HAKMEM_CACHE_SPILL_PCT=90
HAKMEM_MEMORY_TRIM_INTERVAL_MS=1000
HAKMEM_IDLE_TIMEOUT_MS=5000
# L25/Pool (5)
HAKMEM_L25_SIZE_ZONES=64,256,1024
HAKMEM_L25_BUNDLE_SIZE=4
HAKMEM_MID_BUNDLE_SIZE=4
HAKMEM_L25_RUN_BLOCKS=16
HAKMEM_L25_RUN_FACTOR=2
# Global (3)
HAKMEM_MODE=fast|balanced|learning|minimal
HAKMEM_FORCE_LIBC=0|1
HAKMEM_RSS_BUDGET_KB=unlimited
```
## Migration Guidelines
- **Boundary first**: move getenv reads into init/startup boxes; expose derived knobs via globals for hot paths.
- **A/B friendly**: keep legacy names temporarily as aliases that forward to the consolidated knobs (env->new mapping) to enable rollback.
- **Visibility**: emit one-shot warnings when legacy names are used (Box Principle #4: visible, not noisy).
- **Fail-fast**: for removed knobs, fail early with a clear message instead of silently ignoring.
### Example 1: Refill
```
# OLD (18 knobs)
HAKMEM_TINY_REFILL_MAX=64
HAKMEM_TINY_REFILL_MAX_HOT=192
HAKMEM_TINY_REFILL_COUNT=32
HAKMEM_TINY_REFILL_COUNT_HOT=96
HAKMEM_TINY_REFILL_COUNT_MID=48
# ... 13 more ...
# NEW (3 knobs)
HAKMEM_REFILL_BATCH=64
HAKMEM_REFILL_HOT_BATCH=192
HAKMEM_REFILL_MAX=1024
```
### Example 2: Capacity presets
```
# OLD (30+ knobs)
HAKMEM_TINY_MAG_CAP=512
HAKMEM_TINY_MAG_CAP_C0=256
HAKMEM_TINY_MAG_CAP_C1=256
# ... 27 more ...
# NEW (preset)
HAKMEM_CACHE_PRESET=medium
# OR custom
HAKMEM_CACHE_PRESET=custom
HAKMEM_TINY_CACHE_SLOTS=512
HAKMEM_MID_CACHE_SLOTS=128
HAKMEM_LARGE_CACHE_SLOTS=32
HAKMEM_SS_MEMORY_LIMIT_MB=512
HAKMEM_POOL_MEMORY_LIMIT_MB=256
```
### Example 3: Adaptive
```
# OLD (ACE/INT/DYN/WMAX/THP spread across 35 knobs)
HAKMEM_ACE_ENABLED=1
HAKMEM_INT_ADAPT_REFILL=1
HAKMEM_INT_ADAPT_CAPS=1
HAKMEM_WMAX_LEARN=1
# ... 30 more ...
# NEW (4-8 knobs depending on depth)
HAKMEM_ADAPTIVE_MODE=active
HAKMEM_ADAPTIVE_TARGETS=caps,refill,wmax
HAKMEM_ADAPTIVE_INTERVAL_MS=1000
HAKMEM_ADAPTIVE_AGGRESSIVENESS=0.1
```
## Rollout Plan
1. Add compatibility shim: map legacy names to new knobs inside `hak_core_init.inc.h` (one boundary).
2. Emit one-shot warnings for legacy names (visible, not noisy).
3. Update docs (`ENV_VARS.md`, `ENV_VARS_COMPLETE.md`) to show consolidated set and alias mapping.
4. After two releases, drop the alias layer (#ifdef guard for rollback during transition).
## Open Questions
- Which legacy knobs need long-term support for external A/B testing? (mark in the alias table)
- Should `HAKMEM_FORCE_LIBC_ALLOC*` be absorbed into `HAKMEM_FORCE_LIBC` or kept split?
- Confirm whether per-class fastcache caps (`HAKMEM_TINY_FAST_CAP_C*`) still need independent overrides after consolidation.