diff --git a/CONFIGURATION.md b/CONFIGURATION.md new file mode 100644 index 00000000..d7c3114a --- /dev/null +++ b/CONFIGURATION.md @@ -0,0 +1,392 @@ +# HAKMEM Configuration Guide + +**Last Updated**: 2025-11-26 (After Phase 2.2 - Learning Systems Consolidation) + +This guide documents all canonical HAKMEM environment variables after Phase 0-2 cleanup. + +--- + +## ๐Ÿ“‹ Quick Reference + +Use the validation tool to check your configuration: + +```bash +# Validate current environment +./scripts/validate_config.sh + +# Strict mode (treat warnings as errors) +./scripts/validate_config.sh --strict + +# Quiet mode (errors only) +./scripts/validate_config.sh --quiet +``` + +**Deprecated variables?** See [DEPRECATED.md](DEPRECATED.md) for migration guide. + +--- + +## ๐ŸŽฏ Core Configuration + +### Allocator Path Selection + +| Variable | Values | Default | Description | +|----------|--------|---------|-------------| +| `HAKMEM_WRAP_TINY` | 0, 1 | 1 | Enable TINY allocator (1-2048B) | +| `HAKMEM_WRAP_POOL` | 0, 1 | 1 | Enable POOL allocator (2-8KB) | +| `HAKMEM_WRAP_MID` | 0, 1 | 1 | Enable MID allocator (8-32KB) | +| `HAKMEM_WRAP_LARGE` | 0, 1 | 1 | Enable LARGE allocator (>32KB) | + +**Example**: +```bash +# Disable all HAKMEM allocators (use system malloc) +export HAKMEM_WRAP_TINY=0 HAKMEM_WRAP_POOL=0 HAKMEM_WRAP_MID=0 HAKMEM_WRAP_LARGE=0 +``` + +--- + +## ๐Ÿ› Debug & Diagnostics + +**Canonical Variables** (After P0.4 - Debug Consolidation): + +| Variable | Values | Default | Description | +|----------|--------|---------|-------------| +| `HAKMEM_DEBUG_LEVEL` | 0-3 | 0 | Verbosity (0=none, 1=errors, 2=info, 3=verbose) | +| `HAKMEM_DEBUG_TINY` | 0, 1 | 0 | Enable TINY allocator debug output | +| `HAKMEM_TRACE_ALLOCATIONS` | 0, 1 | 0 | Trace every alloc/free (expensive!) | +| `HAKMEM_INTEGRITY_CHECKS` | 0, 1 | 1 | Enable integrity validation (canary checks) | + +**Examples**: +```bash +# Production (quiet, integrity only) +export HAKMEM_DEBUG_LEVEL=0 +export HAKMEM_INTEGRITY_CHECKS=1 + +# Debug session (verbose + TINY debug + tracing) +export HAKMEM_DEBUG_LEVEL=3 +export HAKMEM_DEBUG_TINY=1 +export HAKMEM_TRACE_ALLOCATIONS=1 +export HAKMEM_INTEGRITY_CHECKS=1 + +# Performance testing (all checks OFF) +export HAKMEM_DEBUG_LEVEL=0 +export HAKMEM_INTEGRITY_CHECKS=0 +``` + +--- + +## ๐Ÿ—๏ธ SuperSlab Management + +**Canonical Variables** (After P0.1 - SuperSlab Unification): + +| Variable | Values | Default | Description | +|----------|--------|---------|-------------| +| `HAKMEM_SUPERSLAB_REUSE` | 0, 1 | 0 | Reuse empty slabs (reduces mmap/munmap syscalls) | +| `HAKMEM_SUPERSLAB_LAZY` | 0, 1 | 1 | Lazy deallocation (Phase 9, keep slabs cached) | +| `HAKMEM_SUPERSLAB_PREWARM` | 0-128 | 0 | Preallocate N SuperSlabs at startup | +| `HAKMEM_SUPERSLAB_LRU_CAP` | 0-1024 | 256 | Max cached SuperSlabs (LRU eviction) | +| `HAKMEM_SUPERSLAB_SOFT_CAP` | 0-1024 | 128 | Soft cap for SuperSlab pool (before eviction) | + +**Examples**: +```bash +# High performance (aggressive reuse + large cache) +export HAKMEM_SUPERSLAB_REUSE=1 +export HAKMEM_SUPERSLAB_LAZY=1 +export HAKMEM_SUPERSLAB_PREWARM=16 +export HAKMEM_SUPERSLAB_LRU_CAP=512 + +# Low memory footprint (minimal caching) +export HAKMEM_SUPERSLAB_REUSE=0 +export HAKMEM_SUPERSLAB_LAZY=0 +export HAKMEM_SUPERSLAB_LRU_CAP=32 +export HAKMEM_SUPERSLAB_SOFT_CAP=16 +``` + +**Note**: Phase 12 (Shared SuperSlab Pool) removed per-class registry population, making `SUPERSLAB_REUSE` less effective. Default is OFF. + +--- + +## ๐Ÿง  Learning Systems + +**Canonical Variables** (After P2.2 - Learning Consolidation, 18โ†’6 variables): + +### Allocation Learning +Controls adaptive sizing for allocator caches (TLS, SFC, capacity tuning). + +| Variable | Values | Default | Description | +|----------|--------|---------|-------------| +| `HAKMEM_ALLOC_LEARN` | 0, 1 | 0 | Enable allocation pattern learning | +| `HAKMEM_ALLOC_LEARN_WINDOW` | 1-1000000 | 10000 | Learning window size (operations) | +| `HAKMEM_ALLOC_LEARN_RATE` | 0.0-1.0 | 0.1 | Learning rate (lower = slower adaptation) | + +### Memory Learning +Controls THP (Transparent Huge Pages), RSS optimization, and max-size learning. + +| Variable | Values | Default | Description | +|----------|--------|---------|-------------| +| `HAKMEM_MEM_LEARN` | 0, 1 | 0 | Enable memory pattern learning (THP/RSS/WMAX) | +| `HAKMEM_MEM_LEARN_WINDOW` | 1-1000000 | 5000 | Learning window size (operations) | +| `HAKMEM_MEM_LEARN_THRESHOLD` | 0.0-1.0 | 0.8 | Activation threshold (80% confidence) | + +### Advanced Overrides +**For troubleshooting only** - enables legacy advanced knobs that are auto-tuned by default. + +| Variable | Values | Default | Description | +|----------|--------|---------|-------------| +| `HAKMEM_LEARN_ADVANCED` | 0, 1 | 0 | Enable advanced override knobs (see DEPRECATED.md) | + +**Examples**: +```bash +# Production (learning disabled, use static tuning) +export HAKMEM_ALLOC_LEARN=0 +export HAKMEM_MEM_LEARN=0 + +# Adaptive workload (enable both learners) +export HAKMEM_ALLOC_LEARN=1 +export HAKMEM_ALLOC_LEARN_WINDOW=20000 +export HAKMEM_ALLOC_LEARN_RATE=0.05 +export HAKMEM_MEM_LEARN=1 +export HAKMEM_MEM_LEARN_WINDOW=10000 +export HAKMEM_MEM_LEARN_THRESHOLD=0.75 + +# Migration troubleshooting (enable advanced overrides) +export HAKMEM_LEARN_ADVANCED=1 +export HAKMEM_LEARN_DECAY=0.95 # Override auto-tuned decay +``` + +**Migration Note**: See [DEPRECATED.md](DEPRECATED.md) for mapping of 18 legacy variables โ†’ 6 canonical variables. + +--- + +## ๐ŸŽฏ TINY Allocator (1-2048B) + +### TLS Cache Configuration + +| Variable | Values | Default | Description | +|----------|--------|---------|-------------| +| `HAKMEM_TINY_TLS_CAP` | 16-1024 | 64 | Per-class TLS cache capacity | +| `HAKMEM_TINY_TLS_REFILL` | 4-256 | 16 | Batch refill size | +| `HAKMEM_TINY_DRAIN_THRESH` | 0-1024 | 128 | Remote free drain threshold | + +### Super Front Cache (SFC) +**Note**: SFC is **ACTIVE** and provides 95%+ hit rate for hot allocations. + +| Variable | Values | Default | Description | +|----------|--------|---------|-------------| +| `HAKMEM_TINY_SFC_ENABLE` | 0, 1 | 1 | Enable Super Front Cache (ultra-fast TLS cache) | +| `HAKMEM_TINY_SFC_CAPACITY` | 32-512 | 128 | SFC slot count | +| `HAKMEM_TINY_SFC_HOT_CLASSES` | 1-16 | 8 | Number of hot classes to cache | + +### P0 Batch Optimization + +| Variable | Values | Default | Description | +|----------|--------|---------|-------------| +| `HAKMEM_TINY_P0_ENABLE` | 0, 1 | 1 | Enable P0 batch refill (O(1) freelist pop) | +| `HAKMEM_TINY_P0_BATCH` | 4-128 | 16 | P0 batch size | +| `HAKMEM_TINY_P0_NO_DRAIN` | 0, 1 | 0 | Disable remote drain (debug only) | +| `HAKMEM_TINY_P0_LOG` | 0, 1 | 0 | Enable P0 counter validation logging | + +### Header Configuration + +| Variable | Values | Default | Description | +|----------|--------|---------|-------------| +| `HAKMEM_TINY_HEADER_CLASSIDX` | 0, 1 | 1 | Store class_idx in header (Phase 7, enables fast free) | + +**Examples**: +```bash +# High-throughput (large caches, aggressive batching) +export HAKMEM_TINY_TLS_CAP=256 +export HAKMEM_TINY_TLS_REFILL=32 +export HAKMEM_TINY_SFC_CAPACITY=256 +export HAKMEM_TINY_P0_ENABLE=1 +export HAKMEM_TINY_P0_BATCH=32 + +# Low-latency (small caches, fine-grained refill) +export HAKMEM_TINY_TLS_CAP=32 +export HAKMEM_TINY_TLS_REFILL=4 +export HAKMEM_TINY_SFC_CAPACITY=64 +export HAKMEM_TINY_P0_BATCH=8 + +# Debug P0 issues +export HAKMEM_TINY_P0_LOG=1 +export HAKMEM_TINY_P0_NO_DRAIN=1 # Isolate batch refill from remote free +``` + +--- + +## ๐ŸŠ Pool TLS Allocator (2-8KB) + +### Arena Management + +| Variable | Values | Default | Description | +|----------|--------|---------|-------------| +| `HAKMEM_POOL_TLS_ARENA_MB_INIT` | 1-64 | 1 | Initial arena size (MB) | +| `HAKMEM_POOL_TLS_ARENA_MB_MAX` | 1-64 | 8 | Maximum arena size (MB) | +| `HAKMEM_POOL_TLS_ARENA_GROWTH_LEVELS` | 1-8 | 3 | Growth levels (1MBโ†’2MBโ†’4MBโ†’8MB) | + +**Example**: +```bash +# Large arena for high-throughput 8KB allocations +export HAKMEM_POOL_TLS_ARENA_MB_INIT=4 +export HAKMEM_POOL_TLS_ARENA_MB_MAX=32 +export HAKMEM_POOL_TLS_ARENA_GROWTH_LEVELS=5 # 4MBโ†’8MBโ†’16MBโ†’32MB +``` + +--- + +## ๐Ÿ“Š Statistics & Profiling + +| Variable | Values | Default | Description | +|----------|--------|---------|-------------| +| `HAKMEM_STATS_ENABLE` | 0, 1 | 0 | Enable statistics collection | +| `HAKMEM_STATS_VERBOSE` | 0, 1 | 0 | Verbose stats output | +| `HAKMEM_STATS_INTERVAL_SEC` | 1-3600 | 10 | Stats reporting interval (seconds) | +| `HAKMEM_PROFILE_SYSCALLS` | 0, 1 | 0 | Profile syscall counts (mmap/munmap/madvise) | + +**Example**: +```bash +# Enable stats for performance analysis +export HAKMEM_STATS_ENABLE=1 +export HAKMEM_STATS_VERBOSE=1 +export HAKMEM_STATS_INTERVAL_SEC=5 +export HAKMEM_PROFILE_SYSCALLS=1 +``` + +--- + +## ๐Ÿงช Experimental Features + +**Warning**: These features are experimental and may change or be removed. + +| Variable | Values | Default | Description | +|----------|--------|---------|-------------| +| `HAKMEM_EXPERIMENTAL_ADAPTIVE_DRAIN` | 0, 1 | 0 | Adaptive remote free drain threshold | +| `HAKMEM_EXPERIMENTAL_CACHE_TUNING` | 0, 1 | 0 | Runtime cache capacity tuning | + +--- + +## ๐Ÿš€ Quick Start Examples + +### 1. Production (Default Recommended) +```bash +# High performance, stable, integrity checks enabled +export HAKMEM_SUPERSLAB_LAZY=1 +export HAKMEM_SUPERSLAB_LRU_CAP=256 +export HAKMEM_TINY_P0_ENABLE=1 +export HAKMEM_INTEGRITY_CHECKS=1 +``` + +### 2. Debug Session +```bash +# Verbose logging, tracing, integrity checks +export HAKMEM_DEBUG_LEVEL=3 +export HAKMEM_DEBUG_TINY=1 +export HAKMEM_TRACE_ALLOCATIONS=1 +export HAKMEM_INTEGRITY_CHECKS=1 +export HAKMEM_TINY_P0_LOG=1 +``` + +### 3. Low-Latency Workload +```bash +# Small caches, fine-grained batching, minimal syscalls +export HAKMEM_TINY_TLS_CAP=32 +export HAKMEM_TINY_TLS_REFILL=4 +export HAKMEM_TINY_SFC_CAPACITY=64 +export HAKMEM_SUPERSLAB_LAZY=1 +export HAKMEM_SUPERSLAB_LRU_CAP=128 +``` + +### 4. High-Throughput Workload +```bash +# Large caches, aggressive batching, prewarm +export HAKMEM_TINY_TLS_CAP=256 +export HAKMEM_TINY_TLS_REFILL=32 +export HAKMEM_TINY_SFC_CAPACITY=256 +export HAKMEM_TINY_P0_BATCH=32 +export HAKMEM_SUPERSLAB_PREWARM=16 +export HAKMEM_SUPERSLAB_LRU_CAP=512 +``` + +### 5. Memory-Efficient (Low RSS) +```bash +# Minimal caching, eager deallocation +export HAKMEM_SUPERSLAB_LAZY=0 +export HAKMEM_SUPERSLAB_LRU_CAP=32 +export HAKMEM_SUPERSLAB_SOFT_CAP=16 +export HAKMEM_TINY_TLS_CAP=32 +export HAKMEM_TINY_SFC_CAPACITY=64 +export HAKMEM_POOL_TLS_ARENA_MB_MAX=2 +``` + +--- + +## โœ… Validation & Testing + +### Validate Configuration +```bash +# Check for deprecated/invalid variables +./scripts/validate_config.sh + +# Example output: +# [DEPRECATED] HAKMEM_LEARN is deprecated, use HAKMEM_ALLOC_LEARN instead +# Sunset date: 2026-05-26 (6 months from 2025-11-26) +# See DEPRECATED.md for migration guide +# +# [WARN] HAKMEM_TINY_TLS_CAP=2048 is outside typical range (16-1024) +# +# [OK] HAKMEM_DEBUG_LEVEL=2 +# [OK] HAKMEM_SUPERSLAB_LAZY=1 +``` + +### Test Performance +```bash +# Baseline (10M iterations, 10 runs recommended) +./out/release/bench_random_mixed_hakmem + +# Custom workload +./out/release/bench_random_mixed_hakmem 10000000 256 42 + +# Multi-threaded (Larson benchmark) +./out/release/larson_hakmem 8 # 8 threads +``` + +--- + +## โ“ FAQ + +### Q: What's the difference between ALLOC_LEARN and MEM_LEARN? +**A**: +- `HAKMEM_ALLOC_LEARN`: Tunes **allocator behavior** (cache sizes, refill batches) based on allocation patterns +- `HAKMEM_MEM_LEARN`: Tunes **memory management** (THP usage, RSS optimization, max-size detection) + +### Q: Should I enable learning in production? +**A**: **Generally NO**. Learning adds overhead (~5-10%) and is best for: +- Adaptive workloads with unpredictable patterns +- Benchmarking different configurations +- Initial tuning phase (then bake learned values into static config) + +For production, use static tuning based on profiling. + +### Q: Why is SUPERSLAB_REUSE default OFF? +**A**: Phase 12 (Shared SuperSlab Pool) removed per-class registry population. Reuse is now less effective and can cause fragmentation. Use `SUPERSLAB_LAZY=1` (default) instead for syscall reduction. + +### Q: What's the performance impact of INTEGRITY_CHECKS? +**A**: ~2-5% overhead. Recommended for production (default ON) to catch memory corruption early. Disable only for performance testing. + +### Q: How do I migrate from deprecated learning variables? +**A**: See [DEPRECATED.md](DEPRECATED.md) Section "Learning Systems (P2.2 Consolidation)" for complete mapping of 18โ†’6 variables. The 6-month deprecation period provides backward compatibility. + +### Q: What's SFC and why is it still active? +**A**: SFC (Super Front Cache) is an ultra-fast TLS cache (95%+ hit rate, 3-4 instructions). Unified Cache was tested in Phase 3d-B but found slower than SFC, so SFC remained as the active implementation. + +--- + +## ๐Ÿ“š See Also + +- [DEPRECATED.md](DEPRECATED.md) - Deprecated variables and migration guide +- [BUILDING_QUICKSTART.md](BUILDING_QUICKSTART.md) - Build instructions +- [CLAUDE.md](CLAUDE.md) - Development history and performance benchmarks +- [hakmem_cleanup_proposal.txt](hakmem_cleanup_proposal.txt) - Cleanup roadmap + +--- + +**Generated**: 2025-11-26 (Phase 2.2 - Learning Systems Consolidation)