Files
hakmem/docs/specs/CONFIGURATION.md
Moe Charm (CI) 984cca41ef P0 Optimization: Shared Pool fast path with O(1) metadata lookup
Performance Results:
- Throughput: 2.66M ops/s → 3.8M ops/s (+43% improvement)
- sp_meta_find_or_create: O(N) linear scan → O(1) direct pointer
- Stage 2 metadata scan: 100% → 10-20% (80-90% reduction via hints)

Core Optimizations:

1. O(1) Metadata Lookup (superslab_types.h)
   - Added `shared_meta` pointer field to SuperSlab struct
   - Eliminates O(N) linear search through ss_metadata[] array
   - First access: O(N) scan + cache | Subsequent: O(1) direct return

2. sp_meta_find_or_create Fast Path (hakmem_shared_pool.c)
   - Check cached ss->shared_meta first before linear scan
   - Cache pointer after successful linear scan for future lookups
   - Reduces 7.8% CPU hotspot to near-zero for hot paths

3. Stage 2 Class Hints Fast Path (hakmem_shared_pool_acquire.c)
   - Try class_hints[class_idx] FIRST before full metadata scan
   - Uses O(1) ss->shared_meta lookup for hint validation
   - __builtin_expect() for branch prediction optimization
   - 80-90% of acquire calls now skip full metadata scan

4. Proper Initialization (ss_allocation_box.c)
   - Initialize shared_meta = NULL in superslab_allocate()
   - Ensures correct NULL-check semantics for new SuperSlabs

Additional Improvements:
- Updated ptr_trace and debug ring for release build efficiency
- Enhanced ENV variable documentation and analysis
- Added learner_env_box.h for configuration management
- Various Box optimizations for reduced overhead

Thread Safety:
- All atomic operations use correct memory ordering
- shared_meta cached under mutex protection
- Lock-free Stage 2 uses proper CAS with acquire/release semantics

Testing:
- Benchmark: 1M iterations, 3.8M ops/s stable
- Build: Clean compile RELEASE=0 and RELEASE=1
- No crashes, memory leaks, or correctness issues

Next Optimization Candidates:
- P1: Per-SuperSlab free slot bitmap for O(1) slot claiming
- P2: Reduce Stage 2 critical section size
- P3: Page pre-faulting (MAP_POPULATE)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-04 16:21:54 +09:00

371 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# HAKMEM Configuration Guide
**Last Updated**: 2025-11-28 (After ENV Cleanup Phase 1-3)
This guide documents all canonical HAKMEM environment variables after Phase 0-2 cleanup and ENV Cleanup Phase 1-3.
**Recent Changes**:
- **2025-11-28**: ENV Cleanup Phase 1-3 completed - 13 debug variables now gated behind `!HAKMEM_BUILD_RELEASE`
- **2025-11-26**: Phase 2.2 - Learning Systems Consolidation (18→6 variables)
---
## 📋 Quick Reference
Use the validation tool to check your configuration:
```bash
# Validate current environment
./scripts/validate_config.sh
# Strict mode (treat warnings as errors)
./scripts/validate_config.sh --strict
# Quiet mode (errors only)
./scripts/validate_config.sh --quiet
```
**Deprecated variables?** See [DEPRECATED.md](DEPRECATED.md) for migration guide.
---
## 🔧 Debug Variables (Gated in Release Builds)
**Important**: The following debug-only variables are compiled out when `HAKMEM_BUILD_RELEASE=1` (default for production builds). They have **zero overhead** in release builds.
### Phase 1-3 Gated Variables (2025-11-28)
**Core Debug Infrastructure**:
- `HAKMEM_TINY_ALLOC_DEBUG` - TLS allocation state dumps (4 call sites)
- `HAKMEM_TINY_PROFILE` - FastCache profiling
- `HAKMEM_WATCH_ADDR` - Watch specific address for debugging
**Trace & Timing**:
- `HAKMEM_PTR_TRACE_DUMP` - Pointer trace dumps`HAKMEM_STATS=trace` でも有効)
- `HAKMEM_PTR_TRACE_VERBOSE` - Verbose pointer tracing`HAKMEM_TRACE=ptr` でも有効)
- `HAKMEM_TIMING` - Timing instrumentation
**Freelist Diagnostics**:
- `HAKMEM_TINY_SLL_DIAG` - SLL (singly-linked list) diagnostics (multiple call sites)
- `HAKMEM_TINY_FREELIST_MASK` - Freelist mask updates
- `HAKMEM_SS_FREE_DEBUG` - SuperSlab free debug logging
**SuperSlab Registry Debug**:
- `HAKMEM_SUPER_LOOKUP_DEBUG` - SuperSlab lookup verbose logging
- `HAKMEM_SUPER_REG_DEBUG` - Register/unregister debug (2 sites)
- `HAKMEM_SS_LRU_DEBUG` - LRU cache operation logging (3 sites)
- `HAKMEM_SS_PREWARM_DEBUG` - Prewarm initialization logging (2 sites)
**Production Config (NOT gated)**:
These variables remain available in release builds for operational tuning:
- `HAKMEM_SUPERSLAB_MAX_CACHED` - LRU cache capacity limit
- `HAKMEM_SUPERSLAB_MAX_MEMORY_MB` - LRU memory limit
- `HAKMEM_SUPERSLAB_TTL_SEC` - LRU time-to-live
- `HAKMEM_PREWARM_SUPERSLABS` - Prewarm count per class
**Performance Impact**: Gating these 13 debug variables improved Larson benchmark from 30.2M to 30.5M ops/s (+1.0%).
For details, see `docs/status/ENV_CLEANUP_TASK.md`.
---
## 🎯 Core Configuration
### Allocator Path Selection
| Variable | Values | Default | Description |
|----------|--------|---------|-------------|
| `HAKMEM_WRAP_TINY` | 0, 1 | 1 | Enable TINY allocator (1-2048B) |
| `HAKMEM_WRAP_POOL` | 0, 1 | 1 | Enable POOL allocator (2-8KB) |
| `HAKMEM_WRAP_MID` | 0, 1 | 1 | Enable MID allocator (8-32KB) |
| `HAKMEM_WRAP_LARGE` | 0, 1 | 1 | Enable LARGE allocator (>32KB) |
**Example**:
```bash
# Disable all HAKMEM allocators (use system malloc)
export HAKMEM_WRAP_TINY=0 HAKMEM_WRAP_POOL=0 HAKMEM_WRAP_MID=0 HAKMEM_WRAP_LARGE=0
```
---
## 🏗️ SuperSlab Management
**Canonical Variables** (After P0.1 - SuperSlab Unification):
| Variable | Values | Default | Description |
|----------|--------|---------|-------------|
| `HAKMEM_SUPERSLAB_REUSE` | 0, 1 | 0 | Reuse empty slabs (reduces mmap/munmap syscalls) |
| `HAKMEM_SUPERSLAB_LAZY` | 0, 1 | 1 | Lazy deallocation (Phase 9, keep slabs cached) |
| `HAKMEM_SUPERSLAB_PREWARM` | 0-128 | 0 | Preallocate N SuperSlabs at startup |
| `HAKMEM_SUPERSLAB_LRU_CAP` | 0-1024 | 256 | Max cached SuperSlabs (LRU eviction) |
| `HAKMEM_SUPERSLAB_SOFT_CAP` | 0-1024 | 128 | Soft cap for SuperSlab pool (before eviction) |
**Examples**:
```bash
# High performance (aggressive reuse + large cache)
export HAKMEM_SUPERSLAB_REUSE=1
export HAKMEM_SUPERSLAB_LAZY=1
export HAKMEM_SUPERSLAB_PREWARM=16
export HAKMEM_SUPERSLAB_LRU_CAP=512
# Low memory footprint (minimal caching)
export HAKMEM_SUPERSLAB_REUSE=0
export HAKMEM_SUPERSLAB_LAZY=0
export HAKMEM_SUPERSLAB_LRU_CAP=32
export HAKMEM_SUPERSLAB_SOFT_CAP=16
```
**Note**: Phase 12 (Shared SuperSlab Pool) removed per-class registry population, making `SUPERSLAB_REUSE` less effective. Default is OFF.
---
## 🧠 Learning Systems
**Canonical Variables** (After P2.2 - Learning Consolidation, 18→6 variables):
### Allocation Learning
Controls adaptive sizing for allocator caches (TLS, SFC, capacity tuning).
| Variable | Values | Default | Description |
|----------|--------|---------|-------------|
### Memory Learning
Controls THP (Transparent Huge Pages), RSS optimization, and max-size learning.
| Variable | Values | Default | Description |
|----------|--------|---------|-------------|
### Advanced Overrides
**For troubleshooting only** - enables legacy advanced knobs that are auto-tuned by default.
| Variable | Values | Default | Description |
|----------|--------|---------|-------------|
**Examples**:
```bash
# Production (learning disabled, use static tuning)
## 🎯 TINY Allocator (1-2048B)
### TLS Cache Configuration
| Variable | Values | Default | Description |
|----------|--------|---------|-------------|
| `HAKMEM_TINY_TLS_CAP` | 16-1024 | 64 | Per-class TLS cache capacity |
| `HAKMEM_TINY_TLS_REFILL` | 4-256 | 16 | Batch refill size |
| `HAKMEM_TINY_DRAIN_THRESH` | 0-1024 | 128 | Remote free drain threshold |
### Super Front Cache (SFC)
**Note**: SFC is **ACTIVE** and provides 95%+ hit rate for hot allocations.
| Variable | Values | Default | Description |
|----------|--------|---------|-------------|
| `HAKMEM_TINY_SFC_ENABLE` | 0, 1 | 1 | Enable Super Front Cache (ultra-fast TLS cache) |
| `HAKMEM_TINY_SFC_CAPACITY` | 32-512 | 128 | SFC slot count |
| `HAKMEM_TINY_SFC_HOT_CLASSES` | 1-16 | 8 | Number of hot classes to cache |
### P0 Batch Optimization
| Variable | Values | Default | Description |
|----------|--------|---------|-------------|
| `HAKMEM_TINY_P0_ENABLE` | 0, 1 | 1 | Enable P0 batch refill (O(1) freelist pop) |
| `HAKMEM_TINY_P0_BATCH` | 4-128 | 16 | P0 batch size |
| `HAKMEM_TINY_P0_NO_DRAIN` | 0, 1 | 0 | Disable remote drain (debug only) |
| `HAKMEM_TINY_P0_LOG` | 0, 1 | 0 | Enable P0 counter validation logging |
### Header Configuration
| Variable | Values | Default | Description |
|----------|--------|---------|-------------|
| `HAKMEM_TINY_HEADER_CLASSIDX` | 0, 1 | 1 | Store class_idx in header (Phase 7, enables fast free) |
**Examples**:
```bash
# High-throughput (large caches, aggressive batching)
export HAKMEM_TINY_TLS_CAP=256
export HAKMEM_TINY_TLS_REFILL=32
export HAKMEM_TINY_SFC_CAPACITY=256
export HAKMEM_TINY_P0_ENABLE=1
export HAKMEM_TINY_P0_BATCH=32
# Low-latency (small caches, fine-grained refill)
export HAKMEM_TINY_TLS_CAP=32
export HAKMEM_TINY_TLS_REFILL=4
export HAKMEM_TINY_SFC_CAPACITY=64
export HAKMEM_TINY_P0_BATCH=8
# Debug P0 issues
export HAKMEM_TINY_P0_LOG=1
export HAKMEM_TINY_P0_NO_DRAIN=1 # Isolate batch refill from remote free
```
---
## 🏊 Pool TLS Allocator (2-8KB)
### Arena Management
| Variable | Values | Default | Description |
|----------|--------|---------|-------------|
| `HAKMEM_POOL_TLS_ARENA_MB_INIT` | 1-64 | 1 | Initial arena size (MB) |
| `HAKMEM_POOL_TLS_ARENA_MB_MAX` | 1-64 | 8 | Maximum arena size (MB) |
| `HAKMEM_POOL_TLS_ARENA_GROWTH_LEVELS` | 1-8 | 3 | Growth levels (1MB→2MB→4MB→8MB) |
**Example**:
```bash
# Large arena for high-throughput 8KB allocations
export HAKMEM_POOL_TLS_ARENA_MB_INIT=4
export HAKMEM_POOL_TLS_ARENA_MB_MAX=32
export HAKMEM_POOL_TLS_ARENA_GROWTH_LEVELS=5 # 4MB→8MB→16MB→32MB
```
---
## 📊 Statistics & Profiling
| Variable | Values | Default | Description |
|----------|--------|---------|-------------|
**Example**:
```bash
# Enable stats for performance analysis
```
---
## 🧪 Experimental Features
**Warning**: These features are experimental and may change or be removed.
| Variable | Values | Default | Description |
|----------|--------|---------|-------------|
---
## 🚀 Quick Start Examples
### 1. Production (Default Recommended)
```bash
# High performance, stable, integrity checks enabled
export HAKMEM_SUPERSLAB_LAZY=1
export HAKMEM_SUPERSLAB_LRU_CAP=256
export HAKMEM_TINY_P0_ENABLE=1
```
### 2. Debug Session
```bash
# Verbose logging, tracing, integrity checks
export HAKMEM_TRACE_ALLOCATIONS=1
export HAKMEM_TINY_P0_LOG=1
```
### 3. Low-Latency Workload
```bash
# Small caches, fine-grained batching, minimal syscalls
export HAKMEM_TINY_TLS_CAP=32
export HAKMEM_TINY_TLS_REFILL=4
export HAKMEM_TINY_SFC_CAPACITY=64
export HAKMEM_SUPERSLAB_LAZY=1
export HAKMEM_SUPERSLAB_LRU_CAP=128
```
### 4. High-Throughput Workload
```bash
# Large caches, aggressive batching, prewarm
export HAKMEM_TINY_TLS_CAP=256
export HAKMEM_TINY_TLS_REFILL=32
export HAKMEM_TINY_SFC_CAPACITY=256
export HAKMEM_TINY_P0_BATCH=32
export HAKMEM_SUPERSLAB_PREWARM=16
export HAKMEM_SUPERSLAB_LRU_CAP=512
```
### 5. Memory-Efficient (Low RSS)
```bash
# Minimal caching, eager deallocation
export HAKMEM_SUPERSLAB_LAZY=0
export HAKMEM_SUPERSLAB_LRU_CAP=32
export HAKMEM_SUPERSLAB_SOFT_CAP=16
export HAKMEM_TINY_TLS_CAP=32
export HAKMEM_TINY_SFC_CAPACITY=64
export HAKMEM_POOL_TLS_ARENA_MB_MAX=2
```
---
## ✅ Validation & Testing
### Validate Configuration
```bash
# Check for deprecated/invalid variables
./scripts/validate_config.sh
# Example output:
# Sunset date: 2026-05-26 (6 months from 2025-11-26)
# See DEPRECATED.md for migration guide
#
# [WARN] HAKMEM_TINY_TLS_CAP=2048 is outside typical range (16-1024)
#
# [OK] HAKMEM_SUPERSLAB_LAZY=1
```
### Test Performance
```bash
# Baseline (10M iterations, 10 runs recommended)
./out/release/bench_random_mixed_hakmem
# Custom workload
./out/release/bench_random_mixed_hakmem 10000000 256 42
# Multi-threaded (Larson benchmark)
./out/release/larson_hakmem 8 # 8 threads
```
---
## ❓ FAQ
### Q: What's the difference between ALLOC_LEARN and MEM_LEARN?
**A**:
### Q: Should I enable learning in production?
**A**: **Generally NO**. Learning adds overhead (~5-10%) and is best for:
- Adaptive workloads with unpredictable patterns
- Benchmarking different configurations
- Initial tuning phase (then bake learned values into static config)
For production, use static tuning based on profiling.
### Q: Why is SUPERSLAB_REUSE default OFF?
**A**: Phase 12 (Shared SuperSlab Pool) removed per-class registry population. Reuse is now less effective and can cause fragmentation. Use `SUPERSLAB_LAZY=1` (default) instead for syscall reduction.
### Q: What's the performance impact of INTEGRITY_CHECKS?
**A**: ~2-5% overhead. Recommended for production (default ON) to catch memory corruption early. Disable only for performance testing.
### Q: How do I migrate from deprecated learning variables?
**A**: See [DEPRECATED.md](DEPRECATED.md) Section "Learning Systems (P2.2 Consolidation)" for complete mapping of 18→6 variables. The 6-month deprecation period provides backward compatibility.
### Q: What's SFC and why is it still active?
**A**: SFC (Super Front Cache) is an ultra-fast TLS cache (95%+ hit rate, 3-4 instructions). Unified Cache was tested in Phase 3d-B but found slower than SFC, so SFC remained as the active implementation.
### Q: What are "gated" debug variables?
**A**: Debug variables gated behind `!HAKMEM_BUILD_RELEASE` (13 variables as of Phase 1-3) are compiled out entirely in production builds. This means:
- **Zero runtime overhead** - no getenv() calls, no branch checks
- **Smaller binary size** - debug code removed
- **Still available in debug builds** - set `HAKMEM_BUILD_RELEASE=0` to enable
This differs from production config variables (like `HAKMEM_SUPERSLAB_MAX_CACHED`) which remain accessible for operational tuning.
---
## 📚 See Also
- [ENV_CLEANUP_TASK.md](../status/ENV_CLEANUP_TASK.md) - ENV Cleanup Phase 1-3 completion report
- [DEPRECATED.md](DEPRECATED.md) - Deprecated variables and migration guide
- [BUILDING_QUICKSTART.md](BUILDING_QUICKSTART.md) - Build instructions
- [CLAUDE.md](CLAUDE.md) - Development history and performance benchmarks
- [hakmem_cleanup_proposal.txt](hakmem_cleanup_proposal.txt) - Cleanup roadmap
---
**Generated**: 2025-11-28 (Phase 1-3 ENV Cleanup Complete)