# Phase 6.8: Configuration Cleanup & Mode-based Architecture **Date**: 2025-10-21 **Status**: 🚧 **IN PROGRESS** --- ## 🎯 Goal **Problem**: 現状の hakmem は環境変数が多すぎて管理困難 - `HAKMEM_FREE_POLICY`, `HAKMEM_THP`, `HAKMEM_EVO_POLICY`, etc. - 組み合わせが複雑で不正な設定でバグる可能性 - ベンチマーク比較が困難(どの設定で比較?) **Solution**: **5つのプリセットモード**に統合 - シンプルな `HAKMEM_MODE=balanced` で適切な設定 - 各機能の効果を段階的に測定可能 - 論文での説明が容易 --- ## 📊 5 Modes Definition ### **Mode Overview** | Mode | Use Case | Target Audience | Performance Goal | |------|----------|-----------------|------------------| | **MINIMAL** | ベースライン測定 | ベンチマーク比較 | system malloc 相当 | | **FAST** | 本番環境(速度優先) | Production use | mimalloc +20% | | **BALANCED** | デフォルト推奨 | General use | mimalloc +40% | | **LEARNING** | 学習フェーズ | Development | mimalloc +60% | | **RESEARCH** | 開発・デバッグ | Research | N/A(全機能ON) | ### **Feature Matrix** | Feature | MINIMAL | FAST | BALANCED | LEARNING | RESEARCH | |---------|---------|------|----------|----------|----------| | **ELO learning** | ❌ | ❌ FROZEN | ✅ FROZEN | ✅ LEARN | ✅ LEARN | | **BigCache** | ❌ | ✅ | ✅ | ✅ | ✅ | | **Batch madvise** | ❌ | ✅ | ✅ | ✅ | ✅ | | **TinyPool (future)** | ❌ | ✅ | ✅ | ❌ | ❌ | | **Free policy** | batch | adaptive | adaptive | adaptive | adaptive | | **THP** | off | auto | auto | auto | on | | **Evolution lifecycle** | - | FROZEN | FROZEN | LEARN→FROZEN | LEARN | | **Debug logging** | ❌ | ❌ | ❌ | ⚠️ minimal | ✅ verbose | --- ## 🔧 Implementation Plan ### **Step 0: Baseline Measurement** ✅ (Already done in Phase 6.6-6.7) Current state: - hakmem-evolving: 37,602 ns (VM scenario, 2MB) - mimalloc: 19,964 ns (+88.3% gap) - All features ON (uncontrolled) ### **Step 1: MINIMAL Mode** 🎯 (P0 - Foundation) **Goal**: Create baseline with all features OFF **Implementation**: ```c // hakmem_config.h typedef enum { HAKMEM_MODE_MINIMAL = 0, HAKMEM_MODE_FAST, HAKMEM_MODE_BALANCED, HAKMEM_MODE_LEARNING, HAKMEM_MODE_RESEARCH, } HakemMode; typedef struct { HakemMode mode; // Feature flags int enable_elo; int enable_bigcache; int enable_batch; int enable_pool; // future (Step 5) // Policies FreePolicy free_policy; THPPolicy thp_policy; const char* evo_phase; // "frozen", "learn", "canary" // Debug int debug_logging; } HakemConfig; extern HakemConfig g_hakem_config; void hak_config_init(void); ``` **Changes**: - `hakmem_config.h/c`: New files - `hakmem.c`: Call `hak_config_init()` in `hak_init()` - All modules: Check `g_hakem_config` flags before enabling features **Benchmark**: ```bash HAKMEM_MODE=minimal ./bench_allocators --allocator hakmem-evolving --scenario vm --iterations 100 ``` **Expected**: - Performance: ~40,000-50,000 ns (slower than current, no optimizations) - Serves as baseline for feature comparison **Estimated time**: 1 day --- ### **Step 2: Enable BigCache** 🎯 (P0 - Tier-2 Cache) **Goal**: Measure BigCache impact in isolation **Implementation**: - MINIMAL + BigCache ON - Keep ELO/Batch/THP OFF **Benchmark**: ```bash HAKMEM_MODE=minimal ./bench_runner.sh --warmup 2 --runs 10 # Then: # hakmem.c: g_hakem_config.enable_bigcache = 1; ./bench_runner.sh --warmup 2 --runs 10 ``` **Expected**: - VM scenario hit rate: 99%+ - Performance: -5,000 ns improvement (cache hits avoid mmap) - Target: 35,000-40,000 ns **Measurement**: - BigCache hit rate - mmap syscall count (should drop) - Performance delta **Estimated time**: 0.5 day --- ### **Step 3: Enable Batch madvise** 🎯 (P1 - TLB Optimization) **Goal**: Measure batch madvise impact **Implementation**: - MINIMAL + BigCache + Batch ON - Keep ELO/THP OFF **Benchmark**: ```bash # Previous: MINIMAL + BigCache # New: MINIMAL + BigCache + Batch ./bench_runner.sh --warmup 2 --runs 10 ``` **Expected**: - Batch flush operations: 1-10 per run - Performance: -500-1,000 ns improvement (TLB optimization) - Target: 34,000-39,000 ns **Measurement**: - Batch statistics (blocks added, flush count) - madvise syscall count - Performance delta **Estimated time**: 0.5 day --- ### **Step 4: Enable ELO (FROZEN)** 🎯 (P1 - Strategy Selection) **Goal**: Measure ELO overhead in FROZEN mode (no learning) **Implementation**: - BALANCED mode = MINIMAL + BigCache + Batch + ELO(FROZEN) **Benchmark**: ```bash HAKMEM_MODE=balanced ./bench_runner.sh --warmup 2 --runs 10 ``` **Expected**: - ELO overhead: ~100-200 ns (strategy selection per allocation) - Performance: +100-200 ns regression (acceptable for adaptability) - Target: 34,500-39,500 ns **Measurement**: - ELO selection overhead - Strategy distribution - Performance delta **Estimated time**: 0.5 day --- ### **Step 5: TinyPool Implementation (FAST mode)** 🚀 (P2 - Fast Path) **Goal**: Implement pool-based fast path (ChatGPT Pro proposal) **Implementation**: - FAST mode = BALANCED + TinyPool - 7 size classes: 16/32/64/128/256/512/1024B - Per-thread free lists - class×shard O(1) mapping **Code sketch**: ```c // hakmem_pool.h typedef struct Node { struct Node* next; } Node; typedef struct { Node* head; uint32_t cnt; } FreeList; #define SHARDS 64 #define CLASSES 7 // 16B to 1024B typedef struct { FreeList list[SHARDS]; } ClassPools; _Thread_local ClassPools tls_pools[CLASSES]; // Fast path (O(1)) void* hak_alloc_small(size_t sz, void* pc); void hak_free_small(void* p, void* pc); ``` **Benchmark**: ```bash # Baseline: BALANCED mode HAKMEM_MODE=balanced ./bench_runner.sh --warmup 10 --runs 50 # New: FAST mode HAKMEM_MODE=fast ./bench_runner.sh --warmup 10 --runs 50 ``` **Expected**: - Small allocations (≤1KB): 9-15 ns fast path - VM scenario (2MB): No change (pool not used for large allocations) - Need new benchmark: tiny-hot (16/32/64B allocations) **Measurement**: - Pool hit rate - Fast path latency (perf profiling) - Comparison with mimalloc on tiny-hot **Estimated time**: 2-3 weeks (MVP: 2 weeks, MT support: +1 week) --- ### **Step 6: ELO LEARNING mode** 🎯 (P2 - Adaptive Learning) **Goal**: Measure learning overhead and convergence **Implementation**: - LEARNING mode = BALANCED + ELO(LEARN→FROZEN) **Benchmark**: ```bash HAKMEM_MODE=learning ./bench_runner.sh --warmup 100 --runs 100 ``` **Expected**: - LEARN phase: +200-500 ns overhead (ELO selection + recording) - Convergence: 1024-2048 allocations → FROZEN - FROZEN phase: Same as BALANCED mode - Overall: +50-100 ns average (amortized) **Measurement**: - ELO rating convergence - Phase transitions (LEARN → FROZEN → CANARY) - Learning overhead vs benefit **Estimated time**: 1 day --- ### **Step 7: RESEARCH mode (All features)** 🎯 (P3 - Development) **Goal**: Enable all features + debug logging **Implementation**: - RESEARCH mode = LEARNING + THP(ON) + Debug logging **Use case**: - Development & debugging only - Not for benchmarking (too slow) **Estimated time**: 0.5 day --- ## 📈 Benchmark Plan ### **Comparison Matrix** | Scenario | MINIMAL | +BigCache | +Batch | BALANCED | FAST | LEARNING | |----------|---------|-----------|--------|----------|------|----------| | **VM (2MB)** | 45,000 | 40,000 | 39,000 | 39,500 | 39,500 | 39,600 | | **tiny-hot** | 50 | 50 | 50 | 50 | **12** | 52 | | **cold-churn** | TBD | TBD | TBD | TBD | TBD | TBD | | **json-parse** | TBD | TBD | TBD | TBD | TBD | TBD | **Note**: Numbers are estimates, actual results TBD ### **Metrics to Collect** For each mode: - **Performance**: Median latency (ns) - **Syscalls**: mmap/munmap/madvise counts - **Page faults**: soft/hard counts - **Memory**: RSS delta - **Cache**: Hit rates (BigCache, Pool) ### **Benchmark Script** ```bash #!/bin/bash # bench_modes.sh - Compare all modes MODES="minimal balanced fast learning" SCENARIOS="vm cold-churn json-parse" for mode in $MODES; do for scenario in $SCENARIOS; do echo "=== Mode: $mode, Scenario: $scenario ===" HAKMEM_MODE=$mode ./bench_runner.sh \ --allocator hakmem-evolving \ --scenario $scenario \ --warmup 10 --runs 50 \ --output results_${mode}_${scenario}.csv done done # Aggregate results python3 analyze_modes.py results_*.csv ``` --- ## 🎯 Success Metrics ### **Step 1-4 (MINIMAL → BALANCED)** - ✅ Each feature's impact is measurable - ✅ Performance regression < 10% per feature - ✅ Total BALANCED overhead: +40-60% vs mimalloc ### **Step 5 (FAST mode with TinyPool)** - ✅ tiny-hot benchmark: mimalloc +20% or better - ✅ VM scenario: No regression vs BALANCED - ✅ Pool hit rate: 90%+ for small allocations ### **Step 6 (LEARNING mode)** - ✅ Convergence within 2048 allocations - ✅ Learning overhead amortized to < 5% - ✅ FROZEN performance = BALANCED --- ## 📝 Migration Plan (Backward Compatibility) ### **Environment Variable Priority** ```c // 1. HAKMEM_MODE has highest priority const char* mode_env = getenv("HAKMEM_MODE"); if (mode_env) { hak_config_apply_mode(mode_env); // Apply preset } else { // 2. Fall back to individual settings (legacy) const char* free_policy = getenv("HAKMEM_FREE_POLICY"); const char* thp = getenv("HAKMEM_THP"); // ... etc } // 3. Individual settings can override mode // Example: HAKMEM_MODE=balanced HAKMEM_THP=off // → Use BALANCED preset, but force THP=off ``` ### **Deprecation Timeline** - **Phase 6.8**: Both HAKMEM_MODE and individual env vars supported - **Phase 7**: Prefer HAKMEM_MODE, warn if individual vars used - **Phase 8**: Deprecate individual vars (only HAKMEM_MODE) --- ## 🚀 Implementation Timeline | Step | Task | Time | Cumulative | Status | |------|------|------|------------|--------| | 0 | Baseline (done) | - | - | ✅ | | 1 | MINIMAL mode | 1 day | 1 day | 🚧 | | 2 | +BigCache | 0.5 day | 1.5 days | ⏳ | | 3 | +Batch | 0.5 day | 2 days | ⏳ | | 4 | BALANCED (ELO FROZEN) | 0.5 day | 2.5 days | ⏳ | | 5 | FAST (TinyPool MVP) | 2-3 weeks | 3.5-4.5 weeks | ⏳ | | 6 | LEARNING mode | 1 day | 3.6-4.6 weeks | ⏳ | | 7 | RESEARCH mode | 0.5 day | 3.65-4.65 weeks | ⏳ | **Total**: 3.7-4.7 weeks (MVP: 2.5 days, Full: 4-5 weeks) --- ## 📚 Documentation Updates ### **README.md** Add section: ```markdown ## 🎯 Quick Start: Choosing a Mode - **Development**: `HAKMEM_MODE=learning` (adaptive, slow) - **Production**: `HAKMEM_MODE=fast` (mimalloc +20%) - **General**: `HAKMEM_MODE=balanced` (default, mimalloc +40%) - **Benchmarking**: `HAKMEM_MODE=minimal` (baseline) - **Research**: `HAKMEM_MODE=research` (all features + debug) ``` ### **New Files** - `PHASE_6.8_CONFIG_CLEANUP.md` (this file) - `apps/experiments/hakmem-poc/hakmem_config.h` - `apps/experiments/hakmem-poc/hakmem_config.c` - `apps/experiments/hakmem-poc/bench_modes.sh` - `apps/experiments/hakmem-poc/analyze_modes.py` --- ## 🎓 Expected Outcomes ### **For Paper** **Before Phase 6.8**: - ❌ "hakmem is +88% slower than mimalloc" - ⚠️ Complex configuration, hard to reproduce - ⚠️ Unclear which features contribute to overhead **After Phase 6.8**: - ✅ "BALANCED mode: +40% overhead for adaptive learning" - ✅ "FAST mode: +20% overhead, competitive with production allocators" - ✅ "Each feature's impact clearly measured" - ✅ "5 simple modes, easy to reproduce" ### **For Future Work** - Step 5 (TinyPool) can become **Phase 7** if successful - ChatGPT Pro's hybrid architecture validated - Clear path to mimalloc-level performance --- ## 🏆 Final Status **Phase 6.8**: 🚧 **IN PROGRESS** **Next Steps**: 1. ✅ Design document created (this file) 2. 🚧 Implement Step 1 (MINIMAL mode) 3. ⏳ Measure & iterate through Steps 2-7 --- **Ready to start implementation!** 🚀