Major Features: - Debug counter infrastructure for Refill Stage tracking - Free Pipeline counters (ss_local, ss_remote, tls_sll) - Diagnostic counters for early return analysis - Unified larson.sh benchmark runner with profiles - Phase 6-3 regression analysis documentation Bug Fixes: - Fix SuperSlab disabled by default (HAKMEM_TINY_USE_SUPERSLAB) - Fix profile variable naming consistency - Add .gitignore patterns for large files Performance: - Phase 6-3: 4.79 M ops/s (has OOM risk) - With SuperSlab: 3.13 M ops/s (+19% improvement) This is a clean repository without large log files. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
12 KiB
Phase 6.8: Configuration Cleanup & Mode-based Architecture
Date: 2025-10-21 Status: 🚧 IN PROGRESS
🎯 Goal
Problem: 現状の hakmem は環境変数が多すぎて管理困難
HAKMEM_FREE_POLICY,HAKMEM_THP,HAKMEM_EVO_POLICY, etc.- 組み合わせが複雑で不正な設定でバグる可能性
- ベンチマーク比較が困難(どの設定で比較?)
Solution: 5つのプリセットモードに統合
- シンプルな
HAKMEM_MODE=balancedで適切な設定 - 各機能の効果を段階的に測定可能
- 論文での説明が容易
📊 5 Modes Definition
Mode Overview
| Mode | Use Case | Target Audience | Performance Goal |
|---|---|---|---|
| MINIMAL | ベースライン測定 | ベンチマーク比較 | system malloc 相当 |
| FAST | 本番環境(速度優先) | Production use | mimalloc +20% |
| BALANCED | デフォルト推奨 | General use | mimalloc +40% |
| LEARNING | 学習フェーズ | Development | mimalloc +60% |
| RESEARCH | 開発・デバッグ | Research | N/A(全機能ON) |
Feature Matrix
| Feature | MINIMAL | FAST | BALANCED | LEARNING | RESEARCH |
|---|---|---|---|---|---|
| ELO learning | ❌ | ❌ FROZEN | ✅ FROZEN | ✅ LEARN | ✅ LEARN |
| BigCache | ❌ | ✅ | ✅ | ✅ | ✅ |
| Batch madvise | ❌ | ✅ | ✅ | ✅ | ✅ |
| TinyPool (future) | ❌ | ✅ | ✅ | ❌ | ❌ |
| Free policy | batch | adaptive | adaptive | adaptive | adaptive |
| THP | off | auto | auto | auto | on |
| Evolution lifecycle | - | FROZEN | FROZEN | LEARN→FROZEN | LEARN |
| Debug logging | ❌ | ❌ | ❌ | ⚠️ minimal | ✅ verbose |
🔧 Implementation Plan
Step 0: Baseline Measurement ✅ (Already done in Phase 6.6-6.7)
Current state:
- hakmem-evolving: 37,602 ns (VM scenario, 2MB)
- mimalloc: 19,964 ns (+88.3% gap)
- All features ON (uncontrolled)
Step 1: MINIMAL Mode 🎯 (P0 - Foundation)
Goal: Create baseline with all features OFF
Implementation:
// hakmem_config.h
typedef enum {
HAKMEM_MODE_MINIMAL = 0,
HAKMEM_MODE_FAST,
HAKMEM_MODE_BALANCED,
HAKMEM_MODE_LEARNING,
HAKMEM_MODE_RESEARCH,
} HakemMode;
typedef struct {
HakemMode mode;
// Feature flags
int enable_elo;
int enable_bigcache;
int enable_batch;
int enable_pool; // future (Step 5)
// Policies
FreePolicy free_policy;
THPPolicy thp_policy;
const char* evo_phase; // "frozen", "learn", "canary"
// Debug
int debug_logging;
} HakemConfig;
extern HakemConfig g_hakem_config;
void hak_config_init(void);
Changes:
hakmem_config.h/c: New fileshakmem.c: Callhak_config_init()inhak_init()- All modules: Check
g_hakem_configflags before enabling features
Benchmark:
HAKMEM_MODE=minimal ./bench_allocators --allocator hakmem-evolving --scenario vm --iterations 100
Expected:
- Performance: ~40,000-50,000 ns (slower than current, no optimizations)
- Serves as baseline for feature comparison
Estimated time: 1 day
Step 2: Enable BigCache 🎯 (P0 - Tier-2 Cache)
Goal: Measure BigCache impact in isolation
Implementation:
- MINIMAL + BigCache ON
- Keep ELO/Batch/THP OFF
Benchmark:
HAKMEM_MODE=minimal ./bench_runner.sh --warmup 2 --runs 10
# Then:
# hakmem.c: g_hakem_config.enable_bigcache = 1;
./bench_runner.sh --warmup 2 --runs 10
Expected:
- VM scenario hit rate: 99%+
- Performance: -5,000 ns improvement (cache hits avoid mmap)
- Target: 35,000-40,000 ns
Measurement:
- BigCache hit rate
- mmap syscall count (should drop)
- Performance delta
Estimated time: 0.5 day
Step 3: Enable Batch madvise 🎯 (P1 - TLB Optimization)
Goal: Measure batch madvise impact
Implementation:
- MINIMAL + BigCache + Batch ON
- Keep ELO/THP OFF
Benchmark:
# Previous: MINIMAL + BigCache
# New: MINIMAL + BigCache + Batch
./bench_runner.sh --warmup 2 --runs 10
Expected:
- Batch flush operations: 1-10 per run
- Performance: -500-1,000 ns improvement (TLB optimization)
- Target: 34,000-39,000 ns
Measurement:
- Batch statistics (blocks added, flush count)
- madvise syscall count
- Performance delta
Estimated time: 0.5 day
Step 4: Enable ELO (FROZEN) 🎯 (P1 - Strategy Selection)
Goal: Measure ELO overhead in FROZEN mode (no learning)
Implementation:
- BALANCED mode = MINIMAL + BigCache + Batch + ELO(FROZEN)
Benchmark:
HAKMEM_MODE=balanced ./bench_runner.sh --warmup 2 --runs 10
Expected:
- ELO overhead: ~100-200 ns (strategy selection per allocation)
- Performance: +100-200 ns regression (acceptable for adaptability)
- Target: 34,500-39,500 ns
Measurement:
- ELO selection overhead
- Strategy distribution
- Performance delta
Estimated time: 0.5 day
Step 5: TinyPool Implementation (FAST mode) 🚀 (P2 - Fast Path)
Goal: Implement pool-based fast path (ChatGPT Pro proposal)
Implementation:
- FAST mode = BALANCED + TinyPool
- 7 size classes: 16/32/64/128/256/512/1024B
- Per-thread free lists
- class×shard O(1) mapping
Code sketch:
// hakmem_pool.h
typedef struct Node { struct Node* next; } Node;
typedef struct { Node* head; uint32_t cnt; } FreeList;
#define SHARDS 64
#define CLASSES 7 // 16B to 1024B
typedef struct {
FreeList list[SHARDS];
} ClassPools;
_Thread_local ClassPools tls_pools[CLASSES];
// Fast path (O(1))
void* hak_alloc_small(size_t sz, void* pc);
void hak_free_small(void* p, void* pc);
Benchmark:
# Baseline: BALANCED mode
HAKMEM_MODE=balanced ./bench_runner.sh --warmup 10 --runs 50
# New: FAST mode
HAKMEM_MODE=fast ./bench_runner.sh --warmup 10 --runs 50
Expected:
- Small allocations (≤1KB): 9-15 ns fast path
- VM scenario (2MB): No change (pool not used for large allocations)
- Need new benchmark: tiny-hot (16/32/64B allocations)
Measurement:
- Pool hit rate
- Fast path latency (perf profiling)
- Comparison with mimalloc on tiny-hot
Estimated time: 2-3 weeks (MVP: 2 weeks, MT support: +1 week)
Step 6: ELO LEARNING mode 🎯 (P2 - Adaptive Learning)
Goal: Measure learning overhead and convergence
Implementation:
- LEARNING mode = BALANCED + ELO(LEARN→FROZEN)
Benchmark:
HAKMEM_MODE=learning ./bench_runner.sh --warmup 100 --runs 100
Expected:
- LEARN phase: +200-500 ns overhead (ELO selection + recording)
- Convergence: 1024-2048 allocations → FROZEN
- FROZEN phase: Same as BALANCED mode
- Overall: +50-100 ns average (amortized)
Measurement:
- ELO rating convergence
- Phase transitions (LEARN → FROZEN → CANARY)
- Learning overhead vs benefit
Estimated time: 1 day
Step 7: RESEARCH mode (All features) 🎯 (P3 - Development)
Goal: Enable all features + debug logging
Implementation:
- RESEARCH mode = LEARNING + THP(ON) + Debug logging
Use case:
- Development & debugging only
- Not for benchmarking (too slow)
Estimated time: 0.5 day
📈 Benchmark Plan
Comparison Matrix
| Scenario | MINIMAL | +BigCache | +Batch | BALANCED | FAST | LEARNING |
|---|---|---|---|---|---|---|
| VM (2MB) | 45,000 | 40,000 | 39,000 | 39,500 | 39,500 | 39,600 |
| tiny-hot | 50 | 50 | 50 | 50 | 12 | 52 |
| cold-churn | TBD | TBD | TBD | TBD | TBD | TBD |
| json-parse | TBD | TBD | TBD | TBD | TBD | TBD |
Note: Numbers are estimates, actual results TBD
Metrics to Collect
For each mode:
- Performance: Median latency (ns)
- Syscalls: mmap/munmap/madvise counts
- Page faults: soft/hard counts
- Memory: RSS delta
- Cache: Hit rates (BigCache, Pool)
Benchmark Script
#!/bin/bash
# bench_modes.sh - Compare all modes
MODES="minimal balanced fast learning"
SCENARIOS="vm cold-churn json-parse"
for mode in $MODES; do
for scenario in $SCENARIOS; do
echo "=== Mode: $mode, Scenario: $scenario ==="
HAKMEM_MODE=$mode ./bench_runner.sh \
--allocator hakmem-evolving \
--scenario $scenario \
--warmup 10 --runs 50 \
--output results_${mode}_${scenario}.csv
done
done
# Aggregate results
python3 analyze_modes.py results_*.csv
🎯 Success Metrics
Step 1-4 (MINIMAL → BALANCED)
- ✅ Each feature's impact is measurable
- ✅ Performance regression < 10% per feature
- ✅ Total BALANCED overhead: +40-60% vs mimalloc
Step 5 (FAST mode with TinyPool)
- ✅ tiny-hot benchmark: mimalloc +20% or better
- ✅ VM scenario: No regression vs BALANCED
- ✅ Pool hit rate: 90%+ for small allocations
Step 6 (LEARNING mode)
- ✅ Convergence within 2048 allocations
- ✅ Learning overhead amortized to < 5%
- ✅ FROZEN performance = BALANCED
📝 Migration Plan (Backward Compatibility)
Environment Variable Priority
// 1. HAKMEM_MODE has highest priority
const char* mode_env = getenv("HAKMEM_MODE");
if (mode_env) {
hak_config_apply_mode(mode_env); // Apply preset
} else {
// 2. Fall back to individual settings (legacy)
const char* free_policy = getenv("HAKMEM_FREE_POLICY");
const char* thp = getenv("HAKMEM_THP");
// ... etc
}
// 3. Individual settings can override mode
// Example: HAKMEM_MODE=balanced HAKMEM_THP=off
// → Use BALANCED preset, but force THP=off
Deprecation Timeline
- Phase 6.8: Both HAKMEM_MODE and individual env vars supported
- Phase 7: Prefer HAKMEM_MODE, warn if individual vars used
- Phase 8: Deprecate individual vars (only HAKMEM_MODE)
🚀 Implementation Timeline
| Step | Task | Time | Cumulative | Status |
|---|---|---|---|---|
| 0 | Baseline (done) | - | - | ✅ |
| 1 | MINIMAL mode | 1 day | 1 day | 🚧 |
| 2 | +BigCache | 0.5 day | 1.5 days | ⏳ |
| 3 | +Batch | 0.5 day | 2 days | ⏳ |
| 4 | BALANCED (ELO FROZEN) | 0.5 day | 2.5 days | ⏳ |
| 5 | FAST (TinyPool MVP) | 2-3 weeks | 3.5-4.5 weeks | ⏳ |
| 6 | LEARNING mode | 1 day | 3.6-4.6 weeks | ⏳ |
| 7 | RESEARCH mode | 0.5 day | 3.65-4.65 weeks | ⏳ |
Total: 3.7-4.7 weeks (MVP: 2.5 days, Full: 4-5 weeks)
📚 Documentation Updates
README.md
Add section:
## 🎯 Quick Start: Choosing a Mode
- **Development**: `HAKMEM_MODE=learning` (adaptive, slow)
- **Production**: `HAKMEM_MODE=fast` (mimalloc +20%)
- **General**: `HAKMEM_MODE=balanced` (default, mimalloc +40%)
- **Benchmarking**: `HAKMEM_MODE=minimal` (baseline)
- **Research**: `HAKMEM_MODE=research` (all features + debug)
New Files
PHASE_6.8_CONFIG_CLEANUP.md(this file)apps/experiments/hakmem-poc/hakmem_config.happs/experiments/hakmem-poc/hakmem_config.capps/experiments/hakmem-poc/bench_modes.shapps/experiments/hakmem-poc/analyze_modes.py
🎓 Expected Outcomes
For Paper
Before Phase 6.8:
- ❌ "hakmem is +88% slower than mimalloc"
- ⚠️ Complex configuration, hard to reproduce
- ⚠️ Unclear which features contribute to overhead
After Phase 6.8:
- ✅ "BALANCED mode: +40% overhead for adaptive learning"
- ✅ "FAST mode: +20% overhead, competitive with production allocators"
- ✅ "Each feature's impact clearly measured"
- ✅ "5 simple modes, easy to reproduce"
For Future Work
- Step 5 (TinyPool) can become Phase 7 if successful
- ChatGPT Pro's hybrid architecture validated
- Clear path to mimalloc-level performance
🏆 Final Status
Phase 6.8: 🚧 IN PROGRESS
Next Steps:
- ✅ Design document created (this file)
- 🚧 Implement Step 1 (MINIMAL mode)
- ⏳ Measure & iterate through Steps 2-7
Ready to start implementation! 🚀