469 lines
12 KiB
Markdown
469 lines
12 KiB
Markdown
|
|
# Phase 6.8: Configuration Cleanup & Mode-based Architecture
|
|||
|
|
|
|||
|
|
**Date**: 2025-10-21
|
|||
|
|
**Status**: 🚧 **IN PROGRESS**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🎯 Goal
|
|||
|
|
|
|||
|
|
**Problem**: 現状の hakmem は環境変数が多すぎて管理困難
|
|||
|
|
- `HAKMEM_FREE_POLICY`, `HAKMEM_THP`, `HAKMEM_EVO_POLICY`, etc.
|
|||
|
|
- 組み合わせが複雑で不正な設定でバグる可能性
|
|||
|
|
- ベンチマーク比較が困難(どの設定で比較?)
|
|||
|
|
|
|||
|
|
**Solution**: **5つのプリセットモード**に統合
|
|||
|
|
- シンプルな `HAKMEM_MODE=balanced` で適切な設定
|
|||
|
|
- 各機能の効果を段階的に測定可能
|
|||
|
|
- 論文での説明が容易
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📊 5 Modes Definition
|
|||
|
|
|
|||
|
|
### **Mode Overview**
|
|||
|
|
|
|||
|
|
| Mode | Use Case | Target Audience | Performance Goal |
|
|||
|
|
|------|----------|-----------------|------------------|
|
|||
|
|
| **MINIMAL** | ベースライン測定 | ベンチマーク比較 | system malloc 相当 |
|
|||
|
|
| **FAST** | 本番環境(速度優先) | Production use | mimalloc +20% |
|
|||
|
|
| **BALANCED** | デフォルト推奨 | General use | mimalloc +40% |
|
|||
|
|
| **LEARNING** | 学習フェーズ | Development | mimalloc +60% |
|
|||
|
|
| **RESEARCH** | 開発・デバッグ | Research | N/A(全機能ON) |
|
|||
|
|
|
|||
|
|
### **Feature Matrix**
|
|||
|
|
|
|||
|
|
| Feature | MINIMAL | FAST | BALANCED | LEARNING | RESEARCH |
|
|||
|
|
|---------|---------|------|----------|----------|----------|
|
|||
|
|
| **ELO learning** | ❌ | ❌ FROZEN | ✅ FROZEN | ✅ LEARN | ✅ LEARN |
|
|||
|
|
| **BigCache** | ❌ | ✅ | ✅ | ✅ | ✅ |
|
|||
|
|
| **Batch madvise** | ❌ | ✅ | ✅ | ✅ | ✅ |
|
|||
|
|
| **TinyPool (future)** | ❌ | ✅ | ✅ | ❌ | ❌ |
|
|||
|
|
| **Free policy** | batch | adaptive | adaptive | adaptive | adaptive |
|
|||
|
|
| **THP** | off | auto | auto | auto | on |
|
|||
|
|
| **Evolution lifecycle** | - | FROZEN | FROZEN | LEARN→FROZEN | LEARN |
|
|||
|
|
| **Debug logging** | ❌ | ❌ | ❌ | ⚠️ minimal | ✅ verbose |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🔧 Implementation Plan
|
|||
|
|
|
|||
|
|
### **Step 0: Baseline Measurement** ✅ (Already done in Phase 6.6-6.7)
|
|||
|
|
|
|||
|
|
Current state:
|
|||
|
|
- hakmem-evolving: 37,602 ns (VM scenario, 2MB)
|
|||
|
|
- mimalloc: 19,964 ns (+88.3% gap)
|
|||
|
|
- All features ON (uncontrolled)
|
|||
|
|
|
|||
|
|
### **Step 1: MINIMAL Mode** 🎯 (P0 - Foundation)
|
|||
|
|
|
|||
|
|
**Goal**: Create baseline with all features OFF
|
|||
|
|
|
|||
|
|
**Implementation**:
|
|||
|
|
```c
|
|||
|
|
// hakmem_config.h
|
|||
|
|
typedef enum {
|
|||
|
|
HAKMEM_MODE_MINIMAL = 0,
|
|||
|
|
HAKMEM_MODE_FAST,
|
|||
|
|
HAKMEM_MODE_BALANCED,
|
|||
|
|
HAKMEM_MODE_LEARNING,
|
|||
|
|
HAKMEM_MODE_RESEARCH,
|
|||
|
|
} HakemMode;
|
|||
|
|
|
|||
|
|
typedef struct {
|
|||
|
|
HakemMode mode;
|
|||
|
|
|
|||
|
|
// Feature flags
|
|||
|
|
int enable_elo;
|
|||
|
|
int enable_bigcache;
|
|||
|
|
int enable_batch;
|
|||
|
|
int enable_pool; // future (Step 5)
|
|||
|
|
|
|||
|
|
// Policies
|
|||
|
|
FreePolicy free_policy;
|
|||
|
|
THPPolicy thp_policy;
|
|||
|
|
const char* evo_phase; // "frozen", "learn", "canary"
|
|||
|
|
|
|||
|
|
// Debug
|
|||
|
|
int debug_logging;
|
|||
|
|
} HakemConfig;
|
|||
|
|
|
|||
|
|
extern HakemConfig g_hakem_config;
|
|||
|
|
void hak_config_init(void);
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Changes**:
|
|||
|
|
- `hakmem_config.h/c`: New files
|
|||
|
|
- `hakmem.c`: Call `hak_config_init()` in `hak_init()`
|
|||
|
|
- All modules: Check `g_hakem_config` flags before enabling features
|
|||
|
|
|
|||
|
|
**Benchmark**:
|
|||
|
|
```bash
|
|||
|
|
HAKMEM_MODE=minimal ./bench_allocators --allocator hakmem-evolving --scenario vm --iterations 100
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Expected**:
|
|||
|
|
- Performance: ~40,000-50,000 ns (slower than current, no optimizations)
|
|||
|
|
- Serves as baseline for feature comparison
|
|||
|
|
|
|||
|
|
**Estimated time**: 1 day
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### **Step 2: Enable BigCache** 🎯 (P0 - Tier-2 Cache)
|
|||
|
|
|
|||
|
|
**Goal**: Measure BigCache impact in isolation
|
|||
|
|
|
|||
|
|
**Implementation**:
|
|||
|
|
- MINIMAL + BigCache ON
|
|||
|
|
- Keep ELO/Batch/THP OFF
|
|||
|
|
|
|||
|
|
**Benchmark**:
|
|||
|
|
```bash
|
|||
|
|
HAKMEM_MODE=minimal ./bench_runner.sh --warmup 2 --runs 10
|
|||
|
|
# Then:
|
|||
|
|
# hakmem.c: g_hakem_config.enable_bigcache = 1;
|
|||
|
|
./bench_runner.sh --warmup 2 --runs 10
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Expected**:
|
|||
|
|
- VM scenario hit rate: 99%+
|
|||
|
|
- Performance: -5,000 ns improvement (cache hits avoid mmap)
|
|||
|
|
- Target: 35,000-40,000 ns
|
|||
|
|
|
|||
|
|
**Measurement**:
|
|||
|
|
- BigCache hit rate
|
|||
|
|
- mmap syscall count (should drop)
|
|||
|
|
- Performance delta
|
|||
|
|
|
|||
|
|
**Estimated time**: 0.5 day
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### **Step 3: Enable Batch madvise** 🎯 (P1 - TLB Optimization)
|
|||
|
|
|
|||
|
|
**Goal**: Measure batch madvise impact
|
|||
|
|
|
|||
|
|
**Implementation**:
|
|||
|
|
- MINIMAL + BigCache + Batch ON
|
|||
|
|
- Keep ELO/THP OFF
|
|||
|
|
|
|||
|
|
**Benchmark**:
|
|||
|
|
```bash
|
|||
|
|
# Previous: MINIMAL + BigCache
|
|||
|
|
# New: MINIMAL + BigCache + Batch
|
|||
|
|
./bench_runner.sh --warmup 2 --runs 10
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Expected**:
|
|||
|
|
- Batch flush operations: 1-10 per run
|
|||
|
|
- Performance: -500-1,000 ns improvement (TLB optimization)
|
|||
|
|
- Target: 34,000-39,000 ns
|
|||
|
|
|
|||
|
|
**Measurement**:
|
|||
|
|
- Batch statistics (blocks added, flush count)
|
|||
|
|
- madvise syscall count
|
|||
|
|
- Performance delta
|
|||
|
|
|
|||
|
|
**Estimated time**: 0.5 day
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### **Step 4: Enable ELO (FROZEN)** 🎯 (P1 - Strategy Selection)
|
|||
|
|
|
|||
|
|
**Goal**: Measure ELO overhead in FROZEN mode (no learning)
|
|||
|
|
|
|||
|
|
**Implementation**:
|
|||
|
|
- BALANCED mode = MINIMAL + BigCache + Batch + ELO(FROZEN)
|
|||
|
|
|
|||
|
|
**Benchmark**:
|
|||
|
|
```bash
|
|||
|
|
HAKMEM_MODE=balanced ./bench_runner.sh --warmup 2 --runs 10
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Expected**:
|
|||
|
|
- ELO overhead: ~100-200 ns (strategy selection per allocation)
|
|||
|
|
- Performance: +100-200 ns regression (acceptable for adaptability)
|
|||
|
|
- Target: 34,500-39,500 ns
|
|||
|
|
|
|||
|
|
**Measurement**:
|
|||
|
|
- ELO selection overhead
|
|||
|
|
- Strategy distribution
|
|||
|
|
- Performance delta
|
|||
|
|
|
|||
|
|
**Estimated time**: 0.5 day
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### **Step 5: TinyPool Implementation (FAST mode)** 🚀 (P2 - Fast Path)
|
|||
|
|
|
|||
|
|
**Goal**: Implement pool-based fast path (ChatGPT Pro proposal)
|
|||
|
|
|
|||
|
|
**Implementation**:
|
|||
|
|
- FAST mode = BALANCED + TinyPool
|
|||
|
|
- 7 size classes: 16/32/64/128/256/512/1024B
|
|||
|
|
- Per-thread free lists
|
|||
|
|
- class×shard O(1) mapping
|
|||
|
|
|
|||
|
|
**Code sketch**:
|
|||
|
|
```c
|
|||
|
|
// hakmem_pool.h
|
|||
|
|
typedef struct Node { struct Node* next; } Node;
|
|||
|
|
typedef struct { Node* head; uint32_t cnt; } FreeList;
|
|||
|
|
|
|||
|
|
#define SHARDS 64
|
|||
|
|
#define CLASSES 7 // 16B to 1024B
|
|||
|
|
|
|||
|
|
typedef struct {
|
|||
|
|
FreeList list[SHARDS];
|
|||
|
|
} ClassPools;
|
|||
|
|
|
|||
|
|
_Thread_local ClassPools tls_pools[CLASSES];
|
|||
|
|
|
|||
|
|
// Fast path (O(1))
|
|||
|
|
void* hak_alloc_small(size_t sz, void* pc);
|
|||
|
|
void hak_free_small(void* p, void* pc);
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Benchmark**:
|
|||
|
|
```bash
|
|||
|
|
# Baseline: BALANCED mode
|
|||
|
|
HAKMEM_MODE=balanced ./bench_runner.sh --warmup 10 --runs 50
|
|||
|
|
|
|||
|
|
# New: FAST mode
|
|||
|
|
HAKMEM_MODE=fast ./bench_runner.sh --warmup 10 --runs 50
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Expected**:
|
|||
|
|
- Small allocations (≤1KB): 9-15 ns fast path
|
|||
|
|
- VM scenario (2MB): No change (pool not used for large allocations)
|
|||
|
|
- Need new benchmark: tiny-hot (16/32/64B allocations)
|
|||
|
|
|
|||
|
|
**Measurement**:
|
|||
|
|
- Pool hit rate
|
|||
|
|
- Fast path latency (perf profiling)
|
|||
|
|
- Comparison with mimalloc on tiny-hot
|
|||
|
|
|
|||
|
|
**Estimated time**: 2-3 weeks (MVP: 2 weeks, MT support: +1 week)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### **Step 6: ELO LEARNING mode** 🎯 (P2 - Adaptive Learning)
|
|||
|
|
|
|||
|
|
**Goal**: Measure learning overhead and convergence
|
|||
|
|
|
|||
|
|
**Implementation**:
|
|||
|
|
- LEARNING mode = BALANCED + ELO(LEARN→FROZEN)
|
|||
|
|
|
|||
|
|
**Benchmark**:
|
|||
|
|
```bash
|
|||
|
|
HAKMEM_MODE=learning ./bench_runner.sh --warmup 100 --runs 100
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Expected**:
|
|||
|
|
- LEARN phase: +200-500 ns overhead (ELO selection + recording)
|
|||
|
|
- Convergence: 1024-2048 allocations → FROZEN
|
|||
|
|
- FROZEN phase: Same as BALANCED mode
|
|||
|
|
- Overall: +50-100 ns average (amortized)
|
|||
|
|
|
|||
|
|
**Measurement**:
|
|||
|
|
- ELO rating convergence
|
|||
|
|
- Phase transitions (LEARN → FROZEN → CANARY)
|
|||
|
|
- Learning overhead vs benefit
|
|||
|
|
|
|||
|
|
**Estimated time**: 1 day
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### **Step 7: RESEARCH mode (All features)** 🎯 (P3 - Development)
|
|||
|
|
|
|||
|
|
**Goal**: Enable all features + debug logging
|
|||
|
|
|
|||
|
|
**Implementation**:
|
|||
|
|
- RESEARCH mode = LEARNING + THP(ON) + Debug logging
|
|||
|
|
|
|||
|
|
**Use case**:
|
|||
|
|
- Development & debugging only
|
|||
|
|
- Not for benchmarking (too slow)
|
|||
|
|
|
|||
|
|
**Estimated time**: 0.5 day
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📈 Benchmark Plan
|
|||
|
|
|
|||
|
|
### **Comparison Matrix**
|
|||
|
|
|
|||
|
|
| Scenario | MINIMAL | +BigCache | +Batch | BALANCED | FAST | LEARNING |
|
|||
|
|
|----------|---------|-----------|--------|----------|------|----------|
|
|||
|
|
| **VM (2MB)** | 45,000 | 40,000 | 39,000 | 39,500 | 39,500 | 39,600 |
|
|||
|
|
| **tiny-hot** | 50 | 50 | 50 | 50 | **12** | 52 |
|
|||
|
|
| **cold-churn** | TBD | TBD | TBD | TBD | TBD | TBD |
|
|||
|
|
| **json-parse** | TBD | TBD | TBD | TBD | TBD | TBD |
|
|||
|
|
|
|||
|
|
**Note**: Numbers are estimates, actual results TBD
|
|||
|
|
|
|||
|
|
### **Metrics to Collect**
|
|||
|
|
|
|||
|
|
For each mode:
|
|||
|
|
- **Performance**: Median latency (ns)
|
|||
|
|
- **Syscalls**: mmap/munmap/madvise counts
|
|||
|
|
- **Page faults**: soft/hard counts
|
|||
|
|
- **Memory**: RSS delta
|
|||
|
|
- **Cache**: Hit rates (BigCache, Pool)
|
|||
|
|
|
|||
|
|
### **Benchmark Script**
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
#!/bin/bash
|
|||
|
|
# bench_modes.sh - Compare all modes
|
|||
|
|
|
|||
|
|
MODES="minimal balanced fast learning"
|
|||
|
|
SCENARIOS="vm cold-churn json-parse"
|
|||
|
|
|
|||
|
|
for mode in $MODES; do
|
|||
|
|
for scenario in $SCENARIOS; do
|
|||
|
|
echo "=== Mode: $mode, Scenario: $scenario ==="
|
|||
|
|
HAKMEM_MODE=$mode ./bench_runner.sh \
|
|||
|
|
--allocator hakmem-evolving \
|
|||
|
|
--scenario $scenario \
|
|||
|
|
--warmup 10 --runs 50 \
|
|||
|
|
--output results_${mode}_${scenario}.csv
|
|||
|
|
done
|
|||
|
|
done
|
|||
|
|
|
|||
|
|
# Aggregate results
|
|||
|
|
python3 analyze_modes.py results_*.csv
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🎯 Success Metrics
|
|||
|
|
|
|||
|
|
### **Step 1-4 (MINIMAL → BALANCED)**
|
|||
|
|
|
|||
|
|
- ✅ Each feature's impact is measurable
|
|||
|
|
- ✅ Performance regression < 10% per feature
|
|||
|
|
- ✅ Total BALANCED overhead: +40-60% vs mimalloc
|
|||
|
|
|
|||
|
|
### **Step 5 (FAST mode with TinyPool)**
|
|||
|
|
|
|||
|
|
- ✅ tiny-hot benchmark: mimalloc +20% or better
|
|||
|
|
- ✅ VM scenario: No regression vs BALANCED
|
|||
|
|
- ✅ Pool hit rate: 90%+ for small allocations
|
|||
|
|
|
|||
|
|
### **Step 6 (LEARNING mode)**
|
|||
|
|
|
|||
|
|
- ✅ Convergence within 2048 allocations
|
|||
|
|
- ✅ Learning overhead amortized to < 5%
|
|||
|
|
- ✅ FROZEN performance = BALANCED
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📝 Migration Plan (Backward Compatibility)
|
|||
|
|
|
|||
|
|
### **Environment Variable Priority**
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
// 1. HAKMEM_MODE has highest priority
|
|||
|
|
const char* mode_env = getenv("HAKMEM_MODE");
|
|||
|
|
if (mode_env) {
|
|||
|
|
hak_config_apply_mode(mode_env); // Apply preset
|
|||
|
|
} else {
|
|||
|
|
// 2. Fall back to individual settings (legacy)
|
|||
|
|
const char* free_policy = getenv("HAKMEM_FREE_POLICY");
|
|||
|
|
const char* thp = getenv("HAKMEM_THP");
|
|||
|
|
// ... etc
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// 3. Individual settings can override mode
|
|||
|
|
// Example: HAKMEM_MODE=balanced HAKMEM_THP=off
|
|||
|
|
// → Use BALANCED preset, but force THP=off
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### **Deprecation Timeline**
|
|||
|
|
|
|||
|
|
- **Phase 6.8**: Both HAKMEM_MODE and individual env vars supported
|
|||
|
|
- **Phase 7**: Prefer HAKMEM_MODE, warn if individual vars used
|
|||
|
|
- **Phase 8**: Deprecate individual vars (only HAKMEM_MODE)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🚀 Implementation Timeline
|
|||
|
|
|
|||
|
|
| Step | Task | Time | Cumulative | Status |
|
|||
|
|
|------|------|------|------------|--------|
|
|||
|
|
| 0 | Baseline (done) | - | - | ✅ |
|
|||
|
|
| 1 | MINIMAL mode | 1 day | 1 day | 🚧 |
|
|||
|
|
| 2 | +BigCache | 0.5 day | 1.5 days | ⏳ |
|
|||
|
|
| 3 | +Batch | 0.5 day | 2 days | ⏳ |
|
|||
|
|
| 4 | BALANCED (ELO FROZEN) | 0.5 day | 2.5 days | ⏳ |
|
|||
|
|
| 5 | FAST (TinyPool MVP) | 2-3 weeks | 3.5-4.5 weeks | ⏳ |
|
|||
|
|
| 6 | LEARNING mode | 1 day | 3.6-4.6 weeks | ⏳ |
|
|||
|
|
| 7 | RESEARCH mode | 0.5 day | 3.65-4.65 weeks | ⏳ |
|
|||
|
|
|
|||
|
|
**Total**: 3.7-4.7 weeks (MVP: 2.5 days, Full: 4-5 weeks)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📚 Documentation Updates
|
|||
|
|
|
|||
|
|
### **README.md**
|
|||
|
|
|
|||
|
|
Add section:
|
|||
|
|
```markdown
|
|||
|
|
## 🎯 Quick Start: Choosing a Mode
|
|||
|
|
|
|||
|
|
- **Development**: `HAKMEM_MODE=learning` (adaptive, slow)
|
|||
|
|
- **Production**: `HAKMEM_MODE=fast` (mimalloc +20%)
|
|||
|
|
- **General**: `HAKMEM_MODE=balanced` (default, mimalloc +40%)
|
|||
|
|
- **Benchmarking**: `HAKMEM_MODE=minimal` (baseline)
|
|||
|
|
- **Research**: `HAKMEM_MODE=research` (all features + debug)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### **New Files**
|
|||
|
|
|
|||
|
|
- `PHASE_6.8_CONFIG_CLEANUP.md` (this file)
|
|||
|
|
- `apps/experiments/hakmem-poc/hakmem_config.h`
|
|||
|
|
- `apps/experiments/hakmem-poc/hakmem_config.c`
|
|||
|
|
- `apps/experiments/hakmem-poc/bench_modes.sh`
|
|||
|
|
- `apps/experiments/hakmem-poc/analyze_modes.py`
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🎓 Expected Outcomes
|
|||
|
|
|
|||
|
|
### **For Paper**
|
|||
|
|
|
|||
|
|
**Before Phase 6.8**:
|
|||
|
|
- ❌ "hakmem is +88% slower than mimalloc"
|
|||
|
|
- ⚠️ Complex configuration, hard to reproduce
|
|||
|
|
- ⚠️ Unclear which features contribute to overhead
|
|||
|
|
|
|||
|
|
**After Phase 6.8**:
|
|||
|
|
- ✅ "BALANCED mode: +40% overhead for adaptive learning"
|
|||
|
|
- ✅ "FAST mode: +20% overhead, competitive with production allocators"
|
|||
|
|
- ✅ "Each feature's impact clearly measured"
|
|||
|
|
- ✅ "5 simple modes, easy to reproduce"
|
|||
|
|
|
|||
|
|
### **For Future Work**
|
|||
|
|
|
|||
|
|
- Step 5 (TinyPool) can become **Phase 7** if successful
|
|||
|
|
- ChatGPT Pro's hybrid architecture validated
|
|||
|
|
- Clear path to mimalloc-level performance
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🏆 Final Status
|
|||
|
|
|
|||
|
|
**Phase 6.8**: 🚧 **IN PROGRESS**
|
|||
|
|
|
|||
|
|
**Next Steps**:
|
|||
|
|
1. ✅ Design document created (this file)
|
|||
|
|
2. 🚧 Implement Step 1 (MINIMAL mode)
|
|||
|
|
3. ⏳ Measure & iterate through Steps 2-7
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**Ready to start implementation!** 🚀
|