Files
hakmem/docs/archive/PHASE_6.8_CONFIG_CLEANUP.md
Moe Charm (CI) 52386401b3 Debug Counters Implementation - Clean History
Major Features:
- Debug counter infrastructure for Refill Stage tracking
- Free Pipeline counters (ss_local, ss_remote, tls_sll)
- Diagnostic counters for early return analysis
- Unified larson.sh benchmark runner with profiles
- Phase 6-3 regression analysis documentation

Bug Fixes:
- Fix SuperSlab disabled by default (HAKMEM_TINY_USE_SUPERSLAB)
- Fix profile variable naming consistency
- Add .gitignore patterns for large files

Performance:
- Phase 6-3: 4.79 M ops/s (has OOM risk)
- With SuperSlab: 3.13 M ops/s (+19% improvement)

This is a clean repository without large log files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-05 12:31:14 +09:00

469 lines
12 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Phase 6.8: Configuration Cleanup & Mode-based Architecture
**Date**: 2025-10-21
**Status**: 🚧 **IN PROGRESS**
---
## 🎯 Goal
**Problem**: 現状の hakmem は環境変数が多すぎて管理困難
- `HAKMEM_FREE_POLICY`, `HAKMEM_THP`, `HAKMEM_EVO_POLICY`, etc.
- 組み合わせが複雑で不正な設定でバグる可能性
- ベンチマーク比較が困難(どの設定で比較?)
**Solution**: **5つのプリセットモード**に統合
- シンプルな `HAKMEM_MODE=balanced` で適切な設定
- 各機能の効果を段階的に測定可能
- 論文での説明が容易
---
## 📊 5 Modes Definition
### **Mode Overview**
| Mode | Use Case | Target Audience | Performance Goal |
|------|----------|-----------------|------------------|
| **MINIMAL** | ベースライン測定 | ベンチマーク比較 | system malloc 相当 |
| **FAST** | 本番環境(速度優先) | Production use | mimalloc +20% |
| **BALANCED** | デフォルト推奨 | General use | mimalloc +40% |
| **LEARNING** | 学習フェーズ | Development | mimalloc +60% |
| **RESEARCH** | 開発・デバッグ | Research | N/A全機能ON |
### **Feature Matrix**
| Feature | MINIMAL | FAST | BALANCED | LEARNING | RESEARCH |
|---------|---------|------|----------|----------|----------|
| **ELO learning** | ❌ | ❌ FROZEN | ✅ FROZEN | ✅ LEARN | ✅ LEARN |
| **BigCache** | ❌ | ✅ | ✅ | ✅ | ✅ |
| **Batch madvise** | ❌ | ✅ | ✅ | ✅ | ✅ |
| **TinyPool (future)** | ❌ | ✅ | ✅ | ❌ | ❌ |
| **Free policy** | batch | adaptive | adaptive | adaptive | adaptive |
| **THP** | off | auto | auto | auto | on |
| **Evolution lifecycle** | - | FROZEN | FROZEN | LEARN→FROZEN | LEARN |
| **Debug logging** | ❌ | ❌ | ❌ | ⚠️ minimal | ✅ verbose |
---
## 🔧 Implementation Plan
### **Step 0: Baseline Measurement** ✅ (Already done in Phase 6.6-6.7)
Current state:
- hakmem-evolving: 37,602 ns (VM scenario, 2MB)
- mimalloc: 19,964 ns (+88.3% gap)
- All features ON (uncontrolled)
### **Step 1: MINIMAL Mode** 🎯 (P0 - Foundation)
**Goal**: Create baseline with all features OFF
**Implementation**:
```c
// hakmem_config.h
typedef enum {
HAKMEM_MODE_MINIMAL = 0,
HAKMEM_MODE_FAST,
HAKMEM_MODE_BALANCED,
HAKMEM_MODE_LEARNING,
HAKMEM_MODE_RESEARCH,
} HakemMode;
typedef struct {
HakemMode mode;
// Feature flags
int enable_elo;
int enable_bigcache;
int enable_batch;
int enable_pool; // future (Step 5)
// Policies
FreePolicy free_policy;
THPPolicy thp_policy;
const char* evo_phase; // "frozen", "learn", "canary"
// Debug
int debug_logging;
} HakemConfig;
extern HakemConfig g_hakem_config;
void hak_config_init(void);
```
**Changes**:
- `hakmem_config.h/c`: New files
- `hakmem.c`: Call `hak_config_init()` in `hak_init()`
- All modules: Check `g_hakem_config` flags before enabling features
**Benchmark**:
```bash
HAKMEM_MODE=minimal ./bench_allocators --allocator hakmem-evolving --scenario vm --iterations 100
```
**Expected**:
- Performance: ~40,000-50,000 ns (slower than current, no optimizations)
- Serves as baseline for feature comparison
**Estimated time**: 1 day
---
### **Step 2: Enable BigCache** 🎯 (P0 - Tier-2 Cache)
**Goal**: Measure BigCache impact in isolation
**Implementation**:
- MINIMAL + BigCache ON
- Keep ELO/Batch/THP OFF
**Benchmark**:
```bash
HAKMEM_MODE=minimal ./bench_runner.sh --warmup 2 --runs 10
# Then:
# hakmem.c: g_hakem_config.enable_bigcache = 1;
./bench_runner.sh --warmup 2 --runs 10
```
**Expected**:
- VM scenario hit rate: 99%+
- Performance: -5,000 ns improvement (cache hits avoid mmap)
- Target: 35,000-40,000 ns
**Measurement**:
- BigCache hit rate
- mmap syscall count (should drop)
- Performance delta
**Estimated time**: 0.5 day
---
### **Step 3: Enable Batch madvise** 🎯 (P1 - TLB Optimization)
**Goal**: Measure batch madvise impact
**Implementation**:
- MINIMAL + BigCache + Batch ON
- Keep ELO/THP OFF
**Benchmark**:
```bash
# Previous: MINIMAL + BigCache
# New: MINIMAL + BigCache + Batch
./bench_runner.sh --warmup 2 --runs 10
```
**Expected**:
- Batch flush operations: 1-10 per run
- Performance: -500-1,000 ns improvement (TLB optimization)
- Target: 34,000-39,000 ns
**Measurement**:
- Batch statistics (blocks added, flush count)
- madvise syscall count
- Performance delta
**Estimated time**: 0.5 day
---
### **Step 4: Enable ELO (FROZEN)** 🎯 (P1 - Strategy Selection)
**Goal**: Measure ELO overhead in FROZEN mode (no learning)
**Implementation**:
- BALANCED mode = MINIMAL + BigCache + Batch + ELO(FROZEN)
**Benchmark**:
```bash
HAKMEM_MODE=balanced ./bench_runner.sh --warmup 2 --runs 10
```
**Expected**:
- ELO overhead: ~100-200 ns (strategy selection per allocation)
- Performance: +100-200 ns regression (acceptable for adaptability)
- Target: 34,500-39,500 ns
**Measurement**:
- ELO selection overhead
- Strategy distribution
- Performance delta
**Estimated time**: 0.5 day
---
### **Step 5: TinyPool Implementation (FAST mode)** 🚀 (P2 - Fast Path)
**Goal**: Implement pool-based fast path (ChatGPT Pro proposal)
**Implementation**:
- FAST mode = BALANCED + TinyPool
- 7 size classes: 16/32/64/128/256/512/1024B
- Per-thread free lists
- class×shard O(1) mapping
**Code sketch**:
```c
// hakmem_pool.h
typedef struct Node { struct Node* next; } Node;
typedef struct { Node* head; uint32_t cnt; } FreeList;
#define SHARDS 64
#define CLASSES 7 // 16B to 1024B
typedef struct {
FreeList list[SHARDS];
} ClassPools;
_Thread_local ClassPools tls_pools[CLASSES];
// Fast path (O(1))
void* hak_alloc_small(size_t sz, void* pc);
void hak_free_small(void* p, void* pc);
```
**Benchmark**:
```bash
# Baseline: BALANCED mode
HAKMEM_MODE=balanced ./bench_runner.sh --warmup 10 --runs 50
# New: FAST mode
HAKMEM_MODE=fast ./bench_runner.sh --warmup 10 --runs 50
```
**Expected**:
- Small allocations (≤1KB): 9-15 ns fast path
- VM scenario (2MB): No change (pool not used for large allocations)
- Need new benchmark: tiny-hot (16/32/64B allocations)
**Measurement**:
- Pool hit rate
- Fast path latency (perf profiling)
- Comparison with mimalloc on tiny-hot
**Estimated time**: 2-3 weeks (MVP: 2 weeks, MT support: +1 week)
---
### **Step 6: ELO LEARNING mode** 🎯 (P2 - Adaptive Learning)
**Goal**: Measure learning overhead and convergence
**Implementation**:
- LEARNING mode = BALANCED + ELO(LEARN→FROZEN)
**Benchmark**:
```bash
HAKMEM_MODE=learning ./bench_runner.sh --warmup 100 --runs 100
```
**Expected**:
- LEARN phase: +200-500 ns overhead (ELO selection + recording)
- Convergence: 1024-2048 allocations → FROZEN
- FROZEN phase: Same as BALANCED mode
- Overall: +50-100 ns average (amortized)
**Measurement**:
- ELO rating convergence
- Phase transitions (LEARN → FROZEN → CANARY)
- Learning overhead vs benefit
**Estimated time**: 1 day
---
### **Step 7: RESEARCH mode (All features)** 🎯 (P3 - Development)
**Goal**: Enable all features + debug logging
**Implementation**:
- RESEARCH mode = LEARNING + THP(ON) + Debug logging
**Use case**:
- Development & debugging only
- Not for benchmarking (too slow)
**Estimated time**: 0.5 day
---
## 📈 Benchmark Plan
### **Comparison Matrix**
| Scenario | MINIMAL | +BigCache | +Batch | BALANCED | FAST | LEARNING |
|----------|---------|-----------|--------|----------|------|----------|
| **VM (2MB)** | 45,000 | 40,000 | 39,000 | 39,500 | 39,500 | 39,600 |
| **tiny-hot** | 50 | 50 | 50 | 50 | **12** | 52 |
| **cold-churn** | TBD | TBD | TBD | TBD | TBD | TBD |
| **json-parse** | TBD | TBD | TBD | TBD | TBD | TBD |
**Note**: Numbers are estimates, actual results TBD
### **Metrics to Collect**
For each mode:
- **Performance**: Median latency (ns)
- **Syscalls**: mmap/munmap/madvise counts
- **Page faults**: soft/hard counts
- **Memory**: RSS delta
- **Cache**: Hit rates (BigCache, Pool)
### **Benchmark Script**
```bash
#!/bin/bash
# bench_modes.sh - Compare all modes
MODES="minimal balanced fast learning"
SCENARIOS="vm cold-churn json-parse"
for mode in $MODES; do
for scenario in $SCENARIOS; do
echo "=== Mode: $mode, Scenario: $scenario ==="
HAKMEM_MODE=$mode ./bench_runner.sh \
--allocator hakmem-evolving \
--scenario $scenario \
--warmup 10 --runs 50 \
--output results_${mode}_${scenario}.csv
done
done
# Aggregate results
python3 analyze_modes.py results_*.csv
```
---
## 🎯 Success Metrics
### **Step 1-4 (MINIMAL → BALANCED)**
- ✅ Each feature's impact is measurable
- ✅ Performance regression < 10% per feature
- Total BALANCED overhead: +40-60% vs mimalloc
### **Step 5 (FAST mode with TinyPool)**
- tiny-hot benchmark: mimalloc +20% or better
- VM scenario: No regression vs BALANCED
- Pool hit rate: 90%+ for small allocations
### **Step 6 (LEARNING mode)**
- Convergence within 2048 allocations
- Learning overhead amortized to < 5%
- FROZEN performance = BALANCED
---
## 📝 Migration Plan (Backward Compatibility)
### **Environment Variable Priority**
```c
// 1. HAKMEM_MODE has highest priority
const char* mode_env = getenv("HAKMEM_MODE");
if (mode_env) {
hak_config_apply_mode(mode_env); // Apply preset
} else {
// 2. Fall back to individual settings (legacy)
const char* free_policy = getenv("HAKMEM_FREE_POLICY");
const char* thp = getenv("HAKMEM_THP");
// ... etc
}
// 3. Individual settings can override mode
// Example: HAKMEM_MODE=balanced HAKMEM_THP=off
// → Use BALANCED preset, but force THP=off
```
### **Deprecation Timeline**
- **Phase 6.8**: Both HAKMEM_MODE and individual env vars supported
- **Phase 7**: Prefer HAKMEM_MODE, warn if individual vars used
- **Phase 8**: Deprecate individual vars (only HAKMEM_MODE)
---
## 🚀 Implementation Timeline
| Step | Task | Time | Cumulative | Status |
|------|------|------|------------|--------|
| 0 | Baseline (done) | - | - | |
| 1 | MINIMAL mode | 1 day | 1 day | 🚧 |
| 2 | +BigCache | 0.5 day | 1.5 days | |
| 3 | +Batch | 0.5 day | 2 days | |
| 4 | BALANCED (ELO FROZEN) | 0.5 day | 2.5 days | |
| 5 | FAST (TinyPool MVP) | 2-3 weeks | 3.5-4.5 weeks | |
| 6 | LEARNING mode | 1 day | 3.6-4.6 weeks | |
| 7 | RESEARCH mode | 0.5 day | 3.65-4.65 weeks | |
**Total**: 3.7-4.7 weeks (MVP: 2.5 days, Full: 4-5 weeks)
---
## 📚 Documentation Updates
### **README.md**
Add section:
```markdown
## 🎯 Quick Start: Choosing a Mode
- **Development**: `HAKMEM_MODE=learning` (adaptive, slow)
- **Production**: `HAKMEM_MODE=fast` (mimalloc +20%)
- **General**: `HAKMEM_MODE=balanced` (default, mimalloc +40%)
- **Benchmarking**: `HAKMEM_MODE=minimal` (baseline)
- **Research**: `HAKMEM_MODE=research` (all features + debug)
```
### **New Files**
- `PHASE_6.8_CONFIG_CLEANUP.md` (this file)
- `apps/experiments/hakmem-poc/hakmem_config.h`
- `apps/experiments/hakmem-poc/hakmem_config.c`
- `apps/experiments/hakmem-poc/bench_modes.sh`
- `apps/experiments/hakmem-poc/analyze_modes.py`
---
## 🎓 Expected Outcomes
### **For Paper**
**Before Phase 6.8**:
- "hakmem is +88% slower than mimalloc"
- Complex configuration, hard to reproduce
- Unclear which features contribute to overhead
**After Phase 6.8**:
- "BALANCED mode: +40% overhead for adaptive learning"
- "FAST mode: +20% overhead, competitive with production allocators"
- "Each feature's impact clearly measured"
- "5 simple modes, easy to reproduce"
### **For Future Work**
- Step 5 (TinyPool) can become **Phase 7** if successful
- ChatGPT Pro's hybrid architecture validated
- Clear path to mimalloc-level performance
---
## 🏆 Final Status
**Phase 6.8**: 🚧 **IN PROGRESS**
**Next Steps**:
1. Design document created (this file)
2. 🚧 Implement Step 1 (MINIMAL mode)
3. Measure & iterate through Steps 2-7
---
**Ready to start implementation!** 🚀