hakmem/docs/archive/PHASE_6.8_CONFIG_CLEANUP.md

# Phase 6.8: Configuration Cleanup & Mode-based Architecture

**Date**: 2025-10-21
**Status**: 🚧 **IN PROGRESS**

---

## 🎯 Goal

**Problem**: 現状の hakmem は環境変数が多すぎて管理困難
- `HAKMEM_FREE_POLICY`, `HAKMEM_THP`, `HAKMEM_EVO_POLICY`, etc.
- 組み合わせが複雑で不正な設定でバグる可能性
- ベンチマーク比較が困難（どの設定で比較？）

**Solution**: **5つのプリセットモード**に統合
- シンプルな `HAKMEM_MODE=balanced` で適切な設定
- 各機能の効果を段階的に測定可能
- 論文での説明が容易

---

## 📊 5 Modes Definition

### **Mode Overview**

| Mode | Use Case | Target Audience | Performance Goal |
|------|----------|-----------------|------------------|
| **MINIMAL** | ベースライン測定 | ベンチマーク比較 | system malloc 相当 |
| **FAST** | 本番環境（速度優先） | Production use | mimalloc +20% |
| **BALANCED** | デフォルト推奨 | General use | mimalloc +40% |
| **LEARNING** | 学習フェーズ | Development | mimalloc +60% |
| **RESEARCH** | 開発・デバッグ | Research | N/A（全機能ON） |

### **Feature Matrix**

| Feature | MINIMAL | FAST | BALANCED | LEARNING | RESEARCH |
|---------|---------|------|----------|----------|----------|
| **ELO learning** | ❌ | ❌ FROZEN | ✅ FROZEN | ✅ LEARN | ✅ LEARN |
| **BigCache** | ❌ | ✅ | ✅ | ✅ | ✅ |
| **Batch madvise** | ❌ | ✅ | ✅ | ✅ | ✅ |
| **TinyPool (future)** | ❌ | ✅ | ✅ | ❌ | ❌ |
| **Free policy** | batch | adaptive | adaptive | adaptive | adaptive |
| **THP** | off | auto | auto | auto | on |
| **Evolution lifecycle** | - | FROZEN | FROZEN | LEARN→FROZEN | LEARN |
| **Debug logging** | ❌ | ❌ | ❌ | ⚠️ minimal | ✅ verbose |

---

## 🔧 Implementation Plan

### **Step 0: Baseline Measurement** ✅ (Already done in Phase 6.6-6.7)

Current state:
- hakmem-evolving: 37,602 ns (VM scenario, 2MB)
- mimalloc: 19,964 ns (+88.3% gap)
- All features ON (uncontrolled)

### **Step 1: MINIMAL Mode** 🎯 (P0 - Foundation)

**Goal**: Create baseline with all features OFF

**Implementation**:
```c
// hakmem_config.h
typedef enum {
    HAKMEM_MODE_MINIMAL = 0,
    HAKMEM_MODE_FAST,
    HAKMEM_MODE_BALANCED,
    HAKMEM_MODE_LEARNING,
    HAKMEM_MODE_RESEARCH,
} HakemMode;

typedef struct {
    HakemMode mode;

    // Feature flags
    int enable_elo;
    int enable_bigcache;
    int enable_batch;
    int enable_pool;  // future (Step 5)

    // Policies
    FreePolicy free_policy;
    THPPolicy thp_policy;
    const char* evo_phase;  // "frozen", "learn", "canary"

    // Debug
    int debug_logging;
} HakemConfig;

extern HakemConfig g_hakem_config;
void hak_config_init(void);
```

**Changes**:
- `hakmem_config.h/c`: New files
- `hakmem.c`: Call `hak_config_init()` in `hak_init()`
- All modules: Check `g_hakem_config` flags before enabling features

**Benchmark**:
```bash
HAKMEM_MODE=minimal ./bench_allocators --allocator hakmem-evolving --scenario vm --iterations 100
```

**Expected**:
- Performance: ~40,000-50,000 ns (slower than current, no optimizations)
- Serves as baseline for feature comparison

**Estimated time**: 1 day

---

### **Step 2: Enable BigCache** 🎯 (P0 - Tier-2 Cache)

**Goal**: Measure BigCache impact in isolation

**Implementation**:
- MINIMAL + BigCache ON
- Keep ELO/Batch/THP OFF

**Benchmark**:
```bash
HAKMEM_MODE=minimal ./bench_runner.sh --warmup 2 --runs 10
# Then:
# hakmem.c: g_hakem_config.enable_bigcache = 1;
./bench_runner.sh --warmup 2 --runs 10
```

**Expected**:
- VM scenario hit rate: 99%+
- Performance: -5,000 ns improvement (cache hits avoid mmap)
- Target: 35,000-40,000 ns

**Measurement**:
- BigCache hit rate
- mmap syscall count (should drop)
- Performance delta

**Estimated time**: 0.5 day

---

### **Step 3: Enable Batch madvise** 🎯 (P1 - TLB Optimization)

**Goal**: Measure batch madvise impact

**Implementation**:
- MINIMAL + BigCache + Batch ON
- Keep ELO/THP OFF

**Benchmark**:
```bash
# Previous: MINIMAL + BigCache
# New: MINIMAL + BigCache + Batch
./bench_runner.sh --warmup 2 --runs 10
```

**Expected**:
- Batch flush operations: 1-10 per run
- Performance: -500-1,000 ns improvement (TLB optimization)
- Target: 34,000-39,000 ns

**Measurement**:
- Batch statistics (blocks added, flush count)
- madvise syscall count
- Performance delta

**Estimated time**: 0.5 day

---

### **Step 4: Enable ELO (FROZEN)** 🎯 (P1 - Strategy Selection)

**Goal**: Measure ELO overhead in FROZEN mode (no learning)

**Implementation**:
- BALANCED mode = MINIMAL + BigCache + Batch + ELO(FROZEN)

**Benchmark**:
```bash
HAKMEM_MODE=balanced ./bench_runner.sh --warmup 2 --runs 10
```

**Expected**:
- ELO overhead: ~100-200 ns (strategy selection per allocation)
- Performance: +100-200 ns regression (acceptable for adaptability)
- Target: 34,500-39,500 ns

**Measurement**:
- ELO selection overhead
- Strategy distribution
- Performance delta

**Estimated time**: 0.5 day

---

### **Step 5: TinyPool Implementation (FAST mode)** 🚀 (P2 - Fast Path)

**Goal**: Implement pool-based fast path (ChatGPT Pro proposal)

**Implementation**:
- FAST mode = BALANCED + TinyPool
- 7 size classes: 16/32/64/128/256/512/1024B
- Per-thread free lists
- class×shard O(1) mapping

**Code sketch**:
```c
// hakmem_pool.h
typedef struct Node { struct Node* next; } Node;
typedef struct { Node* head; uint32_t cnt; } FreeList;

#define SHARDS 64
#define CLASSES 7  // 16B to 1024B

typedef struct {
    FreeList list[SHARDS];
} ClassPools;

_Thread_local ClassPools tls_pools[CLASSES];

// Fast path (O(1))
void* hak_alloc_small(size_t sz, void* pc);
void hak_free_small(void* p, void* pc);
```

**Benchmark**:
```bash
# Baseline: BALANCED mode
HAKMEM_MODE=balanced ./bench_runner.sh --warmup 10 --runs 50

# New: FAST mode
HAKMEM_MODE=fast ./bench_runner.sh --warmup 10 --runs 50
```

**Expected**:
- Small allocations (≤1KB): 9-15 ns fast path
- VM scenario (2MB): No change (pool not used for large allocations)
- Need new benchmark: tiny-hot (16/32/64B allocations)

**Measurement**:
- Pool hit rate
- Fast path latency (perf profiling)
- Comparison with mimalloc on tiny-hot

**Estimated time**: 2-3 weeks (MVP: 2 weeks, MT support: +1 week)

---

### **Step 6: ELO LEARNING mode** 🎯 (P2 - Adaptive Learning)

**Goal**: Measure learning overhead and convergence

**Implementation**:
- LEARNING mode = BALANCED + ELO(LEARN→FROZEN)

**Benchmark**:
```bash
HAKMEM_MODE=learning ./bench_runner.sh --warmup 100 --runs 100
```

**Expected**:
- LEARN phase: +200-500 ns overhead (ELO selection + recording)
- Convergence: 1024-2048 allocations → FROZEN
- FROZEN phase: Same as BALANCED mode
- Overall: +50-100 ns average (amortized)

**Measurement**:
- ELO rating convergence
- Phase transitions (LEARN → FROZEN → CANARY)
- Learning overhead vs benefit

**Estimated time**: 1 day

---

### **Step 7: RESEARCH mode (All features)** 🎯 (P3 - Development)

**Goal**: Enable all features + debug logging

**Implementation**:
- RESEARCH mode = LEARNING + THP(ON) + Debug logging

**Use case**:
- Development & debugging only
- Not for benchmarking (too slow)

**Estimated time**: 0.5 day

---

## 📈 Benchmark Plan

### **Comparison Matrix**

| Scenario | MINIMAL | +BigCache | +Batch | BALANCED | FAST | LEARNING |
|----------|---------|-----------|--------|----------|------|----------|
| **VM (2MB)** | 45,000 | 40,000 | 39,000 | 39,500 | 39,500 | 39,600 |
| **tiny-hot** | 50 | 50 | 50 | 50 | **12** | 52 |
| **cold-churn** | TBD | TBD | TBD | TBD | TBD | TBD |
| **json-parse** | TBD | TBD | TBD | TBD | TBD | TBD |

**Note**: Numbers are estimates, actual results TBD

### **Metrics to Collect**

For each mode:
- **Performance**: Median latency (ns)
- **Syscalls**: mmap/munmap/madvise counts
- **Page faults**: soft/hard counts
- **Memory**: RSS delta
- **Cache**: Hit rates (BigCache, Pool)

### **Benchmark Script**

```bash
#!/bin/bash
# bench_modes.sh - Compare all modes

MODES="minimal balanced fast learning"
SCENARIOS="vm cold-churn json-parse"

for mode in $MODES; do
    for scenario in $SCENARIOS; do
        echo "=== Mode: $mode, Scenario: $scenario ==="
        HAKMEM_MODE=$mode ./bench_runner.sh \
            --allocator hakmem-evolving \
            --scenario $scenario \
            --warmup 10 --runs 50 \
            --output results_${mode}_${scenario}.csv
    done
done

# Aggregate results
python3 analyze_modes.py results_*.csv
```

---

## 🎯 Success Metrics

### **Step 1-4 (MINIMAL → BALANCED)**

- ✅ Each feature's impact is measurable
- ✅ Performance regression < 10% per feature
- ✅ Total BALANCED overhead: +40-60% vs mimalloc

### **Step 5 (FAST mode with TinyPool)**

- ✅ tiny-hot benchmark: mimalloc +20% or better
- ✅ VM scenario: No regression vs BALANCED
- ✅ Pool hit rate: 90%+ for small allocations

### **Step 6 (LEARNING mode)**

- ✅ Convergence within 2048 allocations
- ✅ Learning overhead amortized to < 5%
- ✅ FROZEN performance = BALANCED

---

## 📝 Migration Plan (Backward Compatibility)

### **Environment Variable Priority**

```c
// 1. HAKMEM_MODE has highest priority
const char* mode_env = getenv("HAKMEM_MODE");
if (mode_env) {
    hak_config_apply_mode(mode_env);  // Apply preset
} else {
    // 2. Fall back to individual settings (legacy)
    const char* free_policy = getenv("HAKMEM_FREE_POLICY");
    const char* thp = getenv("HAKMEM_THP");
    // ... etc
}

// 3. Individual settings can override mode
// Example: HAKMEM_MODE=balanced HAKMEM_THP=off
//   → Use BALANCED preset, but force THP=off
```

### **Deprecation Timeline**

- **Phase 6.8**: Both HAKMEM_MODE and individual env vars supported
- **Phase 7**: Prefer HAKMEM_MODE, warn if individual vars used
- **Phase 8**: Deprecate individual vars (only HAKMEM_MODE)

---

## 🚀 Implementation Timeline

| Step | Task | Time | Cumulative | Status |
|------|------|------|------------|--------|
| 0 | Baseline (done) | - | - | ✅ |
| 1 | MINIMAL mode | 1 day | 1 day | 🚧 |
| 2 | +BigCache | 0.5 day | 1.5 days | ⏳ |
| 3 | +Batch | 0.5 day | 2 days | ⏳ |
| 4 | BALANCED (ELO FROZEN) | 0.5 day | 2.5 days | ⏳ |
| 5 | FAST (TinyPool MVP) | 2-3 weeks | 3.5-4.5 weeks | ⏳ |
| 6 | LEARNING mode | 1 day | 3.6-4.6 weeks | ⏳ |
| 7 | RESEARCH mode | 0.5 day | 3.65-4.65 weeks | ⏳ |

**Total**: 3.7-4.7 weeks (MVP: 2.5 days, Full: 4-5 weeks)

---

## 📚 Documentation Updates

### **README.md**

Add section:
```markdown
## 🎯 Quick Start: Choosing a Mode

- **Development**: `HAKMEM_MODE=learning` (adaptive, slow)
- **Production**: `HAKMEM_MODE=fast` (mimalloc +20%)
- **General**: `HAKMEM_MODE=balanced` (default, mimalloc +40%)
- **Benchmarking**: `HAKMEM_MODE=minimal` (baseline)
- **Research**: `HAKMEM_MODE=research` (all features + debug)
```

### **New Files**

- `PHASE_6.8_CONFIG_CLEANUP.md` (this file)
- `apps/experiments/hakmem-poc/hakmem_config.h`
- `apps/experiments/hakmem-poc/hakmem_config.c`
- `apps/experiments/hakmem-poc/bench_modes.sh`
- `apps/experiments/hakmem-poc/analyze_modes.py`

---

## 🎓 Expected Outcomes

### **For Paper**

**Before Phase 6.8**:
- ❌ "hakmem is +88% slower than mimalloc"
- ⚠️ Complex configuration, hard to reproduce
- ⚠️ Unclear which features contribute to overhead

**After Phase 6.8**:
- ✅ "BALANCED mode: +40% overhead for adaptive learning"
- ✅ "FAST mode: +20% overhead, competitive with production allocators"
- ✅ "Each feature's impact clearly measured"
- ✅ "5 simple modes, easy to reproduce"

### **For Future Work**

- Step 5 (TinyPool) can become **Phase 7** if successful
- ChatGPT Pro's hybrid architecture validated
- Clear path to mimalloc-level performance

---

## 🏆 Final Status

**Phase 6.8**: 🚧 **IN PROGRESS**

**Next Steps**:
1. ✅ Design document created (this file)
2. 🚧 Implement Step 1 (MINIMAL mode)
3. ⏳ Measure & iterate through Steps 2-7

---

**Ready to start implementation!** 🚀
-												Debug Counters Implementation - Clean History

Major Features:
- Debug counter infrastructure for Refill Stage tracking
- Free Pipeline counters (ss_local, ss_remote, tls_sll)
- Diagnostic counters for early return analysis
- Unified larson.sh benchmark runner with profiles
- Phase 6-3 regression analysis documentation

Bug Fixes:
- Fix SuperSlab disabled by default (HAKMEM_TINY_USE_SUPERSLAB)
- Fix profile variable naming consistency
- Add .gitignore patterns for large files

Performance:
- Phase 6-3: 4.79 M ops/s (has OOM risk)
- With SuperSlab: 3.13 M ops/s (+19% improvement)

This is a clean repository without large log files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>

											
										
										
											2025-11-05 12:31:14 +09:00
+								# Phase 6.8: Configuration Cleanup & Mode-based Architecture
 								**Date**: 2025-10-21
 								**Status**: 🚧 **IN PROGRESS**
 								---
 								## 🎯 Goal
 								**Problem**: 現状の hakmem は環境変数が多すぎて管理困難
 								- `HAKMEM_FREE_POLICY`, `HAKMEM_THP`, `HAKMEM_EVO_POLICY`, etc.
 								- 組み合わせが複雑で不正な設定でバグる可能性
 								- ベンチマーク比較が困難（どの設定で比較？）
 								**Solution**: **5つのプリセットモード**に統合
 								- シンプルな `HAKMEM_MODE=balanced` で適切な設定
 								- 各機能の効果を段階的に測定可能
 								- 論文での説明が容易
 								---
 								## 📊 5 Modes Definition
 								### **Mode Overview**
 								| Mode | Use Case | Target Audience | Performance Goal |
 								|------|----------|-----------------|------------------|
 								| **MINIMAL** | ベースライン測定 | ベンチマーク比較 | system malloc 相当 |
 								| **FAST** | 本番環境（速度優先） | Production use | mimalloc +20% |
 								| **BALANCED** | デフォルト推奨 | General use | mimalloc +40% |
 								| **LEARNING** | 学習フェーズ | Development | mimalloc +60% |
 								| **RESEARCH** | 開発・デバッグ | Research | N/A（全機能ON） |
 								### **Feature Matrix**
 								| Feature | MINIMAL | FAST | BALANCED | LEARNING | RESEARCH |
 								|---------|---------|------|----------|----------|----------|
 								| **ELO learning** | ❌ | ❌ FROZEN | ✅ FROZEN | ✅ LEARN | ✅ LEARN |
 								| **BigCache** | ❌ | ✅ | ✅ | ✅ | ✅ |
 								| **Batch madvise** | ❌ | ✅ | ✅ | ✅ | ✅ |
 								| **TinyPool (future)** | ❌ | ✅ | ✅ | ❌ | ❌ |
 								| **Free policy** | batch | adaptive | adaptive | adaptive | adaptive |
 								| **THP** | off | auto | auto | auto | on |
 								| **Evolution lifecycle** | - | FROZEN | FROZEN | LEARN→FROZEN | LEARN |
 								| **Debug logging** | ❌ | ❌ | ❌ | ⚠️ minimal | ✅ verbose |
 								---
 								## 🔧 Implementation Plan
 								### **Step 0: Baseline Measurement** ✅ (Already done in Phase 6.6-6.7)
 								Current state:
 								- hakmem-evolving: 37,602 ns (VM scenario, 2MB)
 								- mimalloc: 19,964 ns (+88.3% gap)
 								- All features ON (uncontrolled)
 								### **Step 1: MINIMAL Mode** 🎯 (P0 - Foundation)
 								**Goal**: Create baseline with all features OFF
 								**Implementation**:
 								```c
 								// hakmem_config.h
 								typedef enum {
 								    HAKMEM_MODE_MINIMAL = 0,
 								    HAKMEM_MODE_FAST,
 								    HAKMEM_MODE_BALANCED,
 								    HAKMEM_MODE_LEARNING,
 								    HAKMEM_MODE_RESEARCH,
 								} HakemMode;
 								typedef struct {
 								    HakemMode mode;
 								    // Feature flags
 								    int enable_elo;
 								    int enable_bigcache;
 								    int enable_batch;
 								    int enable_pool;  // future (Step 5)
 								    // Policies
 								    FreePolicy free_policy;
 								    THPPolicy thp_policy;
 								    const char* evo_phase;  // "frozen", "learn", "canary"
 								    // Debug
 								    int debug_logging;
 								} HakemConfig;
 								extern HakemConfig g_hakem_config;
 								void hak_config_init(void);
 								```
 								**Changes**:
 								- `hakmem_config.h/c`: New files
 								- `hakmem.c`: Call `hak_config_init()` in `hak_init()`
 								- All modules: Check `g_hakem_config` flags before enabling features
 								**Benchmark**:
 								```bash
 								HAKMEM_MODE=minimal ./bench_allocators --allocator hakmem-evolving --scenario vm --iterations 100
 								```
 								**Expected**:
 								- Performance: ~40,000-50,000 ns (slower than current, no optimizations)
 								- Serves as baseline for feature comparison
 								**Estimated time**: 1 day
 								---
 								### **Step 2: Enable BigCache** 🎯 (P0 - Tier-2 Cache)
 								**Goal**: Measure BigCache impact in isolation
 								**Implementation**:
 								- MINIMAL + BigCache ON
 								- Keep ELO/Batch/THP OFF
 								**Benchmark**:
 								```bash
 								HAKMEM_MODE=minimal ./bench_runner.sh --warmup 2 --runs 10
 								# Then:
 								# hakmem.c: g_hakem_config.enable_bigcache = 1;
 								./bench_runner.sh --warmup 2 --runs 10
 								```
 								**Expected**:
 								- VM scenario hit rate: 99%+
 								- Performance: -5,000 ns improvement (cache hits avoid mmap)
 								- Target: 35,000-40,000 ns
 								**Measurement**:
 								- BigCache hit rate
 								- mmap syscall count (should drop)
 								- Performance delta
 								**Estimated time**: 0.5 day
 								---
 								### **Step 3: Enable Batch madvise** 🎯 (P1 - TLB Optimization)
 								**Goal**: Measure batch madvise impact
 								**Implementation**:
 								- MINIMAL + BigCache + Batch ON
 								- Keep ELO/THP OFF
 								**Benchmark**:
 								```bash
 								# Previous: MINIMAL + BigCache
 								# New: MINIMAL + BigCache + Batch
 								./bench_runner.sh --warmup 2 --runs 10
 								```
 								**Expected**:
 								- Batch flush operations: 1-10 per run
 								- Performance: -500-1,000 ns improvement (TLB optimization)
 								- Target: 34,000-39,000 ns
 								**Measurement**:
 								- Batch statistics (blocks added, flush count)
 								- madvise syscall count
 								- Performance delta
 								**Estimated time**: 0.5 day
 								---
 								### **Step 4: Enable ELO (FROZEN)** 🎯 (P1 - Strategy Selection)
 								**Goal**: Measure ELO overhead in FROZEN mode (no learning)
 								**Implementation**:
 								- BALANCED mode = MINIMAL + BigCache + Batch + ELO(FROZEN)
 								**Benchmark**:
 								```bash
 								HAKMEM_MODE=balanced ./bench_runner.sh --warmup 2 --runs 10
 								```
 								**Expected**:
 								- ELO overhead: ~100-200 ns (strategy selection per allocation)
 								- Performance: +100-200 ns regression (acceptable for adaptability)
 								- Target: 34,500-39,500 ns
 								**Measurement**:
 								- ELO selection overhead
 								- Strategy distribution
 								- Performance delta
 								**Estimated time**: 0.5 day
 								---
 								### **Step 5: TinyPool Implementation (FAST mode)** 🚀 (P2 - Fast Path)
 								**Goal**: Implement pool-based fast path (ChatGPT Pro proposal)
 								**Implementation**:
 								- FAST mode = BALANCED + TinyPool
 								- 7 size classes: 16/32/64/128/256/512/1024B
 								- Per-thread free lists
 								- class×shard O(1) mapping
 								**Code sketch**:
 								```c
 								// hakmem_pool.h
 								typedef struct Node { struct Node* next; } Node;
 								typedef struct { Node* head; uint32_t cnt; } FreeList;
 								#define SHARDS 64
 								#define CLASSES 7  // 16B to 1024B
 								typedef struct {
 								    FreeList list[SHARDS];
 								} ClassPools;
 								_Thread_local ClassPools tls_pools[CLASSES];
 								// Fast path (O(1))
 								void* hak_alloc_small(size_t sz, void* pc);
 								void hak_free_small(void* p, void* pc);
 								```
 								**Benchmark**:
 								```bash
 								# Baseline: BALANCED mode
 								HAKMEM_MODE=balanced ./bench_runner.sh --warmup 10 --runs 50
 								# New: FAST mode
 								HAKMEM_MODE=fast ./bench_runner.sh --warmup 10 --runs 50
 								```
 								**Expected**:
 								- Small allocations (≤1KB): 9-15 ns fast path
 								- VM scenario (2MB): No change (pool not used for large allocations)
 								- Need new benchmark: tiny-hot (16/32/64B allocations)
 								**Measurement**:
 								- Pool hit rate
 								- Fast path latency (perf profiling)
 								- Comparison with mimalloc on tiny-hot
 								**Estimated time**: 2-3 weeks (MVP: 2 weeks, MT support: +1 week)
 								---
 								### **Step 6: ELO LEARNING mode** 🎯 (P2 - Adaptive Learning)
 								**Goal**: Measure learning overhead and convergence
 								**Implementation**:
 								- LEARNING mode = BALANCED + ELO(LEARN→FROZEN)
 								**Benchmark**:
 								```bash
 								HAKMEM_MODE=learning ./bench_runner.sh --warmup 100 --runs 100
 								```
 								**Expected**:
 								- LEARN phase: +200-500 ns overhead (ELO selection + recording)
 								- Convergence: 1024-2048 allocations → FROZEN
 								- FROZEN phase: Same as BALANCED mode
 								- Overall: +50-100 ns average (amortized)
 								**Measurement**:
 								- ELO rating convergence
 								- Phase transitions (LEARN → FROZEN → CANARY)
 								- Learning overhead vs benefit
 								**Estimated time**: 1 day
 								---
 								### **Step 7: RESEARCH mode (All features)** 🎯 (P3 - Development)
 								**Goal**: Enable all features + debug logging
 								**Implementation**:
 								- RESEARCH mode = LEARNING + THP(ON) + Debug logging
 								**Use case**:
 								- Development & debugging only
 								- Not for benchmarking (too slow)
 								**Estimated time**: 0.5 day
 								---
 								## 📈 Benchmark Plan
 								### **Comparison Matrix**
 								| Scenario | MINIMAL | +BigCache | +Batch | BALANCED | FAST | LEARNING |
 								|----------|---------|-----------|--------|----------|------|----------|
 								| **VM (2MB)** | 45,000 | 40,000 | 39,000 | 39,500 | 39,500 | 39,600 |
 								| **tiny-hot** | 50 | 50 | 50 | 50 | **12** | 52 |
 								| **cold-churn** | TBD | TBD | TBD | TBD | TBD | TBD |
 								| **json-parse** | TBD | TBD | TBD | TBD | TBD | TBD |
 								**Note**: Numbers are estimates, actual results TBD
 								### **Metrics to Collect**
 								For each mode:
 								- **Performance**: Median latency (ns)
 								- **Syscalls**: mmap/munmap/madvise counts
 								- **Page faults**: soft/hard counts
 								- **Memory**: RSS delta
 								- **Cache**: Hit rates (BigCache, Pool)
 								### **Benchmark Script**
 								```bash
 								#!/bin/bash
 								# bench_modes.sh - Compare all modes
 								MODES="minimal balanced fast learning"
 								SCENARIOS="vm cold-churn json-parse"
 								for mode in $MODES; do
 								    for scenario in $SCENARIOS; do
 								        echo "=== Mode: $mode, Scenario: $scenario ==="
 								        HAKMEM_MODE=$mode ./bench_runner.sh \
 								            --allocator hakmem-evolving \
 								            --scenario $scenario \
 								            --warmup 10 --runs 50 \
 								            --output results_${mode}_${scenario}.csv
 								    done
 								done
 								# Aggregate results
 								python3 analyze_modes.py results_*.csv
 								```
 								---
 								## 🎯 Success Metrics
 								### **Step 1-4 (MINIMAL → BALANCED)**
 								- ✅ Each feature's impact is measurable
 								- ✅ Performance regression < 10% per feature
 								- ✅ Total BALANCED overhead: +40-60% vs mimalloc
 								### **Step 5 (FAST mode with TinyPool)**
 								- ✅ tiny-hot benchmark: mimalloc +20% or better
 								- ✅ VM scenario: No regression vs BALANCED
 								- ✅ Pool hit rate: 90%+ for small allocations
 								### **Step 6 (LEARNING mode)**
 								- ✅ Convergence within 2048 allocations
 								- ✅ Learning overhead amortized to < 5%
 								- ✅ FROZEN performance = BALANCED
 								---
 								## 📝 Migration Plan (Backward Compatibility)
 								### **Environment Variable Priority**
 								```c
 								// 1. HAKMEM_MODE has highest priority
 								const char* mode_env = getenv("HAKMEM_MODE");
 								if (mode_env) {
 								    hak_config_apply_mode(mode_env);  // Apply preset
 								} else {
 								    // 2. Fall back to individual settings (legacy)
 								    const char* free_policy = getenv("HAKMEM_FREE_POLICY");
 								    const char* thp = getenv("HAKMEM_THP");
 								    // ... etc
 								}
 								// 3. Individual settings can override mode
 								// Example: HAKMEM_MODE=balanced HAKMEM_THP=off
 								//   → Use BALANCED preset, but force THP=off
 								```
 								### **Deprecation Timeline**
 								- **Phase 6.8**: Both HAKMEM_MODE and individual env vars supported
 								- **Phase 7**: Prefer HAKMEM_MODE, warn if individual vars used
 								- **Phase 8**: Deprecate individual vars (only HAKMEM_MODE)
 								---
 								## 🚀 Implementation Timeline
 								| Step | Task | Time | Cumulative | Status |
 								|------|------|------|------------|--------|
 								| 0 | Baseline (done) | - | - | ✅ |
 								| 1 | MINIMAL mode | 1 day | 1 day | 🚧 |
 								| 2 | +BigCache | 0.5 day | 1.5 days | ⏳ |
 								| 3 | +Batch | 0.5 day | 2 days | ⏳ |
 								| 4 | BALANCED (ELO FROZEN) | 0.5 day | 2.5 days | ⏳ |
 								| 5 | FAST (TinyPool MVP) | 2-3 weeks | 3.5-4.5 weeks | ⏳ |
 								| 6 | LEARNING mode | 1 day | 3.6-4.6 weeks | ⏳ |
 								| 7 | RESEARCH mode | 0.5 day | 3.65-4.65 weeks | ⏳ |
 								**Total**: 3.7-4.7 weeks (MVP: 2.5 days, Full: 4-5 weeks)
 								---
 								## 📚 Documentation Updates
 								### **README.md**
 								Add section:
 								```markdown
 								## 🎯 Quick Start: Choosing a Mode
 								- **Development**: `HAKMEM_MODE=learning` (adaptive, slow)
 								- **Production**: `HAKMEM_MODE=fast` (mimalloc +20%)
 								- **General**: `HAKMEM_MODE=balanced` (default, mimalloc +40%)
 								- **Benchmarking**: `HAKMEM_MODE=minimal` (baseline)
 								- **Research**: `HAKMEM_MODE=research` (all features + debug)
 								```
 								### **New Files**
 								- `PHASE_6.8_CONFIG_CLEANUP.md` (this file)
 								- `apps/experiments/hakmem-poc/hakmem_config.h`
 								- `apps/experiments/hakmem-poc/hakmem_config.c`
 								- `apps/experiments/hakmem-poc/bench_modes.sh`
 								- `apps/experiments/hakmem-poc/analyze_modes.py`
 								---
 								## 🎓 Expected Outcomes
 								### **For Paper**
 								**Before Phase 6.8**:
 								- ❌ "hakmem is +88% slower than mimalloc"
 								- ⚠️ Complex configuration, hard to reproduce
 								- ⚠️ Unclear which features contribute to overhead
 								**After Phase 6.8**:
 								- ✅ "BALANCED mode: +40% overhead for adaptive learning"
 								- ✅ "FAST mode: +20% overhead, competitive with production allocators"
 								- ✅ "Each feature's impact clearly measured"
 								- ✅ "5 simple modes, easy to reproduce"
 								### **For Future Work**
 								- Step 5 (TinyPool) can become **Phase 7** if successful
 								- ChatGPT Pro's hybrid architecture validated
 								- Clear path to mimalloc-level performance
 								---
 								## 🏆 Final Status
 								**Phase 6.8**: 🚧 **IN PROGRESS**
 								**Next Steps**:
 . ✅ Design document created (this file)
 . 🚧 Implement Step 1 (MINIMAL mode)
 . ⏳ Measure & iterate through Steps 2-7
 								---
 								**Ready to start implementation!** 🚀