Files
hakmem/docs/archive/PHASE_6.8_CONFIG_CLEANUP.md
Moe Charm (CI) 52386401b3 Debug Counters Implementation - Clean History
Major Features:
- Debug counter infrastructure for Refill Stage tracking
- Free Pipeline counters (ss_local, ss_remote, tls_sll)
- Diagnostic counters for early return analysis
- Unified larson.sh benchmark runner with profiles
- Phase 6-3 regression analysis documentation

Bug Fixes:
- Fix SuperSlab disabled by default (HAKMEM_TINY_USE_SUPERSLAB)
- Fix profile variable naming consistency
- Add .gitignore patterns for large files

Performance:
- Phase 6-3: 4.79 M ops/s (has OOM risk)
- With SuperSlab: 3.13 M ops/s (+19% improvement)

This is a clean repository without large log files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-05 12:31:14 +09:00

12 KiB
Raw Blame History

Phase 6.8: Configuration Cleanup & Mode-based Architecture

Date: 2025-10-21 Status: 🚧 IN PROGRESS


🎯 Goal

Problem: 現状の hakmem は環境変数が多すぎて管理困難

  • HAKMEM_FREE_POLICY, HAKMEM_THP, HAKMEM_EVO_POLICY, etc.
  • 組み合わせが複雑で不正な設定でバグる可能性
  • ベンチマーク比較が困難(どの設定で比較?)

Solution: 5つのプリセットモードに統合

  • シンプルな HAKMEM_MODE=balanced で適切な設定
  • 各機能の効果を段階的に測定可能
  • 論文での説明が容易

📊 5 Modes Definition

Mode Overview

Mode Use Case Target Audience Performance Goal
MINIMAL ベースライン測定 ベンチマーク比較 system malloc 相当
FAST 本番環境(速度優先) Production use mimalloc +20%
BALANCED デフォルト推奨 General use mimalloc +40%
LEARNING 学習フェーズ Development mimalloc +60%
RESEARCH 開発・デバッグ Research N/A全機能ON

Feature Matrix

Feature MINIMAL FAST BALANCED LEARNING RESEARCH
ELO learning FROZEN FROZEN LEARN LEARN
BigCache
Batch madvise
TinyPool (future)
Free policy batch adaptive adaptive adaptive adaptive
THP off auto auto auto on
Evolution lifecycle - FROZEN FROZEN LEARN→FROZEN LEARN
Debug logging ⚠️ minimal verbose

🔧 Implementation Plan

Step 0: Baseline Measurement (Already done in Phase 6.6-6.7)

Current state:

  • hakmem-evolving: 37,602 ns (VM scenario, 2MB)
  • mimalloc: 19,964 ns (+88.3% gap)
  • All features ON (uncontrolled)

Step 1: MINIMAL Mode 🎯 (P0 - Foundation)

Goal: Create baseline with all features OFF

Implementation:

// hakmem_config.h
typedef enum {
    HAKMEM_MODE_MINIMAL = 0,
    HAKMEM_MODE_FAST,
    HAKMEM_MODE_BALANCED,
    HAKMEM_MODE_LEARNING,
    HAKMEM_MODE_RESEARCH,
} HakemMode;

typedef struct {
    HakemMode mode;

    // Feature flags
    int enable_elo;
    int enable_bigcache;
    int enable_batch;
    int enable_pool;  // future (Step 5)

    // Policies
    FreePolicy free_policy;
    THPPolicy thp_policy;
    const char* evo_phase;  // "frozen", "learn", "canary"

    // Debug
    int debug_logging;
} HakemConfig;

extern HakemConfig g_hakem_config;
void hak_config_init(void);

Changes:

  • hakmem_config.h/c: New files
  • hakmem.c: Call hak_config_init() in hak_init()
  • All modules: Check g_hakem_config flags before enabling features

Benchmark:

HAKMEM_MODE=minimal ./bench_allocators --allocator hakmem-evolving --scenario vm --iterations 100

Expected:

  • Performance: ~40,000-50,000 ns (slower than current, no optimizations)
  • Serves as baseline for feature comparison

Estimated time: 1 day


Step 2: Enable BigCache 🎯 (P0 - Tier-2 Cache)

Goal: Measure BigCache impact in isolation

Implementation:

  • MINIMAL + BigCache ON
  • Keep ELO/Batch/THP OFF

Benchmark:

HAKMEM_MODE=minimal ./bench_runner.sh --warmup 2 --runs 10
# Then:
# hakmem.c: g_hakem_config.enable_bigcache = 1;
./bench_runner.sh --warmup 2 --runs 10

Expected:

  • VM scenario hit rate: 99%+
  • Performance: -5,000 ns improvement (cache hits avoid mmap)
  • Target: 35,000-40,000 ns

Measurement:

  • BigCache hit rate
  • mmap syscall count (should drop)
  • Performance delta

Estimated time: 0.5 day


Step 3: Enable Batch madvise 🎯 (P1 - TLB Optimization)

Goal: Measure batch madvise impact

Implementation:

  • MINIMAL + BigCache + Batch ON
  • Keep ELO/THP OFF

Benchmark:

# Previous: MINIMAL + BigCache
# New: MINIMAL + BigCache + Batch
./bench_runner.sh --warmup 2 --runs 10

Expected:

  • Batch flush operations: 1-10 per run
  • Performance: -500-1,000 ns improvement (TLB optimization)
  • Target: 34,000-39,000 ns

Measurement:

  • Batch statistics (blocks added, flush count)
  • madvise syscall count
  • Performance delta

Estimated time: 0.5 day


Step 4: Enable ELO (FROZEN) 🎯 (P1 - Strategy Selection)

Goal: Measure ELO overhead in FROZEN mode (no learning)

Implementation:

  • BALANCED mode = MINIMAL + BigCache + Batch + ELO(FROZEN)

Benchmark:

HAKMEM_MODE=balanced ./bench_runner.sh --warmup 2 --runs 10

Expected:

  • ELO overhead: ~100-200 ns (strategy selection per allocation)
  • Performance: +100-200 ns regression (acceptable for adaptability)
  • Target: 34,500-39,500 ns

Measurement:

  • ELO selection overhead
  • Strategy distribution
  • Performance delta

Estimated time: 0.5 day


Step 5: TinyPool Implementation (FAST mode) 🚀 (P2 - Fast Path)

Goal: Implement pool-based fast path (ChatGPT Pro proposal)

Implementation:

  • FAST mode = BALANCED + TinyPool
  • 7 size classes: 16/32/64/128/256/512/1024B
  • Per-thread free lists
  • class×shard O(1) mapping

Code sketch:

// hakmem_pool.h
typedef struct Node { struct Node* next; } Node;
typedef struct { Node* head; uint32_t cnt; } FreeList;

#define SHARDS 64
#define CLASSES 7  // 16B to 1024B

typedef struct {
    FreeList list[SHARDS];
} ClassPools;

_Thread_local ClassPools tls_pools[CLASSES];

// Fast path (O(1))
void* hak_alloc_small(size_t sz, void* pc);
void hak_free_small(void* p, void* pc);

Benchmark:

# Baseline: BALANCED mode
HAKMEM_MODE=balanced ./bench_runner.sh --warmup 10 --runs 50

# New: FAST mode
HAKMEM_MODE=fast ./bench_runner.sh --warmup 10 --runs 50

Expected:

  • Small allocations (≤1KB): 9-15 ns fast path
  • VM scenario (2MB): No change (pool not used for large allocations)
  • Need new benchmark: tiny-hot (16/32/64B allocations)

Measurement:

  • Pool hit rate
  • Fast path latency (perf profiling)
  • Comparison with mimalloc on tiny-hot

Estimated time: 2-3 weeks (MVP: 2 weeks, MT support: +1 week)


Step 6: ELO LEARNING mode 🎯 (P2 - Adaptive Learning)

Goal: Measure learning overhead and convergence

Implementation:

  • LEARNING mode = BALANCED + ELO(LEARN→FROZEN)

Benchmark:

HAKMEM_MODE=learning ./bench_runner.sh --warmup 100 --runs 100

Expected:

  • LEARN phase: +200-500 ns overhead (ELO selection + recording)
  • Convergence: 1024-2048 allocations → FROZEN
  • FROZEN phase: Same as BALANCED mode
  • Overall: +50-100 ns average (amortized)

Measurement:

  • ELO rating convergence
  • Phase transitions (LEARN → FROZEN → CANARY)
  • Learning overhead vs benefit

Estimated time: 1 day


Step 7: RESEARCH mode (All features) 🎯 (P3 - Development)

Goal: Enable all features + debug logging

Implementation:

  • RESEARCH mode = LEARNING + THP(ON) + Debug logging

Use case:

  • Development & debugging only
  • Not for benchmarking (too slow)

Estimated time: 0.5 day


📈 Benchmark Plan

Comparison Matrix

Scenario MINIMAL +BigCache +Batch BALANCED FAST LEARNING
VM (2MB) 45,000 40,000 39,000 39,500 39,500 39,600
tiny-hot 50 50 50 50 12 52
cold-churn TBD TBD TBD TBD TBD TBD
json-parse TBD TBD TBD TBD TBD TBD

Note: Numbers are estimates, actual results TBD

Metrics to Collect

For each mode:

  • Performance: Median latency (ns)
  • Syscalls: mmap/munmap/madvise counts
  • Page faults: soft/hard counts
  • Memory: RSS delta
  • Cache: Hit rates (BigCache, Pool)

Benchmark Script

#!/bin/bash
# bench_modes.sh - Compare all modes

MODES="minimal balanced fast learning"
SCENARIOS="vm cold-churn json-parse"

for mode in $MODES; do
    for scenario in $SCENARIOS; do
        echo "=== Mode: $mode, Scenario: $scenario ==="
        HAKMEM_MODE=$mode ./bench_runner.sh \
            --allocator hakmem-evolving \
            --scenario $scenario \
            --warmup 10 --runs 50 \
            --output results_${mode}_${scenario}.csv
    done
done

# Aggregate results
python3 analyze_modes.py results_*.csv

🎯 Success Metrics

Step 1-4 (MINIMAL → BALANCED)

  • Each feature's impact is measurable
  • Performance regression < 10% per feature
  • Total BALANCED overhead: +40-60% vs mimalloc

Step 5 (FAST mode with TinyPool)

  • tiny-hot benchmark: mimalloc +20% or better
  • VM scenario: No regression vs BALANCED
  • Pool hit rate: 90%+ for small allocations

Step 6 (LEARNING mode)

  • Convergence within 2048 allocations
  • Learning overhead amortized to < 5%
  • FROZEN performance = BALANCED

📝 Migration Plan (Backward Compatibility)

Environment Variable Priority

// 1. HAKMEM_MODE has highest priority
const char* mode_env = getenv("HAKMEM_MODE");
if (mode_env) {
    hak_config_apply_mode(mode_env);  // Apply preset
} else {
    // 2. Fall back to individual settings (legacy)
    const char* free_policy = getenv("HAKMEM_FREE_POLICY");
    const char* thp = getenv("HAKMEM_THP");
    // ... etc
}

// 3. Individual settings can override mode
// Example: HAKMEM_MODE=balanced HAKMEM_THP=off
//   → Use BALANCED preset, but force THP=off

Deprecation Timeline

  • Phase 6.8: Both HAKMEM_MODE and individual env vars supported
  • Phase 7: Prefer HAKMEM_MODE, warn if individual vars used
  • Phase 8: Deprecate individual vars (only HAKMEM_MODE)

🚀 Implementation Timeline

Step Task Time Cumulative Status
0 Baseline (done) - -
1 MINIMAL mode 1 day 1 day 🚧
2 +BigCache 0.5 day 1.5 days
3 +Batch 0.5 day 2 days
4 BALANCED (ELO FROZEN) 0.5 day 2.5 days
5 FAST (TinyPool MVP) 2-3 weeks 3.5-4.5 weeks
6 LEARNING mode 1 day 3.6-4.6 weeks
7 RESEARCH mode 0.5 day 3.65-4.65 weeks

Total: 3.7-4.7 weeks (MVP: 2.5 days, Full: 4-5 weeks)


📚 Documentation Updates

README.md

Add section:

## 🎯 Quick Start: Choosing a Mode

- **Development**: `HAKMEM_MODE=learning` (adaptive, slow)
- **Production**: `HAKMEM_MODE=fast` (mimalloc +20%)
- **General**: `HAKMEM_MODE=balanced` (default, mimalloc +40%)
- **Benchmarking**: `HAKMEM_MODE=minimal` (baseline)
- **Research**: `HAKMEM_MODE=research` (all features + debug)

New Files

  • PHASE_6.8_CONFIG_CLEANUP.md (this file)
  • apps/experiments/hakmem-poc/hakmem_config.h
  • apps/experiments/hakmem-poc/hakmem_config.c
  • apps/experiments/hakmem-poc/bench_modes.sh
  • apps/experiments/hakmem-poc/analyze_modes.py

🎓 Expected Outcomes

For Paper

Before Phase 6.8:

  • "hakmem is +88% slower than mimalloc"
  • ⚠️ Complex configuration, hard to reproduce
  • ⚠️ Unclear which features contribute to overhead

After Phase 6.8:

  • "BALANCED mode: +40% overhead for adaptive learning"
  • "FAST mode: +20% overhead, competitive with production allocators"
  • "Each feature's impact clearly measured"
  • "5 simple modes, easy to reproduce"

For Future Work

  • Step 5 (TinyPool) can become Phase 7 if successful
  • ChatGPT Pro's hybrid architecture validated
  • Clear path to mimalloc-level performance

🏆 Final Status

Phase 6.8: 🚧 IN PROGRESS

Next Steps:

  1. Design document created (this file)
  2. 🚧 Implement Step 1 (MINIMAL mode)
  3. Measure & iterate through Steps 2-7

Ready to start implementation! 🚀