Files
hakmem/docs/design/BENCHMARK_DESIGN.md
Moe Charm (CI) 52386401b3 Debug Counters Implementation - Clean History
Major Features:
- Debug counter infrastructure for Refill Stage tracking
- Free Pipeline counters (ss_local, ss_remote, tls_sll)
- Diagnostic counters for early return analysis
- Unified larson.sh benchmark runner with profiles
- Phase 6-3 regression analysis documentation

Bug Fixes:
- Fix SuperSlab disabled by default (HAKMEM_TINY_USE_SUPERSLAB)
- Fix profile variable naming consistency
- Add .gitignore patterns for large files

Performance:
- Phase 6-3: 4.79 M ops/s (has OOM risk)
- With SuperSlab: 3.13 M ops/s (+19% improvement)

This is a clean repository without large log files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-05 12:31:14 +09:00

5.4 KiB

hakmem Benchmark Design - jemalloc/mimalloc Comparison

Purpose: Compare hakmem against industry-standard allocators for paper evaluation

Date: 2025-10-21 Status: Phase 5 Implementation


🎯 Benchmark Goals

Per Gemini's S+ review:

"jemalloc/mimalloc比較がないと → Best Paper Awardは無理"

Key Requirements:

  1. Fair comparison (same workload, same environment)
  2. Multiple allocators: hakmem (baseline/evolving), jemalloc, mimalloc, system malloc
  3. KPI measurement: P99 latency, page faults, RSS, throughput
  4. Statistical significance: multiple runs, warm-up, median/percentiles
  5. Paper-ready output: CSV format for graphs/tables

📊 Workload Scenarios

Using existing test scenarios from test_hakmem.c:

Scenario 1: JSON Parsing (small, frequent)

  • Size: 64KB
  • Iterations: 1000
  • Pattern: Allocate → Use → Free (tight loop)

Scenario 2: MIR Build (medium, moderate)

  • Size: 256KB
  • Iterations: 100
  • Pattern: Allocate → Use → Free (moderate)

Scenario 3: VM Execution (large, infrequent)

  • Size: 2MB
  • Iterations: 10
  • Pattern: Allocate → Use → Free (infrequent)

Scenario 4: Mixed (realistic)

  • All three patterns mixed
  • Simulates real compiler workload

🔬 Allocator Configurations

1. hakmem-baseline

  • HAKMEM_MODE not set
  • Fixed policy (256KB threshold)
  • Baseline for comparison

2. hakmem-evolving

  • HAKMEM_MODE=evolving
  • UCB1 enabled
  • Adaptive learning

3. jemalloc

  • LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2
  • Industry standard (Firefox, Redis)

4. mimalloc

  • LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libmimalloc.so.2
  • Microsoft allocator

5. system malloc (glibc)

  • No LD_PRELOAD
  • Default libc allocator
  • Control baseline

📈 KPI Metrics

Primary Metrics (for paper)

  1. P99 Latency: 99th percentile allocation latency (ns)
  2. Page Faults: Hard page faults (I/O required)
  3. RSS Peak: Maximum resident set size (MB)

Secondary Metrics

  1. Throughput: Allocations per second
  2. P50/P95 Latency: Additional percentiles
  3. Soft Page Faults: Minor faults (no I/O)

🏗️ Implementation Plan

Phase 5.1: Benchmark Infrastructure (今回)

  • Design document (this file)
  • bench_allocators.c - Main benchmark program
  • bench_runner.sh - Shell script to run all allocators
  • CSV output format

Phase 5.2: Statistical Analysis

  • Multiple runs (10-50 iterations)
  • Warm-up phase (discard first 3 runs)
  • Median/percentile calculation
  • Confidence intervals

Phase 5.3: Visualization

  • Python/gnuplot scripts for graphs
  • LaTeX tables for paper

🔧 Benchmark Program Design

bench_allocators.c Structure

// Allocator abstraction layer
typedef void* (*alloc_fn_t)(size_t);
typedef void (*free_fn_t)(void*, size_t);

// Benchmark a single scenario
typedef struct {
    const char* name;
    void (*run)(alloc_fn_t, free_fn_t, int iterations);
} benchmark_t;

// Scenarios
void bench_json(alloc_fn_t alloc, free_fn_t free, int iters);
void bench_mir(alloc_fn_t alloc, free_fn_t free, int iters);
void bench_vm(alloc_fn_t alloc, free_fn_t free, int iters);
void bench_mixed(alloc_fn_t alloc, free_fn_t free, int iters);

// KPI measurement
void measure_start(void);
void measure_end(bench_result_t* out);

Output Format (CSV)

allocator,scenario,iterations,p50_ns,p95_ns,p99_ns,soft_pf,hard_pf,rss_mb,throughput
hakmem-baseline,json,1000,42,68,89,1234,0,5.2,23809
hakmem-evolving,json,1000,38,62,81,1150,0,4.8,26315
jemalloc,json,1000,45,72,95,1400,0,6.1,22222
mimalloc,json,1000,40,65,85,1280,0,5.5,25000
system,json,1000,55,90,120,1800,2,7.8,18181

🧪 Execution Plan

Step 1: Build

# hakmem versions
make clean && make  # hakmem with UCB1

# System allocators (already installed)
apt-get install libjemalloc2 libmimalloc2.0

Step 2: Run Benchmarks

# Run all allocators, all scenarios
bash bench_runner.sh --warmup 3 --runs 10 --output results.csv

# Individual runs (for debugging)
./bench_allocators --allocator hakmem-baseline --scenario json
LD_PRELOAD=libjemalloc.so.2 ./bench_allocators --allocator jemalloc --scenario json

Step 3: Analyze

# Generate graphs
python3 analyze_results.py results.csv --output graphs/

# Generate LaTeX table
python3 generate_latex_table.py results.csv --output paper_table.tex

📋 Gemini's Critical Requirements

Must Have (for Best Paper)

  1. jemalloc comparison - Industry standard
  2. mimalloc comparison - State-of-the-art
  3. Fair benchmarking - Same workload, multiple runs
  4. Statistical significance - Warm-up, median, confidence intervals

🎯 Should Have (for generality)

  1. Redis/Nginx benchmarks - Real-world workloads
  2. Confusion Matrix - Auto-inference accuracy

💡 Nice to Have

  1. Multi-threaded benchmarks - Scalability
  2. Memory fragmentation - Long-running tests

🚀 Next Steps

  1. Implement bench_allocators.c ⬅️
  2. Implement bench_runner.sh
  3. Run initial benchmarks (10 runs)
  4. Analyze results
  5. Create graphs for paper

📝 Notes

  • Fair comparison: Use same scenarios for all allocators
  • Statistical rigor: Multiple runs, discard outliers
  • Paper-ready: CSV → graphs/tables directly
  • Reproducible: Document exact versions, environment

Related: Gemini's S+ review in chatgpt-advanced-proposals.md