Major Features: - Debug counter infrastructure for Refill Stage tracking - Free Pipeline counters (ss_local, ss_remote, tls_sll) - Diagnostic counters for early return analysis - Unified larson.sh benchmark runner with profiles - Phase 6-3 regression analysis documentation Bug Fixes: - Fix SuperSlab disabled by default (HAKMEM_TINY_USE_SUPERSLAB) - Fix profile variable naming consistency - Add .gitignore patterns for large files Performance: - Phase 6-3: 4.79 M ops/s (has OOM risk) - With SuperSlab: 3.13 M ops/s (+19% improvement) This is a clean repository without large log files. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
5.4 KiB
5.4 KiB
hakmem Benchmark Design - jemalloc/mimalloc Comparison
Purpose: Compare hakmem against industry-standard allocators for paper evaluation
Date: 2025-10-21 Status: Phase 5 Implementation
🎯 Benchmark Goals
Per Gemini's S+ review:
"jemalloc/mimalloc比較がないと → Best Paper Awardは無理"
Key Requirements:
- Fair comparison (same workload, same environment)
- Multiple allocators: hakmem (baseline/evolving), jemalloc, mimalloc, system malloc
- KPI measurement: P99 latency, page faults, RSS, throughput
- Statistical significance: multiple runs, warm-up, median/percentiles
- Paper-ready output: CSV format for graphs/tables
📊 Workload Scenarios
Using existing test scenarios from test_hakmem.c:
Scenario 1: JSON Parsing (small, frequent)
- Size: 64KB
- Iterations: 1000
- Pattern: Allocate → Use → Free (tight loop)
Scenario 2: MIR Build (medium, moderate)
- Size: 256KB
- Iterations: 100
- Pattern: Allocate → Use → Free (moderate)
Scenario 3: VM Execution (large, infrequent)
- Size: 2MB
- Iterations: 10
- Pattern: Allocate → Use → Free (infrequent)
Scenario 4: Mixed (realistic)
- All three patterns mixed
- Simulates real compiler workload
🔬 Allocator Configurations
1. hakmem-baseline
HAKMEM_MODEnot set- Fixed policy (256KB threshold)
- Baseline for comparison
2. hakmem-evolving
HAKMEM_MODE=evolving- UCB1 enabled
- Adaptive learning
3. jemalloc
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2- Industry standard (Firefox, Redis)
4. mimalloc
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libmimalloc.so.2- Microsoft allocator
5. system malloc (glibc)
- No LD_PRELOAD
- Default libc allocator
- Control baseline
📈 KPI Metrics
Primary Metrics (for paper)
- P99 Latency: 99th percentile allocation latency (ns)
- Page Faults: Hard page faults (I/O required)
- RSS Peak: Maximum resident set size (MB)
Secondary Metrics
- Throughput: Allocations per second
- P50/P95 Latency: Additional percentiles
- Soft Page Faults: Minor faults (no I/O)
🏗️ Implementation Plan
Phase 5.1: Benchmark Infrastructure (今回)
- Design document (this file)
bench_allocators.c- Main benchmark programbench_runner.sh- Shell script to run all allocators- CSV output format
Phase 5.2: Statistical Analysis
- Multiple runs (10-50 iterations)
- Warm-up phase (discard first 3 runs)
- Median/percentile calculation
- Confidence intervals
Phase 5.3: Visualization
- Python/gnuplot scripts for graphs
- LaTeX tables for paper
🔧 Benchmark Program Design
bench_allocators.c Structure
// Allocator abstraction layer
typedef void* (*alloc_fn_t)(size_t);
typedef void (*free_fn_t)(void*, size_t);
// Benchmark a single scenario
typedef struct {
const char* name;
void (*run)(alloc_fn_t, free_fn_t, int iterations);
} benchmark_t;
// Scenarios
void bench_json(alloc_fn_t alloc, free_fn_t free, int iters);
void bench_mir(alloc_fn_t alloc, free_fn_t free, int iters);
void bench_vm(alloc_fn_t alloc, free_fn_t free, int iters);
void bench_mixed(alloc_fn_t alloc, free_fn_t free, int iters);
// KPI measurement
void measure_start(void);
void measure_end(bench_result_t* out);
Output Format (CSV)
allocator,scenario,iterations,p50_ns,p95_ns,p99_ns,soft_pf,hard_pf,rss_mb,throughput
hakmem-baseline,json,1000,42,68,89,1234,0,5.2,23809
hakmem-evolving,json,1000,38,62,81,1150,0,4.8,26315
jemalloc,json,1000,45,72,95,1400,0,6.1,22222
mimalloc,json,1000,40,65,85,1280,0,5.5,25000
system,json,1000,55,90,120,1800,2,7.8,18181
🧪 Execution Plan
Step 1: Build
# hakmem versions
make clean && make # hakmem with UCB1
# System allocators (already installed)
apt-get install libjemalloc2 libmimalloc2.0
Step 2: Run Benchmarks
# Run all allocators, all scenarios
bash bench_runner.sh --warmup 3 --runs 10 --output results.csv
# Individual runs (for debugging)
./bench_allocators --allocator hakmem-baseline --scenario json
LD_PRELOAD=libjemalloc.so.2 ./bench_allocators --allocator jemalloc --scenario json
Step 3: Analyze
# Generate graphs
python3 analyze_results.py results.csv --output graphs/
# Generate LaTeX table
python3 generate_latex_table.py results.csv --output paper_table.tex
📋 Gemini's Critical Requirements
✅ Must Have (for Best Paper)
- jemalloc comparison - Industry standard
- mimalloc comparison - State-of-the-art
- Fair benchmarking - Same workload, multiple runs
- Statistical significance - Warm-up, median, confidence intervals
🎯 Should Have (for generality)
- Redis/Nginx benchmarks - Real-world workloads
- Confusion Matrix - Auto-inference accuracy
💡 Nice to Have
- Multi-threaded benchmarks - Scalability
- Memory fragmentation - Long-running tests
🚀 Next Steps
- Implement
bench_allocators.c⬅️ 次 - Implement
bench_runner.sh - Run initial benchmarks (10 runs)
- Analyze results
- Create graphs for paper
📝 Notes
- Fair comparison: Use same scenarios for all allocators
- Statistical rigor: Multiple runs, discard outliers
- Paper-ready: CSV → graphs/tables directly
- Reproducible: Document exact versions, environment
Related: Gemini's S+ review in chatgpt-advanced-proposals.md