Files

Moe Charm (CI) 52386401b3 Debug Counters Implementation - Clean History

Major Features:
- Debug counter infrastructure for Refill Stage tracking
- Free Pipeline counters (ss_local, ss_remote, tls_sll)
- Diagnostic counters for early return analysis
- Unified larson.sh benchmark runner with profiles
- Phase 6-3 regression analysis documentation

Bug Fixes:
- Fix SuperSlab disabled by default (HAKMEM_TINY_USE_SUPERSLAB)
- Fix profile variable naming consistency
- Add .gitignore patterns for large files

Performance:
- Phase 6-3: 4.79 M ops/s (has OOM risk)
- With SuperSlab: 3.13 M ops/s (+19% improvement)

This is a clean repository without large log files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-05 12:31:14 +09:00

5.4 KiB

Raw Blame History

hakmem Benchmark Design - jemalloc/mimalloc Comparison

Purpose: Compare hakmem against industry-standard allocators for paper evaluation

Date: 2025-10-21 Status: Phase 5 Implementation

🎯 Benchmark Goals

Per Gemini's S+ review:

"jemalloc/mimalloc比較がないと → Best Paper Awardは無理"

Key Requirements:

Fair comparison (same workload, same environment)
Multiple allocators: hakmem (baseline/evolving), jemalloc, mimalloc, system malloc
KPI measurement: P99 latency, page faults, RSS, throughput
Statistical significance: multiple runs, warm-up, median/percentiles
Paper-ready output: CSV format for graphs/tables

📊 Workload Scenarios

Using existing test scenarios from test_hakmem.c:

Scenario 1: JSON Parsing (small, frequent)

Size: 64KB
Iterations: 1000
Pattern: Allocate → Use → Free (tight loop)

Scenario 2: MIR Build (medium, moderate)

Size: 256KB
Iterations: 100
Pattern: Allocate → Use → Free (moderate)

Scenario 3: VM Execution (large, infrequent)

Size: 2MB
Iterations: 10
Pattern: Allocate → Use → Free (infrequent)

Scenario 4: Mixed (realistic)

All three patterns mixed
Simulates real compiler workload

🔬 Allocator Configurations

1. hakmem-baseline

HAKMEM_MODE not set
Fixed policy (256KB threshold)
Baseline for comparison

2. hakmem-evolving

HAKMEM_MODE=evolving
UCB1 enabled
Adaptive learning

3. jemalloc

LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2
Industry standard (Firefox, Redis)

4. mimalloc

LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libmimalloc.so.2
Microsoft allocator

5. system malloc (glibc)

No LD_PRELOAD
Default libc allocator
Control baseline

📈 KPI Metrics

Primary Metrics (for paper)

P99 Latency: 99th percentile allocation latency (ns)
Page Faults: Hard page faults (I/O required)
RSS Peak: Maximum resident set size (MB)

Secondary Metrics

Throughput: Allocations per second
P50/P95 Latency: Additional percentiles
Soft Page Faults: Minor faults (no I/O)

🏗️ Implementation Plan

Phase 5.1: Benchmark Infrastructure (今回)

Design document (this file)
bench_allocators.c - Main benchmark program
bench_runner.sh - Shell script to run all allocators
CSV output format

Phase 5.2: Statistical Analysis

Multiple runs (10-50 iterations)
Warm-up phase (discard first 3 runs)
Median/percentile calculation
Confidence intervals

Phase 5.3: Visualization

Python/gnuplot scripts for graphs
LaTeX tables for paper

🔧 Benchmark Program Design

`bench_allocators.c` Structure

// Allocator abstraction layer
typedef void* (*alloc_fn_t)(size_t);
typedef void (*free_fn_t)(void*, size_t);

// Benchmark a single scenario
typedef struct {
    const char* name;
    void (*run)(alloc_fn_t, free_fn_t, int iterations);
} benchmark_t;

// Scenarios
void bench_json(alloc_fn_t alloc, free_fn_t free, int iters);
void bench_mir(alloc_fn_t alloc, free_fn_t free, int iters);
void bench_vm(alloc_fn_t alloc, free_fn_t free, int iters);
void bench_mixed(alloc_fn_t alloc, free_fn_t free, int iters);

// KPI measurement
void measure_start(void);
void measure_end(bench_result_t* out);

Output Format (CSV)

allocator,scenario,iterations,p50_ns,p95_ns,p99_ns,soft_pf,hard_pf,rss_mb,throughput
hakmem-baseline,json,1000,42,68,89,1234,0,5.2,23809
hakmem-evolving,json,1000,38,62,81,1150,0,4.8,26315
jemalloc,json,1000,45,72,95,1400,0,6.1,22222
mimalloc,json,1000,40,65,85,1280,0,5.5,25000
system,json,1000,55,90,120,1800,2,7.8,18181

🧪 Execution Plan

Step 1: Build

# hakmem versions
make clean && make  # hakmem with UCB1

# System allocators (already installed)
apt-get install libjemalloc2 libmimalloc2.0

Step 2: Run Benchmarks

# Run all allocators, all scenarios
bash bench_runner.sh --warmup 3 --runs 10 --output results.csv

# Individual runs (for debugging)
./bench_allocators --allocator hakmem-baseline --scenario json
LD_PRELOAD=libjemalloc.so.2 ./bench_allocators --allocator jemalloc --scenario json

Step 3: Analyze

# Generate graphs
python3 analyze_results.py results.csv --output graphs/

# Generate LaTeX table
python3 generate_latex_table.py results.csv --output paper_table.tex

📋 Gemini's Critical Requirements

✅ Must Have (for Best Paper)

jemalloc comparison - Industry standard
mimalloc comparison - State-of-the-art
Fair benchmarking - Same workload, multiple runs
Statistical significance - Warm-up, median, confidence intervals

🎯 Should Have (for generality)

Redis/Nginx benchmarks - Real-world workloads
Confusion Matrix - Auto-inference accuracy

💡 Nice to Have

Multi-threaded benchmarks - Scalability
Memory fragmentation - Long-running tests

🚀 Next Steps

Implement bench_allocators.c ⬅️ 次
Implement bench_runner.sh
Run initial benchmarks (10 runs)
Analyze results
Create graphs for paper

📝 Notes

Fair comparison: Use same scenarios for all allocators
Statistical rigor: Multiple runs, discard outliers
Paper-ready: CSV → graphs/tables directly
Reproducible: Document exact versions, environment

Related: Gemini's S+ review in chatgpt-advanced-proposals.md

5.4 KiB Raw Blame History