hakmem/docs/design/BENCHMARK_DESIGN.md

# hakmem Benchmark Design - jemalloc/mimalloc Comparison

**Purpose**: Compare hakmem against industry-standard allocators for paper evaluation

**Date**: 2025-10-21
**Status**: Phase 5 Implementation

---

## 🎯 Benchmark Goals

Per Gemini's S+ review:
> "jemalloc/mimalloc比較がないと → Best Paper Awardは無理"

**Key Requirements**:
1. Fair comparison (same workload, same environment)
2. Multiple allocators: hakmem (baseline/evolving), jemalloc, mimalloc, system malloc
3. KPI measurement: P99 latency, page faults, RSS, throughput
4. Statistical significance: multiple runs, warm-up, median/percentiles
5. Paper-ready output: CSV format for graphs/tables

---

## 📊 Workload Scenarios

Using existing test scenarios from `test_hakmem.c`:

### Scenario 1: JSON Parsing (small, frequent)
- Size: 64KB
- Iterations: 1000
- Pattern: Allocate → Use → Free (tight loop)

### Scenario 2: MIR Build (medium, moderate)
- Size: 256KB
- Iterations: 100
- Pattern: Allocate → Use → Free (moderate)

### Scenario 3: VM Execution (large, infrequent)
- Size: 2MB
- Iterations: 10
- Pattern: Allocate → Use → Free (infrequent)

### Scenario 4: Mixed (realistic)
- All three patterns mixed
- Simulates real compiler workload

---

## 🔬 Allocator Configurations

### 1. hakmem-baseline
- `HAKMEM_MODE` not set
- Fixed policy (256KB threshold)
- Baseline for comparison

### 2. hakmem-evolving
- `HAKMEM_MODE=evolving`
- UCB1 enabled
- Adaptive learning

### 3. jemalloc
- `LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2`
- Industry standard (Firefox, Redis)

### 4. mimalloc
- `LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libmimalloc.so.2`
- Microsoft allocator

### 5. system malloc (glibc)
- No LD_PRELOAD
- Default libc allocator
- Control baseline

---

## 📈 KPI Metrics

### Primary Metrics (for paper)
1. **P99 Latency**: 99th percentile allocation latency (ns)
2. **Page Faults**: Hard page faults (I/O required)
3. **RSS Peak**: Maximum resident set size (MB)

### Secondary Metrics
4. **Throughput**: Allocations per second
5. **P50/P95 Latency**: Additional percentiles
6. **Soft Page Faults**: Minor faults (no I/O)

---

## 🏗️ Implementation Plan

### Phase 5.1: Benchmark Infrastructure (今回)
- [x] Design document (this file)
- [ ] `bench_allocators.c` - Main benchmark program
- [ ] `bench_runner.sh` - Shell script to run all allocators
- [ ] CSV output format

### Phase 5.2: Statistical Analysis
- [ ] Multiple runs (10-50 iterations)
- [ ] Warm-up phase (discard first 3 runs)
- [ ] Median/percentile calculation
- [ ] Confidence intervals

### Phase 5.3: Visualization
- [ ] Python/gnuplot scripts for graphs
- [ ] LaTeX tables for paper

---

## 🔧 Benchmark Program Design

### `bench_allocators.c` Structure

```c
// Allocator abstraction layer
typedef void* (*alloc_fn_t)(size_t);
typedef void (*free_fn_t)(void*, size_t);

// Benchmark a single scenario
typedef struct {
    const char* name;
    void (*run)(alloc_fn_t, free_fn_t, int iterations);
} benchmark_t;

// Scenarios
void bench_json(alloc_fn_t alloc, free_fn_t free, int iters);
void bench_mir(alloc_fn_t alloc, free_fn_t free, int iters);
void bench_vm(alloc_fn_t alloc, free_fn_t free, int iters);
void bench_mixed(alloc_fn_t alloc, free_fn_t free, int iters);

// KPI measurement
void measure_start(void);
void measure_end(bench_result_t* out);
```

### Output Format (CSV)

```csv
allocator,scenario,iterations,p50_ns,p95_ns,p99_ns,soft_pf,hard_pf,rss_mb,throughput
hakmem-baseline,json,1000,42,68,89,1234,0,5.2,23809
hakmem-evolving,json,1000,38,62,81,1150,0,4.8,26315
jemalloc,json,1000,45,72,95,1400,0,6.1,22222
mimalloc,json,1000,40,65,85,1280,0,5.5,25000
system,json,1000,55,90,120,1800,2,7.8,18181
```

---

## 🧪 Execution Plan

### Step 1: Build
```bash
# hakmem versions
make clean && make  # hakmem with UCB1

# System allocators (already installed)
apt-get install libjemalloc2 libmimalloc2.0
```

### Step 2: Run Benchmarks
```bash
# Run all allocators, all scenarios
bash bench_runner.sh --warmup 3 --runs 10 --output results.csv

# Individual runs (for debugging)
./bench_allocators --allocator hakmem-baseline --scenario json
LD_PRELOAD=libjemalloc.so.2 ./bench_allocators --allocator jemalloc --scenario json
```

### Step 3: Analyze
```bash
# Generate graphs
python3 analyze_results.py results.csv --output graphs/

# Generate LaTeX table
python3 generate_latex_table.py results.csv --output paper_table.tex
```

---

## 📋 Gemini's Critical Requirements

### ✅ Must Have (for Best Paper)
1. **jemalloc comparison** - Industry standard
2. **mimalloc comparison** - State-of-the-art
3. **Fair benchmarking** - Same workload, multiple runs
4. **Statistical significance** - Warm-up, median, confidence intervals

### 🎯 Should Have (for generality)
5. **Redis/Nginx benchmarks** - Real-world workloads
6. **Confusion Matrix** - Auto-inference accuracy

### 💡 Nice to Have
7. **Multi-threaded benchmarks** - Scalability
8. **Memory fragmentation** - Long-running tests

---

## 🚀 Next Steps

1. Implement `bench_allocators.c` ⬅️ **次**
2. Implement `bench_runner.sh`
3. Run initial benchmarks (10 runs)
4. Analyze results
5. Create graphs for paper

---

## 📝 Notes

- **Fair comparison**: Use same scenarios for all allocators
- **Statistical rigor**: Multiple runs, discard outliers
- **Paper-ready**: CSV → graphs/tables directly
- **Reproducible**: Document exact versions, environment

**Related**: Gemini's S+ review in `chatgpt-advanced-proposals.md`