hakmem/docs/design/BENCHMARK_DESIGN.md

# hakmem Benchmark Design - jemalloc/mimalloc Comparison

**Purpose**: Compare hakmem against industry-standard allocators for paper evaluation

**Date**: 2025-10-21
**Status**: Phase 5 Implementation

---

## 🎯 Benchmark Goals

Per Gemini's S+ review:
> "jemalloc/mimalloc比較がないと → Best Paper Awardは無理"

**Key Requirements**:
1. Fair comparison (same workload, same environment)
2. Multiple allocators: hakmem (baseline/evolving), jemalloc, mimalloc, system malloc
3. KPI measurement: P99 latency, page faults, RSS, throughput
4. Statistical significance: multiple runs, warm-up, median/percentiles
5. Paper-ready output: CSV format for graphs/tables

---

## 📊 Workload Scenarios

Using existing test scenarios from `test_hakmem.c`:

### Scenario 1: JSON Parsing (small, frequent)
- Size: 64KB
- Iterations: 1000
- Pattern: Allocate → Use → Free (tight loop)

### Scenario 2: MIR Build (medium, moderate)
- Size: 256KB
- Iterations: 100
- Pattern: Allocate → Use → Free (moderate)

### Scenario 3: VM Execution (large, infrequent)
- Size: 2MB
- Iterations: 10
- Pattern: Allocate → Use → Free (infrequent)

### Scenario 4: Mixed (realistic)
- All three patterns mixed
- Simulates real compiler workload

---

## 🔬 Allocator Configurations

### 1. hakmem-baseline
- `HAKMEM_MODE` not set
- Fixed policy (256KB threshold)
- Baseline for comparison

### 2. hakmem-evolving
- `HAKMEM_MODE=evolving`
- UCB1 enabled
- Adaptive learning

### 3. jemalloc
- `LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2`
- Industry standard (Firefox, Redis)

### 4. mimalloc
- `LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libmimalloc.so.2`
- Microsoft allocator

### 5. system malloc (glibc)
- No LD_PRELOAD
- Default libc allocator
- Control baseline

---

## 📈 KPI Metrics

### Primary Metrics (for paper)
1. **P99 Latency**: 99th percentile allocation latency (ns)
2. **Page Faults**: Hard page faults (I/O required)
3. **RSS Peak**: Maximum resident set size (MB)

### Secondary Metrics
4. **Throughput**: Allocations per second
5. **P50/P95 Latency**: Additional percentiles
6. **Soft Page Faults**: Minor faults (no I/O)

---

## 🏗️ Implementation Plan

### Phase 5.1: Benchmark Infrastructure (今回)
- [x] Design document (this file)
- [ ] `bench_allocators.c` - Main benchmark program
- [ ] `bench_runner.sh` - Shell script to run all allocators
- [ ] CSV output format

### Phase 5.2: Statistical Analysis
- [ ] Multiple runs (10-50 iterations)
- [ ] Warm-up phase (discard first 3 runs)
- [ ] Median/percentile calculation
- [ ] Confidence intervals

### Phase 5.3: Visualization
- [ ] Python/gnuplot scripts for graphs
- [ ] LaTeX tables for paper

---

## 🔧 Benchmark Program Design

### `bench_allocators.c` Structure

```c
// Allocator abstraction layer
typedef void* (*alloc_fn_t)(size_t);
typedef void (*free_fn_t)(void*, size_t);

// Benchmark a single scenario
typedef struct {
    const char* name;
    void (*run)(alloc_fn_t, free_fn_t, int iterations);
} benchmark_t;

// Scenarios
void bench_json(alloc_fn_t alloc, free_fn_t free, int iters);
void bench_mir(alloc_fn_t alloc, free_fn_t free, int iters);
void bench_vm(alloc_fn_t alloc, free_fn_t free, int iters);
void bench_mixed(alloc_fn_t alloc, free_fn_t free, int iters);

// KPI measurement
void measure_start(void);
void measure_end(bench_result_t* out);
```

### Output Format (CSV)

```csv
allocator,scenario,iterations,p50_ns,p95_ns,p99_ns,soft_pf,hard_pf,rss_mb,throughput
hakmem-baseline,json,1000,42,68,89,1234,0,5.2,23809
hakmem-evolving,json,1000,38,62,81,1150,0,4.8,26315
jemalloc,json,1000,45,72,95,1400,0,6.1,22222
mimalloc,json,1000,40,65,85,1280,0,5.5,25000
system,json,1000,55,90,120,1800,2,7.8,18181
```

---

## 🧪 Execution Plan

### Step 1: Build
```bash
# hakmem versions
make clean && make  # hakmem with UCB1

# System allocators (already installed)
apt-get install libjemalloc2 libmimalloc2.0
```

### Step 2: Run Benchmarks
```bash
# Run all allocators, all scenarios
bash bench_runner.sh --warmup 3 --runs 10 --output results.csv

# Individual runs (for debugging)
./bench_allocators --allocator hakmem-baseline --scenario json
LD_PRELOAD=libjemalloc.so.2 ./bench_allocators --allocator jemalloc --scenario json
```

### Step 3: Analyze
```bash
# Generate graphs
python3 analyze_results.py results.csv --output graphs/

# Generate LaTeX table
python3 generate_latex_table.py results.csv --output paper_table.tex
```

---

## 📋 Gemini's Critical Requirements

### ✅ Must Have (for Best Paper)
1. **jemalloc comparison** - Industry standard
2. **mimalloc comparison** - State-of-the-art
3. **Fair benchmarking** - Same workload, multiple runs
4. **Statistical significance** - Warm-up, median, confidence intervals

### 🎯 Should Have (for generality)
5. **Redis/Nginx benchmarks** - Real-world workloads
6. **Confusion Matrix** - Auto-inference accuracy

### 💡 Nice to Have
7. **Multi-threaded benchmarks** - Scalability
8. **Memory fragmentation** - Long-running tests

---

## 🚀 Next Steps

1. Implement `bench_allocators.c` ⬅️ **次**
2. Implement `bench_runner.sh`
3. Run initial benchmarks (10 runs)
4. Analyze results
5. Create graphs for paper

---

## 📝 Notes

- **Fair comparison**: Use same scenarios for all allocators
- **Statistical rigor**: Multiple runs, discard outliers
- **Paper-ready**: CSV → graphs/tables directly
- **Reproducible**: Document exact versions, environment

**Related**: Gemini's S+ review in `chatgpt-advanced-proposals.md`
Debug Counters Implementation - Clean History Major Features: - Debug counter infrastructure for Refill Stage tracking - Free Pipeline counters (ss_local, ss_remote, tls_sll) - Diagnostic counters for early return analysis - Unified larson.sh benchmark runner with profiles - Phase 6-3 regression analysis documentation Bug Fixes: - Fix SuperSlab disabled by default (HAKMEM_TINY_USE_SUPERSLAB) - Fix profile variable naming consistency - Add .gitignore patterns for large files Performance: - Phase 6-3: 4.79 M ops/s (has OOM risk) - With SuperSlab: 3.13 M ops/s (+19% improvement) This is a clean repository without large log files. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> 2025-11-05 12:31:14 +09:00			`# hakmem Benchmark Design - jemalloc/mimalloc Comparison`

			`Purpose: Compare hakmem against industry-standard allocators for paper evaluation`

			`Date: 2025-10-21`
			`Status: Phase 5 Implementation`

			`---`

			`## 🎯 Benchmark Goals`

			`Per Gemini's S+ review:`
			`> "jemalloc/mimalloc比較がないと → Best Paper Awardは無理"`

			`Key Requirements:`
			`1. Fair comparison (same workload, same environment)`
			`2. Multiple allocators: hakmem (baseline/evolving), jemalloc, mimalloc, system malloc`
			`3. KPI measurement: P99 latency, page faults, RSS, throughput`
			`4. Statistical significance: multiple runs, warm-up, median/percentiles`
			`5. Paper-ready output: CSV format for graphs/tables`

			`---`

			`## 📊 Workload Scenarios`

			Using existing test scenarios from `test_hakmem.c`:

			`### Scenario 1: JSON Parsing (small, frequent)`
			`- Size: 64KB`
			`- Iterations: 1000`
			`- Pattern: Allocate → Use → Free (tight loop)`

			`### Scenario 2: MIR Build (medium, moderate)`
			`- Size: 256KB`
			`- Iterations: 100`
			`- Pattern: Allocate → Use → Free (moderate)`

			`### Scenario 3: VM Execution (large, infrequent)`
			`- Size: 2MB`
			`- Iterations: 10`
			`- Pattern: Allocate → Use → Free (infrequent)`

			`### Scenario 4: Mixed (realistic)`
			`- All three patterns mixed`
			`- Simulates real compiler workload`

			`---`

			`## 🔬 Allocator Configurations`

			`### 1. hakmem-baseline`
			- `HAKMEM_MODE` not set
			`- Fixed policy (256KB threshold)`
			`- Baseline for comparison`

			`### 2. hakmem-evolving`
			- `HAKMEM_MODE=evolving`
			`- UCB1 enabled`
			`- Adaptive learning`

			`### 3. jemalloc`
			- `LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2`
			`- Industry standard (Firefox, Redis)`

			`### 4. mimalloc`
			- `LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libmimalloc.so.2`
			`- Microsoft allocator`

			`### 5. system malloc (glibc)`
			`- No LD_PRELOAD`
			`- Default libc allocator`
			`- Control baseline`

			`---`

			`## 📈 KPI Metrics`

			`### Primary Metrics (for paper)`
			`1. P99 Latency: 99th percentile allocation latency (ns)`
			`2. Page Faults: Hard page faults (I/O required)`
			`3. RSS Peak: Maximum resident set size (MB)`

			`### Secondary Metrics`
			`4. Throughput: Allocations per second`
			`5. P50/P95 Latency: Additional percentiles`
			`6. Soft Page Faults: Minor faults (no I/O)`

			`---`

			`## 🏗️ Implementation Plan`

			`### Phase 5.1: Benchmark Infrastructure (今回)`
			`- [x] Design document (this file)`
			- [ ] `bench_allocators.c` - Main benchmark program
			- [ ] `bench_runner.sh` - Shell script to run all allocators
			`- [ ] CSV output format`

			`### Phase 5.2: Statistical Analysis`
			`- [ ] Multiple runs (10-50 iterations)`
			`- [ ] Warm-up phase (discard first 3 runs)`
			`- [ ] Median/percentile calculation`
			`- [ ] Confidence intervals`

			`### Phase 5.3: Visualization`
			`- [ ] Python/gnuplot scripts for graphs`
			`- [ ] LaTeX tables for paper`

			`---`

			`## 🔧 Benchmark Program Design`

			### `bench_allocators.c` Structure

			```c
			`// Allocator abstraction layer`
			`typedef void* (*alloc_fn_t)(size_t);`
			`typedef void (free_fn_t)(void, size_t);`

			`// Benchmark a single scenario`
			`typedef struct {`
			`const char* name;`
			`void (*run)(alloc_fn_t, free_fn_t, int iterations);`
			`} benchmark_t;`

			`// Scenarios`
			`void bench_json(alloc_fn_t alloc, free_fn_t free, int iters);`
			`void bench_mir(alloc_fn_t alloc, free_fn_t free, int iters);`
			`void bench_vm(alloc_fn_t alloc, free_fn_t free, int iters);`
			`void bench_mixed(alloc_fn_t alloc, free_fn_t free, int iters);`

			`// KPI measurement`
			`void measure_start(void);`
			`void measure_end(bench_result_t* out);`
			```

			`### Output Format (CSV)`

			```csv
			`allocator,scenario,iterations,p50_ns,p95_ns,p99_ns,soft_pf,hard_pf,rss_mb,throughput`
			`hakmem-baseline,json,1000,42,68,89,1234,0,5.2,23809`
			`hakmem-evolving,json,1000,38,62,81,1150,0,4.8,26315`
			`jemalloc,json,1000,45,72,95,1400,0,6.1,22222`
			`mimalloc,json,1000,40,65,85,1280,0,5.5,25000`
			`system,json,1000,55,90,120,1800,2,7.8,18181`
			```

			`---`

			`## 🧪 Execution Plan`

			`### Step 1: Build`
			```bash
			`# hakmem versions`
			`make clean && make # hakmem with UCB1`

			`# System allocators (already installed)`
			`apt-get install libjemalloc2 libmimalloc2.0`
			```

			`### Step 2: Run Benchmarks`
			```bash
			`# Run all allocators, all scenarios`
			`bash bench_runner.sh --warmup 3 --runs 10 --output results.csv`

			`# Individual runs (for debugging)`
			`./bench_allocators --allocator hakmem-baseline --scenario json`
			`LD_PRELOAD=libjemalloc.so.2 ./bench_allocators --allocator jemalloc --scenario json`
			```

			`### Step 3: Analyze`
			```bash
			`# Generate graphs`
			`python3 analyze_results.py results.csv --output graphs/`

			`# Generate LaTeX table`
			`python3 generate_latex_table.py results.csv --output paper_table.tex`
			```

			`---`

			`## 📋 Gemini's Critical Requirements`

			`### ✅ Must Have (for Best Paper)`
			`1. jemalloc comparison - Industry standard`
			`2. mimalloc comparison - State-of-the-art`
			`3. Fair benchmarking - Same workload, multiple runs`
			`4. Statistical significance - Warm-up, median, confidence intervals`

			`### 🎯 Should Have (for generality)`
			`5. Redis/Nginx benchmarks - Real-world workloads`
			`6. Confusion Matrix - Auto-inference accuracy`

			`### 💡 Nice to Have`
			`7. Multi-threaded benchmarks - Scalability`
			`8. Memory fragmentation - Long-running tests`

			`---`

			`## 🚀 Next Steps`

			1. Implement `bench_allocators.c` ⬅️ 次
			2. Implement `bench_runner.sh`
			`3. Run initial benchmarks (10 runs)`
			`4. Analyze results`
			`5. Create graphs for paper`

			`---`

			`## 📝 Notes`

			`- Fair comparison: Use same scenarios for all allocators`
			`- Statistical rigor: Multiple runs, discard outliers`
			`- Paper-ready: CSV → graphs/tables directly`
			`- Reproducible: Document exact versions, environment`

			Related: Gemini's S+ review in `chatgpt-advanced-proposals.md`