Major Features: - Debug counter infrastructure for Refill Stage tracking - Free Pipeline counters (ss_local, ss_remote, tls_sll) - Diagnostic counters for early return analysis - Unified larson.sh benchmark runner with profiles - Phase 6-3 regression analysis documentation Bug Fixes: - Fix SuperSlab disabled by default (HAKMEM_TINY_USE_SUPERSLAB) - Fix profile variable naming consistency - Add .gitignore patterns for large files Performance: - Phase 6-3: 4.79 M ops/s (has OOM risk) - With SuperSlab: 3.13 M ops/s (+19% improvement) This is a clean repository without large log files. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
217 lines
5.4 KiB
Markdown
217 lines
5.4 KiB
Markdown
# hakmem Benchmark Design - jemalloc/mimalloc Comparison
|
|
|
|
**Purpose**: Compare hakmem against industry-standard allocators for paper evaluation
|
|
|
|
**Date**: 2025-10-21
|
|
**Status**: Phase 5 Implementation
|
|
|
|
---
|
|
|
|
## 🎯 Benchmark Goals
|
|
|
|
Per Gemini's S+ review:
|
|
> "jemalloc/mimalloc比較がないと → Best Paper Awardは無理"
|
|
|
|
**Key Requirements**:
|
|
1. Fair comparison (same workload, same environment)
|
|
2. Multiple allocators: hakmem (baseline/evolving), jemalloc, mimalloc, system malloc
|
|
3. KPI measurement: P99 latency, page faults, RSS, throughput
|
|
4. Statistical significance: multiple runs, warm-up, median/percentiles
|
|
5. Paper-ready output: CSV format for graphs/tables
|
|
|
|
---
|
|
|
|
## 📊 Workload Scenarios
|
|
|
|
Using existing test scenarios from `test_hakmem.c`:
|
|
|
|
### Scenario 1: JSON Parsing (small, frequent)
|
|
- Size: 64KB
|
|
- Iterations: 1000
|
|
- Pattern: Allocate → Use → Free (tight loop)
|
|
|
|
### Scenario 2: MIR Build (medium, moderate)
|
|
- Size: 256KB
|
|
- Iterations: 100
|
|
- Pattern: Allocate → Use → Free (moderate)
|
|
|
|
### Scenario 3: VM Execution (large, infrequent)
|
|
- Size: 2MB
|
|
- Iterations: 10
|
|
- Pattern: Allocate → Use → Free (infrequent)
|
|
|
|
### Scenario 4: Mixed (realistic)
|
|
- All three patterns mixed
|
|
- Simulates real compiler workload
|
|
|
|
---
|
|
|
|
## 🔬 Allocator Configurations
|
|
|
|
### 1. hakmem-baseline
|
|
- `HAKMEM_MODE` not set
|
|
- Fixed policy (256KB threshold)
|
|
- Baseline for comparison
|
|
|
|
### 2. hakmem-evolving
|
|
- `HAKMEM_MODE=evolving`
|
|
- UCB1 enabled
|
|
- Adaptive learning
|
|
|
|
### 3. jemalloc
|
|
- `LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2`
|
|
- Industry standard (Firefox, Redis)
|
|
|
|
### 4. mimalloc
|
|
- `LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libmimalloc.so.2`
|
|
- Microsoft allocator
|
|
|
|
### 5. system malloc (glibc)
|
|
- No LD_PRELOAD
|
|
- Default libc allocator
|
|
- Control baseline
|
|
|
|
---
|
|
|
|
## 📈 KPI Metrics
|
|
|
|
### Primary Metrics (for paper)
|
|
1. **P99 Latency**: 99th percentile allocation latency (ns)
|
|
2. **Page Faults**: Hard page faults (I/O required)
|
|
3. **RSS Peak**: Maximum resident set size (MB)
|
|
|
|
### Secondary Metrics
|
|
4. **Throughput**: Allocations per second
|
|
5. **P50/P95 Latency**: Additional percentiles
|
|
6. **Soft Page Faults**: Minor faults (no I/O)
|
|
|
|
---
|
|
|
|
## 🏗️ Implementation Plan
|
|
|
|
### Phase 5.1: Benchmark Infrastructure (今回)
|
|
- [x] Design document (this file)
|
|
- [ ] `bench_allocators.c` - Main benchmark program
|
|
- [ ] `bench_runner.sh` - Shell script to run all allocators
|
|
- [ ] CSV output format
|
|
|
|
### Phase 5.2: Statistical Analysis
|
|
- [ ] Multiple runs (10-50 iterations)
|
|
- [ ] Warm-up phase (discard first 3 runs)
|
|
- [ ] Median/percentile calculation
|
|
- [ ] Confidence intervals
|
|
|
|
### Phase 5.3: Visualization
|
|
- [ ] Python/gnuplot scripts for graphs
|
|
- [ ] LaTeX tables for paper
|
|
|
|
---
|
|
|
|
## 🔧 Benchmark Program Design
|
|
|
|
### `bench_allocators.c` Structure
|
|
|
|
```c
|
|
// Allocator abstraction layer
|
|
typedef void* (*alloc_fn_t)(size_t);
|
|
typedef void (*free_fn_t)(void*, size_t);
|
|
|
|
// Benchmark a single scenario
|
|
typedef struct {
|
|
const char* name;
|
|
void (*run)(alloc_fn_t, free_fn_t, int iterations);
|
|
} benchmark_t;
|
|
|
|
// Scenarios
|
|
void bench_json(alloc_fn_t alloc, free_fn_t free, int iters);
|
|
void bench_mir(alloc_fn_t alloc, free_fn_t free, int iters);
|
|
void bench_vm(alloc_fn_t alloc, free_fn_t free, int iters);
|
|
void bench_mixed(alloc_fn_t alloc, free_fn_t free, int iters);
|
|
|
|
// KPI measurement
|
|
void measure_start(void);
|
|
void measure_end(bench_result_t* out);
|
|
```
|
|
|
|
### Output Format (CSV)
|
|
|
|
```csv
|
|
allocator,scenario,iterations,p50_ns,p95_ns,p99_ns,soft_pf,hard_pf,rss_mb,throughput
|
|
hakmem-baseline,json,1000,42,68,89,1234,0,5.2,23809
|
|
hakmem-evolving,json,1000,38,62,81,1150,0,4.8,26315
|
|
jemalloc,json,1000,45,72,95,1400,0,6.1,22222
|
|
mimalloc,json,1000,40,65,85,1280,0,5.5,25000
|
|
system,json,1000,55,90,120,1800,2,7.8,18181
|
|
```
|
|
|
|
---
|
|
|
|
## 🧪 Execution Plan
|
|
|
|
### Step 1: Build
|
|
```bash
|
|
# hakmem versions
|
|
make clean && make # hakmem with UCB1
|
|
|
|
# System allocators (already installed)
|
|
apt-get install libjemalloc2 libmimalloc2.0
|
|
```
|
|
|
|
### Step 2: Run Benchmarks
|
|
```bash
|
|
# Run all allocators, all scenarios
|
|
bash bench_runner.sh --warmup 3 --runs 10 --output results.csv
|
|
|
|
# Individual runs (for debugging)
|
|
./bench_allocators --allocator hakmem-baseline --scenario json
|
|
LD_PRELOAD=libjemalloc.so.2 ./bench_allocators --allocator jemalloc --scenario json
|
|
```
|
|
|
|
### Step 3: Analyze
|
|
```bash
|
|
# Generate graphs
|
|
python3 analyze_results.py results.csv --output graphs/
|
|
|
|
# Generate LaTeX table
|
|
python3 generate_latex_table.py results.csv --output paper_table.tex
|
|
```
|
|
|
|
---
|
|
|
|
## 📋 Gemini's Critical Requirements
|
|
|
|
### ✅ Must Have (for Best Paper)
|
|
1. **jemalloc comparison** - Industry standard
|
|
2. **mimalloc comparison** - State-of-the-art
|
|
3. **Fair benchmarking** - Same workload, multiple runs
|
|
4. **Statistical significance** - Warm-up, median, confidence intervals
|
|
|
|
### 🎯 Should Have (for generality)
|
|
5. **Redis/Nginx benchmarks** - Real-world workloads
|
|
6. **Confusion Matrix** - Auto-inference accuracy
|
|
|
|
### 💡 Nice to Have
|
|
7. **Multi-threaded benchmarks** - Scalability
|
|
8. **Memory fragmentation** - Long-running tests
|
|
|
|
---
|
|
|
|
## 🚀 Next Steps
|
|
|
|
1. Implement `bench_allocators.c` ⬅️ **次**
|
|
2. Implement `bench_runner.sh`
|
|
3. Run initial benchmarks (10 runs)
|
|
4. Analyze results
|
|
5. Create graphs for paper
|
|
|
|
---
|
|
|
|
## 📝 Notes
|
|
|
|
- **Fair comparison**: Use same scenarios for all allocators
|
|
- **Statistical rigor**: Multiple runs, discard outliers
|
|
- **Paper-ready**: CSV → graphs/tables directly
|
|
- **Reproducible**: Document exact versions, environment
|
|
|
|
**Related**: Gemini's S+ review in `chatgpt-advanced-proposals.md`
|