571 lines
15 KiB
Markdown
571 lines
15 KiB
Markdown
|
|
# Phase 7 Full Benchmark Suite Execution Plan
|
|||
|
|
|
|||
|
|
**Date**: 2025-11-08
|
|||
|
|
**Phase**: 7-1.3 (HEADER_CLASSIDX=1 optimization)
|
|||
|
|
**Current Status**: Partial results available (Larson 1T: 2.63M ops/s, bench_random_mixed 128B: 17.7M ops/s)
|
|||
|
|
**Goal**: Comprehensive performance evaluation across ALL benchmark patterns
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Executive Summary
|
|||
|
|
|
|||
|
|
### Available Benchmarks (5 categories)
|
|||
|
|
|
|||
|
|
1. **Larson** - Multi-threaded stress test (8-128B, mimalloc-bench derived)
|
|||
|
|
2. **Random Mixed** - Single-threaded random allocation (16-8192B)
|
|||
|
|
3. **Mid-Large MT** - Multi-threaded mid-size (8-32KB)
|
|||
|
|
4. **VM Mixed** - Large allocations (512KB-2MB, L2.5/L2 test)
|
|||
|
|
5. **Tiny Hot** - Hot path micro-benchmark (8-64B, LIFO)
|
|||
|
|
|
|||
|
|
### Current Build Status (Phase 7 = HEADER_CLASSIDX=1)
|
|||
|
|
|
|||
|
|
All benchmarks were built with HEADER_CLASSIDX=1 on 2025-11-07/08:
|
|||
|
|
- ✅ `larson_hakmem` (2025-11-08 11:48)
|
|||
|
|
- ✅ `bench_random_mixed_hakmem` (2025-11-08 11:48)
|
|||
|
|
- ✅ `bench_mid_large_mt_hakmem` (2025-11-07 18:42)
|
|||
|
|
- ✅ `bench_tiny_hot_hakmem` (2025-11-07 18:03)
|
|||
|
|
- ✅ `bench_vm_mixed_hakmem` (2025-11-07 18:03)
|
|||
|
|
|
|||
|
|
**Note**: Makefile has `HAKMEM_TINY_HEADER_CLASSIDX=1` permanently enabled (line 99-100).
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Execution Plan
|
|||
|
|
|
|||
|
|
### Phase 1: Verify Build Status (5 minutes)
|
|||
|
|
|
|||
|
|
**Verify HEADER_CLASSIDX=1 is enabled:**
|
|||
|
|
```bash
|
|||
|
|
# Check Makefile flag
|
|||
|
|
grep "HAKMEM_TINY_HEADER_CLASSIDX" Makefile
|
|||
|
|
|
|||
|
|
# Verify all binaries are up-to-date
|
|||
|
|
make -n bench_random_mixed_hakmem bench_tiny_hot_hakmem \
|
|||
|
|
bench_mid_large_mt_hakmem bench_vm_mixed_hakmem \
|
|||
|
|
larson_hakmem
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**If rebuild needed:**
|
|||
|
|
```bash
|
|||
|
|
# Clean rebuild with HEADER_CLASSIDX=1 (already default)
|
|||
|
|
make clean
|
|||
|
|
make -j bench_random_mixed_hakmem bench_random_mixed_system bench_random_mixed_mi \
|
|||
|
|
bench_tiny_hot_hakmem bench_tiny_hot_system bench_tiny_hot_mi \
|
|||
|
|
bench_mid_large_mt_hakmem bench_mid_large_mt_system bench_mid_large_mt_mi \
|
|||
|
|
bench_vm_mixed_hakmem bench_vm_mixed_system \
|
|||
|
|
larson_hakmem larson_system larson_mi
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Time**: ~3-5 minutes (if rebuild needed)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### Phase 2: Quick Sanity Test (2 minutes)
|
|||
|
|
|
|||
|
|
**Test each benchmark runs successfully:**
|
|||
|
|
```bash
|
|||
|
|
# Larson (1T, 1 second)
|
|||
|
|
./larson_hakmem 1 8 128 1024 1 12345 1
|
|||
|
|
|
|||
|
|
# Random Mixed (small run)
|
|||
|
|
./bench_random_mixed_hakmem 1000 128 1234567
|
|||
|
|
|
|||
|
|
# Mid-Large MT (2 threads, small)
|
|||
|
|
./bench_mid_large_mt_hakmem 2 1000 2048 42
|
|||
|
|
|
|||
|
|
# VM Mixed (small)
|
|||
|
|
./bench_vm_mixed_hakmem 100 256 424242
|
|||
|
|
|
|||
|
|
# Tiny Hot (small)
|
|||
|
|
./bench_tiny_hot_hakmem 32 10 1000
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Expected**: All benchmarks run without SEGV/crashes.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### Phase 3: Full Benchmark Suite Execution
|
|||
|
|
|
|||
|
|
#### Option A: Automated Suite Runner (RECOMMENDED) ⭐
|
|||
|
|
|
|||
|
|
**Use existing bench_suite_matrix.sh:**
|
|||
|
|
```bash
|
|||
|
|
# This runs ALL benchmarks (random_mixed, mid_large_mt, vm_mixed, tiny_hot)
|
|||
|
|
# across system/mimalloc/HAKMEM variants
|
|||
|
|
./scripts/bench_suite_matrix.sh
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Output**:
|
|||
|
|
- CSV: `bench_results/suite/<timestamp>/results.csv`
|
|||
|
|
- Raw logs: `bench_results/suite/<timestamp>/raw/*.out`
|
|||
|
|
|
|||
|
|
**Time**: ~15-20 minutes
|
|||
|
|
|
|||
|
|
**Coverage**:
|
|||
|
|
- Random Mixed: 2 cycles × 2 ws × 3 variants = 12 runs
|
|||
|
|
- Mid-Large MT: 2 threads × 3 variants = 6 runs
|
|||
|
|
- VM Mixed: 2 cycles × 2 variants = 4 runs (system + hakmem only)
|
|||
|
|
- Tiny Hot: 2 sizes × 3 variants = 6 runs
|
|||
|
|
|
|||
|
|
**Total**: 28 benchmark runs
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
#### Option B: Individual Benchmark Scripts (Detailed Analysis)
|
|||
|
|
|
|||
|
|
If you need more control or want to run A/B tests with environment variables:
|
|||
|
|
|
|||
|
|
##### 3.1 Larson Benchmark (Multi-threaded Stress)
|
|||
|
|
|
|||
|
|
**Basic run (1T, 4T, 8T):**
|
|||
|
|
```bash
|
|||
|
|
# 1 thread, 10 seconds
|
|||
|
|
HAKMEM_WRAP_TINY=1 ./larson_hakmem 10 8 128 1024 1 12345 1
|
|||
|
|
|
|||
|
|
# 4 threads, 10 seconds (CRITICAL: test multi-thread stability)
|
|||
|
|
HAKMEM_WRAP_TINY=1 ./larson_hakmem 10 8 128 1024 1 12345 4
|
|||
|
|
|
|||
|
|
# 8 threads, 10 seconds
|
|||
|
|
HAKMEM_WRAP_TINY=1 ./larson_hakmem 10 8 128 1024 1 12345 8
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**A/B test with environment variables:**
|
|||
|
|
```bash
|
|||
|
|
# Use automated script (includes PGO)
|
|||
|
|
./scripts/bench_larson_1t_ab.sh
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Output**: `bench_results/larson_ab/<timestamp>/results.csv`
|
|||
|
|
|
|||
|
|
**Time**: ~20-30 minutes (includes PGO build)
|
|||
|
|
|
|||
|
|
**Key Metrics**:
|
|||
|
|
- Throughput (ops/s)
|
|||
|
|
- Stability (4T should not crash - see Phase 6-2.3 active counter fix)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
##### 3.2 Random Mixed (Single-threaded, Mixed Sizes)
|
|||
|
|
|
|||
|
|
**Basic run:**
|
|||
|
|
```bash
|
|||
|
|
# 400K cycles, 8192B working set
|
|||
|
|
HAKMEM_WRAP_TINY=1 ./bench_random_mixed_hakmem 400000 8192 1234567
|
|||
|
|
./bench_random_mixed_system 400000 8192 1234567
|
|||
|
|
./bench_random_mixed_mi 400000 8192 1234567
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**A/B test with environment variables:**
|
|||
|
|
```bash
|
|||
|
|
# Runs 5 repetitions, median calculation
|
|||
|
|
./scripts/bench_random_mixed_ab.sh
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Output**: `bench_results/random_mixed_ab/<timestamp>/results.csv`
|
|||
|
|
|
|||
|
|
**Time**: ~15-20 minutes (5 reps × multiple configs)
|
|||
|
|
|
|||
|
|
**Key Metrics**:
|
|||
|
|
- Throughput (ops/s) across different working set sizes
|
|||
|
|
- SPECIALIZE_MASK impact (0 vs 0x0F)
|
|||
|
|
- FAST_CAP impact (8 vs 16 vs 32)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
##### 3.3 Mid-Large MT (Multi-threaded, 8-32KB)
|
|||
|
|
|
|||
|
|
**Basic run:**
|
|||
|
|
```bash
|
|||
|
|
# 4 threads, 40K cycles, 2KB working set
|
|||
|
|
HAKMEM_WRAP_TINY=1 ./bench_mid_large_mt_hakmem 4 40000 2048 42
|
|||
|
|
./bench_mid_large_mt_system 4 40000 2048 42
|
|||
|
|
./bench_mid_large_mt_mi 4 40000 2048 42
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**A/B test:**
|
|||
|
|
```bash
|
|||
|
|
./scripts/bench_mid_large_mt_ab.sh
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Output**: `bench_results/mid_large_mt_ab/<timestamp>/results.csv`
|
|||
|
|
|
|||
|
|
**Time**: ~10-15 minutes
|
|||
|
|
|
|||
|
|
**Key Metrics**:
|
|||
|
|
- Multi-threaded performance (2T vs 4T)
|
|||
|
|
- HAKMEM's SuperSlab efficiency (expected: strong performance here)
|
|||
|
|
|
|||
|
|
**Note**: Previous results showed HAKMEM weakness here (suite/20251107: 2.1M vs system 8.7M).
|
|||
|
|
This is unexpected given the Mid-Large benchmark success (+108% on 2025-11-02).
|
|||
|
|
Need to investigate if this is a regression or different test pattern.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
##### 3.4 VM Mixed (Large Allocations, 512KB-2MB)
|
|||
|
|
|
|||
|
|
**Basic run:**
|
|||
|
|
```bash
|
|||
|
|
# 20K cycles, 256 working set
|
|||
|
|
HAKMEM_BIGCACHE_L25=1 HAKMEM_WRAP_TINY=1 ./bench_vm_mixed_hakmem 20000 256 424242
|
|||
|
|
./bench_vm_mixed_system 20000 256 424242
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Time**: ~5 minutes
|
|||
|
|
|
|||
|
|
**Key Metrics**:
|
|||
|
|
- L2.5 cache effectiveness (BIGCACHE_L25=1 vs 0)
|
|||
|
|
- Large allocation performance
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
##### 3.5 Tiny Hot (Hot Path Micro-benchmark)
|
|||
|
|
|
|||
|
|
**Basic run:**
|
|||
|
|
```bash
|
|||
|
|
# 32B, 100 batch, 60K cycles
|
|||
|
|
HAKMEM_WRAP_TINY=1 ./bench_tiny_hot_hakmem 32 100 60000
|
|||
|
|
./bench_tiny_hot_system 32 100 60000
|
|||
|
|
./bench_tiny_hot_mi 32 100 60000
|
|||
|
|
|
|||
|
|
# 64B
|
|||
|
|
HAKMEM_WRAP_TINY=1 ./bench_tiny_hot_hakmem 64 100 60000
|
|||
|
|
./bench_tiny_hot_system 64 100 60000
|
|||
|
|
./bench_tiny_hot_mi 64 100 60000
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Time**: ~5 minutes
|
|||
|
|
|
|||
|
|
**Key Metrics**:
|
|||
|
|
- Hot path efficiency (direct TLS cache access)
|
|||
|
|
- Expected weakness (Phase 6 analysis: -60% vs system)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### Phase 4: Analysis and Comparison
|
|||
|
|
|
|||
|
|
#### 4.1 Extract Results from Suite Run
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# Get latest suite results
|
|||
|
|
latest=$(ls -td bench_results/suite/* | head -1)
|
|||
|
|
cat ${latest}/results.csv
|
|||
|
|
|
|||
|
|
# Quick comparison
|
|||
|
|
awk -F, 'NR>1 {
|
|||
|
|
if ($2=="hakmem") hakmem[$1]+=$4
|
|||
|
|
if ($2=="system") system[$1]+=$4
|
|||
|
|
if ($2=="mi") mi[$1]+=$4
|
|||
|
|
count[$1]++
|
|||
|
|
} END {
|
|||
|
|
for (b in hakmem) {
|
|||
|
|
h=hakmem[b]/count[b]
|
|||
|
|
s=system[b]/count[b]
|
|||
|
|
m=mi[b]/count[b]
|
|||
|
|
printf "%s: HAKMEM=%.2fM system=%.2fM mi=%.2fM (vs_sys=%+.1f%%, vs_mi=%+.1f%%)\n",
|
|||
|
|
b, h/1e6, s/1e6, m/1e6, (h/s-1)*100, (h/m-1)*100
|
|||
|
|
}
|
|||
|
|
}' ${latest}/results.csv
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 4.2 Key Comparisons
|
|||
|
|
|
|||
|
|
**Phase 7 vs System malloc:**
|
|||
|
|
```bash
|
|||
|
|
# Extract HAKMEM vs system for each benchmark
|
|||
|
|
awk -F, 'NR>1 && ($2=="hakmem" || $2=="system") {
|
|||
|
|
key=$1 "," $3
|
|||
|
|
if ($2=="hakmem") h[key]=$4
|
|||
|
|
if ($2=="system") s[key]=$4
|
|||
|
|
} END {
|
|||
|
|
for (k in h) {
|
|||
|
|
if (s[k]) {
|
|||
|
|
pct = (h[k]/s[k] - 1) * 100
|
|||
|
|
printf "%s: %.2fM vs %.2fM (%+.1f%%)\n", k, h[k]/1e6, s[k]/1e6, pct
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
}' ${latest}/results.csv | sort
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Phase 7 vs mimalloc:**
|
|||
|
|
```bash
|
|||
|
|
# Similar for mimalloc comparison
|
|||
|
|
awk -F, 'NR>1 && ($2=="hakmem" || $2=="mi") {
|
|||
|
|
key=$1 "," $3
|
|||
|
|
if ($2=="hakmem") h[key]=$4
|
|||
|
|
if ($2=="mi") m[key]=$4
|
|||
|
|
} END {
|
|||
|
|
for (k in h) {
|
|||
|
|
if (m[k]) {
|
|||
|
|
pct = (h[k]/m[k] - 1) * 100
|
|||
|
|
printf "%s: %.2fM vs %.2fM (%+.1f%%)\n", k, h[k]/1e6, m[k]/1e6, pct
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
}' ${latest}/results.csv | sort
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 4.3 Generate Summary Report
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# Create comprehensive summary
|
|||
|
|
cat > PHASE7_RESULTS_SUMMARY.md << 'REPORT'
|
|||
|
|
# Phase 7 Benchmark Results Summary
|
|||
|
|
|
|||
|
|
## Test Configuration
|
|||
|
|
- Phase: 7-1.3 (HEADER_CLASSIDX=1)
|
|||
|
|
- Date: $(date +%Y-%m-%d)
|
|||
|
|
- Suite: $(basename ${latest})
|
|||
|
|
|
|||
|
|
## Overall Results
|
|||
|
|
|
|||
|
|
### Random Mixed (16-8192B, single-threaded)
|
|||
|
|
[Insert results here]
|
|||
|
|
|
|||
|
|
### Mid-Large MT (8-32KB, multi-threaded)
|
|||
|
|
[Insert results here]
|
|||
|
|
|
|||
|
|
### VM Mixed (512KB-2MB, large allocations)
|
|||
|
|
[Insert results here]
|
|||
|
|
|
|||
|
|
### Tiny Hot (8-64B, hot path micro)
|
|||
|
|
[Insert results here]
|
|||
|
|
|
|||
|
|
### Larson (8-128B, multi-threaded stress)
|
|||
|
|
[Insert results here]
|
|||
|
|
|
|||
|
|
## Analysis
|
|||
|
|
|
|||
|
|
### Strengths
|
|||
|
|
[Areas where HAKMEM outperforms]
|
|||
|
|
|
|||
|
|
### Weaknesses
|
|||
|
|
[Areas where HAKMEM underperforms]
|
|||
|
|
|
|||
|
|
### Comparison with Previous Phases
|
|||
|
|
[Phase 6 vs Phase 7 delta]
|
|||
|
|
|
|||
|
|
## Bottleneck Identification
|
|||
|
|
|
|||
|
|
[Performance profiling with perf]
|
|||
|
|
|
|||
|
|
REPORT
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### Phase 5: Performance Profiling (Optional, if bottlenecks found)
|
|||
|
|
|
|||
|
|
**Profile hot paths with perf:**
|
|||
|
|
```bash
|
|||
|
|
# Profile random_mixed (if slow)
|
|||
|
|
perf record -g --call-graph dwarf -- \
|
|||
|
|
./bench_random_mixed_hakmem 400000 8192 1234567
|
|||
|
|
|
|||
|
|
perf report --stdio > perf_random_mixed_phase7.txt
|
|||
|
|
|
|||
|
|
# Profile larson 1T
|
|||
|
|
perf record -g --call-graph dwarf -- \
|
|||
|
|
./larson_hakmem 10 8 128 1024 1 12345 1
|
|||
|
|
|
|||
|
|
perf report --stdio > perf_larson_1t_phase7.txt
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Compare with Phase 6:**
|
|||
|
|
```bash
|
|||
|
|
# If you have Phase 6 binaries saved, run side-by-side
|
|||
|
|
# and compare perf reports
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Expected Results & Analysis Strategy
|
|||
|
|
|
|||
|
|
### Baseline Expectations (from Phase 6 analysis)
|
|||
|
|
|
|||
|
|
#### Strong Areas (Expected +50% to +171% vs System)
|
|||
|
|
1. **Mid-Large (8-32KB)**: HAKMEM's SuperSlab should dominate
|
|||
|
|
- Expected: +100% to +150% vs system
|
|||
|
|
- Phase 7 improvement target: Maintain or improve
|
|||
|
|
|
|||
|
|
2. **Large Allocations (VM Mixed)**: L2.5 layer efficiency
|
|||
|
|
- Expected: Competitive or slight win vs system
|
|||
|
|
|
|||
|
|
#### Weak Areas (Expected -50% to -70% vs System)
|
|||
|
|
1. **Tiny (≤128B)**: Structural weakness identified in Phase 6
|
|||
|
|
- Expected: -40% to -60% vs system
|
|||
|
|
- Phase 7 HEADER_CLASSIDX may help: +10-20% improvement
|
|||
|
|
|
|||
|
|
2. **Random Mixed**: Magazine layer overhead
|
|||
|
|
- Expected: -20% to -50% vs system
|
|||
|
|
- Phase 7 target: Reduce gap
|
|||
|
|
|
|||
|
|
3. **Larson Multi-thread**: Contention issues
|
|||
|
|
- Expected: Variable (1T: ok, 4T+: risk of crashes)
|
|||
|
|
- Phase 7 critical: Verify 4T stability (active counter fix)
|
|||
|
|
|
|||
|
|
### What to Look For
|
|||
|
|
|
|||
|
|
#### Phase 7 Improvements (HEADER_CLASSIDX=1)
|
|||
|
|
- **Tiny allocations**: +10-30% improvement (fewer header loads)
|
|||
|
|
- **Random mixed**: +15-25% improvement (class_idx in header)
|
|||
|
|
- **Cache efficiency**: Better locality (1-byte header vs 2-byte)
|
|||
|
|
|
|||
|
|
#### Red Flags
|
|||
|
|
- **Mid-Large regression**: Should NOT regress (HEADER_CLASSIDX doesn't affect mid-large path)
|
|||
|
|
- **4T+ crashes in Larson**: Active counter bug should be fixed (Phase 6-2.3)
|
|||
|
|
- **Severe regression (>20%)**: Investigate immediately
|
|||
|
|
|
|||
|
|
#### Bottleneck Identification
|
|||
|
|
If Phase 7 results are disappointing:
|
|||
|
|
1. **Run perf** on slow benchmarks
|
|||
|
|
2. **Compare with Phase 6** perf profiles (if available)
|
|||
|
|
3. **Check hot paths**:
|
|||
|
|
- `tiny_alloc_fast()` - Should be 3-4 instructions
|
|||
|
|
- `tiny_free_fast()` - Should be fast header check
|
|||
|
|
- `superslab_refill()` - Should use P0 ctz optimization
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Time Estimates
|
|||
|
|
|
|||
|
|
### Minimal Run (Option A: Suite Script Only)
|
|||
|
|
- Build verification: 2 min
|
|||
|
|
- Sanity test: 2 min
|
|||
|
|
- Suite execution: 15-20 min
|
|||
|
|
- Quick analysis: 5 min
|
|||
|
|
- **Total: ~25-30 minutes**
|
|||
|
|
|
|||
|
|
### Comprehensive Run (Option B: All Individual Scripts)
|
|||
|
|
- Build verification: 2 min
|
|||
|
|
- Sanity test: 2 min
|
|||
|
|
- Larson A/B: 25 min
|
|||
|
|
- Random Mixed A/B: 20 min
|
|||
|
|
- Mid-Large MT A/B: 15 min
|
|||
|
|
- VM Mixed: 5 min
|
|||
|
|
- Tiny Hot: 5 min
|
|||
|
|
- Analysis & report: 15 min
|
|||
|
|
- **Total: ~90 minutes (1.5 hours)**
|
|||
|
|
|
|||
|
|
### With Performance Profiling
|
|||
|
|
- Add: ~20-30 min per benchmark
|
|||
|
|
- **Total: ~2-3 hours**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Recommended Execution Order
|
|||
|
|
|
|||
|
|
### Quick Assessment (30 minutes)
|
|||
|
|
1. ✅ Verify build status
|
|||
|
|
2. ✅ Run suite script (bench_suite_matrix.sh)
|
|||
|
|
3. ✅ Generate quick comparison
|
|||
|
|
4. 🔍 Identify major wins/losses
|
|||
|
|
5. 📝 Decide if deep dive needed
|
|||
|
|
|
|||
|
|
### Deep Analysis (if needed, +60 minutes)
|
|||
|
|
1. 🔬 Run individual A/B scripts for problem areas
|
|||
|
|
2. 📊 Profile with perf
|
|||
|
|
3. 📝 Compare with Phase 6 baseline
|
|||
|
|
4. 💡 Generate actionable insights
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Output Organization
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
bench_results/
|
|||
|
|
├── suite/
|
|||
|
|
│ └── <timestamp>/
|
|||
|
|
│ ├── results.csv # All benchmarks, all variants
|
|||
|
|
│ └── raw/*.out # Raw logs
|
|||
|
|
├── random_mixed_ab/
|
|||
|
|
│ └── <timestamp>/
|
|||
|
|
│ ├── results.csv # A/B test results
|
|||
|
|
│ └── raw/*.txt # Per-run data
|
|||
|
|
├── larson_ab/
|
|||
|
|
│ └── <timestamp>/
|
|||
|
|
│ ├── results.csv
|
|||
|
|
│ └── raw/*.out
|
|||
|
|
├── mid_large_mt_ab/
|
|||
|
|
│ └── <timestamp>/
|
|||
|
|
│ ├── results.csv
|
|||
|
|
│ └── raw/*.out
|
|||
|
|
└── ...
|
|||
|
|
|
|||
|
|
# Analysis reports
|
|||
|
|
PHASE7_RESULTS_SUMMARY.md # High-level summary
|
|||
|
|
PHASE7_DETAILED_ANALYSIS.md # Deep dive (if needed)
|
|||
|
|
perf_*.txt # Performance profiles
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Next Steps After Benchmark
|
|||
|
|
|
|||
|
|
### If Phase 7 Shows Strong Results (+30-50% overall)
|
|||
|
|
1. ✅ Commit and document improvements
|
|||
|
|
2. 🎯 Focus on remaining weak areas (Tiny allocations)
|
|||
|
|
3. 📢 Prepare performance summary for stakeholders
|
|||
|
|
|
|||
|
|
### If Phase 7 Shows Modest Results (+10-20% overall)
|
|||
|
|
1. 🔍 Identify specific bottlenecks (perf profiling)
|
|||
|
|
2. 🧪 Test individual optimizations in isolation
|
|||
|
|
3. 📊 Compare with Phase 6 to ensure no regressions
|
|||
|
|
|
|||
|
|
### If Phase 7 Shows Regressions (any area -10% or worse)
|
|||
|
|
1. 🚨 Immediate investigation
|
|||
|
|
2. 🔄 Bisect to find regression point
|
|||
|
|
3. 🧪 Consider reverting HEADER_CLASSIDX if severe
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Quick Reference Commands
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# Full suite (automated)
|
|||
|
|
./scripts/bench_suite_matrix.sh
|
|||
|
|
|
|||
|
|
# Individual benchmarks (quick test)
|
|||
|
|
./larson_hakmem 1 8 128 1024 1 12345 1
|
|||
|
|
./bench_random_mixed_hakmem 400000 8192 1234567
|
|||
|
|
./bench_mid_large_mt_hakmem 4 40000 2048 42
|
|||
|
|
./bench_vm_mixed_hakmem 20000 256 424242
|
|||
|
|
./bench_tiny_hot_hakmem 32 100 60000
|
|||
|
|
|
|||
|
|
# A/B tests (environment variable sweeps)
|
|||
|
|
./scripts/bench_larson_1t_ab.sh
|
|||
|
|
./scripts/bench_random_mixed_ab.sh
|
|||
|
|
./scripts/bench_mid_large_mt_ab.sh
|
|||
|
|
|
|||
|
|
# Latest results
|
|||
|
|
ls -td bench_results/suite/* | head -1
|
|||
|
|
cat $(ls -td bench_results/suite/* | head -1)/results.csv
|
|||
|
|
|
|||
|
|
# Performance profiling
|
|||
|
|
perf record -g --call-graph dwarf -- ./bench_random_mixed_hakmem 400000 8192 1234567
|
|||
|
|
perf report --stdio > perf_output.txt
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Key Success Metrics
|
|||
|
|
|
|||
|
|
### Primary Goal: Overall Improvement
|
|||
|
|
- **Target**: +20-30% average throughput vs Phase 6
|
|||
|
|
- **Minimum**: No regressions in mid-large (HAKMEM's strength)
|
|||
|
|
|
|||
|
|
### Secondary Goals:
|
|||
|
|
1. **Stability**: 4T+ Larson runs without crashes
|
|||
|
|
2. **Tiny improvement**: -40% to -50% vs system (from -60%)
|
|||
|
|
3. **Random mixed improvement**: -10% to -20% vs system (from -30%+)
|
|||
|
|
|
|||
|
|
### Stretch Goals:
|
|||
|
|
1. **Mid-large dominance**: Maintain +100% vs system
|
|||
|
|
2. **Overall parity**: Match or beat system malloc on average
|
|||
|
|
3. **Consistency**: No severe outliers (no single test <50% of system)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**Document Version**: 1.0
|
|||
|
|
**Created**: 2025-11-08
|
|||
|
|
**Author**: Claude (Task Agent)
|
|||
|
|
**Status**: Ready for execution
|