# Phase 8 Benchmark - Quick Reference Card ## TL;DR - The Numbers ``` Working Set 256 (Hot Cache): HAKMEM: 79.2 M ops/s System: 86.7 M ops/s (1.09x faster) mimalloc: 114.9 M ops/s (1.45x faster) Working Set 8192 (Realistic): HAKMEM: 16.5 M ops/s ⚠️ CRITICAL System: 57.1 M ops/s (3.46x faster) ⚠️ CRITICAL mimalloc: 96.5 M ops/s (5.85x faster) ⚠️ CRITICAL Scalability (WS256 → WS8192): HAKMEM: 4.80x degradation 🔴 BROKEN System: 1.52x degradation ✅ Good mimalloc: 1.19x degradation ✅ Excellent ``` ## Critical Issues Found ### 1. SuperSlab Scaling Failure (SEVERITY: CRITICAL) - **Impact**: 246% slower than System malloc at WS8192 - **Evidence**: "shared_fail→legacy" logs show slab exhaustion - **Root cause**: SuperSlab architecture doesn't scale beyond hot cache ### 2. Fast Path Overhead (SEVERITY: MEDIUM) - **Impact**: 9.4% slower than System malloc at WS256 - **Evidence**: Even with everything in cache, HAKMEM lags - **Root cause**: TLS drain overhead, SuperSlab lookup costs ### 3. Fragmentation Issues (SEVERITY: HIGH) - **Impact**: 4.8x performance degradation vs 1.5x for System - **Evidence**: Linear performance collapse with working set size - **Root cause**: SuperSlab list becomes inefficient ## Phase 9 Priorities ### Week 1: Investigation 1. Profile SuperSlab lookup latency 2. Measure cache/TLB miss rates 3. Analyze "shared_fail→legacy" root cause 4. Measure fragmentation at different working set sizes ### Week 2: Targeted Fixes 1. Implement hash table for SuperSlab lookup 2. Experiment with 1MB/2MB SuperSlab sizes 3. Fix shared slab capacity issues 4. Optimize fast path (inline more, reduce branches) ## Success Criteria ### Minimum (Required) - WS256: 79.2 → 85 M ops/s (+7%) - WS8192: 16.5 → 35 M ops/s (+112%) - Degradation: 4.80x → 2.50x or better ### Stretch Goal - WS256: 90+ M ops/s (match System malloc) - WS8192: 45+ M ops/s (80% of System malloc) - Degradation: 2.00x or better ## If Phase 9 Fails (<30 M ops/s at WS8192) Switch to **Hybrid Architecture**: - Keep: TLS fast path layer - Replace: SuperSlab backend → jemalloc-style arenas - Timeline: +3 weeks - Success probability: 75% ## Benchmark Reproducibility All benchmarks available at: - `/mnt/workdisk/public_share/hakmem/phase8_comprehensive_benchmark_results.txt` (raw data) - `./bench_random_mixed_hakmem 10000000 8192` (reproduce HAKMEM) - `./bench_random_mixed_system 10000000 8192` (reproduce System) - `./bench_random_mixed_mi 10000000 8192` (reproduce mimalloc) 5 runs per benchmark, StdDev < 2.5% (statistically robust). ## Reports Generated 1. **PHASE8_COMPREHENSIVE_BENCHMARK_REPORT.md** - Full statistical analysis 2. **PHASE8_TECHNICAL_ANALYSIS.md** - Deep dive into root causes 3. **PHASE8_VISUAL_SUMMARY.md** - Visual charts and decision matrix 4. **PHASE8_QUICK_REFERENCE.md** - This file (quick lookup) ## Next Steps 1. Read PHASE8_VISUAL_SUMMARY.md for decision matrix 2. Read PHASE8_TECHNICAL_ANALYSIS.md for root cause details 3. Begin Phase 9 investigation (Week 1) 4. Re-evaluate after 2 weeks --- **Date**: 2025-11-30 **Status**: Phase 8 COMPLETE, Phase 9 READY **Critical Path**: Fix SuperSlab scaling or switch to Hybrid architecture