Key changes: - Phase 83-1: Switch dispatch fixed mode (tiny_inline_slots_switch_dispatch_fixed_box) - NO-GO (marginal +0.32%, branch reduction negligible) Reason: lazy-init pattern already optimal, Phase 78-1 pattern shows diminishing returns - Allocator comparison baseline update (10-run SSOT, WS=400, ITERS=20M): tcmalloc: 115.26M (92.33% of mimalloc) jemalloc: 97.39M (77.96% of mimalloc) system: 85.20M (68.24% of mimalloc) mimalloc: 124.82M (baseline) - hakmem PROFILE correction: scripts/run_mixed_10_cleanenv.sh + run_allocator_quick_matrix.sh PROFILE explicitly set to MIXED_TINYV3_C7_SAFE for hakmem measurements Result: baseline stabilized to 55.53M (44.46% of mimalloc) Previous unstable measurement (35.57M) was due to profile leak - Documentation: * PERFORMANCE_TARGETS_SCORECARD.md: Reference allocators + M1/M2 milestone status * PHASE83_1_SWITCH_DISPATCH_FIXED_RESULTS.md: Phase 83-1 analysis (NO-GO) * ALLOCATOR_COMPARISON_QUICK_RUNBOOK.md: Quick comparison procedure * ALLOCATOR_COMPARISON_SSOT.md: Detailed SSOT methodology - M2 milestone status: 44.46% (target 55%, gap -10.54pp) - structural improvements needed 🤖 Generated with Claude Code Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
1.6 KiB
1.6 KiB
Phase 80-1: Inline Slots Switch Dispatch — Results
Goal
Reduce per-op comparison/branch overhead in inline-slots routing for the hot classes by replacing the sequential if (class_idx==X) chain with a switch (class_idx) dispatch when enabled.
Scope:
- Alloc hot path:
core/box/tiny_front_hot_box.h - Free legacy fallback:
core/box/tiny_legacy_fallback_box.h
Change Summary
- New env gate box:
core/box/tiny_inline_slots_switch_dispatch_box.h- ENV:
HAKMEM_TINY_INLINE_SLOTS_SWITCHDISPATCH=0/1(default 0)
- ENV:
- When enabled, uses switch dispatch for C4/C5/C6 (and excludes C2/C3 work, which is NO-GO).
- Reversible: set
HAKMEM_TINY_INLINE_SLOTS_SWITCHDISPATCH=0to restore the original if-chain.
A/B (Mixed SSOT, 10-run)
Workload:
ITERS=20000000,WS=400,RUNS=10scripts/run_mixed_10_cleanenv.sh
Results:
Baseline (SWITCHDISPATCH=0, if-chain):
- Mean:
51.98M ops/s
Treatment (SWITCHDISPATCH=1, switch):
- Mean:
52.84M ops/s
Delta:
+1.65%✅ GO (threshold +1.0%)
perf stat (single-run sanity)
Key deltas (treatment vs baseline):
- Cycles:
-1.6% - Instructions:
-1.5% - Branches:
-2.9%✅ - Cache-misses:
-6.7% - Throughput (single):
+3.7%
Interpretation:
- Switch dispatch removes repeated failed comparisons for the hot inline-slot classes, reducing branches/instructions without causing cache-miss explosions.
Promotion
Promoted to Mixed SSOT defaults:
core/bench_profile.h:HAKMEM_TINY_INLINE_SLOTS_SWITCHDISPATCH=1scripts/run_mixed_10_cleanenv.sh:HAKMEM_TINY_INLINE_SLOTS_SWITCHDISPATCH=1
Rollback:
export HAKMEM_TINY_INLINE_SLOTS_SWITCHDISPATCH=0