Files
hakmem/docs/analysis/PHASE80_INLINE_SLOTS_SWITCH_DISPATCH_1_RESULTS.md
Moe Charm (CI) 89a9212700 Phase 83-1 + Allocator Comparison: Switch dispatch fixed (NO-GO +0.32%), PROFILE correction, SCORECARD update
Key changes:
- Phase 83-1: Switch dispatch fixed mode (tiny_inline_slots_switch_dispatch_fixed_box) - NO-GO (marginal +0.32%, branch reduction negligible)
  Reason: lazy-init pattern already optimal, Phase 78-1 pattern shows diminishing returns

- Allocator comparison baseline update (10-run SSOT, WS=400, ITERS=20M):
  tcmalloc: 115.26M (92.33% of mimalloc)
  jemalloc: 97.39M (77.96% of mimalloc)
  system: 85.20M (68.24% of mimalloc)
  mimalloc: 124.82M (baseline)

- hakmem PROFILE correction: scripts/run_mixed_10_cleanenv.sh + run_allocator_quick_matrix.sh
  PROFILE explicitly set to MIXED_TINYV3_C7_SAFE for hakmem measurements
  Result: baseline stabilized to 55.53M (44.46% of mimalloc)
  Previous unstable measurement (35.57M) was due to profile leak

- Documentation:
  * PERFORMANCE_TARGETS_SCORECARD.md: Reference allocators + M1/M2 milestone status
  * PHASE83_1_SWITCH_DISPATCH_FIXED_RESULTS.md: Phase 83-1 analysis (NO-GO)
  * ALLOCATOR_COMPARISON_QUICK_RUNBOOK.md: Quick comparison procedure
  * ALLOCATOR_COMPARISON_SSOT.md: Detailed SSOT methodology

- M2 milestone status: 44.46% (target 55%, gap -10.54pp) - structural improvements needed

🤖 Generated with Claude Code
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-18 18:50:00 +09:00

1.6 KiB

Phase 80-1: Inline Slots Switch Dispatch — Results

Goal

Reduce per-op comparison/branch overhead in inline-slots routing for the hot classes by replacing the sequential if (class_idx==X) chain with a switch (class_idx) dispatch when enabled.

Scope:

  • Alloc hot path: core/box/tiny_front_hot_box.h
  • Free legacy fallback: core/box/tiny_legacy_fallback_box.h

Change Summary

  • New env gate box: core/box/tiny_inline_slots_switch_dispatch_box.h
    • ENV: HAKMEM_TINY_INLINE_SLOTS_SWITCHDISPATCH=0/1 (default 0)
  • When enabled, uses switch dispatch for C4/C5/C6 (and excludes C2/C3 work, which is NO-GO).
  • Reversible: set HAKMEM_TINY_INLINE_SLOTS_SWITCHDISPATCH=0 to restore the original if-chain.

A/B (Mixed SSOT, 10-run)

Workload:

  • ITERS=20000000, WS=400, RUNS=10
  • scripts/run_mixed_10_cleanenv.sh

Results:

Baseline (SWITCHDISPATCH=0, if-chain):

  • Mean: 51.98M ops/s

Treatment (SWITCHDISPATCH=1, switch):

  • Mean: 52.84M ops/s

Delta:

  • +1.65% GO (threshold +1.0%)

perf stat (single-run sanity)

Key deltas (treatment vs baseline):

  • Cycles: -1.6%
  • Instructions: -1.5%
  • Branches: -2.9%
  • Cache-misses: -6.7%
  • Throughput (single): +3.7%

Interpretation:

  • Switch dispatch removes repeated failed comparisons for the hot inline-slot classes, reducing branches/instructions without causing cache-miss explosions.

Promotion

Promoted to Mixed SSOT defaults:

  • core/bench_profile.h: HAKMEM_TINY_INLINE_SLOTS_SWITCHDISPATCH=1
  • scripts/run_mixed_10_cleanenv.sh: HAKMEM_TINY_INLINE_SLOTS_SWITCHDISPATCH=1

Rollback:

export HAKMEM_TINY_INLINE_SLOTS_SWITCHDISPATCH=0