Files
hakmem/core/front
Moe Charm (CI) 2b567ac070 Phase FREE-TINY-FAST-DUALHOT-1: Optimize C0-C3 direct free path
Treat C0-C3 classes (48% of calls) as "second hot path" instead of
cold path. Skip expensive policy snapshot and route determination,
direct to tiny_legacy_fallback_free_base().

Measurements from FREE-TINY-FAST-HOTCOLD-OPT-1 revealed C0-C3 is NOT
rare (48.43% of all frees). Previous attempt to optimize via hot/cold
split failed (-13% regression) because noinline + function call on 48%
of workload hurt more than it helped.

This phase applies correct optimization: direct inline path for
frequent C0-C3 without policy snapshot overhead.

Implementation:
- Insert C0-C3 early-exit after C7 ULTRA check
- Skip tiny_front_v3_snapshot_get() for C0-C3 (saves 5-10 cycles)
- Skip route determination logic
- Safety: HAKMEM_TINY_LARSON_FIX=1 disables optimization

Benchmark Results (100M ops, 400 threads, MIXED_TINYV3_C7_SAFE):
- Baseline (optimization OFF): 44.50M ops/s (median)
- Optimized (DUALHOT ON):      48.74M ops/s (median)
- Improvement: +9.51% (+4.23M ops/s)

Perf Stats (optimized):
- Branch misses: 112.8M
- Cycles: 8.89B
- Instructions: 21.95B (2.47 IPC)
- Cache misses: 656K

Status: GO (significant improvement, no regression)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-13 03:46:36 +09:00
..