Phase FREE-TINY-FAST-DUALHOT-1: Optimize C0-C3 direct free path

Treat C0-C3 classes (48% of calls) as "second hot path" instead of
cold path. Skip expensive policy snapshot and route determination,
direct to tiny_legacy_fallback_free_base().

Measurements from FREE-TINY-FAST-HOTCOLD-OPT-1 revealed C0-C3 is NOT
rare (48.43% of all frees). Previous attempt to optimize via hot/cold
split failed (-13% regression) because noinline + function call on 48%
of workload hurt more than it helped.

This phase applies correct optimization: direct inline path for
frequent C0-C3 without policy snapshot overhead.

Implementation:
- Insert C0-C3 early-exit after C7 ULTRA check
- Skip tiny_front_v3_snapshot_get() for C0-C3 (saves 5-10 cycles)
- Skip route determination logic
- Safety: HAKMEM_TINY_LARSON_FIX=1 disables optimization

Benchmark Results (100M ops, 400 threads, MIXED_TINYV3_C7_SAFE):
- Baseline (optimization OFF): 44.50M ops/s (median)
- Optimized (DUALHOT ON):      48.74M ops/s (median)
- Improvement: +9.51% (+4.23M ops/s)

Perf Stats (optimized):
- Branch misses: 112.8M
- Cycles: 8.89B
- Instructions: 21.95B (2.47 IPC)
- Cache misses: 656K

Status: GO (significant improvement, no regression)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
This commit is contained in:
Moe Charm (CI)
2025-12-13 03:46:36 +09:00
parent c503b212a3
commit 2b567ac070

View File

@ -430,6 +430,24 @@ static inline int free_tiny_fast_hot(void* ptr) {
return 1;
}
// Phase FREE-TINY-FAST-DUALHOT-1: C0-C3 direct path (48% of calls)
// Skip expensive policy snapshot and route determination, direct to legacy fallback.
// Safety: Check Larson mode (cross-thread free handling requires full validation path)
{
static __thread int g_larson_fix = -1;
if (__builtin_expect(g_larson_fix == -1, 0)) {
const char* e = getenv("HAKMEM_TINY_LARSON_FIX");
g_larson_fix = (e && *e && *e != '0') ? 1 : 0;
}
if (__builtin_expect(class_idx <= 3 && !g_larson_fix, 1)) {
// C0-C3 + Larson mode OFF → Direct to legacy (no policy snapshot overhead)
tiny_legacy_fallback_free_base(base, class_idx);
FREE_TINY_FAST_HOTCOLD_STAT_INC(hot_hit);
return 1;
}
}
// Phase POLICY-FAST-PATH-V2: Skip policy snapshot for known-legacy classes
if (free_policy_fast_v2_can_skip((uint8_t)class_idx)) {
FREE_PATH_STAT_INC(policy_fast_v2_skip);