Phase 3 D1: Free Path Route Cache - DECISION: GO (+1.06%)

Target: Eliminate tiny_route_for_class() overhead in free path
- Perf finding: 4.39% self + 24.78% children (free bottleneck)
- Approach: Use cached route_kind (like Phase 3 C3 for alloc)

Implementation:
- core/box/tiny_free_route_cache_env_box.h (new)
  * ENV gate: HAKMEM_FREE_STATIC_ROUTE=0/1 (default OFF)
  * Lazy initialization with sentinel value
- core/front/malloc_tiny_fast.h (modified)
  * Two call sites: free_tiny_fast_cold() + legacy_fallback path
  * Direct route lookup: g_tiny_route_class[class_idx]
  * Fallback safety: Check g_tiny_route_snapshot_done

A/B Test Results (Mixed, 10-run):
- Baseline (D1=0): 45.13 M ops/s (avg), 45.76 M ops/s (median)
- Optimized (D1=1): 45.61 M ops/s (avg), 45.40 M ops/s (median)
- Improvement: +1.06% (avg), -0.77% (median)
- DECISION: GO (avg gain meets +1.0% threshold)

Cumulative Phase 2-3:
- B3: +2.89%, B4: +1.47%, C3: +2.20%
- D1: +1.06%
- Total: ~7.2% cumulative gain

🤖 Generated with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
This commit is contained in:
Moe Charm (CI)
2025-12-13 21:44:00 +09:00
parent d43a3ce611
commit f059c0ec83
3 changed files with 105 additions and 2 deletions

View File

@ -68,6 +68,7 @@
#include "../box/free_tiny_fast_hotcold_env_box.h" // Phase FREE-TINY-FAST-HOTCOLD-OPT-1: ENV control
#include "../box/free_tiny_fast_hotcold_stats_box.h" // Phase FREE-TINY-FAST-HOTCOLD-OPT-1: Stats
#include "../box/tiny_metadata_cache_hot_box.h" // Phase 3 C2: Policy hot cache (metadata cache optimization)
#include "../box/tiny_free_route_cache_env_box.h" // Phase 3 D1: Free path route cache
// Helper: current thread id (low 32 bits) for owner check
#ifndef TINY_SELF_U32_LOCAL_DEFINED
@ -369,7 +370,19 @@ static int free_tiny_fast_cold(void* ptr, void* base, int class_idx)
{
FREE_TINY_FAST_HOTCOLD_STAT_INC(cold_hit);
tiny_route_kind_t route = tiny_route_for_class((uint8_t)class_idx);
// Phase 3 D1: Free path route cache (eliminate tiny_route_for_class overhead)
tiny_route_kind_t route;
if (__builtin_expect(tiny_free_static_route_enabled(), 0)) {
// Use cached route (bypasses tiny_route_for_class())
route = g_tiny_route_class[(unsigned)class_idx & 7u];
if (__builtin_expect(route == TINY_ROUTE_LEGACY && !g_tiny_route_snapshot_done, 0)) {
// Fallback if uninitialized
route = tiny_route_for_class((uint8_t)class_idx);
}
} else {
// Standard path
route = tiny_route_for_class((uint8_t)class_idx);
}
const int use_tiny_heap = tiny_route_is_heap_kind(route);
const TinyFrontV3Snapshot* front_snap =
__builtin_expect(tiny_front_v3_enabled(), 0) ? tiny_front_v3_snapshot_get() : NULL;
@ -763,7 +776,19 @@ static inline int free_tiny_fast(void* ptr) {
legacy_fallback:
// LEGACY fallback path
tiny_route_kind_t route = tiny_route_for_class((uint8_t)class_idx);
// Phase 3 D1: Free path route cache (eliminate tiny_route_for_class overhead)
tiny_route_kind_t route;
if (__builtin_expect(tiny_free_static_route_enabled(), 0)) {
// Use cached route (bypasses tiny_route_for_class())
route = g_tiny_route_class[(unsigned)class_idx & 7u];
if (__builtin_expect(route == TINY_ROUTE_LEGACY && !g_tiny_route_snapshot_done, 0)) {
// Fallback if uninitialized
route = tiny_route_for_class((uint8_t)class_idx);
}
} else {
// Standard path
route = tiny_route_for_class((uint8_t)class_idx);
}
const int use_tiny_heap = tiny_route_is_heap_kind(route);
const TinyFrontV3Snapshot* front_snap =
__builtin_expect(tiny_front_v3_enabled(), 0) ? tiny_front_v3_snapshot_get() : NULL;