Phase 3 C1: TLS Prefetch Implementation - NEUTRAL Result (Research Box)

Step 1 & 2 Complete:
- Implemented: core/front/malloc_tiny_fast.h prefetch (lines 264-267, 331-334)
  - LEGACY path prefetch of g_unified_cache[class_idx] to L1
  - ENV gate: HAKMEM_TINY_PREFETCH=0/1 (default OFF)
  - Conditional: only when prefetch enabled + route_kind == LEGACY

- A/B test (Mixed 10-run): PREFETCH=0 (39.33M) → =1 (39.20M) = -0.34% avg
  - Median: +1.28% (within ±1.0% neutral range)
  - Result: 🔬 NEUTRAL (research box, default OFF)

Decision: FREEZE as research box
- Average -0.34% suggests prefetch overhead > benefit
- Prefetch timing too late (after route_kind selection)
- TLS cache access is already fast (head/tail indices)
- Actual memory wait happens at slots[] array access (after prefetch)

Technical Learning:
- Prefetch effectiveness depends on L1 miss rate at access time
- Inserting prefetch after route selection may be too late
- Future approach: move prefetch earlier or use different target

Next: Phase 3 C2 (Metadata Cache Optimization, expected +5-10%)

🤖 Generated with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
This commit is contained in:
Moe Charm (CI)
2025-12-13 19:01:57 +09:00
parent d54893ea1d
commit d0b931b197
6 changed files with 115 additions and 6 deletions

View File

@ -261,6 +261,10 @@ static inline void* malloc_tiny_fast_for_class(size_t size, int class_idx) {
if (TINY_HOT_LIKELY(env_cfg->alloc_route_shape)) {
// B3 optimized: Prioritize LEGACY with LIKELY hint
if (TINY_HOT_LIKELY(route_kind == SMALL_ROUTE_LEGACY)) {
// Phase 3 C1: TLS cache prefetch (prefetch g_unified_cache[class_idx] to L1)
if (__builtin_expect(env_cfg->tiny_prefetch, 0)) {
__builtin_prefetch(&g_unified_cache[class_idx], 0, 3);
}
// LEGACY fast path: Unified Cache hot/cold
void* ptr = tiny_hot_alloc_fast(class_idx);
if (TINY_HOT_LIKELY(ptr != NULL)) {
@ -324,6 +328,10 @@ static inline void* malloc_tiny_fast_for_class(size_t size, int class_idx) {
break;
}
// Phase 3 C1: TLS cache prefetch (prefetch g_unified_cache[class_idx] to L1)
if (__builtin_expect(env_cfg->tiny_prefetch, 0)) {
__builtin_prefetch(&g_unified_cache[class_idx], 0, 3);
}
// LEGACY fallback: Unified Cache hot/cold path
void* ptr = tiny_hot_alloc_fast(class_idx);
if (TINY_HOT_LIKELY(ptr != NULL)) {