Phase 3 C1: TLS Prefetch Implementation - NEUTRAL Result (Research Box)

Step 1 & 2 Complete:
- Implemented: core/front/malloc_tiny_fast.h prefetch (lines 264-267, 331-334)
  - LEGACY path prefetch of g_unified_cache[class_idx] to L1
  - ENV gate: HAKMEM_TINY_PREFETCH=0/1 (default OFF)
  - Conditional: only when prefetch enabled + route_kind == LEGACY

- A/B test (Mixed 10-run): PREFETCH=0 (39.33M) → =1 (39.20M) = -0.34% avg
  - Median: +1.28% (within ±1.0% neutral range)
  - Result: 🔬 NEUTRAL (research box, default OFF)

Decision: FREEZE as research box
- Average -0.34% suggests prefetch overhead > benefit
- Prefetch timing too late (after route_kind selection)
- TLS cache access is already fast (head/tail indices)
- Actual memory wait happens at slots[] array access (after prefetch)

Technical Learning:
- Prefetch effectiveness depends on L1 miss rate at access time
- Inserting prefetch after route selection may be too late
- Future approach: move prefetch earlier or use different target

Next: Phase 3 C2 (Metadata Cache Optimization, expected +5-10%)

🤖 Generated with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
This commit is contained in:
Moe Charm (CI)
2025-12-13 19:01:57 +09:00
parent d54893ea1d
commit d0b931b197
6 changed files with 115 additions and 6 deletions

View File

@ -96,6 +96,12 @@ From `/mnt/workdisk/public_share/hakmem/core/hakmem_tiny_stats.h`:
- **Impact**: A/B gate for policy snapshot cost removal (research box until GO)
- **Notes**: v7 learner が有効な場合(`HAKMEM_SMALL_HEAP_V7_ENABLED=1` かつ learner 無効化なし)は安全のため強制 OFF
#### HAKMEM_TINY_PREFETCH
- **Default**: 0 (disabled)
- **Purpose**: Prefetch hints for Tiny hot paths (Phase 3 C1)
- **Impact**: `malloc_tiny_fast_for_class()` の LEGACY route で TLS Unified Cache を `__builtin_prefetch()` する A/B gate
- **Notes**: Prefetch は workload 依存。NO-GO なら即 freezedefault OFF のまま)
---
### 2. Tiny Pool TLS Caching (Performance Critical)