Phase 74-1 (ENV-gated LOCALIZE): - Result: +0.50% (NEUTRAL) - Runtime branch overhead caused instructions/branches to increase - Diagnosed: Branch tax dominates intended optimization Phase 74-2 (compile-time LOCALIZE): - Result: -0.87% (NEUTRAL, P1 frozen) - Removed runtime branch → instructions -0.6%, branches -2.3% ✓ - But cache-misses +86% (register pressure/spill) → net loss - Conclusion: LOCALIZE本体 works, but fragile to cache effects Key finding: - Dependency chain reduction (LOCALIZE) has low ROI due to cache-miss sensitivity - P1 (LOCALIZE) frozen at default OFF - Next: Phase 74-3 (P0: FASTAPI) - move branches outside hot loop Files: - core/hakmem_build_flags.h: HAKMEM_TINY_UC_LOCALIZE_COMPILED flag - core/box/tiny_unified_cache_hitpath_env_box.h: ENV gate (frozen) - core/front/tiny_unified_cache.h: compile-time #if blocks - docs/analysis/PHASE74_*: Design, instructions, results - CURRENT_TASK.md: P1 frozen, P0 next instructions Also includes: - Phase 69 refill tuning results (archived docs) - PERFORMANCE_TARGETS_SCORECARD.md: Phase 69 baseline update - PHASE70_REFILL_OBSERVABILITY_PREREQS_SSOT.md: Route banner docs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
3.7 KiB
3.7 KiB
Phase 70: Refill Tuning Prerequisites (Observability SSOT)
Status: ✅ COMPLETE (SSOT established)
Context:
- Current baseline (Mixed WS=400 + prefault) generates almost zero cache misses (refills) in Unified Cache.
- Optimizing
unified_cache_refill()or Warm Pool logic will yield zero throughput gain if this path is not hot. - Phase 70 must be gated on observing significant refill activity.
- Current conclusion (WS=400 SSOT): Refill/WarmPool-pop is not hot (misses are extremely low), so refill micro-optimizations are frozen for SSOT workloads; only research workloads should touch refill behavior.
1. Observability Protocol (Step 0)
Before implementing any refill/WarmPool changes, execute this sequence:
-
Route Banner(任意だが推奨):
HAKMEM_ROUTE_BANNER=1 ./bench_random_mixed_hakmem_observe ...- Route assignments(backend route kind)と cache config(
unified_cache_enabled/warm_pool_max_per_class)を 1 回だけ表示する。 - 「Route=LEGACY = Unified Cache 未使用」といった誤認を防ぐ(LEGACYでもUnified Cacheは alloc/free の front で使われる)。
- Route assignments(backend route kind)と cache config(
-
Build with Stats:
make bench_random_mixed_hakmem_observe EXTRA_CFLAGS='-DHAKMEM_UNIFIED_CACHE_STATS_COMPILED=1'(Note: Phase 70-0 fixed release-mode stats blocking)
-
Run with Stats:
HAKMEM_ROUTE_BANNER=1 HAKMEM_WARM_POOL_STATS=1 ./bench_random_mixed_hakmem_observe 20000000 400 1 -
Check Output:
- Look for
Unified-STATS:miss=... - Look for
WarmPool-STATS:hits=...
- Look for
Decision Gate:
- If
misscounts are < 1000 (approx <0.01% miss rate):- STOP: Optimization has no ROI on this workload.
- ACTION: Either accept current state (refill is not a bottleneck) or switch to a research workload (below).
- If
misscounts are significant:- GO: Proceed with Phase 70 logic changes.
2. Research Workload (If Refill Optimization is Mandatory)
If you must measure refill performance improvements (e.g., for architectural validation), modify the workload to force cache pressure:
Option A: Disable Prefault (Cold Path Stress)
HAKMEM_BENCH_PREFAULT=0 ./bench_random_mixed_hakmem_observe ...
- Pros: Forces refills during benchmark loop.
- Cons: Measures startup behavior, not steady-state throughput.
Option B: Increase Working Set (Steady State Stress)
./bench_random_mixed_hakmem_observe 20000000 8192 1 # WS=8192 > Cache(2048)
- Pros: Forces steady-state evictions/refills.
- Cons: Different workload profile than standard Mixed SSOT (WS=400).
WARNING: Do NOT change HAKMEM_TINY_STATIC_ROUTE or ULTRA flags unless specifically testing routing changes. The default LEGACY route does use Unified Cache for alloc/free.
NOTE (Warm Pool sizing semantics):
HAKMEM_WARM_POOL_SIZEprimarily controls the registry-scan prefill cap (warm_pool_max_per_class()).- The steady-state push-back cap inside
unified_cache_refill()isTinyClassPolicy.warm_cap(typically 4/8), soHAKMEM_WARM_POOL_SIZEonly matters when registry scans/refills happen often enough to benefit from prefilled slabs.
3. Reference: Why "LEGACY" Route is OK
- LEGACY route means "not ULTRA/MID/V7 specialized".
- Alloc path:
malloc_tiny_fast→tiny_hot_alloc_fast→ Unified Cache (TLS array). - Free path:
free_tiny_fast→tiny_hot_free_fast→ Unified Cache (TLS array). - Refill path: Cache Miss →
unified_cache_refill→ Warm Pool → Registry.
Previous confusion ("LEGACY unused") was due to:
- Stats counting was gated by
#if !HAKMEM_BUILD_RELEASE. - Low miss rate in WS=400 made it look unused.
- Phase 70-0 fixed the stats visibility.