Phase 74-1 (ENV-gated LOCALIZE): - Result: +0.50% (NEUTRAL) - Runtime branch overhead caused instructions/branches to increase - Diagnosed: Branch tax dominates intended optimization Phase 74-2 (compile-time LOCALIZE): - Result: -0.87% (NEUTRAL, P1 frozen) - Removed runtime branch → instructions -0.6%, branches -2.3% ✓ - But cache-misses +86% (register pressure/spill) → net loss - Conclusion: LOCALIZE本体 works, but fragile to cache effects Key finding: - Dependency chain reduction (LOCALIZE) has low ROI due to cache-miss sensitivity - P1 (LOCALIZE) frozen at default OFF - Next: Phase 74-3 (P0: FASTAPI) - move branches outside hot loop Files: - core/hakmem_build_flags.h: HAKMEM_TINY_UC_LOCALIZE_COMPILED flag - core/box/tiny_unified_cache_hitpath_env_box.h: ENV gate (frozen) - core/front/tiny_unified_cache.h: compile-time #if blocks - docs/analysis/PHASE74_*: Design, instructions, results - CURRENT_TASK.md: P1 frozen, P0 next instructions Also includes: - Phase 69 refill tuning results (archived docs) - PERFORMANCE_TARGETS_SCORECARD.md: Phase 69 baseline update - PHASE70_REFILL_OBSERVABILITY_PREREQS_SSOT.md: Route banner docs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
80 lines
3.7 KiB
Markdown
80 lines
3.7 KiB
Markdown
# Phase 70: Refill Tuning Prerequisites (Observability SSOT)
|
||
|
||
**Status**: ✅ COMPLETE (SSOT established)
|
||
|
||
**Context**:
|
||
- Current baseline (Mixed WS=400 + prefault) generates almost **zero cache misses** (refills) in Unified Cache.
|
||
- Optimizing `unified_cache_refill()` or Warm Pool logic will yield **zero throughput gain** if this path is not hot.
|
||
- Phase 70 must be gated on *observing significant refill activity*.
|
||
- Current conclusion (WS=400 SSOT): Refill/WarmPool-pop is **not hot** (misses are extremely low), so refill micro-optimizations are **frozen** for SSOT workloads; only research workloads should touch refill behavior.
|
||
|
||
## 1. Observability Protocol (Step 0)
|
||
|
||
Before implementing any refill/WarmPool changes, execute this sequence:
|
||
|
||
0. **Route Banner(任意だが推奨)**:
|
||
```bash
|
||
HAKMEM_ROUTE_BANNER=1 ./bench_random_mixed_hakmem_observe ...
|
||
```
|
||
- Route assignments(backend route kind)と cache config(`unified_cache_enabled` / `warm_pool_max_per_class`)を 1 回だけ表示する。
|
||
- 「Route=LEGACY = Unified Cache 未使用」といった誤認を防ぐ(LEGACYでもUnified Cacheは alloc/free の front で使われる)。
|
||
|
||
1. **Build with Stats**:
|
||
```bash
|
||
make bench_random_mixed_hakmem_observe EXTRA_CFLAGS='-DHAKMEM_UNIFIED_CACHE_STATS_COMPILED=1'
|
||
```
|
||
*(Note: Phase 70-0 fixed release-mode stats blocking)*
|
||
|
||
2. **Run with Stats**:
|
||
```bash
|
||
HAKMEM_ROUTE_BANNER=1 HAKMEM_WARM_POOL_STATS=1 ./bench_random_mixed_hakmem_observe 20000000 400 1
|
||
```
|
||
|
||
3. **Check Output**:
|
||
- Look for `Unified-STATS`: `miss=...`
|
||
- Look for `WarmPool-STATS`: `hits=...`
|
||
|
||
**Decision Gate**:
|
||
- If `miss` counts are < 1000 (approx <0.01% miss rate):
|
||
- **STOP**: Optimization has no ROI on this workload.
|
||
- **ACTION**: Either accept current state (refill is not a bottleneck) or switch to a research workload (below).
|
||
- If `miss` counts are significant:
|
||
- **GO**: Proceed with Phase 70 logic changes.
|
||
|
||
## 2. Research Workload (If Refill Optimization is Mandatory)
|
||
|
||
If you must measure refill performance improvements (e.g., for architectural validation), modify the workload to force cache pressure:
|
||
|
||
**Option A: Disable Prefault (Cold Path Stress)**
|
||
```bash
|
||
HAKMEM_BENCH_PREFAULT=0 ./bench_random_mixed_hakmem_observe ...
|
||
```
|
||
- Pros: Forces refills during benchmark loop.
|
||
- Cons: Measures startup behavior, not steady-state throughput.
|
||
|
||
**Option B: Increase Working Set (Steady State Stress)**
|
||
```bash
|
||
./bench_random_mixed_hakmem_observe 20000000 8192 1 # WS=8192 > Cache(2048)
|
||
```
|
||
- Pros: Forces steady-state evictions/refills.
|
||
- Cons: Different workload profile than standard Mixed SSOT (WS=400).
|
||
|
||
**WARNING**: Do NOT change `HAKMEM_TINY_STATIC_ROUTE` or `ULTRA` flags unless specifically testing routing changes. The default `LEGACY` route *does* use Unified Cache for alloc/free.
|
||
|
||
**NOTE (Warm Pool sizing semantics)**:
|
||
- `HAKMEM_WARM_POOL_SIZE` primarily controls the **registry-scan prefill cap** (`warm_pool_max_per_class()`).
|
||
- The steady-state push-back cap inside `unified_cache_refill()` is `TinyClassPolicy.warm_cap` (typically 4/8), so
|
||
`HAKMEM_WARM_POOL_SIZE` only matters when registry scans/refills happen often enough to benefit from prefilled slabs.
|
||
|
||
## 3. Reference: Why "LEGACY" Route is OK
|
||
|
||
- **LEGACY** route means "not ULTRA/MID/V7 specialized".
|
||
- Alloc path: `malloc_tiny_fast` → `tiny_hot_alloc_fast` → **Unified Cache (TLS array)**.
|
||
- Free path: `free_tiny_fast` → `tiny_hot_free_fast` → **Unified Cache (TLS array)**.
|
||
- Refill path: Cache Miss → `unified_cache_refill` → **Warm Pool** → Registry.
|
||
|
||
Previous confusion ("LEGACY unused") was due to:
|
||
1. Stats counting was gated by `#if !HAKMEM_BUILD_RELEASE`.
|
||
2. Low miss rate in WS=400 made it look unused.
|
||
3. Phase 70-0 fixed the stats visibility.
|