Files
hakmem/docs/analysis/PHASE70_REFILL_OBSERVABILITY_PREREQS_SSOT.md
Moe Charm (CI) e9b97e9d8e Phase 74-1/74-2: UnifiedCache LOCALIZE optimization (P1 frozen, NEUTRAL -0.87%)
Phase 74-1 (ENV-gated LOCALIZE):
- Result: +0.50% (NEUTRAL)
- Runtime branch overhead caused instructions/branches to increase
- Diagnosed: Branch tax dominates intended optimization

Phase 74-2 (compile-time LOCALIZE):
- Result: -0.87% (NEUTRAL, P1 frozen)
- Removed runtime branch → instructions -0.6%, branches -2.3% ✓
- But cache-misses +86% (register pressure/spill) → net loss
- Conclusion: LOCALIZE本体 works, but fragile to cache effects

Key finding:
- Dependency chain reduction (LOCALIZE) has low ROI due to cache-miss sensitivity
- P1 (LOCALIZE) frozen at default OFF
- Next: Phase 74-3 (P0: FASTAPI) - move branches outside hot loop

Files:
- core/hakmem_build_flags.h: HAKMEM_TINY_UC_LOCALIZE_COMPILED flag
- core/box/tiny_unified_cache_hitpath_env_box.h: ENV gate (frozen)
- core/front/tiny_unified_cache.h: compile-time #if blocks
- docs/analysis/PHASE74_*: Design, instructions, results
- CURRENT_TASK.md: P1 frozen, P0 next instructions

Also includes:
- Phase 69 refill tuning results (archived docs)
- PERFORMANCE_TARGETS_SCORECARD.md: Phase 69 baseline update
- PHASE70_REFILL_OBSERVABILITY_PREREQS_SSOT.md: Route banner docs

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-18 07:47:44 +09:00

80 lines
3.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Phase 70: Refill Tuning Prerequisites (Observability SSOT)
**Status**: ✅ COMPLETE (SSOT established)
**Context**:
- Current baseline (Mixed WS=400 + prefault) generates almost **zero cache misses** (refills) in Unified Cache.
- Optimizing `unified_cache_refill()` or Warm Pool logic will yield **zero throughput gain** if this path is not hot.
- Phase 70 must be gated on *observing significant refill activity*.
- Current conclusion (WS=400 SSOT): Refill/WarmPool-pop is **not hot** (misses are extremely low), so refill micro-optimizations are **frozen** for SSOT workloads; only research workloads should touch refill behavior.
## 1. Observability Protocol (Step 0)
Before implementing any refill/WarmPool changes, execute this sequence:
0. **Route Banner任意だが推奨**:
```bash
HAKMEM_ROUTE_BANNER=1 ./bench_random_mixed_hakmem_observe ...
```
- Route assignmentsbackend route kindと cache config`unified_cache_enabled` / `warm_pool_max_per_class`)を 1 回だけ表示する。
- 「Route=LEGACY = Unified Cache 未使用」といった誤認を防ぐLEGACYでもUnified Cacheは alloc/free の front で使われる)。
1. **Build with Stats**:
```bash
make bench_random_mixed_hakmem_observe EXTRA_CFLAGS='-DHAKMEM_UNIFIED_CACHE_STATS_COMPILED=1'
```
*(Note: Phase 70-0 fixed release-mode stats blocking)*
2. **Run with Stats**:
```bash
HAKMEM_ROUTE_BANNER=1 HAKMEM_WARM_POOL_STATS=1 ./bench_random_mixed_hakmem_observe 20000000 400 1
```
3. **Check Output**:
- Look for `Unified-STATS`: `miss=...`
- Look for `WarmPool-STATS`: `hits=...`
**Decision Gate**:
- If `miss` counts are < 1000 (approx <0.01% miss rate):
- **STOP**: Optimization has no ROI on this workload.
- **ACTION**: Either accept current state (refill is not a bottleneck) or switch to a research workload (below).
- If `miss` counts are significant:
- **GO**: Proceed with Phase 70 logic changes.
## 2. Research Workload (If Refill Optimization is Mandatory)
If you must measure refill performance improvements (e.g., for architectural validation), modify the workload to force cache pressure:
**Option A: Disable Prefault (Cold Path Stress)**
```bash
HAKMEM_BENCH_PREFAULT=0 ./bench_random_mixed_hakmem_observe ...
```
- Pros: Forces refills during benchmark loop.
- Cons: Measures startup behavior, not steady-state throughput.
**Option B: Increase Working Set (Steady State Stress)**
```bash
./bench_random_mixed_hakmem_observe 20000000 8192 1 # WS=8192 > Cache(2048)
```
- Pros: Forces steady-state evictions/refills.
- Cons: Different workload profile than standard Mixed SSOT (WS=400).
**WARNING**: Do NOT change `HAKMEM_TINY_STATIC_ROUTE` or `ULTRA` flags unless specifically testing routing changes. The default `LEGACY` route *does* use Unified Cache for alloc/free.
**NOTE (Warm Pool sizing semantics)**:
- `HAKMEM_WARM_POOL_SIZE` primarily controls the **registry-scan prefill cap** (`warm_pool_max_per_class()`).
- The steady-state push-back cap inside `unified_cache_refill()` is `TinyClassPolicy.warm_cap` (typically 4/8), so
`HAKMEM_WARM_POOL_SIZE` only matters when registry scans/refills happen often enough to benefit from prefilled slabs.
## 3. Reference: Why "LEGACY" Route is OK
- **LEGACY** route means "not ULTRA/MID/V7 specialized".
- Alloc path: `malloc_tiny_fast` → `tiny_hot_alloc_fast` → **Unified Cache (TLS array)**.
- Free path: `free_tiny_fast` → `tiny_hot_free_fast` → **Unified Cache (TLS array)**.
- Refill path: Cache Miss → `unified_cache_refill` → **Warm Pool** → Registry.
Previous confusion ("LEGACY unused") was due to:
1. Stats counting was gated by `#if !HAKMEM_BUILD_RELEASE`.
2. Low miss rate in WS=400 made it look unused.
3. Phase 70-0 fixed the stats visibility.