diff --git a/CURRENT_TASK.md b/CURRENT_TASK.md index d8138b7d..5bad6989 100644 --- a/CURRENT_TASK.md +++ b/CURRENT_TASK.md @@ -197,6 +197,57 @@ - 実際のメモリ待ちは slots[] 配列へのアクセス時(prefetch より後) - 改善案: prefetch をもっと早期(route_kind 決定前)に移動するか、形状を変更 +#### Phase 3 C2: Slab Metadata Cache Optimization 🔬 NEUTRAL / FREEZE + +**設計メモ**: `docs/analysis/PHASE3_C2_METADATA_CACHE_1_DESIGN.md` + +**狙い**: Free path で metadata access(policy snapshot, slab descriptor)の cache locality を改善 + +**3 Patches 実装完了** ✅: + +1. **Policy Hot Cache** (Patch 1): + - TinyPolicyHot struct: route_kind[8] を TLS にキャッシュ(9 bytes packed) + - policy_snapshot() 呼び出しを削減(~2 memory ops 節約) + - Safety: learner v7 active 時は自動的に disable + - Files: `core/box/tiny_metadata_cache_env_box.h`, `tiny_metadata_cache_hot_box.{h,c}` + - Integration: `core/front/malloc_tiny_fast.h` (line 256) route selection + +2. **First Page Inline Cache** (Patch 2): + - TinyFirstPageCache struct: current slab page pointer を TLS per-class にキャッシュ + - superslab metadata lookup を回避(1-2 memory ops) + - Fast-path check in `tiny_legacy_fallback_free_base()` + - Files: `core/front/tiny_first_page_cache.h`, `tiny_unified_cache.c` + - Integration: `core/box/tiny_legacy_fallback_box.h` (lines 27-36) + +3. **Bounds Check Compile-out** (Patch 3): + - unified_cache capacity を MACRO constant 化(2048 hardcode) + - modulo 演算を compile-time 最適化(`& MASK`) + - Macros: `TINY_UNIFIED_CACHE_CAPACITY_POW2=11`, `CAPACITY=2048`, `MASK=2047` + - File: `core/front/tiny_unified_cache.h` (lines 35-41) + +**A/B テスト結果** 🔬 NEUTRAL: +- Mixed (10-run): + - Baseline (C2=0): 40,433,519 ops/s (avg), 40,722,094 ops/s (median) + - Optimized (C2=1): 40,252,836 ops/s (avg), 40,291,762 ops/s (median) + - **Average gain: -0.45%**, **Median gain: -1.06%** +- **Decision: NEUTRAL** (within ±1.0% threshold) +- Action: Keep as research box (ENV gate OFF by default) + +**Rationale**: +- Policy hot cache: learner との interlock コストが高い(プローブ時に毎回 check) +- First page cache: 現在の free path は unified_cache push のみ(superslab lookup なし) + - 効果を発揮するには drain path への統合が必要(将来の最適化) +- Bounds check: すでにコンパイラが最適化済み(power-of-2 detection) + +**Current Cumulative Gain** (Phase 2-3): +- B3 (Routing shape): +2.89% +- B4 (Wrapper split): +1.47% +- C3 (Static routing): +2.20% +- C2 (Metadata cache): -0.45% +- **Total: ~6.1%** (baseline 37.5M → ~39.8M ops/s) + +**Commit**: `deecda733` + **優先度 C2** - Slab metadata cache optimization: - Profile cache-miss hotspots (policy struct, slab metadata) - Hot/cold split of metadata