2025-12-13 21:44:00 +09:00
|
|
|
|
# Phase 3 D1: Free Path Route Cache 設計メモ
|
|
|
|
|
|
|
|
|
|
|
|
## 目的
|
|
|
|
|
|
Free path の `tiny_route_for_class()` コストを削減(4.39% self + 24.78% children)
|
|
|
|
|
|
|
|
|
|
|
|
## 観察
|
|
|
|
|
|
- free() → tiny_free_fast() → tiny_route_for_class() → g_tiny_route_snapshot_done check
|
|
|
|
|
|
- Route determination が free path の支配的なボトルネック
|
|
|
|
|
|
- Phase 3 C3 (Static routing for alloc) と同じアプローチを free に適用
|
|
|
|
|
|
|
|
|
|
|
|
## 実装アプローチ
|
|
|
|
|
|
|
|
|
|
|
|
### L0: Env(戻せる)
|
|
|
|
|
|
- `HAKMEM_FREE_STATIC_ROUTE=0/1` (default: 0, OFF)
|
|
|
|
|
|
|
|
|
|
|
|
### L1: IntegrationBox(境界: 1箇所)
|
|
|
|
|
|
|
|
|
|
|
|
`tiny_route_env_box.h` に既存する `g_tiny_route` を free path で活用:
|
|
|
|
|
|
- `tiny_route_for_class()` を呼ばずに直接 route を決定
|
|
|
|
|
|
- Cache invalidate: policy version change on sync
|
|
|
|
|
|
|
|
|
|
|
|
### 実装指示
|
|
|
|
|
|
|
|
|
|
|
|
**File 1**: `core/box/tiny_free_route_cache_env_box.h` (新規)
|
|
|
|
|
|
- Inline function: `tiny_free_static_route_enabled()`
|
|
|
|
|
|
- Check `HAKMEM_FREE_STATIC_ROUTE` ENV
|
|
|
|
|
|
- Lazy init with -1 sentinel
|
|
|
|
|
|
- Return cached value
|
|
|
|
|
|
|
|
|
|
|
|
**File 2**: Modify `core/box/tiny_route_env_box.h` (既存)
|
|
|
|
|
|
- Add: `SmallRouteKind tiny_route_get_kind(int class_idx)` if not exist
|
|
|
|
|
|
- Use existing `g_tiny_route.route_kind[class]` cache
|
|
|
|
|
|
|
|
|
|
|
|
**File 3**: Modify `core/front/tiny_legacy_fallback_box.h` (既存)
|
|
|
|
|
|
- In `tiny_legacy_fallback_free_base()` function
|
|
|
|
|
|
- Check: `if (tiny_free_static_route_enabled())` before calling `tiny_route_for_class()`
|
|
|
|
|
|
- Fallback: call `tiny_route_for_class()` if disabled
|
|
|
|
|
|
|
|
|
|
|
|
## A/B テスト
|
|
|
|
|
|
|
|
|
|
|
|
- Mixed (10-run): HAKMEM_FREE_STATIC_ROUTE=0 vs =1
|
|
|
|
|
|
- GO: +1.0%+, NO-GO: -1.0%-
|
|
|
|
|
|
|
|
|
|
|
|
## 期待
|
|
|
|
|
|
- tiny_route_for_class() call 削減 → L1 cache pressure 低下
|
|
|
|
|
|
- +1-2% gain in free path
|
Phase 3 Finalization: D1 20-run validation, D2 frozen, baseline established
Summary:
- D1 (Free route cache): 20-run validation → PROMOTED TO DEFAULT
- Baseline (20-run, ROUTE=0): 46.30M ops/s (mean), 46.30M (median)
- Optimized (20-run, ROUTE=1): 47.32M ops/s (mean), 47.39M (median)
- Mean gain: +2.19%, Median gain: +2.37%
- Decision: GO (both criteria met: mean >= +1.0%, median >= +0.0%)
- Implementation: Added HAKMEM_FREE_STATIC_ROUTE=1 to MIXED preset
- D2 (Wrapper env cache): FROZEN
- Previous result: -1.44% regression (TLS overhead > benefit)
- Status: Research box (do not pursue further)
- Default: OFF (not included in MIXED_TINYV3_C7_SAFE preset)
- Baseline Phase 3: 46.04M ops/s (Mixed, 10-run, 2025-12-13)
Cumulative Gains (Phase 2-3):
B3: +2.89%, B4: +1.47%, C3: +2.20%, D1: +2.19%
Total: ~7.6-8.9% (conservative: 7.6%, multiplicative: 8.93%)
MID_V3 fix: +13% (structural change, Mixed OFF by default)
Documentation Updates:
- PHASE3_FINALIZATION_SUMMARY.md: Comprehensive Phase 3 report
- PHASE3_CACHE_LOCALITY_NEXT_INSTRUCTIONS.md: D1/D2 final status
- PHASE3_D1_FREE_ROUTE_CACHE_1_DESIGN.md: 20-run validation results
- PHASE3_D2_WRAPPER_ENV_CACHE_1_DESIGN.md: FROZEN status
- ENV_PROFILE_PRESETS.md: D1 ADOPT, D2 FROZEN
- PHASE3_BASELINE_AND_CANDIDATES.md: Post-D1/D2 status
- CURRENT_TASK.md: Phase 3 complete summary
Next:
- D3 requires perf validation (tiny_alloc_gate_fast self% ≥5%)
- Or Phase 4 planning if no more D3-class targets
- Current active optimizations: B3, B4, C3, D1, MID_V3 fix
Files Changed:
- docs/analysis/PHASE3_FINALIZATION_SUMMARY.md (new, 580+ lines)
- docs/analysis/*.md (6 files updated with D1/D2 results)
- CURRENT_TASK.md (Phase 3 status update)
- analyze_d1_results.py (statistical analysis script)
- core/bench_profile.h (D1 promoted to default in MIXED preset)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-13 22:42:22 +09:00
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 結果(A/B)
|
|
|
|
|
|
|
|
|
|
|
|
### Initial 10-run Test
|
|
|
|
|
|
|
|
|
|
|
|
**判定**: ✅ GO(追加確認待ち)
|
|
|
|
|
|
|
|
|
|
|
|
- Mixed 10-run:
|
|
|
|
|
|
- Baseline(`HAKMEM_FREE_STATIC_ROUTE=0`): avg **45.13M** / median **45.76M**
|
|
|
|
|
|
- Optimized(`HAKMEM_FREE_STATIC_ROUTE=1`): avg **45.61M** / median **45.40M**
|
|
|
|
|
|
- Delta: avg **+1.06%** / median **-0.77%**
|
|
|
|
|
|
|
|
|
|
|
|
### 20-run Validation (2025-12-13)
|
|
|
|
|
|
|
|
|
|
|
|
**判定**: ✅ ADOPT - PROMOTED TO DEFAULT
|
|
|
|
|
|
|
|
|
|
|
|
- Mixed 20-run (iter=20M, ws=400, 1T):
|
|
|
|
|
|
- Baseline(`HAKMEM_FREE_STATIC_ROUTE=0`):
|
|
|
|
|
|
- Mean: **46.30M ops/s**
|
|
|
|
|
|
- Median: **46.30M ops/s**
|
|
|
|
|
|
- StdDev: **0.10M ops/s**
|
|
|
|
|
|
- Optimized(`HAKMEM_FREE_STATIC_ROUTE=1`):
|
|
|
|
|
|
- Mean: **47.32M ops/s**
|
|
|
|
|
|
- Median: **47.39M ops/s**
|
|
|
|
|
|
- StdDev: **0.11M ops/s**
|
|
|
|
|
|
- Gain:
|
|
|
|
|
|
- Mean: **+2.19%** ✓ (>= +1.0% threshold)
|
|
|
|
|
|
- Median: **+2.37%** ✓ (>= +0.0% threshold)
|
|
|
|
|
|
|
|
|
|
|
|
**Decision Criteria Met**:
|
|
|
|
|
|
- Mean gain >= +1.0%: YES (+2.19%)
|
|
|
|
|
|
- Median gain >= +0.0%: YES (+2.37%)
|
|
|
|
|
|
- Both criteria satisfied → **PROMOTE TO DEFAULT**
|
|
|
|
|
|
|
|
|
|
|
|
**運用**:
|
|
|
|
|
|
- ✅ Promoted to `MIXED_TINYV3_C7_SAFE` preset default
|
|
|
|
|
|
- `bench_setenv_default("HAKMEM_FREE_STATIC_ROUTE", "1");` added to core/bench_profile.h
|
|
|
|
|
|
- Effective: Phase 3 finalization (2025-12-13)
|