Summary: - D1 (Free route cache): 20-run validation → PROMOTED TO DEFAULT - Baseline (20-run, ROUTE=0): 46.30M ops/s (mean), 46.30M (median) - Optimized (20-run, ROUTE=1): 47.32M ops/s (mean), 47.39M (median) - Mean gain: +2.19%, Median gain: +2.37% - Decision: GO (both criteria met: mean >= +1.0%, median >= +0.0%) - Implementation: Added HAKMEM_FREE_STATIC_ROUTE=1 to MIXED preset - D2 (Wrapper env cache): FROZEN - Previous result: -1.44% regression (TLS overhead > benefit) - Status: Research box (do not pursue further) - Default: OFF (not included in MIXED_TINYV3_C7_SAFE preset) - Baseline Phase 3: 46.04M ops/s (Mixed, 10-run, 2025-12-13) Cumulative Gains (Phase 2-3): B3: +2.89%, B4: +1.47%, C3: +2.20%, D1: +2.19% Total: ~7.6-8.9% (conservative: 7.6%, multiplicative: 8.93%) MID_V3 fix: +13% (structural change, Mixed OFF by default) Documentation Updates: - PHASE3_FINALIZATION_SUMMARY.md: Comprehensive Phase 3 report - PHASE3_CACHE_LOCALITY_NEXT_INSTRUCTIONS.md: D1/D2 final status - PHASE3_D1_FREE_ROUTE_CACHE_1_DESIGN.md: 20-run validation results - PHASE3_D2_WRAPPER_ENV_CACHE_1_DESIGN.md: FROZEN status - ENV_PROFILE_PRESETS.md: D1 ADOPT, D2 FROZEN - PHASE3_BASELINE_AND_CANDIDATES.md: Post-D1/D2 status - CURRENT_TASK.md: Phase 3 complete summary Next: - D3 requires perf validation (tiny_alloc_gate_fast self% ≥5%) - Or Phase 4 planning if no more D3-class targets - Current active optimizations: B3, B4, C3, D1, MID_V3 fix Files Changed: - docs/analysis/PHASE3_FINALIZATION_SUMMARY.md (new, 580+ lines) - docs/analysis/*.md (6 files updated with D1/D2 results) - CURRENT_TASK.md (Phase 3 status update) - analyze_d1_results.py (statistical analysis script) - core/bench_profile.h (D1 promoted to default in MIXED preset) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2.8 KiB
2.8 KiB
Phase 3 D1: Free Path Route Cache 設計メモ
目的
Free path の tiny_route_for_class() コストを削減(4.39% self + 24.78% children)
観察
- free() → tiny_free_fast() → tiny_route_for_class() → g_tiny_route_snapshot_done check
- Route determination が free path の支配的なボトルネック
- Phase 3 C3 (Static routing for alloc) と同じアプローチを free に適用
実装アプローチ
L0: Env(戻せる)
HAKMEM_FREE_STATIC_ROUTE=0/1(default: 0, OFF)
L1: IntegrationBox(境界: 1箇所)
tiny_route_env_box.h に既存する g_tiny_route を free path で活用:
tiny_route_for_class()を呼ばずに直接 route を決定- Cache invalidate: policy version change on sync
実装指示
File 1: core/box/tiny_free_route_cache_env_box.h (新規)
- Inline function:
tiny_free_static_route_enabled()- Check
HAKMEM_FREE_STATIC_ROUTEENV - Lazy init with -1 sentinel
- Return cached value
- Check
File 2: Modify core/box/tiny_route_env_box.h (既存)
- Add:
SmallRouteKind tiny_route_get_kind(int class_idx)if not exist - Use existing
g_tiny_route.route_kind[class]cache
File 3: Modify core/front/tiny_legacy_fallback_box.h (既存)
- In
tiny_legacy_fallback_free_base()function - Check:
if (tiny_free_static_route_enabled())before callingtiny_route_for_class() - Fallback: call
tiny_route_for_class()if disabled
A/B テスト
- Mixed (10-run): HAKMEM_FREE_STATIC_ROUTE=0 vs =1
- GO: +1.0%+, NO-GO: -1.0%-
期待
- tiny_route_for_class() call 削減 → L1 cache pressure 低下
- +1-2% gain in free path
結果(A/B)
Initial 10-run Test
判定: ✅ GO(追加確認待ち)
- Mixed 10-run:
- Baseline(
HAKMEM_FREE_STATIC_ROUTE=0): avg 45.13M / median 45.76M - Optimized(
HAKMEM_FREE_STATIC_ROUTE=1): avg 45.61M / median 45.40M - Delta: avg +1.06% / median -0.77%
- Baseline(
20-run Validation (2025-12-13)
判定: ✅ ADOPT - PROMOTED TO DEFAULT
- Mixed 20-run (iter=20M, ws=400, 1T):
- Baseline(
HAKMEM_FREE_STATIC_ROUTE=0):- Mean: 46.30M ops/s
- Median: 46.30M ops/s
- StdDev: 0.10M ops/s
- Optimized(
HAKMEM_FREE_STATIC_ROUTE=1):- Mean: 47.32M ops/s
- Median: 47.39M ops/s
- StdDev: 0.11M ops/s
- Gain:
- Mean: +2.19% ✓ (>= +1.0% threshold)
- Median: +2.37% ✓ (>= +0.0% threshold)
- Baseline(
Decision Criteria Met:
- Mean gain >= +1.0%: YES (+2.19%)
- Median gain >= +0.0%: YES (+2.37%)
- Both criteria satisfied → PROMOTE TO DEFAULT
運用:
- ✅ Promoted to
MIXED_TINYV3_C7_SAFEpreset default bench_setenv_default("HAKMEM_FREE_STATIC_ROUTE", "1");added to core/bench_profile.h- Effective: Phase 3 finalization (2025-12-13)