Files
hakmem/docs/analysis/PHASE3_D1_FREE_ROUTE_CACHE_1_DESIGN.md
Moe Charm (CI) 50bded8c85 Phase 3 Finalization: D1 20-run validation, D2 frozen, baseline established
Summary:
- D1 (Free route cache): 20-run validation → PROMOTED TO DEFAULT
  - Baseline (20-run, ROUTE=0): 46.30M ops/s (mean), 46.30M (median)
  - Optimized (20-run, ROUTE=1): 47.32M ops/s (mean), 47.39M (median)
  - Mean gain: +2.19%, Median gain: +2.37%
  - Decision: GO (both criteria met: mean >= +1.0%, median >= +0.0%)
  - Implementation: Added HAKMEM_FREE_STATIC_ROUTE=1 to MIXED preset

- D2 (Wrapper env cache): FROZEN
  - Previous result: -1.44% regression (TLS overhead > benefit)
  - Status: Research box (do not pursue further)
  - Default: OFF (not included in MIXED_TINYV3_C7_SAFE preset)

- Baseline Phase 3: 46.04M ops/s (Mixed, 10-run, 2025-12-13)

Cumulative Gains (Phase 2-3):
  B3: +2.89%, B4: +1.47%, C3: +2.20%, D1: +2.19%
  Total: ~7.6-8.9% (conservative: 7.6%, multiplicative: 8.93%)
  MID_V3 fix: +13% (structural change, Mixed OFF by default)

Documentation Updates:
  - PHASE3_FINALIZATION_SUMMARY.md: Comprehensive Phase 3 report
  - PHASE3_CACHE_LOCALITY_NEXT_INSTRUCTIONS.md: D1/D2 final status
  - PHASE3_D1_FREE_ROUTE_CACHE_1_DESIGN.md: 20-run validation results
  - PHASE3_D2_WRAPPER_ENV_CACHE_1_DESIGN.md: FROZEN status
  - ENV_PROFILE_PRESETS.md: D1 ADOPT, D2 FROZEN
  - PHASE3_BASELINE_AND_CANDIDATES.md: Post-D1/D2 status
  - CURRENT_TASK.md: Phase 3 complete summary

Next:
  - D3 requires perf validation (tiny_alloc_gate_fast self% ≥5%)
  - Or Phase 4 planning if no more D3-class targets
  - Current active optimizations: B3, B4, C3, D1, MID_V3 fix

Files Changed:
  - docs/analysis/PHASE3_FINALIZATION_SUMMARY.md (new, 580+ lines)
  - docs/analysis/*.md (6 files updated with D1/D2 results)
  - CURRENT_TASK.md (Phase 3 status update)
  - analyze_d1_results.py (statistical analysis script)
  - core/bench_profile.h (D1 promoted to default in MIXED preset)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-13 22:42:22 +09:00

2.8 KiB
Raw Blame History

Phase 3 D1: Free Path Route Cache 設計メモ

目的

Free path の tiny_route_for_class() コストを削減4.39% self + 24.78% children

観察

  • free() → tiny_free_fast() → tiny_route_for_class() → g_tiny_route_snapshot_done check
  • Route determination が free path の支配的なボトルネック
  • Phase 3 C3 (Static routing for alloc) と同じアプローチを free に適用

実装アプローチ

L0: Env戻せる

  • HAKMEM_FREE_STATIC_ROUTE=0/1 (default: 0, OFF)

L1: IntegrationBox境界: 1箇所

tiny_route_env_box.h に既存する g_tiny_route を free path で活用:

  • tiny_route_for_class() を呼ばずに直接 route を決定
  • Cache invalidate: policy version change on sync

実装指示

File 1: core/box/tiny_free_route_cache_env_box.h (新規)

  • Inline function: tiny_free_static_route_enabled()
    • Check HAKMEM_FREE_STATIC_ROUTE ENV
    • Lazy init with -1 sentinel
    • Return cached value

File 2: Modify core/box/tiny_route_env_box.h (既存)

  • Add: SmallRouteKind tiny_route_get_kind(int class_idx) if not exist
  • Use existing g_tiny_route.route_kind[class] cache

File 3: Modify core/front/tiny_legacy_fallback_box.h (既存)

  • In tiny_legacy_fallback_free_base() function
  • Check: if (tiny_free_static_route_enabled()) before calling tiny_route_for_class()
  • Fallback: call tiny_route_for_class() if disabled

A/B テスト

  • Mixed (10-run): HAKMEM_FREE_STATIC_ROUTE=0 vs =1
  • GO: +1.0%+, NO-GO: -1.0%-

期待

  • tiny_route_for_class() call 削減 → L1 cache pressure 低下
  • +1-2% gain in free path

結果A/B

Initial 10-run Test

判定: GO追加確認待ち

  • Mixed 10-run:
    • BaselineHAKMEM_FREE_STATIC_ROUTE=0: avg 45.13M / median 45.76M
    • OptimizedHAKMEM_FREE_STATIC_ROUTE=1: avg 45.61M / median 45.40M
    • Delta: avg +1.06% / median -0.77%

20-run Validation (2025-12-13)

判定: ADOPT - PROMOTED TO DEFAULT

  • Mixed 20-run (iter=20M, ws=400, 1T):
    • BaselineHAKMEM_FREE_STATIC_ROUTE=0:
      • Mean: 46.30M ops/s
      • Median: 46.30M ops/s
      • StdDev: 0.10M ops/s
    • OptimizedHAKMEM_FREE_STATIC_ROUTE=1:
      • Mean: 47.32M ops/s
      • Median: 47.39M ops/s
      • StdDev: 0.11M ops/s
    • Gain:
      • Mean: +2.19% ✓ (>= +1.0% threshold)
      • Median: +2.37% ✓ (>= +0.0% threshold)

Decision Criteria Met:

  • Mean gain >= +1.0%: YES (+2.19%)
  • Median gain >= +0.0%: YES (+2.37%)
  • Both criteria satisfied → PROMOTE TO DEFAULT

運用:

  • Promoted to MIXED_TINYV3_C7_SAFE preset default
  • bench_setenv_default("HAKMEM_FREE_STATIC_ROUTE", "1"); added to core/bench_profile.h
  • Effective: Phase 3 finalization (2025-12-13)