|
|
141cd8a5be
|
Phase 3 Closure & Phase 4 Preparation
Summary:
- Phase 3 optimization complete (cumulative +8.93%)
- D1 promoted to default (HAKMEM_FREE_STATIC_ROUTE=1, +2.19%)
- D2 frozen (NO-GO, -1.44% regression)
- Phase 4 instructions prepared (D3/Alloc Gate Specialization)
Results:
B3 (Routing shape): +2.89%
B4 (Wrapper split): +1.47%
C3 (Static routing): +2.20%
C1 (TLS prefetch): NEUTRAL (-0.34%, research box)
C2 (Metadata cache): NEUTRAL (-0.45%, research box)
D1 (Free route cache): +2.19% (now default)
D2 (Wrapper env cache): NO-GO (-1.44%, frozen)
MID_V3 fix: +13% (structural)
Total Phase 2-3 gain: ~8.93% (37.5M → 51M ops/s)
Updated:
- CURRENT_TASK.md: Phase 3 final results + D3 conditions
- ENV_PROFILE_PRESETS.md: Active optimizations listed
- PHASE3_CACHE_LOCALITY_NEXT_INSTRUCTIONS.md: Phase 3→4 transition
- PHASE4_ALLOC_GATE_SPECIALIZATION_NEXT_INSTRUCTIONS.md: D3 execution plan
- PHASE3_BASELINE_AND_CANDIDATES.md: Post-validation status
Next phase: Phase 4 D3 - Alloc Gate Specialization
- Requires: tiny_alloc_gate_fast self% ≥5% from perf
- Design SSOT: PHASE3_D3_ALLOC_GATE_SPECIALIZATION_1_DESIGN.md
- Execution: PHASE4_ALLOC_GATE_SPECIALIZATION_NEXT_INSTRUCTIONS.md
🤖 Generated with Claude Code
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
|
2025-12-13 23:47:19 +09:00 |
|
|
|
50bded8c85
|
Phase 3 Finalization: D1 20-run validation, D2 frozen, baseline established
Summary:
- D1 (Free route cache): 20-run validation → PROMOTED TO DEFAULT
- Baseline (20-run, ROUTE=0): 46.30M ops/s (mean), 46.30M (median)
- Optimized (20-run, ROUTE=1): 47.32M ops/s (mean), 47.39M (median)
- Mean gain: +2.19%, Median gain: +2.37%
- Decision: GO (both criteria met: mean >= +1.0%, median >= +0.0%)
- Implementation: Added HAKMEM_FREE_STATIC_ROUTE=1 to MIXED preset
- D2 (Wrapper env cache): FROZEN
- Previous result: -1.44% regression (TLS overhead > benefit)
- Status: Research box (do not pursue further)
- Default: OFF (not included in MIXED_TINYV3_C7_SAFE preset)
- Baseline Phase 3: 46.04M ops/s (Mixed, 10-run, 2025-12-13)
Cumulative Gains (Phase 2-3):
B3: +2.89%, B4: +1.47%, C3: +2.20%, D1: +2.19%
Total: ~7.6-8.9% (conservative: 7.6%, multiplicative: 8.93%)
MID_V3 fix: +13% (structural change, Mixed OFF by default)
Documentation Updates:
- PHASE3_FINALIZATION_SUMMARY.md: Comprehensive Phase 3 report
- PHASE3_CACHE_LOCALITY_NEXT_INSTRUCTIONS.md: D1/D2 final status
- PHASE3_D1_FREE_ROUTE_CACHE_1_DESIGN.md: 20-run validation results
- PHASE3_D2_WRAPPER_ENV_CACHE_1_DESIGN.md: FROZEN status
- ENV_PROFILE_PRESETS.md: D1 ADOPT, D2 FROZEN
- PHASE3_BASELINE_AND_CANDIDATES.md: Post-D1/D2 status
- CURRENT_TASK.md: Phase 3 complete summary
Next:
- D3 requires perf validation (tiny_alloc_gate_fast self% ≥5%)
- Or Phase 4 planning if no more D3-class targets
- Current active optimizations: B3, B4, C3, D1, MID_V3 fix
Files Changed:
- docs/analysis/PHASE3_FINALIZATION_SUMMARY.md (new, 580+ lines)
- docs/analysis/*.md (6 files updated with D1/D2 results)
- CURRENT_TASK.md (Phase 3 status update)
- analyze_d1_results.py (statistical analysis script)
- core/bench_profile.h (D1 promoted to default in MIXED preset)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
|
2025-12-13 22:42:22 +09:00 |
|
|
|
19056282b6
|
Phase 3 D2: Wrapper Env Cache - [DECISION: NO-GO]
Target: Reduce wrapper_env_cfg() overhead in malloc/free hot path
- Strategy: Cache wrapper env configuration pointer in TLS
- Approach: Fast pointer cache (TLS caches const wrapper_env_cfg_t*)
Implementation:
- core/box/wrapper_env_cache_env_box.h: ENV gate (HAKMEM_WRAP_ENV_CACHE)
- core/box/wrapper_env_cache_box.h: TLS cache layer (wrapper_env_cfg_fast)
- core/box/hak_wrappers.inc.h: Integration into malloc/free hot paths
- ENV gate: HAKMEM_WRAP_ENV_CACHE=0/1 (default OFF)
A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (D2=0): 46.52M ops/s (avg), 46.47M ops/s (median)
- Optimized (D2=1): 45.85M ops/s (avg), 45.98M ops/s (median)
- Improvement: avg -1.44%, median -1.05% (DECISION: NO-GO)
Analysis:
- Regression cause: TLS cache adds overhead (branch + TLS access)
- wrapper_env_cfg() is already minimal (pointer return after simple check)
- Adding TLS caching layer makes it worse, not better
- Branch prediction penalty outweighs any potential savings
Cumulative Phase 2-3:
- B3: +2.89%, B4: +1.47%, C3: +2.20%
- D1: +1.06% (opt-in), D2: -1.44% (NO-GO)
- Total: ~7.2% (excluding D2)
Decision: FREEZE as research box (default OFF, regression confirmed)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
|
2025-12-13 22:03:27 +09:00 |
|