Target: Consolidate free wrapper TLS reads (2→1)
- free() is 25.26% self% (top hot spot)
- Strategy: Apply E1 success pattern (ENV snapshot) to free path
Implementation:
- ENV gate: HAKMEM_FREE_WRAPPER_ENV_SNAPSHOT=0/1 (default 0)
- core/box/free_wrapper_env_snapshot_box.{h,c}: New box
- Consolidates 2 TLS reads → 1 TLS read (50% reduction)
- Reduces 4 branches → 3 branches (25% reduction)
- Lazy init with probe window (bench_profile putenv sync)
- core/box/hak_wrappers.inc.h: Integration in free() wrapper
- Makefile: Add free_wrapper_env_snapshot_box.o to all targets
A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (SNAPSHOT=0): 45.35M ops/s (mean), 45.31M ops/s (median)
- Optimized (SNAPSHOT=1): 46.94M ops/s (mean), 47.15M ops/s (median)
- Improvement: +3.51% mean, +4.07% median
Decision: GO (+3.51% >= +1.0% threshold)
- Exceeded conservative estimate (+1.5% → +3.51%)
- Similar efficiency to E1 (+3.92%)
- Health check: PASS (all profiles)
- Action: PROMOTED to MIXED_TINYV3_C7_SAFE preset
Phase 5 Cumulative:
- E1 (ENV Snapshot): +3.92%
- E4-1 (Free Wrapper Snapshot): +3.51%
- Total Phase 4-5: ~+7.5%
E3-4 Correction:
- Phase 4 E3-4 (ENV Constructor Init): NO-GO / FROZEN
- Initial A/B showed +4.75%, but investigation revealed:
- Branch prediction hint mismatch (UNLIKELY with always-true)
- Retest confirmed -1.78% regression
- Root cause: __builtin_expect(..., 0) with ctor_mode==1
- Decision: Freeze as research box (default OFF)
- Learning: Branch hints need careful tuning, TLS consolidation safer
Deliverables:
- docs/analysis/PHASE5_E4_FREE_GATE_OPTIMIZATION_1_DESIGN.md
- docs/analysis/PHASE5_E4_1_FREE_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md
- docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md (next)
- docs/analysis/PHASE5_POST_E1_NEXT_INSTRUCTIONS.md
- docs/analysis/ENV_PROFILE_PRESETS.md (E4-1 added, E3-4 corrected)
- CURRENT_TASK.md (E4-1 complete, E3-4 frozen)
- core/bench_profile.h (E4-1 promoted to default)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
67 lines
2.0 KiB
Markdown
67 lines
2.0 KiB
Markdown
# Phase 4 Status - Executive Summary
|
||
|
||
**Date**: 2025-12-14
|
||
**Status**: E1 ✅ GO(preset昇格), E2 🔬 FROZEN, E3-4 ❌ NO-GO
|
||
**Baseline**: Mixed 20M/ws=400(E1=1 を前提)
|
||
|
||
---
|
||
|
||
## Quick Status
|
||
|
||
### E2 Decision: FREEZE ✅ (NEUTRAL)
|
||
|
||
**Result**: -0.21% mean, -0.62% median (NEUTRAL)
|
||
|
||
**Why Freeze?**
|
||
- Alloc route optimization saturated by Phase 3 C3 (static routing)
|
||
- Free DUALHOT worked (+13%) because it skipped expensive ops
|
||
- Alloc DUALHOT doesn't work (-0.21%) because route already cached
|
||
- **Lesson**: Per-class specialization only helps when bypassing uncached overhead
|
||
|
||
**Action**: Keep as research box (default OFF), no further investigation
|
||
|
||
---
|
||
|
||
## E1/E3-4 Results (Mixed A/B)
|
||
|
||
### E1: ENV Snapshot Consolidation ✅ GO (opt-in)
|
||
|
||
**Result**: +3.92% avg, +4.01% median
|
||
**ENV**: `HAKMEM_ENV_SNAPSHOT=1`(`MIXED_TINYV3_C7_SAFE` で default 化、opt-out 可)
|
||
|
||
### E3-4: ENV Constructor Init ❌ NO-GO (FROZEN)
|
||
|
||
**Result(re-validation)**: -1.44% mean, -1.03% median(E1=1 前提)
|
||
**ENV**: `HAKMEM_ENV_SNAPSHOT=1 HAKMEM_ENV_SNAPSHOT_CTOR=1`(default OFF / freeze)
|
||
|
||
---
|
||
|
||
## Phase 4 Cumulative Status
|
||
|
||
**Active**:
|
||
- E1 (ENV Snapshot): +3.92% ✅ GO(opt-in)
|
||
|
||
**Frozen**:
|
||
- D3 (Alloc Gate Shape): +0.56% ⚪
|
||
- E2 (Alloc Per-Class FastPath): -0.21% ⚪
|
||
- E3-4 (ENV CTOR): ❌ NO-GO
|
||
|
||
## Next Actions
|
||
|
||
1. E3-4 を freeze 維持(default OFF)
|
||
2. E1 を本線化した状態で perf を取り直し、“self% ≥ 5%” の芯を選ぶ
|
||
3. 次の箱は “TLS/分岐” ではなく “実データ構造/ホットループ” を優先(alloc gate / unified_cache / pool など)
|
||
|
||
---
|
||
|
||
## Key Lessons
|
||
|
||
1. **Route optimization saturated**: C3 already cached routes, E2 no benefit
|
||
2. **Shape optimization plateaued**: D3 +0.56% neutral, branch prediction saturated
|
||
3. **ENV consolidation successful**: E1 +3.92%, constructor init is next step
|
||
4. **Different optimization vectors needed**: Move beyond route/shape to init/dispatch overhead
|
||
|
||
---
|
||
|
||
**Full Analysis**: `/mnt/workdisk/public_share/hakmem/docs/analysis/PHASE4_COMPREHENSIVE_STATUS_ANALYSIS.md`
|