Phase 5 E4-1: Free Wrapper ENV Snapshot (+3.51% GO, ADOPTED)
Target: Consolidate free wrapper TLS reads (2→1)
- free() is 25.26% self% (top hot spot)
- Strategy: Apply E1 success pattern (ENV snapshot) to free path
Implementation:
- ENV gate: HAKMEM_FREE_WRAPPER_ENV_SNAPSHOT=0/1 (default 0)
- core/box/free_wrapper_env_snapshot_box.{h,c}: New box
- Consolidates 2 TLS reads → 1 TLS read (50% reduction)
- Reduces 4 branches → 3 branches (25% reduction)
- Lazy init with probe window (bench_profile putenv sync)
- core/box/hak_wrappers.inc.h: Integration in free() wrapper
- Makefile: Add free_wrapper_env_snapshot_box.o to all targets
A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (SNAPSHOT=0): 45.35M ops/s (mean), 45.31M ops/s (median)
- Optimized (SNAPSHOT=1): 46.94M ops/s (mean), 47.15M ops/s (median)
- Improvement: +3.51% mean, +4.07% median
Decision: GO (+3.51% >= +1.0% threshold)
- Exceeded conservative estimate (+1.5% → +3.51%)
- Similar efficiency to E1 (+3.92%)
- Health check: PASS (all profiles)
- Action: PROMOTED to MIXED_TINYV3_C7_SAFE preset
Phase 5 Cumulative:
- E1 (ENV Snapshot): +3.92%
- E4-1 (Free Wrapper Snapshot): +3.51%
- Total Phase 4-5: ~+7.5%
E3-4 Correction:
- Phase 4 E3-4 (ENV Constructor Init): NO-GO / FROZEN
- Initial A/B showed +4.75%, but investigation revealed:
- Branch prediction hint mismatch (UNLIKELY with always-true)
- Retest confirmed -1.78% regression
- Root cause: __builtin_expect(..., 0) with ctor_mode==1
- Decision: Freeze as research box (default OFF)
- Learning: Branch hints need careful tuning, TLS consolidation safer
Deliverables:
- docs/analysis/PHASE5_E4_FREE_GATE_OPTIMIZATION_1_DESIGN.md
- docs/analysis/PHASE5_E4_1_FREE_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md
- docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md (next)
- docs/analysis/PHASE5_POST_E1_NEXT_INSTRUCTIONS.md
- docs/analysis/ENV_PROFILE_PRESETS.md (E4-1 added, E3-4 corrected)
- CURRENT_TASK.md (E4-1 complete, E3-4 frozen)
- core/bench_profile.h (E4-1 promoted to default)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
@ -1,8 +1,59 @@
|
||||
# 本線タスク(現在)
|
||||
|
||||
## 更新メモ(2025-12-14 Phase 5 E4-1 Complete - Free Gate Optimization)
|
||||
|
||||
### Phase 5 E4-1: Free Wrapper ENV Snapshot ✅ GO (2025-12-14)
|
||||
|
||||
**Target**: Consolidate TLS reads in free() wrapper to reduce 25.26% self% hot spot
|
||||
- Strategy: Apply E1 success pattern (ENV snapshot consolidation), NOT E3-4 failure pattern
|
||||
- Implementation: Single TLS snapshot with packed flags (wrap_shape + front_gate + hotcold)
|
||||
- Reduce: 2 TLS reads → 1 TLS read, 4 branches → 3 branches
|
||||
|
||||
**Implementation**:
|
||||
- ENV gate: `HAKMEM_FREE_WRAPPER_ENV_SNAPSHOT=0/1` (default: 0, research box)
|
||||
- Files: `core/box/free_wrapper_env_snapshot_box.{h,c}` (new ENV snapshot box)
|
||||
- Integration: `core/box/hak_wrappers.inc.h` (lines 552-580, free() wrapper)
|
||||
|
||||
**A/B Test Results** (Mixed, 10-run, 20M iters, ws=400):
|
||||
- Baseline (SNAPSHOT=0): **45.35M ops/s** (mean), 45.31M ops/s (median), σ=0.34M
|
||||
- Optimized (SNAPSHOT=1): **46.94M ops/s** (mean), 47.15M ops/s (median), σ=0.94M
|
||||
- **Delta: +3.51% mean, +4.07% median** ✅
|
||||
|
||||
**Decision: GO** (+3.51% >= +1.0% threshold)
|
||||
- Exceeded conservative estimate (+1.5%) → Achieved +3.51%
|
||||
- Similar to E1 success (+3.92%) - ENV consolidation pattern works
|
||||
- Action: Promote to `MIXED_TINYV3_C7_SAFE` preset (HAKMEM_FREE_WRAPPER_ENV_SNAPSHOT=1 default)
|
||||
|
||||
**Health Check**: ✅ PASS
|
||||
- MIXED_TINYV3_C7_SAFE: 42.5M ops/s
|
||||
- C6_HEAVY_LEGACY_POOLV1: 23.0M ops/s
|
||||
- All profiles passed, no regressions
|
||||
|
||||
**Perf Profile** (SNAPSHOT=1, 20M iters):
|
||||
- free(): 25.26% (unchanged in this sample)
|
||||
- NEW hot spot: hakmem_env_snapshot_enabled: 4.67% (ENV snapshot overhead visible)
|
||||
- Note: Small sample (65 samples) may not be fully representative
|
||||
- Overall throughput improved +3.51% despite ENV snapshot overhead cost
|
||||
|
||||
**Key Insight**: ENV consolidation continues to yield strong returns. Free path optimization via TLS reduction proves effective, matching E1's success pattern. The visible ENV snapshot overhead (4.67%) is outweighed by overall path efficiency gains.
|
||||
|
||||
**Cumulative Status (Phase 5)**:
|
||||
- E4-1 (Free Wrapper Snapshot): +3.51% (GO)
|
||||
- Total Phase 5: ~+3.5% (on top of Phase 4's +3.9%)
|
||||
|
||||
**Next Steps**:
|
||||
- ✅ Promoted: `MIXED_TINYV3_C7_SAFE` で `HAKMEM_FREE_WRAPPER_ENV_SNAPSHOT=1` を default 化(opt-out 可)
|
||||
- Next target: E4-2(malloc wrapper snapshot)か、perf で self% ≥ 5% の芯を選ぶ
|
||||
- Design doc: `docs/analysis/PHASE5_E4_FREE_GATE_OPTIMIZATION_1_DESIGN.md`
|
||||
- 指示書:
|
||||
- `docs/analysis/PHASE5_E4_1_FREE_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md`
|
||||
- `docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md`
|
||||
|
||||
---
|
||||
|
||||
## 更新メモ(2025-12-14 Phase 4 E3-4 Complete - ENV Constructor Init)
|
||||
|
||||
### Phase 4 E3-4: ENV Constructor Init ✅ GO (+4.75%) (2025-12-14)
|
||||
### Phase 4 E3-4: ENV Constructor Init ❌ NO-GO / FROZEN (2025-12-14)
|
||||
|
||||
**Target**: E1 の lazy init check(3.22% self%)を constructor init で排除
|
||||
- E1 で ENV snapshot を統合したが、`hakmem_env_snapshot_enabled()` の lazy check が残っていた
|
||||
@ -13,23 +64,24 @@
|
||||
- `core/box/hakmem_env_snapshot_box.c`: Constructor function 追加
|
||||
- `core/box/hakmem_env_snapshot_box.h`: Dual-mode enabled check (constructor vs legacy)
|
||||
|
||||
**A/B Test Results** (Mixed, 10-run, 20M iters, HAKMEM_ENV_SNAPSHOT=1):
|
||||
- Baseline (CTOR=0): **44.28M ops/s** (mean), 44.60M ops/s (median), σ=1.0M
|
||||
- Optimized (CTOR=1): **46.38M ops/s** (mean), 46.53M ops/s (median), σ=0.5M
|
||||
- **Improvement: +4.75% mean, +4.35% median**
|
||||
**A/B Test Results(re-validation)** (Mixed, 10-run, 20M iters, ws=400, HAKMEM_ENV_SNAPSHOT=1):
|
||||
- Baseline (CTOR=0): **47.55M ops/s** (mean), 47.46M ops/s (median)
|
||||
- Optimized (CTOR=1): **46.86M ops/s** (mean), 46.97M ops/s (median)
|
||||
- **Delta: -1.44% mean, -1.03% median** ❌
|
||||
|
||||
**Decision: GO** (+4.75% >> +0.5% threshold)
|
||||
- 期待値 +0.5-1.5% を大幅に上回る +4.75% 達成
|
||||
- Action: Keep as research box for now (default OFF)
|
||||
**Decision: NO-GO / FROZEN**
|
||||
- 初回の +4.75% は再現しない(ノイズ/環境要因の可能性が高い)
|
||||
- constructor mode は “追加の分岐/ロード” になり、現状の hot path では得にならない
|
||||
- Action: default OFF のまま freeze(追わない)
|
||||
- Design doc: `docs/analysis/PHASE4_E3_ENV_CONSTRUCTOR_INIT_DESIGN.md`
|
||||
|
||||
**Key Insight**: Lazy init check overhead was larger than expected. Constructor pattern eliminates branch in hot path entirely, yielding substantial gain.
|
||||
**Key Insight**: “constructor で初期化” 自体は安全だが、性能面では現状 NO-GO。勝ち箱は E1 に集中する。
|
||||
|
||||
**Cumulative Status (Phase 4)**:
|
||||
- E1 (ENV Snapshot): +3.92% (GO)
|
||||
- E2 (Alloc Per-Class): -0.21% (NEUTRAL, frozen)
|
||||
- **E3-4 (Constructor Init): +4.75% (GO)**
|
||||
- **Total Phase 4: ~+8.5%**
|
||||
- E3-4 (Constructor Init): NO-GO / frozen
|
||||
- Total Phase 4: ~+3.9%(E1 のみ)
|
||||
|
||||
---
|
||||
|
||||
@ -63,13 +115,16 @@
|
||||
- Conclusion: Alloc route optimization has reached diminishing returns
|
||||
|
||||
**Cumulative Status**:
|
||||
- Phase 4 E1: +3.92% (GO, research box)
|
||||
- Phase 4 E1: +3.92% (GO)
|
||||
- Phase 4 E2: -0.21% (NEUTRAL, frozen)
|
||||
- Phase 4 E3-4: +4.75% (GO, research box; requires E1)
|
||||
- Phase 4 E3-4: NO-GO / frozen
|
||||
|
||||
### Next: Phase 4 E3-4(昇格判断)
|
||||
### Next: Phase 4(close & next target)
|
||||
|
||||
- 指示書: `docs/analysis/PHASE4_E3_ENV_CONSTRUCTOR_INIT_NEXT_INSTRUCTIONS.md`
|
||||
- 勝ち箱: E1 を `MIXED_TINYV3_C7_SAFE` プリセットへ昇格(opt-out 可)
|
||||
- 研究箱: E3-4/E2 は freeze(default OFF)
|
||||
- 次の芯は perf で “self% ≥ 5%” の箱から選ぶ
|
||||
- 次の指示書: `docs/analysis/PHASE5_POST_E1_NEXT_INSTRUCTIONS.md`
|
||||
|
||||
---
|
||||
|
||||
|
||||
Reference in New Issue
Block a user