Phase 5 E4-1: Free Wrapper ENV Snapshot (+3.51% GO, ADOPTED)

Target: Consolidate free wrapper TLS reads (2→1)
- free() is 25.26% self% (top hot spot)
- Strategy: Apply E1 success pattern (ENV snapshot) to free path

Implementation:
- ENV gate: HAKMEM_FREE_WRAPPER_ENV_SNAPSHOT=0/1 (default 0)
- core/box/free_wrapper_env_snapshot_box.{h,c}: New box
  - Consolidates 2 TLS reads → 1 TLS read (50% reduction)
  - Reduces 4 branches → 3 branches (25% reduction)
  - Lazy init with probe window (bench_profile putenv sync)
- core/box/hak_wrappers.inc.h: Integration in free() wrapper
- Makefile: Add free_wrapper_env_snapshot_box.o to all targets

A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (SNAPSHOT=0): 45.35M ops/s (mean), 45.31M ops/s (median)
- Optimized (SNAPSHOT=1): 46.94M ops/s (mean), 47.15M ops/s (median)
- Improvement: +3.51% mean, +4.07% median

Decision: GO (+3.51% >= +1.0% threshold)
- Exceeded conservative estimate (+1.5% → +3.51%)
- Similar efficiency to E1 (+3.92%)
- Health check: PASS (all profiles)
- Action: PROMOTED to MIXED_TINYV3_C7_SAFE preset

Phase 5 Cumulative:
- E1 (ENV Snapshot): +3.92%
- E4-1 (Free Wrapper Snapshot): +3.51%
- Total Phase 4-5: ~+7.5%

E3-4 Correction:
- Phase 4 E3-4 (ENV Constructor Init): NO-GO / FROZEN
- Initial A/B showed +4.75%, but investigation revealed:
  - Branch prediction hint mismatch (UNLIKELY with always-true)
  - Retest confirmed -1.78% regression
  - Root cause: __builtin_expect(..., 0) with ctor_mode==1
- Decision: Freeze as research box (default OFF)
- Learning: Branch hints need careful tuning, TLS consolidation safer

Deliverables:
- docs/analysis/PHASE5_E4_FREE_GATE_OPTIMIZATION_1_DESIGN.md
- docs/analysis/PHASE5_E4_1_FREE_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md
- docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md (next)
- docs/analysis/PHASE5_POST_E1_NEXT_INSTRUCTIONS.md
- docs/analysis/ENV_PROFILE_PRESETS.md (E4-1 added, E3-4 corrected)
- CURRENT_TASK.md (E4-1 complete, E3-4 frozen)
- core/bench_profile.h (E4-1 promoted to default)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
Moe Charm (CI)
2025-12-14 04:24:34 +09:00
parent 21e2e4ac2b
commit 4a070d8a14
17 changed files with 1184 additions and 114 deletions

View File

@ -1,8 +1,59 @@
# 本線タスク(現在)
## 更新メモ2025-12-14 Phase 5 E4-1 Complete - Free Gate Optimization
### Phase 5 E4-1: Free Wrapper ENV Snapshot ✅ GO (2025-12-14)
**Target**: Consolidate TLS reads in free() wrapper to reduce 25.26% self% hot spot
- Strategy: Apply E1 success pattern (ENV snapshot consolidation), NOT E3-4 failure pattern
- Implementation: Single TLS snapshot with packed flags (wrap_shape + front_gate + hotcold)
- Reduce: 2 TLS reads → 1 TLS read, 4 branches → 3 branches
**Implementation**:
- ENV gate: `HAKMEM_FREE_WRAPPER_ENV_SNAPSHOT=0/1` (default: 0, research box)
- Files: `core/box/free_wrapper_env_snapshot_box.{h,c}` (new ENV snapshot box)
- Integration: `core/box/hak_wrappers.inc.h` (lines 552-580, free() wrapper)
**A/B Test Results** (Mixed, 10-run, 20M iters, ws=400):
- Baseline (SNAPSHOT=0): **45.35M ops/s** (mean), 45.31M ops/s (median), σ=0.34M
- Optimized (SNAPSHOT=1): **46.94M ops/s** (mean), 47.15M ops/s (median), σ=0.94M
- **Delta: +3.51% mean, +4.07% median** ✅
**Decision: GO** (+3.51% >= +1.0% threshold)
- Exceeded conservative estimate (+1.5%) → Achieved +3.51%
- Similar to E1 success (+3.92%) - ENV consolidation pattern works
- Action: Promote to `MIXED_TINYV3_C7_SAFE` preset (HAKMEM_FREE_WRAPPER_ENV_SNAPSHOT=1 default)
**Health Check**: ✅ PASS
- MIXED_TINYV3_C7_SAFE: 42.5M ops/s
- C6_HEAVY_LEGACY_POOLV1: 23.0M ops/s
- All profiles passed, no regressions
**Perf Profile** (SNAPSHOT=1, 20M iters):
- free(): 25.26% (unchanged in this sample)
- NEW hot spot: hakmem_env_snapshot_enabled: 4.67% (ENV snapshot overhead visible)
- Note: Small sample (65 samples) may not be fully representative
- Overall throughput improved +3.51% despite ENV snapshot overhead cost
**Key Insight**: ENV consolidation continues to yield strong returns. Free path optimization via TLS reduction proves effective, matching E1's success pattern. The visible ENV snapshot overhead (4.67%) is outweighed by overall path efficiency gains.
**Cumulative Status (Phase 5)**:
- E4-1 (Free Wrapper Snapshot): +3.51% (GO)
- Total Phase 5: ~+3.5% (on top of Phase 4's +3.9%)
**Next Steps**:
- ✅ Promoted: `MIXED_TINYV3_C7_SAFE``HAKMEM_FREE_WRAPPER_ENV_SNAPSHOT=1` を default 化opt-out 可)
- Next target: E4-2malloc wrapper snapshotか、perf で self% ≥ 5% の芯を選ぶ
- Design doc: `docs/analysis/PHASE5_E4_FREE_GATE_OPTIMIZATION_1_DESIGN.md`
- 指示書:
- `docs/analysis/PHASE5_E4_1_FREE_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md`
- `docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md`
---
## 更新メモ2025-12-14 Phase 4 E3-4 Complete - ENV Constructor Init
### Phase 4 E3-4: ENV Constructor Init ✅ GO (+4.75%) (2025-12-14)
### Phase 4 E3-4: ENV Constructor Init ❌ NO-GO / FROZEN (2025-12-14)
**Target**: E1 の lazy init check3.22% self%)を constructor init で排除
- E1 で ENV snapshot を統合したが、`hakmem_env_snapshot_enabled()` の lazy check が残っていた
@ -13,23 +64,24 @@
- `core/box/hakmem_env_snapshot_box.c`: Constructor function 追加
- `core/box/hakmem_env_snapshot_box.h`: Dual-mode enabled check (constructor vs legacy)
**A/B Test Results** (Mixed, 10-run, 20M iters, HAKMEM_ENV_SNAPSHOT=1):
- Baseline (CTOR=0): **44.28M ops/s** (mean), 44.60M ops/s (median), σ=1.0M
- Optimized (CTOR=1): **46.38M ops/s** (mean), 46.53M ops/s (median), σ=0.5M
- **Improvement: +4.75% mean, +4.35% median**
**A/B Test Resultsre-validation** (Mixed, 10-run, 20M iters, ws=400, HAKMEM_ENV_SNAPSHOT=1):
- Baseline (CTOR=0): **47.55M ops/s** (mean), 47.46M ops/s (median)
- Optimized (CTOR=1): **46.86M ops/s** (mean), 46.97M ops/s (median)
- **Delta: -1.44% mean, -1.03% median**
**Decision: GO** (+4.75% >> +0.5% threshold)
- 期待値 +0.5-1.5% を大幅に上回る +4.75% 達成
- Action: Keep as research box for now (default OFF)
**Decision: NO-GO / FROZEN**
- 初回の +4.75% は再現しない(ノイズ/環境要因の可能性が高い)
- constructor mode は “追加の分岐/ロード” になり、現状の hot path では得にならない
- Action: default OFF のまま freeze追わない
- Design doc: `docs/analysis/PHASE4_E3_ENV_CONSTRUCTOR_INIT_DESIGN.md`
**Key Insight**: Lazy init check overhead was larger than expected. Constructor pattern eliminates branch in hot path entirely, yielding substantial gain.
**Key Insight**: “constructor で初期化” 自体は安全だが、性能面では現状 NO-GO。勝ち箱は E1 に集中する。
**Cumulative Status (Phase 4)**:
- E1 (ENV Snapshot): +3.92% (GO)
- E2 (Alloc Per-Class): -0.21% (NEUTRAL, frozen)
- **E3-4 (Constructor Init): +4.75% (GO)**
- **Total Phase 4: ~+8.5%**
- E3-4 (Constructor Init): NO-GO / frozen
- Total Phase 4: ~+3.9%E1 のみ)
---
@ -63,13 +115,16 @@
- Conclusion: Alloc route optimization has reached diminishing returns
**Cumulative Status**:
- Phase 4 E1: +3.92% (GO, research box)
- Phase 4 E1: +3.92% (GO)
- Phase 4 E2: -0.21% (NEUTRAL, frozen)
- Phase 4 E3-4: +4.75% (GO, research box; requires E1)
- Phase 4 E3-4: NO-GO / frozen
### Next: Phase 4 E3-4昇格判断
### Next: Phase 4close & next target
- 指示書: `docs/analysis/PHASE4_E3_ENV_CONSTRUCTOR_INIT_NEXT_INSTRUCTIONS.md`
- 勝ち箱: E1 を `MIXED_TINYV3_C7_SAFE` プリセットへ昇格opt-out 可)
- 研究箱: E3-4/E2 は freezedefault OFF
- 次の芯は perf で “self% ≥ 5%” の箱から選ぶ
- 次の指示書: `docs/analysis/PHASE5_POST_E1_NEXT_INSTRUCTIONS.md`
---