Phase 5 E4-1: Free Wrapper ENV Snapshot (+3.51% GO, ADOPTED)
Target: Consolidate free wrapper TLS reads (2→1)
- free() is 25.26% self% (top hot spot)
- Strategy: Apply E1 success pattern (ENV snapshot) to free path
Implementation:
- ENV gate: HAKMEM_FREE_WRAPPER_ENV_SNAPSHOT=0/1 (default 0)
- core/box/free_wrapper_env_snapshot_box.{h,c}: New box
- Consolidates 2 TLS reads → 1 TLS read (50% reduction)
- Reduces 4 branches → 3 branches (25% reduction)
- Lazy init with probe window (bench_profile putenv sync)
- core/box/hak_wrappers.inc.h: Integration in free() wrapper
- Makefile: Add free_wrapper_env_snapshot_box.o to all targets
A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (SNAPSHOT=0): 45.35M ops/s (mean), 45.31M ops/s (median)
- Optimized (SNAPSHOT=1): 46.94M ops/s (mean), 47.15M ops/s (median)
- Improvement: +3.51% mean, +4.07% median
Decision: GO (+3.51% >= +1.0% threshold)
- Exceeded conservative estimate (+1.5% → +3.51%)
- Similar efficiency to E1 (+3.92%)
- Health check: PASS (all profiles)
- Action: PROMOTED to MIXED_TINYV3_C7_SAFE preset
Phase 5 Cumulative:
- E1 (ENV Snapshot): +3.92%
- E4-1 (Free Wrapper Snapshot): +3.51%
- Total Phase 4-5: ~+7.5%
E3-4 Correction:
- Phase 4 E3-4 (ENV Constructor Init): NO-GO / FROZEN
- Initial A/B showed +4.75%, but investigation revealed:
- Branch prediction hint mismatch (UNLIKELY with always-true)
- Retest confirmed -1.78% regression
- Root cause: __builtin_expect(..., 0) with ctor_mode==1
- Decision: Freeze as research box (default OFF)
- Learning: Branch hints need careful tuning, TLS consolidation safer
Deliverables:
- docs/analysis/PHASE5_E4_FREE_GATE_OPTIMIZATION_1_DESIGN.md
- docs/analysis/PHASE5_E4_1_FREE_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md
- docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md (next)
- docs/analysis/PHASE5_POST_E1_NEXT_INSTRUCTIONS.md
- docs/analysis/ENV_PROFILE_PRESETS.md (E4-1 added, E3-4 corrected)
- CURRENT_TASK.md (E4-1 complete, E3-4 frozen)
- core/bench_profile.h (E4-1 promoted to default)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-14 04:24:34 +09:00
|
|
|
|
# Phase 5: Post-E1 Baseline & Next Target(次の指示書)
|
|
|
|
|
|
|
|
|
|
|
|
## Status(2025-12-14)
|
|
|
|
|
|
|
|
|
|
|
|
- Phase 4 の勝ち箱は **E1(ENV Snapshot)**(`MIXED_TINYV3_C7_SAFE` で default 化)
|
|
|
|
|
|
- E3-4(ENV CTOR)は **NO-GO / freeze**
|
|
|
|
|
|
- Phase 5 の勝ち箱: **E4-1(free wrapper snapshot)**(`MIXED_TINYV3_C7_SAFE` で default 化)
|
2025-12-14 05:13:29 +09:00
|
|
|
|
- Phase 5 の勝ち箱: **E4-2(malloc wrapper snapshot)**(`MIXED_TINYV3_C7_SAFE` で default 化)
|
|
|
|
|
|
- 次は “形” ではなく **新 baseline** で perf を取り直し、self% ≥ 5% の芯を殴る
|
Phase 5 E4-1: Free Wrapper ENV Snapshot (+3.51% GO, ADOPTED)
Target: Consolidate free wrapper TLS reads (2→1)
- free() is 25.26% self% (top hot spot)
- Strategy: Apply E1 success pattern (ENV snapshot) to free path
Implementation:
- ENV gate: HAKMEM_FREE_WRAPPER_ENV_SNAPSHOT=0/1 (default 0)
- core/box/free_wrapper_env_snapshot_box.{h,c}: New box
- Consolidates 2 TLS reads → 1 TLS read (50% reduction)
- Reduces 4 branches → 3 branches (25% reduction)
- Lazy init with probe window (bench_profile putenv sync)
- core/box/hak_wrappers.inc.h: Integration in free() wrapper
- Makefile: Add free_wrapper_env_snapshot_box.o to all targets
A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (SNAPSHOT=0): 45.35M ops/s (mean), 45.31M ops/s (median)
- Optimized (SNAPSHOT=1): 46.94M ops/s (mean), 47.15M ops/s (median)
- Improvement: +3.51% mean, +4.07% median
Decision: GO (+3.51% >= +1.0% threshold)
- Exceeded conservative estimate (+1.5% → +3.51%)
- Similar efficiency to E1 (+3.92%)
- Health check: PASS (all profiles)
- Action: PROMOTED to MIXED_TINYV3_C7_SAFE preset
Phase 5 Cumulative:
- E1 (ENV Snapshot): +3.92%
- E4-1 (Free Wrapper Snapshot): +3.51%
- Total Phase 4-5: ~+7.5%
E3-4 Correction:
- Phase 4 E3-4 (ENV Constructor Init): NO-GO / FROZEN
- Initial A/B showed +4.75%, but investigation revealed:
- Branch prediction hint mismatch (UNLIKELY with always-true)
- Retest confirmed -1.78% regression
- Root cause: __builtin_expect(..., 0) with ctor_mode==1
- Decision: Freeze as research box (default OFF)
- Learning: Branch hints need careful tuning, TLS consolidation safer
Deliverables:
- docs/analysis/PHASE5_E4_FREE_GATE_OPTIMIZATION_1_DESIGN.md
- docs/analysis/PHASE5_E4_1_FREE_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md
- docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md (next)
- docs/analysis/PHASE5_POST_E1_NEXT_INSTRUCTIONS.md
- docs/analysis/ENV_PROFILE_PRESETS.md (E4-1 added, E3-4 corrected)
- CURRENT_TASK.md (E4-1 complete, E3-4 frozen)
- core/bench_profile.h (E4-1 promoted to default)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-14 04:24:34 +09:00
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## Step 0: Baseline 固定(Mixed)
|
|
|
|
|
|
|
|
|
|
|
|
```sh
|
|
|
|
|
|
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE ./bench_random_mixed_hakmem 20000000 400 1
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
注意:
|
|
|
|
|
|
- 以後の A/B はこのプロファイル(=E1 ON)を基準にする
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## Step 1: perf で “芯” を選ぶ(self% ≥ 5%)
|
|
|
|
|
|
|
|
|
|
|
|
```sh
|
|
|
|
|
|
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE perf record -F 99 -- \
|
|
|
|
|
|
./bench_random_mixed_hakmem 20000000 400 1
|
|
|
|
|
|
perf report --stdio --no-children
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
GO/NO-GO:
|
|
|
|
|
|
- self% が **5% 未満**の最適化は原則 NO-GO(まず他を削る)
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## Step 2: 研究箱の候補を 1 つに絞る(Box Theory)
|
|
|
|
|
|
|
|
|
|
|
|
要件:
|
|
|
|
|
|
- L0 ENV gate(default OFF)を必ず用意(戻せる)
|
|
|
|
|
|
- 境界は 1 箇所(変換点を増やさない)
|
|
|
|
|
|
- 可視化はカウンタ 1 本まで(常時ログ禁止)
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## Step 3: A/B で GO 判定(Mixed)
|
|
|
|
|
|
|
|
|
|
|
|
Mixed 10-run:
|
|
|
|
|
|
- GO: mean **+1.0% 以上**
|
|
|
|
|
|
- ±1%: NEUTRAL(freeze)
|
|
|
|
|
|
- -1% 以下: NO-GO(freeze)
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## Step 4: 健康診断
|
|
|
|
|
|
|
|
|
|
|
|
```sh
|
|
|
|
|
|
scripts/verify_health_profiles.sh
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## Step 5: 昇格
|
|
|
|
|
|
|
|
|
|
|
|
- 勝ち箱だけを `core/bench_profile.h` のプリセットへ
|
|
|
|
|
|
- `docs/analysis/ENV_PROFILE_PRESETS.md` に結果+rollback を追記
|
|
|
|
|
|
- `CURRENT_TASK.md` を更新
|
|
|
|
|
|
|
|
|
|
|
|
## Next
|
|
|
|
|
|
|
|
|
|
|
|
- E4-1 昇格: `docs/analysis/PHASE5_E4_1_FREE_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md`
|
|
|
|
|
|
- E4-2 設計/実装: `docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md`
|
2025-12-14 05:13:29 +09:00
|
|
|
|
- E4 合算 A/B: `docs/analysis/PHASE5_E4_COMBINED_AB_TEST_NEXT_INSTRUCTIONS.md`
|
2025-12-14 05:36:57 +09:00
|
|
|
|
- E5 次の芯: `docs/analysis/PHASE5_E5_NEXT_INSTRUCTIONS.md`
|
2025-12-14 05:59:43 +09:00
|
|
|
|
- E5-1 昇格: `docs/analysis/PHASE5_E5_1_FREE_TINY_DIRECT_NEXT_INSTRUCTIONS.md`
|
2025-12-14 06:44:04 +09:00
|
|
|
|
- E5-4 次: `docs/analysis/PHASE5_E5_4_MALLOC_TINY_DIRECT_NEXT_INSTRUCTIONS.md`
|
2025-12-14 06:59:35 +09:00
|
|
|
|
- E6 次: `docs/analysis/PHASE5_E6_ENV_SNAPSHOT_SHAPE_NEXT_INSTRUCTIONS.md`
|
2025-12-14 07:18:59 +09:00
|
|
|
|
- E7 次: `docs/analysis/PHASE5_E7_FROZEN_BOX_PRUNE_NEXT_INSTRUCTIONS.md`
|
2025-12-14 08:11:20 +09:00
|
|
|
|
- E7 結果: `docs/analysis/PHASE5_E7_FROZEN_BOX_PRUNE_AB_TEST_RESULTS.md`
|