Files
hakmem/docs/analysis/PHASE5_POST_E1_NEXT_INSTRUCTIONS.md

74 lines
2.0 KiB
Markdown
Raw Normal View History

Phase 5 E4-1: Free Wrapper ENV Snapshot (+3.51% GO, ADOPTED) Target: Consolidate free wrapper TLS reads (2→1) - free() is 25.26% self% (top hot spot) - Strategy: Apply E1 success pattern (ENV snapshot) to free path Implementation: - ENV gate: HAKMEM_FREE_WRAPPER_ENV_SNAPSHOT=0/1 (default 0) - core/box/free_wrapper_env_snapshot_box.{h,c}: New box - Consolidates 2 TLS reads → 1 TLS read (50% reduction) - Reduces 4 branches → 3 branches (25% reduction) - Lazy init with probe window (bench_profile putenv sync) - core/box/hak_wrappers.inc.h: Integration in free() wrapper - Makefile: Add free_wrapper_env_snapshot_box.o to all targets A/B Test Results (Mixed, 10-run, 20M iters): - Baseline (SNAPSHOT=0): 45.35M ops/s (mean), 45.31M ops/s (median) - Optimized (SNAPSHOT=1): 46.94M ops/s (mean), 47.15M ops/s (median) - Improvement: +3.51% mean, +4.07% median Decision: GO (+3.51% >= +1.0% threshold) - Exceeded conservative estimate (+1.5% → +3.51%) - Similar efficiency to E1 (+3.92%) - Health check: PASS (all profiles) - Action: PROMOTED to MIXED_TINYV3_C7_SAFE preset Phase 5 Cumulative: - E1 (ENV Snapshot): +3.92% - E4-1 (Free Wrapper Snapshot): +3.51% - Total Phase 4-5: ~+7.5% E3-4 Correction: - Phase 4 E3-4 (ENV Constructor Init): NO-GO / FROZEN - Initial A/B showed +4.75%, but investigation revealed: - Branch prediction hint mismatch (UNLIKELY with always-true) - Retest confirmed -1.78% regression - Root cause: __builtin_expect(..., 0) with ctor_mode==1 - Decision: Freeze as research box (default OFF) - Learning: Branch hints need careful tuning, TLS consolidation safer Deliverables: - docs/analysis/PHASE5_E4_FREE_GATE_OPTIMIZATION_1_DESIGN.md - docs/analysis/PHASE5_E4_1_FREE_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md - docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md (next) - docs/analysis/PHASE5_POST_E1_NEXT_INSTRUCTIONS.md - docs/analysis/ENV_PROFILE_PRESETS.md (E4-1 added, E3-4 corrected) - CURRENT_TASK.md (E4-1 complete, E3-4 frozen) - core/bench_profile.h (E4-1 promoted to default) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-14 04:24:34 +09:00
# Phase 5: Post-E1 Baseline & Next Target次の指示書
## Status2025-12-14
- Phase 4 の勝ち箱は **E1ENV Snapshot**`MIXED_TINYV3_C7_SAFE` で default 化)
- E3-4ENV CTOR**NO-GO / freeze**
- Phase 5 の勝ち箱: **E4-1free wrapper snapshot**`MIXED_TINYV3_C7_SAFE` で default 化)
Phase 5 E4-2: Malloc Wrapper ENV Snapshot (+21.83% GO, ADOPTED) Target: Consolidate malloc wrapper TLS reads + eliminate function calls - malloc (16.13%) + tiny_alloc_gate_fast (19.50%) = 35.63% combined - Strategy: E4-1 success pattern + function call elimination Implementation: - ENV gate: HAKMEM_MALLOC_WRAPPER_ENV_SNAPSHOT=0/1 (default 0) - core/box/malloc_wrapper_env_snapshot_box.{h,c}: New box - Consolidates multiple TLS reads → 1 TLS read - Pre-caches tiny_max_size() == 256 (eliminates function call) - Lazy init with probe window (bench_profile putenv sync) - core/box/hak_wrappers.inc.h: Integration in malloc() wrapper - Makefile: Add malloc_wrapper_env_snapshot_box.o to all targets A/B Test Results (Mixed, 10-run, 20M iters): - Baseline (SNAPSHOT=0): 35.74M ops/s (mean), 35.75M ops/s (median) - Optimized (SNAPSHOT=1): 43.54M ops/s (mean), 43.92M ops/s (median) - Improvement: +21.83% mean, +22.86% median (+7.80M ops/s) Decision: GO (+21.83% >> +1.0% threshold, 21.8x over) - Why 6.2x better than E4-1 (+3.51%)? - Higher malloc call frequency (allocation-heavy workload) - Function call elimination (tiny_max_size pre-cached) - Larger target: 35.63% vs free's 25.26% - Health check: PASS (all profiles) - Action: PROMOTED to MIXED_TINYV3_C7_SAFE preset Phase 5 Cumulative (estimated): - E1 (ENV Snapshot): +3.92% - E4-1 (Free Wrapper Snapshot): +3.51% - E4-2 (Malloc Wrapper Snapshot): +21.83% - Estimated combined: ~+30% (needs validation) Next Steps: - Combined A/B test (E4-1 + E4-2 simultaneously) - Measure actual cumulative effect - Profile new baseline for next optimization targets Deliverables: - docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_1_DESIGN.md - docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_1_AB_TEST_RESULTS.md - docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md - docs/analysis/PHASE5_E4_COMBINED_AB_TEST_NEXT_INSTRUCTIONS.md (next) - docs/analysis/ENV_PROFILE_PRESETS.md (E4-2 added) - CURRENT_TASK.md (E4-2 complete) - core/bench_profile.h (E4-2 promoted to default) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-14 05:13:29 +09:00
- Phase 5 の勝ち箱: **E4-2malloc wrapper snapshot**`MIXED_TINYV3_C7_SAFE` で default 化)
- 次は “形” ではなく **新 baseline** で perf を取り直し、self% ≥ 5% の芯を殴る
Phase 5 E4-1: Free Wrapper ENV Snapshot (+3.51% GO, ADOPTED) Target: Consolidate free wrapper TLS reads (2→1) - free() is 25.26% self% (top hot spot) - Strategy: Apply E1 success pattern (ENV snapshot) to free path Implementation: - ENV gate: HAKMEM_FREE_WRAPPER_ENV_SNAPSHOT=0/1 (default 0) - core/box/free_wrapper_env_snapshot_box.{h,c}: New box - Consolidates 2 TLS reads → 1 TLS read (50% reduction) - Reduces 4 branches → 3 branches (25% reduction) - Lazy init with probe window (bench_profile putenv sync) - core/box/hak_wrappers.inc.h: Integration in free() wrapper - Makefile: Add free_wrapper_env_snapshot_box.o to all targets A/B Test Results (Mixed, 10-run, 20M iters): - Baseline (SNAPSHOT=0): 45.35M ops/s (mean), 45.31M ops/s (median) - Optimized (SNAPSHOT=1): 46.94M ops/s (mean), 47.15M ops/s (median) - Improvement: +3.51% mean, +4.07% median Decision: GO (+3.51% >= +1.0% threshold) - Exceeded conservative estimate (+1.5% → +3.51%) - Similar efficiency to E1 (+3.92%) - Health check: PASS (all profiles) - Action: PROMOTED to MIXED_TINYV3_C7_SAFE preset Phase 5 Cumulative: - E1 (ENV Snapshot): +3.92% - E4-1 (Free Wrapper Snapshot): +3.51% - Total Phase 4-5: ~+7.5% E3-4 Correction: - Phase 4 E3-4 (ENV Constructor Init): NO-GO / FROZEN - Initial A/B showed +4.75%, but investigation revealed: - Branch prediction hint mismatch (UNLIKELY with always-true) - Retest confirmed -1.78% regression - Root cause: __builtin_expect(..., 0) with ctor_mode==1 - Decision: Freeze as research box (default OFF) - Learning: Branch hints need careful tuning, TLS consolidation safer Deliverables: - docs/analysis/PHASE5_E4_FREE_GATE_OPTIMIZATION_1_DESIGN.md - docs/analysis/PHASE5_E4_1_FREE_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md - docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md (next) - docs/analysis/PHASE5_POST_E1_NEXT_INSTRUCTIONS.md - docs/analysis/ENV_PROFILE_PRESETS.md (E4-1 added, E3-4 corrected) - CURRENT_TASK.md (E4-1 complete, E3-4 frozen) - core/bench_profile.h (E4-1 promoted to default) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-14 04:24:34 +09:00
---
## Step 0: Baseline 固定Mixed
```sh
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE ./bench_random_mixed_hakmem 20000000 400 1
```
注意:
- 以後の A/B はこのプロファイル(=E1 ONを基準にする
---
## Step 1: perf で “芯” を選ぶself% ≥ 5%
```sh
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE perf record -F 99 -- \
./bench_random_mixed_hakmem 20000000 400 1
perf report --stdio --no-children
```
GO/NO-GO:
- self% が **5% 未満**の最適化は原則 NO-GOまず他を削る
---
## Step 2: 研究箱の候補を 1 つに絞るBox Theory
要件:
- L0 ENV gatedefault OFFを必ず用意戻せる
- 境界は 1 箇所(変換点を増やさない)
- 可視化はカウンタ 1 本まで(常時ログ禁止)
---
## Step 3: A/B で GO 判定Mixed
Mixed 10-run:
- GO: mean **+1.0% 以上**
- ±1%: NEUTRALfreeze
- -1% 以下: NO-GOfreeze
---
## Step 4: 健康診断
```sh
scripts/verify_health_profiles.sh
```
---
## Step 5: 昇格
- 勝ち箱だけを `core/bench_profile.h` のプリセットへ
- `docs/analysis/ENV_PROFILE_PRESETS.md` に結果rollback を追記
- `CURRENT_TASK.md` を更新
## Next
- E4-1 昇格: `docs/analysis/PHASE5_E4_1_FREE_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md`
- E4-2 設計/実装: `docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md`
Phase 5 E4-2: Malloc Wrapper ENV Snapshot (+21.83% GO, ADOPTED) Target: Consolidate malloc wrapper TLS reads + eliminate function calls - malloc (16.13%) + tiny_alloc_gate_fast (19.50%) = 35.63% combined - Strategy: E4-1 success pattern + function call elimination Implementation: - ENV gate: HAKMEM_MALLOC_WRAPPER_ENV_SNAPSHOT=0/1 (default 0) - core/box/malloc_wrapper_env_snapshot_box.{h,c}: New box - Consolidates multiple TLS reads → 1 TLS read - Pre-caches tiny_max_size() == 256 (eliminates function call) - Lazy init with probe window (bench_profile putenv sync) - core/box/hak_wrappers.inc.h: Integration in malloc() wrapper - Makefile: Add malloc_wrapper_env_snapshot_box.o to all targets A/B Test Results (Mixed, 10-run, 20M iters): - Baseline (SNAPSHOT=0): 35.74M ops/s (mean), 35.75M ops/s (median) - Optimized (SNAPSHOT=1): 43.54M ops/s (mean), 43.92M ops/s (median) - Improvement: +21.83% mean, +22.86% median (+7.80M ops/s) Decision: GO (+21.83% >> +1.0% threshold, 21.8x over) - Why 6.2x better than E4-1 (+3.51%)? - Higher malloc call frequency (allocation-heavy workload) - Function call elimination (tiny_max_size pre-cached) - Larger target: 35.63% vs free's 25.26% - Health check: PASS (all profiles) - Action: PROMOTED to MIXED_TINYV3_C7_SAFE preset Phase 5 Cumulative (estimated): - E1 (ENV Snapshot): +3.92% - E4-1 (Free Wrapper Snapshot): +3.51% - E4-2 (Malloc Wrapper Snapshot): +21.83% - Estimated combined: ~+30% (needs validation) Next Steps: - Combined A/B test (E4-1 + E4-2 simultaneously) - Measure actual cumulative effect - Profile new baseline for next optimization targets Deliverables: - docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_1_DESIGN.md - docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_1_AB_TEST_RESULTS.md - docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md - docs/analysis/PHASE5_E4_COMBINED_AB_TEST_NEXT_INSTRUCTIONS.md (next) - docs/analysis/ENV_PROFILE_PRESETS.md (E4-2 added) - CURRENT_TASK.md (E4-2 complete) - core/bench_profile.h (E4-2 promoted to default) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-14 05:13:29 +09:00
- E4 合算 A/B: `docs/analysis/PHASE5_E4_COMBINED_AB_TEST_NEXT_INSTRUCTIONS.md`