Files
hakmem/docs/analysis/PHASE5_POST_E1_NEXT_INSTRUCTIONS.md
Moe Charm (CI) 5528612f2a Phase 5 E4-2: Malloc Wrapper ENV Snapshot (+21.83% GO, ADOPTED)
Target: Consolidate malloc wrapper TLS reads + eliminate function calls
- malloc (16.13%) + tiny_alloc_gate_fast (19.50%) = 35.63% combined
- Strategy: E4-1 success pattern + function call elimination

Implementation:
- ENV gate: HAKMEM_MALLOC_WRAPPER_ENV_SNAPSHOT=0/1 (default 0)
- core/box/malloc_wrapper_env_snapshot_box.{h,c}: New box
  - Consolidates multiple TLS reads → 1 TLS read
  - Pre-caches tiny_max_size() == 256 (eliminates function call)
  - Lazy init with probe window (bench_profile putenv sync)
- core/box/hak_wrappers.inc.h: Integration in malloc() wrapper
- Makefile: Add malloc_wrapper_env_snapshot_box.o to all targets

A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (SNAPSHOT=0): 35.74M ops/s (mean), 35.75M ops/s (median)
- Optimized (SNAPSHOT=1): 43.54M ops/s (mean), 43.92M ops/s (median)
- Improvement: +21.83% mean, +22.86% median (+7.80M ops/s)

Decision: GO (+21.83% >> +1.0% threshold, 21.8x over)
- Why 6.2x better than E4-1 (+3.51%)?
  - Higher malloc call frequency (allocation-heavy workload)
  - Function call elimination (tiny_max_size pre-cached)
  - Larger target: 35.63% vs free's 25.26%
- Health check: PASS (all profiles)
- Action: PROMOTED to MIXED_TINYV3_C7_SAFE preset

Phase 5 Cumulative (estimated):
- E1 (ENV Snapshot): +3.92%
- E4-1 (Free Wrapper Snapshot): +3.51%
- E4-2 (Malloc Wrapper Snapshot): +21.83%
- Estimated combined: ~+30% (needs validation)

Next Steps:
- Combined A/B test (E4-1 + E4-2 simultaneously)
- Measure actual cumulative effect
- Profile new baseline for next optimization targets

Deliverables:
- docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_1_DESIGN.md
- docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_1_AB_TEST_RESULTS.md
- docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md
- docs/analysis/PHASE5_E4_COMBINED_AB_TEST_NEXT_INSTRUCTIONS.md (next)
- docs/analysis/ENV_PROFILE_PRESETS.md (E4-2 added)
- CURRENT_TASK.md (E4-2 complete)
- core/bench_profile.h (E4-2 promoted to default)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-14 05:13:29 +09:00

74 lines
2.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Phase 5: Post-E1 Baseline & Next Target次の指示書
## Status2025-12-14
- Phase 4 の勝ち箱は **E1ENV Snapshot**`MIXED_TINYV3_C7_SAFE` で default 化)
- E3-4ENV CTOR**NO-GO / freeze**
- Phase 5 の勝ち箱: **E4-1free wrapper snapshot**`MIXED_TINYV3_C7_SAFE` で default 化)
- Phase 5 の勝ち箱: **E4-2malloc wrapper snapshot**`MIXED_TINYV3_C7_SAFE` で default 化)
- 次は “形” ではなく **新 baseline** で perf を取り直し、self% ≥ 5% の芯を殴る
---
## Step 0: Baseline 固定Mixed
```sh
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE ./bench_random_mixed_hakmem 20000000 400 1
```
注意:
- 以後の A/B はこのプロファイル(=E1 ONを基準にする
---
## Step 1: perf で “芯” を選ぶself% ≥ 5%
```sh
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE perf record -F 99 -- \
./bench_random_mixed_hakmem 20000000 400 1
perf report --stdio --no-children
```
GO/NO-GO:
- self% が **5% 未満**の最適化は原則 NO-GOまず他を削る
---
## Step 2: 研究箱の候補を 1 つに絞るBox Theory
要件:
- L0 ENV gatedefault OFFを必ず用意戻せる
- 境界は 1 箇所(変換点を増やさない)
- 可視化はカウンタ 1 本まで(常時ログ禁止)
---
## Step 3: A/B で GO 判定Mixed
Mixed 10-run:
- GO: mean **+1.0% 以上**
- ±1%: NEUTRALfreeze
- -1% 以下: NO-GOfreeze
---
## Step 4: 健康診断
```sh
scripts/verify_health_profiles.sh
```
---
## Step 5: 昇格
- 勝ち箱だけを `core/bench_profile.h` のプリセットへ
- `docs/analysis/ENV_PROFILE_PRESETS.md` に結果rollback を追記
- `CURRENT_TASK.md` を更新
## Next
- E4-1 昇格: `docs/analysis/PHASE5_E4_1_FREE_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md`
- E4-2 設計/実装: `docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md`
- E4 合算 A/B: `docs/analysis/PHASE5_E4_COMBINED_AB_TEST_NEXT_INSTRUCTIONS.md`