Target: Consolidate malloc wrapper TLS reads + eliminate function calls
- malloc (16.13%) + tiny_alloc_gate_fast (19.50%) = 35.63% combined
- Strategy: E4-1 success pattern + function call elimination
Implementation:
- ENV gate: HAKMEM_MALLOC_WRAPPER_ENV_SNAPSHOT=0/1 (default 0)
- core/box/malloc_wrapper_env_snapshot_box.{h,c}: New box
- Consolidates multiple TLS reads → 1 TLS read
- Pre-caches tiny_max_size() == 256 (eliminates function call)
- Lazy init with probe window (bench_profile putenv sync)
- core/box/hak_wrappers.inc.h: Integration in malloc() wrapper
- Makefile: Add malloc_wrapper_env_snapshot_box.o to all targets
A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (SNAPSHOT=0): 35.74M ops/s (mean), 35.75M ops/s (median)
- Optimized (SNAPSHOT=1): 43.54M ops/s (mean), 43.92M ops/s (median)
- Improvement: +21.83% mean, +22.86% median (+7.80M ops/s)
Decision: GO (+21.83% >> +1.0% threshold, 21.8x over)
- Why 6.2x better than E4-1 (+3.51%)?
- Higher malloc call frequency (allocation-heavy workload)
- Function call elimination (tiny_max_size pre-cached)
- Larger target: 35.63% vs free's 25.26%
- Health check: PASS (all profiles)
- Action: PROMOTED to MIXED_TINYV3_C7_SAFE preset
Phase 5 Cumulative (estimated):
- E1 (ENV Snapshot): +3.92%
- E4-1 (Free Wrapper Snapshot): +3.51%
- E4-2 (Malloc Wrapper Snapshot): +21.83%
- Estimated combined: ~+30% (needs validation)
Next Steps:
- Combined A/B test (E4-1 + E4-2 simultaneously)
- Measure actual cumulative effect
- Profile new baseline for next optimization targets
Deliverables:
- docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_1_DESIGN.md
- docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_1_AB_TEST_RESULTS.md
- docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md
- docs/analysis/PHASE5_E4_COMBINED_AB_TEST_NEXT_INSTRUCTIONS.md (next)
- docs/analysis/ENV_PROFILE_PRESETS.md (E4-2 added)
- CURRENT_TASK.md (E4-2 complete)
- core/bench_profile.h (E4-2 promoted to default)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2.0 KiB
2.0 KiB
Phase 5: Post-E1 Baseline & Next Target(次の指示書)
Status(2025-12-14)
- Phase 4 の勝ち箱は E1(ENV Snapshot)(
MIXED_TINYV3_C7_SAFEで default 化) - E3-4(ENV CTOR)は NO-GO / freeze
- Phase 5 の勝ち箱: E4-1(free wrapper snapshot)(
MIXED_TINYV3_C7_SAFEで default 化) - Phase 5 の勝ち箱: E4-2(malloc wrapper snapshot)(
MIXED_TINYV3_C7_SAFEで default 化) - 次は “形” ではなく 新 baseline で perf を取り直し、self% ≥ 5% の芯を殴る
Step 0: Baseline 固定(Mixed)
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE ./bench_random_mixed_hakmem 20000000 400 1
注意:
- 以後の A/B はこのプロファイル(=E1 ON)を基準にする
Step 1: perf で “芯” を選ぶ(self% ≥ 5%)
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE perf record -F 99 -- \
./bench_random_mixed_hakmem 20000000 400 1
perf report --stdio --no-children
GO/NO-GO:
- self% が 5% 未満の最適化は原則 NO-GO(まず他を削る)
Step 2: 研究箱の候補を 1 つに絞る(Box Theory)
要件:
- L0 ENV gate(default OFF)を必ず用意(戻せる)
- 境界は 1 箇所(変換点を増やさない)
- 可視化はカウンタ 1 本まで(常時ログ禁止)
Step 3: A/B で GO 判定(Mixed)
Mixed 10-run:
- GO: mean +1.0% 以上
- ±1%: NEUTRAL(freeze)
- -1% 以下: NO-GO(freeze)
Step 4: 健康診断
scripts/verify_health_profiles.sh
Step 5: 昇格
- 勝ち箱だけを
core/bench_profile.hのプリセットへ docs/analysis/ENV_PROFILE_PRESETS.mdに結果+rollback を追記CURRENT_TASK.mdを更新
Next
- E4-1 昇格:
docs/analysis/PHASE5_E4_1_FREE_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md - E4-2 設計/実装:
docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md - E4 合算 A/B:
docs/analysis/PHASE5_E4_COMBINED_AB_TEST_NEXT_INSTRUCTIONS.md