Phase 5 E4-2: Malloc Wrapper ENV Snapshot (+21.83% GO, ADOPTED)

Target: Consolidate malloc wrapper TLS reads + eliminate function calls
- malloc (16.13%) + tiny_alloc_gate_fast (19.50%) = 35.63% combined
- Strategy: E4-1 success pattern + function call elimination

Implementation:
- ENV gate: HAKMEM_MALLOC_WRAPPER_ENV_SNAPSHOT=0/1 (default 0)
- core/box/malloc_wrapper_env_snapshot_box.{h,c}: New box
  - Consolidates multiple TLS reads → 1 TLS read
  - Pre-caches tiny_max_size() == 256 (eliminates function call)
  - Lazy init with probe window (bench_profile putenv sync)
- core/box/hak_wrappers.inc.h: Integration in malloc() wrapper
- Makefile: Add malloc_wrapper_env_snapshot_box.o to all targets

A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (SNAPSHOT=0): 35.74M ops/s (mean), 35.75M ops/s (median)
- Optimized (SNAPSHOT=1): 43.54M ops/s (mean), 43.92M ops/s (median)
- Improvement: +21.83% mean, +22.86% median (+7.80M ops/s)

Decision: GO (+21.83% >> +1.0% threshold, 21.8x over)
- Why 6.2x better than E4-1 (+3.51%)?
  - Higher malloc call frequency (allocation-heavy workload)
  - Function call elimination (tiny_max_size pre-cached)
  - Larger target: 35.63% vs free's 25.26%
- Health check: PASS (all profiles)
- Action: PROMOTED to MIXED_TINYV3_C7_SAFE preset

Phase 5 Cumulative (estimated):
- E1 (ENV Snapshot): +3.92%
- E4-1 (Free Wrapper Snapshot): +3.51%
- E4-2 (Malloc Wrapper Snapshot): +21.83%
- Estimated combined: ~+30% (needs validation)

Next Steps:
- Combined A/B test (E4-1 + E4-2 simultaneously)
- Measure actual cumulative effect
- Profile new baseline for next optimization targets

Deliverables:
- docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_1_DESIGN.md
- docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_1_AB_TEST_RESULTS.md
- docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md
- docs/analysis/PHASE5_E4_COMBINED_AB_TEST_NEXT_INSTRUCTIONS.md (next)
- docs/analysis/ENV_PROFILE_PRESETS.md (E4-2 added)
- CURRENT_TASK.md (E4-2 complete)
- core/bench_profile.h (E4-2 promoted to default)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
Moe Charm (CI)
2025-12-14 05:13:29 +09:00
parent 4a070d8a14
commit 5528612f2a
12 changed files with 751 additions and 54 deletions

View File

@ -124,6 +124,13 @@ HAKMEM_FREE_WRAPPER_ENV_SNAPSHOT=1
- **Status**: ✅ GOMixed 10-run: **+3.51% mean / +4.07% median**)→ ✅ Promoted to `MIXED_TINYV3_C7_SAFE` preset defaultopt-out 可)
- **Effect**: `free()` wrapper の ENV 判定(複数 TLS readを TLS snapshot 1 本に集約して early gate を短絡
- **Rollback**: `HAKMEM_FREE_WRAPPER_ENV_SNAPSHOT=0`
- **Phase 5 E4-2Malloc Wrapper ENV Snapshot** ✅ GO (PROMOTION READY):
```sh
HAKMEM_MALLOC_WRAPPER_ENV_SNAPSHOT=1
```
- **Status**: ✅ GOMixed 10-run: **+21.83% mean / +22.86% median**)→ ✅ Promoted to `MIXED_TINYV3_C7_SAFE` preset defaultopt-out 可)
- **Effect**: `malloc()` wrapper の tiny fast 判定を TLS snapshot で短絡し、hot path の関数呼び出し/判定を削減(特に `tiny_get_max_size()`
- **Rollback**: `HAKMEM_MALLOC_WRAPPER_ENV_SNAPSHOT=0`
- v2 系は触らないC7_SAFE では Pool v2 / Tiny v2 は常時 OFF
- FREE_POLICY/THP を触る実験例(現在の HEAD では必須ではなく、組み合わせによっては微マイナスになる場合もある):
```sh