Files
hakmem/docs/analysis/PHASE4_E3_ENV_CONSTRUCTOR_INIT_NEXT_INSTRUCTIONS.md

94 lines
2.3 KiB
Markdown
Raw Normal View History

Phase 4 E3-4: ENV Constructor Init (+4.75% GO) Target: Eliminate E1 lazy init check overhead (3.22% self%) - E1 consolidated ENV gates but lazy check remained in hot path - Strategy: __attribute__((constructor(101))) for pre-main init Implementation: - ENV gate: HAKMEM_ENV_SNAPSHOT_CTOR=0/1 (default 0, research box) - core/box/hakmem_env_snapshot_box.c: Constructor function added - Reads ENV before main() when CTOR=1 - Refresh also syncs gate state for bench_profile putenv - core/box/hakmem_env_snapshot_box.h: Dual-mode enabled check - CTOR=1 fast path: direct global read (no lazy branch) - CTOR=0 fallback: legacy lazy init (rollback safe) - Branch hints adjusted for default OFF baseline A/B Test Results (Mixed, 10-run, 20M iters, E1=1): - Baseline (CTOR=0): 44.28M ops/s (mean), 44.60M ops/s (median) - Optimized (CTOR=1): 46.38M ops/s (mean), 46.53M ops/s (median) - Improvement: +4.75% mean, +4.35% median Decision: GO (+4.75% >> +0.5% threshold) - Expected +0.5-1.5%, achieved +4.75% - Lazy init branch overhead was larger than expected - Action: Keep as research box (default OFF), evaluate promotion Phase 4 Cumulative: - E1 (ENV Snapshot): +3.92% - E2 (Alloc Per-Class): -0.21% (NEUTRAL, frozen) - E3-4 (Constructor Init): +4.75% - Total Phase 4: ~+8.5% Deliverables: - docs/analysis/PHASE4_E3_ENV_CONSTRUCTOR_INIT_DESIGN.md - docs/analysis/PHASE4_E3_ENV_CONSTRUCTOR_INIT_NEXT_INSTRUCTIONS.md - docs/analysis/PHASE4_COMPREHENSIVE_STATUS_ANALYSIS.md - docs/analysis/PHASE4_EXECUTIVE_SUMMARY.md - scripts/verify_health_profiles.sh (sanity check script) - CURRENT_TASK.md (E3-4 complete, next instructions) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-14 02:57:35 +09:00
# Phase 4 E3-4: ENV Constructor Init次の指示書
## Status2025-12-14
Phase 5 E4-1: Free Wrapper ENV Snapshot (+3.51% GO, ADOPTED) Target: Consolidate free wrapper TLS reads (2→1) - free() is 25.26% self% (top hot spot) - Strategy: Apply E1 success pattern (ENV snapshot) to free path Implementation: - ENV gate: HAKMEM_FREE_WRAPPER_ENV_SNAPSHOT=0/1 (default 0) - core/box/free_wrapper_env_snapshot_box.{h,c}: New box - Consolidates 2 TLS reads → 1 TLS read (50% reduction) - Reduces 4 branches → 3 branches (25% reduction) - Lazy init with probe window (bench_profile putenv sync) - core/box/hak_wrappers.inc.h: Integration in free() wrapper - Makefile: Add free_wrapper_env_snapshot_box.o to all targets A/B Test Results (Mixed, 10-run, 20M iters): - Baseline (SNAPSHOT=0): 45.35M ops/s (mean), 45.31M ops/s (median) - Optimized (SNAPSHOT=1): 46.94M ops/s (mean), 47.15M ops/s (median) - Improvement: +3.51% mean, +4.07% median Decision: GO (+3.51% >= +1.0% threshold) - Exceeded conservative estimate (+1.5% → +3.51%) - Similar efficiency to E1 (+3.92%) - Health check: PASS (all profiles) - Action: PROMOTED to MIXED_TINYV3_C7_SAFE preset Phase 5 Cumulative: - E1 (ENV Snapshot): +3.92% - E4-1 (Free Wrapper Snapshot): +3.51% - Total Phase 4-5: ~+7.5% E3-4 Correction: - Phase 4 E3-4 (ENV Constructor Init): NO-GO / FROZEN - Initial A/B showed +4.75%, but investigation revealed: - Branch prediction hint mismatch (UNLIKELY with always-true) - Retest confirmed -1.78% regression - Root cause: __builtin_expect(..., 0) with ctor_mode==1 - Decision: Freeze as research box (default OFF) - Learning: Branch hints need careful tuning, TLS consolidation safer Deliverables: - docs/analysis/PHASE5_E4_FREE_GATE_OPTIMIZATION_1_DESIGN.md - docs/analysis/PHASE5_E4_1_FREE_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md - docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md (next) - docs/analysis/PHASE5_POST_E1_NEXT_INSTRUCTIONS.md - docs/analysis/ENV_PROFILE_PRESETS.md (E4-1 added, E3-4 corrected) - CURRENT_TASK.md (E4-1 complete, E3-4 frozen) - core/bench_profile.h (E4-1 promoted to default) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-14 04:24:34 +09:00
- ❌ NO-GO / FROZENdefault OFF
- 再検証 A/BMixed, 10-run, iter=20M, ws=400, E1=1: **-1.44% mean / -1.03% median**
Phase 4 E3-4: ENV Constructor Init (+4.75% GO) Target: Eliminate E1 lazy init check overhead (3.22% self%) - E1 consolidated ENV gates but lazy check remained in hot path - Strategy: __attribute__((constructor(101))) for pre-main init Implementation: - ENV gate: HAKMEM_ENV_SNAPSHOT_CTOR=0/1 (default 0, research box) - core/box/hakmem_env_snapshot_box.c: Constructor function added - Reads ENV before main() when CTOR=1 - Refresh also syncs gate state for bench_profile putenv - core/box/hakmem_env_snapshot_box.h: Dual-mode enabled check - CTOR=1 fast path: direct global read (no lazy branch) - CTOR=0 fallback: legacy lazy init (rollback safe) - Branch hints adjusted for default OFF baseline A/B Test Results (Mixed, 10-run, 20M iters, E1=1): - Baseline (CTOR=0): 44.28M ops/s (mean), 44.60M ops/s (median) - Optimized (CTOR=1): 46.38M ops/s (mean), 46.53M ops/s (median) - Improvement: +4.75% mean, +4.35% median Decision: GO (+4.75% >> +0.5% threshold) - Expected +0.5-1.5%, achieved +4.75% - Lazy init branch overhead was larger than expected - Action: Keep as research box (default OFF), evaluate promotion Phase 4 Cumulative: - E1 (ENV Snapshot): +3.92% - E2 (Alloc Per-Class): -0.21% (NEUTRAL, frozen) - E3-4 (Constructor Init): +4.75% - Total Phase 4: ~+8.5% Deliverables: - docs/analysis/PHASE4_E3_ENV_CONSTRUCTOR_INIT_DESIGN.md - docs/analysis/PHASE4_E3_ENV_CONSTRUCTOR_INIT_NEXT_INSTRUCTIONS.md - docs/analysis/PHASE4_COMPREHENSIVE_STATUS_ANALYSIS.md - docs/analysis/PHASE4_EXECUTIVE_SUMMARY.md - scripts/verify_health_profiles.sh (sanity check script) - CURRENT_TASK.md (E3-4 complete, next instructions) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-14 02:57:35 +09:00
- ENV:
- E1: `HAKMEM_ENV_SNAPSHOT=0/1`default 0
- E3-4: `HAKMEM_ENV_SNAPSHOT_CTOR=0/1`default 0、E1=1 前提)
## ゴール
Phase 5 E4-1: Free Wrapper ENV Snapshot (+3.51% GO, ADOPTED) Target: Consolidate free wrapper TLS reads (2→1) - free() is 25.26% self% (top hot spot) - Strategy: Apply E1 success pattern (ENV snapshot) to free path Implementation: - ENV gate: HAKMEM_FREE_WRAPPER_ENV_SNAPSHOT=0/1 (default 0) - core/box/free_wrapper_env_snapshot_box.{h,c}: New box - Consolidates 2 TLS reads → 1 TLS read (50% reduction) - Reduces 4 branches → 3 branches (25% reduction) - Lazy init with probe window (bench_profile putenv sync) - core/box/hak_wrappers.inc.h: Integration in free() wrapper - Makefile: Add free_wrapper_env_snapshot_box.o to all targets A/B Test Results (Mixed, 10-run, 20M iters): - Baseline (SNAPSHOT=0): 45.35M ops/s (mean), 45.31M ops/s (median) - Optimized (SNAPSHOT=1): 46.94M ops/s (mean), 47.15M ops/s (median) - Improvement: +3.51% mean, +4.07% median Decision: GO (+3.51% >= +1.0% threshold) - Exceeded conservative estimate (+1.5% → +3.51%) - Similar efficiency to E1 (+3.92%) - Health check: PASS (all profiles) - Action: PROMOTED to MIXED_TINYV3_C7_SAFE preset Phase 5 Cumulative: - E1 (ENV Snapshot): +3.92% - E4-1 (Free Wrapper Snapshot): +3.51% - Total Phase 4-5: ~+7.5% E3-4 Correction: - Phase 4 E3-4 (ENV Constructor Init): NO-GO / FROZEN - Initial A/B showed +4.75%, but investigation revealed: - Branch prediction hint mismatch (UNLIKELY with always-true) - Retest confirmed -1.78% regression - Root cause: __builtin_expect(..., 0) with ctor_mode==1 - Decision: Freeze as research box (default OFF) - Learning: Branch hints need careful tuning, TLS consolidation safer Deliverables: - docs/analysis/PHASE5_E4_FREE_GATE_OPTIMIZATION_1_DESIGN.md - docs/analysis/PHASE5_E4_1_FREE_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md - docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md (next) - docs/analysis/PHASE5_POST_E1_NEXT_INSTRUCTIONS.md - docs/analysis/ENV_PROFILE_PRESETS.md (E4-1 added, E3-4 corrected) - CURRENT_TASK.md (E4-1 complete, E3-4 frozen) - core/bench_profile.h (E4-1 promoted to default) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-14 04:24:34 +09:00
E3-4 は freeze したので、実行指示は “再現検証” ではなく “凍結維持/rollback”。
Phase 4 E3-4: ENV Constructor Init (+4.75% GO) Target: Eliminate E1 lazy init check overhead (3.22% self%) - E1 consolidated ENV gates but lazy check remained in hot path - Strategy: __attribute__((constructor(101))) for pre-main init Implementation: - ENV gate: HAKMEM_ENV_SNAPSHOT_CTOR=0/1 (default 0, research box) - core/box/hakmem_env_snapshot_box.c: Constructor function added - Reads ENV before main() when CTOR=1 - Refresh also syncs gate state for bench_profile putenv - core/box/hakmem_env_snapshot_box.h: Dual-mode enabled check - CTOR=1 fast path: direct global read (no lazy branch) - CTOR=0 fallback: legacy lazy init (rollback safe) - Branch hints adjusted for default OFF baseline A/B Test Results (Mixed, 10-run, 20M iters, E1=1): - Baseline (CTOR=0): 44.28M ops/s (mean), 44.60M ops/s (median) - Optimized (CTOR=1): 46.38M ops/s (mean), 46.53M ops/s (median) - Improvement: +4.75% mean, +4.35% median Decision: GO (+4.75% >> +0.5% threshold) - Expected +0.5-1.5%, achieved +4.75% - Lazy init branch overhead was larger than expected - Action: Keep as research box (default OFF), evaluate promotion Phase 4 Cumulative: - E1 (ENV Snapshot): +3.92% - E2 (Alloc Per-Class): -0.21% (NEUTRAL, frozen) - E3-4 (Constructor Init): +4.75% - Total Phase 4: ~+8.5% Deliverables: - docs/analysis/PHASE4_E3_ENV_CONSTRUCTOR_INIT_DESIGN.md - docs/analysis/PHASE4_E3_ENV_CONSTRUCTOR_INIT_NEXT_INSTRUCTIONS.md - docs/analysis/PHASE4_COMPREHENSIVE_STATUS_ANALYSIS.md - docs/analysis/PHASE4_EXECUTIVE_SUMMARY.md - scripts/verify_health_profiles.sh (sanity check script) - CURRENT_TASK.md (E3-4 complete, next instructions) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-14 02:57:35 +09:00
---
## Step 0: 前提E1 を ON にして測る)
E3-4 は `hakmem_env_snapshot_enabled()` の gate 判定を短絡する最適化なので、E1 が ON であることが前提。
---
## Step 1: Build & 健康診断(先に通す)
```sh
make bench_random_mixed_hakmem -j
scripts/verify_health_profiles.sh
```
---
Phase 5 E4-1: Free Wrapper ENV Snapshot (+3.51% GO, ADOPTED) Target: Consolidate free wrapper TLS reads (2→1) - free() is 25.26% self% (top hot spot) - Strategy: Apply E1 success pattern (ENV snapshot) to free path Implementation: - ENV gate: HAKMEM_FREE_WRAPPER_ENV_SNAPSHOT=0/1 (default 0) - core/box/free_wrapper_env_snapshot_box.{h,c}: New box - Consolidates 2 TLS reads → 1 TLS read (50% reduction) - Reduces 4 branches → 3 branches (25% reduction) - Lazy init with probe window (bench_profile putenv sync) - core/box/hak_wrappers.inc.h: Integration in free() wrapper - Makefile: Add free_wrapper_env_snapshot_box.o to all targets A/B Test Results (Mixed, 10-run, 20M iters): - Baseline (SNAPSHOT=0): 45.35M ops/s (mean), 45.31M ops/s (median) - Optimized (SNAPSHOT=1): 46.94M ops/s (mean), 47.15M ops/s (median) - Improvement: +3.51% mean, +4.07% median Decision: GO (+3.51% >= +1.0% threshold) - Exceeded conservative estimate (+1.5% → +3.51%) - Similar efficiency to E1 (+3.92%) - Health check: PASS (all profiles) - Action: PROMOTED to MIXED_TINYV3_C7_SAFE preset Phase 5 Cumulative: - E1 (ENV Snapshot): +3.92% - E4-1 (Free Wrapper Snapshot): +3.51% - Total Phase 4-5: ~+7.5% E3-4 Correction: - Phase 4 E3-4 (ENV Constructor Init): NO-GO / FROZEN - Initial A/B showed +4.75%, but investigation revealed: - Branch prediction hint mismatch (UNLIKELY with always-true) - Retest confirmed -1.78% regression - Root cause: __builtin_expect(..., 0) with ctor_mode==1 - Decision: Freeze as research box (default OFF) - Learning: Branch hints need careful tuning, TLS consolidation safer Deliverables: - docs/analysis/PHASE5_E4_FREE_GATE_OPTIMIZATION_1_DESIGN.md - docs/analysis/PHASE5_E4_1_FREE_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md - docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md (next) - docs/analysis/PHASE5_POST_E1_NEXT_INSTRUCTIONS.md - docs/analysis/ENV_PROFILE_PRESETS.md (E4-1 added, E3-4 corrected) - CURRENT_TASK.md (E4-1 complete, E3-4 frozen) - core/bench_profile.h (E4-1 promoted to default) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-14 04:24:34 +09:00
## Step 2: 再現検証(必要な場合のみ)
Phase 4 E3-4: ENV Constructor Init (+4.75% GO) Target: Eliminate E1 lazy init check overhead (3.22% self%) - E1 consolidated ENV gates but lazy check remained in hot path - Strategy: __attribute__((constructor(101))) for pre-main init Implementation: - ENV gate: HAKMEM_ENV_SNAPSHOT_CTOR=0/1 (default 0, research box) - core/box/hakmem_env_snapshot_box.c: Constructor function added - Reads ENV before main() when CTOR=1 - Refresh also syncs gate state for bench_profile putenv - core/box/hakmem_env_snapshot_box.h: Dual-mode enabled check - CTOR=1 fast path: direct global read (no lazy branch) - CTOR=0 fallback: legacy lazy init (rollback safe) - Branch hints adjusted for default OFF baseline A/B Test Results (Mixed, 10-run, 20M iters, E1=1): - Baseline (CTOR=0): 44.28M ops/s (mean), 44.60M ops/s (median) - Optimized (CTOR=1): 46.38M ops/s (mean), 46.53M ops/s (median) - Improvement: +4.75% mean, +4.35% median Decision: GO (+4.75% >> +0.5% threshold) - Expected +0.5-1.5%, achieved +4.75% - Lazy init branch overhead was larger than expected - Action: Keep as research box (default OFF), evaluate promotion Phase 4 Cumulative: - E1 (ENV Snapshot): +3.92% - E2 (Alloc Per-Class): -0.21% (NEUTRAL, frozen) - E3-4 (Constructor Init): +4.75% - Total Phase 4: ~+8.5% Deliverables: - docs/analysis/PHASE4_E3_ENV_CONSTRUCTOR_INIT_DESIGN.md - docs/analysis/PHASE4_E3_ENV_CONSTRUCTOR_INIT_NEXT_INSTRUCTIONS.md - docs/analysis/PHASE4_COMPREHENSIVE_STATUS_ANALYSIS.md - docs/analysis/PHASE4_EXECUTIVE_SUMMARY.md - scripts/verify_health_profiles.sh (sanity check script) - CURRENT_TASK.md (E3-4 complete, next instructions) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-14 02:57:35 +09:00
Mixed 10-runiter=20M, ws=400:
```sh
# Baseline: ctor=0
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE \
HAKMEM_ENV_SNAPSHOT=1 \
HAKMEM_ENV_SNAPSHOT_CTOR=0 \
./bench_random_mixed_hakmem 20000000 400 1
# Optimized: ctor=1
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE \
HAKMEM_ENV_SNAPSHOT=1 \
HAKMEM_ENV_SNAPSHOT_CTOR=1 \
./bench_random_mixed_hakmem 20000000 400 1
```
判定10-run mean:
Phase 5 E4-1: Free Wrapper ENV Snapshot (+3.51% GO, ADOPTED) Target: Consolidate free wrapper TLS reads (2→1) - free() is 25.26% self% (top hot spot) - Strategy: Apply E1 success pattern (ENV snapshot) to free path Implementation: - ENV gate: HAKMEM_FREE_WRAPPER_ENV_SNAPSHOT=0/1 (default 0) - core/box/free_wrapper_env_snapshot_box.{h,c}: New box - Consolidates 2 TLS reads → 1 TLS read (50% reduction) - Reduces 4 branches → 3 branches (25% reduction) - Lazy init with probe window (bench_profile putenv sync) - core/box/hak_wrappers.inc.h: Integration in free() wrapper - Makefile: Add free_wrapper_env_snapshot_box.o to all targets A/B Test Results (Mixed, 10-run, 20M iters): - Baseline (SNAPSHOT=0): 45.35M ops/s (mean), 45.31M ops/s (median) - Optimized (SNAPSHOT=1): 46.94M ops/s (mean), 47.15M ops/s (median) - Improvement: +3.51% mean, +4.07% median Decision: GO (+3.51% >= +1.0% threshold) - Exceeded conservative estimate (+1.5% → +3.51%) - Similar efficiency to E1 (+3.92%) - Health check: PASS (all profiles) - Action: PROMOTED to MIXED_TINYV3_C7_SAFE preset Phase 5 Cumulative: - E1 (ENV Snapshot): +3.92% - E4-1 (Free Wrapper Snapshot): +3.51% - Total Phase 4-5: ~+7.5% E3-4 Correction: - Phase 4 E3-4 (ENV Constructor Init): NO-GO / FROZEN - Initial A/B showed +4.75%, but investigation revealed: - Branch prediction hint mismatch (UNLIKELY with always-true) - Retest confirmed -1.78% regression - Root cause: __builtin_expect(..., 0) with ctor_mode==1 - Decision: Freeze as research box (default OFF) - Learning: Branch hints need careful tuning, TLS consolidation safer Deliverables: - docs/analysis/PHASE5_E4_FREE_GATE_OPTIMIZATION_1_DESIGN.md - docs/analysis/PHASE5_E4_1_FREE_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md - docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md (next) - docs/analysis/PHASE5_POST_E1_NEXT_INSTRUCTIONS.md - docs/analysis/ENV_PROFILE_PRESETS.md (E4-1 added, E3-4 corrected) - CURRENT_TASK.md (E4-1 complete, E3-4 frozen) - core/bench_profile.h (E4-1 promoted to default) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-14 04:24:34 +09:00
- -1% 以下 → freeze 維持(現状)
Phase 4 E3-4: ENV Constructor Init (+4.75% GO) Target: Eliminate E1 lazy init check overhead (3.22% self%) - E1 consolidated ENV gates but lazy check remained in hot path - Strategy: __attribute__((constructor(101))) for pre-main init Implementation: - ENV gate: HAKMEM_ENV_SNAPSHOT_CTOR=0/1 (default 0, research box) - core/box/hakmem_env_snapshot_box.c: Constructor function added - Reads ENV before main() when CTOR=1 - Refresh also syncs gate state for bench_profile putenv - core/box/hakmem_env_snapshot_box.h: Dual-mode enabled check - CTOR=1 fast path: direct global read (no lazy branch) - CTOR=0 fallback: legacy lazy init (rollback safe) - Branch hints adjusted for default OFF baseline A/B Test Results (Mixed, 10-run, 20M iters, E1=1): - Baseline (CTOR=0): 44.28M ops/s (mean), 44.60M ops/s (median) - Optimized (CTOR=1): 46.38M ops/s (mean), 46.53M ops/s (median) - Improvement: +4.75% mean, +4.35% median Decision: GO (+4.75% >> +0.5% threshold) - Expected +0.5-1.5%, achieved +4.75% - Lazy init branch overhead was larger than expected - Action: Keep as research box (default OFF), evaluate promotion Phase 4 Cumulative: - E1 (ENV Snapshot): +3.92% - E2 (Alloc Per-Class): -0.21% (NEUTRAL, frozen) - E3-4 (Constructor Init): +4.75% - Total Phase 4: ~+8.5% Deliverables: - docs/analysis/PHASE4_E3_ENV_CONSTRUCTOR_INIT_DESIGN.md - docs/analysis/PHASE4_E3_ENV_CONSTRUCTOR_INIT_NEXT_INSTRUCTIONS.md - docs/analysis/PHASE4_COMPREHENSIVE_STATUS_ANALYSIS.md - docs/analysis/PHASE4_EXECUTIVE_SUMMARY.md - scripts/verify_health_profiles.sh (sanity check script) - CURRENT_TASK.md (E3-4 complete, next instructions) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-14 02:57:35 +09:00
注意:
- “constructor の pre-main init” を効かせたい場合は、起動前に ENV を設定するbench_profile putenv だけでは遅い)。
---
## Step 3: perf で “消えたか” を確認E3-4=1
```sh
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE \
HAKMEM_ENV_SNAPSHOT=1 \
HAKMEM_ENV_SNAPSHOT_CTOR=1 \
perf record -F 99 -- ./bench_random_mixed_hakmem 20000000 400 1
perf report --stdio --no-children
```
確認ポイント:
- `hakmem_env_snapshot_enabled` の self% が有意に下がるTop から落ちる
- 代わりに “snapshot 参照” が 1 箇所に集約されている
---
Phase 5 E4-1: Free Wrapper ENV Snapshot (+3.51% GO, ADOPTED) Target: Consolidate free wrapper TLS reads (2→1) - free() is 25.26% self% (top hot spot) - Strategy: Apply E1 success pattern (ENV snapshot) to free path Implementation: - ENV gate: HAKMEM_FREE_WRAPPER_ENV_SNAPSHOT=0/1 (default 0) - core/box/free_wrapper_env_snapshot_box.{h,c}: New box - Consolidates 2 TLS reads → 1 TLS read (50% reduction) - Reduces 4 branches → 3 branches (25% reduction) - Lazy init with probe window (bench_profile putenv sync) - core/box/hak_wrappers.inc.h: Integration in free() wrapper - Makefile: Add free_wrapper_env_snapshot_box.o to all targets A/B Test Results (Mixed, 10-run, 20M iters): - Baseline (SNAPSHOT=0): 45.35M ops/s (mean), 45.31M ops/s (median) - Optimized (SNAPSHOT=1): 46.94M ops/s (mean), 47.15M ops/s (median) - Improvement: +3.51% mean, +4.07% median Decision: GO (+3.51% >= +1.0% threshold) - Exceeded conservative estimate (+1.5% → +3.51%) - Similar efficiency to E1 (+3.92%) - Health check: PASS (all profiles) - Action: PROMOTED to MIXED_TINYV3_C7_SAFE preset Phase 5 Cumulative: - E1 (ENV Snapshot): +3.92% - E4-1 (Free Wrapper Snapshot): +3.51% - Total Phase 4-5: ~+7.5% E3-4 Correction: - Phase 4 E3-4 (ENV Constructor Init): NO-GO / FROZEN - Initial A/B showed +4.75%, but investigation revealed: - Branch prediction hint mismatch (UNLIKELY with always-true) - Retest confirmed -1.78% regression - Root cause: __builtin_expect(..., 0) with ctor_mode==1 - Decision: Freeze as research box (default OFF) - Learning: Branch hints need careful tuning, TLS consolidation safer Deliverables: - docs/analysis/PHASE5_E4_FREE_GATE_OPTIMIZATION_1_DESIGN.md - docs/analysis/PHASE5_E4_1_FREE_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md - docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md (next) - docs/analysis/PHASE5_POST_E1_NEXT_INSTRUCTIONS.md - docs/analysis/ENV_PROFILE_PRESETS.md (E4-1 added, E3-4 corrected) - CURRENT_TASK.md (E4-1 complete, E3-4 frozen) - core/bench_profile.h (E4-1 promoted to default) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-14 04:24:34 +09:00
## Step 4: 本線化E1 のみ)
Phase 4 E3-4: ENV Constructor Init (+4.75% GO) Target: Eliminate E1 lazy init check overhead (3.22% self%) - E1 consolidated ENV gates but lazy check remained in hot path - Strategy: __attribute__((constructor(101))) for pre-main init Implementation: - ENV gate: HAKMEM_ENV_SNAPSHOT_CTOR=0/1 (default 0, research box) - core/box/hakmem_env_snapshot_box.c: Constructor function added - Reads ENV before main() when CTOR=1 - Refresh also syncs gate state for bench_profile putenv - core/box/hakmem_env_snapshot_box.h: Dual-mode enabled check - CTOR=1 fast path: direct global read (no lazy branch) - CTOR=0 fallback: legacy lazy init (rollback safe) - Branch hints adjusted for default OFF baseline A/B Test Results (Mixed, 10-run, 20M iters, E1=1): - Baseline (CTOR=0): 44.28M ops/s (mean), 44.60M ops/s (median) - Optimized (CTOR=1): 46.38M ops/s (mean), 46.53M ops/s (median) - Improvement: +4.75% mean, +4.35% median Decision: GO (+4.75% >> +0.5% threshold) - Expected +0.5-1.5%, achieved +4.75% - Lazy init branch overhead was larger than expected - Action: Keep as research box (default OFF), evaluate promotion Phase 4 Cumulative: - E1 (ENV Snapshot): +3.92% - E2 (Alloc Per-Class): -0.21% (NEUTRAL, frozen) - E3-4 (Constructor Init): +4.75% - Total Phase 4: ~+8.5% Deliverables: - docs/analysis/PHASE4_E3_ENV_CONSTRUCTOR_INIT_DESIGN.md - docs/analysis/PHASE4_E3_ENV_CONSTRUCTOR_INIT_NEXT_INSTRUCTIONS.md - docs/analysis/PHASE4_COMPREHENSIVE_STATUS_ANALYSIS.md - docs/analysis/PHASE4_EXECUTIVE_SUMMARY.md - scripts/verify_health_profiles.sh (sanity check script) - CURRENT_TASK.md (E3-4 complete, next instructions) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-14 02:57:35 +09:00
Phase 5 E4-1: Free Wrapper ENV Snapshot (+3.51% GO, ADOPTED) Target: Consolidate free wrapper TLS reads (2→1) - free() is 25.26% self% (top hot spot) - Strategy: Apply E1 success pattern (ENV snapshot) to free path Implementation: - ENV gate: HAKMEM_FREE_WRAPPER_ENV_SNAPSHOT=0/1 (default 0) - core/box/free_wrapper_env_snapshot_box.{h,c}: New box - Consolidates 2 TLS reads → 1 TLS read (50% reduction) - Reduces 4 branches → 3 branches (25% reduction) - Lazy init with probe window (bench_profile putenv sync) - core/box/hak_wrappers.inc.h: Integration in free() wrapper - Makefile: Add free_wrapper_env_snapshot_box.o to all targets A/B Test Results (Mixed, 10-run, 20M iters): - Baseline (SNAPSHOT=0): 45.35M ops/s (mean), 45.31M ops/s (median) - Optimized (SNAPSHOT=1): 46.94M ops/s (mean), 47.15M ops/s (median) - Improvement: +3.51% mean, +4.07% median Decision: GO (+3.51% >= +1.0% threshold) - Exceeded conservative estimate (+1.5% → +3.51%) - Similar efficiency to E1 (+3.92%) - Health check: PASS (all profiles) - Action: PROMOTED to MIXED_TINYV3_C7_SAFE preset Phase 5 Cumulative: - E1 (ENV Snapshot): +3.92% - E4-1 (Free Wrapper Snapshot): +3.51% - Total Phase 4-5: ~+7.5% E3-4 Correction: - Phase 4 E3-4 (ENV Constructor Init): NO-GO / FROZEN - Initial A/B showed +4.75%, but investigation revealed: - Branch prediction hint mismatch (UNLIKELY with always-true) - Retest confirmed -1.78% regression - Root cause: __builtin_expect(..., 0) with ctor_mode==1 - Decision: Freeze as research box (default OFF) - Learning: Branch hints need careful tuning, TLS consolidation safer Deliverables: - docs/analysis/PHASE5_E4_FREE_GATE_OPTIMIZATION_1_DESIGN.md - docs/analysis/PHASE5_E4_1_FREE_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md - docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md (next) - docs/analysis/PHASE5_POST_E1_NEXT_INSTRUCTIONS.md - docs/analysis/ENV_PROFILE_PRESETS.md (E4-1 added, E3-4 corrected) - CURRENT_TASK.md (E4-1 complete, E3-4 frozen) - core/bench_profile.h (E4-1 promoted to default) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-14 04:24:34 +09:00
- `HAKMEM_ENV_SNAPSHOT_CTOR=1` は本線化しないfreeze
- E1`HAKMEM_ENV_SNAPSHOT=1`)は勝ち箱なのでプリセット昇格を優先
Phase 4 E3-4: ENV Constructor Init (+4.75% GO) Target: Eliminate E1 lazy init check overhead (3.22% self%) - E1 consolidated ENV gates but lazy check remained in hot path - Strategy: __attribute__((constructor(101))) for pre-main init Implementation: - ENV gate: HAKMEM_ENV_SNAPSHOT_CTOR=0/1 (default 0, research box) - core/box/hakmem_env_snapshot_box.c: Constructor function added - Reads ENV before main() when CTOR=1 - Refresh also syncs gate state for bench_profile putenv - core/box/hakmem_env_snapshot_box.h: Dual-mode enabled check - CTOR=1 fast path: direct global read (no lazy branch) - CTOR=0 fallback: legacy lazy init (rollback safe) - Branch hints adjusted for default OFF baseline A/B Test Results (Mixed, 10-run, 20M iters, E1=1): - Baseline (CTOR=0): 44.28M ops/s (mean), 44.60M ops/s (median) - Optimized (CTOR=1): 46.38M ops/s (mean), 46.53M ops/s (median) - Improvement: +4.75% mean, +4.35% median Decision: GO (+4.75% >> +0.5% threshold) - Expected +0.5-1.5%, achieved +4.75% - Lazy init branch overhead was larger than expected - Action: Keep as research box (default OFF), evaluate promotion Phase 4 Cumulative: - E1 (ENV Snapshot): +3.92% - E2 (Alloc Per-Class): -0.21% (NEUTRAL, frozen) - E3-4 (Constructor Init): +4.75% - Total Phase 4: ~+8.5% Deliverables: - docs/analysis/PHASE4_E3_ENV_CONSTRUCTOR_INIT_DESIGN.md - docs/analysis/PHASE4_E3_ENV_CONSTRUCTOR_INIT_NEXT_INSTRUCTIONS.md - docs/analysis/PHASE4_COMPREHENSIVE_STATUS_ANALYSIS.md - docs/analysis/PHASE4_EXECUTIVE_SUMMARY.md - scripts/verify_health_profiles.sh (sanity check script) - CURRENT_TASK.md (E3-4 complete, next instructions) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-14 02:57:35 +09:00
---
## Step 5: Rollbackいつでも戻せる
```sh
HAKMEM_ENV_SNAPSHOT=0
HAKMEM_ENV_SNAPSHOT_CTOR=0
```
---
## NextPhase 4 Close
Phase 5 E4-1: Free Wrapper ENV Snapshot (+3.51% GO, ADOPTED) Target: Consolidate free wrapper TLS reads (2→1) - free() is 25.26% self% (top hot spot) - Strategy: Apply E1 success pattern (ENV snapshot) to free path Implementation: - ENV gate: HAKMEM_FREE_WRAPPER_ENV_SNAPSHOT=0/1 (default 0) - core/box/free_wrapper_env_snapshot_box.{h,c}: New box - Consolidates 2 TLS reads → 1 TLS read (50% reduction) - Reduces 4 branches → 3 branches (25% reduction) - Lazy init with probe window (bench_profile putenv sync) - core/box/hak_wrappers.inc.h: Integration in free() wrapper - Makefile: Add free_wrapper_env_snapshot_box.o to all targets A/B Test Results (Mixed, 10-run, 20M iters): - Baseline (SNAPSHOT=0): 45.35M ops/s (mean), 45.31M ops/s (median) - Optimized (SNAPSHOT=1): 46.94M ops/s (mean), 47.15M ops/s (median) - Improvement: +3.51% mean, +4.07% median Decision: GO (+3.51% >= +1.0% threshold) - Exceeded conservative estimate (+1.5% → +3.51%) - Similar efficiency to E1 (+3.92%) - Health check: PASS (all profiles) - Action: PROMOTED to MIXED_TINYV3_C7_SAFE preset Phase 5 Cumulative: - E1 (ENV Snapshot): +3.92% - E4-1 (Free Wrapper Snapshot): +3.51% - Total Phase 4-5: ~+7.5% E3-4 Correction: - Phase 4 E3-4 (ENV Constructor Init): NO-GO / FROZEN - Initial A/B showed +4.75%, but investigation revealed: - Branch prediction hint mismatch (UNLIKELY with always-true) - Retest confirmed -1.78% regression - Root cause: __builtin_expect(..., 0) with ctor_mode==1 - Decision: Freeze as research box (default OFF) - Learning: Branch hints need careful tuning, TLS consolidation safer Deliverables: - docs/analysis/PHASE5_E4_FREE_GATE_OPTIMIZATION_1_DESIGN.md - docs/analysis/PHASE5_E4_1_FREE_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md - docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md (next) - docs/analysis/PHASE5_POST_E1_NEXT_INSTRUCTIONS.md - docs/analysis/ENV_PROFILE_PRESETS.md (E4-1 added, E3-4 corrected) - CURRENT_TASK.md (E4-1 complete, E3-4 frozen) - core/bench_profile.h (E4-1 promoted to default) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-14 04:24:34 +09:00
- Phase 4 は “勝ち箱=E1” を固めて CLOSE。次は perf で次の芯を選ぶ。