Phase 4 E3-4: ENV Constructor Init (+4.75% GO)
Target: Eliminate E1 lazy init check overhead (3.22% self%)
- E1 consolidated ENV gates but lazy check remained in hot path
- Strategy: __attribute__((constructor(101))) for pre-main init
Implementation:
- ENV gate: HAKMEM_ENV_SNAPSHOT_CTOR=0/1 (default 0, research box)
- core/box/hakmem_env_snapshot_box.c: Constructor function added
- Reads ENV before main() when CTOR=1
- Refresh also syncs gate state for bench_profile putenv
- core/box/hakmem_env_snapshot_box.h: Dual-mode enabled check
- CTOR=1 fast path: direct global read (no lazy branch)
- CTOR=0 fallback: legacy lazy init (rollback safe)
- Branch hints adjusted for default OFF baseline
A/B Test Results (Mixed, 10-run, 20M iters, E1=1):
- Baseline (CTOR=0): 44.28M ops/s (mean), 44.60M ops/s (median)
- Optimized (CTOR=1): 46.38M ops/s (mean), 46.53M ops/s (median)
- Improvement: +4.75% mean, +4.35% median
Decision: GO (+4.75% >> +0.5% threshold)
- Expected +0.5-1.5%, achieved +4.75%
- Lazy init branch overhead was larger than expected
- Action: Keep as research box (default OFF), evaluate promotion
Phase 4 Cumulative:
- E1 (ENV Snapshot): +3.92%
- E2 (Alloc Per-Class): -0.21% (NEUTRAL, frozen)
- E3-4 (Constructor Init): +4.75%
- Total Phase 4: ~+8.5%
Deliverables:
- docs/analysis/PHASE4_E3_ENV_CONSTRUCTOR_INIT_DESIGN.md
- docs/analysis/PHASE4_E3_ENV_CONSTRUCTOR_INIT_NEXT_INSTRUCTIONS.md
- docs/analysis/PHASE4_COMPREHENSIVE_STATUS_ANALYSIS.md
- docs/analysis/PHASE4_EXECUTIVE_SUMMARY.md
- scripts/verify_health_profiles.sh (sanity check script)
- CURRENT_TASK.md (E3-4 complete, next instructions)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-14 02:57:35 +09:00
|
|
|
|
# Phase 4 E3-4: ENV Constructor Init 設計メモ
|
|
|
|
|
|
|
|
|
|
|
|
## 目的
|
|
|
|
|
|
|
|
|
|
|
|
E1 で統合した ENV snapshot の lazy init check(3.22% self%)を排除。
|
|
|
|
|
|
|
|
|
|
|
|
**期待**: **+0.5-1.5%** 改善
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 結果(A/B テスト)
|
|
|
|
|
|
|
Phase 5 E4-1: Free Wrapper ENV Snapshot (+3.51% GO, ADOPTED)
Target: Consolidate free wrapper TLS reads (2→1)
- free() is 25.26% self% (top hot spot)
- Strategy: Apply E1 success pattern (ENV snapshot) to free path
Implementation:
- ENV gate: HAKMEM_FREE_WRAPPER_ENV_SNAPSHOT=0/1 (default 0)
- core/box/free_wrapper_env_snapshot_box.{h,c}: New box
- Consolidates 2 TLS reads → 1 TLS read (50% reduction)
- Reduces 4 branches → 3 branches (25% reduction)
- Lazy init with probe window (bench_profile putenv sync)
- core/box/hak_wrappers.inc.h: Integration in free() wrapper
- Makefile: Add free_wrapper_env_snapshot_box.o to all targets
A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (SNAPSHOT=0): 45.35M ops/s (mean), 45.31M ops/s (median)
- Optimized (SNAPSHOT=1): 46.94M ops/s (mean), 47.15M ops/s (median)
- Improvement: +3.51% mean, +4.07% median
Decision: GO (+3.51% >= +1.0% threshold)
- Exceeded conservative estimate (+1.5% → +3.51%)
- Similar efficiency to E1 (+3.92%)
- Health check: PASS (all profiles)
- Action: PROMOTED to MIXED_TINYV3_C7_SAFE preset
Phase 5 Cumulative:
- E1 (ENV Snapshot): +3.92%
- E4-1 (Free Wrapper Snapshot): +3.51%
- Total Phase 4-5: ~+7.5%
E3-4 Correction:
- Phase 4 E3-4 (ENV Constructor Init): NO-GO / FROZEN
- Initial A/B showed +4.75%, but investigation revealed:
- Branch prediction hint mismatch (UNLIKELY with always-true)
- Retest confirmed -1.78% regression
- Root cause: __builtin_expect(..., 0) with ctor_mode==1
- Decision: Freeze as research box (default OFF)
- Learning: Branch hints need careful tuning, TLS consolidation safer
Deliverables:
- docs/analysis/PHASE5_E4_FREE_GATE_OPTIMIZATION_1_DESIGN.md
- docs/analysis/PHASE5_E4_1_FREE_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md
- docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md (next)
- docs/analysis/PHASE5_POST_E1_NEXT_INSTRUCTIONS.md
- docs/analysis/ENV_PROFILE_PRESETS.md (E4-1 added, E3-4 corrected)
- CURRENT_TASK.md (E4-1 complete, E3-4 frozen)
- core/bench_profile.h (E4-1 promoted to default)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-14 04:24:34 +09:00
|
|
|
|
### 初回観測(参考)
|
|
|
|
|
|
|
|
|
|
|
|
初回は **+4.75%** を観測したが、再現しなかった(環境/ノイズの可能性が高い)。
|
|
|
|
|
|
|
|
|
|
|
|
### 再検証(決定)
|
|
|
|
|
|
|
|
|
|
|
|
**判定**: ❌ **NO-GO / FROZEN**
|
Phase 4 E3-4: ENV Constructor Init (+4.75% GO)
Target: Eliminate E1 lazy init check overhead (3.22% self%)
- E1 consolidated ENV gates but lazy check remained in hot path
- Strategy: __attribute__((constructor(101))) for pre-main init
Implementation:
- ENV gate: HAKMEM_ENV_SNAPSHOT_CTOR=0/1 (default 0, research box)
- core/box/hakmem_env_snapshot_box.c: Constructor function added
- Reads ENV before main() when CTOR=1
- Refresh also syncs gate state for bench_profile putenv
- core/box/hakmem_env_snapshot_box.h: Dual-mode enabled check
- CTOR=1 fast path: direct global read (no lazy branch)
- CTOR=0 fallback: legacy lazy init (rollback safe)
- Branch hints adjusted for default OFF baseline
A/B Test Results (Mixed, 10-run, 20M iters, E1=1):
- Baseline (CTOR=0): 44.28M ops/s (mean), 44.60M ops/s (median)
- Optimized (CTOR=1): 46.38M ops/s (mean), 46.53M ops/s (median)
- Improvement: +4.75% mean, +4.35% median
Decision: GO (+4.75% >> +0.5% threshold)
- Expected +0.5-1.5%, achieved +4.75%
- Lazy init branch overhead was larger than expected
- Action: Keep as research box (default OFF), evaluate promotion
Phase 4 Cumulative:
- E1 (ENV Snapshot): +3.92%
- E2 (Alloc Per-Class): -0.21% (NEUTRAL, frozen)
- E3-4 (Constructor Init): +4.75%
- Total Phase 4: ~+8.5%
Deliverables:
- docs/analysis/PHASE4_E3_ENV_CONSTRUCTOR_INIT_DESIGN.md
- docs/analysis/PHASE4_E3_ENV_CONSTRUCTOR_INIT_NEXT_INSTRUCTIONS.md
- docs/analysis/PHASE4_COMPREHENSIVE_STATUS_ANALYSIS.md
- docs/analysis/PHASE4_EXECUTIVE_SUMMARY.md
- scripts/verify_health_profiles.sh (sanity check script)
- CURRENT_TASK.md (E3-4 complete, next instructions)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-14 02:57:35 +09:00
|
|
|
|
|
|
|
|
|
|
| Metric | Baseline (CTOR=0) | Optimized (CTOR=1) | Delta |
|
|
|
|
|
|
|--------|-------------------|-------------------|-------|
|
Phase 5 E4-1: Free Wrapper ENV Snapshot (+3.51% GO, ADOPTED)
Target: Consolidate free wrapper TLS reads (2→1)
- free() is 25.26% self% (top hot spot)
- Strategy: Apply E1 success pattern (ENV snapshot) to free path
Implementation:
- ENV gate: HAKMEM_FREE_WRAPPER_ENV_SNAPSHOT=0/1 (default 0)
- core/box/free_wrapper_env_snapshot_box.{h,c}: New box
- Consolidates 2 TLS reads → 1 TLS read (50% reduction)
- Reduces 4 branches → 3 branches (25% reduction)
- Lazy init with probe window (bench_profile putenv sync)
- core/box/hak_wrappers.inc.h: Integration in free() wrapper
- Makefile: Add free_wrapper_env_snapshot_box.o to all targets
A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (SNAPSHOT=0): 45.35M ops/s (mean), 45.31M ops/s (median)
- Optimized (SNAPSHOT=1): 46.94M ops/s (mean), 47.15M ops/s (median)
- Improvement: +3.51% mean, +4.07% median
Decision: GO (+3.51% >= +1.0% threshold)
- Exceeded conservative estimate (+1.5% → +3.51%)
- Similar efficiency to E1 (+3.92%)
- Health check: PASS (all profiles)
- Action: PROMOTED to MIXED_TINYV3_C7_SAFE preset
Phase 5 Cumulative:
- E1 (ENV Snapshot): +3.92%
- E4-1 (Free Wrapper Snapshot): +3.51%
- Total Phase 4-5: ~+7.5%
E3-4 Correction:
- Phase 4 E3-4 (ENV Constructor Init): NO-GO / FROZEN
- Initial A/B showed +4.75%, but investigation revealed:
- Branch prediction hint mismatch (UNLIKELY with always-true)
- Retest confirmed -1.78% regression
- Root cause: __builtin_expect(..., 0) with ctor_mode==1
- Decision: Freeze as research box (default OFF)
- Learning: Branch hints need careful tuning, TLS consolidation safer
Deliverables:
- docs/analysis/PHASE5_E4_FREE_GATE_OPTIMIZATION_1_DESIGN.md
- docs/analysis/PHASE5_E4_1_FREE_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md
- docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md (next)
- docs/analysis/PHASE5_POST_E1_NEXT_INSTRUCTIONS.md
- docs/analysis/ENV_PROFILE_PRESETS.md (E4-1 added, E3-4 corrected)
- CURRENT_TASK.md (E4-1 complete, E3-4 frozen)
- core/bench_profile.h (E4-1 promoted to default)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-14 04:24:34 +09:00
|
|
|
|
| Mean | 47.55M ops/s | 46.86M ops/s | **-1.44%** |
|
|
|
|
|
|
| Median | 47.46M ops/s | 46.97M ops/s | **-1.03%** |
|
Phase 4 E3-4: ENV Constructor Init (+4.75% GO)
Target: Eliminate E1 lazy init check overhead (3.22% self%)
- E1 consolidated ENV gates but lazy check remained in hot path
- Strategy: __attribute__((constructor(101))) for pre-main init
Implementation:
- ENV gate: HAKMEM_ENV_SNAPSHOT_CTOR=0/1 (default 0, research box)
- core/box/hakmem_env_snapshot_box.c: Constructor function added
- Reads ENV before main() when CTOR=1
- Refresh also syncs gate state for bench_profile putenv
- core/box/hakmem_env_snapshot_box.h: Dual-mode enabled check
- CTOR=1 fast path: direct global read (no lazy branch)
- CTOR=0 fallback: legacy lazy init (rollback safe)
- Branch hints adjusted for default OFF baseline
A/B Test Results (Mixed, 10-run, 20M iters, E1=1):
- Baseline (CTOR=0): 44.28M ops/s (mean), 44.60M ops/s (median)
- Optimized (CTOR=1): 46.38M ops/s (mean), 46.53M ops/s (median)
- Improvement: +4.75% mean, +4.35% median
Decision: GO (+4.75% >> +0.5% threshold)
- Expected +0.5-1.5%, achieved +4.75%
- Lazy init branch overhead was larger than expected
- Action: Keep as research box (default OFF), evaluate promotion
Phase 4 Cumulative:
- E1 (ENV Snapshot): +3.92%
- E2 (Alloc Per-Class): -0.21% (NEUTRAL, frozen)
- E3-4 (Constructor Init): +4.75%
- Total Phase 4: ~+8.5%
Deliverables:
- docs/analysis/PHASE4_E3_ENV_CONSTRUCTOR_INIT_DESIGN.md
- docs/analysis/PHASE4_E3_ENV_CONSTRUCTOR_INIT_NEXT_INSTRUCTIONS.md
- docs/analysis/PHASE4_COMPREHENSIVE_STATUS_ANALYSIS.md
- docs/analysis/PHASE4_EXECUTIVE_SUMMARY.md
- scripts/verify_health_profiles.sh (sanity check script)
- CURRENT_TASK.md (E3-4 complete, next instructions)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-14 02:57:35 +09:00
|
|
|
|
|
Phase 5 E4-1: Free Wrapper ENV Snapshot (+3.51% GO, ADOPTED)
Target: Consolidate free wrapper TLS reads (2→1)
- free() is 25.26% self% (top hot spot)
- Strategy: Apply E1 success pattern (ENV snapshot) to free path
Implementation:
- ENV gate: HAKMEM_FREE_WRAPPER_ENV_SNAPSHOT=0/1 (default 0)
- core/box/free_wrapper_env_snapshot_box.{h,c}: New box
- Consolidates 2 TLS reads → 1 TLS read (50% reduction)
- Reduces 4 branches → 3 branches (25% reduction)
- Lazy init with probe window (bench_profile putenv sync)
- core/box/hak_wrappers.inc.h: Integration in free() wrapper
- Makefile: Add free_wrapper_env_snapshot_box.o to all targets
A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (SNAPSHOT=0): 45.35M ops/s (mean), 45.31M ops/s (median)
- Optimized (SNAPSHOT=1): 46.94M ops/s (mean), 47.15M ops/s (median)
- Improvement: +3.51% mean, +4.07% median
Decision: GO (+3.51% >= +1.0% threshold)
- Exceeded conservative estimate (+1.5% → +3.51%)
- Similar efficiency to E1 (+3.92%)
- Health check: PASS (all profiles)
- Action: PROMOTED to MIXED_TINYV3_C7_SAFE preset
Phase 5 Cumulative:
- E1 (ENV Snapshot): +3.92%
- E4-1 (Free Wrapper Snapshot): +3.51%
- Total Phase 4-5: ~+7.5%
E3-4 Correction:
- Phase 4 E3-4 (ENV Constructor Init): NO-GO / FROZEN
- Initial A/B showed +4.75%, but investigation revealed:
- Branch prediction hint mismatch (UNLIKELY with always-true)
- Retest confirmed -1.78% regression
- Root cause: __builtin_expect(..., 0) with ctor_mode==1
- Decision: Freeze as research box (default OFF)
- Learning: Branch hints need careful tuning, TLS consolidation safer
Deliverables:
- docs/analysis/PHASE5_E4_FREE_GATE_OPTIMIZATION_1_DESIGN.md
- docs/analysis/PHASE5_E4_1_FREE_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md
- docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md (next)
- docs/analysis/PHASE5_POST_E1_NEXT_INSTRUCTIONS.md
- docs/analysis/ENV_PROFILE_PRESETS.md (E4-1 added, E3-4 corrected)
- CURRENT_TASK.md (E4-1 complete, E3-4 frozen)
- core/bench_profile.h (E4-1 promoted to default)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-14 04:24:34 +09:00
|
|
|
|
**結論**:
|
|
|
|
|
|
- constructor init は “安全” だが、性能面では **現状の hot path では得にならない**
|
|
|
|
|
|
- 研究箱として保持するが **default OFF のまま freeze**
|
Phase 4 E3-4: ENV Constructor Init (+4.75% GO)
Target: Eliminate E1 lazy init check overhead (3.22% self%)
- E1 consolidated ENV gates but lazy check remained in hot path
- Strategy: __attribute__((constructor(101))) for pre-main init
Implementation:
- ENV gate: HAKMEM_ENV_SNAPSHOT_CTOR=0/1 (default 0, research box)
- core/box/hakmem_env_snapshot_box.c: Constructor function added
- Reads ENV before main() when CTOR=1
- Refresh also syncs gate state for bench_profile putenv
- core/box/hakmem_env_snapshot_box.h: Dual-mode enabled check
- CTOR=1 fast path: direct global read (no lazy branch)
- CTOR=0 fallback: legacy lazy init (rollback safe)
- Branch hints adjusted for default OFF baseline
A/B Test Results (Mixed, 10-run, 20M iters, E1=1):
- Baseline (CTOR=0): 44.28M ops/s (mean), 44.60M ops/s (median)
- Optimized (CTOR=1): 46.38M ops/s (mean), 46.53M ops/s (median)
- Improvement: +4.75% mean, +4.35% median
Decision: GO (+4.75% >> +0.5% threshold)
- Expected +0.5-1.5%, achieved +4.75%
- Lazy init branch overhead was larger than expected
- Action: Keep as research box (default OFF), evaluate promotion
Phase 4 Cumulative:
- E1 (ENV Snapshot): +3.92%
- E2 (Alloc Per-Class): -0.21% (NEUTRAL, frozen)
- E3-4 (Constructor Init): +4.75%
- Total Phase 4: ~+8.5%
Deliverables:
- docs/analysis/PHASE4_E3_ENV_CONSTRUCTOR_INIT_DESIGN.md
- docs/analysis/PHASE4_E3_ENV_CONSTRUCTOR_INIT_NEXT_INSTRUCTIONS.md
- docs/analysis/PHASE4_COMPREHENSIVE_STATUS_ANALYSIS.md
- docs/analysis/PHASE4_EXECUTIVE_SUMMARY.md
- scripts/verify_health_profiles.sh (sanity check script)
- CURRENT_TASK.md (E3-4 complete, next instructions)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-14 02:57:35 +09:00
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 現状分析
|
|
|
|
|
|
|
|
|
|
|
|
### E1 完了後の状態
|
|
|
|
|
|
|
|
|
|
|
|
- `hakmem_env_snapshot_enabled()`: 3.22% self%(perf profile)
|
|
|
|
|
|
- 原因: 毎回の lazy init check(`static int g = -1` + `getenv()`)
|
|
|
|
|
|
|
|
|
|
|
|
```c
|
|
|
|
|
|
// 現在の実装(core/box/hakmem_env_snapshot_box.h:51-62)
|
|
|
|
|
|
static inline bool hakmem_env_snapshot_enabled(void) {
|
|
|
|
|
|
static int g = -1;
|
|
|
|
|
|
if (__builtin_expect(g == -1, 0)) { // ← この分岐が 3.22%
|
|
|
|
|
|
const char* e = getenv("HAKMEM_ENV_SNAPSHOT");
|
|
|
|
|
|
if (e && *e) {
|
|
|
|
|
|
g = (*e == '1') ? 1 : 0;
|
|
|
|
|
|
} else {
|
|
|
|
|
|
g = 0;
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
return g != 0;
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 問題
|
|
|
|
|
|
|
|
|
|
|
|
1. **分岐コスト**: `if (g == -1)` が hot path で毎回評価
|
|
|
|
|
|
2. **予測ミス**: first call で branch misprediction
|
|
|
|
|
|
3. **関数呼び出しオーバーヘッド**: inline でも分岐は残る
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 設計
|
|
|
|
|
|
|
|
|
|
|
|
### アプローチ: Constructor Init + Direct Read
|
|
|
|
|
|
|
|
|
|
|
|
```c
|
|
|
|
|
|
// 新しい実装
|
|
|
|
|
|
static int g_hakmem_env_snapshot_gate = -1;
|
|
|
|
|
|
|
|
|
|
|
|
__attribute__((constructor(101))) // priority 101: after libc init
|
|
|
|
|
|
static void hakmem_env_snapshot_gate_init(void) {
|
|
|
|
|
|
const char* e = getenv("HAKMEM_ENV_SNAPSHOT");
|
|
|
|
|
|
g_hakmem_env_snapshot_gate = (e && *e == '1') ? 1 : 0;
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
static inline bool hakmem_env_snapshot_enabled(void) {
|
|
|
|
|
|
return g_hakmem_env_snapshot_gate != 0; // No branch (just load + compare)
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 利点
|
|
|
|
|
|
|
|
|
|
|
|
1. **分岐削減**: `if (g == -1)` 完全排除
|
|
|
|
|
|
2. **一度だけ**: `getenv()` は main() 前に 1 回のみ
|
|
|
|
|
|
3. **キャッシュ効率**: global read は TLS より高速(L1 hit 率高い)
|
|
|
|
|
|
|
|
|
|
|
|
### リスク
|
|
|
|
|
|
|
|
|
|
|
|
| リスク | 対策 |
|
|
|
|
|
|
|--------|------|
|
|
|
|
|
|
| putenv() 後の変更が反映されない | bench_profile の `hakmem_env_snapshot_refresh_from_env()` で gate/snapshot を同期 |
|
|
|
|
|
|
| constructor order | priority 101 で libc init 後を保証 |
|
|
|
|
|
|
| fork() 安全性 | hakmem は fork-safe 設計済み |
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## Box Theory(実装計画)
|
|
|
|
|
|
|
|
|
|
|
|
### L0: Env(戻せる)
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
HAKMEM_ENV_SNAPSHOT_CTOR=0/1 # default: 0(OFF)
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
- **ON (=1)**: Constructor init を使用(lazy check なし)
|
|
|
|
|
|
- **OFF (=0)**: 従来の lazy init を使用(rollback 可能)
|
|
|
|
|
|
|
|
|
|
|
|
### L1: ENV Constructor Box(境界: 1 箇所)
|
|
|
|
|
|
|
|
|
|
|
|
#### 変更対象
|
|
|
|
|
|
|
|
|
|
|
|
- `core/box/hakmem_env_snapshot_box.h` (変更)
|
|
|
|
|
|
- `hakmem_env_snapshot_enabled()` を 2 つのモードで実装
|
|
|
|
|
|
- `core/box/hakmem_env_snapshot_box.c` (変更)
|
|
|
|
|
|
- Constructor 関数を追加
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 実装指示
|
|
|
|
|
|
|
|
|
|
|
|
### Patch 1: Constructor Init Gate
|
|
|
|
|
|
|
|
|
|
|
|
**ファイル**: `core/box/hakmem_env_snapshot_box.c`
|
|
|
|
|
|
|
|
|
|
|
|
```c
|
|
|
|
|
|
// Global gate (not static local - avoids lazy init)
|
|
|
|
|
|
int g_hakmem_env_snapshot_gate = -1;
|
|
|
|
|
|
int g_hakmem_env_snapshot_ctor_mode = -1;
|
|
|
|
|
|
|
|
|
|
|
|
// Constructor: run before main()
|
|
|
|
|
|
__attribute__((constructor(101)))
|
|
|
|
|
|
static void hakmem_env_snapshot_gate_ctor(void) {
|
|
|
|
|
|
// Read HAKMEM_ENV_SNAPSHOT_CTOR (default OFF)
|
|
|
|
|
|
const char* ctor_env = getenv("HAKMEM_ENV_SNAPSHOT_CTOR");
|
|
|
|
|
|
g_hakmem_env_snapshot_ctor_mode = (ctor_env && *ctor_env == '1') ? 1 : 0;
|
|
|
|
|
|
|
|
|
|
|
|
if (g_hakmem_env_snapshot_ctor_mode) {
|
|
|
|
|
|
// Constructor mode: init gate now
|
|
|
|
|
|
const char* e = getenv("HAKMEM_ENV_SNAPSHOT");
|
|
|
|
|
|
g_hakmem_env_snapshot_gate = (e && *e == '1') ? 1 : 0;
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### Patch 2: Dual-Mode Enabled Check
|
|
|
|
|
|
|
|
|
|
|
|
**ファイル**: `core/box/hakmem_env_snapshot_box.h`
|
|
|
|
|
|
|
|
|
|
|
|
```c
|
|
|
|
|
|
// Global gate state (defined in .c)
|
|
|
|
|
|
extern int g_hakmem_env_snapshot_gate;
|
|
|
|
|
|
extern int g_hakmem_env_snapshot_ctor_mode;
|
|
|
|
|
|
|
|
|
|
|
|
static inline bool hakmem_env_snapshot_enabled(void) {
|
Phase 5 E4-1: Free Wrapper ENV Snapshot (+3.51% GO, ADOPTED)
Target: Consolidate free wrapper TLS reads (2→1)
- free() is 25.26% self% (top hot spot)
- Strategy: Apply E1 success pattern (ENV snapshot) to free path
Implementation:
- ENV gate: HAKMEM_FREE_WRAPPER_ENV_SNAPSHOT=0/1 (default 0)
- core/box/free_wrapper_env_snapshot_box.{h,c}: New box
- Consolidates 2 TLS reads → 1 TLS read (50% reduction)
- Reduces 4 branches → 3 branches (25% reduction)
- Lazy init with probe window (bench_profile putenv sync)
- core/box/hak_wrappers.inc.h: Integration in free() wrapper
- Makefile: Add free_wrapper_env_snapshot_box.o to all targets
A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (SNAPSHOT=0): 45.35M ops/s (mean), 45.31M ops/s (median)
- Optimized (SNAPSHOT=1): 46.94M ops/s (mean), 47.15M ops/s (median)
- Improvement: +3.51% mean, +4.07% median
Decision: GO (+3.51% >= +1.0% threshold)
- Exceeded conservative estimate (+1.5% → +3.51%)
- Similar efficiency to E1 (+3.92%)
- Health check: PASS (all profiles)
- Action: PROMOTED to MIXED_TINYV3_C7_SAFE preset
Phase 5 Cumulative:
- E1 (ENV Snapshot): +3.92%
- E4-1 (Free Wrapper Snapshot): +3.51%
- Total Phase 4-5: ~+7.5%
E3-4 Correction:
- Phase 4 E3-4 (ENV Constructor Init): NO-GO / FROZEN
- Initial A/B showed +4.75%, but investigation revealed:
- Branch prediction hint mismatch (UNLIKELY with always-true)
- Retest confirmed -1.78% regression
- Root cause: __builtin_expect(..., 0) with ctor_mode==1
- Decision: Freeze as research box (default OFF)
- Learning: Branch hints need careful tuning, TLS consolidation safer
Deliverables:
- docs/analysis/PHASE5_E4_FREE_GATE_OPTIMIZATION_1_DESIGN.md
- docs/analysis/PHASE5_E4_1_FREE_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md
- docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md (next)
- docs/analysis/PHASE5_POST_E1_NEXT_INSTRUCTIONS.md
- docs/analysis/ENV_PROFILE_PRESETS.md (E4-1 added, E3-4 corrected)
- CURRENT_TASK.md (E4-1 complete, E3-4 frozen)
- core/bench_profile.h (E4-1 promoted to default)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-14 04:24:34 +09:00
|
|
|
|
// Fast path: constructor mode (no lazy check, just global read).
|
|
|
|
|
|
// Note: do not attach a fixed branch hint here; it will be wrong for one mode.
|
|
|
|
|
|
if (g_hakmem_env_snapshot_ctor_mode == 1) {
|
Phase 4 E3-4: ENV Constructor Init (+4.75% GO)
Target: Eliminate E1 lazy init check overhead (3.22% self%)
- E1 consolidated ENV gates but lazy check remained in hot path
- Strategy: __attribute__((constructor(101))) for pre-main init
Implementation:
- ENV gate: HAKMEM_ENV_SNAPSHOT_CTOR=0/1 (default 0, research box)
- core/box/hakmem_env_snapshot_box.c: Constructor function added
- Reads ENV before main() when CTOR=1
- Refresh also syncs gate state for bench_profile putenv
- core/box/hakmem_env_snapshot_box.h: Dual-mode enabled check
- CTOR=1 fast path: direct global read (no lazy branch)
- CTOR=0 fallback: legacy lazy init (rollback safe)
- Branch hints adjusted for default OFF baseline
A/B Test Results (Mixed, 10-run, 20M iters, E1=1):
- Baseline (CTOR=0): 44.28M ops/s (mean), 44.60M ops/s (median)
- Optimized (CTOR=1): 46.38M ops/s (mean), 46.53M ops/s (median)
- Improvement: +4.75% mean, +4.35% median
Decision: GO (+4.75% >> +0.5% threshold)
- Expected +0.5-1.5%, achieved +4.75%
- Lazy init branch overhead was larger than expected
- Action: Keep as research box (default OFF), evaluate promotion
Phase 4 Cumulative:
- E1 (ENV Snapshot): +3.92%
- E2 (Alloc Per-Class): -0.21% (NEUTRAL, frozen)
- E3-4 (Constructor Init): +4.75%
- Total Phase 4: ~+8.5%
Deliverables:
- docs/analysis/PHASE4_E3_ENV_CONSTRUCTOR_INIT_DESIGN.md
- docs/analysis/PHASE4_E3_ENV_CONSTRUCTOR_INIT_NEXT_INSTRUCTIONS.md
- docs/analysis/PHASE4_COMPREHENSIVE_STATUS_ANALYSIS.md
- docs/analysis/PHASE4_EXECUTIVE_SUMMARY.md
- scripts/verify_health_profiles.sh (sanity check script)
- CURRENT_TASK.md (E3-4 complete, next instructions)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-14 02:57:35 +09:00
|
|
|
|
return g_hakmem_env_snapshot_gate != 0;
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
// Slow path: legacy lazy init (fallback)
|
|
|
|
|
|
if (__builtin_expect(g_hakmem_env_snapshot_gate == -1, 0)) {
|
|
|
|
|
|
const char* e = getenv("HAKMEM_ENV_SNAPSHOT");
|
|
|
|
|
|
g_hakmem_env_snapshot_gate = (e && *e == '1') ? 1 : 0;
|
|
|
|
|
|
}
|
|
|
|
|
|
return g_hakmem_env_snapshot_gate != 0;
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## A/B テスト計画
|
|
|
|
|
|
|
|
|
|
|
|
### Test Matrix
|
|
|
|
|
|
|
|
|
|
|
|
| Profile | Iterations | Runs | Command |
|
|
|
|
|
|
|---------|-----------|------|---------|
|
|
|
|
|
|
| Mixed | 20M | 10 | `./bench_random_mixed_hakmem 20000000 400 1` |
|
|
|
|
|
|
|
|
|
|
|
|
### Baseline
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE HAKMEM_ENV_SNAPSHOT=1 HAKMEM_ENV_SNAPSHOT_CTOR=0 \
|
|
|
|
|
|
./bench_random_mixed_hakmem 20000000 400 1
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### Optimized
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE HAKMEM_ENV_SNAPSHOT=1 HAKMEM_ENV_SNAPSHOT_CTOR=1 \
|
|
|
|
|
|
./bench_random_mixed_hakmem 20000000 400 1
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 判定基準
|
|
|
|
|
|
|
|
|
|
|
|
- **GO**: +0.5% 以上
|
|
|
|
|
|
- **NEUTRAL**: ±0.5%(研究箱維持)
|
|
|
|
|
|
- **NO-GO**: -0.5% 以下
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 期待値の根拠
|
|
|
|
|
|
|
|
|
|
|
|
**なぜ +0.5-1.5% か?**
|
|
|
|
|
|
|
|
|
|
|
|
1. **現在のオーバーヘッド**: 3.22% self%
|
|
|
|
|
|
2. **削減分**: lazy init check の分岐コスト(~10-15 cycles per call)
|
|
|
|
|
|
3. **削減率**: ~15-30% of 3.22% → 0.5-1.0%
|
|
|
|
|
|
4. **追加効果**: better branch prediction(warm path に分岐なし)
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 非目標
|
|
|
|
|
|
|
|
|
|
|
|
- snapshot refresh API の変更(putenv sync は既存 API で対応)
|
|
|
|
|
|
- E1 の構造変更(consolidation は維持)
|
|
|
|
|
|
- 他の ENV gate の constructor 化(E3-4 は hakmem_env_snapshot_enabled のみ)
|