2025-12-14 00:48:03 +09:00
|
|
|
|
# Phase 4 E1: ENV Snapshot Consolidation(次の指示書)
|
|
|
|
|
|
|
2025-12-14 01:46:18 +09:00
|
|
|
|
## Status(2025-12-14)
|
|
|
|
|
|
|
|
|
|
|
|
- ✅ GO(commit: `88717a873`)
|
|
|
|
|
|
- Mixed A/B(10-run, iter=20M, ws=400): **+3.92% avg / +4.01% median**
|
|
|
|
|
|
- 現状: opt-in(default OFF)のまま保持
|
|
|
|
|
|
|
2025-12-14 00:48:03 +09:00
|
|
|
|
## ゴール
|
|
|
|
|
|
|
|
|
|
|
|
MIXED の Hot path にある ENV gate 呼び出しを “snapshot 1 回” に集約し、**+2.5% 以上**を狙う。
|
|
|
|
|
|
|
|
|
|
|
|
対象(perf self% 合計 ≈ 3.26%):
|
|
|
|
|
|
- `tiny_c7_ultra_enabled_env()`
|
|
|
|
|
|
- `tiny_front_v3_enabled()`
|
|
|
|
|
|
- `tiny_metadata_cache_enabled()`
|
|
|
|
|
|
|
|
|
|
|
|
## Step 0: 事前確認(現状)
|
|
|
|
|
|
|
|
|
|
|
|
Mixed(iter=20M, ws=400)で perf を取り、上記 3 つが Top にいることを確認:
|
|
|
|
|
|
```sh
|
|
|
|
|
|
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE perf record -F 99 -- \
|
|
|
|
|
|
./bench_random_mixed_hakmem 20000000 400 1
|
|
|
|
|
|
perf report --stdio --no-children
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
## Step 1: L0 箱(EnvSnapshotBox)を追加
|
|
|
|
|
|
|
|
|
|
|
|
新規ファイル:
|
|
|
|
|
|
- `core/box/hakmem_env_snapshot_box.h`
|
|
|
|
|
|
- `core/box/hakmem_env_snapshot_box.c`
|
|
|
|
|
|
|
|
|
|
|
|
要件:
|
|
|
|
|
|
- ENV: `HAKMEM_ENV_SNAPSHOT=0/1`(default 0)
|
|
|
|
|
|
- `hakmem_env_snapshot_refresh_from_env()` を用意(getenv のみ/malloc しない)
|
|
|
|
|
|
- `hakmem_env_snapshot_get_fast()` は hot で “1 load + 1 branch” 程度に抑える
|
|
|
|
|
|
- `tiny_metadata_cache_eff = HAKMEM_TINY_METADATA_CACHE && !learner` を snapshot で計算
|
|
|
|
|
|
|
|
|
|
|
|
## Step 2: bench_profile 同期(putenv 後に refresh)
|
|
|
|
|
|
|
|
|
|
|
|
`core/bench_profile.h` の `#ifdef USE_HAKMEM` ブロック末尾に追加:
|
|
|
|
|
|
- `hakmem_env_snapshot_refresh_from_env();`
|
|
|
|
|
|
|
|
|
|
|
|
(既に `wrapper_env_refresh_from_env()` と `tiny_static_route_refresh_from_env()` があるので同列で OK)
|
|
|
|
|
|
|
|
|
|
|
|
## Step 3: 最小 migration(call-site 置換)
|
|
|
|
|
|
|
|
|
|
|
|
まず “毎回通る” 箇所だけ置換(3 gate → snapshot):
|
|
|
|
|
|
|
|
|
|
|
|
- `core/front/malloc_tiny_fast.h`
|
|
|
|
|
|
- `tiny_c7_ultra_enabled_env()` を snapshot 参照へ(C7 ULTRA gate)
|
|
|
|
|
|
- `tiny_front_v3_enabled()` を snapshot 参照へ(free 側の front_snap 取得)
|
|
|
|
|
|
|
|
|
|
|
|
- `core/box/tiny_legacy_fallback_box.h`
|
|
|
|
|
|
- `tiny_front_v3_enabled()` を snapshot 参照へ
|
|
|
|
|
|
- `tiny_metadata_cache_enabled()` を snapshot の `tiny_metadata_cache_eff` 参照へ
|
|
|
|
|
|
|
|
|
|
|
|
- `core/box/tiny_metadata_cache_hot_box.h`
|
|
|
|
|
|
- `tiny_metadata_cache_enabled()` を snapshot の `tiny_metadata_cache_eff` 参照へ
|
|
|
|
|
|
- (ここで learner interlock を “二重に” チェックしないよう整理)
|
|
|
|
|
|
|
|
|
|
|
|
注意(Fail-safe):
|
|
|
|
|
|
- `HAKMEM_ENV_SNAPSHOT=0` のときは既存関数経由に戻る(挙動を変えない)
|
|
|
|
|
|
|
|
|
|
|
|
## Step 4: ビルド & 健康診断
|
|
|
|
|
|
|
|
|
|
|
|
```sh
|
|
|
|
|
|
make bench_random_mixed_hakmem -j
|
|
|
|
|
|
scripts/verify_health_profiles.sh
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
## Step 5: A/B(GO/NO-GO)
|
|
|
|
|
|
|
|
|
|
|
|
Mixed 10-run(iter=20M, ws=400):
|
|
|
|
|
|
```sh
|
|
|
|
|
|
# Baseline
|
|
|
|
|
|
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE HAKMEM_ENV_SNAPSHOT=0 \
|
|
|
|
|
|
./bench_random_mixed_hakmem 20000000 400 1
|
|
|
|
|
|
|
|
|
|
|
|
# Optimized
|
|
|
|
|
|
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE HAKMEM_ENV_SNAPSHOT=1 \
|
|
|
|
|
|
./bench_random_mixed_hakmem 20000000 400 1
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
判定:
|
|
|
|
|
|
- GO: mean **+2.5% 以上**
|
|
|
|
|
|
- ±1%: NEUTRAL(research box)
|
|
|
|
|
|
- -1% 以下: NO-GO(freeze)
|
|
|
|
|
|
|
|
|
|
|
|
## Step 6: perf で “消えたか” を確認
|
|
|
|
|
|
|
|
|
|
|
|
E1=1 で perf を取り直し、次を確認:
|
|
|
|
|
|
- 3 つの gate 関数が Top から落ちる/self% が大きく減る
|
|
|
|
|
|
- 代わりに snapshot load が 1 箇所に集約されている
|
|
|
|
|
|
|
|
|
|
|
|
## Step 7: 昇格(GO の場合のみ)
|
|
|
|
|
|
|
|
|
|
|
|
- `core/bench_profile.h` の `MIXED_TINYV3_C7_SAFE` に `bench_setenv_default("HAKMEM_ENV_SNAPSHOT","1");` を追加
|
|
|
|
|
|
- `docs/analysis/ENV_PROFILE_PRESETS.md` に結果と rollback を追記
|
|
|
|
|
|
- `CURRENT_TASK.md` を E1 完了へ更新
|
|
|
|
|
|
|
|
|
|
|
|
NEUTRAL/NO-GO の場合:
|
|
|
|
|
|
- default OFF のまま freeze(本線は汚さない)
|
|
|
|
|
|
|
Phase 4 E3-4: ENV Constructor Init (+4.75% GO)
Target: Eliminate E1 lazy init check overhead (3.22% self%)
- E1 consolidated ENV gates but lazy check remained in hot path
- Strategy: __attribute__((constructor(101))) for pre-main init
Implementation:
- ENV gate: HAKMEM_ENV_SNAPSHOT_CTOR=0/1 (default 0, research box)
- core/box/hakmem_env_snapshot_box.c: Constructor function added
- Reads ENV before main() when CTOR=1
- Refresh also syncs gate state for bench_profile putenv
- core/box/hakmem_env_snapshot_box.h: Dual-mode enabled check
- CTOR=1 fast path: direct global read (no lazy branch)
- CTOR=0 fallback: legacy lazy init (rollback safe)
- Branch hints adjusted for default OFF baseline
A/B Test Results (Mixed, 10-run, 20M iters, E1=1):
- Baseline (CTOR=0): 44.28M ops/s (mean), 44.60M ops/s (median)
- Optimized (CTOR=1): 46.38M ops/s (mean), 46.53M ops/s (median)
- Improvement: +4.75% mean, +4.35% median
Decision: GO (+4.75% >> +0.5% threshold)
- Expected +0.5-1.5%, achieved +4.75%
- Lazy init branch overhead was larger than expected
- Action: Keep as research box (default OFF), evaluate promotion
Phase 4 Cumulative:
- E1 (ENV Snapshot): +3.92%
- E2 (Alloc Per-Class): -0.21% (NEUTRAL, frozen)
- E3-4 (Constructor Init): +4.75%
- Total Phase 4: ~+8.5%
Deliverables:
- docs/analysis/PHASE4_E3_ENV_CONSTRUCTOR_INIT_DESIGN.md
- docs/analysis/PHASE4_E3_ENV_CONSTRUCTOR_INIT_NEXT_INSTRUCTIONS.md
- docs/analysis/PHASE4_COMPREHENSIVE_STATUS_ANALYSIS.md
- docs/analysis/PHASE4_EXECUTIVE_SUMMARY.md
- scripts/verify_health_profiles.sh (sanity check script)
- CURRENT_TASK.md (E3-4 complete, next instructions)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-14 02:57:35 +09:00
|
|
|
|
## Next(Phase 4 E3-4)
|
2025-12-14 01:46:18 +09:00
|
|
|
|
|
Phase 4 E3-4: ENV Constructor Init (+4.75% GO)
Target: Eliminate E1 lazy init check overhead (3.22% self%)
- E1 consolidated ENV gates but lazy check remained in hot path
- Strategy: __attribute__((constructor(101))) for pre-main init
Implementation:
- ENV gate: HAKMEM_ENV_SNAPSHOT_CTOR=0/1 (default 0, research box)
- core/box/hakmem_env_snapshot_box.c: Constructor function added
- Reads ENV before main() when CTOR=1
- Refresh also syncs gate state for bench_profile putenv
- core/box/hakmem_env_snapshot_box.h: Dual-mode enabled check
- CTOR=1 fast path: direct global read (no lazy branch)
- CTOR=0 fallback: legacy lazy init (rollback safe)
- Branch hints adjusted for default OFF baseline
A/B Test Results (Mixed, 10-run, 20M iters, E1=1):
- Baseline (CTOR=0): 44.28M ops/s (mean), 44.60M ops/s (median)
- Optimized (CTOR=1): 46.38M ops/s (mean), 46.53M ops/s (median)
- Improvement: +4.75% mean, +4.35% median
Decision: GO (+4.75% >> +0.5% threshold)
- Expected +0.5-1.5%, achieved +4.75%
- Lazy init branch overhead was larger than expected
- Action: Keep as research box (default OFF), evaluate promotion
Phase 4 Cumulative:
- E1 (ENV Snapshot): +3.92%
- E2 (Alloc Per-Class): -0.21% (NEUTRAL, frozen)
- E3-4 (Constructor Init): +4.75%
- Total Phase 4: ~+8.5%
Deliverables:
- docs/analysis/PHASE4_E3_ENV_CONSTRUCTOR_INIT_DESIGN.md
- docs/analysis/PHASE4_E3_ENV_CONSTRUCTOR_INIT_NEXT_INSTRUCTIONS.md
- docs/analysis/PHASE4_COMPREHENSIVE_STATUS_ANALYSIS.md
- docs/analysis/PHASE4_EXECUTIVE_SUMMARY.md
- scripts/verify_health_profiles.sh (sanity check script)
- CURRENT_TASK.md (E3-4 complete, next instructions)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-14 02:57:35 +09:00
|
|
|
|
- 設計メモ: `docs/analysis/PHASE4_E3_ENV_CONSTRUCTOR_INIT_DESIGN.md`
|
|
|
|
|
|
- 次の指示書: `docs/analysis/PHASE4_E3_ENV_CONSTRUCTOR_INIT_NEXT_INSTRUCTIONS.md`
|