Phase 3 D2: Wrapper Env Cache - [DECISION: NO-GO]
Target: Reduce wrapper_env_cfg() overhead in malloc/free hot path
- Strategy: Cache wrapper env configuration pointer in TLS
- Approach: Fast pointer cache (TLS caches const wrapper_env_cfg_t*)
Implementation:
- core/box/wrapper_env_cache_env_box.h: ENV gate (HAKMEM_WRAP_ENV_CACHE)
- core/box/wrapper_env_cache_box.h: TLS cache layer (wrapper_env_cfg_fast)
- core/box/hak_wrappers.inc.h: Integration into malloc/free hot paths
- ENV gate: HAKMEM_WRAP_ENV_CACHE=0/1 (default OFF)
A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (D2=0): 46.52M ops/s (avg), 46.47M ops/s (median)
- Optimized (D2=1): 45.85M ops/s (avg), 45.98M ops/s (median)
- Improvement: avg -1.44%, median -1.05% (DECISION: NO-GO)
Analysis:
- Regression cause: TLS cache adds overhead (branch + TLS access)
- wrapper_env_cfg() is already minimal (pointer return after simple check)
- Adding TLS caching layer makes it worse, not better
- Branch prediction penalty outweighs any potential savings
Cumulative Phase 2-3:
- B3: +2.89%, B4: +1.47%, C3: +2.20%
- D1: +1.06% (opt-in), D2: -1.44% (NO-GO)
- Total: ~7.2% (excluding D2)
Decision: FREEZE as research box (default OFF, regression confirmed)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-13 22:03:27 +09:00
|
|
|
|
# Phase 3 D2: Wrapper Env Cache(malloc/free wrapper 入口の負担削減)設計メモ
|
|
|
|
|
|
|
|
|
|
|
|
## 目的
|
|
|
|
|
|
|
|
|
|
|
|
`malloc()` / `free()` の wrapper hot path は B4(hot/cold split)でかなり整形できたが、
|
|
|
|
|
|
依然として **毎呼び出しで `wrapper_env_cfg()` を通る** 形になっている。
|
|
|
|
|
|
|
|
|
|
|
|
この D2 では、wrapper 側の「ENV 設定読み出し」を **さらに軽い形** にして、
|
|
|
|
|
|
wrapper の入口オーバーヘッドを 1–2% 詰める。
|
|
|
|
|
|
|
|
|
|
|
|
## 非目標
|
|
|
|
|
|
|
|
|
|
|
|
- ENV の意味や安全フェンス(LD safe / lock depth / init window)を変えない
|
|
|
|
|
|
- wrapper の “cold 側” のロジック再設計はしない
|
|
|
|
|
|
- getenv を hot path に戻さない
|
|
|
|
|
|
|
|
|
|
|
|
## Box Theory(箱割り)
|
|
|
|
|
|
|
|
|
|
|
|
### L0: Env(戻せる)
|
|
|
|
|
|
|
|
|
|
|
|
- ENV gate: `HAKMEM_WRAP_ENV_CACHE=0/1`(default: 0)
|
|
|
|
|
|
- 目的: 回帰したら即 OFF(A/B で戻せる)
|
|
|
|
|
|
|
|
|
|
|
|
### L1: WrapperEnvCacheBox(境界: wrapper 入口 1箇所)
|
|
|
|
|
|
|
|
|
|
|
|
責務: wrapper hot path で `wrapper_env_cfg()` の “初期化チェック/分岐” を避け、
|
|
|
|
|
|
**最短の読み出し形**にする。
|
|
|
|
|
|
|
|
|
|
|
|
実装方針(推奨: まずは最小差分)
|
|
|
|
|
|
|
|
|
|
|
|
1) **Fast pointer**(最小・安全)
|
|
|
|
|
|
- `const wrapper_env_cfg_t*` を TLS に 1 回だけキャッシュし、以降はそれを使う。
|
|
|
|
|
|
- refresh(`wrapper_env_refresh_from_env()`)が走っても、ポインタは不変なので破綻しない。
|
|
|
|
|
|
|
|
|
|
|
|
2) **TLS mirror**(次段、必要なら)
|
|
|
|
|
|
- `wrap_shape` / `ld_safe_mode` 等を TLS にコピー(読み出しを TLS 化)。
|
|
|
|
|
|
- ただし “refresh 同期” の仕組み(version など)が必要になるので D2 では基本やらない。
|
|
|
|
|
|
|
|
|
|
|
|
## 実装指示(小パッチ)
|
|
|
|
|
|
|
|
|
|
|
|
### Patch D2-1: ENV gate
|
|
|
|
|
|
|
|
|
|
|
|
- 新規: `core/box/wrapper_env_cache_env_box.h`
|
|
|
|
|
|
- `wrap_env_cache_enabled()`(lazy init, getenv 1回のみ)
|
|
|
|
|
|
- 可能ならワンショットログ(最大 1 回、release では抑制)
|
|
|
|
|
|
|
|
|
|
|
|
### Patch D2-2: WrapperEnvCacheBox
|
|
|
|
|
|
|
|
|
|
|
|
- 新規: `core/box/wrapper_env_cache_box.h`
|
|
|
|
|
|
- `wrapper_env_cfg_fast()` を提供
|
|
|
|
|
|
- `wrap_env_cache_enabled()==0` → `wrapper_env_cfg()` を返す(現状維持)
|
|
|
|
|
|
- `wrap_env_cache_enabled()==1` → TLS cached pointer を返す
|
|
|
|
|
|
- 例(概略):
|
|
|
|
|
|
- `static __thread const wrapper_env_cfg_t* t = NULL; if (!t) t = wrapper_env_cfg(); return t;`
|
|
|
|
|
|
|
|
|
|
|
|
### Patch D2-3: wrapper 統合(2 箇所だけ)
|
|
|
|
|
|
|
|
|
|
|
|
- `core/box/hak_wrappers.inc.h`
|
|
|
|
|
|
- `malloc()` / `free()` の hot path 冒頭の `wrapper_env_cfg()` を `wrapper_env_cfg_fast()` に置換
|
|
|
|
|
|
- cold 側(`malloc_cold` / `free_cold`)は変更不要(差分最小)
|
|
|
|
|
|
|
|
|
|
|
|
## A/B(GO/NO-GO)
|
|
|
|
|
|
|
|
|
|
|
|
テスト:
|
|
|
|
|
|
- Mixed: `HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE`、10-run
|
|
|
|
|
|
- 比較:
|
|
|
|
|
|
- `HAKMEM_WRAP_ENV_CACHE=0`(baseline)
|
|
|
|
|
|
- `HAKMEM_WRAP_ENV_CACHE=1`(optimized)
|
|
|
|
|
|
|
|
|
|
|
|
判定(10-run mean):
|
|
|
|
|
|
- GO: **+1.0% 以上**
|
|
|
|
|
|
- NO-GO: **-1.0% 以下**
|
|
|
|
|
|
- ±1.0%: NEUTRAL(研究箱維持、default OFF)
|
|
|
|
|
|
|
|
|
|
|
|
注意:
|
|
|
|
|
|
- できれば iter を 20M にしてノイズを下げる(`./bench_random_mixed_hakmem 20000000 400 1`)
|
|
|
|
|
|
|
|
|
|
|
|
## リスク
|
|
|
|
|
|
|
|
|
|
|
|
- 低(pointer cache は refresh と競合しない)
|
|
|
|
|
|
- ただし multi-thread 初回アクセスで TLS 初期化が走るため、最初の数回は誤差に乗る
|
|
|
|
|
|
|
|
|
|
|
|
## Rollback
|
|
|
|
|
|
|
|
|
|
|
|
- `HAKMEM_WRAP_ENV_CACHE=0` で即 OFF
|
|
|
|
|
|
- 実装は既存 `wrapper_env_box` を汚さず、別箱で合成する
|
|
|
|
|
|
|
Phase 3 Finalization: D1 20-run validation, D2 frozen, baseline established
Summary:
- D1 (Free route cache): 20-run validation → PROMOTED TO DEFAULT
- Baseline (20-run, ROUTE=0): 46.30M ops/s (mean), 46.30M (median)
- Optimized (20-run, ROUTE=1): 47.32M ops/s (mean), 47.39M (median)
- Mean gain: +2.19%, Median gain: +2.37%
- Decision: GO (both criteria met: mean >= +1.0%, median >= +0.0%)
- Implementation: Added HAKMEM_FREE_STATIC_ROUTE=1 to MIXED preset
- D2 (Wrapper env cache): FROZEN
- Previous result: -1.44% regression (TLS overhead > benefit)
- Status: Research box (do not pursue further)
- Default: OFF (not included in MIXED_TINYV3_C7_SAFE preset)
- Baseline Phase 3: 46.04M ops/s (Mixed, 10-run, 2025-12-13)
Cumulative Gains (Phase 2-3):
B3: +2.89%, B4: +1.47%, C3: +2.20%, D1: +2.19%
Total: ~7.6-8.9% (conservative: 7.6%, multiplicative: 8.93%)
MID_V3 fix: +13% (structural change, Mixed OFF by default)
Documentation Updates:
- PHASE3_FINALIZATION_SUMMARY.md: Comprehensive Phase 3 report
- PHASE3_CACHE_LOCALITY_NEXT_INSTRUCTIONS.md: D1/D2 final status
- PHASE3_D1_FREE_ROUTE_CACHE_1_DESIGN.md: 20-run validation results
- PHASE3_D2_WRAPPER_ENV_CACHE_1_DESIGN.md: FROZEN status
- ENV_PROFILE_PRESETS.md: D1 ADOPT, D2 FROZEN
- PHASE3_BASELINE_AND_CANDIDATES.md: Post-D1/D2 status
- CURRENT_TASK.md: Phase 3 complete summary
Next:
- D3 requires perf validation (tiny_alloc_gate_fast self% ≥5%)
- Or Phase 4 planning if no more D3-class targets
- Current active optimizations: B3, B4, C3, D1, MID_V3 fix
Files Changed:
- docs/analysis/PHASE3_FINALIZATION_SUMMARY.md (new, 580+ lines)
- docs/analysis/*.md (6 files updated with D1/D2 results)
- CURRENT_TASK.md (Phase 3 status update)
- analyze_d1_results.py (statistical analysis script)
- core/bench_profile.h (D1 promoted to default in MIXED preset)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-13 22:42:22 +09:00
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 結果(A/B)
|
|
|
|
|
|
|
|
|
|
|
|
**判定**: ❌ NO-GO(FROZEN, default OFF)
|
|
|
|
|
|
|
|
|
|
|
|
- Mixed 10-run:
|
|
|
|
|
|
- Baseline(D2=0): avg **46.52M** / median **46.47M**
|
|
|
|
|
|
- Optimized(D2=1): avg **45.85M** / median **45.98M**
|
|
|
|
|
|
- Delta: **-1.44%**(回帰)
|
|
|
|
|
|
|
|
|
|
|
|
**原因(要約)**:
|
|
|
|
|
|
- `wrapper_env_cfg()` 自体が十分軽く、TLS キャッシュ層の追加が逆効果になった
|
|
|
|
|
|
- 「常にキャッシュが速い」ではなく、単純なグローバル参照が勝つケースがある
|
|
|
|
|
|
|
|
|
|
|
|
**Phase 3 Final Status (2025-12-13)**:
|
|
|
|
|
|
- Status: ❌ FROZEN / NO-GO
|
|
|
|
|
|
- Action: Do not pursue further
|
|
|
|
|
|
- Reason: -1.44% regression, TLS overhead > benefit in Mixed workload
|
|
|
|
|
|
- Research box remains available as opt-in (`HAKMEM_WRAP_ENV_CACHE=1`) but NOT recommended
|
|
|
|
|
|
- Default: OFF (not included in MIXED_TINYV3_C7_SAFE preset)
|