From 742b474890aeba86302f1ccef1ec7addb9b4262e Mon Sep 17 00:00:00 2001 From: "Moe Charm (CI)" Date: Sat, 13 Dec 2025 22:04:28 +0900 Subject: [PATCH] Update CURRENT_TASK: Phase 3 D2 Complete (NO-GO, -1.44% regression) --- CURRENT_TASK.md | 47 +++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 45 insertions(+), 2 deletions(-) diff --git a/CURRENT_TASK.md b/CURRENT_TASK.md index b8ac4d8a..cfb71ad9 100644 --- a/CURRENT_TASK.md +++ b/CURRENT_TASK.md @@ -1,6 +1,6 @@ # 本線タスク(現在) -## 更新メモ(2025-12-13 Phase 1-2 Complete) +## 更新メモ(2025-12-13 Phase 3 D2 Complete - NO-GO) ### Phase 1 Quick Wins: FREE 昇格 + 観測税ゼロ化 - ✅ **A1(FREE 昇格)**: `MIXED_TINYV3_C7_SAFE` で `HAKMEM_FREE_TINY_FAST_HOTCOLD=1` をデフォルト化 @@ -33,7 +33,12 @@ - Implementation: Already in place (lines 252-267 in malloc_tiny_fast.h), now enabled by default - Profile updates: Added `bench_setenv_default("HAKMEM_TINY_ALLOC_ROUTE_SHAPE", "1")` to both profiles -## 現在地: B3 採用完了 ✅ (Mixed +2.89%, C6-heavy +9.13%) +## 現在地: Phase 3 D2 Complete ❌ NO-GO (Mixed -1.44%, wrapper env cache regression) + +**Summary**: +- Phase 3 D2 (Wrapper Env Cache): -1.44% regression → **FROZEN as research box** +- Lesson: TLS caching not always beneficial - simple global access can be faster +- Cumulative gains: B3 +2.89%, B4 +1.47%, C3 +2.20%, D1 +1.06% (opt-in) → **~7.2%** ### Phase ALLOC-GATE-SSOT-1 + ALLOC-TINY-FAST-DUALHOT-2: COMPLETED @@ -286,6 +291,44 @@ **Commit**: `f059c0ec8` +#### Phase 3 D2: Wrapper Env Cache ❌ NO-GO (-1.44%) + +**設計メモ**: `docs/analysis/PHASE3_D2_WRAPPER_ENV_CACHE_1_DESIGN.md` + +**狙い**: malloc/free wrapper 入口の `wrapper_env_cfg()` 呼び出しオーバーヘッドを削減 + +**実装完了** ✅: +- `core/box/wrapper_env_cache_env_box.h` (ENV gate: HAKMEM_WRAP_ENV_CACHE) +- `core/box/wrapper_env_cache_box.h` (TLS cache: wrapper_env_cfg_fast) +- `core/box/hak_wrappers.inc.h` (lines 174, 553) - malloc/free hot paths で wrapper_env_cfg_fast() 使用 +- Strategy: Fast pointer cache (TLS caches const wrapper_env_cfg_t*) +- ENV gate: `HAKMEM_WRAP_ENV_CACHE=0/1` (default OFF) + +**A/B テスト結果** ❌ NO-GO: +- Mixed (10-run, 20M iters): + - Baseline (D2=0): 46,516,538 ops/s (avg), 46,467,988 ops/s (median) + - Optimized (D2=1): 45,846,933 ops/s (avg), 45,978,185 ops/s (median) + - **Average gain: -1.44%**, **Median gain: -1.05%** +- **Decision: NO-GO** (regression below -1.0% threshold) +- Action: FREEZE as research box (default OFF, regression confirmed) + +**Analysis**: +- Regression cause: TLS cache adds overhead (branch + TLS access cost) +- wrapper_env_cfg() is already minimal (pointer return after simple check in g_wrapper_env.inited) +- Adding TLS caching layer makes it worse, not better +- Branch prediction penalty for wrap_env_cache_enabled() check outweighs any savings +- Lesson: Not all caching helps - simple global access can be faster than TLS cache + +**Current Cumulative Gain** (Phase 2-3): +- B3 (Routing shape): +2.89% +- B4 (Wrapper split): +1.47% +- C3 (Static routing): +2.20% +- D1 (Free route cache): +1.06% (opt-in) +- D2 (Wrapper env cache): -1.44% (NO-GO, frozen) +- **Total: ~7.2%** (excluding D2, D1 is opt-in ENV) + +**Commit**: `19056282b` + #### Phase 3 C4: MIXED MID_V3 Routing Fix ✅ ADOPT **要点**: `MIXED_TINYV3_C7_SAFE` では `HAKMEM_MID_V3_ENABLED=1` が大きく遅くなるため、**プリセットのデフォルトを OFF に変更**。