diff --git a/CURRENT_TASK.md b/CURRENT_TASK.md index c4858c2d..ae371e11 100644 --- a/CURRENT_TASK.md +++ b/CURRENT_TASK.md @@ -96,51 +96,49 @@ --- -## 次の攻め先: mimalloc Gap Closure Roadmap (2.5x → 1.9x) +## 次の攻め先: Phase 2 B4(Wrapper Layer Hot/Cold Split) -**Gap Analysis**: hakmem 50.7M ops/s vs mimalloc 127M ops/s +**背中は見えてきた段階**: B1 NO-GO / B3 ADOPT で、ホットスポットが明確。 +- 勝ち箱(FREE DUALHOT + B3)は本線プリセット昇格済み +- 負け箱(B1)は ENV ガード freeze で本線汚さず -根本原因(ROI順): -1. **Observation tax** (+2-3%): Stats macros branch even when OFF -2. **Policy snapshot** (+10-15%): Per-call TLS policy read + atomic sync -3. **Header management** (+5-10%): 1-byte header per block -4. **Wrapper layer** (+5-10%): malloc → tiny_alloc_gate_fast + security checks -5. **Routing switch** (+3-5%): Per-call switch statement +**残りの主因**: wrapper 層(malloc/free)+ 安全チェック + policy snapshot +- mimalloc 同等までは難しいが、さらに数%~十数%は詰められる見込み -### Phase 1: Quick Wins (Week 1) - Target: +4-7% (52-56M ops/s) +### Phase 2 B4(推奨): WRAPPER-SHAPE-1(malloc/free のホット整形) -**優先度 A1** - FREE 勝ち箱の本線昇格: -- HAKMEM_FREE_TINY_FAST_HOTCOLD=1 を MIXED_TINYV3_C7_SAFE default -- FREE-TINY-FAST-DUALHOT-1 のデフォルト有効化 -- Expected: +2-3% (DUALHOT 効果は既に測定済み +13%) +**設計メモ**: `docs/analysis/PHASE2_B4_WRAPPER_SHAPE_1_DESIGN.md` -**優先度 A2** - 観測税ゼロ化 (Compile-out stats): -- Add HAKMEM_DEBUG_COUNTERS compile-time flag (default 0) -- When 0: `#define ALLOC_GATE_STAT_INC(x) do {} while(0)` (zero cost) -- Files: `alloc_gate_stats_box.h`, `free_path_stats_box.h`, `tiny_front_stats_box.h`, `free_tiny_fast_hotcold_stats_box.h` -- Expected: +2-3% (eliminate branching on all stats) +**狙い**: +- wrapper 入口の "稀なチェック"(LD mode、jemalloc、診断)を `noinline,cold` に押し出す +- ホット側は NULL check → Tiny fast → 即 return(最短経路) +- I-cache 削減 + 分岐予測改善 -**優先度 A3** - Inline header write: -- Add `__attribute__((always_inline))` to `tiny_region_id_write_header()` -- Eliminate function call overhead in hot path -- Expected: +1-2% +**実装**: +- ENV gate: `HAKMEM_WRAP_SHAPE=0/1`(default OFF) +- malloc hot/cold 分割(core/box/hak_wrappers.inc.h) +- free hot/cold 分割(core/box/hak_wrappers.inc.h) -### Phase 2: Structural Changes (Weeks 2-3) - Target: +5-10% (55-61M ops/s) +**A/B テスト**: +- Mixed: 10-run(中央値) +- C6-heavy: 5-run(平均) +- GO条件: Mixed +1% 以上 → プリセット昇格 +- NO-GO条件: -1% 以下 → freeze, ENV opt-in のまま -**優先度 B1** - C4-C7 header tax削減: -- Remove 1-byte header for C6 (512B) / C7 (1024B) allocations -- Use registry-only lookup on free -- Expected: +3-5% (C6/C7 = 30% of workload, no header = 10% size savings) +**期待ゲイン**: Mixed +2-5%, C6-heavy +1-3% -**優先度 B2** - C0-C3 専用 fast path: -- Create `malloc_tiny_fast_c0c3()` entry point (no policy snapshot) -- Conditional dispatch from wrapper based on size -- Expected: +1-2% +### Phase 1: Quick Wins(完了) -**優先度 B3** - Routing jump table: -- Replace switch(route_kind) with function pointer array -- Reduce branch prediction misses (5-way switch → direct dispatch) -- Expected: +1-3% +- ✅ **A1(FREE 勝ち箱の本線昇格)**: `MIXED_TINYV3_C7_SAFE` で `HAKMEM_FREE_TINY_FAST_HOTCOLD=1` を default 化(ADOPT) +- ✅ **A2(観測税ゼロ化)**: `HAKMEM_DEBUG_COUNTERS=0` のとき stats を compile-out(ADOPT) +- ❌ **A3(always_inline header)**: Mixed -4% 回帰のため NO-GO → research box freeze(`docs/analysis/TINY_HEADER_WRITE_ALWAYS_INLINE_A3_DESIGN.md`) + +### Phase 2: Structural Changes(進行中) + +- ❌ **B1(Header tax 削減 v2)**: `HAKMEM_TINY_HEADER_MODE=LIGHT` は Mixed -2.54% → NO-GO / freeze(`docs/analysis/PHASE2_B1_HEADER_TAX_AB_TEST_RESULTS.md`) +- ✅ **B3(Routing 分岐形最適化)**: `HAKMEM_TINY_ALLOC_ROUTE_SHAPE=1` は Mixed +2.89% / C6-heavy +9.13% → ADOPT(プリセット default=1) +- ▶ **B4(次の芯)**: WRAPPER-SHAPE-1(malloc/free のホット整形。`docs/analysis/PHASE2_B4_WRAPPER_SHAPE_1_DESIGN.md`) +- (保留)**B2**: C0–C3 専用 alloc fast path(入口短絡は回帰リスク高。B4 の後に判断) ### Phase 3: Cache Locality (Weeks 4-5) - Target: +12-22% (57-68M ops/s) diff --git a/core/box/wrapper_env_box.c b/core/box/wrapper_env_box.c index c829edc8..5a55f0f8 100644 --- a/core/box/wrapper_env_box.c +++ b/core/box/wrapper_env_box.c @@ -3,7 +3,7 @@ #include #include -wrapper_env_cfg_t g_wrapper_env = {.inited = 0, .step_trace = 0, .ld_safe_mode = 1, .free_wrap_trace = 0, .wrap_diag = 0}; +wrapper_env_cfg_t g_wrapper_env = {.inited = 0, .step_trace = 0, .ld_safe_mode = 1, .free_wrap_trace = 0, .wrap_diag = 0, .wrap_shape = 0}; static inline int env_flag(const char* name, int def) { const char* e = getenv(name); @@ -40,6 +40,7 @@ void wrapper_env_init_once(void) { g_wrapper_env.ld_safe_mode = env_int("HAKMEM_LD_SAFE", 1); g_wrapper_env.free_wrap_trace = env_flag("HAKMEM_FREE_WRAP_TRACE", 0); g_wrapper_env.wrap_diag = env_flag("HAKMEM_WRAP_DIAG", 0); + g_wrapper_env.wrap_shape = env_flag("HAKMEM_WRAP_SHAPE", 0); // Mark as initialized last with memory barrier atomic_store_explicit(&g_wrapper_env.inited, 1, memory_order_release); diff --git a/core/box/wrapper_env_box.h b/core/box/wrapper_env_box.h index b9bc0713..f641f4a2 100644 --- a/core/box/wrapper_env_box.h +++ b/core/box/wrapper_env_box.h @@ -10,6 +10,7 @@ typedef struct { int ld_safe_mode; // HAKMEM_LD_SAFE (default: 1) int free_wrap_trace; // HAKMEM_FREE_WRAP_TRACE (default: 0) int wrap_diag; // HAKMEM_WRAP_DIAG (default: 0) - log first few libc fallbacks + int wrap_shape; // HAKMEM_WRAP_SHAPE (default: 0) - Phase 2 B4: malloc/free hot/cold split } wrapper_env_cfg_t; extern wrapper_env_cfg_t g_wrapper_env;