Phase 2 B1/B3/B4 preparation: Analysis & ENV gate setup

## Phase 2 Optimization Research Complete

### B1 (Header tax reduction v2) - NO-GO
- HAKMEM_TINY_HEADER_MODE=LIGHT: -2.54% regression on Mixed
- Decision: FREEZE as research box (ENV opt-in only)

### B3 (Routing branch shape optimization) - ADOPT
- Mixed: +2.89% (48.41M → 49.80M ops/s)
- C6-heavy: +9.13% (8.97M → 9.79M ops/s)
- Strategy: LIKELY on LEGACY (hot), noinline,cold helper for rare routes
- Implementation: Already in malloc_tiny_fast.h:252-267
- Profile updates: HAKMEM_TINY_ALLOC_ROUTE_SHAPE=1 now default

### B4 (Wrapper Layer Hot/Cold Split) - Preparation
- Design memo: docs/analysis/PHASE2_B4_WRAPPER_SHAPE_1_DESIGN.md
- Goal: Split malloc/free into hot/cold paths, reduce I-cache pressure
- ENV gate: HAKMEM_WRAP_SHAPE=0/1 (added to wrapper_env_box)
- Expected gain: +2-5% Mixed, +1-3% C6-heavy

## Analysis Summary
- Background is visible: FREE DUALHOT + B3 routing optimizations work
- Code layering is clean: winning boxes promoted to presets, losing boxes frozen with ENV guards
- Remaining gap to mimalloc is wrapper layer + safety checks + policy snapshot
- Further +5-10% still realistically achievable

🤖 Generated with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
This commit is contained in:
Moe Charm (CI)
2025-12-13 16:46:18 +09:00
parent cc398e4a0e
commit 0feeccdcef
3 changed files with 36 additions and 36 deletions

View File

@ -96,51 +96,49 @@
--- ---
## 次の攻め先: mimalloc Gap Closure Roadmap (2.5x → 1.9x) ## 次の攻め先: Phase 2 B4Wrapper Layer Hot/Cold Split
**Gap Analysis**: hakmem 50.7M ops/s vs mimalloc 127M ops/s **背中は見えてきた段階**: B1 NO-GO / B3 ADOPT で、ホットスポットが明確。
- 勝ち箱FREE DUALHOT + B3は本線プリセット昇格済み
- 負け箱B1は ENV ガード freeze で本線汚さず
根本原因ROI順: **残りの主因**: wrapper 層malloc/free+ 安全チェック + policy snapshot
1. **Observation tax** (+2-3%): Stats macros branch even when OFF - mimalloc 同等までは難しいが、さらに数%~十数%は詰められる見込み
2. **Policy snapshot** (+10-15%): Per-call TLS policy read + atomic sync
3. **Header management** (+5-10%): 1-byte header per block
4. **Wrapper layer** (+5-10%): malloc → tiny_alloc_gate_fast + security checks
5. **Routing switch** (+3-5%): Per-call switch statement
### Phase 1: Quick Wins (Week 1) - Target: +4-7% (52-56M ops/s) ### Phase 2 B4推奨: WRAPPER-SHAPE-1malloc/free のホット整形)
**優先度 A1** - FREE 勝ち箱の本線昇格: **設計メモ**: `docs/analysis/PHASE2_B4_WRAPPER_SHAPE_1_DESIGN.md`
- HAKMEM_FREE_TINY_FAST_HOTCOLD=1 を MIXED_TINYV3_C7_SAFE default
- FREE-TINY-FAST-DUALHOT-1 のデフォルト有効化
- Expected: +2-3% (DUALHOT 効果は既に測定済み +13%)
**優先度 A2** - 観測税ゼロ化 (Compile-out stats): **狙い**:
- Add HAKMEM_DEBUG_COUNTERS compile-time flag (default 0) - wrapper 入口の "稀なチェック"LD mode、jemalloc、診断`noinline,cold` に押し出す
- When 0: `#define ALLOC_GATE_STAT_INC(x) do {} while(0)` (zero cost) - ホット側は NULL check → Tiny fast → 即 return最短経路
- Files: `alloc_gate_stats_box.h`, `free_path_stats_box.h`, `tiny_front_stats_box.h`, `free_tiny_fast_hotcold_stats_box.h` - I-cache 削減 + 分岐予測改善
- Expected: +2-3% (eliminate branching on all stats)
**優先度 A3** - Inline header write: **実装**:
- Add `__attribute__((always_inline))` to `tiny_region_id_write_header()` - ENV gate: `HAKMEM_WRAP_SHAPE=0/1`default OFF
- Eliminate function call overhead in hot path - malloc hot/cold 分割core/box/hak_wrappers.inc.h
- Expected: +1-2% - free hot/cold 分割core/box/hak_wrappers.inc.h
### Phase 2: Structural Changes (Weeks 2-3) - Target: +5-10% (55-61M ops/s) **A/B テスト**:
- Mixed: 10-run中央値
- C6-heavy: 5-run平均
- GO条件: Mixed +1% 以上 → プリセット昇格
- NO-GO条件: -1% 以下 → freeze, ENV opt-in のまま
**優先度 B1** - C4-C7 header tax削減: **期待ゲイン**: Mixed +2-5%, C6-heavy +1-3%
- Remove 1-byte header for C6 (512B) / C7 (1024B) allocations
- Use registry-only lookup on free
- Expected: +3-5% (C6/C7 = 30% of workload, no header = 10% size savings)
**優先度 B2** - C0-C3 専用 fast path: ### Phase 1: Quick Wins完了
- Create `malloc_tiny_fast_c0c3()` entry point (no policy snapshot)
- Conditional dispatch from wrapper based on size
- Expected: +1-2%
**優先度 B3** - Routing jump table: -**A1FREE 勝ち箱の本線昇格)**: `MIXED_TINYV3_C7_SAFE``HAKMEM_FREE_TINY_FAST_HOTCOLD=1` を default 化ADOPT
- Replace switch(route_kind) with function pointer array - **A2観測税ゼロ化**: `HAKMEM_DEBUG_COUNTERS=0` のとき stats を compile-outADOPT
- Reduce branch prediction misses (5-way switch → direct dispatch) - **A3always_inline header**: Mixed -4% 回帰のため NO-GO → research box freeze`docs/analysis/TINY_HEADER_WRITE_ALWAYS_INLINE_A3_DESIGN.md`
- Expected: +1-3%
### Phase 2: Structural Changes進行中
-**B1Header tax 削減 v2**: `HAKMEM_TINY_HEADER_MODE=LIGHT` は Mixed -2.54% → NO-GO / freeze`docs/analysis/PHASE2_B1_HEADER_TAX_AB_TEST_RESULTS.md`
-**B3Routing 分岐形最適化)**: `HAKMEM_TINY_ALLOC_ROUTE_SHAPE=1` は Mixed +2.89% / C6-heavy +9.13% → ADOPTプリセット default=1
-**B4次の芯**: WRAPPER-SHAPE-1malloc/free のホット整形。`docs/analysis/PHASE2_B4_WRAPPER_SHAPE_1_DESIGN.md`
- (保留)**B2**: C0C3 専用 alloc fast path入口短絡は回帰リスク高。B4 の後に判断)
### Phase 3: Cache Locality (Weeks 4-5) - Target: +12-22% (57-68M ops/s) ### Phase 3: Cache Locality (Weeks 4-5) - Target: +12-22% (57-68M ops/s)

View File

@ -3,7 +3,7 @@
#include <stdlib.h> #include <stdlib.h>
#include <string.h> #include <string.h>
wrapper_env_cfg_t g_wrapper_env = {.inited = 0, .step_trace = 0, .ld_safe_mode = 1, .free_wrap_trace = 0, .wrap_diag = 0}; wrapper_env_cfg_t g_wrapper_env = {.inited = 0, .step_trace = 0, .ld_safe_mode = 1, .free_wrap_trace = 0, .wrap_diag = 0, .wrap_shape = 0};
static inline int env_flag(const char* name, int def) { static inline int env_flag(const char* name, int def) {
const char* e = getenv(name); const char* e = getenv(name);
@ -40,6 +40,7 @@ void wrapper_env_init_once(void) {
g_wrapper_env.ld_safe_mode = env_int("HAKMEM_LD_SAFE", 1); g_wrapper_env.ld_safe_mode = env_int("HAKMEM_LD_SAFE", 1);
g_wrapper_env.free_wrap_trace = env_flag("HAKMEM_FREE_WRAP_TRACE", 0); g_wrapper_env.free_wrap_trace = env_flag("HAKMEM_FREE_WRAP_TRACE", 0);
g_wrapper_env.wrap_diag = env_flag("HAKMEM_WRAP_DIAG", 0); g_wrapper_env.wrap_diag = env_flag("HAKMEM_WRAP_DIAG", 0);
g_wrapper_env.wrap_shape = env_flag("HAKMEM_WRAP_SHAPE", 0);
// Mark as initialized last with memory barrier // Mark as initialized last with memory barrier
atomic_store_explicit(&g_wrapper_env.inited, 1, memory_order_release); atomic_store_explicit(&g_wrapper_env.inited, 1, memory_order_release);

View File

@ -10,6 +10,7 @@ typedef struct {
int ld_safe_mode; // HAKMEM_LD_SAFE (default: 1) int ld_safe_mode; // HAKMEM_LD_SAFE (default: 1)
int free_wrap_trace; // HAKMEM_FREE_WRAP_TRACE (default: 0) int free_wrap_trace; // HAKMEM_FREE_WRAP_TRACE (default: 0)
int wrap_diag; // HAKMEM_WRAP_DIAG (default: 0) - log first few libc fallbacks int wrap_diag; // HAKMEM_WRAP_DIAG (default: 0) - log first few libc fallbacks
int wrap_shape; // HAKMEM_WRAP_SHAPE (default: 0) - Phase 2 B4: malloc/free hot/cold split
} wrapper_env_cfg_t; } wrapper_env_cfg_t;
extern wrapper_env_cfg_t g_wrapper_env; extern wrapper_env_cfg_t g_wrapper_env;