Phase 3 Closure & Phase 4 Preparation
Summary: - Phase 3 optimization complete (cumulative +8.93%) - D1 promoted to default (HAKMEM_FREE_STATIC_ROUTE=1, +2.19%) - D2 frozen (NO-GO, -1.44% regression) - Phase 4 instructions prepared (D3/Alloc Gate Specialization) Results: B3 (Routing shape): +2.89% B4 (Wrapper split): +1.47% C3 (Static routing): +2.20% C1 (TLS prefetch): NEUTRAL (-0.34%, research box) C2 (Metadata cache): NEUTRAL (-0.45%, research box) D1 (Free route cache): +2.19% (now default) D2 (Wrapper env cache): NO-GO (-1.44%, frozen) MID_V3 fix: +13% (structural) Total Phase 2-3 gain: ~8.93% (37.5M → 51M ops/s) Updated: - CURRENT_TASK.md: Phase 3 final results + D3 conditions - ENV_PROFILE_PRESETS.md: Active optimizations listed - PHASE3_CACHE_LOCALITY_NEXT_INSTRUCTIONS.md: Phase 3→4 transition - PHASE4_ALLOC_GATE_SPECIALIZATION_NEXT_INSTRUCTIONS.md: D3 execution plan - PHASE3_BASELINE_AND_CANDIDATES.md: Post-validation status Next phase: Phase 4 D3 - Alloc Gate Specialization - Requires: tiny_alloc_gate_fast self% ≥5% from perf - Design SSOT: PHASE3_D3_ALLOC_GATE_SPECIALIZATION_1_DESIGN.md - Execution: PHASE4_ALLOC_GATE_SPECIALIZATION_NEXT_INSTRUCTIONS.md 🤖 Generated with Claude Code Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
This commit is contained in:
@ -49,6 +49,9 @@
|
|||||||
**Baseline Phase 3** (10-run, 2025-12-13):
|
**Baseline Phase 3** (10-run, 2025-12-13):
|
||||||
- Mean: 46.04M ops/s, Median: 46.04M ops/s, StdDev: 0.14M ops/s
|
- Mean: 46.04M ops/s, Median: 46.04M ops/s, StdDev: 0.14M ops/s
|
||||||
|
|
||||||
|
**Next**:
|
||||||
|
- Phase 4 D3 指示書: `docs/analysis/PHASE4_ALLOC_GATE_SPECIALIZATION_NEXT_INSTRUCTIONS.md`
|
||||||
|
|
||||||
### Phase ALLOC-GATE-SSOT-1 + ALLOC-TINY-FAST-DUALHOT-2: COMPLETED
|
### Phase ALLOC-GATE-SSOT-1 + ALLOC-TINY-FAST-DUALHOT-2: COMPLETED
|
||||||
|
|
||||||
**4 Patches Implemented** (2025-12-13):
|
**4 Patches Implemented** (2025-12-13):
|
||||||
@ -258,12 +261,12 @@
|
|||||||
- B4 (Wrapper split): +1.47%
|
- B4 (Wrapper split): +1.47%
|
||||||
- C3 (Static routing): +2.20%
|
- C3 (Static routing): +2.20%
|
||||||
- C2 (Metadata cache): -0.45%
|
- C2 (Metadata cache): -0.45%
|
||||||
- D1 (Free route cache): +1.06%
|
- D1 (Free route cache): +2.19%(PROMOTED TO DEFAULT)
|
||||||
- **Total: ~7.2%** (baseline 37.5M → ~40.2M ops/s)
|
- **Total: ~8.3%** (Phase 2-3, C2=NEUTRAL included)
|
||||||
|
|
||||||
**Commit**: `f059c0ec8`
|
**Commit**: `f059c0ec8`
|
||||||
|
|
||||||
#### Phase 3 D1: Free Path Route Cache ✅ GO (+1.06%)
|
#### Phase 3 D1: Free Path Route Cache ✅ ADOPT - PROMOTED TO DEFAULT (+2.19%)
|
||||||
|
|
||||||
**設計メモ**: `docs/analysis/PHASE3_D1_FREE_ROUTE_CACHE_1_DESIGN.md`
|
**設計メモ**: `docs/analysis/PHASE3_D1_FREE_ROUTE_CACHE_1_DESIGN.md`
|
||||||
|
|
||||||
@ -275,15 +278,21 @@
|
|||||||
- `free_tiny_fast_cold()` path: direct `g_tiny_route_class[]` lookup
|
- `free_tiny_fast_cold()` path: direct `g_tiny_route_class[]` lookup
|
||||||
- `legacy_fallback` path: direct `g_tiny_route_class[]` lookup
|
- `legacy_fallback` path: direct `g_tiny_route_class[]` lookup
|
||||||
- Fallback safety: `g_tiny_route_snapshot_done` check before cache use
|
- Fallback safety: `g_tiny_route_snapshot_done` check before cache use
|
||||||
- ENV gate: `HAKMEM_FREE_STATIC_ROUTE=0/1` (default OFF)
|
- ENV gate: `HAKMEM_FREE_STATIC_ROUTE=0/1` (default OFF; `MIXED_TINYV3_C7_SAFE` では default ON)
|
||||||
|
|
||||||
**A/B テスト結果** ✅ GO:
|
**A/B テスト結果** ✅ ADOPT:
|
||||||
- Mixed (10-run):
|
- Mixed (10-run, initial):
|
||||||
- Baseline (D1=0): 45,132,610 ops/s (avg), 45,756,040 ops/s (median)
|
- Baseline (D1=0): 45,132,610 ops/s (avg), 45,756,040 ops/s (median)
|
||||||
- Optimized (D1=1): 45,610,062 ops/s (avg), 45,402,234 ops/s (median)
|
- Optimized (D1=1): 45,610,062 ops/s (avg), 45,402,234 ops/s (median)
|
||||||
- **Average gain: +1.06%**, **Median gain: -0.77%**
|
- **Average gain: +1.06%**, **Median gain: -0.77%**
|
||||||
- **Decision: GO** (average exceeds +1.0% threshold)
|
|
||||||
- Action: Keep as ENV-gated optimization (candidate for future default)
|
- Mixed (20-run, validation / iter=20M, ws=400):
|
||||||
|
- Baseline(ROUTE=0): Mean **46.30M** / Median **46.30M** / StdDev **0.10M**
|
||||||
|
- Optimized(ROUTE=1): Mean **47.32M** / Median **47.39M** / StdDev **0.11M**
|
||||||
|
- Gain: Mean **+2.19%** ✓ / Median **+2.37%** ✓
|
||||||
|
|
||||||
|
- **Decision**: ✅ Promoted to `MIXED_TINYV3_C7_SAFE` preset default
|
||||||
|
- Rollback: `HAKMEM_FREE_STATIC_ROUTE=0`
|
||||||
|
|
||||||
**Rationale**:
|
**Rationale**:
|
||||||
- Eliminates `tiny_route_for_class()` call overhead in free path
|
- Eliminates `tiny_route_for_class()` call overhead in free path
|
||||||
@ -291,15 +300,6 @@
|
|||||||
- Safe fallback: checks snapshot initialization before cache use
|
- Safe fallback: checks snapshot initialization before cache use
|
||||||
- Minimal code footprint: 2 integration points in malloc_tiny_fast.h
|
- Minimal code footprint: 2 integration points in malloc_tiny_fast.h
|
||||||
|
|
||||||
**Current Cumulative Gain** (Phase 2-3):
|
|
||||||
- B3 (Routing shape): +2.89%
|
|
||||||
- B4 (Wrapper split): +1.47%
|
|
||||||
- C3 (Static routing): +2.20%
|
|
||||||
- D1 (Free route cache): +1.06%
|
|
||||||
- **Total: ~7.9%** (cumulative, assuming multiplicative gains)
|
|
||||||
|
|
||||||
**Commit**: `f059c0ec8`
|
|
||||||
|
|
||||||
#### Phase 3 D2: Wrapper Env Cache ❌ NO-GO (-1.44%)
|
#### Phase 3 D2: Wrapper Env Cache ❌ NO-GO (-1.44%)
|
||||||
|
|
||||||
**設計メモ**: `docs/analysis/PHASE3_D2_WRAPPER_ENV_CACHE_1_DESIGN.md`
|
**設計メモ**: `docs/analysis/PHASE3_D2_WRAPPER_ENV_CACHE_1_DESIGN.md`
|
||||||
|
|||||||
@ -41,6 +41,7 @@ HAKMEM_BENCH_MAX_SIZE=1024
|
|||||||
- `HAKMEM_WRAP_SHAPE=1`(Phase 2 B4: wrapper hot/cold split を default ON)
|
- `HAKMEM_WRAP_SHAPE=1`(Phase 2 B4: wrapper hot/cold split を default ON)
|
||||||
- `HAKMEM_TINY_ALLOC_ROUTE_SHAPE=1`(Phase 2 B3: alloc の route dispatch 形を最適化)
|
- `HAKMEM_TINY_ALLOC_ROUTE_SHAPE=1`(Phase 2 B3: alloc の route dispatch 形を最適化)
|
||||||
- `HAKMEM_TINY_STATIC_ROUTE=1`(Phase 3 C3: policy_snapshot bypass を default ON)
|
- `HAKMEM_TINY_STATIC_ROUTE=1`(Phase 3 C3: policy_snapshot bypass を default ON)
|
||||||
|
- `HAKMEM_FREE_STATIC_ROUTE=1`(Phase 3 D1: free path route cache を default ON)
|
||||||
- `HAKMEM_MID_V3_ENABLED=0`(Mixed 本線では OFF。C6-heavy のみ推奨ON)
|
- `HAKMEM_MID_V3_ENABLED=0`(Mixed 本線では OFF。C6-heavy のみ推奨ON)
|
||||||
- `HAKMEM_MID_V3_CLASSES=0x0`(Mixed 本線では未使用)
|
- `HAKMEM_MID_V3_CLASSES=0x0`(Mixed 本線では未使用)
|
||||||
- `HAKMEM_MID_V35_ENABLED=0`(Phase v11a-5: Mixed では MID v3.5 OFF が最速)
|
- `HAKMEM_MID_V35_ENABLED=0`(Phase v11a-5: Mixed では MID v3.5 OFF が最速)
|
||||||
|
|||||||
@ -282,12 +282,14 @@ static inline void* tiny_alloc_gate_fast(size_t size)
|
|||||||
|
|
||||||
## Step 3: Recommended Next Steps
|
## Step 3: Recommended Next Steps
|
||||||
|
|
||||||
### Phase 3 D1: Free Path Route Cache ✅ GO(ENV opt-in)
|
### Phase 3 D1: Free Path Route Cache ✅ ADOPT(PROMOTED TO DEFAULT)
|
||||||
**Target**: `tiny_route_for_class()` の呼び出しを free path から削る
|
**Target**: `tiny_route_for_class()` の呼び出しを free path から削る
|
||||||
**Result**: Mixed 10-run mean **+1.06%**(median は負ける回がある)
|
**Result**: Mixed 20-run mean **+2.19%** / median **+2.37%**
|
||||||
**Decision**: ✅ GO だが **default 化は 20-run 確認待ち**
|
**Decision**: ✅ `MIXED_TINYV3_C7_SAFE` の default に昇格
|
||||||
|
|
||||||
**ENV Gate**: `HAKMEM_FREE_STATIC_ROUTE=1`(default: 0)
|
**ENV Gate**:
|
||||||
|
- `HAKMEM_FREE_STATIC_ROUTE=0/1`(default: 0)
|
||||||
|
- `MIXED_TINYV3_C7_SAFE` プリセットは `1` を default 注入(rollback は `0`)
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@ -321,7 +323,7 @@ static inline void* tiny_alloc_gate_fast(size_t size)
|
|||||||
| Phase | Optimization | Expected Gain | Notes |
|
| Phase | Optimization | Expected Gain | Notes |
|
||||||
|------------|----------------------------------|---------------|-------|
|
|------------|----------------------------------|---------------|-------|
|
||||||
| Baseline | MID_V3=0 + B3+B4+C3 | - | — |
|
| Baseline | MID_V3=0 + B3+B4+C3 | - | — |
|
||||||
| **D1** | Free route cache | +0〜+2% | mean は勝ち、median 確認待ち(default OFF) |
|
| **D1** | Free route cache | +0〜+2% | ✅ ADOPT(Mixed preset default ON) |
|
||||||
| **D2** | Wrapper env cache | — | NO-GO(freeze) |
|
| **D2** | Wrapper env cache | — | NO-GO(freeze) |
|
||||||
| **D3** | Alloc gate specialization | +0〜+2% | perf で 5% 超なら着手 |
|
| **D3** | Alloc gate specialization | +0〜+2% | perf で 5% 超なら着手 |
|
||||||
|
|
||||||
|
|||||||
@ -76,28 +76,17 @@ perf report --stdio
|
|||||||
- self% が **5% 未満の箱は NO-GO(後回し)**
|
- self% が **5% 未満の箱は NO-GO(後回し)**
|
||||||
- 5% 以上の関数/箱だけを次のフェーズ候補にする
|
- 5% 以上の関数/箱だけを次のフェーズ候補にする
|
||||||
|
|
||||||
### Step 3: Phase 3 D1 を “昇格できるか” を確定する(20-run)
|
### Step 3: Phase 3 D2 は NO-GO(凍結)
|
||||||
|
|
||||||
`HAKMEM_FREE_STATIC_ROUTE=1` は **10-run mean が +1.06%** とギリ勝ちだが、
|
|
||||||
median が負ける回があるため、プリセット default 化は **20-run で確度を上げてから**。
|
|
||||||
|
|
||||||
推奨手順(Mixed, iter=20M, ws=400, 1T):
|
|
||||||
1. Baseline: `HAKMEM_FREE_STATIC_ROUTE=0` を 20-run
|
|
||||||
2. Optimized: `HAKMEM_FREE_STATIC_ROUTE=1` を 20-run
|
|
||||||
3. 判定(mean + median):
|
|
||||||
- GO(昇格候補): mean **+1.0%** 以上 かつ median もプラス域
|
|
||||||
- それ以外: ENV opt-in 維持(default OFF)
|
|
||||||
|
|
||||||
### Step 4: Phase 3 D2 は NO-GO(凍結)
|
|
||||||
|
|
||||||
`HAKMEM_WRAP_ENV_CACHE=1` は **-1.44% 回帰**のため、研究箱として freeze(default OFF)。
|
`HAKMEM_WRAP_ENV_CACHE=1` は **-1.44% 回帰**のため、研究箱として freeze(default OFF)。
|
||||||
次は D3(alloc 側)に進むか、Phase 3 を総括して次フェーズへ移る。
|
次は D3(alloc 側)に進むか、Phase 3 を総括して次フェーズへ移る。
|
||||||
|
|
||||||
### Step 5: Phase 3 D3(Alloc Gate Specialization)は “perf で 5%超えたら” 着手
|
### Step 4: Phase 3 D3(Alloc Gate Specialization)は “perf で 5%超えたら” 着手
|
||||||
|
|
||||||
狙い: Mixed 本線の固定構成に合わせ、alloc gate の分岐を削って 1–2% 詰める。
|
狙い: Mixed 本線の固定構成に合わせ、alloc gate の分岐を削って 1–2% 詰める。
|
||||||
|
|
||||||
- 設計メモ: `docs/analysis/PHASE3_D3_ALLOC_GATE_SPECIALIZATION_1_DESIGN.md`
|
- 設計メモ: `docs/analysis/PHASE3_D3_ALLOC_GATE_SPECIALIZATION_1_DESIGN.md`
|
||||||
|
- 実装指示書: `docs/analysis/PHASE4_ALLOC_GATE_SPECIALIZATION_NEXT_INSTRUCTIONS.md`
|
||||||
- ENV: `HAKMEM_ALLOC_GATE_LEGACY_ONLY=0/1`(default 0)
|
- ENV: `HAKMEM_ALLOC_GATE_LEGACY_ONLY=0/1`(default 0)
|
||||||
- 注意: “safe enable 判定” を必ず入れて、ENV 組み合わせで壊れないようにする
|
- 注意: “safe enable 判定” を必ず入れて、ENV 組み合わせで壊れないようにする
|
||||||
|
|
||||||
|
|||||||
@ -0,0 +1,81 @@
|
|||||||
|
# Phase 4: Alloc Gate Specialization(D3 実装指示書)
|
||||||
|
|
||||||
|
Phase 3 は **D1 昇格 / D2 凍結**で完全閉鎖。次の “芯” は alloc 側の gate を削る D3。
|
||||||
|
|
||||||
|
## 現在地(SSOT)
|
||||||
|
|
||||||
|
- Active defaults(Mixed):
|
||||||
|
- `HAKMEM_TINY_ALLOC_ROUTE_SHAPE=1`(B3)
|
||||||
|
- `HAKMEM_WRAP_SHAPE=1`(B4)
|
||||||
|
- `HAKMEM_TINY_STATIC_ROUTE=1`(C3)
|
||||||
|
- `HAKMEM_FREE_STATIC_ROUTE=1`(D1, promoted)
|
||||||
|
- `HAKMEM_MID_V3_ENABLED=0`(Mixed 本線 SSOT)
|
||||||
|
- Frozen / ignore:
|
||||||
|
- `HAKMEM_WRAP_ENV_CACHE=1`(D2, NO-GO)
|
||||||
|
|
||||||
|
## Step 0: perf で “GO 条件” を満たしているか確認
|
||||||
|
|
||||||
|
判定基準: `tiny_alloc_gate_fast` が **self% ≥ 5%**。
|
||||||
|
|
||||||
|
推奨(Mixed, iter=20M, ws=400, 1T):
|
||||||
|
```bash
|
||||||
|
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE perf record -F 99 -- \
|
||||||
|
./bench_random_mixed_hakmem 20000000 400 1
|
||||||
|
perf report --stdio --no-children
|
||||||
|
```
|
||||||
|
|
||||||
|
メモ(直近の 1-run perf):
|
||||||
|
- `tiny_alloc_gate_fast` self% ≈ **18.8%** → **D3 GO 条件は満たす**
|
||||||
|
|
||||||
|
## Step 1: D3 実装(Box Theory)
|
||||||
|
|
||||||
|
設計メモ(SSOT): `docs/analysis/PHASE3_D3_ALLOC_GATE_SPECIALIZATION_1_DESIGN.md`
|
||||||
|
|
||||||
|
### 箱割り(必須)
|
||||||
|
|
||||||
|
- L0: ENV(戻せる)
|
||||||
|
- `HAKMEM_ALLOC_GATE_LEGACY_ONLY=0/1`(default 0)
|
||||||
|
- L1: Safe-Enable 判定(境界 1 箇所)
|
||||||
|
- Mixed 本線が “LEGACY-only” であることを 1 回だけ確認し、成立した場合のみ fast 経路へ
|
||||||
|
- Learner ON の場合は必ず disable(route が動的更新されるため)
|
||||||
|
- L2: Integration(境界 1 箇所)
|
||||||
|
- `tiny_alloc_gate_fast()` の入口でのみ差し替え(既存の意味は変えない)
|
||||||
|
|
||||||
|
### Safe-Enable 条件(Fail-Fast ではなく “自動 disable”)
|
||||||
|
|
||||||
|
以下のどれかに該当したら D3 は **即 disable** して既存経路へフォールバック:
|
||||||
|
- learner が有効(例: `HAKMEM_SMALL_LEARNER_V7_ENABLED=1`)
|
||||||
|
- route_kind に LEGACY 以外が混入(V7/MID/ULTRA/TINY_FIRST 等)
|
||||||
|
- そのほか Mixed 本線の固定前提が崩れている
|
||||||
|
|
||||||
|
## Step 2: A/B(GO/NO-GO)
|
||||||
|
|
||||||
|
Mixed 10-run(推奨: iter=20M, ws=400, 1T):
|
||||||
|
|
||||||
|
- Baseline:
|
||||||
|
```bash
|
||||||
|
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE HAKMEM_ALLOC_GATE_LEGACY_ONLY=0 \
|
||||||
|
./bench_random_mixed_hakmem 20000000 400 1
|
||||||
|
```
|
||||||
|
- Optimized:
|
||||||
|
```bash
|
||||||
|
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE HAKMEM_ALLOC_GATE_LEGACY_ONLY=1 \
|
||||||
|
./bench_random_mixed_hakmem 20000000 400 1
|
||||||
|
```
|
||||||
|
|
||||||
|
判定(10-run mean):
|
||||||
|
- GO: **+1.0% 以上**
|
||||||
|
- NO-GO: **-1.0% 以下**
|
||||||
|
- ±1.0%: NEUTRAL(研究箱維持、default OFF)
|
||||||
|
|
||||||
|
## Step 3: 昇格(GO の場合のみ)
|
||||||
|
|
||||||
|
- `MIXED_TINYV3_C7_SAFE` プリセットへ default 注入(`core/bench_profile.h`)
|
||||||
|
- `docs/analysis/ENV_PROFILE_PRESETS.md` に結果・rollback 手順を追記
|
||||||
|
- `CURRENT_TASK.md` を “Phase 4 完了” に更新
|
||||||
|
|
||||||
|
## Step 4: 失敗時の扱い(箱理論)
|
||||||
|
|
||||||
|
- NO-GO/NEUTRAL の場合:
|
||||||
|
- D3 は research box として freeze(default OFF)
|
||||||
|
- 本線は汚さない(プリセット昇格はしない)
|
||||||
Reference in New Issue
Block a user