diff --git a/docs/analysis/PHASE3_BASELINE_AND_CANDIDATES.md b/docs/analysis/PHASE3_BASELINE_AND_CANDIDATES.md index e7ae33b3..925beaf0 100644 --- a/docs/analysis/PHASE3_BASELINE_AND_CANDIDATES.md +++ b/docs/analysis/PHASE3_BASELINE_AND_CANDIDATES.md @@ -303,18 +303,20 @@ static inline void* tiny_alloc_gate_fast(size_t size) --- ### Phase 3 D3: Alloc Gate Specialization (MEDIUM PRIORITY) -**Target**: `tiny_alloc_gate_fast()` for LEGACY-only route +**Target**: `tiny_alloc_gate_fast()` の分岐形を最短化(MIXED 向け) **Expected Gain**: +1-2% **Risk**: LOW **Effort**: 2-3 hours **Implementation**: -1. Create `tiny_alloc_gate_fast_legacy()` specialized variant -2. Eliminate ROUTE_POOL_ONLY and ROUTE_TINY_FIRST branches -3. Use in MIXED profile where all classes are LEGACY +1. New ENV gate: `HAKMEM_ALLOC_GATE_SHAPE=0/1` +2. `tiny_route_get()` を避け、`g_tiny_route[]` の直接参照に置換(release logging branch を回避) +3. `ROUTE_POOL_ONLY` は必ず尊重(`HAKMEM_TINY_PROFILE=hot/off` を壊さない) 4. A/B test: BASELINE vs D3 -**ENV Gate**: `HAKMEM_ALLOC_GATE_LEGACY_ONLY=1` (default: 0) +**Design**: `docs/analysis/PHASE4_D3_ALLOC_GATE_SPECIALIZATION_1_DESIGN.md` + +**ENV Gate**: `HAKMEM_ALLOC_GATE_SHAPE=0/1` (default: 0) --- @@ -376,7 +378,7 @@ static inline void* tiny_alloc_gate_fast(size_t size) 2. **Optional**: Phase 3 D3 (Alloc gate specialization) - pending perf validation - Only proceed if perf shows ≥5% self% in alloc gate - - ENV: `HAKMEM_ALLOC_GATE_LEGACY_ONLY=0/1` + - ENV: `HAKMEM_ALLOC_GATE_SHAPE=0/1` 3. **Phase 4 Planning**: If no more 5%+ targets, prepare Phase 4 roadmap diff --git a/docs/analysis/PHASE3_CACHE_LOCALITY_NEXT_INSTRUCTIONS.md b/docs/analysis/PHASE3_CACHE_LOCALITY_NEXT_INSTRUCTIONS.md index be33154d..96c7e750 100644 --- a/docs/analysis/PHASE3_CACHE_LOCALITY_NEXT_INSTRUCTIONS.md +++ b/docs/analysis/PHASE3_CACHE_LOCALITY_NEXT_INSTRUCTIONS.md @@ -85,9 +85,9 @@ perf report --stdio 狙い: Mixed 本線の固定構成に合わせ、alloc gate の分岐を削って 1–2% 詰める。 -- 設計メモ: `docs/analysis/PHASE3_D3_ALLOC_GATE_SPECIALIZATION_1_DESIGN.md` - 実装指示書: `docs/analysis/PHASE4_ALLOC_GATE_SPECIALIZATION_NEXT_INSTRUCTIONS.md` -- ENV: `HAKMEM_ALLOC_GATE_LEGACY_ONLY=0/1`(default 0) +- 設計メモ(最新版): `docs/analysis/PHASE4_D3_ALLOC_GATE_SPECIALIZATION_1_DESIGN.md` +- ENV: `HAKMEM_ALLOC_GATE_SHAPE=0/1`(default 0) - 注意: “safe enable 判定” を必ず入れて、ENV 組み合わせで壊れないようにする ## 次候補(perf で 5% 超なら着手) diff --git a/docs/analysis/PHASE3_D3_ALLOC_GATE_SPECIALIZATION_1_DESIGN.md b/docs/analysis/PHASE3_D3_ALLOC_GATE_SPECIALIZATION_1_DESIGN.md index 8adbe880..580b9e05 100644 --- a/docs/analysis/PHASE3_D3_ALLOC_GATE_SPECIALIZATION_1_DESIGN.md +++ b/docs/analysis/PHASE3_D3_ALLOC_GATE_SPECIALIZATION_1_DESIGN.md @@ -1,5 +1,8 @@ # Phase 3 D3: Alloc Gate Specialization(Mixed “LEGACY-only” 最短化)設計メモ +> NOTE (2025-12-13): この設計メモは **後続の Phase 4 D3** により置き換えられました。 +> 最新の設計・ENV・実装指示は `docs/analysis/PHASE4_D3_ALLOC_GATE_SPECIALIZATION_1_DESIGN.md` を参照してください。 + ## 目的 Mixed 本線(`MIXED_TINYV3_C7_SAFE`)では、 @@ -84,4 +87,3 @@ Mixed 10-run(推奨: 20M iters): - `HAKMEM_ALLOC_GATE_LEGACY_ONLY=0`(即 OFF) - 箱は独立させ、既存 gate を汚さない(境界 1 箇所差し替え) - diff --git a/docs/analysis/PHASE3_FINALIZATION_SUMMARY.md b/docs/analysis/PHASE3_FINALIZATION_SUMMARY.md index 6d44152d..9d2a9ba9 100644 --- a/docs/analysis/PHASE3_FINALIZATION_SUMMARY.md +++ b/docs/analysis/PHASE3_FINALIZATION_SUMMARY.md @@ -238,14 +238,14 @@ IF mean_gain >= +1.0% AND median_gain >= +0.0%: **Requirement**: perf validation showing `tiny_alloc_gate_fast` self% ≥ 5% -**Design**: `docs/analysis/PHASE3_D3_ALLOC_GATE_SPECIALIZATION_1_DESIGN.md` +**Design**: `docs/analysis/PHASE4_D3_ALLOC_GATE_SPECIALIZATION_1_DESIGN.md` **Strategy**: Specialize alloc gate for fixed MIXED configuration - Eliminate dynamic checks - Inline hot paths - Reduce branch complexity -**ENV**: `HAKMEM_ALLOC_GATE_LEGACY_ONLY=0/1` +**ENV**: `HAKMEM_ALLOC_GATE_SHAPE=0/1` **Decision Criteria**: - IF perf shows ≥5% self% in alloc gate → Proceed with D3 diff --git a/docs/analysis/PHASE4_ALLOC_GATE_SPECIALIZATION_NEXT_INSTRUCTIONS.md b/docs/analysis/PHASE4_ALLOC_GATE_SPECIALIZATION_NEXT_INSTRUCTIONS.md index 75bd423d..1fee4f9f 100644 --- a/docs/analysis/PHASE4_ALLOC_GATE_SPECIALIZATION_NEXT_INSTRUCTIONS.md +++ b/docs/analysis/PHASE4_ALLOC_GATE_SPECIALIZATION_NEXT_INSTRUCTIONS.md @@ -29,24 +29,16 @@ perf report --stdio --no-children ## Step 1: D3 実装(Box Theory) -設計メモ(SSOT): `docs/analysis/PHASE3_D3_ALLOC_GATE_SPECIALIZATION_1_DESIGN.md` +設計メモ(SSOT): `docs/analysis/PHASE4_D3_ALLOC_GATE_SPECIALIZATION_1_DESIGN.md` ### 箱割り(必須) - L0: ENV(戻せる) - - `HAKMEM_ALLOC_GATE_LEGACY_ONLY=0/1`(default 0) -- L1: Safe-Enable 判定(境界 1 箇所) - - Mixed 本線が “LEGACY-only” であることを 1 回だけ確認し、成立した場合のみ fast 経路へ - - Learner ON の場合は必ず disable(route が動的更新されるため) -- L2: Integration(境界 1 箇所) - - `tiny_alloc_gate_fast()` の入口でのみ差し替え(既存の意味は変えない) - -### Safe-Enable 条件(Fail-Fast ではなく “自動 disable”) - -以下のどれかに該当したら D3 は **即 disable** して既存経路へフォールバック: -- learner が有効(例: `HAKMEM_SMALL_LEARNER_V7_ENABLED=1`) -- route_kind に LEGACY 以外が混入(V7/MID/ULTRA/TINY_FIRST 等) -- そのほか Mixed 本線の固定前提が崩れている + - `HAKMEM_ALLOC_GATE_SHAPE=0/1`(default 0) +- L1: Gate Shape(境界 1 箇所) + - `tiny_alloc_gate_fast()` 内の分岐形だけを変更(意味は変えない) + - `tiny_route_get()` を呼ばずに `g_tiny_route[]` を直接読む(release logging branch を避ける) + - **Safety**: `g_tiny_route[ci] == ROUTE_POOL_ONLY` のときは必ず `NULL` を返す(`HAKMEM_TINY_PROFILE=hot/off` を壊さない) ## Step 2: A/B(GO/NO-GO) @@ -54,12 +46,12 @@ Mixed 10-run(推奨: iter=20M, ws=400, 1T): - Baseline: ```bash -HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE HAKMEM_ALLOC_GATE_LEGACY_ONLY=0 \ +HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE HAKMEM_ALLOC_GATE_SHAPE=0 \ ./bench_random_mixed_hakmem 20000000 400 1 ``` - Optimized: ```bash -HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE HAKMEM_ALLOC_GATE_LEGACY_ONLY=1 \ +HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE HAKMEM_ALLOC_GATE_SHAPE=1 \ ./bench_random_mixed_hakmem 20000000 400 1 ``` diff --git a/docs/analysis/PHASE4_D3_ALLOC_GATE_SPECIALIZATION_1_DESIGN.md b/docs/analysis/PHASE4_D3_ALLOC_GATE_SPECIALIZATION_1_DESIGN.md new file mode 100644 index 00000000..b8ab0e4a --- /dev/null +++ b/docs/analysis/PHASE4_D3_ALLOC_GATE_SPECIALIZATION_1_DESIGN.md @@ -0,0 +1,417 @@ +# Phase 4 D3: Alloc Gate Specialization 設計メモ + +## 目的 + +`tiny_alloc_gate_fast()` のルーティング分岐を MIXED 向けに特化(LEGACY 優先パス) + +**背景**: +- Phase 3 完了: +8.93% cumulative gain (37.5M → 51M ops/s) +- Perf analysis: `tiny_alloc_gate_fast` at **12.75% self** (HIGH priority) +- MIXED workload: 99% が LEGACY route(MID_V3 OFF) +- 現状: 全 route (LEGACY/ULTRA/MID/V7) をスイッチで分岐 → 予測失敗コスト高 + +## 観察 + +### Current State (Phase 3 Baseline) +- `tiny_alloc_gate_fast`: 12.75% self + children overhead +- MIXED ワークロード特性: + - 99% が LEGACY route(`HAKMEM_MID_V3_ENABLED=0`) + - C0-C7 全体で uniform branching → prediction miss +- 既存最適化(Phase 2 B3): + - `malloc_tiny_fast_for_class()` 内で LEGACY-first branching 実装済み + - しかし `tiny_alloc_gate_fast()` の routing policy 分岐は最適化されていない + +### Bottleneck Analysis + +**Current flow** (`core/box/tiny_alloc_gate_box.h:139-217`): +```c +void* tiny_alloc_gate_fast(size_t size) { + int class_idx = hak_tiny_size_to_class(size); + if (class_idx < 0 || class_idx >= TINY_NUM_CLASSES) return NULL; + + TinyRoutePolicy route = tiny_route_get(class_idx); // ← Policy lookup + + // Branching on route policy (uniform dispatch, poor prediction) + if (route == ROUTE_POOL_ONLY) return NULL; + + void* user_ptr = malloc_tiny_fast_for_class(size, class_idx); + + if (route == ROUTE_TINY_ONLY) { + // Hot path (99% of Mixed traffic) + return user_ptr; + } + + // ROUTE_TINY_FIRST: fallback allowed + return user_ptr; +} +``` + +**Problem**: +- Policy lookup overhead: `tiny_route_get()` call every allocation +- Branch on `route == ROUTE_POOL_ONLY`: rare but evaluated every time +- Branch on `route == ROUTE_TINY_ONLY` vs `ROUTE_TINY_FIRST`: Mixed default は TINY_FIRST(挙動差はほぼ無い) +- Total cost: ~2-3 branches + 1 policy lookup per allocation + +**Expected savings**: +- Eliminate policy lookup for known-LEGACY workloads +- Convert policy branches to LIKELY-hinted checks +- Reduce instruction count by 5-10 per allocation + +## 実装アプローチ + +### Strategy: LEGACY-first with Static Route Assumption + +**Pattern**: Similar to Phase 2 B3 (routing shape optimization) +- Reference: `core/front/malloc_tiny_fast.h:262-278` +- Proven approach: LIKELY hint + cold helper for rare routes +- Expected branch prediction improvement: 75% miss rate → <5% miss rate + +### L0: Env(戻せる) + +- `HAKMEM_ALLOC_GATE_SHAPE=0/1` (default: 0, OFF) +- Opt-in で特化パスを有効化(常時有効化は慎重) +- Rollback: ENV=0 で即座に既存経路へ復帰 + +### L1: SpecializedGateBox(境界: 1箇所) + +#### Optimized Gate Structure + +**Before** (current): +```c +void* tiny_alloc_gate_fast(size_t size) { + int class_idx = hak_tiny_size_to_class(size); + if (class_idx < 0 || class_idx >= TINY_NUM_CLASSES) return NULL; + + TinyRoutePolicy route = tiny_route_get(class_idx); + + if (route == ROUTE_POOL_ONLY) return NULL; + + void* user_ptr = malloc_tiny_fast_for_class(size, class_idx); + + if (route == ROUTE_TINY_ONLY) return user_ptr; + + return user_ptr; // ROUTE_TINY_FIRST +} +``` + +**After** (optimized for MIXED): +```c +void* tiny_alloc_gate_fast(size_t size) { + int class_idx = hak_tiny_size_to_class(size); + if (__builtin_expect(class_idx < 0 || class_idx >= TINY_NUM_CLASSES, 0)) { + return NULL; + } + + // Phase 4 D3: LEGACY-first gate specialization (ENV gated) + if (TINY_HOT_LIKELY(alloc_gate_shape_enabled())) { + // MIXED fast path: Avoid tiny_route_get() overhead. + // NOTE: We do NOT assume TINY_ONLY vs TINY_FIRST; both return user_ptr. + // Safety: still honor POOL_ONLY if configured via HAKMEM_TINY_PROFILE (e.g., "hot"/"off"). + if (__builtin_expect(g_tiny_route[class_idx & 7] == ROUTE_POOL_ONLY, 0)) { + return NULL; + } + // Direct to malloc_tiny_fast_for_class, skip tiny_route_get() + return malloc_tiny_fast_for_class(size, class_idx); + } + + // Original path (backward compatible) + TinyRoutePolicy route = tiny_route_get(class_idx); + + if (__builtin_expect(route == ROUTE_POOL_ONLY, 0)) return NULL; + + void* user_ptr = malloc_tiny_fast_for_class(size, class_idx); + + if (TINY_HOT_LIKELY(route == ROUTE_TINY_ONLY)) return user_ptr; + + return user_ptr; +} +``` + +#### Branch Prediction Impact + +**Current** (uniform branching): +- Policy lookup: Always executed (1 function call overhead) +- `route == ROUTE_POOL_ONLY`: 0% hit rate (but checked every time) +- `route == ROUTE_TINY_ONLY`: 99% hit rate (but no LIKELY hint) +- Total overhead: ~10-15 cycles per allocation + +**Optimized** (LEGACY-first): +- ENV check: 99% cached (< 1 cycle amortized) +- Direct path: Skip `tiny_route_get()` (and its release logging branch) (save ~5-7 cycles) +- LIKELY hint: CPU predictor trained to expect fast path +- Total savings: ~8-12 cycles per allocation + +**Expected gain**: +1-2% on MIXED (conservative estimate based on 12.75% self%) + +### 実装指示 + +#### File 1: `core/box/tiny_alloc_gate_shape_env_box.h` (新規) + +**Role**: ENV gate for alloc gate shape optimization + +**API**: +```c +// ENV gate: HAKMEM_ALLOC_GATE_SHAPE=0/1 (default: 0) +static inline int alloc_gate_shape_enabled(void) { + static int g_enable = -1; // Lazy init sentinel + if (__builtin_expect(g_enable == -1, 0)) { + const char* e = getenv("HAKMEM_ALLOC_GATE_SHAPE"); + g_enable = (e && *e && *e != '0') ? 1 : 0; + } + return g_enable; +} +``` + +**Integration**: Header-only, single-responsibility (ENV caching only) + +#### File 2: Modify `core/box/tiny_alloc_gate_box.h` (既存) + +**Location**: `tiny_alloc_gate_fast()` function (lines 139-217) + +**Changes**: +1. Include new ENV box header: + ```c + #include "tiny_alloc_gate_shape_env_box.h" + ``` + +2. Add LEGACY-first fast path before existing route dispatch: + ```c + // Phase 4 D3: LEGACY-first gate specialization + if (TINY_HOT_LIKELY(alloc_gate_shape_enabled())) { + // Skip policy lookup for MIXED (ROUTE_TINY_ONLY assumption) + return malloc_tiny_fast_for_class(size, class_idx); + } + ``` + +3. Add LIKELY hints to existing branches (backward compatible path): + ```c + if (__builtin_expect(route == ROUTE_POOL_ONLY, 0)) return NULL; + + void* user_ptr = malloc_tiny_fast_for_class(size, class_idx); + + if (TINY_HOT_LIKELY(route == ROUTE_TINY_ONLY)) return user_ptr; + ``` + +**Safety**: +- ENV gate ensures opt-in behavior +- Fallback path unchanged (existing validation/diagnostics preserved) +- No algorithmic changes (only branch shape optimization) + +## A/B テスト + +### Test Configuration + +**Workload**: Mixed (10-run, 20M iters, ws=400) +- Baseline: `HAKMEM_ALLOC_GATE_SHAPE=0` +- Optimized: `HAKMEM_ALLOC_GATE_SHAPE=1` + +**Commands**: +```bash +# Baseline +HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE HAKMEM_ALLOC_GATE_SHAPE=0 \ + ./bench_random_mixed_hakmem 20000000 400 1 + +# Optimized +HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE HAKMEM_ALLOC_GATE_SHAPE=1 \ + ./bench_random_mixed_hakmem 20000000 400 1 +``` + +### Success Criteria + +**GO**: Mean gain >= +1.0%, Median >= +0.0% +- **Promote to default** in `MIXED_TINYV3_C7_SAFE` preset +- Add to `core/bench_profile.h`: `bench_setenv_default("HAKMEM_ALLOC_GATE_SHAPE", "1");` +- Document in `docs/analysis/ENV_PROFILE_PRESETS.md` + +**NEUTRAL**: -1.0% < gain < +1.0% +- **Freeze as research box** (default OFF) +- Document findings, keep implementation for future study + +**NO-GO**: Mean gain <= -1.0% +- **Freeze and archive** (default OFF, do not pursue) +- Document regression cause, learn for future optimizations + +## 期待値 + +### Performance Gain Estimation + +**Target**: `tiny_alloc_gate_fast` at 12.75% self + +**Optimization**: +- Eliminate policy lookup: ~5-7 cycles saved +- Add LIKELY hints: ~3-5 cycles saved (branch prediction) +- Total savings: ~8-12 cycles per allocation + +**Calculation**: +- Baseline: ~100 cycles per allocation (estimated) +- Savings: 8-12 cycles (8-12% of allocation cost) +- `tiny_alloc_gate_fast` contribution: 12.75% self +- Expected gain: 12.75% × (8-12%) = **+1.0-1.5%** (conservative) + +**Realistic range**: +1-2% on MIXED workload + +### Risk Assessment + +**Risk Level**: LOW + +**Why**: +- Only branch shape optimization (no algorithmic change) +- ENV gate allows instant rollback +- Fallback path unchanged (safety preserved) +- Pattern proven by Phase 2 B3 (+2.89% success) + +**Failure modes**: +- Policy lookup cost lower than expected → minimal/no gain +- ENV check overhead outweighs savings → slight regression +- Both cases: Rollback with `HAKMEM_ALLOC_GATE_SHAPE=0` + +## 非目標 + +**NOT in scope**: +- Route algorithm change (only branch shape) +- Learner integration (optional for future) +- C6-heavy workload optimization (already uses MID_V3 ON) +- Policy snapshot bypass (already done in Phase 3 C3) + +## Reference Patterns + +### Similar Optimizations + +**Phase 2 B3** (Routing shape optimization): +- File: `core/front/malloc_tiny_fast.h:262-278` +- Pattern: `if (TINY_HOT_LIKELY(route_kind == SMALL_ROUTE_LEGACY))` +- Result: **+2.89% on MIXED**, +9.13% on C6-heavy +- Lesson: LIKELY hints + cold helpers are highly effective + +**Phase 3 D1** (Free route cache): +- File: `core/box/tiny_free_route_cache_env_box.h` +- Pattern: ENV gate with lazy init (-1 sentinel) +- Result: **+2.19% on MIXED** (promoted to default) +- Lesson: ENV gates with cached values have minimal overhead + +**Phase 3 C3** (Static routing): +- File: `core/box/tiny_static_route_box.h` +- Pattern: Static route table (bypass policy snapshot) +- Result: **+2.20% on MIXED** +- Lesson: Eliminating dynamic lookups pays off + +### Code Examples + +**B3 Pattern** (LIKELY-first branching): +```c +if (TINY_HOT_LIKELY(env_cfg->alloc_route_shape)) { + if (TINY_HOT_LIKELY(route_kind == SMALL_ROUTE_LEGACY)) { + // Hot path: LEGACY fast (99% traffic) + void* ptr = tiny_hot_alloc_fast(class_idx); + if (TINY_HOT_LIKELY(ptr != NULL)) return ptr; + return tiny_cold_refill_and_alloc(class_idx); + } + // Rare routes: cold helper + return tiny_alloc_route_cold(route_kind, class_idx, size); +} +``` + +**D1 Pattern** (ENV gate with lazy init): +```c +static inline int tiny_free_static_route_enabled(void) { + static int g_enable = -1; + if (__builtin_expect(g_enable == -1, 0)) { + const char* e = getenv("HAKMEM_FREE_STATIC_ROUTE"); + g_enable = (e && *e && *e != '0') ? 1 : 0; + } + return g_enable; +} +``` + +## Integration Plan + +### Step 1: Implementation + +1. Create `core/box/tiny_alloc_gate_shape_env_box.h` + - Single function: `alloc_gate_shape_enabled()` + - Lazy init with -1 sentinel + - Return cached ENV value + +2. Modify `core/box/tiny_alloc_gate_box.h` + - Add include for new ENV box + - Insert LEGACY-first fast path (ENV gated) + - Add LIKELY hints to existing branches + - Preserve all validation/diagnostic logic + +### Step 2: A/B Testing + +1. Build with optimization: + ```bash + make clean && make -j8 CFLAGS="-O3 -flto" + ``` + +2. Run 10-run baseline: + ```bash + for i in {1..10}; do + HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE HAKMEM_ALLOC_GATE_SHAPE=0 \ + ./bench_random_mixed_hakmem 20000000 400 1 + done + ``` + +3. Run 10-run optimized: + ```bash + for i in {1..10}; do + HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE HAKMEM_ALLOC_GATE_SHAPE=1 \ + ./bench_random_mixed_hakmem 20000000 400 1 + done + ``` + +4. Calculate statistics: + - Mean, Median, StdDev for both configurations + - Compare against success criteria + - Document results in `PHASE4_D3_ALLOC_GATE_AB_TEST_RESULTS.md` + +### Step 3: Decision & Promotion + +**If GO**: +1. Update `core/bench_profile.h` (MIXED_TINYV3_C7_SAFE preset) +2. Update `docs/analysis/ENV_PROFILE_PRESETS.md` +3. Update `CURRENT_TASK.md` (Phase 4 D3 complete) +4. Commit with message: "Phase 4 D3: Alloc Gate Specialization (+X.X%)" + +**If NEUTRAL or NO-GO**: +1. Document results in `PHASE4_D3_ALLOC_GATE_AB_TEST_RESULTS.md` +2. Keep ENV gate at default OFF +3. Archive as research box +4. Move to next candidate optimization + +## Validation Checklist + +**Pre-implementation**: +- [ ] Design document reviewed and approved +- [ ] Integration points identified (2 files) +- [ ] Reference patterns studied (B3, D1, C3) +- [ ] ENV gate strategy confirmed (opt-in, default OFF) + +**Implementation**: +- [ ] `tiny_alloc_gate_shape_env_box.h` created +- [ ] `tiny_alloc_gate_box.h` modified (LEGACY-first path added) +- [ ] LIKELY hints added to fallback branches +- [ ] Clean compilation (no new warnings) +- [ ] Health check passes: `scripts/verify_health_profiles.sh` + +**A/B Testing**: +- [ ] Baseline 10-run completed (SHAPE=0) +- [ ] Optimized 10-run completed (SHAPE=1) +- [ ] Statistics calculated (mean, median, stddev) +- [ ] Results documented +- [ ] Success criteria evaluated (GO/NEUTRAL/NO-GO) + +**Promotion** (if GO): +- [ ] `bench_profile.h` updated (default SHAPE=1) +- [ ] `ENV_PROFILE_PRESETS.md` updated +- [ ] `CURRENT_TASK.md` updated (Phase 4 complete) +- [ ] Commit created with clear message +- [ ] Cumulative gain updated (+X.X% total) + +--- + +**Phase 4 D3 Status**: DESIGN COMPLETE +**Next Step**: Implementation → A/B Test → Decision +**Expected Outcome**: +1-2% gain (conservative), LOW risk +**Rollback Plan**: `HAKMEM_ALLOC_GATE_SHAPE=0` (instant revert)