Phase 4 D3 Design: Alloc Gate Shape
This commit is contained in:
@ -303,18 +303,20 @@ static inline void* tiny_alloc_gate_fast(size_t size)
|
|||||||
---
|
---
|
||||||
|
|
||||||
### Phase 3 D3: Alloc Gate Specialization (MEDIUM PRIORITY)
|
### Phase 3 D3: Alloc Gate Specialization (MEDIUM PRIORITY)
|
||||||
**Target**: `tiny_alloc_gate_fast()` for LEGACY-only route
|
**Target**: `tiny_alloc_gate_fast()` の分岐形を最短化(MIXED 向け)
|
||||||
**Expected Gain**: +1-2%
|
**Expected Gain**: +1-2%
|
||||||
**Risk**: LOW
|
**Risk**: LOW
|
||||||
**Effort**: 2-3 hours
|
**Effort**: 2-3 hours
|
||||||
|
|
||||||
**Implementation**:
|
**Implementation**:
|
||||||
1. Create `tiny_alloc_gate_fast_legacy()` specialized variant
|
1. New ENV gate: `HAKMEM_ALLOC_GATE_SHAPE=0/1`
|
||||||
2. Eliminate ROUTE_POOL_ONLY and ROUTE_TINY_FIRST branches
|
2. `tiny_route_get()` を避け、`g_tiny_route[]` の直接参照に置換(release logging branch を回避)
|
||||||
3. Use in MIXED profile where all classes are LEGACY
|
3. `ROUTE_POOL_ONLY` は必ず尊重(`HAKMEM_TINY_PROFILE=hot/off` を壊さない)
|
||||||
4. A/B test: BASELINE vs D3
|
4. A/B test: BASELINE vs D3
|
||||||
|
|
||||||
**ENV Gate**: `HAKMEM_ALLOC_GATE_LEGACY_ONLY=1` (default: 0)
|
**Design**: `docs/analysis/PHASE4_D3_ALLOC_GATE_SPECIALIZATION_1_DESIGN.md`
|
||||||
|
|
||||||
|
**ENV Gate**: `HAKMEM_ALLOC_GATE_SHAPE=0/1` (default: 0)
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@ -376,7 +378,7 @@ static inline void* tiny_alloc_gate_fast(size_t size)
|
|||||||
|
|
||||||
2. **Optional**: Phase 3 D3 (Alloc gate specialization) - pending perf validation
|
2. **Optional**: Phase 3 D3 (Alloc gate specialization) - pending perf validation
|
||||||
- Only proceed if perf shows ≥5% self% in alloc gate
|
- Only proceed if perf shows ≥5% self% in alloc gate
|
||||||
- ENV: `HAKMEM_ALLOC_GATE_LEGACY_ONLY=0/1`
|
- ENV: `HAKMEM_ALLOC_GATE_SHAPE=0/1`
|
||||||
|
|
||||||
3. **Phase 4 Planning**: If no more 5%+ targets, prepare Phase 4 roadmap
|
3. **Phase 4 Planning**: If no more 5%+ targets, prepare Phase 4 roadmap
|
||||||
|
|
||||||
|
|||||||
@ -85,9 +85,9 @@ perf report --stdio
|
|||||||
|
|
||||||
狙い: Mixed 本線の固定構成に合わせ、alloc gate の分岐を削って 1–2% 詰める。
|
狙い: Mixed 本線の固定構成に合わせ、alloc gate の分岐を削って 1–2% 詰める。
|
||||||
|
|
||||||
- 設計メモ: `docs/analysis/PHASE3_D3_ALLOC_GATE_SPECIALIZATION_1_DESIGN.md`
|
|
||||||
- 実装指示書: `docs/analysis/PHASE4_ALLOC_GATE_SPECIALIZATION_NEXT_INSTRUCTIONS.md`
|
- 実装指示書: `docs/analysis/PHASE4_ALLOC_GATE_SPECIALIZATION_NEXT_INSTRUCTIONS.md`
|
||||||
- ENV: `HAKMEM_ALLOC_GATE_LEGACY_ONLY=0/1`(default 0)
|
- 設計メモ(最新版): `docs/analysis/PHASE4_D3_ALLOC_GATE_SPECIALIZATION_1_DESIGN.md`
|
||||||
|
- ENV: `HAKMEM_ALLOC_GATE_SHAPE=0/1`(default 0)
|
||||||
- 注意: “safe enable 判定” を必ず入れて、ENV 組み合わせで壊れないようにする
|
- 注意: “safe enable 判定” を必ず入れて、ENV 組み合わせで壊れないようにする
|
||||||
|
|
||||||
## 次候補(perf で 5% 超なら着手)
|
## 次候補(perf で 5% 超なら着手)
|
||||||
|
|||||||
@ -1,5 +1,8 @@
|
|||||||
# Phase 3 D3: Alloc Gate Specialization(Mixed “LEGACY-only” 最短化)設計メモ
|
# Phase 3 D3: Alloc Gate Specialization(Mixed “LEGACY-only” 最短化)設計メモ
|
||||||
|
|
||||||
|
> NOTE (2025-12-13): この設計メモは **後続の Phase 4 D3** により置き換えられました。
|
||||||
|
> 最新の設計・ENV・実装指示は `docs/analysis/PHASE4_D3_ALLOC_GATE_SPECIALIZATION_1_DESIGN.md` を参照してください。
|
||||||
|
|
||||||
## 目的
|
## 目的
|
||||||
|
|
||||||
Mixed 本線(`MIXED_TINYV3_C7_SAFE`)では、
|
Mixed 本線(`MIXED_TINYV3_C7_SAFE`)では、
|
||||||
@ -84,4 +87,3 @@ Mixed 10-run(推奨: 20M iters):
|
|||||||
|
|
||||||
- `HAKMEM_ALLOC_GATE_LEGACY_ONLY=0`(即 OFF)
|
- `HAKMEM_ALLOC_GATE_LEGACY_ONLY=0`(即 OFF)
|
||||||
- 箱は独立させ、既存 gate を汚さない(境界 1 箇所差し替え)
|
- 箱は独立させ、既存 gate を汚さない(境界 1 箇所差し替え)
|
||||||
|
|
||||||
|
|||||||
@ -238,14 +238,14 @@ IF mean_gain >= +1.0% AND median_gain >= +0.0%:
|
|||||||
|
|
||||||
**Requirement**: perf validation showing `tiny_alloc_gate_fast` self% ≥ 5%
|
**Requirement**: perf validation showing `tiny_alloc_gate_fast` self% ≥ 5%
|
||||||
|
|
||||||
**Design**: `docs/analysis/PHASE3_D3_ALLOC_GATE_SPECIALIZATION_1_DESIGN.md`
|
**Design**: `docs/analysis/PHASE4_D3_ALLOC_GATE_SPECIALIZATION_1_DESIGN.md`
|
||||||
|
|
||||||
**Strategy**: Specialize alloc gate for fixed MIXED configuration
|
**Strategy**: Specialize alloc gate for fixed MIXED configuration
|
||||||
- Eliminate dynamic checks
|
- Eliminate dynamic checks
|
||||||
- Inline hot paths
|
- Inline hot paths
|
||||||
- Reduce branch complexity
|
- Reduce branch complexity
|
||||||
|
|
||||||
**ENV**: `HAKMEM_ALLOC_GATE_LEGACY_ONLY=0/1`
|
**ENV**: `HAKMEM_ALLOC_GATE_SHAPE=0/1`
|
||||||
|
|
||||||
**Decision Criteria**:
|
**Decision Criteria**:
|
||||||
- IF perf shows ≥5% self% in alloc gate → Proceed with D3
|
- IF perf shows ≥5% self% in alloc gate → Proceed with D3
|
||||||
|
|||||||
@ -29,24 +29,16 @@ perf report --stdio --no-children
|
|||||||
|
|
||||||
## Step 1: D3 実装(Box Theory)
|
## Step 1: D3 実装(Box Theory)
|
||||||
|
|
||||||
設計メモ(SSOT): `docs/analysis/PHASE3_D3_ALLOC_GATE_SPECIALIZATION_1_DESIGN.md`
|
設計メモ(SSOT): `docs/analysis/PHASE4_D3_ALLOC_GATE_SPECIALIZATION_1_DESIGN.md`
|
||||||
|
|
||||||
### 箱割り(必須)
|
### 箱割り(必須)
|
||||||
|
|
||||||
- L0: ENV(戻せる)
|
- L0: ENV(戻せる)
|
||||||
- `HAKMEM_ALLOC_GATE_LEGACY_ONLY=0/1`(default 0)
|
- `HAKMEM_ALLOC_GATE_SHAPE=0/1`(default 0)
|
||||||
- L1: Safe-Enable 判定(境界 1 箇所)
|
- L1: Gate Shape(境界 1 箇所)
|
||||||
- Mixed 本線が “LEGACY-only” であることを 1 回だけ確認し、成立した場合のみ fast 経路へ
|
- `tiny_alloc_gate_fast()` 内の分岐形だけを変更(意味は変えない)
|
||||||
- Learner ON の場合は必ず disable(route が動的更新されるため)
|
- `tiny_route_get()` を呼ばずに `g_tiny_route[]` を直接読む(release logging branch を避ける)
|
||||||
- L2: Integration(境界 1 箇所)
|
- **Safety**: `g_tiny_route[ci] == ROUTE_POOL_ONLY` のときは必ず `NULL` を返す(`HAKMEM_TINY_PROFILE=hot/off` を壊さない)
|
||||||
- `tiny_alloc_gate_fast()` の入口でのみ差し替え(既存の意味は変えない)
|
|
||||||
|
|
||||||
### Safe-Enable 条件(Fail-Fast ではなく “自動 disable”)
|
|
||||||
|
|
||||||
以下のどれかに該当したら D3 は **即 disable** して既存経路へフォールバック:
|
|
||||||
- learner が有効(例: `HAKMEM_SMALL_LEARNER_V7_ENABLED=1`)
|
|
||||||
- route_kind に LEGACY 以外が混入(V7/MID/ULTRA/TINY_FIRST 等)
|
|
||||||
- そのほか Mixed 本線の固定前提が崩れている
|
|
||||||
|
|
||||||
## Step 2: A/B(GO/NO-GO)
|
## Step 2: A/B(GO/NO-GO)
|
||||||
|
|
||||||
@ -54,12 +46,12 @@ Mixed 10-run(推奨: iter=20M, ws=400, 1T):
|
|||||||
|
|
||||||
- Baseline:
|
- Baseline:
|
||||||
```bash
|
```bash
|
||||||
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE HAKMEM_ALLOC_GATE_LEGACY_ONLY=0 \
|
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE HAKMEM_ALLOC_GATE_SHAPE=0 \
|
||||||
./bench_random_mixed_hakmem 20000000 400 1
|
./bench_random_mixed_hakmem 20000000 400 1
|
||||||
```
|
```
|
||||||
- Optimized:
|
- Optimized:
|
||||||
```bash
|
```bash
|
||||||
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE HAKMEM_ALLOC_GATE_LEGACY_ONLY=1 \
|
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE HAKMEM_ALLOC_GATE_SHAPE=1 \
|
||||||
./bench_random_mixed_hakmem 20000000 400 1
|
./bench_random_mixed_hakmem 20000000 400 1
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|||||||
417
docs/analysis/PHASE4_D3_ALLOC_GATE_SPECIALIZATION_1_DESIGN.md
Normal file
417
docs/analysis/PHASE4_D3_ALLOC_GATE_SPECIALIZATION_1_DESIGN.md
Normal file
@ -0,0 +1,417 @@
|
|||||||
|
# Phase 4 D3: Alloc Gate Specialization 設計メモ
|
||||||
|
|
||||||
|
## 目的
|
||||||
|
|
||||||
|
`tiny_alloc_gate_fast()` のルーティング分岐を MIXED 向けに特化(LEGACY 優先パス)
|
||||||
|
|
||||||
|
**背景**:
|
||||||
|
- Phase 3 完了: +8.93% cumulative gain (37.5M → 51M ops/s)
|
||||||
|
- Perf analysis: `tiny_alloc_gate_fast` at **12.75% self** (HIGH priority)
|
||||||
|
- MIXED workload: 99% が LEGACY route(MID_V3 OFF)
|
||||||
|
- 現状: 全 route (LEGACY/ULTRA/MID/V7) をスイッチで分岐 → 予測失敗コスト高
|
||||||
|
|
||||||
|
## 観察
|
||||||
|
|
||||||
|
### Current State (Phase 3 Baseline)
|
||||||
|
- `tiny_alloc_gate_fast`: 12.75% self + children overhead
|
||||||
|
- MIXED ワークロード特性:
|
||||||
|
- 99% が LEGACY route(`HAKMEM_MID_V3_ENABLED=0`)
|
||||||
|
- C0-C7 全体で uniform branching → prediction miss
|
||||||
|
- 既存最適化(Phase 2 B3):
|
||||||
|
- `malloc_tiny_fast_for_class()` 内で LEGACY-first branching 実装済み
|
||||||
|
- しかし `tiny_alloc_gate_fast()` の routing policy 分岐は最適化されていない
|
||||||
|
|
||||||
|
### Bottleneck Analysis
|
||||||
|
|
||||||
|
**Current flow** (`core/box/tiny_alloc_gate_box.h:139-217`):
|
||||||
|
```c
|
||||||
|
void* tiny_alloc_gate_fast(size_t size) {
|
||||||
|
int class_idx = hak_tiny_size_to_class(size);
|
||||||
|
if (class_idx < 0 || class_idx >= TINY_NUM_CLASSES) return NULL;
|
||||||
|
|
||||||
|
TinyRoutePolicy route = tiny_route_get(class_idx); // ← Policy lookup
|
||||||
|
|
||||||
|
// Branching on route policy (uniform dispatch, poor prediction)
|
||||||
|
if (route == ROUTE_POOL_ONLY) return NULL;
|
||||||
|
|
||||||
|
void* user_ptr = malloc_tiny_fast_for_class(size, class_idx);
|
||||||
|
|
||||||
|
if (route == ROUTE_TINY_ONLY) {
|
||||||
|
// Hot path (99% of Mixed traffic)
|
||||||
|
return user_ptr;
|
||||||
|
}
|
||||||
|
|
||||||
|
// ROUTE_TINY_FIRST: fallback allowed
|
||||||
|
return user_ptr;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Problem**:
|
||||||
|
- Policy lookup overhead: `tiny_route_get()` call every allocation
|
||||||
|
- Branch on `route == ROUTE_POOL_ONLY`: rare but evaluated every time
|
||||||
|
- Branch on `route == ROUTE_TINY_ONLY` vs `ROUTE_TINY_FIRST`: Mixed default は TINY_FIRST(挙動差はほぼ無い)
|
||||||
|
- Total cost: ~2-3 branches + 1 policy lookup per allocation
|
||||||
|
|
||||||
|
**Expected savings**:
|
||||||
|
- Eliminate policy lookup for known-LEGACY workloads
|
||||||
|
- Convert policy branches to LIKELY-hinted checks
|
||||||
|
- Reduce instruction count by 5-10 per allocation
|
||||||
|
|
||||||
|
## 実装アプローチ
|
||||||
|
|
||||||
|
### Strategy: LEGACY-first with Static Route Assumption
|
||||||
|
|
||||||
|
**Pattern**: Similar to Phase 2 B3 (routing shape optimization)
|
||||||
|
- Reference: `core/front/malloc_tiny_fast.h:262-278`
|
||||||
|
- Proven approach: LIKELY hint + cold helper for rare routes
|
||||||
|
- Expected branch prediction improvement: 75% miss rate → <5% miss rate
|
||||||
|
|
||||||
|
### L0: Env(戻せる)
|
||||||
|
|
||||||
|
- `HAKMEM_ALLOC_GATE_SHAPE=0/1` (default: 0, OFF)
|
||||||
|
- Opt-in で特化パスを有効化(常時有効化は慎重)
|
||||||
|
- Rollback: ENV=0 で即座に既存経路へ復帰
|
||||||
|
|
||||||
|
### L1: SpecializedGateBox(境界: 1箇所)
|
||||||
|
|
||||||
|
#### Optimized Gate Structure
|
||||||
|
|
||||||
|
**Before** (current):
|
||||||
|
```c
|
||||||
|
void* tiny_alloc_gate_fast(size_t size) {
|
||||||
|
int class_idx = hak_tiny_size_to_class(size);
|
||||||
|
if (class_idx < 0 || class_idx >= TINY_NUM_CLASSES) return NULL;
|
||||||
|
|
||||||
|
TinyRoutePolicy route = tiny_route_get(class_idx);
|
||||||
|
|
||||||
|
if (route == ROUTE_POOL_ONLY) return NULL;
|
||||||
|
|
||||||
|
void* user_ptr = malloc_tiny_fast_for_class(size, class_idx);
|
||||||
|
|
||||||
|
if (route == ROUTE_TINY_ONLY) return user_ptr;
|
||||||
|
|
||||||
|
return user_ptr; // ROUTE_TINY_FIRST
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**After** (optimized for MIXED):
|
||||||
|
```c
|
||||||
|
void* tiny_alloc_gate_fast(size_t size) {
|
||||||
|
int class_idx = hak_tiny_size_to_class(size);
|
||||||
|
if (__builtin_expect(class_idx < 0 || class_idx >= TINY_NUM_CLASSES, 0)) {
|
||||||
|
return NULL;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Phase 4 D3: LEGACY-first gate specialization (ENV gated)
|
||||||
|
if (TINY_HOT_LIKELY(alloc_gate_shape_enabled())) {
|
||||||
|
// MIXED fast path: Avoid tiny_route_get() overhead.
|
||||||
|
// NOTE: We do NOT assume TINY_ONLY vs TINY_FIRST; both return user_ptr.
|
||||||
|
// Safety: still honor POOL_ONLY if configured via HAKMEM_TINY_PROFILE (e.g., "hot"/"off").
|
||||||
|
if (__builtin_expect(g_tiny_route[class_idx & 7] == ROUTE_POOL_ONLY, 0)) {
|
||||||
|
return NULL;
|
||||||
|
}
|
||||||
|
// Direct to malloc_tiny_fast_for_class, skip tiny_route_get()
|
||||||
|
return malloc_tiny_fast_for_class(size, class_idx);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Original path (backward compatible)
|
||||||
|
TinyRoutePolicy route = tiny_route_get(class_idx);
|
||||||
|
|
||||||
|
if (__builtin_expect(route == ROUTE_POOL_ONLY, 0)) return NULL;
|
||||||
|
|
||||||
|
void* user_ptr = malloc_tiny_fast_for_class(size, class_idx);
|
||||||
|
|
||||||
|
if (TINY_HOT_LIKELY(route == ROUTE_TINY_ONLY)) return user_ptr;
|
||||||
|
|
||||||
|
return user_ptr;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Branch Prediction Impact
|
||||||
|
|
||||||
|
**Current** (uniform branching):
|
||||||
|
- Policy lookup: Always executed (1 function call overhead)
|
||||||
|
- `route == ROUTE_POOL_ONLY`: 0% hit rate (but checked every time)
|
||||||
|
- `route == ROUTE_TINY_ONLY`: 99% hit rate (but no LIKELY hint)
|
||||||
|
- Total overhead: ~10-15 cycles per allocation
|
||||||
|
|
||||||
|
**Optimized** (LEGACY-first):
|
||||||
|
- ENV check: 99% cached (< 1 cycle amortized)
|
||||||
|
- Direct path: Skip `tiny_route_get()` (and its release logging branch) (save ~5-7 cycles)
|
||||||
|
- LIKELY hint: CPU predictor trained to expect fast path
|
||||||
|
- Total savings: ~8-12 cycles per allocation
|
||||||
|
|
||||||
|
**Expected gain**: +1-2% on MIXED (conservative estimate based on 12.75% self%)
|
||||||
|
|
||||||
|
### 実装指示
|
||||||
|
|
||||||
|
#### File 1: `core/box/tiny_alloc_gate_shape_env_box.h` (新規)
|
||||||
|
|
||||||
|
**Role**: ENV gate for alloc gate shape optimization
|
||||||
|
|
||||||
|
**API**:
|
||||||
|
```c
|
||||||
|
// ENV gate: HAKMEM_ALLOC_GATE_SHAPE=0/1 (default: 0)
|
||||||
|
static inline int alloc_gate_shape_enabled(void) {
|
||||||
|
static int g_enable = -1; // Lazy init sentinel
|
||||||
|
if (__builtin_expect(g_enable == -1, 0)) {
|
||||||
|
const char* e = getenv("HAKMEM_ALLOC_GATE_SHAPE");
|
||||||
|
g_enable = (e && *e && *e != '0') ? 1 : 0;
|
||||||
|
}
|
||||||
|
return g_enable;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Integration**: Header-only, single-responsibility (ENV caching only)
|
||||||
|
|
||||||
|
#### File 2: Modify `core/box/tiny_alloc_gate_box.h` (既存)
|
||||||
|
|
||||||
|
**Location**: `tiny_alloc_gate_fast()` function (lines 139-217)
|
||||||
|
|
||||||
|
**Changes**:
|
||||||
|
1. Include new ENV box header:
|
||||||
|
```c
|
||||||
|
#include "tiny_alloc_gate_shape_env_box.h"
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Add LEGACY-first fast path before existing route dispatch:
|
||||||
|
```c
|
||||||
|
// Phase 4 D3: LEGACY-first gate specialization
|
||||||
|
if (TINY_HOT_LIKELY(alloc_gate_shape_enabled())) {
|
||||||
|
// Skip policy lookup for MIXED (ROUTE_TINY_ONLY assumption)
|
||||||
|
return malloc_tiny_fast_for_class(size, class_idx);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Add LIKELY hints to existing branches (backward compatible path):
|
||||||
|
```c
|
||||||
|
if (__builtin_expect(route == ROUTE_POOL_ONLY, 0)) return NULL;
|
||||||
|
|
||||||
|
void* user_ptr = malloc_tiny_fast_for_class(size, class_idx);
|
||||||
|
|
||||||
|
if (TINY_HOT_LIKELY(route == ROUTE_TINY_ONLY)) return user_ptr;
|
||||||
|
```
|
||||||
|
|
||||||
|
**Safety**:
|
||||||
|
- ENV gate ensures opt-in behavior
|
||||||
|
- Fallback path unchanged (existing validation/diagnostics preserved)
|
||||||
|
- No algorithmic changes (only branch shape optimization)
|
||||||
|
|
||||||
|
## A/B テスト
|
||||||
|
|
||||||
|
### Test Configuration
|
||||||
|
|
||||||
|
**Workload**: Mixed (10-run, 20M iters, ws=400)
|
||||||
|
- Baseline: `HAKMEM_ALLOC_GATE_SHAPE=0`
|
||||||
|
- Optimized: `HAKMEM_ALLOC_GATE_SHAPE=1`
|
||||||
|
|
||||||
|
**Commands**:
|
||||||
|
```bash
|
||||||
|
# Baseline
|
||||||
|
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE HAKMEM_ALLOC_GATE_SHAPE=0 \
|
||||||
|
./bench_random_mixed_hakmem 20000000 400 1
|
||||||
|
|
||||||
|
# Optimized
|
||||||
|
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE HAKMEM_ALLOC_GATE_SHAPE=1 \
|
||||||
|
./bench_random_mixed_hakmem 20000000 400 1
|
||||||
|
```
|
||||||
|
|
||||||
|
### Success Criteria
|
||||||
|
|
||||||
|
**GO**: Mean gain >= +1.0%, Median >= +0.0%
|
||||||
|
- **Promote to default** in `MIXED_TINYV3_C7_SAFE` preset
|
||||||
|
- Add to `core/bench_profile.h`: `bench_setenv_default("HAKMEM_ALLOC_GATE_SHAPE", "1");`
|
||||||
|
- Document in `docs/analysis/ENV_PROFILE_PRESETS.md`
|
||||||
|
|
||||||
|
**NEUTRAL**: -1.0% < gain < +1.0%
|
||||||
|
- **Freeze as research box** (default OFF)
|
||||||
|
- Document findings, keep implementation for future study
|
||||||
|
|
||||||
|
**NO-GO**: Mean gain <= -1.0%
|
||||||
|
- **Freeze and archive** (default OFF, do not pursue)
|
||||||
|
- Document regression cause, learn for future optimizations
|
||||||
|
|
||||||
|
## 期待値
|
||||||
|
|
||||||
|
### Performance Gain Estimation
|
||||||
|
|
||||||
|
**Target**: `tiny_alloc_gate_fast` at 12.75% self
|
||||||
|
|
||||||
|
**Optimization**:
|
||||||
|
- Eliminate policy lookup: ~5-7 cycles saved
|
||||||
|
- Add LIKELY hints: ~3-5 cycles saved (branch prediction)
|
||||||
|
- Total savings: ~8-12 cycles per allocation
|
||||||
|
|
||||||
|
**Calculation**:
|
||||||
|
- Baseline: ~100 cycles per allocation (estimated)
|
||||||
|
- Savings: 8-12 cycles (8-12% of allocation cost)
|
||||||
|
- `tiny_alloc_gate_fast` contribution: 12.75% self
|
||||||
|
- Expected gain: 12.75% × (8-12%) = **+1.0-1.5%** (conservative)
|
||||||
|
|
||||||
|
**Realistic range**: +1-2% on MIXED workload
|
||||||
|
|
||||||
|
### Risk Assessment
|
||||||
|
|
||||||
|
**Risk Level**: LOW
|
||||||
|
|
||||||
|
**Why**:
|
||||||
|
- Only branch shape optimization (no algorithmic change)
|
||||||
|
- ENV gate allows instant rollback
|
||||||
|
- Fallback path unchanged (safety preserved)
|
||||||
|
- Pattern proven by Phase 2 B3 (+2.89% success)
|
||||||
|
|
||||||
|
**Failure modes**:
|
||||||
|
- Policy lookup cost lower than expected → minimal/no gain
|
||||||
|
- ENV check overhead outweighs savings → slight regression
|
||||||
|
- Both cases: Rollback with `HAKMEM_ALLOC_GATE_SHAPE=0`
|
||||||
|
|
||||||
|
## 非目標
|
||||||
|
|
||||||
|
**NOT in scope**:
|
||||||
|
- Route algorithm change (only branch shape)
|
||||||
|
- Learner integration (optional for future)
|
||||||
|
- C6-heavy workload optimization (already uses MID_V3 ON)
|
||||||
|
- Policy snapshot bypass (already done in Phase 3 C3)
|
||||||
|
|
||||||
|
## Reference Patterns
|
||||||
|
|
||||||
|
### Similar Optimizations
|
||||||
|
|
||||||
|
**Phase 2 B3** (Routing shape optimization):
|
||||||
|
- File: `core/front/malloc_tiny_fast.h:262-278`
|
||||||
|
- Pattern: `if (TINY_HOT_LIKELY(route_kind == SMALL_ROUTE_LEGACY))`
|
||||||
|
- Result: **+2.89% on MIXED**, +9.13% on C6-heavy
|
||||||
|
- Lesson: LIKELY hints + cold helpers are highly effective
|
||||||
|
|
||||||
|
**Phase 3 D1** (Free route cache):
|
||||||
|
- File: `core/box/tiny_free_route_cache_env_box.h`
|
||||||
|
- Pattern: ENV gate with lazy init (-1 sentinel)
|
||||||
|
- Result: **+2.19% on MIXED** (promoted to default)
|
||||||
|
- Lesson: ENV gates with cached values have minimal overhead
|
||||||
|
|
||||||
|
**Phase 3 C3** (Static routing):
|
||||||
|
- File: `core/box/tiny_static_route_box.h`
|
||||||
|
- Pattern: Static route table (bypass policy snapshot)
|
||||||
|
- Result: **+2.20% on MIXED**
|
||||||
|
- Lesson: Eliminating dynamic lookups pays off
|
||||||
|
|
||||||
|
### Code Examples
|
||||||
|
|
||||||
|
**B3 Pattern** (LIKELY-first branching):
|
||||||
|
```c
|
||||||
|
if (TINY_HOT_LIKELY(env_cfg->alloc_route_shape)) {
|
||||||
|
if (TINY_HOT_LIKELY(route_kind == SMALL_ROUTE_LEGACY)) {
|
||||||
|
// Hot path: LEGACY fast (99% traffic)
|
||||||
|
void* ptr = tiny_hot_alloc_fast(class_idx);
|
||||||
|
if (TINY_HOT_LIKELY(ptr != NULL)) return ptr;
|
||||||
|
return tiny_cold_refill_and_alloc(class_idx);
|
||||||
|
}
|
||||||
|
// Rare routes: cold helper
|
||||||
|
return tiny_alloc_route_cold(route_kind, class_idx, size);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**D1 Pattern** (ENV gate with lazy init):
|
||||||
|
```c
|
||||||
|
static inline int tiny_free_static_route_enabled(void) {
|
||||||
|
static int g_enable = -1;
|
||||||
|
if (__builtin_expect(g_enable == -1, 0)) {
|
||||||
|
const char* e = getenv("HAKMEM_FREE_STATIC_ROUTE");
|
||||||
|
g_enable = (e && *e && *e != '0') ? 1 : 0;
|
||||||
|
}
|
||||||
|
return g_enable;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Integration Plan
|
||||||
|
|
||||||
|
### Step 1: Implementation
|
||||||
|
|
||||||
|
1. Create `core/box/tiny_alloc_gate_shape_env_box.h`
|
||||||
|
- Single function: `alloc_gate_shape_enabled()`
|
||||||
|
- Lazy init with -1 sentinel
|
||||||
|
- Return cached ENV value
|
||||||
|
|
||||||
|
2. Modify `core/box/tiny_alloc_gate_box.h`
|
||||||
|
- Add include for new ENV box
|
||||||
|
- Insert LEGACY-first fast path (ENV gated)
|
||||||
|
- Add LIKELY hints to existing branches
|
||||||
|
- Preserve all validation/diagnostic logic
|
||||||
|
|
||||||
|
### Step 2: A/B Testing
|
||||||
|
|
||||||
|
1. Build with optimization:
|
||||||
|
```bash
|
||||||
|
make clean && make -j8 CFLAGS="-O3 -flto"
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Run 10-run baseline:
|
||||||
|
```bash
|
||||||
|
for i in {1..10}; do
|
||||||
|
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE HAKMEM_ALLOC_GATE_SHAPE=0 \
|
||||||
|
./bench_random_mixed_hakmem 20000000 400 1
|
||||||
|
done
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Run 10-run optimized:
|
||||||
|
```bash
|
||||||
|
for i in {1..10}; do
|
||||||
|
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE HAKMEM_ALLOC_GATE_SHAPE=1 \
|
||||||
|
./bench_random_mixed_hakmem 20000000 400 1
|
||||||
|
done
|
||||||
|
```
|
||||||
|
|
||||||
|
4. Calculate statistics:
|
||||||
|
- Mean, Median, StdDev for both configurations
|
||||||
|
- Compare against success criteria
|
||||||
|
- Document results in `PHASE4_D3_ALLOC_GATE_AB_TEST_RESULTS.md`
|
||||||
|
|
||||||
|
### Step 3: Decision & Promotion
|
||||||
|
|
||||||
|
**If GO**:
|
||||||
|
1. Update `core/bench_profile.h` (MIXED_TINYV3_C7_SAFE preset)
|
||||||
|
2. Update `docs/analysis/ENV_PROFILE_PRESETS.md`
|
||||||
|
3. Update `CURRENT_TASK.md` (Phase 4 D3 complete)
|
||||||
|
4. Commit with message: "Phase 4 D3: Alloc Gate Specialization (+X.X%)"
|
||||||
|
|
||||||
|
**If NEUTRAL or NO-GO**:
|
||||||
|
1. Document results in `PHASE4_D3_ALLOC_GATE_AB_TEST_RESULTS.md`
|
||||||
|
2. Keep ENV gate at default OFF
|
||||||
|
3. Archive as research box
|
||||||
|
4. Move to next candidate optimization
|
||||||
|
|
||||||
|
## Validation Checklist
|
||||||
|
|
||||||
|
**Pre-implementation**:
|
||||||
|
- [ ] Design document reviewed and approved
|
||||||
|
- [ ] Integration points identified (2 files)
|
||||||
|
- [ ] Reference patterns studied (B3, D1, C3)
|
||||||
|
- [ ] ENV gate strategy confirmed (opt-in, default OFF)
|
||||||
|
|
||||||
|
**Implementation**:
|
||||||
|
- [ ] `tiny_alloc_gate_shape_env_box.h` created
|
||||||
|
- [ ] `tiny_alloc_gate_box.h` modified (LEGACY-first path added)
|
||||||
|
- [ ] LIKELY hints added to fallback branches
|
||||||
|
- [ ] Clean compilation (no new warnings)
|
||||||
|
- [ ] Health check passes: `scripts/verify_health_profiles.sh`
|
||||||
|
|
||||||
|
**A/B Testing**:
|
||||||
|
- [ ] Baseline 10-run completed (SHAPE=0)
|
||||||
|
- [ ] Optimized 10-run completed (SHAPE=1)
|
||||||
|
- [ ] Statistics calculated (mean, median, stddev)
|
||||||
|
- [ ] Results documented
|
||||||
|
- [ ] Success criteria evaluated (GO/NEUTRAL/NO-GO)
|
||||||
|
|
||||||
|
**Promotion** (if GO):
|
||||||
|
- [ ] `bench_profile.h` updated (default SHAPE=1)
|
||||||
|
- [ ] `ENV_PROFILE_PRESETS.md` updated
|
||||||
|
- [ ] `CURRENT_TASK.md` updated (Phase 4 complete)
|
||||||
|
- [ ] Commit created with clear message
|
||||||
|
- [ ] Cumulative gain updated (+X.X% total)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Phase 4 D3 Status**: DESIGN COMPLETE
|
||||||
|
**Next Step**: Implementation → A/B Test → Decision
|
||||||
|
**Expected Outcome**: +1-2% gain (conservative), LOW risk
|
||||||
|
**Rollback Plan**: `HAKMEM_ALLOC_GATE_SHAPE=0` (instant revert)
|
||||||
Reference in New Issue
Block a user