@ -0,0 +1,417 @@
# Phase 4 D3: Alloc Gate Specialization 設計メモ
## 目的
`tiny_alloc_gate_fast()` のルーティング分岐を MIXED 向けに特化( LEGACY 優先パス)
**背景** :
- Phase 3 完了: +8.93% cumulative gain (37.5M → 51M ops/s)
- Perf analysis: `tiny_alloc_gate_fast` at **12.75% self** (HIGH priority)
- MIXED workload: 99% が LEGACY route( MID_V3 OFF)
- 現状: 全 route (LEGACY/ULTRA/MID/V7) をスイッチで分岐 → 予測失敗コスト高
## 観察
### Current State (Phase 3 Baseline)
- `tiny_alloc_gate_fast` : 12.75% self + children overhead
- MIXED ワークロード特性:
- 99% が LEGACY route( `HAKMEM_MID_V3_ENABLED=0` )
- C0-C7 全体で uniform branching → prediction miss
- 既存最適化( Phase 2 B3) :
- `malloc_tiny_fast_for_class()` 内で LEGACY-first branching 実装済み
- しかし `tiny_alloc_gate_fast()` の routing policy 分岐は最適化されていない
### Bottleneck Analysis
**Current flow** (`core/box/tiny_alloc_gate_box.h:139-217` ):
```c
void * tiny_alloc_gate_fast ( size_t size ) {
int class_idx = hak_tiny_size_to_class ( size );
if ( class_idx < 0 || class_idx >= TINY_NUM_CLASSES ) return NULL ;
TinyRoutePolicy route = tiny_route_get ( class_idx ); // ← Policy lookup
// Branching on route policy (uniform dispatch, poor prediction)
if ( route == ROUTE_POOL_ONLY ) return NULL ;
void * user_ptr = malloc_tiny_fast_for_class ( size , class_idx );
if ( route == ROUTE_TINY_ONLY ) {
// Hot path (99% of Mixed traffic)
return user_ptr ;
}
// ROUTE_TINY_FIRST: fallback allowed
return user_ptr ;
}
```
**Problem** :
- Policy lookup overhead: `tiny_route_get()` call every allocation
- Branch on `route == ROUTE_POOL_ONLY` : rare but evaluated every time
- Branch on `route == ROUTE_TINY_ONLY` vs `ROUTE_TINY_FIRST` : Mixed default は TINY_FIRST( 挙動差はほぼ無い)
- Total cost: ~2-3 branches + 1 policy lookup per allocation
**Expected savings** :
- Eliminate policy lookup for known-LEGACY workloads
- Convert policy branches to LIKELY-hinted checks
- Reduce instruction count by 5-10 per allocation
## 実装アプローチ
### Strategy: LEGACY-first with Static Route Assumption
**Pattern** : Similar to Phase 2 B3 (routing shape optimization)
- Reference: `core/front/malloc_tiny_fast.h:262-278`
- Proven approach: LIKELY hint + cold helper for rare routes
- Expected branch prediction improvement: 75% miss rate → < 5 % miss rate
### L0: Env( 戻せる)
- `HAKMEM_ALLOC_GATE_SHAPE=0/1` ( default: 0 , OFF )
- Opt-in で特化パスを有効化 ( 常時有効化は慎重 )
- Rollback: ENV = 0 で即座に既存経路へ復帰
### L1: SpecializedGateBox( 境界: 1箇所)
#### Optimized Gate Structure
**Before** ( current ) :
```c
void * tiny_alloc_gate_fast ( size_t size ) {
int class_idx = hak_tiny_size_to_class ( size );
if ( class_idx < 0 || class_idx >= TINY_NUM_CLASSES ) return NULL ;
TinyRoutePolicy route = tiny_route_get ( class_idx );
if ( route == ROUTE_POOL_ONLY ) return NULL ;
void * user_ptr = malloc_tiny_fast_for_class ( size , class_idx );
if ( route == ROUTE_TINY_ONLY ) return user_ptr ;
return user_ptr ; // ROUTE_TINY_FIRST
}
```
**After** ( optimized for MIXED ) :
```c
void * tiny_alloc_gate_fast ( size_t size ) {
int class_idx = hak_tiny_size_to_class ( size );
if ( __builtin_expect ( class_idx < 0 || class_idx >= TINY_NUM_CLASSES , 0 )) {
return NULL ;
}
// Phase 4 D3: LEGACY-first gate specialization (ENV gated)
if ( TINY_HOT_LIKELY ( alloc_gate_shape_enabled ())) {
// MIXED fast path: Avoid tiny_route_get() overhead.
// NOTE: We do NOT assume TINY_ONLY vs TINY_FIRST; both return user_ptr.
// Safety: still honor POOL_ONLY if configured via HAKMEM_TINY_PROFILE (e.g., "hot"/"off").
if ( __builtin_expect ( g_tiny_route [ class_idx & 7 ] == ROUTE_POOL_ONLY , 0 )) {
return NULL ;
}
// Direct to malloc_tiny_fast_for_class, skip tiny_route_get()
return malloc_tiny_fast_for_class ( size , class_idx );
}
// Original path (backward compatible)
TinyRoutePolicy route = tiny_route_get ( class_idx );
if ( __builtin_expect ( route == ROUTE_POOL_ONLY , 0 )) return NULL ;
void * user_ptr = malloc_tiny_fast_for_class ( size , class_idx );
if ( TINY_HOT_LIKELY ( route == ROUTE_TINY_ONLY )) return user_ptr ;
return user_ptr ;
}
```
#### Branch Prediction Impact
**Current** ( uniform branching ) :
- Policy lookup: Always executed ( 1 function call overhead )
- `route == ROUTE_POOL_ONLY` : 0 % hit rate ( but checked every time )
- `route == ROUTE_TINY_ONLY` : 99 % hit rate ( but no LIKELY hint )
- Total overhead: ~ 10-15 cycles per allocation
**Optimized** ( LEGACY-first ) :
- ENV check: 99 % cached (< 1 cycle amortized )
- Direct path: Skip `tiny_route_get()` ( and its release logging branch ) ( save ~ 5-7 cycles )
- LIKELY hint: CPU predictor trained to expect fast path
- Total savings: ~ 8-12 cycles per allocation
**Expected gain** : + 1-2 % on MIXED ( conservative estimate based on 12 . 75 % self %)
### 実装指示
#### File 1: `core/box/tiny_alloc_gate_shape_env_box.h` (新規)
**Role** : ENV gate for alloc gate shape optimization
**API** :
```c
// ENV gate: HAKMEM_ALLOC_GATE_SHAPE=0/1 (default: 0)
static inline int alloc_gate_shape_enabled ( void ) {
static int g_enable = - 1 ; // Lazy init sentinel
if ( __builtin_expect ( g_enable == - 1 , 0 )) {
const char * e = getenv ( "HAKMEM_ALLOC_GATE_SHAPE" );
g_enable = ( e && * e && * e != '0' ) ? 1 : 0 ;
}
return g_enable ;
}
```
**Integration** : Header-only , single-responsibility ( ENV caching only )
#### File 2: Modify `core/box/tiny_alloc_gate_box.h` (既存)
**Location** : `tiny_alloc_gate_fast()` function ( lines 139-217 )
**Changes** :
1. Include new ENV box header:
`` `c
#include "tiny_alloc_gate_shape_env_box.h"
` ` `
2. Add LEGACY-first fast path before existing route dispatch:
` ` `c
// Phase 4 D3: LEGACY-first gate specialization
if (TINY_HOT_LIKELY(alloc_gate_shape_enabled())) {
// Skip policy lookup for MIXED (ROUTE_TINY_ONLY assumption)
return malloc_tiny_fast_for_class(size, class_idx);
}
` ` `
3. Add LIKELY hints to existing branches (backward compatible path):
` ` `c
if (__builtin_expect(route == ROUTE_POOL_ONLY, 0)) return NULL;
void* user_ptr = malloc_tiny_fast_for_class(size, class_idx);
if (TINY_HOT_LIKELY(route == ROUTE_TINY_ONLY)) return user_ptr;
` ` `
**Safety**:
- ENV gate ensures opt-in behavior
- Fallback path unchanged (existing validation/diagnostics preserved)
- No algorithmic changes (only branch shape optimization)
## A/B テスト
### Test Configuration
**Workload**: Mixed (10-run, 20M iters, ws=400)
- Baseline: ` HAKMEM_ALLOC_GATE_SHAPE = 0 `
- Optimized: ` HAKMEM_ALLOC_GATE_SHAPE=1 `
**Commands**:
` ` `bash
# Baseline
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE HAKMEM_ALLOC_GATE_SHAPE=0 \
./bench_random_mixed_hakmem 20000000 400 1
# Optimized
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE HAKMEM_ALLOC_GATE_SHAPE=1 \
./bench_random_mixed_hakmem 20000000 400 1
` ` `
### Success Criteria
**GO**: Mean gain >= +1.0%, Median >= +0.0%
- **Promote to default** in ` MIXED_TINYV3_C7_SAFE ` preset
- Add to ` core/bench_profile.h `: ` bench_setenv_default("HAKMEM_ALLOC_GATE_SHAPE", " 1 "); `
- Document in ` docs / analysis / ENV_PROFILE_PRESETS . md `
**NEUTRAL**: -1.0% < gain < +1.0%
- **Freeze as research box** (default OFF)
- Document findings, keep implementation for future study
**NO-GO**: Mean gain <= -1.0%
- **Freeze and archive** (default OFF, do not pursue)
- Document regression cause, learn for future optimizations
## 期待値
### Performance Gain Estimation
**Target**: ` tiny_alloc_gate_fast ` at 12.75% self
**Optimization**:
- Eliminate policy lookup: ~5-7 cycles saved
- Add LIKELY hints: ~3-5 cycles saved (branch prediction)
- Total savings: ~8-12 cycles per allocation
**Calculation**:
- Baseline: ~100 cycles per allocation (estimated)
- Savings: 8-12 cycles (8-12% of allocation cost)
- ` tiny_alloc_gate_fast ` contribution: 12.75% self
- Expected gain: 12.75% × (8-12%) = **+1.0-1.5%** (conservative)
**Realistic range**: +1-2% on MIXED workload
### Risk Assessment
**Risk Level**: LOW
**Why**:
- Only branch shape optimization (no algorithmic change)
- ENV gate allows instant rollback
- Fallback path unchanged (safety preserved)
- Pattern proven by Phase 2 B3 (+2.89% success)
**Failure modes**:
- Policy lookup cost lower than expected → minimal/no gain
- ENV check overhead outweighs savings → slight regression
- Both cases: Rollback with ` HAKMEM_ALLOC_GATE_SHAPE = 0 `
## 非目標
**NOT in scope**:
- Route algorithm change (only branch shape)
- Learner integration (optional for future)
- C6-heavy workload optimization (already uses MID_V3 ON)
- Policy snapshot bypass (already done in Phase 3 C3)
## Reference Patterns
### Similar Optimizations
**Phase 2 B3** (Routing shape optimization):
- File: ` core/front/malloc_tiny_fast.h:262-278 `
- Pattern: ` if ( TINY_HOT_LIKELY ( route_kind = = SMALL_ROUTE_LEGACY )) `
- Result: **+2.89% on MIXED**, +9.13% on C6-heavy
- Lesson: LIKELY hints + cold helpers are highly effective
**Phase 3 D1** (Free route cache):
- File: ` core / box / tiny_free_route_cache_env_box . h `
- Pattern: ENV gate with lazy init (-1 sentinel)
- Result: **+2.19% on MIXED** (promoted to default)
- Lesson: ENV gates with cached values have minimal overhead
**Phase 3 C3** (Static routing):
- File: ` core / box / tiny_static_route_box . h `
- Pattern: Static route table (bypass policy snapshot)
- Result: **+2.20% on MIXED**
- Lesson: Eliminating dynamic lookups pays off
### Code Examples
**B3 Pattern** (LIKELY-first branching):
` ` `c
if (TINY_HOT_LIKELY(env_cfg->alloc_route_shape)) {
if (TINY_HOT_LIKELY(route_kind == SMALL_ROUTE_LEGACY)) {
// Hot path: LEGACY fast (99% traffic)
void* ptr = tiny_hot_alloc_fast(class_idx);
if (TINY_HOT_LIKELY(ptr != NULL)) return ptr;
return tiny_cold_refill_and_alloc(class_idx);
}
// Rare routes: cold helper
return tiny_alloc_route_cold(route_kind, class_idx, size);
}
` ` `
**D1 Pattern** (ENV gate with lazy init):
` ` `c
static inline int tiny_free_static_route_enabled(void) {
static int g_enable = -1;
if (__builtin_expect(g_enable == -1, 0)) {
const char* e = getenv("HAKMEM_FREE_STATIC_ROUTE");
g_enable = (e && *e && *e != '0') ? 1 : 0;
}
return g_enable;
}
` ` `
## Integration Plan
### Step 1: Implementation
1. Create ` core / box / tiny_alloc_gate_shape_env_box . h `
- Single function: ` alloc_gate_shape_enabled () `
- Lazy init with -1 sentinel
- Return cached ENV value
2. Modify ` core / box / tiny_alloc_gate_box . h `
- Add include for new ENV box
- Insert LEGACY-first fast path (ENV gated)
- Add LIKELY hints to existing branches
- Preserve all validation/diagnostic logic
### Step 2: A/B Testing
1. Build with optimization:
` ` `bash
make clean && make -j8 CFLAGS="-O3 -flto"
` ` `
2. Run 10-run baseline:
` ` `bash
for i in {1..10}; do
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE HAKMEM_ALLOC_GATE_SHAPE=0 \
./bench_random_mixed_hakmem 20000000 400 1
done
` ` `
3. Run 10-run optimized:
` ` `bash
for i in {1..10}; do
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE HAKMEM_ALLOC_GATE_SHAPE=1 \
./bench_random_mixed_hakmem 20000000 400 1
done
` ` `
4. Calculate statistics:
- Mean, Median, StdDev for both configurations
- Compare against success criteria
- Document results in ` PHASE4_D3_ALLOC_GATE_AB_TEST_RESULTS . md `
### Step 3: Decision & Promotion
**If GO**:
1. Update ` core / bench_profile . h ` (MIXED_TINYV3_C7_SAFE preset)
2. Update ` docs / analysis / ENV_PROFILE_PRESETS . md `
3. Update ` CURRENT_TASK . md ` (Phase 4 D3 complete)
4. Commit with message: "Phase 4 D3: Alloc Gate Specialization (+X.X%)"
**If NEUTRAL or NO-GO**:
1. Document results in ` PHASE4_D3_ALLOC_GATE_AB_TEST_RESULTS . md `
2. Keep ENV gate at default OFF
3. Archive as research box
4. Move to next candidate optimization
## Validation Checklist
**Pre-implementation**:
- [ ] Design document reviewed and approved
- [ ] Integration points identified (2 files)
- [ ] Reference patterns studied (B3, D1, C3)
- [ ] ENV gate strategy confirmed (opt-in, default OFF)
**Implementation**:
- [ ] ` tiny_alloc_gate_shape_env_box . h ` created
- [ ] ` tiny_alloc_gate_box . h ` modified (LEGACY-first path added)
- [ ] LIKELY hints added to fallback branches
- [ ] Clean compilation (no new warnings)
- [ ] Health check passes: ` scripts / verify_health_profiles . sh `
**A/B Testing**:
- [ ] Baseline 10-run completed (SHAPE=0)
- [ ] Optimized 10-run completed (SHAPE=1)
- [ ] Statistics calculated (mean, median, stddev)
- [ ] Results documented
- [ ] Success criteria evaluated (GO/NEUTRAL/NO-GO)
**Promotion** (if GO):
- [ ] ` bench_profile . h ` updated (default SHAPE=1)
- [ ] ` ENV_PROFILE_PRESETS . md ` updated
- [ ] ` CURRENT_TASK . md ` updated (Phase 4 complete)
- [ ] Commit created with clear message
- [ ] Cumulative gain updated (+X.X% total)
---
**Phase 4 D3 Status**: DESIGN COMPLETE
**Next Step**: Implementation → A/B Test → Decision
**Expected Outcome**: +1-2% gain (conservative), LOW risk
**Rollback Plan**: ` HAKMEM_ALLOC_GATE_SHAPE = 0` ( instant revert )