Phase 3 Finalization: D1 20-run validation, D2 frozen, baseline established
Summary: - D1 (Free route cache): 20-run validation → PROMOTED TO DEFAULT - Baseline (20-run, ROUTE=0): 46.30M ops/s (mean), 46.30M (median) - Optimized (20-run, ROUTE=1): 47.32M ops/s (mean), 47.39M (median) - Mean gain: +2.19%, Median gain: +2.37% - Decision: GO (both criteria met: mean >= +1.0%, median >= +0.0%) - Implementation: Added HAKMEM_FREE_STATIC_ROUTE=1 to MIXED preset - D2 (Wrapper env cache): FROZEN - Previous result: -1.44% regression (TLS overhead > benefit) - Status: Research box (do not pursue further) - Default: OFF (not included in MIXED_TINYV3_C7_SAFE preset) - Baseline Phase 3: 46.04M ops/s (Mixed, 10-run, 2025-12-13) Cumulative Gains (Phase 2-3): B3: +2.89%, B4: +1.47%, C3: +2.20%, D1: +2.19% Total: ~7.6-8.9% (conservative: 7.6%, multiplicative: 8.93%) MID_V3 fix: +13% (structural change, Mixed OFF by default) Documentation Updates: - PHASE3_FINALIZATION_SUMMARY.md: Comprehensive Phase 3 report - PHASE3_CACHE_LOCALITY_NEXT_INSTRUCTIONS.md: D1/D2 final status - PHASE3_D1_FREE_ROUTE_CACHE_1_DESIGN.md: 20-run validation results - PHASE3_D2_WRAPPER_ENV_CACHE_1_DESIGN.md: FROZEN status - ENV_PROFILE_PRESETS.md: D1 ADOPT, D2 FROZEN - PHASE3_BASELINE_AND_CANDIDATES.md: Post-D1/D2 status - CURRENT_TASK.md: Phase 3 complete summary Next: - D3 requires perf validation (tiny_alloc_gate_fast self% ≥5%) - Or Phase 4 planning if no more D3-class targets - Current active optimizations: B3, B4, C3, D1, MID_V3 fix Files Changed: - docs/analysis/PHASE3_FINALIZATION_SUMMARY.md (new, 580+ lines) - docs/analysis/*.md (6 files updated with D1/D2 results) - CURRENT_TASK.md (Phase 3 status update) - analyze_d1_results.py (statistical analysis script) - core/bench_profile.h (D1 promoted to default in MIXED preset) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
This commit is contained in:
@ -282,35 +282,21 @@ static inline void* tiny_alloc_gate_fast(size_t size)
|
||||
|
||||
## Step 3: Recommended Next Steps
|
||||
|
||||
### Phase 3 D1: Free Path Route Cache (HIGH PRIORITY)
|
||||
**Target**: `tiny_route_for_class()` eliminating snapshot check in free path
|
||||
**Expected Gain**: +1-2%
|
||||
**Risk**: MEDIUM
|
||||
**Effort**: 2-3 hours
|
||||
### Phase 3 D1: Free Path Route Cache ✅ GO(ENV opt-in)
|
||||
**Target**: `tiny_route_for_class()` の呼び出しを free path から削る
|
||||
**Result**: Mixed 10-run mean **+1.06%**(median は負ける回がある)
|
||||
**Decision**: ✅ GO だが **default 化は 20-run 確認待ち**
|
||||
|
||||
**Implementation**:
|
||||
1. Add `tiny_static_route_for_free(ci)` function (mirror of alloc path optimization)
|
||||
2. Cache route decisions at init time in `g_tiny_static_route_free[8]`
|
||||
3. Update `free_tiny_fast_hot()` to use cached route
|
||||
4. A/B test: BASELINE vs D1
|
||||
|
||||
**ENV Gate**: `HAKMEM_FREE_STATIC_ROUTE=1` (default: 0)
|
||||
**ENV Gate**: `HAKMEM_FREE_STATIC_ROUTE=1`(default: 0)
|
||||
|
||||
---
|
||||
|
||||
### Phase 3 D2: Wrapper Env Cache (HIGH PRIORITY)
|
||||
**Target**: `wrapper_env_cfg()` caching in free path
|
||||
**Expected Gain**: +1-2%
|
||||
**Risk**: LOW
|
||||
**Effort**: 1-2 hours
|
||||
### Phase 3 D2: Wrapper Env Cache ❌ NO-GO(FROZEN)
|
||||
**Target**: `wrapper_env_cfg()` の呼び出しを wrapper hot path から削る
|
||||
**Result**: Mixed 10-run mean **-1.44%** regression
|
||||
**Decision**: ❌ NO-GO(研究箱 freeze、default OFF)
|
||||
|
||||
**Implementation**:
|
||||
1. Cache `wrapper_env_cfg()` result in TLS or init-time global
|
||||
2. Avoid repeated memory load on every free() call
|
||||
3. Update free wrapper to use cached pointer
|
||||
4. A/B test: BASELINE vs D2
|
||||
|
||||
**ENV Gate**: `HAKMEM_WRAP_ENV_CACHE=1` (default: 0)
|
||||
**ENV Gate**: `HAKMEM_WRAP_ENV_CACHE=1`(default: 0)
|
||||
|
||||
---
|
||||
|
||||
@ -330,15 +316,14 @@ static inline void* tiny_alloc_gate_fast(size_t size)
|
||||
|
||||
---
|
||||
|
||||
## Expected Cumulative Results
|
||||
## Expected Cumulative Results(更新)
|
||||
|
||||
| Phase | Optimization | Expected Gain | Cumulative |
|
||||
|------------|----------------------------------|---------------|-------------|
|
||||
| Baseline | MID_V3=0 + B3+B4+C3 | - | 46.79M ops/s|
|
||||
| **Phase 3 D1** | Free route cache | +1-2% | 47.3-47.7M |
|
||||
| **Phase 3 D2** | Wrapper env cache | +1-2% | 47.8-48.7M |
|
||||
| **Phase 3 D3** | Alloc gate specialization | +1-2% | 48.3-49.7M |
|
||||
| **Total Expected** | - | **+3-6%** | **48-50M ops/s** |
|
||||
| Phase | Optimization | Expected Gain | Notes |
|
||||
|------------|----------------------------------|---------------|-------|
|
||||
| Baseline | MID_V3=0 + B3+B4+C3 | - | — |
|
||||
| **D1** | Free route cache | +0〜+2% | mean は勝ち、median 確認待ち(default OFF) |
|
||||
| **D2** | Wrapper env cache | — | NO-GO(freeze) |
|
||||
| **D3** | Alloc gate specialization | +0〜+2% | perf で 5% 超なら着手 |
|
||||
|
||||
**With MID_V3 fix for Mixed**: +13% additional (expected ~56M ops/s total)
|
||||
|
||||
@ -349,31 +334,49 @@ static inline void* tiny_alloc_gate_fast(size_t size)
|
||||
| Optimization | Risk Level | Mitigation |
|
||||
|---------------------|------------|-------------------------------------------------|
|
||||
| Free route cache | MEDIUM | Ensure init ordering, ENV gate for rollback |
|
||||
| Wrapper env cache | LOW | Read-only after init, simple TLS cache |
|
||||
| Wrapper env cache | — | NO-GO(-1.44% regression) |
|
||||
| Alloc specialization| LOW | Profile-specific, existing static route pattern |
|
||||
|
||||
**All optimizations**: Follow ENV gate + A/B test + decision pattern (research box)
|
||||
|
||||
---
|
||||
|
||||
## Next Actions
|
||||
## Post-D1/D2 Status (2025-12-13)
|
||||
|
||||
1. **Immediate**: Implement Phase 3 D1 (Free route cache)
|
||||
- Expected: +1-2% gain
|
||||
- Risk: MEDIUM (requires careful init ordering)
|
||||
- Timeline: 2-3 hours
|
||||
### Phase 3 D1/D2 Validation Complete ✅
|
||||
|
||||
2. **Follow-up**: Implement Phase 3 D2 (Wrapper env cache)
|
||||
- Expected: +1-2% gain
|
||||
- Risk: LOW
|
||||
- Timeline: 1-2 hours
|
||||
1. **D1 (Free Route Cache)**: ✅ ADOPT - PROMOTED TO DEFAULT
|
||||
- 20-run validation completed
|
||||
- Results: Mean +2.19%, Median +2.37% (both criteria met)
|
||||
- Status: Added to MIXED_TINYV3_C7_SAFE preset as default
|
||||
- Implementation: `HAKMEM_FREE_STATIC_ROUTE=1`
|
||||
|
||||
3. **Optional**: Implement Phase 3 D3 (Alloc gate specialization)
|
||||
- Expected: +1-2% gain
|
||||
- Risk: LOW
|
||||
- Timeline: 2-3 hours
|
||||
2. **D2 (Wrapper Env Cache)**: ❌ FROZEN
|
||||
- Results: -1.44% regression
|
||||
- Status: Research box frozen, default OFF, do not pursue
|
||||
- Implementation: `HAKMEM_WRAP_ENV_CACHE=1` (opt-in only, not recommended)
|
||||
|
||||
**Total Timeline**: 5-8 hours for +3-6% cumulative improvement
|
||||
### Active Optimizations in MIXED_TINYV3_C7_SAFE
|
||||
|
||||
1. **B3**: Routing branch shape (+2.89% proven)
|
||||
2. **B4**: Wrapper hot/cold split (+1.47% proven)
|
||||
3. **C3**: Static routing (+2.20% proven)
|
||||
4. **D1**: Free route cache (+2.19% proven) - NEW
|
||||
5. **MID_V3**: OFF for Mixed (C6 routing fix, +13% proven)
|
||||
|
||||
**Cumulative gain**: ~7.6% (B3 + B4 + C3 + D1, excluding MID_V3 fix)
|
||||
|
||||
### Next Actions
|
||||
|
||||
1. **Profile**: Run perf on current baseline to identify next targets
|
||||
- Requirement: self% ≥5% for Phase 3 D3 consideration
|
||||
- Target: `tiny_alloc_gate_fast` specialization
|
||||
|
||||
2. **Optional**: Phase 3 D3 (Alloc gate specialization) - pending perf validation
|
||||
- Only proceed if perf shows ≥5% self% in alloc gate
|
||||
- ENV: `HAKMEM_ALLOC_GATE_LEGACY_ONLY=0/1`
|
||||
|
||||
3. **Phase 4 Planning**: If no more 5%+ targets, prepare Phase 4 roadmap
|
||||
|
||||
---
|
||||
|
||||
|
||||
Reference in New Issue
Block a user