Phase 4: E1 docs + E2 next instructions
This commit is contained in:
@ -1,6 +1,36 @@
|
||||
# 本線タスク(現在)
|
||||
|
||||
## 更新メモ(2025-12-14 Phase 4 E1 Next - ENV Snapshot Consolidation)
|
||||
## 更新メモ(2025-12-14 Phase 4 E1 Complete - ENV Snapshot Consolidation)
|
||||
|
||||
### Phase 4 E1: ENV Snapshot Consolidation ✅ COMPLETE (2025-12-14)
|
||||
|
||||
**Target**: Consolidate 3 ENV gate TLS reads → 1 TLS read
|
||||
- `tiny_c7_ultra_enabled_env()`: 1.28% self
|
||||
- `tiny_front_v3_enabled()`: 1.01% self
|
||||
- `tiny_metadata_cache_enabled()`: 0.97% self
|
||||
- **Total ENV overhead: 3.26% self** (from perf profile)
|
||||
|
||||
**Implementation**:
|
||||
- Created `core/box/hakmem_env_snapshot_box.{h,c}` (new ENV snapshot box)
|
||||
- Migrated 8 call sites across 3 hot path files to use snapshot
|
||||
- ENV gate: `HAKMEM_ENV_SNAPSHOT=0/1` (default: 0, research box)
|
||||
- Pattern: Similar to `tiny_front_v3_snapshot` (proven approach)
|
||||
|
||||
**A/B Test Results** (Mixed, 10-run, 20M iters):
|
||||
- Baseline (E1=0): **43.62M ops/s** (avg), 43.56M ops/s (median)
|
||||
- Optimized (E1=1): **45.33M ops/s** (avg), 45.31M ops/s (median)
|
||||
- **Improvement: +3.92% avg, +4.01% median**
|
||||
|
||||
**Decision: GO** (+3.92% >= +2.5% threshold)
|
||||
- Exceeded conservative expectation (+1-3%) → Achieved +3.92%
|
||||
- Action: Keep as research box for now (default OFF)
|
||||
- Commit: `88717a873`
|
||||
|
||||
**Key Insight**: Shifting from shape optimizations (plateaued) to TLS/memory overhead yields strong returns. ENV snapshot consolidation represents new optimization frontier beyond branch prediction tuning.
|
||||
|
||||
### Next: Phase 4 E2 - Alloc Per-Class Fast Path
|
||||
- 指示書(SSOT): `docs/analysis/PHASE4_E2_ALLOC_PER_CLASS_FASTPATH_NEXT_INSTRUCTIONS.md`
|
||||
- 設計メモ: `docs/analysis/PHASE4_E2_ALLOC_PER_CLASS_FASTPATH_1_DESIGN.md`
|
||||
|
||||
### Phase 4 Perf Profiling Complete ✅ (2025-12-14)
|
||||
|
||||
@ -9,30 +39,10 @@
|
||||
- Samples: 922 samples @ 999Hz, 3.1B cycles
|
||||
- Analysis doc: `docs/analysis/PHASE4_PERF_PROFILE_ANALYSIS.md`
|
||||
|
||||
**Key Findings**:
|
||||
1. **ENV Gate Overhead (3.26% combined)**:
|
||||
- `tiny_c7_ultra_enabled_env()` (1.28%)
|
||||
- `tiny_front_v3_enabled()` (1.01%)
|
||||
- `tiny_metadata_cache_enabled()` (0.97%)
|
||||
- Root cause: 3 separate TLS reads + lazy init checks on every hot path call
|
||||
|
||||
2. **Shape Optimization Plateau**:
|
||||
- B3 (Routing Shape): +2.89% (initial win)
|
||||
- D3 (Alloc Gate Shape): +0.56% (NEUTRAL, diminishing returns)
|
||||
- Lesson: Branch prediction saturation → Next approach should target memory/TLS overhead
|
||||
|
||||
3. **tiny_alloc_gate_fast (15.37% self%)**:
|
||||
- Route determination: 15.74% (local)
|
||||
- C7 logging: 17.04% (local, rare in Mixed)
|
||||
- Opportunity: Per-class fast path specialization (defer to E2)
|
||||
|
||||
**Next Target**: **Phase 4 E1 - ENV Snapshot Consolidation**
|
||||
- Expected gain: **+3.0-3.5%** (eliminate 3.26% ENV overhead)
|
||||
- Approach: Consolidate all ENV gates into single TLS snapshot struct
|
||||
- Precedent: `tiny_front_v3_snapshot` pattern (already proven)
|
||||
- Cross-cutting: Improves both alloc and free paths
|
||||
- Next instructions (SSOT): `docs/analysis/PHASE4_E1_ENV_SNAPSHOT_CONSOLIDATION_NEXT_INSTRUCTIONS.md`
|
||||
- Design memo: `docs/analysis/PHASE4_E1_ENV_SNAPSHOT_CONSOLIDATION_1_DESIGN.md`
|
||||
**Key Findings Leading to E1**:
|
||||
1. ENV Gate Overhead (3.26% combined) → **E1 target**
|
||||
2. Shape Optimization Plateau (B3 +2.89%, D3 +0.56% NEUTRAL)
|
||||
3. tiny_alloc_gate_fast (15.37% self%) → defer to E2
|
||||
|
||||
### Phase 4 D3: Alloc Gate Shape(HAKMEM_ALLOC_GATE_SHAPE)
|
||||
- ✅ 実装完了(ENV gate + alloc gate 分岐形)
|
||||
|
||||
Reference in New Issue
Block a user