Phase 4: E1 docs + E2 next instructions

This commit is contained in:
Moe Charm (CI)
2025-12-14 01:46:18 +09:00
parent 88717a8737
commit 7f3ff6c7e6
7 changed files with 185 additions and 35 deletions

View File

@ -1,6 +1,36 @@
# 本線タスク(現在)
## 更新メモ2025-12-14 Phase 4 E1 Next - ENV Snapshot Consolidation
## 更新メモ2025-12-14 Phase 4 E1 Complete - ENV Snapshot Consolidation
### Phase 4 E1: ENV Snapshot Consolidation ✅ COMPLETE (2025-12-14)
**Target**: Consolidate 3 ENV gate TLS reads → 1 TLS read
- `tiny_c7_ultra_enabled_env()`: 1.28% self
- `tiny_front_v3_enabled()`: 1.01% self
- `tiny_metadata_cache_enabled()`: 0.97% self
- **Total ENV overhead: 3.26% self** (from perf profile)
**Implementation**:
- Created `core/box/hakmem_env_snapshot_box.{h,c}` (new ENV snapshot box)
- Migrated 8 call sites across 3 hot path files to use snapshot
- ENV gate: `HAKMEM_ENV_SNAPSHOT=0/1` (default: 0, research box)
- Pattern: Similar to `tiny_front_v3_snapshot` (proven approach)
**A/B Test Results** (Mixed, 10-run, 20M iters):
- Baseline (E1=0): **43.62M ops/s** (avg), 43.56M ops/s (median)
- Optimized (E1=1): **45.33M ops/s** (avg), 45.31M ops/s (median)
- **Improvement: +3.92% avg, +4.01% median**
**Decision: GO** (+3.92% >= +2.5% threshold)
- Exceeded conservative expectation (+1-3%) → Achieved +3.92%
- Action: Keep as research box for now (default OFF)
- Commit: `88717a873`
**Key Insight**: Shifting from shape optimizations (plateaued) to TLS/memory overhead yields strong returns. ENV snapshot consolidation represents new optimization frontier beyond branch prediction tuning.
### Next: Phase 4 E2 - Alloc Per-Class Fast Path
- 指示書SSOT: `docs/analysis/PHASE4_E2_ALLOC_PER_CLASS_FASTPATH_NEXT_INSTRUCTIONS.md`
- 設計メモ: `docs/analysis/PHASE4_E2_ALLOC_PER_CLASS_FASTPATH_1_DESIGN.md`
### Phase 4 Perf Profiling Complete ✅ (2025-12-14)
@ -9,30 +39,10 @@
- Samples: 922 samples @ 999Hz, 3.1B cycles
- Analysis doc: `docs/analysis/PHASE4_PERF_PROFILE_ANALYSIS.md`
**Key Findings**:
1. **ENV Gate Overhead (3.26% combined)**:
- `tiny_c7_ultra_enabled_env()` (1.28%)
- `tiny_front_v3_enabled()` (1.01%)
- `tiny_metadata_cache_enabled()` (0.97%)
- Root cause: 3 separate TLS reads + lazy init checks on every hot path call
2. **Shape Optimization Plateau**:
- B3 (Routing Shape): +2.89% (initial win)
- D3 (Alloc Gate Shape): +0.56% (NEUTRAL, diminishing returns)
- Lesson: Branch prediction saturation → Next approach should target memory/TLS overhead
3. **tiny_alloc_gate_fast (15.37% self%)**:
- Route determination: 15.74% (local)
- C7 logging: 17.04% (local, rare in Mixed)
- Opportunity: Per-class fast path specialization (defer to E2)
**Next Target**: **Phase 4 E1 - ENV Snapshot Consolidation**
- Expected gain: **+3.0-3.5%** (eliminate 3.26% ENV overhead)
- Approach: Consolidate all ENV gates into single TLS snapshot struct
- Precedent: `tiny_front_v3_snapshot` pattern (already proven)
- Cross-cutting: Improves both alloc and free paths
- Next instructions (SSOT): `docs/analysis/PHASE4_E1_ENV_SNAPSHOT_CONSOLIDATION_NEXT_INSTRUCTIONS.md`
- Design memo: `docs/analysis/PHASE4_E1_ENV_SNAPSHOT_CONSOLIDATION_1_DESIGN.md`
**Key Findings Leading to E1**:
1. ENV Gate Overhead (3.26% combined)**E1 target**
2. Shape Optimization Plateau (B3 +2.89%, D3 +0.56% NEUTRAL)
3. tiny_alloc_gate_fast (15.37% self%) → defer to E2
### Phase 4 D3: Alloc Gate ShapeHAKMEM_ALLOC_GATE_SHAPE
- ✅ 実装完了ENV gate + alloc gate 分岐形)