Phase 5 E4 Combined: E4-1 + E4-2 (+6.43% GO, baseline consolidated)
Combined A/B Test Results (10-run Mixed): - Baseline (both OFF): 44.48M ops/s (mean), 44.39M ops/s (median) - Optimized (both ON): 47.34M ops/s (mean), 47.38M ops/s (median) - Improvement: +6.43% mean, +6.74% median Interaction Analysis: - E4-1 alone: +3.51% (measured in separate session) - E4-2 alone: +21.83% (measured in separate session) - Combined: +6.43% (measured in same binary) - Pattern: SUBADDITIVE (overlapping bottlenecks) Key Finding: Single-binary incremental gain is the accurate metric - E4-1 and E4-2 target overlapping TLS/branch resources - Individual measurements were from different baselines/sessions - Combined measurement (same binary, both flags) shows true progress Phase 5 Total Progress: - Original baseline (session start): 35.74M ops/s - Combined optimized: 47.34M ops/s - Total gain: +32.4% (cross-session, reference only) - Same-binary gain: +6.43% (E4-1+E4-2 both ON vs both OFF) New Baseline Perf Profile (47.0M ops/s): - free: 37.56% self% (still top hotspot) - tiny_alloc_gate_fast: 13.73% (reduced from 19.50%) - malloc: 12.95% (reduced from 16.13%) - tiny_region_id_write_header: 6.97% (header write tax) - hakmem_env_snapshot_enabled: 4.29% (ENV overhead visible) Health Check: PASS - MIXED_TINYV3_C7_SAFE: 42.3M ops/s - C6_HEAVY_LEGACY_POOLV1: 20.9M ops/s Phase 5 E5 Candidates (from perf profile): - E5-1: free() path internals (37.56% self%) - E5-2: Header write reduction (6.97% self%) - E5-3: ENV snapshot overhead (4.29% self%) Deliverables: - docs/analysis/PHASE5_E4_COMBINED_AB_TEST_RESULTS.md - docs/analysis/PHASE5_E5_NEXT_INSTRUCTIONS.md - CURRENT_TASK.md (E4 combined complete, E5 candidates) - docs/analysis/PHASE5_POST_E1_NEXT_INSTRUCTIONS.md (E5 pointer) - perf.data.e4combined (perf profile data) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
@ -1,5 +1,84 @@
|
||||
# 本線タスク(現在)
|
||||
|
||||
## 更新メモ(2025-12-14 Phase 5 E4 Combined Complete - E4-1 + E4-2 Interaction Analysis)
|
||||
|
||||
### Phase 5 E4 Combined: E4-1 + E4-2 同時有効化 ✅ GO (2025-12-14)
|
||||
|
||||
**Target**: Measure combined effect of both wrapper ENV snapshots (free + malloc)
|
||||
- Strategy: Enable both HAKMEM_FREE_WRAPPER_ENV_SNAPSHOT=1 and HAKMEM_MALLOC_WRAPPER_ENV_SNAPSHOT=1
|
||||
- Goal: Verify interaction (additive / subadditive / superadditive) and establish new baseline
|
||||
|
||||
**A/B Test Results** (Mixed, 10-run, 20M iters, ws=400):
|
||||
- Baseline (both OFF): **44.48M ops/s** (mean), 44.39M ops/s (median), σ=0.38M
|
||||
- Optimized (both ON): **47.34M ops/s** (mean), 47.38M ops/s (median), σ=0.42M
|
||||
- **Delta: +6.43% mean, +6.74% median** ✅
|
||||
|
||||
**Individual vs Combined**:
|
||||
- E4-1 alone (free wrapper): +3.51%
|
||||
- E4-2 alone (malloc wrapper): +21.83%
|
||||
- **Combined (both): +6.43%**
|
||||
- **Interaction: 非加算**(“単独” は別セッションの参考値。増分は E4 Combined A/B を正とする)
|
||||
|
||||
**Analysis - Why Subadditive?**:
|
||||
1. **Baseline mismatch**: E4-1 と E4-2 の “単独” A/B は別セッション(別バイナリ状態)で測られており、前提が一致しない
|
||||
- E4-1: 45.35M → 46.94M(+3.51%)
|
||||
- E4-2: 35.74M → 43.54M(+21.83%)
|
||||
- 足し算期待値は作らず、同一バイナリでの **E4 Combined A/B** を “正” とする
|
||||
2. **Shared Bottlenecks**: Both optimizations target TLS read consolidation
|
||||
- Once TLS access is optimized in one path, benefits in the other path are reduced
|
||||
- Memory bandwidth / cache line effects are shared resources
|
||||
3. **Branch Predictor Saturation**: Both paths compete for branch predictor entries
|
||||
- ENV snapshot checks add branches that compete for same predictor resources
|
||||
- Combined overhead is non-linear
|
||||
|
||||
**Health Check**: ✅ PASS
|
||||
- MIXED_TINYV3_C7_SAFE: 42.3M ops/s
|
||||
- C6_HEAVY_LEGACY_POOLV1: 20.9M ops/s
|
||||
- All profiles passed, no regressions
|
||||
|
||||
**Perf Profile** (New Baseline: both ON, 20M iters, 47.0M ops/s):
|
||||
|
||||
Top Hot Spots (self% >= 2.0%):
|
||||
1. free: 37.56% (wrapper + gate, still dominant)
|
||||
2. tiny_alloc_gate_fast: 13.73% (alloc gate, reduced from 19.50%)
|
||||
3. malloc: 12.95% (wrapper, reduced from 16.13%)
|
||||
4. main: 11.13% (benchmark driver)
|
||||
5. tiny_region_id_write_header: 6.97% (header write cost)
|
||||
6. tiny_c7_ultra_alloc: 4.56% (C7 alloc path)
|
||||
7. hakmem_env_snapshot_enabled: 4.29% (ENV snapshot overhead, visible)
|
||||
8. tiny_get_max_size: 4.24% (size limit check)
|
||||
|
||||
**Next Phase 5 Candidates** (self% >= 5%):
|
||||
- **free (37.56%)**: Still the largest hot spot, but harder to optimize further
|
||||
- Already has ENV snapshot, hotcold path, static routing
|
||||
- Next step: Analyze free path internals (tiny_free_fast structure)
|
||||
- **tiny_region_id_write_header (6.97%)**: Header write tax
|
||||
- Phase 1 A3 showed always_inline is NO-GO (-4% on Mixed)
|
||||
- Alternative: Reduce header writes (selective mode, cached writes)
|
||||
|
||||
**Key Insight**: ENV snapshot pattern は有効だが、**複数パスに同時適用したときの増分は足し算にならない**。評価は同一バイナリでの **E4 Combined A/B**(+6.43%)を正とする。
|
||||
|
||||
**Decision: GO** (+6.43% >= +1.0% threshold)
|
||||
- New baseline: **47.34M ops/s** (Mixed, 20M iters, ws=400)
|
||||
- Both optimizations remain DEFAULT ON in MIXED_TINYV3_C7_SAFE
|
||||
- Action: Shift focus to next bottleneck (free path internals or header write optimization)
|
||||
|
||||
**Cumulative Status (Phase 5)**:
|
||||
- E4-1 (Free Wrapper Snapshot): +3.51% standalone
|
||||
- E4-2 (Malloc Wrapper Snapshot): +21.83% standalone (on top of E4-1)
|
||||
- **E4 Combined: +6.43%** (from original baseline with both OFF)
|
||||
- **Total Phase 5: +6.43%** (on top of Phase 4's +3.9%)
|
||||
- **Overall progress: 35.74M → 47.34M = +32.4%** (from Phase 5 start to E4 combined)
|
||||
|
||||
**Next Steps**:
|
||||
- Profile analysis: Identify E5 candidates (free path, header write, or other hot spots)
|
||||
- Consider: free() fast path structure optimization (37.56% self% is large target)
|
||||
- Consider: Header write reduction strategies (6.97% self%)
|
||||
- Update design docs with subadditive interaction analysis
|
||||
- Design doc: `docs/analysis/PHASE5_E4_COMBINED_AB_TEST_RESULTS.md`
|
||||
|
||||
---
|
||||
|
||||
## 更新メモ(2025-12-14 Phase 5 E4-2 Complete - Malloc Gate Optimization)
|
||||
|
||||
### Phase 5 E4-2: malloc Wrapper ENV Snapshot ✅ GO (2025-12-14)
|
||||
|
||||
Reference in New Issue
Block a user