2025-12-17 21:08:17 +09:00
|
|
|
|
# CURRENT_TASK(Rolling, SSOT)
|
2025-12-12 16:26:42 +09:00
|
|
|
|
|
2025-12-17 21:08:17 +09:00
|
|
|
|
## 0) 今の「正」
|
2025-12-16 05:35:11 +09:00
|
|
|
|
|
2025-12-17 21:08:17 +09:00
|
|
|
|
- **性能比較の正**: FAST PGO build(`make pgo-fast-full` → `bench_random_mixed_hakmem_minimal_pgo`)✓ **Phase 68 昇格済み** (seed/WS diversified)
|
2025-12-16 15:01:56 +09:00
|
|
|
|
- **安全・互換の正**: Standard build(`make bench_random_mixed_hakmem`)
|
|
|
|
|
|
- **観測の正**: OBSERVE build(`make perf_observe`)
|
2025-12-17 21:08:17 +09:00
|
|
|
|
- **スコアカード**: `docs/analysis/PERFORMANCE_TARGETS_SCORECARD.md`(M1 達成・超過: 50.93% vs 50% target)
|
2025-12-16 15:01:56 +09:00
|
|
|
|
- **計測の正(Mixed 10-run)**: `scripts/run_mixed_10_cleanenv.sh`(`ITERS=20000000 WS=400`)
|
2025-12-16 05:35:11 +09:00
|
|
|
|
|
2025-12-17 21:08:17 +09:00
|
|
|
|
## 1) 現状(要点)
|
2025-12-16 05:35:11 +09:00
|
|
|
|
|
2025-12-17 21:08:17 +09:00
|
|
|
|
- Phase 64(backend prune / DCE): **NO-GO**(-4.05%) → layout tax 由来
|
|
|
|
|
|
- Phase 63(FAST_PROFILE_FIXED): **研究用ビルド**として保持(FAST の gate を compile-time 固定)
|
|
|
|
|
|
- Phase 65(Hot Symbol Ordering): **BLOCKED**(GCC+LTO の制約で不公平/不可能)→ `docs/analysis/PHASE65_HOT_SYMBOL_ORDERING_1_RESULTS.md`
|
|
|
|
|
|
- Phase 66(PGO, GCC+LTO): **GO** ✓
|
|
|
|
|
|
- 検証: 3回独立実行で +3.0% mean, all >+2.89%, 分散 <±1%
|
|
|
|
|
|
- Baseline: `bench_random_mixed_hakmem_minimal_pgo` = 60.89M ops/s = 50.32% (initial PGO)
|
|
|
|
|
|
- Phase 68(PGO training set 最適化): **GO & 昇格完了** ✓
|
|
|
|
|
|
- 検証: 10-run で +1.19% vs Phase 66 (GO: +1.0% threshold超過)
|
|
|
|
|
|
- 新 baseline: `bench_random_mixed_hakmem_minimal_pgo` (upgraded) = 61.614M ops/s = **50.93%** (50% target 超過、+0.93pp)
|
Phase 54-60: Memory-Lean mode, Balanced mode stabilization, M1 (50%) achievement
## Summary
Completed Phase 54-60 optimization work:
**Phase 54-56: Memory-Lean mode (LEAN+OFF prewarm suppression)**
- Implemented ss_mem_lean_env_box.h with ENV gates
- Balanced mode (LEAN+OFF) promoted as production default
- Result: +1.2% throughput, better stability, zero syscall overhead
- Added to bench_profile.h: MIXED_TINYV3_C7_BALANCED preset
**Phase 57: 60-min soak finalization**
- Balanced mode: 60-min soak, RSS drift 0%, CV 5.38%
- Speed-first mode: 60-min soak, RSS drift 0%, CV 1.58%
- Syscall budget: 1.25e-7/op (800× under target)
- Status: PRODUCTION-READY
**Phase 59: 50% recovery baseline rebase**
- hakmem FAST (Balanced): 59.184M ops/s, CV 1.31%
- mimalloc: 120.466M ops/s, CV 3.50%
- Ratio: 49.13% (M1 ACHIEVED within statistical noise)
- Superior stability: 2.68× better CV than mimalloc
**Phase 60: Alloc pass-down SSOT (NO-GO)**
- Implemented alloc_passdown_ssot_env_box.h
- Modified malloc_tiny_fast.h for SSOT pattern
- Result: -0.46% (NO-GO)
- Key lesson: SSOT not applicable where early-exit already optimized
## Key Metrics
- Performance: 49.13% of mimalloc (M1 effectively achieved)
- Stability: CV 1.31% (superior to mimalloc 3.50%)
- Syscall budget: 1.25e-7/op (excellent)
- RSS: 33MB stable, 0% drift over 60 minutes
## Files Added/Modified
New boxes:
- core/box/ss_mem_lean_env_box.h
- core/box/ss_release_policy_box.{h,c}
- core/box/alloc_passdown_ssot_env_box.h
Scripts:
- scripts/soak_mixed_single_process.sh
- scripts/analyze_epoch_tail_csv.py
- scripts/soak_mixed_rss.sh
- scripts/calculate_percentiles.py
- scripts/analyze_soak.py
Documentation: Phase 40-60 analysis documents
## Design Decisions
1. Profile separation (core/bench_profile.h):
- MIXED_TINYV3_C7_SAFE: Speed-first (no LEAN)
- MIXED_TINYV3_C7_BALANCED: Balanced mode (LEAN+OFF)
2. Box Theory compliance:
- All ENV gates reversible (HAKMEM_SS_MEM_LEAN, HAKMEM_ALLOC_PASSDOWN_SSOT)
- Single conversion points maintained
- No physical deletions (compile-out only)
3. Lessons learned:
- SSOT effective only where redundancy exists (Phase 60 showed limits)
- Branch prediction extremely effective (~0 cycles for well-predicted branches)
- Early-exit pattern valuable even when seemingly redundant
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-17 06:24:01 +09:00
|
|
|
|
|
2025-12-17 21:08:17 +09:00
|
|
|
|
## 2) 次の指示書(Active)
|
2025-12-12 03:50:58 +09:00
|
|
|
|
|
2025-12-17 21:08:17 +09:00
|
|
|
|
**Phase 68: PGO training set 最適化** ✅ **完了**
|
2025-12-16 05:35:11 +09:00
|
|
|
|
|
2025-12-17 21:08:17 +09:00
|
|
|
|
- ✓ seed/WS diversification: WS (3→5パターン), seed (1→3パターン)
|
|
|
|
|
|
- ✓ 10-run 検証: +1.19% vs Phase 66 (GO threshold +1.0% 超過)
|
|
|
|
|
|
- ✓ Baseline 昇格: 61.614M ops/s = 50.93% (M1 target 50% を +0.93pp 超過)
|
|
|
|
|
|
- ✓ スコアカード・CURRENT_TASK 更新完了
|
2025-12-16 05:35:11 +09:00
|
|
|
|
|
2025-12-17 21:08:17 +09:00
|
|
|
|
---
|
2025-12-16 05:35:11 +09:00
|
|
|
|
|
2025-12-17 21:08:17 +09:00
|
|
|
|
**Phase 67a(推奨): layout tax 法医学調査**
|
Phase 54-60: Memory-Lean mode, Balanced mode stabilization, M1 (50%) achievement
## Summary
Completed Phase 54-60 optimization work:
**Phase 54-56: Memory-Lean mode (LEAN+OFF prewarm suppression)**
- Implemented ss_mem_lean_env_box.h with ENV gates
- Balanced mode (LEAN+OFF) promoted as production default
- Result: +1.2% throughput, better stability, zero syscall overhead
- Added to bench_profile.h: MIXED_TINYV3_C7_BALANCED preset
**Phase 57: 60-min soak finalization**
- Balanced mode: 60-min soak, RSS drift 0%, CV 5.38%
- Speed-first mode: 60-min soak, RSS drift 0%, CV 1.58%
- Syscall budget: 1.25e-7/op (800× under target)
- Status: PRODUCTION-READY
**Phase 59: 50% recovery baseline rebase**
- hakmem FAST (Balanced): 59.184M ops/s, CV 1.31%
- mimalloc: 120.466M ops/s, CV 3.50%
- Ratio: 49.13% (M1 ACHIEVED within statistical noise)
- Superior stability: 2.68× better CV than mimalloc
**Phase 60: Alloc pass-down SSOT (NO-GO)**
- Implemented alloc_passdown_ssot_env_box.h
- Modified malloc_tiny_fast.h for SSOT pattern
- Result: -0.46% (NO-GO)
- Key lesson: SSOT not applicable where early-exit already optimized
## Key Metrics
- Performance: 49.13% of mimalloc (M1 effectively achieved)
- Stability: CV 1.31% (superior to mimalloc 3.50%)
- Syscall budget: 1.25e-7/op (excellent)
- RSS: 33MB stable, 0% drift over 60 minutes
## Files Added/Modified
New boxes:
- core/box/ss_mem_lean_env_box.h
- core/box/ss_release_policy_box.{h,c}
- core/box/alloc_passdown_ssot_env_box.h
Scripts:
- scripts/soak_mixed_single_process.sh
- scripts/analyze_epoch_tail_csv.py
- scripts/soak_mixed_rss.sh
- scripts/calculate_percentiles.py
- scripts/analyze_soak.py
Documentation: Phase 40-60 analysis documents
## Design Decisions
1. Profile separation (core/bench_profile.h):
- MIXED_TINYV3_C7_SAFE: Speed-first (no LEAN)
- MIXED_TINYV3_C7_BALANCED: Balanced mode (LEAN+OFF)
2. Box Theory compliance:
- All ENV gates reversible (HAKMEM_SS_MEM_LEAN, HAKMEM_ALLOC_PASSDOWN_SSOT)
- Single conversion points maintained
- No physical deletions (compile-out only)
3. Lessons learned:
- SSOT effective only where redundancy exists (Phase 60 showed limits)
- Branch prediction extremely effective (~0 cycles for well-predicted branches)
- Early-exit pattern valuable even when seemingly redundant
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-17 06:24:01 +09:00
|
|
|
|
|
2025-12-17 21:08:17 +09:00
|
|
|
|
- **狙い**: Phase 64 NO-GO (-4.05%) の根本原因を「再現可能な手順」に固定
|
|
|
|
|
|
- **やること**: perf stat (cycles/IPC/branch-miss/cache-miss/iTLB) を差分テンプレ化 → docs に添付
|
|
|
|
|
|
- Binary diff: Phase 66 baseline vs Phase 64 attempt
|
|
|
|
|
|
- perf drill-down: Hot function の IPC drop / branch miss rate 増加を定量化
|
|
|
|
|
|
- 実装変更なし(法医学ドキュメント化のみ)
|
|
|
|
|
|
- **成果物**: `docs/analysis/PHASE67A_LAYOUT_TAX_FORENSICS_RESULTS.md`
|
Phase 54-60: Memory-Lean mode, Balanced mode stabilization, M1 (50%) achievement
## Summary
Completed Phase 54-60 optimization work:
**Phase 54-56: Memory-Lean mode (LEAN+OFF prewarm suppression)**
- Implemented ss_mem_lean_env_box.h with ENV gates
- Balanced mode (LEAN+OFF) promoted as production default
- Result: +1.2% throughput, better stability, zero syscall overhead
- Added to bench_profile.h: MIXED_TINYV3_C7_BALANCED preset
**Phase 57: 60-min soak finalization**
- Balanced mode: 60-min soak, RSS drift 0%, CV 5.38%
- Speed-first mode: 60-min soak, RSS drift 0%, CV 1.58%
- Syscall budget: 1.25e-7/op (800× under target)
- Status: PRODUCTION-READY
**Phase 59: 50% recovery baseline rebase**
- hakmem FAST (Balanced): 59.184M ops/s, CV 1.31%
- mimalloc: 120.466M ops/s, CV 3.50%
- Ratio: 49.13% (M1 ACHIEVED within statistical noise)
- Superior stability: 2.68× better CV than mimalloc
**Phase 60: Alloc pass-down SSOT (NO-GO)**
- Implemented alloc_passdown_ssot_env_box.h
- Modified malloc_tiny_fast.h for SSOT pattern
- Result: -0.46% (NO-GO)
- Key lesson: SSOT not applicable where early-exit already optimized
## Key Metrics
- Performance: 49.13% of mimalloc (M1 effectively achieved)
- Stability: CV 1.31% (superior to mimalloc 3.50%)
- Syscall budget: 1.25e-7/op (excellent)
- RSS: 33MB stable, 0% drift over 60 minutes
## Files Added/Modified
New boxes:
- core/box/ss_mem_lean_env_box.h
- core/box/ss_release_policy_box.{h,c}
- core/box/alloc_passdown_ssot_env_box.h
Scripts:
- scripts/soak_mixed_single_process.sh
- scripts/analyze_epoch_tail_csv.py
- scripts/soak_mixed_rss.sh
- scripts/calculate_percentiles.py
- scripts/analyze_soak.py
Documentation: Phase 40-60 analysis documents
## Design Decisions
1. Profile separation (core/bench_profile.h):
- MIXED_TINYV3_C7_SAFE: Speed-first (no LEAN)
- MIXED_TINYV3_C7_BALANCED: Balanced mode (LEAN+OFF)
2. Box Theory compliance:
- All ENV gates reversible (HAKMEM_SS_MEM_LEAN, HAKMEM_ALLOC_PASSDOWN_SSOT)
- Single conversion points maintained
- No physical deletions (compile-out only)
3. Lessons learned:
- SSOT effective only where redundancy exists (Phase 60 showed limits)
- Branch prediction extremely effective (~0 cycles for well-predicted branches)
- Early-exit pattern valuable even when seemingly redundant
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-17 06:24:01 +09:00
|
|
|
|
|
2025-12-17 21:08:17 +09:00
|
|
|
|
**Phase 67b(後続): 境界inline/unrollチューニング**
|
|
|
|
|
|
- **注意**: layout tax リスク高い(Phase 64 reference)
|
|
|
|
|
|
- **前提**: Top 50 実行確認が必須
|
|
|
|
|
|
- 触るなら最小限・高確度だけ(例: C0 allocator inline candidates のみ)
|
2025-12-17 16:27:06 +09:00
|
|
|
|
|
2025-12-17 21:08:17 +09:00
|
|
|
|
**注記**: 研究箱の削除は今やらない(link-out/削除が layout tax を起こす前例が強いので、compile-out維持が正解)
|
2025-12-17 16:27:06 +09:00
|
|
|
|
|
2025-12-17 21:08:17 +09:00
|
|
|
|
**M2 への道 (55% target)**:
|
|
|
|
|
|
- PGO はもう +1% 程度の改善上限に達した可能性(profile training set 枯渇)
|
|
|
|
|
|
- 次のレバーは: (1) layout tax 排除 / (2) structural changes(box design) / (3) compiler flags tuning
|
2025-12-17 16:27:06 +09:00
|
|
|
|
|
2025-12-17 21:08:17 +09:00
|
|
|
|
## 3) アーカイブ
|
2025-12-17 16:34:03 +09:00
|
|
|
|
|
2025-12-17 21:08:17 +09:00
|
|
|
|
- 詳細ログ: `CURRENT_TASK_ARCHIVE_20251210.md`
|
|
|
|
|
|
- 直近整理前スナップショット: `docs/analysis/CURRENT_TASK_ARCHIVE.md`
|