Files
hakmem/CURRENT_TASK.md
Moe Charm (CI) 89a9212700 Phase 83-1 + Allocator Comparison: Switch dispatch fixed (NO-GO +0.32%), PROFILE correction, SCORECARD update
Key changes:
- Phase 83-1: Switch dispatch fixed mode (tiny_inline_slots_switch_dispatch_fixed_box) - NO-GO (marginal +0.32%, branch reduction negligible)
  Reason: lazy-init pattern already optimal, Phase 78-1 pattern shows diminishing returns

- Allocator comparison baseline update (10-run SSOT, WS=400, ITERS=20M):
  tcmalloc: 115.26M (92.33% of mimalloc)
  jemalloc: 97.39M (77.96% of mimalloc)
  system: 85.20M (68.24% of mimalloc)
  mimalloc: 124.82M (baseline)

- hakmem PROFILE correction: scripts/run_mixed_10_cleanenv.sh + run_allocator_quick_matrix.sh
  PROFILE explicitly set to MIXED_TINYV3_C7_SAFE for hakmem measurements
  Result: baseline stabilized to 55.53M (44.46% of mimalloc)
  Previous unstable measurement (35.57M) was due to profile leak

- Documentation:
  * PERFORMANCE_TARGETS_SCORECARD.md: Reference allocators + M1/M2 milestone status
  * PHASE83_1_SWITCH_DISPATCH_FIXED_RESULTS.md: Phase 83-1 analysis (NO-GO)
  * ALLOCATOR_COMPARISON_QUICK_RUNBOOK.md: Quick comparison procedure
  * ALLOCATOR_COMPARISON_SSOT.md: Detailed SSOT methodology

- M2 milestone status: 44.46% (target 55%, gap -10.54pp) - structural improvements needed

🤖 Generated with Claude Code
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-18 18:50:00 +09:00

414 lines
20 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# CURRENT_TASKRolling, SSOT
## 0) 今の「正」SSOT
- **性能比較の正**: FAST PGO build`make pgo-fast-full``bench_random_mixed_hakmem_minimal_pgo` **WarmPool=16**
- Phase 75C5/C6 inline slotsは presets に昇格済み
- Phase 75-4 で FAST PGO rebase を実施し **C5+C6=ON が +3.16% (GO)** を確認(ただし **FAST PGO baseline 自体が Phase 69 から大きく後退**している疑い → Phase 75-5 で PGO 再生成が必要)
- **安全・互換の正**: Standard build`make bench_random_mixed_hakmem`
- **観測の正**: OBSERVE build`make perf_observe`
- **スコアカード(目標/現在値)**: `docs/analysis/PERFORMANCE_TARGETS_SCORECARD.md`
- **FAST baselineSSOT**: `docs/analysis/PERFORMANCE_TARGETS_SCORECARD.md` を正とするPhase 69: 62.63M ops/s = 51.77% of mimalloc
- **Phase 75 の計測Standard**: `bench_random_mixed_hakmem`**A/B +5.41%** を確認Phase 75-3 4-point matrix
- **Phase 75 の計測FAST PGO**: `bench_random_mixed_hakmem_minimal_pgo`**A/B +3.16%** を確認Phase 75-4 4-point matrix
- 次の目標: **M2 = 55%**gap は FAST baseline を基準に判断する)
- **Mixed 10-run SSOTハーネス**: `scripts/run_mixed_10_cleanenv.sh`
- デフォルト `BENCH_BIN=./bench_random_mixed_hakmem`Standard
- FAST PGO は `BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo` を明示する
- 既定: `ITERS=20000000 WS=400``HAKMEM_WARM_POOL_SIZE=16``HAKMEM_TINY_C4_INLINE_SLOTS=1``HAKMEM_TINY_C5_INLINE_SLOTS=1``HAKMEM_TINY_C6_INLINE_SLOTS=1``HAKMEM_TINY_INLINE_SLOTS_FIXED=1``HAKMEM_TINY_INLINE_SLOTS_SWITCHDISPATCH=1`
- cleanenv で固定OFF漏れ防止: `HAKMEM_TINY_INLINE_SLOTS_SWITCHDISPATCH_FIXED=0`Phase 83-1 NO-GO / research
## 0a) ころころ防止(最低限の SSOT ルール)
- **hakmem は必ず `HAKMEM_PROFILE` を明示**する(未指定だと route が変わり、数値が破綻しやすい)。
- 推奨: `HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE`Speed-first
- 比較は目的で runner を分ける:
- hakmem SSOT最適化判断: `scripts/run_mixed_10_cleanenv.sh`
- allocator reference短時間: `scripts/run_allocator_quick_matrix.sh`
- allocator referencelayout差を最小化: `scripts/run_allocator_preload_matrix.sh`
- 再現ログを残す(数%を詰めるときの最低限):
- `scripts/bench_ssot_capture.sh`
- `HAKMEM_BENCH_ENV_LOG=1`CPU governor/EPP/freq を記録)
## 0b) Allocator比較reference
- allocator比較system/jemalloc/mimalloc/tcmalloc**reference**(別バイナリ/LD_PRELOAD → layout差を含む
- SSOT: `docs/analysis/ALLOCATOR_COMPARISON_SSOT.md`
- **QuickRandom Mixed 10-run**: `scripts/run_allocator_quick_matrix.sh`
- **重要**: hakmem は `HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE` を明示し、`scripts/run_mixed_10_cleanenv.sh` 経由で走らせるPROFILE漏れで数値が壊れるため
- **Same-binary推奨, layout差を最小化**: `scripts/run_allocator_preload_matrix.sh`
- `bench_random_mixed_system` を固定し、`LD_PRELOAD` で allocator を差し替える。
- 注記: hakmem の **linked benchmark**`bench_random_mixed_hakmem*`とは経路が異なるLD_PRELOAD=drop-in wrapper なので別物)。
- **Scenario CSVsmall-scale reference**: `scripts/bench_allocators_compare.sh`
## 1) 迷子防止(経路/観測)
“経路が踏まれていない最適化” を防ぐための最小手順。
- **Route Banner経路の誤認を潰す**: `HAKMEM_ROUTE_BANNER=1`
- 出力: Route assignmentsbackend route kind+ cache config`unified_cache_enabled` / `warm_pool_max_per_class`
- **Refill観測のSSOT**: `docs/analysis/PHASE70_REFILL_OBSERVABILITY_PREREQS_SSOT.md`
- WS=400Mixed SSOTでは miss が極小 → `unified_cache_refill()` 最適化は **凍結ROIゼロ**
## 2) 直近の結論(要点だけ)
- **Phase 69WarmPool sweep**: `HAKMEM_WARM_POOL_SIZE=16`**強GO+3.26%**、baseline 昇格済み。
- 設計: `docs/analysis/PHASE69_REFILL_TUNING_0_DESIGN.md`
- 結果: `docs/analysis/PHASE69_REFILL_TUNING_1_RESULTS.md`
- **Phase 70観測SSOT**: 統計の見える化/前提ゲート確立。WS=400 SSOT では refill は冷たい。
- SSOT: `docs/analysis/PHASE70_REFILL_OBSERVABILITY_PREREQS_SSOT.md`
- **Phase 71/73WarmPool=16 の勝ち筋確定)**: 勝ち筋は **instruction/branch の微減**perf stat で確定)。
- 詳細: `docs/analysis/PHASE70_71_WARMPOOL16_ANALYSIS.md`
- **Phase 72ENV knob ROI枯れ**: WarmPool=16 を超える ENV-only 勝ち筋なし → **構造(コード)で攻める段階**
- **Phase 78-1構造**: Inline Slots enable の per-op ENV gate を固定化し、同一バイナリ A/B で **GO+2.31%**
- 結果: `docs/analysis/PHASE78_1_INLINE_SLOTS_FIXED_MODE_RESULTS.md`
- **Phase 80-1構造**: Inline Slots の if-chain を switch dispatch 化し、同一バイナリ A/B で **GO+1.65%**
- 結果: `docs/analysis/PHASE80_INLINE_SLOTS_SWITCH_DISPATCH_1_RESULTS.md`
- **Phase 83-1構造**: Switch dispatch の per-op ENV gate を固定化 (Phase 78-1 パターン適用), 同一バイナリ A/B で **NO-GO+0.32%, branch reduction negligible**
- 結果: `docs/analysis/PHASE83_1_SWITCH_DISPATCH_FIXED_RESULTS.md`
- 原因: lazy-init pattern が既に最適化済みper-op overhead minimal→ fixed mode の ROI 極小
## 3) 運用ルールBox Theory + layout tax 対策)
- 変更は必ず **箱 + 境界1箇所 + ENVで戻せる** で積むFail-fast、最小可視化
- A/B は **同一バイナリでENVトグル**が原則(別バイナリ比較は layout が混ざる)。
- SSOT運用ころころ防止: `docs/analysis/PHASE75_6_SSOT_POLICY_FAST_PGO_VS_STANDARD.md`
- “削除して速い” は封印link-out/大削除は layout tax で符号反転しやすい)→ **compile-out** を優先。
- 診断: `scripts/box/layout_tax_forensics_box.sh` / `docs/analysis/PHASE67A_LAYOUT_TAX_FORENSICS_SSOT.md`
- 研究箱の棚卸しSSOT: `docs/analysis/RESEARCH_BOXES_SSOT.md`
- ノブ一覧: `scripts/list_hakmem_knobs.sh`
## 5) 研究箱の扱いfreeze方針
- **Phase 79-1C2 local cache**: `HAKMEM_TINY_C2_LOCAL_CACHE=0/1`
- 結果: +0.57%NO-GO, threshold +1.0% 未達)→ **research box freeze**
- SSOT/cleanenv では **default OFF**`scripts/run_mixed_10_cleanenv.sh``0` を強制)
- 物理削除はしないlayout tax リスク回避)
- **Phase 82hardening**: hot path から C2 local cache を完全除外(環境変数を立てても alloc/free hot では踏まない)
- 記録: `docs/analysis/PHASE82_C2_LOCAL_CACHE_HOTPATH_EXCLUSION.md`
## 4) 次の指示書Active
### Phase 74構造: UnifiedCache hit-path を短くする ✅ **P1 (LOCALIZE) 凍結**
**前提**:
- WS=400 SSOT では UnifiedCache miss が極小 → refill最適化は ROIゼロ。
- WarmPool=16 の勝ちは instruction/branch 微減 → hit-path を短くするのが正攻法。
**Phase 74-1: LOCALIZE (ENV-gated)****完了 (NEUTRAL +0.50%)**
- ENV: `HAKMEM_TINY_UC_LOCALIZE=0/1`
- Runtime branch overhead で instructions/branches **増加** (+0.7%/+0.4%)
- 判定: **NEUTRAL (+0.50%)**
**Phase 74-2: LOCALIZE (compile-time gate)****完了 (NEUTRAL -0.87%)**
- Build flag: `HAKMEM_TINY_UC_LOCALIZE_COMPILED=0/1` (default 0)
- Runtime branch 削除 → instructions/branches **改善** (-0.6%/-2.3%) ✓
- しかし **cache-misses +86%** (register pressure / spill) → throughput **-0.87%**
- 切り分け成功: **LOCALIZE本体は勝ち、cache-miss 増加で相殺**
- 判定: **NEUTRAL (-0.87%)****P1 (LOCALIZE) 凍結**
**結論**:
- P1 (LOCALIZE) は default OFF で凍結dependency chain 削減の ROI 低い)
- 次: **Phase 74-3 (P0: FASTAPI)** へ進む
**Phase 74-3: P0 (FASTAPI)****完了 (NEUTRAL +0.32%)**
**Goal**: `unified_cache_enabled()` / `lazy-init` / `stats` 判定を **hot loop の外へ追い出す**
**Approach**:
- `unified_cache_push_fast()` / `unified_cache_pop_fast()` API 追加
- 前提: "valid/enabled/no-stats" を caller 側で保証
- Fail-fast: 想定外の状態なら slow path へ fallback境界1箇所
- ENV gate: `HAKMEM_TINY_UC_FASTAPI=0/1` (default 0, research box)
**Results** (10-run Mixed SSOT, WS=400):
- Throughput: **+0.32%** (NEUTRAL, below +1.0% GO threshold)
- cache-misses: **-16.31%** (positive signal, insufficient throughput gain)
**判定**: **NEUTRAL (+0.32%)****P0 (FASTAPI) 凍結**
**参考**:
- 設計: `docs/analysis/PHASE74_UNIFIEDCACHE_HITPATH_STRUCTURAL_OPT_0_DESIGN.md`
- 指示書: `docs/analysis/PHASE74_UNIFIEDCACHE_HITPATH_STRUCTURAL_OPT_1_NEXT_INSTRUCTIONS.md`
- 結果 (P1/P0): `docs/analysis/PHASE74_UNIFIEDCACHE_HITPATH_STRUCTURAL_OPT_2_RESULTS.md`
---
## Phase 75構造: Hot-class Inline Slots (P2) ✅ **完了Standard A/B**
**Goal**: C4-C7 の統計分析 → targeted optimization 戦略決定
**前提** (Phase 74 learnings):
- UnifiedCache hit-path optimization の ROI が低い ← register pressure / cache-miss effects
- 次の軸: **per-class 特性を活用** → TLS-direct inline slots で branch elimination
**Phase 75-0: Per-Class Analysis****完了**
Per-class Unified-STATS (Mixed SSOT, WS=400, HAKMEM_MEASURE_UNIFIED_CACHE=1):
| Class | Capacity | Occupied | Hits | Pushes | Total Ops | Hit % | % of C4-C7 |
|-------|----------|----------|------|--------|-----------|-------|-----------|
| C6 | 128 | 127 | 2,750,854 | 2,750,855 | **5,501,709** | 100% | **57.2%** |
| C5 | 128 | 127 | 1,373,604 | 1,373,605 | **2,747,209** | 100% | **28.5%** |
| C4 | 64 | 63 | 687,563 | 687,564 | **1,375,127** | 100% | **14.3%** |
| C7 | ? | ? | ? | ? | **?** | ? | **?** |
**Key findings**:
1. C6 圧倒的支配: 57.2% の操作 (2.75M hits)
2. 全クラス 100% hit rate (refill inactive in SSOT)
3. Cache occupancy near-capacity (98-99%)
**Phase 75-1: C6-only Inline Slots****完了 (GO +2.87%)**
**Approach**: Modular box theory design with single decision point at TLS init
**Implementation** (5 new boxes + test script):
- ENV gate box: `HAKMEM_TINY_C6_INLINE_SLOTS=0/1` (lazy-init, default OFF)
- TLS extension: 128-slot ring buffer (1KB per thread, zero overhead when OFF)
- Fast-path API: `c6_inline_push()` / `c6_inline_pop()` (always_inline, 1-2 cycles)
- Integration: Minimal (2 boundary points: alloc/free for C6 class only)
- Backward compatible: Legacy code intact, fail-fast to unified_cache
**Results** (10-run Mixed SSOT, WS=400):
- Baseline (C6 inline OFF): **44.24 M ops/s**
- Treatment (C6 inline ON): **45.51 M ops/s**
- Delta: **+1.27 M ops/s (+2.87%)**
**Decision**: ✅ **GO** (exceeds +1.0% strict threshold)
**Mechanism**: Branch elimination on unified_cache for C6 (57.2% of C4-C7 ops)
**参考**:
- Per-class分析: `docs/analysis/PHASE75_PERCLASS_ANALYSIS_0_SSOT.md`
- 結果: `docs/analysis/PHASE75_C6_INLINE_SLOTS_1_RESULTS.md`
---
**Phase 75-2: C5 Inline Slots****完了 (GO +1.10%)**
**Goal**: C5-only isolated measurement (28.5% of C4-C7) for individual contribution
**Approach**: Replicate C6 pattern with careful isolation
- Add C5 ring buffer (128 slots, 1KB TLS)
- ENV gate: `HAKMEM_TINY_C5_INLINE_SLOTS=0/1` (default OFF)
- Test strategy: C5-only (baseline C5=OFF+C6=ON, treatment C5=ON+C6=ON)
- Integration: alloc/free boundary points (C5 FIRST, then C6, then unified_cache)
**Results** (10-run Mixed SSOT, WS=400):
- Baseline (C5=OFF, C6=ON): **44.26 M ops/s** (σ=0.37)
- Treatment (C5=ON, C6=ON): **44.74 M ops/s** (σ=0.54)
- Delta: **+0.49 M ops/s (+1.10%)**
**Decision**: ✅ **GO** (C5 individual contribution validated)
**Cumulative Performance**:
- Phase 75-1 (C6): +2.87%
- Phase 75-2 (C5 isolated): +1.10%
- Combined potential: ~+3.97% (if additive)
**参考**:
- 実装詳細: `docs/analysis/PHASE75_2_C5_INLINE_SLOTS_IMPLEMENTATION.md`
---
**Phase 75-3: C5+C6 Interaction Test (4-Point Matrix A/B)****完了 (STRONG GO +5.41%)**
**Goal**: Comprehensive interaction test + final promotion decision
**Approach**: 4-point matrix A/B test (single binary, ENV-only configuration)
- Point A (C5=0, C6=0): Baseline
- Point B (C5=1, C6=0): C5 solo
- Point C (C5=0, C6=1): C6 solo
- Point D (C5=1, C6=1): C5+C6 combined
**Results** (10-run per point, Mixed SSOT, WS=400):
- **Point A (baseline)**: 42.36 M ops/s
- **Point B (C5 solo)**: 43.54 M ops/s (+2.79% vs A)
- **Point C (C6 solo)**: 44.25 M ops/s (+4.46% vs A)
- **Point D (C5+C6)**: 44.65 M ops/s (+5.41% vs A) **[MAIN TARGET]**
**Additivity Analysis**:
- Expected additive (B+C-A): 45.43 M ops/s
- Actual (D): 44.65 M ops/s
- Sub-additivity: **1.72%** (near-perfect additivity, minimal negative interaction)
**Perf Stat Validation (D vs A)**:
- Instructions: -6.1% (function call elimination confirmed)
- Branches: -6.1% (matches instruction reduction)
- Cache-misses: -31.5% (improved locality, NOT +86% like Phase 74-2)
- Throughput: +5.41% (net positive)
**Decision**: ✅ **STRONG GO (+5.41%)**
- D vs A: +5.41% >> 3.0% threshold
- Sub-additivity: 1.72% << 20% acceptable
- Phase 73 thesis validated: instructions/branches DOWN, throughput UP
**Promotion Completed**:
1. `core/bench_profile.h`: Added C5+C6 defaults to `bench_apply_mixed_tinyv3_c7_common()`
2. `scripts/run_mixed_10_cleanenv.sh`: Added C5+C6 ENV defaults
3. C5+C6 inline slots now **promoted to preset defaults** for MIXED_TINYV3_C7_SAFE
**Phase 75 Complete**: C5+C6 inline slots (129-256B) deliver +5.41% proven gain **on Standard binary**`bench_random_mixed_hakmem`)。
- FAST PGO baselineスコアカードを更新する前に`BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo` **同条件の A/BC5/C6 OFF/ON** を再計測すること
### Phase 75-4FAST PGO rebase✅ 完了
- 結果: **+3.16% (GO)**4-point matrixoutlier 除外後
- 詳細: `docs/analysis/PHASE75_4_FAST_PGO_REBASE_RESULTS.md`
- 重要: Phase 69 FAST baseline (62.63M) と比較して **現行 FAST PGO baseline が大きく低い**疑いPGO profile staleness / training mismatch / build drift
### Phase 75-5PGO 再生成)✅ 完了NO-GO on hypothesis, code bloat root cause identified
目的:
- C5/C6 inline slots を含む現行コードに対して PGO training を再生成しPhase 69 クラスの FAST baseline を取り戻す
結果:
- PGO profile regeneration の効果は **限定的** (+0.3% のみ)
- Root cause **PGO profile mismatch ではなく code bloat** (+13KB, +3.1%)
- Code bloat layout tax を引き起こし IPC collapse (-7.22%), branch-miss spike (+19.4%) net -12% regression
**Forensics findings** (`scripts/box/layout_tax_forensics_box.sh`):
- Text size: +13KB (+3.1%)
- IPC: 1.80 1.67 (-7.22%)
- Branch-misses: +19.4%
- Cache-misses: +5.7%
**Decision**:
- FAST PGO code bloat に敏感 **Track A/B discipline 確立**
- Track A: Standard binary implementation decisions (SSOT for GO/NO-GO)
- Track B: FAST PGO mimalloc ratio tracking (periodic rebase, not single-point decisions)
**参考**:
- 詳細結果: `docs/analysis/PHASE75_5_PGO_REGENERATION_RESULTS.md`
- 指示書: `docs/analysis/PHASE75_5_PGO_REGENERATION_NEXT_INSTRUCTIONS.md`
---
### Phase 76構造継続: C4-C7 Remaining Classes ✅ **Phase 76-1 完了 (GO +1.73%)**
**前提** (Phase 75 complete):
- C5+C6 inline slots: +5.41% proven (Standard), +3.16% (FAST PGO)
- Code bloat sensitivity identified Track A/B discipline established
- Remaining C4-C7 coverage: C4 (14.29%), C7 (0%)
**Phase 76-0: C7 Statistics Analysis** **完了 (NO-GO for C7 P2)**
**Approach**: OBSERVE run to measure C7 allocation patterns in Mixed SSOT
**Results**: C7 = **0% operations** in Mixed SSOT workload
**Decision**: NO-GO for C7 P2 optimization proceed to C4
**参考**:
- 結果: `docs/analysis/PHASE76_0_C7_STATISTICS_ANALYSIS.md`
**Phase 76-1: C4 Inline Slots** **完了 (GO +1.73%)**
**Goal**: Complete C4-C6 inline slots trilogy, targeting remaining 14.29% of C4-C7 operations
**Implementation** (modular box pattern):
- ENV gate: `HAKMEM_TINY_C4_INLINE_SLOTS=0/1` (default OFF ON after promotion)
- TLS ring: 64 slots, 512B per thread (lighter than C5/C6's 1KB)
- Fast-path API: `c4_inline_push()` / `c4_inline_pop()` (always_inline)
- Integration: C4 FIRST C5 C6 unified_cache (alloc/free cascade)
**Results** (10-run Mixed SSOT, WS=400):
- Baseline (C4=OFF, C5=ON, C6=ON): **52.42 M ops/s**
- Treatment (C4=ON, C5=ON, C6=ON): **53.33 M ops/s**
- Delta: **+0.91 M ops/s (+1.73%)**
**Decision**: **GO** (exceeds +1.0% threshold)
**Promotion Completed**:
1. `core/bench_profile.h`: Added C4 default to `bench_apply_mixed_tinyv3_c7_common()`
2. `scripts/run_mixed_10_cleanenv.sh`: Added `HAKMEM_TINY_C4_INLINE_SLOTS=1` default
3. C4 inline slots now **promoted to preset defaults** alongside C5+C6
**Coverage Summary (C4-C7 complete)**:
- C6: 57.17% (Phase 75-1, +2.87%)
- C5: 28.55% (Phase 75-2, +1.10%)
- **C4: 14.29% (Phase 76-1, +1.73%)**
- C7: 0.00% (Phase 76-0, NO-GO)
- **Combined C4-C6: 100% of C4-C7 operations**
**Estimated Cumulative Gain**: +7-8% (C4+C5+C6 combined, assumes near-perfect additivity like Phase 75-3)
**参考**:
- 結果: `docs/analysis/PHASE76_1_C4_INLINE_SLOTS_RESULTS.md`
- C4 box files: `core/box/tiny_c4_inline_slots_*.h`, `core/front/tiny_c4_inline_slots.h`, `core/tiny_c4_inline_slots.c`
---
**Phase 76-2: C4+C5+C6 Comprehensive 4-Point Matrix** **完了 (STRONG GO +7.05%, super-additive)**
**Goal**: Validate cumulative C4+C5+C6 interaction and establish SSOT baseline for next optimization axis
**Results** (4-point matrix, 10-run each):
- Point A (all OFF): 49.48 M ops/s (baseline)
- Point B (C4 only): 49.44 M ops/s (-0.08%, context-dependent regression)
- Point C (C5+C6 only): 52.27 M ops/s (+5.63% vs A)
- Point D (all ON): **52.97 M ops/s (+7.05% vs A)** **STRONG GO**
**Critical Discovery**:
- C4 shows **-0.08% regression in isolation** (C5/C6 OFF)
- C4 shows **+1.27% gain in context** (with C5+C6 ON)
- **Super-additivity**: Actual D (+7.05%) exceeds expected additive (+5.56%)
- **Implication**: Per-class optimizations are **context-dependent**, not independently additive
**Sub-additivity Analysis**:
- Expected additive: 52.23 M ops/s (B + C - A)
- Actual: 52.97 M ops/s
- Gain: **-1.42% (super-additive!)**
**Decision**: **STRONG GO**
- D vs A: +7.05% >> +3.0% threshold
- Super-additive behavior confirms synergistic gains
- C4+C5+C6 locked to SSOT defaults
**参考**:
- 詳細結果: `docs/analysis/PHASE76_2_C4C5C6_MATRIX_RESULTS.md`
---
### 🟩 完了C4-C7 Inline Slots Optimization Stack
**Per-class Coverage Summary (Final)**:
- C6 (57.17%): +2.87% (Phase 75-1)
- C5 (28.55%): +1.10% (Phase 75-2)
- C4 (14.29%): +1.27% in context (Phase 76-1/76-2)
- C7 (0.00%): NO-GO (Phase 76-0)
- **Combined C4-C6: +7.05% (Phase 76-2 super-additive)**
**Status**: ✅ **C4-C7 Optimization Complete** (100% coverage, SSOT locked)
---
### 🟥 次のActivePhase 77+
**オプション**:
**Option A: FAST PGO Periodic Tracking** (Track B discipline)
- Regenerate PGO profile with C4+C5+C6=ON if code bloat accumulates
- Monitor mimalloc ratio progress (secondary metric)
- Not a decision point per se, but periodic maintenance
**Option B: Phase 77 (Alternative Optimization Axis)**
- Explore beyond per-class inline slots
- Candidates:
- Allocation fast-path optimization (call elimination)
- Metadata/page lookup (table optimization)
- C3/C2 class strategies
- Warm pool tuning (beyond Phase 69's WarmPool=16)
**推奨**: **Option B へ進む**Phase 77+
- C4-C7 optimizations are exhausted and locked
- Ready to explore new optimization axes
- Baseline is now +7.05% stronger than Phase 75-3
**参考**:
- C4-C7 完全分析: `docs/analysis/PHASE76_2_C4C5C6_MATRIX_RESULTS.md`
- Phase 75-3 参考 (C5+C6): `docs/analysis/PHASE75_3_C5_C6_INTERACTION_RESULTS.md`
## 5) アーカイブ
- 詳細ログ: `CURRENT_TASK_ARCHIVE_20251210.md`
- 整理前スナップショット: `docs/analysis/CURRENT_TASK_ARCHIVE.md`