2025-12-17 21:08:17 +09:00
|
|
|
|
# CURRENT_TASK(Rolling, SSOT)
|
2025-12-12 16:26:42 +09:00
|
|
|
|
|
2025-12-18 07:47:44 +09:00
|
|
|
|
## 0) 今の「正」(SSOT)
|
2025-12-16 05:35:11 +09:00
|
|
|
|
|
2025-12-18 07:47:44 +09:00
|
|
|
|
- **性能比較の正**: FAST PGO build(`make pgo-fast-full` → `bench_random_mixed_hakmem_minimal_pgo`)+ **WarmPool=16**(Phase 69 強GOで昇格済み)
|
2025-12-16 15:01:56 +09:00
|
|
|
|
- **安全・互換の正**: Standard build(`make bench_random_mixed_hakmem`)
|
|
|
|
|
|
- **観測の正**: OBSERVE build(`make perf_observe`)
|
2025-12-18 07:47:44 +09:00
|
|
|
|
- **スコアカード(目標/現在値)**: `docs/analysis/PERFORMANCE_TARGETS_SCORECARD.md`
|
|
|
|
|
|
- Current baseline(FAST v3 + PGO, Phase 69): **62.63M ops/s = 51.77% of mimalloc**
|
|
|
|
|
|
- 次の目標: **M2 = 55%**(残り **+3.23pp**)
|
|
|
|
|
|
- **Mixed 10-run SSOT**: `scripts/run_mixed_10_cleanenv.sh`(`ITERS=20000000 WS=400`、`HAKMEM_WARM_POOL_SIZE=16` デフォルト)
|
2025-12-18 05:55:47 +09:00
|
|
|
|
|
2025-12-18 07:47:44 +09:00
|
|
|
|
## 1) 迷子防止(経路/観測)
|
2025-12-18 05:55:47 +09:00
|
|
|
|
|
2025-12-18 07:47:44 +09:00
|
|
|
|
“経路が踏まれていない最適化” を防ぐための最小手順。
|
2025-12-18 05:55:47 +09:00
|
|
|
|
|
2025-12-18 07:47:44 +09:00
|
|
|
|
- **Route Banner(経路の誤認を潰す)**: `HAKMEM_ROUTE_BANNER=1`
|
|
|
|
|
|
- 出力: Route assignments(backend route kind)+ cache config(`unified_cache_enabled` / `warm_pool_max_per_class`)
|
|
|
|
|
|
- **Refill観測のSSOT**: `docs/analysis/PHASE70_REFILL_OBSERVABILITY_PREREQS_SSOT.md`
|
|
|
|
|
|
- WS=400(Mixed SSOT)では miss が極小 → `unified_cache_refill()` 最適化は **凍結(ROIゼロ)**
|
2025-12-18 05:55:47 +09:00
|
|
|
|
|
2025-12-18 07:47:44 +09:00
|
|
|
|
## 2) 直近の結論(要点だけ)
|
2025-12-18 05:55:47 +09:00
|
|
|
|
|
2025-12-18 07:47:44 +09:00
|
|
|
|
- **Phase 69(WarmPool sweep)**: `HAKMEM_WARM_POOL_SIZE=16` が **強GO(+3.26%)**、baseline 昇格済み。
|
|
|
|
|
|
- 設計: `docs/analysis/PHASE69_REFILL_TUNING_0_DESIGN.md`
|
|
|
|
|
|
- 結果: `docs/analysis/PHASE69_REFILL_TUNING_1_RESULTS.md`
|
|
|
|
|
|
- **Phase 70(観測SSOT)**: 統計の見える化/前提ゲート確立。WS=400 SSOT では refill は冷たい。
|
|
|
|
|
|
- SSOT: `docs/analysis/PHASE70_REFILL_OBSERVABILITY_PREREQS_SSOT.md`
|
|
|
|
|
|
- **Phase 71/73(WarmPool=16 の勝ち筋確定)**: 勝ち筋は **instruction/branch の微減**(perf stat で確定)。
|
|
|
|
|
|
- 詳細: `docs/analysis/PHASE70_71_WARMPOOL16_ANALYSIS.md`
|
|
|
|
|
|
- **Phase 72(ENV knob ROI枯れ)**: WarmPool=16 を超える ENV-only 勝ち筋なし → **構造(コード)で攻める段階**。
|
2025-12-18 05:55:47 +09:00
|
|
|
|
|
2025-12-18 07:47:44 +09:00
|
|
|
|
## 3) 運用ルール(Box Theory + layout tax 対策)
|
2025-12-18 05:55:47 +09:00
|
|
|
|
|
2025-12-18 07:47:44 +09:00
|
|
|
|
- 変更は必ず **箱 + 境界1箇所 + ENVで戻せる** で積む(Fail-fast、最小可視化)。
|
|
|
|
|
|
- A/B は **同一バイナリでENVトグル**が原則(別バイナリ比較は layout が混ざる)。
|
|
|
|
|
|
- “削除して速い” は封印(link-out/大削除は layout tax で符号反転しやすい)→ **compile-out** を優先。
|
|
|
|
|
|
- 診断: `scripts/box/layout_tax_forensics_box.sh` / `docs/analysis/PHASE67A_LAYOUT_TAX_FORENSICS_SSOT.md`
|
2025-12-18 05:55:47 +09:00
|
|
|
|
|
2025-12-18 07:47:44 +09:00
|
|
|
|
## 4) 次の指示書(Active)
|
2025-12-18 05:55:47 +09:00
|
|
|
|
|
2025-12-18 07:47:44 +09:00
|
|
|
|
### Phase 74(構造): UnifiedCache hit-path を短くする ✅ **P1 (LOCALIZE) 凍結**
|
2025-12-18 05:55:47 +09:00
|
|
|
|
|
2025-12-18 07:47:44 +09:00
|
|
|
|
**前提**:
|
|
|
|
|
|
- WS=400 SSOT では UnifiedCache miss が極小 → refill最適化は ROIゼロ。
|
|
|
|
|
|
- WarmPool=16 の勝ちは instruction/branch 微減 → hit-path を短くするのが正攻法。
|
2025-12-18 05:55:47 +09:00
|
|
|
|
|
2025-12-18 07:47:44 +09:00
|
|
|
|
**Phase 74-1: LOCALIZE (ENV-gated)** ✅ **完了 (NEUTRAL +0.50%)**
|
|
|
|
|
|
- ENV: `HAKMEM_TINY_UC_LOCALIZE=0/1`
|
|
|
|
|
|
- Runtime branch overhead で instructions/branches **増加** (+0.7%/+0.4%)
|
|
|
|
|
|
- 判定: **NEUTRAL (+0.50%)**
|
2025-12-18 06:11:21 +09:00
|
|
|
|
|
2025-12-18 07:47:44 +09:00
|
|
|
|
**Phase 74-2: LOCALIZE (compile-time gate)** ✅ **完了 (NEUTRAL -0.87%)**
|
|
|
|
|
|
- Build flag: `HAKMEM_TINY_UC_LOCALIZE_COMPILED=0/1` (default 0)
|
|
|
|
|
|
- Runtime branch 削除 → instructions/branches **改善** (-0.6%/-2.3%) ✓
|
|
|
|
|
|
- しかし **cache-misses +86%** (register pressure / spill) → throughput **-0.87%**
|
|
|
|
|
|
- 切り分け成功: **LOCALIZE本体は勝ち、cache-miss 増加で相殺**
|
|
|
|
|
|
- 判定: **NEUTRAL (-0.87%)** → **P1 (LOCALIZE) 凍結**
|
2025-12-18 06:11:21 +09:00
|
|
|
|
|
2025-12-18 07:47:44 +09:00
|
|
|
|
**結論**:
|
|
|
|
|
|
- P1 (LOCALIZE) は default OFF で凍結(dependency chain 削減の ROI 低い)
|
|
|
|
|
|
- 次: **Phase 74-3 (P0: FASTAPI)** へ進む
|
2025-12-18 06:11:21 +09:00
|
|
|
|
|
2025-12-18 08:22:09 +09:00
|
|
|
|
**Phase 74-3: P0 (FASTAPI)** ✅ **完了 (NEUTRAL +0.32%)**
|
2025-12-18 06:11:21 +09:00
|
|
|
|
|
2025-12-18 07:47:44 +09:00
|
|
|
|
**Goal**: `unified_cache_enabled()` / `lazy-init` / `stats` 判定を **hot loop の外へ追い出す**
|
2025-12-18 03:44:51 +09:00
|
|
|
|
|
2025-12-18 07:47:44 +09:00
|
|
|
|
**Approach**:
|
|
|
|
|
|
- `unified_cache_push_fast()` / `unified_cache_pop_fast()` API 追加
|
|
|
|
|
|
- 前提: "valid/enabled/no-stats" を caller 側で保証
|
|
|
|
|
|
- Fail-fast: 想定外の状態なら slow path へ fallback(境界1箇所)
|
|
|
|
|
|
- ENV gate: `HAKMEM_TINY_UC_FASTAPI=0/1` (default 0, research box)
|
2025-12-17 16:27:06 +09:00
|
|
|
|
|
2025-12-18 08:22:09 +09:00
|
|
|
|
**Results** (10-run Mixed SSOT, WS=400):
|
|
|
|
|
|
- Throughput: **+0.32%** (NEUTRAL, below +1.0% GO threshold)
|
|
|
|
|
|
- cache-misses: **-16.31%** (positive signal, insufficient throughput gain)
|
2025-12-18 06:11:21 +09:00
|
|
|
|
|
2025-12-18 08:22:09 +09:00
|
|
|
|
**判定**: **NEUTRAL (+0.32%)** → **P0 (FASTAPI) 凍結**
|
2025-12-18 06:11:21 +09:00
|
|
|
|
|
2025-12-18 07:47:44 +09:00
|
|
|
|
**参考**:
|
|
|
|
|
|
- 設計: `docs/analysis/PHASE74_UNIFIEDCACHE_HITPATH_STRUCTURAL_OPT_0_DESIGN.md`
|
|
|
|
|
|
- 指示書: `docs/analysis/PHASE74_UNIFIEDCACHE_HITPATH_STRUCTURAL_OPT_1_NEXT_INSTRUCTIONS.md`
|
2025-12-18 08:22:09 +09:00
|
|
|
|
- 結果 (P1/P0): `docs/analysis/PHASE74_UNIFIEDCACHE_HITPATH_STRUCTURAL_OPT_2_RESULTS.md`
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## Phase 75(構造): Hot-class Inline Slots (P2) 🟡 **準備中**
|
|
|
|
|
|
|
|
|
|
|
|
**Goal**: C4-C7 の統計分析 → targeted optimization 戦略決定
|
|
|
|
|
|
|
|
|
|
|
|
**前提** (Phase 74 learnings):
|
|
|
|
|
|
- UnifiedCache hit-path optimization の ROI が低い ← register pressure / cache-miss effects
|
|
|
|
|
|
- 次の軸: **per-class 特性を活用** → TLS-direct inline slots で branch elimination
|
|
|
|
|
|
|
|
|
|
|
|
**Phase 75-0: Per-Class Analysis** ✅ **完了**
|
|
|
|
|
|
|
|
|
|
|
|
Per-class Unified-STATS (Mixed SSOT, WS=400, HAKMEM_MEASURE_UNIFIED_CACHE=1):
|
|
|
|
|
|
|
|
|
|
|
|
| Class | Capacity | Occupied | Hits | Pushes | Total Ops | Hit % | % of C4-C7 |
|
|
|
|
|
|
|-------|----------|----------|------|--------|-----------|-------|-----------|
|
|
|
|
|
|
| C6 | 128 | 127 | 2,750,854 | 2,750,855 | **5,501,709** | 100% | **57.2%** |
|
|
|
|
|
|
| C5 | 128 | 127 | 1,373,604 | 1,373,605 | **2,747,209** | 100% | **28.5%** |
|
|
|
|
|
|
| C4 | 64 | 63 | 687,563 | 687,564 | **1,375,127** | 100% | **14.3%** |
|
|
|
|
|
|
| C7 | ? | ? | ? | ? | **?** | ? | **?** |
|
|
|
|
|
|
|
|
|
|
|
|
**Key findings**:
|
|
|
|
|
|
1. C6 圧倒的支配: 57.2% の操作 (2.75M hits)
|
|
|
|
|
|
2. 全クラス 100% hit rate (refill inactive in SSOT)
|
|
|
|
|
|
3. Cache occupancy near-capacity (98-99%)
|
|
|
|
|
|
|
|
|
|
|
|
**Phase 75-1: Targeting Strategy** 🟡 **User decision required**
|
|
|
|
|
|
|
|
|
|
|
|
**Recommendation**: Start with **C6-only** (lowest risk)
|
|
|
|
|
|
- Highest ROI (57.2% of C4-C7 ops)
|
|
|
|
|
|
- Lowest TLS bloat (~1KB per thread)
|
|
|
|
|
|
- Aligns with Phase 74 learnings (register pressure matters)
|
|
|
|
|
|
- Fail-fast: if C6 positive, expand to C5
|
|
|
|
|
|
|
|
|
|
|
|
**Alternative**: C6+C5 combined (85.7% ops, single A/B cycle)
|
|
|
|
|
|
|
|
|
|
|
|
**参考**:
|
|
|
|
|
|
- 分析: `docs/analysis/PHASE75_PERCLASS_ANALYSIS_0_SSOT.md`
|
2025-12-18 06:11:21 +09:00
|
|
|
|
|
2025-12-18 07:47:44 +09:00
|
|
|
|
## 5) アーカイブ
|
2025-12-17 16:34:03 +09:00
|
|
|
|
|
2025-12-17 21:08:17 +09:00
|
|
|
|
- 詳細ログ: `CURRENT_TASK_ARCHIVE_20251210.md`
|
2025-12-18 07:47:44 +09:00
|
|
|
|
- 整理前スナップショット: `docs/analysis/CURRENT_TASK_ARCHIVE.md`
|