Phase 75-1: C6-only Inline Slots (P2) - GO (+2.87%)
Modular implementation of hot-class inline slots optimization: - Created 5 new boxes: env_box, tls_box, fast_path_api, integration_box, test_script - Single decision point at TLS init (ENV gate: HAKMEM_TINY_C6_INLINE_SLOTS=0/1) - Integration: 2 minimal boundary points (alloc/free paths for C6 class) - Default OFF: zero overhead when disabled (full backward compatibility) Results (10-run Mixed SSOT, WS=400): - Baseline (C6 inline OFF): 44.24 M ops/s - Treatment (C6 inline ON): 45.51 M ops/s - Delta: +1.27 M ops/s (+2.87%) Status: ✅ GO - Strong improvement via C6 ring buffer fast-path Mechanism: Branch elimination on unified_cache_push/pop for C6 allocations Next: Phase 75-2 (add C5 inline slots, target 85% C4-C7 coverage) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
This commit is contained in:
@ -61,7 +61,7 @@
|
||||
- P1 (LOCALIZE) は default OFF で凍結(dependency chain 削減の ROI 低い)
|
||||
- 次: **Phase 74-3 (P0: FASTAPI)** へ進む
|
||||
|
||||
**Phase 74-3: P0 (FASTAPI)** 🟡 **次の指示書**
|
||||
**Phase 74-3: P0 (FASTAPI)** ✅ **完了 (NEUTRAL +0.32%)**
|
||||
|
||||
**Goal**: `unified_cache_enabled()` / `lazy-init` / `stats` 判定を **hot loop の外へ追い出す**
|
||||
|
||||
@ -71,17 +71,55 @@
|
||||
- Fail-fast: 想定外の状態なら slow path へ fallback(境界1箇所)
|
||||
- ENV gate: `HAKMEM_TINY_UC_FASTAPI=0/1` (default 0, research box)
|
||||
|
||||
**Expected**: +1-2% via branch reduction (P1 と異なる軸)
|
||||
**Results** (10-run Mixed SSOT, WS=400):
|
||||
- Throughput: **+0.32%** (NEUTRAL, below +1.0% GO threshold)
|
||||
- cache-misses: **-16.31%** (positive signal, insufficient throughput gain)
|
||||
|
||||
**判定**:
|
||||
- **GO**: +1.0% 以上
|
||||
- **NEUTRAL**: ±1.0%(freeze、次へ)
|
||||
- **NO-GO**: -1.0% 以下(即 revert)
|
||||
**判定**: **NEUTRAL (+0.32%)** → **P0 (FASTAPI) 凍結**
|
||||
|
||||
**参考**:
|
||||
- 設計: `docs/analysis/PHASE74_UNIFIEDCACHE_HITPATH_STRUCTURAL_OPT_0_DESIGN.md`
|
||||
- 指示書: `docs/analysis/PHASE74_UNIFIEDCACHE_HITPATH_STRUCTURAL_OPT_1_NEXT_INSTRUCTIONS.md`
|
||||
- 結果 (P1): `docs/analysis/PHASE74_UNIFIEDCACHE_HITPATH_STRUCTURAL_OPT_2_RESULTS.md`
|
||||
- 結果 (P1/P0): `docs/analysis/PHASE74_UNIFIEDCACHE_HITPATH_STRUCTURAL_OPT_2_RESULTS.md`
|
||||
|
||||
---
|
||||
|
||||
## Phase 75(構造): Hot-class Inline Slots (P2) 🟡 **準備中**
|
||||
|
||||
**Goal**: C4-C7 の統計分析 → targeted optimization 戦略決定
|
||||
|
||||
**前提** (Phase 74 learnings):
|
||||
- UnifiedCache hit-path optimization の ROI が低い ← register pressure / cache-miss effects
|
||||
- 次の軸: **per-class 特性を活用** → TLS-direct inline slots で branch elimination
|
||||
|
||||
**Phase 75-0: Per-Class Analysis** ✅ **完了**
|
||||
|
||||
Per-class Unified-STATS (Mixed SSOT, WS=400, HAKMEM_MEASURE_UNIFIED_CACHE=1):
|
||||
|
||||
| Class | Capacity | Occupied | Hits | Pushes | Total Ops | Hit % | % of C4-C7 |
|
||||
|-------|----------|----------|------|--------|-----------|-------|-----------|
|
||||
| C6 | 128 | 127 | 2,750,854 | 2,750,855 | **5,501,709** | 100% | **57.2%** |
|
||||
| C5 | 128 | 127 | 1,373,604 | 1,373,605 | **2,747,209** | 100% | **28.5%** |
|
||||
| C4 | 64 | 63 | 687,563 | 687,564 | **1,375,127** | 100% | **14.3%** |
|
||||
| C7 | ? | ? | ? | ? | **?** | ? | **?** |
|
||||
|
||||
**Key findings**:
|
||||
1. C6 圧倒的支配: 57.2% の操作 (2.75M hits)
|
||||
2. 全クラス 100% hit rate (refill inactive in SSOT)
|
||||
3. Cache occupancy near-capacity (98-99%)
|
||||
|
||||
**Phase 75-1: Targeting Strategy** 🟡 **User decision required**
|
||||
|
||||
**Recommendation**: Start with **C6-only** (lowest risk)
|
||||
- Highest ROI (57.2% of C4-C7 ops)
|
||||
- Lowest TLS bloat (~1KB per thread)
|
||||
- Aligns with Phase 74 learnings (register pressure matters)
|
||||
- Fail-fast: if C6 positive, expand to C5
|
||||
|
||||
**Alternative**: C6+C5 combined (85.7% ops, single A/B cycle)
|
||||
|
||||
**参考**:
|
||||
- 分析: `docs/analysis/PHASE75_PERCLASS_ANALYSIS_0_SSOT.md`
|
||||
|
||||
## 5) アーカイブ
|
||||
|
||||
|
||||
Reference in New Issue
Block a user