# Phase v11a 設計仕様: MID v3.5 (257-1KiB Unified Box) ## 1. 位置づけ **v10 から v11a への移行**: - v10: v7 を C5/C6-only 研究箱として凍結、Learner default ON - v11a: **MID v3.5 を 257-1KiB メイン実装として統一拡張** **アーキテクチャ役割**: ``` L0: ULTRA (C4-C7) → FROZEN(変わらず) L1: MID v3.5 (C5-C7) → 本線 ★NEW ├─ C5: TLS cache + 2MiB segment (multi-page) ├─ C6: TLS cache + 2MiB segment (multi-page) └─ C7: TLS cache + 2MiB segment (multi-page) L1-research: v7 (C5/C6) → 研究箱(凍結) L2: Segment/ColdIface/RegionId L3: Policy Box + Learner v2 (expanded stats) ``` ## 2. MID v3 → MID v3.5 の変更 ### 2-1. 現在の MID v3 構成 **実装済み**: - C5/C6 multi-class TLS heap - 2MiB segments - RefillPolicy (TLS segment hint, pool fallback) - Policy routing (via Policy Box v7-4) - Legacy Stats(page retire時の基本データ) **制限事項**: - C7 未対応(ULTRA固有) - Learner 統計なし(v7のみ) - Single-class segment 前提 ### 2-2. v11a で追加する機能 #### 機能 1: C7 完全対応 **目標**: MID v3.5 が C5-C7 すべてをカバー **実装**: ```c // mid_v3.5.h - new extension // TLS context for C7 typedef struct { SmallHeapCtx ctx; // Reuse existing context void *tls_page; // Current page pointer (C7) uint32_t tls_offset; // Allocation offset in page } SmallHeapCtx_C7_MID; // Allocation fast path // C7: size > 512B → check MID_ROUTE_C7 // If enabled: try TLS fast alloc → refill on demand ``` **Policy routing**: ```c // mid_policy.h route_kind[7] = SMALL_ROUTE_MID_V3; // If C7 enabled, else ULTRA ``` **Stats tracking**: ```c // SmallPageStatsMID_v3: record class_idx for all retires typedef struct { uint32_t class_idx; uint64_t total_allocations; uint64_t total_frees; uint32_t page_alloc_count; // ← v11a new uint32_t free_hit_ratio_bps; // ← v11a new (basis points) } SmallPageStatsMID_v3; ``` #### 機能 2: Multi-class Segment 設計決定 **2択の検討**: **設計 A: Separate segments** ``` MID_v3_segment[3] = { [0] → segment_C5, [1] → segment_C6, [2] → segment_C7 } ``` 利点: Simple, clean class separation 欠点: 3x segment overhead, TLS lookup複雑化 **設計 B: Shared segment + per-class pages** ``` SmallSegment_MID_v3 { free_pages[8]; // per class free stack class_pages[8]; // current page per class page_alloc[8]; // allocation count per class } ``` 利点: 1 segment で済む, RegionIdBox 変更不要 欠点: Logic 複雑化 **v11a 決定**: **設計 B (shared segment)** - 理由: RegionIdBox は既存(変更最小化) - Segment geometry 統一(v7と同じ2MiB/64KiB) - Multi-class TLS hint 対応可能 #### 機能 3: Learner v2 (Expanded Stats) **v7-7 Learner の制限**: ```c // Current: C5 ratio のみ監視 c5_ratio_pct = (stats->per_class[5].v7_allocs * 100) / total_allocs; if (c5_ratio_pct >= THRESHOLD) → route[5] = V7; ``` **v11a Learner v2 の拡張**: ```c typedef struct { uint64_t allocs[8]; // per class allocation count uint32_t retire_ratio_pct[8]; // per class retire efficiency uint64_t avg_page_utilization; // global metric uint32_t free_hit_ratio_bps; // global free hit (basis points) uint64_t eval_count; } SmallLearnerStatsV2; // 複数指標での route決定(後日拡張可能) // Example (Phase v11b): // - C5_ratio < 30% AND retire_ratio < 50% → MID_v3 // - C5_ratio >= 30% AND free_hit > 8000bps → V7 ``` **実装フロー**: ``` MID_v3 page retire ↓ record stats SmallPageStatsMID_v3 {class_idx, allocs, free_hit_ratio} ↓ periodic publish (every LEARNER_EVAL_INTERVAL) SmallLearnerStatsV2 aggregate ↓ small_learner_v2_evaluate() ↓ small_policy_v3_update_from_learner() ← NEW (Policy v2) ↓ TLS policy cache invalidation ``` ### 2-3. 既存コンポーネント継承 **変更なし**: - RegionIdBox: Segment ptr → region lookup(既存動作) - Policy Box: route_kind[8] 配列(既存 API) - ColdIface: refill/retire インターフェース(既存) - TLS cache: class ごと快速化(既存パターン) **要変更**: - Policy initialization: C7 routing 追加 - Learner stats recording: class_idx 記録追加 - Stats aggregation: Multi-class 対応 ## 3. 実装スケジュール ### Phase v11a-1: Design & Infrastructure (Week 1-2) - [ ] SmallSegment_MID_v3 multi-class layout 決定 - [ ] SmallPageStatsMID_v3 型定義 + publish API - [ ] SmallLearnerStatsV2 型定義 - [ ] Policy v2 update 関数スケッチ - [ ] Bench suite拡張: C5/C6/C7 individual tests ### Phase v11a-2: Core Implementation (Week 3-4) - [ ] SmallHeap_MID_v3_C7 alloc/free path - [ ] Multi-class refill logic - [ ] Stats recording (per-page class_idx) - [ ] Learner stats aggregation - [ ] Policy update_from_learner v2 ### Phase v11a-3: Integration & Testing (Week 5) - [ ] Learner default ON for MID_v3 - [ ] Perf benchmarks: C5/C6/C7 mixed - [ ] Learner route switch verification - [ ] Regression: v7 research preset still works ### Phase v11b: Multi-segment Expansion (TBD) - [ ] Evaluate separate segment approach - [ ] TLS multi-segment hint optimization - [ ] C4 support decision (ULTRA vs MID_v3) ## 4. API 変更最小化 ### Policy Box API(変更最小) ```c // 既存: 関数署名そのまま const SmallPolicyV7* small_policy_v7_snapshot(void); void small_policy_v7_init_from_env(SmallPolicyV7* policy); void small_policy_v7_update_from_learner( const SmallLearnerStatsV7* stats, SmallPolicyV7* policy_out ); // v11a: 型名だけ拡張 // typedef SmallLearnerStatsV7 → SmallLearnerStatsV2 (backward compat) // → 内部で v2 の新フィールドは optional ``` ### Learner Box API(新規 add) ```c // smallobject_learner_v2_box.h typedef struct { /* SmallLearnerStatsV2 */ } SmallLearnerStatsV2; void small_learner_v2_record_retire(uint32_t class_idx, uint32_t free_hit_ratio_bps); void small_learner_v2_evaluate(void); const SmallLearnerStatsV2* small_learner_v2_stats_snapshot(void); ``` ### ColdIface API(変更なし) ```c // 既存の refill/retire インターフェース typedef void (*cold_refill_page_fn)(uint32_t class_idx, ...); typedef void (*cold_retire_page_fn)(uint32_t class_idx, ...); ``` ## 5. パフォーマンス予測 ### Current MID v3 (C5/C6) ``` C5/C6 mixed (200-500B, 300K iter): 38.7M ops/s C6 heavy (400-510B, 500K iter): 56.3M ops/s Mixed 16-1024B (v7 OFF): 21.5M ops/s ``` ### Expected MID v3.5 (after implementation) ``` C5/C6/C7 mixed (200-1000B): +3-5% (more pages, better locality) C7 heavy (800-1000B): +2-3% (vs ULTRA fallback) Mixed 16-1024B (with Learner): +1-2% (dynamic routing) ``` ### Actual MID v3.5 Results (Phase v11a-4) **C6-heavy (257-512B)**: ``` v3.5 OFF: 34.0M ops/s v3.5 ON: 35.8M ops/s (+5.1%) ``` **Mixed 16-1024B (ws=400, 10M iters, avg of 3 runs)**: ``` v3.5 OFF: 38.6M ops/s v3.5 ON: 40.3M ops/s (+4.4%) ``` **所感**: C6-heavy では予測通り +5%、Mixed でも +4% の改善が確認できた。 予測より良い結果。Mixed 本線で C6→MID v3.5 は採用候補として有効。 **メトリクス**: - Throughput: +4-5% (予測+1-3% を上回る) - Overhead: 測定なし(mmap 直叩きで回避) - Learner accuracy: 観測モードのみ(route 切替は将来フェーズ) ## 6. 設計確定事項 ### Segment Geometry (v11a) ``` SmallSegment_MID_v3: - Total size: 2 MiB (same as v7) - Page size: 64 KiB (same as v7) - Free stack: per-class (C5/C6/C7 each) - Class pages: current[8], partial[8] - RegionId: single segment per TLS thread ``` ### TLS Caching Pattern ```c // TLS MID context struct { SmallSegment_MID_v3 *seg; void *page[8]; // Current page per class uint32_t offset[8]; // Allocation offset uint32_t cache_hits; uint32_t cache_misses; } __thread tls_mid_v3_ctx; ``` ### Stats Recording ```c // On page retire: void small_cold_mid_v3_retire_page(..., uint32_t class_idx) { SmallPageMeta* meta = page->meta; meta->class_idx = class_idx; // ← record class // Calculate metrics uint32_t free_hit = calc_free_hit_ratio(page); // Publish stats SmallPageStatsMID_v3 stat = { .class_idx = class_idx, .total_allocations = page->alloc_count, .total_frees = page->free_count, .page_alloc_count = capacity, .free_hit_ratio_bps = free_hit }; // Feed to Learner small_learner_v2_ingest_stats(&stat); } ``` ## 7. Next Decision Points ### v11b への移行判定 ``` Go to v11b (multi-segment) if: ✓ C7 performance matches ULTRA (±2%) ✓ Learner accuracy > 90% on class patterns ✓ RegionId lookup latency acceptable (<2% overhead) ``` ### Stay in v11a (iterate) if: ``` ✗ C7 performance < 90% vs ULTRA ✗ Learner detection < 80% accuracy ✗ Stats aggregation cost > 5% CPU ``` ## 8. 枝刳り対象(後日) ### Branch Cutting (Phase v12+) - v3 backend の細部最適化 - v6 headerless gains検証 - v7 multi-class 検証 - Learner 多次元最適化(free_pressure, fragmentation) ### Not in v11a - Policy v2 の複雑なルーティング(多次元条件) - v6/v7/MID 同時最適化 - ColdIface の大規模リファクタ --- **Document Date**: 2025-12-12 **Decision**: Option A (MID v3.5 consolidation) **Target Completion**: Phase v11a end (2025-12-31) **Next Review**: After Phase v11a-2 implementation