2025-12-12 06:20:14 +09:00
|
|
|
|
# Phase v11a 設計仕様: MID v3.5 (257-1KiB Unified Box)
|
|
|
|
|
|
|
|
|
|
|
|
## 1. 位置づけ
|
|
|
|
|
|
|
|
|
|
|
|
**v10 から v11a への移行**:
|
|
|
|
|
|
- v10: v7 を C5/C6-only 研究箱として凍結、Learner default ON
|
|
|
|
|
|
- v11a: **MID v3.5 を 257-1KiB メイン実装として統一拡張**
|
|
|
|
|
|
|
|
|
|
|
|
**アーキテクチャ役割**:
|
|
|
|
|
|
```
|
|
|
|
|
|
L0: ULTRA (C4-C7) → FROZEN(変わらず)
|
|
|
|
|
|
L1: MID v3.5 (C5-C7) → 本線 ★NEW
|
|
|
|
|
|
├─ C5: TLS cache + 2MiB segment (multi-page)
|
|
|
|
|
|
├─ C6: TLS cache + 2MiB segment (multi-page)
|
|
|
|
|
|
└─ C7: TLS cache + 2MiB segment (multi-page)
|
|
|
|
|
|
L1-research: v7 (C5/C6) → 研究箱(凍結)
|
|
|
|
|
|
L2: Segment/ColdIface/RegionId
|
|
|
|
|
|
L3: Policy Box + Learner v2 (expanded stats)
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
## 2. MID v3 → MID v3.5 の変更
|
|
|
|
|
|
|
|
|
|
|
|
### 2-1. 現在の MID v3 構成
|
|
|
|
|
|
|
|
|
|
|
|
**実装済み**:
|
|
|
|
|
|
- C5/C6 multi-class TLS heap
|
|
|
|
|
|
- 2MiB segments
|
|
|
|
|
|
- RefillPolicy (TLS segment hint, pool fallback)
|
|
|
|
|
|
- Policy routing (via Policy Box v7-4)
|
|
|
|
|
|
- Legacy Stats(page retire時の基本データ)
|
|
|
|
|
|
|
|
|
|
|
|
**制限事項**:
|
|
|
|
|
|
- C7 未対応(ULTRA固有)
|
|
|
|
|
|
- Learner 統計なし(v7のみ)
|
|
|
|
|
|
- Single-class segment 前提
|
|
|
|
|
|
|
|
|
|
|
|
### 2-2. v11a で追加する機能
|
|
|
|
|
|
|
|
|
|
|
|
#### 機能 1: C7 完全対応
|
|
|
|
|
|
**目標**: MID v3.5 が C5-C7 すべてをカバー
|
|
|
|
|
|
|
|
|
|
|
|
**実装**:
|
|
|
|
|
|
```c
|
|
|
|
|
|
// mid_v3.5.h - new extension
|
|
|
|
|
|
|
|
|
|
|
|
// TLS context for C7
|
|
|
|
|
|
typedef struct {
|
|
|
|
|
|
SmallHeapCtx ctx; // Reuse existing context
|
|
|
|
|
|
void *tls_page; // Current page pointer (C7)
|
|
|
|
|
|
uint32_t tls_offset; // Allocation offset in page
|
|
|
|
|
|
} SmallHeapCtx_C7_MID;
|
|
|
|
|
|
|
|
|
|
|
|
// Allocation fast path
|
|
|
|
|
|
// C7: size > 512B → check MID_ROUTE_C7
|
|
|
|
|
|
// If enabled: try TLS fast alloc → refill on demand
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**Policy routing**:
|
|
|
|
|
|
```c
|
|
|
|
|
|
// mid_policy.h
|
|
|
|
|
|
route_kind[7] = SMALL_ROUTE_MID_V3; // If C7 enabled, else ULTRA
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**Stats tracking**:
|
|
|
|
|
|
```c
|
|
|
|
|
|
// SmallPageStatsMID_v3: record class_idx for all retires
|
|
|
|
|
|
typedef struct {
|
|
|
|
|
|
uint32_t class_idx;
|
|
|
|
|
|
uint64_t total_allocations;
|
|
|
|
|
|
uint64_t total_frees;
|
|
|
|
|
|
uint32_t page_alloc_count; // ← v11a new
|
|
|
|
|
|
uint32_t free_hit_ratio_bps; // ← v11a new (basis points)
|
|
|
|
|
|
} SmallPageStatsMID_v3;
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
#### 機能 2: Multi-class Segment 設計決定
|
|
|
|
|
|
|
|
|
|
|
|
**2択の検討**:
|
|
|
|
|
|
|
|
|
|
|
|
**設計 A: Separate segments**
|
|
|
|
|
|
```
|
|
|
|
|
|
MID_v3_segment[3] = {
|
|
|
|
|
|
[0] → segment_C5,
|
|
|
|
|
|
[1] → segment_C6,
|
|
|
|
|
|
[2] → segment_C7
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
|
|
|
|
|
利点: Simple, clean class separation
|
|
|
|
|
|
欠点: 3x segment overhead, TLS lookup複雑化
|
|
|
|
|
|
|
|
|
|
|
|
**設計 B: Shared segment + per-class pages**
|
|
|
|
|
|
```
|
|
|
|
|
|
SmallSegment_MID_v3 {
|
|
|
|
|
|
free_pages[8]; // per class free stack
|
|
|
|
|
|
class_pages[8]; // current page per class
|
|
|
|
|
|
page_alloc[8]; // allocation count per class
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
|
|
|
|
|
利点: 1 segment で済む, RegionIdBox 変更不要
|
|
|
|
|
|
欠点: Logic 複雑化
|
|
|
|
|
|
|
|
|
|
|
|
**v11a 決定**: **設計 B (shared segment)**
|
|
|
|
|
|
- 理由: RegionIdBox は既存(変更最小化)
|
|
|
|
|
|
- Segment geometry 統一(v7と同じ2MiB/64KiB)
|
|
|
|
|
|
- Multi-class TLS hint 対応可能
|
|
|
|
|
|
|
|
|
|
|
|
#### 機能 3: Learner v2 (Expanded Stats)
|
|
|
|
|
|
|
|
|
|
|
|
**v7-7 Learner の制限**:
|
|
|
|
|
|
```c
|
|
|
|
|
|
// Current: C5 ratio のみ監視
|
|
|
|
|
|
c5_ratio_pct = (stats->per_class[5].v7_allocs * 100) / total_allocs;
|
|
|
|
|
|
if (c5_ratio_pct >= THRESHOLD) → route[5] = V7;
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**v11a Learner v2 の拡張**:
|
|
|
|
|
|
```c
|
|
|
|
|
|
typedef struct {
|
|
|
|
|
|
uint64_t allocs[8]; // per class allocation count
|
|
|
|
|
|
uint32_t retire_ratio_pct[8]; // per class retire efficiency
|
|
|
|
|
|
uint64_t avg_page_utilization; // global metric
|
|
|
|
|
|
uint32_t free_hit_ratio_bps; // global free hit (basis points)
|
|
|
|
|
|
uint64_t eval_count;
|
|
|
|
|
|
} SmallLearnerStatsV2;
|
|
|
|
|
|
|
|
|
|
|
|
// 複数指標での route決定(後日拡張可能)
|
|
|
|
|
|
// Example (Phase v11b):
|
|
|
|
|
|
// - C5_ratio < 30% AND retire_ratio < 50% → MID_v3
|
|
|
|
|
|
// - C5_ratio >= 30% AND free_hit > 8000bps → V7
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**実装フロー**:
|
|
|
|
|
|
```
|
|
|
|
|
|
MID_v3 page retire
|
|
|
|
|
|
↓ record stats
|
|
|
|
|
|
SmallPageStatsMID_v3 {class_idx, allocs, free_hit_ratio}
|
|
|
|
|
|
↓ periodic publish (every LEARNER_EVAL_INTERVAL)
|
|
|
|
|
|
SmallLearnerStatsV2 aggregate
|
|
|
|
|
|
↓
|
|
|
|
|
|
small_learner_v2_evaluate()
|
|
|
|
|
|
↓
|
|
|
|
|
|
small_policy_v3_update_from_learner() ← NEW (Policy v2)
|
|
|
|
|
|
↓
|
|
|
|
|
|
TLS policy cache invalidation
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 2-3. 既存コンポーネント継承
|
|
|
|
|
|
|
|
|
|
|
|
**変更なし**:
|
|
|
|
|
|
- RegionIdBox: Segment ptr → region lookup(既存動作)
|
|
|
|
|
|
- Policy Box: route_kind[8] 配列(既存 API)
|
|
|
|
|
|
- ColdIface: refill/retire インターフェース(既存)
|
|
|
|
|
|
- TLS cache: class ごと快速化(既存パターン)
|
|
|
|
|
|
|
|
|
|
|
|
**要変更**:
|
|
|
|
|
|
- Policy initialization: C7 routing 追加
|
|
|
|
|
|
- Learner stats recording: class_idx 記録追加
|
|
|
|
|
|
- Stats aggregation: Multi-class 対応
|
|
|
|
|
|
|
|
|
|
|
|
## 3. 実装スケジュール
|
|
|
|
|
|
|
|
|
|
|
|
### Phase v11a-1: Design & Infrastructure (Week 1-2)
|
|
|
|
|
|
- [ ] SmallSegment_MID_v3 multi-class layout 決定
|
|
|
|
|
|
- [ ] SmallPageStatsMID_v3 型定義 + publish API
|
|
|
|
|
|
- [ ] SmallLearnerStatsV2 型定義
|
|
|
|
|
|
- [ ] Policy v2 update 関数スケッチ
|
|
|
|
|
|
- [ ] Bench suite拡張: C5/C6/C7 individual tests
|
|
|
|
|
|
|
|
|
|
|
|
### Phase v11a-2: Core Implementation (Week 3-4)
|
|
|
|
|
|
- [ ] SmallHeap_MID_v3_C7 alloc/free path
|
|
|
|
|
|
- [ ] Multi-class refill logic
|
|
|
|
|
|
- [ ] Stats recording (per-page class_idx)
|
|
|
|
|
|
- [ ] Learner stats aggregation
|
|
|
|
|
|
- [ ] Policy update_from_learner v2
|
|
|
|
|
|
|
|
|
|
|
|
### Phase v11a-3: Integration & Testing (Week 5)
|
|
|
|
|
|
- [ ] Learner default ON for MID_v3
|
|
|
|
|
|
- [ ] Perf benchmarks: C5/C6/C7 mixed
|
|
|
|
|
|
- [ ] Learner route switch verification
|
|
|
|
|
|
- [ ] Regression: v7 research preset still works
|
|
|
|
|
|
|
|
|
|
|
|
### Phase v11b: Multi-segment Expansion (TBD)
|
|
|
|
|
|
- [ ] Evaluate separate segment approach
|
|
|
|
|
|
- [ ] TLS multi-segment hint optimization
|
|
|
|
|
|
- [ ] C4 support decision (ULTRA vs MID_v3)
|
|
|
|
|
|
|
|
|
|
|
|
## 4. API 変更最小化
|
|
|
|
|
|
|
|
|
|
|
|
### Policy Box API(変更最小)
|
|
|
|
|
|
```c
|
|
|
|
|
|
// 既存: 関数署名そのまま
|
|
|
|
|
|
const SmallPolicyV7* small_policy_v7_snapshot(void);
|
|
|
|
|
|
void small_policy_v7_init_from_env(SmallPolicyV7* policy);
|
|
|
|
|
|
void small_policy_v7_update_from_learner(
|
|
|
|
|
|
const SmallLearnerStatsV7* stats,
|
|
|
|
|
|
SmallPolicyV7* policy_out
|
|
|
|
|
|
);
|
|
|
|
|
|
|
|
|
|
|
|
// v11a: 型名だけ拡張
|
|
|
|
|
|
// typedef SmallLearnerStatsV7 → SmallLearnerStatsV2 (backward compat)
|
|
|
|
|
|
// → 内部で v2 の新フィールドは optional
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### Learner Box API(新規 add)
|
|
|
|
|
|
```c
|
|
|
|
|
|
// smallobject_learner_v2_box.h
|
|
|
|
|
|
typedef struct { /* SmallLearnerStatsV2 */ } SmallLearnerStatsV2;
|
|
|
|
|
|
|
|
|
|
|
|
void small_learner_v2_record_retire(uint32_t class_idx,
|
|
|
|
|
|
uint32_t free_hit_ratio_bps);
|
|
|
|
|
|
void small_learner_v2_evaluate(void);
|
|
|
|
|
|
const SmallLearnerStatsV2* small_learner_v2_stats_snapshot(void);
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### ColdIface API(変更なし)
|
|
|
|
|
|
```c
|
|
|
|
|
|
// 既存の refill/retire インターフェース
|
|
|
|
|
|
typedef void (*cold_refill_page_fn)(uint32_t class_idx, ...);
|
|
|
|
|
|
typedef void (*cold_retire_page_fn)(uint32_t class_idx, ...);
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
## 5. パフォーマンス予測
|
|
|
|
|
|
|
|
|
|
|
|
### Current MID v3 (C5/C6)
|
|
|
|
|
|
```
|
|
|
|
|
|
C5/C6 mixed (200-500B, 300K iter): 38.7M ops/s
|
|
|
|
|
|
C6 heavy (400-510B, 500K iter): 56.3M ops/s
|
|
|
|
|
|
Mixed 16-1024B (v7 OFF): 21.5M ops/s
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### Expected MID v3.5 (after implementation)
|
|
|
|
|
|
```
|
|
|
|
|
|
C5/C6/C7 mixed (200-1000B): +3-5% (more pages, better locality)
|
|
|
|
|
|
C7 heavy (800-1000B): +2-3% (vs ULTRA fallback)
|
|
|
|
|
|
Mixed 16-1024B (with Learner): +1-2% (dynamic routing)
|
|
|
|
|
|
```
|
|
|
|
|
|
|
2025-12-12 07:17:52 +09:00
|
|
|
|
### Actual MID v3.5 Results (Phase v11a-4)
|
|
|
|
|
|
|
|
|
|
|
|
**C6-heavy (257-512B)**:
|
|
|
|
|
|
```
|
|
|
|
|
|
v3.5 OFF: 34.0M ops/s
|
|
|
|
|
|
v3.5 ON: 35.8M ops/s (+5.1%)
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**Mixed 16-1024B (ws=400, 10M iters, avg of 3 runs)**:
|
|
|
|
|
|
```
|
|
|
|
|
|
v3.5 OFF: 38.6M ops/s
|
|
|
|
|
|
v3.5 ON: 40.3M ops/s (+4.4%)
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**所感**: C6-heavy では予測通り +5%、Mixed でも +4% の改善が確認できた。
|
|
|
|
|
|
予測より良い結果。Mixed 本線で C6→MID v3.5 は採用候補として有効。
|
|
|
|
|
|
|
2025-12-12 06:20:14 +09:00
|
|
|
|
**メトリクス**:
|
2025-12-12 07:17:52 +09:00
|
|
|
|
- Throughput: +4-5% (予測+1-3% を上回る)
|
|
|
|
|
|
- Overhead: 測定なし(mmap 直叩きで回避)
|
|
|
|
|
|
- Learner accuracy: 観測モードのみ(route 切替は将来フェーズ)
|
2025-12-12 06:20:14 +09:00
|
|
|
|
|
|
|
|
|
|
## 6. 設計確定事項
|
|
|
|
|
|
|
|
|
|
|
|
### Segment Geometry (v11a)
|
|
|
|
|
|
```
|
|
|
|
|
|
SmallSegment_MID_v3:
|
|
|
|
|
|
- Total size: 2 MiB (same as v7)
|
|
|
|
|
|
- Page size: 64 KiB (same as v7)
|
|
|
|
|
|
- Free stack: per-class (C5/C6/C7 each)
|
|
|
|
|
|
- Class pages: current[8], partial[8]
|
|
|
|
|
|
- RegionId: single segment per TLS thread
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### TLS Caching Pattern
|
|
|
|
|
|
```c
|
|
|
|
|
|
// TLS MID context
|
|
|
|
|
|
struct {
|
|
|
|
|
|
SmallSegment_MID_v3 *seg;
|
|
|
|
|
|
void *page[8]; // Current page per class
|
|
|
|
|
|
uint32_t offset[8]; // Allocation offset
|
|
|
|
|
|
uint32_t cache_hits;
|
|
|
|
|
|
uint32_t cache_misses;
|
|
|
|
|
|
} __thread tls_mid_v3_ctx;
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### Stats Recording
|
|
|
|
|
|
```c
|
|
|
|
|
|
// On page retire:
|
|
|
|
|
|
void small_cold_mid_v3_retire_page(..., uint32_t class_idx) {
|
|
|
|
|
|
SmallPageMeta* meta = page->meta;
|
|
|
|
|
|
meta->class_idx = class_idx; // ← record class
|
|
|
|
|
|
|
|
|
|
|
|
// Calculate metrics
|
|
|
|
|
|
uint32_t free_hit = calc_free_hit_ratio(page);
|
|
|
|
|
|
|
|
|
|
|
|
// Publish stats
|
|
|
|
|
|
SmallPageStatsMID_v3 stat = {
|
|
|
|
|
|
.class_idx = class_idx,
|
|
|
|
|
|
.total_allocations = page->alloc_count,
|
|
|
|
|
|
.total_frees = page->free_count,
|
|
|
|
|
|
.page_alloc_count = capacity,
|
|
|
|
|
|
.free_hit_ratio_bps = free_hit
|
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
|
|
// Feed to Learner
|
|
|
|
|
|
small_learner_v2_ingest_stats(&stat);
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
## 7. Next Decision Points
|
|
|
|
|
|
|
|
|
|
|
|
### v11b への移行判定
|
|
|
|
|
|
```
|
|
|
|
|
|
Go to v11b (multi-segment) if:
|
|
|
|
|
|
✓ C7 performance matches ULTRA (±2%)
|
|
|
|
|
|
✓ Learner accuracy > 90% on class patterns
|
|
|
|
|
|
✓ RegionId lookup latency acceptable (<2% overhead)
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### Stay in v11a (iterate) if:
|
|
|
|
|
|
```
|
|
|
|
|
|
✗ C7 performance < 90% vs ULTRA
|
|
|
|
|
|
✗ Learner detection < 80% accuracy
|
|
|
|
|
|
✗ Stats aggregation cost > 5% CPU
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
## 8. 枝刳り対象(後日)
|
|
|
|
|
|
|
|
|
|
|
|
### Branch Cutting (Phase v12+)
|
|
|
|
|
|
- v3 backend の細部最適化
|
|
|
|
|
|
- v6 headerless gains検証
|
|
|
|
|
|
- v7 multi-class 検証
|
|
|
|
|
|
- Learner 多次元最適化(free_pressure, fragmentation)
|
|
|
|
|
|
|
|
|
|
|
|
### Not in v11a
|
|
|
|
|
|
- Policy v2 の複雑なルーティング(多次元条件)
|
|
|
|
|
|
- v6/v7/MID 同時最適化
|
|
|
|
|
|
- ColdIface の大規模リファクタ
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
**Document Date**: 2025-12-12
|
|
|
|
|
|
**Decision**: Option A (MID v3.5 consolidation)
|
|
|
|
|
|
**Target Completion**: Phase v11a end (2025-12-31)
|
|
|
|
|
|
**Next Review**: After Phase v11a-2 implementation
|