Phase v11a: Architecture design and implementation roadmap documents
Create comprehensive design specifications for Phase v11a (MID v3.5): 1. PHASE_V11A_DESIGN_MID_V3.5.md - Decision rationale: Option A chosen (consolidation vs expansion) - MID v3.5 architecture: unified 257-1KiB box - Role clarification: v7 frozen as research preset - Learner v2 scope: multi-class tracking, C5 ratio primary decision - Segment design decision: shared segment (Design B) vs separate segments - Stats expansion: per-class efficiency metrics - API changes: minimal, backward compatible 2. PHASE_V11A_IMPLEMENTATION_ROADMAP.md - Detailed task breakdown for v11a-1, v11a-2, v11a-3 - File structure: new boxes, implementation files, modified files - Concrete function signatures and integration points - Benchmark commands and expected performance - Dependency graph and implementation order - Build/Makefile changes needed - Testing strategy and regression checks Key Design Decisions: - Multi-class segment uses shared 2MiB segment (not separate) - Per-class free page stacks for efficient refill - Stats published per-page retire (for Learner ingestion) - TLS version-based cache invalidation (atomic policy updates) - Backward compatibility: Policy v2 extends v1 interface Next Step: Phase v11a-2 (Core Implementation) - Implement segment creation/alloc/free - Add C7 support to existing MID_v3 - Stats recording during page retire - Learner aggregation logic 🤖 Generated with Claude Code Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
This commit is contained in:
326
docs/analysis/PHASE_V11A_DESIGN_MID_V3.5.md
Normal file
326
docs/analysis/PHASE_V11A_DESIGN_MID_V3.5.md
Normal file
@ -0,0 +1,326 @@
|
||||
# Phase v11a 設計仕様: MID v3.5 (257-1KiB Unified Box)
|
||||
|
||||
## 1. 位置づけ
|
||||
|
||||
**v10 から v11a への移行**:
|
||||
- v10: v7 を C5/C6-only 研究箱として凍結、Learner default ON
|
||||
- v11a: **MID v3.5 を 257-1KiB メイン実装として統一拡張**
|
||||
|
||||
**アーキテクチャ役割**:
|
||||
```
|
||||
L0: ULTRA (C4-C7) → FROZEN(変わらず)
|
||||
L1: MID v3.5 (C5-C7) → 本線 ★NEW
|
||||
├─ C5: TLS cache + 2MiB segment (multi-page)
|
||||
├─ C6: TLS cache + 2MiB segment (multi-page)
|
||||
└─ C7: TLS cache + 2MiB segment (multi-page)
|
||||
L1-research: v7 (C5/C6) → 研究箱(凍結)
|
||||
L2: Segment/ColdIface/RegionId
|
||||
L3: Policy Box + Learner v2 (expanded stats)
|
||||
```
|
||||
|
||||
## 2. MID v3 → MID v3.5 の変更
|
||||
|
||||
### 2-1. 現在の MID v3 構成
|
||||
|
||||
**実装済み**:
|
||||
- C5/C6 multi-class TLS heap
|
||||
- 2MiB segments
|
||||
- RefillPolicy (TLS segment hint, pool fallback)
|
||||
- Policy routing (via Policy Box v7-4)
|
||||
- Legacy Stats(page retire時の基本データ)
|
||||
|
||||
**制限事項**:
|
||||
- C7 未対応(ULTRA固有)
|
||||
- Learner 統計なし(v7のみ)
|
||||
- Single-class segment 前提
|
||||
|
||||
### 2-2. v11a で追加する機能
|
||||
|
||||
#### 機能 1: C7 完全対応
|
||||
**目標**: MID v3.5 が C5-C7 すべてをカバー
|
||||
|
||||
**実装**:
|
||||
```c
|
||||
// mid_v3.5.h - new extension
|
||||
|
||||
// TLS context for C7
|
||||
typedef struct {
|
||||
SmallHeapCtx ctx; // Reuse existing context
|
||||
void *tls_page; // Current page pointer (C7)
|
||||
uint32_t tls_offset; // Allocation offset in page
|
||||
} SmallHeapCtx_C7_MID;
|
||||
|
||||
// Allocation fast path
|
||||
// C7: size > 512B → check MID_ROUTE_C7
|
||||
// If enabled: try TLS fast alloc → refill on demand
|
||||
```
|
||||
|
||||
**Policy routing**:
|
||||
```c
|
||||
// mid_policy.h
|
||||
route_kind[7] = SMALL_ROUTE_MID_V3; // If C7 enabled, else ULTRA
|
||||
```
|
||||
|
||||
**Stats tracking**:
|
||||
```c
|
||||
// SmallPageStatsMID_v3: record class_idx for all retires
|
||||
typedef struct {
|
||||
uint32_t class_idx;
|
||||
uint64_t total_allocations;
|
||||
uint64_t total_frees;
|
||||
uint32_t page_alloc_count; // ← v11a new
|
||||
uint32_t free_hit_ratio_bps; // ← v11a new (basis points)
|
||||
} SmallPageStatsMID_v3;
|
||||
```
|
||||
|
||||
#### 機能 2: Multi-class Segment 設計決定
|
||||
|
||||
**2択の検討**:
|
||||
|
||||
**設計 A: Separate segments**
|
||||
```
|
||||
MID_v3_segment[3] = {
|
||||
[0] → segment_C5,
|
||||
[1] → segment_C6,
|
||||
[2] → segment_C7
|
||||
}
|
||||
```
|
||||
利点: Simple, clean class separation
|
||||
欠点: 3x segment overhead, TLS lookup複雑化
|
||||
|
||||
**設計 B: Shared segment + per-class pages**
|
||||
```
|
||||
SmallSegment_MID_v3 {
|
||||
free_pages[8]; // per class free stack
|
||||
class_pages[8]; // current page per class
|
||||
page_alloc[8]; // allocation count per class
|
||||
}
|
||||
```
|
||||
利点: 1 segment で済む, RegionIdBox 変更不要
|
||||
欠点: Logic 複雑化
|
||||
|
||||
**v11a 決定**: **設計 B (shared segment)**
|
||||
- 理由: RegionIdBox は既存(変更最小化)
|
||||
- Segment geometry 統一(v7と同じ2MiB/64KiB)
|
||||
- Multi-class TLS hint 対応可能
|
||||
|
||||
#### 機能 3: Learner v2 (Expanded Stats)
|
||||
|
||||
**v7-7 Learner の制限**:
|
||||
```c
|
||||
// Current: C5 ratio のみ監視
|
||||
c5_ratio_pct = (stats->per_class[5].v7_allocs * 100) / total_allocs;
|
||||
if (c5_ratio_pct >= THRESHOLD) → route[5] = V7;
|
||||
```
|
||||
|
||||
**v11a Learner v2 の拡張**:
|
||||
```c
|
||||
typedef struct {
|
||||
uint64_t allocs[8]; // per class allocation count
|
||||
uint32_t retire_ratio_pct[8]; // per class retire efficiency
|
||||
uint64_t avg_page_utilization; // global metric
|
||||
uint32_t free_hit_ratio_bps; // global free hit (basis points)
|
||||
uint64_t eval_count;
|
||||
} SmallLearnerStatsV2;
|
||||
|
||||
// 複数指標での route決定(後日拡張可能)
|
||||
// Example (Phase v11b):
|
||||
// - C5_ratio < 30% AND retire_ratio < 50% → MID_v3
|
||||
// - C5_ratio >= 30% AND free_hit > 8000bps → V7
|
||||
```
|
||||
|
||||
**実装フロー**:
|
||||
```
|
||||
MID_v3 page retire
|
||||
↓ record stats
|
||||
SmallPageStatsMID_v3 {class_idx, allocs, free_hit_ratio}
|
||||
↓ periodic publish (every LEARNER_EVAL_INTERVAL)
|
||||
SmallLearnerStatsV2 aggregate
|
||||
↓
|
||||
small_learner_v2_evaluate()
|
||||
↓
|
||||
small_policy_v3_update_from_learner() ← NEW (Policy v2)
|
||||
↓
|
||||
TLS policy cache invalidation
|
||||
```
|
||||
|
||||
### 2-3. 既存コンポーネント継承
|
||||
|
||||
**変更なし**:
|
||||
- RegionIdBox: Segment ptr → region lookup(既存動作)
|
||||
- Policy Box: route_kind[8] 配列(既存 API)
|
||||
- ColdIface: refill/retire インターフェース(既存)
|
||||
- TLS cache: class ごと快速化(既存パターン)
|
||||
|
||||
**要変更**:
|
||||
- Policy initialization: C7 routing 追加
|
||||
- Learner stats recording: class_idx 記録追加
|
||||
- Stats aggregation: Multi-class 対応
|
||||
|
||||
## 3. 実装スケジュール
|
||||
|
||||
### Phase v11a-1: Design & Infrastructure (Week 1-2)
|
||||
- [ ] SmallSegment_MID_v3 multi-class layout 決定
|
||||
- [ ] SmallPageStatsMID_v3 型定義 + publish API
|
||||
- [ ] SmallLearnerStatsV2 型定義
|
||||
- [ ] Policy v2 update 関数スケッチ
|
||||
- [ ] Bench suite拡張: C5/C6/C7 individual tests
|
||||
|
||||
### Phase v11a-2: Core Implementation (Week 3-4)
|
||||
- [ ] SmallHeap_MID_v3_C7 alloc/free path
|
||||
- [ ] Multi-class refill logic
|
||||
- [ ] Stats recording (per-page class_idx)
|
||||
- [ ] Learner stats aggregation
|
||||
- [ ] Policy update_from_learner v2
|
||||
|
||||
### Phase v11a-3: Integration & Testing (Week 5)
|
||||
- [ ] Learner default ON for MID_v3
|
||||
- [ ] Perf benchmarks: C5/C6/C7 mixed
|
||||
- [ ] Learner route switch verification
|
||||
- [ ] Regression: v7 research preset still works
|
||||
|
||||
### Phase v11b: Multi-segment Expansion (TBD)
|
||||
- [ ] Evaluate separate segment approach
|
||||
- [ ] TLS multi-segment hint optimization
|
||||
- [ ] C4 support decision (ULTRA vs MID_v3)
|
||||
|
||||
## 4. API 変更最小化
|
||||
|
||||
### Policy Box API(変更最小)
|
||||
```c
|
||||
// 既存: 関数署名そのまま
|
||||
const SmallPolicyV7* small_policy_v7_snapshot(void);
|
||||
void small_policy_v7_init_from_env(SmallPolicyV7* policy);
|
||||
void small_policy_v7_update_from_learner(
|
||||
const SmallLearnerStatsV7* stats,
|
||||
SmallPolicyV7* policy_out
|
||||
);
|
||||
|
||||
// v11a: 型名だけ拡張
|
||||
// typedef SmallLearnerStatsV7 → SmallLearnerStatsV2 (backward compat)
|
||||
// → 内部で v2 の新フィールドは optional
|
||||
```
|
||||
|
||||
### Learner Box API(新規 add)
|
||||
```c
|
||||
// smallobject_learner_v2_box.h
|
||||
typedef struct { /* SmallLearnerStatsV2 */ } SmallLearnerStatsV2;
|
||||
|
||||
void small_learner_v2_record_retire(uint32_t class_idx,
|
||||
uint32_t free_hit_ratio_bps);
|
||||
void small_learner_v2_evaluate(void);
|
||||
const SmallLearnerStatsV2* small_learner_v2_stats_snapshot(void);
|
||||
```
|
||||
|
||||
### ColdIface API(変更なし)
|
||||
```c
|
||||
// 既存の refill/retire インターフェース
|
||||
typedef void (*cold_refill_page_fn)(uint32_t class_idx, ...);
|
||||
typedef void (*cold_retire_page_fn)(uint32_t class_idx, ...);
|
||||
```
|
||||
|
||||
## 5. パフォーマンス予測
|
||||
|
||||
### Current MID v3 (C5/C6)
|
||||
```
|
||||
C5/C6 mixed (200-500B, 300K iter): 38.7M ops/s
|
||||
C6 heavy (400-510B, 500K iter): 56.3M ops/s
|
||||
Mixed 16-1024B (v7 OFF): 21.5M ops/s
|
||||
```
|
||||
|
||||
### Expected MID v3.5 (after implementation)
|
||||
```
|
||||
C5/C6/C7 mixed (200-1000B): +3-5% (more pages, better locality)
|
||||
C7 heavy (800-1000B): +2-3% (vs ULTRA fallback)
|
||||
Mixed 16-1024B (with Learner): +1-2% (dynamic routing)
|
||||
```
|
||||
|
||||
**メトリクス**:
|
||||
- Throughput: +1-3% overall
|
||||
- Overhead: ~5-8% (relative to ULTRA baseline)
|
||||
- Learner accuracy: > 95% on workload pattern detection
|
||||
|
||||
## 6. 設計確定事項
|
||||
|
||||
### Segment Geometry (v11a)
|
||||
```
|
||||
SmallSegment_MID_v3:
|
||||
- Total size: 2 MiB (same as v7)
|
||||
- Page size: 64 KiB (same as v7)
|
||||
- Free stack: per-class (C5/C6/C7 each)
|
||||
- Class pages: current[8], partial[8]
|
||||
- RegionId: single segment per TLS thread
|
||||
```
|
||||
|
||||
### TLS Caching Pattern
|
||||
```c
|
||||
// TLS MID context
|
||||
struct {
|
||||
SmallSegment_MID_v3 *seg;
|
||||
void *page[8]; // Current page per class
|
||||
uint32_t offset[8]; // Allocation offset
|
||||
uint32_t cache_hits;
|
||||
uint32_t cache_misses;
|
||||
} __thread tls_mid_v3_ctx;
|
||||
```
|
||||
|
||||
### Stats Recording
|
||||
```c
|
||||
// On page retire:
|
||||
void small_cold_mid_v3_retire_page(..., uint32_t class_idx) {
|
||||
SmallPageMeta* meta = page->meta;
|
||||
meta->class_idx = class_idx; // ← record class
|
||||
|
||||
// Calculate metrics
|
||||
uint32_t free_hit = calc_free_hit_ratio(page);
|
||||
|
||||
// Publish stats
|
||||
SmallPageStatsMID_v3 stat = {
|
||||
.class_idx = class_idx,
|
||||
.total_allocations = page->alloc_count,
|
||||
.total_frees = page->free_count,
|
||||
.page_alloc_count = capacity,
|
||||
.free_hit_ratio_bps = free_hit
|
||||
};
|
||||
|
||||
// Feed to Learner
|
||||
small_learner_v2_ingest_stats(&stat);
|
||||
}
|
||||
```
|
||||
|
||||
## 7. Next Decision Points
|
||||
|
||||
### v11b への移行判定
|
||||
```
|
||||
Go to v11b (multi-segment) if:
|
||||
✓ C7 performance matches ULTRA (±2%)
|
||||
✓ Learner accuracy > 90% on class patterns
|
||||
✓ RegionId lookup latency acceptable (<2% overhead)
|
||||
```
|
||||
|
||||
### Stay in v11a (iterate) if:
|
||||
```
|
||||
✗ C7 performance < 90% vs ULTRA
|
||||
✗ Learner detection < 80% accuracy
|
||||
✗ Stats aggregation cost > 5% CPU
|
||||
```
|
||||
|
||||
## 8. 枝刳り対象(後日)
|
||||
|
||||
### Branch Cutting (Phase v12+)
|
||||
- v3 backend の細部最適化
|
||||
- v6 headerless gains検証
|
||||
- v7 multi-class 検証
|
||||
- Learner 多次元最適化(free_pressure, fragmentation)
|
||||
|
||||
### Not in v11a
|
||||
- Policy v2 の複雑なルーティング(多次元条件)
|
||||
- v6/v7/MID 同時最適化
|
||||
- ColdIface の大規模リファクタ
|
||||
|
||||
---
|
||||
|
||||
**Document Date**: 2025-12-12
|
||||
**Decision**: Option A (MID v3.5 consolidation)
|
||||
**Target Completion**: Phase v11a end (2025-12-31)
|
||||
**Next Review**: After Phase v11a-2 implementation
|
||||
480
docs/analysis/PHASE_V11A_IMPLEMENTATION_ROADMAP.md
Normal file
480
docs/analysis/PHASE_V11A_IMPLEMENTATION_ROADMAP.md
Normal file
@ -0,0 +1,480 @@
|
||||
# Phase v11a 実装ロードマップ: MID v3.5
|
||||
|
||||
## 1. ファイル構造(新規作成予定)
|
||||
|
||||
### 新規ボックス定義
|
||||
```
|
||||
core/box/
|
||||
├─ smallobject_segment_mid_v3_box.h [NEW] Multi-class segment layout
|
||||
├─ smallobject_stats_mid_v3_box.h [NEW] SmallPageStatsMID_v3 type
|
||||
├─ smallobject_learner_v2_box.h [NEW] SmallLearnerStatsV2 type
|
||||
└─ smallobject_policy_v2_box.h [NEW] Policy v2 update functions
|
||||
```
|
||||
|
||||
### 実装ファイル
|
||||
```
|
||||
core/
|
||||
├─ smallobject_segment_mid_v3.c [NEW] Segment alloc/free/refill
|
||||
├─ smallobject_learner_v2.c [NEW] Learner stats aggregation
|
||||
└─ smallobject_policy_v2.c [NEW] Policy update logic
|
||||
```
|
||||
|
||||
### 既存ファイル変更
|
||||
```
|
||||
core/
|
||||
├─ smallobject_mid_v3.c [MODIFY] C7 support, stats recording
|
||||
├─ front/malloc_tiny_fast.h [MODIFY] C7 routing (if SMALL_ROUTE_MID_V3)
|
||||
├─ hakmem.c [MODIFY] Init smallobject_learner_v2
|
||||
└─ hakmem.h [MODIFY] Export v2 types
|
||||
```
|
||||
|
||||
## 2. Phase v11a-1: Design & Infrastructure
|
||||
|
||||
### Task 1.1: smallobject_segment_mid_v3_box.h
|
||||
```c
|
||||
// File: core/box/smallobject_segment_mid_v3_box.h [NEW]
|
||||
|
||||
#ifndef SMALLOBJECT_SEGMENT_MID_V3_BOX_H
|
||||
#define SMALLOBJECT_SEGMENT_MID_V3_BOX_H
|
||||
|
||||
#include <stdint.h>
|
||||
#include <stddef.h>
|
||||
|
||||
// SmallSegment_MID_v3: unified 2MiB segment for C5-C7
|
||||
typedef struct {
|
||||
void *start;
|
||||
size_t total_size; // 2 MiB
|
||||
size_t page_size; // 64 KiB
|
||||
uint32_t num_pages; // 32
|
||||
|
||||
// Per-class page stacks
|
||||
void *free_pages[8]; // free page stack per class (LIFO)
|
||||
uint32_t free_count[8]; // free page count per class
|
||||
|
||||
// Current allocation page per class
|
||||
void *current_page[8];
|
||||
uint32_t page_offset[8]; // allocation offset in current page
|
||||
|
||||
// Metadata for pages
|
||||
struct SmallPageMeta **pages; // [32] page pointers
|
||||
|
||||
// Region ID (for lookup)
|
||||
uint32_t region_id;
|
||||
} SmallSegment_MID_v3;
|
||||
|
||||
typedef struct {
|
||||
SmallSegment_MID_v3 *seg;
|
||||
void *page[8]; // TLS cache: current page per class
|
||||
uint32_t offset[8]; // TLS cache: offset per class
|
||||
} SmallHeapCtx_MID_v3;
|
||||
|
||||
// API
|
||||
SmallSegment_MID_v3* small_segment_mid_v3_create(void);
|
||||
void small_segment_mid_v3_destroy(SmallSegment_MID_v3 *seg);
|
||||
|
||||
void* small_segment_mid_v3_alloc_fast(
|
||||
SmallSegment_MID_v3 *seg,
|
||||
uint32_t class_idx,
|
||||
size_t size
|
||||
);
|
||||
|
||||
void small_segment_mid_v3_free_page(
|
||||
SmallSegment_MID_v3 *seg,
|
||||
uint32_t class_idx,
|
||||
void *page
|
||||
);
|
||||
|
||||
#endif
|
||||
```
|
||||
|
||||
**Rationale**: Defines the multi-class segment geometry with per-class free stacks and TLS caching pattern
|
||||
|
||||
### Task 1.2: smallobject_stats_mid_v3_box.h
|
||||
```c
|
||||
// File: core/box/smallobject_stats_mid_v3_box.h [NEW]
|
||||
|
||||
typedef struct {
|
||||
uint32_t class_idx;
|
||||
uint64_t total_allocations;
|
||||
uint64_t total_frees;
|
||||
uint32_t page_alloc_count; // Slots on page
|
||||
uint32_t free_hit_ratio_bps; // Free hit rate in basis points (0-10000)
|
||||
} SmallPageStatsMID_v3;
|
||||
|
||||
typedef struct {
|
||||
SmallPageStatsMID_v3 stat;
|
||||
void *page_ptr;
|
||||
uint64_t retire_timestamp;
|
||||
} SmallPageStatsPublished_MID_v3;
|
||||
|
||||
// API
|
||||
void small_stats_mid_v3_publish(const SmallPageStatsMID_v3 *stat);
|
||||
const SmallPageStatsPublished_MID_v3* small_stats_mid_v3_latest(void);
|
||||
```
|
||||
|
||||
**Rationale**: Separates stats type from policy to keep Learner input clean
|
||||
|
||||
### Task 1.3: smallobject_learner_v2_box.h
|
||||
```c
|
||||
// File: core/box/smallobject_learner_v2_box.h [NEW]
|
||||
|
||||
typedef struct {
|
||||
uint64_t allocs[8]; // Allocation count per class
|
||||
uint32_t retire_ratio_pct[8]; // Retire efficiency per class (%)
|
||||
uint64_t avg_page_utilization; // Global average utilization
|
||||
uint32_t free_hit_ratio_bps; // Global free hit rate (basis points)
|
||||
uint64_t eval_count;
|
||||
uint64_t sample_count;
|
||||
} SmallLearnerStatsV2;
|
||||
|
||||
// API
|
||||
void small_learner_v2_record_refill(uint32_t class_idx, uint64_t capacity);
|
||||
void small_learner_v2_record_retire(uint32_t class_idx,
|
||||
uint32_t free_hit_ratio_bps);
|
||||
void small_learner_v2_evaluate(void);
|
||||
const SmallLearnerStatsV2* small_learner_v2_stats_snapshot(void);
|
||||
```
|
||||
|
||||
**Rationale**: Extends learner beyond v7 C5-only to multi-dimensional metrics
|
||||
|
||||
### Task 1.4: smallobject_policy_v2_box.h
|
||||
```c
|
||||
// File: core/box/smallobject_policy_v2_box.h [NEW]
|
||||
|
||||
// Policy v2: Route decision with Learner-driven updates
|
||||
typedef struct {
|
||||
uint8_t route_kind[8]; // Route per class (ULTRA, MID_V3, V7, LEGACY)
|
||||
uint32_t policy_version; // Version for TLS cache invalidation
|
||||
} SmallPolicyV2;
|
||||
|
||||
// API
|
||||
const SmallPolicyV2* small_policy_v2_snapshot(void);
|
||||
void small_policy_v2_init_from_env(SmallPolicyV2 *policy);
|
||||
void small_policy_v2_update_from_learner(
|
||||
const SmallLearnerStatsV2 *stats,
|
||||
SmallPolicyV2 *policy_out
|
||||
);
|
||||
```
|
||||
|
||||
**Rationale**: Extends Policy Box to handle expanded Learner inputs
|
||||
|
||||
### Task 1.5: Benchmark Suite Extension
|
||||
**File**: `core/bench/bench_allocators.c`
|
||||
|
||||
```c
|
||||
// Add test cases for Phase v11a
|
||||
//
|
||||
// BENCH_C5_C6_C7_MIXED:
|
||||
// - Min size: 200B (C5)
|
||||
// - Max size: 1000B (C7)
|
||||
// - Mixed ratio: 30% C5, 40% C6, 30% C7
|
||||
// - Expected perf: 42-48M ops/s (with MID_v3)
|
||||
//
|
||||
// BENCH_C7_HEAVY:
|
||||
// - Min size: 800B
|
||||
// - Max size: 1000B
|
||||
// - Expected perf: 35-40M ops/s (vs ULTRA baseline)
|
||||
//
|
||||
// BENCH_LEARNER_ROUTE_SWITCH:
|
||||
// - Start with C5-heavy (80% C5)
|
||||
// - Expect route[5] = V7 initially
|
||||
// - Then shift to C6-heavy (80% C6)
|
||||
// - Expect route[5] switch to MID_V3
|
||||
```
|
||||
|
||||
## 3. Phase v11a-2: Core Implementation
|
||||
|
||||
### Task 2.1: SmallSegment_MID_v3 Creation
|
||||
**File**: `core/smallobject_segment_mid_v3.c`
|
||||
|
||||
```c
|
||||
SmallSegment_MID_v3* small_segment_mid_v3_create(void) {
|
||||
// Allocate 2MiB segment
|
||||
// Initialize 32 x 64KiB pages
|
||||
// Set up per-class free stacks
|
||||
// Register in RegionIdBox
|
||||
}
|
||||
```
|
||||
|
||||
**Complexity**: Medium
|
||||
- Memory layout: 2MiB = 32 pages of 64KiB each
|
||||
- Metadata: SmallPageMeta per page
|
||||
- Region registration: via RegionIdBox_v7 API (existing)
|
||||
|
||||
### Task 2.2: Fast Alloc Path for C5/C6/C7
|
||||
**File**: `core/smallobject_mid_v3.c`
|
||||
|
||||
Modify existing C5/C6 alloc to support C7:
|
||||
|
||||
```c
|
||||
// Current (v3):
|
||||
// - TLS fast path: C5/C6 from tls_mid_ctx.page
|
||||
// - Refill: get page from free stack or allocate
|
||||
|
||||
// v11a:
|
||||
// - TLS fast path: C5/C6/C7 from tls_mid_ctx.page[class_idx]
|
||||
// - Refill: per-class free stack
|
||||
// - Retire: record stats with class_idx
|
||||
```
|
||||
|
||||
**Changes**:
|
||||
- [ ] Extend TLS context to support C7
|
||||
- [ ] Update refill logic for multi-class
|
||||
- [ ] Add C7 routing in malloc_tiny_fast.h
|
||||
|
||||
### Task 2.3: Stats Recording
|
||||
**File**: `core/smallobject_mid_v3.c`
|
||||
|
||||
```c
|
||||
void small_cold_mid_v3_retire_page(
|
||||
SmallSegment_MID_v3 *seg,
|
||||
uint32_t class_idx,
|
||||
void *page
|
||||
) {
|
||||
SmallPageMeta *meta = page_to_meta(page);
|
||||
|
||||
// Record stats
|
||||
uint32_t free_hit_ratio_bps = calc_free_hit_ratio(meta);
|
||||
SmallPageStatsMID_v3 stat = {
|
||||
.class_idx = class_idx,
|
||||
.total_allocations = meta->alloc_count,
|
||||
.total_frees = meta->free_count,
|
||||
.page_alloc_count = meta->capacity,
|
||||
.free_hit_ratio_bps = free_hit_ratio_bps
|
||||
};
|
||||
|
||||
// Publish to stats system
|
||||
small_stats_mid_v3_publish(&stat);
|
||||
|
||||
// Feed to Learner
|
||||
small_learner_v2_record_retire(class_idx, free_hit_ratio_bps);
|
||||
|
||||
// Free page (return to free stack or OS)
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
**Key Detail**: Must record `class_idx` for Learner aggregation
|
||||
|
||||
### Task 2.4: Learner v2 Aggregation
|
||||
**File**: `core/smallobject_learner_v2.c`
|
||||
|
||||
```c
|
||||
static SmallLearnerStatsV2 g_learner_v2_stats;
|
||||
|
||||
void small_learner_v2_record_retire(uint32_t class_idx,
|
||||
uint32_t free_hit_ratio_bps) {
|
||||
if (class_idx >= 8) return;
|
||||
|
||||
g_learner_v2_stats.allocs[class_idx]++;
|
||||
g_learner_v2_stats.retire_ratio_pct[class_idx] =
|
||||
(g_learner_v2_stats.retire_ratio_pct[class_idx] * 0.9) +
|
||||
(free_hit_ratio_bps / 100.0) * 0.1; // Exponential smoothing
|
||||
|
||||
// Periodic evaluation
|
||||
static uint64_t total_retires = 0;
|
||||
if (++total_retires % LEARNER_EVAL_INTERVAL == 0) {
|
||||
small_learner_v2_evaluate();
|
||||
}
|
||||
}
|
||||
|
||||
void small_learner_v2_evaluate(void) {
|
||||
// Update global version to invalidate TLS policy cache
|
||||
__sync_fetch_and_add(&g_policy_v2_version, 1);
|
||||
|
||||
g_learner_v2_stats.eval_count++;
|
||||
}
|
||||
```
|
||||
|
||||
### Task 2.5: Policy v2 Update
|
||||
**File**: `core/smallobject_policy_v2.c`
|
||||
|
||||
```c
|
||||
void small_policy_v2_update_from_learner(
|
||||
const SmallLearnerStatsV2 *stats,
|
||||
SmallPolicyV2 *policy_out
|
||||
) {
|
||||
if (!stats || !policy_out) return;
|
||||
|
||||
// C5 decision (Phase v11a: same logic as v7)
|
||||
uint64_t total_allocs = 0;
|
||||
for (int i = 0; i < 8; i++) {
|
||||
total_allocs += stats->allocs[i];
|
||||
}
|
||||
|
||||
if (total_allocs > 0) {
|
||||
uint64_t c5_ratio_pct = (stats->allocs[5] * 100) / total_allocs;
|
||||
|
||||
if (c5_ratio_pct >= 30) {
|
||||
policy_out->route_kind[5] = SMALL_ROUTE_V7;
|
||||
} else {
|
||||
policy_out->route_kind[5] = SMALL_ROUTE_MID_V3;
|
||||
}
|
||||
}
|
||||
|
||||
// Future (Phase v11b): Multi-dimensional decisions
|
||||
// if (retire_ratio[5] < 50% && free_hit < 7000bps) → LEGACY
|
||||
// etc.
|
||||
}
|
||||
```
|
||||
|
||||
## 4. Phase v11a-3: Integration & Testing
|
||||
|
||||
### Task 3.1: C7 Routing in malloc_tiny_fast.h
|
||||
**File**: `core/front/malloc_tiny_fast.h`
|
||||
|
||||
Modify alloc switch statement:
|
||||
|
||||
```c
|
||||
// Current (v10):
|
||||
// case TINY_ROUTE_SMALL_HEAP_V7: return small_heap_alloc_v7(...);
|
||||
// case TINY_ROUTE_SMALL_HEAP_MID_V3: return small_heap_alloc_mid_v3(...);
|
||||
|
||||
// v11a:
|
||||
// Add support for C7 routing to MID_v3
|
||||
switch (policy->route_kind[class_idx]) {
|
||||
case SMALL_ROUTE_ULTRA:
|
||||
return ULTRA_alloc(...)
|
||||
case SMALL_ROUTE_MID_V3:
|
||||
return small_heap_alloc_mid_v3(class_idx, size); // ← v11a: supports C7
|
||||
case SMALL_ROUTE_V7:
|
||||
return small_heap_alloc_v7(class_idx, size);
|
||||
case SMALL_ROUTE_LEGACY:
|
||||
return legacy_alloc(...);
|
||||
}
|
||||
```
|
||||
|
||||
### Task 3.2: Free Path C7 Support
|
||||
**File**: `core/front/malloc_tiny_fast.h`
|
||||
|
||||
```c
|
||||
// v11a: Allow C7 free to route to MID_v3
|
||||
if (SMALL_MID_V3_CLASS_SUPPORTED(class_idx)) {
|
||||
if (policy->route_kind[class_idx] == SMALL_ROUTE_MID_V3) {
|
||||
small_heap_free_mid_v3(ptr, class_idx);
|
||||
return;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Task 3.3: Integration Tests
|
||||
**File**: `core/test/test_mid_v3_c7.c` [NEW]
|
||||
|
||||
```c
|
||||
void test_mid_v3_c7_alloc_free(void) {
|
||||
// Test C7 allocation and free through MID_v3
|
||||
// Expected: successful alloc/free without segfault
|
||||
// Verify: Policy routing is correct
|
||||
// Verify: Learner stats are recorded
|
||||
}
|
||||
|
||||
void test_learner_v2_route_switch(void) {
|
||||
// Allocate C5-heavy workload
|
||||
// Verify: route[5] = V7
|
||||
// Switch to C6-heavy workload
|
||||
// Verify: route[5] switches to MID_V3
|
||||
// Check stderr: "[LEARNER_V2] C5 route switch: V7 → MID_V3"
|
||||
}
|
||||
|
||||
void test_mid_v3_perf_c5_c6_c7_mixed(void) {
|
||||
// Performance baseline for C5/C6/C7 mixed
|
||||
// Expected: 42-48M ops/s
|
||||
// Verify: no regression vs v7 research preset
|
||||
}
|
||||
```
|
||||
|
||||
### Task 3.4: Regression Testing
|
||||
**Ensure**:
|
||||
- [ ] v7 research preset (C5/C6 + Learner) still works
|
||||
- [ ] Mixed profile (16-1024B, v7 OFF) unchanged
|
||||
- [ ] ULTRA (C4-C7) unchanged
|
||||
- [ ] Legacy fallback unchanged
|
||||
|
||||
## 5. Build & Compilation
|
||||
|
||||
### Makefile Changes
|
||||
```makefile
|
||||
# Add new object files to HAKMEM_OBJS
|
||||
HAKMEM_OBJS += \
|
||||
core/smallobject_segment_mid_v3.o \
|
||||
core/smallobject_learner_v2.o \
|
||||
core/smallobject_policy_v2.o
|
||||
|
||||
# Add new box headers to HEADERS
|
||||
HEADERS += \
|
||||
core/box/smallobject_segment_mid_v3_box.h \
|
||||
core/box/smallobject_stats_mid_v3_box.h \
|
||||
core/box/smallobject_learner_v2_box.h \
|
||||
core/box/smallobject_policy_v2_box.h
|
||||
```
|
||||
|
||||
## 6. Testing Commands
|
||||
|
||||
### Benchmark Suite (after Phase v11a-2)
|
||||
```bash
|
||||
# C5/C6/C7 mixed (expected MID_v3 preferred)
|
||||
HAKMEM_SMALL_HEAP_V7_ENABLED=0 \
|
||||
HAKMEM_MID_V3_ENABLED=1 \
|
||||
HAKMEM_MID_V3_CLASSES=0x70 \
|
||||
./bench_allocators bench_c5_c6_c7_mixed 300000
|
||||
|
||||
# C7 heavy (expected MID_v3 performance)
|
||||
HAKMEM_SMALL_HEAP_V7_ENABLED=0 \
|
||||
HAKMEM_MID_V3_ENABLED=1 \
|
||||
./bench_allocators bench_c7_heavy 200000
|
||||
|
||||
# Learner route switch verification
|
||||
HAKMEM_SMALL_HEAP_V7_ENABLED=1 \
|
||||
HAKMEM_SMALL_HEAP_V7_CLASSES=0x60 \
|
||||
HAKMEM_MID_V3_ENABLED=1 \
|
||||
./bench_allocators bench_learner_route_switch 500000
|
||||
```
|
||||
|
||||
### Expected Output
|
||||
```
|
||||
[POLICY_V2_INIT] Route assignments:
|
||||
C0: LEGACY
|
||||
C1: LEGACY
|
||||
C2: LEGACY
|
||||
C3: LEGACY
|
||||
C4: ULTRA
|
||||
C5: MID_V3
|
||||
C6: MID_V3
|
||||
C7: MID_V3
|
||||
|
||||
[LEARNER_V2] eval_count=1, C5_ratio=28%, retire_ratio[5]=92%
|
||||
|
||||
C5/C6/C7 mixed (300K iter): 44.2M ops/s ✓ (+4% vs baseline)
|
||||
```
|
||||
|
||||
## 7. Dependency Graph
|
||||
|
||||
```
|
||||
smallobject_segment_mid_v3_box.h
|
||||
↓
|
||||
smallobject_segment_mid_v3.c
|
||||
↓ calls
|
||||
smallobject_stats_mid_v3.c
|
||||
↓ publishes to
|
||||
smallobject_learner_v2.c
|
||||
↓ feeds to
|
||||
smallobject_policy_v2.c
|
||||
↓ updates
|
||||
malloc_tiny_fast.h (routing)
|
||||
```
|
||||
|
||||
Recommended implementation order:
|
||||
1. smallobject_segment_mid_v3.h/c (foundation)
|
||||
2. smallobject_stats_mid_v3.h (simple type def)
|
||||
3. smallobject_mid_v3.c changes (core alloc/free)
|
||||
4. smallobject_learner_v2.h/c (stats aggregation)
|
||||
5. smallobject_policy_v2.h/c (learner integration)
|
||||
6. malloc_tiny_fast.h (routing)
|
||||
7. Tests & benchmarks
|
||||
|
||||
---
|
||||
|
||||
**Document Date**: 2025-12-12
|
||||
**Phase**: v11a-1 (Design & Infrastructure)
|
||||
**Status**: Ready for Task 1.1-1.5 implementation
|
||||
**Next Review**: After Phase v11a-1 completion
|
||||
Reference in New Issue
Block a user