Files
hakmem/docs/analysis/PHASE_V11A_IMPLEMENTATION_ROADMAP.md

481 lines
13 KiB
Markdown
Raw Normal View History

Phase v11a: Architecture design and implementation roadmap documents Create comprehensive design specifications for Phase v11a (MID v3.5): 1. PHASE_V11A_DESIGN_MID_V3.5.md - Decision rationale: Option A chosen (consolidation vs expansion) - MID v3.5 architecture: unified 257-1KiB box - Role clarification: v7 frozen as research preset - Learner v2 scope: multi-class tracking, C5 ratio primary decision - Segment design decision: shared segment (Design B) vs separate segments - Stats expansion: per-class efficiency metrics - API changes: minimal, backward compatible 2. PHASE_V11A_IMPLEMENTATION_ROADMAP.md - Detailed task breakdown for v11a-1, v11a-2, v11a-3 - File structure: new boxes, implementation files, modified files - Concrete function signatures and integration points - Benchmark commands and expected performance - Dependency graph and implementation order - Build/Makefile changes needed - Testing strategy and regression checks Key Design Decisions: - Multi-class segment uses shared 2MiB segment (not separate) - Per-class free page stacks for efficient refill - Stats published per-page retire (for Learner ingestion) - TLS version-based cache invalidation (atomic policy updates) - Backward compatibility: Policy v2 extends v1 interface Next Step: Phase v11a-2 (Core Implementation) - Implement segment creation/alloc/free - Add C7 support to existing MID_v3 - Stats recording during page retire - Learner aggregation logic 🤖 Generated with Claude Code Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-12 06:20:14 +09:00
# Phase v11a 実装ロードマップ: MID v3.5
## 1. ファイル構造(新規作成予定)
### 新規ボックス定義
```
core/box/
├─ smallobject_segment_mid_v3_box.h [NEW] Multi-class segment layout
├─ smallobject_stats_mid_v3_box.h [NEW] SmallPageStatsMID_v3 type
├─ smallobject_learner_v2_box.h [NEW] SmallLearnerStatsV2 type
└─ smallobject_policy_v2_box.h [NEW] Policy v2 update functions
```
### 実装ファイル
```
core/
├─ smallobject_segment_mid_v3.c [NEW] Segment alloc/free/refill
├─ smallobject_learner_v2.c [NEW] Learner stats aggregation
└─ smallobject_policy_v2.c [NEW] Policy update logic
```
### 既存ファイル変更
```
core/
├─ smallobject_mid_v3.c [MODIFY] C7 support, stats recording
├─ front/malloc_tiny_fast.h [MODIFY] C7 routing (if SMALL_ROUTE_MID_V3)
├─ hakmem.c [MODIFY] Init smallobject_learner_v2
└─ hakmem.h [MODIFY] Export v2 types
```
## 2. Phase v11a-1: Design & Infrastructure
### Task 1.1: smallobject_segment_mid_v3_box.h
```c
// File: core/box/smallobject_segment_mid_v3_box.h [NEW]
#ifndef SMALLOBJECT_SEGMENT_MID_V3_BOX_H
#define SMALLOBJECT_SEGMENT_MID_V3_BOX_H
#include <stdint.h>
#include <stddef.h>
// SmallSegment_MID_v3: unified 2MiB segment for C5-C7
typedef struct {
void *start;
size_t total_size; // 2 MiB
size_t page_size; // 64 KiB
uint32_t num_pages; // 32
// Per-class page stacks
void *free_pages[8]; // free page stack per class (LIFO)
uint32_t free_count[8]; // free page count per class
// Current allocation page per class
void *current_page[8];
uint32_t page_offset[8]; // allocation offset in current page
// Metadata for pages
struct SmallPageMeta **pages; // [32] page pointers
// Region ID (for lookup)
uint32_t region_id;
} SmallSegment_MID_v3;
typedef struct {
SmallSegment_MID_v3 *seg;
void *page[8]; // TLS cache: current page per class
uint32_t offset[8]; // TLS cache: offset per class
} SmallHeapCtx_MID_v3;
// API
SmallSegment_MID_v3* small_segment_mid_v3_create(void);
void small_segment_mid_v3_destroy(SmallSegment_MID_v3 *seg);
void* small_segment_mid_v3_alloc_fast(
SmallSegment_MID_v3 *seg,
uint32_t class_idx,
size_t size
);
void small_segment_mid_v3_free_page(
SmallSegment_MID_v3 *seg,
uint32_t class_idx,
void *page
);
#endif
```
**Rationale**: Defines the multi-class segment geometry with per-class free stacks and TLS caching pattern
### Task 1.2: smallobject_stats_mid_v3_box.h
```c
// File: core/box/smallobject_stats_mid_v3_box.h [NEW]
typedef struct {
uint32_t class_idx;
uint64_t total_allocations;
uint64_t total_frees;
uint32_t page_alloc_count; // Slots on page
uint32_t free_hit_ratio_bps; // Free hit rate in basis points (0-10000)
} SmallPageStatsMID_v3;
typedef struct {
SmallPageStatsMID_v3 stat;
void *page_ptr;
uint64_t retire_timestamp;
} SmallPageStatsPublished_MID_v3;
// API
void small_stats_mid_v3_publish(const SmallPageStatsMID_v3 *stat);
const SmallPageStatsPublished_MID_v3* small_stats_mid_v3_latest(void);
```
**Rationale**: Separates stats type from policy to keep Learner input clean
### Task 1.3: smallobject_learner_v2_box.h
```c
// File: core/box/smallobject_learner_v2_box.h [NEW]
typedef struct {
uint64_t allocs[8]; // Allocation count per class
uint32_t retire_ratio_pct[8]; // Retire efficiency per class (%)
uint64_t avg_page_utilization; // Global average utilization
uint32_t free_hit_ratio_bps; // Global free hit rate (basis points)
uint64_t eval_count;
uint64_t sample_count;
} SmallLearnerStatsV2;
// API
void small_learner_v2_record_refill(uint32_t class_idx, uint64_t capacity);
void small_learner_v2_record_retire(uint32_t class_idx,
uint32_t free_hit_ratio_bps);
void small_learner_v2_evaluate(void);
const SmallLearnerStatsV2* small_learner_v2_stats_snapshot(void);
```
**Rationale**: Extends learner beyond v7 C5-only to multi-dimensional metrics
### Task 1.4: smallobject_policy_v2_box.h
```c
// File: core/box/smallobject_policy_v2_box.h [NEW]
// Policy v2: Route decision with Learner-driven updates
typedef struct {
uint8_t route_kind[8]; // Route per class (ULTRA, MID_V3, V7, LEGACY)
uint32_t policy_version; // Version for TLS cache invalidation
} SmallPolicyV2;
// API
const SmallPolicyV2* small_policy_v2_snapshot(void);
void small_policy_v2_init_from_env(SmallPolicyV2 *policy);
void small_policy_v2_update_from_learner(
const SmallLearnerStatsV2 *stats,
SmallPolicyV2 *policy_out
);
```
**Rationale**: Extends Policy Box to handle expanded Learner inputs
### Task 1.5: Benchmark Suite Extension
**File**: `core/bench/bench_allocators.c`
```c
// Add test cases for Phase v11a
//
// BENCH_C5_C6_C7_MIXED:
// - Min size: 200B (C5)
// - Max size: 1000B (C7)
// - Mixed ratio: 30% C5, 40% C6, 30% C7
// - Expected perf: 42-48M ops/s (with MID_v3)
//
// BENCH_C7_HEAVY:
// - Min size: 800B
// - Max size: 1000B
// - Expected perf: 35-40M ops/s (vs ULTRA baseline)
//
// BENCH_LEARNER_ROUTE_SWITCH:
// - Start with C5-heavy (80% C5)
// - Expect route[5] = V7 initially
// - Then shift to C6-heavy (80% C6)
// - Expect route[5] switch to MID_V3
```
## 3. Phase v11a-2: Core Implementation
### Task 2.1: SmallSegment_MID_v3 Creation
**File**: `core/smallobject_segment_mid_v3.c`
```c
SmallSegment_MID_v3* small_segment_mid_v3_create(void) {
// Allocate 2MiB segment
// Initialize 32 x 64KiB pages
// Set up per-class free stacks
// Register in RegionIdBox
}
```
**Complexity**: Medium
- Memory layout: 2MiB = 32 pages of 64KiB each
- Metadata: SmallPageMeta per page
- Region registration: via RegionIdBox_v7 API (existing)
### Task 2.2: Fast Alloc Path for C5/C6/C7
**File**: `core/smallobject_mid_v3.c`
Modify existing C5/C6 alloc to support C7:
```c
// Current (v3):
// - TLS fast path: C5/C6 from tls_mid_ctx.page
// - Refill: get page from free stack or allocate
// v11a:
// - TLS fast path: C5/C6/C7 from tls_mid_ctx.page[class_idx]
// - Refill: per-class free stack
// - Retire: record stats with class_idx
```
**Changes**:
- [ ] Extend TLS context to support C7
- [ ] Update refill logic for multi-class
- [ ] Add C7 routing in malloc_tiny_fast.h
### Task 2.3: Stats Recording
**File**: `core/smallobject_mid_v3.c`
```c
void small_cold_mid_v3_retire_page(
SmallSegment_MID_v3 *seg,
uint32_t class_idx,
void *page
) {
SmallPageMeta *meta = page_to_meta(page);
// Record stats
uint32_t free_hit_ratio_bps = calc_free_hit_ratio(meta);
SmallPageStatsMID_v3 stat = {
.class_idx = class_idx,
.total_allocations = meta->alloc_count,
.total_frees = meta->free_count,
.page_alloc_count = meta->capacity,
.free_hit_ratio_bps = free_hit_ratio_bps
};
// Publish to stats system
small_stats_mid_v3_publish(&stat);
// Feed to Learner
small_learner_v2_record_retire(class_idx, free_hit_ratio_bps);
// Free page (return to free stack or OS)
...
}
```
**Key Detail**: Must record `class_idx` for Learner aggregation
### Task 2.4: Learner v2 Aggregation
**File**: `core/smallobject_learner_v2.c`
```c
static SmallLearnerStatsV2 g_learner_v2_stats;
void small_learner_v2_record_retire(uint32_t class_idx,
uint32_t free_hit_ratio_bps) {
if (class_idx >= 8) return;
g_learner_v2_stats.allocs[class_idx]++;
g_learner_v2_stats.retire_ratio_pct[class_idx] =
(g_learner_v2_stats.retire_ratio_pct[class_idx] * 0.9) +
(free_hit_ratio_bps / 100.0) * 0.1; // Exponential smoothing
// Periodic evaluation
static uint64_t total_retires = 0;
if (++total_retires % LEARNER_EVAL_INTERVAL == 0) {
small_learner_v2_evaluate();
}
}
void small_learner_v2_evaluate(void) {
// Update global version to invalidate TLS policy cache
__sync_fetch_and_add(&g_policy_v2_version, 1);
g_learner_v2_stats.eval_count++;
}
```
### Task 2.5: Policy v2 Update
**File**: `core/smallobject_policy_v2.c`
```c
void small_policy_v2_update_from_learner(
const SmallLearnerStatsV2 *stats,
SmallPolicyV2 *policy_out
) {
if (!stats || !policy_out) return;
// C5 decision (Phase v11a: same logic as v7)
uint64_t total_allocs = 0;
for (int i = 0; i < 8; i++) {
total_allocs += stats->allocs[i];
}
if (total_allocs > 0) {
uint64_t c5_ratio_pct = (stats->allocs[5] * 100) / total_allocs;
if (c5_ratio_pct >= 30) {
policy_out->route_kind[5] = SMALL_ROUTE_V7;
} else {
policy_out->route_kind[5] = SMALL_ROUTE_MID_V3;
}
}
// Future (Phase v11b): Multi-dimensional decisions
// if (retire_ratio[5] < 50% && free_hit < 7000bps) LEGACY
// etc.
}
```
## 4. Phase v11a-3: Integration & Testing
### Task 3.1: C7 Routing in malloc_tiny_fast.h
**File**: `core/front/malloc_tiny_fast.h`
Modify alloc switch statement:
```c
// Current (v10):
// case TINY_ROUTE_SMALL_HEAP_V7: return small_heap_alloc_v7(...);
// case TINY_ROUTE_SMALL_HEAP_MID_V3: return small_heap_alloc_mid_v3(...);
// v11a:
// Add support for C7 routing to MID_v3
switch (policy->route_kind[class_idx]) {
case SMALL_ROUTE_ULTRA:
return ULTRA_alloc(...)
case SMALL_ROUTE_MID_V3:
return small_heap_alloc_mid_v3(class_idx, size); // ← v11a: supports C7
case SMALL_ROUTE_V7:
return small_heap_alloc_v7(class_idx, size);
case SMALL_ROUTE_LEGACY:
return legacy_alloc(...);
}
```
### Task 3.2: Free Path C7 Support
**File**: `core/front/malloc_tiny_fast.h`
```c
// v11a: Allow C7 free to route to MID_v3
if (SMALL_MID_V3_CLASS_SUPPORTED(class_idx)) {
if (policy->route_kind[class_idx] == SMALL_ROUTE_MID_V3) {
small_heap_free_mid_v3(ptr, class_idx);
return;
}
}
```
### Task 3.3: Integration Tests
**File**: `core/test/test_mid_v3_c7.c` [NEW]
```c
void test_mid_v3_c7_alloc_free(void) {
// Test C7 allocation and free through MID_v3
// Expected: successful alloc/free without segfault
// Verify: Policy routing is correct
// Verify: Learner stats are recorded
}
void test_learner_v2_route_switch(void) {
// Allocate C5-heavy workload
// Verify: route[5] = V7
// Switch to C6-heavy workload
// Verify: route[5] switches to MID_V3
// Check stderr: "[LEARNER_V2] C5 route switch: V7 → MID_V3"
}
void test_mid_v3_perf_c5_c6_c7_mixed(void) {
// Performance baseline for C5/C6/C7 mixed
// Expected: 42-48M ops/s
// Verify: no regression vs v7 research preset
}
```
### Task 3.4: Regression Testing
**Ensure**:
- [ ] v7 research preset (C5/C6 + Learner) still works
- [ ] Mixed profile (16-1024B, v7 OFF) unchanged
- [ ] ULTRA (C4-C7) unchanged
- [ ] Legacy fallback unchanged
## 5. Build & Compilation
### Makefile Changes
```makefile
# Add new object files to HAKMEM_OBJS
HAKMEM_OBJS += \
core/smallobject_segment_mid_v3.o \
core/smallobject_learner_v2.o \
core/smallobject_policy_v2.o
# Add new box headers to HEADERS
HEADERS += \
core/box/smallobject_segment_mid_v3_box.h \
core/box/smallobject_stats_mid_v3_box.h \
core/box/smallobject_learner_v2_box.h \
core/box/smallobject_policy_v2_box.h
```
## 6. Testing Commands
### Benchmark Suite (after Phase v11a-2)
```bash
# C5/C6/C7 mixed (expected MID_v3 preferred)
HAKMEM_SMALL_HEAP_V7_ENABLED=0 \
HAKMEM_MID_V3_ENABLED=1 \
HAKMEM_MID_V3_CLASSES=0x70 \
./bench_allocators bench_c5_c6_c7_mixed 300000
# C7 heavy (expected MID_v3 performance)
HAKMEM_SMALL_HEAP_V7_ENABLED=0 \
HAKMEM_MID_V3_ENABLED=1 \
./bench_allocators bench_c7_heavy 200000
# Learner route switch verification
HAKMEM_SMALL_HEAP_V7_ENABLED=1 \
HAKMEM_SMALL_HEAP_V7_CLASSES=0x60 \
HAKMEM_MID_V3_ENABLED=1 \
./bench_allocators bench_learner_route_switch 500000
```
### Expected Output
```
[POLICY_V2_INIT] Route assignments:
C0: LEGACY
C1: LEGACY
C2: LEGACY
C3: LEGACY
C4: ULTRA
C5: MID_V3
C6: MID_V3
C7: MID_V3
[LEARNER_V2] eval_count=1, C5_ratio=28%, retire_ratio[5]=92%
C5/C6/C7 mixed (300K iter): 44.2M ops/s ✓ (+4% vs baseline)
```
## 7. Dependency Graph
```
smallobject_segment_mid_v3_box.h
smallobject_segment_mid_v3.c
↓ calls
smallobject_stats_mid_v3.c
↓ publishes to
smallobject_learner_v2.c
↓ feeds to
smallobject_policy_v2.c
↓ updates
malloc_tiny_fast.h (routing)
```
Recommended implementation order:
1. smallobject_segment_mid_v3.h/c (foundation)
2. smallobject_stats_mid_v3.h (simple type def)
3. smallobject_mid_v3.c changes (core alloc/free)
4. smallobject_learner_v2.h/c (stats aggregation)
5. smallobject_policy_v2.h/c (learner integration)
6. malloc_tiny_fast.h (routing)
7. Tests & benchmarks
---
**Document Date**: 2025-12-12
**Phase**: v11a-1 (Design & Infrastructure)
**Status**: Ready for Task 1.1-1.5 implementation
**Next Review**: After Phase v11a-1 completion