Files
hakmem/docs/analysis/PHASE_V11A_IMPLEMENTATION_ROADMAP.md
Moe Charm (CI) 57313f7822 Phase v11a: Architecture design and implementation roadmap documents
Create comprehensive design specifications for Phase v11a (MID v3.5):

1. PHASE_V11A_DESIGN_MID_V3.5.md
   - Decision rationale: Option A chosen (consolidation vs expansion)
   - MID v3.5 architecture: unified 257-1KiB box
   - Role clarification: v7 frozen as research preset
   - Learner v2 scope: multi-class tracking, C5 ratio primary decision
   - Segment design decision: shared segment (Design B) vs separate segments
   - Stats expansion: per-class efficiency metrics
   - API changes: minimal, backward compatible

2. PHASE_V11A_IMPLEMENTATION_ROADMAP.md
   - Detailed task breakdown for v11a-1, v11a-2, v11a-3
   - File structure: new boxes, implementation files, modified files
   - Concrete function signatures and integration points
   - Benchmark commands and expected performance
   - Dependency graph and implementation order
   - Build/Makefile changes needed
   - Testing strategy and regression checks

Key Design Decisions:
- Multi-class segment uses shared 2MiB segment (not separate)
- Per-class free page stacks for efficient refill
- Stats published per-page retire (for Learner ingestion)
- TLS version-based cache invalidation (atomic policy updates)
- Backward compatibility: Policy v2 extends v1 interface

Next Step: Phase v11a-2 (Core Implementation)
- Implement segment creation/alloc/free
- Add C7 support to existing MID_v3
- Stats recording during page retire
- Learner aggregation logic

🤖 Generated with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-12 06:20:14 +09:00

13 KiB

Phase v11a 実装ロードマップ: MID v3.5

1. ファイル構造(新規作成予定)

新規ボックス定義

core/box/
  ├─ smallobject_segment_mid_v3_box.h        [NEW] Multi-class segment layout
  ├─ smallobject_stats_mid_v3_box.h          [NEW] SmallPageStatsMID_v3 type
  ├─ smallobject_learner_v2_box.h            [NEW] SmallLearnerStatsV2 type
  └─ smallobject_policy_v2_box.h             [NEW] Policy v2 update functions

実装ファイル

core/
  ├─ smallobject_segment_mid_v3.c            [NEW] Segment alloc/free/refill
  ├─ smallobject_learner_v2.c                [NEW] Learner stats aggregation
  └─ smallobject_policy_v2.c                 [NEW] Policy update logic

既存ファイル変更

core/
  ├─ smallobject_mid_v3.c                    [MODIFY] C7 support, stats recording
  ├─ front/malloc_tiny_fast.h                [MODIFY] C7 routing (if SMALL_ROUTE_MID_V3)
  ├─ hakmem.c                                [MODIFY] Init smallobject_learner_v2
  └─ hakmem.h                                [MODIFY] Export v2 types

2. Phase v11a-1: Design & Infrastructure

Task 1.1: smallobject_segment_mid_v3_box.h

// File: core/box/smallobject_segment_mid_v3_box.h [NEW]

#ifndef SMALLOBJECT_SEGMENT_MID_V3_BOX_H
#define SMALLOBJECT_SEGMENT_MID_V3_BOX_H

#include <stdint.h>
#include <stddef.h>

// SmallSegment_MID_v3: unified 2MiB segment for C5-C7
typedef struct {
    void *start;
    size_t total_size;          // 2 MiB
    size_t page_size;           // 64 KiB
    uint32_t num_pages;         // 32

    // Per-class page stacks
    void *free_pages[8];        // free page stack per class (LIFO)
    uint32_t free_count[8];     // free page count per class

    // Current allocation page per class
    void *current_page[8];
    uint32_t page_offset[8];    // allocation offset in current page

    // Metadata for pages
    struct SmallPageMeta **pages;  // [32] page pointers

    // Region ID (for lookup)
    uint32_t region_id;
} SmallSegment_MID_v3;

typedef struct {
    SmallSegment_MID_v3 *seg;
    void *page[8];              // TLS cache: current page per class
    uint32_t offset[8];         // TLS cache: offset per class
} SmallHeapCtx_MID_v3;

// API
SmallSegment_MID_v3* small_segment_mid_v3_create(void);
void small_segment_mid_v3_destroy(SmallSegment_MID_v3 *seg);

void* small_segment_mid_v3_alloc_fast(
    SmallSegment_MID_v3 *seg,
    uint32_t class_idx,
    size_t size
);

void small_segment_mid_v3_free_page(
    SmallSegment_MID_v3 *seg,
    uint32_t class_idx,
    void *page
);

#endif

Rationale: Defines the multi-class segment geometry with per-class free stacks and TLS caching pattern

Task 1.2: smallobject_stats_mid_v3_box.h

// File: core/box/smallobject_stats_mid_v3_box.h [NEW]

typedef struct {
    uint32_t class_idx;
    uint64_t total_allocations;
    uint64_t total_frees;
    uint32_t page_alloc_count;      // Slots on page
    uint32_t free_hit_ratio_bps;    // Free hit rate in basis points (0-10000)
} SmallPageStatsMID_v3;

typedef struct {
    SmallPageStatsMID_v3 stat;
    void *page_ptr;
    uint64_t retire_timestamp;
} SmallPageStatsPublished_MID_v3;

// API
void small_stats_mid_v3_publish(const SmallPageStatsMID_v3 *stat);
const SmallPageStatsPublished_MID_v3* small_stats_mid_v3_latest(void);

Rationale: Separates stats type from policy to keep Learner input clean

Task 1.3: smallobject_learner_v2_box.h

// File: core/box/smallobject_learner_v2_box.h [NEW]

typedef struct {
    uint64_t allocs[8];              // Allocation count per class
    uint32_t retire_ratio_pct[8];    // Retire efficiency per class (%)
    uint64_t avg_page_utilization;   // Global average utilization
    uint32_t free_hit_ratio_bps;     // Global free hit rate (basis points)
    uint64_t eval_count;
    uint64_t sample_count;
} SmallLearnerStatsV2;

// API
void small_learner_v2_record_refill(uint32_t class_idx, uint64_t capacity);
void small_learner_v2_record_retire(uint32_t class_idx,
                                    uint32_t free_hit_ratio_bps);
void small_learner_v2_evaluate(void);
const SmallLearnerStatsV2* small_learner_v2_stats_snapshot(void);

Rationale: Extends learner beyond v7 C5-only to multi-dimensional metrics

Task 1.4: smallobject_policy_v2_box.h

// File: core/box/smallobject_policy_v2_box.h [NEW]

// Policy v2: Route decision with Learner-driven updates
typedef struct {
    uint8_t route_kind[8];      // Route per class (ULTRA, MID_V3, V7, LEGACY)
    uint32_t policy_version;    // Version for TLS cache invalidation
} SmallPolicyV2;

// API
const SmallPolicyV2* small_policy_v2_snapshot(void);
void small_policy_v2_init_from_env(SmallPolicyV2 *policy);
void small_policy_v2_update_from_learner(
    const SmallLearnerStatsV2 *stats,
    SmallPolicyV2 *policy_out
);

Rationale: Extends Policy Box to handle expanded Learner inputs

Task 1.5: Benchmark Suite Extension

File: core/bench/bench_allocators.c

// Add test cases for Phase v11a
//
// BENCH_C5_C6_C7_MIXED:
//   - Min size: 200B (C5)
//   - Max size: 1000B (C7)
//   - Mixed ratio: 30% C5, 40% C6, 30% C7
//   - Expected perf: 42-48M ops/s (with MID_v3)
//
// BENCH_C7_HEAVY:
//   - Min size: 800B
//   - Max size: 1000B
//   - Expected perf: 35-40M ops/s (vs ULTRA baseline)
//
// BENCH_LEARNER_ROUTE_SWITCH:
//   - Start with C5-heavy (80% C5)
//   - Expect route[5] = V7 initially
//   - Then shift to C6-heavy (80% C6)
//   - Expect route[5] switch to MID_V3

3. Phase v11a-2: Core Implementation

Task 2.1: SmallSegment_MID_v3 Creation

File: core/smallobject_segment_mid_v3.c

SmallSegment_MID_v3* small_segment_mid_v3_create(void) {
    // Allocate 2MiB segment
    // Initialize 32 x 64KiB pages
    // Set up per-class free stacks
    // Register in RegionIdBox
}

Complexity: Medium

  • Memory layout: 2MiB = 32 pages of 64KiB each
  • Metadata: SmallPageMeta per page
  • Region registration: via RegionIdBox_v7 API (existing)

Task 2.2: Fast Alloc Path for C5/C6/C7

File: core/smallobject_mid_v3.c

Modify existing C5/C6 alloc to support C7:

// Current (v3):
// - TLS fast path: C5/C6 from tls_mid_ctx.page
// - Refill: get page from free stack or allocate

// v11a:
// - TLS fast path: C5/C6/C7 from tls_mid_ctx.page[class_idx]
// - Refill: per-class free stack
// - Retire: record stats with class_idx

Changes:

  • Extend TLS context to support C7
  • Update refill logic for multi-class
  • Add C7 routing in malloc_tiny_fast.h

Task 2.3: Stats Recording

File: core/smallobject_mid_v3.c

void small_cold_mid_v3_retire_page(
    SmallSegment_MID_v3 *seg,
    uint32_t class_idx,
    void *page
) {
    SmallPageMeta *meta = page_to_meta(page);

    // Record stats
    uint32_t free_hit_ratio_bps = calc_free_hit_ratio(meta);
    SmallPageStatsMID_v3 stat = {
        .class_idx = class_idx,
        .total_allocations = meta->alloc_count,
        .total_frees = meta->free_count,
        .page_alloc_count = meta->capacity,
        .free_hit_ratio_bps = free_hit_ratio_bps
    };

    // Publish to stats system
    small_stats_mid_v3_publish(&stat);

    // Feed to Learner
    small_learner_v2_record_retire(class_idx, free_hit_ratio_bps);

    // Free page (return to free stack or OS)
    ...
}

Key Detail: Must record class_idx for Learner aggregation

Task 2.4: Learner v2 Aggregation

File: core/smallobject_learner_v2.c

static SmallLearnerStatsV2 g_learner_v2_stats;

void small_learner_v2_record_retire(uint32_t class_idx,
                                    uint32_t free_hit_ratio_bps) {
    if (class_idx >= 8) return;

    g_learner_v2_stats.allocs[class_idx]++;
    g_learner_v2_stats.retire_ratio_pct[class_idx] =
        (g_learner_v2_stats.retire_ratio_pct[class_idx] * 0.9) +
        (free_hit_ratio_bps / 100.0) * 0.1;  // Exponential smoothing

    // Periodic evaluation
    static uint64_t total_retires = 0;
    if (++total_retires % LEARNER_EVAL_INTERVAL == 0) {
        small_learner_v2_evaluate();
    }
}

void small_learner_v2_evaluate(void) {
    // Update global version to invalidate TLS policy cache
    __sync_fetch_and_add(&g_policy_v2_version, 1);

    g_learner_v2_stats.eval_count++;
}

Task 2.5: Policy v2 Update

File: core/smallobject_policy_v2.c

void small_policy_v2_update_from_learner(
    const SmallLearnerStatsV2 *stats,
    SmallPolicyV2 *policy_out
) {
    if (!stats || !policy_out) return;

    // C5 decision (Phase v11a: same logic as v7)
    uint64_t total_allocs = 0;
    for (int i = 0; i < 8; i++) {
        total_allocs += stats->allocs[i];
    }

    if (total_allocs > 0) {
        uint64_t c5_ratio_pct = (stats->allocs[5] * 100) / total_allocs;

        if (c5_ratio_pct >= 30) {
            policy_out->route_kind[5] = SMALL_ROUTE_V7;
        } else {
            policy_out->route_kind[5] = SMALL_ROUTE_MID_V3;
        }
    }

    // Future (Phase v11b): Multi-dimensional decisions
    // if (retire_ratio[5] < 50% && free_hit < 7000bps) → LEGACY
    // etc.
}

4. Phase v11a-3: Integration & Testing

Task 3.1: C7 Routing in malloc_tiny_fast.h

File: core/front/malloc_tiny_fast.h

Modify alloc switch statement:

// Current (v10):
// case TINY_ROUTE_SMALL_HEAP_V7: return small_heap_alloc_v7(...);
// case TINY_ROUTE_SMALL_HEAP_MID_V3: return small_heap_alloc_mid_v3(...);

// v11a:
// Add support for C7 routing to MID_v3
switch (policy->route_kind[class_idx]) {
    case SMALL_ROUTE_ULTRA:
        return ULTRA_alloc(...)
    case SMALL_ROUTE_MID_V3:
        return small_heap_alloc_mid_v3(class_idx, size);  // ← v11a: supports C7
    case SMALL_ROUTE_V7:
        return small_heap_alloc_v7(class_idx, size);
    case SMALL_ROUTE_LEGACY:
        return legacy_alloc(...);
}

Task 3.2: Free Path C7 Support

File: core/front/malloc_tiny_fast.h

// v11a: Allow C7 free to route to MID_v3
if (SMALL_MID_V3_CLASS_SUPPORTED(class_idx)) {
    if (policy->route_kind[class_idx] == SMALL_ROUTE_MID_V3) {
        small_heap_free_mid_v3(ptr, class_idx);
        return;
    }
}

Task 3.3: Integration Tests

File: core/test/test_mid_v3_c7.c [NEW]

void test_mid_v3_c7_alloc_free(void) {
    // Test C7 allocation and free through MID_v3
    // Expected: successful alloc/free without segfault
    // Verify: Policy routing is correct
    // Verify: Learner stats are recorded
}

void test_learner_v2_route_switch(void) {
    // Allocate C5-heavy workload
    // Verify: route[5] = V7
    // Switch to C6-heavy workload
    // Verify: route[5] switches to MID_V3
    // Check stderr: "[LEARNER_V2] C5 route switch: V7 → MID_V3"
}

void test_mid_v3_perf_c5_c6_c7_mixed(void) {
    // Performance baseline for C5/C6/C7 mixed
    // Expected: 42-48M ops/s
    // Verify: no regression vs v7 research preset
}

Task 3.4: Regression Testing

Ensure:

  • v7 research preset (C5/C6 + Learner) still works
  • Mixed profile (16-1024B, v7 OFF) unchanged
  • ULTRA (C4-C7) unchanged
  • Legacy fallback unchanged

5. Build & Compilation

Makefile Changes

# Add new object files to HAKMEM_OBJS
HAKMEM_OBJS += \
    core/smallobject_segment_mid_v3.o \
    core/smallobject_learner_v2.o \
    core/smallobject_policy_v2.o

# Add new box headers to HEADERS
HEADERS += \
    core/box/smallobject_segment_mid_v3_box.h \
    core/box/smallobject_stats_mid_v3_box.h \
    core/box/smallobject_learner_v2_box.h \
    core/box/smallobject_policy_v2_box.h

6. Testing Commands

Benchmark Suite (after Phase v11a-2)

# C5/C6/C7 mixed (expected MID_v3 preferred)
HAKMEM_SMALL_HEAP_V7_ENABLED=0 \
HAKMEM_MID_V3_ENABLED=1 \
HAKMEM_MID_V3_CLASSES=0x70 \
./bench_allocators bench_c5_c6_c7_mixed 300000

# C7 heavy (expected MID_v3 performance)
HAKMEM_SMALL_HEAP_V7_ENABLED=0 \
HAKMEM_MID_V3_ENABLED=1 \
./bench_allocators bench_c7_heavy 200000

# Learner route switch verification
HAKMEM_SMALL_HEAP_V7_ENABLED=1 \
HAKMEM_SMALL_HEAP_V7_CLASSES=0x60 \
HAKMEM_MID_V3_ENABLED=1 \
./bench_allocators bench_learner_route_switch 500000

Expected Output

[POLICY_V2_INIT] Route assignments:
  C0: LEGACY
  C1: LEGACY
  C2: LEGACY
  C3: LEGACY
  C4: ULTRA
  C5: MID_V3
  C6: MID_V3
  C7: MID_V3

[LEARNER_V2] eval_count=1, C5_ratio=28%, retire_ratio[5]=92%

C5/C6/C7 mixed (300K iter): 44.2M ops/s ✓ (+4% vs baseline)

7. Dependency Graph

smallobject_segment_mid_v3_box.h
  ↓
smallobject_segment_mid_v3.c
  ↓ calls
smallobject_stats_mid_v3.c
  ↓ publishes to
smallobject_learner_v2.c
  ↓ feeds to
smallobject_policy_v2.c
  ↓ updates
malloc_tiny_fast.h (routing)

Recommended implementation order:

  1. smallobject_segment_mid_v3.h/c (foundation)
  2. smallobject_stats_mid_v3.h (simple type def)
  3. smallobject_mid_v3.c changes (core alloc/free)
  4. smallobject_learner_v2.h/c (stats aggregation)
  5. smallobject_policy_v2.h/c (learner integration)
  6. malloc_tiny_fast.h (routing)
  7. Tests & benchmarks

Document Date: 2025-12-12 Phase: v11a-1 (Design & Infrastructure) Status: Ready for Task 1.1-1.5 implementation Next Review: After Phase v11a-1 completion