Files
hakmem/docs/analysis/MID_POOL_V3_DESIGN.md
Moe Charm (CI) a8d0ab06fc MID-V3: Specialize to 257-768B, exclude C7 (ULTRA handles 1KB)
Role separation based on ultrathink analysis:
- MID v3: 257-768B専用 (C6 only, HAKMEM_MID_V3_CLASSES=0x40)
- C7 ULTRA: 769-1024B専用 (existing optimized path)

Changes:
- core/box/hak_alloc_api.inc.h: Remove C7 route, restrict to 257-768B
- core/box/mid_hotbox_v3_env_box.h: Update ENV comments
- docs/analysis/MID_POOL_V3_DESIGN.md: Add performance results & role
- CURRENT_TASK.md: Document MID-V3 completion & role separation

Verified:
- 257-768B with v3 ON: 1,199,526 ops/s (+1.7% vs baseline)
- 769-1024B with v3 ON: 1,181,254 ops/s (same as baseline, C7 excluded)
- C7 correctly routes to ULTRA instead of MID v3

Rationale: C7-only showed -11% regression, but C6/mixed showed +11-19%
improvement. Specializing to mid-range (257-768B) leverages v3 strengths
while keeping C7 on the proven ULTRA path.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-12 01:14:13 +09:00

9.6 KiB

MID_POOL_V3 設計書

概要

Mid/Pool v3 は既存の SmallObject v4 (MF2) を発展させ、RegionIdBox による ptr→page_meta O(1) lookup を統合した次世代アーキテクチャ。

役割分担: MID v3 は 257-768B 専用、C7 ULTRA が 769-1024B を担当。

Phase Plan

Phase 内容 依存
MID-V3-0 設計 doc (本文書) -
MID-V3-1 型スケルトン + ENV MID-V3-0
MID-V3-2 RegionIdBox Registration API 完成 (V6-HDR-2) MID-V3-1
MID-V3-3 RegionId 統合 (page registration at carve) MID-V3-2
MID-V3-4 Allocation fast path 実装 MID-V3-3
MID-V3-5 Free/cold path 実装 MID-V3-4

設計課題: Lane vs Page 二重管理問題

問題点 (Task Review で指摘)

当初の設計案では Lane と Page の両方で freelist を管理することを想定していたが、 既存 v4 MF2 では per-page freelist が既に機能しており、 Lane を追加すると管理責任が二重化する。

既存 v4 MF2 構造

// core/smallobject_hotbox_v4.c
typedef struct small_page_v4 {
    uint8_t class_idx;
    uint16_t capacity;
    uint16_t used;
    uint32_t block_size;
    uint8_t* base;
    void* freelist;          // ← Per-page freelist
    void* slab_ref;
    void* segment;
    struct small_page_v4* next;
    uint16_t flags;
} small_page_v4;

typedef struct small_class_heap_v4 {
    small_page_v4* current;      // Current working page
    small_page_v4* partial_head; // Partial pages list
    uint32_t partial_count;
    small_page_v4* full_head;    // Full pages list
} small_class_heap_v4;

解決策: Lane = Page Index Cache

Lane を独立した freelist 管理単位としてではなく、 TLS が現在作業中の page への index/cache として再定義する。

┌──────────────────────────────────────────────────────┐
│                   MidHotBoxV3 (L0 TLS)               │
│  ┌─────────────────────────────────────────────────┐ │
│  │ lane[class] = { page_idx, freelist_cache }     │ │
│  │   ↓                                             │ │
│  │ page_idx → MidPageDesc (via RegionIdBox)       │ │
│  └─────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────┘
                         ↓ refill/retire
┌──────────────────────────────────────────────────────┐
│                  MidColdIfaceV3 (L1)                 │
│  - page carve (segment → page)                       │
│  - page return (page → segment)                      │
│  - RegionIdBox registration                          │
└──────────────────────────────────────────────────────┘

Lane 構造 (Revised)

typedef struct MidLaneV3 {
    uint32_t page_idx;        // Current working page index
    void*    freelist_head;   // TLS-local freelist snapshot (fast path)
    uint32_t freelist_count;  // Remaining items in freelist
    // Note: 実際の freelist は MidPageDesc にあり、
    //       lane は TLS cache として機能
} MidLaneV3;

Page 構造

typedef struct MidPageDescV3 {
    uint8_t*  base;           // Page base address
    uint32_t  capacity;       // Total slots
    uint32_t  used;           // Used count
    void*     freelist;       // Actual freelist (authoritative)
    uint32_t  region_id;      // RegionIdBox registration ID
    uint8_t   class_idx;      // Size class
    uint8_t   flags;          // Page state flags
} MidPageDescV3;

RegionIdBox 統合

現状 (V6-HDR-1)

region_id_register_v6() は stub 状態:

// core/region_id_v6.c:202
uint32_t region_id_register_v6(void* base, size_t size, region_kind_t kind, void* metadata) {
    (void)base;
    (void)size;
    (void)kind;
    (void)metadata;
    return 1;  // Single region for now
}

V6-HDR-2: Registration API 完成 (MID-V3-2)

// 必要な機能:
// 1. Region entry array (固定サイズ or 動的)
// 2. ptr → region_entry lookup (radix tree or sorted array)
// 3. Thread-safe registration/unregistration

typedef struct RegionEntry {
    uintptr_t base;
    uintptr_t end;
    region_kind_t kind;
    void* metadata;       // MidPageDescV3* for SMALL_V4
    uint32_t id;
} RegionEntry;

// API
uint32_t region_id_register_v6(void* base, size_t size, region_kind_t kind, void* metadata);
void region_id_unregister_v6(uint32_t region_id);
RegionLookupV6 region_id_lookup_v6(void* ptr);

MID-V3-3: Page Registration at Carve

// Page carve 時に RegionIdBox に登録
MidPageDescV3* mid_cold_v3_carve_page(MidSegmentV3* seg, int class_idx) {
    MidPageDescV3* page = /* ... carve from segment ... */;

    // RegionIdBox に登録
    page->region_id = region_id_register_v6(
        page->base,
        page->capacity * stride_for_class(class_idx),
        REGION_KIND_SMALL_V4,  // or new REGION_KIND_MID_V3
        page
    );

    return page;
}

// Page return 時に登録解除
void mid_cold_v3_return_page(MidPageDescV3* page) {
    region_id_unregister_v6(page->region_id);
    /* ... return to segment ... */
}

Allocation Fast Path (MID-V3-4)

void* mid_hot_v3_alloc(MidHotBoxV3* hot, int class_idx) {
    MidLaneV3* lane = &hot->lanes[class_idx];

    // L0: TLS freelist cache hit
    if (likely(lane->freelist_head)) {
        void* blk = lane->freelist_head;
        lane->freelist_head = *(void**)blk;
        lane->freelist_count--;
        return blk;
    }

    // L0 miss: Refill from page or cold path
    return mid_hot_v3_alloc_slow(hot, class_idx);
}

static void* mid_hot_v3_alloc_slow(MidHotBoxV3* hot, int class_idx) {
    MidLaneV3* lane = &hot->lanes[class_idx];

    // Try to refill from current page
    if (lane->page_idx != 0) {
        MidPageDescV3* page = mid_page_from_idx(lane->page_idx);
        if (page && page->freelist) {
            // Batch transfer from page to lane
            mid_lane_refill_from_page(lane, page);
            return mid_hot_v3_alloc(hot, class_idx);
        }
    }

    // Cold path: Get new page
    MidPageDescV3* new_page = mid_cold_v3_refill_page(hot, class_idx);
    if (!new_page) return NULL;

    lane->page_idx = mid_page_to_idx(new_page);
    mid_lane_refill_from_page(lane, new_page);
    return mid_hot_v3_alloc(hot, class_idx);
}

Free Path (MID-V3-5)

void mid_hot_v3_free(void* ptr) {
    // RegionIdBox lookup (O(1) via TLS cache)
    RegionLookupV6 lk = region_id_lookup_cached_v6(ptr);

    if (lk.kind != REGION_KIND_MID_V3) {
        // Not our allocation, delegate
        return;
    }

    MidPageDescV3* page = (MidPageDescV3*)lk.page_meta;

    // Check if local thread owns this page
    MidHotBoxV3* hot = mid_hot_box_v3_get();
    MidLaneV3* lane = &hot->lanes[page->class_idx];

    if (lane->page_idx == mid_page_to_idx(page)) {
        // Local page: direct push to lane freelist
        *(void**)ptr = lane->freelist_head;
        lane->freelist_head = ptr;
        lane->freelist_count++;
        return;
    }

    // Remote page: push to page freelist (atomic if needed)
    mid_page_push_free(page, ptr);
}

ENV Controls

HAKMEM_MID_V3_ENABLED=1      # Enable MID v3 (default: 0)
HAKMEM_MID_V3_CLASSES=0x40   # Class bitmask (default: 0, recommended: 0x40 = C6 only)
                             # NOTE: C7 (0x80) NOT recommended - use C7 ULTRA instead
HAKMEM_MID_V3_DEBUG=1        # Debug logging (default: 0)
HAKMEM_MID_V3_LANE_BATCH=16  # Lane refill batch size (default: 16)

Performance Results

Workload Baseline (ops/s) MID v3 ON (ops/s) Improvement
C6 (257-768B) 1,043,379 1,159,390 +11.1%
Mixed (257-768B) 976,057 1,169,648 +19.8%

Note: C7 (769-1024B) is intentionally excluded from MID v3 and handled by C7 ULTRA, which shows better performance for 1KB allocations.

Checklist

  • MID-V3-1: 型スケルトン + ENV

    • MidHotBoxV3 structure
    • MidLaneV3 structure
    • MidPageDescV3 structure
    • MidColdIfaceV3 interface
    • ENV parsing
  • MID-V3-2: RegionIdBox Registration API (V6-HDR-2)

    • RegionEntry structure
    • region_id_register_v6() implementation
    • region_id_unregister_v6() implementation
    • Lookup integration (ptr → page_meta)
  • MID-V3-3: RegionId 統合

    • Page carve time registration
    • Page return time unregistration
    • TLS segment auto-registration
  • MID-V3-4: Allocation fast path

    • Lane freelist fast path
    • Page refill slow path
    • Cold refill integration
  • MID-V3-5: Free/cold path

    • RegionIdBox lookup in free
    • Local page fast free
    • Remote page handling

参考: 既存コードとの関係

既存 v3 対応
smallobject_hotbox_v4.c mid_hotbox_v3.c (新規)
small_page_v4 MidPageDescV3
small_class_heap_v4 MidLaneV3
cold_refill_page_v4() mid_cold_v3_refill_page()
cold_retire_page_v4() mid_cold_v3_retire_page()
region_id_v6.c RegionIdBox API 拡張