Files
hakorune/docs/private/roadmap/phases/phase-20.7/PLAN.md

14 KiB
Raw Blame History

Phase 20.7: Collections in Hakorune - Implementation Plan

Duration: 8 weeks (2026-05-25 → 2026-07-19) Prerequisite: Phase 20.6 (VM Core Complete + Dispatch Unification) Completes: Phase D (Collections - MapBox/ArrayBox in Pure Hakorune)


Executive Summary

Phase 20.7 rolls out the CABI bridge for Basic boxes (String/Array/Map) so Rust VM and Hako VM call the same implementation. Default profile uses dynamic plugins; embedded/WASM uses builtin plugins (static link + LTO). Pure Hakorune implementations remain available for research but are not the default execution path.

Key Achievements:

  • String/Array/Map via CABI (plugins) with identical behavior across Rust/Hako VMs
  • Builtin plugin preset (WASM/embedded) yields zerocost inlining via TwoLayer Export
  • Golden tests proving Rust vs Hako parity for collections (size/push/get/set/slice/indexOf/substring)
  • Performance ≥ 70% of Rust runtime (for ABI path)

Critical Principles:

  • CABI SSOT w/ TwoLayer ExportStable ABI / Inlinable API
  • Deterministic Behavior: Key order is predictable (Symbol < Int < String)
  • ValueBox/DataBox Boundaries: ValueBox is temporary (pipeline only), DataBox is persistent
  • FailFast: Unknown methods/capability violations → RuntimeError (no silent fallbacks)

CABI Bridge Rollout (String/Array/Map)

Flags & Presets

  • NYASH_ABI_BASIC_ON=1 … Enable ABI path for Basic boxes既定: OFF → quick → integration で段階ON
  • NYASH_ABI_BUILTIN_LINK=1 … Builtin pluginsLTO/inline for embedded/WASM preset
  • NYASH_NORMALIZE_CORE_EXTERN=1 … Builder/VM が String/Array/Map を Extern("nyrt.*") に強制正規化するゲート。既定ON0で BoxCall/Plugin 経路にロールバック可)。
  • NYASH_VERIFY_CORE_EXTERN=1 … Verifier strict。Method/ModuleFunction 形が残っていないかを FailFast で監視ローカル観測用、既定OFF
  • Presets: defaultplugins, embeddedbuiltin, researchpure

Acceptance Criteria

  • VM/LLVM/Hakonyvmで collections 操作の出力一致size/get/set/push/slice/indexOf/substring
  • quick → integration と段階で NYASH_ABI_BASIC_ON=1 をON化し全緑
  • 性能: ABI経路が Rust 実装の ≥ 70%

Rollback

  • いつでも NYASH_ABI_BASIC_ON=0 で既存経路に戻せる(設計上のセーフティネット)

Pilot Rollout Plan段階ONの細分化

目的

  • リスクを最小化するため、機能単位で段階ONパイロット → 観測 → 横展開)を行う。

WeekbyWeek目安

  • Week 1quick only: String.length を ABI で ONNYASH_ABI_BASIC_ON=1 → 48h 観測
  • Week 2quick only: Array.size/push を追加 ON → 観測
  • Week 34quick only: Map.set/get を追加 ON → 観測
  • Week 5: quick での ABI ON を integration に横展開String/Array/Map
  • Week 67: 性能微調整・診断強化(≥ 70% 目標)
  • Week 8: 既定ON 判定(問題あれば NYASH_ABI_BASIC_ON=0 で即時戻し)

診断/観測

  • NYASH_ABI_BASIC_DIAG=1 で ABI 経路呼出を 1 行ログdev/CI観測用、既定OFF
  • リーク・ガード(テスト専用)で retain/release 漏れ検知1000x new/free → 総ハンドル数=0

CI Matrix最小

  • 追加: quickABI ONジョブ 1 本段階ONの対象のみ String/Array/Map
  • 既存: quick/integrationABI OFFを維持して回帰監視

Week 1-4: MapBox Implementation

Overview

MapBox is a hash map storing key-value pairs with deterministic iteration order.

Core Methods

box MapBox {
    // Storage
    _buckets: ArrayBox      // Internal bucket storage
    _size: IntegerBox       // Element count
    _load_factor: FloatBox  // For resizing

    // Public API
    set(key: DataBox, value: DataBox) -> NullBox
    get(key: DataBox) -> DataBox | NullBox
    has(key: DataBox) -> BoolBox
    remove(key: DataBox) -> DataBox | NullBox
    size() -> IntegerBox
    keys() -> ArrayBox      // Returns keys in deterministic order
    values() -> ArrayBox    // Returns values in deterministic order
}

Week 1: Foundation

Deliverables:

  1. Data Structures:

    • Bucket storage design (chaining or open addressing)
    • Hash function for Symbol/Int/String
    • Collision resolution strategy
  2. Core Operations:

    • set(key, value) implementation
    • get(key) implementation
    • Basic resizing logic (load factor > 0.75 → double capacity)
  3. Tests:

    • Basic set/get operations (10 test cases)
    • Hash collision handling (5 test cases)
    • Resizing behavior (3 test cases)

Week 2: Deterministic Hashing

Deliverables:

  1. Hash Strategy:

    • Symbol hash: Use symbol ID (deterministic)
    • Int hash: Use integer value (deterministic)
    • String hash: Use UTF-8 bytes with stable algorithm (e.g., FNV-1a)
  2. Key Normalization:

    • Type detection (Symbol vs Int vs String)
    • Unified comparison function
    • Stable sort implementation
  3. Deterministic Iteration:

    • keys() returns sorted keys (Symbol < Int < String)
    • values() returns values in key order
    • Provenance tracking (plugin_id, version)
  4. Tests:

    • Mixed-type key insertion (Symbol, Int, String)
    • Deterministic iteration order (20 test cases)
    • Hash stability (same key → same bucket)

Week 3: Advanced Operations

Deliverables:

  1. Remove Operation:

    • remove(key) implementation
    • Bucket chain repair (if using chaining)
    • Size decrement
  2. Query Operations:

    • has(key) implementation
    • size() implementation
    • Edge cases (empty map, single element)
  3. Boundary Checks:

    • Reject ValueBox keys (Fail-Fast)
    • Ensure DataBox inputs only
    • Clear error messages
  4. Tests:

    • Remove operations (15 test cases)
    • Edge cases (empty map, remove non-existent key)
    • ValueBox rejection (Fail-Fast verification)

Week 4: MapBox Golden Tests

Deliverables:

  1. Golden Test Suite:

    • 50+ test cases comparing Rust-MapBox vs Hako-MapBox
    • Identical output verification (keys, values, size)
    • Performance benchmarks (set/get/remove)
  2. Deterministic Verification:

    • Same input → same output (10 runs per test)
    • Key order stability (Symbol < Int < String)
    • Provenance tracking validation
  3. Performance Baseline:

    • Measure operation times (set/get/remove)
    • Compare against Rust-MapBox
    • Target: ≥ 70% of Rust performance
  4. Documentation:

    • MapBox API reference
    • Usage examples
    • Performance characteristics (O(1) average, O(n) worst case)

Week 5-8: ArrayBox Implementation

Overview

ArrayBox is a dynamic array (list) with automatic capacity expansion.

Core Methods

box ArrayBox {
    // Storage
    _data: RawBufferBox     // Internal buffer
    _size: IntegerBox       // Current element count
    _capacity: IntegerBox   // Allocated capacity

    // Public API
    push(value: DataBox) -> NullBox
    pop() -> DataBox | NullBox
    get(index: IntegerBox) -> DataBox | NullBox
    set(index: IntegerBox, value: DataBox) -> NullBox
    size() -> IntegerBox
    slice(start: IntegerBox, end: IntegerBox) -> ArrayBox
    concat(other: ArrayBox) -> ArrayBox
}

Week 5: Foundation

Deliverables:

  1. Data Structures:

    • RawBufferBox for storage
    • Capacity expansion strategy (2x growth)
    • Bounds checking
  2. Core Operations:

    • push(value) implementation
    • pop() implementation
    • Capacity doubling logic
  3. Tests:

    • Basic push/pop operations (10 test cases)
    • Capacity expansion (grow from 0 → 1 → 2 → 4 → 8)
    • Empty array edge cases

Week 6: Index Operations

Deliverables:

  1. Index Access:

    • get(index) implementation
    • set(index, value) implementation
    • Bounds checking (index < 0 or index ≥ size → error)
  2. Fail-Fast Boundaries:

    • Out-of-bounds access → RuntimeError
    • Negative index → RuntimeError
    • Clear error messages with context
  3. Tests:

    • Valid index access (20 test cases)
    • Out-of-bounds access (10 test cases)
    • Negative index handling (5 test cases)

Week 7: Advanced Operations

Deliverables:

  1. Slice Operation:

    • slice(start, end) implementation
    • Return new ArrayBox with elements [start, end)
    • Handle edge cases (start > end, negative indices)
  2. Concat Operation:

    • concat(other) implementation
    • Create new ArrayBox with combined elements
    • Efficient memory allocation
  3. ValueBox/DataBox Boundaries:

    • Reject ValueBox elements (Fail-Fast)
    • Ensure DataBox inputs only
    • Unpack at entry/exit points
  4. Tests:

    • Slice operations (15 test cases)
    • Concat operations (10 test cases)
    • ValueBox rejection (5 test cases)

Week 8: ArrayBox Golden Tests

Deliverables:

  1. Golden Test Suite:

    • 50+ test cases comparing Rust-ArrayBox vs Hako-ArrayBox
    • Identical output verification (size, elements, order)
    • Performance benchmarks (push/pop/get/set)
  2. Performance Baseline:

    • Measure operation times (all methods)
    • Compare against Rust-ArrayBox
    • Target: ≥ 70% of Rust performance
  3. Large-Scale Tests:

    • 10,000 element arrays
    • Stress test capacity expansion
    • Memory usage validation
  4. Documentation:

    • ArrayBox API reference
    • Usage examples
    • Performance characteristics (O(1) amortized, O(n) worst case)

Key Design Decisions

1. Deterministic Key Order (Symbol < Int < String)

Rationale:

  • Predictable iteration order for debugging
  • Reproducible behavior across runs
  • Stable sorting for golden tests

Implementation:

// Key comparison function
method compare_keys(key1: DataBox, key2: DataBox) -> IntegerBox {
    local type1 = key1.type_id()
    local type2 = key2.type_id()

    // Type priority: Symbol < Int < String
    if (type1 != type2) {
        if (type1 == :Symbol) { return -1 }
        if (type2 == :Symbol) { return 1 }
        if (type1 == :Int) { return -1 }
        if (type2 == :Int) { return 1 }
    }

    // Same type: natural order
    return key1.compare(key2)
}

2. ValueBox/DataBox Boundaries

Rationale:

  • ValueBox is ephemeral (pipeline boundaries only)
  • DataBox is persistent (long-lived storage)
  • Fail-Fast prevents confusion

Implementation:

// MapBox.set with boundary check
method set(key: DataBox, value: DataBox) -> NullBox {
    if (key.is_valuebox()) {
        panic("MapBox.set: key must be DataBox, not ValueBox")
    }
    if (value.is_valuebox()) {
        panic("MapBox.set: value must be DataBox, not ValueBox")
    }

    // Proceed with actual set operation
    local hash = me._hash(key)
    local bucket = me._buckets.get(hash)
    bucket.insert(key, value)
}

3. Fail-Fast Error Handling

Rationale:

  • No silent fallbacks (prevents bugs)
  • Clear error messages (aids debugging)
  • Early detection (stops at source)

Implementation:

// ArrayBox.get with Fail-Fast
method get(index: IntegerBox) -> DataBox | NullBox {
    if (index < 0) {
        panic("ArrayBox.get: negative index: " + index)
    }
    if (index >= me._size) {
        panic("ArrayBox.get: index out of bounds: " + index + " >= " + me._size)
    }

    return me._data.get(index)
}

Success Criteria

Mandatory ( All must pass)

  1. MapBox/ArrayBox Fully Implemented:

    • All methods work in Pure Hakorune
    • No Rust dependencies (except HostBridge)
  2. Golden Tests Pass:

    • Rust-Collections vs Hako-Collections: 100% parity
    • 100+ test cases with identical output
    • Deterministic behavior verified
  3. Deterministic Behavior:

    • Key order is predictable (Symbol < Int < String)
    • Iteration order is stable
    • Provenance tracking works
  4. Performance:

    • Hako-Collections ≥ 70% of Rust-Collections speed
    • Basic operations (set/get/push/pop) are O(1) average
    • Large-scale data (10,000 elements) is practical

Excluded (⚠️ NOT in Phase 20.7)

  • GC (Garbage Collection): Deferred to Phase 20.8
  • Optimization: Basic implementation only (no profiling yet)
  • Concurrency: Single-threaded only
  • Persistence: In-memory only (no disk storage)

Dependencies

Prerequisite: Phase 20.6

Phase 20.7 requires:

  1. VM Core Complete: All 16 MIR instructions working
  2. Dispatch Unification: Single Resolver path for all method calls
  3. Golden Test Framework: Rust-VM vs Hako-VM comparison working

Provides to: Phase 20.8

Phase 20.7 delivers:

  1. Collections in Hakorune: MapBox/ArrayBox fully self-hosted
  2. Deterministic Iteration: Stable key order for GC root scanning
  3. Reference Tracking: Preparation for GC mark phase

Risk Mitigation

Risk 1: Performance Below 70%

Mitigation:

  • Measure at each week (not just Week 4/8)
  • Profile hot paths early (hash function, bounds checks)
  • Accept slower performance initially (can optimize later)

Risk 2: Deterministic Hashing Complexity

Mitigation:

  • Use simple, proven hash functions (FNV-1a for strings)
  • Stable sort implementation (merge sort, not quicksort)
  • Test hash stability with 1,000+ keys

Risk 3: ValueBox/DataBox Confusion

Mitigation:

  • Clear error messages ("must be DataBox, not ValueBox")
  • Fail-Fast at entry points (set/push methods)
  • Golden tests verify boundary enforcement


Status: PLANNED Dependencies: Phase 20.6 (VM Core + Dispatch) Delivers: Phase D (Collections in Pure Hakorune) Timeline: 8 weeks (2026-05-25 → 2026-07-19)