Files

nyash-codex 8fd3a2b509 docs: restore docs/private/roadmap from 7b4908f9 (Phase 20.31)

2025-10-31 18:00:10 +09:00

14 KiB

Raw Blame History

Phase 20.7: Collections in Hakorune - Implementation Plan

Duration: 8 weeks (2026-05-25 → 2026-07-19) Prerequisite: Phase 20.6 (VM Core Complete + Dispatch Unification) Completes: Phase D (Collections - MapBox/ArrayBox in Pure Hakorune)

Executive Summary

Phase 20.7 rolls out the C‑ABI bridge for Basic boxes (String/Array/Map) so Rust VM and Hako VM call the same implementation. Default profile uses dynamic plugins; embedded/WASM uses built‑in plugins (static link + LTO). Pure Hakorune implementations remain available for research but are not the default execution path.

Key Achievements:

String/Array/Map via C‑ABI (plugins) with identical behavior across Rust/Hako VMs
Built‑in plugin preset (WASM/embedded) yields zero‑cost inlining via Two‑Layer Export
Golden tests proving Rust vs Hako parity for collections (size/push/get/set/slice/indexOf/substring)
Performance ≥ 70% of Rust runtime (for ABI path)

Critical Principles:

C‑ABI SSOT w/ Two‑Layer Export（Stable ABI / Inlinable API）
Deterministic Behavior: Key order is predictable (Symbol < Int < String)
ValueBox/DataBox Boundaries: ValueBox is temporary (pipeline only), DataBox is persistent
Fail‑Fast: Unknown methods/capability violations → RuntimeError (no silent fallbacks)

C‑ABI Bridge Rollout (String/Array/Map)

Flags & Presets

NYASH_ABI_BASIC_ON=1 … Enable ABI path for Basic boxes（既定: OFF → quick → integration で段階ON）
NYASH_ABI_BUILTIN_LINK=1 … Built‑in plugins（LTO/inline） for embedded/WASM preset
NYASH_NORMALIZE_CORE_EXTERN=1 … Builder/VM が String/Array/Map を Extern("nyrt.*") に強制正規化するゲート。既定ON（0で BoxCall/Plugin 経路にロールバック可）。
NYASH_VERIFY_CORE_EXTERN=1 … Verifier strict。Method/ModuleFunction 形が残っていないかを Fail‑Fast で監視（ローカル観測用、既定OFF）。
Presets: default（plugins）, embedded（built‑in）, research（pure）

Acceptance Criteria

VM/LLVM/Hako（nyvm）で collections 操作の出力一致（size/get/set/push/slice/indexOf/substring）
quick → integration と段階で NYASH_ABI_BASIC_ON=1 をON化し全緑
性能: ABI経路が Rust 実装の ≥ 70%

Rollback

いつでも NYASH_ABI_BASIC_ON=0 で既存経路に戻せる（設計上のセーフティネット）

Pilot Rollout Plan（段階ONの細分化）

目的

リスクを最小化するため、機能単位で段階ON（パイロット → 観測 → 横展開）を行う。

Week‑by‑Week（目安）

Week 1（quick only）: String.length を ABI で ON（NYASH_ABI_BASIC_ON=1） → 48h 観測
Week 2（quick only）: Array.size/push を追加 ON → 観測
Week 3‑4（quick only）: Map.set/get を追加 ON → 観測
Week 5: quick での ABI ON を integration に横展開（String/Array/Map）
Week 6‑7: 性能微調整・診断強化（≥ 70% 目標）
Week 8: 既定ON 判定（問題あれば NYASH_ABI_BASIC_ON=0 で即時戻し）

診断/観測

NYASH_ABI_BASIC_DIAG=1 で ABI 経路呼出を 1 行ログ（dev/CI観測用、既定OFF）
リーク・ガード（テスト専用）で retain/release 漏れ検知（1000x new/free → 総ハンドル数=0）

CI Matrix（最小）

追加: quick（ABI ON）ジョブ 1 本（段階ONの対象のみ String/Array/Map）
既存: quick/integration（ABI OFF）を維持して回帰監視

Week 1-4: MapBox Implementation

Overview

MapBox is a hash map storing key-value pairs with deterministic iteration order.

Core Methods

box MapBox {
    // Storage
    _buckets: ArrayBox      // Internal bucket storage
    _size: IntegerBox       // Element count
    _load_factor: FloatBox  // For resizing

    // Public API
    set(key: DataBox, value: DataBox) -> NullBox
    get(key: DataBox) -> DataBox | NullBox
    has(key: DataBox) -> BoolBox
    remove(key: DataBox) -> DataBox | NullBox
    size() -> IntegerBox
    keys() -> ArrayBox      // Returns keys in deterministic order
    values() -> ArrayBox    // Returns values in deterministic order
}

Week 1: Foundation

Deliverables:

Data Structures:
- Bucket storage design (chaining or open addressing)
- Hash function for Symbol/Int/String
- Collision resolution strategy
Core Operations:
- set(key, value) implementation
- get(key) implementation
- Basic resizing logic (load factor > 0.75 → double capacity)
Tests:
- Basic set/get operations (10 test cases)
- Hash collision handling (5 test cases)
- Resizing behavior (3 test cases)

Week 2: Deterministic Hashing

Deliverables:

Hash Strategy:
- Symbol hash: Use symbol ID (deterministic)
- Int hash: Use integer value (deterministic)
- String hash: Use UTF-8 bytes with stable algorithm (e.g., FNV-1a)
Key Normalization:
- Type detection (Symbol vs Int vs String)
- Unified comparison function
- Stable sort implementation
Deterministic Iteration:
- keys() returns sorted keys (Symbol < Int < String)
- values() returns values in key order
- Provenance tracking (plugin_id, version)
Tests:
- Mixed-type key insertion (Symbol, Int, String)
- Deterministic iteration order (20 test cases)
- Hash stability (same key → same bucket)

Week 3: Advanced Operations

Deliverables:

Remove Operation:
- remove(key) implementation
- Bucket chain repair (if using chaining)
- Size decrement
Query Operations:
- has(key) implementation
- size() implementation
- Edge cases (empty map, single element)
Boundary Checks:
- Reject ValueBox keys (Fail-Fast)
- Ensure DataBox inputs only
- Clear error messages
Tests:
- Remove operations (15 test cases)
- Edge cases (empty map, remove non-existent key)
- ValueBox rejection (Fail-Fast verification)

Week 4: MapBox Golden Tests

Deliverables:

Golden Test Suite:
- 50+ test cases comparing Rust-MapBox vs Hako-MapBox
- Identical output verification (keys, values, size)
- Performance benchmarks (set/get/remove)
Deterministic Verification:
- Same input → same output (10 runs per test)
- Key order stability (Symbol < Int < String)
- Provenance tracking validation
Performance Baseline:
- Measure operation times (set/get/remove)
- Compare against Rust-MapBox
- Target: ≥ 70% of Rust performance
Documentation:
- MapBox API reference
- Usage examples
- Performance characteristics (O(1) average, O(n) worst case)

Week 5-8: ArrayBox Implementation

Overview

ArrayBox is a dynamic array (list) with automatic capacity expansion.

Core Methods

box ArrayBox {
    // Storage
    _data: RawBufferBox     // Internal buffer
    _size: IntegerBox       // Current element count
    _capacity: IntegerBox   // Allocated capacity

    // Public API
    push(value: DataBox) -> NullBox
    pop() -> DataBox | NullBox
    get(index: IntegerBox) -> DataBox | NullBox
    set(index: IntegerBox, value: DataBox) -> NullBox
    size() -> IntegerBox
    slice(start: IntegerBox, end: IntegerBox) -> ArrayBox
    concat(other: ArrayBox) -> ArrayBox
}

Week 5: Foundation

Deliverables:

Data Structures:
- RawBufferBox for storage
- Capacity expansion strategy (2x growth)
- Bounds checking
Core Operations:
- push(value) implementation
- pop() implementation
- Capacity doubling logic
Tests:
- Basic push/pop operations (10 test cases)
- Capacity expansion (grow from 0 → 1 → 2 → 4 → 8)
- Empty array edge cases

Week 6: Index Operations

Deliverables:

Index Access:
- get(index) implementation
- set(index, value) implementation
- Bounds checking (index < 0 or index ≥ size → error)
Fail-Fast Boundaries:
- Out-of-bounds access → RuntimeError
- Negative index → RuntimeError
- Clear error messages with context
Tests:
- Valid index access (20 test cases)
- Out-of-bounds access (10 test cases)
- Negative index handling (5 test cases)

Week 7: Advanced Operations

Deliverables:

Slice Operation:
- slice(start, end) implementation
- Return new ArrayBox with elements [start, end)
- Handle edge cases (start > end, negative indices)
Concat Operation:
- concat(other) implementation
- Create new ArrayBox with combined elements
- Efficient memory allocation
ValueBox/DataBox Boundaries:
- Reject ValueBox elements (Fail-Fast)
- Ensure DataBox inputs only
- Unpack at entry/exit points
Tests:
- Slice operations (15 test cases)
- Concat operations (10 test cases)
- ValueBox rejection (5 test cases)

Week 8: ArrayBox Golden Tests

Deliverables:

Golden Test Suite:
- 50+ test cases comparing Rust-ArrayBox vs Hako-ArrayBox
- Identical output verification (size, elements, order)
- Performance benchmarks (push/pop/get/set)
Performance Baseline:
- Measure operation times (all methods)
- Compare against Rust-ArrayBox
- Target: ≥ 70% of Rust performance
Large-Scale Tests:
- 10,000 element arrays
- Stress test capacity expansion
- Memory usage validation
Documentation:
- ArrayBox API reference
- Usage examples
- Performance characteristics (O(1) amortized, O(n) worst case)

Key Design Decisions

1. Deterministic Key Order (Symbol < Int < String)

Rationale:

Predictable iteration order for debugging
Reproducible behavior across runs
Stable sorting for golden tests

Implementation:

// Key comparison function
method compare_keys(key1: DataBox, key2: DataBox) -> IntegerBox {
    local type1 = key1.type_id()
    local type2 = key2.type_id()

    // Type priority: Symbol < Int < String
    if (type1 != type2) {
        if (type1 == :Symbol) { return -1 }
        if (type2 == :Symbol) { return 1 }
        if (type1 == :Int) { return -1 }
        if (type2 == :Int) { return 1 }
    }

    // Same type: natural order
    return key1.compare(key2)
}

2. ValueBox/DataBox Boundaries

Rationale:

ValueBox is ephemeral (pipeline boundaries only)
DataBox is persistent (long-lived storage)
Fail-Fast prevents confusion

Implementation:

// MapBox.set with boundary check
method set(key: DataBox, value: DataBox) -> NullBox {
    if (key.is_valuebox()) {
        panic("MapBox.set: key must be DataBox, not ValueBox")
    }
    if (value.is_valuebox()) {
        panic("MapBox.set: value must be DataBox, not ValueBox")
    }

    // Proceed with actual set operation
    local hash = me._hash(key)
    local bucket = me._buckets.get(hash)
    bucket.insert(key, value)
}

3. Fail-Fast Error Handling

Rationale:

No silent fallbacks (prevents bugs)
Clear error messages (aids debugging)
Early detection (stops at source)

Implementation:

// ArrayBox.get with Fail-Fast
method get(index: IntegerBox) -> DataBox | NullBox {
    if (index < 0) {
        panic("ArrayBox.get: negative index: " + index)
    }
    if (index >= me._size) {
        panic("ArrayBox.get: index out of bounds: " + index + " >= " + me._size)
    }

    return me._data.get(index)
}

Success Criteria

Mandatory (✅ All must pass)

MapBox/ArrayBox Fully Implemented:
- All methods work in Pure Hakorune
- No Rust dependencies (except HostBridge)
Golden Tests Pass:
- Rust-Collections vs Hako-Collections: 100% parity
- 100+ test cases with identical output
- Deterministic behavior verified
Deterministic Behavior:
- Key order is predictable (Symbol < Int < String)
- Iteration order is stable
- Provenance tracking works
Performance:
- Hako-Collections ≥ 70% of Rust-Collections speed
- Basic operations (set/get/push/pop) are O(1) average
- Large-scale data (10,000 elements) is practical

Excluded (⚠️ NOT in Phase 20.7)

GC (Garbage Collection): Deferred to Phase 20.8
Optimization: Basic implementation only (no profiling yet)
Concurrency: Single-threaded only
Persistence: In-memory only (no disk storage)

Dependencies

Prerequisite: Phase 20.6

Phase 20.7 requires:

VM Core Complete: All 16 MIR instructions working
Dispatch Unification: Single Resolver path for all method calls
Golden Test Framework: Rust-VM vs Hako-VM comparison working

Provides to: Phase 20.8

Phase 20.7 delivers:

Collections in Hakorune: MapBox/ArrayBox fully self-hosted
Deterministic Iteration: Stable key order for GC root scanning
Reference Tracking: Preparation for GC mark phase

Risk Mitigation

Risk 1: Performance Below 70%

Mitigation:

Measure at each week (not just Week 4/8)
Profile hot paths early (hash function, bounds checks)
Accept slower performance initially (can optimize later)

Risk 2: Deterministic Hashing Complexity

Mitigation:

Use simple, proven hash functions (FNV-1a for strings)
Stable sort implementation (merge sort, not quicksort)
Test hash stability with 1,000+ keys

Risk 3: ValueBox/DataBox Confusion

Mitigation:

Clear error messages ("must be DataBox, not ValueBox")
Fail-Fast at entry points (set/push methods)
Golden tests verify boundary enforcement

Phase 20.5: Pure Hakorune Roadmap
Phase 20.6: VM Core Complete
Phase 20.8: GC v0 + Rust Deprecation
Box-First Principle: CLAUDE.md
Golden Testing Strategy: PURE_HAKORUNE_ROADMAP.md#🧪-golden-testing-strategy

Status: PLANNED Dependencies: Phase 20.6 (VM Core + Dispatch) Delivers: Phase D (Collections in Pure Hakorune) Timeline: 8 weeks (2026-05-25 → 2026-07-19)

14 KiB Raw Blame History Unescape Escape

Phase 20.7: Collections in Hakorune - Implementation Plan

Executive Summary

C‑ABI Bridge Rollout (String/Array/Map)

Flags & Presets

Acceptance Criteria

Rollback

Pilot Rollout Plan（段階ONの細分化）

Week 1-4: MapBox Implementation

Overview

Core Methods

Week 1: Foundation

Week 2: Deterministic Hashing

Week 3: Advanced Operations

Week 4: MapBox Golden Tests

Week 5-8: ArrayBox Implementation

Overview

Core Methods

Week 5: Foundation

Week 6: Index Operations

Week 7: Advanced Operations

Week 8: ArrayBox Golden Tests

Key Design Decisions

1. Deterministic Key Order (Symbol < Int < String)

2. ValueBox/DataBox Boundaries

3. Fail-Fast Error Handling

Success Criteria

Mandatory (✅ All must pass)

Excluded (⚠️ NOT in Phase 20.7)

Dependencies

Prerequisite: Phase 20.6

Provides to: Phase 20.8

Risk Mitigation

Risk 1: Performance Below 70%

Risk 2: Deterministic Hashing Complexity

Risk 3: ValueBox/DataBox Confusion

Related Documents

14 KiB

Raw Blame History