# Phase 20.7: Collections in Hakorune - Implementation Plan **Duration**: 8 weeks (2026-05-25 → 2026-07-19) **Prerequisite**: Phase 20.6 (VM Core Complete + Dispatch Unification) **Completes**: Phase D (Collections - MapBox/ArrayBox in Pure Hakorune) --- ## Executive Summary Phase 20.7 rolls out the **C‑ABI bridge for Basic boxes (String/Array/Map)** so Rust VM and Hako VM call the same implementation. Default profile uses dynamic plugins; embedded/WASM uses built‑in plugins (static link + LTO). Pure Hakorune implementations remain available for research but are not the default execution path. **Key Achievements**: - String/Array/Map via C‑ABI (plugins) with identical behavior across Rust/Hako VMs - Built‑in plugin preset (WASM/embedded) yields zero‑cost inlining via Two‑Layer Export - Golden tests proving Rust vs Hako parity for collections (size/push/get/set/slice/indexOf/substring) - Performance ≥ 70% of Rust runtime (for ABI path) **Critical Principles**: - **C‑ABI SSOT** w/ Two‑Layer Export(Stable ABI / Inlinable API) - **Deterministic Behavior**: Key order is predictable (Symbol < Int < String) - **ValueBox/DataBox Boundaries**: ValueBox is temporary (pipeline only), DataBox is persistent - **Fail‑Fast**: Unknown methods/capability violations → RuntimeError (no silent fallbacks) --- ## C‑ABI Bridge Rollout (String/Array/Map) ### Flags & Presets - `NYASH_ABI_BASIC_ON=1` … Enable ABI path for Basic boxes(既定: OFF → quick → integration で段階ON) - `NYASH_ABI_BUILTIN_LINK=1` … Built‑in plugins(LTO/inline) for embedded/WASM preset - `NYASH_NORMALIZE_CORE_EXTERN=1` … Builder/VM が String/Array/Map を `Extern("nyrt.*")` に強制正規化するゲート。既定ON(0で BoxCall/Plugin 経路にロールバック可)。 - `NYASH_VERIFY_CORE_EXTERN=1` … Verifier strict。Method/ModuleFunction 形が残っていないかを Fail‑Fast で監視(ローカル観測用、既定OFF)。 - Presets: default(plugins), embedded(built‑in), research(pure) ### Acceptance Criteria - VM/LLVM/Hako(nyvm)で collections 操作の出力一致(size/get/set/push/slice/indexOf/substring) - quick → integration と段階で `NYASH_ABI_BASIC_ON=1` をON化し全緑 - 性能: ABI経路が Rust 実装の ≥ 70% ### Rollback - いつでも `NYASH_ABI_BASIC_ON=0` で既存経路に戻せる(設計上のセーフティネット) --- ## Pilot Rollout Plan(段階ONの細分化) 目的 - リスクを最小化するため、機能単位で段階ON(パイロット → 観測 → 横展開)を行う。 Week‑by‑Week(目安) - Week 1(quick only): String.length を ABI で ON(`NYASH_ABI_BASIC_ON=1`) → 48h 観測 - Week 2(quick only): Array.size/push を追加 ON → 観測 - Week 3‑4(quick only): Map.set/get を追加 ON → 観測 - Week 5: quick での ABI ON を integration に横展開(String/Array/Map) - Week 6‑7: 性能微調整・診断強化(≥ 70% 目標) - Week 8: 既定ON 判定(問題あれば `NYASH_ABI_BASIC_ON=0` で即時戻し) 診断/観測 - `NYASH_ABI_BASIC_DIAG=1` で ABI 経路呼出を 1 行ログ(dev/CI観測用、既定OFF) - リーク・ガード(テスト専用)で retain/release 漏れ検知(1000x new/free → 総ハンドル数=0) CI Matrix(最小) - 追加: quick(ABI ON)ジョブ 1 本(段階ONの対象のみ String/Array/Map) - 既存: quick/integration(ABI OFF)を維持して回帰監視 --- ## Week 1-4: MapBox Implementation ### Overview MapBox is a hash map storing key-value pairs with deterministic iteration order. ### Core Methods ```hakorune box MapBox { // Storage _buckets: ArrayBox // Internal bucket storage _size: IntegerBox // Element count _load_factor: FloatBox // For resizing // Public API set(key: DataBox, value: DataBox) -> NullBox get(key: DataBox) -> DataBox | NullBox has(key: DataBox) -> BoolBox remove(key: DataBox) -> DataBox | NullBox size() -> IntegerBox keys() -> ArrayBox // Returns keys in deterministic order values() -> ArrayBox // Returns values in deterministic order } ``` ### Week 1: Foundation **Deliverables**: 1. **Data Structures**: - Bucket storage design (chaining or open addressing) - Hash function for Symbol/Int/String - Collision resolution strategy 2. **Core Operations**: - `set(key, value)` implementation - `get(key)` implementation - Basic resizing logic (load factor > 0.75 → double capacity) 3. **Tests**: - Basic set/get operations (10 test cases) - Hash collision handling (5 test cases) - Resizing behavior (3 test cases) ### Week 2: Deterministic Hashing **Deliverables**: 1. **Hash Strategy**: - Symbol hash: Use symbol ID (deterministic) - Int hash: Use integer value (deterministic) - String hash: Use UTF-8 bytes with stable algorithm (e.g., FNV-1a) 2. **Key Normalization**: - Type detection (Symbol vs Int vs String) - Unified comparison function - Stable sort implementation 3. **Deterministic Iteration**: - `keys()` returns sorted keys (Symbol < Int < String) - `values()` returns values in key order - Provenance tracking (plugin_id, version) 4. **Tests**: - Mixed-type key insertion (Symbol, Int, String) - Deterministic iteration order (20 test cases) - Hash stability (same key → same bucket) ### Week 3: Advanced Operations **Deliverables**: 1. **Remove Operation**: - `remove(key)` implementation - Bucket chain repair (if using chaining) - Size decrement 2. **Query Operations**: - `has(key)` implementation - `size()` implementation - Edge cases (empty map, single element) 3. **Boundary Checks**: - Reject ValueBox keys (Fail-Fast) - Ensure DataBox inputs only - Clear error messages 4. **Tests**: - Remove operations (15 test cases) - Edge cases (empty map, remove non-existent key) - ValueBox rejection (Fail-Fast verification) ### Week 4: MapBox Golden Tests **Deliverables**: 1. **Golden Test Suite**: - 50+ test cases comparing Rust-MapBox vs Hako-MapBox - Identical output verification (keys, values, size) - Performance benchmarks (set/get/remove) 2. **Deterministic Verification**: - Same input → same output (10 runs per test) - Key order stability (Symbol < Int < String) - Provenance tracking validation 3. **Performance Baseline**: - Measure operation times (set/get/remove) - Compare against Rust-MapBox - Target: ≥ 70% of Rust performance 4. **Documentation**: - MapBox API reference - Usage examples - Performance characteristics (O(1) average, O(n) worst case) --- ## Week 5-8: ArrayBox Implementation ### Overview ArrayBox is a dynamic array (list) with automatic capacity expansion. ### Core Methods ```hakorune box ArrayBox { // Storage _data: RawBufferBox // Internal buffer _size: IntegerBox // Current element count _capacity: IntegerBox // Allocated capacity // Public API push(value: DataBox) -> NullBox pop() -> DataBox | NullBox get(index: IntegerBox) -> DataBox | NullBox set(index: IntegerBox, value: DataBox) -> NullBox size() -> IntegerBox slice(start: IntegerBox, end: IntegerBox) -> ArrayBox concat(other: ArrayBox) -> ArrayBox } ``` ### Week 5: Foundation **Deliverables**: 1. **Data Structures**: - RawBufferBox for storage - Capacity expansion strategy (2x growth) - Bounds checking 2. **Core Operations**: - `push(value)` implementation - `pop()` implementation - Capacity doubling logic 3. **Tests**: - Basic push/pop operations (10 test cases) - Capacity expansion (grow from 0 → 1 → 2 → 4 → 8) - Empty array edge cases ### Week 6: Index Operations **Deliverables**: 1. **Index Access**: - `get(index)` implementation - `set(index, value)` implementation - Bounds checking (index < 0 or index ≥ size → error) 2. **Fail-Fast Boundaries**: - Out-of-bounds access → RuntimeError - Negative index → RuntimeError - Clear error messages with context 3. **Tests**: - Valid index access (20 test cases) - Out-of-bounds access (10 test cases) - Negative index handling (5 test cases) ### Week 7: Advanced Operations **Deliverables**: 1. **Slice Operation**: - `slice(start, end)` implementation - Return new ArrayBox with elements [start, end) - Handle edge cases (start > end, negative indices) 2. **Concat Operation**: - `concat(other)` implementation - Create new ArrayBox with combined elements - Efficient memory allocation 3. **ValueBox/DataBox Boundaries**: - Reject ValueBox elements (Fail-Fast) - Ensure DataBox inputs only - Unpack at entry/exit points 4. **Tests**: - Slice operations (15 test cases) - Concat operations (10 test cases) - ValueBox rejection (5 test cases) ### Week 8: ArrayBox Golden Tests **Deliverables**: 1. **Golden Test Suite**: - 50+ test cases comparing Rust-ArrayBox vs Hako-ArrayBox - Identical output verification (size, elements, order) - Performance benchmarks (push/pop/get/set) 2. **Performance Baseline**: - Measure operation times (all methods) - Compare against Rust-ArrayBox - Target: ≥ 70% of Rust performance 3. **Large-Scale Tests**: - 10,000 element arrays - Stress test capacity expansion - Memory usage validation 4. **Documentation**: - ArrayBox API reference - Usage examples - Performance characteristics (O(1) amortized, O(n) worst case) --- ## Key Design Decisions ### 1. Deterministic Key Order (Symbol < Int < String) **Rationale**: - Predictable iteration order for debugging - Reproducible behavior across runs - Stable sorting for golden tests **Implementation**: ```hakorune // Key comparison function method compare_keys(key1: DataBox, key2: DataBox) -> IntegerBox { local type1 = key1.type_id() local type2 = key2.type_id() // Type priority: Symbol < Int < String if (type1 != type2) { if (type1 == :Symbol) { return -1 } if (type2 == :Symbol) { return 1 } if (type1 == :Int) { return -1 } if (type2 == :Int) { return 1 } } // Same type: natural order return key1.compare(key2) } ``` ### 2. ValueBox/DataBox Boundaries **Rationale**: - ValueBox is ephemeral (pipeline boundaries only) - DataBox is persistent (long-lived storage) - Fail-Fast prevents confusion **Implementation**: ```hakorune // MapBox.set with boundary check method set(key: DataBox, value: DataBox) -> NullBox { if (key.is_valuebox()) { panic("MapBox.set: key must be DataBox, not ValueBox") } if (value.is_valuebox()) { panic("MapBox.set: value must be DataBox, not ValueBox") } // Proceed with actual set operation local hash = me._hash(key) local bucket = me._buckets.get(hash) bucket.insert(key, value) } ``` ### 3. Fail-Fast Error Handling **Rationale**: - No silent fallbacks (prevents bugs) - Clear error messages (aids debugging) - Early detection (stops at source) **Implementation**: ```hakorune // ArrayBox.get with Fail-Fast method get(index: IntegerBox) -> DataBox | NullBox { if (index < 0) { panic("ArrayBox.get: negative index: " + index) } if (index >= me._size) { panic("ArrayBox.get: index out of bounds: " + index + " >= " + me._size) } return me._data.get(index) } ``` --- ## Success Criteria ### Mandatory (✅ All must pass) 1. **MapBox/ArrayBox Fully Implemented**: - All methods work in Pure Hakorune - No Rust dependencies (except HostBridge) 2. **Golden Tests Pass**: - Rust-Collections vs Hako-Collections: 100% parity - 100+ test cases with identical output - Deterministic behavior verified 3. **Deterministic Behavior**: - Key order is predictable (Symbol < Int < String) - Iteration order is stable - Provenance tracking works 4. **Performance**: - Hako-Collections ≥ 70% of Rust-Collections speed - Basic operations (set/get/push/pop) are O(1) average - Large-scale data (10,000 elements) is practical ### Excluded (⚠️ NOT in Phase 20.7) - **GC (Garbage Collection)**: Deferred to Phase 20.8 - **Optimization**: Basic implementation only (no profiling yet) - **Concurrency**: Single-threaded only - **Persistence**: In-memory only (no disk storage) --- ## Dependencies ### Prerequisite: Phase 20.6 Phase 20.7 requires: 1. **VM Core Complete**: All 16 MIR instructions working 2. **Dispatch Unification**: Single Resolver path for all method calls 3. **Golden Test Framework**: Rust-VM vs Hako-VM comparison working ### Provides to: Phase 20.8 Phase 20.7 delivers: 1. **Collections in Hakorune**: MapBox/ArrayBox fully self-hosted 2. **Deterministic Iteration**: Stable key order for GC root scanning 3. **Reference Tracking**: Preparation for GC mark phase --- ## Risk Mitigation ### Risk 1: Performance Below 70% **Mitigation**: - Measure at each week (not just Week 4/8) - Profile hot paths early (hash function, bounds checks) - Accept slower performance initially (can optimize later) ### Risk 2: Deterministic Hashing Complexity **Mitigation**: - Use simple, proven hash functions (FNV-1a for strings) - Stable sort implementation (merge sort, not quicksort) - Test hash stability with 1,000+ keys ### Risk 3: ValueBox/DataBox Confusion **Mitigation**: - Clear error messages ("must be DataBox, not ValueBox") - Fail-Fast at entry points (set/push methods) - Golden tests verify boundary enforcement --- ## Related Documents - **Phase 20.5**: [Pure Hakorune Roadmap](../phase-20.5/PURE_HAKORUNE_ROADMAP.md) - **Phase 20.6**: [VM Core Complete](../phase-20.6/) - **Phase 20.8**: [GC v0 + Rust Deprecation](../phase-20.8/) - **Box-First Principle**: [CLAUDE.md](../../../../CLAUDE.md#🧱-先頭原則-箱理論box-first) - **Golden Testing Strategy**: [PURE_HAKORUNE_ROADMAP.md#🧪-golden-testing-strategy](../phase-20.5/PURE_HAKORUNE_ROADMAP.md#-golden-testing-strategy) --- **Status**: PLANNED **Dependencies**: Phase 20.6 (VM Core + Dispatch) **Delivers**: Phase D (Collections in Pure Hakorune) **Timeline**: 8 weeks (2026-05-25 → 2026-07-19)