14 KiB
Phase 20.7: Collections in Hakorune - Implementation Plan
Duration: 8 weeks (2026-05-25 → 2026-07-19) Prerequisite: Phase 20.6 (VM Core Complete + Dispatch Unification) Completes: Phase D (Collections - MapBox/ArrayBox in Pure Hakorune)
Executive Summary
Phase 20.7 rolls out the C‑ABI bridge for Basic boxes (String/Array/Map) so Rust VM and Hako VM call the same implementation. Default profile uses dynamic plugins; embedded/WASM uses built‑in plugins (static link + LTO). Pure Hakorune implementations remain available for research but are not the default execution path.
Key Achievements:
- String/Array/Map via C‑ABI (plugins) with identical behavior across Rust/Hako VMs
- Built‑in plugin preset (WASM/embedded) yields zero‑cost inlining via Two‑Layer Export
- Golden tests proving Rust vs Hako parity for collections (size/push/get/set/slice/indexOf/substring)
- Performance ≥ 70% of Rust runtime (for ABI path)
Critical Principles:
- C‑ABI SSOT w/ Two‑Layer Export(Stable ABI / Inlinable API)
- Deterministic Behavior: Key order is predictable (Symbol < Int < String)
- ValueBox/DataBox Boundaries: ValueBox is temporary (pipeline only), DataBox is persistent
- Fail‑Fast: Unknown methods/capability violations → RuntimeError (no silent fallbacks)
C‑ABI Bridge Rollout (String/Array/Map)
Flags & Presets
NYASH_ABI_BASIC_ON=1… Enable ABI path for Basic boxes(既定: OFF → quick → integration で段階ON)NYASH_ABI_BUILTIN_LINK=1… Built‑in plugins(LTO/inline) for embedded/WASM presetNYASH_NORMALIZE_CORE_EXTERN=1… Builder/VM が String/Array/Map をExtern("nyrt.*")に強制正規化するゲート。既定ON(0で BoxCall/Plugin 経路にロールバック可)。NYASH_VERIFY_CORE_EXTERN=1… Verifier strict。Method/ModuleFunction 形が残っていないかを Fail‑Fast で監視(ローカル観測用、既定OFF)。- Presets: default(plugins), embedded(built‑in), research(pure)
Acceptance Criteria
- VM/LLVM/Hako(nyvm)で collections 操作の出力一致(size/get/set/push/slice/indexOf/substring)
- quick → integration と段階で
NYASH_ABI_BASIC_ON=1をON化し全緑 - 性能: ABI経路が Rust 実装の ≥ 70%
Rollback
- いつでも
NYASH_ABI_BASIC_ON=0で既存経路に戻せる(設計上のセーフティネット)
Pilot Rollout Plan(段階ONの細分化)
目的
- リスクを最小化するため、機能単位で段階ON(パイロット → 観測 → 横展開)を行う。
Week‑by‑Week(目安)
- Week 1(quick only): String.length を ABI で ON(
NYASH_ABI_BASIC_ON=1) → 48h 観測 - Week 2(quick only): Array.size/push を追加 ON → 観測
- Week 3‑4(quick only): Map.set/get を追加 ON → 観測
- Week 5: quick での ABI ON を integration に横展開(String/Array/Map)
- Week 6‑7: 性能微調整・診断強化(≥ 70% 目標)
- Week 8: 既定ON 判定(問題あれば
NYASH_ABI_BASIC_ON=0で即時戻し)
診断/観測
NYASH_ABI_BASIC_DIAG=1で ABI 経路呼出を 1 行ログ(dev/CI観測用、既定OFF)- リーク・ガード(テスト専用)で retain/release 漏れ検知(1000x new/free → 総ハンドル数=0)
CI Matrix(最小)
- 追加: quick(ABI ON)ジョブ 1 本(段階ONの対象のみ String/Array/Map)
- 既存: quick/integration(ABI OFF)を維持して回帰監視
Week 1-4: MapBox Implementation
Overview
MapBox is a hash map storing key-value pairs with deterministic iteration order.
Core Methods
box MapBox {
// Storage
_buckets: ArrayBox // Internal bucket storage
_size: IntegerBox // Element count
_load_factor: FloatBox // For resizing
// Public API
set(key: DataBox, value: DataBox) -> NullBox
get(key: DataBox) -> DataBox | NullBox
has(key: DataBox) -> BoolBox
remove(key: DataBox) -> DataBox | NullBox
size() -> IntegerBox
keys() -> ArrayBox // Returns keys in deterministic order
values() -> ArrayBox // Returns values in deterministic order
}
Week 1: Foundation
Deliverables:
-
Data Structures:
- Bucket storage design (chaining or open addressing)
- Hash function for Symbol/Int/String
- Collision resolution strategy
-
Core Operations:
set(key, value)implementationget(key)implementation- Basic resizing logic (load factor > 0.75 → double capacity)
-
Tests:
- Basic set/get operations (10 test cases)
- Hash collision handling (5 test cases)
- Resizing behavior (3 test cases)
Week 2: Deterministic Hashing
Deliverables:
-
Hash Strategy:
- Symbol hash: Use symbol ID (deterministic)
- Int hash: Use integer value (deterministic)
- String hash: Use UTF-8 bytes with stable algorithm (e.g., FNV-1a)
-
Key Normalization:
- Type detection (Symbol vs Int vs String)
- Unified comparison function
- Stable sort implementation
-
Deterministic Iteration:
keys()returns sorted keys (Symbol < Int < String)values()returns values in key order- Provenance tracking (plugin_id, version)
-
Tests:
- Mixed-type key insertion (Symbol, Int, String)
- Deterministic iteration order (20 test cases)
- Hash stability (same key → same bucket)
Week 3: Advanced Operations
Deliverables:
-
Remove Operation:
remove(key)implementation- Bucket chain repair (if using chaining)
- Size decrement
-
Query Operations:
has(key)implementationsize()implementation- Edge cases (empty map, single element)
-
Boundary Checks:
- Reject ValueBox keys (Fail-Fast)
- Ensure DataBox inputs only
- Clear error messages
-
Tests:
- Remove operations (15 test cases)
- Edge cases (empty map, remove non-existent key)
- ValueBox rejection (Fail-Fast verification)
Week 4: MapBox Golden Tests
Deliverables:
-
Golden Test Suite:
- 50+ test cases comparing Rust-MapBox vs Hako-MapBox
- Identical output verification (keys, values, size)
- Performance benchmarks (set/get/remove)
-
Deterministic Verification:
- Same input → same output (10 runs per test)
- Key order stability (Symbol < Int < String)
- Provenance tracking validation
-
Performance Baseline:
- Measure operation times (set/get/remove)
- Compare against Rust-MapBox
- Target: ≥ 70% of Rust performance
-
Documentation:
- MapBox API reference
- Usage examples
- Performance characteristics (O(1) average, O(n) worst case)
Week 5-8: ArrayBox Implementation
Overview
ArrayBox is a dynamic array (list) with automatic capacity expansion.
Core Methods
box ArrayBox {
// Storage
_data: RawBufferBox // Internal buffer
_size: IntegerBox // Current element count
_capacity: IntegerBox // Allocated capacity
// Public API
push(value: DataBox) -> NullBox
pop() -> DataBox | NullBox
get(index: IntegerBox) -> DataBox | NullBox
set(index: IntegerBox, value: DataBox) -> NullBox
size() -> IntegerBox
slice(start: IntegerBox, end: IntegerBox) -> ArrayBox
concat(other: ArrayBox) -> ArrayBox
}
Week 5: Foundation
Deliverables:
-
Data Structures:
- RawBufferBox for storage
- Capacity expansion strategy (2x growth)
- Bounds checking
-
Core Operations:
push(value)implementationpop()implementation- Capacity doubling logic
-
Tests:
- Basic push/pop operations (10 test cases)
- Capacity expansion (grow from 0 → 1 → 2 → 4 → 8)
- Empty array edge cases
Week 6: Index Operations
Deliverables:
-
Index Access:
get(index)implementationset(index, value)implementation- Bounds checking (index < 0 or index ≥ size → error)
-
Fail-Fast Boundaries:
- Out-of-bounds access → RuntimeError
- Negative index → RuntimeError
- Clear error messages with context
-
Tests:
- Valid index access (20 test cases)
- Out-of-bounds access (10 test cases)
- Negative index handling (5 test cases)
Week 7: Advanced Operations
Deliverables:
-
Slice Operation:
slice(start, end)implementation- Return new ArrayBox with elements [start, end)
- Handle edge cases (start > end, negative indices)
-
Concat Operation:
concat(other)implementation- Create new ArrayBox with combined elements
- Efficient memory allocation
-
ValueBox/DataBox Boundaries:
- Reject ValueBox elements (Fail-Fast)
- Ensure DataBox inputs only
- Unpack at entry/exit points
-
Tests:
- Slice operations (15 test cases)
- Concat operations (10 test cases)
- ValueBox rejection (5 test cases)
Week 8: ArrayBox Golden Tests
Deliverables:
-
Golden Test Suite:
- 50+ test cases comparing Rust-ArrayBox vs Hako-ArrayBox
- Identical output verification (size, elements, order)
- Performance benchmarks (push/pop/get/set)
-
Performance Baseline:
- Measure operation times (all methods)
- Compare against Rust-ArrayBox
- Target: ≥ 70% of Rust performance
-
Large-Scale Tests:
- 10,000 element arrays
- Stress test capacity expansion
- Memory usage validation
-
Documentation:
- ArrayBox API reference
- Usage examples
- Performance characteristics (O(1) amortized, O(n) worst case)
Key Design Decisions
1. Deterministic Key Order (Symbol < Int < String)
Rationale:
- Predictable iteration order for debugging
- Reproducible behavior across runs
- Stable sorting for golden tests
Implementation:
// Key comparison function
method compare_keys(key1: DataBox, key2: DataBox) -> IntegerBox {
local type1 = key1.type_id()
local type2 = key2.type_id()
// Type priority: Symbol < Int < String
if (type1 != type2) {
if (type1 == :Symbol) { return -1 }
if (type2 == :Symbol) { return 1 }
if (type1 == :Int) { return -1 }
if (type2 == :Int) { return 1 }
}
// Same type: natural order
return key1.compare(key2)
}
2. ValueBox/DataBox Boundaries
Rationale:
- ValueBox is ephemeral (pipeline boundaries only)
- DataBox is persistent (long-lived storage)
- Fail-Fast prevents confusion
Implementation:
// MapBox.set with boundary check
method set(key: DataBox, value: DataBox) -> NullBox {
if (key.is_valuebox()) {
panic("MapBox.set: key must be DataBox, not ValueBox")
}
if (value.is_valuebox()) {
panic("MapBox.set: value must be DataBox, not ValueBox")
}
// Proceed with actual set operation
local hash = me._hash(key)
local bucket = me._buckets.get(hash)
bucket.insert(key, value)
}
3. Fail-Fast Error Handling
Rationale:
- No silent fallbacks (prevents bugs)
- Clear error messages (aids debugging)
- Early detection (stops at source)
Implementation:
// ArrayBox.get with Fail-Fast
method get(index: IntegerBox) -> DataBox | NullBox {
if (index < 0) {
panic("ArrayBox.get: negative index: " + index)
}
if (index >= me._size) {
panic("ArrayBox.get: index out of bounds: " + index + " >= " + me._size)
}
return me._data.get(index)
}
Success Criteria
Mandatory (✅ All must pass)
-
MapBox/ArrayBox Fully Implemented:
- All methods work in Pure Hakorune
- No Rust dependencies (except HostBridge)
-
Golden Tests Pass:
- Rust-Collections vs Hako-Collections: 100% parity
- 100+ test cases with identical output
- Deterministic behavior verified
-
Deterministic Behavior:
- Key order is predictable (Symbol < Int < String)
- Iteration order is stable
- Provenance tracking works
-
Performance:
- Hako-Collections ≥ 70% of Rust-Collections speed
- Basic operations (set/get/push/pop) are O(1) average
- Large-scale data (10,000 elements) is practical
Excluded (⚠️ NOT in Phase 20.7)
- GC (Garbage Collection): Deferred to Phase 20.8
- Optimization: Basic implementation only (no profiling yet)
- Concurrency: Single-threaded only
- Persistence: In-memory only (no disk storage)
Dependencies
Prerequisite: Phase 20.6
Phase 20.7 requires:
- VM Core Complete: All 16 MIR instructions working
- Dispatch Unification: Single Resolver path for all method calls
- Golden Test Framework: Rust-VM vs Hako-VM comparison working
Provides to: Phase 20.8
Phase 20.7 delivers:
- Collections in Hakorune: MapBox/ArrayBox fully self-hosted
- Deterministic Iteration: Stable key order for GC root scanning
- Reference Tracking: Preparation for GC mark phase
Risk Mitigation
Risk 1: Performance Below 70%
Mitigation:
- Measure at each week (not just Week 4/8)
- Profile hot paths early (hash function, bounds checks)
- Accept slower performance initially (can optimize later)
Risk 2: Deterministic Hashing Complexity
Mitigation:
- Use simple, proven hash functions (FNV-1a for strings)
- Stable sort implementation (merge sort, not quicksort)
- Test hash stability with 1,000+ keys
Risk 3: ValueBox/DataBox Confusion
Mitigation:
- Clear error messages ("must be DataBox, not ValueBox")
- Fail-Fast at entry points (set/push methods)
- Golden tests verify boundary enforcement
Related Documents
- Phase 20.5: Pure Hakorune Roadmap
- Phase 20.6: VM Core Complete
- Phase 20.8: GC v0 + Rust Deprecation
- Box-First Principle: CLAUDE.md
- Golden Testing Strategy: PURE_HAKORUNE_ROADMAP.md#🧪-golden-testing-strategy
Status: PLANNED Dependencies: Phase 20.6 (VM Core + Dispatch) Delivers: Phase D (Collections in Pure Hakorune) Timeline: 8 weeks (2026-05-25 → 2026-07-19)