462 lines
14 KiB
Markdown
462 lines
14 KiB
Markdown
# Phase 20.7: Collections in Hakorune - Implementation Plan
|
||
|
||
**Duration**: 8 weeks (2026-05-25 → 2026-07-19)
|
||
**Prerequisite**: Phase 20.6 (VM Core Complete + Dispatch Unification)
|
||
**Completes**: Phase D (Collections - MapBox/ArrayBox in Pure Hakorune)
|
||
|
||
---
|
||
|
||
## Executive Summary
|
||
|
||
Phase 20.7 rolls out the **C‑ABI bridge for Basic boxes (String/Array/Map)** so Rust VM and Hako VM call the same implementation. Default profile uses dynamic plugins; embedded/WASM uses built‑in plugins (static link + LTO). Pure Hakorune implementations remain available for research but are not the default execution path.
|
||
|
||
**Key Achievements**:
|
||
- String/Array/Map via C‑ABI (plugins) with identical behavior across Rust/Hako VMs
|
||
- Built‑in plugin preset (WASM/embedded) yields zero‑cost inlining via Two‑Layer Export
|
||
- Golden tests proving Rust vs Hako parity for collections (size/push/get/set/slice/indexOf/substring)
|
||
- Performance ≥ 70% of Rust runtime (for ABI path)
|
||
|
||
**Critical Principles**:
|
||
- **C‑ABI SSOT** w/ Two‑Layer Export(Stable ABI / Inlinable API)
|
||
- **Deterministic Behavior**: Key order is predictable (Symbol < Int < String)
|
||
- **ValueBox/DataBox Boundaries**: ValueBox is temporary (pipeline only), DataBox is persistent
|
||
- **Fail‑Fast**: Unknown methods/capability violations → RuntimeError (no silent fallbacks)
|
||
|
||
---
|
||
|
||
## C‑ABI Bridge Rollout (String/Array/Map)
|
||
|
||
### Flags & Presets
|
||
- `NYASH_ABI_BASIC_ON=1` … Enable ABI path for Basic boxes(既定: OFF → quick → integration で段階ON)
|
||
- `NYASH_ABI_BUILTIN_LINK=1` … Built‑in plugins(LTO/inline) for embedded/WASM preset
|
||
- `NYASH_NORMALIZE_CORE_EXTERN=1` … Builder/VM が String/Array/Map を `Extern("nyrt.*")` に強制正規化するゲート。既定ON(0で BoxCall/Plugin 経路にロールバック可)。
|
||
- `NYASH_VERIFY_CORE_EXTERN=1` … Verifier strict。Method/ModuleFunction 形が残っていないかを Fail‑Fast で監視(ローカル観測用、既定OFF)。
|
||
- Presets: default(plugins), embedded(built‑in), research(pure)
|
||
|
||
### Acceptance Criteria
|
||
- VM/LLVM/Hako(nyvm)で collections 操作の出力一致(size/get/set/push/slice/indexOf/substring)
|
||
- quick → integration と段階で `NYASH_ABI_BASIC_ON=1` をON化し全緑
|
||
- 性能: ABI経路が Rust 実装の ≥ 70%
|
||
|
||
### Rollback
|
||
- いつでも `NYASH_ABI_BASIC_ON=0` で既存経路に戻せる(設計上のセーフティネット)
|
||
|
||
---
|
||
|
||
## Pilot Rollout Plan(段階ONの細分化)
|
||
|
||
目的
|
||
- リスクを最小化するため、機能単位で段階ON(パイロット → 観測 → 横展開)を行う。
|
||
|
||
Week‑by‑Week(目安)
|
||
- Week 1(quick only): String.length を ABI で ON(`NYASH_ABI_BASIC_ON=1`) → 48h 観測
|
||
- Week 2(quick only): Array.size/push を追加 ON → 観測
|
||
- Week 3‑4(quick only): Map.set/get を追加 ON → 観測
|
||
- Week 5: quick での ABI ON を integration に横展開(String/Array/Map)
|
||
- Week 6‑7: 性能微調整・診断強化(≥ 70% 目標)
|
||
- Week 8: 既定ON 判定(問題あれば `NYASH_ABI_BASIC_ON=0` で即時戻し)
|
||
|
||
診断/観測
|
||
- `NYASH_ABI_BASIC_DIAG=1` で ABI 経路呼出を 1 行ログ(dev/CI観測用、既定OFF)
|
||
- リーク・ガード(テスト専用)で retain/release 漏れ検知(1000x new/free → 総ハンドル数=0)
|
||
|
||
CI Matrix(最小)
|
||
- 追加: quick(ABI ON)ジョブ 1 本(段階ONの対象のみ String/Array/Map)
|
||
- 既存: quick/integration(ABI OFF)を維持して回帰監視
|
||
|
||
|
||
---
|
||
|
||
## Week 1-4: MapBox Implementation
|
||
|
||
### Overview
|
||
|
||
MapBox is a hash map storing key-value pairs with deterministic iteration order.
|
||
|
||
### Core Methods
|
||
|
||
```hakorune
|
||
box MapBox {
|
||
// Storage
|
||
_buckets: ArrayBox // Internal bucket storage
|
||
_size: IntegerBox // Element count
|
||
_load_factor: FloatBox // For resizing
|
||
|
||
// Public API
|
||
set(key: DataBox, value: DataBox) -> NullBox
|
||
get(key: DataBox) -> DataBox | NullBox
|
||
has(key: DataBox) -> BoolBox
|
||
remove(key: DataBox) -> DataBox | NullBox
|
||
size() -> IntegerBox
|
||
keys() -> ArrayBox // Returns keys in deterministic order
|
||
values() -> ArrayBox // Returns values in deterministic order
|
||
}
|
||
```
|
||
|
||
### Week 1: Foundation
|
||
|
||
**Deliverables**:
|
||
1. **Data Structures**:
|
||
- Bucket storage design (chaining or open addressing)
|
||
- Hash function for Symbol/Int/String
|
||
- Collision resolution strategy
|
||
|
||
2. **Core Operations**:
|
||
- `set(key, value)` implementation
|
||
- `get(key)` implementation
|
||
- Basic resizing logic (load factor > 0.75 → double capacity)
|
||
|
||
3. **Tests**:
|
||
- Basic set/get operations (10 test cases)
|
||
- Hash collision handling (5 test cases)
|
||
- Resizing behavior (3 test cases)
|
||
|
||
### Week 2: Deterministic Hashing
|
||
|
||
**Deliverables**:
|
||
1. **Hash Strategy**:
|
||
- Symbol hash: Use symbol ID (deterministic)
|
||
- Int hash: Use integer value (deterministic)
|
||
- String hash: Use UTF-8 bytes with stable algorithm (e.g., FNV-1a)
|
||
|
||
2. **Key Normalization**:
|
||
- Type detection (Symbol vs Int vs String)
|
||
- Unified comparison function
|
||
- Stable sort implementation
|
||
|
||
3. **Deterministic Iteration**:
|
||
- `keys()` returns sorted keys (Symbol < Int < String)
|
||
- `values()` returns values in key order
|
||
- Provenance tracking (plugin_id, version)
|
||
|
||
4. **Tests**:
|
||
- Mixed-type key insertion (Symbol, Int, String)
|
||
- Deterministic iteration order (20 test cases)
|
||
- Hash stability (same key → same bucket)
|
||
|
||
### Week 3: Advanced Operations
|
||
|
||
**Deliverables**:
|
||
1. **Remove Operation**:
|
||
- `remove(key)` implementation
|
||
- Bucket chain repair (if using chaining)
|
||
- Size decrement
|
||
|
||
2. **Query Operations**:
|
||
- `has(key)` implementation
|
||
- `size()` implementation
|
||
- Edge cases (empty map, single element)
|
||
|
||
3. **Boundary Checks**:
|
||
- Reject ValueBox keys (Fail-Fast)
|
||
- Ensure DataBox inputs only
|
||
- Clear error messages
|
||
|
||
4. **Tests**:
|
||
- Remove operations (15 test cases)
|
||
- Edge cases (empty map, remove non-existent key)
|
||
- ValueBox rejection (Fail-Fast verification)
|
||
|
||
### Week 4: MapBox Golden Tests
|
||
|
||
**Deliverables**:
|
||
1. **Golden Test Suite**:
|
||
- 50+ test cases comparing Rust-MapBox vs Hako-MapBox
|
||
- Identical output verification (keys, values, size)
|
||
- Performance benchmarks (set/get/remove)
|
||
|
||
2. **Deterministic Verification**:
|
||
- Same input → same output (10 runs per test)
|
||
- Key order stability (Symbol < Int < String)
|
||
- Provenance tracking validation
|
||
|
||
3. **Performance Baseline**:
|
||
- Measure operation times (set/get/remove)
|
||
- Compare against Rust-MapBox
|
||
- Target: ≥ 70% of Rust performance
|
||
|
||
4. **Documentation**:
|
||
- MapBox API reference
|
||
- Usage examples
|
||
- Performance characteristics (O(1) average, O(n) worst case)
|
||
|
||
---
|
||
|
||
## Week 5-8: ArrayBox Implementation
|
||
|
||
### Overview
|
||
|
||
ArrayBox is a dynamic array (list) with automatic capacity expansion.
|
||
|
||
### Core Methods
|
||
|
||
```hakorune
|
||
box ArrayBox {
|
||
// Storage
|
||
_data: RawBufferBox // Internal buffer
|
||
_size: IntegerBox // Current element count
|
||
_capacity: IntegerBox // Allocated capacity
|
||
|
||
// Public API
|
||
push(value: DataBox) -> NullBox
|
||
pop() -> DataBox | NullBox
|
||
get(index: IntegerBox) -> DataBox | NullBox
|
||
set(index: IntegerBox, value: DataBox) -> NullBox
|
||
size() -> IntegerBox
|
||
slice(start: IntegerBox, end: IntegerBox) -> ArrayBox
|
||
concat(other: ArrayBox) -> ArrayBox
|
||
}
|
||
```
|
||
|
||
### Week 5: Foundation
|
||
|
||
**Deliverables**:
|
||
1. **Data Structures**:
|
||
- RawBufferBox for storage
|
||
- Capacity expansion strategy (2x growth)
|
||
- Bounds checking
|
||
|
||
2. **Core Operations**:
|
||
- `push(value)` implementation
|
||
- `pop()` implementation
|
||
- Capacity doubling logic
|
||
|
||
3. **Tests**:
|
||
- Basic push/pop operations (10 test cases)
|
||
- Capacity expansion (grow from 0 → 1 → 2 → 4 → 8)
|
||
- Empty array edge cases
|
||
|
||
### Week 6: Index Operations
|
||
|
||
**Deliverables**:
|
||
1. **Index Access**:
|
||
- `get(index)` implementation
|
||
- `set(index, value)` implementation
|
||
- Bounds checking (index < 0 or index ≥ size → error)
|
||
|
||
2. **Fail-Fast Boundaries**:
|
||
- Out-of-bounds access → RuntimeError
|
||
- Negative index → RuntimeError
|
||
- Clear error messages with context
|
||
|
||
3. **Tests**:
|
||
- Valid index access (20 test cases)
|
||
- Out-of-bounds access (10 test cases)
|
||
- Negative index handling (5 test cases)
|
||
|
||
### Week 7: Advanced Operations
|
||
|
||
**Deliverables**:
|
||
1. **Slice Operation**:
|
||
- `slice(start, end)` implementation
|
||
- Return new ArrayBox with elements [start, end)
|
||
- Handle edge cases (start > end, negative indices)
|
||
|
||
2. **Concat Operation**:
|
||
- `concat(other)` implementation
|
||
- Create new ArrayBox with combined elements
|
||
- Efficient memory allocation
|
||
|
||
3. **ValueBox/DataBox Boundaries**:
|
||
- Reject ValueBox elements (Fail-Fast)
|
||
- Ensure DataBox inputs only
|
||
- Unpack at entry/exit points
|
||
|
||
4. **Tests**:
|
||
- Slice operations (15 test cases)
|
||
- Concat operations (10 test cases)
|
||
- ValueBox rejection (5 test cases)
|
||
|
||
### Week 8: ArrayBox Golden Tests
|
||
|
||
**Deliverables**:
|
||
1. **Golden Test Suite**:
|
||
- 50+ test cases comparing Rust-ArrayBox vs Hako-ArrayBox
|
||
- Identical output verification (size, elements, order)
|
||
- Performance benchmarks (push/pop/get/set)
|
||
|
||
2. **Performance Baseline**:
|
||
- Measure operation times (all methods)
|
||
- Compare against Rust-ArrayBox
|
||
- Target: ≥ 70% of Rust performance
|
||
|
||
3. **Large-Scale Tests**:
|
||
- 10,000 element arrays
|
||
- Stress test capacity expansion
|
||
- Memory usage validation
|
||
|
||
4. **Documentation**:
|
||
- ArrayBox API reference
|
||
- Usage examples
|
||
- Performance characteristics (O(1) amortized, O(n) worst case)
|
||
|
||
---
|
||
|
||
## Key Design Decisions
|
||
|
||
### 1. Deterministic Key Order (Symbol < Int < String)
|
||
|
||
**Rationale**:
|
||
- Predictable iteration order for debugging
|
||
- Reproducible behavior across runs
|
||
- Stable sorting for golden tests
|
||
|
||
**Implementation**:
|
||
```hakorune
|
||
// Key comparison function
|
||
method compare_keys(key1: DataBox, key2: DataBox) -> IntegerBox {
|
||
local type1 = key1.type_id()
|
||
local type2 = key2.type_id()
|
||
|
||
// Type priority: Symbol < Int < String
|
||
if (type1 != type2) {
|
||
if (type1 == :Symbol) { return -1 }
|
||
if (type2 == :Symbol) { return 1 }
|
||
if (type1 == :Int) { return -1 }
|
||
if (type2 == :Int) { return 1 }
|
||
}
|
||
|
||
// Same type: natural order
|
||
return key1.compare(key2)
|
||
}
|
||
```
|
||
|
||
### 2. ValueBox/DataBox Boundaries
|
||
|
||
**Rationale**:
|
||
- ValueBox is ephemeral (pipeline boundaries only)
|
||
- DataBox is persistent (long-lived storage)
|
||
- Fail-Fast prevents confusion
|
||
|
||
**Implementation**:
|
||
```hakorune
|
||
// MapBox.set with boundary check
|
||
method set(key: DataBox, value: DataBox) -> NullBox {
|
||
if (key.is_valuebox()) {
|
||
panic("MapBox.set: key must be DataBox, not ValueBox")
|
||
}
|
||
if (value.is_valuebox()) {
|
||
panic("MapBox.set: value must be DataBox, not ValueBox")
|
||
}
|
||
|
||
// Proceed with actual set operation
|
||
local hash = me._hash(key)
|
||
local bucket = me._buckets.get(hash)
|
||
bucket.insert(key, value)
|
||
}
|
||
```
|
||
|
||
### 3. Fail-Fast Error Handling
|
||
|
||
**Rationale**:
|
||
- No silent fallbacks (prevents bugs)
|
||
- Clear error messages (aids debugging)
|
||
- Early detection (stops at source)
|
||
|
||
**Implementation**:
|
||
```hakorune
|
||
// ArrayBox.get with Fail-Fast
|
||
method get(index: IntegerBox) -> DataBox | NullBox {
|
||
if (index < 0) {
|
||
panic("ArrayBox.get: negative index: " + index)
|
||
}
|
||
if (index >= me._size) {
|
||
panic("ArrayBox.get: index out of bounds: " + index + " >= " + me._size)
|
||
}
|
||
|
||
return me._data.get(index)
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## Success Criteria
|
||
|
||
### Mandatory (✅ All must pass)
|
||
|
||
1. **MapBox/ArrayBox Fully Implemented**:
|
||
- All methods work in Pure Hakorune
|
||
- No Rust dependencies (except HostBridge)
|
||
|
||
2. **Golden Tests Pass**:
|
||
- Rust-Collections vs Hako-Collections: 100% parity
|
||
- 100+ test cases with identical output
|
||
- Deterministic behavior verified
|
||
|
||
3. **Deterministic Behavior**:
|
||
- Key order is predictable (Symbol < Int < String)
|
||
- Iteration order is stable
|
||
- Provenance tracking works
|
||
|
||
4. **Performance**:
|
||
- Hako-Collections ≥ 70% of Rust-Collections speed
|
||
- Basic operations (set/get/push/pop) are O(1) average
|
||
- Large-scale data (10,000 elements) is practical
|
||
|
||
### Excluded (⚠️ NOT in Phase 20.7)
|
||
|
||
- **GC (Garbage Collection)**: Deferred to Phase 20.8
|
||
- **Optimization**: Basic implementation only (no profiling yet)
|
||
- **Concurrency**: Single-threaded only
|
||
- **Persistence**: In-memory only (no disk storage)
|
||
|
||
---
|
||
|
||
## Dependencies
|
||
|
||
### Prerequisite: Phase 20.6
|
||
|
||
Phase 20.7 requires:
|
||
1. **VM Core Complete**: All 16 MIR instructions working
|
||
2. **Dispatch Unification**: Single Resolver path for all method calls
|
||
3. **Golden Test Framework**: Rust-VM vs Hako-VM comparison working
|
||
|
||
### Provides to: Phase 20.8
|
||
|
||
Phase 20.7 delivers:
|
||
1. **Collections in Hakorune**: MapBox/ArrayBox fully self-hosted
|
||
2. **Deterministic Iteration**: Stable key order for GC root scanning
|
||
3. **Reference Tracking**: Preparation for GC mark phase
|
||
|
||
---
|
||
|
||
## Risk Mitigation
|
||
|
||
### Risk 1: Performance Below 70%
|
||
|
||
**Mitigation**:
|
||
- Measure at each week (not just Week 4/8)
|
||
- Profile hot paths early (hash function, bounds checks)
|
||
- Accept slower performance initially (can optimize later)
|
||
|
||
### Risk 2: Deterministic Hashing Complexity
|
||
|
||
**Mitigation**:
|
||
- Use simple, proven hash functions (FNV-1a for strings)
|
||
- Stable sort implementation (merge sort, not quicksort)
|
||
- Test hash stability with 1,000+ keys
|
||
|
||
### Risk 3: ValueBox/DataBox Confusion
|
||
|
||
**Mitigation**:
|
||
- Clear error messages ("must be DataBox, not ValueBox")
|
||
- Fail-Fast at entry points (set/push methods)
|
||
- Golden tests verify boundary enforcement
|
||
|
||
---
|
||
|
||
## Related Documents
|
||
|
||
- **Phase 20.5**: [Pure Hakorune Roadmap](../phase-20.5/PURE_HAKORUNE_ROADMAP.md)
|
||
- **Phase 20.6**: [VM Core Complete](../phase-20.6/)
|
||
- **Phase 20.8**: [GC v0 + Rust Deprecation](../phase-20.8/)
|
||
- **Box-First Principle**: [CLAUDE.md](../../../../CLAUDE.md#🧱-先頭原則-箱理論box-first)
|
||
- **Golden Testing Strategy**: [PURE_HAKORUNE_ROADMAP.md#🧪-golden-testing-strategy](../phase-20.5/PURE_HAKORUNE_ROADMAP.md#-golden-testing-strategy)
|
||
|
||
---
|
||
|
||
**Status**: PLANNED
|
||
**Dependencies**: Phase 20.6 (VM Core + Dispatch)
|
||
**Delivers**: Phase D (Collections in Pure Hakorune)
|
||
**Timeline**: 8 weeks (2026-05-25 → 2026-07-19)
|