hakorune/docs/private/roadmap/phases/phase-20.7/PLAN.md

# Phase 20.7: Collections in Hakorune - Implementation Plan

**Duration**: 8 weeks (2026-05-25 → 2026-07-19)
**Prerequisite**: Phase 20.6 (VM Core Complete + Dispatch Unification)
**Completes**: Phase D (Collections - MapBox/ArrayBox in Pure Hakorune)

---

## Executive Summary

Phase 20.7 rolls out the **C‑ABI bridge for Basic boxes (String/Array/Map)** so Rust VM and Hako VM call the same implementation. Default profile uses dynamic plugins; embedded/WASM uses built‑in plugins (static link + LTO). Pure Hakorune implementations remain available for research but are not the default execution path.

**Key Achievements**:
- String/Array/Map via C‑ABI (plugins) with identical behavior across Rust/Hako VMs
- Built‑in plugin preset (WASM/embedded) yields zero‑cost inlining via Two‑Layer Export
- Golden tests proving Rust vs Hako parity for collections (size/push/get/set/slice/indexOf/substring)
- Performance ≥ 70% of Rust runtime (for ABI path)

**Critical Principles**:
- **C‑ABI SSOT** w/ Two‑Layer Export（Stable ABI / Inlinable API）
- **Deterministic Behavior**: Key order is predictable (Symbol < Int < String)
- **ValueBox/DataBox Boundaries**: ValueBox is temporary (pipeline only), DataBox is persistent
- **Fail‑Fast**: Unknown methods/capability violations → RuntimeError (no silent fallbacks)

---

## C‑ABI Bridge Rollout (String/Array/Map)

### Flags & Presets
- `NYASH_ABI_BASIC_ON=1` … Enable ABI path for Basic boxes（既定: OFF → quick → integration で段階ON）
- `NYASH_ABI_BUILTIN_LINK=1` … Built‑in plugins（LTO/inline） for embedded/WASM preset
- `NYASH_NORMALIZE_CORE_EXTERN=1` … Builder/VM が String/Array/Map を `Extern("nyrt.*")` に強制正規化するゲート。既定ON（0で BoxCall/Plugin 経路にロールバック可）。
- `NYASH_VERIFY_CORE_EXTERN=1` … Verifier strict。Method/ModuleFunction 形が残っていないかを Fail‑Fast で監視（ローカル観測用、既定OFF）。
- Presets: default（plugins）, embedded（built‑in）, research（pure）

### Acceptance Criteria
- VM/LLVM/Hako（nyvm）で collections 操作の出力一致（size/get/set/push/slice/indexOf/substring）
- quick → integration と段階で `NYASH_ABI_BASIC_ON=1` をON化し全緑
- 性能: ABI経路が Rust 実装の ≥ 70%

### Rollback
- いつでも `NYASH_ABI_BASIC_ON=0` で既存経路に戻せる（設計上のセーフティネット）

---

## Pilot Rollout Plan（段階ONの細分化）

目的
- リスクを最小化するため、機能単位で段階ON（パイロット → 観測 → 横展開）を行う。

Week‑by‑Week（目安）
- Week 1（quick only）: String.length を ABI で ON（`NYASH_ABI_BASIC_ON=1`） → 48h 観測
- Week 2（quick only）: Array.size/push を追加 ON → 観測
- Week 3‑4（quick only）: Map.set/get を追加 ON → 観測
- Week 5: quick での ABI ON を integration に横展開（String/Array/Map）
- Week 6‑7: 性能微調整・診断強化（≥ 70% 目標）
- Week 8: 既定ON 判定（問題あれば `NYASH_ABI_BASIC_ON=0` で即時戻し）

診断/観測
- `NYASH_ABI_BASIC_DIAG=1` で ABI 経路呼出を 1 行ログ（dev/CI観測用、既定OFF）
- リーク・ガード（テスト専用）で retain/release 漏れ検知（1000x new/free → 総ハンドル数=0）

CI Matrix（最小）
- 追加: quick（ABI ON）ジョブ 1 本（段階ONの対象のみ String/Array/Map）
- 既存: quick/integration（ABI OFF）を維持して回帰監視


---

## Week 1-4: MapBox Implementation

### Overview

MapBox is a hash map storing key-value pairs with deterministic iteration order.

### Core Methods

```hakorune
box MapBox {
    // Storage
    _buckets: ArrayBox      // Internal bucket storage
    _size: IntegerBox       // Element count
    _load_factor: FloatBox  // For resizing

    // Public API
    set(key: DataBox, value: DataBox) -> NullBox
    get(key: DataBox) -> DataBox | NullBox
    has(key: DataBox) -> BoolBox
    remove(key: DataBox) -> DataBox | NullBox
    size() -> IntegerBox
    keys() -> ArrayBox      // Returns keys in deterministic order
    values() -> ArrayBox    // Returns values in deterministic order
}
```

### Week 1: Foundation

**Deliverables**:
1. **Data Structures**:
   - Bucket storage design (chaining or open addressing)
   - Hash function for Symbol/Int/String
   - Collision resolution strategy

2. **Core Operations**:
   - `set(key, value)` implementation
   - `get(key)` implementation
   - Basic resizing logic (load factor > 0.75 → double capacity)

3. **Tests**:
   - Basic set/get operations (10 test cases)
   - Hash collision handling (5 test cases)
   - Resizing behavior (3 test cases)

### Week 2: Deterministic Hashing

**Deliverables**:
1. **Hash Strategy**:
   - Symbol hash: Use symbol ID (deterministic)
   - Int hash: Use integer value (deterministic)
   - String hash: Use UTF-8 bytes with stable algorithm (e.g., FNV-1a)

2. **Key Normalization**:
   - Type detection (Symbol vs Int vs String)
   - Unified comparison function
   - Stable sort implementation

3. **Deterministic Iteration**:
   - `keys()` returns sorted keys (Symbol < Int < String)
   - `values()` returns values in key order
   - Provenance tracking (plugin_id, version)

4. **Tests**:
   - Mixed-type key insertion (Symbol, Int, String)
   - Deterministic iteration order (20 test cases)
   - Hash stability (same key → same bucket)

### Week 3: Advanced Operations

**Deliverables**:
1. **Remove Operation**:
   - `remove(key)` implementation
   - Bucket chain repair (if using chaining)
   - Size decrement

2. **Query Operations**:
   - `has(key)` implementation
   - `size()` implementation
   - Edge cases (empty map, single element)

3. **Boundary Checks**:
   - Reject ValueBox keys (Fail-Fast)
   - Ensure DataBox inputs only
   - Clear error messages

4. **Tests**:
   - Remove operations (15 test cases)
   - Edge cases (empty map, remove non-existent key)
   - ValueBox rejection (Fail-Fast verification)

### Week 4: MapBox Golden Tests

**Deliverables**:
1. **Golden Test Suite**:
   - 50+ test cases comparing Rust-MapBox vs Hako-MapBox
   - Identical output verification (keys, values, size)
   - Performance benchmarks (set/get/remove)

2. **Deterministic Verification**:
   - Same input → same output (10 runs per test)
   - Key order stability (Symbol < Int < String)
   - Provenance tracking validation

3. **Performance Baseline**:
   - Measure operation times (set/get/remove)
   - Compare against Rust-MapBox
   - Target: ≥ 70% of Rust performance

4. **Documentation**:
   - MapBox API reference
   - Usage examples
   - Performance characteristics (O(1) average, O(n) worst case)

---

## Week 5-8: ArrayBox Implementation

### Overview

ArrayBox is a dynamic array (list) with automatic capacity expansion.

### Core Methods

```hakorune
box ArrayBox {
    // Storage
    _data: RawBufferBox     // Internal buffer
    _size: IntegerBox       // Current element count
    _capacity: IntegerBox   // Allocated capacity

    // Public API
    push(value: DataBox) -> NullBox
    pop() -> DataBox | NullBox
    get(index: IntegerBox) -> DataBox | NullBox
    set(index: IntegerBox, value: DataBox) -> NullBox
    size() -> IntegerBox
    slice(start: IntegerBox, end: IntegerBox) -> ArrayBox
    concat(other: ArrayBox) -> ArrayBox
}
```

### Week 5: Foundation

**Deliverables**:
1. **Data Structures**:
   - RawBufferBox for storage
   - Capacity expansion strategy (2x growth)
   - Bounds checking

2. **Core Operations**:
   - `push(value)` implementation
   - `pop()` implementation
   - Capacity doubling logic

3. **Tests**:
   - Basic push/pop operations (10 test cases)
   - Capacity expansion (grow from 0 → 1 → 2 → 4 → 8)
   - Empty array edge cases

### Week 6: Index Operations

**Deliverables**:
1. **Index Access**:
   - `get(index)` implementation
   - `set(index, value)` implementation
   - Bounds checking (index < 0 or index ≥ size → error)

2. **Fail-Fast Boundaries**:
   - Out-of-bounds access → RuntimeError
   - Negative index → RuntimeError
   - Clear error messages with context

3. **Tests**:
   - Valid index access (20 test cases)
   - Out-of-bounds access (10 test cases)
   - Negative index handling (5 test cases)

### Week 7: Advanced Operations

**Deliverables**:
1. **Slice Operation**:
   - `slice(start, end)` implementation
   - Return new ArrayBox with elements [start, end)
   - Handle edge cases (start > end, negative indices)

2. **Concat Operation**:
   - `concat(other)` implementation
   - Create new ArrayBox with combined elements
   - Efficient memory allocation

3. **ValueBox/DataBox Boundaries**:
   - Reject ValueBox elements (Fail-Fast)
   - Ensure DataBox inputs only
   - Unpack at entry/exit points

4. **Tests**:
   - Slice operations (15 test cases)
   - Concat operations (10 test cases)
   - ValueBox rejection (5 test cases)

### Week 8: ArrayBox Golden Tests

**Deliverables**:
1. **Golden Test Suite**:
   - 50+ test cases comparing Rust-ArrayBox vs Hako-ArrayBox
   - Identical output verification (size, elements, order)
   - Performance benchmarks (push/pop/get/set)

2. **Performance Baseline**:
   - Measure operation times (all methods)
   - Compare against Rust-ArrayBox
   - Target: ≥ 70% of Rust performance

3. **Large-Scale Tests**:
   - 10,000 element arrays
   - Stress test capacity expansion
   - Memory usage validation

4. **Documentation**:
   - ArrayBox API reference
   - Usage examples
   - Performance characteristics (O(1) amortized, O(n) worst case)

---

## Key Design Decisions

### 1. Deterministic Key Order (Symbol < Int < String)

**Rationale**:
- Predictable iteration order for debugging
- Reproducible behavior across runs
- Stable sorting for golden tests

**Implementation**:
```hakorune
// Key comparison function
method compare_keys(key1: DataBox, key2: DataBox) -> IntegerBox {
    local type1 = key1.type_id()
    local type2 = key2.type_id()

    // Type priority: Symbol < Int < String
    if (type1 != type2) {
        if (type1 == :Symbol) { return -1 }
        if (type2 == :Symbol) { return 1 }
        if (type1 == :Int) { return -1 }
        if (type2 == :Int) { return 1 }
    }

    // Same type: natural order
    return key1.compare(key2)
}
```

### 2. ValueBox/DataBox Boundaries

**Rationale**:
- ValueBox is ephemeral (pipeline boundaries only)
- DataBox is persistent (long-lived storage)
- Fail-Fast prevents confusion

**Implementation**:
```hakorune
// MapBox.set with boundary check
method set(key: DataBox, value: DataBox) -> NullBox {
    if (key.is_valuebox()) {
        panic("MapBox.set: key must be DataBox, not ValueBox")
    }
    if (value.is_valuebox()) {
        panic("MapBox.set: value must be DataBox, not ValueBox")
    }

    // Proceed with actual set operation
    local hash = me._hash(key)
    local bucket = me._buckets.get(hash)
    bucket.insert(key, value)
}
```

### 3. Fail-Fast Error Handling

**Rationale**:
- No silent fallbacks (prevents bugs)
- Clear error messages (aids debugging)
- Early detection (stops at source)

**Implementation**:
```hakorune
// ArrayBox.get with Fail-Fast
method get(index: IntegerBox) -> DataBox | NullBox {
    if (index < 0) {
        panic("ArrayBox.get: negative index: " + index)
    }
    if (index >= me._size) {
        panic("ArrayBox.get: index out of bounds: " + index + " >= " + me._size)
    }

    return me._data.get(index)
}
```

---

## Success Criteria

### Mandatory (✅ All must pass)

1. **MapBox/ArrayBox Fully Implemented**:
   - All methods work in Pure Hakorune
   - No Rust dependencies (except HostBridge)

2. **Golden Tests Pass**:
   - Rust-Collections vs Hako-Collections: 100% parity
   - 100+ test cases with identical output
   - Deterministic behavior verified

3. **Deterministic Behavior**:
   - Key order is predictable (Symbol < Int < String)
   - Iteration order is stable
   - Provenance tracking works

4. **Performance**:
   - Hako-Collections ≥ 70% of Rust-Collections speed
   - Basic operations (set/get/push/pop) are O(1) average
   - Large-scale data (10,000 elements) is practical

### Excluded (⚠️ NOT in Phase 20.7)

- **GC (Garbage Collection)**: Deferred to Phase 20.8
- **Optimization**: Basic implementation only (no profiling yet)
- **Concurrency**: Single-threaded only
- **Persistence**: In-memory only (no disk storage)

---

## Dependencies

### Prerequisite: Phase 20.6

Phase 20.7 requires:
1. **VM Core Complete**: All 16 MIR instructions working
2. **Dispatch Unification**: Single Resolver path for all method calls
3. **Golden Test Framework**: Rust-VM vs Hako-VM comparison working

### Provides to: Phase 20.8

Phase 20.7 delivers:
1. **Collections in Hakorune**: MapBox/ArrayBox fully self-hosted
2. **Deterministic Iteration**: Stable key order for GC root scanning
3. **Reference Tracking**: Preparation for GC mark phase

---

## Risk Mitigation

### Risk 1: Performance Below 70%

**Mitigation**:
- Measure at each week (not just Week 4/8)
- Profile hot paths early (hash function, bounds checks)
- Accept slower performance initially (can optimize later)

### Risk 2: Deterministic Hashing Complexity

**Mitigation**:
- Use simple, proven hash functions (FNV-1a for strings)
- Stable sort implementation (merge sort, not quicksort)
- Test hash stability with 1,000+ keys

### Risk 3: ValueBox/DataBox Confusion

**Mitigation**:
- Clear error messages ("must be DataBox, not ValueBox")
- Fail-Fast at entry points (set/push methods)
- Golden tests verify boundary enforcement

---

## Related Documents

- **Phase 20.5**: [Pure Hakorune Roadmap](../phase-20.5/PURE_HAKORUNE_ROADMAP.md)
- **Phase 20.6**: [VM Core Complete](../phase-20.6/)
- **Phase 20.8**: [GC v0 + Rust Deprecation](../phase-20.8/)
- **Box-First Principle**: [CLAUDE.md](../../../../CLAUDE.md#🧱-先頭原則-箱理論box-first)
- **Golden Testing Strategy**: [PURE_HAKORUNE_ROADMAP.md#🧪-golden-testing-strategy](../phase-20.5/PURE_HAKORUNE_ROADMAP.md#-golden-testing-strategy)

---

**Status**: PLANNED
**Dependencies**: Phase 20.6 (VM Core + Dispatch)
**Delivers**: Phase D (Collections in Pure Hakorune)
**Timeline**: 8 weeks (2026-05-25 → 2026-07-19)