Files
hakorune/docs/private/roadmap/phases/phase-20.8/PLAN.md

739 lines
19 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Phase 20.8: GC + Rust Deprecation - Implementation Plan
**Duration**: 6 weeks (2026-07-20 → 2026-08-30)
**Status**: Not Started
**Prerequisites**: Phase 20.7 (Collections in Hakorune) completed
---
## Executive Summary
Phase 20.8 is the final phase of the "Pure Hakorune" initiative, completing the vision of "Rust=floor, Hakorune=house". It consists of two sub-phases:
1. **Phase E: GC v0** (Week 1-4) - Implement Mark & Sweep garbage collection in Hakorune
2. **Phase F: Rust VM Deprecation** (Week 5-6) - Deprecate Rust VM and achieve true self-hosting
Upon completion, Hakorune will be a fully self-hosted language with:
- **Rust layer**: ≤ 100 lines (HostBridge API only)
- **Hakorune layer**: Everything else (VM, parser, collections, GC, stdlib)
---
## Design Freeze — Boundaries and Contracts (prework)
Goal: Lock down boundaries and invariants before coding to keep Phase20.8 small and deterministic.
Rust Layer (final responsibilities)
- [ ] Limit to: Boot (Lock/Capsule verify→BootPlan), HostBridge CABI publish, CLI flags, PluginHost init (LockOnly order)
- [ ] No discovery fallbacks; no silent retries; propagate nonOK status to exit code
HostBridge minimal API (7 functions)
- [ ] open/close/last_error
- [ ] list_types/type_id/method_id
- [ ] call (unified Call over Extern/Method/ModuleFunction/Constructor)
- [ ] Versioning: abi_major/minor + struct_size; caps bits; optional allocator pointer
- [ ] Error policy: HAKO_OK/NOT_FOUND/BAD_LOCK/INCOMPATIBLE/OOM/UNSUPPORTED/VALIDATION/PANIC
- See: docs/development/architecture/hostbridge/ABI_v1.md
Determinism / Capsule / Lock
- [ ] LockOnly run path; load order fixed; sha256 verify (plugins + optional AOT objects)
- [ ] Frozen mode requires verify; no fallback
Semantic guards (SSOT)
- [ ] Published names: Box.method/Arity only (aliases with TTL→remove)
- [ ] Tail fallback OFF by default; tail_ok via CallAttrs only
- [ ] Eq/Ne rewrite: primitive=Compare, box=op_eq, enum=.equals; Verifier enforces
- [ ] Intern: apply to published names only; dump name→id set for CI
Performance/Observability
- [ ] KPI targets documented (VM ≥ 70% of LLVM on representative ops)
- [ ] GC metrics: pause_ms/live_bytes/num_objects/alloc_rate; HAKO_GC_TRACE format fixed
- [ ] Stable error messages: see docs/development/architecture/errors/fail-fast-messages.md
---
## Week 1-4: Phase E - GC v0 (Mark & Sweep)
### Overview
Implement a minimal stop-the-world Mark & Sweep garbage collector in Hakorune.
**Key Principles**:
- **Simplicity**: No generational GC, no incremental GC
- **Correctness**: Zero memory leaks
- **Observability**: Full GC tracing via `HAKO_GC_TRACE=1`
- **Fail-Fast**: Invalid GC states panic immediately
### Safepoints (design)
- Call boundariesMirCall直前/直後)
- Control transfersbranch/jump 後 / ループ backedge
- Long I/OHostBridge/extern 直前)
- v0: `GcHooks.safepoint()` は noop。後段で `should_collect()` を統合。
### Safepoints (design)
- Call boundariesMirCall直前/直後)
- Control transfersbranch/jump 後 / ループbackedge
- Long I/OHostBridge/extern の長時間前)
- v0: `GcHooks.safepoint()` は noop。後段で should_collect() を内部に統合。
---
### Week 1-2: Mark Phase Implementation
#### Task 1.1: GC Roots Detection
Implement detection of GC roots (entry points for reachability analysis):
```hakorune
// GcBox implementation
box GcBox {
// Root set
stack_roots: ArrayBox // Stack frame locals
global_roots: ArrayBox // Global static boxes
handle_roots: ArrayBox // HandleRegistry handles
birth() {
from Box.birth()
me.stack_roots = new ArrayBox()
me.global_roots = new ArrayBox()
me.handle_roots = new ArrayBox()
}
collect_roots() {
// 1. Scan stack frames for local variables
me.scan_stack_frames()
// 2. Scan global static boxes
me.scan_global_boxes()
// 3. Scan HandleRegistry for C-ABI handles
me.scan_handle_registry()
}
scan_stack_frames() {
// Iterate through VM stack frames
local frame = ExecBox.current_frame()
loop(frame != null) {
local locals = frame.get_locals()
locals.foreach(func(local) {
me.stack_roots.push(local)
})
frame = frame.parent()
}
}
scan_global_boxes() {
// Scan all static boxes
local globals = GlobalRegistry.all_static_boxes()
globals.foreach(func(box) {
me.global_roots.push(box)
})
}
scan_handle_registry() {
// Scan HandleRegistry for live handles
local handles = HandleRegistry.all_handles()
handles.foreach(func(handle) {
me.handle_roots.push(handle)
})
}
}
```
**Deliverables**:
- [ ] GC roots detection implemented
- [ ] Stack frame scanning
- [ ] Global box scanning
- [ ] HandleRegistry scanning
- [ ] Tests: Verify all roots found
#### Task 1.2: Mark Algorithm
Implement mark phase (trace reachable objects):
```hakorune
box GcBox {
marked: MapBox // object_id -> true
mark() {
me.marked = new MapBox()
// Mark all objects reachable from roots
me.stack_roots.foreach(func(root) {
me.mark_object(root)
})
me.global_roots.foreach(func(root) {
me.mark_object(root)
})
me.handle_roots.foreach(func(root) {
me.mark_object(root)
})
}
mark_object(obj) {
local obj_id = obj.object_id()
// Already marked?
if (me.marked.has(obj_id)) {
return
}
// Mark this object
me.marked.set(obj_id, true)
// Recursively mark children
local children = obj.get_children()
children.foreach(func(child) {
me.mark_object(child)
})
}
}
```
**Deliverables**:
- [ ] Mark algorithm implemented
- [ ] Recursive marking of children
- [ ] Cycle detection (avoid infinite loops)
- [ ] Tests: Verify mark correctness
---
### Week 3-4: Sweep Phase & Metrics
#### Task 3.1: Sweep Algorithm
Implement sweep phase (free unmarked objects):
```hakorune
box GcBox {
all_objects: ArrayBox // All allocated objects
sweep() {
local freed_count = 0
local start_time = TimeBox.now_ms()
// Iterate through all objects
local survivors = new ArrayBox()
me.all_objects.foreach(func(obj) {
local obj_id = obj.object_id()
if (me.marked.has(obj_id)) {
// Survivor: Keep it
survivors.push(obj)
} else {
// Garbage: Free it
obj.destroy()
freed_count = freed_count + 1
}
})
// Update object list
me.all_objects = survivors
local sweep_time = TimeBox.now_ms() - start_time
me.log_sweep(freed_count, survivors.size(), sweep_time)
}
log_sweep(freed, survivors, time_ms) {
if (EnvBox.has("HAKO_GC_TRACE")) {
ConsoleBox.log("[GC] Sweep phase: " + freed + " objects freed (" + time_ms + "ms)")
ConsoleBox.log("[GC] Survivors: " + survivors + " objects")
}
}
}
```
**Deliverables**:
- [ ] Sweep algorithm implemented
- [ ] Object destruction (finalization)
- [ ] Survivor list updated
- [ ] Tests: Verify sweep correctness
#### Task 3.2: GC Metrics Collection
Implement GC metrics for observability:
```hakorune
box GcBox {
metrics: GcMetricsBox
birth() {
from Box.birth()
me.metrics = new GcMetricsBox()
}
collect() {
me.metrics.increment_collections()
// Mark phase
local mark_start = TimeBox.now_ms()
me.collect_roots()
me.mark()
local mark_time = TimeBox.now_ms() - mark_start
// Sweep phase
local sweep_start = TimeBox.now_ms()
me.sweep()
local sweep_time = TimeBox.now_ms() - sweep_start
// Record metrics
me.metrics.record_collection(mark_time, sweep_time, me.marked.size())
}
}
box GcMetricsBox {
total_allocations: IntegerBox
total_collections: IntegerBox
total_freed: IntegerBox
peak_handles: IntegerBox
birth() {
from Box.birth()
me.total_allocations = 0
me.total_collections = 0
me.total_freed = 0
me.peak_handles = 0
}
increment_allocations() {
me.total_allocations = me.total_allocations + 1
}
increment_collections() {
me.total_collections = me.total_collections + 1
}
record_collection(mark_time, sweep_time, survivors) {
// Log metrics (stable format)
// [GC] mark=<ms> sweep=<ms> survivors=<n>
if (EnvBox.has("HAKO_GC_TRACE")) {
ConsoleBox.log("[GC] mark=" + mark_time + " sweep=" + sweep_time + " survivors=" + survivors)
}
}
print_stats() {
ConsoleBox.log("[GC Stats] Total allocations: " + me.total_allocations)
ConsoleBox.log("[GC Stats] Total collections: " + me.total_collections)
ConsoleBox.log("[GC Stats] Total freed: " + me.total_freed)
ConsoleBox.log("[GC Stats] Peak handles: " + me.peak_handles)
}
}
```
**Deliverables**:
- [ ] GcMetricsBox implemented
- [ ] Allocation/collection counters
- [ ] Timing metrics
- [ ] `HAKO_GC_TRACE=1` logging
- [ ] Tests: Verify metrics accuracy
#### Task 3.3: Integration & Testing
Integrate GC with VM execution:
```hakorune
box MiniVmBox {
gc: GcBox
birth() {
from Box.birth()
me.gc = new GcBox()
}
allocate_object(obj) {
// Register with GC
me.gc.register_object(obj)
me.gc.metrics.increment_allocations()
// Trigger GC if needed
if (me.gc.should_collect()) {
me.gc.collect()
}
return obj
}
destroy() {
// Print GC stats before exit
me.gc.metrics.print_stats()
from Box.destroy()
}
}
```
**Deliverables**:
- [ ] GC integrated with VM
- [ ] Allocation hook
- [ ] GC trigger policy
- [ ] Stats printed at exit
- [ ] Tests: End-to-end GC validation
---
## Week 5-6: Phase F - Rust VM Deprecation
### Overview
Deprecate Rust VM and achieve true self-hosting with Hakorune VM as the default backend.
**Goals**:
1. Make Hakorune-VM the default (`--backend vm`)
2. Move Rust-VM to opt-in mode (`--backend vm-rust`, with warning)
3. Verify bit-identical self-compilation (Hako₁ → Hako₂ → Hako₃)
4. Minimize Rust layer to ≤ 100 lines (HostBridge API only)
---
### Week 5: Backend Switching & Deprecation
#### Task 5.1: Make Hakorune-VM Default
Update CLI argument parsing:
```rust
// src/cli.rs
#[derive(Parser)]
struct Cli {
#[clap(long, default_value = "vm")]
backend: Backend,
}
enum Backend {
Vm, // Hakorune-VM (new default)
VmRust, // Rust-VM (deprecated)
Llvm, // LLVM backend
}
fn main() {
let cli = Cli::parse();
if matches!(cli.backend, Backend::VmRust) {
eprintln!("Warning: Rust-VM (--backend vm-rust) is deprecated.");
eprintln!(" It will be removed in Phase 15.82.");
eprintln!(" Use Hakorune-VM (--backend vm) instead.");
}
// Execute with chosen backend
execute(cli.backend, &cli.input_file);
}
```
**Deliverables**:
- [ ] CLI updated (Hakorune-VM default)
- [ ] Deprecation warning added
- [ ] Documentation updated
- [ ] Tests: Verify default backend
#### Task 5.2: Golden Tests Verification
Verify Rust-VM vs Hakorune-VM parity:
```bash
# Run golden tests
./tools/golden_tests.sh
# Expected output:
# ✅ arithmetic.hako: Rust-VM == Hakorune-VM
# ✅ control_flow.hako: Rust-VM == Hakorune-VM
# ✅ collections.hako: Rust-VM == Hakorune-VM
# ✅ recursion.hako: Rust-VM == Hakorune-VM
# ✅ strings.hako: Rust-VM == Hakorune-VM
# ✅ enums.hako: Rust-VM == Hakorune-VM
# ✅ closures.hako: Rust-VM == Hakorune-VM
# ✅ selfhost_mini.hako: Rust-VM == Hakorune-VM
#
# All golden tests PASSED (8/8)
```
**Deliverables**:
- [ ] Golden test suite passes
- [ ] 100% Rust-VM vs Hakorune-VM parity
- [ ] CI integration
- [ ] Tests: All outputs match exactly
---
### Week 6: Bit-Identical Verification & Rust Minimization
#### Task 6.1: Bit-Identical Self-Compilation
Implement self-compilation chain verification:
```bash
#!/bin/bash
# tools/verify_self_compilation.sh
set -e
echo "=== Self-Compilation Verification ==="
# Hako₁: Rust-based compiler (current version)
echo "[1/5] Building Hako₁ (Rust-based compiler)..."
cargo build --release
cp target/release/hako hako_1
# Hako₂: Compiled by Hako₁
echo "[2/5] Building Hako₂ (via Hako₁)..."
./hako_1 apps/selfhost-compiler/main.hako -o hako_2
chmod +x hako_2
# Hako₃: Compiled by Hako₂
echo "[3/5] Building Hako₃ (via Hako₂)..."
./hako_2 apps/selfhost-compiler/main.hako -o hako_3
chmod +x hako_3
# Verify Hako₂ == Hako₃ (bit-identical)
echo "[4/5] Verifying bit-identical: Hako₂ == Hako₃..."
if diff hako_2 hako_3 > /dev/null; then
echo "✅ SUCCESS: Hako₂ == Hako₃ (bit-identical)"
else
echo "❌ FAILURE: Hako₂ != Hako₃"
exit 1
fi
# Verify Hako₁ == Hako₂ (should match after stabilization)
echo "[5/5] Verifying bit-identical: Hako₁ == Hako₂..."
if diff hako_1 hako_2 > /dev/null; then
echo "✅ SUCCESS: Hako₁ == Hako₂ (bit-identical)"
else
echo "⚠️ WARNING: Hako₁ != Hako₂ (expected during transition)"
fi
echo ""
echo "=== Self-Compilation Verification PASSED ==="
```
**CI Integration**:
```yaml
# .github/workflows/self_compilation.yml
name: Self-Compilation Verification
on:
push:
branches: [main, private/selfhost]
pull_request:
schedule:
- cron: '0 0 * * *' # Daily at midnight
jobs:
verify:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Setup Rust
uses: actions-rust-lang/setup-rust-toolchain@v1
- name: Build Hako (Rust-based)
run: cargo build --release
- name: Self-Compilation Chain
run: ./tools/verify_self_compilation.sh
- name: Upload artifacts
if: failure()
uses: actions/upload-artifact@v3
with:
name: self-compilation-failure
path: |
hako_1
hako_2
hako_3
```
**Deliverables**:
- [ ] `verify_self_compilation.sh` script
- [ ] CI workflow added
- [ ] Daily verification runs
- [ ] Bit-identical verification passes
- [ ] Tests: Hako₂ == Hako₃ confirmed
#### Task 6.2: Rust Layer Audit
Verify Rust layer ≤ 100 lines:
```bash
#!/bin/bash
# tools/audit_rust_layer.sh
echo "=== Rust Layer Audit ==="
# Count lines in HostBridge API
RUST_LINES=$(wc -l src/host_bridge.rs | awk '{print $1}')
echo "Rust layer: $RUST_LINES lines (target: ≤ 100)"
if [ "$RUST_LINES" -le 100 ]; then
echo "✅ SUCCESS: Rust layer minimized (≤ 100 lines)"
else
echo "❌ FAILURE: Rust layer too large (> 100 lines)"
echo " Please move more logic to Hakorune"
exit 1
fi
# List Rust files (should be minimal)
echo ""
echo "Rust files:"
find src -name "*.rs" -not -path "src/host_bridge.rs" | while read file; do
lines=$(wc -l "$file" | awk '{print $1}')
echo " $file: $lines lines (should be removed or moved to Hakorune)"
done
```
**Deliverables**:
- [ ] `audit_rust_layer.sh` script
- [ ] Rust layer ≤ 100 lines confirmed
- [ ] Non-HostBridge Rust files identified
- [ ] Migration plan for remaining Rust code
- [ ] Tests: Rust layer audit passes
#### Task 6.3: Final Documentation
Update documentation to reflect completion:
**Update CURRENT_TASK.md**:
```markdown
## ✅ Phase 20.8 Complete (2026-08-30)
- ✅ GC v0 implemented (Mark & Sweep)
- ✅ Hakorune-VM is default backend
- ✅ Rust-VM deprecated (--backend vm-rust)
- ✅ Bit-identical self-compilation verified
- ✅ Rust layer minimized (≤ 100 lines)
**Status**: True Self-Hosting Achieved
**Next**: Phase 15.82 (Advanced GC, Performance Optimization)
```
**Update README.md**:
```markdown
## Hakorune - True Self-Hosted Programming Language
Hakorune is a fully self-hosted language where the compiler, VM, and runtime
are all implemented in Hakorune itself.
**Architecture**:
- Rust layer: ~100 lines (HostBridge API for C-ABI boundary)
- Hakorune layer: Everything else (VM, parser, GC, stdlib)
**Self-Hosting Status**: ✅ Complete (2026-08-30)
```
**Deliverables**:
- [ ] CURRENT_TASK.md updated
- [ ] README.md updated
- [ ] Phase 20.8 completion report
- [ ] Phase 15.82 planning document
---
## Success Criteria
### Phase E (GC v0)
- [ ] GC v0 implemented and functional
- [ ] Mark & Sweep algorithms correct
- [ ] GC roots detected (stack, global, handles)
- [ ] Metrics collection working
- [ ] `HAKO_GC_TRACE=1` provides detailed logs
- [ ] Zero memory leaks in smoke tests
- [ ] Performance: GC overhead ≤ 10% of total runtime
### Phase F (Rust VM Deprecation)
- [ ] Hakorune-VM is default backend (`--backend vm`)
- [ ] Rust-VM deprecated with clear warning
- [ ] Bit-identical self-compilation verified (Hako₂ == Hako₃)
- [ ] CI daily verification passes
- [ ] Rust layer ≤ 100 lines (HostBridge API only)
- [ ] All smoke tests pass with Hakorune-VM
- [ ] Documentation complete
### Overall
- [ ] **True Self-Hosting**: Hakorune IS Hakorune
- [ ] **Rust=floor, Hakorune=house**: Architecture realized
- [ ] **Production Ready**: All tests pass, no memory leaks
- [ ] **Performance**: ≥ 50% of Rust-VM speed
---
## Risk Mitigation
### Risk 1: GC Bugs (Memory Leaks/Corruption)
**Mitigation**:
- Implement comprehensive tests (golden tests, smoke tests)
- Use `HAKO_GC_TRACE=1` for debugging
- Start with simple Mark & Sweep (no generational/incremental)
- Valgrind integration for leak detection
### Risk 2: Self-Compilation Divergence
**Mitigation**:
- Daily CI verification (Hako₂ == Hako₃)
- Freeze Rust VM after Phase 20.5 (no new features)
- Golden tests ensure Rust-VM vs Hakorune-VM parity
- Bisect on divergence to identify root cause
### Risk 3: Performance Degradation
**Mitigation**:
- Accept slower performance initially (≥ 50% of Rust-VM)
- Profile hot paths and optimize incrementally
- GC tuning (collection frequency, root set optimization)
- Defer advanced optimizations to Phase 15.82
### Risk 4: Incomplete Rust Minimization
**Mitigation**:
- Strict audit (Rust layer ≤ 100 lines)
- Move all logic to Hakorune (VM, collections, GC)
- HostBridge API is stable (no new features)
- Clear boundary: Rust=C-ABI only
---
## Timeline Summary
```
Week 1-2: GC Mark Phase
- GC roots detection
- Mark algorithm
- Basic tracing
Week 3-4: GC Sweep Phase & Metrics
- Sweep algorithm
- Metrics collection
- HAKO_GC_TRACE=1
- Integration & testing
Week 5: Backend Switching & Deprecation
- Hakorune-VM default
- Rust-VM deprecation warning
- Golden tests verification
Week 6: Bit-Identical Verification & Audit
- Self-compilation chain
- CI integration
- Rust layer audit (≤ 100 lines)
- Final documentation
```
---
## Related Documents
- **Phase 20.7 (Collections)**: [../phase-20.7/README.md](../phase-20.7/README.md)
- **Phase 15.80 (VM Core)**: [../phase-15.80/README.md](../phase-15.80/README.md)
- **Pure Hakorune Roadmap**: [../phase-20.5/PURE_HAKORUNE_ROADMAP.md](../phase-20.5/PURE_HAKORUNE_ROADMAP.md)
- **HostBridge API**: [../phase-20.5/HOSTBRIDGE_API_DESIGN.md](../phase-20.5/HOSTBRIDGE_API_DESIGN.md)
---
**Status**: Not Started
**Start Date**: 2026-07-20
**Target Completion**: 2026-08-30
**Dependencies**: Phase 20.7 must be complete