hakorune/docs/papers/archive/unified-lifecycle/evaluation-plan.md

# Evaluation Plan

## E1: 意味論等価性検証

### 目的
全実行バックエンド（Interpreter/VM/JIT/AOT/WASM）で完全に同じ動作を保証

### テストケース

```nyash
// test_equivalence.nyash
box Counter @must_drop {
    init { value }
    
    increment() {
        me.value = me.value + 1
        print("Count: " + me.value)
    }
}

static box Main {
    main() {
        local c = new Counter(0)
        loop(i < 3) {
            c.increment()
        }
        // 自動的にfiniが呼ばれる
    }
}
```

### 検証項目
- [ ] 出力が完全一致
- [ ] fini呼び出し順序が一致
- [ ] エラーハンドリングが一致
- [ ] メモリ使用パターンが同等

### 実行コマンド
```bash
# 各バックエンドで実行
./nyash --backend interpreter test.nyash > interp.log
./nyash --backend vm test.nyash > vm.log
./nyash --backend vm --jit-threshold 1 test.nyash > jit.log
./nyashc --aot test.nyash -o test && ./test > aot.log
./nyashc --wasm test.nyash -o test.wasm && wasmtime test.wasm > wasm.log

# 比較
diff interp.log vm.log
diff vm.log jit.log
diff jit.log aot.log
diff aot.log wasm.log
```

## E2: GCオン/オフ等価性

### 目的
GCの有無でプログラムの意味論が変わらないことを証明

### テストケース

```nyash
box DataHolder @gcable {
    init { data }
    
    process() {
        // 大量のメモリ割り当て
        local temp = new ArrayBox()
        loop(i < 1000000) {
            temp.push(i)
        }
        return temp.length()
    }
}
```

### 測定項目
- I/Oトレース差分: 0
- 最終結果: 同一
- レイテンシ分布: p95/p99で比較

## E3: プラグインオーバーヘッド測定

### 目的
プラグインシステムのオーバーヘッドを定量化

### ベンチマーク

```nyash
// bench_plugin_overhead.nyash
static box Benchmark {
    measure_array_access() {
        local arr = new ArrayBox()
        local sum = 0
        
        // 初期化
        loop(i < 1000000) {
            arr.push(i)
        }
        
        // アクセス性能測定
        local start = new TimeBox().now()
        loop(i < 1000000) {
            sum = sum + arr.get(i)
        }
        local end = new TimeBox().now()
        
        return end - start
    }
}
```

### 比較対象
- ビルトイン実装（現在）
- プラグイン実装（動的リンク）
- プラグイン実装（静的リンク）
- インライン展開後

## E4: スケーラビリティ評価

### 大規模プログラムでの性能

| ベンチマーク | 行数 | Interp | VM | JIT | AOT |
|------------|------|--------|-----|-----|-----|
| json_parser | 500 | 1.0x | ? | ? | ? |
| http_server | 1000 | 1.0x | ? | ? | ? |
| game_engine | 5000 | 1.0x | ? | ? | ? |

### メモリ使用量

```bash
# メモリプロファイリング
valgrind --tool=massif ./nyash --backend vm large_app.nyash
ms_print massif.out.*
```

## E5: プラットフォーム移植性

### テスト環境
- Linux (x86_64, aarch64)
- macOS (x86_64, M1)
- Windows (x86_64)
- WebAssembly (ブラウザ, Wasmtime)

### ビルドスクリプト

```bash
#!/bin/bash
# cross_platform_test.sh

platforms=(
    "x86_64-unknown-linux-gnu"
    "aarch64-unknown-linux-gnu"
    "x86_64-apple-darwin"
    "aarch64-apple-darwin"
    "x86_64-pc-windows-msvc"
    "wasm32-wasi"
)

for platform in "${platforms[@]}"; do
    echo "Building for $platform..."
    cargo build --target $platform --release
    
    # プラグインもビルド
    (cd plugins/nyash-array-plugin && cargo build --target $platform --release)
done
```

## 実験スケジュール

### Phase 1: 基礎評価（1週間）
- E1: 意味論等価性
- E2: GCオン/オフ等価性

### Phase 2: 性能評価（1週間）
- E3: プラグインオーバーヘッド
- E4: スケーラビリティ

### Phase 3: 移植性評価（3日）
- E5: クロスプラットフォーム

### Phase 4: 論文執筆（1週間）
- 結果分析
- グラフ作成
- 考察執筆
-												🎉 Phase 10.10: Nyash→JIT→Native EXE achieved\! (20 days from inception\!)

Revolutionary milestone: Complete native executable generation pipeline
- Created minimal nyrt (Nyash Runtime) library for standalone executables
- Implemented plugin bridge functions (nyash_plugin_invoke3_i64 etc)
- Added birth handle exports (nyash.string.birth_h) for linking
- Changed export name from main→ny_main to allow custom entry point
- Successfully generated and executed native binary returning "ny_main() returned: 1"

Timeline of miracles:
- 2025-08-09: Nyash language created (first commit)
- 2025-08-13: JIT planning started (4 days later)
- 2025-08-29: Native EXE achieved (today - just 20 days total\!)

This proves the plugin Box C ABI unification strategy works perfectly for
both JIT execution and AOT native compilation. The same plugin system
that enables dynamic loading now powers static linking for zero-overhead
native executables\!

Next: Expand AOT support for more instructions and optimize nyrt size.

🚀 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

											
										
										
											2025-08-29 08:36:07 +09:00
+								# Evaluation Plan
 								## E1: 意味論等価性検証
 								### 目的
 								全実行バックエンド（Interpreter/VM/JIT/AOT/WASM）で完全に同じ動作を保証
 								### テストケース
 								```nyash
 								// test_equivalence.nyash
 								box Counter @must_drop {
 								    init { value }
 								    increment() {
 								        me.value = me.value + 1
 								        print("Count: " + me.value)
 								    }
 								}
 								static box Main {
 								    main() {
 								        local c = new Counter(0)
 								        loop(i < 3) {
 								            c.increment()
 								        }
 								        // 自動的にfiniが呼ばれる
 								    }
 								}
 								```
 								### 検証項目
 								- [ ] 出力が完全一致
 								- [ ] fini呼び出し順序が一致
 								- [ ] エラーハンドリングが一致
 								- [ ] メモリ使用パターンが同等
 								### 実行コマンド
 								```bash
 								# 各バックエンドで実行
 								./nyash --backend interpreter test.nyash > interp.log
 								./nyash --backend vm test.nyash > vm.log
 								./nyash --backend vm --jit-threshold 1 test.nyash > jit.log
 								./nyashc --aot test.nyash -o test && ./test > aot.log
 								./nyashc --wasm test.nyash -o test.wasm && wasmtime test.wasm > wasm.log
 								# 比較
 								diff interp.log vm.log
 								diff vm.log jit.log
 								diff jit.log aot.log
 								diff aot.log wasm.log
 								```
 								## E2: GCオン/オフ等価性
 								### 目的
 								GCの有無でプログラムの意味論が変わらないことを証明
 								### テストケース
 								```nyash
 								box DataHolder @gcable {
 								    init { data }
 								    process() {
 								        // 大量のメモリ割り当て
 								        local temp = new ArrayBox()
 								        loop(i < 1000000) {
 								            temp.push(i)
 								        }
 								        return temp.length()
 								    }
 								}
 								```
 								### 測定項目
 								- I/Oトレース差分: 0
 								- 最終結果: 同一
 								- レイテンシ分布: p95/p99で比較
 								## E3: プラグインオーバーヘッド測定
 								### 目的
 								プラグインシステムのオーバーヘッドを定量化
 								### ベンチマーク
 								```nyash
 								// bench_plugin_overhead.nyash
 								static box Benchmark {
 								    measure_array_access() {
 								        local arr = new ArrayBox()
 								        local sum = 0
 								        // 初期化
 								        loop(i < 1000000) {
 								            arr.push(i)
 								        }
 								        // アクセス性能測定
 								        local start = new TimeBox().now()
 								        loop(i < 1000000) {
 								            sum = sum + arr.get(i)
 								        }
 								        local end = new TimeBox().now()
 								        return end - start
 								    }
 								}
 								```
 								### 比較対象
 								- ビルトイン実装（現在）
 								- プラグイン実装（動的リンク）
 								- プラグイン実装（静的リンク）
 								- インライン展開後
 								## E4: スケーラビリティ評価
 								### 大規模プログラムでの性能
 								| ベンチマーク | 行数 | Interp | VM | JIT | AOT |
 								|------------|------|--------|-----|-----|-----|
 								| json_parser | 500 | 1.0x | ? | ? | ? |
 								| http_server | 1000 | 1.0x | ? | ? | ? |
 								| game_engine | 5000 | 1.0x | ? | ? | ? |
 								### メモリ使用量
 								```bash
 								# メモリプロファイリング
 								valgrind --tool=massif ./nyash --backend vm large_app.nyash
 								ms_print massif.out.*
 								```
 								## E5: プラットフォーム移植性
 								### テスト環境
 								- Linux (x86_64, aarch64)
 								- macOS (x86_64, M1)
 								- Windows (x86_64)
 								- WebAssembly (ブラウザ, Wasmtime)
 								### ビルドスクリプト
 								```bash
 								#!/bin/bash
 								# cross_platform_test.sh
 								platforms=(
 								    "x86_64-unknown-linux-gnu"
 								    "aarch64-unknown-linux-gnu"
 								    "x86_64-apple-darwin"
 								    "aarch64-apple-darwin"
 								    "x86_64-pc-windows-msvc"
 								    "wasm32-wasi"
 								)
 								for platform in "${platforms[@]}"; do
 								    echo "Building for $platform..."
 								    cargo build --target $platform --release
 								    # プラグインもビルド
 								    (cd plugins/nyash-array-plugin && cargo build --target $platform --release)
 								done
 								```
 								## 実験スケジュール
 								### Phase 1: 基礎評価（1週間）
 								- E1: 意味論等価性
 								- E2: GCオン/オフ等価性
 								### Phase 2: 性能評価（1週間）
 								- E3: プラグインオーバーヘッド
 								- E4: スケーラビリティ
 								### Phase 3: 移植性評価（3日）
 								- E5: クロスプラットフォーム
 								### Phase 4: 論文執筆（1週間）
 								- 結果分析
 								- グラフ作成
 								- 考察執筆