Files
hakorune/docs/reference/testing-quality/golden-dump-testing.md
Moe Charm cc2a820af7 feat(plugin): Fix plugin BoxRef return and Box argument support
- Fixed deadlock in FileBox plugin copyFrom implementation (single lock)
- Added TLV Handle (tag=8) parsing in calls.rs for returned BoxRefs
- Improved plugin loader with config path consistency and detailed logging
- Fixed loader routing for proper Handle type_id/fini_method_id resolution
- Added detailed logging for TLV encoding/decoding in plugin_loader_v2

Test docs/examples/plugin_boxref_return.nyash now works correctly:
- cloneSelf() returns FileBox Handle properly
- copyFrom(Box) accepts plugin Box arguments
- Both FileBox instances close and fini correctly

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-21 00:41:26 +09:00

407 lines
12 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# 🏆 Nyash Golden Dump Testing System
*ChatGPT5推奨・MIR互換テスト回帰検出完全仕様*
## 🎯 目的
**「同じ入力→同じ出力」をinterp/vm/wasm/aot間で保証する自動検証システム**
MIR仕様の揺れ・バックエンド差異・最適化バグを**即座検出**し、Portability Contract v0を技術的に保証。
## 🔧 **Golden Dump方式**
### **基本原理**
```bash
# 1. MIR「黄金標準」生成
nyash --dump-mir program.nyash > program.golden.mir
# 2. 実行時MIR比較回帰検出
nyash --dump-mir program.nyash > program.current.mir
diff program.golden.mir program.current.mir
# 3. 全バックエンド出力比較(互換検証)
nyash --target interp program.nyash > interp.out
nyash --target vm program.nyash > vm.out
nyash --target wasm program.nyash > wasm.out
diff interp.out vm.out && diff vm.out wasm.out
```
### **階層化検証戦略**
| レベル | 検証対象 | 目的 | 頻度 |
|--------|----------|------|------|
| **L1: MIR構造** | AST→MIR変換 | 回帰検出 | 毎commit |
| **L2: 実行結果** | stdout/stderr | 互換性 | 毎PR |
| **L3: 最適化効果** | 性能・メモリ | 最適化回帰 | 毎週 |
| **L4: エラー処理** | 例外・エラー | 堅牢性 | 毎リリース |
## 🧪 **検証テストスイート**
### **1⃣ MIR Structure Tests (L1)**
#### **基本構造検証**
```rust
// tests/golden_dump/mir_structure_tests.rs
#[test]
fn test_basic_arithmetic_mir_stability() {
let source = r#"
static box Main {
main() {
local a, b, result
a = 42
b = 8
result = a + b
print(result)
return result
}
}
"#;
let golden_mir = load_golden_mir("basic_arithmetic.mir");
let current_mir = compile_to_mir(source);
assert_eq!(golden_mir, current_mir, "MIR回帰検出");
}
#[test]
fn test_box_operations_mir_stability() {
let source = r#"
box DataBox {
init { value }
pack(val) { me.value = val }
}
static box Main {
main() {
local obj = new DataBox(100)
print(obj.value)
}
}
"#;
let golden_mir = load_golden_mir("box_operations.mir");
let current_mir = compile_to_mir(source);
assert_mir_equivalent(golden_mir, current_mir);
}
#[test]
fn test_weak_reference_mir_stability() {
let source = r#"
box Parent { init { child_weak } }
box Child { init { data } }
static box Main {
main() {
local parent = new Parent()
local child = new Child(42)
parent.child_weak = weak(child)
if parent.child_weak.isAlive() {
print(parent.child_weak.get().data)
}
}
}
"#;
verify_mir_golden("weak_reference", source);
}
```
#### **MIR比較アルゴリズム**
```rust
// src/testing/mir_comparison.rs
pub fn assert_mir_equivalent(golden: &MirModule, current: &MirModule) {
// 1. 関数数・名前一致
assert_eq!(golden.functions.len(), current.functions.len());
for (name, golden_func) in &golden.functions {
let current_func = current.functions.get(name)
.expect(&format!("関数{}が見つからない", name));
// 2. 基本ブロック構造一致
assert_eq!(golden_func.blocks.len(), current_func.blocks.len());
// 3. 命令列意味的等価性ValueId正規化
let golden_normalized = normalize_value_ids(golden_func);
let current_normalized = normalize_value_ids(current_func);
assert_eq!(golden_normalized, current_normalized);
}
}
fn normalize_value_ids(func: &MirFunction) -> MirFunction {
// ValueIdを連番に正規化%0, %1, %2...
// 意味的に同じ命令列を確実に比較可能にする
}
```
### **2⃣ Cross-Backend Output Tests (L2)**
#### **標準出力一致検証**
```rust
// tests/golden_dump/output_compatibility_tests.rs
#[test]
fn test_cross_backend_arithmetic_output() {
let program = "arithmetic_test.nyash";
let interp_output = run_backend("interp", program);
let vm_output = run_backend("vm", program);
let wasm_output = run_backend("wasm", program);
assert_eq!(interp_output.stdout, vm_output.stdout);
assert_eq!(vm_output.stdout, wasm_output.stdout);
assert_eq!(interp_output.exit_code, vm_output.exit_code);
assert_eq!(vm_output.exit_code, wasm_output.exit_code);
}
#[test]
fn test_cross_backend_object_lifecycle() {
let program = "object_lifecycle_test.nyash";
let results = run_all_backends(program);
// fini()順序・タイミングが全バックエンドで同一
let finalization_orders: Vec<_> = results.iter()
.map(|r| &r.finalization_order)
.collect();
assert!(finalization_orders.windows(2).all(|w| w[0] == w[1]));
}
#[test]
fn test_cross_backend_weak_reference_behavior() {
let program = "weak_reference_test.nyash";
let results = run_all_backends(program);
// weak参照の生存チェック・null化が同一タイミング
let weak_behaviors: Vec<_> = results.iter()
.map(|r| &r.weak_reference_timeline)
.collect();
assert_all_equivalent(weak_behaviors);
}
```
#### **エラー処理一致検証**
```rust
#[test]
fn test_cross_backend_error_handling() {
let error_programs = [
"null_dereference.nyash",
"division_by_zero.nyash",
"weak_reference_after_fini.nyash",
"infinite_recursion.nyash"
];
for program in &error_programs {
let results = run_all_backends(program);
// エラー種別・メッセージが全バックエンドで同一
let error_types: Vec<_> = results.iter()
.map(|r| &r.error_type)
.collect();
assert_all_equivalent(error_types);
}
}
```
### **3⃣ Optimization Effect Tests (L3)**
#### **Bus-elision検証**
```rust
// tests/golden_dump/optimization_tests.rs
#[test]
fn test_bus_elision_output_equivalence() {
let program = "bus_communication_test.nyash";
let elision_on = run_with_flag(program, "--elide-bus");
let elision_off = run_with_flag(program, "--no-elide-bus");
// 出力は同一・性能は差がある
assert_eq!(elision_on.stdout, elision_off.stdout);
assert!(elision_on.execution_time < elision_off.execution_time);
}
#[test]
fn test_pure_function_optimization_equivalence() {
let program = "pure_function_optimization.nyash";
let optimized = run_with_flag(program, "--optimize");
let reference = run_with_flag(program, "--no-optimize");
// 最適化ON/OFFで結果同一
assert_eq!(optimized.output, reference.output);
// PURE関数の呼び出し回数が最適化で削減
assert!(optimized.pure_function_calls <= reference.pure_function_calls);
}
#[test]
fn test_memory_layout_compatibility() {
let program = "memory_intensive_test.nyash";
let results = run_all_backends(program);
// Box構造・フィールドアクセスが全バックエンドで同一結果
let memory_access_patterns: Vec<_> = results.iter()
.map(|r| &r.memory_access_log)
.collect();
assert_memory_semantics_equivalent(memory_access_patterns);
}
```
#### **性能回帰検証**
```rust
#[test]
fn test_performance_regression() {
let benchmarks = [
"arithmetic_heavy.nyash",
"object_creation_heavy.nyash",
"weak_reference_heavy.nyash"
];
for benchmark in &benchmarks {
let golden_perf = load_golden_performance(benchmark);
let current_perf = measure_current_performance(benchmark);
// 性能が大幅に劣化していないことを確認
let regression_threshold = 1.2; // 20%まで許容
assert!(current_perf.execution_time <= golden_perf.execution_time * regression_threshold);
assert!(current_perf.memory_usage <= golden_perf.memory_usage * regression_threshold);
}
}
```
## 🤖 **自動化CI/CD統合**
### **GitHub Actions設定**
```yaml
# .github/workflows/golden_dump_testing.yml
name: Golden Dump Testing
on: [push, pull_request]
jobs:
mir-stability:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Setup Rust
uses: actions-rs/toolchain@v1
with:
toolchain: stable
- name: Run MIR Structure Tests (L1)
run: |
cargo test --test mir_structure_tests
- name: Verify MIR Golden Dumps
run: |
./scripts/verify_mir_golden_dumps.sh
cross-backend-compatibility:
runs-on: ubuntu-latest
needs: mir-stability
steps:
- name: Run Cross-Backend Tests (L2)
run: |
cargo test --test output_compatibility_tests
- name: Verify All Backend Output Equality
run: |
./scripts/verify_backend_compatibility.sh
optimization-regression:
runs-on: ubuntu-latest
needs: cross-backend-compatibility
steps:
- name: Run Optimization Tests (L3)
run: |
cargo test --test optimization_tests
- name: Performance Regression Check
run: |
./scripts/check_performance_regression.sh
```
### **自動Golden Dump更新**
```bash
#!/bin/bash
# scripts/update_golden_dumps.sh
echo "🏆 Golden Dump更新中..."
# 1. 現在のMIRを新しい黄金標準として設定
for test_file in tests/golden_dump/programs/*.nyash; do
program_name=$(basename "$test_file" .nyash)
echo "更新中: $program_name"
# MIR golden dump更新
./target/release/nyash --dump-mir "$test_file" > "tests/golden_dump/mir/${program_name}.golden.mir"
# 出力 golden dump更新
./target/release/nyash --target interp "$test_file" > "tests/golden_dump/output/${program_name}.golden.out"
done
echo "✅ Golden Dump更新完了"
# 2. 更新を確認するためのテスト実行
cargo test --test golden_dump_tests
if [ $? -eq 0 ]; then
echo "🎉 新しいGolden Dumpでテスト成功"
else
echo "❌ 新しいGolden Dumpでテスト失敗"
exit 1
fi
```
## 📊 **実装優先順位**
### **Phase 8.4(緊急)**
- [ ] **L1実装**: MIR構造検証・基本golden dump
- [ ] **基本自動化**: CI/CDでのMIR回帰検出
- [ ] **Bus命令テスト**: elision ON/OFF検証基盤
### **Phase 8.5(短期)**
- [ ] **L2実装**: 全バックエンド出力一致検証
- [ ] **エラー処理**: 例外・エラーケース検証
- [ ] **性能基準**: ベンチマーク回帰検出
### **Phase 9+(中長期)**
- [ ] **L3-L4実装**: 最適化・堅牢性検証
- [ ] **高度自動化**: 自動修復・性能トレンド分析
- [ ] **形式検証**: 数学的正当性証明
## 🎯 **期待効果**
### **品質保証**
- **回帰即座検出**: MIR仕様変更のバグを即座発見
- **バックエンド信頼性**: 全実行環境で同一動作保証
- **最適化安全性**: 高速化による動作変更防止
### **開発効率**
- **自動品質確認**: 手動テスト不要・CI/CDで自動化
- **リファクタリング安全性**: 大規模変更の影響範囲特定
- **新機能信頼性**: 追加機能が既存動作に影響しない保証
### **Nyash言語価値**
- **エンタープライズ品質**: 厳密な品質保証プロセス
- **技術的差別化**: 「全バックエンド互換保証」の実証
- **拡張性基盤**: 新バックエンド追加時の品質維持
---
## 📚 **関連ドキュメント**
- **MIRリファレンス**: [mir-reference.md](mir-reference.md)
- **互換性契約**: [portability-contract.md](portability-contract.md)
- **ベンチマークシステム**: [../../../benchmarks/README.md](../../../benchmarks/README.md)
- **CI/CD設定**: [../../../.github/workflows/](../../../.github/workflows/)
---
*最終更新: 2025-08-14 - ChatGPT5推奨3点セット完成*
*Golden Dump Testing = Nyash品質保証の技術的基盤*