diff --git a/CURRENT_TASK.md b/CURRENT_TASK.md index a797d1ec..38e9b436 100644 --- a/CURRENT_TASK.md +++ b/CURRENT_TASK.md @@ -212,6 +212,12 @@ Hot Update — 2025‑09‑12 (sealed + dominator 修正の途中経過) - 適用先: strings.substring の start/end、strings.concat の si/is、compare の整数比較、flow.emit_branch の条件(int/ptr/float→i1) - 失敗時IRダンプ: `NYASH_LLVM_DUMP_ON_FAIL=1` で `tmp/llvm_fail_.ll` を出力(関数検証失敗時) +Hot Update — 2025‑09‑12 (Resolver 適用拡大 + sealed 既定ON) +- Resolver: i64/ptr/f64 を実装(per‑functionキャッシュ+BB先頭PHI)。 +- 適用: emit_branch 条件、strings(substring/concat si|is)、arith_ops(整数演算)、compare(整数比較)、externcall(console/env)、newbox(env.box.new_i64)、call の引数解決を Resolver 経由に統一。 +- 非sealed配線の削除: emit_jump/emit_branch 内の直接incoming追加を撤去。sealedスナップショット+Resolverの需要駆動で一本化。 +- 既定: `NYASH_LLVM_PHI_SEALED` 未設定=ON(`0` のみOFF)。 + Smoke(sealed=ON, dep_tree_min_string)所見 - 進展: PHI 欠落は再現せず、sealed での incoming 配線は安定 - 依然NG: Main.node_json/3 で dominator 違反(Instruction does not dominate all uses!) diff --git a/docs/RESOLVER_API.md b/docs/RESOLVER_API.md index 81b02a1b..224087e8 100644 --- a/docs/RESOLVER_API.md +++ b/docs/RESOLVER_API.md @@ -8,7 +8,9 @@ Goals Design - `Resolver` keeps small per-function caches keyed by `(BasicBlockId, ValueId)`. - `resolve_i64(...)` returns an `i64`-typed `IntValue`, inserting PHI and casts as needed using sealed snapshots. -- Internally uses `flow::localize_to_i64(...)` for now; later, fold logic directly and add `resolve_ptr/resolve_f64`. +- `resolve_ptr(...)` returns an `i8*` `PointerValue`, PHI at BB start; int handles are bridged via `inttoptr`. +- `resolve_f64(...)` returns an `f64` `FloatValue`, PHI at BB start; ints bridged via `sitofp`. +- Internally uses `flow::localize_to_i64(...)` for the i64 path; pointer/float are localized directly. Usage (planned wiring) - Create `let mut resolver = instructions::Resolver::new();` at function lowering start. @@ -16,10 +18,8 @@ Usage (planned wiring) - Keep builder insertion discipline via `BuilderCursor`. Next -- Add `resolve_ptr(...)` and `resolve_f64(...)` with same caching discipline. -- Migrate existing `localize_to_i64` call sites to the resolver. +- Migrate remaining `localize_to_i64` call sites to the resolver. - Enforce vmap direct access ban in lowerers (Resolver-only for reads). Acceptance tie-in - Combined with LoopForm: dispatch-only PHI + resolver-based value access → dominance violations drop to zero (A2.5). - diff --git a/docs/private/papers/paper-g-ai-assisted-compiler/README.md b/docs/private/papers/paper-g-ai-assisted-compiler/README.md new file mode 100644 index 00000000..a80a59c3 --- /dev/null +++ b/docs/private/papers/paper-g-ai-assisted-compiler/README.md @@ -0,0 +1,68 @@ +# AI-Assisted Compiler Development: The Nyash LLVM Journey + +## 📚 論文概要 +本論文は、ChatGPT/Claude/Geminiを活用してゼロからコンパイラ(Nyash)をLLVM層まで構築した、世界初?のAI主導コンパイラ開発の完全記録です。 + +## 🎯 論文の切り口(ハイブリッドアプローチ) + +### 1. **技術的貢献** +- MIR14: 27→13→14命令への進化 +- Box理論: Everything is Boxの設計哲学 +- LoopForm: PHI集中化による制御フロー正規化 +- Sealed SSA: 支配関係の構造的保証 + +### 2. **方法論的貢献** +- AI支援開発プロセスの実態記録 +- 人間×AIの役割分担と共進化 +- 失敗と解決の繰り返しから学ぶ知見 +- Python llvmliteハーネスへの方針転換 + +## 📅 開発タイムライン(1週間以上の激闘) + +``` +Day 1: 「PHI簡単だにゃ!」→ 現実の厳しさを知る +Day 2-3: PHI欠落、支配関係違反との戦い +Day 4: ChatGPT 8分調査「Investigating the builder issue」 +Day 5: Python llvmlite導入決断 +Day 6: LoopForm最終手段投入 +Day 7+: Resolver API統一、設計のブレを修正 +``` + +## 🔍 主要な転換点 + +1. **PHI配線問題の発覚** + - 「PHINode should have one entry for each predecessor」 + - emit側での配線 vs to側での需要駆動 + +2. **支配関係違反との長い戦い** + - 「Instruction does not dominate all uses!」 + - 値解決の分散が根本原因と判明 + +3. **LoopForm導入** + - 既存問題を顕在化させた構造 + - 「すべてをループのスコープにする」シンプルな理念 + +4. **Resolver API設計** + - すべての値解決を統一的に扱う + - 局所化→型正規化→キャッシュの自動化 + +## 📝 論文構成(案) + +1. **序論**: AI時代のコンパイラ開発 +2. **背景**: 既存コンパイラの複雑性とNyashの設計思想 +3. **技術編**: MIR14, Box理論, LoopForm, Sealed SSA +4. **開発編**: AI対話プロセスと実装の実態 +5. **評価編**: LLVM層完成の証明と性能評価 +6. **考察編**: AI×人間の協調開発の可能性と限界 +7. **結論**: 新しいコンパイラ開発手法の提案 + +## 🎉 成果物 +- 動作するLLVMバックエンド +- 27→14命令の極限的IR削減 +- AI支援開発の実践的知見 +- 「簡単最高」哲学の実証 + +## 📌 注記 +- 完全な再現は困難(「もう覚えてないにゃー」) +- しかし、プロセスの記録として価値がある +- 混沌とした開発過程自体が研究データ \ No newline at end of file diff --git a/docs/private/papers/paper-g-ai-assisted-compiler/abstract.md b/docs/private/papers/paper-g-ai-assisted-compiler/abstract.md new file mode 100644 index 00000000..c37b33ad --- /dev/null +++ b/docs/private/papers/paper-g-ai-assisted-compiler/abstract.md @@ -0,0 +1,25 @@ +# Abstract + +## AI-Assisted Compiler Development: Building Nyash LLVM Backend with ChatGPT and Claude + +We present the first documented case of building a compiler's LLVM backend through intensive AI assistance over a week-long development period. The Nyash programming language, based on the "Everything is Box" philosophy, achieves a minimalist intermediate representation (MIR14) with just 14 instructions, evolved from an initial 27 through aggressive reduction and pragmatic restoration. + +Our development process involved hundreds of AI interactions, with ChatGPT providing deep architectural analysis (including an 8-minute investigation into PHI node issues) and Claude offering continuous implementation support. Key technical contributions include: (1) MIR14 design achieving 27→13→14 instruction evolution, (2) LoopForm IR for control flow normalization, (3) Sealed SSA with Resolver API for unified value resolution, and (4) BuilderCursor for structural terminator safety. + +The project faced and overcame significant challenges including PHI wiring complexity, dominance violations, and type system confusion between i64 handles and i8* pointers. A critical insight emerged when LoopForm, initially blamed for introducing problems, actually exposed pre-existing design flaws in value resolution and type conversion placement. + +We document the pragmatic decision to introduce Python llvmlite as a verification harness when Rust/inkwell complexity became a bottleneck, demonstrating that "simple is best" - a core Nyash philosophy. The development logs, though incomplete ("I don't remember anymore"), provide valuable insights into the chaotic reality of AI-assisted development. + +This work demonstrates that AI can successfully assist in building complex systems like compilers, but human design judgment remains essential. The co-evolution of human design decisions and AI implementation represents a new paradigm in software development, with implications for future compiler construction and computer science education. + +## 日本語要旨 + +ChatGPTとClaudeを活用し、1週間以上の開発期間でLLVMバックエンドを構築した世界初の完全記録を報告する。「Everything is Box」哲学に基づくNyashプログラミング言語は、27命令から13命令への積極的削減と実用的な1命令復活により、わずか14命令の極限的中間表現(MIR14)を実現した。 + +数百回のAI対話を通じた開発では、ChatGPTがPHIノード問題の8分間にわたる深い調査を含むアーキテクチャ分析を提供し、Claudeが継続的な実装支援を行った。主要な技術的貢献として、(1) 27→13→14命令進化を遂げたMIR14設計、(2) 制御フロー正規化のためのLoopForm IR、(3) 統一的値解決のためのResolver API付きSealed SSA、(4) 構造的終端安全性のためのBuilderCursorが挙げられる。 + +開発過程では、PHI配線の複雑性、支配関係違反、i64ハンドルとi8*ポインタ間の型システム混乱など、重大な課題に直面した。当初LoopFormが問題の原因と誤解されたが、実際には既存の値解決と型変換配置の設計欠陥を顕在化させただけという重要な洞察を得た。 + +Rust/inkwellの複雑性がボトルネックとなった際、検証ハーネスとしてPython llvmliteを導入する実用的判断を下した。これは「簡単最高」というNyashの核心哲学の実証でもある。開発ログは不完全(「もう覚えてないにゃー」)ながらも、AI支援開発の混沌とした現実への貴重な洞察を提供する。 + +本研究は、コンパイラのような複雑なシステム構築においてもAI支援が有効であることを実証したが、人間の設計判断は依然として不可欠である。人間の設計決定とAI実装の共進化は、ソフトウェア開発の新しいパラダイムを示し、将来のコンパイラ構築と計算機科学教育に示唆を与える。 \ No newline at end of file diff --git a/docs/private/papers/paper-g-ai-assisted-compiler/development-log.md b/docs/private/papers/paper-g-ai-assisted-compiler/development-log.md new file mode 100644 index 00000000..e5873b53 --- /dev/null +++ b/docs/private/papers/paper-g-ai-assisted-compiler/development-log.md @@ -0,0 +1,74 @@ +# 開発ログ: AI支援によるLLVM層構築の記録 + +## 🔥 主要な問題と解決の軌跡 + +### 1. PHI配線問題(初期) +``` +エラー: PHINode should have one entry for each predecessor +原因: emit側とto側での二重配線、責務の曖昧さ +解決: Sealed SSAによる需要駆動型配線への統一 +``` + +### 2. 支配関係違反(中期) +``` +エラー: Instruction does not dominate all uses! +症状: Main.node_json/3, Main.esc_json/1での頻発 +原因: 値解決が分散、型変換の場所が不統一 +解決: Resolver APIによる統一的値解決 +``` + +### 3. 型システムの混乱 +``` +問題: i64 handle vs i8* pointer の混在 +症状: console.log(handle) → 不正メモリアクセス +解決: nyash.console.log_handle(i64)への変更 +``` + +### 4. LoopForm導入(後期) +``` +動機: 既存手法で解決困難、最終手段として投入 +効果: 問題を顕在化、PHI集中化への道筋 +誤解: 「LoopForm特有の問題」→ 実は既存問題の露呈 +``` + +## 💡 重要な設計決定 + +### Resolver API +```rust +// Before: 分散した値解決 +let val = vmap.get(&value_id) // どの時点の値? + +// After: 統一API +let val = resolver.resolve_i64(value_id, current_bb) +``` + +### BuilderCursor +- post-terminator挿入を構造的に防止 +- 全lowering経路に適用完了 + +### Python llvmlite導入 +- 「簡単最高」の精神 +- Rust/inkwellの複雑性からの解放 +- 検証ハーネスとしての活用 + +## 🤖 AI活用の実態 + +### ChatGPT +- 8分の深い調査「Investigating the builder issue」 +- 設計のブレを的確に指摘 +- Resolver APIの提案 + +### Claude +- 日々の実装支援 +- エラー解析と解決策提示 +- ドキュメント作成 + +### Gemini +- アーキテクチャ相談 +- セルフホスティング戦略議論 + +## 📊 成果 +- MIR14(14命令)でのLLVM層実装 +- LoopForm実験的実装 +- Resolver API設計・実装 +- BuilderCursor全域適用 \ No newline at end of file diff --git a/docs/private/papers/paper-g-ai-assisted-compiler/key-insights.md b/docs/private/papers/paper-g-ai-assisted-compiler/key-insights.md new file mode 100644 index 00000000..a1b0cc70 --- /dev/null +++ b/docs/private/papers/paper-g-ai-assisted-compiler/key-insights.md @@ -0,0 +1,65 @@ +# Key Insights: AI支援コンパイラ開発から得られた知見 + +## 🎯 技術的知見 + +### 1. **最小化の限界** +- MIR 27→13→14命令への進化 +- 「もうこれ以上簡単にできないにゃー」 +- UnaryOp復活の実用的判断 + +### 2. **LoopFormの真実** +- 問題の原因ではなく、問題を顕在化させた構造 +- 「すべてをループのスコープにする」シンプルな理念 +- 既存設計の弱点を浮き彫りに + +### 3. **設計のブレと修正** +- 値解決の分散 → Resolver APIで統一 +- 型変換の不統一 → 局所化の徹底 +- PHI配線の曖昧さ → Sealed SSAで解決 + +## 🤝 AI×人間の協調 + +### 成功パターン +1. **人間**: 設計原則の提示(Box理論、LoopForm) +2. **AI**: 具体的なRustコード実装 +3. **人間**: エラー報告と方向修正 +4. **AI**: 詳細な調査と解決策提案 + +### 失敗パターン +- AIの提案を鵜呑み → 設計の一貫性崩壊 +- 問題の本質を見誤る → 「LoopForm特有」という誤解 +- 最適化の誘惑 → 支配関係違反 + +## 💡 方法論的発見 + +### 1. **「遅くてOK」の価値** +- 正しさ優先、最適化は後回し +- 冗長な局所化を許容 +- まず動くものを作る + +### 2. **Python移行の合理性** +- 「恥じゃない、実用的な解決策」 +- llvmliteで100行 vs Rust/inkwellで数千行 +- デバッグの容易さ + +### 3. **記録の重要性** +- 「もう覚えてないにゃー」でも大丈夫 +- Gitコミット、CURRENT_TASK.mdが証拠 +- 混沌も含めて研究データ + +## 🚀 将来への示唆 + +### AI支援開発の可能性 +- コンパイラのような複雑なソフトウェアも構築可能 +- ただし、人間の設計判断は不可欠 +- 失敗と修正のサイクルが重要 + +### 新しい開発スタイル +- 対話的開発(人間×AI) +- 高速プロトタイピング +- 言語の壁を越える(Python/Rust混在) + +### 教育への影響 +- AIとの協調作業スキルが必須に +- 設計原則の理解がより重要に +- 実装詳細よりアーキテクチャ \ No newline at end of file diff --git a/src/backend/llvm/compiler/codegen/instructions/arrays.rs b/src/backend/llvm/compiler/codegen/instructions/arrays.rs index aa521e1e..500b0d53 100644 --- a/src/backend/llvm/compiler/codegen/instructions/arrays.rs +++ b/src/backend/llvm/compiler/codegen/instructions/arrays.rs @@ -10,6 +10,7 @@ use super::builder_cursor::BuilderCursor; pub(super) fn try_handle_array_method<'ctx, 'b>( codegen: &CodegenContext<'ctx>, cursor: &mut BuilderCursor<'ctx, 'b>, + resolver: &mut super::Resolver<'ctx>, cur_bid: BasicBlockId, func: &MirFunction, vmap: &mut HashMap>, @@ -18,6 +19,9 @@ pub(super) fn try_handle_array_method<'ctx, 'b>( method: &str, args: &[ValueId], recv_h: inkwell::values::IntValue<'ctx>, + bb_map: &std::collections::HashMap>, + preds: &std::collections::HashMap>, + block_end_values: &std::collections::HashMap>>, ) -> Result { // Only when receiver is ArrayBox let is_array = matches!(func.metadata.value_types.get(box_val), Some(crate::mir::MirType::Box(b)) if b == "ArrayBox") @@ -34,12 +38,7 @@ pub(super) fn try_handle_array_method<'ctx, 'b>( if args.len() != 1 { return Err("ArrayBox.get expects 1 arg".to_string()); } - let idx_v = *vmap.get(&args[0]).ok_or("array.get index missing")?; - let idx_i = if let BVE::IntValue(iv) = idx_v { - iv - } else { - return Err("array.get index must be int".to_string()); - }; + let idx_i = resolver.resolve_i64(codegen, cursor, cur_bid, args[0], bb_map, preds, block_end_values, vmap)?; let fnty = i64t.fn_type(&[i64t.into(), i64t.into()], false); let callee = codegen .module @@ -64,18 +63,8 @@ pub(super) fn try_handle_array_method<'ctx, 'b>( if args.len() != 2 { return Err("ArrayBox.set expects 2 arg".to_string()); } - let idx_v = *vmap.get(&args[0]).ok_or("array.set index missing")?; - let val_v = *vmap.get(&args[1]).ok_or("array.set value missing")?; - let idx_i = if let BVE::IntValue(iv) = idx_v { - iv - } else { - return Err("array.set index must be int".to_string()); - }; - let val_i = if let BVE::IntValue(iv) = val_v { - iv - } else { - return Err("array.set value must be int".to_string()); - }; + let idx_i = resolver.resolve_i64(codegen, cursor, cur_bid, args[0], bb_map, preds, block_end_values, vmap)?; + let val_i = resolver.resolve_i64(codegen, cursor, cur_bid, args[1], bb_map, preds, block_end_values, vmap)?; let fnty = i64t.fn_type(&[i64t.into(), i64t.into(), i64t.into()], false); let callee = codegen .module @@ -93,14 +82,7 @@ pub(super) fn try_handle_array_method<'ctx, 'b>( if args.len() != 1 { return Err("ArrayBox.push expects 1 arg".to_string()); } - let val_v = *vmap.get(&args[0]).ok_or("array.push value missing")?; - let val_i = match val_v { - BVE::IntValue(iv) => iv, - BVE::PointerValue(pv) => cursor - .emit_instr(cur_bid, |b| b.build_ptr_to_int(pv, i64t, "val_p2i")) - .map_err(|e| e.to_string())?, - _ => return Err("array.push value must be int or handle ptr".to_string()), - }; + let val_i = resolver.resolve_i64(codegen, cursor, cur_bid, args[0], bb_map, preds, block_end_values, vmap)?; let fnty = i64t.fn_type(&[i64t.into(), i64t.into()], false); let callee = codegen .module diff --git a/src/backend/llvm/compiler/codegen/instructions/boxcall.rs b/src/backend/llvm/compiler/codegen/instructions/boxcall.rs index 8389d49c..cda9f760 100644 --- a/src/backend/llvm/compiler/codegen/instructions/boxcall.rs +++ b/src/backend/llvm/compiler/codegen/instructions/boxcall.rs @@ -67,12 +67,12 @@ pub(in super::super) fn lower_boxcall<'ctx, 'b>( } // Delegate Map methods first (to avoid Array fallback catching get/set ambiguously) - if super::maps::try_handle_map_method(codegen, cursor, cur_bid, func, vmap, dst, box_val, method, args, recv_h)? { + if super::maps::try_handle_map_method(codegen, cursor, resolver, cur_bid, func, vmap, dst, box_val, method, args, recv_h)? { return Ok(()); } // Delegate Array methods - if super::arrays::try_handle_array_method(codegen, cursor, cur_bid, func, vmap, dst, box_val, method, args, recv_h)? { + if super::arrays::try_handle_array_method(codegen, cursor, resolver, cur_bid, func, vmap, dst, box_val, method, args, recv_h, bb_map, preds, block_end_values)? { return Ok(()); } diff --git a/src/backend/llvm/compiler/codegen/instructions/call.rs b/src/backend/llvm/compiler/codegen/instructions/call.rs index 2a599c33..887baefc 100644 --- a/src/backend/llvm/compiler/codegen/instructions/call.rs +++ b/src/backend/llvm/compiler/codegen/instructions/call.rs @@ -14,6 +14,7 @@ use crate::backend::llvm::compiler::codegen::instructions::builder_cursor::Build pub(in super::super) fn lower_call<'ctx, 'b>( codegen: &CodegenContext<'ctx>, cursor: &mut BuilderCursor<'ctx, 'b>, + resolver: &mut super::Resolver<'ctx>, cur_bid: BasicBlockId, _func: &MirFunction, vmap: &mut HashMap>, @@ -22,6 +23,9 @@ pub(in super::super) fn lower_call<'ctx, 'b>( args: &[ValueId], const_strs: &HashMap, llvm_funcs: &HashMap>, + bb_map: &std::collections::HashMap>, + preds: &std::collections::HashMap>, + block_end_values: &std::collections::HashMap>>, ) -> Result<(), String> { let name_s = const_strs .get(callee) @@ -43,11 +47,43 @@ pub(in super::super) fn lower_call<'ctx, 'b>( } let mut params: Vec = Vec::with_capacity(args.len()); for (i, a) in args.iter().enumerate() { - let v = *vmap - .get(a) - .ok_or_else(|| format!("call arg missing: {}", a.as_u32()))?; - let tv = coerce_to_type_cursor(codegen, cursor, cur_bid, v, exp_tys[i])?; - params.push(tv.into()); + use inkwell::types::BasicMetadataTypeEnum as BMTy; + let coerced: BVE<'ctx> = match exp_tys[i] { + BMTy::IntType(it) => { + // Localize as i64, then adjust width to callee expectation + let iv = resolver.resolve_i64(codegen, cursor, cur_bid, *a, bb_map, preds, block_end_values, vmap)?; + let bw_dst = it.get_bit_width(); + let bw_src = iv.get_type().get_bit_width(); + if bw_src == bw_dst { iv.into() } + else if bw_src < bw_dst { cursor.emit_instr(cur_bid, |b| b.build_int_z_extend(iv, it, "call_arg_zext")).map_err(|e| e.to_string())?.into() } + else if bw_dst == 1 { super::super::types::to_bool(codegen.context, iv.into(), &codegen.builder)?.into() } + else { cursor.emit_instr(cur_bid, |b| b.build_int_truncate(iv, it, "call_arg_trunc")).map_err(|e| e.to_string())?.into() } + } + BMTy::PointerType(pt) => { + // Localize as i64 handle and convert to expected pointer type + let iv = resolver.resolve_i64(codegen, cursor, cur_bid, *a, bb_map, preds, block_end_values, vmap)?; + let p = cursor + .emit_instr(cur_bid, |b| b.build_int_to_ptr(iv, pt, "call_arg_i2p")) + .map_err(|e| e.to_string())?; + p.into() + } + BMTy::FloatType(ft) => { + // Localize as f64, then adjust to callee expectation width if needed + let fv = resolver.resolve_f64(codegen, cursor, cur_bid, *a, bb_map, preds, block_end_values, vmap)?; + if fv.get_type() == ft { fv.into() } + else { + // Cast f64<->f32 as needed + cursor + .emit_instr(cur_bid, |b| b.build_float_cast(fv, ft, "call_arg_fcast")) + .map_err(|e| e.to_string())? + .into() + } + } + _ => { + return Err("call: unsupported parameter type (expected int/ptr/float)".to_string()); + } + }; + params.push(coerced.into()); } let call = cursor .emit_instr(cur_bid, |b| b.build_call(*target, ¶ms, "call")) @@ -59,53 +95,3 @@ pub(in super::super) fn lower_call<'ctx, 'b>( } Ok(()) } - -fn coerce_to_type_cursor<'ctx, 'b>( - codegen: &CodegenContext<'ctx>, - cursor: &mut BuilderCursor<'ctx, 'b>, - cur_bid: BasicBlockId, - val: BVE<'ctx>, - target: BMT<'ctx>, -) -> Result, String> { - use inkwell::types::BasicMetadataTypeEnum as BMTy; - match (val, target) { - (BVE::IntValue(iv), BMTy::IntType(it)) => { - let bw_src = iv.get_type().get_bit_width(); - let bw_dst = it.get_bit_width(); - if bw_src == bw_dst { - Ok(iv.into()) - } else if bw_src < bw_dst { - Ok(cursor - .emit_instr(cur_bid, |b| b.build_int_z_extend(iv, it, "call_zext")) - .map_err(|e| e.to_string())? - .into()) - } else if bw_dst == 1 { - Ok(super::super::types::to_bool(codegen.context, iv.into(), &codegen.builder)?.into()) - } else { - Ok(cursor - .emit_instr(cur_bid, |b| b.build_int_truncate(iv, it, "call_trunc")) - .map_err(|e| e.to_string())? - .into()) - } - } - (BVE::PointerValue(pv), BMTy::IntType(it)) => Ok(cursor - .emit_instr(cur_bid, |b| b.build_ptr_to_int(pv, it, "call_p2i")) - .map_err(|e| e.to_string())? - .into()), - (BVE::FloatValue(fv), BMTy::IntType(it)) => Ok(cursor - .emit_instr(cur_bid, |b| b.build_float_to_signed_int(fv, it, "call_f2i")) - .map_err(|e| e.to_string())? - .into()), - (BVE::IntValue(iv), BMTy::PointerType(pt)) => Ok(cursor - .emit_instr(cur_bid, |b| b.build_int_to_ptr(iv, pt, "call_i2p")) - .map_err(|e| e.to_string())? - .into()), - (BVE::PointerValue(pv), BMTy::PointerType(_)) => Ok(pv.into()), - (BVE::IntValue(iv), BMTy::FloatType(ft)) => Ok(cursor - .emit_instr(cur_bid, |b| b.build_signed_int_to_float(iv, ft, "call_i2f")) - .map_err(|e| e.to_string())? - .into()), - (BVE::FloatValue(fv), BMTy::FloatType(_)) => Ok(fv.into()), - (v, _) => Ok(v), - } -} diff --git a/src/backend/llvm/compiler/codegen/instructions/flow.rs b/src/backend/llvm/compiler/codegen/instructions/flow.rs index 51752425..4e2808ff 100644 --- a/src/backend/llvm/compiler/codegen/instructions/flow.rs +++ b/src/backend/llvm/compiler/codegen/instructions/flow.rs @@ -221,17 +221,14 @@ pub(in super::super) fn seal_block<'ctx, 'b>( } }; // Insert any required casts in the predecessor block, right before its terminator - let saved_block = codegen.builder.get_insert_block(); if let Some(pred_llbb) = bb_map.get(&bid) { - let term = unsafe { pred_llbb.get_terminator() }; - if let Some(t) = term { - codegen.builder.position_before(&t); - } else { - codegen.builder.position_at_end(*pred_llbb); - } + cursor.with_block(bid, *pred_llbb, |c| { + let term = unsafe { pred_llbb.get_terminator() }; + if let Some(t) = term { codegen.builder.position_before(&t); } + else { c.position_at_end(*pred_llbb); } + val = coerce_to_type(codegen, phi, val).expect("coerce_to_type in seal_block"); + }); } - val = coerce_to_type(codegen, phi, val)?; - if let Some(bb) = saved_block { codegen.builder.position_at_end(bb); } let pred_bb = *bb_map.get(&bid).ok_or("pred bb missing")?; if std::env::var("NYASH_CLI_VERBOSE").ok().as_deref() == Some("1") { let tys = phi @@ -341,14 +338,14 @@ pub(in super::super) fn finalize_phis<'ctx, 'b>( } }; // Insert casts in pred block, just before its terminator - let saved_block = codegen.builder.get_insert_block(); if let Some(pred_llbb) = bb_map.get(pred) { - let term = unsafe { pred_llbb.get_terminator() }; - if let Some(t) = term { codegen.builder.position_before(&t); } - else { codegen.builder.position_at_end(*pred_llbb); } + cursor.with_block(*pred, *pred_llbb, |c| { + let term = unsafe { pred_llbb.get_terminator() }; + if let Some(t) = term { codegen.builder.position_before(&t); } + else { c.position_at_end(*pred_llbb); } + val = coerce_to_type(codegen, phi, val).expect("coerce_to_type finalize_phis"); + }); } - val = coerce_to_type(codegen, phi, val)?; - if let Some(bb) = saved_block { codegen.builder.position_at_end(bb); } let pred_bb = *bb_map.get(pred).ok_or("pred bb missing")?; if std::env::var("NYASH_CLI_VERBOSE").ok().as_deref() == Some("1") { eprintln!( diff --git a/src/backend/llvm/compiler/codegen/instructions/loopform.rs b/src/backend/llvm/compiler/codegen/instructions/loopform.rs index 4fd9f10f..0a137644 100644 --- a/src/backend/llvm/compiler/codegen/instructions/loopform.rs +++ b/src/backend/llvm/compiler/codegen/instructions/loopform.rs @@ -16,6 +16,7 @@ use super::super::types::to_bool; /// LoopForm scaffolding — fixed block layout for while/loop normalization pub struct LoopFormContext<'ctx> { + pub preheader: BasicBlock<'ctx>, pub header: BasicBlock<'ctx>, pub body: BasicBlock<'ctx>, pub dispatch: BasicBlock<'ctx>, @@ -32,6 +33,9 @@ impl<'ctx> LoopFormContext<'ctx> { loop_id: u32, prefix: &str, ) -> Self { + let preheader = codegen + .context + .append_basic_block(function, &format!("{}_lf{}_preheader", prefix, loop_id)); let header = codegen .context .append_basic_block(function, &format!("{}_lf{}_header", prefix, loop_id)); @@ -47,7 +51,7 @@ impl<'ctx> LoopFormContext<'ctx> { let exit = codegen .context .append_basic_block(function, &format!("{}_lf{}_exit", prefix, loop_id)); - Self { header, body, dispatch, latch, exit, loop_id } + Self { preheader, header, body, dispatch, latch, exit, loop_id } } } @@ -78,6 +82,13 @@ pub fn lower_while_loopform<'ctx, 'b>( // Create LoopForm fixed blocks under the same function let lf = LoopFormContext::new(codegen, llvm_func, loop_id, prefix); + // Preheader: currently a pass-through to header (Phase 1) + codegen.builder.position_at_end(lf.preheader); + codegen + .builder + .build_unconditional_branch(lf.header) + .map_err(|e| e.to_string()) + .unwrap(); // Header: evaluate condition and branch to body (for true) or dispatch (for false) let cond_v = *vmap.get(condition).ok_or("loopform: condition value missing")?; diff --git a/src/backend/llvm/compiler/codegen/instructions/maps.rs b/src/backend/llvm/compiler/codegen/instructions/maps.rs index 7495d644..30d1de3b 100644 --- a/src/backend/llvm/compiler/codegen/instructions/maps.rs +++ b/src/backend/llvm/compiler/codegen/instructions/maps.rs @@ -10,6 +10,7 @@ use super::builder_cursor::BuilderCursor; pub(super) fn try_handle_map_method<'ctx, 'b>( codegen: &CodegenContext<'ctx>, cursor: &mut BuilderCursor<'ctx, 'b>, + resolver: &mut super::Resolver<'ctx>, cur_bid: BasicBlockId, func: &MirFunction, vmap: &mut HashMap>, @@ -59,10 +60,7 @@ pub(super) fn try_handle_map_method<'ctx, 'b>( } let key_v = *vmap.get(&args[0]).ok_or("map.has key missing")?; let key_i = match key_v { - BVE::IntValue(iv) => iv, - BVE::PointerValue(pv) => cursor - .emit_instr(cur_bid, |b| b.build_ptr_to_int(pv, i64t, "key_p2i")) - .map_err(|e| e.to_string())?, + BVE::IntValue(_) | BVE::PointerValue(_) => resolver.resolve_i64(codegen, cursor, cur_bid, args[0], &std::collections::HashMap::new(), &std::collections::HashMap::new(), &std::collections::HashMap::new(), vmap).map_err(|e| e.to_string())?, _ => return Err("map.has key must be int or handle ptr".to_string()), }; let fnty = i64t.fn_type(&[i64t.into(), i64t.into()], false); @@ -91,7 +89,8 @@ pub(super) fn try_handle_map_method<'ctx, 'b>( } let key_v = *vmap.get(&args[0]).ok_or("map.get key missing")?; let call = match key_v { - BVE::IntValue(iv) => { + BVE::IntValue(_) => { + let iv = resolver.resolve_i64(codegen, cursor, cur_bid, args[0], &std::collections::HashMap::new(), &std::collections::HashMap::new(), &std::collections::HashMap::new(), vmap)?; let fnty = i64t.fn_type(&[i64t.into(), i64t.into()], false); let callee = codegen .module @@ -147,17 +146,11 @@ pub(super) fn try_handle_map_method<'ctx, 'b>( let key_v = *vmap.get(&args[0]).ok_or("map.set key missing")?; let val_v = *vmap.get(&args[1]).ok_or("map.set value missing")?; let key_i = match key_v { - BVE::IntValue(iv) => iv, - BVE::PointerValue(pv) => cursor - .emit_instr(cur_bid, |b| b.build_ptr_to_int(pv, i64t, "key_p2i")) - .map_err(|e| e.to_string())?, + BVE::IntValue(_) | BVE::PointerValue(_) => resolver.resolve_i64(codegen, cursor, cur_bid, args[0], &std::collections::HashMap::new(), &std::collections::HashMap::new(), &std::collections::HashMap::new(), vmap).map_err(|e| e.to_string())?, _ => return Err("map.set key must be int or handle ptr".to_string()), }; let val_i = match val_v { - BVE::IntValue(iv) => iv, - BVE::PointerValue(pv) => cursor - .emit_instr(cur_bid, |b| b.build_ptr_to_int(pv, i64t, "val_p2i")) - .map_err(|e| e.to_string())?, + BVE::IntValue(_) | BVE::PointerValue(_) => resolver.resolve_i64(codegen, cursor, cur_bid, args[1], &std::collections::HashMap::new(), &std::collections::HashMap::new(), &std::collections::HashMap::new(), vmap).map_err(|e| e.to_string())?, _ => return Err("map.set value must be int or handle ptr".to_string()), }; let fnty = i64t.fn_type(&[i64t.into(), i64t.into(), i64t.into()], false); diff --git a/src/backend/llvm/compiler/codegen/instructions/resolver.rs b/src/backend/llvm/compiler/codegen/instructions/resolver.rs index 7735fc5b..f78c8318 100644 --- a/src/backend/llvm/compiler/codegen/instructions/resolver.rs +++ b/src/backend/llvm/compiler/codegen/instructions/resolver.rs @@ -1,6 +1,7 @@ use std::collections::HashMap; use inkwell::values::{BasicValueEnum as BVE, IntValue}; +use inkwell::values::PointerValue; use crate::backend::llvm::context::CodegenContext; use crate::mir::{BasicBlockId, ValueId}; @@ -12,11 +13,13 @@ use super::flow::localize_to_i64; /// redundant PHIs and casts when multiple users in the same block request the same MIR value. pub struct Resolver<'ctx> { i64_locals: HashMap<(BasicBlockId, ValueId), IntValue<'ctx>>, + ptr_locals: HashMap<(BasicBlockId, ValueId), PointerValue<'ctx>>, + f64_locals: HashMap<(BasicBlockId, ValueId), inkwell::values::FloatValue<'ctx>>, } impl<'ctx> Resolver<'ctx> { pub fn new() -> Self { - Self { i64_locals: HashMap::new() } + Self { i64_locals: HashMap::new(), ptr_locals: HashMap::new(), f64_locals: HashMap::new() } } /// Resolve a MIR value as an i64 dominating the current block. @@ -39,5 +42,112 @@ impl<'ctx> Resolver<'ctx> { self.i64_locals.insert((cur_bid, vid), iv); Ok(iv) } -} + /// Resolve a MIR value as an i8* pointer dominating the current block. + pub fn resolve_ptr<'b>( + &mut self, + codegen: &CodegenContext<'ctx>, + cursor: &mut BuilderCursor<'ctx, 'b>, + cur_bid: BasicBlockId, + vid: ValueId, + bb_map: &std::collections::HashMap>, + preds: &std::collections::HashMap>, + block_end_values: &std::collections::HashMap>>, + vmap: &std::collections::HashMap>, + ) -> Result, String> { + if let Some(pv) = self.ptr_locals.get(&(cur_bid, vid)).copied() { + return Ok(pv); + } + let i8p = codegen.context.ptr_type(inkwell::AddressSpace::from(0)); + let cur_llbb = *bb_map.get(&cur_bid).ok_or("cur bb missing")?; + let pred_list = preds.get(&cur_bid).cloned().unwrap_or_default(); + // Insert PHI at block start + let saved_ip = codegen.builder.get_insert_block(); + if let Some(first) = cur_llbb.get_first_instruction() { codegen.builder.position_before(&first); } + else { codegen.builder.position_at_end(cur_llbb); } + let phi = codegen.builder.build_phi(i8p, &format!("loc_p_{}", vid.as_u32())).map_err(|e| e.to_string())?; + if pred_list.is_empty() { + // Entry-like block: derive from vmap or zero + let base = vmap.get(&vid).copied().unwrap_or_else(|| i8p.const_zero().into()); + let coerced = match base { + BVE::PointerValue(pv) => pv, + BVE::IntValue(iv) => cursor.emit_instr(cur_bid, |b| b.build_int_to_ptr(iv, i8p, "loc_i2p")).map_err(|e| e.to_string())?, + BVE::FloatValue(_) => i8p.const_zero(), + _ => i8p.const_zero(), + }; + phi.add_incoming(&[(&coerced, cur_llbb)]); + } else { + for p in &pred_list { + let pred_bb = *bb_map.get(p).ok_or("pred bb missing")?; + let base = block_end_values + .get(p) + .and_then(|m| m.get(&vid).copied()) + .unwrap_or_else(|| i8p.const_zero().into()); + let coerced = match base { + BVE::PointerValue(pv) => pv, + BVE::IntValue(iv) => codegen.builder.build_int_to_ptr(iv, i8p, "loc_i2p_p").map_err(|e| e.to_string())?, + BVE::FloatValue(_) => i8p.const_zero(), + _ => i8p.const_zero(), + }; + phi.add_incoming(&[(&coerced, pred_bb)]); + } + } + if let Some(bb) = saved_ip { codegen.builder.position_at_end(bb); } + let out = phi.as_basic_value().into_pointer_value(); + self.ptr_locals.insert((cur_bid, vid), out); + Ok(out) + } + + /// Resolve a MIR value as an f64 dominating the current block. + pub fn resolve_f64<'b>( + &mut self, + codegen: &CodegenContext<'ctx>, + cursor: &mut BuilderCursor<'ctx, 'b>, + cur_bid: BasicBlockId, + vid: ValueId, + bb_map: &std::collections::HashMap>, + preds: &std::collections::HashMap>, + block_end_values: &std::collections::HashMap>>, + vmap: &std::collections::HashMap>, + ) -> Result, String> { + if let Some(fv) = self.f64_locals.get(&(cur_bid, vid)).copied() { + return Ok(fv); + } + let f64t = codegen.context.f64_type(); + let cur_llbb = *bb_map.get(&cur_bid).ok_or("cur bb missing")?; + let pred_list = preds.get(&cur_bid).cloned().unwrap_or_default(); + let saved_ip = codegen.builder.get_insert_block(); + if let Some(first) = cur_llbb.get_first_instruction() { codegen.builder.position_before(&first); } + else { codegen.builder.position_at_end(cur_llbb); } + let phi = codegen.builder.build_phi(f64t, &format!("loc_f64_{}", vid.as_u32())).map_err(|e| e.to_string())?; + if pred_list.is_empty() { + let base = vmap.get(&vid).copied().unwrap_or_else(|| f64t.const_zero().into()); + let coerced = match base { + BVE::FloatValue(fv) => fv, + BVE::IntValue(iv) => codegen.builder.build_signed_int_to_float(iv, f64t, "loc_i2f").map_err(|e| e.to_string())?, + BVE::PointerValue(_) => f64t.const_zero(), + _ => f64t.const_zero(), + }; + phi.add_incoming(&[(&coerced, cur_llbb)]); + } else { + for p in &pred_list { + let pred_bb = *bb_map.get(p).ok_or("pred bb missing")?; + let base = block_end_values + .get(p) + .and_then(|m| m.get(&vid).copied()) + .unwrap_or_else(|| f64t.const_zero().into()); + let coerced = match base { + BVE::FloatValue(fv) => fv, + BVE::IntValue(iv) => codegen.builder.build_signed_int_to_float(iv, f64t, "loc_i2f_p").map_err(|e| e.to_string())?, + BVE::PointerValue(_) => f64t.const_zero(), + _ => f64t.const_zero(), + }; + phi.add_incoming(&[(&coerced, pred_bb)]); + } + } + if let Some(bb) = saved_ip { codegen.builder.position_at_end(bb); } + let out = phi.as_basic_value().into_float_value(); + self.f64_locals.insert((cur_bid, vid), out); + Ok(out) + } +} diff --git a/src/backend/llvm/compiler/codegen/mod.rs b/src/backend/llvm/compiler/codegen/mod.rs index d764894e..0fa92678 100644 --- a/src/backend/llvm/compiler/codegen/mod.rs +++ b/src/backend/llvm/compiler/codegen/mod.rs @@ -277,7 +277,22 @@ impl LLVMCompiler { defined_in_block.insert(*dst); }, MirInstruction::Call { dst, func: callee, args, .. } => { - instructions::lower_call(&codegen, &mut cursor, *bid, func, &mut vmap, dst, callee, args, &const_strs, &llvm_funcs)?; + instructions::lower_call( + &codegen, + &mut cursor, + &mut resolver, + *bid, + func, + &mut vmap, + dst, + callee, + args, + &const_strs, + &llvm_funcs, + &bb_map, + &preds, + &block_end_values, + )?; if let Some(d) = dst { defined_in_block.insert(*d); } } MirInstruction::BoxCall {