diff --git a/CURRENT_TASK.md b/CURRENT_TASK.md index 4bc186c2..a797d1ec 100644 --- a/CURRENT_TASK.md +++ b/CURRENT_TASK.md @@ -15,7 +15,7 @@ Compact Roadmap (2025‑09‑12) - BuilderCursor: post‑terminator挿入を即panic(strings/arith_ops/memへ適用済)。 - Next (short): 1) BuilderCursor厳格化の適用拡大(externcall→newbox→arrays→maps→call)。 - 2) Sealed SSA を既定ONに一本化(finalize_phis停止、seal_blockで完結)。 + 2) Sealed SSA を既定ONに一本化(finalize_phis停止、seal_blockで完結)。NYASH_LLVM_PHI_SEALED は未設定時=ON。 3) LoopForm header PHI正規化の安定化(latch→header ON 時も verifier green)。 4) body→dispatchを単純ボディで常用化(段階ゲート)。 5) 計測: dispatch-only PHI/ゼロ合成減少、post‑terminator検知ゼロ継続。 @@ -54,6 +54,21 @@ Hot Update — 2025‑09‑12 (Plan: LLVM wrapper via Nyash ABI) - I/O 仕様: 入力=MIR(JSON/メモリ), 出力=.o(`NYASH_AOT_OBJECT_OUT` に書き出し)。 - 受け入れ: harness ON/OFF で dep_tree_min_string の出力一致(機能同値)。 +Scaffold — 2025‑09‑12 (llvmlite harness) +- Added tools/llvmlite_harness.py (trivial ny_main returning 0) and docs/LLVM_HARNESS.md. +- Use to validate toolchain wiring; extend to lower MIR14 JSON incrementally. + +Scaffold — 2025‑09‑12 (Resolver i64 minimal) +- Added src/backend/llvm/compiler/codegen/instructions/resolver.rs with `Resolver::resolve_i64(...)` and per-block cache. +- docs/RESOLVER_API.md documents goals/usage; wiring to replace `localize_to_i64` callsites comes next. + +Docs — LLVM layer overview (2025‑09‑12) +- Added docs/LLVM_LAYER_OVERVIEW.md and linked it with existing docs: + - docs/LOWERING_LLVM.md — concrete lowering rules and RT calls + - docs/RESOLVER_API.md — value resolution (sealed/localize) API and cache + - docs/LLVM_HARNESS.md — llvmlite harness scope and interface + Use as the canonical reference for invariants: Resolver-only reads, cast placement, cursor discipline, sealed SSA, and LoopForm shape. + Hot Repro — esc_json/1 PHI 配線(2025‑09‑12) - 対象: apps/selfhost/tools/dep_tree_min_string.nyash - 実行(LLVM): @@ -187,6 +202,36 @@ Done (today) - BuilderCursor 適用(第1弾): strings/arith_ops/mem を Cursor 経由に統一。post-terminator 挿入検知を強化。 - Sealed SSA: `finalize_phis` を停止し、`seal_block` に一本化。LoopForm latch→header の header PHI 正規化を追加(ゲート付)。 +Hot Update — 2025‑09‑12 (sealed + dominator 修正の途中経過) +- BuilderCursor 全域化 拡大(第一波 完了) + - externcall(console/env), newbox, arrays, maps, call, compare を Cursor 経由へ移行 + - 既存 strings/arith_ops/mem とあわせて、ほぼ全 lowering が post‑terminator 防止のガード下に +- on‑demand PHI(局所化)導入(flow::localize_to_i64) + - 目的: 「現在BBで利用する i64 値」を pred スナップショットから PHI 生成して局所定義に置換→支配関係違反を解消 + - 生成位置: BB 先頭(既存PHIの前)に挿入。挿入点は保存/復元 + - 適用先: strings.substring の start/end、strings.concat の si/is、compare の整数比較、flow.emit_branch の条件(int/ptr/float→i1) +- 失敗時IRダンプ: `NYASH_LLVM_DUMP_ON_FAIL=1` で `tmp/llvm_fail_.ll` を出力(関数検証失敗時) + +Smoke(sealed=ON, dep_tree_min_string)所見 +- 進展: PHI 欠落は再現せず、sealed での incoming 配線は安定 +- 依然NG: Main.node_json/3 で dominator 違反(Instruction does not dominate all uses!) + - iadd→icmp/sub/substring/concat 連鎖の一部で、iadd 定義が利用点を支配していない + - 対応済: 分岐条件/整数比較/substring/concat の整数引数は局所化済み + - まだの可能性が高い箇所: そのほかの lowering 内で vmap 経由の整数使用(BoxCall/ExternCall/arith_ops 内の再利用点など) + +Next(引き継ぎアクション) +1) 局所化の適用拡大(優先) + - vmap から整数値を読み出して利用する全パスで `localize_to_i64` を適用 + - 候補: arith_ops(BinOpのオペランド再利用箇所)、BoxCall の残りの整数引数、他メソッドの整数パラメータ + - types.to_bool 直叩きは emit 側での「局所化→!=0」に段階移行 +2) Resolver API の一般化 + - 「ValueId→現在BBの値」を返す resolver を導入(まず i64、必要に応じて ptr/f64 へ拡張) + - 全 lowering から resolver 経由で値取得し、支配関係崩れを根本排除 +3) IR 可視化/検証強化 + - 失敗関数の .ll を確認し、局所化漏れの使用点を特定→順次塞ぐ +4) 併走: llvmlite 検証ハーネス(`NYASH_LLVM_USE_HARNESS=1`) + - PHI/loop/短絡の形を高速に検証→Rust 実装へ反映(機能一致を Acceptance A5 で担保) + Refactor — LLVM codegen instructions modularized (done) - Goal achieved: `instructions.rs` を段階分割し、責務ごとに再配置(0‑diff)。 - New layout under `src/backend/llvm/compiler/codegen/instructions/`: @@ -275,6 +320,7 @@ Next Flow(これからの流れ=段階導入) Acceptance(段階ごと) - A1: LoopForm ON でも従来挙動と等価(Break 集約のみ・非破壊、smoke green)。 - A2: BuilderCursor 厳格化で post-terminator が検知ゼロ(panic不発)が続く。 +- A2.5: sealed=ON で dep_tree_min_string の dominator 違反ゼロ(IR dump 不要レベル)。 - A3: header PHI 正規化後、latch→header 有効でも verifier green(PHI 欠落なし)。 - A4: body→dispatch を単純ボディで常用化し、dispatch 以外に PHI が出ないことを確認。 - A5: `NYASH_LLVM_USE_HARNESS=1`(llvmlite)と OFF(Rust)の出力が dep_tree_min_string で機能一致。 diff --git a/docs/LLVM_HARNESS.md b/docs/LLVM_HARNESS.md new file mode 100644 index 00000000..16efebc7 --- /dev/null +++ b/docs/LLVM_HARNESS.md @@ -0,0 +1,36 @@ +# llvmlite Harness (Experimental) + +Purpose +- Provide a fast, scriptable LLVM emission path using Python + llvmlite for validation and prototyping. +- Run in parallel with the Rust/inkwell path; keep outputs functionally equivalent for targeted smokes. + +Switch +- Set `NYASH_LLVM_USE_HARNESS=1` to prefer the harness (future: wired in LLVM backend entry). + +Protocol (tentative) +- Input: MIR14 JSON file path (subset sufficient for dep_tree_min_string initially). +- Output: `.o` object file written to `NYASH_AOT_OBJECT_OUT` or `--out` path. +- Entry function: `ny_main(i64 argc, i8** argv) -> i64` (returns app exit code/box-handle per ABI). + +Quick Start +- Install deps: `python3 -m pip install llvmlite` +- Generate a dummy object to validate toolchain: + - `python3 tools/llvmlite_harness.py --out /tmp/dummy.o` + - Link with NyRT as usual to produce an executable. + +Intended Wiring (Rust side) +- LLVM backend checks `NYASH_LLVM_USE_HARNESS=1` and, if set, exports MIR14 of the target module to a temp JSON, then invokes: + - `python3 tools/llvmlite_harness.py --in --out ` +- On success, the normal link step continues using ``. + +Scope (Phase 15) +- Minimal ops: i64 arithmetic, comparisons, branches, PHI(Sealed), basic string ops through NyRT shims. +- Target case: `apps/selfhost/tools/dep_tree_min_string.nyash` builds and runs. + +Acceptance +- A5: Harness ON vs OFF produce functionally equivalent output for the target smoke. + +Notes +- The first version may ignore MIR details and emit a fixed `ny_main` body for smoke scaffolding; then iterate to lower MIR ops. +- Keep the harness self-contained; no external state besides inputs and env. + diff --git a/docs/LLVM_LAYER_OVERVIEW.md b/docs/LLVM_LAYER_OVERVIEW.md new file mode 100644 index 00000000..e9ad52fa --- /dev/null +++ b/docs/LLVM_LAYER_OVERVIEW.md @@ -0,0 +1,37 @@ +# LLVM Layer Overview (Phase 15) + +Scope +- Practical guide to LLVM lowering architecture and invariants used in Phase 15. +- Complements LOWERING_LLVM.md (rules), RESOLVER_API.md (value resolution), and LLVM_HARNESS.md (harness). + +Module Layout +- `src/backend/llvm/compiler/codegen/` + - `instructions/` split by concern: `flow.rs`, `blocks.rs`, `arith.rs`, `arith_ops.rs`, `mem.rs`, + `strings.rs`, `arrays.rs`, `maps.rs`, `boxcall/`, `externcall/`, `call.rs`, `loopform.rs`, `resolver.rs`. + - `builder_cursor.rs`: central insertion/terminator guard. + +Core Invariants +- Resolver-only reads: lowerers fetch MIR values through `Resolver` (no direct `vmap` access for cross-BB values). +- Localize at block start: PHIs created at the beginning of the current BB (before non-PHI) to guarantee dominance. +- Cast placement: perform ptr↔int and width casts outside PHIs, at BB start or just-before-pred-terminator via `with_block`. +- Sealed SSA: successor PHIs wired by predecessor snapshots and `seal_block`; branch/jump do not push incoming directly. +- Cursor discipline: only insert via `BuilderCursor`; post-terminator insertions are forbidden. + +LoopForm (gated) +- Shape: `preheader → header → body → dispatch(phi) → {latch|exit} → header` with PHIs centralized in `dispatch`. +- State: model loop-carried values via a `LoopState` aggregate (tag + payloads). +- Goals: move all PHIs to dispatch, ensure header uses are dominated by preheader/dispatch values. + +Types and Bridges +- Box handle is `i64` across NyRT boundary; strings prefer `i8*` fast paths. +- Convert rules: `ensure_i64/ensure_i1/ensure_ptr` style helpers (planned extraction) to centralize casting. + +Harness (optional) +- llvmlite harness exists for fast prototyping and structural checks. +- Gate: `NYASH_LLVM_USE_HARNESS=1` (planned wiring); target parity tested by Acceptance A5. + +References +- LOWERING_LLVM.md — lowering rules and runtime calls +- RESOLVER_API.md — Resolver design and usage +- LLVM_HARNESS.md — llvmlite harness interface and usage + diff --git a/docs/RESOLVER_API.md b/docs/RESOLVER_API.md new file mode 100644 index 00000000..81b02a1b --- /dev/null +++ b/docs/RESOLVER_API.md @@ -0,0 +1,25 @@ +# Resolver API (Minimal i64 Prototype) + +Goals +- Centralize "ValueId → current-block value" resolution. +- Guarantee dominance by localizing values at the start of the block (before non-PHI). +- De-duplicate per (block, value) to avoid redundant PHIs/casts. + +Design +- `Resolver` keeps small per-function caches keyed by `(BasicBlockId, ValueId)`. +- `resolve_i64(...)` returns an `i64`-typed `IntValue`, inserting PHI and casts as needed using sealed snapshots. +- Internally uses `flow::localize_to_i64(...)` for now; later, fold logic directly and add `resolve_ptr/resolve_f64`. + +Usage (planned wiring) +- Create `let mut resolver = instructions::Resolver::new();` at function lowering start. +- Replace all integer value fetches in lowerers with `resolver.resolve_i64(...)`. +- Keep builder insertion discipline via `BuilderCursor`. + +Next +- Add `resolve_ptr(...)` and `resolve_f64(...)` with same caching discipline. +- Migrate existing `localize_to_i64` call sites to the resolver. +- Enforce vmap direct access ban in lowerers (Resolver-only for reads). + +Acceptance tie-in +- Combined with LoopForm: dispatch-only PHI + resolver-based value access → dominance violations drop to zero (A2.5). + diff --git a/src/backend/llvm/compiler/codegen/instructions/arith.rs b/src/backend/llvm/compiler/codegen/instructions/arith.rs index c917e6d8..4f9f12c1 100644 --- a/src/backend/llvm/compiler/codegen/instructions/arith.rs +++ b/src/backend/llvm/compiler/codegen/instructions/arith.rs @@ -10,6 +10,7 @@ use super::builder_cursor::BuilderCursor; pub(in super::super) fn lower_compare<'ctx, 'b>( codegen: &CodegenContext<'ctx>, cursor: &mut BuilderCursor<'ctx, 'b>, + resolver: &mut super::Resolver<'ctx>, cur_bid: BasicBlockId, func: &MirFunction, vmap: &HashMap>, @@ -104,9 +105,11 @@ pub(in super::super) fn lower_compare<'ctx, 'b>( } let out = if let (Some(_li0), Some(_ri0)) = (as_int(lv), as_int(rv)) { // Localize integer operands into current block to satisfy dominance - let mut li = super::flow::localize_to_i64(codegen, cursor, cur_bid, *lhs, bb_map, preds, block_end_values, vmap) + let mut li = resolver + .resolve_i64(codegen, cursor, cur_bid, *lhs, bb_map, preds, block_end_values, vmap) .unwrap_or_else(|_| as_int(lv).unwrap()); - let mut ri = super::flow::localize_to_i64(codegen, cursor, cur_bid, *rhs, bb_map, preds, block_end_values, vmap) + let mut ri = resolver + .resolve_i64(codegen, cursor, cur_bid, *rhs, bb_map, preds, block_end_values, vmap) .unwrap_or_else(|_| as_int(rv).unwrap()); // Normalize integer widths: extend the narrower to match the wider to satisfy LLVM let lw = li.get_type().get_bit_width(); diff --git a/src/backend/llvm/compiler/codegen/instructions/arith_ops.rs b/src/backend/llvm/compiler/codegen/instructions/arith_ops.rs index 191ebe52..f92275b5 100644 --- a/src/backend/llvm/compiler/codegen/instructions/arith_ops.rs +++ b/src/backend/llvm/compiler/codegen/instructions/arith_ops.rs @@ -57,6 +57,7 @@ pub(in super::super) fn lower_unary<'ctx, 'b>( pub(in super::super) fn lower_binop<'ctx, 'b>( codegen: &CodegenContext<'ctx>, cursor: &mut BuilderCursor<'ctx, 'b>, + resolver: &mut super::Resolver<'ctx>, cur_bid: BasicBlockId, func: &MirFunction, vmap: &mut HashMap>, @@ -64,6 +65,9 @@ pub(in super::super) fn lower_binop<'ctx, 'b>( op: &BinaryOp, lhs: &ValueId, rhs: &ValueId, + bb_map: &std::collections::HashMap>, + preds: &std::collections::HashMap>, + block_end_values: &std::collections::HashMap>>, ) -> Result<(), String> { use crate::backend::llvm::compiler::helpers::{as_float, as_int}; use inkwell::values::BasicValueEnum as BVE; @@ -221,7 +225,12 @@ pub(in super::super) fn lower_binop<'ctx, 'b>( return Ok(()); } - let out = if let (Some(li), Some(ri)) = (as_int(lv), as_int(rv)) { + let out = if let (Some(_li0), Some(_ri0)) = (as_int(lv), as_int(rv)) { + // Localize integer operands into current block to satisfy dominance (normalize to i64) + let li = resolver.resolve_i64(codegen, cursor, cur_bid, *lhs, bb_map, preds, block_end_values, vmap) + .unwrap_or_else(|_| codegen.context.i64_type().const_zero()); + let ri = resolver.resolve_i64(codegen, cursor, cur_bid, *rhs, bb_map, preds, block_end_values, vmap) + .unwrap_or_else(|_| codegen.context.i64_type().const_zero()); use BinaryOp as B; match op { B::Add => cursor.emit_instr(cur_bid, |b| b.build_int_add(li, ri, "iadd")).map_err(|e| e.to_string())?.into(), diff --git a/src/backend/llvm/compiler/codegen/instructions/boxcall.rs b/src/backend/llvm/compiler/codegen/instructions/boxcall.rs index a603bcb0..8389d49c 100644 --- a/src/backend/llvm/compiler/codegen/instructions/boxcall.rs +++ b/src/backend/llvm/compiler/codegen/instructions/boxcall.rs @@ -16,6 +16,7 @@ use super::builder_cursor::BuilderCursor; pub(in super::super) fn lower_boxcall<'ctx, 'b>( codegen: &CodegenContext<'ctx>, cursor: &mut BuilderCursor<'ctx, 'b>, + resolver: &mut super::Resolver<'ctx>, cur_bid: BasicBlockId, func: &MirFunction, vmap: &mut HashMap>, @@ -59,7 +60,7 @@ pub(in super::super) fn lower_boxcall<'ctx, 'b>( // Delegate String methods if super::strings::try_handle_string_method( - codegen, cursor, cur_bid, func, vmap, dst, box_val, method, args, recv_v, + codegen, cursor, resolver, cur_bid, func, vmap, dst, box_val, method, args, recv_v, bb_map, preds, block_end_values, )? { return Ok(()); @@ -77,7 +78,21 @@ pub(in super::super) fn lower_boxcall<'ctx, 'b>( // Console convenience: treat println as env.console.log if method == "println" { - return super::externcall::lower_externcall(codegen, cursor, cur_bid, func, vmap, dst, &"env.console".to_string(), &"log".to_string(), args); + return super::externcall::lower_externcall( + codegen, + cursor, + resolver, + cur_bid, + func, + vmap, + dst, + &"env.console".to_string(), + &"log".to_string(), + args, + bb_map, + preds, + block_end_values, + ); } // getField/setField diff --git a/src/backend/llvm/compiler/codegen/instructions/externcall/console.rs b/src/backend/llvm/compiler/codegen/instructions/externcall/console.rs index 3ec064cc..3470d9c2 100644 --- a/src/backend/llvm/compiler/codegen/instructions/externcall/console.rs +++ b/src/backend/llvm/compiler/codegen/instructions/externcall/console.rs @@ -10,12 +10,16 @@ use crate::backend::llvm::compiler::codegen::instructions::builder_cursor::Build pub(super) fn lower_log_or_trace<'ctx, 'b>( codegen: &CodegenContext<'ctx>, cursor: &mut BuilderCursor<'ctx, 'b>, + resolver: &mut super::super::Resolver<'ctx>, cur_bid: BasicBlockId, vmap: &mut HashMap>, dst: &Option, iface_name: &str, method_name: &str, args: &[ValueId], + bb_map: &std::collections::HashMap>, + preds: &std::collections::HashMap>, + block_end_values: &std::collections::HashMap>>, ) -> Result<(), String> { if args.len() != 1 { return Err(format!("{}.{} expects 1 arg", iface_name, method_name)); @@ -49,23 +53,8 @@ pub(super) fn lower_log_or_trace<'ctx, 'b>( } // Otherwise, convert to i64 and call handle variant _ => { - let arg_val = match av { - BVE::IntValue(iv) => { - if iv.get_type() == codegen.context.bool_type() { - cursor - .emit_instr(cur_bid, |b| b.build_int_z_extend(iv, codegen.context.i64_type(), "bool2i64")) - .map_err(|e| e.to_string())? - } else if iv.get_type() == codegen.context.i64_type() { - iv - } else { - cursor - .emit_instr(cur_bid, |b| b.build_int_s_extend(iv, codegen.context.i64_type(), "int2i64")) - .map_err(|e| e.to_string())? - } - } - BVE::PointerValue(_) => unreachable!(), - _ => return Err("console.log arg conversion failed".to_string()), - }; + // Localize to i64 to satisfy dominance + let arg_val = resolver.resolve_i64(codegen, cursor, cur_bid, args[0], bb_map, preds, block_end_values, vmap)?; let fnty = codegen .context .i64_type() diff --git a/src/backend/llvm/compiler/codegen/instructions/externcall/env.rs b/src/backend/llvm/compiler/codegen/instructions/externcall/env.rs index a3c925e4..b98c8ce8 100644 --- a/src/backend/llvm/compiler/codegen/instructions/externcall/env.rs +++ b/src/backend/llvm/compiler/codegen/instructions/externcall/env.rs @@ -10,10 +10,14 @@ use crate::backend::llvm::compiler::codegen::instructions::builder_cursor::Build pub(super) fn lower_future_spawn_instance<'ctx, 'b>( codegen: &CodegenContext<'ctx>, cursor: &mut BuilderCursor<'ctx, 'b>, + resolver: &mut super::super::Resolver<'ctx>, cur_bid: BasicBlockId, vmap: &mut HashMap>, dst: &Option, args: &[ValueId], + bb_map: &std::collections::HashMap>, + preds: &std::collections::HashMap>, + block_end_values: &std::collections::HashMap>>, ) -> Result<(), String> { if args.len() < 2 { return Err("env.future.spawn_instance expects at least (recv, method_name)".to_string()); @@ -22,10 +26,10 @@ pub(super) fn lower_future_spawn_instance<'ctx, 'b>( let i8p = codegen.context.ptr_type(AddressSpace::from(0)); let recv_v = *vmap.get(&args[0]).ok_or("recv missing")?; let recv_h = match recv_v { - BVE::IntValue(iv) => iv, - BVE::PointerValue(pv) => cursor - .emit_instr(cur_bid, |b| b.build_ptr_to_int(pv, i64t, "recv_p2i")) - .map_err(|e| e.to_string())?, + BVE::IntValue(_) | BVE::PointerValue(_) => { + // Localize to i64 to satisfy dominance; converts ptr→i64 if needed + resolver.resolve_i64(codegen, cursor, cur_bid, args[0], bb_map, preds, block_end_values, vmap)? + } _ => return Err("spawn_instance recv must be int or ptr".to_string()), }; let name_v = *vmap.get(&args[1]).ok_or("method name missing")?; @@ -54,11 +58,15 @@ pub(super) fn lower_future_spawn_instance<'ctx, 'b>( pub(super) fn lower_local_get<'ctx, 'b>( codegen: &CodegenContext<'ctx>, cursor: &mut BuilderCursor<'ctx, 'b>, + _resolver: &mut super::super::Resolver<'ctx>, cur_bid: BasicBlockId, func: &MirFunction, vmap: &mut HashMap>, dst: &Option, args: &[ValueId], + _bb_map: &std::collections::HashMap>, + _preds: &std::collections::HashMap>, + _block_end_values: &std::collections::HashMap>>, ) -> Result<(), String> { if args.len() != 1 { return Err("env.local.get expects 1 arg".to_string()); @@ -119,10 +127,14 @@ pub(super) fn lower_local_get<'ctx, 'b>( pub(super) fn lower_box_new<'ctx, 'b>( codegen: &CodegenContext<'ctx>, cursor: &mut BuilderCursor<'ctx, 'b>, + resolver: &mut super::super::Resolver<'ctx>, cur_bid: BasicBlockId, vmap: &mut HashMap>, dst: &Option, args: &[ValueId], + bb_map: &std::collections::HashMap>, + preds: &std::collections::HashMap>, + block_end_values: &std::collections::HashMap>>, ) -> Result<(), String> { // Two variants: (name) and (argc, arg1, arg2, arg3, arg4) with optional ptr conversion // Prefer the i64 birth when possible; else call env.box.new(name) @@ -186,7 +198,9 @@ pub(super) fn lower_box_new<'ctx, 'b>( if args.len() >= 2 { let bv = *vmap.get(&args[1]).ok_or("arg missing")?; a1 = match bv { - BVE::IntValue(iv) => iv, + BVE::IntValue(_) | BVE::PointerValue(_) => { + resolver.resolve_i64(codegen, cursor, cur_bid, args[1], bb_map, preds, block_end_values, vmap)? + } BVE::FloatValue(fv) => { let fnty = i64t.fn_type(&[codegen.context.f64_type().into()], false); let callee = codegen @@ -202,18 +216,7 @@ pub(super) fn lower_box_new<'ctx, 'b>( .ok_or("from_f64 returned void".to_string())?; if let BVE::IntValue(h) = rv { h } else { return Err("from_f64 ret expected i64".to_string()); } } - BVE::PointerValue(pv) => { - let fnty = i64t.fn_type(&[i8p.into()], false); - let callee = codegen - .module - .get_function("nyash.box.from_i8_string") - .unwrap_or_else(|| codegen.module.add_function("nyash.box.from_i8_string", fnty, None)); - let call = cursor - .emit_instr(cur_bid, |b| b.build_call(callee, &[pv.into()], "arg1_i8_to_box")) - .map_err(|e| e.to_string())?; - let rv = call.try_as_basic_value().left().ok_or("from_i8_string returned void".to_string())?; - if let BVE::IntValue(h) = rv { h } else { return Err("from_i8_string ret expected i64".to_string()); } - } + // Pointer handled above by resolve_i64 _ => return Err("unsupported arg value for env.box.new".to_string()), }; } @@ -221,7 +224,9 @@ pub(super) fn lower_box_new<'ctx, 'b>( if args.len() >= 3 { let bv = *vmap.get(&args[2]).ok_or("arg missing")?; a2 = match bv { - BVE::IntValue(iv) => iv, + BVE::IntValue(_) | BVE::PointerValue(_) => { + resolver.resolve_i64(codegen, cursor, cur_bid, args[2], bb_map, preds, block_end_values, vmap)? + } BVE::FloatValue(fv) => { let fnty = i64t.fn_type(&[codegen.context.f64_type().into()], false); let callee = codegen @@ -237,18 +242,7 @@ pub(super) fn lower_box_new<'ctx, 'b>( .ok_or("from_f64 returned void".to_string())?; if let BVE::IntValue(h) = rv { h } else { return Err("from_f64 ret expected i64".to_string()); } } - BVE::PointerValue(pv) => { - let fnty = i64t.fn_type(&[i8p.into()], false); - let callee = codegen - .module - .get_function("nyash.box.from_i8_string") - .unwrap_or_else(|| codegen.module.add_function("nyash.box.from_i8_string", fnty, None)); - let call = cursor - .emit_instr(cur_bid, |b| b.build_call(callee, &[pv.into()], "arg2_i8_to_box")) - .map_err(|e| e.to_string())?; - let rv = call.try_as_basic_value().left().ok_or("from_i8_string returned void".to_string())?; - if let BVE::IntValue(h) = rv { h } else { return Err("from_i8_string ret expected i64".to_string()); } - } + // Pointer handled above by resolve_i64 _ => return Err("unsupported arg value for env.box.new".to_string()), }; } diff --git a/src/backend/llvm/compiler/codegen/instructions/externcall/mod.rs b/src/backend/llvm/compiler/codegen/instructions/externcall/mod.rs index b1dff077..de3229cc 100644 --- a/src/backend/llvm/compiler/codegen/instructions/externcall/mod.rs +++ b/src/backend/llvm/compiler/codegen/instructions/externcall/mod.rs @@ -12,6 +12,7 @@ use crate::backend::llvm::compiler::codegen::instructions::builder_cursor::Build pub(in super::super) fn lower_externcall<'ctx, 'b>( codegen: &CodegenContext<'ctx>, cursor: &mut BuilderCursor<'ctx, 'b>, + resolver: &mut super::Resolver<'ctx>, cur_bid: BasicBlockId, func: &MirFunction, vmap: &mut HashMap>, @@ -19,13 +20,16 @@ pub(in super::super) fn lower_externcall<'ctx, 'b>( iface_name: &str, method_name: &str, args: &[ValueId], + bb_map: &std::collections::HashMap>, + preds: &std::collections::HashMap>, + block_end_values: &std::collections::HashMap>>, ) -> Result<(), String> { // console/debug if (iface_name == "env.console" && matches!(method_name, "log" | "warn" | "error")) || (iface_name == "env.debug" && method_name == "trace") { - return console::lower_log_or_trace(codegen, cursor, cur_bid, vmap, dst, iface_name, method_name, args); + return console::lower_log_or_trace(codegen, cursor, resolver, cur_bid, vmap, dst, iface_name, method_name, args, bb_map, preds, block_end_values); } if iface_name == "env.console" && method_name == "readLine" { return console::lower_readline(codegen, cursor, cur_bid, vmap, dst, args); @@ -33,13 +37,13 @@ pub(in super::super) fn lower_externcall<'ctx, 'b>( // env.* if iface_name == "env.future" && method_name == "spawn_instance" { - return env::lower_future_spawn_instance(codegen, cursor, cur_bid, vmap, dst, args); + return env::lower_future_spawn_instance(codegen, cursor, resolver, cur_bid, vmap, dst, args, bb_map, preds, block_end_values); } if iface_name == "env.local" && method_name == "get" { - return env::lower_local_get(codegen, cursor, cur_bid, func, vmap, dst, args); + return env::lower_local_get(codegen, cursor, resolver, cur_bid, func, vmap, dst, args, bb_map, preds, block_end_values); } if iface_name == "env.box" && method_name == "new" { - return env::lower_box_new(codegen, cursor, cur_bid, vmap, dst, args); + return env::lower_box_new(codegen, cursor, resolver, cur_bid, vmap, dst, args, bb_map, preds, block_end_values); } Err(format!( diff --git a/src/backend/llvm/compiler/codegen/instructions/flow.rs b/src/backend/llvm/compiler/codegen/instructions/flow.rs index 4fcbde5c..51752425 100644 --- a/src/backend/llvm/compiler/codegen/instructions/flow.rs +++ b/src/backend/llvm/compiler/codegen/instructions/flow.rs @@ -56,38 +56,7 @@ pub(in super::super) fn emit_jump<'ctx, 'b>( >, vmap: &HashMap>, ) -> Result<(), String> { - let sealed = std::env::var("NYASH_LLVM_PHI_SEALED").ok().as_deref() == Some("1"); - if !sealed { - if let Some(list) = phis_by_block.get(target) { - for (_dst, phi, inputs) in list { - if let Some((_, in_vid)) = inputs.iter().find(|(pred, _)| pred == &bid) { - let mut val = *vmap.get(in_vid).ok_or("phi incoming value missing")?; - let pred_bb = *bb_map.get(&bid).ok_or("pred bb missing")?; - // Coerce incoming to PHI type when needed - val = coerce_to_type(codegen, phi, val)?; - if std::env::var("NYASH_CLI_VERBOSE").ok().as_deref() == Some("1") { - let tys = phi - .as_basic_value() - .get_type() - .print_to_string() - .to_string(); - eprintln!( - "[PHI] incoming add pred_bb={} val={} ty={}", - bid.as_u32(), - in_vid.as_u32(), - tys - ); - } - match val { - BasicValueEnum::IntValue(iv) => phi.add_incoming(&[(&iv, pred_bb)]), - BasicValueEnum::FloatValue(fv) => phi.add_incoming(&[(&fv, pred_bb)]), - BasicValueEnum::PointerValue(pv) => phi.add_incoming(&[(&pv, pred_bb)]), - _ => return Err("unsupported phi incoming value".to_string()), - } - } - } - } - } + // Non-sealed incoming wiring removed: rely on sealed snapshots and resolver-driven PHIs. let tbb = *bb_map.get(target).ok_or("target bb missing")?; if std::env::var("NYASH_CLI_VERBOSE").ok().as_deref() == Some("1") { eprintln!("[LLVM] emit_jump: {} -> {}", bid.as_u32(), target.as_u32()); @@ -101,6 +70,7 @@ pub(in super::super) fn emit_jump<'ctx, 'b>( pub(in super::super) fn emit_branch<'ctx, 'b>( codegen: &CodegenContext<'ctx>, cursor: &mut BuilderCursor<'ctx, 'b>, + resolver: &mut super::Resolver<'ctx>, bid: BasicBlockId, condition: &ValueId, then_bb: &BasicBlockId, @@ -118,79 +88,13 @@ pub(in super::super) fn emit_branch<'ctx, 'b>( let cond_v = *vmap.get(condition).ok_or("cond missing")?; let b = match cond_v { BasicValueEnum::IntValue(_) | BasicValueEnum::PointerValue(_) | BasicValueEnum::FloatValue(_) => { - let ci = localize_to_i64(codegen, cursor, bid, *condition, bb_map, preds, block_end_values, vmap)?; + let ci = resolver.resolve_i64(codegen, cursor, bid, *condition, bb_map, preds, block_end_values, vmap)?; let zero = codegen.context.i64_type().const_zero(); codegen.builder.build_int_compare(inkwell::IntPredicate::NE, ci, zero, "cond_nez").map_err(|e| e.to_string())? } _ => to_bool(codegen.context, cond_v, &codegen.builder)?, }; - let sealed = std::env::var("NYASH_LLVM_PHI_SEALED").ok().as_deref() == Some("1"); - // then - if !sealed { - if let Some(list) = phis_by_block.get(then_bb) { - for (_dst, phi, inputs) in list { - if let Some((_, in_vid)) = inputs.iter().find(|(pred, _)| pred == &bid) { - let mut val = *vmap - .get(in_vid) - .ok_or("phi incoming (then) value missing")?; - let pred_bb = *bb_map.get(&bid).ok_or("pred bb missing")?; - val = coerce_to_type(codegen, phi, val)?; - if std::env::var("NYASH_CLI_VERBOSE").ok().as_deref() == Some("1") { - let tys = phi - .as_basic_value() - .get_type() - .print_to_string() - .to_string(); - eprintln!( - "[PHI] incoming add (then) pred_bb={} val={} ty={}", - bid.as_u32(), - in_vid.as_u32(), - tys - ); - } - match val { - BasicValueEnum::IntValue(iv) => phi.add_incoming(&[(&iv, pred_bb)]), - BasicValueEnum::FloatValue(fv) => phi.add_incoming(&[(&fv, pred_bb)]), - BasicValueEnum::PointerValue(pv) => phi.add_incoming(&[(&pv, pred_bb)]), - _ => return Err("unsupported phi incoming value (then)".to_string()), - } - } - } - } - } - // else - if !sealed { - if let Some(list) = phis_by_block.get(else_bb) { - for (_dst, phi, inputs) in list { - if let Some((_, in_vid)) = inputs.iter().find(|(pred, _)| pred == &bid) { - let mut val = *vmap - .get(in_vid) - .ok_or("phi incoming (else) value missing")?; - let pred_bb = *bb_map.get(&bid).ok_or("pred bb missing")?; - val = coerce_to_type(codegen, phi, val)?; - if std::env::var("NYASH_CLI_VERBOSE").ok().as_deref() == Some("1") { - let tys = phi - .as_basic_value() - .get_type() - .print_to_string() - .to_string(); - eprintln!( - "[PHI] incoming add (else) pred_bb={} val={} ty={}", - bid.as_u32(), - in_vid.as_u32(), - tys - ); - } - match val { - BasicValueEnum::IntValue(iv) => phi.add_incoming(&[(&iv, pred_bb)]), - BasicValueEnum::FloatValue(fv) => phi.add_incoming(&[(&fv, pred_bb)]), - BasicValueEnum::PointerValue(pv) => phi.add_incoming(&[(&pv, pred_bb)]), - _ => return Err("unsupported phi incoming value (else)".to_string()), - } - } - } - } - } + // Non-sealed incoming wiring removed: rely on sealed snapshots and resolver-driven PHIs. let tbb = *bb_map.get(then_bb).ok_or("then bb missing")?; let ebb = *bb_map.get(else_bb).ok_or("else bb missing")?; if std::env::var("NYASH_CLI_VERBOSE").ok().as_deref() == Some("1") { diff --git a/src/backend/llvm/compiler/codegen/instructions/mod.rs b/src/backend/llvm/compiler/codegen/instructions/mod.rs index 94df9e60..eef5c285 100644 --- a/src/backend/llvm/compiler/codegen/instructions/mod.rs +++ b/src/backend/llvm/compiler/codegen/instructions/mod.rs @@ -13,6 +13,7 @@ mod maps; mod arith_ops; mod call; mod loopform; +mod resolver; pub(super) use blocks::{create_basic_blocks, precreate_phis}; pub(super) use flow::{emit_branch, emit_jump, emit_return}; @@ -26,3 +27,4 @@ pub(super) use arith_ops::{lower_binop, lower_unary}; pub(super) use call::lower_call; pub(super) use loopform::{LoopFormContext, lower_while_loopform}; pub(super) use loopform::normalize_header_phis_for_latch; +pub(super) use resolver::Resolver; diff --git a/src/backend/llvm/compiler/codegen/instructions/resolver.rs b/src/backend/llvm/compiler/codegen/instructions/resolver.rs new file mode 100644 index 00000000..7735fc5b --- /dev/null +++ b/src/backend/llvm/compiler/codegen/instructions/resolver.rs @@ -0,0 +1,43 @@ +use std::collections::HashMap; + +use inkwell::values::{BasicValueEnum as BVE, IntValue}; + +use crate::backend::llvm::context::CodegenContext; +use crate::mir::{BasicBlockId, ValueId}; + +use super::builder_cursor::BuilderCursor; +use super::flow::localize_to_i64; + +/// Minimal per-function resolver caches. Caches localized i64 values per (block,value) to avoid +/// redundant PHIs and casts when multiple users in the same block request the same MIR value. +pub struct Resolver<'ctx> { + i64_locals: HashMap<(BasicBlockId, ValueId), IntValue<'ctx>>, +} + +impl<'ctx> Resolver<'ctx> { + pub fn new() -> Self { + Self { i64_locals: HashMap::new() } + } + + /// Resolve a MIR value as an i64 dominating the current block. + /// Strategy: if present in cache, return it; otherwise localize via sealed snapshots and cache. + pub fn resolve_i64<'b>( + &mut self, + codegen: &CodegenContext<'ctx>, + cursor: &mut BuilderCursor<'ctx, 'b>, + cur_bid: BasicBlockId, + vid: ValueId, + bb_map: &std::collections::HashMap>, + preds: &std::collections::HashMap>, + block_end_values: &std::collections::HashMap>>, + vmap: &std::collections::HashMap>, + ) -> Result, String> { + if let Some(iv) = self.i64_locals.get(&(cur_bid, vid)).copied() { + return Ok(iv); + } + let iv = localize_to_i64(codegen, cursor, cur_bid, vid, bb_map, preds, block_end_values, vmap)?; + self.i64_locals.insert((cur_bid, vid), iv); + Ok(iv) + } +} + diff --git a/src/backend/llvm/compiler/codegen/instructions/strings.rs b/src/backend/llvm/compiler/codegen/instructions/strings.rs index 100aada5..747cb291 100644 --- a/src/backend/llvm/compiler/codegen/instructions/strings.rs +++ b/src/backend/llvm/compiler/codegen/instructions/strings.rs @@ -5,12 +5,13 @@ use inkwell::{values::BasicValueEnum as BVE, AddressSpace}; use crate::backend::llvm::context::CodegenContext; use crate::mir::{function::MirFunction, BasicBlockId, ValueId}; use super::builder_cursor::BuilderCursor; -use super::flow::localize_to_i64; +use super::Resolver; /// Handle String-specific methods. Returns true if handled, false to let caller continue. pub(super) fn try_handle_string_method<'ctx, 'b>( codegen: &CodegenContext<'ctx>, cursor: &mut BuilderCursor<'ctx, 'b>, + resolver: &mut Resolver<'ctx>, cur_bid: BasicBlockId, func: &MirFunction, vmap: &mut HashMap>, @@ -60,8 +61,8 @@ pub(super) fn try_handle_string_method<'ctx, 'b>( } (BVE::PointerValue(lp), BVE::IntValue(_ri)) => { let i64t = codegen.context.i64_type(); - // Localize rhs integer in current block - let ri = localize_to_i64(codegen, cursor, cur_bid, args[0], bb_map, preds, block_end_values, vmap)?; + // Localize rhs integer in current block via Resolver + let ri = resolver.resolve_i64(codegen, cursor, cur_bid, args[0], bb_map, preds, block_end_values, vmap)?; let fnty = i8p.fn_type(&[i8p.into(), i64t.into()], false); let callee = codegen .module @@ -83,7 +84,7 @@ pub(super) fn try_handle_string_method<'ctx, 'b>( (BVE::IntValue(_li), BVE::PointerValue(rp)) => { let i64t = codegen.context.i64_type(); // Localize receiver integer in current block (box_val) - let li = localize_to_i64(codegen, cursor, cur_bid, *box_val, bb_map, preds, block_end_values, vmap)?; + let li = resolver.resolve_i64(codegen, cursor, cur_bid, *box_val, bb_map, preds, block_end_values, vmap)?; let fnty = i8p.fn_type(&[i64t.into(), i8p.into()], false); let callee = codegen .module @@ -172,8 +173,8 @@ pub(super) fn try_handle_string_method<'ctx, 'b>( _ => return Ok(false), }; // Localize start/end indices to current block via sealed snapshots (i64) - let s = localize_to_i64(codegen, cursor, cur_bid, args[0], bb_map, preds, block_end_values, vmap)?; - let e = localize_to_i64(codegen, cursor, cur_bid, args[1], bb_map, preds, block_end_values, vmap)?; + let s = resolver.resolve_i64(codegen, cursor, cur_bid, args[0], bb_map, preds, block_end_values, vmap)?; + let e = resolver.resolve_i64(codegen, cursor, cur_bid, args[1], bb_map, preds, block_end_values, vmap)?; let fnty = i8p.fn_type(&[i8p.into(), i64t.into(), i64t.into()], false); let callee = codegen .module diff --git a/src/backend/llvm/compiler/codegen/mod.rs b/src/backend/llvm/compiler/codegen/mod.rs index 78ea4249..d764894e 100644 --- a/src/backend/llvm/compiler/codegen/mod.rs +++ b/src/backend/llvm/compiler/codegen/mod.rs @@ -203,10 +203,13 @@ impl LLVMCompiler { // Lower body let mut loopform_loop_id: u32 = 0; - let sealed_mode = std::env::var("NYASH_LLVM_PHI_SEALED").ok().as_deref() == Some("1"); + // Default sealed-SSA ON unless explicitly disabled with NYASH_LLVM_PHI_SEALED=0 + let sealed_mode = std::env::var("NYASH_LLVM_PHI_SEALED").ok().as_deref() != Some("0"); // LoopForm registry (per-function lowering; gated) let mut loopform_registry: HashMap = HashMap::new(); let mut loopform_body_to_header: HashMap = HashMap::new(); + // Per-function Resolver for dominance-safe value access (i64 minimal) + let mut resolver = instructions::Resolver::new(); for (bi, bid) in block_ids.iter().enumerate() { let bb = *bb_map.get(bid).unwrap(); // Use cursor to position at BB start for lowering @@ -289,6 +292,7 @@ impl LLVMCompiler { instructions::lower_boxcall( &codegen, &mut cursor, + &mut resolver, *bid, func, &mut vmap, @@ -306,7 +310,21 @@ impl LLVMCompiler { if let Some(d) = dst { defined_in_block.insert(*d); } }, MirInstruction::ExternCall { dst, iface_name, method_name, args, effects: _ } => { - instructions::lower_externcall(&codegen, &mut cursor, *bid, func, &mut vmap, dst, iface_name, method_name, args)?; + instructions::lower_externcall( + &codegen, + &mut cursor, + &mut resolver, + *bid, + func, + &mut vmap, + dst, + iface_name, + method_name, + args, + &bb_map, + &preds, + &block_end_values, + )?; if let Some(d) = dst { defined_in_block.insert(*d); } }, MirInstruction::UnaryOp { dst, op, operand } => { @@ -314,11 +332,11 @@ impl LLVMCompiler { defined_in_block.insert(*dst); }, MirInstruction::BinOp { dst, op, lhs, rhs } => { - instructions::lower_binop(&codegen, &mut cursor, *bid, func, &mut vmap, *dst, op, lhs, rhs)?; + instructions::lower_binop(&codegen, &mut cursor, &mut resolver, *bid, func, &mut vmap, *dst, op, lhs, rhs, &bb_map, &preds, &block_end_values)?; defined_in_block.insert(*dst); }, MirInstruction::Compare { dst, op, lhs, rhs } => { - let out = instructions::lower_compare(&codegen, &mut cursor, *bid, func, &vmap, op, lhs, rhs, &bb_map, &preds, &block_end_values)?; + let out = instructions::lower_compare(&codegen, &mut cursor, &mut resolver, *bid, func, &vmap, op, lhs, rhs, &bb_map, &preds, &block_end_values)?; vmap.insert(*dst, out); defined_in_block.insert(*dst); }, @@ -439,7 +457,7 @@ impl LLVMCompiler { } } if !handled_by_loopform { - instructions::emit_branch(&codegen, &mut cursor, *bid, condition, then_bb, else_bb, &bb_map, &phis_by_block, &vmap, &preds, &block_end_values)?; + instructions::emit_branch(&codegen, &mut cursor, &mut resolver, *bid, condition, then_bb, else_bb, &bb_map, &phis_by_block, &vmap, &preds, &block_end_values)?; } } _ => { diff --git a/tools/llvmlite_harness.py b/tools/llvmlite_harness.py new file mode 100644 index 00000000..fce073d0 --- /dev/null +++ b/tools/llvmlite_harness.py @@ -0,0 +1,83 @@ +#!/usr/bin/env python3 +""" +Experimental llvmlite-based LLVM emission harness for Nyash. + +Usage: + python3 tools/llvmlite_harness.py [--in MIR.json] --out OUTPUT.o + +Notes: +- First cut emits a trivial ny_main that returns 0 to validate toolchain. +- Extend to lower MIR14 JSON incrementally. +""" +from __future__ import annotations +import argparse +import json +import os +import sys + +try: + from llvmlite import ir, binding +except Exception as e: # noqa: BLE001 + sys.stderr.write( + "llvmlite is required. Install with: python3 -m pip install llvmlite\n" + ) + sys.stderr.write(f"Import error: {e}\n") + sys.exit(2) + + +def parse_args() -> argparse.Namespace: + ap = argparse.ArgumentParser(description="Nyash llvmlite harness") + ap.add_argument("--in", dest="in_path", help="MIR14 JSON input (optional)") + ap.add_argument("--out", dest="out_path", required=True, help="Output object file path") + return ap.parse_args() + + +def load_mir(path: str | None) -> dict | None: + if not path: + return None + with open(path, "r", encoding="utf-8") as f: + return json.load(f) + + +def build_trivial_module() -> ir.Module: + mod = ir.Module(name="nyash_harness") + mod.triple = binding.get_default_triple() + i64 = ir.IntType(64) + i8 = ir.IntType(8) + i8p = i8.as_pointer() + i8pp = i8p.as_pointer() + fn_ty = ir.FunctionType(i64, [i64, i8pp]) + fn = ir.Function(mod, fn_ty, name="ny_main") + entry = fn.append_basic_block(name="entry") + b = ir.IRBuilder(entry) + b.ret(ir.Constant(i64, 0)) + return mod + + +def emit_object(mod: ir.Module, out_path: str) -> None: + binding.initialize() + binding.initialize_native_target() + binding.initialize_native_asmprinter() + + target = binding.Target.from_default_triple() + tm = target.create_target_machine() + llvm_mod = binding.parse_assembly(str(mod)) + llvm_mod.verify() + obj = tm.emit_object(llvm_mod) + with open(out_path, "wb") as f: + f.write(obj) + + +def main() -> int: + ns = parse_args() + _mir = load_mir(ns.in_path) + # For now, ignore MIR content and emit a trivial module. + mod = build_trivial_module() + os.makedirs(os.path.dirname(ns.out_path) or ".", exist_ok=True) + emit_object(mod, ns.out_path) + return 0 + + +if __name__ == "__main__": # pragma: no cover + raise SystemExit(main()) +