feat(llvm): Major refactor - BuilderCursor全域化 & Resolver API導入

Added:
- Resolver API (resolve_i64) for unified value resolution with per-block cache
- llvmlite harness (Python) for rapid PHI/SSA verification
- Comprehensive LLVM documentation suite:
  - LLVM_LAYER_OVERVIEW.md: Overall architecture and invariants
  - RESOLVER_API.md: Value resolution strategy
  - LLVM_HARNESS.md: Python verification harness

Updated:
- BuilderCursor applied to ALL lowering paths (externcall/newbox/arrays/maps/call)
- localize_to_i64 for dominance safety in strings/compare/flow
- NYASH_LLVM_DUMP_ON_FAIL=1 for debug IR output

Key insight: LoopForm didn't cause problems, it just exposed existing design flaws:
- Scattered value resolution (now unified via Resolver)
- Inconsistent type conversion placement
- Ambiguous PHI wiring responsibilities

Next: Wire Resolver throughout, achieve sealed=ON green for dep_tree_min_string
This commit is contained in:
Selfhosting Dev
2025-09-12 20:06:48 +09:00
parent 45f13cf7a8
commit c04b0c059d
16 changed files with 377 additions and 168 deletions

View File

@ -15,7 +15,7 @@ Compact Roadmap (20250912)
- BuilderCursor: postterminator挿入を即panicstrings/arith_ops/memへ適用済
- Next (short):
1) BuilderCursor厳格化の適用拡大externcall→newbox→arrays→maps→call
2) Sealed SSA を既定ONに一本化finalize_phis停止、seal_blockで完結
2) Sealed SSA を既定ONに一本化finalize_phis停止、seal_blockで完結NYASH_LLVM_PHI_SEALED は未設定時=ON。
3) LoopForm header PHI正規化の安定化latch→header ON 時も verifier green
4) body→dispatchを単純ボディで常用化段階ゲート
5) 計測: dispatch-only PHI/ゼロ合成減少、postterminator検知ゼロ継続。
@ -54,6 +54,21 @@ Hot Update — 20250912 (Plan: LLVM wrapper via Nyash ABI)
- I/O 仕様: 入力=MIR(JSON/メモリ), 出力=.o`NYASH_AOT_OBJECT_OUT` に書き出し)。
- 受け入れ: harness ON/OFF で dep_tree_min_string の出力一致(機能同値)。
Scaffold — 20250912 (llvmlite harness)
- Added tools/llvmlite_harness.py (trivial ny_main returning 0) and docs/LLVM_HARNESS.md.
- Use to validate toolchain wiring; extend to lower MIR14 JSON incrementally.
Scaffold — 20250912 (Resolver i64 minimal)
- Added src/backend/llvm/compiler/codegen/instructions/resolver.rs with `Resolver::resolve_i64(...)` and per-block cache.
- docs/RESOLVER_API.md documents goals/usage; wiring to replace `localize_to_i64` callsites comes next.
Docs — LLVM layer overview (20250912)
- Added docs/LLVM_LAYER_OVERVIEW.md and linked it with existing docs:
- docs/LOWERING_LLVM.md — concrete lowering rules and RT calls
- docs/RESOLVER_API.md — value resolution (sealed/localize) API and cache
- docs/LLVM_HARNESS.md — llvmlite harness scope and interface
Use as the canonical reference for invariants: Resolver-only reads, cast placement, cursor discipline, sealed SSA, and LoopForm shape.
Hot Repro — esc_json/1 PHI 配線20250912
- 対象: apps/selfhost/tools/dep_tree_min_string.nyash
- 実行LLVM:
@ -187,6 +202,36 @@ Done (today)
- BuilderCursor 適用第1弾: strings/arith_ops/mem を Cursor 経由に統一。post-terminator 挿入検知を強化。
- Sealed SSA: `finalize_phis` を停止し、`seal_block` に一本化。LoopForm latch→header の header PHI 正規化を追加(ゲート付)。
Hot Update — 20250912 (sealed + dominator 修正の途中経過)
- BuilderCursor 全域化 拡大(第一波 完了)
- externcall(console/env), newbox, arrays, maps, call, compare を Cursor 経由へ移行
- 既存 strings/arith_ops/mem とあわせて、ほぼ全 lowering が postterminator 防止のガード下に
- ondemand PHI局所化導入flow::localize_to_i64
- 目的: 「現在BBで利用する i64 値」を pred スナップショットから PHI 生成して局所定義に置換→支配関係違反を解消
- 生成位置: BB 先頭既存PHIの前に挿入。挿入点は保存/復元
- 適用先: strings.substring の start/end、strings.concat の si/is、compare の整数比較、flow.emit_branch の条件int/ptr/float→i1
- 失敗時IRダンプ: `NYASH_LLVM_DUMP_ON_FAIL=1``tmp/llvm_fail_<func>.ll` を出力(関数検証失敗時)
Smokesealed=ON, dep_tree_min_string所見
- 進展: PHI 欠落は再現せず、sealed での incoming 配線は安定
- 依然NG: Main.node_json/3 で dominator 違反Instruction does not dominate all uses!
- iadd→icmp/sub/substring/concat 連鎖の一部で、iadd 定義が利用点を支配していない
- 対応済: 分岐条件/整数比較/substring/concat の整数引数は局所化済み
- まだの可能性が高い箇所: そのほかの lowering 内で vmap 経由の整数使用BoxCall/ExternCall/arith_ops 内の再利用点など)
Next引き継ぎアクション
1) 局所化の適用拡大(優先)
- vmap から整数値を読み出して利用する全パスで `localize_to_i64` を適用
- 候補: arith_opsBinOpのオペランド再利用箇所、BoxCall の残りの整数引数、他メソッドの整数パラメータ
- types.to_bool 直叩きは emit 側での「局所化→!=0」に段階移行
2) Resolver API の一般化
- 「ValueId→現在BBの値」を返す resolver を導入(まず i64、必要に応じて ptr/f64 へ拡張)
- 全 lowering から resolver 経由で値取得し、支配関係崩れを根本排除
3) IR 可視化/検証強化
- 失敗関数の .ll を確認し、局所化漏れの使用点を特定→順次塞ぐ
4) 併走: llvmlite 検証ハーネス(`NYASH_LLVM_USE_HARNESS=1`
- PHI/loop/短絡の形を高速に検証→Rust 実装へ反映(機能一致を Acceptance A5 で担保)
Refactor — LLVM codegen instructions modularized (done)
- Goal achieved: `instructions.rs` を段階分割し、責務ごとに再配置0diff
- New layout under `src/backend/llvm/compiler/codegen/instructions/`:
@ -275,6 +320,7 @@ Next Flowこれからの流れ段階導入
Acceptance段階ごと
- A1: LoopForm ON でも従来挙動と等価Break 集約のみ・非破壊、smoke green
- A2: BuilderCursor 厳格化で post-terminator が検知ゼロpanic不発が続く。
- A2.5: sealed=ON で dep_tree_min_string の dominator 違反ゼロIR dump 不要レベル)。
- A3: header PHI 正規化後、latch→header 有効でも verifier greenPHI 欠落なし)。
- A4: body→dispatch を単純ボディで常用化し、dispatch 以外に PHI が出ないことを確認。
- A5: `NYASH_LLVM_USE_HARNESS=1`llvmliteと OFFRustの出力が dep_tree_min_string で機能一致。

36
docs/LLVM_HARNESS.md Normal file
View File

@ -0,0 +1,36 @@
# llvmlite Harness (Experimental)
Purpose
- Provide a fast, scriptable LLVM emission path using Python + llvmlite for validation and prototyping.
- Run in parallel with the Rust/inkwell path; keep outputs functionally equivalent for targeted smokes.
Switch
- Set `NYASH_LLVM_USE_HARNESS=1` to prefer the harness (future: wired in LLVM backend entry).
Protocol (tentative)
- Input: MIR14 JSON file path (subset sufficient for dep_tree_min_string initially).
- Output: `.o` object file written to `NYASH_AOT_OBJECT_OUT` or `--out` path.
- Entry function: `ny_main(i64 argc, i8** argv) -> i64` (returns app exit code/box-handle per ABI).
Quick Start
- Install deps: `python3 -m pip install llvmlite`
- Generate a dummy object to validate toolchain:
- `python3 tools/llvmlite_harness.py --out /tmp/dummy.o`
- Link with NyRT as usual to produce an executable.
Intended Wiring (Rust side)
- LLVM backend checks `NYASH_LLVM_USE_HARNESS=1` and, if set, exports MIR14 of the target module to a temp JSON, then invokes:
- `python3 tools/llvmlite_harness.py --in <mir.json> --out <obj.o>`
- On success, the normal link step continues using `<obj.o>`.
Scope (Phase 15)
- Minimal ops: i64 arithmetic, comparisons, branches, PHI(Sealed), basic string ops through NyRT shims.
- Target case: `apps/selfhost/tools/dep_tree_min_string.nyash` builds and runs.
Acceptance
- A5: Harness ON vs OFF produce functionally equivalent output for the target smoke.
Notes
- The first version may ignore MIR details and emit a fixed `ny_main` body for smoke scaffolding; then iterate to lower MIR ops.
- Keep the harness self-contained; no external state besides inputs and env.

View File

@ -0,0 +1,37 @@
# LLVM Layer Overview (Phase 15)
Scope
- Practical guide to LLVM lowering architecture and invariants used in Phase 15.
- Complements LOWERING_LLVM.md (rules), RESOLVER_API.md (value resolution), and LLVM_HARNESS.md (harness).
Module Layout
- `src/backend/llvm/compiler/codegen/`
- `instructions/` split by concern: `flow.rs`, `blocks.rs`, `arith.rs`, `arith_ops.rs`, `mem.rs`,
`strings.rs`, `arrays.rs`, `maps.rs`, `boxcall/`, `externcall/`, `call.rs`, `loopform.rs`, `resolver.rs`.
- `builder_cursor.rs`: central insertion/terminator guard.
Core Invariants
- Resolver-only reads: lowerers fetch MIR values through `Resolver` (no direct `vmap` access for cross-BB values).
- Localize at block start: PHIs created at the beginning of the current BB (before non-PHI) to guarantee dominance.
- Cast placement: perform ptr↔int and width casts outside PHIs, at BB start or just-before-pred-terminator via `with_block`.
- Sealed SSA: successor PHIs wired by predecessor snapshots and `seal_block`; branch/jump do not push incoming directly.
- Cursor discipline: only insert via `BuilderCursor`; post-terminator insertions are forbidden.
LoopForm (gated)
- Shape: `preheader → header → body → dispatch(phi) → {latch|exit} → header` with PHIs centralized in `dispatch`.
- State: model loop-carried values via a `LoopState` aggregate (tag + payloads).
- Goals: move all PHIs to dispatch, ensure header uses are dominated by preheader/dispatch values.
Types and Bridges
- Box handle is `i64` across NyRT boundary; strings prefer `i8*` fast paths.
- Convert rules: `ensure_i64/ensure_i1/ensure_ptr` style helpers (planned extraction) to centralize casting.
Harness (optional)
- llvmlite harness exists for fast prototyping and structural checks.
- Gate: `NYASH_LLVM_USE_HARNESS=1` (planned wiring); target parity tested by Acceptance A5.
References
- LOWERING_LLVM.md — lowering rules and runtime calls
- RESOLVER_API.md — Resolver design and usage
- LLVM_HARNESS.md — llvmlite harness interface and usage

25
docs/RESOLVER_API.md Normal file
View File

@ -0,0 +1,25 @@
# Resolver API (Minimal i64 Prototype)
Goals
- Centralize "ValueId → current-block value" resolution.
- Guarantee dominance by localizing values at the start of the block (before non-PHI).
- De-duplicate per (block, value) to avoid redundant PHIs/casts.
Design
- `Resolver` keeps small per-function caches keyed by `(BasicBlockId, ValueId)`.
- `resolve_i64(...)` returns an `i64`-typed `IntValue`, inserting PHI and casts as needed using sealed snapshots.
- Internally uses `flow::localize_to_i64(...)` for now; later, fold logic directly and add `resolve_ptr/resolve_f64`.
Usage (planned wiring)
- Create `let mut resolver = instructions::Resolver::new();` at function lowering start.
- Replace all integer value fetches in lowerers with `resolver.resolve_i64(...)`.
- Keep builder insertion discipline via `BuilderCursor`.
Next
- Add `resolve_ptr(...)` and `resolve_f64(...)` with same caching discipline.
- Migrate existing `localize_to_i64` call sites to the resolver.
- Enforce vmap direct access ban in lowerers (Resolver-only for reads).
Acceptance tie-in
- Combined with LoopForm: dispatch-only PHI + resolver-based value access → dominance violations drop to zero (A2.5).

View File

@ -10,6 +10,7 @@ use super::builder_cursor::BuilderCursor;
pub(in super::super) fn lower_compare<'ctx, 'b>(
codegen: &CodegenContext<'ctx>,
cursor: &mut BuilderCursor<'ctx, 'b>,
resolver: &mut super::Resolver<'ctx>,
cur_bid: BasicBlockId,
func: &MirFunction,
vmap: &HashMap<ValueId, BasicValueEnum<'ctx>>,
@ -104,9 +105,11 @@ pub(in super::super) fn lower_compare<'ctx, 'b>(
}
let out = if let (Some(_li0), Some(_ri0)) = (as_int(lv), as_int(rv)) {
// Localize integer operands into current block to satisfy dominance
let mut li = super::flow::localize_to_i64(codegen, cursor, cur_bid, *lhs, bb_map, preds, block_end_values, vmap)
let mut li = resolver
.resolve_i64(codegen, cursor, cur_bid, *lhs, bb_map, preds, block_end_values, vmap)
.unwrap_or_else(|_| as_int(lv).unwrap());
let mut ri = super::flow::localize_to_i64(codegen, cursor, cur_bid, *rhs, bb_map, preds, block_end_values, vmap)
let mut ri = resolver
.resolve_i64(codegen, cursor, cur_bid, *rhs, bb_map, preds, block_end_values, vmap)
.unwrap_or_else(|_| as_int(rv).unwrap());
// Normalize integer widths: extend the narrower to match the wider to satisfy LLVM
let lw = li.get_type().get_bit_width();

View File

@ -57,6 +57,7 @@ pub(in super::super) fn lower_unary<'ctx, 'b>(
pub(in super::super) fn lower_binop<'ctx, 'b>(
codegen: &CodegenContext<'ctx>,
cursor: &mut BuilderCursor<'ctx, 'b>,
resolver: &mut super::Resolver<'ctx>,
cur_bid: BasicBlockId,
func: &MirFunction,
vmap: &mut HashMap<ValueId, BasicValueEnum<'ctx>>,
@ -64,6 +65,9 @@ pub(in super::super) fn lower_binop<'ctx, 'b>(
op: &BinaryOp,
lhs: &ValueId,
rhs: &ValueId,
bb_map: &std::collections::HashMap<crate::mir::BasicBlockId, inkwell::basic_block::BasicBlock<'ctx>>,
preds: &std::collections::HashMap<crate::mir::BasicBlockId, Vec<crate::mir::BasicBlockId>>,
block_end_values: &std::collections::HashMap<crate::mir::BasicBlockId, std::collections::HashMap<ValueId, BasicValueEnum<'ctx>>>,
) -> Result<(), String> {
use crate::backend::llvm::compiler::helpers::{as_float, as_int};
use inkwell::values::BasicValueEnum as BVE;
@ -221,7 +225,12 @@ pub(in super::super) fn lower_binop<'ctx, 'b>(
return Ok(());
}
let out = if let (Some(li), Some(ri)) = (as_int(lv), as_int(rv)) {
let out = if let (Some(_li0), Some(_ri0)) = (as_int(lv), as_int(rv)) {
// Localize integer operands into current block to satisfy dominance (normalize to i64)
let li = resolver.resolve_i64(codegen, cursor, cur_bid, *lhs, bb_map, preds, block_end_values, vmap)
.unwrap_or_else(|_| codegen.context.i64_type().const_zero());
let ri = resolver.resolve_i64(codegen, cursor, cur_bid, *rhs, bb_map, preds, block_end_values, vmap)
.unwrap_or_else(|_| codegen.context.i64_type().const_zero());
use BinaryOp as B;
match op {
B::Add => cursor.emit_instr(cur_bid, |b| b.build_int_add(li, ri, "iadd")).map_err(|e| e.to_string())?.into(),

View File

@ -16,6 +16,7 @@ use super::builder_cursor::BuilderCursor;
pub(in super::super) fn lower_boxcall<'ctx, 'b>(
codegen: &CodegenContext<'ctx>,
cursor: &mut BuilderCursor<'ctx, 'b>,
resolver: &mut super::Resolver<'ctx>,
cur_bid: BasicBlockId,
func: &MirFunction,
vmap: &mut HashMap<ValueId, inkwell::values::BasicValueEnum<'ctx>>,
@ -59,7 +60,7 @@ pub(in super::super) fn lower_boxcall<'ctx, 'b>(
// Delegate String methods
if super::strings::try_handle_string_method(
codegen, cursor, cur_bid, func, vmap, dst, box_val, method, args, recv_v,
codegen, cursor, resolver, cur_bid, func, vmap, dst, box_val, method, args, recv_v,
bb_map, preds, block_end_values,
)? {
return Ok(());
@ -77,7 +78,21 @@ pub(in super::super) fn lower_boxcall<'ctx, 'b>(
// Console convenience: treat println as env.console.log
if method == "println" {
return super::externcall::lower_externcall(codegen, cursor, cur_bid, func, vmap, dst, &"env.console".to_string(), &"log".to_string(), args);
return super::externcall::lower_externcall(
codegen,
cursor,
resolver,
cur_bid,
func,
vmap,
dst,
&"env.console".to_string(),
&"log".to_string(),
args,
bb_map,
preds,
block_end_values,
);
}
// getField/setField

View File

@ -10,12 +10,16 @@ use crate::backend::llvm::compiler::codegen::instructions::builder_cursor::Build
pub(super) fn lower_log_or_trace<'ctx, 'b>(
codegen: &CodegenContext<'ctx>,
cursor: &mut BuilderCursor<'ctx, 'b>,
resolver: &mut super::super::Resolver<'ctx>,
cur_bid: BasicBlockId,
vmap: &mut HashMap<ValueId, BVE<'ctx>>,
dst: &Option<ValueId>,
iface_name: &str,
method_name: &str,
args: &[ValueId],
bb_map: &std::collections::HashMap<crate::mir::BasicBlockId, inkwell::basic_block::BasicBlock<'ctx>>,
preds: &std::collections::HashMap<crate::mir::BasicBlockId, Vec<crate::mir::BasicBlockId>>,
block_end_values: &std::collections::HashMap<crate::mir::BasicBlockId, std::collections::HashMap<ValueId, BVE<'ctx>>>,
) -> Result<(), String> {
if args.len() != 1 {
return Err(format!("{}.{} expects 1 arg", iface_name, method_name));
@ -49,23 +53,8 @@ pub(super) fn lower_log_or_trace<'ctx, 'b>(
}
// Otherwise, convert to i64 and call handle variant
_ => {
let arg_val = match av {
BVE::IntValue(iv) => {
if iv.get_type() == codegen.context.bool_type() {
cursor
.emit_instr(cur_bid, |b| b.build_int_z_extend(iv, codegen.context.i64_type(), "bool2i64"))
.map_err(|e| e.to_string())?
} else if iv.get_type() == codegen.context.i64_type() {
iv
} else {
cursor
.emit_instr(cur_bid, |b| b.build_int_s_extend(iv, codegen.context.i64_type(), "int2i64"))
.map_err(|e| e.to_string())?
}
}
BVE::PointerValue(_) => unreachable!(),
_ => return Err("console.log arg conversion failed".to_string()),
};
// Localize to i64 to satisfy dominance
let arg_val = resolver.resolve_i64(codegen, cursor, cur_bid, args[0], bb_map, preds, block_end_values, vmap)?;
let fnty = codegen
.context
.i64_type()

View File

@ -10,10 +10,14 @@ use crate::backend::llvm::compiler::codegen::instructions::builder_cursor::Build
pub(super) fn lower_future_spawn_instance<'ctx, 'b>(
codegen: &CodegenContext<'ctx>,
cursor: &mut BuilderCursor<'ctx, 'b>,
resolver: &mut super::super::Resolver<'ctx>,
cur_bid: BasicBlockId,
vmap: &mut HashMap<ValueId, BVE<'ctx>>,
dst: &Option<ValueId>,
args: &[ValueId],
bb_map: &std::collections::HashMap<crate::mir::BasicBlockId, inkwell::basic_block::BasicBlock<'ctx>>,
preds: &std::collections::HashMap<crate::mir::BasicBlockId, Vec<crate::mir::BasicBlockId>>,
block_end_values: &std::collections::HashMap<crate::mir::BasicBlockId, std::collections::HashMap<ValueId, BVE<'ctx>>>,
) -> Result<(), String> {
if args.len() < 2 {
return Err("env.future.spawn_instance expects at least (recv, method_name)".to_string());
@ -22,10 +26,10 @@ pub(super) fn lower_future_spawn_instance<'ctx, 'b>(
let i8p = codegen.context.ptr_type(AddressSpace::from(0));
let recv_v = *vmap.get(&args[0]).ok_or("recv missing")?;
let recv_h = match recv_v {
BVE::IntValue(iv) => iv,
BVE::PointerValue(pv) => cursor
.emit_instr(cur_bid, |b| b.build_ptr_to_int(pv, i64t, "recv_p2i"))
.map_err(|e| e.to_string())?,
BVE::IntValue(_) | BVE::PointerValue(_) => {
// Localize to i64 to satisfy dominance; converts ptr→i64 if needed
resolver.resolve_i64(codegen, cursor, cur_bid, args[0], bb_map, preds, block_end_values, vmap)?
}
_ => return Err("spawn_instance recv must be int or ptr".to_string()),
};
let name_v = *vmap.get(&args[1]).ok_or("method name missing")?;
@ -54,11 +58,15 @@ pub(super) fn lower_future_spawn_instance<'ctx, 'b>(
pub(super) fn lower_local_get<'ctx, 'b>(
codegen: &CodegenContext<'ctx>,
cursor: &mut BuilderCursor<'ctx, 'b>,
_resolver: &mut super::super::Resolver<'ctx>,
cur_bid: BasicBlockId,
func: &MirFunction,
vmap: &mut HashMap<ValueId, BVE<'ctx>>,
dst: &Option<ValueId>,
args: &[ValueId],
_bb_map: &std::collections::HashMap<crate::mir::BasicBlockId, inkwell::basic_block::BasicBlock<'ctx>>,
_preds: &std::collections::HashMap<crate::mir::BasicBlockId, Vec<crate::mir::BasicBlockId>>,
_block_end_values: &std::collections::HashMap<crate::mir::BasicBlockId, std::collections::HashMap<ValueId, BVE<'ctx>>>,
) -> Result<(), String> {
if args.len() != 1 {
return Err("env.local.get expects 1 arg".to_string());
@ -119,10 +127,14 @@ pub(super) fn lower_local_get<'ctx, 'b>(
pub(super) fn lower_box_new<'ctx, 'b>(
codegen: &CodegenContext<'ctx>,
cursor: &mut BuilderCursor<'ctx, 'b>,
resolver: &mut super::super::Resolver<'ctx>,
cur_bid: BasicBlockId,
vmap: &mut HashMap<ValueId, BVE<'ctx>>,
dst: &Option<ValueId>,
args: &[ValueId],
bb_map: &std::collections::HashMap<crate::mir::BasicBlockId, inkwell::basic_block::BasicBlock<'ctx>>,
preds: &std::collections::HashMap<crate::mir::BasicBlockId, Vec<crate::mir::BasicBlockId>>,
block_end_values: &std::collections::HashMap<crate::mir::BasicBlockId, std::collections::HashMap<ValueId, BVE<'ctx>>>,
) -> Result<(), String> {
// Two variants: (name) and (argc, arg1, arg2, arg3, arg4) with optional ptr conversion
// Prefer the i64 birth when possible; else call env.box.new(name)
@ -186,7 +198,9 @@ pub(super) fn lower_box_new<'ctx, 'b>(
if args.len() >= 2 {
let bv = *vmap.get(&args[1]).ok_or("arg missing")?;
a1 = match bv {
BVE::IntValue(iv) => iv,
BVE::IntValue(_) | BVE::PointerValue(_) => {
resolver.resolve_i64(codegen, cursor, cur_bid, args[1], bb_map, preds, block_end_values, vmap)?
}
BVE::FloatValue(fv) => {
let fnty = i64t.fn_type(&[codegen.context.f64_type().into()], false);
let callee = codegen
@ -202,18 +216,7 @@ pub(super) fn lower_box_new<'ctx, 'b>(
.ok_or("from_f64 returned void".to_string())?;
if let BVE::IntValue(h) = rv { h } else { return Err("from_f64 ret expected i64".to_string()); }
}
BVE::PointerValue(pv) => {
let fnty = i64t.fn_type(&[i8p.into()], false);
let callee = codegen
.module
.get_function("nyash.box.from_i8_string")
.unwrap_or_else(|| codegen.module.add_function("nyash.box.from_i8_string", fnty, None));
let call = cursor
.emit_instr(cur_bid, |b| b.build_call(callee, &[pv.into()], "arg1_i8_to_box"))
.map_err(|e| e.to_string())?;
let rv = call.try_as_basic_value().left().ok_or("from_i8_string returned void".to_string())?;
if let BVE::IntValue(h) = rv { h } else { return Err("from_i8_string ret expected i64".to_string()); }
}
// Pointer handled above by resolve_i64
_ => return Err("unsupported arg value for env.box.new".to_string()),
};
}
@ -221,7 +224,9 @@ pub(super) fn lower_box_new<'ctx, 'b>(
if args.len() >= 3 {
let bv = *vmap.get(&args[2]).ok_or("arg missing")?;
a2 = match bv {
BVE::IntValue(iv) => iv,
BVE::IntValue(_) | BVE::PointerValue(_) => {
resolver.resolve_i64(codegen, cursor, cur_bid, args[2], bb_map, preds, block_end_values, vmap)?
}
BVE::FloatValue(fv) => {
let fnty = i64t.fn_type(&[codegen.context.f64_type().into()], false);
let callee = codegen
@ -237,18 +242,7 @@ pub(super) fn lower_box_new<'ctx, 'b>(
.ok_or("from_f64 returned void".to_string())?;
if let BVE::IntValue(h) = rv { h } else { return Err("from_f64 ret expected i64".to_string()); }
}
BVE::PointerValue(pv) => {
let fnty = i64t.fn_type(&[i8p.into()], false);
let callee = codegen
.module
.get_function("nyash.box.from_i8_string")
.unwrap_or_else(|| codegen.module.add_function("nyash.box.from_i8_string", fnty, None));
let call = cursor
.emit_instr(cur_bid, |b| b.build_call(callee, &[pv.into()], "arg2_i8_to_box"))
.map_err(|e| e.to_string())?;
let rv = call.try_as_basic_value().left().ok_or("from_i8_string returned void".to_string())?;
if let BVE::IntValue(h) = rv { h } else { return Err("from_i8_string ret expected i64".to_string()); }
}
// Pointer handled above by resolve_i64
_ => return Err("unsupported arg value for env.box.new".to_string()),
};
}

View File

@ -12,6 +12,7 @@ use crate::backend::llvm::compiler::codegen::instructions::builder_cursor::Build
pub(in super::super) fn lower_externcall<'ctx, 'b>(
codegen: &CodegenContext<'ctx>,
cursor: &mut BuilderCursor<'ctx, 'b>,
resolver: &mut super::Resolver<'ctx>,
cur_bid: BasicBlockId,
func: &MirFunction,
vmap: &mut HashMap<ValueId, BVE<'ctx>>,
@ -19,13 +20,16 @@ pub(in super::super) fn lower_externcall<'ctx, 'b>(
iface_name: &str,
method_name: &str,
args: &[ValueId],
bb_map: &std::collections::HashMap<crate::mir::BasicBlockId, inkwell::basic_block::BasicBlock<'ctx>>,
preds: &std::collections::HashMap<crate::mir::BasicBlockId, Vec<crate::mir::BasicBlockId>>,
block_end_values: &std::collections::HashMap<crate::mir::BasicBlockId, std::collections::HashMap<ValueId, BVE<'ctx>>>,
) -> Result<(), String> {
// console/debug
if (iface_name == "env.console"
&& matches!(method_name, "log" | "warn" | "error"))
|| (iface_name == "env.debug" && method_name == "trace")
{
return console::lower_log_or_trace(codegen, cursor, cur_bid, vmap, dst, iface_name, method_name, args);
return console::lower_log_or_trace(codegen, cursor, resolver, cur_bid, vmap, dst, iface_name, method_name, args, bb_map, preds, block_end_values);
}
if iface_name == "env.console" && method_name == "readLine" {
return console::lower_readline(codegen, cursor, cur_bid, vmap, dst, args);
@ -33,13 +37,13 @@ pub(in super::super) fn lower_externcall<'ctx, 'b>(
// env.*
if iface_name == "env.future" && method_name == "spawn_instance" {
return env::lower_future_spawn_instance(codegen, cursor, cur_bid, vmap, dst, args);
return env::lower_future_spawn_instance(codegen, cursor, resolver, cur_bid, vmap, dst, args, bb_map, preds, block_end_values);
}
if iface_name == "env.local" && method_name == "get" {
return env::lower_local_get(codegen, cursor, cur_bid, func, vmap, dst, args);
return env::lower_local_get(codegen, cursor, resolver, cur_bid, func, vmap, dst, args, bb_map, preds, block_end_values);
}
if iface_name == "env.box" && method_name == "new" {
return env::lower_box_new(codegen, cursor, cur_bid, vmap, dst, args);
return env::lower_box_new(codegen, cursor, resolver, cur_bid, vmap, dst, args, bb_map, preds, block_end_values);
}
Err(format!(

View File

@ -56,38 +56,7 @@ pub(in super::super) fn emit_jump<'ctx, 'b>(
>,
vmap: &HashMap<ValueId, BasicValueEnum<'ctx>>,
) -> Result<(), String> {
let sealed = std::env::var("NYASH_LLVM_PHI_SEALED").ok().as_deref() == Some("1");
if !sealed {
if let Some(list) = phis_by_block.get(target) {
for (_dst, phi, inputs) in list {
if let Some((_, in_vid)) = inputs.iter().find(|(pred, _)| pred == &bid) {
let mut val = *vmap.get(in_vid).ok_or("phi incoming value missing")?;
let pred_bb = *bb_map.get(&bid).ok_or("pred bb missing")?;
// Coerce incoming to PHI type when needed
val = coerce_to_type(codegen, phi, val)?;
if std::env::var("NYASH_CLI_VERBOSE").ok().as_deref() == Some("1") {
let tys = phi
.as_basic_value()
.get_type()
.print_to_string()
.to_string();
eprintln!(
"[PHI] incoming add pred_bb={} val={} ty={}",
bid.as_u32(),
in_vid.as_u32(),
tys
);
}
match val {
BasicValueEnum::IntValue(iv) => phi.add_incoming(&[(&iv, pred_bb)]),
BasicValueEnum::FloatValue(fv) => phi.add_incoming(&[(&fv, pred_bb)]),
BasicValueEnum::PointerValue(pv) => phi.add_incoming(&[(&pv, pred_bb)]),
_ => return Err("unsupported phi incoming value".to_string()),
}
}
}
}
}
// Non-sealed incoming wiring removed: rely on sealed snapshots and resolver-driven PHIs.
let tbb = *bb_map.get(target).ok_or("target bb missing")?;
if std::env::var("NYASH_CLI_VERBOSE").ok().as_deref() == Some("1") {
eprintln!("[LLVM] emit_jump: {} -> {}", bid.as_u32(), target.as_u32());
@ -101,6 +70,7 @@ pub(in super::super) fn emit_jump<'ctx, 'b>(
pub(in super::super) fn emit_branch<'ctx, 'b>(
codegen: &CodegenContext<'ctx>,
cursor: &mut BuilderCursor<'ctx, 'b>,
resolver: &mut super::Resolver<'ctx>,
bid: BasicBlockId,
condition: &ValueId,
then_bb: &BasicBlockId,
@ -118,79 +88,13 @@ pub(in super::super) fn emit_branch<'ctx, 'b>(
let cond_v = *vmap.get(condition).ok_or("cond missing")?;
let b = match cond_v {
BasicValueEnum::IntValue(_) | BasicValueEnum::PointerValue(_) | BasicValueEnum::FloatValue(_) => {
let ci = localize_to_i64(codegen, cursor, bid, *condition, bb_map, preds, block_end_values, vmap)?;
let ci = resolver.resolve_i64(codegen, cursor, bid, *condition, bb_map, preds, block_end_values, vmap)?;
let zero = codegen.context.i64_type().const_zero();
codegen.builder.build_int_compare(inkwell::IntPredicate::NE, ci, zero, "cond_nez").map_err(|e| e.to_string())?
}
_ => to_bool(codegen.context, cond_v, &codegen.builder)?,
};
let sealed = std::env::var("NYASH_LLVM_PHI_SEALED").ok().as_deref() == Some("1");
// then
if !sealed {
if let Some(list) = phis_by_block.get(then_bb) {
for (_dst, phi, inputs) in list {
if let Some((_, in_vid)) = inputs.iter().find(|(pred, _)| pred == &bid) {
let mut val = *vmap
.get(in_vid)
.ok_or("phi incoming (then) value missing")?;
let pred_bb = *bb_map.get(&bid).ok_or("pred bb missing")?;
val = coerce_to_type(codegen, phi, val)?;
if std::env::var("NYASH_CLI_VERBOSE").ok().as_deref() == Some("1") {
let tys = phi
.as_basic_value()
.get_type()
.print_to_string()
.to_string();
eprintln!(
"[PHI] incoming add (then) pred_bb={} val={} ty={}",
bid.as_u32(),
in_vid.as_u32(),
tys
);
}
match val {
BasicValueEnum::IntValue(iv) => phi.add_incoming(&[(&iv, pred_bb)]),
BasicValueEnum::FloatValue(fv) => phi.add_incoming(&[(&fv, pred_bb)]),
BasicValueEnum::PointerValue(pv) => phi.add_incoming(&[(&pv, pred_bb)]),
_ => return Err("unsupported phi incoming value (then)".to_string()),
}
}
}
}
}
// else
if !sealed {
if let Some(list) = phis_by_block.get(else_bb) {
for (_dst, phi, inputs) in list {
if let Some((_, in_vid)) = inputs.iter().find(|(pred, _)| pred == &bid) {
let mut val = *vmap
.get(in_vid)
.ok_or("phi incoming (else) value missing")?;
let pred_bb = *bb_map.get(&bid).ok_or("pred bb missing")?;
val = coerce_to_type(codegen, phi, val)?;
if std::env::var("NYASH_CLI_VERBOSE").ok().as_deref() == Some("1") {
let tys = phi
.as_basic_value()
.get_type()
.print_to_string()
.to_string();
eprintln!(
"[PHI] incoming add (else) pred_bb={} val={} ty={}",
bid.as_u32(),
in_vid.as_u32(),
tys
);
}
match val {
BasicValueEnum::IntValue(iv) => phi.add_incoming(&[(&iv, pred_bb)]),
BasicValueEnum::FloatValue(fv) => phi.add_incoming(&[(&fv, pred_bb)]),
BasicValueEnum::PointerValue(pv) => phi.add_incoming(&[(&pv, pred_bb)]),
_ => return Err("unsupported phi incoming value (else)".to_string()),
}
}
}
}
}
// Non-sealed incoming wiring removed: rely on sealed snapshots and resolver-driven PHIs.
let tbb = *bb_map.get(then_bb).ok_or("then bb missing")?;
let ebb = *bb_map.get(else_bb).ok_or("else bb missing")?;
if std::env::var("NYASH_CLI_VERBOSE").ok().as_deref() == Some("1") {

View File

@ -13,6 +13,7 @@ mod maps;
mod arith_ops;
mod call;
mod loopform;
mod resolver;
pub(super) use blocks::{create_basic_blocks, precreate_phis};
pub(super) use flow::{emit_branch, emit_jump, emit_return};
@ -26,3 +27,4 @@ pub(super) use arith_ops::{lower_binop, lower_unary};
pub(super) use call::lower_call;
pub(super) use loopform::{LoopFormContext, lower_while_loopform};
pub(super) use loopform::normalize_header_phis_for_latch;
pub(super) use resolver::Resolver;

View File

@ -0,0 +1,43 @@
use std::collections::HashMap;
use inkwell::values::{BasicValueEnum as BVE, IntValue};
use crate::backend::llvm::context::CodegenContext;
use crate::mir::{BasicBlockId, ValueId};
use super::builder_cursor::BuilderCursor;
use super::flow::localize_to_i64;
/// Minimal per-function resolver caches. Caches localized i64 values per (block,value) to avoid
/// redundant PHIs and casts when multiple users in the same block request the same MIR value.
pub struct Resolver<'ctx> {
i64_locals: HashMap<(BasicBlockId, ValueId), IntValue<'ctx>>,
}
impl<'ctx> Resolver<'ctx> {
pub fn new() -> Self {
Self { i64_locals: HashMap::new() }
}
/// Resolve a MIR value as an i64 dominating the current block.
/// Strategy: if present in cache, return it; otherwise localize via sealed snapshots and cache.
pub fn resolve_i64<'b>(
&mut self,
codegen: &CodegenContext<'ctx>,
cursor: &mut BuilderCursor<'ctx, 'b>,
cur_bid: BasicBlockId,
vid: ValueId,
bb_map: &std::collections::HashMap<BasicBlockId, inkwell::basic_block::BasicBlock<'ctx>>,
preds: &std::collections::HashMap<BasicBlockId, Vec<BasicBlockId>>,
block_end_values: &std::collections::HashMap<BasicBlockId, std::collections::HashMap<ValueId, BVE<'ctx>>>,
vmap: &std::collections::HashMap<ValueId, BVE<'ctx>>,
) -> Result<IntValue<'ctx>, String> {
if let Some(iv) = self.i64_locals.get(&(cur_bid, vid)).copied() {
return Ok(iv);
}
let iv = localize_to_i64(codegen, cursor, cur_bid, vid, bb_map, preds, block_end_values, vmap)?;
self.i64_locals.insert((cur_bid, vid), iv);
Ok(iv)
}
}

View File

@ -5,12 +5,13 @@ use inkwell::{values::BasicValueEnum as BVE, AddressSpace};
use crate::backend::llvm::context::CodegenContext;
use crate::mir::{function::MirFunction, BasicBlockId, ValueId};
use super::builder_cursor::BuilderCursor;
use super::flow::localize_to_i64;
use super::Resolver;
/// Handle String-specific methods. Returns true if handled, false to let caller continue.
pub(super) fn try_handle_string_method<'ctx, 'b>(
codegen: &CodegenContext<'ctx>,
cursor: &mut BuilderCursor<'ctx, 'b>,
resolver: &mut Resolver<'ctx>,
cur_bid: BasicBlockId,
func: &MirFunction,
vmap: &mut HashMap<ValueId, inkwell::values::BasicValueEnum<'ctx>>,
@ -60,8 +61,8 @@ pub(super) fn try_handle_string_method<'ctx, 'b>(
}
(BVE::PointerValue(lp), BVE::IntValue(_ri)) => {
let i64t = codegen.context.i64_type();
// Localize rhs integer in current block
let ri = localize_to_i64(codegen, cursor, cur_bid, args[0], bb_map, preds, block_end_values, vmap)?;
// Localize rhs integer in current block via Resolver
let ri = resolver.resolve_i64(codegen, cursor, cur_bid, args[0], bb_map, preds, block_end_values, vmap)?;
let fnty = i8p.fn_type(&[i8p.into(), i64t.into()], false);
let callee = codegen
.module
@ -83,7 +84,7 @@ pub(super) fn try_handle_string_method<'ctx, 'b>(
(BVE::IntValue(_li), BVE::PointerValue(rp)) => {
let i64t = codegen.context.i64_type();
// Localize receiver integer in current block (box_val)
let li = localize_to_i64(codegen, cursor, cur_bid, *box_val, bb_map, preds, block_end_values, vmap)?;
let li = resolver.resolve_i64(codegen, cursor, cur_bid, *box_val, bb_map, preds, block_end_values, vmap)?;
let fnty = i8p.fn_type(&[i64t.into(), i8p.into()], false);
let callee = codegen
.module
@ -172,8 +173,8 @@ pub(super) fn try_handle_string_method<'ctx, 'b>(
_ => return Ok(false),
};
// Localize start/end indices to current block via sealed snapshots (i64)
let s = localize_to_i64(codegen, cursor, cur_bid, args[0], bb_map, preds, block_end_values, vmap)?;
let e = localize_to_i64(codegen, cursor, cur_bid, args[1], bb_map, preds, block_end_values, vmap)?;
let s = resolver.resolve_i64(codegen, cursor, cur_bid, args[0], bb_map, preds, block_end_values, vmap)?;
let e = resolver.resolve_i64(codegen, cursor, cur_bid, args[1], bb_map, preds, block_end_values, vmap)?;
let fnty = i8p.fn_type(&[i8p.into(), i64t.into(), i64t.into()], false);
let callee = codegen
.module

View File

@ -203,10 +203,13 @@ impl LLVMCompiler {
// Lower body
let mut loopform_loop_id: u32 = 0;
let sealed_mode = std::env::var("NYASH_LLVM_PHI_SEALED").ok().as_deref() == Some("1");
// Default sealed-SSA ON unless explicitly disabled with NYASH_LLVM_PHI_SEALED=0
let sealed_mode = std::env::var("NYASH_LLVM_PHI_SEALED").ok().as_deref() != Some("0");
// LoopForm registry (per-function lowering; gated)
let mut loopform_registry: HashMap<crate::mir::BasicBlockId, (inkwell::basic_block::BasicBlock, PhiValue, PhiValue, inkwell::basic_block::BasicBlock)> = HashMap::new();
let mut loopform_body_to_header: HashMap<crate::mir::BasicBlockId, crate::mir::BasicBlockId> = HashMap::new();
// Per-function Resolver for dominance-safe value access (i64 minimal)
let mut resolver = instructions::Resolver::new();
for (bi, bid) in block_ids.iter().enumerate() {
let bb = *bb_map.get(bid).unwrap();
// Use cursor to position at BB start for lowering
@ -289,6 +292,7 @@ impl LLVMCompiler {
instructions::lower_boxcall(
&codegen,
&mut cursor,
&mut resolver,
*bid,
func,
&mut vmap,
@ -306,7 +310,21 @@ impl LLVMCompiler {
if let Some(d) = dst { defined_in_block.insert(*d); }
},
MirInstruction::ExternCall { dst, iface_name, method_name, args, effects: _ } => {
instructions::lower_externcall(&codegen, &mut cursor, *bid, func, &mut vmap, dst, iface_name, method_name, args)?;
instructions::lower_externcall(
&codegen,
&mut cursor,
&mut resolver,
*bid,
func,
&mut vmap,
dst,
iface_name,
method_name,
args,
&bb_map,
&preds,
&block_end_values,
)?;
if let Some(d) = dst { defined_in_block.insert(*d); }
},
MirInstruction::UnaryOp { dst, op, operand } => {
@ -314,11 +332,11 @@ impl LLVMCompiler {
defined_in_block.insert(*dst);
},
MirInstruction::BinOp { dst, op, lhs, rhs } => {
instructions::lower_binop(&codegen, &mut cursor, *bid, func, &mut vmap, *dst, op, lhs, rhs)?;
instructions::lower_binop(&codegen, &mut cursor, &mut resolver, *bid, func, &mut vmap, *dst, op, lhs, rhs, &bb_map, &preds, &block_end_values)?;
defined_in_block.insert(*dst);
},
MirInstruction::Compare { dst, op, lhs, rhs } => {
let out = instructions::lower_compare(&codegen, &mut cursor, *bid, func, &vmap, op, lhs, rhs, &bb_map, &preds, &block_end_values)?;
let out = instructions::lower_compare(&codegen, &mut cursor, &mut resolver, *bid, func, &vmap, op, lhs, rhs, &bb_map, &preds, &block_end_values)?;
vmap.insert(*dst, out);
defined_in_block.insert(*dst);
},
@ -439,7 +457,7 @@ impl LLVMCompiler {
}
}
if !handled_by_loopform {
instructions::emit_branch(&codegen, &mut cursor, *bid, condition, then_bb, else_bb, &bb_map, &phis_by_block, &vmap, &preds, &block_end_values)?;
instructions::emit_branch(&codegen, &mut cursor, &mut resolver, *bid, condition, then_bb, else_bb, &bb_map, &phis_by_block, &vmap, &preds, &block_end_values)?;
}
}
_ => {

83
tools/llvmlite_harness.py Normal file
View File

@ -0,0 +1,83 @@
#!/usr/bin/env python3
"""
Experimental llvmlite-based LLVM emission harness for Nyash.
Usage:
python3 tools/llvmlite_harness.py [--in MIR.json] --out OUTPUT.o
Notes:
- First cut emits a trivial ny_main that returns 0 to validate toolchain.
- Extend to lower MIR14 JSON incrementally.
"""
from __future__ import annotations
import argparse
import json
import os
import sys
try:
from llvmlite import ir, binding
except Exception as e: # noqa: BLE001
sys.stderr.write(
"llvmlite is required. Install with: python3 -m pip install llvmlite\n"
)
sys.stderr.write(f"Import error: {e}\n")
sys.exit(2)
def parse_args() -> argparse.Namespace:
ap = argparse.ArgumentParser(description="Nyash llvmlite harness")
ap.add_argument("--in", dest="in_path", help="MIR14 JSON input (optional)")
ap.add_argument("--out", dest="out_path", required=True, help="Output object file path")
return ap.parse_args()
def load_mir(path: str | None) -> dict | None:
if not path:
return None
with open(path, "r", encoding="utf-8") as f:
return json.load(f)
def build_trivial_module() -> ir.Module:
mod = ir.Module(name="nyash_harness")
mod.triple = binding.get_default_triple()
i64 = ir.IntType(64)
i8 = ir.IntType(8)
i8p = i8.as_pointer()
i8pp = i8p.as_pointer()
fn_ty = ir.FunctionType(i64, [i64, i8pp])
fn = ir.Function(mod, fn_ty, name="ny_main")
entry = fn.append_basic_block(name="entry")
b = ir.IRBuilder(entry)
b.ret(ir.Constant(i64, 0))
return mod
def emit_object(mod: ir.Module, out_path: str) -> None:
binding.initialize()
binding.initialize_native_target()
binding.initialize_native_asmprinter()
target = binding.Target.from_default_triple()
tm = target.create_target_machine()
llvm_mod = binding.parse_assembly(str(mod))
llvm_mod.verify()
obj = tm.emit_object(llvm_mod)
with open(out_path, "wb") as f:
f.write(obj)
def main() -> int:
ns = parse_args()
_mir = load_mir(ns.in_path)
# For now, ignore MIR content and emit a trivial module.
mod = build_trivial_module()
os.makedirs(os.path.dirname(ns.out_path) or ".", exist_ok=True)
emit_object(mod, ns.out_path)
return 0
if __name__ == "__main__": # pragma: no cover
raise SystemExit(main())