harness(llvm/py): fix PHI/dominance via Resolver-only; per-pred localization and constant GEPs; stabilize Main.esc_json/1, dirname/1, node_json/3; docs: add NYASH_LLVM_TRACE_FINAL and Resolver-only invariants

- Resolver-only reads across BBs; remove vmap fallbacks
- Create PHIs at block start; insert casts in preds before terminators
- Re-materialize int in preds to satisfy dominance (add/zext/trunc)
- Use constant GEP for method strings to avoid order dependency
- Order non-PHI lowering to preserve producer→consumer dominance
- Update docs: RESOLVER_API.md, LLVM_HARNESS.md
- compare_harness_on_off: ON/OFF exits match; linking green
This commit is contained in:
Selfhosting Dev
2025-09-13 19:49:03 +09:00
parent 1d6fab4eda
commit 2a9aa5368d
15 changed files with 646 additions and 214 deletions

View File

@ -5,6 +5,14 @@ Summary
- VM/Cranelift/Interpreter は MIR14 非対応。MIR 正規化Resolver・LoopForm規約を Rust 側で担保し、ハーネスにも同じ形を供給する。
- 代表ケースapps/selfhost/tools/dep_tree_min_string.nyash`.o`(および必要時 EXEを安定生成。Harness ON/OFF で機能同値を確認。
Quick Status — 20250913compressed
- Harness ONllvmliteで .ll verify green → .o → link 成立dep_tree_min_string
- Resolver-only 統一vmap直読排除。PHIはBB先頭に集約・i64ハンドル固定、pointer incomingはpred終端で boxingGEP+from_i8_string
- 降下順序: preds優先の擬似トポロジカル順に block 降下。非PHI命令は常に「現在BB」末尾に挿入dominance安定
- 文字列: + は stringタグ/ptr検出時のみ concat_hh、len/eq 対応、substring/lastIndexOf は handle版_hii/_hhをNyRTに実装・使用
- const(string): Global保持→使用側で GEP→i8*、MIR main→private、ny_main ラッパ生成
- 比較/検証: compare_harness_on_off.sh で ON/OFF のExit一致現状JSONは双方空、最終一致に向け調整中
Hot Update — 20250913Harness 配線・フォールバック廃止)
- RunnerLLVMモードにハーネス配線を追加。`NYASH_LLVM_USE_HARNESS=1` のとき:
- MIR(JSON) を `tmp/nyash_harness_mir.json` へ出力
@ -24,9 +32,9 @@ Hot Update — 20250913Resolveronly 統一 + Harness ON green
- 代表ケースdep_tree_min_string: Harness ON で `.ll verify green → .o` を確認し、NyRT とリンクして EXE 生成成功。
Nextshort
1) ON/OFF 等価性の拡張(戻り値/検証ログ/最終出力の一致まで)
2) Resolver フォールバックの残存箇所を削除し、完全 Resolveronly 固定
3) 代表ケースの拡充println/実出力の比較)とドキュメント更新Resolver 規約・PHI/ptr/i64 ポリシー
1) PHI/dominance最終安定化Main.esc_json/1, Main.dirname/1→ ON/OFF の最終JSON一致
2) 残オンデマンドPHI/フォールバック撤去(完全 Resolveronly 固定・vmap直読ゼロ
3) Docs/Trace 更新Resolver/PHI/ptri64、不変条件、NYASH_LLVM_TRACE_FINAL
Compact Roadmap20250913 改定)
- Focus ARust LLVM 維持): Flow hardening, PHI(sealed) 安定化, LoopForm 仕様遵守。
@ -34,9 +42,9 @@ Compact Roadmap20250913 改定)
- Now:
- Sealed SSA・Cursor 厳格化を導入済み。dep_tree_min_string の `.o` 生成と verifier green を Rust LLVM で確認済み。
- Nextshort:
1) ON/OFF 等価性の拡張(戻り値/ログ/出力比較)
2) Resolver フォールバックの完全除去(常時 Resolver 経由
3) ドキュメント更新Resolver-only/局所化規律、PHI(sealed)、ptr/i64 ブリッジ)
1) ON/OFF 等価性(戻り値/ログ/最終JSONの一致確認
2) 残フォールバック撤去(完全 Resolveronly 固定
3) Docs 更新Resolver/PHI/ptri64 ブリッジ)
- Flags:
- `NYASH_ENABLE_LOOPFORM=1`非破壊ON
- `NYASH_LOOPFORM_BODY2DISPATCH=1`(実験: 単純ボディのbody→dispatch

Binary file not shown.

Binary file not shown.

View File

@ -101,6 +101,46 @@ pub extern "C" fn nyash_string_eq_hh_export(a_h: i64, b_h: i64) -> i64 {
}
}
// String.substring_hii(handle, start, end) -> handle
#[export_name = "nyash.string.substring_hii"]
pub extern "C" fn nyash_string_substring_hii_export(h: i64, start: i64, end: i64) -> i64 {
use nyash_rust::{box_trait::NyashBox, box_trait::StringBox, jit::rt::handles};
if h <= 0 {
return 0;
}
let s = if let Some(obj) = handles::get(h as u64) {
if let Some(sb) = obj.as_any().downcast_ref::<StringBox>() { sb.value.clone() } else { String::new() }
} else { String::new() };
let n = s.len() as i64;
let mut st = if start < 0 { 0 } else { start };
let mut en = if end < 0 { 0 } else { end };
if st > n { st = n; }
if en > n { en = n; }
if en < st { std::mem::swap(&mut st, &mut en); }
let (st_u, en_u) = (st as usize, en as usize);
let sub = s.get(st_u.min(s.len())..en_u.min(s.len())).unwrap_or("");
let arc: std::sync::Arc<dyn NyashBox> = std::sync::Arc::new(StringBox::new(sub.to_string()));
handles::to_handle(arc) as i64
}
// String.lastIndexOf_hh(haystack_h, needle_h) -> i64
#[export_name = "nyash.string.lastIndexOf_hh"]
pub extern "C" fn nyash_string_lastindexof_hh_export(h: i64, n: i64) -> i64 {
use nyash_rust::{box_trait::StringBox, jit::rt::handles};
let hay = if h > 0 {
if let Some(o) = handles::get(h as u64) {
if let Some(sb) = o.as_any().downcast_ref::<StringBox>() { sb.value.clone() } else { String::new() }
} else { String::new() }
} else { String::new() };
let nee = if n > 0 {
if let Some(o) = handles::get(n as u64) {
if let Some(sb) = o.as_any().downcast_ref::<StringBox>() { sb.value.clone() } else { String::new() }
} else { String::new() }
} else { String::new() };
if nee.is_empty() { return hay.len() as i64; }
if let Some(pos) = hay.rfind(&nee) { pos as i64 } else { -1 }
}
// box.from_i8_string(ptr) -> handle
// Helper: build a StringBox from i8* and return a handle for AOT marshalling
#[export_name = "nyash.box.from_i8_string"]

View File

@ -7,6 +7,10 @@ Purpose
Switch
- `NYASH_LLVM_USE_HARNESS=1` でハーネス優先LLVM バックエンド入口から起動)。
Tracing
- `NYASH_LLVM_TRACE_FINAL=1` を設定すると、代表コール(`Main.node_json/3`, `Main.esc_json/1`, `main` 等)を標準出力へ簡易トレースします。
ON/OFF の最終 JSON 突合の補助に使用してください。
Protocol
- Input: MIR14 JSONRust 前段で Resolver/LoopForm 規約を満たした形)。
- Output: `.o` オブジェクト(既定: `NYASH_AOT_OBJECT_OUT` または `NYASH_LLVM_OBJ_OUT`)。

View File

@ -6,20 +6,33 @@ Goals
- De-duplicate per (block, value) to avoid redundant PHIs/casts.
Design
- Resolver-only reads: lowerers must fetch cross-block values via `Resolver` (ban direct `vmap.get` for reads across BBs).
- `Resolver` keeps small per-function caches keyed by `(BasicBlockId, ValueId)`.
- `resolve_i64(...)` returns an `i64`-typed `IntValue`, inserting PHI and casts as needed using sealed snapshots.
- `resolve_ptr(...)` returns an `i8*` `PointerValue`, PHI at BB start; int handles are bridged via `inttoptr`.
- `resolve_f64(...)` returns an `f64` `FloatValue`, PHI at BB start; ints bridged via `sitofp`.
- Internally uses `flow::localize_to_i64(...)` for the i64 path; pointer/float are localized directly.
- `resolve_i64(...)` returns an `i64`-typed value, inserting a PHI at the beginning of the current block and wiring
incoming from predecessor end-snapshots (sealed SSA). Any required casts are inserted in predecessor blocks just
before their terminators to preserve dominance.
- `resolve_ptr(...)` returns an `i8*` value localized to the current block; integer handles are bridged via `inttoptr`.
- `resolve_f64(...)` returns an `f64` value localized to the current block; ints bridged via `sitofp`.
- Internally uses sealed snapshots (`block_end_values`) to avoid referencing values that do not dominate the use.
Usage (planned wiring)
- Create `let mut resolver = instructions::Resolver::new();` at function lowering start.
- Replace all integer value fetches in lowerers with `resolver.resolve_i64(...)`.
- Keep builder insertion discipline via `BuilderCursor`.
Ban: Direct `vmap.get(..)` for cross-BB reads
- Direct reads from `vmap` are allowed only for values defined in the same block (after they are created).
- For any value that may come from a predecessor, always go through `Resolver`.
- CI guard: keep `rg "vmap\.get\(" src/backend/llvm` at zero for instruction paths (Resolver-only).
Next
- Migrate remaining `localize_to_i64` call sites to the resolver.
- Enforce vmap direct access ban in lowerers (Resolver-only for reads).
Tracing
- `NYASH_LLVM_TRACE_PHI=1`: log PHI creation/wiring in the Rust/inkwell path.
- `NYASH_LLVM_TRACE_FINAL=1`: in the Python/llvmlite harness, trace selected final calls (e.g., `Main.node_json/3`,
`Main.esc_json/1`) to correlate ON/OFF outputs during parity checks.
Acceptance tie-in
- Combined with LoopForm: dispatch-only PHI + resolver-based value access → dominance violations drop to zero (A2.5).

View File

@ -45,17 +45,40 @@ def lower_binop(
# Relational/equality operators delegate to compare
if op in ('==','!=','<','>','<=','>='):
lower_compare(builder, op, lhs, rhs, dst, vmap)
# Delegate to compare with resolver/preds context to maintain dominance via localization
lower_compare(
builder,
op,
lhs,
rhs,
dst,
vmap,
resolver=resolver,
current_block=current_block,
preds=preds,
block_end_values=block_end_values,
bb_map=bb_map,
)
return
# String-aware concatenation unified to handles (i64) when any side is pointer string
# String-aware concatenation unified to handles (i64).
# Use concat_hh when either side is a pointer string OR tagged as string handle.
if op == '+':
i64 = ir.IntType(64)
i8p = ir.IntType(8).as_pointer()
lhs_raw = vmap.get(lhs)
rhs_raw = vmap.get(rhs)
is_str = (hasattr(lhs_raw, 'type') and isinstance(lhs_raw.type, ir.PointerType)) or \
# pointer present?
is_ptr_side = (hasattr(lhs_raw, 'type') and isinstance(lhs_raw.type, ir.PointerType)) or \
(hasattr(rhs_raw, 'type') and isinstance(rhs_raw.type, ir.PointerType))
# tagged string handles?(両辺ともに string-ish のときのみ)
both_tagged = False
try:
if resolver is not None and hasattr(resolver, 'is_stringish'):
both_tagged = resolver.is_stringish(lhs) and resolver.is_stringish(rhs)
except Exception:
pass
is_str = is_ptr_side or both_tagged
if is_str:
# Helper: convert raw or resolved value to string handle
def to_handle(raw, val, tag: str):
@ -91,6 +114,12 @@ def lower_binop(
callee = ir.Function(builder.module, hh_fnty, name='nyash.string.concat_hh')
res = builder.call(callee, [hl, hr], name=f"concat_hh_{dst}")
vmap[dst] = res
# Tag result as string handle so subsequent '+' stays in string domain
try:
if resolver is not None and hasattr(resolver, 'mark_string'):
resolver.mark_string(dst)
except Exception:
pass
return
# Ensure both are i64

View File

@ -76,27 +76,39 @@ def lower_boxcall(
# Minimal method bridging for strings and console
if method_name in ("length", "len"):
# Prefer handle-based len_h
# Any.length_h: Array/String/Map に対応
recv_h = _ensure_handle(builder, module, recv_val)
callee = _declare(module, "nyash.string.len_h", i64, [i64])
result = builder.call(callee, [recv_h], name="strlen_h")
callee = _declare(module, "nyash.any.length_h", i64, [i64])
result = builder.call(callee, [recv_h], name="any_length_h")
if dst_vid is not None:
vmap[dst_vid] = result
return
if method_name == "substring":
# substring(start, end) with pointer-based API
# substring(start, end)
# If receiver is a handle (i64), use handle-based helper; else pointer-based API
if resolver is not None and preds is not None and block_end_values is not None and bb_map is not None:
s = resolver.resolve_i64(args[0], builder.block, preds, block_end_values, vmap, bb_map) if args else ir.Constant(i64, 0)
e = resolver.resolve_i64(args[1], builder.block, preds, block_end_values, vmap, bb_map) if len(args) > 1 else ir.Constant(i64, 0)
else:
s = vmap.get(args[0], ir.Constant(i64, 0)) if args else ir.Constant(i64, 0)
e = vmap.get(args[1], ir.Constant(i64, 0)) if len(args) > 1 else ir.Constant(i64, 0)
# Coerce recv to i8*
if hasattr(recv_val, 'type') and isinstance(recv_val.type, ir.IntType):
# handle-based
callee = _declare(module, "nyash.string.substring_hii", i64, [i64, i64, i64])
h = builder.call(callee, [recv_val, s, e], name="substring_h")
if dst_vid is not None:
vmap[dst_vid] = h
try:
if resolver is not None and hasattr(resolver, 'mark_string'):
resolver.mark_string(dst_vid)
except Exception:
pass
return
else:
# pointer-based
recv_p = recv_val
if hasattr(recv_p, 'type') and isinstance(recv_p.type, ir.IntType):
recv_p = builder.inttoptr(recv_p, i8p, name="bc_i2p_recv")
elif hasattr(recv_p, 'type') and isinstance(recv_p.type, ir.PointerType):
if hasattr(recv_p, 'type') and isinstance(recv_p.type, ir.PointerType):
try:
if isinstance(recv_p.type.pointee, ir.ArrayType):
c0 = ir.Constant(ir.IntType(32), 0)
@ -112,30 +124,43 @@ def lower_boxcall(
e = builder.ptrtoint(e, i64)
callee = _declare(module, "nyash.string.substring_sii", i8p, [i8p, i64, i64])
p = builder.call(callee, [recv_p, s, e], name="substring")
# Return as handle across blocks (i8* -> i64 via nyash.box.from_i8_string)
conv_fnty = ir.FunctionType(i64, [i8p])
conv = _declare(module, "nyash.box.from_i8_string", i64, [i8p])
h = builder.call(conv, [p], name="str_ptr2h_sub")
if dst_vid is not None:
vmap[dst_vid] = h
try:
if resolver is not None and hasattr(resolver, 'mark_string'):
resolver.mark_string(dst_vid)
except Exception:
pass
return
if method_name == "lastIndexOf":
# lastIndexOf(needle)
if resolver is not None and preds is not None and block_end_values is not None and bb_map is not None:
needle = resolver.resolve_ptr(args[0], builder.block, preds, block_end_values, vmap) if args else ir.Constant(i8p, None)
n_i64 = resolver.resolve_i64(args[0], builder.block, preds, block_end_values, vmap, bb_map) if args else ir.Constant(i64, 0)
else:
needle = vmap.get(args[0], ir.Constant(i8p, None)) if args else ir.Constant(i8p, None)
n_i64 = vmap.get(args[0], ir.Constant(i64, 0)) if args else ir.Constant(i64, 0)
if hasattr(recv_val, 'type') and isinstance(recv_val.type, ir.IntType):
# handle-based
callee = _declare(module, "nyash.string.lastIndexOf_hh", i64, [i64, i64])
res = builder.call(callee, [recv_val, n_i64], name="lastIndexOf_hh")
if dst_vid is not None:
vmap[dst_vid] = res
return
else:
# pointer-based
recv_p = recv_val
if hasattr(recv_p, 'type') and isinstance(recv_p.type, ir.IntType):
recv_p = builder.inttoptr(recv_p, i8p, name="bc_i2p_recv2")
elif hasattr(recv_p, 'type') and isinstance(recv_p.type, ir.PointerType):
if hasattr(recv_p, 'type') and isinstance(recv_p.type, ir.PointerType):
try:
if isinstance(recv_p.type.pointee, ir.ArrayType):
c0 = ir.Constant(ir.IntType(32), 0)
recv_p = builder.gep(recv_p, [c0, c0], name="bc_gep_recv2")
except Exception:
pass
else:
recv_p = ir.Constant(i8p, None)
needle = n_i64
if hasattr(needle, 'type') and isinstance(needle.type, ir.IntType):
needle = builder.inttoptr(needle, i8p, name="bc_i2p_needle")
elif hasattr(needle, 'type') and isinstance(needle.type, ir.PointerType):
@ -145,14 +170,31 @@ def lower_boxcall(
needle = builder.gep(needle, [c0, c0], name="bc_gep_needle")
except Exception:
pass
elif not hasattr(needle, 'type'):
needle = ir.Constant(i8p, None)
callee = _declare(module, "nyash.string.lastIndexOf_ss", i64, [i8p, i8p])
res = builder.call(callee, [recv_p, needle], name="lastIndexOf")
if dst_vid is not None:
vmap[dst_vid] = res
return
if method_name == "get":
# ArrayBox.get(index) → nyash.array.get_h(handle, idx)
recv_h = _ensure_handle(builder, module, recv_val)
if resolver is not None and preds is not None and block_end_values is not None and bb_map is not None:
idx = resolver.resolve_i64(args[0], builder.block, preds, block_end_values, vmap, bb_map) if args else ir.Constant(i64, 0)
else:
idx = vmap.get(args[0], ir.Constant(i64, 0)) if args else ir.Constant(i64, 0)
callee = _declare(module, "nyash.array.get_h", i64, [i64, i64])
res = builder.call(callee, [recv_h, idx], name="arr_get_h")
if dst_vid is not None:
vmap[dst_vid] = res
try:
if resolver is not None and hasattr(resolver, 'mark_string'):
# Heuristic: args array often stores strings for CLI; tag as string-ish
resolver.mark_string(dst_vid)
except Exception:
pass
return
if method_name in ("print", "println", "log"):
# Console mapping
if resolver is not None and preds is not None and block_end_values is not None and bb_map is not None:
@ -174,6 +216,48 @@ def lower_boxcall(
vmap[dst_vid] = ir.Constant(i64, 0)
return
# Special: method on `me` (self) or static dispatch to Main.* → direct call to `Main.method/arity`
try:
cur_fn_name = str(builder.block.parent.name)
except Exception:
cur_fn_name = ''
# Heuristic: value-id 0 is often the implicit receiver for `me` in MIR
if box_vid == 0 and cur_fn_name.startswith('Main.'):
# Build target function name with arity
arity = len(args)
target = f"Main.{method_name}/{arity}"
# If module already has such function, prefer direct call
callee = None
for f in module.functions:
if f.name == target:
callee = f
break
if callee is not None:
a = []
for i, aid in enumerate(args):
raw = vmap.get(aid)
if raw is not None and hasattr(raw, 'type') and isinstance(raw.type, ir.PointerType):
aval = _ensure_handle(builder, module, raw)
else:
if resolver is not None and preds is not None and block_end_values is not None and bb_map is not None:
aval = resolver.resolve_i64(aid, builder.block, preds, block_end_values, vmap, bb_map)
else:
aval = vmap.get(aid, ir.Constant(ir.IntType(64), 0))
if hasattr(aval, 'type') and isinstance(aval.type, ir.PointerType):
aval = _ensure_handle(builder, module, aval)
elif hasattr(aval, 'type') and isinstance(aval.type, ir.IntType) and aval.type.width != 64:
aval = builder.zext(aval, ir.IntType(64)) if aval.type.width < 64 else builder.trunc(aval, ir.IntType(64))
a.append(aval)
res = builder.call(callee, a, name=f"call_self_{method_name}")
if dst_vid is not None:
vmap[dst_vid] = res
try:
if method_name in ("esc_json", "node_json", "dirname", "join", "read_all") and resolver is not None and hasattr(resolver, 'mark_string'):
resolver.mark_string(dst_vid)
except Exception:
pass
return
# Default: invoke via NyRT by-name shim (runtime resolves method id)
recv_h = _ensure_handle(builder, module, recv_val)
# Build C string for method name
@ -195,7 +279,9 @@ def lower_boxcall(
g.global_constant = True
g.initializer = ir.Constant(arr_ty, bytearray(mbytes))
c0 = ir.Constant(ir.IntType(32), 0)
mptr = builder.gep(g, [c0, c0], inbounds=True)
# Compute GEP in the current block so it is naturally ordered before the call
# Use constant GEP so we don't depend on instruction ordering
mptr = ir.Constant.gep(g, (c0, c0))
# Up to 2 args for minimal path
argc = ir.Constant(i64, min(len(args), 2))

View File

@ -87,7 +87,25 @@ def lower_call(
# Make the call
result = builder.call(func, call_args, name=f"call_{func_name}")
# Optional trace for final debugging
try:
import os
if os.environ.get('NYASH_LLVM_TRACE_FINAL') == '1' and isinstance(actual_name, str):
if actual_name in ("Main.node_json/3", "Main.esc_json/1", "main"):
print(f"[TRACE] call {actual_name} args={len(call_args)}", flush=True)
except Exception:
pass
# Store result if needed
if dst_vid is not None:
vmap[dst_vid] = result
# Heuristic: mark known string-producing functions as string handles
try:
name_for_tag = actual_name if isinstance(actual_name, str) else str(actual_name)
if resolver is not None and hasattr(resolver, 'mark_string'):
if any(key in name_for_tag for key in [
'esc_json', 'node_json', 'dirname', 'join', 'read_all', 'toJson'
]):
resolver.mark_string(dst_vid)
except Exception:
pass

View File

@ -42,10 +42,20 @@ def lower_compare(
i64 = ir.IntType(64)
i8p = ir.IntType(8).as_pointer()
# String-aware equality: if both are pointers, assume i8* strings
if op in ('==','!=') and hasattr(lhs_val, 'type') and hasattr(rhs_val, 'type'):
if isinstance(lhs_val.type, ir.PointerType) and isinstance(rhs_val.type, ir.PointerType):
# Box both to handles and call nyash.string.eq_hh
# String-aware equality: if either side is a pointer or tagged as string-ish, compare via eq_hh
if op in ('==','!='):
lhs_ptr = hasattr(lhs_val, 'type') and isinstance(lhs_val.type, ir.PointerType)
rhs_ptr = hasattr(rhs_val, 'type') and isinstance(rhs_val.type, ir.PointerType)
lhs_tag = False
rhs_tag = False
try:
if resolver is not None and hasattr(resolver, 'is_stringish'):
lhs_tag = resolver.is_stringish(lhs)
rhs_tag = resolver.is_stringish(rhs)
except Exception:
pass
if lhs_ptr or rhs_ptr or lhs_tag or rhs_tag:
# Convert both to handles (i64) then nyash.string.eq_hh
# nyash.box.from_i8_string(i8*) -> i64
box_from = None
for f in builder.module.functions:
@ -54,9 +64,18 @@ def lower_compare(
break
if not box_from:
box_from = ir.Function(builder.module, ir.FunctionType(i64, [i8p]), name='nyash.box.from_i8_string')
lh = builder.call(box_from, [lhs_val], name='lhs_ptr2h')
rh = builder.call(box_from, [rhs_val], name='rhs_ptr2h')
def to_h(v):
if hasattr(v, 'type') and isinstance(v.type, ir.PointerType):
return builder.call(box_from, [v])
else:
# assume i64 handle or number; zext/trunc to i64 if needed
if hasattr(v, 'type') and isinstance(v.type, ir.IntType) and v.type.width != 64:
return builder.zext(v, i64) if v.type.width < 64 else builder.trunc(v, i64)
if hasattr(v, 'type') and isinstance(v.type, ir.PointerType):
return builder.ptrtoint(v, i64)
return v if hasattr(v, 'type') else ir.Constant(i64, 0)
lh = to_h(lhs_val)
rh = to_h(rhs_val)
eqf = None
for f in builder.module.functions:
if f.name == 'nyash.string.eq_hh':
@ -68,7 +87,6 @@ def lower_compare(
if op == '==':
vmap[dst] = eq
else:
# ne = 1 - eq
one = ir.Constant(i64, 1)
ne = builder.sub(one, eq, name='str_ne')
vmap[dst] = ne

View File

@ -63,8 +63,12 @@ def lower_const(
g.global_constant = True
# Store the GlobalVariable; resolver.resolve_ptr will emit GEP in the current block
vmap[dst] = g
if resolver is not None and hasattr(resolver, 'string_literals'):
if resolver is not None:
if hasattr(resolver, 'string_literals'):
resolver.string_literals[dst] = str_val
# Mark this value-id as string-ish to guide '+' and '==' lowering
if hasattr(resolver, 'mark_string'):
resolver.mark_string(dst)
elif const_type == 'void':
# Void/null constant - use i64 zero

View File

@ -45,6 +45,8 @@ def lower_externcall(
"nyash.string.charCodeAt_h": (i64, [i64, i64]),
"nyash.string.concat_hh": (i64, [i64, i64]),
"nyash.string.eq_hh": (i64, [i64, i64]),
"nyash.string.substring_hii": (i64, [i64, i64, i64]),
"nyash.string.lastIndexOf_hh": (i64, [i64, i64]),
# Strings (pointer-based plugin functions)
"nyash.string.concat_ss": (i8p, [i8p, i8p]),
"nyash.string.concat_si": (i8p, [i8p, i64]),

View File

@ -34,20 +34,9 @@ def lower_phi(
vmap[dst_vid] = ir.Constant(ir.IntType(64), 0)
return
# Determine PHI type from snapshots or fallback i64
# Use i64 for PHI to carry handles across blocks (strings/boxes),
# avoiding pointer PHIs that complicate dominance and boxing.
phi_type = ir.IntType(64)
if block_end_values is not None:
for val_id, pred_bid in incoming:
snap = block_end_values.get(pred_bid, {})
val = snap.get(val_id)
if val is not None and hasattr(val, 'type'):
phi_type = val.type
# Prefer pointer type
if hasattr(phi_type, 'is_pointer') and phi_type.is_pointer:
break
# Create PHI instruction
phi = builder.phi(phi_type, name=f"phi_{dst_vid}")
# Build map from provided incoming
incoming_map: Dict[int, int] = {}
@ -67,35 +56,35 @@ def lower_phi(
# Fallback: use blocks in incoming list
actual_preds = [b for _, b in incoming]
# Add incoming for each actual predecessor
# Collect incoming values
incoming_pairs: List[Tuple[ir.Block, ir.Value]] = []
for block_id in actual_preds:
block = bb_map.get(block_id)
# Prefer pred snapshot
vid = incoming_map.get(block_id)
if block is None:
continue
# Prefer resolver-driven localization per predecessor block to satisfy dominance
if vid is not None and resolver is not None and bb_map is not None:
try:
pred_block_obj = bb_map.get(block_id)
if pred_block_obj is not None and hasattr(resolver, 'resolve_i64'):
val = resolver.resolve_i64(vid, pred_block_obj, preds_map or {}, block_end_values or {}, vmap, bb_map)
else:
val = None
except Exception:
val = None
else:
# Snapshot fallback
if block_end_values is not None:
snap = block_end_values.get(block_id, {})
vid = incoming_map.get(block_id)
val = snap.get(vid) if vid is not None else None
else:
vid = incoming_map.get(block_id)
val = vmap.get(vid) if vid is not None else None
if not val:
# Create default value based on type
if isinstance(phi_type, ir.IntType):
# Missing incoming for this predecessor → default 0
val = ir.Constant(phi_type, 0)
elif isinstance(phi_type, ir.DoubleType):
val = ir.Constant(phi_type, 0.0)
else:
# Pointer type - null
val = ir.Constant(phi_type, None)
if not block:
# Skip if block not found
continue
# Type conversion if needed
# Coerce pointer to i64 at predecessor end
if hasattr(val, 'type') and val.type != phi_type:
# Position at end (before terminator) of predecessor block
pb = ir.IRBuilder(block)
try:
term = block.terminator
@ -105,21 +94,32 @@ def lower_phi(
pb.position_at_end(block)
except Exception:
pb.position_at_end(block)
# Convert types
if isinstance(phi_type, ir.IntType) and val.type.is_pointer:
val = pb.ptrtoint(val, phi_type, name=f"cast_p2i_{val_id}")
elif phi_type.is_pointer and isinstance(val.type, ir.IntType):
val = pb.inttoptr(val, phi_type, name=f"cast_i2p_{val_id}")
elif isinstance(phi_type, ir.IntType) and isinstance(val.type, ir.IntType):
# Int to int
if phi_type.width > val.type.width:
val = pb.zext(val, phi_type, name=f"zext_{val_id}")
else:
val = pb.trunc(val, phi_type, name=f"trunc_{val_id}")
i8p = ir.IntType(8).as_pointer()
try:
if hasattr(val.type, 'pointee') and isinstance(val.type.pointee, ir.ArrayType):
c0 = ir.Constant(ir.IntType(32), 0)
val = pb.gep(val, [c0, c0], name=f"phi_gep_{vid}")
except Exception:
pass
boxer = None
for f in builder.module.functions:
if f.name == 'nyash.box.from_i8_string':
boxer = f
break
if boxer is None:
boxer = ir.Function(builder.module, ir.FunctionType(ir.IntType(64), [i8p]), name='nyash.box.from_i8_string')
val = pb.call(boxer, [val], name=f"phi_ptr2h_{vid}")
incoming_pairs.append((block, val))
# Add to PHI (skip if no block)
if block is not None:
# If nothing collected, use zero constant and bail out
if not incoming_pairs:
vmap[dst_vid] = ir.Constant(phi_type, 0)
return
# Create PHI instruction now and add incoming
phi = builder.phi(phi_type, name=f"phi_{dst_vid}")
for block, val in incoming_pairs:
phi.add_incoming(val, block)
# Store PHI result

View File

@ -166,6 +166,25 @@ class NyashLLVMBuilder:
param_types = [self.i64] * arity
func_ty = ir.FunctionType(self.i64, param_types)
# Reset per-function maps and resolver caches to avoid cross-function collisions
try:
self.vmap.clear()
except Exception:
self.vmap = {}
# Reset resolver caches (they key by block name; avoid collisions across functions)
try:
self.resolver.i64_cache.clear()
self.resolver.ptr_cache.clear()
self.resolver.f64_cache.clear()
if hasattr(self.resolver, '_end_i64_cache'):
self.resolver._end_i64_cache.clear()
if hasattr(self.resolver, 'string_ids'):
self.resolver.string_ids.clear()
if hasattr(self.resolver, 'string_literals'):
self.resolver.string_literals.clear()
except Exception:
pass
# Create or reuse function
func = None
for f in self.module.functions:
@ -211,9 +230,44 @@ class NyashLLVMBuilder:
bb = func.append_basic_block(block_name)
self.bb_map[bid] = bb
# Process each block
# Build quick lookup for blocks by id
block_by_id: Dict[int, Dict[str, Any]] = {}
for block_data in blocks:
bid = block_data.get("id", 0)
block_by_id[block_data.get("id", 0)] = block_data
# Determine entry block: first with no predecessors; fallback to first block
entry_bid = None
for bid, preds in self.preds.items():
if len(preds) == 0:
entry_bid = bid
break
if entry_bid is None and blocks:
entry_bid = blocks[0].get("id", 0)
# Compute a preds-first (approx topological) order
visited = set()
order: List[int] = []
def visit(bid: int):
if bid in visited:
return
visited.add(bid)
for p in self.preds.get(bid, []):
visit(p)
order.append(bid)
if entry_bid is not None:
visit(entry_bid)
# Include any blocks not reachable from entry
for bid in block_by_id.keys():
if bid not in visited:
visit(bid)
# Process blocks in the computed order
for bid in order:
block_data = block_by_id.get(bid)
if block_data is None:
continue
bb = self.bb_map[bid]
self.lower_block(bb, block_data, func)
@ -228,9 +282,45 @@ class NyashLLVMBuilder:
pass
instructions = block_data.get("instructions", [])
created_ids: List[int] = []
# Process each instruction
for inst in instructions:
# Two-pass: lower all PHIs first to keep them grouped at top
phi_insts = [inst for inst in instructions if inst.get("op") == "phi"]
non_phi_insts = [inst for inst in instructions if inst.get("op") != "phi"]
# Lower PHIs
if phi_insts:
# Ensure insertion at block start
builder.position_at_start(bb)
for inst in phi_insts:
self.lower_instruction(builder, inst, func)
try:
dst = inst.get("dst")
if isinstance(dst, int) and dst not in created_ids and dst in self.vmap:
created_ids.append(dst)
except Exception:
pass
# Lower non-PHI instructions in a coarse dependency-friendly order
# (ensure producers like newbox/const appear before consumers like boxcall/externcall)
order = {
'newbox': 0,
'const': 1,
'typeop': 2,
'load': 3,
'store': 3,
'binop': 4,
'compare': 5,
'call': 6,
'boxcall': 6,
'externcall': 7,
'safepoint': 8,
'barrier': 8,
'while': 8,
'jump': 9,
'branch': 9,
'ret': 10,
}
non_phi_insts_sorted = sorted(non_phi_insts, key=lambda i: order.get(i.get('op'), 100))
for inst in non_phi_insts_sorted:
# Append in program order to preserve dominance; avoid re-inserting before a terminator here
builder.position_at_end(bb)
self.lower_instruction(builder, inst, func)
try:
dst = inst.get("dst")
@ -451,6 +541,15 @@ class NyashLLVMBuilder:
if val_id == dst_vid:
val = phi
else:
# Prefer resolver-driven localization at the end of the predecessor block
if hasattr(self, 'resolver') and self.resolver is not None:
try:
pred_block_obj = pred_bb
val = self.resolver.resolve_i64(val_id, pred_block_obj, self.preds, self.block_end_values, self.vmap, self.bb_map)
except Exception:
val = None
else:
# Snapshot fallback
snap = self.block_end_values.get(pred_bid, {})
# Special-case: incoming 0 means typed zero/null, not value-id 0
if isinstance(val_id, int) and val_id == 0:
@ -477,11 +576,11 @@ class NyashLLVMBuilder:
pb.position_at_end(pred_bb)
except Exception:
pb.position_at_end(pred_bb)
if isinstance(phi_type, ir.IntType) and isinstance(val.type, ir.PointerType):
if isinstance(phi_type, ir.IntType) and hasattr(val, 'type') and isinstance(val.type, ir.PointerType):
val = pb.ptrtoint(val, phi_type, name=f"phi_p2i_{dst_vid}_{pred_bid}")
elif isinstance(phi_type, ir.PointerType) and isinstance(val.type, ir.IntType):
elif isinstance(phi_type, ir.PointerType) and hasattr(val, 'type') and isinstance(val.type, ir.IntType):
val = pb.inttoptr(val, phi_type, name=f"phi_i2p_{dst_vid}_{pred_bid}")
elif isinstance(phi_type, ir.IntType) and isinstance(val.type, ir.IntType):
elif isinstance(phi_type, ir.IntType) and hasattr(val, 'type') and isinstance(val.type, ir.IntType):
if phi_type.width > val.type.width:
val = pb.zext(val, phi_type, name=f"phi_zext_{dst_vid}_{pred_bid}")
elif phi_type.width < val.type.width:

View File

@ -32,11 +32,28 @@ class Resolver:
self.f64_cache: Dict[Tuple[str, int], ir.Value] = {}
# String literal map: value_id -> Python string (for by-name calls)
self.string_literals: Dict[int, str] = {}
# Track value-ids that are known to represent string handles (i64)
# This is a best-effort tag used to decide '+' as string concat when both sides are i64.
self.string_ids: set[int] = set()
# Type shortcuts
self.i64 = ir.IntType(64)
self.i8p = ir.IntType(8).as_pointer()
self.f64_type = ir.DoubleType()
# Cache for recursive end-of-block i64 resolution
self._end_i64_cache: Dict[Tuple[int, int], ir.Value] = {}
def mark_string(self, value_id: int) -> None:
try:
self.string_ids.add(int(value_id))
except Exception:
pass
def is_stringish(self, value_id: int) -> bool:
try:
return int(value_id) in self.string_ids
except Exception:
return False
def resolve_i64(
self,
@ -67,72 +84,72 @@ class Resolver:
pred_ids = [p for p in preds.get(bid, []) if p != bid]
if not pred_ids:
# Entry block or no predecessors
base_val = vmap.get(value_id, ir.Constant(self.i64, 0))
# Do not emit casts here; if pointer, fall back to zero
if hasattr(base_val, 'type') and isinstance(base_val.type, ir.IntType):
result = base_val if base_val.type.width == 64 else ir.Constant(self.i64, 0)
elif hasattr(base_val, 'type') and isinstance(base_val.type, ir.PointerType):
# Entry block or no predecessors: prefer local vmap value (already dominating)
base_val = vmap.get(value_id)
if base_val is None:
result = ir.Constant(self.i64, 0)
else:
result = ir.Constant(self.i64, 0)
else:
# Create PHI at block start
saved_pos = None
if self.builder is not None:
saved_pos = self.builder.block
self.builder.position_at_start(current_block)
phi = self.builder.phi(self.i64, name=f"loc_i64_{value_id}")
# Add incoming values from predecessors
for pred_id in pred_ids:
pred_vals = block_end_values.get(pred_id, {})
val = pred_vals.get(value_id)
# Coerce in predecessor block if needed
if val is None:
coerced = ir.Constant(self.i64, 0)
else:
if hasattr(val, 'type') and isinstance(val.type, ir.IntType):
coerced = val if val.type.width == 64 else ir.Constant(self.i64, 0)
elif hasattr(val, 'type') and isinstance(val.type, ir.PointerType):
# insert ptrtoint in predecessor
pred_bb = bb_map.get(pred_id) if bb_map is not None else None
if pred_bb is not None:
pb = ir.IRBuilder(pred_bb)
# If pointer string, box to handle in current block (use local builder)
if hasattr(base_val, 'type') and isinstance(base_val.type, ir.PointerType) and self.module is not None:
pb = ir.IRBuilder(current_block)
try:
term = pred_bb.terminator
if term is not None:
pb.position_before(term)
else:
pb.position_at_end(pred_bb)
pb.position_at_start(current_block)
except Exception:
pb.position_at_end(pred_bb)
coerced = pb.ptrtoint(val, self.i64, name=f"res_p2i_{value_id}_{pred_id}")
pass
i8p = ir.IntType(8).as_pointer()
v = base_val
try:
if hasattr(v.type, 'pointee') and isinstance(v.type.pointee, ir.ArrayType):
c0 = ir.Constant(ir.IntType(32), 0)
v = pb.gep(v, [c0, c0], name=f"res_gep_{value_id}")
except Exception:
pass
# declare and call boxer
for f in self.module.functions:
if f.name == 'nyash.box.from_i8_string':
box_from = f
break
else:
coerced = ir.Constant(self.i64, 0)
box_from = ir.Function(self.module, ir.FunctionType(self.i64, [i8p]), name='nyash.box.from_i8_string')
result = pb.call(box_from, [v], name=f"res_ptr2h_{value_id}")
elif hasattr(base_val, 'type') and isinstance(base_val.type, ir.IntType):
result = base_val if base_val.type.width == 64 else ir.Constant(self.i64, 0)
else:
coerced = ir.Constant(self.i64, 0)
# Use predecessor block if available
pred_bb = None
if bb_map is not None:
pred_bb = bb_map.get(pred_id)
if pred_bb is not None:
phi.add_incoming(coerced, pred_bb)
# If no valid incoming were added, fold to zero to avoid invalid PHI
if len(getattr(phi, 'incoming', [])) == 0:
# Replace with zero constant and discard phi
result = ir.Constant(self.i64, 0)
# Restore position and cache
if saved_pos and self.builder is not None:
self.builder.position_at_end(saved_pos)
else:
# Sealed SSA localization: create a PHI at the start of current block
# that merges i64-coerced snapshots from each predecessor. This guarantees
# dominance for downstream uses within the current block.
# Use shared builder so insertion order is respected relative to other instructions.
# Save current insertion point
sb = self.builder
if sb is None:
# As a conservative fallback, synthesize zero (should not happen in normal lowering)
result = ir.Constant(self.i64, 0)
self.i64_cache[cache_key] = result
return result
# Restore position
if saved_pos and self.builder is not None:
self.builder.position_at_end(saved_pos)
orig_block = sb.block
# Insert PHI at the very start of current_block
sb.position_at_start(current_block)
phi = sb.phi(self.i64, name=f"loc_i64_{value_id}")
for pred_id in pred_ids:
# Value at the end of predecessor, coerced to i64 within pred block
coerced = self._value_at_end_i64(value_id, pred_id, preds, block_end_values, vmap, bb_map)
pred_bb = bb_map.get(pred_id) if bb_map is not None else None
if pred_bb is None:
continue
phi.add_incoming(coerced, pred_bb)
# Restore insertion point to original location
try:
if orig_block is not None:
term = orig_block.terminator
if term is not None:
sb.position_before(term)
else:
sb.position_at_end(orig_block)
except Exception:
pass
# Use the PHI value as the localized definition for this block
result = phi
# Cache and return
@ -166,6 +183,100 @@ class Resolver:
self.ptr_cache[cache_key] = result
return result
def _value_at_end_i64(self, value_id: int, block_id: int, preds: Dict[int, list],
block_end_values: Dict[int, Dict[int, Any]], vmap: Dict[int, Any],
bb_map: Optional[Dict[int, ir.Block]] = None,
_vis: Optional[set] = None) -> ir.Value:
"""Resolve value as i64 at the end of a given block by traversing predecessors if needed."""
key = (block_id, value_id)
if key in self._end_i64_cache:
return self._end_i64_cache[key]
if _vis is None:
_vis = set()
if key in _vis:
return ir.Constant(self.i64, 0)
_vis.add(key)
# If present in snapshot, coerce there
snap = block_end_values.get(block_id, {})
if value_id in snap and snap[value_id] is not None:
val = snap[value_id]
coerced = self._coerce_in_block_to_i64(val, block_id, bb_map)
self._end_i64_cache[key] = coerced
return coerced
# Try recursively from predecessors
pred_ids = [p for p in preds.get(block_id, []) if p != block_id]
for p in pred_ids:
v = self._value_at_end_i64(value_id, p, preds, block_end_values, vmap, bb_map, _vis)
if v is not None:
self._end_i64_cache[key] = v
return v
# Do not use global vmap here; if not materialized by end of this block
# (or its preds), bail out with zero to preserve dominance.
z = ir.Constant(self.i64, 0)
self._end_i64_cache[key] = z
return z
def _coerce_in_block_to_i64(self, val: Any, block_id: int, bb_map: Optional[Dict[int, ir.Block]]) -> ir.Value:
"""Ensure a value is available as i64 at the end of the given block by inserting casts/boxing there."""
if hasattr(val, 'type') and isinstance(val.type, ir.IntType):
# Re-materialize an i64 definition in the predecessor block to satisfy dominance
pred_bb = bb_map.get(block_id) if bb_map is not None else None
if pred_bb is None:
return ir.Constant(self.i64, 0)
pb = ir.IRBuilder(pred_bb)
try:
term = pred_bb.terminator
if term is not None:
pb.position_before(term)
else:
pb.position_at_end(pred_bb)
except Exception:
pb.position_at_end(pred_bb)
if val.type.width == 64:
z = ir.Constant(self.i64, 0)
return pb.add(val, z, name=f"res_copy_{block_id}")
else:
# Extend/truncate to i64 in pred block
if val.type.width < 64:
return pb.zext(val, self.i64, name=f"res_zext_{block_id}")
else:
return pb.trunc(val, self.i64, name=f"res_trunc_{block_id}")
if hasattr(val, 'type') and isinstance(val.type, ir.PointerType):
pred_bb = bb_map.get(block_id) if bb_map is not None else None
if pred_bb is None:
return ir.Constant(self.i64, 0)
pb = ir.IRBuilder(pred_bb)
try:
term = pred_bb.terminator
if term is not None:
pb.position_before(term)
else:
pb.position_at_end(pred_bb)
except Exception:
pb.position_at_end(pred_bb)
i8p = ir.IntType(8).as_pointer()
v = val
try:
if hasattr(v.type, 'pointee') and isinstance(v.type.pointee, ir.ArrayType):
c0 = ir.Constant(ir.IntType(32), 0)
v = pb.gep(v, [c0, c0], name=f"res_gep_{block_id}_{id(val)}")
except Exception:
pass
# declare boxer
box_from = None
for f in self.module.functions:
if f.name == 'nyash.box.from_i8_string':
box_from = f
break
if box_from is None:
box_from = ir.Function(self.module, ir.FunctionType(self.i64, [i8p]), name='nyash.box.from_i8_string')
return pb.call(box_from, [v], name=f"res_ptr2h_{block_id}")
return ir.Constant(self.i64, 0)
def resolve_f64(self, value_id: int, current_block: ir.Block,
preds: Dict, block_end_values: Dict, vmap: Dict) -> ir.Value:
"""Resolve as f64"""