mini_vm: stabilize BinOp(+), literal/string/functioncall/compare/if fast-paths; pyvm: indexOf(start)/lastIndexOf(start), substring(None) guard, __me__ dispatch; update CURRENT_TASK; selfhost smokes green for core cases

This commit is contained in:
Selfhosting Dev
2025-09-21 06:45:21 +09:00
parent 37f93d5630
commit ee17cfd979
3 changed files with 1933 additions and 135 deletions

View File

@ -1,141 +1,177 @@
# Current Task — Macro Normalize + Freeze (Save Point)
# Current Task — Freeze Polish (Concise)
Updated: 20250920
Updated: 20250921
## Today (Done)
- Polishing Sprint非破壊・仕様不変
- main の薄型化bin→lib 委譲)とテスト import 整流
- LLVM Python ビルダ: builders/* に一本化fallback 除去)
- 重要区間の例外ログ化(`NYASH_CLI_VERBOSE=1` 連動)
- 生成物の既定出力を `tmp/` に統一tools/build_llvm.sh, tools/build_aot.sh, llvm_builder.py CLI
- README 冒頭に Execution Status を追記Active/Inactive の明示
- DEV_QUICKSTART に Acceptance Checklist を追記
- lib 内コメントのトーン整流(実装は要点のみ
- PeekExpr → If 連鎖の正変換を安定化
- マクロ側IfMatchNormalizeでの検出を has_kind("PeekExpr") に統一。
- Local/Assignment/Return/Print の4経路で PeekExpr を If に置換できるよう整備。
- 子ランナーPyVM経路の不安定さを踏まえ、既定実行を「internalchild内蔵変換」に切替。
- 内蔵変換Rustで Literalonly の match を If 連鎖へ正規化する安全パスを実装。
- Golden 緑化: tools/test/golden/macro/match_literal_basic_user_macro_golden.sh
- ランナー診断の強化
- 子プロセス stderr を親に透過。失敗時にエラー内容を必ず表示EOF隠れを防止
- JsonBuilder の安定化
- キーワード衝突を回避local → local_decl。呼び出し側を追従。
- LLVM PHI 健全性スモーク拡張If/Matchを追加
- 実行には LLVM ビルドが必要。スクリプトは tools/test/smoke/llvm/ir_phi_hygiene_ifcases.sh。
- Scope ヒント設計noop
- docs/guides/scope-hints.md を追加(今は観測用のみ)。
## Compressed Snapshot (Short)
- Strings (UTF8/CP vs Byte): baseline done
- [x] PyVM CP smokes (length/indexOf/lastIndexOf/substring)
- [x] ASCII Byte smoke
- [x] Rust CP gate (`NYASH_STR_CP=1`) for length/indexOf/lastIndexOf
- [x] Docs: blueprint updated with CP gate
- MiniVM BinOp(+): stabilization in progresssafe pathのみで緑化へ
- [x] Removed global digit-sum fallbacksハング源を除去
- [x] Added typed/token/value-pair probes仕様不変
- [x] Expressionbounded extractorPrint.expression の `{…}` で value×2 決定的抽出)
- [x] Main.fastpath: BinaryOp('+') を早期に 2 値抽出→加算→即 return
- [x] PyVM: `__me__` ディスパッチ同一Box内メソッド呼びの安全化
- [x] PyVM: `String.substring` の None 引数を安全化None→既定値
- [ ] MiniVM 内の me 呼びを完全撤去(関数呼びに統一)/ substring 前の index ガード徹底
- [ ] 代表スモークint+int=46を緑化print_prints_in_slice の無限ループ回避を含む)
- CI: keep min-gate light (MacroCtx/selfhost-preexpand/UTF8/ScopeBox) — all green
重要な運用変更
- ユーザーマクロは継続使用。ただし既定実行は internalchild内蔵変換
- NYASH_MACRO_BOX_CHILD_RUNNER の既定は OFF必要時のみON
- 子環境では NYASH_MACRO_ENABLE=0 などを明示して再帰初期化を抑止。
This page is trimmed to reflect the active work only. The previous long form has been archived at `CURRENT_TASK_restored.md`.
## Delivered (Macro Platform)
- Builtin macros (Rust): derive(Equals/ToString) minimal, publiconly + hygiene.
- User macros (Nyash/PyVM): Proxy → child process (NYASH_VM_USE_PY=1, plugins disabled, timeout). Strict=1 by default (fail on error/timeout; strict=0 → identity fallback).
- Dump/Trace: `--dump-expanded-ast-json`, `NYASH_MACRO_TRACE_JSONL=…`.
- Runner routes:
- Child mode (PoC): `--macro-expand-child <file>` (stdin JSON → stdout JSON).
- PyVM runner (recommended): dynamic runner includes macro and calls `MacroBoxSpec.expand(json)` once per pass.
- Templates & Tests:
- Templates: echo_macro (identity), upper_string_macro ("UPPER:" prefix → uppercase suffix).
- Array/Map examples: array_prepend_zero_macro (prepend 0 to any Array), map_insert_tag_macro (insert {"__macro":"on"} into any Map).
- Goldens (user macros): identity, upper_string, array_prepend_zero, array_empty, array_nested, array_mixed, map_insert_tag, map_multi, map_esc。
- Negative smokes: user macro timeout (strict fail), invalid JSON strict fail, invalid JSON nonstrict → identity fallback。
- Capabilities (実効化): マクロNyashのAST静的走査で IO/NET をゲートstrict=fail / 非厳格は未登録=identity
- MacroCtx (MVP): gensym/report/getEnv の最小スキャフォールドを提供。
- Selfhost 前展開PyVM限定: `NYASH_USE_NY_COMPILER=1` + `NYASH_MACRO_SELFHOST_PRE_EXPAND=1|auto` → AST前展開→MIR→PyVM 実行。
- CI: mingate に macrogolden ジョブと selfhostpreexpandsmoke を追加。
- Docs: usermacros.md / astjsonv0.md / capabilities.md (io/net/env).
Principles (freeze)
- Selfhosting first. Macro normalization preMIR; PyVM semantics are authoritative.
- New features are paused; allow only bug fixes, docs, smokes/goldens, CI polish.
- Keep changes minimal/local; no spec changes unless to fix critical issues.
## Delivered (LoopForm MVP1 Safe)
- LoopNormalize macro (Nyash runner) scaffolded and active.
- Canonicalize Loop node key order (condition → body).
- Safe body reorder: move Assignment nodes to tail only when original order already has nonassign → assign; otherwise keep order (no semantic change).
- JsonBuilder utility added for AST JSON v0 fragments (stringbased minimal helpers).
- Golden comparison made keyorder insensitive (JSON normalized via python) for macro tests.
- LLVM preexpand run verified on loop_simple (PyVM → MIR → LLVM).
### Delta (since last update)
- Docs
- Added strings blueprint: `docs/blueprints/strings-utf8-byte.md`
- Refreshed docs index with clear "Start here" links (blueprints/strings, EBNF, strings reference)
- Clarified operator/loop sugar policy in `guides/language-core-and-sugar.md` ("!" adopted, dowhile not adopted)
- CI/Smokes
- Added UTF8 CP smoke (PyVM): `tools/test/smoke/strings/utf8_cp_smoke.sh` using `apps/tests/strings/utf8_cp_demo.nyash` (green)
- Wired into mingate CI alongside MacroCtx smoke (green)
- Runtime (Rust)
- StringBox.length: CP/Byte gate via env `NYASH_STR_CP=1` (default remains byte length; freezesafe)
- StringBox.indexOf/lastIndexOf: CP gate via env `NYASH_STR_CP=1`既定はByte index; PyVMはCP挙動
## Focus — Freeze & Polish
1) 実アプリの作成と運用(マクロ前展開/PyVM/LLVMラインの動作確認
2) バグ修正・ドキュメント整備・スモーク/ゴールデン/CI強化仕様不変
3) 自己ホスト前展開の観測強化(ログ/スモーク)と安定運用
4) ランタイムcapabilitiesio/net/envのPyVM側実効化は必要になった時点で最小修正
Notes / Risks
- 現在の赤は 2 系統の複合が原因:
1) Nyash 側の `||` 連鎖短絡による digit 判定崩れ(→ if チェーン化で解消
2) 同一 Box 内の `me.*` 呼びが PyVM で未解決(→ `__me__` ディスパッチ導入)。
付随して、`substring(None, …)` 例外や `print_prints_in_slice` のステップ超過が発生。
ここを「me 撤去index ガード+ループ番兵」で仕上げる。
## Polishing Sprint (nonbreaking, minimal)
- [x] Thin bin entry (src/main.rs): remove duplicate `pub mod` list; use `nyash_rust::runner::NyashRunner` and friends.
- [x] Adjust main test imports to refer to `nyash_rust::box_trait::*`.
- [x] Add debug logs in Python LLVM builder for previously silent exceptions (gated by `NYASH_CLI_VERBOSE=1`).
- [x] LLVM builder delegated only (builders/*); legacy fallback removed with clear debug on failure.
- [x] Default outputs unified to `tmp/` (tools/build_llvm.sh, tools/build_aot.sh, llvm_builder.py CLI default).
- [x] No behavior change: keep LLVM/PHI invariants and outputs semantics as-is.
### Design DecisionStrings & Delegation
- Keep `StringBox` as the canonical text type (UTF8 / Code Point semantics). Do NOT introduce a separate "AnsiStringBox".
- Separate bytes explicitly via `ByteCursorBox`/buffer types (byte semantics only).
- Unify operations through a cursor/strategy interface (length/indexOf/lastIndexOf/substring). `StringBox` delegates to `Utf8Cursor` internally; byte paths use `ByteCursor` explicitly.
- Gate for transition (Rust only): `NYASH_STR_CP=1` enables CP semantics where legacy byte behavior exists.
## Next Milestones
- DONE: Selfhost 前展開 既定化auto
- 変更多: `NYASH_MACRO_SELFHOST_PRE_EXPAND` 未設定時に、マクロ有効かつ `NYASH_VM_USE_PY=1` で自動ON安全策付き
- 追加: `--macro-preexpand-auto` フラグ。
- DONE: 環境変数の整理CLI中心の導線
- 追加: `--macro-top-level-allow`, `--macro-profile {dev|ci-fast|strict}`(非破壊の簡易マッピング)。
- DONE: Toplevel allow 既定値を OFF に変更 + AST検出へ統一ファイル名ヒューリスティック撤廃
- DONE: MacroCtx 契約 PoC — ランナーが `expand(json, ctx)` を優先、失敗時に `expand(json)` へフォールバック。
## Implementation Order12 days
Next (short)
- Matchガード含むの正規化を内蔵変換にも拡張If 連鎖)+ golden/smoke 追加
- DONE: LoopForm MVP2 — while → carrier normalizationbreak/continueなし、最大2変数
- 内蔵変換Rust, internalchildで安全ガード付きの末尾整列を実装非代入→代入
- twovars の出力一致スモークを追加tools/test/smoke/macro/loop_two_vars_output_smoke.sh
- DONE: LoopForm MVP3 — break/continue 最小対応(セグメント整列)
- 本体を control 文で分割、各セグメント内のみ安全に「非代入→代入」。
- スモーク: tools/test/smoke/macro/loopform_continue_break_output_smoke.sh
- DONE: for/foreach 正規化(コア正規化パスへ昇格)
- 形: `for(fn(){init}, cond, fn(){step}, fn(){body})`, `foreach(arr, "x", fn(){body})`
- 出力スモーク: tools/test/smoke/macro/for_foreach_output_smoke.shfor: 0,1,2 / foreach: 1,2,3
- MacroCtx PoC子ランナー経路のctx受け渡しを有効化
- ctx JSON: `{ "caps": { "io|net|env": bool } }`
- 例マクロ: `apps/macros/examples/macro_ctx_demo.nyash`identity、stdoutは使わない
- Docs: guides/macro-system.md にMacroCtx節を追記
- Goldens 追加(正規化結果の固定化)
- for_basic / foreach_basic の expanded.json と照合スクリプト
- loop_nonreorder非整列パス: 代入の後に非代入がある)→ 変換スキップの確認
- LoopForm MVP3: break/continue minimal handling (singlelevel)
- for/foreach predesugaring → LoopForm normalization (limited)
- LLVM IR hygiene for LoopForm / If / Match — PHI at block head, no empty PHIs (smoke)
- Docs: enrich `docs/guides/loopform.md` with carrier examples and JSON builder snippets.
- If/Match normalization pass: canonical If join with single PHI group and Match→Ifchain (scrutinee once, guard fused), expression results via join var.
- ScopeBox (compile-time meta): design + docs; no-op macro scaffold; MIR hint names (no-op) and plan for zero-cost stripping.
- ControlFlowBuilder/PatternBuilder docs and scaffolding: author APIs for If/Match normalization and pattern conditions; migrate macros to use them.
1) Strings CP/Byte baselinefoundation
- [x] CP smokePyVM: length/indexOf/lastIndexOf/substringapps/tests/strings/utf8_cp_demo.nyash
- [x] ASCII byte smokePyVM: length/indexOf/substringapps/tests/strings/byte_ascii_demo.nyash
- [x] Rust CP gate: length/indexOf/lastIndexOf → `NYASH_STR_CP=1` でCP既定はByte
- [x] Docs note: CP gate env (`NYASH_STR_CP=1`) を strings blueprint に明記
Action Items (next 48h)
- [x] Enable sugar by default (array/map literals)
- [x] Golden normalizer (keyorder insensitive) for macro tests
- [x] Loop simple/twovars goldens with normalization
- [x] Match guard: smokePeekExpr なし
- [x] Match guard: goldenliteral OR 最小形)
- [ ] Match guard: 追加goldentype最小形、Boxなし構成
- [x] Smoke for guard/type match normalizationno PeekExpr; If present
- [x] LoopForm MVP2: twovars carrier safe normalization + tests/smokes
- [x] LLVM PHI hygiene smoke on LoopForm cases
- [x] LLVM PHI hygiene smoke on If cases
- [ ] ScopeBox docs + macro scaffold (no-op) + MIR hint type sketch
- [ ] ControlFlowBuilder/PatternBuilder docs本commitで追加→ スキャフォールド実装 → If/Matchマクロ置換の最初の1本
- [x] Reorganize macro tests under apps/tests/macro/* and update golden/smoke paths
- [x] Add MIR hints module (no-op sink) and loop header/latch hint calls
2) MiniVM BinOp(+安定化StageB 内部置換)
- [x] 広域フォールバック(全数値合算)を削除(安全化)
- [x] typed/token/value-pair の局所探索を導入(非破壊)
- [x] 式境界Print.expression`{…}` 内で `value`×2 を確実抽出→加算(決定化)
- [x] Main.fastpath を追加(先頭で確定→即 return
- [x] PyVM`__me__` ディスパッチ追加/`substring` None ガード
- [ ] MiniVMme 呼びの全面撤去(関数呼びへ統一
- [ ] `substring` 呼び前の index>=0 ガード徹底・`print_prints_in_slice` に番兵を追加
- [ ] 代表スモークint+int=46を緑化
## Phase16 Outlook
- MacroCtx (gensym/report/getEnv) and capabilities mapped to `nyash.toml`.
- Attributestyle (@derive/@lint) macros consolidated via MacroBox.
- span/source map for better diagnostics.
3) JSON ローダ分離(導線のみ・互換維持)
- [ ] MiniJsonLoader薄ラッパ経由に集約→後で `apps/libs/json_cur.nyash` に差し替え可能に
- [ ] スモークは現状維持(既存の stdin/argv 供給で緑)
## Final Goal (Replacement Strategy)
- Ultimately replace both the Rust front (beyond minimal bootstrap/backends) and PyVM with Nyash implementations—fully selfhosted pipeline.
- Steps: front macros → resolver helpers → MIR lowering helpers → Nyash VM → phase out PyVM → reduce Rust to boot/backends only.
4) Nyash 箱の委譲(内部・段階導入)
- [ ] StringBox 公開API → Utf8CursorBoxlength/indexOf/substringへ委譲互換を保つ
- [ ] Byte 系は ByteCursorBox を明示利用(混線防止)
## Quick Reference
- Profiles: `--profile {lite|dev|ci|strict}`dev相当が既定運用
- Register macros: `NYASH_MACRO_PATHS=apps/macros/examples/echo_macro.nyash`
- Strict/Timeout: `NYASH_MACRO_STRICT=1`(既定), `NYASH_NY_COMPILER_TIMEOUT_MS=2000`
- Dump expanded AST: `nyash --dump-expanded-ast-json <file.nyash>`
- Selfhost 前展開: 既定autoPyVM限定
- Docs: `docs/guides/user-macros.md`, `docs/guides/macro-profiles.md`, `docs/reference/ir/ast-json-v0.md`, `docs/reference/macro/capabilities.md`
5) CI/Docs polish軽量維持
- [x] README/blueprint に CP gate を追記
- [ ] Min-gate は軽量を維持MacroCtx/前展開/UTF8/Scope系のみ
## MiniVMStageB 進行中)
目的: Nyash で書かれた極小VMMiniVMを段階育成し、PyVM 依存を徐々に薄める。まずは “読む→解釈→出力” の最小パイプを安定化。
現状(達成)
- stdin ローダ(ゲート): `NYASH_MINIVM_READ_STDIN=1`
- Print(Literal/FunctionCall)、BinaryOp("+") の最小実装とスモーク
- Compare 6種<, <=, >, >=, ==, !=)と int+int の厳密抽出
- Ifリテラル条件片側分岐の走査
次にやる(順)
0) BinOp(+: me 撤去index ガード徹底print_prints 番兵(ハング防止)
1) JSON ローダの分離(`apps/libs/json_cur.nyash` 採用準備)
2) if/loop の代表スモークを 12 本追加PyVM と出力一致)
3) MiniVM を関数形式へ移行(トップレベル MVP から段階置換)
受け入れStageB
- stdin/argv から供給した JSON 入力で Print/分岐が正しく動作(スモーク緑)
## UTF8 計画UTF8 First, Bytes Separate
目的: String の公開 API を UTF8 カーソルへ委譲し、文字列処理の一貫性と可観測性を確保(性能最適化は後続)。
現状
- Docs: `docs/reference/language/strings.md`
- MVP Box: `apps/libs/utf8_cursor.nyash`, `apps/libs/byte_cursor.nyash`
段階導入(内部置換のみ)
1) StringBox の公開 API を段階的に `Utf8CursorBox` 委譲(`length/indexOf/substring`
2) MiniVM/macro 内の簡易走査を `Utf8CursorBox`/`ByteCursorBox` に置換(機能同値、内部のみ)
3) Docs/スモークの更新(出力は不変;必要時のみ観測ログを追加)
Nyash スクリプトの基本ボックス(標準 libs
- 既存: `json_cur.nyash`, `string_ext.nyash`, `array_ext.nyash`, `string_builder.nyash`, `test_assert.nyash`, `utf8_cursor.nyash`, `byte_cursor.nyash`
- 追加候補(凍結順守: libs 配下・任意採用・互換保持)
- MapExtBoxkeys/values/entries
- PathBox minidirname/join の最小)
- PrintfExt`StringBuilderBox` 補助)
## CI/Gates — Green vs Pending
Alwayson期待値: 緑)
- rustcheck: `cargo check --all-targets`
- pyvmsmoke: `tools/pyvm_stage2_smoke.sh`
- macrogolden: identity/strings/array/map/loopformkeyorder insensitive
- macrosmokeslite:
- match guardliteral OR / type minimal
- MIR hintsScope/Join/Loop
- ScopeBoxnoop
- MacroCtx ctx JSON追加済み
- selfhostpreexpandsmoke: upper_stringauto engage 確認)
Pending / Skipped未導入・任意
- Match guard: 追加ゴールデンtype最小形
- LoopForm: break/continue 降下の観測スモーク
- MiniVM StageB: JSON ローダ分離 + if/loop 代表スモーク
- MiniVM StageB: BinOp(+緑化me 撤去index ガード+番兵)
- UTF8 委譲: StringBox→Utf8CursorBox の段階置換(内部のみ;ゲート)
- UTF8 CP gate (Rust): indexOf/lastIndexOf envgated CP semantics既定OFF
- LLVM 重テスト: 手動/任意ジョブのみ(常時スキップ)
## 80/20 Plan小粒で高効果
Checklist更新済み
- [x] Selfhost 前展開の固定スモーク 1 本upper_string
- [x] MacroCtx ctx JSON スモーク 1 本CI 組み込み)
- [ ] Match 正規化: 追加テストは当面維持(必要時にのみ追加)
- [x] プロファイル運用ガイド追記(`--profile dev|lite`
- [ ] LLVM 重テストは常時スキップ(手動/任意ジョブのみ)
- [ ] 警告掃除は次回リファクタで一括(今回は非破壊)
Acceptance
- 上記 2 本preexpand/MacroCtx常時緑、既存 smokes/goldens 緑
- README/ガイドにプロファイル説明が反映済み
- UTF8 CP smoke is green under PyVM; Rust CP gate remains optin
## SelfHosting — Stage A要約
Scope最小
- JSON v0 ローダ(ミニセット)/ MiniVM最小命令/ スモーク 2 本print / if
Progress
- [x] MiniVM MVPprint literal / if branch
- [x] PyVM P1: `String.indexOf` 追加
- [x] Entry 統一(`Main.main`/ ネスト関数リフト
Nextクリーン経路
- [ ] MiniVM: 関数形式+簡易 JSON ローダへ段階移行
- [ ] Docs: invariants/constraints/testingmatrix へ反映追加
### Guardrailsactive
- 参照実行: PyVM が常時緑、マクロ正規化は preMIR で一度だけ
- 前展開: `NYASH_MACRO_SELFHOST_PRE_EXPAND=auto`dev/CI
- テスト: VM/goldens は軽量維持、IR は任意ジョブ

File diff suppressed because it is too large Load Diff

View File

@ -39,14 +39,67 @@ class Function:
class PyVM:
def __init__(self, program: Dict[str, Any]):
self.functions: Dict[str, Function] = {}
self._debug = os.environ.get('NYASH_PYVM_DEBUG') in ('1','true','on')
for f in program.get("functions", []):
name = f.get("name")
params = [int(p) for p in f.get("params", [])]
bmap: Dict[int, Block] = {}
for bb in f.get("blocks", []):
bmap[int(bb.get("id"))] = Block(id=int(bb.get("id")), instructions=list(bb.get("instructions", [])))
# Register each function inside the loop (bugfix)
self.functions[name] = Function(name=name, params=params, blocks=bmap)
def _dbg(self, *a):
if self._debug:
try:
import sys as _sys
print(*a, file=_sys.stderr)
except Exception:
pass
def _type_name(self, v: Any) -> str:
"""Pretty type name for debug traces mapped to MIR conventions."""
if v is None:
return "null"
if isinstance(v, bool):
# Booleans are encoded as i64 0/1 in MIR
return "i64"
if isinstance(v, int):
return "i64"
if isinstance(v, float):
return "f64"
if isinstance(v, str):
return "string"
if isinstance(v, dict) and "__box__" in v:
return f"Box({v.get('__box__')})"
return type(v).__name__
# --- Capability helpers (macro sandbox) ---
def _macro_sandbox_active(self) -> bool:
"""Detect if we are running under macro sandbox.
Heuristics:
- Explicit flag NYASH_MACRO_SANDBOX=1
- Macro child default envs (plugins off + macro off)
- Any MACRO_CAP_* enabled
"""
if os.environ.get("NYASH_MACRO_SANDBOX", "0") in ("1", "true", "on"):
return True
if os.environ.get("NYASH_DISABLE_PLUGINS") in ("1", "true", "on") and os.environ.get("NYASH_MACRO_ENABLE") in ("0", "false", "off"):
return True
if self._cap_env() or self._cap_io() or self._cap_net():
return True
return False
def _cap_env(self) -> bool:
return os.environ.get("NYASH_MACRO_CAP_ENV", "0") in ("1", "true", "on")
def _cap_io(self) -> bool:
return os.environ.get("NYASH_MACRO_CAP_IO", "0") in ("1", "true", "on")
def _cap_net(self) -> bool:
return os.environ.get("NYASH_MACRO_CAP_NET", "0") in ("1", "true", "on")
def _read(self, regs: Dict[int, Any], v: Optional[int]) -> Any:
if v is None:
return None
@ -69,13 +122,61 @@ class PyVM:
def _is_console(self, v: Any) -> bool:
return isinstance(v, dict) and v.get("__box__") == "ConsoleBox"
def _sandbox_allow_newbox(self, box_type: str) -> bool:
"""Allow-list for constructing boxes under macro sandbox."""
if not self._macro_sandbox_active():
return True
if box_type in ("ConsoleBox", "StringBox", "ArrayBox", "MapBox"):
return True
if box_type in ("FileBox", "PathBox", "DirBox"):
return self._cap_io()
# Simple net-related boxes
if box_type in ("HTTPBox", "HttpBox", "SocketBox"):
return self._cap_net()
# Unknown boxes are denied in sandbox
return False
def _sandbox_allow_boxcall(self, recv: Any, method: Optional[str]) -> bool:
if not self._macro_sandbox_active():
return True
# Console methods are fine
if self._is_console(recv):
return True
# String methods (our VM treats StringBox receiver as Python str)
if isinstance(recv, str):
return method in ("length", "substring", "lastIndexOf", "indexOf")
# File/Path/Dir need IO cap
if isinstance(recv, dict) and recv.get("__box__") in ("FileBox", "PathBox", "DirBox"):
return self._cap_io()
# Other boxes are denied in sandbox
return False
def run(self, entry: str) -> Any:
fn = self.functions.get(entry)
if fn is None:
raise RuntimeError(f"entry function not found: {entry}")
self._dbg(f"[pyvm] run entry={entry}")
return self._exec_function(fn, [])
def run_args(self, entry: str, args: list[Any]) -> Any:
fn = self.functions.get(entry)
if fn is None:
raise RuntimeError(f"entry function not found: {entry}")
self._dbg(f"[pyvm] run entry={entry} argv={args}")
call_args = list(args)
# If entry is a typical main (main / *.main), pack argv into an ArrayBox-like value
# to match Nyash's `main(args)` convention regardless of param count.
try:
if entry == 'main' or entry.endswith('.main'):
call_args = [{"__box__": "ArrayBox", "__arr": list(args)}]
elif fn.params and len(fn.params) == 1:
call_args = [{"__box__": "ArrayBox", "__arr": list(args)}]
except Exception:
pass
return self._exec_function(fn, call_args)
def _exec_function(self, fn: Function, args: List[Any]) -> Any:
self._dbg(f"[pyvm] call {fn.name} args={args}")
# Intrinsic fast path for small helpers used in smokes
ok, ret = self._try_intrinsic(fn.name, args)
if ok:
@ -344,7 +445,10 @@ class PyVM:
if op == "newbox":
btype = inst.get("type")
if btype == "ConsoleBox":
# Sandbox gate: only allow minimal boxes when sandbox is active
if not self._sandbox_allow_newbox(str(btype)):
val = {"__box__": str(btype), "__denied__": True}
elif btype == "ConsoleBox":
val = {"__box__": "ConsoleBox"}
elif btype == "StringBox":
# empty string instance
@ -371,6 +475,85 @@ class PyVM:
method = inst.get("method")
args = [self._read(regs, a) for a in inst.get("args", [])]
out: Any = None
self._dbg(f"[pyvm] boxcall recv={recv} method={method} args={args}")
# Sandbox gate: disallow unsafe/unknown boxcalls
if not self._sandbox_allow_boxcall(recv, method):
self._set(regs, inst.get("dst"), out)
i += 1
continue
# Special-case: inside a method body, 'me.method(...)' lowers to a
# boxcall with a synthetic receiver marker '__me__'. Resolve it by
# dispatching to the current box's lowered function if available.
if isinstance(recv, str) and recv == "__me__" and isinstance(method, str):
# Derive box name from current function (e.g., 'MiniVm.foo/2' -> 'MiniVm')
box_name = ""
try:
if "." in fn.name:
box_name = fn.name.split(".")[0]
except Exception:
box_name = ""
if box_name:
cand = f"{box_name}.{method}/{len(args)}"
callee = self.functions.get(cand)
if callee is not None:
self._dbg(f"[pyvm] boxcall(__me__) -> {cand} args={args}")
out = self._exec_function(callee, args)
self._set(regs, inst.get("dst"), out)
i += 1
continue
# Fast-path: built-in ArrayBox minimal methods (avoid noisy unresolved logs)
if isinstance(recv, dict) and recv.get("__box__") == "ArrayBox":
arr = recv.get("__arr", [])
if method in ("len", "size"):
out = len(arr)
elif method == "get":
idx = int(args[0]) if args else 0
out = arr[idx] if 0 <= idx < len(arr) else None
elif method == "set":
idx = int(args[0]) if len(args) > 0 else 0
val = args[1] if len(args) > 1 else None
if 0 <= idx < len(arr):
arr[idx] = val
elif idx == len(arr):
arr.append(val)
else:
while len(arr) < idx:
arr.append(None)
arr.append(val)
out = 0
elif method == "push":
val = args[0] if args else None
arr.append(val)
out = len(arr)
elif method == "toString":
out = "[" + ",".join(str(x) for x in arr) + "]"
else:
out = None
recv["__arr"] = arr
self._set(regs, inst.get("dst"), out)
i += 1
continue
# User-defined box: dispatch to lowered function if available (Box.method/N)
if isinstance(recv, dict) and isinstance(method, str) and "__box__" in recv:
box_name = recv.get("__box__")
cand = f"{box_name}.{method}/{len(args)}"
callee = self.functions.get(cand)
if callee is not None:
self._dbg(f"[pyvm] boxcall dispatch -> {cand} args={args}")
out = self._exec_function(callee, args)
self._set(regs, inst.get("dst"), out)
i += 1
continue
else:
if self._debug:
prefix = f"{box_name}.{method}/"
cands = sorted([k for k in self.functions.keys() if k.startswith(prefix)])
if cands:
self._dbg(f"[pyvm] boxcall unresolved: '{cand}' — available: {cands}")
else:
any_for_box = sorted([k for k in self.functions.keys() if k.startswith(f"{box_name}.")])
self._dbg(f"[pyvm] boxcall unresolved: '{cand}' — no candidates; methods for {box_name}: {any_for_box}")
# ConsoleBox methods
if method in ("print", "println", "log") and self._is_console(recv):
s = args[0] if args else ""
@ -494,13 +677,33 @@ class PyVM:
out = len(str(recv))
elif method == "substring":
s = str(recv)
start = int(args[0]) if len(args) > 0 else 0
end = int(args[1]) if len(args) > 1 else len(s)
start = int(args[0]) if (len(args) > 0 and args[0] is not None) else 0
end = int(args[1]) if (len(args) > 1 and args[1] is not None) else len(s)
out = s[start:end]
elif method == "lastIndexOf":
s = str(recv)
needle = str(args[0]) if args else ""
out = s.rfind(needle)
# Optional start index (ignored by many call sites; support if provided)
if len(args) > 1 and args[1] is not None:
try:
start = int(args[1])
except Exception:
start = 0
out = s.rfind(needle, start)
else:
out = s.rfind(needle)
elif method == "indexOf":
s = str(recv)
needle = str(args[0]) if args else ""
# Support optional start index: indexOf(needle, start)
if len(args) > 1 and args[1] is not None:
try:
start = int(args[1])
except Exception:
start = 0
out = s.find(needle, start)
else:
out = s.find(needle)
else:
# Unimplemented method -> no-op
out = None
@ -512,6 +715,7 @@ class PyVM:
func = inst.get("func")
args = [self._read(regs, a) for a in inst.get("args", [])]
out: Any = None
self._dbg(f"[pyvm] externcall func={func} args={args}")
# Normalize known console/debug externs
if isinstance(func, str):
if func in ("nyash.console.println", "nyash.console.log", "env.console.log"):
@ -531,6 +735,10 @@ class PyVM:
except Exception:
print(str(s))
out = 0
else:
# Macro sandbox: disallow unknown externcall unless explicitly whitelisted by future caps
# (currently no IO/NET externs are allowed in macro child)
out = 0
# Unknown extern -> no-op with 0/None
self._set(regs, inst.get("dst"), out)
i += 1
@ -542,6 +750,7 @@ class PyVM:
eid = int(inst.get("else"))
prev = cur
cur = tid if self._truthy(cond) else eid
self._dbg(f"[pyvm] branch cond={cond} -> next={cur}")
# Restart execution at next block
break
@ -549,10 +758,13 @@ class PyVM:
tgt = int(inst.get("target"))
prev = cur
cur = tgt
self._dbg(f"[pyvm] jump -> {cur}")
break
if op == "ret":
v = self._read(regs, inst.get("value"))
if self._debug:
self._dbg(f"[pyvm] ret {self._type_name(v)} value={v}")
return v
if op == "call":
@ -564,9 +776,30 @@ class PyVM:
fname = fval if isinstance(fval, str) else None
call_args = [self._read(regs, a) for a in inst.get("args", [])]
result = None
if isinstance(fname, str) and fname in self.functions:
callee = self.functions[fname]
result = self._exec_function(callee, call_args)
if isinstance(fname, str):
# Direct hit
if fname in self.functions:
callee = self.functions[fname]
self._dbg(f"[pyvm] call -> {fname} args={call_args}")
result = self._exec_function(callee, call_args)
else:
# Heuristic resolution: match suffix ".name/arity"
arity = len(call_args)
suffix = f".{fname}/{arity}"
candidates = [k for k in self.functions.keys() if k.endswith(suffix)]
if len(candidates) == 1:
callee = self.functions[candidates[0]]
self._dbg(f"[pyvm] call -> {candidates[0]} args={call_args}")
result = self._exec_function(callee, call_args)
elif self._debug and len(candidates) > 1:
self._dbg(f"[pyvm] call unresolved: '{fname}'/{arity} has multiple candidates: {candidates}")
elif self._debug:
# Suggest close candidates across arities using suffix ".name/"
any_cands = sorted([k for k in self.functions.keys() if k.endswith(f".{fname}/") or f".{fname}/" in k])
if any_cands:
self._dbg(f"[pyvm] call unresolved: '{fname}'/{arity} — available: {any_cands}")
else:
self._dbg(f"[pyvm] call unresolved: '{fname}'/{arity} not found")
# Store result if needed
self._set(regs, inst.get("dst"), result)
i += 1
@ -592,6 +825,27 @@ class PyVM:
else:
out.append(ch)
return True, "".join(out)
if name == "MiniVm.read_digits/2":
s = "" if not args or args[0] is None else str(args[0])
pos = 0 if len(args) < 2 or args[1] is None else int(args[1])
out_chars = []
while pos < len(s):
ch = s[pos]
if '0' <= ch <= '9':
out_chars.append(ch)
pos += 1
else:
break
return True, "".join(out_chars)
if name == "MiniVm.parse_first_int/1":
js = "" if not args or args[0] is None else str(args[0])
key = '"value":{"type":"int","value":'
idx = js.rfind(key)
if idx < 0:
return True, "0"
start = idx + len(key)
ok, digits = self._try_intrinsic("MiniVm.read_digits/2", [js, start])
return True, digits
if name == "Main.dirname/1":
p = "" if not args else ("" if args[0] is None else str(args[0]))
d = os.path.dirname(p)