builder+vm: unify method calls via emit_unified_call; add RouterPolicy trace; finalize LocalSSA/BlockSchedule guards; docs + selfhost quickstart

- Unify standard method calls to emit_unified_call; route via RouterPolicy and apply rewrite::{special,known} at a single entry.\n- Stabilize emit-time invariants: LocalSSA finalize + BlockSchedule PHI→Copy→Call ordering; metadata propagation on copies.\n- Known rewrite default ON (userbox only, strict guards) with opt-out flag NYASH_REWRITE_KNOWN_DEFAULT=0.\n- Expand TypeAnnotation whitelist (is_digit_char/is_hex_digit_char/is_alpha_char/Map.has).\n- Docs: unified-method-resolution design note; Quick Reference normalization note; selfhosting/quickstart.\n- Tools: add tools/selfhost_smoke.sh (dev-only).\n- Keep behavior unchanged for Unknown/core/user-instance via BoxCall fallback; all tests green (quick/integration).
This commit is contained in:
nyash-codex
2025-09-28 20:38:09 +09:00
parent e442e5f612
commit dd65cf7e4c
60 changed files with 2523 additions and 471 deletions

View File

@ -0,0 +1,64 @@
# MIR Builder — Boxes Catalog (Phase 15.7)
Purpose
- Consolidate scattered responsibilities into small, focused “boxes” (modules) with clear APIs.
- Reduce regression surface by centralizing invariants and repeated patterns.
- Keep behavior unchanged (default-off for any new diagnostics). Adopt gradually.
Status (2025-09-28)
- S-tier (landed skeletons):
- MetadataPropagationBox — type/origin propagation.
- ConstantEmissionBox — Const emission helpers.
- TypeAnnotationBox — minimal return-type annotation for known calls.
- S-tier (new in this pass):
- RouterPolicyBox — route decision (Unified vs BoxCall).
- EmitGuardBox — emit-time invariants (LocalSSA finalize + schedule verify).
- NameConstBox — string Const for function names.
- A/B-tier: planned; do not implement by default.
Call Routing — Unification (20250928)
- Standard method calls now delegate to `emit_unified_call` (single entry).
- Receiver class hint (origin/type) is resolved inside unified; handlers no longer duplicate it.
- RouterPolicy decides Unified vs BoxCall. Unknown/core/userinstance → BoxCall (behaviorpreserving).
- Rewrites apply centrally: `rewrite::special` (toString/stringify→str, equals/1) and `rewrite::known` (Known→function).
- LocalSSA + BlockSchedule + EmitGuard enforce PHI→Copy→Call ordering and inblock materialization.
Structure
```
src/mir/builder/
├── metadata/propagate.rs # MetadataPropagationBox
├── emission/constant.rs # ConstantEmissionBox
├── emission/compare.rs # CompareEmissionBox (new)
├── emission/branch.rs # BranchEmissionBox (new)
├── types/annotation.rs # TypeAnnotationBox
├── router/policy.rs # RouterPolicyBox
├── emit_guard/mod.rs # EmitGuardBox
└── name_const.rs # NameConstBox
```
APIs (concise)
- metadata::propagate(builder, src, dst)
- metadata::propagate_with_override(builder, dst, MirType)
- emission::constant::{emit_integer, emit_string, emit_bool, emit_float, emit_null, emit_void}
- emission::compare::{emit_to, emit_eq_to, emit_ne_to}
- emission::branch::{emit_conditional, emit_jump}
- types::annotation::{set_type, annotate_from_function}
- router::policy::{Route, choose_route(box_name, method, certainty, arity)}
- emit_guard::{finalize_call_operands(builder, &mut Callee, &mut Vec<ValueId>), verify_after_call(builder)}
- name_const::{make_name_const_result(builder, &str) -> Result<ValueId, String>}
Adoption Plan (behavior-preserving)
1) Replace representative Const sites with `emission::constant`.
2) Replace ad-hoc type/origin copy with `metadata::propagate`.
3) Call `types::annotation` where return type is clearly known (string length/size/str etc.).
4) Use `router::policy::choose_route` in unified call path; later migrate utils prefer_legacy to it.
5) Use `emit_guard` to centralize LocalSSA finalize + schedule verify around calls; later extend to branch/compare.
6) Use `name_const` in rewrite paths to reduce duplication.
Diagnostics
- All new logs remain dev-only behind env toggles already present (e.g., NYASH_LOCAL_SSA_TRACE, NYASH_BLOCK_SCHEDULE_VERIFY).
- Router trace: `NYASH_ROUTER_TRACE=1` prints route decisions (stderr, short, default OFF).
Guardrails
- Behavior must remain unchanged; only refactors/centralizations allowed.
- Keep diffs small; validate `make smoke-quick` and `make smoke-integration` stay green at each step.

View File

@ -0,0 +1,59 @@
# Unified Method Resolution — Design Note (Phase P4)
Purpose
- Document the unified pipeline for method resolution and how we will roll it out safely.
- Make behavior observable (dev-only) and gate any future default changes behind clear criteria.
Goals
- Single entry for all method calls via `emit_unified_call`.
- Behavior-preserving by default: Unknown/core/userinstance receivers route to BoxCall.
- Known receivers may be rewritten to function calls (obj.m → Class.m(me,…)) under strict conditions.
- Keep invariants around SSA and instruction order to prevent sporadic undefined uses.
Pipeline (concept)
1) Entry: `emit_unified_call(dst, CallTarget::Method { box_type, method, receiver }, args)`
2) Special rewrites (early): toString/stringify → str, equals/1 consolidation.
3) Known/unique rewrite (user boxes only): if class is Known and a unique function exists, rewrite to `Call(Class.m/arity)`.
4) Routing: `RouterPolicy.choose_route` decides Unified vs BoxCall (Unknown/core/userinstance → BoxCall; else Unified).
5) Emit guard: LocalSSA finalize (recv/args in current block) + BlockSchedule order contract (PHI → Copy → Call).
6) MIR emit: `Call { callee=Method/Extern/Global }` or `BoxCall` as routed.
Invariants (dev-verified)
- SSA locality: All operands are materialized within the current basic block before use.
- Order: PHI group at block head, then materialize Copies, then body (Calls). Verified with `NYASH_BLOCK_SCHEDULE_VERIFY=1`.
- Rewrites do not change semantics: Known rewrite only when a concrete target exists and is unique for the arity.
Behavior flags (existing)
- `NYASH_ROUTER_TRACE=1`: short route decisions to stderr (reason, class, method, arity, certainty).
- `NYASH_LOCAL_SSA_TRACE=1`: LocalSSA ensure/finalize traces (recv/arg/cond/cmp).
- `NYASH_BLOCK_SCHEDULE_VERIFY=1`: warn when Copy/Call ordering does not follow the contract.
- KPI (dev-only):
- `NYASH_DEBUG_KPI_KNOWN=1` → aggregate Known rate for `resolve.choose`.
- `NYASH_DEBUG_SAMPLE_EVERY=N` → sample output every N events.
Flag (P4)
- `NYASH_REWRITE_KNOWN_DEFAULT` (default ON; set to 0/false/off to disable):
- Enables Known→function rewrite by default for user boxes if and only if:
- receiver is Known (origin), and
- function exists, and
- candidate is unique for the arity.
- When disabled, behavior remains conservative; routing still handles BoxCall fallback.
Rollout note
- Default is ON with strict guards; set `NYASH_REWRITE_KNOWN_DEFAULT=0` to revert to conservative behavior.
- Continue to use `NYASH_ROUTER_TRACE=1` and KPI sampling to validate stability during development.
Key files
- Entry & routing: `src/mir/builder/builder_calls.rs`, `src/mir/builder/router/policy.rs`
- Rewrites: `src/mir/builder/rewrite/{special.rs, known.rs}`
- SSA & order: `src/mir/builder/ssa/local.rs`, `src/mir/builder/schedule/block.rs`, `src/mir/builder/emit_guard/`
- Observability: `src/mir/builder/observe/resolve.rs`
Acceptance for P4
- quick/integration stay green with flags OFF.
- With flags ON (dev), green remains; KPI reports sensible Known rates without mismatches.
- No noisy logs in default runs; all diagnostics behind flags.
Notes
- This design keeps Unknown/core/userinstance on BoxCall for stability and parity with legacy behavior.
- Known rewrite is structurally safe because user box methods are lowered to standalone MIR functions during build.

View File

@ -38,7 +38,7 @@ Unified Call開発既定ON
4) NYABIVM Kernel Bridge下地未配線・既定OFF
- docs/abi/vm-kernel.md関数: caps()/policy.*()/resolve_method_batch()
- スケルトン: apps/selfhost/vm/boxes/vm_kernel_box.nyashpolicy スタブ)
- 既定OFFトグル予約: NYASH_VM_NY_KERNEL, *_TIMEOUT_MS, *_TRACE
- 既定OFFトグル予約: NYASH_VM_NY_KERNEL, *_TIMEOUT_MS, *_TRACE
非スコープ(やらない)
- 既定挙動の変更Rust VM/LLVMが主軸のまま
@ -88,3 +88,44 @@ Unified Call開発既定ON
更新履歴
- 20250928 v2本書: Known 化Rewrite 統合dev観測、表示API `str()` 統一、MiniVM 安定化へ焦点を再定義
- 20250928 初版: MiniVM M3 + NYABI下地の計画
## ステータス20250928 仕上げメモ)
- M3compare/branch/jump: MiniVMMirVmMinが厳密セグメントの単一パスで動作。代表 JSON 断片で compare(Eq)→ret、branch、jump を評価。
- 統合スモーク: integration プロファイルLLVM/llvmliteは PASS 17/17全緑
- ルータ/順序ガード(仕様不変):
- Router: 受信者クラスが Unknown のメソッド呼び出しは常にレガシー BoxCall にフォールバック安定性優先・常時ON
- BlockSchedule: φ→Copy(materialize)→本体(Call) の順序を devonly で検証(`NYASH_BLOCK_SCHEDULE_VERIFY=1`)。
- LocalSSA: 受信者・引数・条件・フィールド基底を emit 直前で「現在のブロック内」に必ず定義。
- VM 寛容フラグの方針:
- `NYASH_VM_TOLERATE_VOID`: dev 時の救済専用quick テストからは除去)。
- Router の Unknown→BoxCall は常時ON仕様不変・安定化目的
## 次のTODO短期
- json_query_vmVM: LocalSSA/順序の取りこぼしを補強し、SKIP を解除。
- ループ PHI 搬送: ループ header/合流での搬送を最小補強し、break/continue/loop_statement の SKIP を解除。
- MiniVM M2/M3: 単一パス化の仕上げ境界厳密化の再確認後、代表4件m2_eq_true/false, m3_branch_true, m3_jumpを PASS → SKIP 解除。
## Builder 小箱Box 化)方針(仕様不変・段階導入)
- S-tier導入:
- MetadataPropagationBox型/起源伝播): `metadata/propagate.rs`
- ConstantEmissionBoxConst発行: `emission/constant.rs`
- TypeAnnotationBox最小型注釈: `types/annotation.rs`
- RouterPolicyBoxUnified vs BoxCall ルート): `router/policy.rs`
- EmitGuardBoxemit直前の最終関所: `emit_guard/mod.rs`
- NameConstBox関数名Const生成: `name_const.rs`
- A/B-tier計画:
- Compare/BranchEmissionBox、PhiWiringBox、EffectMask/TypeInferenceBoxPhase16以降
採用順(小さく安全に)
1) Const → metadata → 最小注釈の順に薄く差し替え(代表箇所→全体)
2) RouterPolicyBox を統一Call経路に導入utils側は後段で移行
3) EmitGuardBox で Call 周辺の finalize/verify を集約Branch/Compare は後段)
4) NameConstBox を rewrite/special/known に段階適用
ドキュメント
- 詳細は `docs/development/builder/BOXES.md` を参照。
## Unskip Plan段階復帰
- P0: json_query_vm → 期待出力一致、寛容フラグ不要。
- P1: loopsbreak/continue/loop_statement→ PHI 搬送安定。
- P2: MiniVMM2/M3→ 代表4件 PASS、coarse 撤去・単一パス維持。

View File

@ -0,0 +1,68 @@
# SelfHosting Quickstart (Phase 15 — Resume)
This note shows how to run the Nyash selfhost compiler MVP to emit MIR(JSON v0) and execute it with the current VM line. The flow keeps defaults unchanged and uses small, optin flags for development.
## Layout
- Compiler MVP: `apps/selfhost-compiler/compiler.nyash`
- Runtime helpers (dev): `apps/selfhost-runtime/`
- MiniVM samples (dev): `apps/selfhost/vm/`
## Run the selfhost compiler
Compile a minimal program (string embedded in the compiler) and print JSON:
```
./target/release/nyash apps/selfhost-compiler/compiler.nyash -- --stage3
```
ENV → child args (透過):
- `NYASH_NY_COMPILER_MIN_JSON=1``-- --min-json`
- `NYASH_SELFHOST_READ_TMP=1``-- --read-tmp` (reads `tmp/ny_parser_input.ny`)
- `NYASH_NY_COMPILER_STAGE3=1``-- --stage3` (Stage3 surface enable)
- `NYASH_NY_COMPILER_CHILD_ARGS="..."` → passes extra args verbatim
Examples:
```
NYASH_NY_COMPILER_MIN_JSON=1 ./target/release/nyash apps/selfhost-compiler/compiler.nyash -- --stage3 > /tmp/out.json
NYASH_SELFHOST_READ_TMP=1 ./target/release/nyash apps/selfhost-compiler/compiler.nyash -- --min-json --stage3
```
## Execute MIR(JSON v0)
Use the VM line (Rust) or PyVM harness as needed.
Rust VM (default):
```
./target/release/nyash --backend vm apps/examples/json_query/main.nyash
```
PyVM reference (when verifying parity):
```
NYASH_VM_USE_PY=1 ./target/release/nyash --backend vm apps/examples/json_query/main.nyash
```
LLVM harness (llvmlite):
```
NYASH_LLVM_USE_HARNESS=1 ./target/release/nyash --backend llvm apps/examples/json_query/main.nyash
```
Notes:
- For selfhost emitted JSON, route the file to your runner pipeline or a small loader script (dev only). Keep defaults unchanged in CI (no new jobs required).
## Oneshot dev smoke
Run a minimal endtoend smoke that tries to emit JSON (besteffort) and verifies VM outputs match with Known rewrite ON/OFF:
```
tools/selfhost_smoke.sh
```
It does not modify defaults and is safe to run locally.
## Flags (dev)
- Known rewrite default ON (userbox only, strict guards): `NYASH_REWRITE_KNOWN_DEFAULT=0|1`
- Router trace: `NYASH_ROUTER_TRACE=1`
- KPI sampling: `NYASH_DEBUG_KPI_KNOWN=1` (+ `NYASH_DEBUG_SAMPLE_EVERY=N`)
- Local SSA trace: `NYASH_LOCAL_SSA_TRACE=1`
## Acceptance (P6 resume)
- quick/integration remain green.
- Minimal selfhost emit→execute path PASS in a dev job (no CI change).
- No default behavior changes; all diagnostics under env flags.