🚀 Start Phase 15.3: Nyash compiler MVP implementation

Major milestone:
- Set up apps/selfhost-compiler/ directory structure
- Implement basic Nyash compiler in Nyash (CompilerBox)
- Stage-1: Basic arithmetic parser (int/string/+/-/*/括弧/return)
- JSON v0 output compatible with --ny-parser-pipe
- Runner integration with NYASH_USE_NY_COMPILER=1 flag
- Comprehensive smoke tests for PHI/Bridge/Stage-2

Technical updates:
- Updated CLAUDE.md with Phase 15.3 status and MIR14 details
- Statement separation policy: newline-based with minimal ASI
- Fixed runaway ny-parser-pipe processes (CPU 94.9%)
- Clarified MIR14 as canonical instruction set (not 13/18)
- LoopForm strategy: PHI auto-generation during reverse lowering

Collaborative development:
- ChatGPT5 implementing compiler skeleton
- Codex provided LoopForm PHI generation guidance
- Claude maintaining documentation and coordination

🎉 セルフホスティングの歴史的一歩!自分自身をコンパイルする日が近いにゃ!

Co-Authored-By: ChatGPT <noreply@openai.com>
This commit is contained in:
Selfhosting Dev
2025-09-15 01:21:37 +09:00
parent d01f9b9c93
commit af11c6855b
28 changed files with 1007 additions and 40 deletions

View File

@ -31,6 +31,14 @@ MIR 13命令の美しさを最大限に活かし、外部コンパイラ依存
- dep_tree_min_string: PyVM↔llvmlite パリティ一致、llvmlite 経路で `.ll verify → .o → EXE` 完走。
- 一時救済ゲート `NYASH_LLVM_ESC_JSON_FIX` は受入には未使用OFF
#### PHI 取り扱い方針Phase15 中)
- 現行: JSON v0 Bridge 側で If/Loop の PHI を生成(安定・緑)。
- 方針: Phase15 ではこのまま完成させる(変更しない)。
- 理由: LoopFormCore14導入時に、逆Loweringで PHI を自動生成する案(推薦)に寄せるため。
- PHI は「合流点での別名付け」であり、Boxの操作ではない。
- 抽象レイヤの純度維持Everything is Box
- 実装責務の一極化(行数削減/保守性向上)。
### Phase 15.3: NyashコンパイラMVP次フェーズ着手
- PyVM 安定後、Nyash製パーサ/レクサ(サブセット)と MIR ビルダを段階導入
- フラグでRustフォールバックと併存例: `NYASH_USE_NY_COMPILER=1`
@ -41,16 +49,76 @@ MIR 13命令の美しさを最大限に活かし、外部コンパイラ依存
- ステージ2: 文/式サブセット拡張local/if/loop/call/method/new/me/substring/length/lastIndexOf
- ステージ3: Ny AST→MIR JSON 降下(直接 llvmlite/PyVM へ渡す)。
#### Phase 15.3 — Detailed PlanNy compiler MVP
- Directory layoutselfhost compiler
- `apps/selfhost-compiler/compiler.nyash`CompilerBox entry; Ny→JSON v0 emit
- `apps/selfhost-compiler/parser/{lexer.nyash,parser.nyash,ast.nyash}`Stage2 へ段階拡張)
- `apps/selfhost-compiler/emitter/json_v0.nyash`(将来: emit 分離。MVPは inline でも可)
- `apps/selfhost-compiler/mir/{builder.nyash,optimizer.nyash}`(将来)
- `apps/selfhost-compiler/tests/{stage1,stage2}`サンプルと期待JSON
- Runner integration安全ゲート
- フラグ: `NYASH_USE_NY_COMPILER=1`既定OFF
- 子プロセス: `--backend vm` で selfhost compiler を起動し、stdout から JSON v0 1行を収集
- 環境: `NYASH_JSON_ONLY=1` を子に渡して余計な出力を抑制。失敗時は静かにフォールバック
- Stage1小さく積む
1) return / 整数 / 文字列 / 四則 / 括弧(左結合)
2) 文分離最小ASI: 改行=文区切り、継続子(+ - * / . ,)やグルーピング中は継続
3) 代表スモーク: `return 1+2*3` → JSON v0 → Bridge → MIR 実行 = 7
- Stage2本命へ
- local / if / loop / call / method / new / var / 比較 / 論理(短絡)
- PHI: Bridge 側の合流If/Loopに依存Phase15中は現行維持
- 代表スモーク: nested if / loop 累積 / 短絡 and/or と if/loop の交錯
- Acceptance15.3
- Stage1: 代表サンプルで JSON v0 emit → Bridge → PyVM/llvmlite で一致(差分なし)
- Bootstrap: `tools/bootstrap_selfhost_smoke.sh` で c0→c1→c1' が PASSフォールバックは許容
- Docs: 文分離ポリシー改行最小ASIを公開link: reference/language/statements.md
- Smokes / Tools更新
- `tools/selfhost_compiler_smoke.sh`(入口)
- `tools/ny_stage2_bridge_smoke.sh`(算術/比較/短絡/ネストif
- `tools/ny_parser_stage2_phi_smoke.sh`If/Loop の PHI 合流)
- `tools/parity.sh --lhs pyvm --rhs llvmlite <test.nyash>`(常時)
Imports/Namespace plan15.3late
- See: imports-namespace-plan.md — keep `nyash.toml` resolution in runner; accept `using` in Ny compiler as noop (no resolution) gated by `NYASH_ENABLE_USING=1`.
- Operational switches
- `NYASH_USE_NY_COMPILER=1`selfhost compiler 経路ON
- `NYASH_JSON_ONLY=1`(子プロセスの余計な出力抑止)
- `NYASH_DISABLE_PLUGINS=1`(必要に応じて子のみ最小化)
- 文分離: 最小ASIルール深さ0・直前が継続子でない改行のみ終端
- Risks / Rollback
- 子プロセス出力がJSONでない→フォールバックで安全運用
- 代表ケースで parity 不一致→selfhost 経路のみ切替OFF
- 影響範囲: CLI/Runner 層の限定的変更ゲートOFFなら既存経路と同値
【受入MVP
- `tools/ny_roundtrip_smoke.sh`Case A/B
- `apps/tests/esc_dirname_smoke.nyash` / `apps/selfhost/tools/dep_tree_min_string.nyash` を Ny パーサ経路で実行し、PyVM/llvmlite とパリティ一致stdout/exit
#### 予告: LoopFormCore14での PHI 自動化Phase15 後)
- LoopForm を強化し、`loop.begin(loop_carried_values) / loop.iter / loop.branch / loop.end` の構造的情報から逆Loweringで PHI を合成。
- If/短絡についても同様に、構造ブロックから合流点を決めて PHI を自動化。
- スケジュール: Phase15 後Core14で検討・実装。Phase15 では変更しない。
### Phase 15.4: VM層のNyash化PyVMからの置換
- PyVM を足場に、VMコアを Nyash 実装へ段階移植(命令サブセットから)
- 動的ディスパッチで13命令処理を目標に拡張
詳細:[セルフホスティング戦略 2025年9月版](implementation/self-hosting-strategy-2025-09.md)
---
補足: JSON v0 の扱い(互換)
- Phase15: Bridge で PHI を生成(現行継続)。
- Core14 以降: LoopForm で PHI 自動化後、JSON 側の PHI は非必須(将来は除外方向)。
- 型メタ(“+”の文字列混在/文字列比較)は継続。
## 📊 主要成果物
### コンパイラコンポーネント
@ -65,7 +133,7 @@ MIR 13命令の美しさを最大限に活かし、外部コンパイラ依存
### 自動生成基盤
- [ ] boxes.yamlBox型定義
- [ ] externs.yamlC ABI境界
- [ ] semantics.yamlMIR15定義)
- [ ] semantics.yamlMIR14定義)
- [ ] build.rs自動生成システム
### ブートストラップ
@ -75,13 +143,21 @@ MIR 13命令の美しさを最大限に活かし、外部コンパイラ依存
## 🔧 技術的アプローチ
### MIR 13命令の革命
- **基本演算(5)**: Const, UnaryOp, BinOp, Compare, TypeOp
- **メモリ(2)**: Load, Store
- **制御(4)**: Branch, Jump, Return, Phi
- **Box(1)**: BoxCallすべての箱操作を統合
- **外部(1)**: ExternCall
### MIR 14命令の革命
1. Const - 定数
2. BinOp - 二項演算
3. UnaryOp - 単項演算(復活!)
4. Compare - 比較
5. Jump - 無条件ジャンプ
6. Branch - 条件分岐
7. Return - 戻り値
8. Phi - SSA合流
9. Call - 関数呼び出し
10. BoxCall - Box操作配列/フィールド/メソッド統一!)
11. ExternCall - 外部呼び出し
12. TypeOp - 型操作
13. Safepoint - GC安全点
14. Barrier - メモリバリア
この究極のシンプルさにより、直接x86変換も現実的に
### バックエンドの選択肢
@ -238,6 +314,8 @@ ny_free_buf(buffer)
### ✅ クイックスモーク(現状)
- PyVM↔llvmlite パリティ: `tools/parity.sh --lhs pyvm --rhs llvmlite apps/tests/esc_dirname_smoke.nyash`
- dep_treeハーネスON: `NYASH_LLVM_FEATURE=llvm ./tools/build_llvm.sh apps/selfhost/tools/dep_tree_min_string.nyash -o app_dep && ./app_dep`
- JSON v0 bridge spec: `docs/reference/ir/json_v0.md`
- Stage2 smokes: `tools/ny_stage2_bridge_smoke.sh`, `tools/ny_parser_stage2_phi_smoke.sh`, `tools/ny_me_dummy_smoke.sh`
### 📚 関連フェーズ
- [Phase 10: Cranelift JIT](../phase-10/)

View File

@ -35,6 +35,17 @@ This roadmap is a living checklist to advance Phase 15 with small, safe boxes. U
- No circular dependency: nyrt provides StringBox/ArrayBox via C ABI
- Flag path: `NYASH_USE_NY_COMPILER=1` to switch rust→ny compiler; rust parser as fallback
- Add apps/selfhost-compiler/ and minimal smokes
- Stage1 checklist:
- [ ] return/int/string/arithmetic/paren JSON v0 emit
- [ ] Minimal ASInewline separator + continuation tokens
- [ ] Smokes: `return 1+2*3` / grouping / string literal
- Stage2 checklist:
- [ ] local/if/loop/call/method/new/var/logical/compare
- [ ] PHI 合流は Bridge に委譲If/Loop
- [ ] Smokes: nested if / loop 累積 / and/or × if/loop
4) PHI 自動化は Phase15 後Core14 LoopForm
- Phase15: 現行の BridgePHI を維持し、E2E 緑とパリティを最優先
- Core14: LoopForm 強化逆Loweringで PHI を自動生成(合流点の定型化)
4) Bootstrap loop (c0→c1→c1')
- Use existing trace/hash harness to compare parity; add optional CI gate
- **This achieves self-hosting!** Nyash compiles Nyash
@ -63,6 +74,8 @@ This roadmap is a living checklist to advance Phase 15 with small, safe boxes. U
- Parser path: `--parser {rust|ny}` or `NYASH_USE_NY_PARSER=1`
- JSON dump: `NYASH_DUMP_JSON_IR=1`
- 予告LoopForm: Core14 で仕様化予定
- Selfhost compiler: `NYASH_USE_NY_COMPILER=1`, child quiet: `NYASH_JSON_ONLY=1`
- Load Ny plugins: `NYASH_LOAD_NY_PLUGINS=1` / `--load-ny-plugins`
- AOT smoke: `CLIF_SMOKE_RUN=1`
@ -83,6 +96,8 @@ This roadmap is a living checklist to advance Phase 15 with small, safe boxes. U
- v0 E2E green (parser pipe + direct bridge) including Ny compiler MVP switch
- v1 minimal samples pass via JSON bridge
- AOT P2: emit→link→run stable for constant/arith
- Phase15 STOP には PHI 切替を含めないPHI は LoopForm/Core14 で扱う)
- 15.3: Stage1 代表サンプル緑 + Bootstrap smokeフォールバック許容+ 文分離ポリシー公開
- Docs/recipes usable on Windows/Unix
## Notes

View File

@ -0,0 +1,38 @@
# Phase 15.3 — Imports/Namespace/nyash.toml Plan
Status: 15.3 planning; focus remains on Stage1/2 compiler MVP. This document scopes when and how to bring `nyash.toml`/include/import/namespace into the selfhost path without destabilizing parity.
Goals
- Keep runnerlevel `nyash.toml` parsing/resolution as the source of truth during 15.3.
- Accept `using/import` syntax in the Ny compiler as a noop (record only) until resolution is delegated.
- Avoid VM changes; resolution happens before codegen.
Scope & Sequence (Phase 15.3)
1) Stage1/2 compiler stability (primary)
- Ny→JSON v0 → Bridge → PyVM/llvmlite parity maintained
- PHI merge remains on Bridge (If/Loop)
2) Imports/Namespace minimal acceptance (15.3late)
- Parse `using ns` / `using "path" [as alias]` as statements in the Ny compiler
- Do not resolve; emit no JSON entries (or emit metadata) — runner continues to strip/handle
- Gate via `NYASH_ENABLE_USING=1`
3) Runner remains in charge
- Keep existing Rust runners `using` stripping + modules registry population
- `nyash.toml` parsing stays in Rust (Phase 15)
Out of scope (Phase 15)
- Porting `nyash.toml` parsing to Ny
- Crossmodule codegen/linking in Ny compiler
- Advanced include resolution / package graph
Acceptance (15.3)
- Ny compiler can lex/parse `using` forms without breaking Stage1/2 programs
- Runner path (Rust) continues to resolve `using` and `nyash.toml` as before (parity unchanged)
Looking ahead (Core14 / Phase 16)
- Evaluate moving `nyash.toml` parsing to Ny as a library box (ConfigBox)
- Unify include/import/namespace into a single resolver pass in Ny with a small JSON side channel back to the runner
- Keep VM unchanged; all resolution before MIR build
Switches
- `NYASH_ENABLE_USING=1` — enable `using` acceptance in Ny compiler (no resolution)
- `NYASH_SKIP_TOML_ENV=1` — skip applying [env] in nyash.toml (existing)

View File

@ -52,3 +52,7 @@ When you need the implementation details
- Tokenizer: src/tokenizer.rs
- Parser: src/parser/expressions.rs, src/parser/statements.rs
- Lowering to MIR: src/mir/builder/**
Statement Separation (Semicolons)
- Newline separates statements by default; semicolons are optional.
- Use semicolons only when placing multiple statements on one line.
- Minimal ASI rules: newline does not end a statement when the line ends with an operator/dot/comma, or while inside grouping.

View File

@ -1,25 +0,0 @@
# Cranelift / AOT/JITAOT Tasks (Phase 15)
このドキュメントは Cranelift backendAOT/JITAOT関連の課題・進捗を集約します。
selfhostingdev ブランチでは VM/JIT 中心で開発するため、詳細はこちらへ集約し、`CURRENT_TASK.md` は軽量化しました。
最終更新: 20250906CURRENT_TASK から分離)
参考リンク
- 旧コンテンツ・完全版アーカイブ: `../../archives/CURRENT_TASK-2025-09-06.md`
- フェーズ概要: `../README.md`
現状サマリ(抜粋)
- StringBox.length/len が 0 になるケースの是正Lower 二段フォールバック: string.len_h → any.length_h
- Hostcall registry/extern thunks の追補(`SYM_STRING_LEN_H` 登録)
- AOT でのまれな segfaultDT_TEXTREL 警告の追跡TLS/extern 紐付け順)
優先課題(案)
1) Return 材化の強化JITdirect/JITAOT 共通)
2) Cranelift import シンボル解決の検証(`extern_thunks::nyash_string_len_h` の実呼出し保証)
3) AOT ツールチェーン(リンク・フラグ)の最小安定セット定義
運用メモ
- selfhostingdev では本ファイルの参照のみ(直接の実装変更は Cranelift 専用ブランチで実施)。
- 共有面(ランナー/IR など)に変更が必要な場合は feature gate と互換 API を優先し、両ブランチが同時に衝突しない形へ調整。

View File

@ -0,0 +1,81 @@
# Ny JSON IR v0 — Minimal Spec (Stage2)
Status: experimental but stable for Phase15 Stage2. Input to `--ny-parser-pipe`.
Version and root
- `version`: 0
- `kind`: "Program"
- `body`: array of statements
Statements (`StmtV0`)
- `Return { expr }`
- `Extern { iface, method, args[] }` (optional; passes through to `ExternCall`)
- `Expr { expr }` (expression statement; side effects only)
- `Local { name, expr }` (Stage2)
- `If { cond, then: Stmt[], else?: Stmt[] }` (Stage2)
- `Loop { cond, body: Stmt[] }` (Stage2; while(cond) body)
Expressions (`ExprV0`)
- `Int { value }` where `value` is JSON number or digit string
- `Str { value: string }`
- `Bool { value: bool }`
- `Binary { op: "+"|"-"|"*"|"/", lhs, rhs }`
- `Compare { op: "=="|"!="|"<"|"<="|">"|">=", lhs, rhs }`
- `Logical { op: "&&"|"||", lhs, rhs }` (shortcircuit)
- `Call { name: string, args[] }` (function by name)
- `Method { recv: Expr, method: string, args[] }` (box method)
- `New { class: string, args[] }` (construct Box)
- `Var { name: string }`
CFG conventions (lowered by the bridge)
- If: create `then_bb`, `else_bb`, `merge_bb`. Both branches jump to merge if unterminated.
- Loop: `preheader -> cond_bb -> (body_bb or exit_bb)`, body jumps back to cond.
- Shortcircuit Logical: create `rhs_bb`, `fall_bb`, `merge_bb` with constants on fall path.
- All blocks end with a terminator (branch/jump/return).
PHI merging (current behavior)
- If: locals updated in `then`/`else` merge at `merge_bb` via `phi`.
- Else欠落時は else 側に分岐前(base)を採用。
- 片側にしか存在しない新規変数はスコープ外として外へ未伝播。
- Loop: `cond_bb` にヘッダ PHI を先置きpreheader/base と latch/body end を合流)。
- 目的: Stage2 を早期に安定化させるための橋渡し。将来Core14は LoopForm からの逆LoweringでPHI自動化予定。
Type meta (emitter/LLVM harness cooperation)
- `+` with any string operand → string concat pathhandle固定
- `==/!=` with both strings → string compare path。
Special notes
- `Var("me")`: Bridge 既定では未定義エラー。デバッグ用に `NYASH_BRIDGE_ME_DUMMY=1` でダミー `NewBox{class}` を注入可(`NYASH_BRIDGE_ME_CLASS` 省略時は `Main`)。
- `--ny-parser-pipe` は stdin の JSON v0 を受け取り、MIR→MIRInterp 経由で実行する。
CLI/Env cheatsheet
- Pipe: `echo '{...}' | target/release/nyash --ny-parser-pipe`
- File: `target/release/nyash --json-file sample.json`
- Verbose MIR dump: `NYASH_CLI_VERBOSE=1`
- me dummy: `NYASH_BRIDGE_ME_DUMMY=1 NYASH_BRIDGE_ME_CLASS=ConsoleBox`
Examples
Arithmetic
```json
{"version":0,"kind":"Program","body":[
{"type":"Return","expr":{
"type":"Binary","op":"+",
"lhs":{"type":"Int","value":1},
"rhs":{"type":"Binary","op":"*","lhs":{"type":"Int","value":2},"rhs":{"type":"Int","value":3}}
}}
]}
```
If with local + PHI merge
```json
{"version":0,"kind":"Program","body":[
{"type":"Local","name":"x","expr":{"type":"Int","value":1}},
{"type":"If","cond":{"type":"Compare","op":"<","lhs":{"type":"Int","value":1},"rhs":{"type":"Int","value":2}},
"then":[{"type":"Local","name":"x","expr":{"type":"Int","value":10}}],
"else":[{"type":"Local","name":"x","expr":{"type":"Int","value":20}}]
},
{"type":"Return","expr":{"type":"Var","name":"x"}}
]}
```

View File

@ -10,6 +10,9 @@ This is the entry point for Nyash language documentation.
- Sugar Transformations (?., ??, |> and friends): parser/sugar.rs (source) and tools/nyfmt/NYFMT_POC_ROADMAP.md
- Peek Expression Design/Usage: covered in the Language Reference and Phase 12.7 specs above
Statement separation and semicolons
- See: reference/language/statements.md — newline as primary separator; semicolons optional for multiple statements on one line; minimal ASI rules.
Related implementation notes
- Tokenizer: src/tokenizer.rs
- Parser (expressions/statements): src/parser/expressions.rs, src/parser/statements.rs

View File

@ -0,0 +1,70 @@
# Statement Separation and Semicolons
Status: Adopted for Phase 15.3+; parser implementation is staged.
Policy
- Newline as primary statement separator.
- Semicolons are optional and only needed when multiple statements appear on one physical line.
- Minimal ASI (auto semicolon insertion) rules to avoid surprises.
Rules (minimal and predictable)
- Newline ends a statement when:
- Parenthesis/brace/bracket depth is 0, and
- The line does not end with a continuation token (`+ - * / . ,` etc.).
- Newline does NOT end a statement when:
- Inside any open grouping `(...)`, `[...]`, `{...}`; or
- The previous token is a continuation token.
- `return/break/continue` end the statement at newline unless the value is on the same line or grouped via parentheses.
- `if/else` (and similar paired constructs): do not insert a semicolon between a block and a following `else`.
- Oneline multistatements are allowed with semicolons: `x = 1; y = 2; print(y)`.
- Method chains can break across lines after a dot: `obj\n .method()` (newline treated as whitespace).
Style guidance
- Prefer newline separation (no semicolons) for readability.
- Use semicolons only when placing multiple statements on a single line.
Examples
```nyash
// Preferred (no semicolons)
local x = 5
x = x + 1
print(x)
// One line with multiple statements (use semicolons)
local x = 5; x = x + 1; print(x)
// Line continuation by operator
local n = 1 +
2 +
3
// Grouping across lines
return (
1 + 2 + 3
)
// if / else on separate lines without inserting a semicolon
if cond {
x = x - 1
}
else {
print(x)
}
// Dot chain across lines
local v = obj
.methodA()
.methodB(42)
```
Implementation notes (parser)
- Tokenizer keeps track of grouping depth.
- At newline, attempt ASI only when depth==0 and previous token is not a continuation.
- Error messages should suggest adding a continuation token or grouping when a newline unexpectedly ends a statement.
Parser dev notes (Stage1/2)
- return + newline: treat bare `return` as statement end. To return an expression on the next line, require grouping with parentheses.
- if/else: never insert a semicolon between a closed block and `else` (ASI禁止箇所)。
- Dot chains: treat `.` followed by newline as whitespace (line continuation)。
- Oneline multistatements: accept `;` as statement separator, but formatter should prefer newlines.
- Unary minus: disambiguate from binary minus; implement after Stage1当面は括弧で回避