Phase 21.7 normalization: optimization pre-work + bench harness expansion

- Add opt-in optimizations (defaults OFF)
  - Ret purity verifier: NYASH_VERIFY_RET_PURITY=1
  - strlen FAST enhancement for const handles
  - FAST_INT gate for same-BB SSA optimization
  - length cache for string literals in llvmlite
- Expand bench harness (tools/perf/microbench.sh)
  - Add branch/call/stringchain/arraymap/chip8/kilo cases
  - Auto-calculate ratio vs C reference
  - Document in benchmarks/README.md
- Compiler health improvements
  - Unify PHI insertion to insert_phi_at_head()
  - Add NYASH_LLVM_SKIP_BUILD=1 for build reuse
- Runtime & safety enhancements
  - Clarify Rust/Hako ownership boundaries
  - Strengthen receiver localization (LocalSSA/pin/after-PHIs)
  - Stop excessive PluginInvoke→BoxCall rewrites
- Update CURRENT_TASK.md, docs, and canaries

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
nyash-codex
2025-11-13 16:40:58 +09:00
parent 9e2fa1e36e
commit dda65b94b7
160 changed files with 6773 additions and 1692 deletions

View File

@ -0,0 +1,45 @@
Legacy ByName Call Removal Plan
Context
- Unified call system is default. VM routes by `Callee` kind (Global/Method/Extern/…)
- Legacy byname resolution (callee=None → string name lookup) is gated by env and OFF by default.
- Goal: remove legacy code path once parity is proven at integration/full levels.
Guards / Current Behavior
- 旧 byname 経路は削除済み。`NYASH_VM_ALLOW_LEGACY` は廃止。
- 既定: `callee=None` 生成時は FailFastBuilder で Callee を必ず付与)。
Scope of Work
1) Identify callers that could still rely on legacy resolution
- Builder emissions that may omit `callee` (should be none after Phase 1 refactor)
- Any VM entry points or helpers constructing byname calls dynamically
- Older tests or tools expecting byname lookups
2) Ensure unified path covers them
- For Global user functions: attach `Callee::Global("Module.func")`
- For dynamic/indirect calls: mark `Callee::Value` and keep VM restrictions explicit
- For externlike names: delegate via handlers/calls/externs.rs (SSOT)
- Normalize arity (`foo/1``foo`) via `utils::normalize_arity_suffix`
3) Strengthen tests
- Smokes/quick: unified pathbyname 不可)を既に強制
- Integration/full: ユーザー関数呼び出し/extern 名正規化のカナリアを追加
- 旧ENV依存がないことを確認`NYASH_VM_ALLOW_LEGACY` は削除済)
4) Remove legacy code path
- Delete `src/backend/mir_interpreter/handlers/calls/legacy.rs` and its resolver module
- Drop env guard checks referring to legacy path (keep error message clarity)
- Update READMEs to remove temporary notes
Readiness Criteria (before removal)
- quick/integration/full all green with legacy OFF
- No `callee=None` emission in builder (verified by grepping MIR JSON / code review)
- Extern SSOT accepts aritysuffixed names; Global externlike names delegate properly
Rollback Plan
- Keep a small revertable commit boundary for legacy deletion
- If issues appear, reintroduce legacy.rs with the same path under a devguard until fixed
CrossReferences
- handlers/calls/README.md (SSOT and boundaries)
- docs/ENV_VARS.md (env guards and policies)

View File

@ -0,0 +1,43 @@
Normalization Ownership — Rust vs Hakorune
Goal
- Prevent double-normalization and keep a clear Single Source of Truth (SSOT) for where each rewrite lives.
Ownership
- Hakorune layer (Hako scripts)
- Methodization: Global("Box.m/N") → mir_call(Method) への変換。
- Name/arity canonicalizationBox.method/N
- Function defs scan/injectHAKO_STAGEB_FUNC_SCAN, HAKO_MIR_BUILDER_FUNCS
- Emit JSON v1 + unified mir_callNYASH_JSON_SCHEMA_V1=1, NYASH_MIR_UNIFIED_CALL=1
- 可視化タグ/診断の出力dev のみ)
- Rust layer
- Structural/correctness: SSA/PHI、受信側ローカライズLocalSSA/Copy/pin
- Legacy JSON v0 → minimal bridgingjson_v0_bridge 内での Callee 補完など)。
- 互換/安全弁: 未定義受信の構造的回復同一BB直近 NewBoxなど、dev ガード付きの最小範囲。
- Optimizer は構造・副作用ベースの最適化に限定(意味論の再書換えはしない)。
Guards and Toggles
- Hakodev 推奨セット)
- HAKO_STAGEB_FUNC_SCAN=1
- HAKO_MIR_BUILDER_FUNCS=1
- HAKO_MIR_BUILDER_CALL_RESOLVE=1
- HAKO_MIR_BUILDER_METHODIZE=1methodize が v1+unified 出力へ寄与)
- NYASH_JSON_SCHEMA_V1=1, NYASH_MIR_UNIFIED_CALL=1
- Rustbridge/診断)
- HAKO_BRIDGE_METHODIZE=1 は bring-up 用の補助。Hako 既定化後は OFF撤去予定
- mir_plugin_invoke/plugin_only は A/B 評価・診断用途。既定 OFF。
Rules of Engagement
- v1 + unified を Hako で生成した場合、Rust 側での methodize/再書換えは行わない(構造のみ)。
- json_v0_bridge は v0 入力に対する互換のために限定運用。v1 既定化が進めば縮退する。
- dev の安全弁(未定義受信の構造回復など)は、テストが十分になり次第 OFF/撤去する。
Testing
- Canary
- tools/dev/phase217_methodize_canary.shrc=5
- tools/dev/phase217_methodize_json_canary.shschema_version + mir_call present、Method優先
- tools/dev/phase216_chain_canary_call.shrc=5
- 失敗時は Hako 側methodize→ Rust 側(構造) の順で原因を特定する。

View File

@ -1,31 +1,33 @@
# Phase 21.5 — Optimization (ny-llvm crate line)
# Phase 21.5 — Optimization (AotPrepFirst)
Scope
- Optimize hot paths for the crate (ny-llvmc) line using Hakorune scripts only.
- Preserve default behavior; all risky changes behind dev toggles.
- Measure EXE runtime (build once, run many) to avoid toolchain overhead noise.
目的
- .hako 側AotPrepで前処理最適化構造のみを行い、LLVM/AOT に渡すIRを軽量にする。
- 既定は挙動不変optin。Return 純化ガードで安全性を担保。
Targets (initial)
- loop: integer accumulations (no I/O)
- strlen: FAST=1 path (pointer → nyrt_string_length)
- box: construct/destroy minimal boxes (String/Integer)
チェックリスト
- [x] パス分割StrlenFold / LoopHoist / ConstDedup / CollectionsHot / BinopCSE
- [x] CollectionsHotArray/Map導入既定OFF
- [x] Map key モード `NYASH_AOT_MAP_KEY_MODE={h|i64|hh|auto}`
- [x] LoopHoist v1 / BinopCSE v1最小
- [x] ベンチ `linidx`/`maplin` 追加
- [ ] LoopHoist v2+/* 右項 const の連鎖前出し/fixpoint
- [ ] BinopCSE v2線形 `i*n` 共通化の強化)
- [ ] CollectionsHot v2array index の共通SSA利用
- [ ] Map auto 精緻化_is_const_or_linear の再帰判定)
- [ ] Idempotence置換済みタグで再実行時も不変
- [ ] `arraymap`/`matmul` ≤ 125%C基準
Methodology
- Build once via ny-llvmc; time execution only (`--exe` mode).
- Runs: 35; report median and average (target ≥ 100ms per run).
- Observe NYASH_VM_STATS=1 (inst/compare/branch) where relevant to correlate structure and runtime.
Commands (examples)
- tools/perf/phase215/bench_loop.sh --runs 5
- tools/perf/phase215/bench_strlen.sh --runs 5 --fast 1
- tools/perf/phase215/run_all.sh --runs 5 --timeout 120
Dev Toggles (keep OFF by default)
- NYASH_LLVM_FAST=1 (strlen FAST)
- HAKO_MIR_BUILDER_JSONFRAG_NORMALIZE=1 (normalize)
- HAKO_MIR_BUILDER_NORMALIZE_TAG=1 (tag, test-only)
Exit Criteria
- Representative microbenches stable (≤ 5% variance) and ≥ 80% of C baselines.
- No regression in EXE canaries (loop/print/strlen FAST) and VM parity canaries.
トグル
- `NYASH_MIR_LOOP_HOIST=1` … StrlenFold/LoopHoist/ConstDedup/BinopCSE を有効化
- `NYASH_AOT_COLLECTIONS_HOT=1` … CollectionsHotArray/Map
- `NYASH_AOT_MAP_KEY_MODE``h|i64|hh|auto`(推奨: `auto`
- `NYASH_VERIFY_RET_PURITY=1` … Return 純化ガード開発時ON
ベンチ(例)
```bash
export NYASH_SKIP_TOML_ENV=1 NYASH_DISABLE_PLUGINS=1 \
NYASH_LLVM_SKIP_BUILD=1 NYASH_LLVM_FAST=1 NYASH_LLVM_FAST_INT=1 \
NYASH_MIR_LOOP_HOIST=1 NYASH_AOT_COLLECTIONS_HOT=1 NYASH_VERIFY_RET_PURITY=1
for c in arraymap matmul sieve linidx maplin; do \
tools/perf/microbench.sh --case $c --exe --runs 3; echo; done
```

View File

@ -36,7 +36,13 @@ Canaries
- tools/dev/stageb_loop_json_canary.sh — Program(JSON) shape for loop(i<n){i=i+1}
- tools/dev/phase216_chain_canary.sh endtoend EXE rc=10 for minimal loop
Provider Path Notes (Dev)
- Optional normalization for provider output is available via `HAKO_MIR_NORMALIZE_PROVIDER=1`.
- This applies the same JsonFrag normalizer/purifier to MIR(JSON) emitted by the Rust Provider path.
- Keep defaults unchanged; use only during bringup to eliminate retafter effects.
- `HAKO_MIR_BUILDER_LOOP_FORCE_JSONFRAG=1` now shortcircuits both selfhostfirst and providerfirst wrappers to emit a minimal, pure controlflow MIR suitable for EXE build sanity.
- Default OFF; intended for small canaries and performance harness bringup.
Removal Plan for temporary parser fallback
- Once VM/gpos interaction is fixed and parser emits correct loop JSON without guards,
remove the conservative fallback in ParserControlBox.parse_loop.

View File

@ -0,0 +1,51 @@
# Phase 21.6 — DualEmit Parity & Cline Readiness
Goal: Produce identical MIR(JSON) from both provider (Rust) and selfhost (Hako) builder paths, measure generation cost, and keep AOT (nyllvmc) fast/green. All work is semanticspreserving; defaults remain unchanged.
## Checklists
- [ ] Dualemit parity on representative apps (MIR(JSON) normalized SHA1 equal)
- [ ] Resolverfirst ON passes quick/integration
- [ ] Selfhostfirst fallback ok (provider/legacy on failure)
- [ ] AOT obj/exe via nyllvmc (crate backend) green
- [ ] Docs updated (bench guides, env vars, quick recipes)
## Scripts
- Dual emit + compare + bench: `tools/perf/dual_emit_compare.sh <input.hako> [rounds]`
- MIR emit bench: `tools/perf/bench_hakorune_emit_mir.sh <input.hako> [rounds]`
- AOT bench: `tools/perf/bench_ny_mir_builder.sh <mir.json> [rounds]`
- MIR diff: `tools/perf/compare_mir_json.sh <a.json> <b.json>`
## Env Knobs
- `HAKO_USING_RESOLVER_FIRST=1` (resolverfirst)
- `HAKO_SELFHOST_BUILDER_FIRST=1` (selfhost→provider→legacy)
- `HAKO_MIR_BUILDER_BOX=hako.mir.builder|min`
- `NYASH_LLVM_BACKEND=crate`nyllvmc
- `HAKO_LLVM_OPT_LEVEL=0|1`AOT O0 既定)
## Benchmarks — Tracking
Record normalized parity and generation times here (edit in place).
Legend: SHA1 = normalized JSON digest; Parity=Yes when SHA1 equal; Times are medians unless noted.
| Benchmark (Hako) | Resolver | Parity | Provider p50 (ms) | Selfhost p50 (ms) | Notes |
|------------------------------------------|----------|--------|-------------------|-------------------|-------|
| apps/examples/json_query/main.hako | off/on | | | | |
| apps/examples/json_pp/main.hako | off/on | | | | |
| apps/examples/json_lint/main.hako | off/on | | | | |
| apps/examples/json_query_min/main.hako | off/on | | | | |
How to fill:
1) Run `tools/perf/dual_emit_compare.sh <file> 5`
2) Copy p50s from the summary lines and mark Parity based on `compare_mir_json.sh` output.
3) Note any diffs (callee kinds/order/phi/meta) in Notes.
## Next Steps
- [ ] If parity holds on the above set, extend to apps/tests subset
- [ ] If diffs remain, categorize and align either provider or selfhost output
- [ ] Keep AOT line green under `HAKO_LLVM_OPT_LEVEL=0` and optional `=1` spot checks

View File

@ -9,12 +9,15 @@ Targets (must be green)
Canaries
- tools/dev/phase216_chain_canary_call.sh — remains PASS when OFF, PASS when ON
- Add: tools/dev/phase217_methodize_canary.sh (dev) — asserts method callee usage in MIR or IR tags
- tools/dev/phase217_methodize_canary.sh (dev) — compile-run rc=5セマンティクス
- tools/dev/phase217_methodize_json_canary.sh (dev) — v1 rootschema_version+ mir_call presentMethodが望ましい、Globalは経過容認
Toggles
- HAKO_MIR_BUILDER_METHODIZE=1 (new)
- HAKO_STAGEB_FUNC_SCAN=1 / HAKO_MIR_BUILDER_FUNCS=1 / HAKO_MIR_BUILDER_CALL_RESOLVE=1 (existing)
- 一軍devプロファイルで既定ON: 上記3つ + NYASH_JSON_SCHEMA_V1=1 + NYASH_MIR_UNIFIED_CALL=1
- 診断既定OFF: HAKO_BRIDGE_METHODIZE=1core_bridge側の補助。Hako既定化後に撤去
Rollback
- Disable HAKO_MIR_BUILDER_METHODIZE; remove methodization rewrite; keep Global path active.
- core_bridgeの methodize ブリッジは Hako側が既定化され次第、撤去タグ: [bridge/methodize:*] を一時観測可能にして差分検知)