Files

nyash-codex dda65b94b7 Phase 21.7 normalization: optimization pre-work + bench harness expansion

- Add opt-in optimizations (defaults OFF)
  - Ret purity verifier: NYASH_VERIFY_RET_PURITY=1
  - strlen FAST enhancement for const handles
  - FAST_INT gate for same-BB SSA optimization
  - length cache for string literals in llvmlite
- Expand bench harness (tools/perf/microbench.sh)
  - Add branch/call/stringchain/arraymap/chip8/kilo cases
  - Auto-calculate ratio vs C reference
  - Document in benchmarks/README.md
- Compiler health improvements
  - Unify PHI insertion to insert_phi_at_head()
  - Add NYASH_LLVM_SKIP_BUILD=1 for build reuse
- Runtime & safety enhancements
  - Clarify Rust/Hako ownership boundaries
  - Strengthen receiver localization (LocalSSA/pin/after-PHIs)
  - Stop excessive PluginInvoke→BoxCall rewrites
- Update CURRENT_TASK.md, docs, and canaries

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-13 16:40:58 +09:00

baselines

feat(phase21.5/22.1): MirBuilder JsonFrag refactor + FileBox ring-1 + registry tests

2025-11-10 19:42:42 +09:00

feat(phase21.5/22.1): MirBuilder JsonFrag refactor + FileBox ring-1 + registry tests

2025-11-10 19:42:42 +09:00

python

feat(phase21.5/22.1): MirBuilder JsonFrag refactor + FileBox ring-1 + registry tests

2025-11-10 19:42:42 +09:00

bench_aot_len_heavy.hako

phase: 20.49 COMPLETE; 20.50 Flow+String minimal reps; 20.51 selfhost v0/v1 minimal (Option A/B); hv1-inline binop/unop/copy; docs + run_all + CURRENT_TASK -> 21.0

2025-11-06 15:41:52 +09:00

bench_aot_len_light.hako

phase: 20.49 COMPLETE; 20.50 Flow+String minimal reps; 20.51 selfhost v0/v1 minimal (Option A/B); hv1-inline binop/unop/copy; docs + run_all + CURRENT_TASK -> 21.0

2025-11-06 15:41:52 +09:00

bench_aot_len_medium.hako

phase: 20.49 COMPLETE; 20.50 Flow+String minimal reps; 20.51 selfhost v0/v1 minimal (Option A/B); hv1-inline binop/unop/copy; docs + run_all + CURRENT_TASK -> 21.0

2025-11-06 15:41:52 +09:00

bench_box_create_destroy_small.hako

phase: 20.49 COMPLETE; 20.50 Flow+String minimal reps; 20.51 selfhost v0/v1 minimal (Option A/B); hv1-inline binop/unop/copy; docs + run_all + CURRENT_TASK -> 21.0

2025-11-06 15:41:52 +09:00

bench_box_create_destroy.hako

phase: 20.49 COMPLETE; 20.50 Flow+String minimal reps; 20.51 selfhost v0/v1 minimal (Option A/B); hv1-inline binop/unop/copy; docs + run_all + CURRENT_TASK -> 21.0

2025-11-06 15:41:52 +09:00

bench_heavy.hako

phase: 20.49 COMPLETE; 20.50 Flow+String minimal reps; 20.51 selfhost v0/v1 minimal (Option A/B); hv1-inline binop/unop/copy; docs + run_all + CURRENT_TASK -> 21.0

2025-11-06 15:41:52 +09:00

bench_light.hako

phase: 20.49 COMPLETE; 20.50 Flow+String minimal reps; 20.51 selfhost v0/v1 minimal (Option A/B); hv1-inline binop/unop/copy; docs + run_all + CURRENT_TASK -> 21.0

2025-11-06 15:41:52 +09:00

bench_medium.hako

phase: 20.49 COMPLETE; 20.50 Flow+String minimal reps; 20.51 selfhost v0/v1 minimal (Option A/B); hv1-inline binop/unop/copy; docs + run_all + CURRENT_TASK -> 21.0

2025-11-06 15:41:52 +09:00

bench_method_call_only_small.hako

phase: 20.49 COMPLETE; 20.50 Flow+String minimal reps; 20.51 selfhost v0/v1 minimal (Option A/B); hv1-inline binop/unop/copy; docs + run_all + CURRENT_TASK -> 21.0

2025-11-06 15:41:52 +09:00

bench_method_call_only.hako

phase: 20.49 COMPLETE; 20.50 Flow+String minimal reps; 20.51 selfhost v0/v1 minimal (Option A/B); hv1-inline binop/unop/copy; docs + run_all + CURRENT_TASK -> 21.0

2025-11-06 15:41:52 +09:00

README.md

Phase 21.7 normalization: optimization pre-work + bench harness expansion

2025-11-13 16:40:58 +09:00

README.md

Hakorune Benchmarks

This repository now bundles a light micro/synthetic benchmark suite to keep an eye on low-level optimizations. All cases are emitted on the fly by tools/perf/microbench.sh so they do not interfere with normal apps, but the generated Nyash code mirrors patterns we see in real programs.

Included cases

Case	Notes
`loop`	Plain integer accumulation
`strlen`	String length in tight loop (`nyrt_string_length` path)
`box`	StringBox allocation/free
`branch`	Dense conditional tree (modulo & arithmetic)
`call`	Helper function dispatch (mix/twist)
`stringchain`	Substring concatenation + length accumulation
`arraymap`	ArrayBox + MapBox churn
`chip8`	Simplified CHIP-8 style fetch/decode loop (derived from apps/chip8)
`kilo`	Text-buffer edits/search (inspired by enhanced_kilo_editor)
`sieve`	Integer-heavy Sieve of Eratosthenes (prime count)
`matmul`	NxN integer matrix multiply (3 nested loops)
`linidx`	Linear index array ops: idx=i*cols+j (CSE/hoist検証)
`maplin`	Integer-key Map get/set with linear keys (auto key heuristics)

Each case has a matching C reference, so the script reports both absolute time and the Hakorune/C ratio. Scenario-style cases (chip8, kilo) still keep all logic deterministic and print-free to make timings stable.

Notes

tools/perf/dump_mir.sh can optionally write the MIR(JSON) for a given .hako and print a block/op histogram. It tries the normal provider path first and falls back to the minimal jsonfrag version (while-form) when needed, so you can inspect both the structural skeleton and the full lowering.
Current baseline observations (LLVM/EXE, NYASH_SKIP_TOML_ENV=1 NYASH_DISABLE_PLUGINS=1): call, stringchain, and kilo already beat the C reference (ratio < 100%), while branch, arraymap, and chip8 remain near ≈200%—they are targets for the upcoming hoisting/array-map hot-path work.

Latest fast-path measurements

The following numbers were recorded on 2025-11-12 with the opt-in work enabled:

export NYASH_SKIP_TOML_ENV=1 NYASH_DISABLE_PLUGINS=1 \
       NYASH_LLVM_SKIP_BUILD=1 NYASH_LLVM_FAST=1 NYASH_LLVM_FAST_INT=1 \
       NYASH_MIR_LOOP_HOIST=1 NYASH_AOT_COLLECTIONS_HOT=1
tools/perf/microbench.sh --case <case> --backend llvm --exe --runs 3

Goal: bring the Hakorune EXE ratio to ≤125% of the reference C time (roughly C's 80%).
Current effort: keep baking new hoist/CSE patterns so arraymap, matmul, and maplin fall below that bar while sieve and the other cases stay stable.

Case	EXE ratio (C=100%)	Notes
`branch`	75.00%	目標達成（≤125%）。
`arraymap`	150.00%	Array/Map hot-path + binop CSE をさらに磨いて ≤125% を目指す。
`chip8`	25.00%	十分速い。FAST_INT/hoist が効いている。
`kilo`	0.21% (N=200,000)	EXE モード既定で N を 200k に自動調整。C 参照の方が重い構成のため比率は極小。
`sieve`	200.00%	`NYASH_VERIFY_RET_PURITY=1` ON での測定。auto キー判定がまだ保守的。
`matmul`	300.00%	まだ 3 重ループの Array/Map get/set が支配。自動 CSE と auto map key を詰める予定。
`linidx`	100.00%	Linear index case is at parity; hoist + CSE already helps share SSA.
`maplin`	200.00%	auto Map キー判定（linear/const拡張）で _h を選択しやすくなった。さらに詰める余地あり。

Notes

You can rerun any case with the command above; NYASH_LLVM_SKIP_BUILD=1 keeps repeated ny-llvmc builds cheap once the binaries are ready.
kilo は C 参照側が重く既定 N=5,000,000 だと長時間化するため、EXE モードかつ N 未指定では既定 N を 200,000 に自動調整するようにしました（tools/perf/microbench.sh）。必要なら --n <value> で上書きしてください。
lang/src/llvm_ir/boxes/aot_prep/README.md に StrlenFold / LoopHoist / ConstDedup / CollectionsHot のパス一覧をまとめ、NYASH_AOT_MAP_KEY_MODE={h|i64|hh|auto} の切り替えも説明しています。今後は hoist/collections の強化で arraymap/matmul/maplin/sieve を 125% 以内に引き下げる施策を続けます。
代表測定では NYASH_VERIFY_RET_PURITY=1 を有効化し、Return 直前の副作用を Fail-Fast で検出しながら回しています（ごく軽微なハンドル・boxcallの変化が 2～3× に跳ねることがある点をご留意ください）。

Running

Build hakorune in release mode first:

cargo build --release --bin hakorune

Then pick a case:

# LLVM EXE vs C (default n/runs are tuned per case)
tools/perf/microbench.sh --case chip8 --exe --n 200000 --runs 3

# VM vs C for a string-heavy micro
tools/perf/microbench.sh --case stringchain --backend vm --n 100000 --runs 3

The script takes care of generating temporary Nyash/C sources, building the C baseline, piping through the MIR builder (with FAST toggles enabled), and printing averages + ratios. Set NYASH_SKIP_TOML_ENV=1 / NYASH_DISABLE_PLUGINS=1 before calling the script if you want a fully clean execution environment.

README.md Unescape Escape

Hakorune Benchmarks

Included cases

Notes

Latest fast-path measurements

Notes

Running

README.md