- Add opt-in optimizations (defaults OFF) - Ret purity verifier: NYASH_VERIFY_RET_PURITY=1 - strlen FAST enhancement for const handles - FAST_INT gate for same-BB SSA optimization - length cache for string literals in llvmlite - Expand bench harness (tools/perf/microbench.sh) - Add branch/call/stringchain/arraymap/chip8/kilo cases - Auto-calculate ratio vs C reference - Document in benchmarks/README.md - Compiler health improvements - Unify PHI insertion to insert_phi_at_head() - Add NYASH_LLVM_SKIP_BUILD=1 for build reuse - Runtime & safety enhancements - Clarify Rust/Hako ownership boundaries - Strengthen receiver localization (LocalSSA/pin/after-PHIs) - Stop excessive PluginInvoke→BoxCall rewrites - Update CURRENT_TASK.md, docs, and canaries 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Hakorune Benchmarks
This repository now bundles a light micro/synthetic benchmark suite to keep an eye on low-level optimizations. All cases are emitted on the fly by tools/perf/microbench.sh so they do not interfere with normal apps, but the generated Nyash code mirrors patterns we see in real programs.
Included cases
| Case | Notes |
|---|---|
loop |
Plain integer accumulation |
strlen |
String length in tight loop (nyrt_string_length path) |
box |
StringBox allocation/free |
branch |
Dense conditional tree (modulo & arithmetic) |
call |
Helper function dispatch (mix/twist) |
stringchain |
Substring concatenation + length accumulation |
arraymap |
ArrayBox + MapBox churn |
chip8 |
Simplified CHIP-8 style fetch/decode loop (derived from apps/chip8) |
kilo |
Text-buffer edits/search (inspired by enhanced_kilo_editor) |
sieve |
Integer-heavy Sieve of Eratosthenes (prime count) |
matmul |
NxN integer matrix multiply (3 nested loops) |
linidx |
Linear index array ops: idx=i*cols+j (CSE/hoist検証) |
maplin |
Integer-key Map get/set with linear keys (auto key heuristics) |
Each case has a matching C reference, so the script reports both absolute time and the Hakorune/C ratio. Scenario-style cases (chip8, kilo) still keep all logic deterministic and print-free to make timings stable.
Notes
tools/perf/dump_mir.shcan optionally write the MIR(JSON) for a given.hakoand print a block/op histogram. It tries the normal provider path first and falls back to the minimaljsonfragversion (while-form) when needed, so you can inspect both the structural skeleton and the full lowering.- Current baseline observations (LLVM/EXE,
NYASH_SKIP_TOML_ENV=1 NYASH_DISABLE_PLUGINS=1):call,stringchain, andkiloalready beat the C reference (ratio < 100%), whilebranch,arraymap, andchip8remain near ≈200%—they are targets for the upcoming hoisting/array-map hot-path work.
Latest fast-path measurements
The following numbers were recorded on 2025-11-12 with the opt-in work enabled:
export NYASH_SKIP_TOML_ENV=1 NYASH_DISABLE_PLUGINS=1 \
NYASH_LLVM_SKIP_BUILD=1 NYASH_LLVM_FAST=1 NYASH_LLVM_FAST_INT=1 \
NYASH_MIR_LOOP_HOIST=1 NYASH_AOT_COLLECTIONS_HOT=1
tools/perf/microbench.sh --case <case> --backend llvm --exe --runs 3
Goal: bring the Hakorune EXE ratio to ≤125% of the reference C time (roughly C's 80%).
Current effort: keep baking new hoist/CSE patterns so arraymap, matmul, and maplin fall below that bar while sieve and the other cases stay stable.
| Case | EXE ratio (C=100%) | Notes |
|---|---|---|
branch |
75.00% | 目標達成(≤125%)。 |
arraymap |
150.00% | Array/Map hot-path + binop CSE をさらに磨いて ≤125% を目指す。 |
chip8 |
25.00% | 十分速い。FAST_INT/hoist が効いている。 |
kilo |
0.21% (N=200,000) | EXE モード既定で N を 200k に自動調整。C 参照の方が重い構成のため比率は極小。 |
sieve |
200.00% | NYASH_VERIFY_RET_PURITY=1 ON での測定。auto キー判定がまだ保守的。 |
matmul |
300.00% | まだ 3 重ループの Array/Map get/set が支配。自動 CSE と auto map key を詰める予定。 |
linidx |
100.00% | Linear index case is at parity; hoist + CSE already helps share SSA. |
maplin |
200.00% | auto Map キー判定(linear/const拡張)で _h を選択しやすくなった。 さらに詰める余地あり。 |
Notes
- You can rerun any case with the command above;
NYASH_LLVM_SKIP_BUILD=1keeps repeated ny-llvmc builds cheap once the binaries are ready. kiloは C 参照側が重く既定 N=5,000,000 だと長時間化するため、EXE モードかつ N 未指定では既定 N を 200,000 に自動調整するようにしました(tools/perf/microbench.sh)。必要なら--n <value>で上書きしてください。lang/src/llvm_ir/boxes/aot_prep/README.mdに StrlenFold / LoopHoist / ConstDedup / CollectionsHot のパス一覧をまとめ、NYASH_AOT_MAP_KEY_MODE={h|i64|hh|auto} の切り替えも説明しています。今後は hoist/collections の強化で arraymap/matmul/maplin/sieve を 125% 以内に引き下げる施策を続けます。- 代表測定では
NYASH_VERIFY_RET_PURITY=1を有効化し、Return 直前の副作用を Fail-Fast で検出しながら回しています(ごく軽微なハンドル・boxcallの変化が 2~3× に跳ねることがある点をご留意ください)。
Running
Build hakorune in release mode first:
cargo build --release --bin hakorune
Then pick a case:
# LLVM EXE vs C (default n/runs are tuned per case)
tools/perf/microbench.sh --case chip8 --exe --n 200000 --runs 3
# VM vs C for a string-heavy micro
tools/perf/microbench.sh --case stringchain --backend vm --n 100000 --runs 3
The script takes care of generating temporary Nyash/C sources, building the C baseline, piping through the MIR builder (with FAST toggles enabled), and printing averages + ratios. Set NYASH_SKIP_TOML_ENV=1 / NYASH_DISABLE_PLUGINS=1 before calling the script if you want a fully clean execution environment.