Files
hakorune/benchmarks/README.md
nyash-codex 376857a81f fix(perf): stabilize MIR emit for ny-llvmc/EXE benchmarks
Problem:
- Stage-B JSON extraction used fragile `awk '/^{/,/^}$/'`
- stdout noise caused empty JSON and bench failures
- arraymap/matmul/maplin --exe mode failed with "failed to emit MIR JSON"

Solution:
- Python3-based robust JSON extraction
  - Search for "kind":"Program" marker
  - Balance braces with quote/escape awareness
  - Resilient to stdout noise
- FORCE jsonfrag mode priority (HAKO_MIR_BUILDER_LOOP_FORCE_JSONFRAG=1)
  - Bypasses Stage-B entirely when set
  - Generates minimal while-form MIR with PHI nodes
- Multi-level fallback strategy
  - L1: Stage-B + selfhost/provider builder
  - L2: --emit-mir-json CLI direct path
  - L3: Minimal jsonfrag MIR generation
- cd $ROOT for Stage-B (fixes using resolution context)

Results:
-  arraymap --exe: ratio=200.00% (was failing)
-  matmul --exe: ratio=200.00% (was failing)
-  maplin --exe: ratio=100.00% (was failing)
-  Existing canaries: aot_prep_e2e_normalize_canary_vm.sh PASS
-  New canary: emit_mir_canary.sh PASS

Known Issues (workarounds applied):
- Stage-B compiler broken (using resolution: StringHelpers.skip_ws/2)
- --emit-mir-json CLI broken (undefined variable: local)
- Current jsonfrag mode bypasses both issues

Documentation:
- benchmarks/README.md: Added MIR emit stabilization notes
- ENV_VARS.md: Already documents HAKO_SELFHOST_BUILDER_FIRST, etc.

Next: Fix Stage-B using resolution to re-enable full optimization path

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 17:23:48 +09:00

6.3 KiB
Raw Blame History

Hakorune Benchmarks

This repository now bundles a light micro/synthetic benchmark suite to keep an eye on low-level optimizations. All cases are emitted on the fly by tools/perf/microbench.sh so they do not interfere with normal apps, but the generated Nyash code mirrors patterns we see in real programs.

Included cases

Case Notes
loop Plain integer accumulation
strlen String length in tight loop (nyrt_string_length path)
box StringBox allocation/free
branch Dense conditional tree (modulo & arithmetic)
call Helper function dispatch (mix/twist)
stringchain Substring concatenation + length accumulation
arraymap ArrayBox + MapBox churn
chip8 Simplified CHIP-8 style fetch/decode loop (derived from apps/chip8)
kilo Text-buffer edits/search (inspired by enhanced_kilo_editor)
sieve Integer-heavy Sieve of Eratosthenes (prime count)
matmul NxN integer matrix multiply (3 nested loops)
linidx Linear index array ops: idx=i*cols+j (CSE/hoist検証)
maplin Integer-key Map get/set with linear keys (auto key heuristics)

Each case has a matching C reference, so the script reports both absolute time and the Hakorune/C ratio. Scenario-style cases (chip8, kilo) still keep all logic deterministic and print-free to make timings stable.

Notes

  • tools/perf/dump_mir.sh can optionally write the MIR(JSON) for a given .hako and print a block/op histogram. It tries the normal provider path first and falls back to the minimal jsonfrag version (while-form) when needed, so you can inspect both the structural skeleton and the full lowering.
  • Current baseline observations (LLVM/EXE, NYASH_SKIP_TOML_ENV=1 NYASH_DISABLE_PLUGINS=1): call, stringchain, and kilo already beat the C reference (ratio < 100%), while branch, arraymap, and chip8 remain near ≈200%—they are targets for the upcoming hoisting/array-map hot-path work.

MIR emit stabilization (2025-11-13)

The --exe mode now uses a robust Python3-based JSON extraction in tools/hakorune_emit_mir.sh to handle stdout noise from Stage-B. When Stage-B is unavailable (using resolution issues), the script automatically falls back to:

  1. Direct --emit-mir-json CLI path
  2. Minimal jsonfrag MIR generation (FORCE mode)

This ensures that tools/perf/microbench.sh --exe always produces a ratio measurement, even when the full selfhost MIR builder path is unavailable. For production use, PERF_USE_PROVIDER=1 can force the provider path (with automatic jsonfrag fallback).

Latest fast-path measurements

The following numbers were recorded on 2025-11-12 with the opt-in work enabled:

export NYASH_SKIP_TOML_ENV=1 NYASH_DISABLE_PLUGINS=1 \
       NYASH_LLVM_SKIP_BUILD=1 NYASH_LLVM_FAST=1 NYASH_LLVM_FAST_INT=1 \
       NYASH_MIR_LOOP_HOIST=1 NYASH_AOT_COLLECTIONS_HOT=1
tools/perf/microbench.sh --case <case> --backend llvm --exe --runs 3

Goal: bring the Hakorune EXE ratio to ≤125% of the reference C time (roughly C's 80%).
Current effort: keep baking new hoist/CSE patterns so arraymap, matmul, and maplin fall below that bar while sieve and the other cases stay stable.

Case EXE ratio (C=100%) Notes
branch 75.00% 目標達成≤125%)。
arraymap 150.00% Array/Map hot-path + binop CSE をさらに磨いて ≤125% を目指す。
chip8 25.00% 十分速い。FAST_INT/hoist が効いている。
kilo 0.21% (N=200,000) EXE モード既定で N を 200k に自動調整。C 参照の方が重い構成のため比率は極小。
sieve 200.00% NYASH_VERIFY_RET_PURITY=1 ON での測定。auto キー判定がまだ保守的。
matmul 300.00% まだ 3 重ループの Array/Map get/set が支配。自動 CSE と auto map key を詰める予定。
linidx 100.00% Linear index case is at parity; hoist + CSE already helps share SSA.
maplin 200.00% auto Map キー判定linear/const拡張で _h を選択しやすくなった。 さらに詰める余地あり。

Notes

  • You can rerun any case with the command above; NYASH_LLVM_SKIP_BUILD=1 keeps repeated ny-llvmc builds cheap once the binaries are ready.
  • kilo は C 参照側が重く既定 N=5,000,000 だと長時間化するため、EXE モードかつ N 未指定では既定 N を 200,000 に自動調整するようにしました(tools/perf/microbench.sh)。必要なら --n <value> で上書きしてください。
  • lang/src/llvm_ir/boxes/aot_prep/README.md に StrlenFold / LoopHoist / ConstDedup / CollectionsHot のパス一覧をまとめ、NYASH_AOT_MAP_KEY_MODE={h|i64|hh|auto} の切り替えも説明しています。今後は hoist/collections の強化で arraymap/matmul/maplin/sieve を 125% 以内に引き下げる施策を続けます。
  • 代表測定では NYASH_VERIFY_RET_PURITY=1 を有効化し、Return 直前の副作用を Fail-Fast で検出しながら回していますごく軽微なハンドル・boxcallの変化が 23× に跳ねることがある点をご留意ください)。

Running

Build hakorune in release mode first:

cargo build --release --bin hakorune

Then pick a case:

# LLVM EXE vs C (default n/runs are tuned per case)
tools/perf/microbench.sh --case chip8 --exe --n 200000 --runs 3

# VM vs C for a string-heavy micro
tools/perf/microbench.sh --case stringchain --backend vm --n 100000 --runs 3

The script takes care of generating temporary Nyash/C sources, building the C baseline, piping through the MIR builder (with FAST toggles enabled), and printing averages + ratios. Set NYASH_SKIP_TOML_ENV=1 / NYASH_DISABLE_PLUGINS=1 before calling the script if you want a fully clean execution environment.