hakorune/benchmarks/README.md

# Hakorune Benchmarks

This repository now bundles a light micro/synthetic benchmark suite to keep an eye on low-level optimizations. All cases are emitted on the fly by `tools/perf/microbench.sh` so they do not interfere with normal apps, but the generated Nyash code mirrors patterns we see in real programs.

## Included cases

| Case        | Notes                                                                 |
|-------------|-----------------------------------------------------------------------|
| `loop`      | Plain integer accumulation                                            |
| `strlen`    | String length in tight loop (`nyrt_string_length` path)               |
| `box`       | StringBox allocation/free                                             |
| `branch`    | Dense conditional tree (modulo & arithmetic)                          |
| `call`      | Helper function dispatch (mix/twist)                                  |
| `stringchain`| Substring concatenation + length accumulation                        |
| `arraymap`  | ArrayBox + MapBox churn                                               |
| `chip8`     | Simplified CHIP-8 style fetch/decode loop (derived from apps/chip8)   |
| `kilo`      | Text-buffer edits/search (inspired by enhanced_kilo_editor)           |
| `sieve`     | Integer-heavy Sieve of Eratosthenes (prime count)                     |
| `matmul`    | NxN integer matrix multiply (3 nested loops)                          |
| `linidx`    | Linear index array ops: idx=i*cols+j (CSE/hoist検証)                  |
| `maplin`    | Integer-key Map get/set with linear keys (auto key heuristics)        |

Each case has a matching C reference, so the script reports both absolute time and the Hakorune/C ratio. Scenario-style cases (`chip8`, `kilo`) still keep all logic deterministic and print-free to make timings stable.

## Notes

- `tools/perf/dump_mir.sh` can optionally write the MIR(JSON) for a given `.hako` and print a block/op histogram. It tries the normal provider path first and falls back to the minimal `jsonfrag` version (while-form) when needed, so you can inspect both the structural skeleton and the full lowering.
- Current baseline observations (LLVM/EXE, `NYASH_SKIP_TOML_ENV=1 NYASH_DISABLE_PLUGINS=1`): `call`, `stringchain`, and `kilo` already beat the C reference (ratio < 100%), while `branch`, `arraymap`, and `chip8` remain near ≈200%—they are targets for the upcoming hoisting/array-map hot-path work.

### MIR emit stabilization (2025-11-13)

The `--exe` mode now uses a robust Python3-based JSON extraction in `tools/hakorune_emit_mir.sh` to handle stdout noise from Stage-B. When Stage-B is unavailable (using resolution issues), the script automatically falls back to:
1. Direct `--emit-mir-json` CLI path
2. Minimal jsonfrag MIR generation (FORCE mode)

This ensures that `tools/perf/microbench.sh --exe` always produces a ratio measurement, even when the full selfhost MIR builder path is unavailable. For production use, `PERF_USE_PROVIDER=1` can force the provider path (with automatic jsonfrag fallback).

## Latest fast-path measurements

The following numbers were recorded on 2025-11-12 with the opt-in work enabled:

```bash
export NYASH_SKIP_TOML_ENV=1 NYASH_DISABLE_PLUGINS=1 \
       NYASH_LLVM_SKIP_BUILD=1 NYASH_LLVM_FAST=1 NYASH_LLVM_FAST_INT=1 \
       NYASH_MIR_LOOP_HOIST=1 NYASH_AOT_COLLECTIONS_HOT=1
tools/perf/microbench.sh --case <case> --backend llvm --exe --runs 3
```

Goal: bring the Hakorune EXE ratio to ≤125% of the reference C time (roughly C's 80%).
Current effort: keep baking new hoist/CSE patterns so `arraymap`, `matmul`, and `maplin` fall below that bar while `sieve` and the other cases stay stable.

| Case      | EXE ratio (C=100%) | Notes |
|-----------|--------------------|-------|
| `branch`   | 75.00%             | 目標達成（≤125%）。 |
| `arraymap` | 150.00%            | Array/Map hot-path + binop CSE をさらに磨いて ≤125% を目指す。 |
| `chip8`    | 25.00%             | 十分速い。FAST_INT/hoist が効いている。 |
| `kilo`     | 0.21% (N=200,000)  | LLVM backend では EXE 経路を強制し、既定 N を 200k に自動調整。C 参照の方が重い構成のため比率は極小。 |
| `sieve`    | 200.00%            | `NYASH_VERIFY_RET_PURITY=1` ON での測定。auto キー判定がまだ保守的。 |
| `matmul`   | 300.00%            | まだ 3 重ループの Array/Map get/set が支配。自動 CSE と auto map key を詰める予定。 |
| `linidx`   | 100.00%            | Linear index case is at parity; hoist + CSE already helps share SSA. |
| `maplin`   | 200.00%            | auto Map キー判定（linear/const拡張）で _h を選択しやすくなった。 さらに詰める余地あり。 |

### Notes

- You can rerun any case with the command above; `NYASH_LLVM_SKIP_BUILD=1` keeps repeated ny-llvmc builds cheap once the binaries are ready.
- `kilo` は C 参照側が重く既定 N=5,000,000 だと長時間化するため、LLVM backend では常に EXE 経路＋既定 N=200,000 で測定するようにしました（`tools/perf/microbench.sh` が `--backend llvm` 時に自動で `--exe` + `N=200000` 相当へ調整します）。必要なら `--n <value>` で上書きしてください。
- `lang/src/llvm_ir/boxes/aot_prep/README.md` に StrlenFold / LoopHoist / ConstDedup / CollectionsHot のパス一覧をまとめ、NYASH_AOT_MAP_KEY_MODE={h|i64|hh|auto} の切り替えも説明しています。今後は hoist/collections の強化で arraymap/matmul/maplin/sieve を 125% 以内に引き下げる施策を続けます。
- 代表測定では `NYASH_VERIFY_RET_PURITY=1` を有効化し、Return 直前の副作用を Fail-Fast で検出しながら回しています（ごく軽微なハンドル・boxcallの変化が 2～3× に跳ねることがある点をご留意ください）。

## Running

Build `hakorune` in release mode first:

```bash
cargo build --release --bin hakorune
```

Then pick a case:

```bash
# LLVM EXE vs C (default n/runs are tuned per case)
tools/perf/microbench.sh --case chip8 --exe --n 200000 --runs 3

# VM vs C for a string-heavy micro
tools/perf/microbench.sh --case stringchain --backend vm --n 100000 --runs 3
```

The script takes care of generating temporary Nyash/C sources, building the C baseline, piping through the MIR builder (with FAST toggles enabled), and printing averages + ratios. Set `NYASH_SKIP_TOML_ENV=1` / `NYASH_DISABLE_PLUGINS=1` before calling the script if you want a fully clean execution environment.