Files
hakorune/benchmarks/README.md

89 lines
6.4 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Hakorune Benchmarks
This repository now bundles a light micro/synthetic benchmark suite to keep an eye on low-level optimizations. All cases are emitted on the fly by `tools/perf/microbench.sh` so they do not interfere with normal apps, but the generated Nyash code mirrors patterns we see in real programs.
## Included cases
| Case | Notes |
|-------------|-----------------------------------------------------------------------|
| `loop` | Plain integer accumulation |
| `strlen` | String length in tight loop (`nyrt_string_length` path) |
| `box` | StringBox allocation/free |
| `branch` | Dense conditional tree (modulo & arithmetic) |
| `call` | Helper function dispatch (mix/twist) |
| `stringchain`| Substring concatenation + length accumulation |
| `arraymap` | ArrayBox + MapBox churn |
| `chip8` | Simplified CHIP-8 style fetch/decode loop (derived from apps/chip8) |
| `kilo` | Text-buffer edits/search (inspired by enhanced_kilo_editor) |
| `sieve` | Integer-heavy Sieve of Eratosthenes (prime count) |
| `matmul` | NxN integer matrix multiply (3 nested loops) |
| `linidx` | Linear index array ops: idx=i*cols+j (CSE/hoist検証) |
| `maplin` | Integer-key Map get/set with linear keys (auto key heuristics) |
Each case has a matching C reference, so the script reports both absolute time and the Hakorune/C ratio. Scenario-style cases (`chip8`, `kilo`) still keep all logic deterministic and print-free to make timings stable.
## Notes
- `tools/perf/dump_mir.sh` can optionally write the MIR(JSON) for a given `.hako` and print a block/op histogram. It tries the normal provider path first and falls back to the minimal `jsonfrag` version (while-form) when needed, so you can inspect both the structural skeleton and the full lowering.
- Current baseline observations (LLVM/EXE, `NYASH_SKIP_TOML_ENV=1 NYASH_DISABLE_PLUGINS=1`): `call`, `stringchain`, and `kilo` already beat the C reference (ratio < 100%), while `branch`, `arraymap`, and `chip8` remain near 200%—they are targets for the upcoming hoisting/array-map hot-path work.
### MIR emit stabilization (2025-11-13)
The `--exe` mode now uses a robust Python3-based JSON extraction in `tools/hakorune_emit_mir.sh` to handle stdout noise from Stage-B. When Stage-B is unavailable (using resolution issues), the script automatically falls back to:
1. Direct `--emit-mir-json` CLI path
2. Minimal jsonfrag MIR generation (FORCE mode)
This ensures that `tools/perf/microbench.sh --exe` always produces a ratio measurement, even when the full selfhost MIR builder path is unavailable. For production use, `PERF_USE_PROVIDER=1` can force the provider path (with automatic jsonfrag fallback).
## Latest fast-path measurements
The following numbers were recorded on 2025-11-12 with the opt-in work enabled:
```bash
export NYASH_SKIP_TOML_ENV=1 NYASH_DISABLE_PLUGINS=1 \
NYASH_LLVM_SKIP_BUILD=1 NYASH_LLVM_FAST=1 NYASH_LLVM_FAST_INT=1 \
NYASH_MIR_LOOP_HOIST=1 NYASH_AOT_COLLECTIONS_HOT=1
tools/perf/microbench.sh --case <case> --backend llvm --exe --runs 3
```
Goal: bring the Hakorune EXE ratio to 125% of the reference C time (roughly C's 80%).
Current effort: keep baking new hoist/CSE patterns so `arraymap`, `matmul`, and `maplin` fall below that bar while `sieve` and the other cases stay stable.
| Case | EXE ratio (C=100%) | Notes |
|-----------|--------------------|-------|
| `branch` | 75.00% | 目標達成(≤125%)。 |
| `arraymap` | 150.00% | Array/Map hot-path + binop CSE をさらに磨いて 125% を目指す |
| `chip8` | 25.00% | 十分速いFAST_INT/hoist が効いている |
| `kilo` | 0.21% (N=200,000) | LLVM backend では EXE 経路を強制し既定 N 200k に自動調整C 参照の方が重い構成のため比率は極小 |
| `sieve` | 200.00% | `NYASH_VERIFY_RET_PURITY=1` ON での測定auto キー判定がまだ保守的 |
| `matmul` | 300.00% | まだ 3 重ループの Array/Map get/set が支配自動 CSE auto map key を詰める予定 |
| `linidx` | 100.00% | Linear index case is at parity; hoist + CSE already helps share SSA. |
| `maplin` | 200.00% | auto Map キー判定linear/const拡張 _h を選択しやすくなった さらに詰める余地あり |
### Notes
- You can rerun any case with the command above; `NYASH_LLVM_SKIP_BUILD=1` keeps repeated ny-llvmc builds cheap once the binaries are ready.
- `kilo` C 参照側が重く既定 N=5,000,000 だと長時間化するためLLVM backend では常に EXE 経路既定 N=200,000 で測定するようにしました`tools/perf/microbench.sh` `--backend llvm` 時に自動で `--exe` + `N=200000` 相当へ調整します)。必要なら `--n <value>` で上書きしてください
- `lang/src/llvm_ir/boxes/aot_prep/README.md` StrlenFold / LoopHoist / ConstDedup / CollectionsHot のパス一覧をまとめNYASH_AOT_MAP_KEY_MODE={h|i64|hh|auto} の切り替えも説明しています今後は hoist/collections の強化で arraymap/matmul/maplin/sieve 125% 以内に引き下げる施策を続けます
- 代表測定では `NYASH_VERIFY_RET_PURITY=1` を有効化しReturn 直前の副作用を Fail-Fast で検出しながら回していますごく軽微なハンドルboxcallの変化が 23× に跳ねることがある点をご留意ください)。
## Running
Build `hakorune` in release mode first:
```bash
cargo build --release --bin hakorune
```
Then pick a case:
```bash
# LLVM EXE vs C (default n/runs are tuned per case)
tools/perf/microbench.sh --case chip8 --exe --n 200000 --runs 3
# VM vs C for a string-heavy micro
tools/perf/microbench.sh --case stringchain --backend vm --n 100000 --runs 3
```
The script takes care of generating temporary Nyash/C sources, building the C baseline, piping through the MIR builder (with FAST toggles enabled), and printing averages + ratios. Set `NYASH_SKIP_TOML_ENV=1` / `NYASH_DISABLE_PLUGINS=1` before calling the script if you want a fully clean execution environment.