Files
hakorune/benchmarks/README.md
nyash-codex dda65b94b7 Phase 21.7 normalization: optimization pre-work + bench harness expansion
- Add opt-in optimizations (defaults OFF)
  - Ret purity verifier: NYASH_VERIFY_RET_PURITY=1
  - strlen FAST enhancement for const handles
  - FAST_INT gate for same-BB SSA optimization
  - length cache for string literals in llvmlite
- Expand bench harness (tools/perf/microbench.sh)
  - Add branch/call/stringchain/arraymap/chip8/kilo cases
  - Auto-calculate ratio vs C reference
  - Document in benchmarks/README.md
- Compiler health improvements
  - Unify PHI insertion to insert_phi_at_head()
  - Add NYASH_LLVM_SKIP_BUILD=1 for build reuse
- Runtime & safety enhancements
  - Clarify Rust/Hako ownership boundaries
  - Strengthen receiver localization (LocalSSA/pin/after-PHIs)
  - Stop excessive PluginInvoke→BoxCall rewrites
- Update CURRENT_TASK.md, docs, and canaries

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 16:40:58 +09:00

81 lines
5.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Hakorune Benchmarks
This repository now bundles a light micro/synthetic benchmark suite to keep an eye on low-level optimizations. All cases are emitted on the fly by `tools/perf/microbench.sh` so they do not interfere with normal apps, but the generated Nyash code mirrors patterns we see in real programs.
## Included cases
| Case | Notes |
|-------------|-----------------------------------------------------------------------|
| `loop` | Plain integer accumulation |
| `strlen` | String length in tight loop (`nyrt_string_length` path) |
| `box` | StringBox allocation/free |
| `branch` | Dense conditional tree (modulo & arithmetic) |
| `call` | Helper function dispatch (mix/twist) |
| `stringchain`| Substring concatenation + length accumulation |
| `arraymap` | ArrayBox + MapBox churn |
| `chip8` | Simplified CHIP-8 style fetch/decode loop (derived from apps/chip8) |
| `kilo` | Text-buffer edits/search (inspired by enhanced_kilo_editor) |
| `sieve` | Integer-heavy Sieve of Eratosthenes (prime count) |
| `matmul` | NxN integer matrix multiply (3 nested loops) |
| `linidx` | Linear index array ops: idx=i*cols+j (CSE/hoist検証) |
| `maplin` | Integer-key Map get/set with linear keys (auto key heuristics) |
Each case has a matching C reference, so the script reports both absolute time and the Hakorune/C ratio. Scenario-style cases (`chip8`, `kilo`) still keep all logic deterministic and print-free to make timings stable.
## Notes
- `tools/perf/dump_mir.sh` can optionally write the MIR(JSON) for a given `.hako` and print a block/op histogram. It tries the normal provider path first and falls back to the minimal `jsonfrag` version (while-form) when needed, so you can inspect both the structural skeleton and the full lowering.
- Current baseline observations (LLVM/EXE, `NYASH_SKIP_TOML_ENV=1 NYASH_DISABLE_PLUGINS=1`): `call`, `stringchain`, and `kilo` already beat the C reference (ratio < 100%), while `branch`, `arraymap`, and `chip8` remain near 200%—they are targets for the upcoming hoisting/array-map hot-path work.
## Latest fast-path measurements
The following numbers were recorded on 2025-11-12 with the opt-in work enabled:
```bash
export NYASH_SKIP_TOML_ENV=1 NYASH_DISABLE_PLUGINS=1 \
NYASH_LLVM_SKIP_BUILD=1 NYASH_LLVM_FAST=1 NYASH_LLVM_FAST_INT=1 \
NYASH_MIR_LOOP_HOIST=1 NYASH_AOT_COLLECTIONS_HOT=1
tools/perf/microbench.sh --case <case> --backend llvm --exe --runs 3
```
Goal: bring the Hakorune EXE ratio to 125% of the reference C time (roughly C's 80%).
Current effort: keep baking new hoist/CSE patterns so `arraymap`, `matmul`, and `maplin` fall below that bar while `sieve` and the other cases stay stable.
| Case | EXE ratio (C=100%) | Notes |
|-----------|--------------------|-------|
| `branch` | 75.00% | 目標達成(≤125%)。 |
| `arraymap` | 150.00% | Array/Map hot-path + binop CSE をさらに磨いて 125% を目指す |
| `chip8` | 25.00% | 十分速いFAST_INT/hoist が効いている |
| `kilo` | 0.21% (N=200,000) | EXE モード既定で N 200k に自動調整C 参照の方が重い構成のため比率は極小 |
| `sieve` | 200.00% | `NYASH_VERIFY_RET_PURITY=1` ON での測定auto キー判定がまだ保守的 |
| `matmul` | 300.00% | まだ 3 重ループの Array/Map get/set が支配自動 CSE auto map key を詰める予定 |
| `linidx` | 100.00% | Linear index case is at parity; hoist + CSE already helps share SSA. |
| `maplin` | 200.00% | auto Map キー判定linear/const拡張 _h を選択しやすくなった さらに詰める余地あり |
### Notes
- You can rerun any case with the command above; `NYASH_LLVM_SKIP_BUILD=1` keeps repeated ny-llvmc builds cheap once the binaries are ready.
- `kilo` C 参照側が重く既定 N=5,000,000 だと長時間化するためEXE モードかつ N 未指定では既定 N 200,000 に自動調整するようにしました`tools/perf/microbench.sh`)。必要なら `--n <value>` で上書きしてください
- `lang/src/llvm_ir/boxes/aot_prep/README.md` StrlenFold / LoopHoist / ConstDedup / CollectionsHot のパス一覧をまとめNYASH_AOT_MAP_KEY_MODE={h|i64|hh|auto} の切り替えも説明しています今後は hoist/collections の強化で arraymap/matmul/maplin/sieve 125% 以内に引き下げる施策を続けます
- 代表測定では `NYASH_VERIFY_RET_PURITY=1` を有効化しReturn 直前の副作用を Fail-Fast で検出しながら回していますごく軽微なハンドルboxcallの変化が 23× に跳ねることがある点をご留意ください)。
## Running
Build `hakorune` in release mode first:
```bash
cargo build --release --bin hakorune
```
Then pick a case:
```bash
# LLVM EXE vs C (default n/runs are tuned per case)
tools/perf/microbench.sh --case chip8 --exe --n 200000 --runs 3
# VM vs C for a string-heavy micro
tools/perf/microbench.sh --case stringchain --backend vm --n 100000 --runs 3
```
The script takes care of generating temporary Nyash/C sources, building the C baseline, piping through the MIR builder (with FAST toggles enabled), and printing averages + ratios. Set `NYASH_SKIP_TOML_ENV=1` / `NYASH_DISABLE_PLUGINS=1` before calling the script if you want a fully clean execution environment.