feat(perf): add Phase 21.8 foundation for IntArrayCore/MatI64 numeric boxes

Prepare infrastructure for specialized numeric array benchmarking:
- Add IntArrayCore plugin stub (crates/nyash_kernel/src/plugin/intarray.rs)
- Add IntArrayCore/MatI64 box definitions (lang/src/runtime/numeric/)
- Add Phase 21.8 documentation and task tracking
- Update nyash.toml/hako.toml with numeric library configuration
- Extend microbench.sh for matmul_core benchmark case

Next: Resolve Stage-B MirBuilder to recognize MatI64/IntArrayCore as boxes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
nyash-codex
2025-11-14 15:18:14 +09:00
parent f1fa182a4b
commit 8214176814
11 changed files with 549 additions and 3 deletions

View File

@ -4,7 +4,7 @@
- .hako 側AotPrepで前処理最適化構造のみを行い、LLVM/AOT に渡すIRを軽量にする。
- 既定は挙動不変optin。Return 純化ガードで安全性を担保。
チェックリスト
チェックリスト21.5 時点の着地)
- [x] パス分割StrlenFold / LoopHoist / ConstDedup / CollectionsHot / BinopCSE
- [x] CollectionsHotArray/Map導入既定OFF
- [x] Map key モード `NYASH_AOT_MAP_KEY_MODE={h|i64|hh|auto}`
@ -17,6 +17,11 @@
- [ ] Idempotence置換済みタグで再実行時も不変
- [ ] `arraymap`/`matmul` ≤ 125%C基準
メモ21.5 クロージング)
- linidx/maplin など「線形インデックスArray/Map」系は CollectionsHot + hoist/CSE で C≒100% 近辺まで到達。
- arraymap は Array/Map 部分の externcall 化は進んだものの、文字列キー生成toString/`\"k\"+idx`)と hash パスが支配的なため、C の単純 int[] とは根本的に前提が異なる状態で終了。
- matmul は CollectionsHot 自体は単体では効いているが、行列積そのものが ArrayBox ベースであり、Core 数値箱不在のまま 80% 目標には届かず。これは 21.6 以降の「Core 数値箱+行列箱」導入で扱う。
トグル
- `NYASH_MIR_LOOP_HOIST=1` … StrlenFold/LoopHoist/ConstDedup/BinopCSE を有効化
- `NYASH_AOT_COLLECTIONS_HOT=1` … CollectionsHotArray/Map

View File

@ -0,0 +1,90 @@
# Phase 21.6 — Core Numeric Boxes (Draft)
Status: proposal (to refine at 21.6 kickoff)
## Goal
Provide explicit, lowlevel numeric boxes that:
- Give Nyash a “fair” core for int/f64 benchmarks against C.
- Stay compatible with the existing ArrayBox API (no breaking changes).
- Can be used both explicitly in `.hako` and (later) as conservative AotPrep targets.
This phase focuses on design + minimal implementation; aggressive autorewrites stay behind optin flags.
## Scope (21.6)
- Design and add **IntArrayCore** numeric core (NyRT + Hako wrapper):
- NyRT: `IntArrayCore` boxRustwith internal layout `Vec<i64>`contiguous, rowmajor semantics
- Hako: `IntArrayCoreBox` in `nyash.core.numeric.intarray`, wrapping NyRT via externcall:
- `static new(len: i64) -> IntArrayCoreBox``nyash.intarray.new_h`
- `length(self) -> i64``nyash.intarray.len_h`
- `get_unchecked(self, idx: i64) -> i64``nyash.intarray.get_hi`
- `set_unchecked(self, idx: i64, v: i64)``nyash.intarray.set_hii`
- Semantics: i64only、固定長構造変更なし。境界チェックは NyRT 側FailFastに限定し、Hako 側は数値カーネル専用の薄いラッパーに留める。
- Design and add **MatI64** (matrix box) on top of IntArrayCore:
- Internal layout: `rows: i64`, `cols: i64`, `stride: i64`, `core: IntArrayCoreBox`.
- Minimal API:
- `new(rows: i64, cols: i64) -> MatI64`
- `rows(self) -> i64`, `cols(self) -> i64`
- `at(self, r: i64, c: i64) -> i64`
- `set(self, r: i64, c: i64, v: i64)`
- Provide one reference implementation:
- `MatOps.matmul_naive(a: MatI64, b: MatI64) -> MatI64` (O(n³), clear structure, not tuned).
- Bench alignment:
- Add `matmul_core` benchmark:
- Nyash: MatI64 + IntArrayCore implementation.
- C: struct `{ int64_t *ptr; int64_t rows; int64_t cols; int64_t stride; }` + helper `get/set`.
- Keep existing `matmul` (ArrayBox vs raw `int*`) as “languagelevel” benchmark.
Out of scope for 21.6:
- Autorewrite from `ArrayBox``IntArrayCore` / `MatI64` in AotPrep (only sketched, not default).
- SIMD / blocked matmul / cachetuned kernels (can be separate optimization phases).
- f64/complex variants (only type skeletons, if any).
## Design Notes
- **Layering**
- Core: IntArrayCore (and future F64ArrayCore) are “muscle” boxes: minimal, numericonly. NyRT では IntArrayCoreRust、Hako では IntArrayCoreBox として露出。
- Matrix: MatI64 expresses 2D shape and indexing; it owns an IntArrayCoreBox.
- Highlevel: ArrayBox / MapBox / existing user APIs remain unchanged.
- **Hako ABI vs Nyash implementation**
- IntArrayCore lives as a NyRT box (C/Rust implementation) exposed via Hako ABI (`nyash.intarray.*`).
- IntArrayCoreBox, MatI64 and MatOps are written in Nyash, calling IntArrayCore via externcall while exposing boxcall APIs to user code.
- This keeps heavy lifting in NyRT while keeping the 2D semantics in `.hako`.
- **Fair C comparison**
- For `matmul_core`, C should mirror IntArrayCore/MatI64:
- Same struct layout (ptr + len / rows + cols + stride).
- Same naive O(n³) algorithm.
- This separates:
- “Nyash vs C as languages” → existing `matmul` (ArrayBox vs `int*`).
- “Core numeric kernel parity” → new `matmul_core` (IntArrayCore vs equivalent C).
## AotPrep / Future Work (21.6+)
Not for default in 21.6, but to keep in mind:
- Add conservative patterns in Collections/AotPrep to detect:
- `ArrayBox<i64>` with:
- Fixed length.
- No structural mutations after initialization.
- Access patterns of the form `base + i*cols + j` (or similar linear forms).
- Allow optin rewrite from such patterns to IntArrayCore/MatI64 calls.
- Keep all autorewrites:
- Behind env toggles (e.g. `NYASH_AOT_INTARRAY_CORE=1`).
- Semanticspreserving by construction; fall back to ArrayBox path when unsure.
## Open Questions for 21.6 Kickoff
- Exact module names:
- `nyash.core.intarray` / `nyash.core.matrix` vs `nyash.linalg.*`.
- Bounds checking policy for IntArrayCore:
- Always on (failfast) vs dev toggle for light checks in hot loops.
- Interop:
- Whether MatI64 should expose its IntArrayCore (e.g. `as_core_row_major()`) for advanced users.

View File

@ -0,0 +1,85 @@
# Phase 21.8 — Numeric Core Integration & Builder Support
Status: proposal (to hand off to Claude Code)
## Goal
Integrate the new numeric core boxes (IntArrayCore + MatI64) into the Hakorune selfhost chain so that:
- StageB → MirBuilder → nyllvmc(crate) can emit MIR(JSON) and EXE for code that uses:
- `using nyash.core.numeric.intarray as IntArrayCore`
- `using nyash.core.numeric.matrix_i64 as MatI64`
- The `matmul_core` microbench (MatI64 + IntArrayCore) runs endtoend in EXE mode and can be compared fairly against a matching C implementation.
21.6 provides the core boxes; 21.8 focuses on wiring them into the builder/runtime chain without changing default behaviour for other code.
## Scope (21.8, this host)
- StageB / MirBuilder:
- Ensure `MatI64` and `IntArrayCore` are recognized as valid boxes when referenced via:
- `using nyash.core.numeric.matrix_i64 as MatI64`
- `using nyash.core.numeric.intarray as IntArrayCore`
- Fix the current provideremit failure:
- Error today: `[mirbuilder/parse/error] undefined variable: MatI64` during `env.mirbuilder.emit`.
- Diagnose and adjust StageB / MirBuilder so that static box references (`MatI64.new`, `A.mul_naive`) compile in the same way as other boxes.
- AotPrep / emit pipeline:
- Keep AotPrep unchanged for now; the goal is to make `tools/hakorune_emit_mir.sh` succeed on `matmul_core` sources without specialcasing.
- Ensure `tools/hakorune_emit_mir.sh` with:
- `HAKO_APPLY_AOT_PREP=1 NYASH_AOT_COLLECTIONS_HOT=1 NYASH_LLVM_FAST=1 NYASH_MIR_LOOP_HOIST=1`
- can emit valid MIR(JSON) for MatI64/IntArrayCore code.
- Microbench integration:
- Finish wiring `matmul_core` in `tools/perf/microbench.sh`:
- Hako side: MatI64/IntArrayCore based O(n³) matmul (`MatI64.mul_naive`).
- C side: `MatI64Core { int64_t *ptr; rows; cols; stride; }` with identical algorithm.
- Accept that performance may still be far from the 80% target; 21.8 focuses on **structural integration and parity**, not tuning.
Out of scope:
- New optimizations inside AotPrep / CollectionsHot.
- SIMD/blocked matmul kernels (to be handled in a later optimization phase).
- f64/complex matrix variants.
## Tasks for implementation (Claude Code)
1) **Fix MatI64 visibility in StageB / MirBuilder**
- Reproduce the current failure:
- Use a small `.hako` like:
- `using nyash.core.numeric.matrix_i64 as MatI64`
- `static box Main { method main(args) { local n = 4; local A = MatI64.new(n,n); return A.at(0,0); } }`
- Confirm `env.mirbuilder.emit` reports `undefined variable: MatI64`.
- Investigate how modules from `nyash.toml` (`"nyash.core.numeric.matrix_i64" = "lang/src/runtime/numeric/mat_i64_box.hako"`) are made visible to StageB and MirBuilder.
- Adjust the resolver / module prelude so that `MatI64` (and `IntArrayCore`) are treated like other core boxes:
- Either via explicit prelude inclusion,
- Or via module registry entries consumed by the builder.
2) **Ensure `tools/hakorune_emit_mir.sh` can emit MIR(JSON) for matmul_core**
- Once MatI64 is visible, run:
- `HAKO_APPLY_AOT_PREP=1 NYASH_AOT_COLLECTIONS_HOT=1 NYASH_LLVM_FAST=1 NYASH_MIR_LOOP_HOIST=1 NYASH_JSON_ONLY=1 tools/hakorune_emit_mir.sh <matmul_core.hako> tmp/matmul_core.json`
- Acceptance:
- No `undefined variable: MatI64` / `IntArrayCore` errors.
- `tmp/matmul_core.json` is valid MIR(JSON) (same schema as existing matmul case).
3) **Finish `matmul_core` microbench**
- Use the existing skeleton in `tools/perf/microbench.sh` (`case matmul_core`):
- Confirm Hako side compiles and runs under `--backend vm`.
- Confirm EXE path works:
- `NYASH_SKIP_TOML_ENV=1 NYASH_LLVM_SKIP_BUILD=1 tools/perf/microbench.sh --case matmul_core --backend llvm --exe --runs 1 --n 64`
- Update `benchmarks/README.md`:
- Add `matmul_core` row with a short description:
- “MatI64/IntArrayCore vs MatI64Core C struct (ptr+rows+cols+stride)”
- Record initial ratios (even if far from 80%).
4) **Keep existing behaviour stable**
- No changes to default user behaviour, env toggles, or existing benches beyond adding `matmul_core`.
- Ensure quick/profile smokes (where applicable) remain green with numeric core present.
## Notes
- 21.6 already introduced:
- NyRT `IntArrayCore` (Vec<i64> + RwLock) and handlebased externs (`nyash.intarray.*`).
- Hako wrappers `IntArrayCore` and `MatI64` in `lang/src/runtime/numeric/`.
- `nyash.toml` module aliases for `nyash.core.numeric.intarray` and `nyash.core.numeric.matrix_i64`.
- 21.8 is about wiring these into the builder/emit chain so that Hakorune can compile and benchmark numeric core code endtoend.