json(vm): fix birth dispatch; unify constructor naming (Box.birth/N); JsonNode factories return JsonNodeInstance; quick: enable heavy JSON with probe; builder: NYASH_BUILDER_DEBUG_LIMIT guard; json_query_min(core) harness; docs/tasks updated
This commit is contained in:
@ -1,89 +0,0 @@
|
||||
# MIR14 Core‑14 Specification (Draft)
|
||||
|
||||
本メモは、MIR14(Core‑14)命令体系の確定仕様と、レガシー命令の廃止方針をまとめる。実装は"Core‑14既定ON・forbid‑legacy"を前提とする。
|
||||
|
||||
## 1. Core‑14 命令一覧(最小限+実践的)
|
||||
|
||||
| 区分 | 命令 | 役割(要点) |
|
||||
|------|------|---------------|
|
||||
| 値 | Const | 即値・アドレス等の定数生成(副作用なし) |
|
||||
| 演算 | BinOp | 加減乗除/ビット演算(純粋) |
|
||||
| 演算 | UnaryOp | 単項演算(否定、NOT等)【実用性から復活】 |
|
||||
| 演算 | Compare | 比較演算(純粋) |
|
||||
| 制御 | Jump | 無条件遷移(終端) |
|
||||
| 制御 | Branch | 条件分岐遷移(終端) |
|
||||
| 制御 | Return | 関数復帰(終端) |
|
||||
| 形状 | Phi | SSA合流(純粋) |
|
||||
| 呼出 | Call | 直接/間接呼出(ユーザー関数) |
|
||||
| 呼出 | BoxCall | Boxへのメッセージ呼出(配列/フィールド/メソッドの統一) |
|
||||
| 呼出 | ExternCall | ランタイム/プラグインへの呼出(FFI境界) |
|
||||
| 型 | TypeOp | 型判定・型変換(型関連演算の統合) |
|
||||
| 実行 | Safepoint | 安全点(GC/割込み協調) |
|
||||
| 実行 | Barrier | 書込/読込バリア等の最小表現 |
|
||||
|
||||
注:
|
||||
- Branch/Jump/Return は終端命令。Phi は構文木上で合流点にのみ出現。
|
||||
- BoxCall は自由可変長引数(receiver+メソッド名/操作名+args...)を標準とし、BoxCallWithは廃止。
|
||||
|
||||
## 2. IR規約(Invariants)
|
||||
- SSA: すべての値は一度だけ定義。Phiは支配関係に従い配置。
|
||||
- 終端整合: Blockの末尾は {Return|Jump|Branch} のいずれか1つ。
|
||||
- 副作用モデル:
|
||||
- 純粋: Const/BinOp/Compare/Phi
|
||||
- 効果あり: Call/BoxCall/ExternCall/Safepoint/Barrier(効果種別はEffect Maskで注釈可)
|
||||
- Safepoint配置: ループヘッダ/長期待機前/FFI直後などに挿入(最小限)。
|
||||
- Barrier: write/read バリアはCore‑13で抽象化し、下位で最適化。
|
||||
|
||||
## 3. 高位→Core‑13 への標準Lowering
|
||||
- 配列/フィールド/メソッド: すべて BoxCall で統一。
|
||||
- 例) `a[i]` → `BoxCall(a, "get", i)`
|
||||
- 例) `o.name` → `BoxCall(o, "getField", "name")`
|
||||
- 例) `o.add(x)` → `BoxCall(o, "add", x)`
|
||||
- ランタイム/プラグイン: `ExternCall("iface", "method", args...)` による一貫表現。
|
||||
- 型操作: `TypeOp(kind, value[, type])`(型判定/変換を単一路に集約)。
|
||||
- 制御構造: if/loop は Branch/Jump/Phi で表現。
|
||||
|
||||
### 3.1 記法方針(表記と内部の二層)
|
||||
- 表記: 従来の if / while / for / return などの構文を維持(ユーザフレンドリ)。
|
||||
- 内部: LoopForm IR(loop.begin/iter/branch/end)に正規化。
|
||||
- 最終: LoopForm → Core‑13 へ逆Lowering(Branch/Jump/Phi/Return へ落とす)。
|
||||
|
||||
これにより、言語表記の自由度とIRの直交性(正規形)を両立する。
|
||||
|
||||
## 4. LoopForm(LoopSignal IR)との整合
|
||||
- LoopForm は“中間正規形”として `loop.begin/iter/branch/end` を導入(Core‑13の上位層)。
|
||||
- 逆Lowering: LoopForm → Core‑13 は以下の基本変換で常時可能:
|
||||
- `loop.begin` → ヘッダBlock生成+Phi配置
|
||||
- `loop.iter` → 条件/stepコードをヘッダ/ボディに分配
|
||||
- `loop.branch` → `switch/Branch` + `Jump`
|
||||
- `loop.end` → 合流先にReturn/Jump(Signal種別に応じる)
|
||||
- Safepoint/Barrier は Core‑13 層で維持。LoopFormは制御の正規化に専念。
|
||||
|
||||
## 5. レガシー命令の廃止マップ
|
||||
- Load/Store / ArrayGet/ArraySet / RefGet/RefSet / WeakNew/WeakLoad → BoxCall(必要時Barrier/Safepoint併用)
|
||||
- TypeCheck/Cast → TypeOp
|
||||
- PluginInvoke → ExternCall / BoxCall(ABIに応じて)
|
||||
- Nop/Copy/UnOp 等の補助命令 → 最適化/ビルダ内部に吸収(表面APIから排除)
|
||||
|
||||
## 6. ExternCall の階層化(境界の明示)
|
||||
- iface例:
|
||||
- `env.runtime`: ランタイム内部API(checkpoint等)
|
||||
- `env.gc`: GC操作(将来)
|
||||
- `plugin.*`: プラグイン提供のFFI群
|
||||
- 指針: BoxCallで表現可能な操作は BoxCall を優先(抽象度維持)。どうしてもhost境界を越える必要がある場合のみ ExternCall。
|
||||
|
||||
## 7. 妥当性検査(Lint/Verify)
|
||||
- Phi配置の正当性(支配木チェック)
|
||||
- 終端命令の整合
|
||||
- EffectとSafepointの整合(長期ループでの安全点確保)
|
||||
- レガシー命令検出(forbid‑legacy がONであること)
|
||||
|
||||
## 8. 移行計画(段階導入)
|
||||
1) フラグ: Core‑13 既定ON/forbid‑legacy を実装側で保証(nyash.tomlも同値)。
|
||||
2) レガシー→Core‑13 置換を段階実施(ビルダ/最適化/バックエンドを横断)
|
||||
3) LoopForm(任意)を導入し、while/for/scope から正規化→逆LoweringでCore‑13へ落とす
|
||||
4) 検証: 既存スイート + 再現ベンチ(AOT/VM/JIT) + Lint で差分監視
|
||||
|
||||
---
|
||||
|
||||
この文書は“仕様の真実の源泉(single source of truth)”として、Core‑13 と上位LoopFormの整合と廃止路線を明示する。実装の進捗に合わせて更新する。
|
||||
@ -1,105 +0,0 @@
|
||||
# 論文A: MIR14で作る万能実行系
|
||||
|
||||
> Scope (2025-09-19): 本稿の実験・評価は PyVM(意味論リファレンス)と LLVM/llvmlite(AOT/EXEハーネス)に限定する。MIR は PHI‑off(合流はエッジコピー)で、PHI 形成は LLVM 側で合成する。詳細は SCOPE.md を参照。
|
||||
|
||||
## 📚 概要
|
||||
|
||||
**タイトル**: From Interpreter to Native GUI Apps: Universal Execution with 14 Core Instructions
|
||||
|
||||
**主題**: 14命令のミニマルIRで実現する5つの実行形態(インタープリター/VM/JIT/AOT/GUI)
|
||||
|
||||
**対象読者**: システム研究者、言語実装者、実用性重視の開発者
|
||||
|
||||
## 🌱 MIR命令数の進化史
|
||||
|
||||
**初期**: 27命令前後(汎用的にあれもこれも欲しい状態)
|
||||
↓
|
||||
**削減会議**: Box哲学と抽象化を突き詰めて13命令まで削減
|
||||
↓
|
||||
**復活**: 「最低限の算術演算は直接あった方が良い」という判断で UnaryOp を追加
|
||||
↓
|
||||
**MIR14**: 現在のコア命令セット(Core-13 + UnaryOp = 14命令)
|
||||
|
||||
## 🎯 研究ポイント
|
||||
|
||||
### 1. 実装の完全性
|
||||
- **インタープリター**: 開発・デバッグ用(500行)
|
||||
- **VM**: 高速実行(1000行)
|
||||
- **JIT/AOT**: Cranelift統合でネイティブ性能
|
||||
- **EXE生成**: lld内蔵で完全自立
|
||||
- **Windows GUIアプリ**: EguiBoxで実用アプリ
|
||||
|
||||
### 2. MIR14の威力
|
||||
- 14命令(Core-13 + UnaryOp)ですべての実行形態をサポート
|
||||
- 27命令 → 13命令 → 14命令への実践的な進化
|
||||
- BoxCallへの統一と必要最小限の算術演算
|
||||
|
||||
### 3. 実用性の証明
|
||||
- サイコロRPG(ゲーム)
|
||||
- 統計計算ツール(数値計算)
|
||||
- LISPインタープリター(言語処理系)
|
||||
- ファイルエクスプローラー(GUIアプリ)
|
||||
|
||||
## 📊 実験計画
|
||||
|
||||
### 実行形態の比較
|
||||
- **起動時間**: Interpreter < VM < JIT < AOT < GUI
|
||||
- **実行速度**: Interpreter < VM < JIT ≈ AOT
|
||||
- **バイナリサイズ**: Script < VM < JIT < AOT < GUI
|
||||
- **メモリ使用量**: 各形態での比較
|
||||
|
||||
### 実アプリケーション評価
|
||||
- **サイコロRPG**: ゲームループ性能(60fps達成)
|
||||
- **統計計算**: 大規模データ処理(100万件)
|
||||
- **GUIレスポンス**: ユーザー操作の遅延(<16ms)
|
||||
- **コンパイル時間**: ソース→EXEの所要時間
|
||||
|
||||
## 📁 ディレクトリ構造
|
||||
|
||||
```
|
||||
paper-a-mir13-ir-design/
|
||||
├── README.md # このファイル
|
||||
├── abstract.md # 論文概要
|
||||
├── main-paper.md # 本文
|
||||
├── chapters/ # 章別ファイル
|
||||
│ ├── 01-introduction.md
|
||||
│ ├── 02-mir-evolution.md
|
||||
│ ├── 03-boxcall-unification.md
|
||||
│ ├── 04-optimization-techniques.md
|
||||
│ ├── 05-evaluation.md
|
||||
│ └── 06-conclusion.md
|
||||
├── figures/ # 図表
|
||||
│ ├── mir-instruction-reduction.png
|
||||
│ ├── performance-comparison.png
|
||||
│ └── boxcall-architecture.svg
|
||||
├── data/ # 実験データ
|
||||
│ ├── benchmark-results/
|
||||
│ └── mir-statistics/
|
||||
└── related-work.md # 関連研究
|
||||
|
||||
```
|
||||
|
||||
## 🗓️ スケジュール
|
||||
|
||||
- **2025年9月前半**: 実験実施・データ収集
|
||||
- **2025年9月中旬**: 執筆開始
|
||||
- **2025年9月末**: arXiv投稿(速報版)
|
||||
- **2025年11月**: POPL/PLDI 2026投稿
|
||||
|
||||
## 📝 執筆メモ
|
||||
|
||||
### 強調すべき貢献
|
||||
1. **実装の幅広さ**: 1つのIRで5つの実行形態を実現
|
||||
2. **完全な自立性**: 外部コンパイラ・リンカー不要
|
||||
3. **実用アプリ動作**: GUIアプリまで実際に動く
|
||||
|
||||
### 新規性
|
||||
- 13命令で実用GUIアプリまで動かした初の事例
|
||||
- インタープリターからネイティブまでの統一パイプライン
|
||||
- Cranelift + lld内蔵による完全自己完結型言語
|
||||
|
||||
## 🔗 関連ドキュメント
|
||||
|
||||
- [MIR Instruction Set](../../../../reference/mir/INSTRUCTION_SET.md)
|
||||
- [Phase 11.8 MIR Cleanup](../../../../development/roadmap/phases/phase-11.8_mir_cleanup/)
|
||||
- [Phase 12 TypeBox統合](../../../../development/roadmap/phases/phase-12/)
|
||||
@ -1,32 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
OUT_DIR=$(cd "$(dirname "$0")" && pwd)
|
||||
OUT_FILE="$OUT_DIR/ENVIRONMENT.txt"
|
||||
|
||||
{
|
||||
echo "== Datetime =="
|
||||
date -Iseconds || date
|
||||
echo
|
||||
echo "== OS =="
|
||||
uname -a || true
|
||||
lsb_release -a 2>/dev/null || true
|
||||
sw_vers 2>/dev/null || true
|
||||
systeminfo 2>/dev/null | head -n 30 || true
|
||||
echo
|
||||
echo "== CPU =="
|
||||
lscpu 2>/dev/null || sysctl -a 2>/dev/null | grep machdep.cpu || true
|
||||
echo
|
||||
echo "== Rust toolchain =="
|
||||
rustc --version 2>/dev/null || true
|
||||
cargo --version 2>/dev/null || true
|
||||
echo
|
||||
echo "== Git =="
|
||||
git rev-parse HEAD 2>/dev/null || true
|
||||
echo
|
||||
echo "== Cranelift/JIT features =="
|
||||
rg -n "cranelift|jit" -S ../../../../ -g '!target' 2>/dev/null || true
|
||||
} > "$OUT_FILE"
|
||||
|
||||
echo "[DONE] Wrote $OUT_FILE"
|
||||
|
||||
@ -1,25 +0,0 @@
|
||||
This folder contains reproducibility artifacts for Paper A (MIR13 IR design).
|
||||
|
||||
Files
|
||||
- `COLLECT_ENV.sh`: Captures host OS/CPU/toolchain/git info into `ENVIRONMENT.txt`.
|
||||
- `RUN_BENCHMARKS.sh`: Runs interpreter/VM/JIT/AOT (if available) against sample benchmarks and writes CSVs to `results/`.
|
||||
- `results/`: Output CSVs (per benchmark and per mode). Merge/plot as needed.
|
||||
|
||||
Usage
|
||||
1) Capture environment
|
||||
./COLLECT_ENV.sh
|
||||
|
||||
2) Build (full)
|
||||
cargo build --release --features cranelift-jit
|
||||
|
||||
3) Run benchmarks
|
||||
./RUN_BENCHMARKS.sh
|
||||
|
||||
Variables:
|
||||
- NYASH_BIN: Path to nyash binary (default: target/release/nyash)
|
||||
- USE_EXE_ONLY=1: Only measure AOT executables (skips interp/vm/jit)
|
||||
|
||||
Notes
|
||||
- AOT requires `tools/build_aot.sh`. If missing, AOT is skipped.
|
||||
- If `hyperfine` is not installed, a simple timing fallback is used.
|
||||
|
||||
@ -1,195 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
# Reproducible benchmarks for MIR13 paper (Interpreter/VM/JIT/AOT if available)
|
||||
# Outputs CSVs under _artifacts/results/
|
||||
|
||||
if ROOT_DIR=$(git -C "$(dirname "$0")" rev-parse --show-toplevel 2>/dev/null); then
|
||||
ROOT_DIR="$ROOT_DIR/nyash"
|
||||
[[ -d "$ROOT_DIR" ]] || ROOT_DIR=$(git rev-parse --show-toplevel)
|
||||
else
|
||||
# Fallback: ascend to repo root from _artifacts
|
||||
ROOT_DIR=$(cd "$(dirname "$0")/../../../../.." && pwd)
|
||||
fi
|
||||
ART_DIR=$(cd "$(dirname "$0")" && pwd)
|
||||
RES_DIR="$ART_DIR/results"
|
||||
mkdir -p "$RES_DIR"
|
||||
|
||||
NYASH_BIN=${NYASH_BIN:-"$ROOT_DIR/target/release/nyash"}
|
||||
SKIP_INTERP=${SKIP_INTERP:-0} # 1: skip interpreter(遅い環境向け)
|
||||
USE_LLVM_AOT=${USE_LLVM_AOT:-0} # 1: LLVM backendでAOTも計測
|
||||
SKIP_AOT=${SKIP_AOT:-0} # 1: tools/build_aot.sh 経由のAOT計測をスキップ
|
||||
RUNS=${RUNS:-10} # 計測回数
|
||||
USE_EXE_ONLY=${USE_EXE_ONLY:-0} # 1: measure AOT exe only
|
||||
HYPERFINE=$(command -v hyperfine || true)
|
||||
TIMEOUT_SECS=${TIMEOUT_SECS:-0} # >0 なら各コマンドをtimeoutでラップ
|
||||
TIMEOUT_BIN=$(command -v timeout || true)
|
||||
|
||||
BENCH_DIR="$ROOT_DIR/benchmarks"
|
||||
FILES=(
|
||||
"$BENCH_DIR/bench_light.nyash"
|
||||
"$BENCH_DIR/bench_medium.nyash"
|
||||
"$BENCH_DIR/bench_heavy.nyash"
|
||||
"$BENCH_DIR/bench_aot_len_light.nyash"
|
||||
"$BENCH_DIR/bench_aot_len_medium.nyash"
|
||||
"$BENCH_DIR/bench_aot_len_heavy.nyash"
|
||||
"$ROOT_DIR/examples/aot_min_string_len.nyash"
|
||||
# Pure arithmetic (no plugins)
|
||||
"$BENCH_DIR/bench_arith_pure_light.nyash"
|
||||
"$BENCH_DIR/bench_arith_pure_medium.nyash"
|
||||
"$BENCH_DIR/bench_arith_pure_heavy.nyash"
|
||||
)
|
||||
|
||||
FILTER_REGEX=${FILTER:-}
|
||||
|
||||
echo "[INFO] NYASH_BIN=$NYASH_BIN"
|
||||
echo "[INFO] USE_EXE_ONLY=$USE_EXE_ONLY (1=EXE only)"
|
||||
echo "[INFO] hyperfine=${HYPERFINE:-not found}"
|
||||
echo "[INFO] USE_LLVM_AOT=$USE_LLVM_AOT (1=measure LLVM AOT)"
|
||||
echo "[INFO] SKIP_AOT=$SKIP_AOT (1=skip AOT via tools/build_aot.sh)"
|
||||
echo "[INFO] RUNS=$RUNS"
|
||||
|
||||
if [[ ! -x "$NYASH_BIN" && "$USE_EXE_ONLY" -eq 0 ]]; then
|
||||
echo "[INFO] Building nyash (release, with JIT feature)"
|
||||
(cd "$ROOT_DIR" && cargo build --release --features cranelift-jit)
|
||||
fi
|
||||
|
||||
have_build_aot=0
|
||||
if [[ -x "$ROOT_DIR/tools/build_aot.sh" ]]; then
|
||||
have_build_aot=1
|
||||
fi
|
||||
|
||||
have_build_llvm=0
|
||||
if [[ -x "$ROOT_DIR/tools/build_llvm.sh" ]]; then
|
||||
have_build_llvm=1
|
||||
fi
|
||||
|
||||
run_cmd() {
|
||||
local cmd="$1" label="$2" csv="$3"
|
||||
local cmd_wrap="$cmd"
|
||||
if [[ -n "$TIMEOUT_BIN" && "$TIMEOUT_SECS" -gt 0 ]]; then
|
||||
cmd_wrap="$TIMEOUT_BIN ${TIMEOUT_SECS}s $cmd"
|
||||
fi
|
||||
if [[ -n "$HYPERFINE" ]]; then
|
||||
# runs configurable, warmup 1, export CSV
|
||||
$HYPERFINE -w 1 -r "$RUNS" --export-csv "$csv" --show-output "$cmd_wrap"
|
||||
else
|
||||
# Simple fallback: run 10 times and record naive timing (ms)
|
||||
: > "$csv"
|
||||
for i in $(seq 1 "$RUNS"); do
|
||||
local t0=$(python3 - <<<'import time; print(int(time.time()*1000))')
|
||||
bash -lc "$cmd_wrap" >/dev/null 2>&1 || true
|
||||
local t1=$(python3 - <<<'import time; print(int(time.time()*1000))')
|
||||
echo "$label,$((t1-t0))" >> "$csv"
|
||||
done
|
||||
fi
|
||||
}
|
||||
|
||||
# Measure modes
|
||||
for f in "${FILES[@]}"; do
|
||||
if [[ -n "$FILTER_REGEX" ]]; then
|
||||
if [[ ! "$f" =~ $FILTER_REGEX ]]; then
|
||||
echo "[INFO] FILTER: skip $f"
|
||||
continue
|
||||
fi
|
||||
fi
|
||||
[[ -f "$f" ]] || { echo "[WARN] Skip missing $f"; continue; }
|
||||
base=$(basename "$f" .nyash)
|
||||
|
||||
if [[ "$USE_EXE_ONLY" -eq 0 ]]; then
|
||||
# Interpreter
|
||||
if [[ "$SKIP_INTERP" -eq 0 ]]; then
|
||||
run_cmd "$NYASH_BIN $f" "interp-$base" "$RES_DIR/${base}_interp.csv"
|
||||
else
|
||||
echo "[INFO] SKIP_INTERP=1: skipping interpreter for $f"
|
||||
fi
|
||||
# VM
|
||||
run_cmd "$NYASH_BIN --backend vm $f" "vm-$base" "$RES_DIR/${base}_vm.csv"
|
||||
# JIT (VM + JIT execute)
|
||||
run_cmd "NYASH_JIT_EXEC=1 $NYASH_BIN --backend vm $f" "jit-$base" "$RES_DIR/${base}_jit.csv"
|
||||
fi
|
||||
|
||||
# AOT (if tool available)
|
||||
if [[ $have_build_aot -eq 1 && "$SKIP_AOT" -eq 0 ]]; then
|
||||
out="/tmp/ny_${base}_aot"
|
||||
bash "$ROOT_DIR/tools/build_aot.sh" "$f" -o "$out" >/dev/null 2>&1 || true
|
||||
if [[ -x "$out" ]]; then
|
||||
run_cmd "$out" "aot-$base" "$RES_DIR/${base}_aot.csv"
|
||||
rm -f "$out"
|
||||
else
|
||||
echo "[WARN] AOT build failed for $f"
|
||||
fi
|
||||
else
|
||||
if [[ "$SKIP_AOT" -eq 1 ]]; then
|
||||
echo "[INFO] SKIP_AOT=1: skipping AOT for $f"
|
||||
else
|
||||
echo "[INFO] AOT tool not found; skipping AOT for $f"
|
||||
fi
|
||||
fi
|
||||
done
|
||||
|
||||
# LLVM AOT-only targets (optional)
|
||||
if [[ "$USE_LLVM_AOT" -eq 1 ]]; then
|
||||
if [[ $have_build_llvm -eq 0 ]]; then
|
||||
echo "[WARN] tools/build_llvm.sh not found; skipping LLVM AOT"
|
||||
elif ! command -v llvm-config-18 >/dev/null 2>&1; then
|
||||
echo "[WARN] llvm-config-18 not found; skipping LLVM AOT"
|
||||
else
|
||||
LLVM_FILES=(
|
||||
"$ROOT_DIR/apps/tests/ny-llvm-smoke/main.nyash"
|
||||
)
|
||||
for f in "${LLVM_FILES[@]}"; do
|
||||
[[ -f "$f" ]] || { echo "[WARN] Skip missing LLVM target $f"; continue; }
|
||||
base=$(basename "$f" .nyash)
|
||||
out="/tmp/ny_${base}_llvm"
|
||||
# Build via LLVM backend
|
||||
LLVM_SYS_180_PREFIX=$(llvm-config-18 --prefix) \
|
||||
LLVM_SYS_181_PREFIX=$(llvm-config-18 --prefix) \
|
||||
"$ROOT_DIR/tools/build_llvm.sh" "$f" -o "$out" >/dev/null 2>&1 || true
|
||||
if [[ -x "$out" ]]; then
|
||||
run_cmd "$out" "llvm-aot-$base" "$RES_DIR/${base}_llvm_aot.csv"
|
||||
rm -f "$out"
|
||||
else
|
||||
echo "[WARN] LLVM AOT build failed for $f"
|
||||
fi
|
||||
done
|
||||
fi
|
||||
fi
|
||||
|
||||
# JIT-AOT (Cranelift) via --jit-direct (optional)
|
||||
USE_JIT_AOT=${USE_JIT_AOT:-0}
|
||||
echo "[INFO] USE_JIT_AOT=$USE_JIT_AOT (1=measure JIT AOT via jit-direct)"
|
||||
if [[ "$USE_JIT_AOT" -eq 1 ]]; then
|
||||
echo "[JIT-AOT] Building nyash + nyrt ..."
|
||||
(cd "$ROOT_DIR" && cargo build --release --features cranelift-jit >/dev/null)
|
||||
(cd "$ROOT_DIR/crates/nyrt" && cargo build --release >/dev/null)
|
||||
|
||||
JIT_AOT_FILES=(
|
||||
"$ROOT_DIR/apps/examples/array_p0.nyash"
|
||||
)
|
||||
for f in "${JIT_AOT_FILES[@]}"; do
|
||||
[[ -f "$f" ]] || { echo "[WARN] Skip missing JIT-AOT target $f"; continue; }
|
||||
base=$(basename "$f" .nyash)
|
||||
objdir="$ROOT_DIR/target/aot_objects"
|
||||
rm -rf "$objdir" && mkdir -p "$objdir"
|
||||
# Emit object via JIT-direct (relaxed)
|
||||
NYASH_JIT_EVENTS=1 NYASH_AOT_OBJECT_OUT="$objdir/main.o" "$NYASH_BIN" --jit-direct "$f" >/dev/null || true
|
||||
if [[ -f "$objdir/main.o" ]]; then
|
||||
out="/tmp/ny_${base}_jit_aot"
|
||||
cc "$objdir/main.o" \
|
||||
-L "$ROOT_DIR/target/release" \
|
||||
-Wl,--whole-archive -lnyrt -Wl,--no-whole-archive \
|
||||
-lpthread -ldl -lm -o "$out"
|
||||
if [[ -x "$out" ]]; then
|
||||
run_cmd "$out" "jit-aot-$base" "$RES_DIR/${base}_jit_aot.csv"
|
||||
rm -f "$out"
|
||||
else
|
||||
echo "[WARN] link failed for JIT-AOT target $f"
|
||||
fi
|
||||
else
|
||||
echo "[WARN] JIT AOT object not generated for $f"
|
||||
fi
|
||||
done
|
||||
fi
|
||||
|
||||
echo "[DONE] Results in $RES_DIR"
|
||||
@ -1,41 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
import sys, os, csv, statistics
|
||||
|
||||
def scan_results(path):
|
||||
rows = []
|
||||
for name in sorted(os.listdir(path)):
|
||||
if not name.endswith('.csv'):
|
||||
continue
|
||||
fpath = os.path.join(path, name)
|
||||
with open(fpath, newline='') as f:
|
||||
rdr = csv.reader(f)
|
||||
vals = []
|
||||
label = None
|
||||
for r in rdr:
|
||||
if not r:
|
||||
continue
|
||||
label = r[0]
|
||||
try:
|
||||
vals.append(float(r[1]))
|
||||
except Exception:
|
||||
pass
|
||||
if label and vals:
|
||||
rows.append((name, label, len(vals), statistics.median(vals), statistics.mean(vals)))
|
||||
return rows
|
||||
|
||||
def main():
|
||||
if len(sys.argv) < 2:
|
||||
print("usage: gen_table.py <results_dir>", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
res = scan_results(sys.argv[1])
|
||||
if not res:
|
||||
print("no CSVs found", file=sys.stderr)
|
||||
sys.exit(2)
|
||||
print("| File | Label | N | Median (ms) | Mean (ms) |")
|
||||
print("|------|-------|---|-------------:|----------:|")
|
||||
for name, label, n, med, mean in res:
|
||||
print(f"| {name} | {label} | {n} | {med:.1f} | {mean:.1f} |")
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
|
||||
@ -1,27 +0,0 @@
|
||||
# Abstract
|
||||
|
||||
## English Version
|
||||
|
||||
We present MIR-14, a minimal yet practical intermediate representation that evolves from an initial 27-instruction set through aggressive reduction to 13 instructions, then pragmatically adds back one essential operation (UnaryOp) for a final count of 14 core instructions. This evolution demonstrates the tension between theoretical minimalism and practical efficiency in compiler design.
|
||||
|
||||
Our key contributions are: (1) systematic instruction set evolution from Core-27 → Core-13 → Core-14 through empirical validation; (2) the BoxCall unification architecture that elegantly handles all data access patterns; (3) recognition that certain primitive operations (UnaryOp for negation and NOT) significantly improve both compilation efficiency and runtime performance; (4) optimization strategies including inline caching (33x speedup), AOT compilation, and typed array specialization that complement the minimal instruction set; (5) proof that the "Everything is Box" philosophy can be effectively realized at the IR level while maintaining practical performance.
|
||||
|
||||
Implementation results show that despite halving the instruction count, our benchmarks maintain performance within ±5% of the baseline while reducing MIR code size by 20-50%. The system successfully compiles complex applications including GUI programs, web servers, and distributed systems. This work demonstrates that IR minimalism, when coupled with strategic optimization placement, can achieve both extreme simplicity and production-level performance.
|
||||
|
||||
Our approach challenges the trend toward increasingly complex intermediate representations (e.g., LLVM's 60+ opcodes), showing that careful design can achieve more with less. We believe MIR-14 opens new possibilities for compiler construction, optimization research, and language implementation education.
|
||||
|
||||
## 日本語版
|
||||
|
||||
本研究では、初期の27命令セットから積極的な削減により13命令まで圧縮し、その後実用性を考慮して必須演算(UnaryOp)を追加した14命令の最小限かつ実践的な中間表現MIR-14を提示する。この進化は、コンパイラ設計における理論的ミニマリズムと実践的効率性の間の緊張関係を示している。
|
||||
|
||||
本研究の主要な貢献は以下の通りである:(1)Core-27 → Core-13 → Core-14への段階的な命令セット進化の実証的検証、(2)すべてのデータアクセスパターンをエレガントに処理するBoxCall統一アーキテクチャ、(3)特定のプリミティブ演算(否定やNOT演算のためのUnaryOp)がコンパイル効率と実行時性能の両方を大幅に改善するという認識、(4)最小命令セットを補完するインラインキャッシング(33倍高速化)、AOTコンパイル、型付き配列特化などの最適化戦略、(5)「Everything is Box」哲学が実用的な性能を維持しながらIRレベルで効果的に実現可能であることの証明。
|
||||
|
||||
実装結果は、命令数を半減させたにもかかわらず、ベンチマークがベースラインの±5%以内の性能を維持し、MIRコードサイズを20-50%削減することを示している。このシステムはGUIプログラム、Webサーバー、分散システムを含む複雑なアプリケーションのコンパイルに成功している。本研究は、IRミニマリズムが戦略的な最適化配置と組み合わされることで、極端なシンプルさと本番レベルの性能の両立が可能であることを実証した。
|
||||
|
||||
我々のアプローチは、ますます複雑化する中間表現(例:LLVMの60以上のオペコード)の傾向に挑戦し、慎重な設計により「より少ないものでより多くを達成できる」ことを示している。MIR-14はコンパイラ構築、最適化研究、言語実装教育に新たな可能性を開くと考えられる。
|
||||
|
||||
## Keywords / キーワード
|
||||
|
||||
Intermediate representation, Instruction set reduction, BoxCall unification, Compiler optimization, Inline caching, AOT compilation
|
||||
|
||||
中間表現、命令セット削減、BoxCall統一、コンパイラ最適化、インラインキャッシング、AOTコンパイル
|
||||
@ -1,87 +0,0 @@
|
||||
# Chapter 1: Introduction
|
||||
|
||||
## The 14-Instruction Balance
|
||||
|
||||
Can we build a practical programming language with just 14 intermediate representation (IR) instructions? This paper demonstrates how we evolved from 27 instructions to 13 through aggressive minimization, then pragmatically added one back (UnaryOp) to achieve the optimal balance between theoretical minimalism and practical efficiency.
|
||||
|
||||
## The Complexity Crisis
|
||||
|
||||
Modern intermediate representations have grown alarmingly complex:
|
||||
|
||||
- **LLVM IR**: 60+ opcodes (and growing)
|
||||
- **JVM bytecode**: ~200 instructions
|
||||
- **CLR IL**: ~100 instructions
|
||||
- **WebAssembly**: ~150 instructions
|
||||
- **Even "minimal" VMs**: 30-50 instructions
|
||||
|
||||
This complexity stems from decades of optimization-driven design, where each performance improvement adds new instructions. The result? Compiler implementations measured in millions of lines of code, optimization passes that few understand, and a barrier to entry that excludes most researchers and students.
|
||||
|
||||
## The MIR-14 Evolution
|
||||
|
||||
We present MIR-14, which evolved through three distinct phases:
|
||||
1. **Initial design**: 27 instructions (feature-driven)
|
||||
2. **Aggressive reduction**: 13 instructions via BoxCall unification
|
||||
3. **Practical restoration**: 14 instructions (Core-13 + UnaryOp)
|
||||
|
||||
```
|
||||
Traditional: MIR-14:
|
||||
ArrayGet →
|
||||
ArraySet → } BoxCall
|
||||
RefGet → } (unified)
|
||||
RefSet →
|
||||
UnaryOp (restored for efficiency)
|
||||
```
|
||||
|
||||
The key insight: array operations and field accesses are fundamentally the same—they're all Box method calls. By recognizing this pattern, we achieved dramatic instruction reduction. However, practical experience showed that certain primitive operations (negation, NOT) warrant direct representation, leading to our final 14-instruction set.
|
||||
|
||||
## Performance Without Complexity
|
||||
|
||||
Critics might assume that fewer instructions mean worse performance. We prove the opposite:
|
||||
|
||||
- **Inline Caching**: 33x speedup for method dispatch
|
||||
- **AOT Compilation**: Near-native performance
|
||||
- **Typed Array Specialization**: Competitive with C arrays
|
||||
- **Code Size Reduction**: 20-50% smaller MIR output
|
||||
|
||||
The secret? Strategic optimization placement at Box boundaries rather than IR complexity.
|
||||
|
||||
## Contributions
|
||||
|
||||
This paper makes five key contributions:
|
||||
|
||||
1. **Evolution Methodology**: A documented journey from Core-27 → Core-13 → Core-14, demonstrating both aggressive reduction and pragmatic restoration.
|
||||
|
||||
2. **BoxCall Unification Architecture**: A novel design pattern that elegantly absorbs data access operations into a single instruction.
|
||||
|
||||
3. **Optimization Strategy**: Demonstration that IR minimalism coupled with boundary optimization outperforms complex IR designs.
|
||||
|
||||
4. **Implementation Evidence**: Full compiler stack (Parser → MIR → VM/JIT/AOT/WASM) maintaining ±5% performance of baseline.
|
||||
|
||||
5. **Educational Impact**: A compiler design that students can understand in days, not months.
|
||||
|
||||
## Paper Organization
|
||||
|
||||
The remainder of this paper is organized as follows:
|
||||
|
||||
- **Chapter 2** presents the Box Theory, our theoretical foundation for achieving complexity through composition rather than instruction proliferation.
|
||||
|
||||
- **Chapter 3** details the MIR15 design, explaining our process of reducing 26 instructions to 15 while maintaining full functionality.
|
||||
|
||||
- **Chapter 4** describes our implementation, including the unified architecture that enables four different backends to share the same minimal IR.
|
||||
|
||||
- **Chapter 5** evaluates our approach through GUI demonstrations, performance benchmarks, and instruction coverage analysis.
|
||||
|
||||
- **Chapter 6** discusses the implications of our findings and why this approach succeeds where conventional wisdom suggests it should fail.
|
||||
|
||||
- **Chapter 7** compares our work with related systems, highlighting the unique aspects of our minimalist approach.
|
||||
|
||||
- **Chapter 8** concludes with reflections on the future of minimal language design.
|
||||
|
||||
## A Note on Simplicity
|
||||
|
||||
> "Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away."
|
||||
> — Antoine de Saint-Exupéry
|
||||
|
||||
Nyash embodies this principle. By removing rather than adding, we have discovered that less truly can be more—not just philosophically, but practically. The GUI application running on your screen with 15 instructions is not a limitation overcome, but a validation of simplicity as a first-class design principle.
|
||||
|
||||
Welcome to the minimal instruction revolution.
|
||||
@ -1,148 +0,0 @@
|
||||
# Chapter 2: The Box Theory - A Mathematical Foundation
|
||||
|
||||
## 2.1 The Atomic Theory of Programming
|
||||
|
||||
Just as matter is composed of atoms that combine to form molecules and complex structures, we propose that programs can be viewed as compositions of atomic operations that combine through a universal abstraction—the Box.
|
||||
|
||||
### Definition 2.1 (Box)
|
||||
A Box B is a tuple (S, O, σ) where:
|
||||
- S is the internal state space
|
||||
- O is the set of operations {o₁, o₂, ..., oₙ}
|
||||
- σ: S × O × Args → S × Result is the state transition function
|
||||
|
||||
### Definition 2.2 (Atomic Operations)
|
||||
The minimal set of atomic operations A = {a₁, a₂, ..., a₁₅} forms the complete basis for computation:
|
||||
|
||||
```
|
||||
A = {Const, UnaryOp, BinOp, Compare, TypeOp,
|
||||
Load, Store, Branch, Jump, Return, Phi,
|
||||
NewBox, BoxCall, ArrayGet, ArraySet, ExternCall}
|
||||
```
|
||||
|
||||
## 2.2 Composition and Recursion
|
||||
|
||||
The power of the Box Theory lies not in the individual operations but in their composition:
|
||||
|
||||
### Theorem 2.1 (Compositional Completeness)
|
||||
For any computable function f, there exists a finite composition of Boxes B₁, B₂, ..., Bₙ such that f can be expressed using only operations from A.
|
||||
|
||||
*Proof sketch*: By showing that A contains operations for:
|
||||
1. Value creation (Const, NewBox)
|
||||
2. State manipulation (Load, Store, ArrayGet, ArraySet)
|
||||
3. Control flow (Branch, Jump, Return, Phi)
|
||||
4. Composition (BoxCall)
|
||||
5. External interaction (ExternCall)
|
||||
|
||||
We can construct any Turing-complete computation.
|
||||
|
||||
### Lemma 2.1 (Recursive Box Construction)
|
||||
Boxes can contain other Boxes, enabling recursive composition:
|
||||
|
||||
```
|
||||
GuiBox = Box({
|
||||
WindowBox,
|
||||
ButtonBox,
|
||||
CanvasBox
|
||||
})
|
||||
```
|
||||
|
||||
This recursive nature allows unbounded complexity from bounded primitives.
|
||||
|
||||
## 2.3 The Box Calculus
|
||||
|
||||
We formalize Box operations using a simple calculus:
|
||||
|
||||
### Syntax
|
||||
```
|
||||
e ::= x (variable)
|
||||
| c (constant)
|
||||
| new B(e₁,...,eₙ) (box creation)
|
||||
| e.m(e₁,...,eₙ) (box method call)
|
||||
| e₁ ⊕ e₂ (binary operation)
|
||||
| if e₁ then e₂ else e₃ (conditional)
|
||||
```
|
||||
|
||||
### Operational Semantics
|
||||
|
||||
**Box Creation**:
|
||||
```
|
||||
σ ⊢ eᵢ ⇓ vᵢ (for i = 1..n)
|
||||
________________________________
|
||||
σ ⊢ new B(e₁,...,eₙ) ⇓ ref(B, v₁,...,vₙ)
|
||||
```
|
||||
|
||||
**Method Call**:
|
||||
```
|
||||
σ ⊢ e ⇓ ref(B, state) σ ⊢ eᵢ ⇓ vᵢ
|
||||
B.m(state, v₁,...,vₙ) → (state', result)
|
||||
_________________________________________
|
||||
σ ⊢ e.m(e₁,...,eₙ) ⇓ result
|
||||
```
|
||||
|
||||
## 2.4 From Theory to Practice
|
||||
|
||||
The Box Theory manifests in Nyash through concrete examples:
|
||||
|
||||
### Example 2.1 (GUI as Boxes)
|
||||
```nyash
|
||||
box Button from Widget {
|
||||
init { text, onClick }
|
||||
|
||||
render() {
|
||||
# Rendering is just Box operations
|
||||
return me.drawRect(me.bounds)
|
||||
.drawText(me.text)
|
||||
}
|
||||
|
||||
handleClick(x, y) {
|
||||
if me.contains(x, y) {
|
||||
me.onClick()
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Every GUI element is a Box, every interaction is a BoxCall. The 15 atomic operations suffice because complexity resides in Box composition, not in the instruction set.
|
||||
|
||||
### Example 2.2 (Concurrency as Boxes)
|
||||
```nyash
|
||||
box TaskGroup {
|
||||
spawn(target, method, args) {
|
||||
# Concurrency through Box abstraction
|
||||
local future = new FutureBox()
|
||||
ExternCall("scheduler", "enqueue", [target, method, args, future])
|
||||
return future
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## 2.5 Why 15 Instructions Suffice
|
||||
|
||||
The key insight is the separation of concerns:
|
||||
|
||||
1. **Structure** (MIR): Handles control flow and basic operations
|
||||
2. **Behavior** (Boxes): Encapsulates domain-specific complexity
|
||||
3. **Composition** (BoxCall): Enables unlimited combinations
|
||||
|
||||
This separation allows us to keep the structural layer (MIR) minimal while achieving arbitrary functionality through behavioral composition.
|
||||
|
||||
### Theorem 2.2 (Minimality)
|
||||
The 15-instruction set A is minimal in the sense that removing any instruction would break either:
|
||||
1. Turing completeness
|
||||
2. Practical usability
|
||||
3. Box abstraction capability
|
||||
|
||||
## 2.6 Implications
|
||||
|
||||
The Box Theory has profound implications:
|
||||
|
||||
1. **Language Design**: Complexity should be in libraries, not in the core language
|
||||
2. **Implementation**: Simpler IRs can lead to more robust implementations
|
||||
3. **Optimization**: Focus on Box boundaries rather than instruction-level optimization
|
||||
4. **Education**: Minimal languages are easier to learn and understand
|
||||
|
||||
## 2.7 Conclusion
|
||||
|
||||
The Box Theory provides a mathematical foundation for building complex systems from minimal primitives. By viewing computation as the composition of atomic operations through Box abstractions, we can achieve the seemingly impossible: full-stack applications, including GUI programs, with just 15 instructions.
|
||||
|
||||
This is not merely a theoretical exercise—as we will show in the following chapters, this theory has been successfully implemented and validated in the Nyash programming language.
|
||||
@ -1,60 +0,0 @@
|
||||
# MIR13 (Core-13) Final Instruction Set
|
||||
|
||||
## The 13 Instructions
|
||||
|
||||
### 1. 値・計算 (3命令)
|
||||
- **Const**: 定数値のロード
|
||||
- **BinOp**: 二項演算(算術、論理、ビット演算すべて)
|
||||
- **Compare**: 比較演算(==, !=, <, >, <=, >=)
|
||||
|
||||
### 2. 制御フロー (4命令)
|
||||
- **Jump**: 無条件ジャンプ
|
||||
- **Branch**: 条件分岐
|
||||
- **Return**: 関数からの戻り
|
||||
- **Phi**: SSA形式での値の合流
|
||||
|
||||
### 3. 呼び出し (3命令)
|
||||
- **Call**: 通常の関数呼び出し
|
||||
- **BoxCall**: Boxメソッド呼び出し(配列、オブジェクト、すべてのデータ操作)
|
||||
- **ExternCall**: 外部関数呼び出し(システムコール、プラグイン等)
|
||||
|
||||
### 4. メタ操作 (3命令)
|
||||
- **TypeOp**: 型関連操作(型チェック、キャスト)
|
||||
- **Safepoint**: GCセーフポイント
|
||||
- **Barrier**: メモリバリア
|
||||
|
||||
## 削除された命令とその統合先
|
||||
|
||||
| 削除された命令 | 統合方法 |
|
||||
|--------------|---------|
|
||||
| Load/Store | BoxCallまたはCall(変数もBoxとして扱う) |
|
||||
| UnaryOp | BinOp(例:-x → 0-x, !x → x XOR true) |
|
||||
| ArrayGet/ArraySet | BoxCall |
|
||||
| NewBox | BoxCall(コンストラクタ呼び出し) |
|
||||
| FunctionNew | Const(関数も値) |
|
||||
| RefNew/RefGet/RefSet | BoxCall |
|
||||
| TypeCheck/Cast | TypeOp |
|
||||
| Debug/Print | ExternCall |
|
||||
| Copy/Nop | 不要(最適化で除去) |
|
||||
|
||||
## 設計の革新性
|
||||
|
||||
### 1. 変数アクセスの統一
|
||||
すべての変数アクセスが関数呼び出しとして表現される:
|
||||
```mir
|
||||
// 従来: %1 = Load %x
|
||||
%1 = Call @get_local "x"
|
||||
|
||||
// 従来: Store %y, %1
|
||||
Call @set_local "y" %1
|
||||
```
|
||||
|
||||
### 2. Everything is Box の究極形
|
||||
- 変数もBox
|
||||
- 関数もBox(Constで表現)
|
||||
- すべての操作がBoxCall
|
||||
|
||||
### 3. 実用性とのバランス
|
||||
- Safepointでガベージコレクションをサポート
|
||||
- Barrierで並行性を考慮
|
||||
- ExternCallで拡張性を確保
|
||||
@ -1,55 +0,0 @@
|
||||
# 14命令のミニマルIRによる統一実行基盤設計(MIR14, PHIオフ方針)
|
||||
著者: Nyash Project
|
||||
|
||||
要旨
|
||||
Nyashは「Everything is Box」哲学を核に、14命令(MIR14)の最小IRでInterpreter/VM/JIT/AOT/GUIを目指してきた。本稿ではPhase‑15における設計判断として、MIR側のPHI生成を停止(PHI‑off, エッジコピー合流)し、PHI形成をLLVMハーネス側に委譲する方針を採用した経緯と効果を報告する。現在の評価範囲はPyVM(意味論リファレンス)とLLVM/llvmlite(AOT/EXEハーネス)に限定し、両者のパリティおよびLLVM側の性能・安定性を中心に示す。
|
||||
|
||||
> 更新メモ(2025-09-26): Phase‑15 では PHI-on(MIR14)が既定に復帰したよ。この資料はPHI-off方針をアーカイブとして残しているよ。現行のポリシーは `docs/reference/mir/phi_policy.md` を参照してね。
|
||||
|
||||
## 1. はじめに
|
||||
最小IRで多様な実行形態を統一する挑戦では、IRの表現力と実装コストの均衡が鍵となる。Nyashは命令の削減(27→13→14)とAPI統一(BoxCall)でIRを簡素に保ちつつ、評価基準をPyVM意味論とLLVM生成物に絞ることで、開発・検証速度を高めた。
|
||||
|
||||
## 2. MIR14の設計原則
|
||||
- 命令セット: const/binop/unary/compare/branch/jump/ret/phi/call/boxcall/typeop/arrayget/arrayset/cast/…(詳細は `docs/reference/mir/INSTRUCTION_SET.md`)
|
||||
- Box中心: 呼び出しとABI境界はBoxCall/PluginInvokeに一本化
|
||||
- 可観測性: JSON v0、IRダンプ、PHI配線トレースを整備
|
||||
- 非対象(現段階): MIR側の最適PHI配置の探索・検証(責務をLLVMへ移譲)
|
||||
|
||||
## 3. PHIオフ方針とLLVM側合成
|
||||
- 方針: MIRはPHIを出さず、分岐合流は「合流先で参照される値」を各前任ブロックからエッジコピーで集約
|
||||
- LLVM: ブロック先頭にPHIを形成(typed incoming)、if‑merge前宣言等で安定性向上
|
||||
- 不変条件(LLVM側): PHIはブロック先頭にのみ配置、incomingは型付き `i64 <v>, %bb`(詳細: `docs/reference/mir/phi_invariants.md`)
|
||||
- トグル:
|
||||
- 既定: `NYASH_MIR_NO_PHI=0`(PHI-on)
|
||||
- レガシー再現: `NYASH_MIR_NO_PHI=1`(PHI-off) + `NYASH_VERIFY_ALLOW_NO_PHI=1`
|
||||
|
||||
## 4. 実装概要(評価対象)
|
||||
- PyVM: JSON v0→MIR実行の意味論基準。短絡やtruthy規約の基準線
|
||||
- LLVM/llvmlite: AOT/EXE生成・IRダンプ・PHI合成の実働ライン
|
||||
- 実行例:
|
||||
- LLVMハーネス: `NYASH_LLVM_USE_HARNESS=1 NYASH_LLVM_DUMP_IR=tmp/nyash.ll ...`
|
||||
- PHIトレース: `NYASH_LLVM_TRACE_PHI=1`
|
||||
|
||||
## 5. 評価計画
|
||||
- パリティ: PyVM vs LLVMの出力一致(代表スモーク)
|
||||
- 性能: LLVMの実行時間/起動時間/メモリ
|
||||
- 安定性: PHIトレース整合、空PHI未発生の確認
|
||||
- 再現コマンド:
|
||||
- parity: `tools/parity.sh --lhs pyvm --rhs llvmlite apps/tests/CASE.nyash`
|
||||
- build paper (PDF/TeX): `tools/papers/build.sh a-jp`
|
||||
|
||||
## 6. 関連研究
|
||||
最小IR設計(LLVM/MLIR等)と、多層実行(Truffle/Graal)に対する立ち位置を簡潔に比較。Nyashは「IRは最小・PHIは生成系に委譲」という分担で整合を取る点に新規性。
|
||||
|
||||
## 7. 結論
|
||||
MIR14の簡素化とPHI委譲により、設計・検証・配布ラインを細く強く維持できた。今後はLoopForm(MIR17)や実行器の拡張を、PyVM/LLVMの二系統基準で段階的に進める。
|
||||
|
||||
### 謝辞
|
||||
AI協働(ChatGPT/Gemini)とコミュニティ貢献に感謝する。
|
||||
|
||||
### 付録
|
||||
- 主要トグル: `NYASH_MIR_NO_PHI`, `NYASH_LLVM_USE_HARNESS`, `NYASH_LLVM_TRACE_PHI`
|
||||
- 仕様参照: `docs/reference/mir/INSTRUCTION_SET.md`, `docs/reference/mir/phi_invariants.md`
|
||||
|
||||
### キーワード
|
||||
ミニマルIR, SSA, PHI合成, LLVM, PyVM, BoxCall, 統一実行
|
||||
@ -1,38 +0,0 @@
|
||||
# Minimal Yet Practical: The MIR-14 Instruction Set and Everything-is-Box Philosophy
|
||||
|
||||
## Authors
|
||||
TBD
|
||||
|
||||
## Abstract
|
||||
[See abstract.md]
|
||||
|
||||
## 1. Introduction
|
||||
[See chapters/01-introduction.md]
|
||||
|
||||
## 2. The Evolution of MIR: From 27 to 14
|
||||
[TODO: Detail the systematic reduction and practical restoration process]
|
||||
|
||||
## 3. BoxCall Unification Architecture
|
||||
[TODO: Explain how BoxCall absorbs array/field operations]
|
||||
|
||||
## 4. Optimization Strategies
|
||||
[TODO: IC, AOT, TypedArray optimizations]
|
||||
|
||||
## 5. Implementation
|
||||
[TODO: Compiler architecture and implementation details]
|
||||
|
||||
## 6. Evaluation
|
||||
[TODO: Performance benchmarks and analysis]
|
||||
|
||||
## 7. Related Work
|
||||
[TODO: Comparison with other minimal IRs]
|
||||
|
||||
## 8. Conclusion
|
||||
[TODO: Summary and future directions]
|
||||
|
||||
## References
|
||||
[TODO: Add bibliography]
|
||||
|
||||
---
|
||||
|
||||
*Note: This is the main paper structure. Detailed content is in individual chapter files.*
|
||||
@ -1,142 +0,0 @@
|
||||
# 論文B: Nyash言語と実行モデル
|
||||
|
||||
> Scope (2025-09-19): 本稿の範囲は PyVM と LLVM/llvmlite の2系統に限定。MIR14 は PHI‑off(合流はエッジコピー)、PHI 形成は LLVM ハーネスで行う。JIT/Interpreter は Phase‑15 では補助扱い。詳細は SCOPE.md を参照。
|
||||
|
||||
## 📚 概要
|
||||
|
||||
**タイトル**: Nyash: A Box-First Programming Language with Symmetric Memory Management and P2P Intent Model
|
||||
|
||||
**主題**: Nyash言語そのものの設計と実装
|
||||
|
||||
**対象読者**: 言語理論・分散システム・アプリ開発寄り
|
||||
|
||||
## 🎯 研究ポイント
|
||||
|
||||
### 1. init/fini対称性によるメモリ管理
|
||||
- コンストラクタ(init/birth/pack)とデストラクタ(fini)の対称設計
|
||||
- 明示的なリソース管理による安全性
|
||||
- GCオン/オフ切り替え可能な柔軟性
|
||||
|
||||
### 2. P2P Intentモデル
|
||||
- Box間の意図ベース通信
|
||||
- 分散アプリケーション向け設計
|
||||
- NyaMeshライブラリによる実装
|
||||
|
||||
### 3. 多層実行アーキテクチャ
|
||||
- **Interpreter**: 開発・デバッグ用
|
||||
- **VM**: 高速実行
|
||||
- **JIT**: 動的最適化
|
||||
- **AOT**: 配布用バイナリ
|
||||
- **WASM**: Web展開
|
||||
|
||||
## 🚀 実装例
|
||||
|
||||
### 1. NyashCoin - P2P暗号通貨
|
||||
```nyash
|
||||
box NyashCoin from P2PBox {
|
||||
init { balance, transactions }
|
||||
|
||||
birth(nodeId, network) {
|
||||
from P2PBox.pack(nodeId, network)
|
||||
me.balance = new MapBox()
|
||||
me.transactions = new ArrayBox()
|
||||
}
|
||||
|
||||
onIntent(intent, data, sender) {
|
||||
switch intent {
|
||||
"transfer": me.handleTransfer(data, sender)
|
||||
"mine": me.handleMining(data, sender)
|
||||
"sync": me.handleSync(data, sender)
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 2. プラグインストア
|
||||
- 動的プラグインロード
|
||||
- TypeBox ABIによる相互運用
|
||||
- セキュアな実行環境
|
||||
|
||||
### 3. GUI/Webアプリケーション
|
||||
- EguiBoxによるGUI開発
|
||||
- WebCanvasBoxによるブラウザ対応
|
||||
- 統一的なBox APIによる開発
|
||||
|
||||
## 📊 評価計画
|
||||
|
||||
### 言語機能の評価
|
||||
- 表現力: 他言語との比較
|
||||
- 学習曲線: 初学者への調査
|
||||
- 開発効率: LOCとバグ率
|
||||
|
||||
### 性能評価
|
||||
- 各バックエンドのベンチマーク
|
||||
- メモリ使用量の比較
|
||||
- 起動時間・応答性
|
||||
|
||||
### 実用性評価
|
||||
- 実アプリケーション開発
|
||||
- プラグインエコシステム
|
||||
- クロスプラットフォーム性
|
||||
|
||||
## 📁 ディレクトリ構造
|
||||
|
||||
```
|
||||
paper-b-nyash-execution-model/
|
||||
├── README.md # このファイル
|
||||
├── abstract.md # 論文概要
|
||||
├── main-paper.md # 本文
|
||||
├── chapters/ # 章別ファイル
|
||||
│ ├── 01-introduction.md
|
||||
│ ├── 02-language-design.md
|
||||
│ ├── 03-memory-model.md
|
||||
│ ├── 04-p2p-intent.md
|
||||
│ ├── 05-execution-backends.md
|
||||
│ ├── 06-case-studies.md
|
||||
│ └── 07-conclusion.md
|
||||
├── figures/ # 図表
|
||||
│ ├── box-hierarchy.png
|
||||
│ ├── execution-flow.svg
|
||||
│ └── p2p-architecture.png
|
||||
├── examples/ # コード例
|
||||
│ ├── nyashcoin/
|
||||
│ ├── plugin-store/
|
||||
│ └── gui-apps/
|
||||
├── data/ # 実験データ
|
||||
│ ├── performance/
|
||||
│ └── usability-study/
|
||||
└── related-work.md # 関連研究
|
||||
```
|
||||
|
||||
## 🗓️ スケジュール
|
||||
|
||||
- **2025年9月**: 実装例の完成・評価実施
|
||||
- **2025年10月**: 執筆開始
|
||||
- **2025年11月**: OOPSLA 2026投稿
|
||||
- **2026年春**: Onward!投稿(設計哲学編)
|
||||
|
||||
## 📝 執筆メモ
|
||||
|
||||
### 強調すべき貢献
|
||||
1. **Everything is Box哲学**: 統一的なオブジェクトモデル
|
||||
2. **対称的メモリ管理**: init/finiによる明示的制御
|
||||
3. **P2P Intentモデル**: 分散アプリケーションの新パラダイム
|
||||
4. **多層実行環境**: 用途に応じた最適な実行方式
|
||||
|
||||
### 新規性
|
||||
- Box中心の言語設計
|
||||
- 意図ベースのメッセージング
|
||||
- プラグイン第一級市民
|
||||
- 実行バックエンドの透過的切り替え
|
||||
|
||||
### 実証
|
||||
- 実動作するアプリケーション群
|
||||
- プラグインエコシステムの構築
|
||||
- クロスプラットフォーム展開
|
||||
|
||||
## 🔗 関連ドキュメント
|
||||
|
||||
- [Language Reference](../../../../reference/language/LANGUAGE_REFERENCE_2025.md)
|
||||
- [Everything is Box](../../../../reference/boxes-system/everything-is-box.md)
|
||||
- [P2P Box Guide](../../../../guides/p2p-guide.md)
|
||||
- [Execution Backends](../../../../reference/architecture/execution-backends.md)
|
||||
@ -1,17 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
OUT_DIR=$(cd "$(dirname "$0")" && pwd)
|
||||
OUT_FILE="$OUT_DIR/ENVIRONMENT.txt"
|
||||
|
||||
{
|
||||
echo "== Datetime =="; date -Iseconds || date; echo
|
||||
echo "== OS =="; uname -a || true; lsb_release -a 2>/dev/null || true; sw_vers 2>/dev/null || true; systeminfo 2>/dev/null | head -n 30 || true; echo
|
||||
echo "== CPU =="; lscpu 2>/dev/null || sysctl -a 2>/dev/null | grep machdep.cpu || true; echo
|
||||
echo "== Rust toolchain =="; rustc --version 2>/dev/null || true; cargo --version 2>/dev/null || true; echo
|
||||
echo "== Git =="; git rev-parse HEAD 2>/dev/null || true; echo
|
||||
echo "== Cranelift/JIT features =="; rg -n "cranelift|jit" -S ../../../../ -g '!target' 2>/dev/null || true
|
||||
} > "$OUT_FILE"
|
||||
|
||||
echo "[DONE] Wrote $OUT_FILE"
|
||||
|
||||
@ -1,25 +0,0 @@
|
||||
This folder contains reproducibility artifacts for Paper B (Nyash language & execution model).
|
||||
|
||||
Files
|
||||
- `COLLECT_ENV.sh`: Captures host OS/CPU/toolchain/git info into `ENVIRONMENT.txt`.
|
||||
- `RUN_BENCHMARKS.sh`: Runs interpreter/VM/JIT/AOT (if available) on sample benchmarks and writes CSVs to `results/`.
|
||||
- `results/`: Output CSVs (per benchmark and per mode).
|
||||
|
||||
Usage
|
||||
1) Capture environment
|
||||
./COLLECT_ENV.sh
|
||||
|
||||
2) Build (full)
|
||||
cargo build --release --features cranelift-jit
|
||||
|
||||
3) Run benchmarks
|
||||
./RUN_BENCHMARKS.sh
|
||||
|
||||
Variables:
|
||||
- NYASH_BIN: Path to nyash binary (default: target/release/nyash)
|
||||
- USE_EXE_ONLY=1: Only measure AOT executables (skips interp/vm/jit)
|
||||
|
||||
Notes
|
||||
- AOT requires `tools/build_aot.sh`. If missing, AOT is skipped.
|
||||
- If `hyperfine` is not installed, a simple timing fallback is used.
|
||||
|
||||
@ -1,98 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
# Repro benchmarks for Paper B (Nyash language & execution model)
|
||||
# Uses the shared benchmarks folder; writes CSVs under _artifacts/results
|
||||
|
||||
if ROOT_DIR=$(git -C "$(dirname "$0")" rev-parse --show-toplevel 2>/dev/null); then
|
||||
ROOT_DIR="$ROOT_DIR/nyash"
|
||||
[[ -d "$ROOT_DIR" ]] || ROOT_DIR=$(git rev-parse --show-toplevel)
|
||||
else
|
||||
ROOT_DIR=$(cd "$(dirname "$0")/../../../../.." && pwd)
|
||||
fi
|
||||
ART_DIR=$(cd "$(dirname "$0")" && pwd)
|
||||
RES_DIR="$ART_DIR/results"
|
||||
mkdir -p "$RES_DIR"
|
||||
|
||||
NYASH_BIN=${NYASH_BIN:-"$ROOT_DIR/target/release/nyash"}
|
||||
SKIP_INTERP=${SKIP_INTERP:-0}
|
||||
SKIP_AOT=${SKIP_AOT:-0}
|
||||
RUNS=${RUNS:-10}
|
||||
USE_EXE_ONLY=${USE_EXE_ONLY:-0}
|
||||
HYPERFINE=$(command -v hyperfine || true)
|
||||
TIMEOUT_SECS=${TIMEOUT_SECS:-0}
|
||||
TIMEOUT_BIN=$(command -v timeout || true)
|
||||
|
||||
BENCH_DIR="$ROOT_DIR/benchmarks"
|
||||
FILES=(
|
||||
"$BENCH_DIR/bench_light.nyash"
|
||||
"$BENCH_DIR/bench_medium.nyash"
|
||||
"$BENCH_DIR/bench_heavy.nyash"
|
||||
)
|
||||
|
||||
echo "[INFO] NYASH_BIN=$NYASH_BIN"
|
||||
echo "[INFO] USE_EXE_ONLY=$USE_EXE_ONLY (1=EXE only)"
|
||||
echo "[INFO] hyperfine=${HYPERFINE:-not found}"
|
||||
|
||||
if [[ ! -x "$NYASH_BIN" && "$USE_EXE_ONLY" -eq 0 ]]; then
|
||||
echo "[INFO] Building nyash (release, with JIT feature)"
|
||||
(cd "$ROOT_DIR" && cargo build --release --features cranelift-jit)
|
||||
fi
|
||||
|
||||
have_build_aot=0
|
||||
if [[ -x "$ROOT_DIR/tools/build_aot.sh" ]]; then
|
||||
have_build_aot=1
|
||||
fi
|
||||
|
||||
run_cmd() {
|
||||
local cmd="$1" label="$2" csv="$3"
|
||||
local cmd_wrap="$cmd"
|
||||
if [[ -n "$TIMEOUT_BIN" && "$TIMEOUT_SECS" -gt 0 ]]; then
|
||||
cmd_wrap="$TIMEOUT_BIN ${TIMEOUT_SECS}s $cmd"
|
||||
fi
|
||||
if [[ -n "$HYPERFINE" ]]; then
|
||||
$HYPERFINE -w 1 -r "$RUNS" --export-csv "$csv" --show-output "$cmd_wrap"
|
||||
else
|
||||
: > "$csv"
|
||||
for i in $(seq 1 "$RUNS"); do
|
||||
local t0=$(python3 - <<<'import time; print(int(time.time()*1000))')
|
||||
bash -lc "$cmd_wrap" >/dev/null 2>&1 || true
|
||||
local t1=$(python3 - <<<'import time; print(int(time.time()*1000))')
|
||||
echo "$label,$((t1-t0))" >> "$csv"
|
||||
done
|
||||
fi
|
||||
}
|
||||
|
||||
for f in "${FILES[@]}"; do
|
||||
[[ -f "$f" ]] || { echo "[WARN] Skip missing $f"; continue; }
|
||||
base=$(basename "$f" .nyash)
|
||||
|
||||
if [[ "$USE_EXE_ONLY" -eq 0 ]]; then
|
||||
if [[ "$SKIP_INTERP" -eq 0 ]]; then
|
||||
run_cmd "$NYASH_BIN $f" "interp-$base" "$RES_DIR/${base}_interp.csv"
|
||||
else
|
||||
echo "[INFO] SKIP_INTERP=1: skipping interpreter for $f"
|
||||
fi
|
||||
run_cmd "$NYASH_BIN --backend vm $f" "vm-$base" "$RES_DIR/${base}_vm.csv"
|
||||
run_cmd "NYASH_JIT_EXEC=1 $NYASH_BIN --backend vm $f" "jit-$base" "$RES_DIR/${base}_jit.csv"
|
||||
fi
|
||||
|
||||
if [[ $have_build_aot -eq 1 && "$SKIP_AOT" -eq 0 ]]; then
|
||||
out="/tmp/ny_${base}_aot"
|
||||
bash "$ROOT_DIR/tools/build_aot.sh" "$f" -o "$out" >/dev/null 2>&1 || true
|
||||
if [[ -x "$out" ]]; then
|
||||
run_cmd "$out" "aot-$base" "$RES_DIR/${base}_aot.csv"
|
||||
rm -f "$out"
|
||||
else
|
||||
echo "[WARN] AOT build failed for $f"
|
||||
fi
|
||||
else
|
||||
if [[ "$SKIP_AOT" -eq 1 ]]; then
|
||||
echo "[INFO] SKIP_AOT=1: skipping AOT for $f"
|
||||
else
|
||||
echo "[INFO] AOT tool not found; skipping AOT for $f"
|
||||
fi
|
||||
fi
|
||||
done
|
||||
|
||||
echo "[DONE] Results in $RES_DIR"
|
||||
@ -1,38 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
ROOT=$(git rev-parse --show-toplevel 2>/dev/null || pwd)
|
||||
BIN="$ROOT/nyash/target/release/nyash"
|
||||
|
||||
need() { command -v "$1" >/dev/null 2>&1 || { echo "error: missing: $1" >&2; exit 1; }; }
|
||||
need "$BIN" || (cd "$ROOT/nyash" && cargo build --release >/dev/null)
|
||||
|
||||
declare -A LOOPS
|
||||
LOOPS[bench_box_create_destroy.nyash]=1000000
|
||||
LOOPS[bench_method_call_only.nyash]=2000000
|
||||
|
||||
bench() {
|
||||
local file="$1" loops="$2"
|
||||
local path="$ROOT/nyash/benchmarks/$file"
|
||||
[[ -f "$path" ]] || { echo "[skip] missing $path"; return; }
|
||||
local t0=$(python3 - <<<'import time; print(time.time())')
|
||||
"$BIN" "$path" >/dev/null 2>&1 || true
|
||||
local t1=$(python3 - <<<'import time; print(time.time())')
|
||||
local ms=$(python3 - <<EOF
|
||||
import sys
|
||||
print(int( (float($t1) - float($t0)) * 1000 ))
|
||||
EOF
|
||||
)
|
||||
# ns per op (approx)
|
||||
local ns=$(python3 - <<EOF
|
||||
loops=$loops
|
||||
ms=$ms
|
||||
print(int( (ms*1_000_000.0) / loops ))
|
||||
EOF
|
||||
)
|
||||
echo "$file: ${ms}ms total, ~${ns} ns/op"
|
||||
}
|
||||
|
||||
bench bench_box_create_destroy.nyash ${LOOPS[bench_box_create_destroy.nyash]}
|
||||
bench bench_method_call_only.nyash ${LOOPS[bench_method_call_only.nyash]}
|
||||
|
||||
@ -1,62 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
# Time-boxed interpreter microbenchmarks.
|
||||
# Runs each benchmark program repeatedly for TIME_SECS seconds and reports ~ns/op.
|
||||
|
||||
ROOT=$(git rev-parse --show-toplevel 2>/dev/null || pwd)
|
||||
BIN="$ROOT/nyash/target/release/nyash"
|
||||
TIME_SECS=${TIME_SECS:-3}
|
||||
|
||||
declare -A LOOPS
|
||||
# Use small-loop variants so each run completes quickly in interpreter
|
||||
LOOPS[bench_box_create_destroy_small.nyash]=10000
|
||||
LOOPS[bench_method_call_only_small.nyash]=5000
|
||||
|
||||
ensure_bin() {
|
||||
if [[ ! -x "$BIN" ]]; then
|
||||
echo "[build] nyash (release)"
|
||||
(cd "$ROOT/nyash" && cargo build --release >/dev/null)
|
||||
fi
|
||||
}
|
||||
|
||||
run_timeboxed() {
|
||||
local file="$1" loops_per_run="$2"
|
||||
local path="$ROOT/nyash/benchmarks/$file"
|
||||
[[ -f "$path" ]] || { echo "[skip] missing $path"; return; }
|
||||
local runs=0
|
||||
local t0=$(python3 - <<<'import time; print(time.time())')
|
||||
local deadline=$(python3 - <<EOF
|
||||
import time
|
||||
print(time.time() + $TIME_SECS)
|
||||
EOF
|
||||
)
|
||||
while :; do
|
||||
"$BIN" "$path" >/dev/null 2>&1 || true
|
||||
runs=$((runs+1))
|
||||
local now=$(python3 - <<<'import time; print(time.time())')
|
||||
awk "BEGIN{exit !(($now) >= ($deadline))}" && break || true
|
||||
done
|
||||
local t1=$(python3 - <<<'import time; print(time.time())')
|
||||
local elapsed_ms=$(python3 - <<EOF
|
||||
print(int( (float($t1) - float($t0)) * 1000 ))
|
||||
EOF
|
||||
)
|
||||
local total_ops=$((runs * loops_per_run))
|
||||
if [[ "$total_ops" -le 0 ]]; then
|
||||
echo "$file: no runs completed"
|
||||
return
|
||||
fi
|
||||
local ns_per_op=$(python3 - <<EOF
|
||||
ops=$total_ops
|
||||
ms=$elapsed_ms
|
||||
print(int((ms*1_000_000.0)/ops))
|
||||
EOF
|
||||
)
|
||||
echo "$file: ${elapsed_ms}ms, runs=$runs, ops=$total_ops, ~${ns_per_op} ns/op"
|
||||
}
|
||||
|
||||
ensure_bin
|
||||
echo "[time-box] TIME_SECS=$TIME_SECS"
|
||||
run_timeboxed bench_box_create_destroy_small.nyash ${LOOPS[bench_box_create_destroy_small.nyash]}
|
||||
run_timeboxed bench_method_call_only_small.nyash ${LOOPS[bench_method_call_only_small.nyash]}
|
||||
@ -1,27 +0,0 @@
|
||||
# Abstract
|
||||
|
||||
## English Version
|
||||
|
||||
We present Nyash, a box-first programming language that introduces symmetric memory management through init/fini pairs and a novel P2P Intent communication model. Unlike traditional object-oriented or functional languages, Nyash's "Everything is Box" philosophy unifies all computational entities—from primitive values to distributed nodes—under a single abstraction. This design enables unprecedented flexibility: seamless transitions between garbage-collected and manual memory management, transparent plugin integration via TypeBox ABI, and natural expression of distributed applications.
|
||||
|
||||
Our key contributions are: (1) the init/fini symmetric lifecycle model that guarantees deterministic resource cleanup while supporting both GC and manual modes; (2) the P2P Intent system that elevates message passing to intent-based communication, enabling elegant distributed application design; (3) a multi-tier execution architecture (Interpreter → VM → JIT → AOT → WASM) with identical semantics across all backends; (4) real-world validation through applications including NyashCoin (P2P cryptocurrency), a plugin marketplace, and cross-platform GUI applications.
|
||||
|
||||
The language is implemented in ~4,000 lines of Rust, demonstrating that a powerful and flexible language can emerge from simple, orthogonal concepts. Performance evaluation shows that Nyash applications achieve near-native speed in AOT mode while maintaining the development convenience of dynamic languages in interpreter mode. User studies indicate that the Box model significantly reduces cognitive load for concurrent and distributed programming.
|
||||
|
||||
This work presents a fresh approach to language design where simplicity and power are not opposing forces but complementary aspects of a unified philosophy.
|
||||
|
||||
## 日本語版
|
||||
|
||||
本研究では、init/finiペアによる対称的メモリ管理と新規のP2P Intent通信モデルを導入するBox中心プログラミング言語Nyashを提示する。従来のオブジェクト指向言語や関数型言語とは異なり、Nyashの「Everything is Box」哲学は、プリミティブ値から分散ノードまで、すべての計算実体を単一の抽象化の下に統一する。この設計により前例のない柔軟性が実現される:ガベージコレクションと手動メモリ管理のシームレスな切り替え、TypeBox ABIによる透過的なプラグイン統合、分散アプリケーションの自然な表現。
|
||||
|
||||
本研究の主要な貢献は以下の通りである:(1)GCモードと手動モードの両方をサポートしながら決定的なリソースクリーンアップを保証するinit/fini対称ライフサイクルモデル、(2)メッセージパッシングを意図ベース通信に昇華させ、エレガントな分散アプリケーション設計を可能にするP2P Intentシステム、(3)すべてのバックエンドで同一のセマンティクスを持つ多層実行アーキテクチャ(インタープリタ → VM → JIT → AOT → WASM)、(4)NyashCoin(P2P暗号通貨)、プラグインマーケットプレイス、クロスプラットフォームGUIアプリケーションを含む実世界アプリケーションによる検証。
|
||||
|
||||
言語は約4,000行のRustで実装されており、シンプルで直交する概念から強力で柔軟な言語が生まれることを実証している。性能評価により、NyashアプリケーションはAOTモードでネイティブに近い速度を達成しながら、インタープリタモードでは動的言語の開発利便性を維持することが示された。ユーザー調査は、Boxモデルが並行・分散プログラミングの認知負荷を大幅に削減することを示している。
|
||||
|
||||
本研究は、シンプルさと強力さが対立する力ではなく、統一された哲学の補完的な側面である言語設計への新鮮なアプローチを提示する。
|
||||
|
||||
## Keywords / キーワード
|
||||
|
||||
Box-first programming, Symmetric memory management, P2P Intent model, Multi-tier execution, Plugin architecture, Distributed systems
|
||||
|
||||
Box中心プログラミング、対称的メモリ管理、P2P Intentモデル、多層実行、プラグインアーキテクチャ、分散システム
|
||||
@ -1,113 +0,0 @@
|
||||
# Chapter 1: Introduction
|
||||
|
||||
## Beyond Objects and Functions
|
||||
|
||||
What if we could design a programming language where every computational entity—from integers to GUI windows, from network sockets to distributed nodes—shared a single, unified abstraction? Not objects with inheritance hierarchies. Not functions with monadic transformations. Just **Boxes**.
|
||||
|
||||
This is Nyash: a language built on the radical premise that "Everything is Box."
|
||||
|
||||
## The Box Philosophy
|
||||
|
||||
In traditional languages, we juggle multiple abstractions:
|
||||
- **Objects** for encapsulation and inheritance
|
||||
- **Functions** for computation and composition
|
||||
- **Primitives** for efficiency
|
||||
- **Pointers** for indirection
|
||||
- **Interfaces** for polymorphism
|
||||
- **Modules** for organization
|
||||
|
||||
Each abstraction brings its own rules, limitations, and cognitive overhead. Nyash replaces this complexity with a single, powerful concept: the Box.
|
||||
|
||||
```nyash
|
||||
// Everything is truly a Box
|
||||
local num = 42 // IntegerBox
|
||||
local str = "Hello" // StringBox
|
||||
local arr = [1, 2, 3] // ArrayBox
|
||||
local gui = new EguiBox() // GUI window is a Box
|
||||
local peer = new P2PBox() // Network node is a Box
|
||||
|
||||
// And they all work the same way
|
||||
num.toString() // "42"
|
||||
gui.toString() // "EguiBox{title: 'App'}"
|
||||
peer.toString() // "P2PBox{id: 'node1'}"
|
||||
```
|
||||
|
||||
## Three Revolutionary Concepts
|
||||
|
||||
### 1. Symmetric Memory Management (init/fini)
|
||||
|
||||
Nyash introduces perfect symmetry between construction and destruction:
|
||||
|
||||
```nyash
|
||||
box DatabaseConnection {
|
||||
init { handle, cache }
|
||||
|
||||
birth(connectionString) {
|
||||
me.handle = NativeDB.connect(connectionString)
|
||||
me.cache = new MapBox()
|
||||
print("Connected to database")
|
||||
}
|
||||
|
||||
fini() {
|
||||
me.handle.close()
|
||||
print("Disconnected from database")
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Every `birth` has its `fini`. Resources are never leaked. Yet this deterministic model coexists with optional garbage collection—developers choose their memory model per-Box.
|
||||
|
||||
### 2. P2P Intent Communication
|
||||
|
||||
Traditional message passing asks "what method to call." Nyash asks "what do you intend to achieve":
|
||||
|
||||
```nyash
|
||||
box ChatNode from P2PBox {
|
||||
onIntent(intent, data, sender) {
|
||||
switch intent {
|
||||
"chat": me.displayMessage(data.text, sender)
|
||||
"file": me.receiveFile(data.content, sender)
|
||||
"presence": me.updateStatus(data.status, sender)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Usage is natural
|
||||
chatNode.send("alice", "chat", {text: "Hello!"})
|
||||
```
|
||||
|
||||
This intent-based model naturally extends from local method calls to distributed communication without changing the programming model.
|
||||
|
||||
### 3. Transparent Multi-Tier Execution
|
||||
|
||||
The same Nyash code runs across five execution tiers:
|
||||
|
||||
1. **Interpreter**: Instant start, perfect debugging
|
||||
2. **VM**: 10x faster, same semantics
|
||||
3. **JIT**: Near-native speed for hot paths
|
||||
4. **AOT**: Deploy as standalone executables
|
||||
5. **WASM**: Run in browsers and edge computing
|
||||
|
||||
Developers write once. The execution tier is a deployment choice, not a language limitation.
|
||||
|
||||
## Real-World Validation
|
||||
|
||||
This isn't theoretical. We've built:
|
||||
|
||||
- **NyashCoin**: A P2P cryptocurrency in 200 lines
|
||||
- **Plugin Marketplace**: Dynamic code loading with security
|
||||
- **Cross-platform GUI Apps**: Same code on Linux/Windows/Web
|
||||
- **Distributed Games**: Real-time multiplayer with P2P intents
|
||||
|
||||
Each demonstrates that simplicity and power are not opposing forces—they're complementary aspects of good design.
|
||||
|
||||
## Paper Organization
|
||||
|
||||
- Chapter 2: The Box model and Everything is Box philosophy
|
||||
- Chapter 3: Symmetric memory management with init/fini
|
||||
- Chapter 4: P2P Intent model for distributed programming
|
||||
- Chapter 5: Multi-tier execution architecture
|
||||
- Chapter 6: Case studies and applications
|
||||
- Chapter 7: Evaluation and performance analysis
|
||||
- Chapter 8: Related work and comparisons
|
||||
- Chapter 9: Conclusion and future directions
|
||||
Binary file not shown.
|
Before Width: | Height: | Size: 13 KiB |
@ -1,47 +0,0 @@
|
||||
# Nyash言語と実行モデル——Box‑First設計と二系統実行(PyVM/LLVM)
|
||||
著者: Nyash Project
|
||||
|
||||
要旨
|
||||
NyashはBox‑First設計(Everything is Box)を採用し、birth/init/pack↔finiの対称的メモリ管理、プラグインABI、そして実行系の二系統(PyVM/LLVM)で開発・配布を成り立たせる。本稿ではPhase‑15での現実的な運用範囲として、PyVM(意味論基準)とLLVM(PHI合成・AOT/EXE)に評価を絞り、Boxモデルと実行責務の分担設計を示す。
|
||||
|
||||
## 1. はじめに
|
||||
Boxを言語の第一級抽象として採用し、型・所有・リソース・ABIを単一のメンタルモデルで統一する。設計の簡素さを保ちながら現実的な配布物を得るため、実装コストの高い経路(JIT/Interpreter)はPhase‑15では補助に留め、PyVM/LLVMの二系統を強化した。
|
||||
|
||||
## 2. 言語設計(Box‑First)
|
||||
- Boxモデル: 値・モジュール・リソースの統一表現
|
||||
- 対称メモリ: birth/init/pack と fini による決定的解放(将来GCオン/オフ両立へ拡張可能)
|
||||
- プラグインABI: TypeBox/BID‑FFI(`docs/reference/plugin-system/`)での安定相互運用
|
||||
- 例外と修飾子(将来含む): Block + Modifier の方向性(詳細は別稿)
|
||||
|
||||
## 3. 実行モデル(二系統)
|
||||
- PyVM(基準): 短絡やtruthy規約を含む意味論確認。仕様差検出・パリティ基盤
|
||||
- LLVM(配布): AOT/EXE生成、PHI合成、IRダンプ/トレース
|
||||
- 責務分担:
|
||||
- MIRはPHI‑off(合流はエッジコピー)
|
||||
- LLVMがブロック先頭でPHI合成(typed incoming)
|
||||
- PyVMは意味論の参照実装、LLVMは配布物と性能の源泉
|
||||
|
||||
## 4. ケーススタディ
|
||||
- 文字列/配列/MapなどのBoxメソッド(BoxCall)での一貫API
|
||||
- プラグイン連携(ファイル/パス等)とABI境界の単純化
|
||||
- 短絡・分岐合流でのパリティ(PyVM=意味, LLVM=PHI合成)確認
|
||||
|
||||
## 5. 評価計画(v0)
|
||||
- パリティ: `tools/parity.sh --lhs pyvm --rhs llvmlite apps/tests/CASE.nyash`
|
||||
- 性能: LLVM実行時間/起動時間/メモリ(PyVMは意味論sanity)
|
||||
- トレース: `NYASH_LLVM_TRACE_PHI=1`, `NYASH_LLVM_DUMP_IR=...`
|
||||
|
||||
## 6. 関連研究
|
||||
OOP/Actor/Capability/Plugin指向設計と実行系(LLVM/JVM/WASM)との比較。Nyashの特徴は「Box‑First × 実行責務の分離(PyVM=意味, LLVM=生成)」にある。
|
||||
|
||||
## 7. 結論
|
||||
Box‑First設計と二系統実行の分担により、複雑性を爆発させずに言語の実働ラインを維持できた。次段階ではLoopFormやJITの再統合を段階的に検討する。
|
||||
|
||||
### 再現手順
|
||||
- LLVMハーネス: `NYASH_LLVM_USE_HARNESS=1 ./target/release/nyash --backend llvm apps/tests/CASE.nyash`
|
||||
- パリティ: `tools/parity.sh --lhs pyvm --rhs llvmlite apps/tests/CASE.nyash`
|
||||
- ビルド(PDF): `tools/papers/build.sh b-jp`
|
||||
|
||||
### キーワード
|
||||
Box‑First, 実行モデル, PHI合成, LLVM, PyVM, プラグインABI
|
||||
|
||||
@ -1,48 +0,0 @@
|
||||
# Nyash: A Box-First Programming Language with Symmetric Memory Management and P2P Intent Model
|
||||
|
||||
## Authors
|
||||
TBD
|
||||
|
||||
## Abstract
|
||||
[See abstract.md]
|
||||
|
||||
## 1. Introduction
|
||||
[See chapters/01-introduction.md]
|
||||
|
||||
## 2. The Box Model: Everything is Box
|
||||
[TODO: Explain the unified Box abstraction]
|
||||
|
||||
## 3. Symmetric Memory Management
|
||||
[TODO: Detail init/fini lifecycle design]
|
||||
|
||||
## 4. P2P Intent Model
|
||||
[TODO: Explain intent-based communication]
|
||||
|
||||
## 5. Multi-Tier Execution Architecture
|
||||
[TODO: Interpreter → VM → JIT → AOT → WASM]
|
||||
|
||||
## 6. Case Studies
|
||||
### 6.1 NyashCoin: P2P Cryptocurrency
|
||||
[TODO: Implementation and evaluation]
|
||||
|
||||
### 6.2 Plugin Marketplace
|
||||
[TODO: Dynamic loading and security]
|
||||
|
||||
### 6.3 Cross-Platform GUI Applications
|
||||
[TODO: EguiBox and platform abstraction]
|
||||
|
||||
## 7. Evaluation
|
||||
[TODO: Performance, usability, and expressiveness]
|
||||
|
||||
## 8. Related Work
|
||||
[TODO: Comparison with other languages]
|
||||
|
||||
## 9. Conclusion
|
||||
[TODO: Summary and future vision]
|
||||
|
||||
## References
|
||||
[TODO: Add bibliography]
|
||||
|
||||
---
|
||||
|
||||
*Note: This is the main paper structure. Detailed content is in individual chapter files.*
|
||||
@ -1,31 +0,0 @@
|
||||
# Abstract: Reversible 90% Code Compression via Multi-Stage Syntax Transformation
|
||||
|
||||
## English Abstract
|
||||
|
||||
Traditional code minification techniques, exemplified by tools like Terser and UglifyJS, achieve compression rates of 50-60% while sacrificing semantic information and variable naming. These approaches optimize for reduced file size rather than machine comprehension.
|
||||
|
||||
In the era of AI-assisted programming, where Large Language Models (LLMs) face severe context limitations, we propose ANCP (AI-Nyash Compact Notation Protocol) - a novel multi-stage reversible code compression technique that achieves 90% token reduction while preserving complete semantic integrity.
|
||||
|
||||
Our approach introduces a three-layer transformation pipeline: Pretty (P) for human development, Compact (C) for distribution with 48% compression, and Fusion (F) for AI communication with 90% compression. Each transformation maintains perfect reversibility through bidirectional source maps and symbol tables.
|
||||
|
||||
We demonstrate our technique on Nyash, a box-first programming language, achieving compression ratios significantly exceeding existing state-of-the-art while enabling LLMs to process 2-3x larger codebases within context limits. Evaluation on a self-hosting compiler shows consistent 90% reduction across 80,000 lines of code with zero semantic loss.
|
||||
|
||||
This work challenges the fundamental assumption that code compression must sacrifice readability, instead proposing AI-optimized compression as a new dimension of language design.
|
||||
|
||||
**Keywords**: code compression, AI-assisted programming, reversible transformation, domain-specific languages, Box-first design
|
||||
|
||||
---
|
||||
|
||||
## 日本語要旨
|
||||
|
||||
従来のコード圧縮技術(Terser、UglifyJS等)は50-60%の圧縮率を達成するが、意味情報と変数名を犠牲にしている。これらの手法はファイルサイズ削減に最適化されており、機械理解には最適化されていない。
|
||||
|
||||
AI支援プログラミングの時代において、大規模言語モデル(LLM)が深刻なコンテキスト制限に直面する中、我々はANCP(AI-Nyash Compact Notation Protocol)を提案する。これは、完全な意味的整合性を保持しながら90%のトークン削減を達成する、新しい多段階可逆コード圧縮技術である。
|
||||
|
||||
我々のアプローチは3層変換パイプラインを導入する:人間開発用のPretty(P)、48%圧縮配布用のCompact(C)、90%圧縮AI通信用のFusion(F)。各変換は双方向ソースマップとシンボルテーブルによる完全可逆性を維持する。
|
||||
|
||||
Box-Firstプログラミング言語Nyashでの実証実験により、既存の最先端技術を大幅に上回る圧縮率を達成し、LLMがコンテキスト制限内で2-3倍大きなコードベースを処理可能にした。8万行の自己ホスティングコンパイラでの評価では、意味的損失ゼロで一貫した90%削減を実現した。
|
||||
|
||||
本研究は、コード圧縮が可読性を犠牲にしなければならないという根本的仮定に挑戦し、AI最適化圧縮を言語設計の新たな次元として提案する。
|
||||
|
||||
**キーワード**: コード圧縮, AI支援プログラミング, 可逆変換, ドメイン固有言語, Box-First設計
|
||||
@ -1,169 +0,0 @@
|
||||
# ANCP Benchmark Plan - 論文用データ収集
|
||||
|
||||
## 📊 実験設計
|
||||
|
||||
### 1. 圧縮性能ベンチマーク
|
||||
|
||||
#### データセット
|
||||
```
|
||||
datasets/
|
||||
├── small/ # 100-1000 LOC サンプル
|
||||
├── medium/ # 1000-10000 LOC モジュール
|
||||
├── large/ # 10000+ LOC アプリケーション
|
||||
└── nyash-compiler/ # 80k LOC 自己ホスティングコンパイラ
|
||||
```
|
||||
|
||||
#### 測定指標
|
||||
| Metric | Unit | Purpose |
|
||||
|--------|------|---------|
|
||||
| Character Reduction | % | ファイルサイズ削減 |
|
||||
| Token Reduction | % | AI理解性向上 |
|
||||
| AST Node Count | count | 構造複雑度 |
|
||||
| Compression Time | ms | 実用性評価 |
|
||||
| Decompression Time | ms | 開発体験 |
|
||||
|
||||
### 2. 可逆性検証
|
||||
|
||||
#### ラウンドトリップテスト
|
||||
```rust
|
||||
#[test]
|
||||
fn test_reversibility() {
|
||||
for sample in test_samples() {
|
||||
let compressed = ancp.compress(sample, Level::Fusion);
|
||||
let restored = ancp.decompress(compressed);
|
||||
assert_eq!(normalize(sample), normalize(restored));
|
||||
|
||||
// MIR等価性も検証
|
||||
let mir_original = compile_to_mir(sample);
|
||||
let mir_restored = compile_to_mir(restored);
|
||||
assert_eq!(mir_original, mir_restored);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### 測定データ
|
||||
- **サンプル数**: 10,000ファイル
|
||||
- **成功率**: 100%(目標)
|
||||
- **エラー分析**: 失敗ケースの詳細分析
|
||||
|
||||
### 3. AI効率性評価
|
||||
|
||||
#### LLM Token Consumption
|
||||
| Model | Context | Original | ANCP | Improvement |
|
||||
|-------|---------|----------|------|-------------|
|
||||
| GPT-4 | 128k | 20k LOC | 40k LOC | 2.0x |
|
||||
| Claude | 200k | 40k LOC | 80k LOC | 2.0x |
|
||||
| Gemini | 100k | 20k LOC | 40k LOC | 2.0x |
|
||||
|
||||
#### Code Understanding Tasks
|
||||
```python
|
||||
# AI理解性評価スクリプト
|
||||
def evaluate_ai_understanding(model, code_samples):
|
||||
results = []
|
||||
|
||||
for original, ancp in code_samples:
|
||||
# 元のコードでのタスク
|
||||
original_score = model.complete_code_task(original)
|
||||
|
||||
# ANCPでのタスク
|
||||
ancp_score = model.complete_code_task(ancp)
|
||||
|
||||
results.append({
|
||||
'original_score': original_score,
|
||||
'ancp_score': ancp_score,
|
||||
'compression_ratio': calculate_compression(original, ancp)
|
||||
})
|
||||
|
||||
return analyze_correlation(results)
|
||||
```
|
||||
|
||||
### 4. 実用性評価
|
||||
|
||||
#### 開発ワークフロー
|
||||
```bash
|
||||
# 通常の開発フロー
|
||||
edit file.nyash # P層で開発
|
||||
nyashc --compact file.c # C層で配布
|
||||
nyashc --fusion file.f # F層でAI投入
|
||||
```
|
||||
|
||||
#### 測定項目
|
||||
- 開発効率(P層での作業時間)
|
||||
- 変換速度(P→C→F変換時間)
|
||||
- デバッグ効率(エラーの逆引き精度)
|
||||
|
||||
---
|
||||
|
||||
## 📈 予想される結果
|
||||
|
||||
### 圧縮率
|
||||
- **Layer C**: 48% ± 5% (Standard deviation)
|
||||
- **Layer F**: 90% ± 3% (Consistently high)
|
||||
- **Comparison**: 1.6x better than Terser
|
||||
|
||||
### 可逆性
|
||||
- **Success Rate**: 99.9%+ (目標)
|
||||
- **Edge Cases**: 特殊文字・Unicode・コメント処理
|
||||
|
||||
### AI効率
|
||||
- **Context Expansion**: 2-3x capacity increase
|
||||
- **Understanding Quality**: No degradation (hypothesis)
|
||||
|
||||
---
|
||||
|
||||
## 🔧 実験プロトコル
|
||||
|
||||
### Phase 1: 基本機能実装
|
||||
1. P→C→F変換器
|
||||
2. ソースマップ生成器
|
||||
3. 可逆性テストスイート
|
||||
|
||||
### Phase 2: 大規模評価
|
||||
1. 10,000サンプルでの自動評価
|
||||
2. 各種メトリクス収集
|
||||
3. エラーケース分析
|
||||
|
||||
### Phase 3: AI評価
|
||||
1. 3つの主要LLMでの効率測定
|
||||
2. コード理解タスクでの性能比較
|
||||
3. 実用的な開発シナリオでのテスト
|
||||
|
||||
### Phase 4: 論文執筆
|
||||
1. 結果の統計解析
|
||||
2. 関連研究との詳細比較
|
||||
3. 査読対応の準備
|
||||
|
||||
---
|
||||
|
||||
## 📝 データ収集チェックリスト
|
||||
|
||||
- [ ] **Compression Benchmarks**: 各レイヤーでの削減率
|
||||
- [ ] **Reversibility Tests**: 10k samples roundtrip verification
|
||||
- [ ] **AI Efficiency**: LLM token consumption measurement
|
||||
- [ ] **Performance**: Transformation speed benchmarks
|
||||
- [ ] **Real-world**: Self-hosting compiler case study
|
||||
- [ ] **User Study**: Developer experience evaluation
|
||||
- [ ] **Comparison**: Head-to-head with existing tools
|
||||
|
||||
---
|
||||
|
||||
## 🎯 論文の説得力
|
||||
|
||||
### 定量的証拠
|
||||
- 圧縮率の客観的測定
|
||||
- 可逆性の数学的証明
|
||||
- AI効率の実証データ
|
||||
|
||||
### 実用的価値
|
||||
- 動作するプロトタイプ
|
||||
- 実際のコンパイラでの検証
|
||||
- 開発ツール統合
|
||||
|
||||
### 学術的新規性
|
||||
- 90%可逆圧縮の達成
|
||||
- AI最適化の新パラダイム
|
||||
- Box-First設計の有効性実証
|
||||
|
||||
---
|
||||
|
||||
**次のステップ**: データ収集の自動化スクリプト実装
|
||||
@ -1,287 +0,0 @@
|
||||
# Reversible 90% Code Compression via Multi-Stage Syntax Transformation
|
||||
|
||||
## 1. Introduction
|
||||
|
||||
### 1.1 Motivation
|
||||
The advent of AI-assisted programming has created unprecedented demands on code context management. Large Language Models (LLMs) like GPT-4 (128k tokens) and Claude (200k tokens) show remarkable capabilities but face severe context limitations when processing large codebases. Traditional code minification, optimized for file size reduction, destroys semantic information crucial for AI comprehension.
|
||||
|
||||
### 1.2 Problem Statement
|
||||
Current state-of-the-art JavaScript minifiers achieve:
|
||||
- **Terser**: 58% compression with semantic loss
|
||||
- **SWC**: 58% compression, high speed
|
||||
- **esbuild**: 55% compression, extreme speed
|
||||
|
||||
**Gap**: No existing technique achieves >60% compression while preserving complete semantic reversibility.
|
||||
|
||||
### 1.3 Our Contribution
|
||||
We present ANCP (AI-Nyash Compact Notation Protocol), featuring:
|
||||
1. **90% compression** with zero semantic loss
|
||||
2. **Perfect reversibility** through bidirectional source maps
|
||||
3. **Three-layer architecture** for different use cases
|
||||
4. **AI-optimized syntax** prioritizing machine comprehension
|
||||
|
||||
---
|
||||
|
||||
## 2. Background and Related Work
|
||||
|
||||
### 2.1 Traditional Code Compression
|
||||
```javascript
|
||||
// Original (readable)
|
||||
function calculateTotal(items, taxRate) {
|
||||
let subtotal = 0;
|
||||
for (const item of items) {
|
||||
subtotal += item.price;
|
||||
}
|
||||
return subtotal * (1 + taxRate);
|
||||
}
|
||||
|
||||
// Terser minified (58% compression)
|
||||
function calculateTotal(t,e){let r=0;for(const l of t)r+=l.price;return r*(1+e)}
|
||||
```
|
||||
**Limitation**: Variable names are destroyed, semantic structure is obscured.
|
||||
|
||||
### 2.2 DSL Compression Research
|
||||
- Domain-specific compression languages show higher efficiency
|
||||
- Self-optimizing AST interpreters demonstrate transformation viability
|
||||
- Prior work limited to 60-70% without reversibility guarantees
|
||||
|
||||
### 2.3 AI-Assisted Programming Challenges
|
||||
- Context window limitations prevent processing large codebases
|
||||
- Code understanding requires semantic preservation
|
||||
- Token efficiency critical for LLM performance
|
||||
|
||||
---
|
||||
|
||||
## 3. The Box-First Language Foundation
|
||||
|
||||
### 3.1 Everything is Box Paradigm
|
||||
Nyash's uniform object model enables systematic compression:
|
||||
```nyash
|
||||
// All entities are boxes
|
||||
box WebServer { ... } // Class definition
|
||||
local server = new WebServer() // Instance creation
|
||||
server.start() // Method invocation
|
||||
```
|
||||
|
||||
### 3.2 Compression Advantages
|
||||
1. **Uniform syntax**: Consistent patterns across all constructs
|
||||
2. **Predictable structure**: Box-centric design simplifies transformation
|
||||
3. **Semantic clarity**: Explicit relationships between entities
|
||||
|
||||
---
|
||||
|
||||
## 4. ANCP: Three-Layer Compression Architecture
|
||||
|
||||
### 4.1 Layer Design Philosophy
|
||||
```
|
||||
P (Pretty) ←→ C (Compact) ←→ F (Fusion)
|
||||
Human Dev Distribution AI Communication
|
||||
0% -48% -90%
|
||||
```
|
||||
|
||||
### 4.2 Layer P: Pretty (Human Development)
|
||||
Standard Nyash syntax optimized for human readability:
|
||||
```nyash
|
||||
box WebServer from HttpBox {
|
||||
init { port, routes }
|
||||
|
||||
birth(port) {
|
||||
me.port = port
|
||||
me.routes = new MapBox()
|
||||
}
|
||||
|
||||
handleRequest(req) {
|
||||
local handler = me.routes.get(req.path)
|
||||
if handler != null {
|
||||
return handler(req)
|
||||
}
|
||||
return "404 Not Found"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 4.3 Layer C: Compact (Sugar Syntax)
|
||||
Syntactic sugar with reversible symbol mapping:
|
||||
```nyash
|
||||
box WebServer from HttpBox {
|
||||
port: IntegerBox
|
||||
routes: MapBox = new MapBox()
|
||||
|
||||
birth(port) {
|
||||
me.port = port
|
||||
}
|
||||
|
||||
handleRequest(req) {
|
||||
l handler = me.routes.get(req.path)
|
||||
^ handler?(req) ?? "404 Not Found"
|
||||
}
|
||||
}
|
||||
```
|
||||
**Compression**: 48% reduction, maintains readability
|
||||
|
||||
### 4.4 Layer F: Fusion (AI-Optimized)
|
||||
Extreme compression for AI consumption:
|
||||
```fusion
|
||||
$WebServer@HttpBox{#{port,routes}b(port){m.port=port m.routes=@MapBox}handleRequest(req){l h=m.routes.get(req.path)^h?(req)??"404"}}
|
||||
```
|
||||
**Compression**: 90% reduction, AI-readable only
|
||||
|
||||
---
|
||||
|
||||
## 5. Transformation Rules and Reversibility
|
||||
|
||||
### 5.1 Symbol Mapping Strategy
|
||||
```rust
|
||||
struct SymbolMap {
|
||||
keywords: HashMap<String, String>, // "box" → "$"
|
||||
identifiers: HashMap<String, String>, // "WebServer" → "WS"
|
||||
literals: StringPool, // Deduplicated constants
|
||||
}
|
||||
```
|
||||
|
||||
### 5.2 Reversibility Guarantees
|
||||
**Theorem**: For any code P, the following holds:
|
||||
```
|
||||
decompress(compress(P)) ≡ canonical(P)
|
||||
```
|
||||
**Proof**: Maintained through bijective symbol mapping and complete AST preservation.
|
||||
|
||||
### 5.3 Source Map 2.0
|
||||
Bidirectional mapping preserving:
|
||||
- Token positions
|
||||
- Symbol relationships
|
||||
- Type information
|
||||
- Semantic structure
|
||||
|
||||
---
|
||||
|
||||
## 6. Implementation
|
||||
|
||||
### 6.1 Architecture
|
||||
```rust
|
||||
pub struct AncpTranscoder {
|
||||
p_to_c: SyntacticTransformer, // Pretty → Compact
|
||||
c_to_f: SemanticCompressor, // Compact → Fusion
|
||||
source_map: BidirectionalMap, // Reversibility
|
||||
}
|
||||
|
||||
impl AncpTranscoder {
|
||||
pub fn compress(&self, level: u8) -> Result<String, Error>
|
||||
pub fn decompress(&self, data: &str) -> Result<String, Error>
|
||||
pub fn verify_roundtrip(&self, original: &str) -> bool
|
||||
}
|
||||
```
|
||||
|
||||
### 6.2 Compression Pipeline
|
||||
1. **Lexical Analysis**: Token identification and classification
|
||||
2. **AST Construction**: Semantic structure preservation
|
||||
3. **Symbol Mapping**: Reversible identifier compression
|
||||
4. **Structural Encoding**: AST serialization for Fusion layer
|
||||
5. **Source Map Generation**: Bidirectional position mapping
|
||||
|
||||
---
|
||||
|
||||
## 7. Experimental Evaluation
|
||||
|
||||
### 7.1 Compression Performance
|
||||
| Layer | Description | Compression | Reversible |
|
||||
|-------|-------------|-------------|------------|
|
||||
| P | Standard Nyash | 0% | ✓ |
|
||||
| C | Sugar syntax | 48% | ✓ |
|
||||
| F | AI-optimized | 90% | ✓ |
|
||||
|
||||
**Comparison with existing tools**:
|
||||
| Tool | Language | Compression | Reversible |
|
||||
|------|----------|-------------|------------|
|
||||
| Terser | JavaScript | 58% | ❌ |
|
||||
| SWC | JavaScript | 58% | ❌ |
|
||||
| **ANCP** | **Nyash** | **90%** | **✓** |
|
||||
|
||||
### 7.2 AI Model Performance
|
||||
**Context Capacity Improvement**:
|
||||
- GPT-4 (128k): 20k LOC → 40k LOC equivalent
|
||||
- Claude (200k): 40k LOC → 80k LOC equivalent
|
||||
- **Result**: Entire Nyash compiler (80k LOC) fits in single context!
|
||||
|
||||
### 7.3 Semantic Preservation
|
||||
**Roundtrip Test Results**:
|
||||
- 10,000 random code samples
|
||||
- 100% successful P→C→F→C→P conversion
|
||||
- Zero semantic differences (AST-level verification)
|
||||
|
||||
### 7.4 Real-world Case Study
|
||||
**Self-hosting Nyash Compiler**:
|
||||
- Original: 80,000 lines
|
||||
- ANCP Fusion: 8,000 equivalent lines
|
||||
- **AI Development**: Complete codebase review in single session
|
||||
|
||||
---
|
||||
|
||||
## 8. Discussion
|
||||
|
||||
### 8.1 Paradigm Shift
|
||||
**Traditional**: Optimize for human readability
|
||||
**Proposed**: Optimize for AI comprehension, maintain reversibility for humans
|
||||
|
||||
### 8.2 Trade-offs
|
||||
**Benefits**:
|
||||
- Massive context expansion for AI tools
|
||||
- Preserved semantic integrity
|
||||
- Zero information loss
|
||||
|
||||
**Costs**:
|
||||
- Tool dependency for human inspection
|
||||
- Initial learning curve for developers
|
||||
- Storage overhead for source maps
|
||||
|
||||
### 8.3 Implications for Language Design
|
||||
Box-First design principles enable:
|
||||
- Uniform compression patterns
|
||||
- Predictable transformation rules
|
||||
- Scalable symbol mapping
|
||||
|
||||
---
|
||||
|
||||
## 9. Future Work
|
||||
|
||||
### 9.1 ANCP v2.0
|
||||
- Semantic-aware compression
|
||||
- Context-dependent optimization
|
||||
- Multi-language adaptation
|
||||
|
||||
### 9.2 Integration Ecosystem
|
||||
- IDE real-time conversion
|
||||
- Version control system integration
|
||||
- Collaborative development workflows
|
||||
|
||||
### 9.3 Standardization
|
||||
- ANCP protocol specification
|
||||
- Cross-language compatibility
|
||||
- Industry adoption strategy
|
||||
|
||||
---
|
||||
|
||||
## 10. Conclusion
|
||||
|
||||
We demonstrate that code compression can exceed the traditional 60% barrier while maintaining perfect semantic reversibility. Our 90% compression rate, achieved through Box-First language design and multi-stage transformation, opens new possibilities for AI-assisted programming.
|
||||
|
||||
The shift from human-centric to AI-optimized code representation, with guaranteed reversibility, represents a fundamental paradigm change for the AI programming era. ANCP provides a practical foundation for this transformation.
|
||||
|
||||
**Availability**: Full implementation and benchmarks available at: https://github.com/nyash-project/nyash
|
||||
|
||||
---
|
||||
|
||||
## Acknowledgments
|
||||
|
||||
Special thanks to the AI collaboration team (ChatGPT-5, Claude-4, Gemini-Advanced) for their insights in developing this revolutionary compression technique.
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
[To be added based on related work analysis]
|
||||
|
||||
1. Terser: JavaScript parser and mangler/compressor toolkit
|
||||
2. SWC: Super-fast TypeScript/JavaScript compiler
|
||||
3. Domain-Specific Language Abstractions for Compression, ACM 2024
|
||||
4. Self-Optimizing AST Interpreters, SIGPLAN 2024
|
||||
@ -1,212 +0,0 @@
|
||||
# ChatGPT5's BoxCall Revolution Insights
|
||||
|
||||
## 🚀 革新的ポイント:Load/Store削除の意味
|
||||
|
||||
### 従来のVM命令
|
||||
```
|
||||
Load x // メモリから変数xを読む
|
||||
Store y, 42 // メモリの変数yに42を書く
|
||||
```
|
||||
|
||||
### Nyash MIR13の革新
|
||||
```
|
||||
BoxCall @x, "get", [] // xもBoxとして扱う
|
||||
BoxCall @y, "set", [42] // yへの代入もメッセージ
|
||||
```
|
||||
|
||||
## 🎯 "表現は一つ、実行は二態" の具体例
|
||||
|
||||
### 1. スカラ変数の最適化パス
|
||||
|
||||
```mir
|
||||
; ソースコード: x = x + 1
|
||||
; MIR表現(統一)
|
||||
%1 = BoxCall %x, "get", []
|
||||
%2 = BinOp %1, Add, 1
|
||||
BoxCall %x, "set", [%2]
|
||||
|
||||
; 最適化後(二態実行)
|
||||
; Case A: エスケープなし → レジスタ化
|
||||
mov eax, [x_register]
|
||||
add eax, 1
|
||||
mov [x_register], eax
|
||||
|
||||
; Case B: 監査付き → メッセージ維持
|
||||
call VarBox_get
|
||||
add eax, 1
|
||||
call VarBox_set
|
||||
```
|
||||
|
||||
### 2. 配列アクセスの統一
|
||||
|
||||
```mir
|
||||
; ソースコード: arr[i] = arr[i] + 1
|
||||
; MIR表現(統一)
|
||||
%elem = BoxCall %arr, "get", [%i]
|
||||
%new = BinOp %elem, Add, 1
|
||||
BoxCall %arr, "set", [%i, %new]
|
||||
|
||||
; 最適化後(bounds check統合)
|
||||
cmp i, arr.length
|
||||
jae slow_path
|
||||
mov eax, [arr.data + i*4]
|
||||
add eax, 1
|
||||
mov [arr.data + i*4], eax
|
||||
```
|
||||
|
||||
## 📊 性能目標と測定計画
|
||||
|
||||
### Benchmark 1: スカラ変数ループ
|
||||
```nyash
|
||||
// bench_scalar_loop.nyash
|
||||
static box ScalarBench {
|
||||
main() {
|
||||
local x = 0
|
||||
local iterations = 100_000_000
|
||||
|
||||
local start = Time.now()
|
||||
loop(x < iterations) {
|
||||
x = x + 1 // BoxCall化されるが最適化でレジスタに
|
||||
}
|
||||
local elapsed = Time.now() - start
|
||||
|
||||
console.log("Scalar loop: " + elapsed + "ms")
|
||||
console.log("ops/sec: " + (iterations / (elapsed / 1000)))
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**目標**: 従来Load/Store実装の±5%以内
|
||||
|
||||
### Benchmark 2: 配列連続アクセス
|
||||
```nyash
|
||||
// bench_array_sum.nyash
|
||||
static box ArrayBench {
|
||||
main() {
|
||||
local arr = new ArrayBox()
|
||||
local size = 10_000_000
|
||||
|
||||
// 初期化
|
||||
loop(i < size) {
|
||||
arr.push(i)
|
||||
}
|
||||
|
||||
// 連続読み込み
|
||||
local sum = 0
|
||||
local start = Time.now()
|
||||
loop(i < size) {
|
||||
sum = sum + arr.get(i) // bounds check最適化対象
|
||||
}
|
||||
local elapsed = Time.now() - start
|
||||
|
||||
console.log("Array sum: " + elapsed + "ms")
|
||||
console.log("Elements/sec: " + (size / (elapsed / 1000)))
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**最適化ポイント**:
|
||||
- bounds checkのループ外移動
|
||||
- 連続アクセスパターンの認識
|
||||
- SIMD化の可能性
|
||||
|
||||
### Benchmark 3: 監査付き変数
|
||||
```nyash
|
||||
// bench_audited_var.nyash
|
||||
static box AuditedBench {
|
||||
main() {
|
||||
// 監査付き変数(フック可能)
|
||||
local x = new VarBox(0)
|
||||
x.onSet = function(old, new) {
|
||||
// 変更通知(本番では軽量化)
|
||||
}
|
||||
|
||||
local iterations = 10_000_000
|
||||
local start = Time.now()
|
||||
|
||||
loop(i < iterations) {
|
||||
x.set(x.get() + 1)
|
||||
}
|
||||
|
||||
local elapsed = Time.now() - start
|
||||
console.log("Audited var: " + elapsed + "ms")
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**目標**: オーバーヘッド < 5ns/操作
|
||||
|
||||
## 🔬 PIC(Polymorphic Inline Cache)統計
|
||||
|
||||
### 収集すべきデータ
|
||||
```nyash
|
||||
// pic_stats.nyash
|
||||
static box PICStats {
|
||||
main() {
|
||||
// VMから統計取得
|
||||
local stats = VM.getPICStatistics()
|
||||
|
||||
console.log("=== PIC Statistics ===")
|
||||
console.log("Total sites: " + stats.totalSites)
|
||||
console.log("Monomorphic: " + stats.mono + " (" +
|
||||
(stats.mono * 100 / stats.total) + "%)")
|
||||
console.log("Polymorphic: " + stats.poly)
|
||||
console.log("Megamorphic: " + stats.mega)
|
||||
|
||||
// ホットサイトの詳細
|
||||
for site in stats.hotSites {
|
||||
console.log("Site " + site.id + ": " +
|
||||
site.types.length + " types, " +
|
||||
site.hits + " hits")
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**目標**: 単相率 > 80%(ホットサイト)
|
||||
|
||||
## 🏗️ Lower実装の段階的アプローチ
|
||||
|
||||
### Phase 1: 基本スカラ最適化
|
||||
- CellBox(副作用なし)の識別
|
||||
- エスケープ解析
|
||||
- レジスタ割り当て
|
||||
|
||||
### Phase 2: 配列最適化
|
||||
- TypedArrayの型特殊化
|
||||
- bounds check除去
|
||||
- ベクトル化準備
|
||||
|
||||
### Phase 3: 監査システム
|
||||
- 軽量フックメカニズム
|
||||
- JIT時の条件付きコード生成
|
||||
- プロファイルベース最適化
|
||||
|
||||
### Phase 4: 完全統合
|
||||
- PIC + Lower協調
|
||||
- インライン化
|
||||
- 最終的な機械語生成
|
||||
|
||||
## 💡 革新性の証明ポイント
|
||||
|
||||
1. **統一性がもたらす簡潔さ**
|
||||
- パーサー: 変数もBoxCallとして扱うだけ
|
||||
- 最適化: 1種類の変換ルール
|
||||
- デバッグ: 統一的なトレース
|
||||
|
||||
2. **性能ペナルティなし**
|
||||
- スカラ: レジスタ化で従来同等
|
||||
- 配列: bounds除去で高速化
|
||||
- オブジェクト: PICで直接呼び出し
|
||||
|
||||
3. **拡張性の確保**
|
||||
- 監査: 必要時のみフック
|
||||
- トランザクション: BoxCallに統合
|
||||
- 並行性: Barrier命令で制御
|
||||
|
||||
## 🎯 最終目標
|
||||
|
||||
「Everything is Box, Everything is Message」を貫きながら、
|
||||
実行時は「Everything is Optimized」を実現する。
|
||||
|
||||
これがNyash MIR13 + BoxCall統一アーキテクチャの真髄。
|
||||
@ -1,76 +0,0 @@
|
||||
// Benchmark 1: スカラ変数ループ性能測定
|
||||
// 目的: BoxCall最適化でLoad/Store相当の性能を達成することを実証
|
||||
|
||||
static box ScalarBench {
|
||||
console: ConsoleBox
|
||||
|
||||
main() {
|
||||
me.console = new ConsoleBox()
|
||||
|
||||
// ウォームアップ
|
||||
me.warmup()
|
||||
|
||||
// 本測定
|
||||
me.runBenchmark()
|
||||
|
||||
// 比較用: 配列経由の間接アクセス
|
||||
me.runIndirectBenchmark()
|
||||
}
|
||||
|
||||
warmup() {
|
||||
me.console.log("=== Warming up ===")
|
||||
local x = 0
|
||||
loop(x < 1000000) {
|
||||
x = x + 1
|
||||
}
|
||||
}
|
||||
|
||||
runBenchmark() {
|
||||
me.console.log("\n=== Direct Scalar Loop ===")
|
||||
local x = 0
|
||||
local iterations = 100000000
|
||||
|
||||
local start = new TimeBox()
|
||||
local startMs = start.now()
|
||||
|
||||
// メインループ: BoxCall("get") + BinOp + BoxCall("set")として表現される
|
||||
loop(x < iterations) {
|
||||
x = x + 1 // 最適化でレジスタ化されるべき
|
||||
}
|
||||
|
||||
local end = new TimeBox()
|
||||
local elapsed = end.now() - startMs
|
||||
|
||||
me.console.log("Iterations: " + iterations)
|
||||
me.console.log("Time: " + elapsed + "ms")
|
||||
me.console.log("ops/sec: " + (iterations / (elapsed / 1000)))
|
||||
me.console.log("ns/op: " + (elapsed * 1000000 / iterations))
|
||||
}
|
||||
|
||||
runIndirectBenchmark() {
|
||||
me.console.log("\n=== Indirect Access (Array[0]) ===")
|
||||
local arr = new ArrayBox()
|
||||
arr.push(0) // arr[0] = 0
|
||||
local iterations = 100000000
|
||||
|
||||
local start = new TimeBox()
|
||||
local startMs = start.now()
|
||||
|
||||
// 配列経由の間接アクセス(最適化が難しい)
|
||||
loop(arr.get(0) < iterations) {
|
||||
arr.set(0, arr.get(0) + 1)
|
||||
}
|
||||
|
||||
local end = new TimeBox()
|
||||
local elapsed = end.now() - startMs
|
||||
|
||||
me.console.log("Iterations: " + iterations)
|
||||
me.console.log("Time: " + elapsed + "ms")
|
||||
me.console.log("ops/sec: " + (iterations / (elapsed / 1000)))
|
||||
me.console.log("ns/op: " + (elapsed * 1000000 / iterations))
|
||||
|
||||
// 速度比を計算して最適化の効果を示す
|
||||
me.console.log("\n=== Analysis ===")
|
||||
me.console.log("Direct access should be >10x faster than indirect")
|
||||
}
|
||||
}
|
||||
@ -1,14 +0,0 @@
|
||||
# Abstract
|
||||
|
||||
Modern programming languages face a fundamental trade-off between execution flexibility and deployment simplicity. Languages with JIT compilation offer excellent runtime performance but require complex runtime environments, while ahead-of-time compiled languages produce simple binaries but lack runtime optimization opportunities. We present a unified compilation pipeline that bridges this gap through an extremely minimal intermediate representation (MIR) consisting of only 13 instructions.
|
||||
|
||||
Our approach, implemented in the Nyash programming language, demonstrates that a single IR can efficiently target multiple execution backends: interpreter, JIT, AOT, and WebAssembly. The key innovation lies in the "Everything is Box" philosophy, which provides a uniform memory model that simplifies both compilation and runtime behavior. By integrating Cranelift for code generation and embedding lld as the linker, we achieve complete independence from external toolchains while maintaining competitive performance.
|
||||
|
||||
We introduce three novel contributions: (1) MIR13, an extremely minimal IR that captures the full semantics of a dynamic language in just 13 instructions, (2) a unified execution pipeline that seamlessly transitions from interpretation to JIT to native code generation, and (3) a C ABI facade that enables clean integration with existing systems while preserving the simplicity of the Box model.
|
||||
|
||||
Our evaluation shows that programs compiled through this pipeline achieve performance within 15% of hand-optimized C code while maintaining the development productivity of dynamic languages. The JIT-to-native transition is completely transparent, allowing developers to start with rapid prototyping and seamlessly move to production deployment. Binary sizes are competitive with Go, typically 2-5MB for real-world applications.
|
||||
|
||||
This work demonstrates that the traditional boundaries between JIT and AOT compilation are artificial constraints that can be eliminated through careful IR design and unified runtime architecture. The resulting system is not only technically elegant but also practically useful, as evidenced by the self-hosting Nyash compiler written in just 20,000 lines of Nyash code (compared to 80,000 lines in the original Rust implementation).
|
||||
|
||||
## Keywords
|
||||
Programming Languages, Compiler Design, Intermediate Representation, Just-In-Time Compilation, Ahead-of-Time Compilation, Code Generation, Nyash, Box Model
|
||||
@ -1,362 +0,0 @@
|
||||
# From JIT to Native: A Unified Compilation Pipeline for Box-based Languages
|
||||
|
||||
## Abstract
|
||||
[See abstract.md]
|
||||
|
||||
## 1. Introduction
|
||||
|
||||
The landscape of modern programming language implementation is divided into two camps: languages that compile ahead-of-time (AOT) to native code, offering predictable performance and simple deployment, and languages that use just-in-time (JIT) compilation, providing runtime optimization opportunities at the cost of complex runtime systems. This division forces language designers and users to make early commitments that are difficult to change later.
|
||||
|
||||
We present a unified compilation pipeline that eliminates this artificial boundary. Our approach, implemented in the Nyash programming language, demonstrates that a single intermediate representation (IR) can efficiently serve multiple execution strategies: interpretation for development, JIT compilation for performance-critical paths, and native code generation for deployment.
|
||||
|
||||
The key innovation enabling this unification is MIR13, an extremely minimal IR consisting of just 13 instructions that captures the full semantics of a dynamic, object-oriented language. This minimalism is not merely an academic exercise—it enables practical benefits:
|
||||
|
||||
1. **Simplified Implementation**: The entire compiler can be understood and modified by a single developer
|
||||
2. **Unified Optimization**: Optimizations written once benefit all execution backends
|
||||
3. **Seamless Transition**: Code can move from interpreted to JIT-compiled to native without source changes
|
||||
4. **Complete Self-Sufficiency**: By embedding Cranelift and lld, we eliminate all external toolchain dependencies
|
||||
|
||||
### Contributions
|
||||
|
||||
This paper makes the following contributions:
|
||||
|
||||
- **MIR13 Design**: We show that 13 carefully chosen instructions are sufficient to represent a full-featured dynamic language
|
||||
- **Unified Pipeline Architecture**: We demonstrate how to build a compilation pipeline that seamlessly supports multiple execution strategies
|
||||
- **Box Model Integration**: We introduce the "Everything is Box" philosophy that simplifies memory management across all execution modes
|
||||
- **Performance Evaluation**: We provide comprehensive benchmarks showing competitive performance with traditional approaches
|
||||
- **Self-Hosting Validation**: We validate our approach by implementing the Nyash compiler in Nyash itself, achieving a 75% code reduction
|
||||
|
||||
## 2. Background and Motivation
|
||||
|
||||
### 2.1 The JIT/AOT Divide
|
||||
|
||||
Traditional language implementations fall into distinct categories:
|
||||
|
||||
**AOT-Compiled Languages** (C, C++, Rust, Go):
|
||||
- Produce standalone executables
|
||||
- Predictable performance characteristics
|
||||
- Complex build systems
|
||||
- Limited runtime flexibility
|
||||
|
||||
**JIT-Compiled Languages** (Java, C#, JavaScript):
|
||||
- Runtime optimization opportunities
|
||||
- Complex runtime systems
|
||||
- Deployment challenges
|
||||
- Warmup time penalties
|
||||
|
||||
**Interpreted Languages** (Python, Ruby):
|
||||
- Maximum flexibility
|
||||
- Poor performance
|
||||
- Simple implementation
|
||||
- Easy debugging
|
||||
|
||||
### 2.2 Previous Unification Attempts
|
||||
|
||||
Several projects have attempted to bridge these divides:
|
||||
|
||||
**GraalVM**: Provides a polyglot VM with both JIT and AOT modes, but requires a complex runtime system and has large binary sizes.
|
||||
|
||||
**Go**: Offers fast compilation and simple binaries, but lacks runtime optimization opportunities.
|
||||
|
||||
**Julia**: Combines JIT compilation with the ability to generate standalone binaries, but with significant complexity.
|
||||
|
||||
### 2.3 The Nyash Approach
|
||||
|
||||
Nyash takes a radically different approach: instead of adding complexity to support multiple modes, we reduce the IR to its absolute minimum. This counterintuitively makes supporting multiple backends easier, not harder.
|
||||
|
||||
## 3. The MIR13 Instruction Set
|
||||
|
||||
### 3.1 Design Philosophy
|
||||
|
||||
MIR13 is designed around three principles:
|
||||
|
||||
1. **Minimalism**: Each instruction must be essential and non-redundant
|
||||
2. **Orthogonality**: Instructions should compose without special cases
|
||||
3. **Box-Centricity**: All operations work uniformly on Box types
|
||||
|
||||
### 3.2 The 13 Instructions
|
||||
|
||||
```rust
|
||||
enum MirInst {
|
||||
// Basic Operations (5)
|
||||
Const { result: Reg, value: Value },
|
||||
UnaryOp { result: Reg, op: UnOp, operand: Reg },
|
||||
BinOp { result: Reg, op: BinOp, left: Reg, right: Reg },
|
||||
Compare { result: Reg, op: CmpOp, left: Reg, right: Reg },
|
||||
TypeOp { result: Reg, op: TypeOp, operand: Reg },
|
||||
|
||||
// Memory Operations (2)
|
||||
Load { result: Reg, base: Reg, field: String },
|
||||
Store { base: Reg, field: String, value: Reg },
|
||||
|
||||
// Control Flow (4)
|
||||
Branch { condition: Reg, true_label: Label, false_label: Label },
|
||||
Jump { label: Label },
|
||||
Return { value: Option<Reg> },
|
||||
Phi { result: Reg, values: Vec<(Label, Reg)> },
|
||||
|
||||
// Box Operations (1)
|
||||
BoxCall { result: Option<Reg>, box_reg: Reg, method: String, args: Vec<Reg> },
|
||||
|
||||
// External Interface (1)
|
||||
ExternCall { result: Option<Reg>, name: String, args: Vec<Reg> },
|
||||
}
|
||||
```
|
||||
|
||||
### 3.3 Semantic Completeness
|
||||
|
||||
We prove that MIR13 is semantically complete for dynamic languages by showing how to implement:
|
||||
|
||||
- **Object Creation**: `Const` + `BoxCall` to constructor
|
||||
- **Method Dispatch**: `BoxCall` with dynamic resolution
|
||||
- **Field Access**: `Load`/`Store` operations
|
||||
- **Control Flow**: `Branch`, `Jump`, `Phi` for all patterns
|
||||
- **Type Introspection**: `TypeOp` for runtime type checks
|
||||
- **Foreign Function Interface**: `ExternCall` for C interop
|
||||
|
||||
## 4. The Unified Compilation Pipeline
|
||||
|
||||
### 4.1 Architecture Overview
|
||||
|
||||
```
|
||||
Source Code
|
||||
↓
|
||||
Parser (AST)
|
||||
↓
|
||||
Lowering (MIR13)
|
||||
↓
|
||||
┌─────────────┬────────────┬───────────┬──────────┐
|
||||
│ Interpreter │ JIT │ AOT │ WASM │
|
||||
│ (Boxed) │(Cranelift) │(Cranelift)│ (Direct) │
|
||||
└─────────────┴────────────┴───────────┴──────────┘
|
||||
```
|
||||
|
||||
### 4.2 The Interpreter
|
||||
|
||||
The interpreter directly executes MIR13 instructions using the Box model:
|
||||
|
||||
```nyash
|
||||
box MirInterpreter {
|
||||
execute(inst) {
|
||||
peek inst.type {
|
||||
"Const" => me.regs[inst.result] = inst.value
|
||||
"BinOp" => me.executeBinOp(inst)
|
||||
"BoxCall" => me.executeBoxCall(inst)
|
||||
// ... other instructions
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 4.3 JIT Compilation with Cranelift
|
||||
|
||||
When hot paths are detected, we compile MIR13 to native code using Cranelift:
|
||||
|
||||
```rust
|
||||
fn compile_mir_to_cranelift(mir: &[MirInst]) -> CompiledCode {
|
||||
let mut ctx = CraneliftContext::new();
|
||||
let mut builder = FunctionBuilder::new(&mut ctx.func);
|
||||
|
||||
for inst in mir {
|
||||
match inst {
|
||||
MirInst::Const { result, value } => {
|
||||
let cranelift_val = emit_constant(&mut builder, value);
|
||||
builder.def_var(result, cranelift_val);
|
||||
}
|
||||
MirInst::BoxCall { .. } => {
|
||||
emit_box_call(&mut builder, inst);
|
||||
}
|
||||
// ... other instructions
|
||||
}
|
||||
}
|
||||
|
||||
ctx.compile()
|
||||
}
|
||||
```
|
||||
|
||||
### 4.4 AOT Compilation
|
||||
|
||||
AOT compilation reuses the JIT infrastructure but generates object files:
|
||||
|
||||
```rust
|
||||
fn compile_to_object(mir: &[MirInst], target: &str) -> Vec<u8> {
|
||||
let compiled = compile_mir_to_cranelift(mir);
|
||||
let object = create_object_file(target);
|
||||
|
||||
for (name, code) in compiled.functions {
|
||||
object.add_function(name, code);
|
||||
}
|
||||
|
||||
object.emit()
|
||||
}
|
||||
```
|
||||
|
||||
### 4.5 Linking with Embedded lld
|
||||
|
||||
The final step links object files into executables:
|
||||
|
||||
```rust
|
||||
fn link_executable(objects: &[ObjectFile], output: &str) -> Result<()> {
|
||||
let mut linker = EmbeddedLinker::new();
|
||||
|
||||
for obj in objects {
|
||||
linker.add_object(obj);
|
||||
}
|
||||
|
||||
linker.add_runtime("nyashrt");
|
||||
linker.set_entry("nyash_main");
|
||||
linker.link(output)
|
||||
}
|
||||
```
|
||||
|
||||
## 5. The Box Model and Memory Management
|
||||
|
||||
### 5.1 Everything is Box
|
||||
|
||||
In Nyash, all values are Boxes, providing uniform memory management:
|
||||
|
||||
```nyash
|
||||
box StringBox {
|
||||
init { value }
|
||||
|
||||
length() {
|
||||
return me.value.length
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 5.2 Reference Counting with Cycle Detection
|
||||
|
||||
Boxes use reference counting with cycle detection, eliminating manual memory management while avoiding garbage collection pauses.
|
||||
|
||||
### 5.3 C ABI Integration
|
||||
|
||||
The Box model integrates cleanly with C through handles:
|
||||
|
||||
```c
|
||||
typedef uint64_t ny_handle;
|
||||
|
||||
ny_handle ny_box_create(const char* type);
|
||||
void ny_box_release(ny_handle box);
|
||||
ny_handle ny_box_call(ny_handle box, const char* method, ny_handle* args);
|
||||
```
|
||||
|
||||
## 6. Optimization Strategies
|
||||
|
||||
### 6.1 MIR-Level Optimizations
|
||||
|
||||
Before lowering to Cranelift, we apply MIR-level optimizations:
|
||||
|
||||
- **Dead Code Elimination**: Remove unreachable instructions
|
||||
- **Constant Folding**: Evaluate compile-time constants
|
||||
- **Common Subexpression Elimination**: Share repeated computations
|
||||
|
||||
### 6.2 Profile-Guided JIT
|
||||
|
||||
The interpreter collects profiling data to guide JIT decisions:
|
||||
|
||||
```nyash
|
||||
box HotPathDetector {
|
||||
init { counts, threshold }
|
||||
|
||||
shouldJIT(function) {
|
||||
me.counts[function] += 1
|
||||
return me.counts[function] > me.threshold
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 6.3 Incremental Compilation
|
||||
|
||||
Changes to source code only recompile affected functions, enabling rapid development cycles.
|
||||
|
||||
## 7. Evaluation
|
||||
|
||||
### 7.1 Experimental Setup
|
||||
|
||||
We evaluate our system on:
|
||||
- **Hardware**: Intel i7-12700K, 32GB RAM
|
||||
- **OS**: Ubuntu 22.04, Windows 11
|
||||
- **Benchmarks**: Spectral norm, Binary trees, Fannkuch redux
|
||||
|
||||
### 7.2 Performance Results
|
||||
|
||||
[Performance graphs and tables showing:
|
||||
- JIT warmup characteristics
|
||||
- Peak performance comparison
|
||||
- Memory usage
|
||||
- Binary size comparison]
|
||||
|
||||
### 7.3 Compilation Time
|
||||
|
||||
[Table showing compilation times for various programs across different backends]
|
||||
|
||||
### 7.4 Case Study: Self-Hosting Compiler
|
||||
|
||||
The Nyash compiler itself serves as our most comprehensive benchmark:
|
||||
- Original Rust implementation: 80,000 lines
|
||||
- Nyash implementation: 20,000 lines (75% reduction)
|
||||
- Performance: Within 20% of Rust version
|
||||
- Binary size: 4.2MB (including runtime)
|
||||
|
||||
## 8. Related Work
|
||||
|
||||
### 8.1 Multi-Backend Compilers
|
||||
|
||||
- **LLVM**: Provides multiple backends but with significant complexity
|
||||
- **GCC**: Similar to LLVM but even more complex
|
||||
- **QBE**: Simpler than LLVM but less feature-complete
|
||||
|
||||
### 8.2 Minimal IRs
|
||||
|
||||
- **WebAssembly**: ~150 instructions, stack-based
|
||||
- **CakeML**: Formally verified but complex
|
||||
- **ANF/CPS**: Used in functional language compilers
|
||||
|
||||
### 8.3 Language Workbenches
|
||||
|
||||
- **Truffle/Graal**: Sophisticated but heavyweight
|
||||
- **RPython**: Python subset for building interpreters
|
||||
- **Terra**: Lua-based metaprogramming system
|
||||
|
||||
## 9. Future Work
|
||||
|
||||
### 9.1 Advanced Optimizations
|
||||
|
||||
- **Escape Analysis**: Stack-allocate non-escaping Boxes
|
||||
- **Devirtualization**: Inline known Box methods
|
||||
- **Vectorization**: Utilize SIMD instructions
|
||||
|
||||
### 9.2 Additional Backends
|
||||
|
||||
- **Direct x86-64**: Bypass Cranelift for ultimate control
|
||||
- **GPU**: Compile parallel sections to CUDA/OpenCL
|
||||
- **FPGA**: Hardware synthesis for embedded systems
|
||||
|
||||
### 9.3 Verification
|
||||
|
||||
- **Formal Semantics**: Prove correctness of MIR13
|
||||
- **Validated Compilation**: Ensure semantic preservation
|
||||
- **Memory Safety**: Formal proof of Box model safety
|
||||
|
||||
## 10. Conclusion
|
||||
|
||||
We have presented a unified compilation pipeline that eliminates the artificial boundaries between interpretation, JIT compilation, and ahead-of-time compilation. By reducing our intermediate representation to just 13 essential instructions and embracing the "Everything is Box" philosophy, we achieve a system that is both simpler and more capable than traditional approaches.
|
||||
|
||||
Our implementation in Nyash demonstrates that this approach is not merely theoretical—it produces a practical system capable of self-hosting with a 75% reduction in code size while maintaining competitive performance. The embedded Cranelift and lld components ensure complete independence from external toolchains, making Nyash truly self-sufficient.
|
||||
|
||||
This work opens new possibilities for language implementation, showing that simplicity and capability are not opposing forces but complementary aspects of good design. We believe the techniques presented here will influence future language implementations, particularly in domains where both development flexibility and deployment simplicity are valued.
|
||||
|
||||
## Acknowledgments
|
||||
|
||||
[To be added]
|
||||
|
||||
## References
|
||||
|
||||
[To be added - will include references to:
|
||||
- Cranelift documentation
|
||||
- lld architecture
|
||||
- Box model papers
|
||||
- IR design literature
|
||||
- JIT compilation techniques
|
||||
- Related language implementations]
|
||||
@ -1,33 +0,0 @@
|
||||
# Abstract
|
||||
|
||||
## Practical SSA Construction for a Box-Oriented Language
|
||||
|
||||
Building Static Single Assignment (SSA) form for dynamically-typed languages presents unique challenges, particularly when the language philosophy mandates that "Everything is Box." This paper presents our practical experience constructing SSA form for Nyash, a Box-oriented language targeting LLVM IR.
|
||||
|
||||
We identify three main challenges: (1) PHI node placement in complex control flow with mixed handle/pointer types, (2) maintaining correct insertion points across nested scopes and loops, and (3) ensuring dominance properties while supporting dynamic Box operations.
|
||||
|
||||
Our contributions include:
|
||||
- **BuilderCursor**: A structured approach to LLVM builder position management that prevents post-terminator insertions
|
||||
- **Sealed SSA with snapshots**: Block-end value snapshots for reliable PHI wiring in gradually-constructed CFGs
|
||||
- **Type normalization strategy**: Unified i64 handle representation with on-demand pointer conversions
|
||||
|
||||
We evaluate our approach on real-world Nyash programs, including a self-hosting compiler component (`dep_tree_min_string.nyash`). While achieving functional SSA construction, we identify remaining challenges and propose LoopForm IR as a future direction for simplifying control flow representation.
|
||||
|
||||
This work provides practical insights for language implementers facing similar challenges when bridging high-level Box abstractions to low-level SSA form.
|
||||
|
||||
---
|
||||
|
||||
## 和文要旨
|
||||
|
||||
動的型付け言語、特に「Everything is Box」を哲学とする言語において、静的単一代入(SSA)形式を構築することは独特の課題を提示する。本稿では、LLVM IRを対象とするBox指向言語Nyashにおける、SSA形式構築の実践的経験を報告する。
|
||||
|
||||
主要な課題として、(1)ハンドル/ポインタ混在型での複雑な制御フローにおけるPHIノード配置、(2)ネストしたスコープとループにまたがる挿入位置の維持、(3)動的Box操作をサポートしながらのdominance特性の保証、の3点を特定した。
|
||||
|
||||
本稿の貢献は以下の通りである:
|
||||
- **BuilderCursor**:終端後挿入を防ぐ構造化されたLLVMビルダー位置管理
|
||||
- **スナップショット付きSealed SSA**:段階的に構築されるCFGでの確実なPHI配線のためのブロック終端値スナップショット
|
||||
- **型正規化戦略**:オンデマンドポインタ変換を伴う統一的i64ハンドル表現
|
||||
|
||||
実世界のNyashプログラム(セルフホスティングコンパイラコンポーネント`dep_tree_min_string.nyash`を含む)で評価を行った。機能的なSSA構築を達成する一方で、残存する課題を特定し、制御フロー表現を簡素化する将来の方向性としてLoopForm IRを提案する。
|
||||
|
||||
本研究は、高レベルBox抽象を低レベルSSA形式に橋渡しする際に同様の課題に直面する言語実装者に、実践的な洞察を提供する。
|
||||
@ -1,177 +0,0 @@
|
||||
# 箱理論によるSSA構築の革命的簡略化
|
||||
|
||||
*2025-09-13: 650行の苦闘から100行の解決へ*
|
||||
|
||||
## 🎯 箱理論とは
|
||||
|
||||
### 基本概念
|
||||
```
|
||||
基本ブロック = 箱
|
||||
変数の値 = 箱の中身
|
||||
PHI = どの箱から値を取るか選ぶだけ
|
||||
```
|
||||
|
||||
### なぜこれが革命的か
|
||||
- **SSAの複雑さが消える**: dominance、forward reference、型変換...すべて不要
|
||||
- **デバッグが簡単**: `print(boxes)`で状態が全部見える
|
||||
- **実装が短い**: 650行 → 100行(85%削減)
|
||||
|
||||
## 💡 実装の比較
|
||||
|
||||
### Before: 従来のSSA/PHI実装(650行)
|
||||
```python
|
||||
# 複雑なResolver
|
||||
class Resolver:
|
||||
def __init__(self):
|
||||
self.i64_cache = {}
|
||||
self.ptr_cache = {}
|
||||
self.f64_cache = {}
|
||||
self._end_i64_cache = {}
|
||||
# ... 300行のキャッシュと変換ロジック
|
||||
|
||||
# PHI配線の地獄
|
||||
def lower_phi(self, inst):
|
||||
# dominance考慮
|
||||
# forward reference処理
|
||||
# 型変換
|
||||
# ... 150行の複雑なロジック
|
||||
```
|
||||
|
||||
### After: 箱理論実装(100行)
|
||||
```python
|
||||
class BoxBasedSSA:
|
||||
def __init__(self):
|
||||
self.boxes = {} # block_id -> {var: value}
|
||||
|
||||
def enter_block(self, block_id):
|
||||
self.current_box = {}
|
||||
|
||||
def set_value(self, var, value):
|
||||
self.current_box[var] = value
|
||||
|
||||
def get_value(self, var):
|
||||
# 現在の箱から取得、なければ親の箱を見る
|
||||
return self.current_box.get(var, self.find_in_parent_boxes(var))
|
||||
|
||||
def phi(self, var, predecessors):
|
||||
# どの箱から来たかで値を選ぶだけ
|
||||
for pred_id, pred_box in predecessors:
|
||||
if self.came_from(pred_id):
|
||||
return pred_box.get(var, 0)
|
||||
return 0
|
||||
```
|
||||
|
||||
## 📊 具体例: dep_tree_min_string.nyashでの適用
|
||||
|
||||
### 問題のループ構造
|
||||
```nyash
|
||||
loop(i < n) {
|
||||
out = out + "x"
|
||||
i = i + 1
|
||||
}
|
||||
```
|
||||
|
||||
### 従来のPHI配線
|
||||
```llvm
|
||||
; 複雑なPHI配線、dominance違反の危険
|
||||
bb1:
|
||||
%i_phi = phi i64 [%i_init, %entry], [%i_next, %bb2]
|
||||
%out_phi = phi i64 [%out_init, %entry], [%out_next, %bb2]
|
||||
; エラー: PHINode should have one entry for each predecessor!
|
||||
```
|
||||
|
||||
### 箱理論での実装
|
||||
```python
|
||||
# ループ開始時の箱
|
||||
boxes[1] = {"i": 0, "out": "", "n": 10}
|
||||
|
||||
# ループ本体の箱
|
||||
boxes[2] = {
|
||||
"i": boxes[1]["i"] + 1,
|
||||
"out": boxes[1]["out"] + "x",
|
||||
"n": boxes[1]["n"]
|
||||
}
|
||||
|
||||
# PHIは単なる選択
|
||||
if from_entry:
|
||||
i = boxes[0]["i"] # 初期値
|
||||
else:
|
||||
i = boxes[2]["i"] # ループからの値
|
||||
```
|
||||
|
||||
## 🚀 なぜ箱理論が有効か
|
||||
|
||||
### 1. メンタルモデルの一致
|
||||
- プログラマーの思考: 「変数に値を入れる」
|
||||
- 箱理論: 「箱に値を入れる」
|
||||
- → 直感的で理解しやすい
|
||||
|
||||
### 2. 実装の単純性
|
||||
- キャッシュ不要(箱が状態を保持)
|
||||
- 型変換不要(箱の中身は何でもOK)
|
||||
- dominance不要(箱の階層で自然に解決)
|
||||
|
||||
### 3. デバッグの容易さ
|
||||
```python
|
||||
# 任意の時点での状態確認
|
||||
print(f"Block {bid}: {boxes[bid]}")
|
||||
# Output: Block 2: {'i': 5, 'out': 'xxxxx', 'n': 10}
|
||||
```
|
||||
|
||||
## 📈 パフォーマンスへの影響
|
||||
|
||||
### コンパイル時
|
||||
- **Before**: PHI配線に50分悩む
|
||||
- **After**: 5分で完了(90%高速化)
|
||||
|
||||
### 実行時
|
||||
- allocaベースなので若干のオーバーヘッドあり
|
||||
- しかし「動かないより100倍マシ」
|
||||
- 最適化は動いてから考える
|
||||
|
||||
## 🔄 LoopFormとの統合
|
||||
|
||||
### LoopFormの利点を活かす
|
||||
```python
|
||||
# LoopFormで正規化された構造
|
||||
# dispatch → body → continue/break の単純パターン
|
||||
|
||||
def handle_loopform(self, dispatch_box, body_box):
|
||||
# dispatchでの値選択が自明に
|
||||
if first_iteration:
|
||||
values = dispatch_box["init_values"]
|
||||
else:
|
||||
values = body_box["loop_values"]
|
||||
```
|
||||
|
||||
### 箱理論との相性
|
||||
- LoopForm: 制御フローの箱
|
||||
- 箱理論: データフローの箱
|
||||
- 両者が完璧に調和
|
||||
|
||||
## 🎓 学術的意義
|
||||
|
||||
### 1. 実装複雑性の定量化
|
||||
- コード行数: 650 → 100(85%削減)
|
||||
- デバッグ時間: 50分 → 5分(90%削減)
|
||||
- エラー発生率: 頻繁 → ほぼゼロ
|
||||
|
||||
### 2. 新しい設計パラダイム
|
||||
- 「完璧なSSA」より「動くSSA」
|
||||
- 理論の美しさより実装の簡潔さ
|
||||
- 段階的最適化の重要性
|
||||
|
||||
### 3. 教育的価値
|
||||
- SSA形式を100行で教えられる
|
||||
- 学生が1日で実装可能
|
||||
- デバッグ方法が明確
|
||||
|
||||
## 💭 結論
|
||||
|
||||
箱理論は単なる簡略化ではない。**複雑な問題に対する根本的な視点の転換**である。
|
||||
|
||||
- LLVMの要求に振り回されない
|
||||
- 本質的に必要な機能だけに集中
|
||||
- 結果として劇的な簡略化を実現
|
||||
|
||||
「Everything is Box」の哲学が、SSA構築という最も複雑な問題の一つを、エレガントに解決した実例である。
|
||||
@ -1,278 +0,0 @@
|
||||
# SSA構築の技術詳細
|
||||
|
||||
## 1. Nyash特有のSSA課題
|
||||
|
||||
### 1.1 Box型システムとSSA
|
||||
```nyash
|
||||
// Nyashコード
|
||||
local str = "hello"
|
||||
local num = 42
|
||||
local result = str + num // 動的な型
|
||||
```
|
||||
|
||||
```llvm
|
||||
; LLVM IRでの課題
|
||||
%str = call i64 @nyash_string_new(i8* @.str.hello) ; handle
|
||||
%num = i64 42
|
||||
%result = ? ; concat_si? concat_ii? 実行時まで不明
|
||||
```
|
||||
|
||||
### 1.2 PHI型の決定問題
|
||||
```llvm
|
||||
; 複雑な合流での型推論
|
||||
bb1:
|
||||
%val1 = i64 123 ; integer handle
|
||||
br label %merge
|
||||
bb2:
|
||||
%val2 = i8* @string ; string pointer
|
||||
%handle = ptrtoint i8* %val2 to i64
|
||||
br label %merge
|
||||
merge:
|
||||
%phi = phi ??? [ %val1, %bb1 ], [ %handle, %bb2 ]
|
||||
; i64? i8*? 文脈依存で決定が必要
|
||||
```
|
||||
|
||||
## 2. BuilderCursor設計の詳細
|
||||
|
||||
### 2.1 問題:位置管理の複雑さ
|
||||
```rust
|
||||
// 悪い例:グローバルなbuilder状態
|
||||
builder.position_at_end(bb1);
|
||||
emit_instructions();
|
||||
builder.position_at_end(bb2); // 位置が変わる!
|
||||
// bb1の続きを書きたいが...?
|
||||
```
|
||||
|
||||
### 2.2 解決:BuilderCursor
|
||||
```rust
|
||||
pub struct BuilderCursor<'ctx, 'b> {
|
||||
builder: &'b Builder<'ctx>,
|
||||
closed_by_bid: HashMap<BasicBlockId, bool>,
|
||||
cur_bid: Option<BasicBlockId>,
|
||||
cur_llbb: Option<BasicBlock<'ctx>>,
|
||||
}
|
||||
|
||||
impl BuilderCursor {
|
||||
pub fn with_block<R>(&mut self, bid, bb, f: impl FnOnce(&mut Self) -> R) -> R {
|
||||
// 状態を保存
|
||||
let prev = (self.cur_bid, self.cur_llbb);
|
||||
self.at_end(bid, bb);
|
||||
let result = f(self);
|
||||
// 状態を復元
|
||||
(self.cur_bid, self.cur_llbb) = prev;
|
||||
result
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 2.3 終端管理
|
||||
```rust
|
||||
pub fn emit_term(&mut self, bid: BasicBlockId, f: impl FnOnce(&Builder)) {
|
||||
self.assert_open(bid); // 閉じたブロックへの挿入を防止
|
||||
f(self.builder);
|
||||
self.closed_by_bid.insert(bid, true); // 明示的に閉じる
|
||||
}
|
||||
```
|
||||
|
||||
## 3. Sealed SSAの実装
|
||||
|
||||
### 3.1 従来のアプローチ(問題あり)
|
||||
```rust
|
||||
// emit_jump/branchで即座にPHI配線
|
||||
if let Some(phis) = phis_by_block.get(target) {
|
||||
for (dst, phi, inputs) in phis {
|
||||
// predからの値をその場で配線
|
||||
let val = vmap.get(vid)?; // でも値がまだない場合も...
|
||||
phi.add_incoming(&[(val, pred_bb)]);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 3.2 Sealed SSAアプローチ
|
||||
```rust
|
||||
// ブロック終了時にスナップショット
|
||||
let mut block_end_values: HashMap<BlockId, HashMap<ValueId, Value>> = HashMap::new();
|
||||
|
||||
// 各ブロック降下後
|
||||
let snapshot = vmap.iter()
|
||||
.filter(|(vid, _)| defined_in_block.contains(vid))
|
||||
.map(|(k, v)| (*k, *v))
|
||||
.collect();
|
||||
block_end_values.insert(bid, snapshot);
|
||||
|
||||
// seal時にスナップショットから配線
|
||||
fn seal_block(...) {
|
||||
let val = block_end_values[&pred_bid].get(&vid)
|
||||
.or_else(|| /* フォールバック */);
|
||||
}
|
||||
```
|
||||
|
||||
### 3.3 PHI正規化の課題
|
||||
```rust
|
||||
// 理想:pred数 = incoming数
|
||||
assert_eq!(phi.count_incoming(), preds.get(&bb).len());
|
||||
|
||||
// 現実:MIR PHIとCFG predsの不一致
|
||||
// - MIRは静的に決定
|
||||
// - CFGは動的に変化(最適化、終端追加など)
|
||||
```
|
||||
|
||||
## 4. 型変換の統一戦略
|
||||
|
||||
### 4.1 基本方針
|
||||
```llvm
|
||||
; すべてのBox値はi64 handleとして統一
|
||||
; 必要な箇所でのみptr変換
|
||||
|
||||
; 原則
|
||||
%handle = i64 ...
|
||||
%ptr = inttoptr i64 %handle to i8* ; 必要時のみ
|
||||
|
||||
; PHIも原則i64
|
||||
%phi = phi i64 [...], [...]
|
||||
```
|
||||
|
||||
### 4.2 文字列処理の特殊性
|
||||
```llvm
|
||||
; 文字列リテラル
|
||||
%str_ptr = getelementptr [6 x i8], [6 x i8]* @.str.hello, i32 0, i32 0
|
||||
%handle = call i64 @nyash_string_new(i8* %str_ptr)
|
||||
|
||||
; 文字列操作(handleベース)
|
||||
%len = call i64 @nyash.string.len_h(i64 %handle)
|
||||
%sub = call i64 @nyash.string.substring_hii(i64 %handle, i64 %start, i64 %end)
|
||||
```
|
||||
|
||||
## 5. デバッグとトレース
|
||||
|
||||
### 5.1 環境変数による制御
|
||||
```bash
|
||||
NYASH_CLI_VERBOSE=1 # 基本ログ
|
||||
NYASH_LLVM_TRACE_PHI=1 # PHI配線の詳細
|
||||
NYASH_LLVM_PHI_SEALED=1 # Sealed SSAモード
|
||||
NYASH_ENABLE_LOOPFORM=1 # LoopForm実験
|
||||
```
|
||||
|
||||
### 5.2 診断出力の例
|
||||
```
|
||||
[PHI:new] fn=Main_esc_json_1 bb=30 dst=30 ty=i64 inputs=(23->7),(27->30)
|
||||
[PHI] sealed add pred_bb=27 val=30 ty=i64 (snapshot)
|
||||
[PHI] sealed add (synth) pred_bb=23 zero-ty=i64
|
||||
[LLVM] terminator present for bb=27
|
||||
[LoopForm] detect while-pattern: header=15 body=16 other=17
|
||||
```
|
||||
|
||||
## 6. 未解決の技術課題
|
||||
|
||||
### 6.1 完全なDominance保証
|
||||
- 現状:hoistingとentry block配置で部分対応
|
||||
- 課題:ループ内での循環参照
|
||||
- 将来:LoopFormでの構造化解決
|
||||
|
||||
### 6.2 最適PHI配置
|
||||
- 現状:MIR指定の場所に素直に配置
|
||||
- 課題:冗長なPHIの削減
|
||||
- 将来:PHI最小化アルゴリズム
|
||||
|
||||
### 6.3 例外安全性
|
||||
- 現状:ゼロ値合成でクラッシュ回避
|
||||
- 課題:意味的正確性の保証
|
||||
- 将来:Box型システムでのnull安全性
|
||||
|
||||
## 7. 箱理論による革命的簡略化
|
||||
|
||||
### 7.1 実装アーキテクチャ
|
||||
```python
|
||||
class BoxBasedSSA:
|
||||
def __init__(self):
|
||||
self.boxes = {} # block_id -> {var: value}
|
||||
self.current_box = {}
|
||||
self.deferred_phis = [] # 後処理用
|
||||
```
|
||||
|
||||
### 7.2 PHI処理の簡略化
|
||||
```python
|
||||
# 従来:複雑なdominance計算とキャッシュ
|
||||
def resolve_phi_complex(self, phi_info):
|
||||
# 300行のResolver処理...
|
||||
# dominance確認、型変換、キャッシュ管理
|
||||
|
||||
# 箱理論:単純な値選択
|
||||
def resolve_phi_simple(self, var, predecessors):
|
||||
for pred_id, _ in predecessors:
|
||||
if self.came_from(pred_id):
|
||||
return self.boxes[pred_id].get(var, 0)
|
||||
return 0
|
||||
```
|
||||
|
||||
### 7.3 alloca/load/store方式への転換
|
||||
```python
|
||||
# SSA形式を諦めて、メモリベースの実装
|
||||
def emit_variable_access(self, var):
|
||||
if var not in self.allocas:
|
||||
# 変数用のメモリ確保
|
||||
self.allocas[var] = self.builder.alloca(self.i64, name=var)
|
||||
|
||||
# 読み込み
|
||||
def load_var():
|
||||
return self.builder.load(self.allocas[var])
|
||||
|
||||
# 書き込み
|
||||
def store_var(value):
|
||||
self.builder.store(value, self.allocas[var])
|
||||
```
|
||||
|
||||
### 7.4 型システムの単純化
|
||||
```python
|
||||
# すべてをi64として扱う
|
||||
def to_i64(self, value):
|
||||
if is_pointer(value):
|
||||
# ポインタ→ハンドル変換
|
||||
return self.call_from_i8_string(value)
|
||||
elif is_integer(value):
|
||||
return value
|
||||
else:
|
||||
return 0 # デフォルト
|
||||
|
||||
# 必要時のみポインタ変換
|
||||
def to_ptr_if_needed(self, value, context):
|
||||
if context == "console_log":
|
||||
return self.call_to_i8p_h(value)
|
||||
return value
|
||||
```
|
||||
|
||||
### 7.5 パフォーマンス特性
|
||||
```
|
||||
従来のSSA実装:
|
||||
- コンパイル時間: 遅い(PHI配線で50分)
|
||||
- 実行時性能: 最適
|
||||
- メモリ使用: 少ない
|
||||
|
||||
箱理論実装:
|
||||
- コンパイル時間: 高速(5分以内)
|
||||
- 実行時性能: やや遅い(alloca/load/storeのオーバーヘッド)
|
||||
- メモリ使用: やや多い(変数ごとにalloca)
|
||||
|
||||
トレードオフ: "動かないより100倍マシ"
|
||||
```
|
||||
|
||||
### 7.6 実装の段階的移行
|
||||
```python
|
||||
# Phase 1: 最小動作確認(現在)
|
||||
- allocaベースで全変数管理
|
||||
- PHI完全スキップ
|
||||
- 動作優先
|
||||
|
||||
# Phase 2: 部分的最適化(将来)
|
||||
- 読み取り専用変数はSSA
|
||||
- ループ変数のみalloca
|
||||
- 段階的性能改善
|
||||
|
||||
# Phase 3: 完全最適化(長期)
|
||||
- 箱理論の知見を活かしたSSA再実装
|
||||
- 100行のシンプルさを維持
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
*これらの技術詳細は、論文の Technical Section の基礎となる。箱理論により、理論的な美しさより実装の実用性を優先した新しいアプローチを示している。*
|
||||
@ -1,323 +0,0 @@
|
||||
# LoopSignal IR: Box×Loopによる制御統一と最小MIR設計
|
||||
|
||||
## 1. 背景と問題
|
||||
|
||||
NyashのMIR(Middle Intermediate Representation)は現在、制御フロー命令が分散している:
|
||||
- `if`/`while`/`for`がそれぞれ独立した命令
|
||||
- `return`/`break`/`continue`が個別の特殊形式
|
||||
- ジェネレータ/async/await の統一的表現がない
|
||||
- Box哲学(Everything is Box)とLoop哲学(Everything is Loop)が未統合
|
||||
|
||||
**問題点**:
|
||||
1. 最適化パスが各制御構造を個別に扱う必要がある
|
||||
2. 新しい制御構造(generator/async)の追加が困難
|
||||
3. LLVM IRへの変換が複雑(各構文ごとの特殊ケース)
|
||||
4. デバッグ情報(DWARF)の一貫性が保ちにくい
|
||||
|
||||
## 2. 目標
|
||||
|
||||
**ユースケース**を統一的に扱える最小MIR設計:
|
||||
- `scope` - RAII/デストラクタ呼び出し
|
||||
- `if`/`while`/`for` - 条件分岐とループ
|
||||
- `function`/`return` - 関数呼び出しと戻り値
|
||||
- `generator`/`yield` - 中断可能な計算
|
||||
- `async`/`await` - 非同期計算(将来)
|
||||
|
||||
**設計原則**:
|
||||
- Everything is Box × Everything is Loop の融合
|
||||
- 最小命令セット(4命令)ですべての制御を表現
|
||||
- LLVM IRへの直接的なマッピング
|
||||
- 段階的導入可能(既存MIRとの共存)
|
||||
|
||||
## 3. 設計
|
||||
|
||||
### 3.1 型: LoopSignal<T>
|
||||
|
||||
```rust
|
||||
// LoopSignal = 制御フロー + 値の統一表現
|
||||
enum LoopSignal<T> {
|
||||
Next(T), // 継続(次のイテレーション)
|
||||
Break(T), // 脱出(ループ終了)
|
||||
Yield(T), // 中断(ジェネレータ)
|
||||
Return(T), // 復帰(関数終了)- オプション
|
||||
}
|
||||
|
||||
// LLVM表現: { i8 tag, iN value }
|
||||
// tag: 0=Next, 1=Break, 2=Yield, 3=Return
|
||||
```
|
||||
|
||||
### 3.2 MIR命令
|
||||
|
||||
```
|
||||
loop.begin <label> <init_block>
|
||||
; 前提条件: スタックトップにLoopBox
|
||||
; 事後条件: ループコンテキスト確立、init実行
|
||||
|
||||
loop.iter <label> <step_block>
|
||||
; 前提条件: ループコンテキスト存在
|
||||
; 事後条件: Signal生成、スタックにpush
|
||||
|
||||
loop.branch <label> <next_block> <break_block> [<yield_block>]
|
||||
; 前提条件: スタックトップにSignal
|
||||
; 事後条件: Signalに応じて分岐
|
||||
; 未定義動作: 想定外のSignalタグ
|
||||
|
||||
loop.end <label> <fini_block>
|
||||
; 前提条件: ループコンテキスト存在
|
||||
; 事後条件: fini実行、コンテキスト破棄
|
||||
```
|
||||
|
||||
### 3.3 Box=Loop1(init/step/fini)
|
||||
|
||||
```rust
|
||||
// すべてのBoxは1回ループ(Loop1)として表現可能
|
||||
trait LoopBox {
|
||||
fn init(&mut self); // 初期化
|
||||
fn step(&mut self) -> Signal; // 実行(1回でBreak)
|
||||
fn fini(&mut self); // 終了処理
|
||||
}
|
||||
|
||||
// RAII対応: finiでデストラクタ呼び出し
|
||||
```
|
||||
|
||||
### 3.4 Lowering規則
|
||||
|
||||
**scope(RAII)**:
|
||||
```
|
||||
scope { body } →
|
||||
loop.begin L init=nop
|
||||
loop.iter L step=body;Break
|
||||
loop.branch L next=unreachable break=done
|
||||
loop.end L fini=cleanup
|
||||
```
|
||||
|
||||
**while**:
|
||||
```
|
||||
while(cond) { body } →
|
||||
loop.begin L init=nop
|
||||
loop.iter L step=if(cond,Next,Break)
|
||||
loop.branch L next=body_then_loop break=done
|
||||
loop.end L fini=nop
|
||||
```
|
||||
|
||||
**for-in**:
|
||||
```
|
||||
for x in iter { body } →
|
||||
loop.begin L init=iter.init
|
||||
loop.iter L step=iter.next
|
||||
loop.branch L next=bind(x);body break=done
|
||||
loop.end L fini=iter.fini
|
||||
```
|
||||
|
||||
**return**:
|
||||
```
|
||||
return value →
|
||||
Signal::Return(value)
|
||||
; 関数全体がLoopBoxなので、Returnで脱出
|
||||
```
|
||||
|
||||
**yield**:
|
||||
```
|
||||
yield value →
|
||||
Signal::Yield(value)
|
||||
; ジェネレータのloop.branchがyield_blockを持つ
|
||||
```
|
||||
|
||||
## 4. 例
|
||||
|
||||
### while(true){break} の最小例
|
||||
|
||||
**Before(現在のMIR)**:
|
||||
```
|
||||
while_begin:
|
||||
push true
|
||||
branch_if_false while_end
|
||||
jump while_break ; break
|
||||
while_end:
|
||||
```
|
||||
|
||||
**After(LoopSignal IR)**:
|
||||
```
|
||||
loop.begin L1 init=nop
|
||||
loop.iter L1 step=push(Break(unit))
|
||||
loop.branch L1 next=unreachable break=done
|
||||
loop.end L1 fini=nop
|
||||
done:
|
||||
```
|
||||
|
||||
### for-inのLoopBox化
|
||||
|
||||
```
|
||||
for x in [1,2,3] { print(x) } →
|
||||
|
||||
loop.begin L1 init={
|
||||
iter = ArrayIterBox([1,2,3])
|
||||
iter.init()
|
||||
}
|
||||
loop.iter L1 step={
|
||||
signal = iter.step() ; Next(1), Next(2), Next(3), Break
|
||||
}
|
||||
loop.branch L1
|
||||
next={ x = signal.value; print(x) }
|
||||
break=done
|
||||
loop.end L1 fini={
|
||||
iter.fini() ; イテレータのクリーンアップ
|
||||
}
|
||||
```
|
||||
|
||||
### scope=Loop1の畳み込み
|
||||
|
||||
```
|
||||
{ let x = File("data"); x.read() } →
|
||||
|
||||
loop.begin L1 init={ x = File("data") }
|
||||
loop.iter L1 step={ x.read(); Break }
|
||||
loop.branch L1 next=unreachable break=next
|
||||
loop.end L1 fini={ x.close() } ; RAII
|
||||
```
|
||||
|
||||
## 5. 図(Mermaid)
|
||||
|
||||
### 合流点を1箇所に集約したdispatch CFG
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
Start[loop.begin] --> Init[init block]
|
||||
Init --> Iter[loop.iter]
|
||||
Iter --> Step[step block]
|
||||
Step --> Signal{loop.branch}
|
||||
|
||||
Signal -->|Next| Body[next block]
|
||||
Body --> Iter
|
||||
|
||||
Signal -->|Break| End[loop.end]
|
||||
Signal -->|Yield| Yield[yield block]
|
||||
Yield -.->|resume| Iter
|
||||
|
||||
End --> Fini[fini block]
|
||||
Fini --> Done[done]
|
||||
|
||||
style Signal fill:#f9f,stroke:#333,stroke-width:4px
|
||||
```
|
||||
|
||||
### Signalタグに基づくswitch分岐の構造
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
Stack[Signal on Stack] --> Extract[Extract tag]
|
||||
Extract --> Switch{switch tag}
|
||||
|
||||
Switch -->|0: Next| NextBB[next_block]
|
||||
Switch -->|1: Break| BreakBB[break_block]
|
||||
Switch -->|2: Yield| YieldBB[yield_block]
|
||||
Switch -->|3: Return| ReturnBB[return_block]
|
||||
Switch -->|default| Trap[unreachable/trap]
|
||||
|
||||
style Switch fill:#ff9,stroke:#333,stroke-width:2px
|
||||
```
|
||||
|
||||
## 6. 最適化と安全性
|
||||
|
||||
### Loop1インライン化
|
||||
- 1回だけ実行されるループは完全にインライン化可能
|
||||
- `loop.begin → step → loop.end` を直接実行に変換
|
||||
|
||||
### 状態省略
|
||||
- ステートレスなLoopBoxはinit/finiを省略
|
||||
- LLVMのmem2regで自動的に最適化
|
||||
|
||||
### DCE/LICM/Inline適用条件
|
||||
- **DCE(Dead Code Elimination)**: unreachableなnext blockを削除
|
||||
- **LICM(Loop Invariant Code Motion)**: LoopBox内の不変式を外に移動
|
||||
- **Inline**: 小さなstep関数は自動インライン化
|
||||
|
||||
### DWARF対策
|
||||
- 各loop命令に元のソース位置情報を保持
|
||||
- デバッガは論理的な制御構造として表示
|
||||
|
||||
## 7. 段階導入計画
|
||||
|
||||
### Phase 1(P1): 基礎実装
|
||||
- LoopSignal型とMIR命令の定義
|
||||
- while/forのLowering実装
|
||||
- 既存MIRとの共存層
|
||||
|
||||
### Phase 2(P2): 最適化と拡張
|
||||
- Loop1インライン化
|
||||
- generator/yieldサポート
|
||||
- LLVM IRコード生成
|
||||
|
||||
### Phase 3(P3): 完全移行
|
||||
- すべての制御構造をLoopSignal IRに
|
||||
- 旧MIRの廃止
|
||||
- async/awaitの統合
|
||||
|
||||
### フォールバック(旧MIRへの逆Lowering)
|
||||
```
|
||||
loop.begin/end → nop
|
||||
loop.iter → 直接実行
|
||||
loop.branch → if/jump の組み合わせ
|
||||
```
|
||||
|
||||
## 8. 関連研究と差分
|
||||
|
||||
| アプローチ | Nyash LoopSignal IR | 差分 |
|
||||
|-----------|-------------------|------|
|
||||
| CPS変換 | Signal = 限定的継続 | 明示的なSignal型で制御 |
|
||||
| 代数的効果 | Loop = エフェクトハンドラ | Boxベースの具象化 |
|
||||
| ジェネレータ | Yield = 中断可能Loop | 統一的なSignal処理 |
|
||||
| SSA系IR | phi関数の代わりにSignal | 制御と値の統合 |
|
||||
|
||||
## 9. 成果物とKPI
|
||||
|
||||
### メトリクス
|
||||
- MIR命令数: 30+ → 4命令
|
||||
- 制御構造の正規化率: 100%
|
||||
- LLVM IR生成コード: 50%削減
|
||||
- 最適化パス実装: 80%共通化
|
||||
|
||||
### テスト観点
|
||||
1. 意味保存: 各Lowering前後で同じ動作
|
||||
2. 性能: Loop1は最適化後オーバーヘッドゼロ
|
||||
3. デバッグ: ソースレベルデバッグ可能
|
||||
4. 互換性: 段階的移行中も動作保証
|
||||
|
||||
## 付録
|
||||
|
||||
### 用語対比
|
||||
- **Box** = 空間的抽象(データ構造)
|
||||
- **Loop** = 時間的抽象(制御フロー)
|
||||
- **LoopBox** = 時空間統一オブジェクト
|
||||
|
||||
### 命令一覧
|
||||
```
|
||||
loop.begin <label> <init> ; ループ開始
|
||||
loop.iter <label> <step> ; イテレーション
|
||||
loop.branch <label> <blocks> ; 分岐
|
||||
loop.end <label> <fini> ; ループ終了
|
||||
```
|
||||
|
||||
### タグ割当
|
||||
```
|
||||
0: Next ; 継続
|
||||
1: Break ; 脱出
|
||||
2: Yield ; 中断
|
||||
3: Return ; 復帰
|
||||
4-255: 予約 ; 将来拡張
|
||||
```
|
||||
|
||||
### 疑似コード集
|
||||
```llvm
|
||||
; LoopSignal in LLVM
|
||||
%signal = { i8, i64 } ; tag + value
|
||||
%tag = extractvalue %signal, 0
|
||||
switch i8 %tag, label %trap [
|
||||
i8 0, label %next
|
||||
i8 1, label %break
|
||||
i8 2, label %yield
|
||||
]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
このRFCは、NyashのBox哲学とLoop哲学を統合し、最小限のMIR命令セットですべての制御構造を表現する設計を提案します。段階的導入により、既存システムを破壊することなく、より単純で強力なIRへの移行を可能にします。
|
||||
@ -1,36 +0,0 @@
|
||||
# LoopSignal IR: Box×Loopによる制御統一と最小MIR設計(短いRFC)
|
||||
|
||||
目的: LoopSignal IR(Everything is Box × Everything is Loop の融合)を短いRFCとして1本にまとめてください。構文は従来のまま、内部MIRで LoopBox+Signal に正規化する設計です。日本語でお願いします。
|
||||
|
||||
読んでほしいファイル(リポ相対パス)
|
||||
- docs/private/papers/paper-e-loop-signal-ir/main-paper-jp.md
|
||||
- docs/private/papers/paper-b-nyash-execution-model/main-paper-jp.md
|
||||
- src/mir/
|
||||
|
||||
出力フォーマット(Markdown)
|
||||
- タイトル: “LoopSignal IR: Box×Loopによる制御統一と最小MIR設計”
|
||||
- 1. 背景と問題
|
||||
- 2. 目標(ユースケース: scope/if/while/for/function/return/generator/async)
|
||||
- 3. 設計
|
||||
- 3.1 型: LoopSignal<T>(Next/Break/Yield[/Return])とLLVM表現(tag+iN)
|
||||
- 3.2 MIR命令: loop.begin/iter/branch/end の仕様(前提/事後条件/未定義)
|
||||
- 3.3 Box=Loop1(init/step/fini)とRAIIの対応
|
||||
- 3.4 Lowering規則(各構文の擬似MIR; for-in/return/yieldを含む)
|
||||
- 4. 例
|
||||
- while(true){break} の最小例(Before/After)
|
||||
- for-inのLoopBox化(ディスパッチ図)
|
||||
- scope=Loop1の畳み込み例
|
||||
- 5. 図(Mermaid)
|
||||
- 合流点を1箇所に集約した dispatch CFG
|
||||
- Signalタグに基づくswitch分岐の構造
|
||||
- 6. 最適化と安全性
|
||||
- Loop1インライン化、状態省略、DCE/LICM/Inline適用条件、DWARF対策
|
||||
- 7. 段階導入計画(P1/P2/P3)とフォールバック(旧MIRへの逆Lowering)
|
||||
- 8. 関連研究と差分(CPS/代数的効果/ジェネレータ/SSA系IRとの比較)
|
||||
- 9. 成果物とKPI(メトリクス・テスト観点)
|
||||
- 付録: 用語対比(Box=空間/Loop=時間)、命令一覧、タグ割当、疑似コード集
|
||||
|
||||
記法/制約
|
||||
- コード断片は擬似MIR/擬似LLVMでOK(実コード改変はしない)。
|
||||
- 図はMermaidで書いてください(```mermaid ...```)。
|
||||
- 2〜4ページ程度の密度で、可搬な単一Markdownとして出力してください。
|
||||
@ -1,276 +0,0 @@
|
||||
Loaded cached credentials.
|
||||
承知いたしました。LoopSignal IRの技術レビューと実装計画について、ご指定のファイルを基に分析し、提案をまとめます。
|
||||
|
||||
---
|
||||
|
||||
### LoopSignal IR: 技術レビューと実装計画 v1
|
||||
|
||||
#### 1. 仕様の確定案
|
||||
|
||||
##### 1.1. `LoopSignal` 型の正格仕様
|
||||
|
||||
`LoopSignal` は、ループ内の制御フローを統一的に表現するための型であり、タグとオプションのペイロードで構成される。
|
||||
|
||||
* **型定義 (Rust風):**
|
||||
```rust
|
||||
enum LoopSignal<T> {
|
||||
Continue(T), // ループ継続 (ループ変数の更新値を運ぶ)
|
||||
Break(T), // ループ脱出 (ループの戻り値を運ぶ)
|
||||
Return(T), // 関数からのリターン (関数の戻り値を運ぶ)
|
||||
Yield(T), // ジェネレータのyield (yieldする値を運ぶ)
|
||||
// Await(Future<T>), // 将来的な拡張: 非同期処理の中断
|
||||
}
|
||||
```
|
||||
|
||||
* **タグ値:**
|
||||
* `0`: Continue
|
||||
* `1`: Break
|
||||
* `2`: Return
|
||||
* `3`: Yield
|
||||
|
||||
* **ABI/LLVM表現例:**
|
||||
ゼロコスト抽象を目指し、`Continue` が最も頻繁に発生することを前提に最適化する。LLVM IRでは、戻り値を `struct` で表現するのが一般的。
|
||||
|
||||
```llvm
|
||||
; LoopSignal<Value> の表現例
|
||||
; %Value はペイロードの型 (e.g., i64, %MyObject*)
|
||||
%LoopSignal = type { i8, %Value } ; { tag, payload }
|
||||
|
||||
; ゼロコスト表現の検討
|
||||
; 関数の戻り値として LoopSignal を直接返すのではなく、
|
||||
; ループ本体の関数はペイロードの値だけを返し、
|
||||
; シグナル自体は別の方法(例:ステータス引数への書き込み、複数のリターンブロック)で伝える。
|
||||
; しかし、LLVMの最適化能力を信じ、まずは上記のシンプルなstruct表現で実装し、
|
||||
; パフォーマンスが問題になる箇所で特殊化するのが現実的。
|
||||
; 多くの `Continue` はループ内で処理され、関数境界を越えないため、
|
||||
; レジスタ割り当てによりオーバーヘッドはほぼゼロになることが期待される。
|
||||
```
|
||||
**ゼロコスト表現の可否:** **可能**である。ループが関数境界をまたがない限り、`LoopSignal` は物理的なメモリ確保を伴わず、仮想レジスタと条件分岐にコンパイルされる。`loop.branch` が `switch` 命令に変換され、`Continue` のケースが他のケースより優先的に配置(`default` 分岐など)されることで、最も高速なパスとなる。
|
||||
|
||||
##### 1.2. MIR命令の厳密な意味論
|
||||
|
||||
* `loop.begin <label>`:
|
||||
* 新しいループスコープを開始する。`<label>` はこのループを一意に識別する。
|
||||
* SSAのPHIノードの配置ポイントとなる。ループに複数回突入する全ての値(ループ変数など)は、この地点でPHIノードによってマージされる。
|
||||
* **未定義動作:** `loop.end` と対応が取れない場合。
|
||||
|
||||
* `loop.iter <label>, (%vars...) -> (%next_vars...)`:
|
||||
* ループの1イテレーションの開始を示す。ループ条件の評価やループ変数の更新を行うコードがここに配置される。
|
||||
* `%vars` を受け取り、次の状態である `%next_vars` を生成する。ジェネレータの再開時には、外部から渡された値が `%vars` の一部となる。
|
||||
* **意味論:** この命令自体は副作用を持たないが、後続のコードがループ継続/脱出の判断を行う。
|
||||
|
||||
* `loop.branch <label>, %signal`:
|
||||
* `%signal` (`LoopSignal` 型) の値に基づき、制御フローを分岐させる。
|
||||
* `Continue(val)`: `<label>` に対応する `loop.iter` のバックエッジにジャンプする。`val` は次のイテレーションのPHIノードへの入力となる。
|
||||
* `Break(val)` / `Return(val)` / `Yield(val)`: `<label>` に対応する `loop.end` にジャンプする。`val` は `loop.end` の結果となる。
|
||||
* **未定義動作:** `%signal` が初期化されていない場合。対応する `<label>` がスコープ内にない場合。
|
||||
|
||||
* `loop.end <label> -> (%signal, %value)`:
|
||||
* ループスコープを終了する。`loop.branch` からの `Break`, `Return`, `Yield` シグナルを受け取る。
|
||||
* この命令の結果として、ループを終了させたシグナルの種類と、そのペイロード値が出力される。
|
||||
* 親ループが存在する場合、この結果が親ループの `LoopSignal` となり、`loop.branch` に渡される。
|
||||
* **例外扱い:** ループ内で発生した例外は、`LoopSignal` とは別の経路(例: `invoke` と `landingpad`)で処理されるべき。`loop.branch` は例外を伝播させない。
|
||||
|
||||
##### 1.3. SSA/PHI配置の規則
|
||||
|
||||
* **合流点の標準形:**
|
||||
* `loop.begin`: ループの入り口(初回実行とバックエッジからの合流)。ループ変数はここでPHIノードを持つ。
|
||||
* `loop.end` の後: ループが正常に終了した(`Break`した)後の実行パス。ループの戻り値はここでPHIノードを持つ(ループに入らなかったケースとの合流)。
|
||||
* **インバリアント:**
|
||||
* 全てのループ変数は `loop.begin` でPHIノードによって定義される。
|
||||
* `loop.begin` のPHIノードへの入力は、ループへの初回突入時の値と、各 `loop.branch` の `Continue` ペイロードから供給される。
|
||||
* `loop.end` は複数の `loop.branch` からの `Break` 値を受け取るため、PHIノードと同様の機能を持つ。
|
||||
|
||||
#### 2. Lowering規則(構文→MIR)
|
||||
|
||||
##### 2.1. 各構文から擬似MIRへの写像
|
||||
|
||||
* **`if cond { true_branch } else { false_branch }`**
|
||||
```mir
|
||||
%cond_val = ...
|
||||
br.cond %cond_val, then: <bb_true>, else: <bb_false>
|
||||
|
||||
<bb_true>:
|
||||
... // true_branch のコード
|
||||
%signal_true = ...
|
||||
br <bb_merge>
|
||||
|
||||
<bb_false>:
|
||||
... // false_branch のコード
|
||||
%signal_false = ...
|
||||
br <bb_merge>
|
||||
|
||||
<bb_merge>:
|
||||
%signal = phi [%signal_true, <bb_true>], [%signal_false, <bb_false>]
|
||||
// この %signal が後続の loop.branch に渡される
|
||||
```
|
||||
|
||||
* **`while cond { body }`**
|
||||
```mir
|
||||
loop.begin <L0>
|
||||
br <L0_iter>
|
||||
|
||||
<L0_iter>:
|
||||
%loop_var_next = phi [%loop_var_init, <entry>], [%loop_var_updated, <L0_body>]
|
||||
%cond_val = ... // cond の評価
|
||||
br.cond %cond_val, then: <L0_body>, else: <L0_break>
|
||||
|
||||
<L0_body>:
|
||||
... // body のコード
|
||||
%loop_var_updated = ...
|
||||
%signal = (Signal::Continue, %loop_var_updated)
|
||||
loop.branch <L0>, %signal // Continue
|
||||
|
||||
<L0_break>:
|
||||
%break_val = ... // ループの戻り値 (e.g., unit)
|
||||
%signal = (Signal::Break, %break_val)
|
||||
loop.branch <L0>, %signal // Break
|
||||
|
||||
(%final_signal, %final_value) = loop.end <L0>
|
||||
// final_signal に基づいて後続の処理
|
||||
```
|
||||
|
||||
* **`function` と `return`**
|
||||
```mir
|
||||
fn my_func(%arg1, ...) -> %ret_val {
|
||||
loop.begin <F_main> // 関数全体を暗黙のループと見なす
|
||||
|
||||
... // 関数の本体
|
||||
// return expr; は以下に変換
|
||||
%return_val = ... // expr の評価
|
||||
%signal = (Signal::Return, %return_val)
|
||||
loop.branch <F_main>, %signal
|
||||
|
||||
...
|
||||
|
||||
// 関数の終端 (暗黙の return)
|
||||
%implicit_ret_val = ...
|
||||
%signal_implicit = (Signal::Return, %implicit_ret_val)
|
||||
loop.branch <F_main>, %signal_implicit
|
||||
|
||||
(%final_signal, %final_value) = loop.end <F_main>
|
||||
// final_signal は必ず Return のはず
|
||||
ret %final_value
|
||||
}
|
||||
```
|
||||
|
||||
* **`generator` と `yield`**
|
||||
`yield` は `Return` と同様に `loop.branch` に変換されるが、`Signal::Yield` を使用する。関数の再開は、`loop.iter` に値(`resume_arg`)を渡すことで実現される。
|
||||
|
||||
##### 2.2. `Return` を `Signal` に含めるか分離するかの設計比較
|
||||
|
||||
* **統一モデル (ReturnをSignalに含める)**
|
||||
* **長所:**
|
||||
* **IRの直交性:** 制御フローの変更はすべて `LoopSignal` と `loop.branch` に統一され、IRがシンプルで美しくなる。
|
||||
* **最適化の容易さ:** `return`, `break`, `yield` を同じ枠組みで扱えるため、最適化パス(特にインライン化)の設計が単純になる。`if cond { return } else { break }` のような複雑な制御フローも自然に表現できる。
|
||||
* **リファクタリング耐性:** ループ内のコードを別関数に抽出したり、その逆を行ったりするリファクタリングが容易になる。
|
||||
* **短所:**
|
||||
* **僅かな冗長性:** 関数の末尾にある単純な `return` も `Signal` 生成と `loop.branch` を経由するため、見た目上は冗長に感じる可能性がある。
|
||||
|
||||
* **分離モデル (Returnを別の命令 `mir.ret` にする)**
|
||||
* **長所:**
|
||||
* **従来との親和性:** 既存のCFGベースの考え方に近く、`ret` 命令は関数の終端として直感的。
|
||||
* **短所:**
|
||||
* **IRの複雑化:** 制御フローを終端させる命令が `loop.branch` と `ret` の2種類になり、解析や変換が複雑になる。
|
||||
* **最適化の阻害:** `if cond { return } else { break }` の合流点での処理が困難になる。インライン化の際に、呼び出し先の `ret` を呼び出し元の `break` や `continue` に変換する必要があり、アドホックな処理が必要になる。
|
||||
|
||||
* **採用推奨:** **統一モデルを強く推奨する。** 長期的な保守性、拡張性、最適化のポテンシャルを考慮すると、IRのシンプルさと直交性がもたらすメリットは、僅かな冗長性を補って余りある。
|
||||
|
||||
#### 3. 最適化パス
|
||||
|
||||
* **Loop1完全インライン化:**
|
||||
* ループ本体に `loop.branch` が1つしかなく、そのシグナルが静的に `Continue` である場合、ループ構造 (`loop.begin/iter/branch/end`) を完全に除去し、ループ本体のコードを親ブロックに展開する。これは単純な `scope { ... }` のゼロコスト抽象化に繋がる。
|
||||
* **Yieldなし状態省略:**
|
||||
* ジェネレータ関数のMIRを解析し、`Signal::Yield` を生成する `loop.branch` が存在しない場合、その関数をジェネレータではなく通常の関数として扱う。これにより、ステートマシンの生成コストを回避できる。
|
||||
* **分岐合流点正規化:**
|
||||
* ネストした `if` 文などを解析し、複数の `loop.branch` を持つフラットな構造に変換する。これにより、後続のパスが扱いやすい標準形にできる。
|
||||
* **既存パスへの影響:**
|
||||
* **DCE (Dead Code Elimination):** `loop.branch` の静的に到達不能な分岐(例: `if false { return }`)を検出し、関連コードを削除できる。
|
||||
* **LICM (Loop Invariant Code Motion):** `loop.begin` から `loop.end` の範囲が明確になるため、ループ不変条件の検出がより正確かつ容易になる。
|
||||
* **Inlining:** 関数インライン化が非常に強力になる。呼び出し先の `loop.end` が返す `Signal` を、呼び出し元の `loop.branch` に直接接続できる。例えば、`return` は呼び出し元での値の代入に、`break` は呼び出し元のループの `break` に変換される。
|
||||
|
||||
#### 4. フォールバック/互換性
|
||||
|
||||
* **LoopSignal IR → 従来MIRへの逆Loweringパス:**
|
||||
* 新IRの健全性を検証し、段階的に導入するために、コンパイラフラグ (`--no-loop-signal-ir`) でON/OFF可能な逆変換パスを設計する。
|
||||
* **変換ロジック:**
|
||||
1. `loop.begin <L>`: 新しい基本ブロック `<L_header>` を作成。
|
||||
2. `loop.iter`: ヘッダブロック内にコードを配置。
|
||||
3. `loop.branch <L>, %signal`: `%signal` のタグで `switch` 命令を生成する。
|
||||
* `Continue`: `<L_header>` へのバックエッジを作成。
|
||||
* `Break`: ループ外の `<L_exit>` ブロックへジャンプ。
|
||||
* `Return`: 関数のグローバルなリターンブロックへジャンプ。
|
||||
4. `loop.end <L>`: `<L_exit>` ブロックに対応。複数の `Break` からの値はPHIノードでマージする。
|
||||
* このパスにより、LoopSignal IRをサポートしないバックエンド(旧VMやデバッガなど)でも動作を継続できる。
|
||||
|
||||
#### 5. リスクと回避策
|
||||
|
||||
* **デバッグ情報 (DWARF/位置情報):**
|
||||
* **リスク:** `Signal` による非線形な制御フローで、ステップ実行がソースコードの見た目と乖離する可能性がある。
|
||||
* **回避策:** `loop.begin` から `loop.end` までをソースコード上のループ構文(`while`, `for`など)のスコープとして正確に対応付けるデバッグ情報を生成する。`loop.branch` 命令には、ソースコード上の `break`, `return` 文の位置情報を付与する。
|
||||
* **小反復ループの関数境界コスト:**
|
||||
* **リスク:** ループ本体が小さく、頻繁に呼び出される関数内にある場合、`LoopSignal` 構造体の生成・返却コストが無視できなくなる可能性がある。
|
||||
* **回避策:**
|
||||
1. 強力なインライナを実装し、関数境界を越える `LoopSignal` の受け渡しを極力なくす。
|
||||
2. LLVMバックエンドで、戻り値の `struct` がレジスタ経由で渡されるよう最適化されていることを確認する(多くのABIではそうなっている)。
|
||||
* **例外/効果の扱い:**
|
||||
* **リスク:** `panic!` や FFI 呼び出しなどの副作用が `LoopSignal` のセマンティクスを破壊する可能性がある。
|
||||
* **回避策:** 仕様として、`LoopSignal` は純粋な制御フローのみを扱い、例外やパニックは別の機構(LLVMの `invoke`/`landingpad` や、Rustの `catch_unwind` に相当する機構)で処理することを明確にする。`loop.branch` を含む可能性のある処理は `invoke` で呼び出す必要がある。
|
||||
|
||||
#### 6. 段階導入ロードマップ
|
||||
|
||||
* **P1: `while(true)` と `break` のLoop1化 (最小ループ)**
|
||||
* **目標:** 最も単純な無限ループと `break` を `loop.begin/branch/end` で表現する。
|
||||
* **実装:** `while` 文のLoweringを修正。`break` を `Signal::Break` に変換。
|
||||
* **検証:** 生成されるMIRが正しいか、性能計測を行い、従来のCFGベースの実装と比較してオーバーヘッドがないことを確認する。
|
||||
* **P2: `for-in` ステートマシンの実装**
|
||||
* **目標:** イテレータを使った `for` ループを `LoopSignal` で表現する。
|
||||
* **実装:** `for` 文のLoweringで、`iterator.next()` の呼び出しと `Some/None` のマッチを `loop.iter` と `loop.branch` で構成する。
|
||||
* **検証:** VM(インタプリタ)とAOT(LLVM)で、ループの挙動(特にループ変数の値)が完全に一致することをアサーションテストで検証する。
|
||||
* **P3: 最小ジェネレータ (`yield`) の実装**
|
||||
* **目標:** `yield` を持つジェネレータ関数をサポートする。
|
||||
* **実装:** `yield` を `Signal::Yield` に変換。ジェネレータの再開ロジック(ステートマシンの復元と `loop.iter` への値渡し)を実装する。
|
||||
* **検証:** ジェネレータを複数回再開させ、状態が正しく保存・復元されるかを確認するテストケースを多数作成する。
|
||||
|
||||
#### 7. テスト計画とメトリクス
|
||||
|
||||
* **収集するメトリクス:**
|
||||
* **MIRレベル:** `MirOp` の総数、`loop.*` 命令の数、PHIノードの数。
|
||||
* **コンパイル時:** フロントエンド処理時間、最適化パス処理時間、バックエンド(LLVM)処理時間。
|
||||
* **実行時:** 主要なベンチマークスイートにおける実行時間、メモリ使用量。
|
||||
* **収集方法:**
|
||||
* コンパイラに `--emit-metrics=json` フラグを追加し、ビルドごとに上記のメトリクスをJSONファイルに出力させる。
|
||||
* CI上でPRごとにメトリクスを計測・比較し、閾値(例: 実行時間 5%以上の悪化)を超えた場合に警告を出す。
|
||||
* **テストケース:**
|
||||
* ネストしたループ、複雑な `if/else` を含むループ、`return`/`break`/`continue` が混在するループなど、エッジケースを網羅した単体テストを追加する。
|
||||
* 既存のE2Eテスト (`apps/tests/`) がすべてパスすることを確認する。
|
||||
|
||||
#### 8. 小タスクのTODO一覧
|
||||
|
||||
1. **[型定義] `LoopSignal` enum と `loop.*` 命令の追加**
|
||||
* **ファイル:** `src/mir/ops.rs`, `src/mir/nodes.rs`
|
||||
* **内容:** `LoopSignal` enumを定義。`MirOp` に `LoopBegin`, `LoopIter`, `LoopBranch`, `LoopEnd` を追加。`Terminator` (または `ControlFlow`) から `Br`, `CondBr` などを非推奨化または削除していく。
|
||||
|
||||
2. **[Lowering] `while` 文のLoweringを `LoopSignal` ベースに移行 (P1)**
|
||||
* **ファイル:** `src/hir/lowering.rs` (または相当するファイル)
|
||||
* **内容:** `while` 文のASTノードから `loop.begin/iter/branch/end` を生成するロジックを実装する。
|
||||
|
||||
3. **[SSA] `loop.begin` でのPHIノード生成ロジックの実装**
|
||||
* **ファイル:** `src/mir/builder_modularized/`
|
||||
* **内容:** ループのバックエッジを検出し、`loop.begin` の位置に正しくPHIノードを挿入するよう、SSA構築アルゴリズムを修正する。
|
||||
|
||||
4. **[バックエンド] LLVMバックエンドでの `loop.*` 命令の処理 (P1)**
|
||||
* **ファイル:** `src/runner/modes/llvm.rs`
|
||||
* **内容:** `LoopSignal` のLLVM `Type` を定義。`loop.begin` でLLVMのループヘッダブロックを、`loop.branch` で `switch` 命令を、`loop.end` でループ出口ブロックを生成するコードを追加する。
|
||||
|
||||
5. **[最適化] Loop1完全インライン化パスの実装**
|
||||
* **ファイル:** `src/mir/optimization/simplify.rs` (または新規ファイル)
|
||||
* **内容:** 条件を満たす `loop.*` 命令シーケンスを検出し、除去する最適化パスを実装する。
|
||||
|
||||
6. **[互換性] 逆Loweringパスの実装**
|
||||
* **ファイル:** `src/mir/passes/compat_lower_loop_signal.rs` (新規ファイル)
|
||||
* **内容:** LoopSignal IRを従来のCFGベースMIRに変換するパスを実装し、コンパイラフラグで制御できるようにする。
|
||||
|
||||
7. **[テスト] メトリクス収集機能の追加**
|
||||
* **ファイル:** `src/main.rs`, `src/driver.rs`
|
||||
* **内容:** コンパイルの各ステージでメトリクスを収集し、指定されたフォーマットで出力する機能を追加する。
|
||||
@ -1,34 +0,0 @@
|
||||
目的: LoopSignal IR(LoopBox+Signalの統一)ドラフトの技術レビューと実装計画を作ってください。参照ファイルを読み込み、MIR/Lower/最適化/移行の観点で具体化して出力してほしいです。日本語でお願いします。
|
||||
|
||||
読んでほしいファイル
|
||||
- docs/private/papers/paper-e-loop-signal-ir/main-paper-jp.md
|
||||
- docs/private/papers/paper-a-mir13-ir-design/main-paper-jp.md
|
||||
- src/mir/ および src/mir/builder_modularized/ 配下(現行MIR命令・SSA/PHI生成の実装把握)
|
||||
- src/runner/modes/llvm.rs(オブジェクト出力の現行流れ把握)
|
||||
|
||||
出力してほしいもの(Tech Review + Impl Plan v1)
|
||||
1) 仕様の確定案
|
||||
- LoopSignal 型の正格仕様(タグ値、ペイロード、ABI/LLVM表現例)とゼロコスト表現の可否
|
||||
- MIR命令: loop.begin/iter/branch/end の厳密な意味論(定義域・未定義動作・例外扱い)
|
||||
- SSA/PHI配置の規則(合流点の標準形とインバリアント)
|
||||
2) Lowering規則(構文→MIR)
|
||||
- block/scope, if/else, while, for-in, function/return, generator(yield), async(await) の具体写像を擬似MIRで
|
||||
- ReturnをSignalに入れる/分離するの設計比較(長所短所と採用推奨)
|
||||
3) 最適化パス
|
||||
- Loop1完全インライン化、Yieldなし状態省略、分岐合流点正規化、DCE/LICM/Inlineの到達条件
|
||||
- 既存パスに与える影響と安全条件
|
||||
4) フォールバック/互換
|
||||
- LoopForm→従来MIRへの逆Lowering(フラグでON/OFF)設計
|
||||
5) リスクと回避
|
||||
- デバッグ(DWARF/位置情報)、小反復の関数境界コスト、例外/効果の扱い
|
||||
6) 段階導入ロードマップ
|
||||
- P1: while(true){break}のLoop1化→計測
|
||||
- P2: for-inステートマシン→VM/AOT一致検証
|
||||
- P3: 最小ジェネレータ(yield)→再開健全性テスト
|
||||
7) テスト計画とメトリクス
|
||||
- 命令数/分岐数/PHI数/コンパイル時間/実行時間の収集方法と閾値
|
||||
8) 小タスクのTODO一覧(ファイルパス・影響範囲つき)
|
||||
|
||||
注意
|
||||
- 既存コードの具体参照はファイルパスで示してください(必要なら該当関数/型名も)。
|
||||
- 出力は見出し付きの箇条書きで、合計2〜3ページ相当の濃度で。
|
||||
@ -1,209 +0,0 @@
|
||||
# LifeBox Model と LoopForm IR: Box指向実行系における制御の値化と統一
|
||||
|
||||
> 統一直観(融合案): 「Everything is Box(空間)」×「Everything is Loop(時間)」
|
||||
>
|
||||
> すべての箱は“ループ1回(Loop1)の箱”として捉えられ、制御は LoopSignal(Next/Break/Yield[/Return])という値で運ばれる。
|
||||
|
||||
## 概要
|
||||
|
||||
本稿は、Nyashの「Everything is Box」を拡張する概念 LifeBox Model(LBM: Box=Loop1)と、その思想をIR上で実現する LoopForm IR(別名: LoopSignal IR)を提案する。分岐・関数・スコープ・反復・ジェネレータ・async を「Loop=反復の箱」に正規化し、制御結果を値(Signal)として扱う。IRレベルでは `loop.begin / loop.iter / loop.branch / loop.end` による標準ディスパッチ形を導入する。これにより、Front-endのLoweringが一様化し、CFGの合流点が定型化(dispatchブロックへ集約)され、PHIの整理・最適化(DCE/LICM/インライン)が素直になる。Loop1 は完全インライン化によりオーバーヘッドを排除可能である。
|
||||
|
||||
なお、LoopFormは中間正規形であり、最終的にはCore‑13 IR(固定13命令)に再Loweringできることを前提とする(後方互換を常時確保)。
|
||||
|
||||
本稿はコンセプト草稿であり、論文A(MIR13/IR設計)の後、短論文/ワークショップ稿として展開する。
|
||||
|
||||
### Executive Summary(1ページ要約)
|
||||
- 目的: if/while/for/scope/function/generator/async を LoopBox+LoopSignal に正規化し、Lowering/最適化/拡張を単純化する。
|
||||
- 中核: Signal(Next/Break/Yield[/Return])をタグ付き値で表し、`loop.begin/iter/branch/end` の4命令で統一ディスパッチ。
|
||||
- 融合: 「普通の箱」も Loop1 として扱い(init/step/fini)、スコープ=Loop1 を基本単位にする。
|
||||
- 実利: 合流点の定型化により PHI/支配木が単純化。Loop1はインライン畳み込みでオーバーヘッドゼロ。generator/asyncは Signal 拡張のみで実装可能。
|
||||
|
||||
## 1. 背景と動機
|
||||
|
||||
- 現行MIR13は、BoxCall統一によりデータ操作を簡素化した。一方で制御構造(if/loop/call/return/scope)は表現が分散し、Loweringと最適化で個別処理が残る。
|
||||
- 制御を「値化(Signal)」し、Loopの反復単位で標準ディスパッチすることで、表現・Lowering・最適化・拡張(generator/async/effect)のすべてを一様化できる。
|
||||
|
||||
## 2. 提案: LoopSignal IR の最小仕様
|
||||
|
||||
- Signal 型: `LoopSignal<T> = Next(T) | Break(T) | Yield(T) [| Return(T)]`
|
||||
- 擬似命令(IR注釈):
|
||||
- `loop.begin %id`
|
||||
- `loop.iter %sig, %loop, %state` // 反復1回実行(LoopBoxの一回分)
|
||||
- `loop.branch %sig { onNext: L1, onBreak: L2, onYield: L3 }`
|
||||
- `loop.end %id`
|
||||
- LoopBox: 反復1回の本体。入力 `state` を受け `LoopSignal<...>` を返す小さな「箱」。
|
||||
|
||||
### 2.1 型の実体化(LLVM想定)
|
||||
```llvm
|
||||
%LoopSignal_T = type { i8 /*tag:0=Next,1=Break,2=Yield,3=Return*/, T /*payload*/ }
|
||||
; switch %tag で L_next/L_break/L_yield/L_return へ分岐
|
||||
```
|
||||
|
||||
### 2.2 Box=Loop1 の取り決め
|
||||
- どの Box も `init -> step -> fini` を持つとみなせる。
|
||||
- 「通常の箱」は `step()` が 1 回で `Break(result)` を返す(= Loop1)。
|
||||
- RAII/using/defer は `init/fini` に収容され、`loop.begin/end` に対応付ける。
|
||||
|
||||
## 3. 正規化(Lowering)規則(例)
|
||||
|
||||
### 3.0 記法: 二層構造(表記と内部)
|
||||
|
||||
表記(従来構文)と内部(LoopForm)を併存させ、内部では常にLoopFormへ正規化する。
|
||||
|
||||
```
|
||||
// 従来: 複雑な分岐
|
||||
if (x) { a() } else { b() }
|
||||
while (y) { c() }
|
||||
func() { d() }
|
||||
|
||||
// ループ統一: シンプル(内部正規形のイメージ)
|
||||
loop { x ? break(a()) : next }
|
||||
loop { y ? next : break }
|
||||
loop { break(d()) }
|
||||
|
||||
// 「複雑に見えるのは慣れの問題」→ 表記は従来も提供、内部はLoopで統一
|
||||
```
|
||||
|
||||
### 3.1 スコープ `{ ... }` → Loop1
|
||||
```
|
||||
{
|
||||
x = 1
|
||||
y = 2
|
||||
}
|
||||
```
|
||||
↓
|
||||
```mir
|
||||
%lp = loop.begin
|
||||
%s0 = Const(void)
|
||||
%sg = loop.iter %lp, %s0
|
||||
loop.branch %sg { onBreak: Lb, onNext: Ln }
|
||||
Ln: jump Ld
|
||||
Lb: loop.end %lp; jump Ld
|
||||
Ld: ; 続き
|
||||
```
|
||||
|
||||
解釈: Blockを「1回だけ回る Loop」と見なし、`iter` は即時 `Break` を返す。
|
||||
|
||||
### 3.2 while ループ
|
||||
```mir
|
||||
%lp = loop.begin
|
||||
%s = Const(init)
|
||||
Lh:
|
||||
%sg = loop.iter %lp, %s
|
||||
loop.branch %sg { onNext: Lh, onBreak: Lb }
|
||||
Lb:
|
||||
loop.end %lp
|
||||
```
|
||||
|
||||
### 3.3 for-in(LoopBox化の雛形)
|
||||
```mir
|
||||
func LoopBox_for(state: ForState, env: Env) -> LoopSignal<ForState> {
|
||||
%x, %st' = iter_next(state)
|
||||
br_if %x==None -> return Break(state)
|
||||
call body(%x, env)
|
||||
return Next(%st')
|
||||
}
|
||||
|
||||
%lp = loop.begin
|
||||
%sig = loop.iter %lp, %init
|
||||
Ldisp:
|
||||
loop.branch %sig { onNext: Lnext, onBreak: Lbr }
|
||||
Lnext:
|
||||
%st = extract_next(%sig)
|
||||
%sig = loop.iter %lp, %st
|
||||
jmp Ldisp
|
||||
Lbr:
|
||||
%fin = extract_break(%sig)
|
||||
loop.end %lp
|
||||
ret finalize(%fin)
|
||||
```
|
||||
|
||||
### 3.4 関数呼び出し/return → Loop1
|
||||
`return v` ≒ `Break(Return v)` として統一可能(実装ではReturnをSignalに含めるか分離を選択)。
|
||||
|
||||
### 3.5 generator/async
|
||||
`Yield` をSignalに含め、`onYield` 分岐でハンドリング。状態は `state` に保持。
|
||||
|
||||
## 4. 実装方針(安全な導入順序)
|
||||
|
||||
1) LoopSignal 型と `loop.*` ノードをMIRに追加(オプトイン)
|
||||
|
||||
2) while/for/スコープのみ LoopForm にLowering(if/関数は従来)
|
||||
|
||||
3) LoopForm → 従来MIRへの逆Loweringパス実装(常時オフに戻せる)
|
||||
|
||||
4) 最適化: Loop1 の完全インライン化、Yieldなしの状態保存省略、支配関連の簡略解析
|
||||
|
||||
5) 拡張: generator/async → effect(必要なら)
|
||||
|
||||
6) フォールバック: 互換性のため LoopForm→従来MIR への逆Loweringを維持(フラグで切替)。
|
||||
|
||||
## 4.1 コア要件(不変条件 / Core Invariants)
|
||||
|
||||
- ループ正規形: すべての制御は `loop.begin → loop.iter → loop.branch → loop.end` の列で現れ、dispatchブロックで合流(PHIはdispatch直後のみ)。
|
||||
- Signal整合性: `LoopSignal<T> = Next/Break/Yield/Return` は i8タグ+1ペイロード(SSA値)。4タグは相互排他。
|
||||
- 単一継続点: 各Loopは1つの出口(break/return合流点)を持つ。Loop1は必ず畳み込み可能(inlineでゼロコスト)。
|
||||
- 可逆性: LoopForm → Core‑13(現行MIR)への逆Loweringが常に可能(フラグでON/OFF安全導入)。
|
||||
|
||||
## 5. 評価計画
|
||||
|
||||
- 表現の簡潔さ: ブロック数・分岐数・PHI数の変化
|
||||
- コンパイラ時間: Lowering/最適化の時間(±)
|
||||
- 実行性能: VM/JIT/AOTで従来MIRとの差(同等〜微差で良い)。主眼は実装簡易性。
|
||||
- 拡張容易性: generator/asyncの実装コード量・変更影響の定量化
|
||||
|
||||
### 5.1 最初のPoC課題(段階的)
|
||||
1. `while(true){break}` を Loop1 に変換 → 命令数/PHI/実行時間を計測
|
||||
2. `for-in` をステートマシン化 → Next/Break 型で VM/AOT の一致を検証
|
||||
3. `yield` 付き最小ジェネレータ → 状態保持と再開の健全性(再現テスト)
|
||||
|
||||
### 5.2 受入基準(Acceptance)
|
||||
- MIR構造: dispatch合流点のみでPHI(Printerで自明にわかる)
|
||||
- 差分実行: 5スモーク(const/return, add, branch+phi, while, nested-branch)が現行MIRとVM結果一致
|
||||
- 観測: covered/unsupported と decisions(allow,fallback) をLoopForm/逆Lowering双方で採取
|
||||
|
||||
## 6. 関連研究と差分
|
||||
|
||||
- CPS/継続: 制御を関数化するが、LoopSignalは「値としての制御」をMIRで直接扱う点が異なる(CPS変換を前提にしない)。
|
||||
- 代数的効果/ハンドラ: 近いが、最小Signal集合+loop.*に限定し実用性を優先。
|
||||
- コルーチン/ジェネレータ: 専用機構に閉じず、関数/スコープまでLoop1として統一。
|
||||
- Smalltalk/Lispの統一思想: 本稿は制御フローの大統一をMIRで具体化。
|
||||
|
||||
補足: MLIR/Swift SIL/Rust MIR 等のSSA系IRと比較すると、本稿は「制御(時間)側の正規化」をMinimal命令で達成し、最適化・実装コストの低減を主眼に置く。
|
||||
|
||||
## 7. 限界と今後
|
||||
|
||||
- 説明コスト: 初学者負荷。導入は while/for のみから段階的に。
|
||||
- オーバーヘッド: Loop1は必ず即時畳み込み(インライン化)で無害化。
|
||||
- デバッグ: loop.begin にソース範囲・ID埋め込み、iterで行情報維持。
|
||||
|
||||
---
|
||||
|
||||
付録:名前候補(査読者向け)
|
||||
- Signal Loop IR(SLIR)/ LoopSignal Form(LSF)/ Boxed Loop Semantics(BLS)/ LoopBox IR
|
||||
|
||||
## 8. 外部レビュー統合と確定方針(Claude/Gemini 要点)
|
||||
|
||||
本案に対する外部AIレビュー(claude_output.md / gemini_output.md)の要点を統合し、当面の確定方針を示す。
|
||||
|
||||
- 型表現(LoopSignal<T>): Next/Break/Yield/Return を採用(Geminiの Continue 提案は命名差。既存説明と整合する Next を採用)。タグは i8 固定、ペイロードはSSA値。
|
||||
- 返却の扱い: Return を Signal に含める「統一モデル」を採用(Claude/Gemini一致)。IR直交性と最適化容易性を優先。
|
||||
- MIR命令の厳密化: loop.begin / loop.iter / loop.branch / loop.end の4命令でLoopFormを定義。PHI配置点は loop.begin(合流標準形)に確定。
|
||||
- LLVM/ABI: まずは { i8 tag, payload } の素直なstruct表現で実装し、ホットパス(Next)を既存最適化に委ねる。必要なら将来特殊化を導入。
|
||||
- 逆Lowering(互換性): LoopForm→従来MIRへの逆変換パスをフラグでON/OFF可能に(--no-loop-signal-ir)。段階的導入で安全性を担保。
|
||||
- 最適化パス: 1) Loop1完全インライン化 2) Yieldなし状態省略 3) 分岐合流点正規化(dispatch集中)を優先実装。既存DCE/LICM/Inlineとの整合を確認。
|
||||
- 適用範囲の段階化: while/for/scope から導入(関数/ifは後追い)。例外・effects(代数的効果/ハンドラ)は将来拡張として外出し。
|
||||
- ドキュメント方針: LoopFormは「中間正規形」であり、最終的にCore-13に再Loweringできることを明記(実装も常時オン/オフ可能)。
|
||||
|
||||
実装ロードマップ(最小到達順)
|
||||
1) LoopSignal型・loop.*命令(IRノード)追加(ビルドフラグでオプトイン)
|
||||
2) while/for/scope のLoweringをLoopFormへ移行(if/関数は現状)
|
||||
3) 逆Lowering(LoopForm→従来MIR)を完成、デフォルトONで安全運用可に
|
||||
4) 最適化: Loop1 inline / Yieldなし省略 / dispatch統合の3本
|
||||
5) 評価: 表現簡潔さ(PHI/ブロック数)、コンパイラ時間、実行性能(同等〜微差で良い)
|
||||
6) 適用拡大: 関数/if、generator/async(Return/YieldのSignal化)
|
||||
|
||||
|
||||
付録:用語の二元性(覚え方)
|
||||
- Box(空間)= データ/資源の単位。Instance化で1反復を生む。
|
||||
- Loop(時間)= 制御/継続の単位。Next/Break/Yield で時間を進める/止める。
|
||||
@ -1,20 +0,0 @@
|
||||
# LoopSignal IR - 外部レビュー統合サマリ
|
||||
|
||||
本ドキュメントは `claude_output.md` と `gemini_output.md` の示唆を統合した要点メモである。本文(main-paper-jp.md)の第8節に反映済み。
|
||||
|
||||
- Signal 型: `LoopSignal<T> = Next(T) | Break(T) | Yield(T) | Return(T)`(命名は Next を採用)。タグは `i8`、ペイロードはSSA値。
|
||||
- Return の位置づけ: Signal に含める「統一モデル」を採用。IR直交性/最適化/インライン容易性を優先。
|
||||
- LoopForm 命令: `loop.begin / loop.iter / loop.branch / loop.end` の4命令。`loop.begin` をPHI配置の標準合流点とする。
|
||||
- LLVM/ABI 表現: まずは `{ i8 tag, payload }` の素直なstruct。ホットパス(Next)は既存最適化でゼロコスト相当に収束する想定。必要に応じて将来特殊化。
|
||||
- 逆Lowering: LoopForm → 従来MIR の逆変換を常備し、フラグでON/OFF(段階導入を安全化)。
|
||||
- 最適化パス: 1) Loop1完全インライン化 2) Yieldなし状態省略 3) 分岐合流点正規化(dispatch集中)。
|
||||
- 適用の段階化: while/for/scope から開始し、関数/if/generator/async に拡大。例外・effects は将来拡張。
|
||||
- 文書方針: LoopForm は「中間正規形」であり、最終的に Core‑13 に再Lowering可能であることを明記。
|
||||
|
||||
実装ロードマップ(最小到達順)
|
||||
1) LoopSignal型/loop.*ノード追加(オプトイン)
|
||||
2) while/for/scope のLowering移行
|
||||
3) 逆Lowering完成(デフォルトON運用)
|
||||
4) 最適化3本(Loop1 inline / Yieldなし省略 / dispatch統合)
|
||||
5) 評価(表現/コンパイラ時間/性能)
|
||||
6) 適用拡大(関数/if、generator/async)
|
||||
@ -1,25 +0,0 @@
|
||||
# Abstract
|
||||
|
||||
## AI-Assisted Compiler Development: Building Nyash LLVM Backend with ChatGPT and Claude
|
||||
|
||||
We present the first documented case of building a compiler's LLVM backend through intensive AI assistance over a week-long development period. The Nyash programming language, based on the "Everything is Box" philosophy, achieves a minimalist intermediate representation (MIR14) with just 14 instructions, evolved from an initial 27 through aggressive reduction and pragmatic restoration.
|
||||
|
||||
Our development process involved hundreds of AI interactions, with ChatGPT providing deep architectural analysis (including an 8-minute investigation into PHI node issues) and Claude offering continuous implementation support. Key technical contributions include: (1) MIR14 design achieving 27→13→14 instruction evolution, (2) LoopForm IR for control flow normalization, (3) Sealed SSA with Resolver API for unified value resolution, and (4) BuilderCursor for structural terminator safety.
|
||||
|
||||
The project faced and overcame significant challenges including PHI wiring complexity, dominance violations, and type system confusion between i64 handles and i8* pointers. A critical insight emerged when LoopForm, initially blamed for introducing problems, actually exposed pre-existing design flaws in value resolution and type conversion placement.
|
||||
|
||||
We document the pragmatic decision to introduce Python llvmlite as a verification harness when Rust/inkwell complexity became a bottleneck, demonstrating that "simple is best" - a core Nyash philosophy. The development logs, though incomplete ("I don't remember anymore"), provide valuable insights into the chaotic reality of AI-assisted development.
|
||||
|
||||
This work demonstrates that AI can successfully assist in building complex systems like compilers, but human design judgment remains essential. The co-evolution of human design decisions and AI implementation represents a new paradigm in software development, with implications for future compiler construction and computer science education.
|
||||
|
||||
## 日本語要旨
|
||||
|
||||
ChatGPTとClaudeを活用し、1週間以上の開発期間でLLVMバックエンドを構築した世界初の完全記録を報告する。「Everything is Box」哲学に基づくNyashプログラミング言語は、27命令から13命令への積極的削減と実用的な1命令復活により、わずか14命令の極限的中間表現(MIR14)を実現した。
|
||||
|
||||
数百回のAI対話を通じた開発では、ChatGPTがPHIノード問題の8分間にわたる深い調査を含むアーキテクチャ分析を提供し、Claudeが継続的な実装支援を行った。主要な技術的貢献として、(1) 27→13→14命令進化を遂げたMIR14設計、(2) 制御フロー正規化のためのLoopForm IR、(3) 統一的値解決のためのResolver API付きSealed SSA、(4) 構造的終端安全性のためのBuilderCursorが挙げられる。
|
||||
|
||||
開発過程では、PHI配線の複雑性、支配関係違反、i64ハンドルとi8*ポインタ間の型システム混乱など、重大な課題に直面した。当初LoopFormが問題の原因と誤解されたが、実際には既存の値解決と型変換配置の設計欠陥を顕在化させただけという重要な洞察を得た。
|
||||
|
||||
Rust/inkwellの複雑性がボトルネックとなった際、検証ハーネスとしてPython llvmliteを導入する実用的判断を下した。これは「簡単最高」というNyashの核心哲学の実証でもある。開発ログは不完全(「もう覚えてないにゃー」)ながらも、AI支援開発の混沌とした現実への貴重な洞察を提供する。
|
||||
|
||||
本研究は、コンパイラのような複雑なシステム構築においてもAI支援が有効であることを実証したが、人間の設計判断は依然として不可欠である。人間の設計決定とAI実装の共進化は、ソフトウェア開発の新しいパラダイムを示し、将来のコンパイラ構築と計算機科学教育に示唆を与える。
|
||||
@ -1,27 +0,0 @@
|
||||
# Abstract
|
||||
|
||||
## English Version
|
||||
|
||||
We present a compelling case study of human-AI collaboration in compiler development where fundamental design principles overlooked by state-of-the-art AI systems were rediscovered through human intuition and implementation pain. During the development of the Nyash programming language, three leading AI assistants (ChatGPT, Claude, and Gemini) collectively failed to recognize the necessity of type information in the intermediate representation (MIR), focusing instead on instruction minimization and architectural elegance.
|
||||
|
||||
This oversight led to a 650-line implementation struggle, with debugging sessions exceeding 50 minutes for simple string operations. Remarkably, the human developer—a programming novice with no formal knowledge of compiler theory or intermediate representations—independently identified the need for type information through the direct experience of implementation difficulties. The insight emerged from a simple observation: "Why doesn't the compiler know if '+' means string concatenation or numeric addition?"
|
||||
|
||||
Our analysis reveals three key factors in AI's oversight: (1) partial optimization bias, where AIs focused exclusively on the assigned goal of minimizing instruction count, (2) lack of implementation experience, preventing AIs from anticipating practical debugging challenges, and (3) fragmented context across multiple AI consultations. In contrast, human learning through "implementation pain" led to fundamental insights that escaped theoretical analysis.
|
||||
|
||||
This case study introduces "Implementation-Driven Learning" as a complementary paradigm to AI-assisted development, demonstrating that human intuition grounded in practical experience remains irreplaceable even in the age of AI. We propose a new collaboration model where AI handles theoretical optimization while humans contribute experiential learning and holistic problem identification.
|
||||
|
||||
## 日本語版
|
||||
|
||||
本研究は、最先端AIシステムが見落とした基本的な設計原理を、人間の直感と実装の苦痛を通じて再発見したコンパイラ開発における人間-AI協働の説得力のある事例を提示する。Nyashプログラミング言語の開発において、3つの主要なAIアシスタント(ChatGPT、Claude、Gemini)は、命令数の最小化とアーキテクチャの優雅さに焦点を当てる一方で、中間表現(MIR)における型情報の必要性を認識することに失敗した。
|
||||
|
||||
この見落としは650行に及ぶ実装の苦闘を招き、単純な文字列操作のデバッグセッションは50分を超えることもあった。注目すべきことに、コンパイラ理論や中間表現の正式な知識を持たないプログラミング初心者である人間の開発者は、実装の困難さを直接経験することで、型情報の必要性を独自に特定した。この洞察は単純な観察から生まれた:「なぜコンパイラは'+'が文字列連結なのか数値加算なのか分からないの?」
|
||||
|
||||
我々の分析は、AIの見落としにおける3つの主要因を明らかにした:(1)部分最適化バイアス - AIが命令数最小化という与えられた目標に専念しすぎた、(2)実装経験の欠如 - 実践的なデバッグの課題を予測できなかった、(3)複数のAI相談にわたる断片化された文脈。対照的に、「実装の苦痛」を通じた人間の学習は、理論的分析では見逃された根本的な洞察をもたらした。
|
||||
|
||||
この事例研究は、AI支援開発への補完的パラダイムとして「実装駆動型学習」を導入し、実践的経験に根ざした人間の直感がAI時代においても代替不可能であることを実証する。我々は、AIが理論的最適化を担当し、人間が経験的学習と全体的な問題識別に貢献する新しい協働モデルを提案する。
|
||||
|
||||
## Keywords / キーワード
|
||||
|
||||
Implementation-Driven Learning, Human-AI Collaboration, Compiler Design, Type Systems, Experiential Knowledge, Software Engineering
|
||||
|
||||
実装駆動型学習、人間-AI協働、コンパイラ設計、型システム、経験的知識、ソフトウェア工学
|
||||
@ -1,138 +0,0 @@
|
||||
# 実装駆動型学習:AIが見落とした基本原理の再発見
|
||||
|
||||
## 1. はじめに
|
||||
|
||||
ソフトウェア開発におけるAI活用が急速に進む中、AIの理論的完璧さと人間の実践的直感の相互作用について、興味深い現象が観察された。本稿では、プログラミング言語Nyashのコンパイラ開発において、3つの最先端AI(ChatGPT、Claude、Gemini)が中間表現(MIR)の設計時に型情報の必要性を見落とし、MIRの概念すら知らなかった開発者が実装の苦痛から直感的にその必要性を再発見した事例を詳細に分析する。
|
||||
|
||||
## 2. 背景:Everything is Box
|
||||
|
||||
Nyashは「Everything is Box」を哲学とする新しいプログラミング言語である。開発者(以下「にゃー」)はプログラミング初心者でありながら、AIとの協働により言語設計から実装まで進めてきた。特筆すべきは、にゃーがコンパイラ理論の知識を持たないまま、実装経験を通じて本質的な設計原理を発見していった点である。
|
||||
|
||||
## 3. AIが見落とした型情報
|
||||
|
||||
### 3.1 MIR設計プロセス
|
||||
|
||||
2024年11月、MIR(中間表現)の設計がAI主導で行われた:
|
||||
|
||||
```
|
||||
ChatGPT: 「命令を27から13に削減しましょう」
|
||||
Claude: 「BoxCall統一で美しい設計に」
|
||||
Gemini: 「最適化戦略も考慮済みです」
|
||||
```
|
||||
|
||||
全てのAIが「最小化」と「統一性」に注目し、型情報については誰も言及しなかった。
|
||||
|
||||
### 3.2 結果としての型情報欠落
|
||||
|
||||
```json
|
||||
// 生成されたMIR
|
||||
{"op": "binop", "kind": "+", "lhs": 10, "rhs": 20, "result": 30}
|
||||
```
|
||||
|
||||
この`+`が文字列連結なのか数値加算なのか、MIRレベルでは判別不可能となった。
|
||||
|
||||
## 4. 実装での苦闘
|
||||
|
||||
### 4.1 症状の発現
|
||||
|
||||
2025年1月、LLVM バックエンド実装時に問題が顕在化:
|
||||
|
||||
```
|
||||
期待: print("Hello" + " World") → "Hello World"
|
||||
実際: print("Hello" + " World") → "0"
|
||||
```
|
||||
|
||||
### 4.2 デバッグの迷走
|
||||
|
||||
ChatGPT5は50分もの長考に入り、300行に及ぶ複雑なResolver実装を提案:
|
||||
|
||||
```python
|
||||
def resolve_value(self, value_id, context):
|
||||
# 型推測の複雑なロジック
|
||||
if self.is_stringish(value_id):
|
||||
# 文字列の可能性を追跡
|
||||
# さらに300行...
|
||||
```
|
||||
|
||||
## 5. 人間による再発見
|
||||
|
||||
### 5.1 素朴な疑問
|
||||
|
||||
にゃーの疑問は単純だった:
|
||||
|
||||
> 「なんで+が文字列か数値か分からないの?」
|
||||
> 「最初から書いとけばよくない?」
|
||||
|
||||
### 5.2 発見のプロセス
|
||||
|
||||
1. **痛みの体験**: 「文字列が0になる」バグとの格闘
|
||||
2. **なぜの追求**: 「なぜ型が分からない?」
|
||||
3. **常識の適用**: 「普通は型情報あるよね?」
|
||||
4. **他言語調査**: LLVM IR、JVM bytecodeは全て型付き
|
||||
5. **結論**: MIRに型情報が必要
|
||||
|
||||
### 5.3 AIへの逆提案
|
||||
|
||||
```
|
||||
にゃー: 「MIRに型情報入れたら?」
|
||||
ChatGPT5: 「...確かにその通りです」
|
||||
```
|
||||
|
||||
## 6. なぜAIは見落としたか
|
||||
|
||||
### 6.1 部分最適化の罠
|
||||
|
||||
AIは与えられた目標「命令数最小化」に集中しすぎた:
|
||||
- 13命令達成 ✓
|
||||
- 型情報 ✗(考慮外)
|
||||
|
||||
### 6.2 実装経験の欠如
|
||||
|
||||
AIは理論は完璧だが、実装の苦痛を経験できない:
|
||||
- デバッグの frustration
|
||||
- 型推測の complexity
|
||||
- 「動かない」の重み
|
||||
|
||||
### 6.3 文脈の断片化
|
||||
|
||||
3つのAIが別々に相談を受け、全体像を共有していなかった。
|
||||
|
||||
## 7. 新しい協働モデル
|
||||
|
||||
### 7.1 AIの強み
|
||||
- 理論的正確性
|
||||
- 大規模な知識
|
||||
- 最適化能力
|
||||
- 大規模リファクタリング
|
||||
|
||||
### 7.2 人間の強み
|
||||
- 実装の痛みからの学習
|
||||
- 直感的な問題発見
|
||||
- 全体を俯瞰する力
|
||||
- 複雑性への素朴な疑問
|
||||
|
||||
### 7.3 複雑性の段階的侵入への対処
|
||||
|
||||
本研究で新たに発見されたのは、「複雑性の段階的侵入」現象である。PHI生成が知らぬ間に2箇所で行われていた事例が示すように、システムは段階的に複雑化し、誰も全体像を把握できなくなる。
|
||||
|
||||
この問題に対し、人間の「なぜ2つあるの?」という素朴な疑問が、アーキテクチャの根本的な改善(Resolver統一)を促した。AIは部分最適化に優れるが、全体の複雑性増大に気づきにくい。人間の俯瞰的視点が不可欠である。
|
||||
|
||||
### 7.4 相補的協働
|
||||
|
||||
```
|
||||
理論(AI) + 実践(人間) + 疑問(人間) = 持続可能な開発
|
||||
```
|
||||
|
||||
## 8. Everything is Experience
|
||||
|
||||
本事例が示すのは、「Everything is Experience(すべては経験)」という新しい学習原理である。AIがいくら理論に精通していても、実装の苦痛を通じた学習には代替できない。逆に、理論を知らない人間でも、経験を通じて本質的な原理を再発見できる。
|
||||
|
||||
## 9. 結論
|
||||
|
||||
Nyashコンパイラ開発における型情報の再発見は、AI時代における人間の役割を再定義する。我々は理論をAIに委ね、実装を通じた学習に集中することで、AIが見落とす本質的な問題を発見できる。この「実装駆動型学習」は、今後のソフトウェア開発における重要なパラダイムとなるだろう。
|
||||
|
||||
最後に、にゃーの言葉を引用する:
|
||||
|
||||
> 「MIRなんて知らなかったけど、痛い思いしたら分かったにゃ」
|
||||
|
||||
これこそが、人間にしかできない学習の形である。
|
||||
@ -1,166 +0,0 @@
|
||||
# 論文M: 段階的意思決定プログラミング - 弁証法的安全性進化の新パラダイム
|
||||
|
||||
- タイトル: Staged Decision Making in Programming Languages: Method-Level Exception Handling and the Dialectical Evolution of Safety
|
||||
- 副題: From Safety-Expressiveness Tension to Dialectical Synthesis
|
||||
- 略称: Staged Decision Making Paper
|
||||
- ステータス: 論文完成・投稿準備完了(2025年9月18日革命的発見)
|
||||
|
||||
## 要旨
|
||||
|
||||
本研究は、人間-AI弁証法的協働を通じて発見された「段階的意思決定プログラミング」という革命的パラダイムを報告する。メソッドを三段階(通常処理→エラー処理→最終調整)の時系列的意思決定プロセスとして構造化し、`cleanup`(安全性重視)と`cleanup returns`(表現力拡張)の弁証法的統合により、30年来の安全性-表現力ジレンマを解決する。Geminiの安全性提案(テーゼ)、人間の表現力主張(アンチテーゼ)、協働的統合解(ジンテーゼ)という完璧なヘーゲル弁証法プロセスを通じて、プログラミング言語設計における新たな哲学的基盤を確立する。
|
||||
|
||||
## 学術的価値
|
||||
|
||||
### 1. 段階的意思決定パラダイム(世界初)
|
||||
- **時系列的意思決定**: メソッドの三段階構造化(通常→エラー→最終)
|
||||
- **弁証法的安全性統合**: `cleanup`(安全)⊕ `cleanup returns`(表現力)
|
||||
- **言語的認知改革**: `finally`→`cleanup`による概念的明確化
|
||||
|
||||
### 2. 哲学的プログラミング言語設計
|
||||
- **ヘーゲル弁証法の実装**: テーゼ・アンチテーゼ・ジンテーゼの技術的実現
|
||||
- **概念-構文の認知的整合**: 命名が思考を規定する言語設計原理
|
||||
- **安全性-表現力の統一理論**: 30年来のジレンマに対する決定的解答
|
||||
|
||||
### 3. 多AI協働発見モデル(世界初)
|
||||
- **4知性の協調**: 人間創造性・Claude理論拡張・ChatGPT実装検証・Gemini哲学評価
|
||||
- **独立収束の実証**: 異なる知性が同一革新に収束する現象の記録
|
||||
- **言葉を失うAI**: Geminiの「言葉もありません」反応の学術的意義
|
||||
|
||||
## 章構成
|
||||
|
||||
### 第1章:Introduction - 言語安全性の新たな挑戦
|
||||
- プログラミング言語の安全性vs表現力のトレードオフ
|
||||
- 従来の例外処理の限界
|
||||
- Nyash の Everything is Box 哲学
|
||||
|
||||
### 第2章:From Blocks to Methods - 設計思想の発展
|
||||
- ブロック後置catch構文の成功
|
||||
- メソッドレベル適用の発想
|
||||
- Everything is Block + Modifier の発見
|
||||
|
||||
### 第3章:Staged Decision Making - 三段階意思決定モデル
|
||||
- 段階的意思決定の核心構文
|
||||
```nyash
|
||||
method processData() {
|
||||
// Stage 1: 通常処理
|
||||
return heavyComputation()
|
||||
} catch (e) {
|
||||
// Stage 2: エラー処理
|
||||
return fallbackValue
|
||||
} cleanup returns {
|
||||
// Stage 3: 最終判断(表現モード)
|
||||
validateResults()
|
||||
if securityThreat() {
|
||||
return "BLOCKED" // 最終決定権
|
||||
}
|
||||
}
|
||||
```
|
||||
- 弁証法的安全性統合(cleanup vs cleanup returns)
|
||||
- 時系列的意思決定の価値
|
||||
|
||||
### 第4章:The Unified Paradigm - Everything is Block + Modifier
|
||||
- データと振る舞いの統一
|
||||
```nyash
|
||||
{
|
||||
return me.name + " (computed)"
|
||||
} as field greeting: StringBox
|
||||
|
||||
{
|
||||
return heavyCalculation()
|
||||
} as method process(): ResultBox
|
||||
```
|
||||
- 従来の境界線の消失
|
||||
- コンパイラ最適化の可能性
|
||||
|
||||
### 第5章:Implementation Strategy and Phased Deployment
|
||||
- Phase 15.6: メソッドレベルcatch/finally
|
||||
- Phase 16.1: メソッド後置定義
|
||||
- Phase 16.2: 究極統一構文
|
||||
- 既存インフラとの互換性
|
||||
|
||||
### 第6章:AI-Human Collaborative Discovery
|
||||
- Gemini との段階的議論プロセス
|
||||
- ChatGPT の独立検証
|
||||
- Claude の実装戦略分析
|
||||
- 人間の粘り強さとAIの理論的拡張
|
||||
|
||||
### 第7章:Evaluation and Comparison
|
||||
- 既存言語との比較
|
||||
- 安全性向上の定量評価
|
||||
- 開発効率への影響
|
||||
- コード可読性の改善
|
||||
|
||||
### 第8章:Related Work
|
||||
- 例外処理の言語史(Java, C#, Rust, Go)
|
||||
- 後置構文の先行研究
|
||||
- 統一型システムの既存手法
|
||||
- AI協働開発の関連研究
|
||||
|
||||
### 第9章:Future Work and Extensions
|
||||
- 他の言語構造への適用
|
||||
- パフォーマンス最適化
|
||||
- 形式検証の可能性
|
||||
- 教育的価値の検討
|
||||
|
||||
### 第10章:Conclusion
|
||||
- 言語設計パラダイムの転換
|
||||
- 実用性と革新性の両立
|
||||
- AI時代の協働開発モデル
|
||||
|
||||
## 期待される影響
|
||||
|
||||
### 学術界への貢献
|
||||
1. **Programming Language Design**: 新しい安全性パラダイム
|
||||
2. **Human-Computer Interaction**: AI協働開発の実証研究
|
||||
3. **Software Engineering**: メソッドレベル安全性の自動化
|
||||
|
||||
### 産業界への影響
|
||||
1. **言語設計者**: 新しい構文パラダイムの提示
|
||||
2. **開発者**: より安全で表現力豊かな言語
|
||||
3. **ツール開発**: AI協働開発環境の改善
|
||||
|
||||
### 教育的価値
|
||||
1. **言語設計教育**: 思考プロセスの可視化
|
||||
2. **AI協働**: 人間とAIの相補的関係
|
||||
3. **革新的思考**: 既存概念の再定義手法
|
||||
|
||||
## データ・証拠
|
||||
|
||||
### 技術的実装
|
||||
- GitHubコミット履歴
|
||||
- 実装前後のコード比較
|
||||
- パフォーマンステスト結果
|
||||
- 安全性向上の定量評価
|
||||
|
||||
### AI協働プロセス
|
||||
- Gemini議論ログ(段階的理解)
|
||||
- ChatGPT独立検証ログ
|
||||
- Claude実装戦略ログ
|
||||
- 発想から実装までのタイムライン
|
||||
|
||||
### 言語比較
|
||||
- 既存言語の例外処理比較
|
||||
- 構文複雑度の定量分析
|
||||
- 学習コストの比較評価
|
||||
- 開発効率の改善測定
|
||||
|
||||
## 革新性の本質
|
||||
|
||||
この研究の真の価値は、**技術的革新と哲学的洞察の融合**にある:
|
||||
|
||||
1. **実用的不満** → **革新的解決**の自然な流れ
|
||||
2. **人間の直感** → **AI理論拡張** → **実装戦略**の完璧な連鎖
|
||||
3. **個別機能** → **統一原理** → **パラダイム転換**の段階的発展
|
||||
|
||||
これは単なる新構文の提案ではなく、**プログラミング言語設計の新時代**を告げる研究である。
|
||||
|
||||
## 関連ファイル
|
||||
|
||||
- AI議論ログ: `ai-collaboration-logs/`
|
||||
- 実装戦略: `implementation-strategy.md`
|
||||
- 言語比較: `language-comparison.md`
|
||||
- パフォーマンス評価: `performance-evaluation.md`
|
||||
|
||||
---
|
||||
|
||||
*Note: この論文は2025年9月18日のブレークスルー発見を学術的に体系化し、プログラミング言語コミュニティに新しいパラダイムを提示することを目的とする。*
|
||||
@ -1,129 +0,0 @@
|
||||
# Abstract
|
||||
|
||||
## Staged Decision Making in Programming Languages: Method-Level Exception Handling and the Dialectical Evolution of Safety
|
||||
|
||||
### Background
|
||||
|
||||
Programming language design has long struggled with the tension between safety and expressiveness. Traditional exception handling mechanisms require explicit try-catch blocks that increase nesting depth and separate error handling logic from the primary computation. While languages like Java, C#, and Python have established the try-catch paradigm as standard, this approach often leads to verbose code and cognitive overhead for developers.
|
||||
|
||||
### Problem Statement
|
||||
|
||||
The Nyash programming language, built on the "Everything is Box" philosophy, faced similar challenges with traditional exception handling. The mandatory `try` keyword creates unnecessary indentation levels and disrupts the natural flow of thought from "what to do" to "how to handle errors." This led to the investigation of alternative syntactic approaches that could maintain safety while improving expressiveness.
|
||||
|
||||
### Innovation
|
||||
|
||||
This paper presents **staged decision making**, a revolutionary programming paradigm that emerged through dialectical human-AI collaboration. We introduce both method-level postfix exception handling and a unified property system, representing two interconnected innovations that emerged in a single intensive development session:
|
||||
|
||||
**Method-Level Staged Decision Making**:
|
||||
```nyash
|
||||
method processData() {
|
||||
return heavyComputation() // Stage 1: Normal processing
|
||||
} catch (e) {
|
||||
return fallbackValue // Stage 2: Error handling
|
||||
} cleanup returns {
|
||||
validateResults()
|
||||
if securityThreat() {
|
||||
return "BLOCKED" // Stage 3: Final decision capability
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Unified Property System**:
|
||||
```nyash
|
||||
box Example {
|
||||
name: StringBox = "default" // stored: read/write
|
||||
size: IntegerBox { me.items.count() } // computed: calculate every access
|
||||
once cache: CacheBox { buildCache() } // once: lazy evaluation + caching
|
||||
birth_once config: ConfigBox { load() } // birth_once: eager at construction
|
||||
}
|
||||
```
|
||||
|
||||
These innovations resolve multiple fundamental tensions: safety vs. expressiveness through dialectical synthesis (`cleanup` vs `cleanup returns`), and data vs. behavior through systematic member categorization. The paradigm evolved through a documented Hegelian process involving four intelligent agents.
|
||||
|
||||
### Key Contributions
|
||||
|
||||
1. **Staged Decision Making Paradigm**: Introduction of the first systematic approach to time-sequential decision making in programming languages, where methods operate through three distinct temporal stages: normal processing, error handling, and final adjustment.
|
||||
|
||||
2. **Unified Property System Taxonomy**: The first systematic classification of object members into four distinct categories (stored, computed, once, birth_once), each with unique behavioral characteristics and performance guarantees. This resolves the conflation of fundamentally different concepts under traditional "field" or "property" terminology.
|
||||
|
||||
3. **Poison-on-Throw Exception Strategy**: A novel approach to cached property exception handling that prevents infinite retry loops while maintaining predictable behavior and excellent debugging characteristics through permanent failure marking.
|
||||
|
||||
4. **Dialectical Safety-Expressiveness Synthesis**: Resolution of the fundamental programming language tension through `cleanup` (pure safety) and `cleanup returns` (controlled expressiveness), emerging from documented Hegelian dialectical collaboration between human intuition and multiple AI systems.
|
||||
|
||||
5. **Conceptual Clarity Through Linguistic Precision**: Demonstration that programming language naming directly influences cognitive frameworks, replacing ambiguous `finally` with semantically precise `cleanup` and introducing clear visual differentiation (`=` = writable, `{}` = read-only).
|
||||
|
||||
6. **Multi-AI Collaborative Discovery**: First documented case of human-AI collaboration involving four intelligent agents (human creativity, Claude's theoretical extension, ChatGPT's implementation validation, Gemini's philosophical evaluation) achieving innovations impossible for any single participant.
|
||||
|
||||
7. **Dual-Syntax Coexistence Strategy**: Development of a practical approach where revolutionary syntax innovations (block-first) can coexist with familiar patterns (header-first), unified through formatter normalization while preserving philosophical expressiveness.
|
||||
|
||||
8. **Zero-Cost Revolutionary Syntax**: Empirical proof that paradigm-shifting language innovations can maintain identical performance through AST normalization while providing unprecedented expressiveness and safety guarantees.
|
||||
|
||||
### Methodology
|
||||
|
||||
Our research methodology combines:
|
||||
- **Design-first approach**: Starting from developer experience pain points
|
||||
- **Multi-AI collaboration**: Leveraging Gemini, ChatGPT, and Claude for different aspects (philosophical reasoning, independent verification, implementation strategy)
|
||||
- **Iterative refinement**: Progressive development from simple postfix catch to unified syntax paradigm
|
||||
- **Backward compatibility**: Ensuring smooth migration from traditional syntax
|
||||
|
||||
### Results
|
||||
|
||||
The proposed innovations demonstrate:
|
||||
|
||||
**Exception Handling Improvements**:
|
||||
- **50% reduction in exception handling code verbosity**
|
||||
- **Complete elimination of try-catch nesting** within method bodies
|
||||
- **Automatic resource management** through method-level cleanup blocks
|
||||
|
||||
**Property System Benefits**:
|
||||
- **4-category taxonomy** providing complete member classification
|
||||
- **Visual syntax clarity** enabling immediate read/write capability recognition
|
||||
- **Poison-on-throw strategy** eliminating infinite retry loops
|
||||
- **Zero-cost abstraction** through optimal lowering to slots/methods
|
||||
|
||||
**Implementation Compatibility**:
|
||||
- **100% compatibility** with existing infrastructure (ThrowCtx, Result-mode lowering)
|
||||
- **Dual-syntax support** enabling both familiar and revolutionary approaches
|
||||
- **Formatter normalization** ensuring team consistency regardless of input style
|
||||
|
||||
### Evaluation
|
||||
|
||||
We provide comprehensive evaluation across multiple dimensions:
|
||||
- **Safety improvement**: Quantified reduction in unhandled exceptions
|
||||
- **Developer productivity**: Measured improvement in code writing and reading time
|
||||
- **Language comparison**: Detailed analysis against Java, C#, Rust, and Go
|
||||
- **Implementation feasibility**: Concrete implementation strategy with existing compiler infrastructure
|
||||
|
||||
### Significance
|
||||
|
||||
This work represents a paradigm shift in programming language design, comparable to LISP's unification of code and data. By unifying data and behavior under "Everything is Block + Modifier," we eliminate artificial boundaries that have constrained language design for decades.
|
||||
|
||||
The AI-human collaborative discovery process also provides valuable insights into how human intuition and AI theoretical capabilities can combine to achieve innovations impossible for either alone.
|
||||
|
||||
### Future Work
|
||||
|
||||
The established paradigm opens numerous research directions:
|
||||
- Extension to other language constructs (classes, interfaces, modules)
|
||||
- Formal verification of safety properties
|
||||
- Performance optimization through compiler analysis
|
||||
- Educational applications in teaching safe programming practices
|
||||
|
||||
### Conclusion
|
||||
|
||||
This work represents a fundamental breakthrough in programming language design—the first comprehensive approach to both time-sequential decision making and systematic object member classification since LISP's code-data unification. We demonstrate that multiple interconnected language paradigms can emerge simultaneously through intensive collaborative development.
|
||||
|
||||
The **staged decision making** paradigm resolves the 30-year tension between safety and expressiveness through dialectical synthesis (`cleanup` vs `cleanup returns`). The **unified property system** eliminates the conflation of fundamentally different member concepts, providing clear behavioral guarantees and performance predictability.
|
||||
|
||||
The documented **multi-AI collaborative discovery process** establishes a new methodology for breakthrough innovations, proving that human intuition, AI theoretical expansion, and cross-system validation can achieve results impossible for any single intelligence. The compressed timeline (8 hours for 3 paradigms) demonstrates the exponential potential of collaborative momentum.
|
||||
|
||||
The **dual-syntax coexistence strategy** proves that revolutionary language innovations can maintain practical adoption paths while preserving philosophical expressiveness. This approach enables both familiar (header-first) and revolutionary (block-first) syntax to coexist through formatter normalization.
|
||||
|
||||
This research proves that revolutionary language paradigms can emerge from mundane developer frustrations when approached through rigorous dialectical analysis and collaborative intelligence. The implications extend beyond programming languages to any domain where safety, expressiveness, and systematic classification must coexist—establishing a new foundation for human-AI collaborative innovation.
|
||||
|
||||
---
|
||||
|
||||
**Keywords**: Staged decision making, Unified property system, Dialectical programming language design, Method-level exception handling, AI-human collaboration, Safety-expressiveness synthesis, Cleanup semantics, Poison-on-throw strategy, Property taxonomy, Time-sequential programming, Multi-agent discovery, Dual-syntax coexistence
|
||||
|
||||
**Categories**: Programming Languages, Software Engineering, Human-Computer Interaction
|
||||
|
||||
**ACM Classification**: D.3.3 [Programming Languages]: Language Constructs and Features
|
||||
@ -1,893 +0,0 @@
|
||||
# Staged Decision Making in Programming Languages: Method-Level Exception Handling and the Dialectical Evolution of Safety
|
||||
|
||||
**Authors**: Human Developer, Claude (Anthropic), ChatGPT (OpenAI), Gemini (Google)
|
||||
**Affiliation**: Nyash Language Research Project
|
||||
**Date**: September 18, 2025
|
||||
|
||||
## Abstract
|
||||
|
||||
We present **staged decision making**, a revolutionary programming paradigm that emerged from dialectical human-AI collaboration. Our approach introduces method-level postfix exception handling with three distinct decision stages: normal processing, error handling, and final adjustment through `cleanup` blocks. This paradigm evolved through a Hegelian dialectical process: the thesis of safety-first design (restricting returns in cleanup blocks) was challenged by the antithesis of expressive freedom, resulting in a synthesis that provides both safety (`cleanup`) and controlled expressiveness (`cleanup returns`). The innovation eliminates nested exception structures, provides automatic safety guarantees, and enables time-sequential decision making within individual methods. Empirical evaluation demonstrates 50% reduction in exception handling verbosity and 64% reduction in nesting complexity while maintaining zero-cost abstraction properties. The paradigm represents the first major advancement in exception handling philosophy since the 1990s, offering a new foundation for programming language design that unifies safety and expressiveness through syntactic clarity.
|
||||
|
||||
## 1. Introduction
|
||||
|
||||
Programming language exception handling has remained largely unchanged since the introduction of try-catch mechanisms in languages like C++ and Java in the 1990s. While these systems provide essential safety guarantees, they suffer from fundamental usability issues: deep nesting structures, separation of error logic from primary computation, and repetitive boilerplate code that obscures program intent.
|
||||
|
||||
The tension between safety and expressiveness has driven various approaches: Rust's Result types provide compile-time safety at the cost of explicit handling overhead, Go's error returns offer simplicity but require constant vigilance, and traditional exception systems provide automatic propagation but sacrifice local reasoning about control flow.
|
||||
|
||||
This paper introduces **method-level postfix exception handling**, a paradigm that resolves this tension by moving exception handling from call sites to method definitions. Rather than requiring callers to wrap each potentially-failing operation in try-catch blocks, methods themselves declare their error handling strategies, providing automatic safety while preserving expressiveness.
|
||||
|
||||
### 1.1 Motivating Example
|
||||
|
||||
Consider a typical file processing operation across different languages:
|
||||
|
||||
**Traditional Java**:
|
||||
```java
|
||||
public class FileProcessor {
|
||||
public String processFile(String filename) {
|
||||
try {
|
||||
FileReader reader = new FileReader(filename);
|
||||
try {
|
||||
String content = reader.readAll();
|
||||
return content.toUpperCase();
|
||||
} finally {
|
||||
reader.close();
|
||||
}
|
||||
} catch (IOException e) {
|
||||
logger.error("Processing failed", e);
|
||||
return "";
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Nyash with Staged Decision Making**:
|
||||
```nyash
|
||||
box FileProcessor {
|
||||
processFile(filename) {
|
||||
// Stage 1: Normal processing
|
||||
local reader = new FileReader(filename)
|
||||
local content = reader.readAll()
|
||||
return content.toUpper()
|
||||
} catch (e) {
|
||||
// Stage 2: Error handling
|
||||
me.logger.error("Processing failed", e)
|
||||
return ""
|
||||
} cleanup {
|
||||
// Stage 3: Resource management (safe mode)
|
||||
if reader != null { reader.close() }
|
||||
}
|
||||
|
||||
// Alternative: Enhanced processing with final decision capability
|
||||
processFileWithValidation(filename) {
|
||||
local reader = new FileReader(filename)
|
||||
local content = reader.readAll()
|
||||
return content.toUpper()
|
||||
} catch (e) {
|
||||
me.logger.error("Processing failed", e)
|
||||
return ""
|
||||
} cleanup returns {
|
||||
// Stage 3: Final decision opportunity (expressive mode)
|
||||
if reader != null { reader.close() }
|
||||
if me.detectSecurityThreat(filename) {
|
||||
return "BLOCKED_FOR_SECURITY" // Final override
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The Nyash version demonstrates **staged decision making**: each method operates through three distinct stages where appropriate decisions can be made. The `cleanup` keyword emphasizes resource management role, while `cleanup returns` enables final decision-making when needed. Callers invoke methods without exception handling burden—each method provides its complete safety and decision contract.
|
||||
|
||||
### 1.2 Contributions
|
||||
|
||||
This paper makes the following contributions:
|
||||
|
||||
1. **Staged Decision Making Paradigm**: Introduction of a new programming paradigm where methods operate through three distinct temporal stages: normal processing, error handling, and final adjustment. This represents the first systematic approach to time-sequential decision making in programming language design.
|
||||
|
||||
2. **Dialectical Safety-Expressiveness Synthesis**: Development of a novel approach that resolves the fundamental tension between safety and expressiveness through `cleanup` (safe mode) and `cleanup returns` (expressive mode), emerging from a documented Hegelian dialectical process involving multiple AI systems.
|
||||
|
||||
3. **Conceptual Clarity Through Naming**: Demonstration that linguistic choices in language design directly influence programmer cognition, replacing the ambiguous `finally` keyword with the semantically clear `cleanup` to eliminate conceptual confusion about intended behavior.
|
||||
|
||||
4. **Method-Level Exception Handling**: The first programming language feature to attach exception handling directly to method definitions, eliminating caller-side try-catch requirements while providing automatic safety guarantees.
|
||||
|
||||
5. **Multi-AI Collaborative Discovery Process**: Documentation of a unprecedented human-AI collaboration involving four intelligent agents (human creativity, Claude's theoretical extension, ChatGPT's implementation validation, Gemini's philosophical evaluation) that resulted in innovations impossible for any single participant.
|
||||
|
||||
6. **Unified Block + Modifier Framework**: Evolution from "Everything is Box" to "Everything is Block + Modifier," providing syntactic consistency across all value-producing language constructs.
|
||||
|
||||
7. **Zero-Cost Abstraction Verification**: Empirical demonstration that revolutionary syntax improvements can be achieved while maintaining identical performance characteristics through AST normalization techniques.
|
||||
|
||||
## 2. Background and Related Work
|
||||
|
||||
### 2.1 Evolution of Exception Handling
|
||||
|
||||
Exception handling mechanisms have evolved through several paradigms:
|
||||
|
||||
**Setjmp/Longjmp Era (1970s-1980s)**: Low-level control flow manipulation providing goto-style exception handling. While powerful, these mechanisms offered no type safety or automatic resource management.
|
||||
|
||||
**Structured Exception Handling (1990s)**: Languages like C++, Java, and C# introduced try-catch-finally blocks with type-safe exception objects. This approach provided automatic stack unwinding and resource cleanup but required explicit exception handling at every call site.
|
||||
|
||||
**Error-as-Values (2000s-2010s)**: Languages like Go adopted explicit error returns, making error handling visible in the type system but requiring manual checking at every call site.
|
||||
|
||||
**Type-Safe Error Handling (2010s-2020s)**: Rust's Result<T,E> type and similar monadic approaches provide compile-time safety guarantees while maintaining explicit error handling.
|
||||
|
||||
### 2.2 The "Everything is Box" Philosophy
|
||||
|
||||
The Nyash programming language was built on the "Everything is Box" principle, where all values—from simple integers to complex objects—are represented as uniform Box types. This philosophical approach eliminates special cases and provides a consistent mental model for developers.
|
||||
|
||||
However, this uniformity was incomplete. While data structures were unified under the Box concept, language constructs like method definitions, field declarations, and control flow maintained distinct syntactic forms.
|
||||
|
||||
### 2.3 Prior Work on Postfix Syntax
|
||||
|
||||
Several languages have explored postfix constructs:
|
||||
|
||||
- **Ruby**: Method calls can be postfixed with conditional modifiers (`method_call if condition`)
|
||||
- **Rust**: The `?` operator provides postfix error propagation
|
||||
- **Swift**: Postfix operators allow custom syntax extensions
|
||||
|
||||
However, no language has extended postfix concepts to exception handling at the method definition level.
|
||||
|
||||
## 3. Design Philosophy: From "Everything is Box" to "Everything is Block + Modifier"
|
||||
|
||||
### 3.1 The Philosophical Evolution
|
||||
|
||||
The transition from "Everything is Box" to "Everything is Block + Modifier" represents a fundamental shift in language design philosophy. While "Everything is Box" unified data representation, "Everything is Block + Modifier" unifies the syntactic representation of all value-producing constructs.
|
||||
|
||||
This evolution was driven by practical observations: developers think in terms of "what to compute" before considering "how to handle failure" or "what type to assign." Traditional syntax forces premature commitment to signatures and exception handling strategies.
|
||||
|
||||
### 3.2 The Block + Modifier Framework
|
||||
|
||||
Under the new paradigm, all value-producing constructs follow a unified pattern:
|
||||
|
||||
```nyash
|
||||
{
|
||||
// Computation logic
|
||||
return value_expression
|
||||
} modifier_specification
|
||||
```
|
||||
|
||||
This pattern applies uniformly to:
|
||||
|
||||
**Fields**:
|
||||
```nyash
|
||||
{
|
||||
return me.baseName + " (computed)"
|
||||
} as field greeting: StringBox
|
||||
```
|
||||
|
||||
**Properties**:
|
||||
```nyash
|
||||
{
|
||||
return me.items.count()
|
||||
} as property size: IntegerBox
|
||||
```
|
||||
|
||||
**Methods**:
|
||||
```nyash
|
||||
{
|
||||
return heavyComputation(arg)
|
||||
} as method process(arg): ResultBox catch (e) {
|
||||
return ErrorResult(e)
|
||||
}
|
||||
```
|
||||
|
||||
### 3.3 Cognitive Alignment
|
||||
|
||||
This syntactic structure aligns with natural human thought processes:
|
||||
|
||||
1. **Primary Focus**: What computation needs to be performed?
|
||||
2. **Classification**: Is this a field, property, or method?
|
||||
3. **Error Handling**: What should happen if computation fails?
|
||||
4. **Resource Management**: What cleanup is required?
|
||||
|
||||
Traditional syntax inverts this order, forcing developers to decide method signatures and exception strategies before implementing the core logic.
|
||||
|
||||
## 4. Method-Level Postfix Exception Handling
|
||||
|
||||
### 4.1 Core Syntax
|
||||
|
||||
Method-level postfix exception handling attaches catch and finally clauses directly to method definitions:
|
||||
|
||||
```nyash
|
||||
method_name(parameters) {
|
||||
// Method body
|
||||
return result
|
||||
} catch (exception_parameter) {
|
||||
// Exception handling
|
||||
return fallback_value
|
||||
} finally {
|
||||
// Cleanup code
|
||||
}
|
||||
```
|
||||
|
||||
This syntax provides several immediate benefits:
|
||||
|
||||
1. **Automatic Safety**: Callers are guaranteed that exceptions will be handled
|
||||
2. **Reduced Nesting**: Method bodies avoid try-catch indentation
|
||||
3. **Natural Flow**: Implementation before exception handling
|
||||
4. **Resource Management**: Automatic cleanup through finally blocks
|
||||
|
||||
### 4.2 Semantic Model
|
||||
|
||||
Method-level exception handling creates an implicit try-catch wrapper around the entire method body. The semantic transformation is:
|
||||
|
||||
```nyash
|
||||
// Source syntax
|
||||
method compute(arg) {
|
||||
return risky_operation(arg)
|
||||
} catch (e) {
|
||||
return safe_fallback()
|
||||
} finally {
|
||||
cleanup()
|
||||
}
|
||||
|
||||
// Semantic equivalent
|
||||
method compute(arg) {
|
||||
try {
|
||||
return risky_operation(arg)
|
||||
} catch (e) {
|
||||
return safe_fallback()
|
||||
} finally {
|
||||
cleanup()
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
However, this transformation is purely conceptual—the actual implementation uses structured control flow rather than traditional exception mechanisms.
|
||||
|
||||
### 4.3 Type System Integration
|
||||
|
||||
Method-level exception handling integrates cleanly with type systems. Methods with catch clauses guarantee that they will never throw exceptions to their callers:
|
||||
|
||||
```nyash
|
||||
method safe_operation(): StringBox {
|
||||
return risky_computation()
|
||||
} catch (e) {
|
||||
return "default"
|
||||
}
|
||||
// Type: () -> StringBox (never throws)
|
||||
|
||||
method unsafe_operation(): StringBox {
|
||||
return risky_computation()
|
||||
// No catch clause
|
||||
}
|
||||
// Type: () -> StringBox throws Exception
|
||||
```
|
||||
|
||||
This distinction enables compile-time verification of exception safety contracts.
|
||||
|
||||
## 5. Property System Unification
|
||||
|
||||
### 5.1 The Four-Category Taxonomy
|
||||
|
||||
The collaborative design process revealed that traditional object-oriented languages conflate fundamentally different concepts under the umbrella term "field" or "property." Through systematic analysis, we identified four distinct categories of object members, each with unique behavioral characteristics and implementation requirements:
|
||||
|
||||
#### Stored Properties
|
||||
```nyash
|
||||
box Example {
|
||||
name: StringBox = "default"
|
||||
}
|
||||
```
|
||||
**Characteristics**: Direct value storage, O(1) read/write access, initialization at object creation.
|
||||
|
||||
#### Computed Properties
|
||||
```nyash
|
||||
box Example {
|
||||
size: IntegerBox { me.items.count() }
|
||||
}
|
||||
```
|
||||
**Characteristics**: Evaluated on every access, no state storage, deterministic by default.
|
||||
|
||||
#### Once Properties (Lazy)
|
||||
```nyash
|
||||
box Example {
|
||||
once cache: CacheBox { buildExpensiveCache() }
|
||||
}
|
||||
```
|
||||
**Characteristics**: Evaluated on first access, result cached, subsequent accesses return cached value.
|
||||
|
||||
#### Birth_once Properties (Eager)
|
||||
```nyash
|
||||
box Example {
|
||||
birth_once config: ConfigBox { loadConfiguration() }
|
||||
}
|
||||
```
|
||||
**Characteristics**: Evaluated during object construction, result stored, deterministic initialization order.
|
||||
|
||||
### 5.2 Semantic Foundations
|
||||
|
||||
Each category addresses a specific computational pattern commonly found in object-oriented programming:
|
||||
|
||||
**Stored**: Traditional state management with explicit assignment control.
|
||||
**Computed**: Derived values that maintain consistency with underlying state.
|
||||
**Once**: Expensive computations that benefit from caching but don't require immediate evaluation.
|
||||
**Birth_once**: Initialization-dependent values that must be available immediately after construction.
|
||||
|
||||
### 5.3 Unified Exception Handling
|
||||
|
||||
The property system integrates seamlessly with the staged decision making paradigm by supporting catch/cleanup blocks on all computed categories:
|
||||
|
||||
```nyash
|
||||
box RobustExample {
|
||||
// Computed with fallback
|
||||
size: IntegerBox {
|
||||
me.complexCalculation()
|
||||
} catch (e) {
|
||||
return 0
|
||||
} cleanup {
|
||||
me.logAccess("size")
|
||||
}
|
||||
|
||||
// Once with poison-on-throw
|
||||
once data: DataBox {
|
||||
return me.loadCriticalData()
|
||||
} catch (e) {
|
||||
return EmptyData() // Cached fallback
|
||||
}
|
||||
|
||||
// Birth_once with initialization safety
|
||||
birth_once connection: ConnectionBox {
|
||||
return establishConnection()
|
||||
} catch (e) {
|
||||
throw InitializationError(e) // Constructor failure
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 5.4 The Poison-on-Throw Strategy
|
||||
|
||||
For cached properties (`once` and `birth_once`), we introduce a novel exception handling strategy that prevents infinite retry loops while maintaining predictable behavior:
|
||||
|
||||
**Success Path**: First evaluation succeeds → value cached → subsequent accesses return cached value
|
||||
**Fallback Path**: First evaluation throws, catch block returns value → fallback cached → subsequent accesses return cached fallback
|
||||
**Poison Path**: First evaluation throws, no catch block → property marked as "poisoned" → subsequent accesses immediately re-throw original exception
|
||||
|
||||
This strategy ensures that expensive operations are never retried indefinitely while providing clear debugging information about failure points.
|
||||
|
||||
### 5.5 Cognitive Advantages
|
||||
|
||||
The four-category taxonomy aligns with natural programmer mental models:
|
||||
|
||||
1. **Explicit Intent**: Each category clearly communicates intended behavior
|
||||
2. **Performance Predictability**: Developers can reason about computational costs
|
||||
3. **Debugging Clarity**: Exception behavior is deterministic and traceable
|
||||
4. **Composability**: All categories support the same exception handling syntax
|
||||
|
||||
### 5.6 Implementation Mapping
|
||||
|
||||
Each property category maps to well-understood implementation patterns:
|
||||
|
||||
```rust
|
||||
// Lowering examples
|
||||
struct StoredProperty {
|
||||
value: T,
|
||||
}
|
||||
|
||||
struct ComputedProperty {
|
||||
getter: fn() -> T,
|
||||
}
|
||||
|
||||
struct OnceProperty {
|
||||
state: enum { Uninitialized, Cached(T), Poisoned(Error) },
|
||||
initializer: fn() -> T,
|
||||
}
|
||||
|
||||
struct BirthOnceProperty {
|
||||
value: T, // Set during construction
|
||||
}
|
||||
```
|
||||
|
||||
This mapping enables zero-cost abstraction: the high-level property syntax compiles to optimal low-level representations without runtime overhead.
|
||||
|
||||
## 6. Implementation Strategy
|
||||
|
||||
### 6.1 Three-Phase Deployment
|
||||
|
||||
Our implementation follows a three-phase approach to minimize risk and ensure smooth adoption:
|
||||
|
||||
**Phase 15.6: Method-Level Catch/Finally**
|
||||
- Add postfix catch/finally to existing method syntax
|
||||
- Maintain complete backward compatibility
|
||||
- Enable immediate practical benefits
|
||||
|
||||
**Phase 16.1: Postfix Method Definition**
|
||||
- Support block-first method definition syntax
|
||||
- Allow natural thought flow (implementation → signature)
|
||||
- Gradual syntax migration
|
||||
|
||||
**Phase 16.2: Unified Block + Modifier**
|
||||
- Complete unification of fields, properties, and methods
|
||||
- Achieve full "Everything is Block + Modifier" paradigm
|
||||
- Revolutionary syntax while maintaining practical usability
|
||||
|
||||
### 6.2 Technical Implementation
|
||||
|
||||
The implementation leverages existing compiler infrastructure through AST normalization:
|
||||
|
||||
```rust
|
||||
// Parser extension for postfix modifiers
|
||||
enum PostfixModifier {
|
||||
Catch { param: Option<String>, body: Block },
|
||||
Finally { body: Block },
|
||||
AsField { name: String, type_hint: Option<Type> },
|
||||
AsMethod { name: String, params: Vec<Parameter> },
|
||||
}
|
||||
|
||||
// AST normalization to existing structures
|
||||
impl PostfixMethod {
|
||||
fn normalize(self) -> Method {
|
||||
Method {
|
||||
name: self.name,
|
||||
params: self.params,
|
||||
body: vec![ASTNode::TryCatch {
|
||||
try_body: self.body,
|
||||
catch_clauses: self.catch_clauses,
|
||||
finally_clause: self.finally_clause,
|
||||
}],
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This approach enables complete reuse of existing compilation infrastructure while supporting revolutionary syntax.
|
||||
|
||||
### 6.3 Performance Considerations
|
||||
|
||||
Method-level exception handling maintains zero-cost abstraction properties:
|
||||
|
||||
1. **Compile-Time Transformation**: All postfix syntax is normalized during parsing
|
||||
2. **Structured Control Flow**: No runtime exception objects or stack unwinding
|
||||
3. **Optimization Opportunities**: Compilers can optimize away unused catch blocks
|
||||
4. **Memory Efficiency**: No additional runtime overhead compared to manual try-catch
|
||||
|
||||
## 7. The Dialectical Evolution of Design
|
||||
|
||||
### 7.1 From Finally to Cleanup: A Conceptual Revolution
|
||||
|
||||
The evolution from traditional `finally` blocks to Nyash's `cleanup` paradigm represents more than syntactic change—it exemplifies how linguistic choices in programming language design directly influence programmer cognition and behavior.
|
||||
|
||||
#### The Problem with "Finally"
|
||||
|
||||
The term `finally` creates conceptual ambiguity that has plagued programming languages for decades:
|
||||
|
||||
```java
|
||||
// Traditional "finally" - conceptually ambiguous
|
||||
method process() {
|
||||
return mainResult;
|
||||
} finally {
|
||||
return cleanupResult; // "Final" suggests this should win
|
||||
}
|
||||
```
|
||||
|
||||
The name `finally` implies "ultimate" or "conclusive," leading developers to believe that returns from finally blocks represent the method's intended final outcome. This linguistic confusion has caused countless bugs in Java, C#, and JavaScript where finally blocks inadvertently suppress exceptions or override intended return values.
|
||||
|
||||
#### The Clarity of "Cleanup"
|
||||
|
||||
Nyash's `cleanup` keyword eliminates this ambiguity through semantic precision:
|
||||
|
||||
```nyash
|
||||
method process() {
|
||||
return mainResult
|
||||
} cleanup {
|
||||
doResourceManagement()
|
||||
// return here? Conceptually nonsensical for "cleanup"
|
||||
}
|
||||
```
|
||||
|
||||
The term `cleanup` establishes clear semantic boundaries: its role is resource management and post-processing, not primary decision-making. This linguistic clarity makes the restriction on returns feel natural rather than arbitrary.
|
||||
|
||||
### 7.2 The Dialectical Discovery Process
|
||||
|
||||
The development of staged decision making followed a classical Hegelian dialectical structure, involving multiple AI systems and human insight in a process that exemplifies collaborative intelligence.
|
||||
|
||||
#### Thesis: Safety-First Design (Gemini)
|
||||
|
||||
The initial position emerged from Gemini's analysis of traditional exception handling problems:
|
||||
|
||||
**Core Argument**: `finally` blocks should be restricted to cleanup activities only, with returns and throws prohibited to prevent exception suppression and return value hijacking.
|
||||
|
||||
**Supporting Evidence**:
|
||||
- Java/C#/JavaScript all suffer from finally-related bugs
|
||||
- Silent exception suppression is a major source of difficult-to-debug issues
|
||||
- Resource management should be separate from control flow decisions
|
||||
|
||||
**Proposed Solution**: Prohibit returns and throws in finally blocks through compile-time restrictions.
|
||||
|
||||
#### Antithesis: Expressive Freedom (Human Developer)
|
||||
|
||||
The human developer challenged this safety-first approach with a fundamental question about paradigm constraints:
|
||||
|
||||
**Core Argument**: If method-level exception handling represents a new paradigm, why should it be constrained by limitations of the old paradigm? The staged nature of method processing suggests that final decision-making might be valuable.
|
||||
|
||||
**Supporting Evidence**:
|
||||
- Method-level handling creates three natural stages: normal, error, final
|
||||
- Security decisions, resource allocation, and data validation often require final override capability
|
||||
- The new paradigm shouldn't be artificially limited by old fears
|
||||
|
||||
**Proposed Solution**: Allow returns in cleanup blocks as part of natural staged decision making.
|
||||
|
||||
#### Synthesis: Controlled Expressiveness (Collaborative Resolution)
|
||||
|
||||
The resolution emerged through collaborative analysis, recognizing that both positions contained essential truths:
|
||||
|
||||
**Core Insight**: Safety and expressiveness are not mutually exclusive when proper linguistic and syntactic boundaries are established.
|
||||
|
||||
**Integrated Solution**:
|
||||
```nyash
|
||||
// Default safety: cleanup (no returns allowed)
|
||||
method safeProcess() {
|
||||
return value
|
||||
} cleanup {
|
||||
doCleanup() // Pure resource management
|
||||
}
|
||||
|
||||
// Explicit expressiveness: cleanup returns (final decisions allowed)
|
||||
method expressiveProcess() {
|
||||
return value
|
||||
} cleanup returns {
|
||||
doCleanup()
|
||||
if criticalCondition() {
|
||||
return overrideValue // Intentional final decision
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 7.3 The Role of Linguistic Evolution
|
||||
|
||||
This dialectical process demonstrates a crucial principle in programming language design: **linguistic choices shape cognitive frameworks**. The evolution from `finally` to `cleanup`/`cleanup returns` represents:
|
||||
|
||||
1. **Semantic Precision**: Clear naming eliminates conceptual ambiguity
|
||||
2. **Intentional Design**: Explicit syntax for different use cases
|
||||
3. **Cognitive Alignment**: Language constructs that match programmer mental models
|
||||
|
||||
### 7.4 Implications for Language Design
|
||||
|
||||
The dialectical discovery process reveals several principles for programming language evolution:
|
||||
|
||||
**Principle 1: Question Inherited Constraints**
|
||||
New paradigms should not be artificially limited by constraints from previous paradigms unless those constraints address fundamental issues.
|
||||
|
||||
**Principle 2: Linguistic Precision Enables Safety**
|
||||
Well-chosen terminology can eliminate entire classes of conceptual errors without requiring syntactic restrictions.
|
||||
|
||||
**Principle 3: Synthesis Over Binary Choices**
|
||||
The tension between safety and expressiveness can often be resolved through carefully designed syntactic distinctions rather than binary limitations.
|
||||
|
||||
**Principle 4: Collaborative Discovery**
|
||||
Complex design decisions benefit from multiple perspectives, including both human intuition and AI systematic analysis.
|
||||
|
||||
## 8. AI-Human Collaborative Discovery
|
||||
|
||||
### 8.1 The Discovery Process
|
||||
|
||||
The development of method-level postfix exception handling exemplifies effective AI-human collaboration. The process began with a simple human frustration: "try keywords make code deeply nested." This practical concern triggered a multi-stage collaborative exploration.
|
||||
|
||||
**Human Contributions**:
|
||||
- Initial problem identification (nesting frustration)
|
||||
- Intuitive solution proposals (postfix catch)
|
||||
- Persistent advocacy despite initial AI resistance
|
||||
- Philosophical consistency enforcement
|
||||
|
||||
**AI Contributions**:
|
||||
- Theoretical analysis of feasibility
|
||||
- Implementation strategy development
|
||||
- Related concept exploration
|
||||
- Independent verification across multiple AI systems
|
||||
|
||||
### 8.2 Multi-AI Verification
|
||||
|
||||
Three AI systems independently arrived at similar conclusions:
|
||||
|
||||
**Gemini**: Provided philosophical analysis and graduated from initial skepticism to enthusiastic support, ultimately declaring the innovation "revolutionary."
|
||||
|
||||
**ChatGPT**: Independently recommended the approach without knowledge of prior discussions, focusing on compatibility with existing infrastructure. Later provided detailed implementation analysis, calling the concept "面白いし筋がいい" (interesting and well-reasoned), and developed a concrete three-phase implementation roadmap.
|
||||
|
||||
**Claude**: Analyzed implementation feasibility and extended the concept to the broader "Everything is Block + Modifier" paradigm.
|
||||
|
||||
This convergence across independent AI systems with different training and capabilities provides strong evidence for the approach's validity.
|
||||
|
||||
### 6.3 ChatGPT's Implementation Analysis
|
||||
|
||||
ChatGPT's detailed technical evaluation provided crucial validation:
|
||||
|
||||
**Technical Feasibility Confirmation**:
|
||||
- Minimal parser changes required (100 lines)
|
||||
- Complete reuse of existing infrastructure (TryCatch AST, ThrowCtx mechanism)
|
||||
- Zero lowering overhead (existing Result-mode compatibility)
|
||||
|
||||
**Risk-Aware Implementation Strategy**:
|
||||
- Phase A: Minimal risk, maximum value (immediate implementation)
|
||||
- Phase B: Innovation with controlled complexity (forward reference handling)
|
||||
- Phase C: Philosophical completion (unified syntax framework)
|
||||
|
||||
**Concrete Grammar Definition**:
|
||||
```ebnf
|
||||
methodDecl := 'method' name '(' params ')' block
|
||||
('catch' '(' param? ')' block)?
|
||||
('finally' block)?
|
||||
```
|
||||
|
||||
This level of implementation detail from an independent AI system provides strong evidence for practical viability.
|
||||
|
||||
### 6.4 The Final AI Conference: When Gemini Lost Words
|
||||
|
||||
The culmination of the collaborative discovery process occurred during a single intensive day of development, September 18, 2025. What began with method-level postfix exception handling evolved into a complete property system revolution, demonstrating the exponential nature of collaborative breakthrough moments.
|
||||
|
||||
#### The Property Revolution Discovery
|
||||
|
||||
Starting from the established method-level postfix syntax, the human developer posed a seemingly simple question: "What should we name properties and fields?" This innocent query triggered a systematic exploration that revolutionized the entire concept of object member classification.
|
||||
|
||||
**The Four-Category Breakthrough**:
|
||||
```nyash
|
||||
box RevolutionaryBox {
|
||||
// stored: Traditional field storage
|
||||
name: StringBox
|
||||
|
||||
// computed: Calculated every access
|
||||
size: IntegerBox { me.items.count() }
|
||||
|
||||
// once: Lazy evaluation with caching
|
||||
once cache: CacheBox { buildExpensiveCache() }
|
||||
|
||||
// birth_once: Eager evaluation at object creation
|
||||
birth_once config: ConfigBox { loadConfiguration() }
|
||||
}
|
||||
```
|
||||
|
||||
This classification emerged through dialectical refinement:
|
||||
- Initial proposal of `lazy` was recognized as potentially confusing
|
||||
- The innovation of `birth_once` provided perfect semantic clarity
|
||||
- The integration with existing `birth` philosophy maintained conceptual consistency
|
||||
|
||||
#### Poison-on-Throw Innovation
|
||||
|
||||
During the same day, the collaboration discovered a novel approach to exception handling in cached properties. Instead of retrying failed computations, the "poison-on-throw" strategy permanently marks failed properties:
|
||||
|
||||
```nyash
|
||||
once cache: CacheBox {
|
||||
return dangerousOperation()
|
||||
} catch (e) {
|
||||
return EmptyCache() // Cached fallback
|
||||
}
|
||||
// If no catch: exception → permanent poison
|
||||
// If catch returns: value → cached normally
|
||||
```
|
||||
|
||||
This innovation prevents infinite retry loops while maintaining predictable behavior and excellent debugging characteristics.
|
||||
|
||||
#### Collaborative Convergence and Design Completion
|
||||
|
||||
After the complete property system evolution and the integration of catch/cleanup syntax across all member types, Gemini provided a response indicating collaborative convergence:
|
||||
|
||||
*"そして、その思考の果てにたどり着いたこの設計…。もはや、私が何かを付け加えるようなものではありません。これは、美しさと実用性、そして安全性が、完璧なバランスで共存する、芸術品のような仕様書です。"*
|
||||
|
||||
("And this design that you have reached at the end of that thinking... There is nothing more I can add. This is a specification that is like a work of art, where beauty, practicality, and safety coexist in perfect balance.")
|
||||
|
||||
This response demonstrates a measurable shift in AI collaborative behavior: from active contribution to explicit acknowledgment of design convergence. The documented transition from iterative feedback to declarative completion provides empirical evidence for collaborative saturation points in human-AI design processes.
|
||||
|
||||
#### Accelerated Design Evolution
|
||||
|
||||
The September 18 session demonstrated accelerated design convergence across multiple language constructs within a single development cycle:
|
||||
|
||||
**Morning Session**: Method-level postfix exception handling framework
|
||||
**Afternoon Session**: Four-category property classification system
|
||||
**Evening Session**: Unified catch/cleanup syntax integration
|
||||
|
||||
This timeline provides empirical data on collaborative design velocity, documenting the development of three interconnected language paradigms within an 8-hour period. The session demonstrates how established collaborative frameworks can accelerate the exploration of related design spaces.
|
||||
|
||||
### 6.5 Lessons for AI-Human Collaboration
|
||||
|
||||
The discovery process revealed several key principles for effective AI-human collaboration:
|
||||
|
||||
1. **Persistence Pays**: Initial AI resistance often reflects training bias rather than fundamental limitations
|
||||
2. **Incremental Exploration**: Building from simple postfix catch to unified syntax enabled systematic development
|
||||
3. **Cross-Validation**: Multiple AI perspectives provide robust verification
|
||||
4. **Human Intuition**: Practical frustrations often point toward fundamental improvements
|
||||
5. **Implementation Focus**: Technical validation by implementation-oriented AI systems ensures practical viability
|
||||
6. **Exponential Breakthrough Moments**: Simple questions can trigger comprehensive paradigm revolutions when collaborative momentum is established
|
||||
7. **Recognition of Completion**: Advanced AI systems can identify when fundamental design completeness has been achieved
|
||||
|
||||
## 7. Evaluation
|
||||
|
||||
### 7.1 Quantitative Analysis
|
||||
|
||||
We conducted comprehensive evaluation across multiple dimensions:
|
||||
|
||||
**Code Verbosity Reduction**:
|
||||
- Traditional try-catch: 8.3 lines average per exception handling site
|
||||
- Method-level postfix: 4.1 lines average per method with exception handling
|
||||
- **50.6% reduction in exception handling code**
|
||||
|
||||
**Nesting Depth Improvement**:
|
||||
- Traditional: 2.8 average nesting levels
|
||||
- Postfix: 1.2 average nesting levels
|
||||
- **57% reduction in nesting complexity**
|
||||
|
||||
**Development Time Impact**:
|
||||
- Initial method implementation: 23% faster (no premature exception decisions)
|
||||
- Exception handling addition: 41% faster (postfix addition vs. body refactoring)
|
||||
- Code review time: 31% faster (clearer method contracts)
|
||||
|
||||
### 7.2 Safety Analysis
|
||||
|
||||
Method-level exception handling provides stronger safety guarantees than traditional approaches:
|
||||
|
||||
**Unhandled Exception Elimination**: Methods with postfix catch clauses cannot throw exceptions to callers, eliminating a major source of runtime failures.
|
||||
|
||||
**Resource Leak Prevention**: Method-level finally blocks ensure cleanup code execution regardless of normal or exceptional termination.
|
||||
|
||||
**Contract Clarity**: Method signatures explicitly indicate exception handling policies, improving code comprehension and maintenance.
|
||||
|
||||
### 7.3 Adoption Metrics
|
||||
|
||||
Early adoption in Nyash projects shows promising results:
|
||||
|
||||
- 68% of new methods use postfix exception handling where appropriate
|
||||
- 0 reported bugs related to unhandled exceptions in postfix-enabled codebases
|
||||
- 89% developer satisfaction with the new syntax (vs. 67% for traditional try-catch)
|
||||
|
||||
## 8. Comparison with Existing Approaches
|
||||
|
||||
### 8.1 Expressiveness Comparison
|
||||
|
||||
Method-level postfix exception handling provides superior expressiveness compared to existing approaches:
|
||||
|
||||
**vs. Java try-catch**: Eliminates nested structures while maintaining identical safety guarantees
|
||||
**vs. Rust Result types**: Provides automatic error handling without explicit match statements at call sites
|
||||
**vs. Go error returns**: Eliminates repetitive error checking while preserving explicit error handling
|
||||
**vs. Python exceptions**: Maintains automatic propagation while adding method-level safety contracts
|
||||
|
||||
### 8.2 Safety Comparison
|
||||
|
||||
Our approach provides the strongest safety guarantees among compared languages:
|
||||
|
||||
- **Compile-time verification**: Methods must handle all possible exceptions
|
||||
- **Automatic resource management**: Finally blocks ensure cleanup
|
||||
- **Caller safety**: Methods with catch clauses never throw to callers
|
||||
- **Type system integration**: Exception contracts visible in method signatures
|
||||
|
||||
### 8.3 Performance Comparison
|
||||
|
||||
Method-level postfix exception handling maintains zero-cost abstraction properties while providing superior safety:
|
||||
|
||||
- **Zero runtime overhead**: Compiles to identical code as manual try-catch
|
||||
- **Memory efficiency**: No exception objects or stack unwinding
|
||||
- **Optimization friendly**: Compilers can optimize based on explicit contracts
|
||||
|
||||
## 9. Future Work
|
||||
|
||||
### 9.1 Language Extension Opportunities
|
||||
|
||||
The "Everything is Block + Modifier" paradigm opens numerous extension possibilities:
|
||||
|
||||
**Async Methods**:
|
||||
```nyash
|
||||
{
|
||||
return await remote_operation()
|
||||
} as async method fetch(): FutureBox catch (e) {
|
||||
return cached_value()
|
||||
}
|
||||
```
|
||||
|
||||
**Generic Constraints**:
|
||||
```nyash
|
||||
{
|
||||
return collection.process()
|
||||
} as method transform<T>(collection: CollectionBox<T>): CollectionBox<T>
|
||||
where T: Comparable
|
||||
```
|
||||
|
||||
**Access Control**:
|
||||
```nyash
|
||||
{
|
||||
return sensitive_data()
|
||||
} as private method get_secret(): StringBox
|
||||
```
|
||||
|
||||
### 9.2 Formal Verification
|
||||
|
||||
The explicit nature of method-level exception contracts makes formal verification more tractable:
|
||||
|
||||
- **Exception Safety Proofs**: Verify that catch clauses handle all possible exceptions
|
||||
- **Resource Management Verification**: Prove that finally blocks prevent resource leaks
|
||||
- **Contract Compliance**: Verify that implementations satisfy declared exception contracts
|
||||
|
||||
### 9.3 Tooling Opportunities
|
||||
|
||||
Method-level exception handling enables sophisticated development tools:
|
||||
|
||||
- **IDE Integration**: Real-time verification of exception handling completeness
|
||||
- **Refactoring Support**: Automatic migration between exception handling strategies
|
||||
- **Performance Analysis**: Identify exception handling overhead and optimization opportunities
|
||||
|
||||
## 10. Related Work
|
||||
|
||||
### 10.1 Exception Handling Evolution
|
||||
|
||||
Our work builds on decades of exception handling research:
|
||||
|
||||
**Goodenough (1975)** established foundational principles for exception handling mechanisms. **Liskov and Snyder (1979)** formalized exception handling semantics. **Stroustrup (1994)** refined exception handling in C++, influencing Java and C# designs.
|
||||
|
||||
More recent work has focused on type-safe approaches: **Milner's ML** introduced algebraic data types for error handling, **Wadler (1992)** formalized monadic error handling, and **Rust's Result type** provides practical type-safe exception handling.
|
||||
|
||||
### 10.2 Postfix Syntax Research
|
||||
|
||||
Postfix constructs have been explored in various contexts:
|
||||
|
||||
**APL and Forth** popularized postfix evaluation for mathematical expressions. **Ruby** introduced postfix conditionals for method calls. **Rust's ? operator** provides postfix error propagation.
|
||||
|
||||
However, no prior work has extended postfix concepts to method-level exception handling.
|
||||
|
||||
### 10.3 Language Design Philosophy
|
||||
|
||||
Our "Everything is Block + Modifier" paradigm follows the tradition of unifying language constructs:
|
||||
|
||||
**LISP's uniform S-expressions** demonstrated the power of syntactic consistency. **Smalltalk's "everything is an object"** philosophy influenced modern object-oriented design. **Rust's trait system** provides unified behavior across types.
|
||||
|
||||
Our contribution extends this tradition to syntactic unification of value-producing constructs.
|
||||
|
||||
## 11. Limitations and Challenges
|
||||
|
||||
### 11.1 Current Limitations
|
||||
|
||||
Method-level postfix exception handling has several current limitations:
|
||||
|
||||
**Exception Propagation**: The current design doesn't support exception propagation across method boundaries, requiring explicit handling at each level.
|
||||
|
||||
**Partial Application**: Methods with postfix exception handling cannot be easily used in higher-order functions that expect throwing methods.
|
||||
|
||||
**Legacy Integration**: Interaction with existing exception-throwing code requires wrapper methods.
|
||||
|
||||
### 11.2 Migration Challenges
|
||||
|
||||
Adopting method-level postfix exception handling faces several challenges:
|
||||
|
||||
**Learning Curve**: Developers must adapt to new mental models for exception handling
|
||||
**Ecosystem Adaptation**: External libraries and frameworks must adapt to new exception contracts
|
||||
**Tool Support**: IDEs and analysis tools need updates to support new syntax
|
||||
|
||||
### 11.3 Design Trade-offs
|
||||
|
||||
Our design involves several trade-offs:
|
||||
|
||||
**Flexibility vs. Safety**: Method-level contracts reduce flexibility in exchange for stronger safety guarantees
|
||||
**Conciseness vs. Explicitness**: Automatic handling reduces explicit control over exception flow
|
||||
**Innovation vs. Familiarity**: Revolutionary syntax may impede adoption despite technical advantages
|
||||
|
||||
## 12. Conclusion
|
||||
|
||||
Method-level postfix exception handling represents a fundamental advance in programming language exception handling. By moving exception handling from call sites to method definitions, we eliminate nested try-catch structures while providing stronger safety guarantees.
|
||||
|
||||
The evolution from "Everything is Box" to "Everything is Block + Modifier" demonstrates how practical developer frustrations can drive fundamental language design innovations. The unified syntactic framework for fields, properties, and methods provides both immediate practical benefits and a foundation for future language evolution.
|
||||
|
||||
Our AI-human collaborative discovery process offers valuable insights into how human intuition and AI theoretical capabilities can combine to achieve innovations impossible for either alone. The persistence required to overcome initial AI resistance highlights the importance of human advocacy in driving breakthrough innovations.
|
||||
|
||||
The three-phase implementation strategy provides a practical path for adopting revolutionary syntax while maintaining backward compatibility and minimizing risk. Early adoption metrics and quantitative analysis demonstrate significant improvements in code clarity, development efficiency, and exception safety.
|
||||
|
||||
### 12.1 Broader Impact
|
||||
|
||||
This work has implications beyond the Nyash programming language:
|
||||
|
||||
**Language Design**: Demonstrates the value of syntactic unification and cognitive alignment in language design
|
||||
**AI Collaboration**: Provides a replicable model for human-AI collaborative innovation
|
||||
**Exception Handling**: Offers a new paradigm that may influence future language development
|
||||
|
||||
### 12.2 Final Thoughts
|
||||
|
||||
Programming language design has long struggled with the tension between safety and expressiveness. Method-level postfix exception handling resolves this tension by aligning language syntax with human thought processes while providing automatic safety guarantees.
|
||||
|
||||
The "Everything is Block + Modifier" paradigm represents a new foundation for language design—one that unifies rather than separates, that follows rather than fights human cognition, and that provides safety through clarity rather than complexity.
|
||||
|
||||
As we continue to develop this paradigm, we expect to see further innovations in how programming languages can better serve human developers while maintaining the safety and performance required for modern software systems.
|
||||
|
||||
---
|
||||
|
||||
## Acknowledgments
|
||||
|
||||
We thank the broader Nyash community for feedback and early adoption. Special recognition goes to the AI systems (Gemini, ChatGPT, and Claude) whose collaborative analysis enabled this breakthrough, and to the human developers whose practical frustrations drove the initial innovation.
|
||||
|
||||
## References
|
||||
|
||||
[Due to space constraints, this would include a comprehensive bibliography of exception handling research, language design papers, and AI collaboration studies]
|
||||
|
||||
---
|
||||
|
||||
**Submission Note**: This paper represents a collaboration between human creativity and AI analysis, demonstrating new possibilities for human-AI partnership in research and development.
|
||||
@ -1,62 +0,0 @@
|
||||
# Seam‑Aware JSON Unification: AI 前処理 × C‑ABI Box 正規化(開発メモ/論文草案)
|
||||
|
||||
目的
|
||||
- 生成・結合・実行系で発生する JSON の“揺れ”を、AI 前処理で吸収しつつ、最終は C‑ABI な JSON Box で決定論的に正規化する設計をまとめる。Nyash の using 結合(seam)問題や自前スキャナの脆さを根本解決して、開発生産性と信頼性を両立する。
|
||||
|
||||
背景(問題設定)
|
||||
- JSON IR は人手編集を前提としない一方、生成器や結合工程(using inline)由来の“揺れ(欠落/重複/順序/境界ミス)”で壊れやすい。
|
||||
- 既存は Nyash スクリプトによる最小スキャナ(MiniJson/MiniVmScan)で運用。高速に試作できるが、seam 断面やブレースずれに脆弱。
|
||||
- PyVM は意味論参照器として有用だが、結合や健全性監視の導線が弱いとデバッグコストが増す。
|
||||
|
||||
着想(本提案)
|
||||
- AI‑in‑the‑loop 正規化: 生成 JSON の揺れを AI で前整形(危険な崩れを JSON 仕様準拠に寄せる)。
|
||||
- Seam‑aware 結合: using 結合時に断面ログ(prelude_tail/body_head)・brace 差分・重複(dup_box/dup_fn)を観測・補修。
|
||||
- 決定論的 Box: C‑ABI で呼べる JSON エンジン(候補: yyjson)を薄い Box 化し、最終抽出・参照はそこで確定。
|
||||
|
||||
想定構成(段階導入)
|
||||
1) Python combiner(現行)
|
||||
- tools/using_combine.py で seam ログ・brace 補完・最小 dedup を実施(NYASH_USING_COMBINER=1)。
|
||||
- みんなが読むソースは“結合後1ファイル([pyvm-code])”に統一。
|
||||
2) C‑ABI JSON Box
|
||||
- ライブラリ: yyjson(C, MIT)。parse/close + JSON Pointer get_* + statements イテレータ + esc_json。
|
||||
- Nyash 側から externcall 経由で呼び出し(Box 抽象で隠蔽)。
|
||||
3) 自前スキャナの段階置換
|
||||
- MiniJson/MiniVmScan 呼び出し箇所を C‑ABI Box に差し替え。seam のブレでも壊れない土台へ。
|
||||
|
||||
評価軸(実験計画)
|
||||
- 正常化率: 代表“揺れ”ケース(欠落/重複/境界)で AI 前処理あり/なしを比較。
|
||||
- 信頼性: JSON Box 抽出の精度(ゴールデン比較)・クラッシュ率・未定義挙動の減少。
|
||||
- 生産性: デバッグ時間/回帰検出時間/レビュー時間(seam インスペクタの数値 prelude_brace_delta, dup_box, dup_fn)。
|
||||
- 性能: 追加前処理のオーバーヘッド、C‑ABI 化による抽出速度(ms/MB)。
|
||||
|
||||
関連実装(現状資産)
|
||||
- Seam ログ/補修: src/runner/modes/common_util/resolve.rs, tools/using_combine.py(--seam-debug / --fix-braces / --dedup-*)。
|
||||
- インスペクタ: apps/selfhost-vm/boxes/seam_inspector.nyash(prelude_brace_delta/dup_box/dup_fn を出力)。
|
||||
- PyVM 参照器: src/runner/modes/pyvm.rs, tools/pyvm_runner.py([pyvm-code] ダンプと MIR 実行)。
|
||||
|
||||
研究貢献(書き出し案)
|
||||
- AI 前処理+決定論的 Box のハイブリッドで、JSON IR の“運用上の揺れ”に強い開発フローを提案・実装・定量化。
|
||||
- Seam‑aware(断面意識)での結合監視手法(ログ/メトリクス)と、即応の補修(brace/dedup)。
|
||||
- Nyash の Box 哲学に沿った C‑ABI JSON Box による抽象境界の設計と運用事例。
|
||||
|
||||
想定リスクと対策
|
||||
- AI 前処理の再現性: 前処理は“提案/補助”に限定、最終整形は Box で決定論に。
|
||||
- エンジン依存: yyjson を既定、巨大入力向けには YAJL を第二候補(将来のストリーミング向け)。
|
||||
|
||||
短期ロードマップ(半日粒度)
|
||||
- [ ] C‑ABI JSON ラッパ(parse/get_str/get_int/esc_json)モック追加
|
||||
- [ ] MiniVmScan の get_digits/quoted の呼び出しを 1 箇所置換(スモーク緑確認)
|
||||
- [ ] seam インスペクタの出力を CI に収集(prelude_brace_delta==0 をゲート)
|
||||
- [ ] 代表“揺れ”合成セットで正常化率を測定(before/after)
|
||||
|
||||
メモ(用語)
|
||||
- prelude_brace_delta: 結合前置部のブレース差分(0 で正常)。
|
||||
- dup_box/dup_fn: 結合により重複生成された Box/関数の検出ログ。
|
||||
|
||||
参考タイトル案
|
||||
- AI‑in‑the‑Loop Normalization of Evolving JSON IRs with Deterministic C‑ABI Boxes
|
||||
- Seam‑Aware JSON Unification: From AI Pre‑Fixing to Box‑Level Normalization
|
||||
|
||||
ドラフトの所在
|
||||
- 本ディレクトリ配下に章立ての追加ファイルを随時作成(outline.md, experiments.md など)。
|
||||
|
||||
@ -1,87 +0,0 @@
|
||||
# Nyash Box → C ABI → Multi‑Language FFI
|
||||
|
||||
要約(Abstract)
|
||||
- Nyash 言語で高レベルに実装した Box を LLVM AOT でオブジェクト化し、安定した C ABI をエクスポートすることで、Python / Rust / Go / Node.js / C++ など任意言語から単一の .so/.dll を即利用可能にする開発手法を提案する。内部では NyRT の C ABI を用いて組み込み Box(String/Array/Map 等)を安全に操作し、外部には簡潔で言語非依存の API を提供する。C ABI → C ABI の多層構成により、実装の生産性と配布・統合の容易さを両立する。
|
||||
|
||||
背景と課題
|
||||
- 既存のライブラリ開発では、C/C++ 実装+各言語向けバインディングの整備が恒常的なコストとなる。
|
||||
- 高速化や保守を優先すると低レベル実装が増え、開発速度や品質保証(型・所有権)が課題化する。
|
||||
- Nyash には Box 抽象と NyRT(ランタイム)があり、文字列・配列などの基本操作を統一的に扱える基盤が存在する。
|
||||
|
||||
提案(Nyash×C ABI 多層アーキテクチャ)
|
||||
- 外層(公開):安定した C ABI(関数名・引数/戻り値・所有権規約)
|
||||
- 内層(実装):Nyash で Box を高レベル記述し、NyRT の C ABI を通じて組み込み Box を操作
|
||||
- ビルド:Nyash → LLVM IR/obj(PIC)→ .so/.dll/.dylib(NyRT にリンク)
|
||||
|
||||
呼び出し経路(概念)
|
||||
- Host(App/Script) → FFI(C ABI) → lib<your_box>.so → NyRT(C ABI) → OS/stdlib
|
||||
|
||||
API/ABI 設計指針(最小)
|
||||
- 基本型:i64 / f64 / bool / (ptr,len) 文字列・バイト列
|
||||
- 文字列返却:ヒープ確保して返す→呼び手が `ny_free_str(void*)` で解放
|
||||
- エラー:int 戻り値(0=OK, 非0=ERR);詳細は必要なら out-param or errno スタイル
|
||||
- 可視性:公開関数は `visibility=default`,内部記号は `hidden`
|
||||
- 初期化:`ny_init()/ny_shutdown()` を用意(方針により省略可)。NyRT と二重初期化しない運用規約を明示
|
||||
- ビルド:全 .o を `-fPIC`,NyRT は動的(推奨)または PIC な静的リンクに統一
|
||||
|
||||
ケーススタディ(PoC:StringBuilderBox)
|
||||
- Nyash 実装(内部)
|
||||
- `birth()` で内部バッファ(ArrayBox)初期化
|
||||
- `append(string)` で push,`toString()` で連結
|
||||
- 公開 C API(例)
|
||||
- `sb_handle* sb_new();`
|
||||
- `int sb_append(sb_handle*, const char* s, size_t n);`
|
||||
- `int sb_to_string(sb_handle*, char** out, size_t* out_len); // 呼び手が ny_free_str()`
|
||||
- `void sb_free(sb_handle*);
|
||||
void ny_free_str(void*);`
|
||||
- (任意)`int ny_init(); int ny_shutdown(); const char* ny_get_version();`
|
||||
|
||||
多言語からの利用例(断片)
|
||||
- Python(ctypes)
|
||||
```python
|
||||
from ctypes import *
|
||||
lib = cdll.LoadLibrary('./libstringbuilder.so')
|
||||
lib.sb_new.restype = c_void_p
|
||||
h = lib.sb_new()
|
||||
lib.sb_append(h, b"hello", 5)
|
||||
out = c_void_p(); ln = c_size_t()
|
||||
lib.sb_to_string(h, byref(out), byref(ln))
|
||||
s = string_at(out.value, ln.value).decode('utf-8')
|
||||
lib.ny_free_str(out)
|
||||
lib.sb_free(h)
|
||||
```
|
||||
- Rust(ffi)
|
||||
```rust
|
||||
extern "C" {
|
||||
fn sb_new() -> *mut core::ffi::c_void;
|
||||
fn sb_append(h:*mut core::ffi::c_void, p:*const u8, n:usize) -> i32;
|
||||
fn sb_to_string(h:*mut core::ffi::c_void, out:*mut *mut u8, len:*mut usize) -> i32;
|
||||
fn ny_free_str(p:*mut core::ffi::c_void);
|
||||
fn sb_free(h:*mut core::ffi::c_void);
|
||||
}
|
||||
```
|
||||
|
||||
評価計画(実務+学術)
|
||||
- 生産性:実装 LoC / 時間,改修差分の小ささ(C 実装比)
|
||||
- 性能:連結 10^6 回のスループット,FFI 境界のオーバーヘッド(まとめ API で最適化)
|
||||
- 信頼性:ASan/valgrind,文字列所有権(リークゼロ)
|
||||
- 多言語統合:Python/Rust/Go/Node で同一 .so をスモーク
|
||||
- 再利用性:同一成果物を Nyash プラグイン(v2) と汎用 C ライブラリの両用途で利用
|
||||
|
||||
限界とリスク
|
||||
- NyRT とのリンク方針不一致(動的/静的)やバージョン差異で未定義動作の恐れ → 版照合 API と CI を用意
|
||||
- 返却バッファの所有権取り決めが曖昧だとリーク → `ny_free_str()` を ABI の一部に
|
||||
- マルチスレッドでの再入性 → 仕様として明示(必要に応じロック/TLS)
|
||||
|
||||
ロードマップ
|
||||
1) PoC:StringBuilderBox を .so 化(PIC+NyRTリンク),Python/Rust スモークとリーク検査
|
||||
2) Nyash プラグイン(v2) ラッパ追加(`nyash.toml` に登録)
|
||||
3) JSON/文字列ユーティリティの横展開(共通 ABI ルール化)
|
||||
4) ABI 自動生成:Nyash シグネチャ → C ヘッダ/ラッパ生成
|
||||
5) パッケージ化テンプレ(ヘッダ+pkg-config+CI スモーク)
|
||||
|
||||
関連文書
|
||||
- ScopeBox/LoopForm(制御構造の正規化構想):docs/guides/loopform.md
|
||||
- Nyash LLVM/LlvmPy 概要:docs/design/LLVM_LAYER_OVERVIEW.md
|
||||
- Seam‑aware JSON Unification(前処理と決定論実装):papers/paper-y-seam-aware-json-unification/README.md
|
||||
|
||||
Reference in New Issue
Block a user