From 13f21334c9cf473d644e13401fe3bcec71e9057c Mon Sep 17 00:00:00 2001 From: nyash-codex Date: Fri, 14 Nov 2025 06:54:56 +0900 Subject: [PATCH] fix(aot): resolve copy chains in PHI propagation for deep loop coverage MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Fix CollectionsHot type_table to resolve copy chains when propagating types through PHI nodes, enabling ArrayBox optimization in deeply nested loops (matmul). **Root Cause:** - PHI incoming values were checked directly in type_table - Copy chains (e.g., SSA 56→11) were not resolved - Types failed to propagate through deep loop nesting **Solution:** - Add copy_src_map parameter to build_type_table() - Resolve copy chains before checking if PHI incoming values are typed - Multi-pass fixpoint algorithm (up to 12 passes) ensures convergence **Impact:** - matmul: ArrayBox A/B/C now propagate through nested loops - Expected: boxcall→externcall conversion for get/set/push - Backwards compatible: opt-in (NYASH_AOT_COLLECTIONS_HOT=1), no behavior change when disabled **Analysis:** - Documented in docs/development/analysis/matmul_collections_hot_fix.md - Box→SSA flow traced: A(dst=2)→...→71(get), B(dst=3)→...→73(get), C(dst=4)→...→105(set) **Testing:** Pending verification due to VM step budget constraints 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude --- .../analysis/matmul_collections_hot_fix.md | 155 ++++++++++++++++++ .../aot_prep/passes/collections_hot.hako | 11 +- 2 files changed, 164 insertions(+), 2 deletions(-) create mode 100644 docs/development/analysis/matmul_collections_hot_fix.md diff --git a/docs/development/analysis/matmul_collections_hot_fix.md b/docs/development/analysis/matmul_collections_hot_fix.md new file mode 100644 index 00000000..87fe07f2 --- /dev/null +++ b/docs/development/analysis/matmul_collections_hot_fix.md @@ -0,0 +1,155 @@ +# CollectionsHot Fix for matmul Pattern + +## Problem Summary + +**Issue**: CollectionsHot pass was not rewriting ArrayBox get/set/push operations to externcalls in matmul benchmark, causing 200,000% performance degradation vs C. + +**Root Cause**: PHI propagation in `build_type_table()` failed to resolve copy chains before checking if incoming PHI values were typed, causing type propagation to stop prematurely in deeply nested loop structures. + +## Analysis Results (Task 1) + +### Box→SSA Mapping for matmul + +**Test case**: `/tmp/matmul_min.hako` with n=512, reps=8, A/B/C as `new ArrayBox()` + +#### ArrayBox A (newbox dst=2): +- Value flow: `[2, 10, 11, 39, 40, 55, 56, 70, 71, 100, 101]` +- Boxcalls using A: 2 + - Block 2: `box=11.push([17])` + - Block 11: `box=71.get([86])` + +#### ArrayBox B (newbox dst=3): +- Value flow: `[3, 12, 13, 41, 42, 57, 58, 72, 73, 102, 103]` +- Boxcalls using B: 2 + - Block 2: `box=13.push([25])` + - Block 11: `box=73.get([89])` + +#### ArrayBox C (newbox dst=4): +- Value flow: `[4, 14, 15, 43, 44, 59, 60, 74, 75, 104, 105]` +- Boxcalls using C: 3 + - Block 2: `box=15.push([30])` + - Block 6: `box=44.get([135])` + - Block 15: `box=105.set([124, 107])` + +### Type Table Propagation Issue + +**Before fix**: type_table stopped at SSA 43 +- Final size: 12 entries +- Entries: `{2: 'arr', 3: 'arr', 4: 'arr', 10: 'arr', 11: 'arr', 12: 'arr', 13: 'arr', 14: 'arr', 15: 'arr', 39: 'arr', 41: 'arr', 43: 'arr'}` + +**Problem**: Boxcalls used SSA 71, 73, 105 which were NOT in type_table + +**SSA 71 trace back to origin**: +``` +71 -> 70 -> 56 -> 55 -> 40 -> 39 -> 11 -> 10 -> 2 (newbox ArrayBox) +``` + +**Blocking PHI**: SSA 40 +```json +{ + "dst": 40, + "incoming": [[39, 3], [56, 9]], + "op": "phi" +} +``` + +- SSA 39: in type_table ✓ +- SSA 56: NOT in type_table ✗ (but resolves via copy to SSA 11 which IS typed) + +## Solution (Task 2) + +### Changes to `collections_hot.hako` + +#### 1. Updated `build_type_table` signature (line 23): +```hako +// OLD: +build_type_table(text) { + +// NEW: +build_type_table(text, copy_src_map) { +``` + +#### 2. Added copy resolution in PHI propagation (lines 62-66): +```hako +// NEW CODE: +local vid = StringHelpers.read_digits(body, posb+1) +// Resolve copy chains before checking type table +if vid != "" && copy_src_map != null { + vid = AotPrepCollectionsHotBox.resolve_copy(copy_src_map, vid) +} +if vid != "" && tmap.has(vid) { + // ... rest of logic +``` + +#### 3. Updated call site to pass copy_src (line 485): +```hako +// OLD: +local type_table = AotPrepCollectionsHotBox.build_type_table(out) + +// NEW: +// Pass copy_src to enable PHI propagation through copy chains (matmul fix) +local type_table = AotPrepCollectionsHotBox.build_type_table(out, copy_src) +``` + +### Why This Works + +1. **copy_src map** is already built before calling `build_type_table` (lines 448-460) +2. **PHI propagation** now resolves copy chains: when checking PHI incoming value 56, it resolves to 11, which IS in type_table +3. **Cascading effect**: Once SSA 40 gets typed, subsequent PHIs (56, 70, 71) get typed in later passes +4. **Fixpoint convergence**: Multi-pass algorithm (up to 12 passes) ensures deep nested loops converge + +## Expected Results (Task 3-5) + +### Task 3: Verification +- PREP MIR should show `"op":"externcall"` with `nyash.array.get_h`, `nyash.array.set_h`, `nyash.array.push_h` +- Original PREP had 0 externcalls, 1+ boxcalls +- Fixed PREP should have many externcalls, fewer boxcalls + +### Task 4: Performance Target +- **Current**: C ≈ 12ms, Hakorune EXE ≈ 25s, ratio ≈ 200,000% +- **Target**: ratio < 1000% (ideally < 500% after subsequent hoist/CSE passes) +- **Stretch goal**: ratio ≈ 125% (matching maplin's C-equivalent performance) + +### Task 5: Regression Checks +Must verify no degradation in: +- `maplin`: Currently at ~100% ratio (C-equivalent) +- `arraymap`: Existing optimizations preserved +- `linidx`: Existing optimizations preserved + +## Testing Commands + +```bash +# Generate ORIG MIR (no AOT prep) +NYASH_SKIP_TOML_ENV=1 NYASH_DISABLE_PLUGINS=1 NYASH_PARSER_STAGE3=1 \ +./target/release/hakorune --emit-mir-json tmp/matmul_orig.json /tmp/matmul_min.hako + +# Generate PREP MIR (with CollectionsHot fix) +# Note: Need to use tools/hakorune_emit_mir.sh or manually apply AotPrep.run_json + +# Count externcalls vs boxcalls +rg '"op":"externcall"' tmp/matmul_prep.json | wc -l +rg '"op":"boxcall"' tmp/matmul_prep.json | wc -l + +# Run benchmark +NYASH_SKIP_TOML_ENV=1 NYASH_DISABLE_PLUGINS=1 \ +tools/perf/microbench.sh --case matmul --backend llvm --exe --runs 3 + +# Regression checks +tools/perf/microbench.sh --case maplin --backend llvm --exe --runs 3 +tools/perf/microbench.sh --case arraymap --backend llvm --exe --runs 3 --budget-ms 10000 +tools/perf/microbench.sh --case linidx --backend llvm --exe --runs 3 +``` + +## Implementation Status + +- ✅ Task 1: Analysis complete +- ✅ Task 2: Code changes complete (collections_hot.hako modified) +- ⏳ Task 3: Testing pending (VM step budget issues with large MIR) +- ⏳ Task 4: Performance benchmark pending +- ⏳ Task 5: Regression checks pending + +## Notes + +- **Phase 15.5 compliance**: Fix follows "既定OFF" principle - only affects when `NYASH_AOT_COLLECTIONS_HOT=1` is set +- **Backwards compatibility**: `copy_src_map` parameter is optional (can be null), so existing callers still work +- **No behavior change**: Existing maplin/arraymap/linidx should be unaffected as they don't have deep PHI chains requiring copy resolution diff --git a/lang/src/llvm_ir/boxes/aot_prep/passes/collections_hot.hako b/lang/src/llvm_ir/boxes/aot_prep/passes/collections_hot.hako index 9eae349f..c4aa7727 100644 --- a/lang/src/llvm_ir/boxes/aot_prep/passes/collections_hot.hako +++ b/lang/src/llvm_ir/boxes/aot_prep/passes/collections_hot.hako @@ -19,7 +19,8 @@ static box AotPrepCollectionsHotBox { } // Build lightweight type table: vid -> "arr" | "map" - build_type_table(text) { + // copy_src_map is optional; if provided, used to resolve copy chains in PHI incoming + build_type_table(text, copy_src_map) { local tmap = new MapBox() // multi-pass to propagate copy/phi until fixpoint (cap at 12) local pass = 0 @@ -54,10 +55,15 @@ static box AotPrepCollectionsHotBox { if aend > abr { local body = inst.substring(abr+1, aend) // Accept only when ALL incomings are already typed and all equal + // Use copy resolution when copy_src_map is available local tval = ""; local all_same = 1 local posb = body.indexOf("[") loop(posb >= 0) { local vid = StringHelpers.read_digits(body, posb+1) + // Resolve copy chains before checking type table + if vid != "" && copy_src_map != null { + vid = AotPrepCollectionsHotBox.resolve_copy(copy_src_map, vid) + } if vid != "" && tmap.has(vid) { local tv = tmap.get(vid) if tval == "" { tval = tv } else { if tv != tval { all_same = 0 break } } @@ -475,7 +481,8 @@ static box AotPrepCollectionsHotBox { } } // Build lightweight SSA type table (arr/map) on out text for consistency with rewrite scan - local type_table = AotPrepCollectionsHotBox.build_type_table(out) + // Pass copy_src to enable PHI propagation through copy chains (matmul fix) + local type_table = AotPrepCollectionsHotBox.build_type_table(out, copy_src) // Backpropagation: infer types from call sites (default enabled) AotPrepCollectionsHotBox.tmap_backprop(out, type_table, copy_src) {