fix(aot): resolve copy chains in PHI propagation for deep loop coverage

Fix CollectionsHot type_table to resolve copy chains when propagating types
through PHI nodes, enabling ArrayBox optimization in deeply nested loops (matmul).

**Root Cause:**
- PHI incoming values were checked directly in type_table
- Copy chains (e.g., SSA 56→11) were not resolved
- Types failed to propagate through deep loop nesting

**Solution:**
- Add copy_src_map parameter to build_type_table()
- Resolve copy chains before checking if PHI incoming values are typed
- Multi-pass fixpoint algorithm (up to 12 passes) ensures convergence

**Impact:**
- matmul: ArrayBox A/B/C now propagate through nested loops
- Expected: boxcall→externcall conversion for get/set/push
- Backwards compatible: opt-in (NYASH_AOT_COLLECTIONS_HOT=1), no behavior change when disabled

**Analysis:**
- Documented in docs/development/analysis/matmul_collections_hot_fix.md
- Box→SSA flow traced: A(dst=2)→...→71(get), B(dst=3)→...→73(get), C(dst=4)→...→105(set)

**Testing:** Pending verification due to VM step budget constraints

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
nyash-codex
2025-11-14 06:54:56 +09:00
parent 71ff310471
commit 13f21334c9
2 changed files with 164 additions and 2 deletions

View File

@ -0,0 +1,155 @@
# CollectionsHot Fix for matmul Pattern
## Problem Summary
**Issue**: CollectionsHot pass was not rewriting ArrayBox get/set/push operations to externcalls in matmul benchmark, causing 200,000% performance degradation vs C.
**Root Cause**: PHI propagation in `build_type_table()` failed to resolve copy chains before checking if incoming PHI values were typed, causing type propagation to stop prematurely in deeply nested loop structures.
## Analysis Results (Task 1)
### Box→SSA Mapping for matmul
**Test case**: `/tmp/matmul_min.hako` with n=512, reps=8, A/B/C as `new ArrayBox()`
#### ArrayBox A (newbox dst=2):
- Value flow: `[2, 10, 11, 39, 40, 55, 56, 70, 71, 100, 101]`
- Boxcalls using A: 2
- Block 2: `box=11.push([17])`
- Block 11: `box=71.get([86])`
#### ArrayBox B (newbox dst=3):
- Value flow: `[3, 12, 13, 41, 42, 57, 58, 72, 73, 102, 103]`
- Boxcalls using B: 2
- Block 2: `box=13.push([25])`
- Block 11: `box=73.get([89])`
#### ArrayBox C (newbox dst=4):
- Value flow: `[4, 14, 15, 43, 44, 59, 60, 74, 75, 104, 105]`
- Boxcalls using C: 3
- Block 2: `box=15.push([30])`
- Block 6: `box=44.get([135])`
- Block 15: `box=105.set([124, 107])`
### Type Table Propagation Issue
**Before fix**: type_table stopped at SSA 43
- Final size: 12 entries
- Entries: `{2: 'arr', 3: 'arr', 4: 'arr', 10: 'arr', 11: 'arr', 12: 'arr', 13: 'arr', 14: 'arr', 15: 'arr', 39: 'arr', 41: 'arr', 43: 'arr'}`
**Problem**: Boxcalls used SSA 71, 73, 105 which were NOT in type_table
**SSA 71 trace back to origin**:
```
71 -> 70 -> 56 -> 55 -> 40 -> 39 -> 11 -> 10 -> 2 (newbox ArrayBox)
```
**Blocking PHI**: SSA 40
```json
{
"dst": 40,
"incoming": [[39, 3], [56, 9]],
"op": "phi"
}
```
- SSA 39: in type_table ✓
- SSA 56: NOT in type_table ✗ (but resolves via copy to SSA 11 which IS typed)
## Solution (Task 2)
### Changes to `collections_hot.hako`
#### 1. Updated `build_type_table` signature (line 23):
```hako
// OLD:
build_type_table(text) {
// NEW:
build_type_table(text, copy_src_map) {
```
#### 2. Added copy resolution in PHI propagation (lines 62-66):
```hako
// NEW CODE:
local vid = StringHelpers.read_digits(body, posb+1)
// Resolve copy chains before checking type table
if vid != "" && copy_src_map != null {
vid = AotPrepCollectionsHotBox.resolve_copy(copy_src_map, vid)
}
if vid != "" && tmap.has(vid) {
// ... rest of logic
```
#### 3. Updated call site to pass copy_src (line 485):
```hako
// OLD:
local type_table = AotPrepCollectionsHotBox.build_type_table(out)
// NEW:
// Pass copy_src to enable PHI propagation through copy chains (matmul fix)
local type_table = AotPrepCollectionsHotBox.build_type_table(out, copy_src)
```
### Why This Works
1. **copy_src map** is already built before calling `build_type_table` (lines 448-460)
2. **PHI propagation** now resolves copy chains: when checking PHI incoming value 56, it resolves to 11, which IS in type_table
3. **Cascading effect**: Once SSA 40 gets typed, subsequent PHIs (56, 70, 71) get typed in later passes
4. **Fixpoint convergence**: Multi-pass algorithm (up to 12 passes) ensures deep nested loops converge
## Expected Results (Task 3-5)
### Task 3: Verification
- PREP MIR should show `"op":"externcall"` with `nyash.array.get_h`, `nyash.array.set_h`, `nyash.array.push_h`
- Original PREP had 0 externcalls, 1+ boxcalls
- Fixed PREP should have many externcalls, fewer boxcalls
### Task 4: Performance Target
- **Current**: C ≈ 12ms, Hakorune EXE ≈ 25s, ratio ≈ 200,000%
- **Target**: ratio < 1000% (ideally < 500% after subsequent hoist/CSE passes)
- **Stretch goal**: ratio 125% (matching maplin's C-equivalent performance)
### Task 5: Regression Checks
Must verify no degradation in:
- `maplin`: Currently at ~100% ratio (C-equivalent)
- `arraymap`: Existing optimizations preserved
- `linidx`: Existing optimizations preserved
## Testing Commands
```bash
# Generate ORIG MIR (no AOT prep)
NYASH_SKIP_TOML_ENV=1 NYASH_DISABLE_PLUGINS=1 NYASH_PARSER_STAGE3=1 \
./target/release/hakorune --emit-mir-json tmp/matmul_orig.json /tmp/matmul_min.hako
# Generate PREP MIR (with CollectionsHot fix)
# Note: Need to use tools/hakorune_emit_mir.sh or manually apply AotPrep.run_json
# Count externcalls vs boxcalls
rg '"op":"externcall"' tmp/matmul_prep.json | wc -l
rg '"op":"boxcall"' tmp/matmul_prep.json | wc -l
# Run benchmark
NYASH_SKIP_TOML_ENV=1 NYASH_DISABLE_PLUGINS=1 \
tools/perf/microbench.sh --case matmul --backend llvm --exe --runs 3
# Regression checks
tools/perf/microbench.sh --case maplin --backend llvm --exe --runs 3
tools/perf/microbench.sh --case arraymap --backend llvm --exe --runs 3 --budget-ms 10000
tools/perf/microbench.sh --case linidx --backend llvm --exe --runs 3
```
## Implementation Status
- Task 1: Analysis complete
- Task 2: Code changes complete (collections_hot.hako modified)
- Task 3: Testing pending (VM step budget issues with large MIR)
- Task 4: Performance benchmark pending
- Task 5: Regression checks pending
## Notes
- **Phase 15.5 compliance**: Fix follows "既定OFF" principle - only affects when `NYASH_AOT_COLLECTIONS_HOT=1` is set
- **Backwards compatibility**: `copy_src_map` parameter is optional (can be null), so existing callers still work
- **No behavior change**: Existing maplin/arraymap/linidx should be unaffected as they don't have deep PHI chains requiring copy resolution

View File

@ -19,7 +19,8 @@ static box AotPrepCollectionsHotBox {
} }
// Build lightweight type table: vid -> "arr" | "map" // Build lightweight type table: vid -> "arr" | "map"
build_type_table(text) { // copy_src_map is optional; if provided, used to resolve copy chains in PHI incoming
build_type_table(text, copy_src_map) {
local tmap = new MapBox() local tmap = new MapBox()
// multi-pass to propagate copy/phi until fixpoint (cap at 12) // multi-pass to propagate copy/phi until fixpoint (cap at 12)
local pass = 0 local pass = 0
@ -54,10 +55,15 @@ static box AotPrepCollectionsHotBox {
if aend > abr { if aend > abr {
local body = inst.substring(abr+1, aend) local body = inst.substring(abr+1, aend)
// Accept only when ALL incomings are already typed and all equal // Accept only when ALL incomings are already typed and all equal
// Use copy resolution when copy_src_map is available
local tval = ""; local all_same = 1 local tval = ""; local all_same = 1
local posb = body.indexOf("[") local posb = body.indexOf("[")
loop(posb >= 0) { loop(posb >= 0) {
local vid = StringHelpers.read_digits(body, posb+1) local vid = StringHelpers.read_digits(body, posb+1)
// Resolve copy chains before checking type table
if vid != "" && copy_src_map != null {
vid = AotPrepCollectionsHotBox.resolve_copy(copy_src_map, vid)
}
if vid != "" && tmap.has(vid) { if vid != "" && tmap.has(vid) {
local tv = tmap.get(vid) local tv = tmap.get(vid)
if tval == "" { tval = tv } else { if tv != tval { all_same = 0 break } } if tval == "" { tval = tv } else { if tv != tval { all_same = 0 break } }
@ -475,7 +481,8 @@ static box AotPrepCollectionsHotBox {
} }
} }
// Build lightweight SSA type table (arr/map) on out text for consistency with rewrite scan // Build lightweight SSA type table (arr/map) on out text for consistency with rewrite scan
local type_table = AotPrepCollectionsHotBox.build_type_table(out) // Pass copy_src to enable PHI propagation through copy chains (matmul fix)
local type_table = AotPrepCollectionsHotBox.build_type_table(out, copy_src)
// Backpropagation: infer types from call sites (default enabled) // Backpropagation: infer types from call sites (default enabled)
AotPrepCollectionsHotBox.tmap_backprop(out, type_table, copy_src) AotPrepCollectionsHotBox.tmap_backprop(out, type_table, copy_src)
{ {