fix(aot): resolve copy chains in PHI propagation for deep loop coverage
Fix CollectionsHot type_table to resolve copy chains when propagating types through PHI nodes, enabling ArrayBox optimization in deeply nested loops (matmul). **Root Cause:** - PHI incoming values were checked directly in type_table - Copy chains (e.g., SSA 56→11) were not resolved - Types failed to propagate through deep loop nesting **Solution:** - Add copy_src_map parameter to build_type_table() - Resolve copy chains before checking if PHI incoming values are typed - Multi-pass fixpoint algorithm (up to 12 passes) ensures convergence **Impact:** - matmul: ArrayBox A/B/C now propagate through nested loops - Expected: boxcall→externcall conversion for get/set/push - Backwards compatible: opt-in (NYASH_AOT_COLLECTIONS_HOT=1), no behavior change when disabled **Analysis:** - Documented in docs/development/analysis/matmul_collections_hot_fix.md - Box→SSA flow traced: A(dst=2)→...→71(get), B(dst=3)→...→73(get), C(dst=4)→...→105(set) **Testing:** Pending verification due to VM step budget constraints 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
155
docs/development/analysis/matmul_collections_hot_fix.md
Normal file
155
docs/development/analysis/matmul_collections_hot_fix.md
Normal file
@ -0,0 +1,155 @@
|
||||
# CollectionsHot Fix for matmul Pattern
|
||||
|
||||
## Problem Summary
|
||||
|
||||
**Issue**: CollectionsHot pass was not rewriting ArrayBox get/set/push operations to externcalls in matmul benchmark, causing 200,000% performance degradation vs C.
|
||||
|
||||
**Root Cause**: PHI propagation in `build_type_table()` failed to resolve copy chains before checking if incoming PHI values were typed, causing type propagation to stop prematurely in deeply nested loop structures.
|
||||
|
||||
## Analysis Results (Task 1)
|
||||
|
||||
### Box→SSA Mapping for matmul
|
||||
|
||||
**Test case**: `/tmp/matmul_min.hako` with n=512, reps=8, A/B/C as `new ArrayBox()`
|
||||
|
||||
#### ArrayBox A (newbox dst=2):
|
||||
- Value flow: `[2, 10, 11, 39, 40, 55, 56, 70, 71, 100, 101]`
|
||||
- Boxcalls using A: 2
|
||||
- Block 2: `box=11.push([17])`
|
||||
- Block 11: `box=71.get([86])`
|
||||
|
||||
#### ArrayBox B (newbox dst=3):
|
||||
- Value flow: `[3, 12, 13, 41, 42, 57, 58, 72, 73, 102, 103]`
|
||||
- Boxcalls using B: 2
|
||||
- Block 2: `box=13.push([25])`
|
||||
- Block 11: `box=73.get([89])`
|
||||
|
||||
#### ArrayBox C (newbox dst=4):
|
||||
- Value flow: `[4, 14, 15, 43, 44, 59, 60, 74, 75, 104, 105]`
|
||||
- Boxcalls using C: 3
|
||||
- Block 2: `box=15.push([30])`
|
||||
- Block 6: `box=44.get([135])`
|
||||
- Block 15: `box=105.set([124, 107])`
|
||||
|
||||
### Type Table Propagation Issue
|
||||
|
||||
**Before fix**: type_table stopped at SSA 43
|
||||
- Final size: 12 entries
|
||||
- Entries: `{2: 'arr', 3: 'arr', 4: 'arr', 10: 'arr', 11: 'arr', 12: 'arr', 13: 'arr', 14: 'arr', 15: 'arr', 39: 'arr', 41: 'arr', 43: 'arr'}`
|
||||
|
||||
**Problem**: Boxcalls used SSA 71, 73, 105 which were NOT in type_table
|
||||
|
||||
**SSA 71 trace back to origin**:
|
||||
```
|
||||
71 -> 70 -> 56 -> 55 -> 40 -> 39 -> 11 -> 10 -> 2 (newbox ArrayBox)
|
||||
```
|
||||
|
||||
**Blocking PHI**: SSA 40
|
||||
```json
|
||||
{
|
||||
"dst": 40,
|
||||
"incoming": [[39, 3], [56, 9]],
|
||||
"op": "phi"
|
||||
}
|
||||
```
|
||||
|
||||
- SSA 39: in type_table ✓
|
||||
- SSA 56: NOT in type_table ✗ (but resolves via copy to SSA 11 which IS typed)
|
||||
|
||||
## Solution (Task 2)
|
||||
|
||||
### Changes to `collections_hot.hako`
|
||||
|
||||
#### 1. Updated `build_type_table` signature (line 23):
|
||||
```hako
|
||||
// OLD:
|
||||
build_type_table(text) {
|
||||
|
||||
// NEW:
|
||||
build_type_table(text, copy_src_map) {
|
||||
```
|
||||
|
||||
#### 2. Added copy resolution in PHI propagation (lines 62-66):
|
||||
```hako
|
||||
// NEW CODE:
|
||||
local vid = StringHelpers.read_digits(body, posb+1)
|
||||
// Resolve copy chains before checking type table
|
||||
if vid != "" && copy_src_map != null {
|
||||
vid = AotPrepCollectionsHotBox.resolve_copy(copy_src_map, vid)
|
||||
}
|
||||
if vid != "" && tmap.has(vid) {
|
||||
// ... rest of logic
|
||||
```
|
||||
|
||||
#### 3. Updated call site to pass copy_src (line 485):
|
||||
```hako
|
||||
// OLD:
|
||||
local type_table = AotPrepCollectionsHotBox.build_type_table(out)
|
||||
|
||||
// NEW:
|
||||
// Pass copy_src to enable PHI propagation through copy chains (matmul fix)
|
||||
local type_table = AotPrepCollectionsHotBox.build_type_table(out, copy_src)
|
||||
```
|
||||
|
||||
### Why This Works
|
||||
|
||||
1. **copy_src map** is already built before calling `build_type_table` (lines 448-460)
|
||||
2. **PHI propagation** now resolves copy chains: when checking PHI incoming value 56, it resolves to 11, which IS in type_table
|
||||
3. **Cascading effect**: Once SSA 40 gets typed, subsequent PHIs (56, 70, 71) get typed in later passes
|
||||
4. **Fixpoint convergence**: Multi-pass algorithm (up to 12 passes) ensures deep nested loops converge
|
||||
|
||||
## Expected Results (Task 3-5)
|
||||
|
||||
### Task 3: Verification
|
||||
- PREP MIR should show `"op":"externcall"` with `nyash.array.get_h`, `nyash.array.set_h`, `nyash.array.push_h`
|
||||
- Original PREP had 0 externcalls, 1+ boxcalls
|
||||
- Fixed PREP should have many externcalls, fewer boxcalls
|
||||
|
||||
### Task 4: Performance Target
|
||||
- **Current**: C ≈ 12ms, Hakorune EXE ≈ 25s, ratio ≈ 200,000%
|
||||
- **Target**: ratio < 1000% (ideally < 500% after subsequent hoist/CSE passes)
|
||||
- **Stretch goal**: ratio ≈ 125% (matching maplin's C-equivalent performance)
|
||||
|
||||
### Task 5: Regression Checks
|
||||
Must verify no degradation in:
|
||||
- `maplin`: Currently at ~100% ratio (C-equivalent)
|
||||
- `arraymap`: Existing optimizations preserved
|
||||
- `linidx`: Existing optimizations preserved
|
||||
|
||||
## Testing Commands
|
||||
|
||||
```bash
|
||||
# Generate ORIG MIR (no AOT prep)
|
||||
NYASH_SKIP_TOML_ENV=1 NYASH_DISABLE_PLUGINS=1 NYASH_PARSER_STAGE3=1 \
|
||||
./target/release/hakorune --emit-mir-json tmp/matmul_orig.json /tmp/matmul_min.hako
|
||||
|
||||
# Generate PREP MIR (with CollectionsHot fix)
|
||||
# Note: Need to use tools/hakorune_emit_mir.sh or manually apply AotPrep.run_json
|
||||
|
||||
# Count externcalls vs boxcalls
|
||||
rg '"op":"externcall"' tmp/matmul_prep.json | wc -l
|
||||
rg '"op":"boxcall"' tmp/matmul_prep.json | wc -l
|
||||
|
||||
# Run benchmark
|
||||
NYASH_SKIP_TOML_ENV=1 NYASH_DISABLE_PLUGINS=1 \
|
||||
tools/perf/microbench.sh --case matmul --backend llvm --exe --runs 3
|
||||
|
||||
# Regression checks
|
||||
tools/perf/microbench.sh --case maplin --backend llvm --exe --runs 3
|
||||
tools/perf/microbench.sh --case arraymap --backend llvm --exe --runs 3 --budget-ms 10000
|
||||
tools/perf/microbench.sh --case linidx --backend llvm --exe --runs 3
|
||||
```
|
||||
|
||||
## Implementation Status
|
||||
|
||||
- ✅ Task 1: Analysis complete
|
||||
- ✅ Task 2: Code changes complete (collections_hot.hako modified)
|
||||
- ⏳ Task 3: Testing pending (VM step budget issues with large MIR)
|
||||
- ⏳ Task 4: Performance benchmark pending
|
||||
- ⏳ Task 5: Regression checks pending
|
||||
|
||||
## Notes
|
||||
|
||||
- **Phase 15.5 compliance**: Fix follows "既定OFF" principle - only affects when `NYASH_AOT_COLLECTIONS_HOT=1` is set
|
||||
- **Backwards compatibility**: `copy_src_map` parameter is optional (can be null), so existing callers still work
|
||||
- **No behavior change**: Existing maplin/arraymap/linidx should be unaffected as they don't have deep PHI chains requiring copy resolution
|
||||
@ -19,7 +19,8 @@ static box AotPrepCollectionsHotBox {
|
||||
}
|
||||
|
||||
// Build lightweight type table: vid -> "arr" | "map"
|
||||
build_type_table(text) {
|
||||
// copy_src_map is optional; if provided, used to resolve copy chains in PHI incoming
|
||||
build_type_table(text, copy_src_map) {
|
||||
local tmap = new MapBox()
|
||||
// multi-pass to propagate copy/phi until fixpoint (cap at 12)
|
||||
local pass = 0
|
||||
@ -54,10 +55,15 @@ static box AotPrepCollectionsHotBox {
|
||||
if aend > abr {
|
||||
local body = inst.substring(abr+1, aend)
|
||||
// Accept only when ALL incomings are already typed and all equal
|
||||
// Use copy resolution when copy_src_map is available
|
||||
local tval = ""; local all_same = 1
|
||||
local posb = body.indexOf("[")
|
||||
loop(posb >= 0) {
|
||||
local vid = StringHelpers.read_digits(body, posb+1)
|
||||
// Resolve copy chains before checking type table
|
||||
if vid != "" && copy_src_map != null {
|
||||
vid = AotPrepCollectionsHotBox.resolve_copy(copy_src_map, vid)
|
||||
}
|
||||
if vid != "" && tmap.has(vid) {
|
||||
local tv = tmap.get(vid)
|
||||
if tval == "" { tval = tv } else { if tv != tval { all_same = 0 break } }
|
||||
@ -475,7 +481,8 @@ static box AotPrepCollectionsHotBox {
|
||||
}
|
||||
}
|
||||
// Build lightweight SSA type table (arr/map) on out text for consistency with rewrite scan
|
||||
local type_table = AotPrepCollectionsHotBox.build_type_table(out)
|
||||
// Pass copy_src to enable PHI propagation through copy chains (matmul fix)
|
||||
local type_table = AotPrepCollectionsHotBox.build_type_table(out, copy_src)
|
||||
// Backpropagation: infer types from call sites (default enabled)
|
||||
AotPrepCollectionsHotBox.tmap_backprop(out, type_table, copy_src)
|
||||
{
|
||||
|
||||
Reference in New Issue
Block a user