Files
hakorune/docs/development/analysis/matmul_collections_hot_fix.md
nyash-codex 13f21334c9 fix(aot): resolve copy chains in PHI propagation for deep loop coverage
Fix CollectionsHot type_table to resolve copy chains when propagating types
through PHI nodes, enabling ArrayBox optimization in deeply nested loops (matmul).

**Root Cause:**
- PHI incoming values were checked directly in type_table
- Copy chains (e.g., SSA 56→11) were not resolved
- Types failed to propagate through deep loop nesting

**Solution:**
- Add copy_src_map parameter to build_type_table()
- Resolve copy chains before checking if PHI incoming values are typed
- Multi-pass fixpoint algorithm (up to 12 passes) ensures convergence

**Impact:**
- matmul: ArrayBox A/B/C now propagate through nested loops
- Expected: boxcall→externcall conversion for get/set/push
- Backwards compatible: opt-in (NYASH_AOT_COLLECTIONS_HOT=1), no behavior change when disabled

**Analysis:**
- Documented in docs/development/analysis/matmul_collections_hot_fix.md
- Box→SSA flow traced: A(dst=2)→...→71(get), B(dst=3)→...→73(get), C(dst=4)→...→105(set)

**Testing:** Pending verification due to VM step budget constraints

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-14 06:54:56 +09:00

5.3 KiB

CollectionsHot Fix for matmul Pattern

Problem Summary

Issue: CollectionsHot pass was not rewriting ArrayBox get/set/push operations to externcalls in matmul benchmark, causing 200,000% performance degradation vs C.

Root Cause: PHI propagation in build_type_table() failed to resolve copy chains before checking if incoming PHI values were typed, causing type propagation to stop prematurely in deeply nested loop structures.

Analysis Results (Task 1)

Box→SSA Mapping for matmul

Test case: /tmp/matmul_min.hako with n=512, reps=8, A/B/C as new ArrayBox()

ArrayBox A (newbox dst=2):

  • Value flow: [2, 10, 11, 39, 40, 55, 56, 70, 71, 100, 101]
  • Boxcalls using A: 2
    • Block 2: box=11.push([17])
    • Block 11: box=71.get([86])

ArrayBox B (newbox dst=3):

  • Value flow: [3, 12, 13, 41, 42, 57, 58, 72, 73, 102, 103]
  • Boxcalls using B: 2
    • Block 2: box=13.push([25])
    • Block 11: box=73.get([89])

ArrayBox C (newbox dst=4):

  • Value flow: [4, 14, 15, 43, 44, 59, 60, 74, 75, 104, 105]
  • Boxcalls using C: 3
    • Block 2: box=15.push([30])
    • Block 6: box=44.get([135])
    • Block 15: box=105.set([124, 107])

Type Table Propagation Issue

Before fix: type_table stopped at SSA 43

  • Final size: 12 entries
  • Entries: {2: 'arr', 3: 'arr', 4: 'arr', 10: 'arr', 11: 'arr', 12: 'arr', 13: 'arr', 14: 'arr', 15: 'arr', 39: 'arr', 41: 'arr', 43: 'arr'}

Problem: Boxcalls used SSA 71, 73, 105 which were NOT in type_table

SSA 71 trace back to origin:

71 -> 70 -> 56 -> 55 -> 40 -> 39 -> 11 -> 10 -> 2 (newbox ArrayBox)

Blocking PHI: SSA 40

{
  "dst": 40,
  "incoming": [[39, 3], [56, 9]],
  "op": "phi"
}
  • SSA 39: in type_table ✓
  • SSA 56: NOT in type_table ✗ (but resolves via copy to SSA 11 which IS typed)

Solution (Task 2)

Changes to collections_hot.hako

1. Updated build_type_table signature (line 23):

// OLD:
build_type_table(text) {

// NEW:
build_type_table(text, copy_src_map) {

2. Added copy resolution in PHI propagation (lines 62-66):

// NEW CODE:
local vid = StringHelpers.read_digits(body, posb+1)
// Resolve copy chains before checking type table
if vid != "" && copy_src_map != null {
  vid = AotPrepCollectionsHotBox.resolve_copy(copy_src_map, vid)
}
if vid != "" && tmap.has(vid) {
  // ... rest of logic

3. Updated call site to pass copy_src (line 485):

// OLD:
local type_table = AotPrepCollectionsHotBox.build_type_table(out)

// NEW:
// Pass copy_src to enable PHI propagation through copy chains (matmul fix)
local type_table = AotPrepCollectionsHotBox.build_type_table(out, copy_src)

Why This Works

  1. copy_src map is already built before calling build_type_table (lines 448-460)
  2. PHI propagation now resolves copy chains: when checking PHI incoming value 56, it resolves to 11, which IS in type_table
  3. Cascading effect: Once SSA 40 gets typed, subsequent PHIs (56, 70, 71) get typed in later passes
  4. Fixpoint convergence: Multi-pass algorithm (up to 12 passes) ensures deep nested loops converge

Expected Results (Task 3-5)

Task 3: Verification

  • PREP MIR should show "op":"externcall" with nyash.array.get_h, nyash.array.set_h, nyash.array.push_h
  • Original PREP had 0 externcalls, 1+ boxcalls
  • Fixed PREP should have many externcalls, fewer boxcalls

Task 4: Performance Target

  • Current: C ≈ 12ms, Hakorune EXE ≈ 25s, ratio ≈ 200,000%
  • Target: ratio < 1000% (ideally < 500% after subsequent hoist/CSE passes)
  • Stretch goal: ratio ≈ 125% (matching maplin's C-equivalent performance)

Task 5: Regression Checks

Must verify no degradation in:

  • maplin: Currently at ~100% ratio (C-equivalent)
  • arraymap: Existing optimizations preserved
  • linidx: Existing optimizations preserved

Testing Commands

# Generate ORIG MIR (no AOT prep)
NYASH_SKIP_TOML_ENV=1 NYASH_DISABLE_PLUGINS=1 NYASH_PARSER_STAGE3=1 \
./target/release/hakorune --emit-mir-json tmp/matmul_orig.json /tmp/matmul_min.hako

# Generate PREP MIR (with CollectionsHot fix)
# Note: Need to use tools/hakorune_emit_mir.sh or manually apply AotPrep.run_json

# Count externcalls vs boxcalls
rg '"op":"externcall"' tmp/matmul_prep.json | wc -l
rg '"op":"boxcall"' tmp/matmul_prep.json | wc -l

# Run benchmark
NYASH_SKIP_TOML_ENV=1 NYASH_DISABLE_PLUGINS=1 \
tools/perf/microbench.sh --case matmul --backend llvm --exe --runs 3

# Regression checks
tools/perf/microbench.sh --case maplin --backend llvm --exe --runs 3
tools/perf/microbench.sh --case arraymap --backend llvm --exe --runs 3 --budget-ms 10000
tools/perf/microbench.sh --case linidx --backend llvm --exe --runs 3

Implementation Status

  • Task 1: Analysis complete
  • Task 2: Code changes complete (collections_hot.hako modified)
  • Task 3: Testing pending (VM step budget issues with large MIR)
  • Task 4: Performance benchmark pending
  • Task 5: Regression checks pending

Notes

  • Phase 15.5 compliance: Fix follows "既定OFF" principle - only affects when NYASH_AOT_COLLECTIONS_HOT=1 is set
  • Backwards compatibility: copy_src_map parameter is optional (can be null), so existing callers still work
  • No behavior change: Existing maplin/arraymap/linidx should be unaffected as they don't have deep PHI chains requiring copy resolution