# Phase 131-3: MIR→LLVM Lowering Inventory **Date**: 2025-12-14 **Purpose**: Identify what is broken in the LLVM (Python llvmlite) lowering pipeline using a few representative cases, and record evidence + next actions. ## Test Cases & Results | Case | File | Emit | Link | Run | Notes | |------|------|------|------|-----|-------| | A | `apps/tests/phase87_llvm_exe_min.hako` | ✅ | ✅ | ✅ | **PASS** - Simple return 42, no BoxCall, exit code verified | | B | `apps/tests/loop_min_while.hako` | ❌ | - | - | **TAG-EMIT** - Loop generates invalid LLVM IR (observed: PHI placement/order issue; also mentions empty block) | | B2 | `/tmp/case_b_simple.hako` | ✅ | ✅ | ✅ | **PASS** - Simple print(42) without loop works | | C | `apps/tests/llvm_stage3_loop_only.hako` | ❌ | - | - | **TAG-EMIT** - Complex loop (break/continue) fails JoinIR pattern matching | ## Root Causes Identified ### 1. TAG-EMIT: Loop PHI → Invalid LLVM IR (Case B) **File**: `apps/tests/loop_min_while.hako` **Code**: ```nyash static box Main { main() { local i = 0 loop(i < 3) { print(i) i = i + 1 } return 0 } } ``` **MIR Compilation**: SUCCESS (Pattern 1 JoinIR lowering works) ``` [joinir/pattern1] Generated JoinIR for Simple While Pattern [joinir/pattern1] Functions: main, loop_step, k_exit 📊 MIR Module compiled successfully! 📊 Functions: 4 ``` **LLVM Harness Failure**: ``` RuntimeError: LLVM IR parsing error :35:1: error: expected instruction opcode bb4: ^ ``` **Observed invalid IR snippet**: ```llvm bb3: ret i64 %"ret_phi_17" ← Terminator FIRST (INVALID!) %"ret_phi_17" = phi i64 [0, %"bb6"] ← PHI AFTER terminator ``` **What we know**: - LLVM IR requires: **PHI nodes first**, then non-PHI instructions, then terminator last. - The harness lowers blocks (including terminators), then wires PHIs, then runs a safety pass: - `src/llvm_py/builders/function_lower.py` calls `_lower_blocks(...)` → `_finalize_phis(builder)` → `_enforce_terminators(...)`. - Per-block lowering explicitly lowers terminators after body ops: - `src/llvm_py/builders/block_lower.py` splits `body_ops` and `term_ops`, then lowers `term_ops` after `body_ops`. - PHIs are created/wired during finalize via `ensure_phi(...)`: - `src/llvm_py/phi_wiring/wiring.py` (positions PHI “at block head”, and logs when a terminator already exists). This strongly suggests an **emission ordering / insertion-position** problem in the harness, not a MIR generation bug. The exact failure mode still needs to be confirmed by tracing where the PHI is inserted relative to the terminator in the failing block. **Where to inspect next (code pointers)**: - Harness pipeline ordering: `src/llvm_py/builders/function_lower.py` - Terminator emission: `src/llvm_py/builders/block_lower.py` - PHI insertion rules + debug: `src/llvm_py/phi_wiring/wiring.py` (`NYASH_PHI_ORDERING_DEBUG=1`) - “Empty block” safety pass (separate concern): `src/llvm_py/builders/function_lower.py:_enforce_terminators` --- ### 2. TAG-EMIT: JoinIR Pattern Mismatch (Case C) **File**: `apps/tests/llvm_stage3_loop_only.hako` **Code**: ```nyash static box Main { main() { local counter = 0 loop (true) { counter = counter + 1 if counter == 3 { break } continue } print("Result: " + counter) return 0 } } ``` **MIR Compilation**: FAILURE ``` ❌ MIR compilation error: [joinir/freeze] Loop lowering failed: JoinIR does not support this pattern, and LoopBuilder has been removed. Function: main Hint: This loop pattern is not supported. All loops must use JoinIR lowering. ``` **Diagnosis**: - `loop(true)` with `break`/`continue` doesn't match Pattern 1-4 - LoopBuilder fallback was removed (Phase 33 cleanup) - JoinIR Pattern coverage gap: needs Pattern 5 or Pattern variant for infinite loops with early exit **Location**: `src/mir/builder/control_flow/joinir/router.rs` - pattern matching logic --- ## Success Cases ### Case A: Minimal (No BoxCall, No Loop) - **EMIT**: ✅ Object generated successfully - **LINK**: ✅ Linked with NyKernel runtime - **RUN**: ✅ Exit code 42 verified - **Validation**: Full LLVM exe line SSOT confirmed working ### Case B2: Simple BoxCall (No Loop) - **EMIT**: ✅ Object generated successfully - **LINK**: ✅ Linked with NyKernel runtime - **RUN**: ✅ `print(42)` executes (loop-free path) - **Validation**: BoxCall → ExternCall lowering works correctly ## Next Steps ### Priority 1: Fix TAG-EMIT (PHI After Terminator Bug) ⚠️ CRITICAL **Target**: Case B (`loop_min_while.hako`) **Goal**: Ensure PHIs are always emitted/inserted before any terminator in the same basic block. **Candidate approach** (docs-only; implementation to be decided): - Split lowering into multi-pass so that PHI placeholders exist before terminators are emitted, or delay terminator emission until after PHI finalization: - (A) Predeclare PHIs at block creation time (placeholders), then emit body ops, then wire incomings, then emit terminators. - (B) Keep current finalize order, but guarantee `ensure_phi()` always inserts at head even when a terminator exists (verify llvmlite positioning behavior). **Primary files to look at for the fix**: - `src/llvm_py/builders/function_lower.py` (pass ordering) - `src/llvm_py/builders/block_lower.py` (terminator emission split point) - `src/llvm_py/phi_wiring/wiring.py` (PHI insertion positioning) --- ### Priority 2: Fix TAG-EMIT (JoinIR Pattern Coverage) **Target**: Case C (`llvm_stage3_loop_only.hako`) **Approach**: 1. Analyze `loop(true) { ... break ... continue }` control flow 2. Design JoinIR Pattern variant (Pattern 1.5 or Pattern 5?) 3. Implement pattern in `src/mir/builder/control_flow/joinir/patterns/` 4. Update router to match this pattern **Files**: - `src/mir/builder/control_flow/joinir/router.rs` - add pattern matching - `src/mir/builder/control_flow/joinir/patterns/` - new pattern module **Expected**: Infinite loops with break/continue should lower to JoinIR --- ### Priority 3: Comprehensive Loop Coverage Test **After** P1+P2 fixed: **Test Matrix**: ```bash # Pattern 1: Simple while apps/tests/loop_min_while.hako # Pattern 2: Infinite loop + break apps/tests/llvm_stage3_loop_only.hako # Pattern 3: Loop with if-phi apps/tests/loop_if_phi.hako # Pattern 4: Nested loops apps/tests/nested_loop_inner_break_isolated.hako ``` All should pass: EMIT ✅ LINK ✅ RUN ✅ --- ## Box Theory Modularization Feedback ### LLVM Line SSOT Analysis #### ✅ Good: Single Entry Point - `tools/build_llvm.sh` is the SSOT for LLVM exe line - Clear 4-phase pipeline: Build → Emit → Link → Run - Env vars control compiler mode (`NYASH_LLVM_COMPILER=harness|crate`) #### ❌ Bad: Harness Duplication Risk - Python harness: `src/llvm_py/llvm_builder.py` (~2000 lines) - Rust crate: `crates/nyash-llvm-compiler/` (separate implementation) - Both translate MIR14→LLVM, risk of divergence #### 🔧 Recommendation: Harness as Box ``` Box: LLVMCompilerBox - Method: compile_to_object(mir_json: str, output: str) - Default impl: Python harness (llvmlite) - Alternative impl: Rust crate (inkwell - deprecated) - Interface: MIR JSON v1 schema (fixed contract) ``` **Benefits**: - Single interface definition - Easy A/B testing (Python vs Rust) - Plugin architecture: external LLVM backends --- ### Duplication Found: BB Emission Logic **Location 1**: `src/llvm_py/llvm_builder.py:400-600` **Location 2**: (likely) `crates/nyash-llvm-compiler/src/codegen/` (if crate path is used) **Problem**: Empty BB handling differs between harness and crate path **Solution**: Box-first extraction ```rust // Extract to: src/mir/llvm_ir_validator.rs pub fn validate_basic_blocks(blocks: &[BasicBlock]) -> Result<(), String> { for bb in blocks { if bb.instructions.is_empty() && bb.terminator.is_none() { return Err(format!("Empty BB detected: {:?}", bb.id)); } } Ok(()) } ``` Call this validator **before** harness invocation (in Rust MIR emission path). --- ### Legacy Deletion Candidates #### 1. LoopBuilder Remnants (Phase 33 cleanup incomplete?) **Search**: `grep -r "LoopBuilder" src/mir/builder/control_flow/` **Action**: Verify no dead imports/comments remain #### 2. Unreachable BB Emission Code **Location**: `src/llvm_py/llvm_builder.py` **Check**: Does harness skip `"reachable": false` blocks from MIR JSON? **Action**: If not, add filter before BB emission **Code snippet to check**: ```python # src/llvm_py/llvm_builder.py (approx line 450) for block in function["blocks"]: if block.get("reachable") == False: # ← Add this check? continue self.emit_basic_block(block) ``` --- ## Validation: build_llvm.sh SSOT Conformance ### ✅ Confirmed SSOT Behaviors 1. **Feature selection**: `NYASH_LLVM_FEATURE=llvm` (default harness) vs `llvm-inkwell-legacy` 2. **Compiler mode**: `NYASH_LLVM_COMPILER=harness` (default) vs `crate` (ny-llvmc) 3. **Object caching**: `NYASH_LLVM_SKIP_EMIT=1` for pre-generated .o files 4. **Runtime selection**: `NYASH_LLVM_NYRT=crates/nyash_kernel/target/release` ### ❌ Missing SSOT: Error Logs - Python harness errors go to stderr (lost after build_llvm.sh exits) - No env var for `NYASH_LLVM_HARNESS_LOG=/tmp/llvm_harness.log` **Recommendation**: ```bash # In build_llvm.sh, line ~118: HARNESS_LOG="${NYASH_LLVM_HARNESS_LOG:-/tmp/nyash_llvm_harness_$$.log}" NYASH_LLVM_OBJ_OUT="$OBJ" NYASH_LLVM_USE_HARNESS=1 \ "$BIN" --backend llvm "$INPUT" 2>&1 | tee "$HARNESS_LOG" ``` --- ## Timeline Estimate - **P1 (Loop PHI → LLVM IR fix)**: 1-2 hours (harness BB emission logic) - **P2 (JoinIR pattern coverage)**: 3-4 hours (pattern design + implementation) - **P3 (Comprehensive test)**: 1 hour (run matrix + verify) **Total**: 5-7 hours to full LLVM loop support --- ## Executive Summary ### What We Found (1.5 hours of investigation) **✅ Case A (Minimal)**: PASS - Simple return works perfectly - EMIT ✅ LINK ✅ RUN ✅ - Validates: Build pipeline, NyKernel runtime, basic MIR→LLVM lowering **❌ Case B (Loop+PHI)**: TAG-EMIT failure - **PHI after terminator bug** - **Root Cause**: Function lowering emits terminators BEFORE finalizing PHIs - **Impact**: ALL loops with PHI nodes fail to compile - **Fix Complexity**: Medium (2-3 hours) - requires multi-pass block emission - **Files**: `src/llvm_py/builders/function_lower.py`, `block_lower.py` **✅ Case B2 (BoxCall)**: PASS - print() without loops works - EMIT ✅ LINK ✅ RUN ✅ - Validates: BoxCall→ExternCall lowering, runtime ABI **❌ Case C (Break/Continue)**: TAG-EMIT failure - **JoinIR pattern gap** - **Root Cause**: `loop(true) { break }` pattern not recognized by JoinIR router - **Impact**: Infinite loops with early exit fail at MIR compilation - **Fix Complexity**: Medium-High (3-4 hours) - requires new JoinIR pattern - **Files**: `src/mir/builder/control_flow/joinir/router.rs`, new pattern module --- ### Critical Path to LLVM Loop Support 1. **Fix PHI ordering** (P1) - Enables Pattern 1 loops (simple while) 2. **Add JoinIR Pattern 5** (P2) - Enables infinite loops with break/continue 3. **Comprehensive test** (P3) - Validate all loop patterns **Total Effort**: 5-7 hours to full LLVM loop support --- ### Box Theory Modularization Insights #### ✅ Good: LLVM Line SSOT - `tools/build_llvm.sh` is well-structured (4-phase pipeline) - Clear separation: Emit → Link → Run - Environment variables control behavior cleanly #### ⚠️ Risk: Harness Duplication - Python harness (`src/llvm_py/`) vs Rust crate (`crates/nyash-llvm-compiler/`) - Both implement MIR14→LLVM, risk of divergence - **Recommendation**: Box-ify with interface contract (MIR JSON v1 schema) #### 🔧 Technical Debt Found 1. **PHI emission ordering** - Architectural issue, not a quick fix 2. **Unreachable block handling** - MIR JSON marks all blocks `reachable: false` (may be stale metadata) 3. **Error logging** - Python harness errors lost after build_llvm.sh exits --- ## Appendix: Test Commands ### Case A (Minimal - PASS) ```bash tools/build_llvm.sh apps/tests/phase87_llvm_exe_min.hako -o tmp/case_a tmp/case_a echo $? # Expected: 42 ``` ### Case B (Loop PHI - FAIL at EMIT) ```bash tools/build_llvm.sh apps/tests/loop_min_while.hako -o tmp/case_b # Error: empty bb4 in LLVM IR ``` ### Case B2 (Simple BoxCall - PASS) ```bash cat > /tmp/case_b_simple.hako << 'EOF' static box Main { main() { print(42) return 0 } } EOF tools/build_llvm.sh /tmp/case_b_simple.hako -o tmp/case_b2 tmp/case_b2 # Output: (empty, but executes without crash) ``` ### Case C (Complex Loop - FAIL at MIR) ```bash tools/build_llvm.sh apps/tests/llvm_stage3_loop_only.hako -o tmp/case_c # Error: JoinIR pattern not supported ``` --- ## MIR JSON Inspection (Case B Debug) ```bash # Generate MIR JSON ./target/release/hakorune --emit-mir-json /tmp/case_b.json --backend mir apps/tests/loop_min_while.hako # Check for unreachable blocks jq '.cfg.functions[] | select(.name=="main") | .blocks[] | select(.reachable==false)' /tmp/case_b.json # Inspect bb4 (the problematic block) jq '.cfg.functions[] | select(.name=="main") | .blocks[] | select(.id==4)' /tmp/case_b.json ``` --- ## Success Criteria **Phase 131-3 Complete** when: 1. ✅ Case A continues to pass (regression prevention) 2. ✅ Case B (loop_min_while.hako) compiles to valid LLVM IR and runs 3. ✅ Case B2 continues to pass (BoxCall regression prevention) 4. ✅ Case C (llvm_stage3_loop_only.hako) lowers to JoinIR and runs 5. ✅ All 4 cases produce correct output 6. ✅ No plugin errors (or plugin errors are benign/documented) **Definition of Done**: - All test cases: EMIT ✅ LINK ✅ RUN ✅ - Exit codes match expected values - Output matches expected output (where applicable)