Files
hakorune/docs/development/current/main/investigations/phase132-case-c-llvm-exe.md
nyash-codex 3f58f34592 feat(llvm): Phase 132-P0 - block_end_values tuple-key fix for cross-function isolation
## Problem
`block_end_values` used block ID only as key, causing collisions when
multiple functions share the same block IDs (e.g., bb0 in both
condition_fn and main).

## Root Cause
- condition_fn's bb0 → block_end_values[0]
- main's bb0 → block_end_values[0] (OVERWRITES!)
- PHI resolution gets wrong snapshot → dominance error

## Solution (Box-First principle)
Change key from `int` to `Tuple[str, int]` (func_name, block_id):

```python
# Before
block_end_values: Dict[int, Dict[int, ir.Value]]

# After
block_end_values: Dict[Tuple[str, int], Dict[int, ir.Value]]
```

## Files Modified (Python - 6 files)

1. `llvm_builder.py` - Type annotation update
2. `function_lower.py` - Pass func_name to lower_blocks
3. `block_lower.py` - Use tuple keys for snapshot save/load
4. `resolver.py` - Add func_name parameter to resolve_incoming
5. `wiring.py` - Thread func_name through PHI wiring
6. `phi_manager.py` - Debug traces

## Files Modified (Rust - cleanup)

- Removed deprecated `loop_to_join.rs` (297 lines deleted)
- Updated pattern lowerers for cleaner exit handling
- Added lifecycle management improvements

## Verification

-  Pattern 1: VM RC: 3, LLVM Result: 3 (no regression)
- ⚠️ Case C: Still has dominance error (separate root cause)
  - Needs additional scope fixes (phi_manager, resolver caches)

## Design Principles

- **Box-First**: Each function is an isolated Box with scoped state
- **SSOT**: (func_name, block_id) uniquely identifies block snapshots
- **Fail-Fast**: No cross-function state contamination

## Known Issues (Phase 132-P1)

Other function-local state needs same treatment:
- phi_manager.predeclared
- resolver caches (i64_cache, ptr_cache, etc.)
- builder._jump_only_blocks

## Documentation

- docs/development/current/main/investigations/phase132-p0-case-c-root-cause.md
- docs/development/current/main/investigations/phase132-p0-tuple-key-implementation.md

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 05:36:50 +09:00

6.7 KiB

Phase 132-P0: Case C (Infinite Loop with Early Exit) - LLVM EXE Investigation

Date

2025-12-15

Status

🔴 FAILED - LLVM executable returns wrong result

Summary

Testing apps/tests/llvm_stage3_loop_only.hako (Pattern 5: InfiniteEarlyExit) in LLVM EXE mode reveals a critical exit PHI usage bug.

Test File

apps/tests/llvm_stage3_loop_only.hako

static box Main {
  main() {
    local counter = 0
    loop (true) {
      counter = counter + 1
      if counter == 3 { break }
      continue
    }
    print("Result: " + counter)
    return 0
  }
}

Expected Behavior

  • VM execution: Result: 3
  • LLVM EXE: Result: 3 (should match VM)

Actual Behavior

  • VM execution: Result: 3
  • LLVM EXE: Result: 0

Root Cause Analysis

MIR Structure (Correct)

bb3:  ; Exit block
    1: %1: Integer = phi [%8, bb6]  ; ✅ PHI correctly receives counter
    1: %16: String = const "Result: "
    1: %17: String = copy %16
    1: %18: Integer = copy %1       ; ✅ MIR uses %1 (PHI result)
    1: %19: Box("StringBox") = %17 Add %18
    1: %20: Box("StringBox") = copy %19
    1: call_global print(%20)
    1: %21: Integer = const 0
    1: ret %21

MIR is correct: The exit block (bb3) has a PHI node that receives the counter value from bb6, and subsequent instructions correctly use %1.

LLVM IR (BUG)

bb3:
  %"phi_1" = phi  i64 [%"add_8", %"bb6"]           ; ✅ PHI created correctly
  %".2" = getelementptr inbounds [9 x i8], [9 x i8]* @".str.main.16", i32 0, i32 0
  %"const_str_h_16" = call i64 @"nyash.box.from_i8_string"(i8* %".2")
  %"bin_h2p_r_19" = call i8* @"nyash.string.to_i8p_h"(i64 0)  ; ❌ Uses 0 instead of %"phi_1"
  %"concat_is_19" = call i8* @"nyash.string.concat_is"(i64 0, i8* %"bin_h2p_r_19")  ; ❌ Uses 0
  %"concat_box_19" = call i64 @"nyash.box.from_i8_string"(i8* %"concat_is_19")
  call void @"ny_check_safepoint"()
  %"unified_global_print" = call i64 @"print"(i64 0)  ; ❌ Uses 0
  ret i64 0

Bug identified: The PHI node %"phi_1" is created correctly and receives the counter value from %"add_8". However, all subsequent uses of ValueId(1) are hardcoded to i64 0 instead of using %"phi_1".

Hypothesis

The Python LLVM builder is not correctly resolving ValueId(1) when lowering instructions in bb3. Possible causes:

  1. vmap issue: The PHI node is created and stored in self.vmap[1] during setup_phi_placeholders, but when lowering instructions in bb3, vmap_cur may not contain the PHI.

  2. Resolution fallback: When resolve_i64_strict fails to find ValueId(1), it falls back to ir.Constant(i64, 0).

  3. Block-local vmap initialization: vmap_cur is initialized with dict(builder.vmap) at the start of each block, but something may be preventing the PHI from being included.

Investigation Steps

Step 1: Verify PHI Creation

Confirmed: PHI is created in LLVM IR at bb3

Step 2: Check MIR Exit PHI Generation

Confirmed: MIR has correct exit PHI with debug logs:

[DEBUG-177] Phase 246-EX: Block BasicBlockId(1) has jump_args metadata: [ValueId(1004)]
[DEBUG-177] Phase 246-EX: Remapped jump_args: [ValueId(8)]
[DEBUG-177] Phase 246-EX: exit_phi_inputs from jump_args[0]: (BasicBlockId(6), ValueId(8))
[DEBUG-177] Phase 246-EX-P5: Added loop_var 'counter' to carrier_inputs: (BasicBlockId(6), ValueId(8))
[DEBUG-177] Exit block PHI (carrier 'counter'): ValueId(1) = phi [(BasicBlockId(6), ValueId(8))]

Step 3: Trace vmap Resolution

🔄 In progress: Running with NYASH_LLVM_VMAP_TRACE=1 to see if PHI is in vmap_cur

Code Locations

Python LLVM Builder

  • PHI placeholder creation: /home/tomoaki/git/hakorune-selfhost/src/llvm_py/llvm_builder.py:276-368
    • Line 343: self.vmap[dst0] = ph0 - PHI stored in global vmap
  • Block lowering: /home/tomoaki/git/hakorune-selfhost/src/llvm_py/builders/block_lower.py
    • Line 335: vmap_cur = dict(builder.vmap) - Copy global vmap to block-local
  • Instruction lowering: /home/tomoaki/git/hakorune-selfhost/src/llvm_py/builders/instruction_lower.py
    • Line 8: vmap_ctx = getattr(owner, '_current_vmap', owner.vmap) - Use block-local vmap
  • Value resolution: /home/tomoaki/git/hakorune-selfhost/src/llvm_py/utils/values.py:11-56
    • resolve_i64_strict - Checks vmap, global_vmap, then resolver
    • Falls back to ir.Constant(i64, 0) if all fail (line 53)

Rust MIR Generation

  • Exit PHI generation: src/mir/join_ir/lowering/simple_while_minimal.rs
    • Pattern 5 (InfiniteEarlyExit) lowering

Root Cause Identified

Bug Location

/home/tomoaki/git/hakorune-selfhost/src/llvm_py/llvm_builder.py:342-343

ph0 = b0.phi(self.i64, name=f"phi_{dst0}")
self.vmap[dst0] = ph0
# ❌ MISSING: self.phi_manager.register_phi(bid0, dst0, ph0)

Failure Chain

  1. PHI Creation (line 342-343): PHI is created and stored in self.vmap[1]
  2. PHI Registration (MISSING): PHI is never registered via phi_manager.register_phi()
  3. Block Lowering (block_lower.py:325): filter_vmap_preserve_phis is called
  4. PHI Filtering (phi_manager.py:filter_vmap_preserve_phis): Checks is_phi_owned(3, 1)
  5. Ownership Check (phi_manager.py:is_phi_owned): Looks for (3, 1) in predeclared dict
  6. Not Found : PHI was never registered, so (3, 1) is not in predeclared
  7. PHI Filtered Out: PHI is removed from vmap_cur
  8. Value Resolution Fails: Instructions can't find ValueId(1), fall back to ir.Constant(i64, 0)

Fix Strategy

Option A: Add PHI Registration (Recommended)

Add self.phi_manager.register_phi(bid0, dst0, ph0) after line 343 in llvm_builder.py:setup_phi_placeholders:

if not is_phi:
    ph0 = b0.phi(self.i64, name=f"phi_{dst0}")
    self.vmap[dst0] = ph0
    # ✅ FIX: Register PHI for filter_vmap_preserve_phis
    self.phi_manager.register_phi(int(bid0), int(dst0), ph0)

This ensures PHIs are included in vmap_cur when lowering their defining block.

Verification Plan

  1. Add register_phi call in setup_phi_placeholders
  2. Rebuild and test: NYASH_LLVM_STRICT=1 tools/build_llvm.sh apps/tests/llvm_stage3_loop_only.hako -o /tmp/case_c
  3. Execute: /tmp/case_c should output Result: 3
  4. Check LLVM IR: Should use %"phi_1" instead of 0

Acceptance Criteria

  • LLVM EXE output: Result: 3
  • LLVM IR uses %"phi_1" instead of 0
  • STRICT mode passes without fallback warnings