Files
hakorune/docs/development/current/main/investigations/phase132-case-c-llvm-exe.md
nyash-codex 3f58f34592 feat(llvm): Phase 132-P0 - block_end_values tuple-key fix for cross-function isolation
## Problem
`block_end_values` used block ID only as key, causing collisions when
multiple functions share the same block IDs (e.g., bb0 in both
condition_fn and main).

## Root Cause
- condition_fn's bb0 → block_end_values[0]
- main's bb0 → block_end_values[0] (OVERWRITES!)
- PHI resolution gets wrong snapshot → dominance error

## Solution (Box-First principle)
Change key from `int` to `Tuple[str, int]` (func_name, block_id):

```python
# Before
block_end_values: Dict[int, Dict[int, ir.Value]]

# After
block_end_values: Dict[Tuple[str, int], Dict[int, ir.Value]]
```

## Files Modified (Python - 6 files)

1. `llvm_builder.py` - Type annotation update
2. `function_lower.py` - Pass func_name to lower_blocks
3. `block_lower.py` - Use tuple keys for snapshot save/load
4. `resolver.py` - Add func_name parameter to resolve_incoming
5. `wiring.py` - Thread func_name through PHI wiring
6. `phi_manager.py` - Debug traces

## Files Modified (Rust - cleanup)

- Removed deprecated `loop_to_join.rs` (297 lines deleted)
- Updated pattern lowerers for cleaner exit handling
- Added lifecycle management improvements

## Verification

-  Pattern 1: VM RC: 3, LLVM Result: 3 (no regression)
- ⚠️ Case C: Still has dominance error (separate root cause)
  - Needs additional scope fixes (phi_manager, resolver caches)

## Design Principles

- **Box-First**: Each function is an isolated Box with scoped state
- **SSOT**: (func_name, block_id) uniquely identifies block snapshots
- **Fail-Fast**: No cross-function state contamination

## Known Issues (Phase 132-P1)

Other function-local state needs same treatment:
- phi_manager.predeclared
- resolver caches (i64_cache, ptr_cache, etc.)
- builder._jump_only_blocks

## Documentation

- docs/development/current/main/investigations/phase132-p0-case-c-root-cause.md
- docs/development/current/main/investigations/phase132-p0-tuple-key-implementation.md

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 05:36:50 +09:00

170 lines
6.7 KiB
Markdown

# Phase 132-P0: Case C (Infinite Loop with Early Exit) - LLVM EXE Investigation
## Date
2025-12-15
## Status
🔴 **FAILED** - LLVM executable returns wrong result
## Summary
Testing `apps/tests/llvm_stage3_loop_only.hako` (Pattern 5: InfiniteEarlyExit) in LLVM EXE mode reveals a critical exit PHI usage bug.
## Test File
`apps/tests/llvm_stage3_loop_only.hako`
```nyash
static box Main {
main() {
local counter = 0
loop (true) {
counter = counter + 1
if counter == 3 { break }
continue
}
print("Result: " + counter)
return 0
}
}
```
## Expected Behavior
- **VM execution**: `Result: 3`
- **LLVM EXE**: `Result: 3` (should match VM)
## Actual Behavior
- **VM execution**: `Result: 3`
- **LLVM EXE**: `Result: 0`
## Root Cause Analysis
### MIR Structure (Correct)
```mir
bb3: ; Exit block
1: %1: Integer = phi [%8, bb6] ; ✅ PHI correctly receives counter
1: %16: String = const "Result: "
1: %17: String = copy %16
1: %18: Integer = copy %1 ; ✅ MIR uses %1 (PHI result)
1: %19: Box("StringBox") = %17 Add %18
1: %20: Box("StringBox") = copy %19
1: call_global print(%20)
1: %21: Integer = const 0
1: ret %21
```
**MIR is correct**: The exit block (bb3) has a PHI node that receives the counter value from bb6, and subsequent instructions correctly use `%1`.
### LLVM IR (BUG)
```llvm
bb3:
%"phi_1" = phi i64 [%"add_8", %"bb6"] ; ✅ PHI created correctly
%".2" = getelementptr inbounds [9 x i8], [9 x i8]* @".str.main.16", i32 0, i32 0
%"const_str_h_16" = call i64 @"nyash.box.from_i8_string"(i8* %".2")
%"bin_h2p_r_19" = call i8* @"nyash.string.to_i8p_h"(i64 0) ; ❌ Uses 0 instead of %"phi_1"
%"concat_is_19" = call i8* @"nyash.string.concat_is"(i64 0, i8* %"bin_h2p_r_19") ; ❌ Uses 0
%"concat_box_19" = call i64 @"nyash.box.from_i8_string"(i8* %"concat_is_19")
call void @"ny_check_safepoint"()
%"unified_global_print" = call i64 @"print"(i64 0) ; ❌ Uses 0
ret i64 0
```
**Bug identified**: The PHI node `%"phi_1"` is created correctly and receives the counter value from `%"add_8"`. However, **all subsequent uses of ValueId(1) are hardcoded to `i64 0` instead of using `%"phi_1"`**.
### Hypothesis
The Python LLVM builder is not correctly resolving ValueId(1) when lowering instructions in bb3. Possible causes:
1. **vmap issue**: The PHI node is created and stored in `self.vmap[1]` during `setup_phi_placeholders`, but when lowering instructions in bb3, `vmap_cur` may not contain the PHI.
2. **Resolution fallback**: When `resolve_i64_strict` fails to find ValueId(1), it falls back to `ir.Constant(i64, 0)`.
3. **Block-local vmap initialization**: `vmap_cur` is initialized with `dict(builder.vmap)` at the start of each block, but something may be preventing the PHI from being included.
## Investigation Steps
### Step 1: Verify PHI Creation
**Confirmed**: PHI is created in LLVM IR at bb3
### Step 2: Check MIR Exit PHI Generation
**Confirmed**: MIR has correct exit PHI with debug logs:
```
[DEBUG-177] Phase 246-EX: Block BasicBlockId(1) has jump_args metadata: [ValueId(1004)]
[DEBUG-177] Phase 246-EX: Remapped jump_args: [ValueId(8)]
[DEBUG-177] Phase 246-EX: exit_phi_inputs from jump_args[0]: (BasicBlockId(6), ValueId(8))
[DEBUG-177] Phase 246-EX-P5: Added loop_var 'counter' to carrier_inputs: (BasicBlockId(6), ValueId(8))
[DEBUG-177] Exit block PHI (carrier 'counter'): ValueId(1) = phi [(BasicBlockId(6), ValueId(8))]
```
### Step 3: Trace vmap Resolution
🔄 **In progress**: Running with `NYASH_LLVM_VMAP_TRACE=1` to see if PHI is in vmap_cur
## Code Locations
### Python LLVM Builder
- PHI placeholder creation: `/home/tomoaki/git/hakorune-selfhost/src/llvm_py/llvm_builder.py:276-368`
- Line 343: `self.vmap[dst0] = ph0` - PHI stored in global vmap
- Block lowering: `/home/tomoaki/git/hakorune-selfhost/src/llvm_py/builders/block_lower.py`
- Line 335: `vmap_cur = dict(builder.vmap)` - Copy global vmap to block-local
- Instruction lowering: `/home/tomoaki/git/hakorune-selfhost/src/llvm_py/builders/instruction_lower.py`
- Line 8: `vmap_ctx = getattr(owner, '_current_vmap', owner.vmap)` - Use block-local vmap
- Value resolution: `/home/tomoaki/git/hakorune-selfhost/src/llvm_py/utils/values.py:11-56`
- `resolve_i64_strict` - Checks vmap, global_vmap, then resolver
- Falls back to `ir.Constant(i64, 0)` if all fail (line 53)
### Rust MIR Generation
- Exit PHI generation: `src/mir/join_ir/lowering/simple_while_minimal.rs`
- Pattern 5 (InfiniteEarlyExit) lowering
## Root Cause Identified
### Bug Location
`/home/tomoaki/git/hakorune-selfhost/src/llvm_py/llvm_builder.py:342-343`
```python
ph0 = b0.phi(self.i64, name=f"phi_{dst0}")
self.vmap[dst0] = ph0
# ❌ MISSING: self.phi_manager.register_phi(bid0, dst0, ph0)
```
### Failure Chain
1. **PHI Creation** (line 342-343): PHI is created and stored in `self.vmap[1]`
2. **PHI Registration** (MISSING): PHI is **never registered** via `phi_manager.register_phi()`
3. **Block Lowering** (block_lower.py:325): `filter_vmap_preserve_phis` is called
4. **PHI Filtering** (phi_manager.py:filter_vmap_preserve_phis): Checks `is_phi_owned(3, 1)`
5. **Ownership Check** (phi_manager.py:is_phi_owned): Looks for `(3, 1)` in `predeclared` dict
6. **Not Found** ❌: PHI was never registered, so `(3, 1)` is not in `predeclared`
7. **PHI Filtered Out**: PHI is removed from `vmap_cur`
8. **Value Resolution Fails**: Instructions can't find ValueId(1), fall back to `ir.Constant(i64, 0)`
### Fix Strategy
**Option A: Add PHI Registration** (Recommended)
Add `self.phi_manager.register_phi(bid0, dst0, ph0)` after line 343 in `llvm_builder.py:setup_phi_placeholders`:
```python
if not is_phi:
ph0 = b0.phi(self.i64, name=f"phi_{dst0}")
self.vmap[dst0] = ph0
# ✅ FIX: Register PHI for filter_vmap_preserve_phis
self.phi_manager.register_phi(int(bid0), int(dst0), ph0)
```
This ensures PHIs are included in `vmap_cur` when lowering their defining block.
### Verification Plan
1. Add `register_phi` call in `setup_phi_placeholders`
2. Rebuild and test: `NYASH_LLVM_STRICT=1 tools/build_llvm.sh apps/tests/llvm_stage3_loop_only.hako -o /tmp/case_c`
3. Execute: `/tmp/case_c` should output `Result: 3`
4. Check LLVM IR: Should use `%"phi_1"` instead of `0`
## Acceptance Criteria
- ✅ LLVM EXE output: `Result: 3`
- ✅ LLVM IR uses `%"phi_1"` instead of `0`
- ✅ STRICT mode passes without fallback warnings
## Related Documents
- [Phase 132 Plan](/home/tomoaki/git/hakorune-selfhost/docs/development/current/main/phase132-plan.md)
- [Phase 131-3 LLVM Lowering Inventory](/home/tomoaki/git/hakorune-selfhost/docs/development/current/main/phase131-3-llvm-lowering-inventory.md)
- [Simple While Minimal Lowering](/home/tomoaki/git/hakorune-selfhost/src/mir/join_ir/lowering/simple_while_minimal.rs)