## Problem `block_end_values` used block ID only as key, causing collisions when multiple functions share the same block IDs (e.g., bb0 in both condition_fn and main). ## Root Cause - condition_fn's bb0 → block_end_values[0] - main's bb0 → block_end_values[0] (OVERWRITES!) - PHI resolution gets wrong snapshot → dominance error ## Solution (Box-First principle) Change key from `int` to `Tuple[str, int]` (func_name, block_id): ```python # Before block_end_values: Dict[int, Dict[int, ir.Value]] # After block_end_values: Dict[Tuple[str, int], Dict[int, ir.Value]] ``` ## Files Modified (Python - 6 files) 1. `llvm_builder.py` - Type annotation update 2. `function_lower.py` - Pass func_name to lower_blocks 3. `block_lower.py` - Use tuple keys for snapshot save/load 4. `resolver.py` - Add func_name parameter to resolve_incoming 5. `wiring.py` - Thread func_name through PHI wiring 6. `phi_manager.py` - Debug traces ## Files Modified (Rust - cleanup) - Removed deprecated `loop_to_join.rs` (297 lines deleted) - Updated pattern lowerers for cleaner exit handling - Added lifecycle management improvements ## Verification - ✅ Pattern 1: VM RC: 3, LLVM Result: 3 (no regression) - ⚠️ Case C: Still has dominance error (separate root cause) - Needs additional scope fixes (phi_manager, resolver caches) ## Design Principles - **Box-First**: Each function is an isolated Box with scoped state - **SSOT**: (func_name, block_id) uniquely identifies block snapshots - **Fail-Fast**: No cross-function state contamination ## Known Issues (Phase 132-P1) Other function-local state needs same treatment: - phi_manager.predeclared - resolver caches (i64_cache, ptr_cache, etc.) - builder._jump_only_blocks ## Documentation - docs/development/current/main/investigations/phase132-p0-case-c-root-cause.md - docs/development/current/main/investigations/phase132-p0-tuple-key-implementation.md 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
150 lines
4.2 KiB
Markdown
150 lines
4.2 KiB
Markdown
# Phase 132-P0 Case C Root Cause Investigation
|
|
|
|
**Date**: 2025-12-15
|
|
**Status**: Root cause identified, fix designed
|
|
**Priority**: P0 (blocks LLVM EXE execution)
|
|
|
|
## Problem Statement
|
|
|
|
Case C (Pattern 5 + print concat) LLVM EXE fails with domination error:
|
|
|
|
```
|
|
RuntimeError: Instruction does not dominate all uses!
|
|
%phi_1 = phi i64 [ %add_8, %bb6 ]
|
|
%phi_3 = phi i64 [ %phi_1, %bb0 ], [ %add_8, %bb7 ]
|
|
```
|
|
|
|
`%phi_3` in bb4 uses `%phi_1` from bb0 edge, but `%phi_1` is defined in bb3 which doesn't dominate bb0.
|
|
|
|
## Investigation Process
|
|
|
|
### Step 0: IR Dump Confirmation
|
|
|
|
Generated IR shows:
|
|
```llvm
|
|
bb0:
|
|
br label %bb4
|
|
|
|
bb3:
|
|
%phi_1 = phi i64 [%add_8, %bb6] ; Defined in bb3
|
|
...
|
|
|
|
bb4:
|
|
%phi_3 = phi i64 [%phi_1, %bb0], [%add_8, %bb7] ; Uses %phi_1 from bb0!
|
|
```
|
|
|
|
MIR shows correct structure:
|
|
- bb0: `ValueId(1) = const 0`
|
|
- bb3: `ValueId(1) = PHI [bb6, ValueId(8)]`
|
|
- bb4: `ValueId(3) = PHI [(bb0, ValueId(2)), (bb7, ValueId(8))]`
|
|
|
|
**Key observation**: Same ValueId(1) used in different blocks is normal (SSA allows this), but LLVM builder is confusing them!
|
|
|
|
### Step 1: VMAP Trace Analysis
|
|
|
|
VMAP trace showed:
|
|
```
|
|
[vmap/id] Pass A bb0 snapshot id=139925440649984 keys=[0, 1]
|
|
[vmap/id] Pass A bb0 snapshot id=139925440650112 keys=[1, 2, 3]
|
|
```
|
|
|
|
Two different bb0 snapshots! But only one bb0 in `main` function.
|
|
|
|
### Root Cause Discovery
|
|
|
|
Checked all functions in MIR JSON:
|
|
```json
|
|
{
|
|
"name": "condition_fn",
|
|
"blocks": [0]
|
|
},
|
|
{
|
|
"name": "main",
|
|
"blocks": [0, 3, 4, 5, 6, 7]
|
|
}
|
|
```
|
|
|
|
**BINGO**: Two functions have bb0!
|
|
- `condition_fn` has bb0 (first snapshot)
|
|
- `main` has bb0 (second snapshot, overwrites first)
|
|
|
|
### Root Cause
|
|
|
|
**`block_end_values` uses `block_id` as key instead of `(function_name, block_id)` tuple**
|
|
|
|
**Problem flow**:
|
|
1. Process `condition_fn` bb0 → `block_end_values[0] = {0: ..., 1: ...}`
|
|
2. Process `main` bb0 → `block_end_values[0] = {1: ..., 2: ..., 3: ...}` (OVERWRITES!)
|
|
3. Process `main` bb4's PHI → resolve incoming ValueId(1) from bb0
|
|
4. `resolve_incoming(pred_block_id=0, value_id=1)` looks up `block_end_values[0][1]`
|
|
5. Gets `main`'s bb0 ValueId(1) (which is copy of PHI) instead of const 0!
|
|
|
|
**Result**: bb4's PHI gets `%phi_1` (bb3's PHI) instead of `i64 0` (bb0's const), causing domination error.
|
|
|
|
## Solution Design
|
|
|
|
### Change 1: Tuple-Key block_end_values
|
|
|
|
**Old**:
|
|
```python
|
|
block_end_values: Dict[int, Dict[int, ir.Value]] = {}
|
|
block_end_values[bid] = snap
|
|
```
|
|
|
|
**New**:
|
|
```python
|
|
block_end_values: Dict[Tuple[str, int], Dict[int, ir.Value]] = {}
|
|
block_end_values[(func_name, bid)] = snap
|
|
```
|
|
|
|
### Change 2: Thread function name through call chain
|
|
|
|
**Files to modify**:
|
|
1. `llvm_builder.py` - Type annotation
|
|
2. `function_lower.py` - Pass `func.name` to `lower_blocks`
|
|
3. `block_lower.py` - Accept `func_name` parameter, use tuple keys
|
|
4. `resolver.py` - Update `resolve_incoming` to accept `func_name`
|
|
5. `phi_wiring/wiring.py` - Update `wire_incomings` to use tuple keys
|
|
|
|
### Change 3: Verifier (STRICT mode)
|
|
|
|
Add collision detection:
|
|
```python
|
|
if key in block_end_values and STRICT:
|
|
existing_func = find_function_for_key(key)
|
|
if existing_func != current_func:
|
|
raise RuntimeError(
|
|
f"Block ID collision: bb{bid} exists in both "
|
|
f"{existing_func} and {current_func}"
|
|
)
|
|
```
|
|
|
|
## Acceptance Criteria
|
|
|
|
1. **Pattern 1** (Phase 132): Still passes (regression test)
|
|
2. **Case C** (Pattern 5): Builds and executes correctly
|
|
3. **VM/LLVM parity**: Both produce same result
|
|
4. **STRICT mode**: No collisions, no fallback to 0
|
|
|
|
## Implementation Status
|
|
|
|
- [x] Root cause identified
|
|
- [x] Solution designed
|
|
- [ ] Tuple-key implementation
|
|
- [ ] STRICT verifier
|
|
- [ ] Acceptance testing
|
|
- [ ] Documentation update
|
|
|
|
## Related Documents
|
|
|
|
- Task: `/home/tomoaki/git/hakorune-selfhost/CURRENT_TASK.md` (Phase 132-P0)
|
|
- Inventory: `/home/tomoaki/git/hakorune-selfhost/docs/development/current/main/phase131-3-llvm-lowering-inventory.md`
|
|
|
|
## Key Insight
|
|
|
|
**"Same block ID in different functions is a FEATURE, not a bug"**
|
|
|
|
MIR reuses block IDs across functions (bb0 is common entry block). The LLVM builder MUST namespace block_end_values by function to avoid collisions.
|
|
|
|
This is a **Box-First principle violation**: `block_end_values` should have been scoped per-function from the start (encapsulation boundary).
|