Files
hakorune/docs/development/current/main/investigations/phase132-p0-case-c-root-cause.md
nyash-codex 3f58f34592 feat(llvm): Phase 132-P0 - block_end_values tuple-key fix for cross-function isolation
## Problem
`block_end_values` used block ID only as key, causing collisions when
multiple functions share the same block IDs (e.g., bb0 in both
condition_fn and main).

## Root Cause
- condition_fn's bb0 → block_end_values[0]
- main's bb0 → block_end_values[0] (OVERWRITES!)
- PHI resolution gets wrong snapshot → dominance error

## Solution (Box-First principle)
Change key from `int` to `Tuple[str, int]` (func_name, block_id):

```python
# Before
block_end_values: Dict[int, Dict[int, ir.Value]]

# After
block_end_values: Dict[Tuple[str, int], Dict[int, ir.Value]]
```

## Files Modified (Python - 6 files)

1. `llvm_builder.py` - Type annotation update
2. `function_lower.py` - Pass func_name to lower_blocks
3. `block_lower.py` - Use tuple keys for snapshot save/load
4. `resolver.py` - Add func_name parameter to resolve_incoming
5. `wiring.py` - Thread func_name through PHI wiring
6. `phi_manager.py` - Debug traces

## Files Modified (Rust - cleanup)

- Removed deprecated `loop_to_join.rs` (297 lines deleted)
- Updated pattern lowerers for cleaner exit handling
- Added lifecycle management improvements

## Verification

-  Pattern 1: VM RC: 3, LLVM Result: 3 (no regression)
- ⚠️ Case C: Still has dominance error (separate root cause)
  - Needs additional scope fixes (phi_manager, resolver caches)

## Design Principles

- **Box-First**: Each function is an isolated Box with scoped state
- **SSOT**: (func_name, block_id) uniquely identifies block snapshots
- **Fail-Fast**: No cross-function state contamination

## Known Issues (Phase 132-P1)

Other function-local state needs same treatment:
- phi_manager.predeclared
- resolver caches (i64_cache, ptr_cache, etc.)
- builder._jump_only_blocks

## Documentation

- docs/development/current/main/investigations/phase132-p0-case-c-root-cause.md
- docs/development/current/main/investigations/phase132-p0-tuple-key-implementation.md

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 05:36:50 +09:00

4.2 KiB

Phase 132-P0 Case C Root Cause Investigation

Date: 2025-12-15 Status: Root cause identified, fix designed Priority: P0 (blocks LLVM EXE execution)

Problem Statement

Case C (Pattern 5 + print concat) LLVM EXE fails with domination error:

RuntimeError: Instruction does not dominate all uses!
  %phi_1 = phi i64 [ %add_8, %bb6 ]
  %phi_3 = phi i64 [ %phi_1, %bb0 ], [ %add_8, %bb7 ]

%phi_3 in bb4 uses %phi_1 from bb0 edge, but %phi_1 is defined in bb3 which doesn't dominate bb0.

Investigation Process

Step 0: IR Dump Confirmation

Generated IR shows:

bb0:
  br label %bb4

bb3:
  %phi_1 = phi i64 [%add_8, %bb6]    ; Defined in bb3
  ...

bb4:
  %phi_3 = phi i64 [%phi_1, %bb0], [%add_8, %bb7]  ; Uses %phi_1 from bb0!

MIR shows correct structure:

  • bb0: ValueId(1) = const 0
  • bb3: ValueId(1) = PHI [bb6, ValueId(8)]
  • bb4: ValueId(3) = PHI [(bb0, ValueId(2)), (bb7, ValueId(8))]

Key observation: Same ValueId(1) used in different blocks is normal (SSA allows this), but LLVM builder is confusing them!

Step 1: VMAP Trace Analysis

VMAP trace showed:

[vmap/id] Pass A bb0 snapshot id=139925440649984 keys=[0, 1]
[vmap/id] Pass A bb0 snapshot id=139925440650112 keys=[1, 2, 3]

Two different bb0 snapshots! But only one bb0 in main function.

Root Cause Discovery

Checked all functions in MIR JSON:

{
  "name": "condition_fn",
  "blocks": [0]
},
{
  "name": "main",
  "blocks": [0, 3, 4, 5, 6, 7]
}

BINGO: Two functions have bb0!

  • condition_fn has bb0 (first snapshot)
  • main has bb0 (second snapshot, overwrites first)

Root Cause

block_end_values uses block_id as key instead of (function_name, block_id) tuple

Problem flow:

  1. Process condition_fn bb0 → block_end_values[0] = {0: ..., 1: ...}
  2. Process main bb0 → block_end_values[0] = {1: ..., 2: ..., 3: ...} (OVERWRITES!)
  3. Process main bb4's PHI → resolve incoming ValueId(1) from bb0
  4. resolve_incoming(pred_block_id=0, value_id=1) looks up block_end_values[0][1]
  5. Gets main's bb0 ValueId(1) (which is copy of PHI) instead of const 0!

Result: bb4's PHI gets %phi_1 (bb3's PHI) instead of i64 0 (bb0's const), causing domination error.

Solution Design

Change 1: Tuple-Key block_end_values

Old:

block_end_values: Dict[int, Dict[int, ir.Value]] = {}
block_end_values[bid] = snap

New:

block_end_values: Dict[Tuple[str, int], Dict[int, ir.Value]] = {}
block_end_values[(func_name, bid)] = snap

Change 2: Thread function name through call chain

Files to modify:

  1. llvm_builder.py - Type annotation
  2. function_lower.py - Pass func.name to lower_blocks
  3. block_lower.py - Accept func_name parameter, use tuple keys
  4. resolver.py - Update resolve_incoming to accept func_name
  5. phi_wiring/wiring.py - Update wire_incomings to use tuple keys

Change 3: Verifier (STRICT mode)

Add collision detection:

if key in block_end_values and STRICT:
    existing_func = find_function_for_key(key)
    if existing_func != current_func:
        raise RuntimeError(
            f"Block ID collision: bb{bid} exists in both "
            f"{existing_func} and {current_func}"
        )

Acceptance Criteria

  1. Pattern 1 (Phase 132): Still passes (regression test)
  2. Case C (Pattern 5): Builds and executes correctly
  3. VM/LLVM parity: Both produce same result
  4. STRICT mode: No collisions, no fallback to 0

Implementation Status

  • Root cause identified
  • Solution designed
  • Tuple-key implementation
  • STRICT verifier
  • Acceptance testing
  • Documentation update
  • Task: /home/tomoaki/git/hakorune-selfhost/CURRENT_TASK.md (Phase 132-P0)
  • Inventory: /home/tomoaki/git/hakorune-selfhost/docs/development/current/main/phase131-3-llvm-lowering-inventory.md

Key Insight

"Same block ID in different functions is a FEATURE, not a bug"

MIR reuses block IDs across functions (bb0 is common entry block). The LLVM builder MUST namespace block_end_values by function to avoid collisions.

This is a Box-First principle violation: block_end_values should have been scoped per-function from the start (encapsulation boundary).