Files

nyash-codex 1510dcb7d8 fix(llvm): Phase 131-6 調査 - TAG-RUN 3バグ発見（1修正/1部分/1未修正）

Phase 131-6: Infinite Loop Bug 調査完了

発見したバグ（3件）:
1. Bug #1: Copy-to-PHI 命令（SSA 違反）✅ 修正完了
   - instruction_rewriter.rs: PHI destination への Copy をスキップ
2. Bug #2: Type Inference 混同（String vs Integer）⚠️ 部分修正
   - binop.py: force_string ロジック削除
3. Bug #3: SSA Dominance Violation ❌ 未修正
   - MIR builder が定義前に値を使用（根本問題）

変更ファイル:
- src/mir/builder/control_flow/joinir/merge/instruction_rewriter.rs:
  - Lines 428-443: header PHI への Copy スキップ追加
- src/llvm_py/instructions/binop.py:
  - Lines 128-159: force_string 削除、Phase 131-6 コメント追加
- docs/development/current/main/phase131-3-llvm-lowering-inventory.md:
  - 3バグの詳細追記

テスト結果:
- Case A/B2: ✅ 退行なし
- Case B: ❌ infinite loop 継続（Bug #3 が原因）
- Simple Add: ❌ 0 を出力（期待: 1）

Next: Phase 131-6 続き - MIR SSA dominance 根治

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2025-12-14 06:52:50 +09:00

19 KiB

Raw Blame History

Phase 131-3: MIR→LLVM Lowering Inventory

Date: 2025-12-14 Purpose: Identify what is broken in the LLVM (Python llvmlite) lowering pipeline using a few representative cases, and record evidence + next actions.

Test Cases & Results

Case	File	Emit	Link	Run	Notes
A	`apps/tests/phase87_llvm_exe_min.hako`	✅	✅	✅	PASS - Simple return 42, no BoxCall, exit code verified
B	`apps/tests/loop_min_while.hako`	✅	✅	❌	TAG-RUN - EMIT/LINK fixed (Phase 131-5), infinite loop in runtime (PHI update bug)
B2	`/tmp/case_b_simple.hako`	✅	✅	✅	PASS - Simple print(42) without loop works
C	`apps/tests/llvm_stage3_loop_only.hako`	❌	-	-	TAG-EMIT - Complex loop (break/continue) fails JoinIR pattern matching

Root Causes Identified

1. TAG-EMIT: Loop PHI → Invalid LLVM IR (Case B)

File: apps/tests/loop_min_while.hako

Code:

static box Main {
  main() {
    local i = 0
    loop(i < 3) {
      print(i)
      i = i + 1
    }
    return 0
  }
}

MIR Compilation: SUCCESS (Pattern 1 JoinIR lowering works)

[joinir/pattern1] Generated JoinIR for Simple While Pattern
[joinir/pattern1] Functions: main, loop_step, k_exit
📊 MIR Module compiled successfully!
📊 Functions: 4

LLVM Harness Failure:

RuntimeError: LLVM IR parsing error
<string>:35:1: error: expected instruction opcode
bb4:
^

Observed invalid IR snippet:

bb3:
  ret i64 %"ret_phi_17"          ← Terminator FIRST (INVALID!)
  %"ret_phi_17" = phi  i64 [0, %"bb6"]  ← PHI AFTER terminator

What we know:

LLVM IR requires: PHI nodes first, then non-PHI instructions, then terminator last.
The harness lowers blocks (including terminators), then wires PHIs, then runs a safety pass:
- src/llvm_py/builders/function_lower.py calls _lower_blocks(...) → _finalize_phis(builder) → _enforce_terminators(...).
Per-block lowering explicitly lowers terminators after body ops:
- src/llvm_py/builders/block_lower.py splits body_ops and term_ops, then lowers term_ops after body_ops.
PHIs are created/wired during finalize via ensure_phi(...):
- src/llvm_py/phi_wiring/wiring.py (positions PHI “at block head”, and logs when a terminator already exists).

This strongly suggests an emission ordering / insertion-position problem in the harness, not a MIR generation bug. The exact failure mode still needs to be confirmed by tracing where the PHI is inserted relative to the terminator in the failing block.

Where to inspect next (code pointers):

Harness pipeline ordering: src/llvm_py/builders/function_lower.py
Terminator emission: src/llvm_py/builders/block_lower.py
PHI insertion rules + debug: src/llvm_py/phi_wiring/wiring.py (NYASH_PHI_ORDERING_DEBUG=1)
“Empty block” safety pass (separate concern): src/llvm_py/builders/function_lower.py:_enforce_terminators

✅ FIXED (Phase 131-4): Multi-pass block lowering architecture

Solution implemented:

Pass A: Lower non-terminator instructions (body ops only)
Pass B: Finalize PHIs (wire incoming edges) - happens in function_lower.py
Pass C: Lower deferred terminators (after PHIs are placed)

Key changes:

src/llvm_py/builders/block_lower.py:
- Split lower_blocks() to defer terminators
- Added lower_terminators() function for Pass C
- Deferred terminators stored in builder._deferred_terminators
src/llvm_py/builders/function_lower.py:
- Updated pass ordering: Pass A → Pass B → Pass C
- Added call to _lower_terminators() after _finalize_phis()
src/llvm_py/instructions/ret.py:
- Added _disable_phi_synthesis flag check
- Prevents PHI creation during Pass C (terminators should only use existing values)

Result:

Case B EMIT now succeeds ✅
Generated LLVM IR is valid (PHIs before terminators)
No regression in Cases A and B2

2. TAG-LINK: Symbol Name Mismatch (Case B) - ✅ FIXED (Phase 131-5)

File: apps/tests/loop_min_while.hako

Link Error:

/usr/bin/ld: /home/tomoaki/git/hakorune-selfhost/target/aot_objects/loop_min_while.o: in function `condition_fn':
<string>:(.text+0x99): undefined reference to `nyash_console_log'

Root Cause: Python harness was converting dots to underscores in symbol names.

Generated symbol: nyash_console_log (underscores)
NyKernel exports: nyash.console.log (dots)
ELF symbol tables support dots in symbol names - no conversion needed!

Fix Applied (Phase 131-5):

File: src/llvm_py/instructions/externcall.py
Removed dot-to-underscore conversion (lines 54-58)
Now uses symbol names directly as exported by NyKernel
Result: Case B LINK ✅ (no more undefined reference errors)

Verification:

# NyKernel symbols (dots)
$ objdump -t target/release/libnyash_kernel.a | grep console
nyash.console.log
nyash.console.log_handle
print (alias to nyash.console.log)

# LLVM IR now emits (dots - matching!)
declare i64 @nyash.console.log(i8*)

Status: TAG-LINK completely resolved. Case B now passes EMIT ✅ LINK ✅

3. TAG-RUN: Loop Infinite Iteration (Case B) - 🔍 NEW ISSUE

File: apps/tests/loop_min_while.hako

Expected Behavior:

$ ./target/release/hakorune apps/tests/loop_min_while.hako
0
1
2
RC: 0

Actual Behavior (LLVM):

$ /tmp/loop_min_while
0
0
0
... (infinite loop, prints 0 forever)

Diagnosis:

Loop counter i is not being updated correctly
PHI node receives correct values but store/load may be broken
String conversion creates new handles (seen in trace: from_i8_string -> N)
Loop condition (i < 3) always evaluates to true

Hypothesis: PHI value is computed correctly but not written back to memory location, causing i = i + 1 to have no effect.

Next Steps:

Inspect generated LLVM IR for store instructions after PHI
Check if PHI value is being used in subsequent stores
Verify loop increment instruction sequence

Files to investigate:

src/llvm_py/instructions/store.py - Store instruction lowering
src/llvm_py/phi_wiring/wiring.py - PHI value propagation
target/aot_objects/loop_min_while.ll - Generated LLVM IR (if saved)

4. TAG-EMIT: JoinIR Pattern Mismatch (Case C)

File: apps/tests/llvm_stage3_loop_only.hako

Code:

static box Main {
  main() {
    local counter = 0
    loop (true) {
      counter = counter + 1
      if counter == 3 { break }
      continue
    }
    print("Result: " + counter)
    return 0
  }
}

MIR Compilation: FAILURE

❌ MIR compilation error: [joinir/freeze] Loop lowering failed:
   JoinIR does not support this pattern, and LoopBuilder has been removed.
Function: main
Hint: This loop pattern is not supported. All loops must use JoinIR lowering.

Diagnosis:

loop(true) with break/continue doesn't match Pattern 1-4
LoopBuilder fallback was removed (Phase 33 cleanup)
JoinIR Pattern coverage gap: needs Pattern 5 or Pattern variant for infinite loops with early exit

Location: src/mir/builder/control_flow/joinir/router.rs - pattern matching logic

Success Cases

Case A: Minimal (No BoxCall, No Loop)

EMIT: ✅ Object generated successfully
LINK: ✅ Linked with NyKernel runtime
RUN: ✅ Exit code 42 verified
Validation: Full LLVM exe line SSOT confirmed working

Case B2: Simple BoxCall (No Loop)

EMIT: ✅ Object generated successfully
LINK: ✅ Linked with NyKernel runtime
RUN: ✅ print(42) executes (loop-free path)
Validation: BoxCall → ExternCall lowering works correctly

Next Steps

✅ Priority 1: COMPLETED - Fix TAG-EMIT (PHI After Terminator Bug)

Target: Case B (loop_min_while.hako)

Status: ✅ FIXED in Phase 131-4 (see Root Cause #1 above)

Result: Case B EMIT now succeeds. Multi-pass block lowering architecture working.

✅ Priority 2: COMPLETED - Fix TAG-LINK (Symbol Name Mismatch)

Target: Case B (loop_min_while.hako)

Status: ✅ FIXED in Phase 131-5 (see Root Cause #2 above)

Approach Taken:

Investigated NyKernel exported symbols → found dots in names (nyash.console.log)
Identified Python harness converting dots to underscores (WRONG!)
Removed conversion - ELF supports dots natively
Verified with objdump and test execution

Files Modified:

src/llvm_py/instructions/externcall.py - Removed dot-to-underscore conversion

Result: Case B now passes EMIT ✅ LINK ✅ (but RUN fails - see Priority 3)

🔥 Priority 3: Fix TAG-RUN (Loop Infinite Iteration) - IN PROGRESS (Phase 131-6)

Target: Case B (loop_min_while.hako)

Issue: Loop counter not updating, causes infinite loop printing 0

Phase 131-6 Investigation Results:

Bug #1: MIR Copy-to-PHI (FIXED)

Location: src/mir/builder/control_flow/joinir/merge/instruction_rewriter.rs lines 419-440
Problem: Parameter binding was generating Copy { dst: PHI_dst, src: value } in loop latch
Fix: Added check to skip Copy when dst is a header PHI destination
Status: ✅ Copy instruction removed from block 7

Bug #2: Type Inference - String vs Integer (PARTIAL FIX)

Location: src/llvm_py/instructions/binop.py lines 128-153
Problem: MIR marks i + 1 as dst_type: StringBox (forward-looking hint from print(i) usage)
Impact: Python harness was doing string concatenation instead of integer addition
Fix Attempted: Removed force_string logic that trusted dst_type hint
Status: ⚠️ Partially fixed but infinite loop persists

Bug #3: Instruction Ordering Violation (DISCOVERED)

Location: MIR builder (instruction scheduling)
Problem: Copy instructions emitted AFTER values are used (violates SSA dominance)
Example: %6 = %4 + %5 appears before %5 = copy %3
Impact: LLVM requires strict SSA form, Rust VM tolerates it
Status: ❌ Not yet addressed

Additional Findings

Simple test /tmp/test_simple_add.hako (i=0; i=i+1; print(i)) also fails (prints 0 not 1)
Issue exists even without loops, suggesting fundamental binop/type problem
String tagging propagation may be marking integer PHI values as strings

Next Steps:

Trace resolver.is_stringish() to see if integers are being marked as strings
Fix MIR instruction scheduling to respect SSA dominance
Consider runtime type checking instead of compile-time inference

Files Modified:

src/mir/builder/control_flow/joinir/merge/instruction_rewriter.rs (Phase 131-6 fix)
src/llvm_py/instructions/binop.py (Phase 131-6 partial fix)

Priority 4: Fix TAG-EMIT (JoinIR Pattern Coverage)

Target: Case C (llvm_stage3_loop_only.hako)

Approach:

Analyze loop(true) { ... break ... continue } control flow
Design JoinIR Pattern variant (Pattern 1.5 or Pattern 5?)
Implement pattern in src/mir/builder/control_flow/joinir/patterns/
Update router to match this pattern

Files:

src/mir/builder/control_flow/joinir/router.rs - add pattern matching
src/mir/builder/control_flow/joinir/patterns/ - new pattern module

Expected: Infinite loops with break/continue should lower to JoinIR

Priority 5: Comprehensive Loop Coverage Test

After P3+P4 fixed:

Test Matrix:

# Pattern 1: Simple while
apps/tests/loop_min_while.hako

# Pattern 2: Infinite loop + break
apps/tests/llvm_stage3_loop_only.hako

# Pattern 3: Loop with if-phi
apps/tests/loop_if_phi.hako

# Pattern 4: Nested loops
apps/tests/nested_loop_inner_break_isolated.hako

All should pass: EMIT ✅ LINK ✅ RUN ✅

Box Theory Modularization Feedback

LLVM Line SSOT Analysis

✅ Good: Single Entry Point

tools/build_llvm.sh is the SSOT for LLVM exe line
Clear 4-phase pipeline: Build → Emit → Link → Run
Env vars control compiler mode (NYASH_LLVM_COMPILER=harness|crate)

❌ Bad: Harness Duplication Risk

Python harness: src/llvm_py/llvm_builder.py (~2000 lines)
Rust crate: crates/nyash-llvm-compiler/ (separate implementation)
Both translate MIR14→LLVM, risk of divergence

🔧 Recommendation: Harness as Box

Box: LLVMCompilerBox
  - Method: compile_to_object(mir_json: str, output: str)
  - Default impl: Python harness (llvmlite)
  - Alternative impl: Rust crate (inkwell - deprecated)
  - Interface: MIR JSON v1 schema (fixed contract)

Benefits:

Single interface definition
Easy A/B testing (Python vs Rust)
Plugin architecture: external LLVM backends

Duplication Found: BB Emission Logic

Location 1: src/llvm_py/llvm_builder.py:400-600 Location 2: (likely) crates/nyash-llvm-compiler/src/codegen/ (if crate path is used)

Problem: Empty BB handling differs between harness and crate path

Solution: Box-first extraction

// Extract to: src/mir/llvm_ir_validator.rs
pub fn validate_basic_blocks(blocks: &[BasicBlock]) -> Result<(), String> {
    for bb in blocks {
        if bb.instructions.is_empty() && bb.terminator.is_none() {
            return Err(format!("Empty BB detected: {:?}", bb.id));
        }
    }
    Ok(())
}

Call this validator before harness invocation (in Rust MIR emission path).

Legacy Deletion Candidates

1. LoopBuilder Remnants (Phase 33 cleanup incomplete?)

Search: grep -r "LoopBuilder" src/mir/builder/control_flow/ Action: Verify no dead imports/comments remain

2. Unreachable BB Emission Code

Location: src/llvm_py/llvm_builder.py Check: Does harness skip "reachable": false blocks from MIR JSON? Action: If not, add filter before BB emission

Code snippet to check:

# src/llvm_py/llvm_builder.py (approx line 450)
for block in function["blocks"]:
    if block.get("reachable") == False:  # ← Add this check?
        continue
    self.emit_basic_block(block)

Validation: build_llvm.sh SSOT Conformance

✅ Confirmed SSOT Behaviors

Feature selection: NYASH_LLVM_FEATURE=llvm (default harness) vs llvm-inkwell-legacy
Compiler mode: NYASH_LLVM_COMPILER=harness (default) vs crate (ny-llvmc)
Object caching: NYASH_LLVM_SKIP_EMIT=1 for pre-generated .o files
Runtime selection: NYASH_LLVM_NYRT=crates/nyash_kernel/target/release

❌ Missing SSOT: Error Logs

Python harness errors go to stderr (lost after build_llvm.sh exits)
No env var for NYASH_LLVM_HARNESS_LOG=/tmp/llvm_harness.log

Recommendation:

# In build_llvm.sh, line ~118:
HARNESS_LOG="${NYASH_LLVM_HARNESS_LOG:-/tmp/nyash_llvm_harness_$$.log}"
NYASH_LLVM_OBJ_OUT="$OBJ" NYASH_LLVM_USE_HARNESS=1 \
  "$BIN" --backend llvm "$INPUT" 2>&1 | tee "$HARNESS_LOG"

Timeline Estimate

P1 (Loop PHI → LLVM IR fix): 1-2 hours (harness BB emission logic)
P2 (JoinIR pattern coverage): 3-4 hours (pattern design + implementation)
P3 (Comprehensive test): 1 hour (run matrix + verify)

Total: 5-7 hours to full LLVM loop support

Executive Summary

Phase 131-5 Results (TAG-LINK Fix Complete!)

✅ Case A (Minimal): PASS - Simple return works perfectly

EMIT ✅ LINK ✅ RUN ✅
Validates: Build pipeline, NyKernel runtime, basic MIR→LLVM lowering

⚠️ Case B (Loop+PHI): EMIT ✅ LINK ✅ RUN ❌

Phase 131-4: Fixed TAG-EMIT (PHI after terminator) ✅
Phase 131-5: Fixed TAG-LINK (symbol name mismatch) ✅
NEW ISSUE: TAG-RUN (infinite loop - counter not updating) ❌
Progress: 2/3 milestones achieved, runtime bug discovered

✅ Case B2 (BoxCall): PASS - print() without loops works

EMIT ✅ LINK ✅ RUN ✅
Validates: BoxCall→ExternCall lowering, runtime ABI

❌ Case C (Break/Continue): TAG-EMIT failure - JoinIR pattern gap

Root Cause: loop(true) { break } pattern not recognized by JoinIR router
Status: Unchanged from Phase 131-3

Phase 131-5 Achievements

✅ Fixed TAG-LINK (Symbol Name Mismatch):

Investigation: Used objdump to discover NyKernel exports symbols with dots
Root Cause: Python harness was converting nyash.console.log → nyash_console_log
Fix: Removed dot-to-underscore conversion in externcall.py
Verification: Case B now links successfully against NyKernel
No Regression: Cases A and B2 still pass

Files Modified:

src/llvm_py/instructions/externcall.py (4 lines removed)

Impact: All ExternCall symbols now match NyKernel exports exactly.

Critical Path Update

✅ Fix PHI ordering (P1 - Phase 131-4) - DONE
✅ Fix symbol mapping (P2 - Phase 131-5) - DONE
🔥 Fix loop runtime bug (P3 - NEW) - IN PROGRESS
⏳ Add JoinIR Pattern 5 (P4) - PENDING
⏳ Comprehensive test (P5) - PENDING

Total Effort So Far: ~3 hours (Investigation + 2 fixes) Remaining: ~4-6 hours (Runtime bug + Pattern 5 + Testing)

Box Theory Modularization Insights

✅ Good: LLVM Line SSOT

tools/build_llvm.sh is well-structured (4-phase pipeline)
Clear separation: Emit → Link → Run
Environment variables control behavior cleanly

⚠️ Risk: Harness Duplication

Python harness (src/llvm_py/) vs Rust crate (crates/nyash-llvm-compiler/)
Both implement MIR14→LLVM, risk of divergence
Recommendation: Box-ify with interface contract (MIR JSON v1 schema)

🔧 Technical Debt Found

PHI emission ordering - Architectural issue, not a quick fix
Unreachable block handling - MIR JSON marks all blocks reachable: false (may be stale metadata)
Error logging - Python harness errors lost after build_llvm.sh exits

Appendix: Test Commands

Case A (Minimal - PASS)

tools/build_llvm.sh apps/tests/phase87_llvm_exe_min.hako -o tmp/case_a
tmp/case_a
echo $?  # Expected: 42

Case B (Loop PHI - FAIL at EMIT)

tools/build_llvm.sh apps/tests/loop_min_while.hako -o tmp/case_b
# Error: empty bb4 in LLVM IR

Case B2 (Simple BoxCall - PASS)

cat > /tmp/case_b_simple.hako << 'EOF'
static box Main {
    main() {
        print(42)
        return 0
    }
}
EOF
tools/build_llvm.sh /tmp/case_b_simple.hako -o tmp/case_b2
tmp/case_b2
# Output: (empty, but executes without crash)

Case C (Complex Loop - FAIL at MIR)

tools/build_llvm.sh apps/tests/llvm_stage3_loop_only.hako -o tmp/case_c
# Error: JoinIR pattern not supported

MIR JSON Inspection (Case B Debug)

# Generate MIR JSON
./target/release/hakorune --emit-mir-json /tmp/case_b.json --backend mir apps/tests/loop_min_while.hako

# Check for unreachable blocks
jq '.cfg.functions[] | select(.name=="main") | .blocks[] | select(.reachable==false)' /tmp/case_b.json

# Inspect bb4 (the problematic block)
jq '.cfg.functions[] | select(.name=="main") | .blocks[] | select(.id==4)' /tmp/case_b.json

Success Criteria

Phase 131-5 Complete when:

✅ Case A continues to pass (regression prevention) - VERIFIED
⚠️ Case B (loop_min_while.hako) compiles to valid LLVM IR and links - PARTIAL (EMIT ✅ LINK ✅ RUN ❌)
✅ Case B2 continues to pass (BoxCall regression prevention) - VERIFIED
❌ Case C (llvm_stage3_loop_only.hako) lowers to JoinIR and runs - NOT YET
⚠️ All 4 cases produce correct output - PARTIAL (2/4 passing)
⚠️ No plugin errors (or plugin errors are benign/documented) - ACCEPTABLE (plugin errors don't affect AOT execution)

Definition of Done:

All test cases: EMIT ✅ LINK ✅ RUN ✅
Exit codes match expected values
Output matches expected output (where applicable)

19 KiB Raw Blame History

Phase 131-3: MIR→LLVM Lowering Inventory

Test Cases & Results

Root Causes Identified

1. TAG-EMIT: Loop PHI → Invalid LLVM IR (Case B)

2. TAG-LINK: Symbol Name Mismatch (Case B) - ✅ FIXED (Phase 131-5)

3. TAG-RUN: Loop Infinite Iteration (Case B) - 🔍 NEW ISSUE

4. TAG-EMIT: JoinIR Pattern Mismatch (Case C)

Success Cases

Case A: Minimal (No BoxCall, No Loop)

Case B2: Simple BoxCall (No Loop)

Next Steps

✅ Priority 1: COMPLETED - Fix TAG-EMIT (PHI After Terminator Bug)

✅ Priority 2: COMPLETED - Fix TAG-LINK (Symbol Name Mismatch)

🔥 Priority 3: Fix TAG-RUN (Loop Infinite Iteration) - IN PROGRESS (Phase 131-6)

Bug #1: MIR Copy-to-PHI (FIXED)

Bug #2: Type Inference - String vs Integer (PARTIAL FIX)

Bug #3: Instruction Ordering Violation (DISCOVERED)

Additional Findings

Priority 4: Fix TAG-EMIT (JoinIR Pattern Coverage)

Priority 5: Comprehensive Loop Coverage Test

Box Theory Modularization Feedback

LLVM Line SSOT Analysis

✅ Good: Single Entry Point

❌ Bad: Harness Duplication Risk

🔧 Recommendation: Harness as Box

Duplication Found: BB Emission Logic

Legacy Deletion Candidates

1. LoopBuilder Remnants (Phase 33 cleanup incomplete?)

2. Unreachable BB Emission Code

Validation: build_llvm.sh SSOT Conformance

✅ Confirmed SSOT Behaviors

❌ Missing SSOT: Error Logs

Timeline Estimate

Executive Summary

Phase 131-5 Results (TAG-LINK Fix Complete!)

Phase 131-5 Achievements

Critical Path Update

Box Theory Modularization Insights

✅ Good: LLVM Line SSOT

⚠️ Risk: Harness Duplication

🔧 Technical Debt Found

Appendix: Test Commands

Case A (Minimal - PASS)

Case B (Loop PHI - FAIL at EMIT)

Case B2 (Simple BoxCall - PASS)

Case C (Complex Loop - FAIL at MIR)

MIR JSON Inspection (Case B Debug)

Success Criteria

19 KiB

Raw Blame History