Files
hakorune/docs/development/current/main/phase87-selfhost-llvm-exe-line.md
nyash-codex 5709026812 docs(phase131): Phase 131-3 完了 - LLVM lowering 棚卸し(3ケース)
Phase 131-3 完了: MIR→LLVM lowering 棚卸し

テスト結果マトリックス:
- Case A (phase87_llvm_exe_min.hako):  PASS (baseline)
- Case B (loop_min_while.hako):  TAG-EMIT (PHI after terminator)
- Case B2 (print(42) simple):  PASS (BoxCall works)
- Case C (llvm_stage3_loop_only.hako):  TAG-EMIT (JoinIR pattern gap)

Critical Bugs:
1. Bug #1: PHI After Terminator (Case B)
   - 原因: function_lower.py が terminator を PHI より先に emit
   - 修正: 4-pass block emission (2-3h)

2. Bug #2: JoinIR Pattern Gap (Case C)
   - 原因: loop(true) { break } パターンが JoinIR 未対応
   - 修正: Pattern 5 設計・実装 (3-4h)

Next Actions:
- P1 (推奨): PHI ordering 修正 → 80% のループを有効化
- P2: JoinIR Pattern 5 → infinite loop 対応

ドキュメント:
- phase131-3-llvm-lowering-inventory.md: 詳細棚卸し結果
- phase87-selfhost-llvm-exe-line.md: LLVM IR parsing error 追記
- CURRENT_TASK.md: phase131-3 参照追加

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-14 05:55:21 +09:00

754 lines
19 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Phase 87: LLVM Exe Line SSOT (2025-12-13)
## Goal
Establish single source of truth for `.hako → .o → executable → execution` pipeline.
**SSOT Tool**: `tools/build_llvm.sh`
## Prerequisites
1. **llvm-config-18** available:
```bash
llvm-config-18 --version
# Expected: 18.x.x
```
2. **hakorune built with LLVM features**:
```bash
cargo build --release --features llvm
./target/release/hakorune --version
# Check --backend llvm available in --help
```
3. **Python llvmlite** (for LLVM harness):
```bash
python3 -c "import llvmlite; print(llvmlite.__version__)"
# Expected: 0.40.0 or newer
```
## Compiler Modes
`tools/build_llvm.sh` supports two compiler modes for LLVM object generation:
### Harness (Default) - Production Ready
**Python llvmlite-based LLVM IR generation**
- **Stability**: ✅ Proven stable, battle-tested
- **Build Time**: Fast (~1-3s for minimal programs)
- **Dependencies**: Python 3, llvmlite, LLVM 18
- **Use Case**: Default for all production builds
**Enable** (default behavior):
```bash
# Explicit mode selection (optional):
NYASH_LLVM_COMPILER=harness tools/build_llvm.sh program.hako -o output
# Default (no env var needed):
tools/build_llvm.sh program.hako -o output
```
**How it works**:
1. `hakorune --backend llvm` invokes Python harness
2. `src/llvm_py/llvm_builder.py` generates LLVM IR via llvmlite
3. `llc-18` compiles IR to object file
### Crate (Experimental) - Rust-native Compiler
**Pure Rust LLVM IR generation via crates/nyash-llvm-compiler**
- **Stability**: ⚠️ Experimental, under active development
- **Build Time**: Slower (~5-10s, requires crate compilation)
- **Dependencies**: LLVM 18 dev libraries, Rust toolchain
- **Use Case**: Advanced users, development/testing
**Enable**:
```bash
NYASH_LLVM_COMPILER=crate tools/build_llvm.sh program.hako -o output
```
**How it works**:
1. `hakorune --emit-mir-json` generates MIR JSON
2. `ny-llvmc` (Rust crate) reads JSON and emits LLVM IR
3. `llc-18` compiles IR to object file
**Advanced: Direct exe emission** (experimental):
```bash
NYASH_LLVM_COMPILER=crate NYASH_LLVM_EMIT=exe \
tools/build_llvm.sh program.hako -o output
# Skips separate linking step, emits executable directly
```
### Mode Comparison Table
| Feature | Harness (Default) | Crate (Experimental) |
|---------|------------------|----------------------|
| **Stability** | ✅ Production ready | ⚠️ Experimental |
| **Build Time** | Fast (1-3s) | Moderate (5-10s) |
| **Dependencies** | Python + llvmlite | LLVM dev + Rust |
| **MIR JSON** | Internal | Explicit generation |
| **Direct exe** | ❌ Not supported | ✅ Experimental |
| **Recommended For** | All users | Advanced/dev only |
**Default recommendation**: Use harness mode (no env vars needed).
## Environment Variables Reference
環境変数の一覧は [`docs/reference/environment-variables.md`](../../reference/environment-variables.md#llvm-build-pipeline) の「LLVM Build Pipeline」セクションを参照してください。
主要な環境変数(クイックリファレンス):
- `NYASH_LLVM_COMPILER`: コンパイラモード (`harness` または `crate`)
- `NYASH_CLI_VERBOSE=1`: 詳細ビルド出力を有効化
- `NYASH_LLVM_ONLY_OBJ=1`: オブジェクト生成後に停止
- その他14変数の詳細は上記SSOTドキュメントを参照
## Standard Procedure
**Build and execute** a .hako program to native executable:
```bash
# Step 1: Build .hako → executable
tools/build_llvm.sh apps/tests/your_program.hako -o tmp/your_program
# Step 2: Execute
./tmp/your_program
echo $? # Check exit code
```
**What it does**:
1. Compiles `.hako` → MIR (hakorune compiler)
2. MIR → LLVM IR (llvmlite harness, `src/llvm_py/`)
3. LLVM IR → object file `.o` (llvm tools)
4. Links object → executable (clang)
## Example: Minimal Program
**File**: `apps/tests/phase87_llvm_exe_min.hako`
```nyash
static box Main {
main() {
return 42
}
}
```
**Build**:
```bash
tools/build_llvm.sh apps/tests/phase87_llvm_exe_min.hako -o tmp/phase87_test
```
**Execute**:
```bash
./tmp/phase87_test
echo $? # Output: 42
```
## Detailed Pipeline Explanation
### Step 1: Hako → MIR JSON
**Command** (internal to build_llvm.sh):
```bash
./target/release/hakorune --emit-mir-json tmp/program.json program.hako
```
**Output**: MIR JSON representation of the program
### Step 2: MIR JSON → LLVM IR
**Command** (internal to build_llvm.sh):
```bash
python3 src/llvm_py/llvm_builder.py tmp/program.json -o tmp/program.ll
```
**Output**: LLVM IR text file (.ll)
### Step 3: LLVM IR → Object File
**Command** (internal to build_llvm.sh):
```bash
llc-18 tmp/program.ll -o tmp/program.o -filetype=obj
```
**Output**: Object file (.o)
### Step 4: Object File → Executable
**Command** (internal to build_llvm.sh):
```bash
clang-18 tmp/program.o -o tmp/program
```
**Output**: Native executable
## Success/Failure Criteria
### Success Indicators
A successful build via `tools/build_llvm.sh` exhibits:
**1. Exit Code**: `0`
```bash
tools/build_llvm.sh program.hako -o output
echo $? # Should output: 0
```
**2. All 4 Steps Complete**:
```
[1/4] Building hakorune (feature selectable) ...
[2/4] Emitting object (.o) via LLVM backend ...
[3/4] Building Nyash Kernel static runtime ...
[4/4] Linking output ...
✅ Done: output
```
**3. Executable Generated**:
```bash
ls -lh output
# Should exist and be executable
file output
# Output: ELF 64-bit LSB executable, x86-64, dynamically linked
```
**4. Executable Runs**:
```bash
./output
echo $?
# Should match expected exit code (e.g., 42 for phase87_llvm_exe_min.hako)
```
### Failure Modes
`build_llvm.sh` uses distinct exit codes for different failure types:
| Exit Code | Meaning | Common Cause |
|-----------|---------|--------------|
| **0** | ✅ Success | Build completed normally |
| **1** | Usage error | Missing input file or invalid arguments |
| **2** | Missing dependency | `llvm-config-18` not found |
| **3** | Compilation failure | Object file not generated (MIR/LLVM IR error) |
| **Other** | System/linking error | Linking failure, missing libraries |
**Exit Code 1** - Usage Error:
```bash
tools/build_llvm.sh
# Output: Usage: tools/build_llvm.sh <input.hako> [-o <output>]
# Exit: 1
```
**Exit Code 2** - Missing LLVM:
```bash
# When llvm-config-18 not installed
tools/build_llvm.sh program.hako -o output
# Output: error: llvm-config-18 not found (install LLVM 18 dev).
# Exit: 2
```
**Exit Code 3** - Object Generation Failed:
```bash
# When MIR/LLVM IR compilation fails
tools/build_llvm.sh bad_program.hako -o output
# Output: error: object not generated: target/aot_objects/bad_program.o
# Exit: 3
```
### Validation Commands
**Verify object file validity**:
```bash
# Check object file exists and has correct format
file target/aot_objects/program.o
# Expected: ELF 64-bit relocatable, x86-64
# Check object file symbols
nm target/aot_objects/program.o | grep -E '(main|nyash_)'
# Should show exported symbols
```
**Verify LLVM IR validity** (when using crate mode with JSON):
```bash
# Step 1: Generate LLVM IR (manual)
NYASH_LLVM_COMPILER=crate NYASH_LLVM_MIR_JSON=tmp/test.json \
tools/build_llvm.sh program.hako -o output
# Step 2: Validate LLVM IR
llvm-as-18 tmp/test.ll -o /dev/null
# Should complete without errors
# Step 3: Disassemble and inspect
llvm-dis-18 target/aot_objects/program.o -o - | less
# Should show valid LLVM IR
```
**Verify MIR JSON validity** (crate mode):
```bash
# Ensure MIR JSON is well-formed
jq . tmp/test.json > /dev/null
echo $? # Should output: 0
# Optional: Schema validation
NYASH_LLVM_VALIDATE_JSON=1 NYASH_LLVM_COMPILER=crate \
tools/build_llvm.sh program.hako -o output
```
### Build Time Expectations
Typical build times for `phase87_llvm_exe_min.hako` (minimal program):
| Step | Expected Time | Notes |
|------|---------------|-------|
| **[1/4] Build hakorune** | ~0.5-2s | Incremental build (release) |
| **[2/4] Emit object** | ~1-2s | Harness mode (llvmlite) |
| | ~5-10s | Crate mode (ny-llvmc) |
| **[3/4] Build Nyash Kernel** | ~1-3s | Incremental build (release) |
| **[4/4] Linking** | ~0.2-0.5s | Native linker (cc/clang) |
| **Total** | ~3-8s | Harness mode |
| | ~7-15s | Crate mode |
**First build**: Add ~30-60s for initial `cargo build --release` compilation.
**Performance factors**:
- **Parallel builds**: `-j 24` used by default (see `build_llvm.sh`)
- **Incremental builds**: `CARGO_INCREMENTAL=1` enabled
- **Cache hits**: Subsequent builds much faster (~1-3s total)
**Troubleshooting slow builds**:
```bash
# Check cargo cache status
cargo clean --release -p nyash-rust
cargo clean --release -p nyash-llvm-compiler
# Rebuild with timing information
time tools/build_llvm.sh program.hako -o output
# Verbose output for bottleneck analysis
NYASH_CLI_VERBOSE=1 time tools/build_llvm.sh program.hako -o output
```
## Troubleshooting
### Issue: llvm-config-18 not found
**Symptom**: `build_llvm.sh` fails with "llvm-config-18: command not found"
**Solution**:
```bash
# Ubuntu/Debian:
sudo apt-get install llvm-18-dev llvm-18-tools
# macOS (Homebrew):
brew install llvm@18
export PATH="/opt/homebrew/opt/llvm@18/bin:$PATH"
# WSL (Ubuntu):
wget https://apt.llvm.org/llvm.sh
chmod +x llvm.sh
sudo ./llvm.sh 18
```
### Issue: Python llvmlite not found
**Symptom**: `ModuleNotFoundError: No module named 'llvmlite'`
**Solution**:
```bash
pip3 install llvmlite
# If system-wide install fails, use virtual environment:
python3 -m venv venv
source venv/bin/activate
pip install llvmlite
```
### Issue: Linking fails
**Symptom**: `ld: symbol(s) not found for architecture`
**Check**:
- Ensure clang-18 is installed
- Verify LLVM 18 libraries available:
```bash
llvm-config-18 --libdir
ls $(llvm-config-18 --libdir)
```
### Issue: MIR compilation error
**Symptom**: hakorune fails to compile .hako to MIR
### Issue: LLVM IR parsing errorexpected instruction opcode / PHI placement
**Symptom**: llvmlite が生成した LLVM IR の parse に失敗する(例: `expected instruction opcode`)。
**Next**:
- まず棚卸しと代表ケース表を確認: `docs/development/current/main/phase131-3-llvm-lowering-inventory.md`
- 典型例: ループ + PHI が絡むケースで “PHI が terminator の後に出る” など、LLVM IR の不変条件違反が起きる
**Debug**:
```bash
# Test MIR generation manually:
./target/release/hakorune --emit-mir-json test.json test.hako
# Check error messages:
cat test.json # Should be valid JSON
```
### Issue: LLVM IR generation error
**Symptom**: llvm_builder.py fails
**Debug**:
```bash
# Run Python builder manually:
python3 src/llvm_py/llvm_builder.py test.json -o test.ll
# Check LLVM IR validity:
llvm-as-18 test.ll -o /dev/null
# Should complete without errors
```
### Debugging Build Pipeline
When `build_llvm.sh` fails, use these techniques to isolate the problem:
#### Enable Verbose Mode
**Global verbose output**:
```bash
NYASH_CLI_VERBOSE=1 tools/build_llvm.sh program.hako -o output
# Shows detailed command execution via set -x
```
**Step-by-step verbosity**:
```bash
# Verbose hakorune compilation
NYASH_CLI_VERBOSE=1 ./target/release/hakorune --emit-mir-json tmp/debug.json program.hako
# Verbose Python LLVM builder
python3 -v src/llvm_py/llvm_builder.py tmp/debug.json -o tmp/debug.ll
# Verbose LLVM compilation
llc-18 -debug tmp/debug.ll -o tmp/debug.o -filetype=obj
# Verbose linking
cc -v tmp/debug.o -L crates/nyash_kernel/target/release -lnyash_kernel -o output
```
#### Manual Step Tracing
**Isolate each step** to find exact failure point:
**Step 1: Test MIR emission**:
```bash
./target/release/hakorune --emit-mir-json tmp/test.json program.hako
echo $? # Should be 0
jq . tmp/test.json # Validate JSON
```
**Step 2: Test LLVM IR generation**:
```bash
# Harness mode (default)
NYASH_LLVM_USE_HARNESS=1 ./target/release/hakorune --backend llvm program.hako
# Check exit code
# Crate mode
cargo build --release -p nyash-llvm-compiler
./target/release/ny-llvmc --in tmp/test.json --out tmp/test.o
file tmp/test.o # Should be ELF object
```
**Step 3: Test object compilation**:
```bash
# If .ll file available (crate mode intermediate)
llc-18 -filetype=obj tmp/test.ll -o tmp/test.o
file tmp/test.o # Verify ELF format
nm tmp/test.o # Check symbols
```
**Step 4: Test linking**:
```bash
# Ensure Nyash Kernel built
cd crates/nyash_kernel && cargo build --release
cd ../..
# Manual link
cc tmp/test.o \
-L crates/nyash_kernel/target/release \
-Wl,--whole-archive -lnyash_kernel -Wl,--no-whole-archive \
-lpthread -ldl -lm -o tmp/manual_output
# Test execution
./tmp/manual_output
echo $?
```
#### Save Intermediate Files
**Preserve all build artifacts** for inspection:
```bash
# Create debug directory
mkdir -p debug_build
# Step 1: Emit MIR JSON
./target/release/hakorune --emit-mir-json debug_build/program.json program.hako
# Step 2: Generate LLVM IR (harness mode, manual Python call)
python3 src/llvm_py/llvm_builder.py debug_build/program.json -o debug_build/program.ll
# Step 3: Compile to object
llc-18 debug_build/program.ll -o debug_build/program.o -filetype=obj
# Step 4: Link
cc debug_build/program.o \
-L crates/nyash_kernel/target/release \
-Wl,--whole-archive -lnyash_kernel -Wl,--no-whole-archive \
-lpthread -ldl -lm -o debug_build/program
# Inspect all intermediate files
ls -lh debug_build/
file debug_build/*
```
**Inspect saved artifacts**:
```bash
# View MIR JSON structure
jq '.functions[] | {name: .name, blocks: .blocks | length}' debug_build/program.json
# View LLVM IR
less debug_build/program.ll
# Disassemble object file
objdump -d debug_build/program.o | less
# Check symbols
nm debug_build/program.o
nm debug_build/program
```
#### Common LLVM IR Issues
**Problem 1: Invalid function signature**
```
error: expected type '...' but found '...'
```
**Diagnosis**: MIR → LLVM IR type mismatch
**Fix**: Check MIR JSON `functions[].signature`, ensure correct types
**Problem 2: Undefined symbol**
```
error: undefined reference to 'nyash_...'
```
**Diagnosis**: Missing Nyash Kernel runtime symbols
**Fix**:
```bash
# Rebuild Nyash Kernel
cd crates/nyash_kernel && cargo clean && cargo build --release
# Verify symbols available
nm crates/nyash_kernel/target/release/libnyash_kernel.a | grep nyash_
```
**Problem 3: Invalid IR instruction**
```
error: invalid IR instruction '...'
```
**Diagnosis**: Python llvm_builder.py bug or unsupported MIR instruction
**Fix**:
```bash
# Check LLVM IR syntax
llvm-as-18 -o /dev/null debug_build/program.ll
# Error message shows exact line number
# Inspect problematic instruction
sed -n '<line>p' debug_build/program.ll
```
**Problem 4: Linking failure**
```
ld: symbol(s) not found for architecture x86_64
```
**Diagnosis**: Missing system libraries or incorrect link order
**Fix**:
```bash
# Check what symbols are needed
nm -u debug_build/program.o
# Verify Nyash Kernel provides them
nm crates/nyash_kernel/target/release/libnyash_kernel.a | grep <symbol>
# If system library missing, add to NYASH_LLVM_LIBS
NYASH_LLVM_LIBS="-lmissing_lib" tools/build_llvm.sh program.hako -o output
```
#### Environment Variables for Debugging
Combine multiple debugging flags:
```bash
# Maximum verbosity + preserve artifacts
NYASH_CLI_VERBOSE=1 \
NYASH_LLVM_COMPILER=crate \
NYASH_LLVM_MIR_JSON=/tmp/debug.json \
NYASH_LLVM_VALIDATE_JSON=1 \
tools/build_llvm.sh program.hako -o /tmp/debug_output
# Then inspect intermediate files
ls -lh /tmp/debug*
jq . /tmp/debug.json
cat /tmp/debug.ll
```
**Recommended debugging workflow**:
1. Enable `NYASH_CLI_VERBOSE=1` for initial diagnosis
2. Use manual step tracing to isolate failure
3. Save intermediate files for inspection
4. Check LLVM IR validity with `llvm-as-18`
5. Verify object symbols with `nm`
6. Test linking manually with verbose `cc -v`
## What NOT to Do
❌ **DO NOT** create custom link procedures:
- Scattered linking logic across multiple scripts
- Manual `clang` invocations outside `build_llvm.sh`
- Duplicate .o → exe pipelines
❌ **DO NOT** bypass build_llvm.sh:
- Direct llvm_builder.py invocations for production
- Custom shell scripts for one-off builds
- Hardcoded paths in makefiles
✅ **DO** use `tools/build_llvm.sh` for all LLVM exe generation
## Integration Test
**Location**: `tools/smokes/v2/profiles/integration/apps/phase87_llvm_exe_min.sh`
**What it tests**:
- Full pipeline: .hako → exe → execution
- Exit code verification (42)
- SKIP if LLVM unavailable (graceful degradation)
**Run manually**:
```bash
tools/smokes/v2/run.sh --profile integration --filter 'phase87_llvm_exe_min\.sh'
```
**Expected outcomes**:
- **PASS**: If llvm-config-18 available → exit code 42 verified
- **SKIP**: If llvm-config-18 not found → graceful skip message
## Why Exit Code 42?
- **Stdout-independent**: Works even if stdout is redirected/buffered
- **Deterministic**: No parsing required, simple integer comparison
- **Traditional**: Unix convention for testable exit codes
- **Minimal**: No dependencies on print/console boxes
## Advanced Usage
### Custom output location
```bash
# Default: output to tmp/
tools/build_llvm.sh program.hako -o custom/path/binary
# Ensure directory exists first:
mkdir -p custom/path
```
### Debugging build steps
**Set verbose mode** (if supported by build_llvm.sh):
```bash
VERBOSE=1 tools/build_llvm.sh program.hako -o output
```
**Check intermediate files**:
```bash
# MIR JSON:
ls -lh tmp/*.json
# LLVM IR:
ls -lh tmp/*.ll
# Object file:
ls -lh tmp/*.o
```
### Comparing with VM backend
**VM execution** (interpreted):
```bash
./target/release/hakorune --backend vm program.hako
echo $?
```
**LLVM execution** (native):
```bash
tools/build_llvm.sh program.hako -o tmp/program
./tmp/program
echo $?
```
**Should produce identical exit codes** for correct programs.
## Performance Characteristics
**Build time**: ~1-3 seconds for minimal programs
- .hako → MIR: ~100ms
- MIR → LLVM IR: ~500ms
- LLVM IR → .o: ~1s
- Linking: ~200ms
**Execution time**: Native speed (no VM overhead)
- Typical speedup: 10-100x vs VM backend
- No JIT warmup required
- Full LLVM optimizations applied
## Related Documentation
- LLVM Python harness: `src/llvm_py/README.md`
- MIR spec: `docs/reference/mir/INSTRUCTION_SET.md`
- Integration smokes: `tools/smokes/v2/README.md`
- build_llvm.sh implementation: `tools/build_llvm.sh` (read source for details)
## SSOT Principle
**Single Source of Truth**:
- ONE script: `tools/build_llvm.sh`
- ONE pipeline: .hako → MIR → LLVM IR → .o → exe
- ONE integration test: `phase87_llvm_exe_min.sh`
**Benefits**:
- Maintainability: Update one script, not scattered logic
- Consistency: All LLVM builds use same pipeline
- Testability: Single smoke test covers full pipeline
- Documentation: One canonical reference
**Anti-patterns to avoid**:
- Multiple competing build scripts
- Copy-pasted linking commands
- Ad-hoc shell scripts for "quick builds"
## Status
- ✅ SSOT established: `tools/build_llvm.sh`
- ✅ Integration smoke added: `phase87_llvm_exe_min.sh`
- ✅ Documentation complete
- ✅ Prerequisites verified: llvm-config-18, llvmlite, LLVM features
- 🎯 Production ready: Use for all LLVM native compilations
## Future Enhancements (Out of Scope for Phase 87)
- **Optimization levels**: -O0, -O1, -O2, -O3 flags
- **Debug symbols**: -g flag support
- **Static linking**: --static flag
- **Cross-compilation**: --target flag
- **LTO**: Link-time optimization support
**Current scope**: Baseline SSOT pipeline establishment only.