refactor(joinir): Phase 287 P0.1 - Move verification to debug_assertions

- Move verify_no_phi_dst_overwrite() to debug_assertions.rs
- Move verify_phi_inputs_defined() to debug_assertions.rs
- Move verify_joinir_contracts() to debug_assertions.rs
- Remove duplicate get_instruction_dst() from mod.rs
- mod.rs: 1,555 → ~1,380 lines (-176 lines)
- Semantic invariance: 154/154 smoke tests PASS, Pattern6 RC:9

Phase 287 P0: Big Files Refactoring (意味論不変)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
2025-12-27 10:10:59 +09:00
parent a04b48416e
commit 433e1d45c0
5 changed files with 416 additions and 193 deletions

View File

@ -8,6 +8,7 @@
- `./tools/smokes/v2/run.sh --profile quick` 154/154 PASS 維持
- 入口: `docs/development/current/main/phases/phase-188.3/README.md`
- 次の指示書refactor挟み込み: `docs/development/current/main/phases/phase-188.3/P2-REFACTORING-INSTRUCTIONS.md`
- 次の指示書(でかいファイル分割): `docs/development/current/main/phases/phase-287/P0-BIGFILES-REFACTORING-INSTRUCTIONS.md`
**2025-12-27: Phase 188.2 完了**
- StepTreeの `max_loop_depth` を SSOT に採用Option A

View File

@ -0,0 +1,137 @@
# Phase 287 P0: Big Files Refactoring 指示書(意味論不変 / 推定の削減)
**Date**: 2025-12-27
**Scope**: Rust 側の“でかいファイル”を分割して、推定heuristics依存を減らす
**Non-goals**: 新機能、既定挙動変更、silent fallback 追加、env var 追加
---
## 目的SSOT
- “推定で決めている箇所” を減らし、境界・契約・入口を SSOT として明文化する。
- **意味論不変**で分割し、将来の拡張Pattern6 generalize / merge 強化)に備える。
---
## 前提(現状の到達点)
- Pattern6 の事故undef / 無限ループ)は SSOT へ固定済み:
- latch 記録は `TailCallKind::BackEdge` のみ
- entry-like は “JoinIR main の entry block のみ”
- 二重 latch は `debug_assert!` で fail-fast
- 入口: `docs/development/current/main/phases/phase-188.3/P2-REFACTORING-INSTRUCTIONS.md`
---
## でかいファイルの棚卸し(再現コマンド)
```bash
find src -name '*.rs' -print0 | xargs -0 wc -l | sort -nr | head -50
```
このセッションの観測500行超え:
- 16 個
---
## 優先順位(今すぐやる価値)
### 1) `merge/mod.rs`1,555行— 最優先
現状: merge coordinator + value remap + 契約検証 + header PHI 構築が 1 ファイルに同居。
方針: “純粋寄り” から剥がして、`mod.rs` は orchestrator に寄せる。
実行用の詳細プラン:
- `docs/development/current/main/phases/phase-287/P0-MERGE_MOD_MODULARIZATION_PLAN.md`
#### 目標構造(案)
```
src/mir/builder/control_flow/joinir/merge/
├── mod.rs # orchestrator公開APIと配線のみ
├── entry_selector.rs # loop header / entry block 選定SSOT
├── header_phi_prebuild.rs # LoopHeaderPhiBuilder 呼び出しSSOT
├── boundary_logging.rs # verbose/debug のみtrace統一
└── verification/ # “契約検証” をまとめる
├── mod.rs
└── ... (既存 contract_checks.rs を移す/薄くする)
```
#### SSOTここで削る推定
- loop header の推定を boundary に寄せる:
- `JoinInlineBoundary.loop_header_func_name` を優先し、無い場合のみ legacy heuristic
- “log を常時出す” を禁止し、`trace.stderr_if(..., debug/verbose)` に統一
#### 受け入れ基準
- `cargo build --release` が通る
- `./tools/smokes/v2/run.sh --profile quick` が PASS
- 差分は “移動 + 入口統一” に限定(意味論不変)
---
### 2) `merge/instruction_rewriter.rs`1,297行— 今は“触らない”が正しい
現状: Scan → Plan → Apply の 3 段パイプラインが 1 ファイルで、局所的に複雑。
方針: **Pattern6 の直後なので、いま大きく動かさない**(回帰コストが高い)。
#### ただし“安全にできる”こと(意味論不変)
- policy を 1 箇所へ集約して SSOT にする(既に一部完了)
- latch 記録: `src/mir/builder/control_flow/joinir/merge/rewriter/latch_incoming_recorder.rs`
- tail-call 分類: `src/mir/builder/control_flow/joinir/merge/tail_call_classifier.rs`
- “loop header 推定” を boundary SSOT に寄せる(既に実装済み)
#### 将来の分割計画(今すぐはやらない)
- `scanner.rs` / `planner/` / `applicator.rs` へ物理分割
- `RewriteContext` を SSOT にして、stage 間の引数を減らす
---
### 3) `patterns/ast_feature_extractor.rs`1,148行— 低難易度で効く
現状: 複数の “検出器” が同居。純粋関数なので物理分割が安全。
方針: `pattern_recognizers/` を作って“1 recognizer = 1 質問”にする。
#### 目標構造(案)
```
src/mir/builder/control_flow/joinir/patterns/
├── ast_feature_extractor.rs # facadere-export と glue
└── pattern_recognizers/
├── mod.rs
├── continue_break.rs
├── infinite_loop.rs
├── if_else_phi.rs
├── carrier_count.rs
└── ...(既存の extracted recognizer と揃える)
```
#### 受け入れ基準
- 既存の public 関数シグネチャを維持(呼び出し側の差分を最小化)
- `cargo build --release` + quick PASS
- unit tests は “薄く” で良いrecognizer 単位で 12 個)
---
## 小テスト1本だけで契約を固定する推奨
“bb番号や ValueId の固定” は不安定なので、**構造テスト**で固定する。
- latch 二重セット検知debug: `LoopHeaderPhiInfo::set_latch_incoming()``#[should_panic]`
- tail-call 分類の境界: `classify_tail_call()` の “entry-like でも target!=loop_step は LoopEntry ではない”
MIR文字列の grep 固定は、ブロック番号が揺れやすいので最終手段)
---
## 検証手順(毎回)
```bash
cargo build --release
./target/release/hakorune --backend vm apps/tests/phase1883_nested_minimal.hako # RC=9
./tools/smokes/v2/run.sh --profile quick
```

View File

@ -0,0 +1,123 @@
# Phase 287 P0: `merge/mod.rs` Modularization Plan意味論不変
**Date**: 2025-12-27
**Status**: Planning CompleteReady for Implementation
**Parent**: Phase 287 (Big Files Refactoring)
**Goal**: `src/mir/builder/control_flow/joinir/merge/mod.rs` を“配線だけ”に寄せる(意味論不変)
---
## 前提(直近の SSOT
- Pattern6 の merge/latch 事故は SSOT 化済み
- latch 記録は `TailCallKind::BackEdge` のみ
- entry-like は “JoinIR main の entry block のみ”
- 二重 latch は `debug_assert!` で fail-fast
- loop header 推定は boundary SSOT を優先できる状態
- `JoinInlineBoundary.loop_header_func_name`(明示)
- ない場合のみ legacy heuristic後方互換
---
## 現状の問題
- `src/mir/builder/control_flow/joinir/merge/mod.rs` が巨大(責務が混在)
- orchestrator / header PHI 構築 / entry 選定 / 値 remap / boundary ログ / 契約検証
- “推定heuristics” と “契約SSOT” が混ざると回帰が起きやすい
---
## 目標Target State
- `merge/mod.rs` は orchestrator のみ(公開 API + 配線)
- 純粋/半純粋ロジックを局所モジュールへ退避
- SSOT: `boundary.loop_header_func_name` 優先、fallback は “互換のためだけ”
- 意味論不変挙動変更なし、silent fallback 追加なし)
---
## 提案ディレクトリ構造(案)
```
src/mir/builder/control_flow/joinir/merge/
├── mod.rs # orchestrator only
├── entry_selector.rs # loop header func 選定SSOT
├── header_phi_prebuild.rs # header PHI の事前構築(配線)
├── value_remapper.rs # 小さな pure helper必要なら
├── boundary_logging.rs # trace 統一debug/verbose のみ)
└── verification/
├── mod.rs
├── phi_dst_checks.rs
├── carrier_checks.rs
└── value_usage_checks.rs
```
注:
- “verification の SSOT” は既存 `merge/contract_checks.rs` と重なるので、移設するなら **入口を `verification/mod.rs` に統合**し、旧名は re-export で互換維持するのが安全。
---
## 実装フェーズBottom-Up
### Phase 1: `verification/` の抽出(低リスク)
- 対象: **pure function**builder への副作用なし)だけを移す
- 既存 `contract_checks.rs` と役割衝突しないように、移設後も呼び出し点の責務を増やさない
検証:
- `cargo build --release`
- `./tools/smokes/v2/run.sh --profile quick`
### Phase 2: 小さな helper の抽出(低リスク)
- `value_remapper.rs` は “本当に重複があるなら” に限定(作りすぎない)
- 入口は `mod.rs` からのみ呼ぶ(依存拡散を避ける)
### Phase 3: `entry_selector.rs`(中リスク / SSOT
SSOT:
1) `boundary.loop_header_func_name` があればそれを使う
2) なければ legacy heuristic`MAIN``boundary.continuation_func_ids` を除外して最初の関数)
注意(重要):
- “k_exit の名前一致” で除外するのは NGcontinuation は SSOT が `boundary.continuation_func_ids`
- 対象は JoinIR function ではなく、merge が扱っている `MirModule.functions: BTreeMap<String, MirFunction>` の世界観に合わせる
検証:
- fixture: `./target/release/hakorune --backend vm apps/tests/phase1883_nested_minimal.hako`RC=9
- quick: `./tools/smokes/v2/run.sh --profile quick`
### Phase 4: `header_phi_prebuild.rs`(中リスク)
- ここは “pure” ではなく “配線orchestrator補助” と割り切る
- 入口で必要なものremapper / builder / boundary / loop_header_func_name など)を明示引数で受ける
### Phase 5: `boundary_logging.rs`(低リスク)
- `trace.stderr_if(..., debug/verbose)` へ統一
- “常時ログ” を禁止quick のノイズ増加は避ける)
### Phase 6: `merge/mod.rs` の最終整理(低リスク)
- 公開 API と “段取り” だけ残す
- 変更は “移動 + 入口統一” に限定
---
## 検証(毎フェーズ)
```bash
cargo build --release
./target/release/hakorune --backend vm apps/tests/phase1883_nested_minimal.hako # RC=9
./tools/smokes/v2/run.sh --profile quick
```
注:
- “0 warnings” は現状リポジトリ特性として非現実的なので、**0 errors** と **新規の恒常ログ無し** を受け入れ条件にする。
---
## リスクメモ
- `merge/mod.rs``merge/instruction_rewriter.rs` は両方で “entry 選定” を持ちがちなので、SSOT を二重にしない(可能なら selector を共用、ただし大きく動かさない)。

View File

@ -438,3 +438,156 @@ pub(super) fn verify_header_phi_dsts_not_redefined(
}
}
}
/// Phase 204-2: Verify PHI dst ValueIds are not overwritten by subsequent instructions in header block
///
/// # Contract
///
/// PHI instructions must appear first in a basic block, and their dst ValueIds
/// must not be overwritten by any subsequent non-PHI instructions in the same block.
///
/// # Panics
///
/// Panics if:
/// - PHI instruction appears after non-PHI instructions
/// - Non-PHI instruction overwrites a PHI dst in the header block
#[cfg(debug_assertions)]
pub(super) fn verify_no_phi_dst_overwrite(
func: &MirFunction,
header_block: BasicBlockId,
loop_info: &LoopHeaderPhiInfo,
) {
if loop_info.carrier_phis.is_empty() {
return; // No PHIs to verify
}
let header_block_data = func.blocks.get(&header_block).unwrap_or_else(|| {
panic!(
"[JoinIRVerifier] Header block {} not found ({} blocks in func)",
header_block,
func.blocks.len()
)
});
// 1. Collect all PHI dsts
let phi_dsts: std::collections::HashSet<ValueId> = loop_info
.carrier_phis
.values()
.map(|entry| entry.phi_dst)
.collect();
// 2. Check instructions after PHI definitions
let mut after_phis = false;
for instr in &header_block_data.instructions {
match instr {
MirInstruction::Phi { dst, .. } => {
// PHI instructions come first in basic block
if after_phis {
panic!(
"[JoinIRVerifier] PHI instruction {:?} appears after non-PHI instructions in block {}",
dst, header_block.0
);
}
}
_ => {
after_phis = true;
// Check if this instruction writes to a PHI dst
if let Some(dst) = get_instruction_dst(instr) {
if phi_dsts.contains(&dst) {
panic!(
"[JoinIRVerifier/Phase204] PHI dst {:?} is overwritten by instruction in header block {}: {:?}",
dst, header_block.0, instr
);
}
}
}
}
}
}
/// Verify PHI inputs are defined (Phase 204-3 - Conservative sanity checks)
///
/// # Checks
///
/// 1. PHI inputs have reasonable ValueId values (< threshold)
/// 2. No obviously undefined values (e.g., suspiciously large IDs)
///
/// # Note
///
/// Full data-flow analysis (DFA) verification is deferred to Phase 205+.
/// This function only performs conservative sanity checks.
///
/// # Panics
///
/// Panics in debug mode if suspicious PHI inputs are detected.
#[cfg(debug_assertions)]
pub(super) fn verify_phi_inputs_defined(
func: &MirFunction,
header_block: BasicBlockId,
) {
let header_block_data = func.blocks.get(&header_block).unwrap_or_else(|| {
panic!(
"[JoinIRVerifier] Header block {} not found ({} blocks in func)",
header_block,
func.blocks.len()
)
});
for instr in &header_block_data.instructions {
if let MirInstruction::Phi {
dst,
inputs,
type_hint: _,
} = instr
{
for (value_id, pred_block) in inputs {
// Conservative sanity check: ValueId should not be suspiciously large
// Phase 201 JoinValueSpace uses regions:
// - PHI Reserved: 0-99
// - Param: 100-999
// - Local: 1000+
// - Reasonable max: 100000 (arbitrary but catches obvious bugs)
if value_id.0 >= 100000 {
panic!(
"[JoinIRVerifier/Phase204-3] PHI {:?} has suspiciously large input {:?} from predecessor block {:?}",
dst, value_id, pred_block
);
}
}
}
}
}
/// Verify all loop contracts for a merged JoinIR function
///
/// This is the main entry point for verification. It runs all checks
/// and panics if any contract violation is found.
///
/// # Panics
///
/// Panics in debug mode if any contract violation is detected.
#[cfg(debug_assertions)]
pub(super) fn verify_joinir_contracts(
func: &MirFunction,
header_block: BasicBlockId,
exit_block: BasicBlockId,
loop_info: &LoopHeaderPhiInfo,
boundary: &JoinInlineBoundary,
) {
// Phase 135 P1 Step 1: Verify condition_bindings consistency (before merge)
verify_condition_bindings_consistent(boundary);
verify_loop_header_phis(func, header_block, loop_info, boundary);
verify_no_phi_dst_overwrite(func, header_block, loop_info); // Phase 204-2
verify_phi_inputs_defined(func, header_block); // Phase 204-3
verify_exit_line(func, exit_block, boundary);
verify_valueid_regions(loop_info, boundary); // Phase 205-4
// Phase 135 P1 Step 2: Verify header PHI dsts not redefined (after merge)
let phi_dsts: std::collections::HashSet<_> = loop_info
.carrier_phis
.values()
.map(|entry| entry.phi_dst)
.collect();
verify_header_phi_dsts_not_redefined(func, header_block, &phi_dsts);
}

View File

@ -1209,7 +1209,7 @@ pub(in crate::mir::builder) fn merge_joinir_mir_blocks(
{
if let Some(boundary) = boundary {
if let Some(ref func) = builder.scope_ctx.current_function {
verify_joinir_contracts(
debug_assertions::verify_joinir_contracts(
func,
entry_block,
exit_block_id,
@ -1361,195 +1361,4 @@ fn remap_values(
Ok(())
}
/// Verify that PHI dst values are not overwritten by later instructions (Phase 204-2)
///
/// # Checks
///
/// 1. PHI instructions define dst values in header block
/// 2. No subsequent instruction in the same block overwrites these dsts
///
/// # Rationale
///
/// PHI dst overwrite violates SSA invariant (single assignment) and causes undefined behavior.
/// While Phase 201 JoinValueSpace prevents ValueId collisions, manual coding errors can still
/// occur during pattern lowering.
///
/// # Panics
///
/// Panics in debug mode if PHI dst is overwritten by a later instruction.
#[cfg(debug_assertions)]
fn verify_no_phi_dst_overwrite(
func: &crate::mir::MirFunction,
header_block: crate::mir::BasicBlockId,
loop_info: &LoopHeaderPhiInfo,
) {
if loop_info.carrier_phis.is_empty() {
return; // No PHIs to verify
}
let header_block_data = func.blocks.get(&header_block).unwrap_or_else(|| {
panic!(
"[JoinIRVerifier] Header block {} not found ({} blocks in func)",
header_block,
func.blocks.len()
)
});
// 1. Collect all PHI dsts
let phi_dsts: std::collections::HashSet<crate::mir::ValueId> = loop_info
.carrier_phis
.values()
.map(|entry| entry.phi_dst)
.collect();
// 2. Check instructions after PHI definitions
let mut after_phis = false;
for instr in &header_block_data.instructions {
match instr {
crate::mir::MirInstruction::Phi { dst, .. } => {
// PHI instructions come first in basic block
if after_phis {
panic!(
"[JoinIRVerifier] PHI instruction {:?} appears after non-PHI instructions in block {}",
dst, header_block.0
);
}
}
_ => {
after_phis = true;
// Check if this instruction writes to a PHI dst
if let Some(dst) = get_instruction_dst(instr) {
if phi_dsts.contains(&dst) {
panic!(
"[JoinIRVerifier/Phase204] PHI dst {:?} is overwritten by instruction in header block {}: {:?}",
dst, header_block.0, instr
);
}
}
}
}
}
}
/// Helper: Extract dst ValueId from MirInstruction (Phase 204-2)
#[cfg(debug_assertions)]
fn get_instruction_dst(instr: &crate::mir::MirInstruction) -> Option<crate::mir::ValueId> {
use crate::mir::MirInstruction;
match instr {
// Instructions with always-present dst
MirInstruction::Const { dst, .. }
| MirInstruction::Load { dst, .. }
| MirInstruction::UnaryOp { dst, .. }
| MirInstruction::BinOp { dst, .. }
| MirInstruction::Compare { dst, .. }
| MirInstruction::TypeOp { dst, .. }
| MirInstruction::NewBox { dst, .. }
| MirInstruction::NewClosure { dst, .. }
| MirInstruction::Copy { dst, .. }
| MirInstruction::Cast { dst, .. }
| MirInstruction::TypeCheck { dst, .. }
| MirInstruction::Phi { dst, .. }
| MirInstruction::ArrayGet { dst, .. }
| MirInstruction::RefNew { dst, .. }
| MirInstruction::RefGet { dst, .. }
| MirInstruction::WeakNew { dst, .. }
| MirInstruction::WeakLoad { dst, .. }
| MirInstruction::WeakRef { dst, .. }
| MirInstruction::FutureNew { dst, .. }
| MirInstruction::Await { dst, .. } => Some(*dst),
// Instructions with Option<ValueId> dst
MirInstruction::BoxCall { dst, .. }
| MirInstruction::ExternCall { dst, .. }
| MirInstruction::Call { dst, .. }
| MirInstruction::PluginInvoke { dst, .. } => *dst,
// Instructions without dst (side-effects only)
_ => None,
}
}
/// Verify PHI inputs are defined (Phase 204-3 - Conservative sanity checks)
///
/// # Checks
///
/// 1. PHI inputs have reasonable ValueId values (< threshold)
/// 2. No obviously undefined values (e.g., suspiciously large IDs)
///
/// # Note
///
/// Full data-flow analysis (DFA) verification is deferred to Phase 205+.
/// This function only performs conservative sanity checks.
///
/// # Panics
///
/// Panics in debug mode if suspicious PHI inputs are detected.
#[cfg(debug_assertions)]
fn verify_phi_inputs_defined(
func: &crate::mir::MirFunction,
header_block: crate::mir::BasicBlockId,
) {
let header_block_data = func.blocks.get(&header_block).unwrap_or_else(|| {
panic!(
"[JoinIRVerifier] Header block {} not found ({} blocks in func)",
header_block,
func.blocks.len()
)
});
for instr in &header_block_data.instructions {
if let crate::mir::MirInstruction::Phi {
dst,
inputs,
type_hint: _,
} = instr
{
for (value_id, pred_block) in inputs {
// Conservative sanity check: ValueId should not be suspiciously large
// Phase 201 JoinValueSpace uses regions:
// - PHI Reserved: 0-99
// - Param: 100-999
// - Local: 1000+
// - Reasonable max: 100000 (arbitrary but catches obvious bugs)
if value_id.0 >= 100000 {
panic!(
"[JoinIRVerifier/Phase204-3] PHI {:?} has suspiciously large input {:?} from predecessor block {:?}",
dst, value_id, pred_block
);
}
}
}
}
}
/// Verify all loop contracts for a merged JoinIR function
///
/// This is the main entry point for verification. It runs all checks
/// and panics if any contract violation is found.
///
/// # Panics
///
/// Panics in debug mode if any contract violation is detected.
#[cfg(debug_assertions)]
fn verify_joinir_contracts(
func: &crate::mir::MirFunction,
header_block: crate::mir::BasicBlockId,
exit_block: crate::mir::BasicBlockId,
loop_info: &LoopHeaderPhiInfo,
boundary: &JoinInlineBoundary,
) {
// Phase 135 P1 Step 1: Verify condition_bindings consistency (before merge)
debug_assertions::verify_condition_bindings_consistent(boundary);
debug_assertions::verify_loop_header_phis(func, header_block, loop_info, boundary);
verify_no_phi_dst_overwrite(func, header_block, loop_info); // Phase 204-2
verify_phi_inputs_defined(func, header_block); // Phase 204-3
debug_assertions::verify_exit_line(func, exit_block, boundary);
debug_assertions::verify_valueid_regions(loop_info, boundary); // Phase 205-4
// Phase 135 P1 Step 2: Verify header PHI dsts not redefined (after merge)
let phi_dsts: std::collections::HashSet<_> = loop_info
.carrier_phis
.values()
.map(|entry| entry.phi_dst)
.collect();
debug_assertions::verify_header_phi_dsts_not_redefined(func, header_block, &phi_dsts);
}
// Phase 287 P0.1: Verification functions moved to debug_assertions.rs