# Phase 4: merge_joinir_mir_blocks Breakdown Guide **CRITICAL PHASE** - This is the most complex modularization task. **Context**: `merge_joinir_mir_blocks()` is a 714-line monster function that performs 6 distinct operations. This guide provides step-by-step instructions for breaking it down safely. --- ## Overview ### Current State - **File**: `src/mir/builder/control_flow.rs` - **Function**: `merge_joinir_mir_blocks()` (lines 864-1578) - **Size**: 714 lines (44% of the entire file!) - **Complexity**: 6 distinct phases, multiple data structures, critical to JoinIR integration ### Target State - **Directory**: `src/mir/builder/control_flow/joinir/merge/` - **Files**: 7 files (1 coordinator + 6 sub-modules) - **Average size**: ~100-150 lines per file - **Maintainability**: Each phase in its own file --- ## Function Analysis ### Current Structure (Lines 864-1578) ```rust fn merge_joinir_mir_blocks( &mut self, mir_module: &MirModule, boundary: Option<&JoinInlineBoundary>, debug: bool, ) -> Result, String> { // Phase 1: Block ID Allocation (lines 864-923) // - Create JoinIrIdRemapper // - Allocate new block IDs for all functions // - Map function entry blocks // Phase 2: Value Collection (lines 931-971) // - Collect all ValueIds used across functions // - Track Const String → function name mapping // - Collect function parameters for tail call conversion // Phase 3: ValueId Remapping (lines 973-1100) // - Allocate new ValueIds for all collected values // - Create remapping table // - Handle determinism (BTreeSet/BTreeMap usage) // Phase 4: Block Merging & Instruction Rewriting (lines 1102-1400) // - For each function in sorted order: // - For each block in sorted order: // - Rewrite instructions with remapped IDs // - Convert Call → Jump for intra-module calls // - Convert Return → Jump to exit block // - Handle tail call optimization // - Insert merged blocks into current_function // Phase 5: Exit PHI Construction (lines 1402-1500) // - Collect return values from all functions // - Create exit block with PHI node // - Return exit PHI value // Phase 6: Boundary Reconnection (lines 1502-1578) // - If boundary with host_outputs specified: // - Map exit PHI back to host variable_map // - Update SSA slots in variable_map // - Reconnect inlined code to host context } ``` --- ## Proposed Module Breakdown ### 1. `merge/mod.rs` - Coordinator (~100 lines) **Purpose**: Orchestrate the 6 phases without implementing logic. ```rust //! JoinIR MIR Block Merging Coordinator //! //! This module coordinates the merging of JoinIR-generated MIR functions //! into the host MIR builder. The process is broken into 6 phases: //! //! 1. Block ID allocation //! 2. Value collection //! 3. ValueId remapping //! 4. Instruction rewriting //! 5. Exit PHI construction //! 6. Boundary reconnection use crate::mir::{MirModule, ValueId}; use crate::mir::join_ir::lowering::inline_boundary::JoinInlineBoundary; mod id_remapper; mod block_allocator; mod value_collector; mod instruction_rewriter; mod exit_phi_builder; use id_remapper::JoinIrIdRemapper; pub(in crate::mir::builder) fn merge_joinir_mir_blocks( builder: &mut crate::mir::builder::MirBuilder, mir_module: &MirModule, boundary: Option<&JoinInlineBoundary>, debug: bool, ) -> Result, String> { // Phase 1: Allocate block IDs let mut remapper = block_allocator::allocate_blocks(builder, mir_module, debug)?; // Phase 2: Collect values let (used_values, value_to_func_name, function_params) = value_collector::collect_values(mir_module, &mut remapper, debug)?; // Phase 3: Remap ValueIds id_remapper::remap_values(builder, &used_values, &mut remapper, debug)?; // Phase 4: Merge blocks and rewrite instructions let exit_block_id = instruction_rewriter::merge_and_rewrite( builder, mir_module, &mut remapper, &value_to_func_name, &function_params, debug, )?; // Phase 5: Build exit PHI let exit_phi_value = exit_phi_builder::build_exit_phi( builder, mir_module, &remapper, exit_block_id, debug, )?; // Phase 6: Reconnect boundary (if specified) if let Some(boundary) = boundary { reconnect_boundary(builder, boundary, exit_phi_value, debug)?; } Ok(exit_phi_value) } fn reconnect_boundary( builder: &mut crate::mir::builder::MirBuilder, boundary: &JoinInlineBoundary, exit_phi_value: Option, debug: bool, ) -> Result<(), String> { // Phase 6 implementation (simple, can stay in mod.rs) // ... (lines 1502-1578 logic) Ok(()) } ``` --- ### 2. `merge/id_remapper.rs` - ID Remapping (~150 lines) **Purpose**: Manage ValueId and BlockId remapping. ```rust //! JoinIR ID Remapper //! //! Handles the mapping of ValueIds and BlockIds from JoinIR functions //! to the host MIR builder's ID space. use crate::mir::{BasicBlockId, ValueId}; use std::collections::BTreeMap; /// Remapper for JoinIR → Host MIR ID translation pub(super) struct JoinIrIdRemapper { /// (function_name, old_block_id) → new_block_id block_map: BTreeMap<(String, BasicBlockId), BasicBlockId>, /// old_value_id → new_value_id value_map: BTreeMap, /// function_name → entry_block_id function_entry_map: BTreeMap, } impl JoinIrIdRemapper { pub(super) fn new() -> Self { Self { block_map: BTreeMap::new(), value_map: BTreeMap::new(), function_entry_map: BTreeMap::new(), } } pub(super) fn set_block(&mut self, func_name: String, old_id: BasicBlockId, new_id: BasicBlockId) { self.block_map.insert((func_name, old_id), new_id); } pub(super) fn get_block(&self, func_name: &str, old_id: BasicBlockId) -> Option { self.block_map.get(&(func_name.to_string(), old_id)).copied() } pub(super) fn set_value(&mut self, old_id: ValueId, new_id: ValueId) { self.value_map.insert(old_id, new_id); } pub(super) fn get_value(&self, old_id: ValueId) -> Option { self.value_map.get(&old_id).copied() } pub(super) fn set_function_entry(&mut self, func_name: String, entry_block: BasicBlockId) { self.function_entry_map.insert(func_name, entry_block); } pub(super) fn get_function_entry(&self, func_name: &str) -> Option { self.function_entry_map.get(func_name).copied() } // ... (other helper methods from lines 873-959) } pub(super) fn remap_values( builder: &mut crate::mir::builder::MirBuilder, used_values: &std::collections::BTreeSet, remapper: &mut JoinIrIdRemapper, debug: bool, ) -> Result<(), String> { // Phase 3 logic (lines 973-1100) // - Allocate new ValueIds // - Store in remapper Ok(()) } ``` --- ### 3. `merge/block_allocator.rs` - Block Allocation (~100 lines) **Purpose**: Allocate block IDs for all JoinIR functions. ```rust //! JoinIR Block ID Allocator //! //! Allocates new BasicBlockIds for all blocks in JoinIR functions //! to avoid ID conflicts with the host MIR builder. use crate::mir::{BasicBlockId, MirModule}; use super::id_remapper::JoinIrIdRemapper; pub(super) fn allocate_blocks( builder: &mut crate::mir::builder::MirBuilder, mir_module: &MirModule, debug: bool, ) -> Result { let mut remapper = JoinIrIdRemapper::new(); if debug { eprintln!("[merge/block_allocator] Allocating block IDs for {} functions", mir_module.functions.len()); } // DETERMINISM FIX: Sort functions by name let mut functions: Vec<_> = mir_module.functions.iter().collect(); functions.sort_by_key(|(name, _)| name.as_str()); for (func_name, func) in functions { if debug { eprintln!("[merge/block_allocator] Function: {}", func_name); } // DETERMINISM FIX: Sort blocks by ID let mut blocks: Vec<_> = func.blocks.iter().collect(); blocks.sort_by_key(|(id, _)| id.0); for (old_block_id, _) in blocks { let new_block_id = builder.block_gen.next(); remapper.set_block(func_name.clone(), *old_block_id, new_block_id); if debug { eprintln!("[merge/block_allocator] Block: {:?} → {:?}", old_block_id, new_block_id); } } // Store function entry block let entry_block_new = remapper.get_block(func_name, func.entry_block) .ok_or_else(|| format!("Entry block not found for {}", func_name))?; remapper.set_function_entry(func_name.clone(), entry_block_new); } if debug { eprintln!("[merge/block_allocator] Allocation complete"); } Ok(remapper) } ``` --- ### 4. `merge/value_collector.rs` - Value Collection (~100 lines) **Purpose**: Collect all ValueIds used across JoinIR functions. ```rust //! JoinIR Value Collector //! //! Collects all ValueIds used in JoinIR functions for remapping. use crate::mir::{ValueId, MirModule}; use super::id_remapper::JoinIrIdRemapper; use std::collections::{BTreeSet, HashMap}; pub(super) fn collect_values( mir_module: &MirModule, remapper: &mut JoinIrIdRemapper, debug: bool, ) -> Result<(BTreeSet, HashMap, HashMap>), String> { let mut used_values = BTreeSet::new(); let mut value_to_func_name = HashMap::new(); let mut function_params = HashMap::new(); if debug { eprintln!("[merge/value_collector] Collecting values from {} functions", mir_module.functions.len()); } for (func_name, func) in &mir_module.functions { // Collect function parameters function_params.insert(func_name.clone(), func.params.clone()); for block in func.blocks.values() { // Collect values from instructions let block_values = collect_values_in_block(block); used_values.extend(block_values); // Track Const String → function name mapping for inst in &block.instructions { if let crate::mir::MirInstruction::Const { dst, value } = inst { if let crate::mir::types::ConstValue::String(s) = value { // Check if this is a function name if mir_module.functions.contains_key(s) { value_to_func_name.insert(*dst, s.clone()); used_values.insert(*dst); } } } } } } if debug { eprintln!("[merge/value_collector] Collected {} values", used_values.len()); } Ok((used_values, value_to_func_name, function_params)) } fn collect_values_in_block(block: &crate::mir::BasicBlock) -> Vec { // Helper function from lines 948-970 // ... (extract logic from remapper.collect_values_in_block) vec![] } ``` --- ### 5. `merge/instruction_rewriter.rs` - Instruction Rewriting (~150 lines) **Purpose**: Rewrite instructions and merge blocks. ```rust //! JoinIR Instruction Rewriter //! //! Rewrites JoinIR instructions with remapped IDs and merges blocks //! into the host MIR builder. use crate::mir::{BasicBlockId, ValueId, MirModule, MirInstruction, BasicBlock}; use super::id_remapper::JoinIrIdRemapper; use std::collections::HashMap; pub(super) fn merge_and_rewrite( builder: &mut crate::mir::builder::MirBuilder, mir_module: &MirModule, remapper: &mut JoinIrIdRemapper, value_to_func_name: &HashMap, function_params: &HashMap>, debug: bool, ) -> Result { // Create exit block let exit_block_id = builder.block_gen.next(); if debug { eprintln!("[merge/instruction_rewriter] Merging blocks from {} functions", mir_module.functions.len()); eprintln!("[merge/instruction_rewriter] Exit block: {:?}", exit_block_id); } // DETERMINISM FIX: Sort functions by name let mut functions: Vec<_> = mir_module.functions.iter().collect(); functions.sort_by_key(|(name, _)| name.as_str()); for (func_name, func) in functions { merge_function_blocks( builder, func_name, func, remapper, value_to_func_name, function_params, exit_block_id, debug, )?; } Ok(exit_block_id) } fn merge_function_blocks( builder: &mut crate::mir::builder::MirBuilder, func_name: &str, func: &crate::mir::MirFunction, remapper: &mut JoinIrIdRemapper, value_to_func_name: &HashMap, function_params: &HashMap>, exit_block_id: BasicBlockId, debug: bool, ) -> Result<(), String> { // DETERMINISM FIX: Sort blocks by ID let mut blocks: Vec<_> = func.blocks.iter().collect(); blocks.sort_by_key(|(id, _)| id.0); for (old_block_id, old_block) in blocks { let new_block_id = remapper.get_block(func_name, *old_block_id) .ok_or_else(|| format!("Block ID not found: {:?}", old_block_id))?; let new_block = rewrite_block( old_block, remapper, value_to_func_name, function_params, exit_block_id, debug, )?; builder.current_function.as_mut() .ok_or("No current function")? .blocks.insert(new_block_id, new_block); } Ok(()) } fn rewrite_block( old_block: &BasicBlock, remapper: &JoinIrIdRemapper, value_to_func_name: &HashMap, function_params: &HashMap>, exit_block_id: BasicBlockId, debug: bool, ) -> Result { let mut new_instructions = Vec::new(); // Rewrite each instruction (lines 1102-1300) for inst in &old_block.instructions { let new_inst = rewrite_instruction( inst, remapper, value_to_func_name, function_params, exit_block_id, debug, )?; new_instructions.push(new_inst); } // Rewrite terminator (lines 1300-1400) let new_terminator = rewrite_terminator( &old_block.terminator, remapper, exit_block_id, debug, )?; Ok(BasicBlock { instructions: new_instructions, terminator: new_terminator, }) } fn rewrite_instruction( inst: &MirInstruction, remapper: &JoinIrIdRemapper, value_to_func_name: &HashMap, function_params: &HashMap>, exit_block_id: BasicBlockId, debug: bool, ) -> Result { // Instruction rewriting logic (lines 1102-1300) // - Remap ValueIds // - Convert Call → Jump for intra-module calls // - Handle tail call optimization Ok(inst.clone()) // Placeholder } fn rewrite_terminator( terminator: &Option, remapper: &JoinIrIdRemapper, exit_block_id: BasicBlockId, debug: bool, ) -> Result, String> { // Terminator rewriting logic (lines 1300-1400) // - Remap block IDs // - Convert Return → Jump to exit block Ok(terminator.clone()) // Placeholder } ``` --- ### 6. `merge/exit_phi_builder.rs` - Exit PHI Construction (~100 lines) **Purpose**: Construct exit PHI node from return values. ```rust //! JoinIR Exit PHI Builder //! //! Constructs the exit block PHI node that merges return values //! from all inlined JoinIR functions. use crate::mir::{BasicBlockId, ValueId, MirModule, MirInstruction, BasicBlock}; use super::id_remapper::JoinIrIdRemapper; pub(super) fn build_exit_phi( builder: &mut crate::mir::builder::MirBuilder, mir_module: &MirModule, remapper: &JoinIrIdRemapper, exit_block_id: BasicBlockId, debug: bool, ) -> Result, String> { if debug { eprintln!("[merge/exit_phi_builder] Building exit PHI"); } // Collect return values from all functions (lines 1402-1450) let mut return_values = Vec::new(); for (func_name, func) in &mir_module.functions { for (old_block_id, block) in &func.blocks { if let Some(MirInstruction::Return { value: Some(old_value) }) = &block.terminator { // Remap the return value let new_value = remapper.get_value(*old_value) .ok_or_else(|| format!("Return value not remapped: {:?}", old_value))?; // Remap the block ID (predecessor of exit block) let new_block_id = remapper.get_block(func_name, *old_block_id) .ok_or_else(|| format!("Block ID not remapped: {:?}", old_block_id))?; return_values.push((new_block_id, new_value)); } } } if return_values.is_empty() { if debug { eprintln!("[merge/exit_phi_builder] No return values, creating void exit block"); } // Create empty exit block builder.current_function.as_mut() .ok_or("No current function")? .blocks.insert(exit_block_id, BasicBlock { instructions: vec![], terminator: None, }); return Ok(None); } // Create PHI node (lines 1450-1500) let phi_value = builder.value_gen.next(); let phi_inst = MirInstruction::Phi { dst: phi_value, incoming: return_values.iter() .map(|(block_id, value_id)| (*block_id, *value_id)) .collect(), }; // Create exit block with PHI builder.current_function.as_mut() .ok_or("No current function")? .blocks.insert(exit_block_id, BasicBlock { instructions: vec![phi_inst], terminator: None, }); if debug { eprintln!("[merge/exit_phi_builder] Exit PHI created: {:?}", phi_value); } Ok(Some(phi_value)) } ``` --- ## Implementation Steps ### Step 1: Create Directory Structure (5 min) ```bash mkdir -p src/mir/builder/control_flow/joinir/merge ``` ### Step 2: Create Skeleton Files (10 min) ```bash # Create empty files with module documentation touch src/mir/builder/control_flow/joinir/merge/mod.rs touch src/mir/builder/control_flow/joinir/merge/id_remapper.rs touch src/mir/builder/control_flow/joinir/merge/block_allocator.rs touch src/mir/builder/control_flow/joinir/merge/value_collector.rs touch src/mir/builder/control_flow/joinir/merge/instruction_rewriter.rs touch src/mir/builder/control_flow/joinir/merge/exit_phi_builder.rs ``` ### Step 3: Extract id_remapper.rs (30 min) - Copy `JoinIrIdRemapper` struct definition - Move helper methods - Add `remap_values()` function - Test compilation: `cargo build --release` ### Step 4: Extract block_allocator.rs (30 min) - Copy block allocation logic (lines 888-923) - Create `allocate_blocks()` function - Test compilation: `cargo build --release` ### Step 5: Extract value_collector.rs (30 min) - Copy value collection logic (lines 931-971) - Create `collect_values()` function - Test compilation: `cargo build --release` ### Step 6: Extract instruction_rewriter.rs (1.5 hours) - Copy instruction rewriting logic (lines 1102-1400) - Create `merge_and_rewrite()` function - Create helper functions for instruction/terminator rewriting - Test compilation: `cargo build --release` ### Step 7: Extract exit_phi_builder.rs (30 min) - Copy exit PHI logic (lines 1402-1500) - Create `build_exit_phi()` function - Test compilation: `cargo build --release` ### Step 8: Create mod.rs Coordinator (1 hour) - Create coordinator function - Wire up all sub-modules - Move boundary reconnection logic (lines 1502-1578) - Test compilation: `cargo build --release` ### Step 9: Update control_flow/mod.rs (15 min) - Update imports to use new `merge` module - Remove old `merge_joinir_mir_blocks()` implementation - Test compilation: `cargo build --release` ### Step 10: Comprehensive Testing (1 hour) ```bash # Build verification cargo build --release cargo test --lib # Smoke tests tools/smokes/v2/run.sh --profile quick # Debug trace verification NYASH_OPTION_C_DEBUG=1 ./target/release/nyash apps/tests/loop_min_while.hako 2>&1 | grep "merge" # Determinism check (run 3 times) for i in 1 2 3; do echo "=== Run $i ===" cargo test --release test_loop_patterns 2>&1 | grep "test result" done ``` --- ## Verification Checklist - [ ] All 267+ tests pass - [ ] Build time ≤ current (no regression) - [ ] Debug traces still appear with `NYASH_OPTION_C_DEBUG=1` - [ ] No compiler warnings - [ ] No clippy warnings (`cargo clippy --all-targets`) - [ ] Determinism test passes (3 consecutive runs produce identical results) - [ ] Smoke tests pass for all patterns (1/2/3) --- ## Rollback Procedure If Phase 4 fails: ```bash # 1. Remove new directory rm -rf src/mir/builder/control_flow/joinir/merge # 2. Restore original control_flow.rs git checkout src/mir/builder/control_flow.rs # 3. Verify build cargo build --release cargo test --lib # 4. Document failure echo "$(date): Phase 4 rollback due to: $REASON" >> docs/development/refactoring/rollback_log.txt ``` --- ## Common Pitfalls ### Pitfall 1: HashMap Non-Determinism **Problem**: Using `HashMap` for iteration causes non-deterministic ValueId allocation. **Solution**: Use `BTreeMap` and `BTreeSet` for all ID mappings. ### Pitfall 2: Missing Value Remapping **Problem**: Forgetting to remap a ValueId causes "value not found" errors. **Solution**: Systematically collect ALL values before remapping phase. ### Pitfall 3: Block Order Matters **Problem**: Processing blocks in random order causes test failures. **Solution**: Always sort by name/ID before iteration. ### Pitfall 4: Forgetting to Update Imports **Problem**: Old imports cause compilation failures. **Solution**: Update all imports in a single commit after extraction. --- ## Success Criteria - ✅ 714-line function → 6 focused modules (100-150 lines each) - ✅ All tests pass (no regressions) - ✅ Debug traces work (`NYASH_OPTION_C_DEBUG=1`) - ✅ Determinism maintained (BTreeMap/BTreeSet usage) - ✅ Code is easier to understand (each phase isolated) - ✅ Future maintenance easier (modify one phase at a time) --- ## Timeline - **Total Estimated Effort**: 6 hours - **Buffer**: +2 hours for unexpected issues - **Recommended Schedule**: 2 days (3 hours/day) **Day 1**: - Steps 1-5 (directory setup, id_remapper, block_allocator, value_collector) - Verification after each step **Day 2**: - Steps 6-10 (instruction_rewriter, exit_phi_builder, coordinator, testing) - Comprehensive verification --- ## Conclusion Phase 4 is the most critical modularization task. By breaking down the 714-line monster function into 6 focused modules, we: 1. **Improve maintainability** - Each phase in its own file 2. **Reduce complexity** - 714 lines → 100-150 lines per file 3. **Enable future development** - Easy to modify individual phases 4. **Maintain stability** - Zero breaking changes, backward compatible **Next Steps**: 1. Review this guide 2. Set aside 2 days for implementation 3. Execute steps 1-10 systematically 4. Run comprehensive verification 5. Commit success or rollback if needed **Questions?** Refer to the full implementation plan or open an issue. --- **Document Version**: 1.0 **Created**: 2025-12-05 **Author**: Claude Code (AI-assisted planning) **Status**: Ready for execution