diff --git a/docs/development/refactoring/README.md b/docs/development/refactoring/README.md new file mode 100644 index 00000000..2176da98 --- /dev/null +++ b/docs/development/refactoring/README.md @@ -0,0 +1,352 @@ +# Modularization Implementation Resources + +This directory contains comprehensive plans and guides for modularizing large source files in the Nyash codebase. + +--- + +## Overview + +**Goal**: Break down 3 oversized files (3,854 lines total) into 37 focused modules with clear separation of concerns. + +**Priority**: +1. **control_flow.rs** (1,632 lines) - **HIGHEST** (blocking Pattern 4+ development) +2. **generic_case_a.rs** (1,056 lines) - **MEDIUM** (high code deduplication potential) +3. **loopform_builder.rs** (1,166 lines) - **LOWER** (already partially modularized) + +--- + +## Documents in This Directory + +### 1. [modularization-implementation-plan.md](./modularization-implementation-plan.md) ⭐ **START HERE** +**Comprehensive implementation plan** covering all 3 files. + +**Contents**: +- Executive summary +- Phase-by-phase migration plans for each file +- Public API changes +- Build verification strategies +- Risk assessment matrices +- Implementation effort breakdowns +- Success criteria + +**Who should read this**: Anyone implementing the modularization. + +**Estimated read time**: 30 minutes + +--- + +### 2. [modularization-quick-start.md](./modularization-quick-start.md) 🚀 **QUICK REFERENCE** +**TL;DR checklist version** of the implementation plan. + +**Contents**: +- Step-by-step checklists for each phase +- Verification commands +- Emergency rollback commands +- Timeline and milestones + +**Who should read this**: Developers actively working on modularization. + +**Estimated read time**: 10 minutes + +--- + +### 3. [modularization-directory-structure.md](./modularization-directory-structure.md) 📊 **VISUAL GUIDE** +**Visual directory structure diagrams** showing before/after states. + +**Contents**: +- Directory tree diagrams for all 3 files +- Metrics comparison tables +- Import path changes +- Navigation improvement examples +- File size distribution charts + +**Who should read this**: Anyone wanting to understand the proposed structure. + +**Estimated read time**: 15 minutes + +--- + +### 4. [phase4-merge-function-breakdown.md](./phase4-merge-function-breakdown.md) 🔥 **CRITICAL PHASE** +**Detailed implementation guide** for Phase 4 (merge_joinir_mir_blocks breakdown). + +**Contents**: +- Function analysis (714 lines → 6 modules) +- Detailed module breakdowns with code examples +- Step-by-step implementation steps (10 steps) +- Verification checklist +- Common pitfalls and solutions +- Rollback procedure + +**Who should read this**: Developers working on control_flow.rs Phase 4. + +**Estimated read time**: 20 minutes + +--- + +## Quick Navigation + +### I want to... + +#### **Start the modularization** +→ Read [modularization-implementation-plan.md](./modularization-implementation-plan.md) (full plan) +→ Use [modularization-quick-start.md](./modularization-quick-start.md) (checklist) + +#### **Understand the proposed structure** +→ Read [modularization-directory-structure.md](./modularization-directory-structure.md) (visual guide) + +#### **Work on Phase 4 (merge function)** +→ Read [phase4-merge-function-breakdown.md](./phase4-merge-function-breakdown.md) (detailed guide) + +#### **Get approval for the plan** +→ Share [modularization-implementation-plan.md](./modularization-implementation-plan.md) (comprehensive) +→ Use [modularization-directory-structure.md](./modularization-directory-structure.md) (visual support) + +#### **Estimate effort** +→ See "Implementation Effort Breakdown" in [modularization-implementation-plan.md](./modularization-implementation-plan.md) + +#### **Assess risks** +→ See "Risk Assessment" sections in [modularization-implementation-plan.md](./modularization-implementation-plan.md) + +--- + +## Recommended Reading Order + +### For Implementers (Developers) +1. **Quick Start** - [modularization-quick-start.md](./modularization-quick-start.md) (10 min) +2. **Full Plan** - [modularization-implementation-plan.md](./modularization-implementation-plan.md) (30 min) +3. **Phase 4 Guide** - [phase4-merge-function-breakdown.md](./phase4-merge-function-breakdown.md) (when ready for Phase 4) + +### For Reviewers (Team Leads) +1. **Visual Guide** - [modularization-directory-structure.md](./modularization-directory-structure.md) (15 min) +2. **Full Plan** - [modularization-implementation-plan.md](./modularization-implementation-plan.md) (30 min) +3. **Quick Start** - [modularization-quick-start.md](./modularization-quick-start.md) (verification commands) + +### For Stakeholders (Management) +1. **Executive Summary** - First page of [modularization-implementation-plan.md](./modularization-implementation-plan.md) (5 min) +2. **Metrics Comparison** - Tables in [modularization-directory-structure.md](./modularization-directory-structure.md) (5 min) + +--- + +## Key Metrics + +### control_flow.rs +- **Lines**: 1,632 → 1,850 (+13% for clarity) +- **Files**: 1 → 19 +- **Largest file**: 1,632 → 180 (-89%) +- **Effort**: 12.5 hours + +### generic_case_a.rs +- **Lines**: 1,056 → 1,470 (+39% for clarity) +- **Files**: 3 → 7 +- **Largest file**: 1,056 → 500 (-53%) +- **Effort**: 3.5 hours + +### loopform_builder.rs +- **Lines**: 1,166 → 1,450 (+24% for clarity) +- **Files**: 5 → 11 +- **Largest file**: 1,166 → 200 (-83%) +- **Effort**: 4 hours + +### Total +- **Lines**: 3,854 → 4,770 (+24% for clarity, distributed across 37 files) +- **Files**: 9 → 37 +- **Total Effort**: 20 hours (2-3 weeks) + +--- + +## Implementation Timeline + +### Week 1: control_flow.rs Phases 1-3 (Low Risk) +- **Monday**: Phase 1 (Debug utilities) - 30 min +- **Tuesday**: Phase 2 (Pattern lowerers) - 2 hours +- **Wednesday**: Phase 3 (JoinIR routing) - 1.5 hours +- **Thursday-Friday**: Verification and buffer + +**Deliverable**: Pattern lowerers and routing isolated + +### Week 2: control_flow.rs Phase 4 (High Risk) +- **Monday-Tuesday**: Phase 4 (merge function) - 6 hours +- **Wednesday**: Buffer for issues +- **Thursday-Friday**: Phases 5-7 (Exception, utils, cleanup) - 2.5 hours + +**Deliverable**: control_flow.rs fully modularized + +### Week 3: generic_case_a.rs (Optional) +- **Monday-Tuesday**: generic_case_a.rs Phases 1-5 - 3.5 hours +- **Wednesday**: Buffer +- **Thursday-Friday**: Documentation & final verification + +**Deliverable**: generic_case_a.rs fully modularized + +### Future: loopform_builder.rs (After Pattern 4+) +- **Timing**: After Pattern 4/5/6 development stabilizes +- **Effort**: 4 hours +- **Priority**: Lower (already partially modularized) + +--- + +## Success Criteria + +### Quantitative +- ✅ All 267+ tests pass (no regressions) +- ✅ Build time ≤ current (no increase) +- ✅ Largest file < 250 lines (vs 1,632 before) +- ✅ Average file size < 150 lines + +### Qualitative +- ✅ Code is easier to navigate +- ✅ New patterns can be added without modifying 1,600-line files +- ✅ Debug traces remain functional +- ✅ Documentation is clear and helpful + +### Process +- ✅ Zero breaking changes at any phase +- ✅ Each phase can be rolled back independently +- ✅ Commits are small and focused +- ✅ CI/CD passes after every commit + +--- + +## Verification Commands + +### Quick Verification (after each phase) +```bash +cargo build --release +cargo test --lib +``` + +### Comprehensive Verification (after critical phases) +```bash +cargo build --release --all-features +cargo test --release +cargo clippy --all-targets +tools/smokes/v2/run.sh --profile quick +``` + +### Debug Trace Verification (Phase 4 only) +```bash +NYASH_OPTION_C_DEBUG=1 ./target/release/nyash apps/tests/loop_min_while.hako 2>&1 | grep "merge_joinir" +``` + +--- + +## Emergency Rollback + +### control_flow.rs +```bash +rm -rf src/mir/builder/control_flow/ +git checkout src/mir/builder/control_flow.rs +cargo build --release && cargo test --lib +``` + +### generic_case_a.rs +```bash +rm -rf src/mir/join_ir/lowering/generic_case_a/ +git checkout src/mir/join_ir/lowering/generic_case_a*.rs +cargo build --release && cargo test --lib +``` + +### loopform_builder.rs +```bash +rm -rf src/mir/phi_core/loopform/ +git checkout src/mir/phi_core/loopform*.rs +cargo build --release && cargo test --lib +``` + +--- + +## Why Modularize? + +### Current Pain Points +1. **714-line merge function** - Impossible to understand without hours of study +2. **1,632-line control_flow.rs** - Pattern 4+ would add another 500+ lines +3. **Merge conflicts** - Multiple developers editing the same giant file +4. **Hard to debug** - `NYASH_OPTION_C_DEBUG` traces are buried in massive files +5. **Hard to test** - Can't test individual phases in isolation + +### Benefits After Modularization +1. **100-150 line modules** - Easy to understand at a glance +2. **19 focused files** - Each with a single responsibility +3. **Isolated changes** - Modify one phase without affecting others +4. **Easy debugging** - Jump to specific module for traces +5. **Testable** - Can unit test individual modules + +### ROI (Return on Investment) +- **Time investment**: 20 hours (2-3 weeks) +- **Time saved**: ~5 hours/month on maintenance (conservatively) +- **Breakeven**: 4 months +- **Long-term benefit**: Much easier Pattern 4/5/6 development + +--- + +## Implementation Order Justification + +### Why control_flow.rs First? +1. **Blocking Pattern 4+** - Currently blocking new pattern development +2. **Highest pain** - 714-line merge function is the biggest code smell +3. **Sets the pattern** - Establishes the modularization template for others +4. **Most benefit** - Reduces merge conflicts immediately + +### Why generic_case_a.rs Second? +1. **High code deduplication** - 4 similar lowerers can be separated +2. **Already partially split** - Companion files already exist +3. **Medium priority** - Not blocking, but would improve maintainability + +### Why loopform_builder.rs Last? +1. **Already partially modularized** - Phase 191 did most of the work +2. **Lower priority** - Not blocking anything +3. **Can wait** - Best done after Pattern 4+ development stabilizes + +--- + +## Questions & Concerns + +### "Is this worth the effort?" +**Yes.** 20 hours investment for ongoing maintenance benefits. Breakeven in 4 months. + +### "Will this break anything?" +**No.** Zero breaking changes, backward compatible at every phase. Full test suite verification. + +### "Can we roll back if needed?" +**Yes.** Each phase can be rolled back independently with simple git commands. + +### "What if we only do control_flow.rs?" +**Still valuable.** That's where the highest pain is. Do that first, others can wait. + +### "Who should implement this?" +**Experienced developer** familiar with MIR builder and JoinIR integration. Phase 4 requires careful attention. + +--- + +## Next Steps + +1. **Review this README** - Understand the resources available +2. **Read the full plan** - [modularization-implementation-plan.md](./modularization-implementation-plan.md) +3. **Get approval** - Share with team leads +4. **Create a branch** - `refactor/modularize-control-flow` +5. **Start Phase 1** - Use [modularization-quick-start.md](./modularization-quick-start.md) + +--- + +## Document Status + +- **Created**: 2025-12-05 +- **Status**: Ready for review and implementation +- **Maintainer**: Claude Code (AI-assisted planning) +- **Next Review**: After Week 1 completion + +--- + +## Feedback + +If you have questions or suggestions about this modularization plan: + +1. **Open an issue** - Tag with `refactoring` label +2. **Update the plan** - Submit a PR with improvements +3. **Document lessons learned** - Add notes to this README + +**Contact**: Open a discussion in the team channel + +--- + +**Happy Modularizing!** 🚀 diff --git a/docs/development/refactoring/modularization-directory-structure.md b/docs/development/refactoring/modularization-directory-structure.md new file mode 100644 index 00000000..09f330b2 --- /dev/null +++ b/docs/development/refactoring/modularization-directory-structure.md @@ -0,0 +1,426 @@ +# Modularization Directory Structure Diagrams + +Visual reference for the proposed directory structures after modularization. + +--- + +## 1. control_flow.rs Modularization + +### Before (Current State) +``` +src/mir/builder/ +├── control_flow.rs (1,632 lines) ⚠️ MONOLITH +├── if_form.rs +├── loops.rs +└── ... (other files) +``` + +### After (Proposed Structure) +``` +src/mir/builder/ +├── control_flow/ +│ ├── mod.rs (~150 lines) ✅ Entry points +│ │ ├── pub(super) fn cf_block() +│ │ ├── pub(super) fn cf_if() +│ │ ├── pub(super) fn cf_loop() +│ │ ├── pub(super) fn cf_try_catch() +│ │ └── pub(super) fn cf_throw() +│ │ +│ ├── debug.rs (~50 lines) ✅ Debug utilities +│ │ └── fn trace_varmap() +│ │ +│ ├── utils.rs (~50 lines) ✅ Utility functions +│ │ └── fn extract_loop_variable_from_condition() +│ │ +│ ├── joinir/ +│ │ ├── mod.rs (~100 lines) ✅ JoinIR coordinator +│ │ │ +│ │ ├── routing.rs (~150 lines) ✅ Routing logic +│ │ │ ├── fn try_cf_loop_joinir() +│ │ │ └── fn cf_loop_joinir_impl() +│ │ │ +│ │ ├── merge/ +│ │ │ ├── mod.rs (~100 lines) ✅ Merge coordinator +│ │ │ │ └── pub fn merge_joinir_mir_blocks() +│ │ │ │ +│ │ │ ├── id_remapper.rs (~150 lines) ✅ ID remapping +│ │ │ │ ├── struct JoinIrIdRemapper +│ │ │ │ ├── fn create_remapper() +│ │ │ │ └── fn remap_ids() +│ │ │ │ +│ │ │ ├── block_allocator.rs (~100 lines) ✅ Block allocation +│ │ │ │ └── fn allocate_blocks() +│ │ │ │ +│ │ │ ├── value_collector.rs (~100 lines) ✅ Value collection +│ │ │ │ └── fn collect_values() +│ │ │ │ +│ │ │ ├── instruction_rewriter.rs (~150 lines) ✅ Instruction rewriting +│ │ │ │ ├── fn rewrite_instructions() +│ │ │ │ └── fn convert_call_to_jump() +│ │ │ │ +│ │ │ └── exit_phi_builder.rs (~100 lines) ✅ Exit PHI construction +│ │ │ └── fn build_exit_phi() +│ │ │ +│ │ └── patterns/ +│ │ ├── mod.rs (~50 lines) ✅ Pattern dispatcher +│ │ │ └── pub fn dispatch_pattern() +│ │ │ +│ │ ├── pattern1_minimal.rs (~150 lines) ✅ Pattern 1 lowering +│ │ │ └── pub fn cf_loop_pattern1_minimal() +│ │ │ +│ │ ├── pattern2_with_break.rs (~130 lines) ✅ Pattern 2 lowering +│ │ │ └── pub fn cf_loop_pattern2_with_break() +│ │ │ +│ │ └── pattern3_with_if_phi.rs (~180 lines) ✅ Pattern 3 lowering +│ │ └── pub fn cf_loop_pattern3_with_if_phi() +│ │ +│ └── exception/ +│ ├── mod.rs (~50 lines) ✅ Exception API +│ │ +│ ├── try_catch.rs (~150 lines) ✅ try/catch impl +│ │ └── pub fn cf_try_catch() +│ │ +│ └── throw.rs (~30 lines) ✅ throw impl +│ └── pub fn cf_throw() +│ +├── if_form.rs +├── loops.rs +└── ... (other files - unchanged) +``` + +### Metrics Comparison + +| Metric | Before | After | Change | +|--------|--------|-------|--------| +| **Total lines** | 1,632 | ~1,850 | +13% (for clarity) | +| **Number of files** | 1 | 19 | +1800% | +| **Largest file** | 1,632 | ~180 | -89% | +| **Average file size** | 1,632 | ~97 | -94% | +| **Functions per file** | 13 | 1-2 | Much clearer | + +### Benefits +- ✅ **714-line merge function → 6 focused modules** (100-150 lines each) +- ✅ **Pattern lowerers isolated** → Easy to add Pattern 4/5/6 +- ✅ **Debug traces easier to locate** → NYASH_OPTION_C_DEBUG output clearer +- ✅ **Merge conflicts reduced** → Changes isolated to specific files +- ✅ **Code navigation improved** → Jump to definition works better + +--- + +## 2. generic_case_a.rs Modularization + +### Before (Current State) +``` +src/mir/join_ir/lowering/ +├── generic_case_a.rs (1,056 lines) ⚠️ LARGE +├── generic_case_a_entry_builder.rs (4,828 bytes) +├── generic_case_a_whitespace_check.rs (4,552 bytes) +└── ... (other files) +``` + +### After (Proposed Structure) +``` +src/mir/join_ir/lowering/ +├── generic_case_a/ +│ ├── mod.rs (~100 lines) ✅ Public API +│ │ ├── pub use skip_ws::lower_case_a_skip_ws_with_scope +│ │ ├── pub use trim::lower_case_a_trim_with_scope +│ │ ├── pub use append_defs::lower_case_a_append_defs_with_scope +│ │ └── pub use stage1_using_resolver::... +│ │ +│ ├── skip_ws.rs (~220 lines) ✅ skip_ws lowerer +│ │ ├── pub fn lower_case_a_skip_ws_with_scope() +│ │ └── fn lower_case_a_skip_ws_core() +│ │ +│ ├── trim.rs (~500 lines) ✅ trim lowerer +│ │ ├── pub fn lower_case_a_trim_with_scope() +│ │ └── fn lower_case_a_trim_core() +│ │ +│ ├── append_defs.rs (~170 lines) ✅ append_defs lowerer +│ │ ├── pub fn lower_case_a_append_defs_with_scope() +│ │ └── fn lower_case_a_append_defs_core() +│ │ +│ ├── stage1_using_resolver.rs (~180 lines) ✅ stage1 lowerer +│ │ ├── pub fn lower_case_a_stage1_usingresolver_with_scope() +│ │ └── fn lower_case_a_stage1_usingresolver_core() +│ │ +│ ├── entry_builder.rs (~150 lines) ✅ (moved from parent) +│ │ └── struct EntryFunctionBuilder +│ │ +│ └── whitespace_check.rs (~150 lines) ✅ (moved from parent) +│ └── fn check_whitespace() +│ +└── ... (other files - unchanged) +``` + +### Metrics Comparison + +| Metric | Before | After | Change | +|--------|--------|-------|--------| +| **Total lines** | 1,056 | ~1,470 | +39% (for clarity) | +| **Number of files** | 3 (scattered) | 7 (organized) | +133% | +| **Largest file** | 1,056 | ~500 | -53% | +| **Average file size** | 352 | ~210 | -40% | +| **Functions per file** | 4 | 1-2 | Much clearer | + +### Benefits +- ✅ **Each lowerer in its own file** → Easy to maintain +- ✅ **Companion files integrated** → All Case A logic in one directory +- ✅ **trim.rs still large (500 lines)** → Could be further split if needed +- ✅ **Clear public API** → mod.rs shows what's exported + +--- + +## 3. loopform_builder.rs Modularization + +### Before (Current State - After Phase 191) +``` +src/mir/phi_core/ +├── loopform_builder.rs (1,166 lines) ⚠️ LARGE +├── loopform_context.rs (✅ already modularized) +├── loopform_variable_models.rs (✅ already modularized) +├── loopform_utils.rs (✅ already modularized) +├── loopform_exit_phi.rs (✅ already modularized) +└── ... (other files) +``` + +### After (Proposed Structure) +``` +src/mir/phi_core/ +├── loopform/ +│ ├── mod.rs (~100 lines) ✅ Public API +│ │ ├── pub use context::LoopFormContext +│ │ ├── pub use variable_models::{CarrierVariable, PinnedVariable} +│ │ ├── pub use exit_phi::build_exit_phis_for_control +│ │ └── pub use builder_core::LoopFormBuilder +│ │ +│ ├── context.rs (~150 lines) ✅ (existing) +│ │ └── pub struct LoopFormContext +│ │ +│ ├── variable_models.rs (~150 lines) ✅ (existing) +│ │ ├── pub struct CarrierVariable +│ │ ├── pub struct PinnedVariable +│ │ └── pub struct LoopBypassFlags +│ │ +│ ├── utils.rs (~100 lines) ✅ (existing) +│ │ ├── pub fn is_loopform_debug_enabled() +│ │ └── pub fn get_loop_bypass_flags() +│ │ +│ ├── exit_phi.rs (~150 lines) ✅ (existing) +│ │ └── pub fn build_exit_phis_for_control() +│ │ +│ ├── passes/ +│ │ ├── mod.rs (~50 lines) ✅ 4-pass coordinator +│ │ │ └── pub fn run_4_pass_architecture() +│ │ │ +│ │ ├── pass1_discovery.rs (~150 lines) ✅ Variable discovery +│ │ │ └── pub fn discover_variables() +│ │ │ +│ │ ├── pass2_header_phi.rs (~150 lines) ✅ Header PHI +│ │ │ └── pub fn build_header_phi() +│ │ │ +│ │ ├── pass3_latch.rs (~100 lines) ✅ Latch processing +│ │ │ └── pub fn process_latch() +│ │ │ +│ │ └── pass4_exit_phi.rs (~150 lines) ✅ Exit PHI +│ │ └── pub fn build_exit_phi() +│ │ +│ └── builder_core.rs (~200 lines) ✅ Core builder +│ └── pub struct LoopFormBuilder +│ +└── ... (other files - unchanged) +``` + +### Metrics Comparison + +| Metric | Before | After | Change | +|--------|--------|-------|--------| +| **Total lines** | 1,166 (main file) | ~1,450 | +24% (for clarity) | +| **Number of files** | 5 (partially modularized) | 11 (fully modularized) | +120% | +| **Largest file** | 1,166 | ~200 | -83% | +| **Average file size** | 233 | ~132 | -43% | + +### Benefits +- ✅ **Completes Phase 191 modularization** → Finishes what was started +- ✅ **4-pass architecture explicit** → Each pass in its own file +- ✅ **Already partially modularized** → Lower risk than control_flow.rs +- ✅ **Clear separation of concerns** → Context, models, passes, builder + +--- + +## File Size Distribution Comparison + +### control_flow.rs +``` +Before: +█████████████████████████████████████████████████████████████████ 1,632 lines + +After: +merge/instruction_rewriter.rs: ███████ 150 lines +pattern1_minimal.rs: ███████ 150 lines +routing.rs: ███████ 150 lines +pattern3_with_if_phi.rs: ████████ 180 lines +try_catch.rs: ███████ 150 lines +... (14 more files < 150 lines each) +``` + +### generic_case_a.rs +``` +Before: +███████████████████████████████████████████████████ 1,056 lines + +After: +trim.rs: ████████████████████████ 500 lines +skip_ws.rs: ██████████ 220 lines +stage1_using_resolver.rs: ████████ 180 lines +append_defs.rs: ████████ 170 lines +... (3 more files < 150 lines each) +``` + +### loopform_builder.rs +``` +Before: +████████████████████████████████████████████████████████ 1,166 lines + +After: +builder_core.rs: █████████ 200 lines +context.rs: ███████ 150 lines +variable_models.rs: ███████ 150 lines +exit_phi.rs: ███████ 150 lines +pass1_discovery.rs: ███████ 150 lines +... (6 more files < 150 lines each) +``` + +--- + +## Import Path Changes + +### control_flow.rs + +#### Before +```rust +use crate::mir::builder::MirBuilder; + +impl MirBuilder { + pub(super) fn cf_loop(...) -> Result { + // 1,632 lines of code + } +} +``` + +#### After +```rust +// src/mir/builder/control_flow/mod.rs +use crate::mir::builder::MirBuilder; + +impl MirBuilder { + pub(super) fn cf_loop(...) -> Result { + // Delegates to routing module + joinir::routing::try_cf_loop_joinir(self, ...) + } +} + +// src/mir/builder/control_flow/joinir/routing.rs +pub(in crate::mir::builder::control_flow) fn try_cf_loop_joinir( + builder: &mut MirBuilder, + ... +) -> Result, String> { + // 150 lines of focused code +} +``` + +### generic_case_a.rs + +#### Before +```rust +// src/mir/join_ir/lowering/generic_case_a.rs +pub(crate) fn lower_case_a_skip_ws_with_scope(...) -> Option { + // 200+ lines +} +``` + +#### After +```rust +// src/mir/join_ir/lowering/generic_case_a/mod.rs +pub(crate) use skip_ws::lower_case_a_skip_ws_with_scope; + +// src/mir/join_ir/lowering/generic_case_a/skip_ws.rs +pub(crate) fn lower_case_a_skip_ws_with_scope(...) -> Option { + // 220 lines +} +``` + +### loopform_builder.rs + +#### Before +```rust +// src/mir/phi_core/loopform_builder.rs +pub use loopform_context::LoopFormContext; + +pub fn build_exit_phis_for_control(...) { + // 1,166 lines +} +``` + +#### After +```rust +// src/mir/phi_core/loopform/mod.rs +pub use context::LoopFormContext; +pub use exit_phi::build_exit_phis_for_control; + +// src/mir/phi_core/loopform/exit_phi.rs +pub fn build_exit_phis_for_control(...) { + // 150 lines +} +``` + +--- + +## Navigation Improvements + +### Before (Monolith Files) +``` +Developer: "Where is the merge function?" +Answer: "control_flow.rs, line 864-1578 (search through 1,632 lines)" + +Developer: "Where is Pattern 3 lowering?" +Answer: "control_flow.rs, line 696-863 (search through 1,632 lines)" + +Developer: "What does merge_joinir_mir_blocks do?" +Answer: "Read 714 lines to understand" +``` + +### After (Modularized) +``` +Developer: "Where is the merge function?" +Answer: "control_flow/joinir/merge/mod.rs (100 lines coordinator)" + +Developer: "Where is Pattern 3 lowering?" +Answer: "control_flow/joinir/patterns/pattern3_with_if_phi.rs (180 lines)" + +Developer: "What does merge_joinir_mir_blocks do?" +Answer: "Read merge/mod.rs (100 lines) → delegates to 6 sub-modules" +``` + +--- + +## Conclusion + +The modularization dramatically improves code organization: + +- **control_flow.rs**: 1,632 lines → 19 files (avg 97 lines) +- **generic_case_a.rs**: 1,056 lines → 7 files (avg 210 lines) +- **loopform_builder.rs**: 1,166 lines → 11 files (avg 132 lines) + +**Total Impact**: 3,854 lines → 37 focused modules + +**Developer Experience**: +- ✅ Easier navigation (jump to definition works better) +- ✅ Clearer separation of concerns +- ✅ Less merge conflicts +- ✅ Easier to add new patterns/lowerers +- ✅ Better code review experience (smaller diffs) + +--- + +**Last Updated**: 2025-12-05 diff --git a/docs/development/refactoring/modularization-implementation-plan.md b/docs/development/refactoring/modularization-implementation-plan.md new file mode 100644 index 00000000..c315ef89 --- /dev/null +++ b/docs/development/refactoring/modularization-implementation-plan.md @@ -0,0 +1,960 @@ +# Modularization Implementation Plan for Large Source Files + +## Executive Summary + +This plan details the modularization of three oversized source files in the Nyash MIR builder: +- **control_flow.rs** (1,632 lines) - **HIGHEST PRIORITY** +- **generic_case_a.rs** (1,056 lines) - **MEDIUM PRIORITY** +- **loopform_builder.rs** (1,166 lines) - **LOWER PRIORITY** + +The strategy prioritizes **control_flow.rs** first due to its critical role in JoinIR integration and ongoing Pattern 4+ development. Each modularization is broken into incremental phases that maintain backward compatibility, with clear verification points and rollback procedures. + +**Total Estimated Effort**: 15-20 hours across 2-3 weeks +**Key Principle**: Zero breaking changes, backward compatible at every phase + +--- + +## Why control_flow.rs First? + +**control_flow.rs** is the integration point for JoinIR lowering patterns, actively being extended for Pattern 4+. Modularizing now: + +1. **Prevents future pain** - Pattern 4/5/6 would add another 500+ lines to an already massive file +2. **Sets the pattern** - Establishes the modularization template for other files +3. **Reduces merge conflicts** - Isolates pattern-specific changes to dedicated files +4. **Improves debuggability** - `NYASH_OPTION_C_DEBUG` traces become easier to locate +5. **Currently blocking** - The 714-line `merge_joinir_mir_blocks()` function is a code smell that makes maintenance difficult + +--- + +# 1. control_flow.rs Modularization (HIGHEST PRIORITY) + +## Current State + +**File**: `src/mir/builder/control_flow.rs` (1,632 lines) + +**Functions** (13 total): +- `trace_varmap()` (8 lines) - Debug utility +- `cf_block()` (4 lines) - Block entry point +- `cf_if()` (10 lines) - If entry point +- `cf_loop()` (80 lines) - Loop entry point +- `try_cf_loop_joinir()` (91 lines) - JoinIR routing logic +- `cf_loop_joinir_impl()` (247 lines) - JoinIR pattern dispatcher +- `cf_loop_pattern1_minimal()` (143 lines) - Pattern 1 lowering +- `cf_loop_pattern2_with_break()` (120 lines) - Pattern 2 lowering +- `cf_loop_pattern3_with_if_phi()` (168 lines) - Pattern 3 lowering +- **`merge_joinir_mir_blocks()` (714 lines) - LARGEST FUNCTION** ⚠️ +- `cf_try_catch()` (138 lines) - Exception handling +- `extract_loop_variable_from_condition()` (31 lines) - Utility +- `cf_throw()` (23 lines) - Throw entry point + +**Key Issues**: +- `merge_joinir_mir_blocks()` is 714 lines (44% of the file!) +- Pattern lowerers (1/2/3) are isolated but scattered +- JoinIR integration logic mixed with entry points +- No clear separation between routing and implementation + +--- + +## Proposed New Structure + +``` +src/mir/builder/control_flow/ +├── mod.rs (~150 lines) - Public API, entry points +├── debug.rs (~50 lines) - Debug utilities (trace_varmap) +├── joinir/ +│ ├── mod.rs (~100 lines) - JoinIR integration coordinator +│ ├── routing.rs (~150 lines) - try_cf_loop_joinir, dispatcher +│ ├── merge/ +│ │ ├── mod.rs (~100 lines) - merge_joinir_mir_blocks entry point +│ │ ├── id_remapper.rs (~150 lines) - ValueId/BlockId remapping +│ │ ├── block_allocator.rs (~100 lines) - Block ID allocation +│ │ ├── value_collector.rs (~100 lines) - Value collection phase +│ │ ├── instruction_rewriter.rs (~150 lines) - Instruction rewriting +│ │ └── exit_phi_builder.rs (~100 lines) - Exit PHI construction +│ └── patterns/ +│ ├── mod.rs (~50 lines) - Pattern dispatcher +│ ├── pattern1_minimal.rs (~150 lines) - Pattern 1 lowering +│ ├── pattern2_with_break.rs (~130 lines) - Pattern 2 lowering +│ └── pattern3_with_if_phi.rs (~180 lines) - Pattern 3 lowering +├── exception/ +│ ├── mod.rs (~50 lines) - Exception handling API +│ ├── try_catch.rs (~150 lines) - try/catch implementation +│ └── throw.rs (~30 lines) - throw implementation +└── utils.rs (~50 lines) - extract_loop_variable, etc. + +Total: ~1,850 lines (13% increase for clarity, but distributed across 19 files) +Average file size: ~97 lines (vs 1,632 lines monolith) +``` + +--- + +## Phase-by-Phase Migration Plan + +### Phase 1: Extract Debug Utilities (30 min) + +**Goal**: Move `trace_varmap()` to a dedicated debug module. + +**Steps**: +1. Create `src/mir/builder/control_flow/debug.rs` +2. Move `trace_varmap()` implementation +3. Update `mod.rs` to re-export `pub(super) use debug::*;` +4. Run verification + +**Files Created**: +- `src/mir/builder/control_flow/debug.rs` (~50 lines) + +**Files Modified**: +- `src/mir/builder/control_flow.rs` → `.../control_flow/mod.rs` + +**Verification**: +```bash +cargo build --release +cargo test --lib +tools/smokes/v2/run.sh --profile quick --filter "loop_*" +``` + +**Rollback**: Delete `debug.rs`, revert imports in `mod.rs` + +**Estimated Effort**: 30 minutes + +--- + +### Phase 2: Extract Pattern Lowerers (2 hours) + +**Goal**: Move Pattern 1/2/3 lowering functions to dedicated files. + +**Steps**: +1. Create `src/mir/builder/control_flow/joinir/patterns/` directory +2. Create `patterns/mod.rs` with dispatcher +3. Move `cf_loop_pattern1_minimal()` to `pattern1_minimal.rs` +4. Move `cf_loop_pattern2_with_break()` to `pattern2_with_break.rs` +5. Move `cf_loop_pattern3_with_if_phi()` to `pattern3_with_if_phi.rs` +6. Update imports in `mod.rs` +7. Run verification + +**Files Created**: +- `control_flow/joinir/patterns/mod.rs` (~50 lines) +- `control_flow/joinir/patterns/pattern1_minimal.rs` (~150 lines) +- `control_flow/joinir/patterns/pattern2_with_break.rs` (~130 lines) +- `control_flow/joinir/patterns/pattern3_with_if_phi.rs` (~180 lines) + +**Public API Changes**: None (all functions are already `fn`, not `pub(super) fn`) + +**Verification**: +```bash +cargo build --release +cargo test --lib -- --include-ignored +NYASH_OPTION_C_DEBUG=1 ./target/release/nyash apps/tests/loop_min_while.hako +tools/smokes/v2/run.sh --profile quick --filter "loop_*" +``` + +**Rollback**: Delete `patterns/` directory, revert imports + +**Estimated Effort**: 2 hours + +--- + +### Phase 3: Extract JoinIR Routing Logic (1.5 hours) + +**Goal**: Move `try_cf_loop_joinir()` and `cf_loop_joinir_impl()` to routing module. + +**Steps**: +1. Create `control_flow/joinir/routing.rs` +2. Move `try_cf_loop_joinir()` implementation +3. Move `cf_loop_joinir_impl()` implementation +4. Update imports in `mod.rs` +5. Run verification + +**Files Created**: +- `control_flow/joinir/routing.rs` (~150 lines) + +**Files Modified**: +- `control_flow/mod.rs` (update imports) + +**Verification**: +```bash +cargo build --release +cargo test --release +HAKO_JOINIR_PRINT_TOKENS_MAIN=1 ./target/release/nyash test_program.hako +``` + +**Rollback**: Delete `routing.rs`, revert imports + +**Estimated Effort**: 1.5 hours + +--- + +### Phase 4: Break Down merge_joinir_mir_blocks (6 hours) ⚠️ **CRITICAL** + +**Goal**: Split the 714-line monster function into 6 logical modules. + +**Background**: `merge_joinir_mir_blocks()` performs 6 distinct phases: +1. **Block ID allocation** (lines 864-923) +2. **Value collection** (lines 931-971) +3. **Block merging** (lines 973-1100) +4. **Instruction rewriting** (lines 1102-1400) +5. **Exit PHI construction** (lines 1402-1500) +6. **Boundary reconnection** (lines 1502-1578) + +**Steps**: +1. Create `control_flow/joinir/merge/` directory structure +2. Extract `id_remapper.rs` - ID remapping utilities +3. Extract `block_allocator.rs` - Block ID allocation logic +4. Extract `value_collector.rs` - Value collection phase +5. Extract `instruction_rewriter.rs` - Instruction transformation +6. Extract `exit_phi_builder.rs` - Exit PHI construction +7. Create `merge/mod.rs` as the coordinator +8. Update imports +9. Run comprehensive verification + +**Files Created**: +- `control_flow/joinir/merge/mod.rs` (~100 lines) - Coordinator +- `control_flow/joinir/merge/id_remapper.rs` (~150 lines) +- `control_flow/joinir/merge/block_allocator.rs` (~100 lines) +- `control_flow/joinir/merge/value_collector.rs` (~100 lines) +- `control_flow/joinir/merge/instruction_rewriter.rs` (~150 lines) +- `control_flow/joinir/merge/exit_phi_builder.rs` (~100 lines) + +**Public API**: +```rust +// control_flow/joinir/merge/mod.rs +pub(in crate::mir::builder) fn merge_joinir_mir_blocks( + builder: &mut MirBuilder, + mir_module: &MirModule, + boundary: Option<&JoinInlineBoundary>, + debug: bool, +) -> Result, String> { + // Coordinator that calls the 6 sub-modules + let remapper = id_remapper::create_remapper(builder, mir_module, debug)?; + let values = value_collector::collect_values(mir_module, &remapper, debug)?; + // ... etc +} +``` + +**Verification** (CRITICAL - run ALL tests): +```bash +# Step 1: Build verification +cargo build --release +cargo test --lib + +# Step 2: Smoke tests (ALL patterns) +tools/smokes/v2/run.sh --profile quick + +# Step 3: Debug trace verification +NYASH_OPTION_C_DEBUG=1 ./target/release/nyash apps/tests/loop_min_while.hako 2>&1 | grep "merge_joinir" + +# Step 4: Regression check (run 3 times for determinism) +for i in 1 2 3; do + echo "=== Run $i ===" + cargo test --release test_loop_patterns 2>&1 | grep "test result" +done + +# Step 5: Full integration test +cargo test --release --all-features +``` + +**Rollback**: +```bash +# If anything breaks, immediately: +rm -rf src/mir/builder/control_flow/joinir/merge +git checkout src/mir/builder/control_flow.rs +cargo build --release && cargo test +``` + +**Risk Mitigation**: +- Keep the original `merge_joinir_mir_blocks()` as a comment at the top of `merge/mod.rs` +- Add `#[cfg(test)]` unit tests for each sub-module +- Use feature flag `NYASH_USE_LEGACY_MERGE=1` for emergency fallback (optional) + +**Estimated Effort**: 6 hours (most complex phase) + +--- + +### Phase 5: Extract Exception Handling (1 hour) + +**Goal**: Move `cf_try_catch()` and `cf_throw()` to exception module. + +**Steps**: +1. Create `control_flow/exception/` directory +2. Move `cf_try_catch()` to `try_catch.rs` +3. Move `cf_throw()` to `throw.rs` +4. Create `exception/mod.rs` as coordinator +5. Update imports +6. Run verification + +**Files Created**: +- `control_flow/exception/mod.rs` (~50 lines) +- `control_flow/exception/try_catch.rs` (~150 lines) +- `control_flow/exception/throw.rs` (~30 lines) + +**Verification**: +```bash +cargo build --release +cargo test --lib -- exception +``` + +**Rollback**: Delete `exception/` directory, revert imports + +**Estimated Effort**: 1 hour + +--- + +### Phase 6: Extract Utilities (30 min) + +**Goal**: Move `extract_loop_variable_from_condition()` to utils module. + +**Steps**: +1. Create `control_flow/utils.rs` +2. Move utility functions +3. Update imports +4. Run verification + +**Files Created**: +- `control_flow/utils.rs` (~50 lines) + +**Verification**: +```bash +cargo build --release +cargo test --lib +``` + +**Rollback**: Delete `utils.rs`, revert imports + +**Estimated Effort**: 30 minutes + +--- + +### Phase 7: Final Cleanup & Documentation (1 hour) + +**Goal**: Clean up `mod.rs`, add module documentation, verify all imports. + +**Steps**: +1. Review `control_flow/mod.rs` for clarity +2. Add module-level documentation to each file +3. Ensure all `pub(super)` visibility is correct +4. Run final comprehensive verification +5. Update CLAUDE.md with new structure + +**Documentation Template**: +```rust +//! Pattern 1 Minimal Loop Lowering +//! +//! This module implements the simplest JoinIR loop lowering pattern: +//! - Single loop variable +//! - No break statements +//! - Simple condition (i < N) +//! +//! Used by: minimal_ssa_skip_ws, simple while loops +//! Phase: 188 (Pattern 1 implementation) +``` + +**Verification**: +```bash +cargo build --release --all-features +cargo test --release +cargo clippy --all-targets +tools/smokes/v2/run.sh --profile integration +``` + +**Estimated Effort**: 1 hour + +--- + +## Public API Changes + +### Before (control_flow.rs) +```rust +impl MirBuilder { + pub(super) fn cf_block(...) -> Result + pub(super) fn cf_if(...) -> Result + pub(super) fn cf_loop(...) -> Result + pub(super) fn cf_try_catch(...) -> Result + pub(super) fn cf_throw(...) -> Result +} +``` + +### After (control_flow/mod.rs) +```rust +// Re-export all entry points (NO CHANGE to public API) +pub(super) use entry_points::{cf_block, cf_if, cf_loop, cf_try_catch, cf_throw}; + +// Internal modules (not exposed outside control_flow) +mod debug; +mod utils; +mod joinir; +mod exception; +mod entry_points; +``` + +**Guarantee**: Zero breaking changes - all `pub(super)` functions remain accessible from `MirBuilder`. + +--- + +## Build Verification Strategy + +### After Each Phase: + +```bash +# 1. Compilation check +cargo build --release +echo "Build: $?" >> /tmp/modularization_log.txt + +# 2. Unit tests +cargo test --lib +echo "Unit tests: $?" >> /tmp/modularization_log.txt + +# 3. Integration tests +cargo test --release +echo "Integration tests: $?" >> /tmp/modularization_log.txt + +# 4. Smoke tests (only for Phases 2-4) +tools/smokes/v2/run.sh --profile quick --filter "loop_*" +echo "Smoke tests: $?" >> /tmp/modularization_log.txt + +# 5. Debug trace verification (for Phase 4) +NYASH_OPTION_C_DEBUG=1 ./target/release/nyash apps/tests/loop_min_while.hako 2>&1 | tee /tmp/debug_trace.txt +grep -q "merge_joinir" /tmp/debug_trace.txt && echo "Debug trace: OK" +``` + +### Failure Handling: +```bash +# If any step fails, STOP immediately +if [ $? -ne 0 ]; then + echo "FAILURE detected at Phase $PHASE" + git status + echo "Run rollback procedure? (y/n)" + # ... rollback steps +fi +``` + +--- + +## Risk Assessment + +| Risk | Likelihood | Impact | Mitigation | Detection | +|------|------------|--------|------------|-----------| +| **Breaking imports** | Medium | High | Incremental phases, test after each | `cargo build` fails | +| **Merge function breakage** | Low | Critical | Keep original as comment, feature flag | Smoke tests fail | +| **Performance regression** | Very Low | Medium | No algorithmic changes | Benchmark before/after | +| **Debug trace changes** | Low | Low | Verify `NYASH_OPTION_C_DEBUG` output | Manual inspection | +| **HashMap non-determinism** | Very Low | Low | Already using BTreeMap in critical paths | Run tests 3x | + +### Critical Mitigations: +1. **Phase 4 (merge function)**: Keep original function as comment +2. **All phases**: Commit after each successful phase +3. **Rollback plan**: Document exact `git checkout` commands for each file + +--- + +## Implementation Effort Breakdown + +| Phase | Description | Effort | Risk | +|-------|-------------|--------|------| +| Phase 1 | Debug utilities | 30 min | Low | +| Phase 2 | Pattern lowerers | 2 hours | Low | +| Phase 3 | JoinIR routing | 1.5 hours | Low | +| **Phase 4** | **merge_joinir_mir_blocks** | **6 hours** | **Medium** | +| Phase 5 | Exception handling | 1 hour | Low | +| Phase 6 | Utilities | 30 min | Low | +| Phase 7 | Cleanup & docs | 1 hour | Low | +| **Total** | **control_flow.rs** | **12.5 hours** | - | + +**Recommended Schedule**: +- **Week 1**: Phases 1-3 (4 hours total) - Low risk warmup +- **Week 2**: Phase 4 (6 hours) - Dedicated time for merge function +- **Week 3**: Phases 5-7 (2.5 hours) - Final cleanup + +--- + +## Success Criteria + +- ✅ All 267 tests pass (no regressions) +- ✅ Build time ≤ current (no increase) +- ✅ `control_flow/mod.rs` is < 200 lines (88% reduction) +- ✅ Largest single file is < 200 lines (vs 714 lines before) +- ✅ Debug traces still work (`NYASH_OPTION_C_DEBUG=1`) +- ✅ Smoke tests pass for all patterns (1/2/3) +- ✅ No HashMap non-determinism introduced +- ✅ Code is easier to navigate (measured by developer feedback) + +--- + +# 2. generic_case_a.rs Modularization (MEDIUM PRIORITY) + +## Current State + +**File**: `src/mir/join_ir/lowering/generic_case_a.rs` (1,056 lines) + +**Functions** (4 public lowerers): +- `lower_case_a_skip_ws_with_scope()` + `lower_case_a_skip_ws_core()` (~203 lines) +- `lower_case_a_trim_with_scope()` + `lower_case_a_trim_core()` (~479 lines) +- `lower_case_a_append_defs_with_scope()` + `lower_case_a_append_defs_core()` (~156 lines) +- `lower_case_a_stage1_usingresolver_with_scope()` + `lower_case_a_stage1_usingresolver_core()` (~167 lines) + +**Key Observation**: This file already has companion files: +- `generic_case_a_entry_builder.rs` (4,828 bytes) +- `generic_case_a_whitespace_check.rs` (4,552 bytes) + +**Strategy**: Complete the modularization pattern by splitting the 4 lowerers into separate files. + +--- + +## Proposed New Structure + +``` +src/mir/join_ir/lowering/generic_case_a/ +├── mod.rs (~100 lines) - Public API +├── skip_ws.rs (~220 lines) - skip_ws lowerer +├── trim.rs (~500 lines) - trim lowerer +├── append_defs.rs (~170 lines) - append_defs lowerer +├── stage1_using_resolver.rs (~180 lines) - stage1 using resolver +├── entry_builder.rs (~150 lines) - (moved from parent) +└── whitespace_check.rs (~150 lines) - (moved from parent) + +Total: ~1,470 lines (39% increase for clarity, distributed across 7 files) +Average: ~210 lines per file +``` + +--- + +## Phase-by-Phase Migration Plan + +### Phase 1: Create Directory Structure (15 min) + +**Steps**: +1. Create `src/mir/join_ir/lowering/generic_case_a/` directory +2. Create `mod.rs` with public API exports +3. Move existing companion files into directory +4. Update parent `mod.rs` imports +5. Run verification + +**Verification**: +```bash +cargo build --release +``` + +**Rollback**: Delete directory, revert parent `mod.rs` + +**Estimated Effort**: 15 minutes + +--- + +### Phase 2: Extract skip_ws Lowerer (45 min) + +**Steps**: +1. Create `generic_case_a/skip_ws.rs` +2. Move `lower_case_a_skip_ws_with_scope()` and `_core()` +3. Add module documentation +4. Update `mod.rs` imports +5. Run verification + +**Verification**: +```bash +cargo build --release +cargo test --lib -- skip_ws +``` + +**Estimated Effort**: 45 minutes + +--- + +### Phase 3: Extract trim Lowerer (1 hour) + +**Steps**: +1. Create `generic_case_a/trim.rs` +2. Move `lower_case_a_trim_with_scope()` and `_core()` +3. Add documentation +4. Update imports +5. Run verification + +**Verification**: +```bash +cargo build --release +cargo test --lib -- trim +``` + +**Estimated Effort**: 1 hour + +--- + +### Phase 4: Extract append_defs & stage1 Lowerers (1 hour) + +**Steps**: +1. Create `generic_case_a/append_defs.rs` +2. Create `generic_case_a/stage1_using_resolver.rs` +3. Move respective functions +4. Add documentation +5. Update imports +6. Run verification + +**Verification**: +```bash +cargo build --release +cargo test --release +tools/smokes/v2/run.sh --profile quick --filter "funcscanner_*" +``` + +**Estimated Effort**: 1 hour + +--- + +### Phase 5: Final Cleanup (30 min) + +**Steps**: +1. Add module-level documentation +2. Verify all imports are clean +3. Run comprehensive tests +4. Update documentation + +**Verification**: +```bash +cargo build --release --all-features +cargo test --release +``` + +**Estimated Effort**: 30 minutes + +--- + +## Public API Changes + +### Before +```rust +// src/mir/join_ir/lowering/generic_case_a.rs +pub(crate) fn lower_case_a_skip_ws_with_scope(...) -> Option +pub(crate) fn lower_case_a_trim_with_scope(...) -> Option +// ... etc +``` + +### After +```rust +// src/mir/join_ir/lowering/generic_case_a/mod.rs +pub(crate) use skip_ws::lower_case_a_skip_ws_with_scope; +pub(crate) use trim::lower_case_a_trim_with_scope; +pub(crate) use append_defs::lower_case_a_append_defs_with_scope; +pub(crate) use stage1_using_resolver::lower_case_a_stage1_usingresolver_with_scope; +``` + +**Guarantee**: Zero breaking changes - all `pub(crate)` functions remain accessible. + +--- + +## Implementation Effort Breakdown + +| Phase | Description | Effort | Risk | +|-------|-------------|--------|------| +| Phase 1 | Directory setup | 15 min | Low | +| Phase 2 | skip_ws lowerer | 45 min | Low | +| Phase 3 | trim lowerer | 1 hour | Low | +| Phase 4 | append_defs & stage1 | 1 hour | Low | +| Phase 5 | Cleanup | 30 min | Low | +| **Total** | **generic_case_a.rs** | **3.5 hours** | - | + +--- + +## Success Criteria + +- ✅ All tests pass +- ✅ `generic_case_a/mod.rs` is < 150 lines +- ✅ Each lowerer is in a dedicated file +- ✅ Companion files integrated into directory +- ✅ Documentation added to all modules + +--- + +# 3. loopform_builder.rs Modularization (LOWER PRIORITY) + +## Current State + +**File**: `src/mir/phi_core/loopform_builder.rs` (1,166 lines) + +**Status**: Already partially modularized in Phase 191! + +**Existing Structure** (Phase 191): +``` +src/mir/phi_core/ +├── loopform_builder.rs (1,166 lines) - Main coordinator +├── loopform_context.rs - ValueId management +├── loopform_variable_models.rs - CarrierVariable, PinnedVariable +├── loopform_utils.rs - Debug and bypass utilities +├── loopform_exit_phi.rs - Exit PHI builder +``` + +**Remaining Work**: The main `loopform_builder.rs` still contains implementation logic that should be moved to dedicated modules. + +--- + +## Proposed New Structure + +``` +src/mir/phi_core/loopform/ +├── mod.rs (~100 lines) - Public API +├── context.rs (~150 lines) - (existing loopform_context.rs) +├── variable_models.rs (~150 lines) - (existing loopform_variable_models.rs) +├── utils.rs (~100 lines) - (existing loopform_utils.rs) +├── exit_phi.rs (~150 lines) - (existing loopform_exit_phi.rs) +├── passes/ +│ ├── mod.rs (~50 lines) - 4-pass architecture coordinator +│ ├── pass1_discovery.rs (~150 lines) - Variable discovery +│ ├── pass2_header_phi.rs (~150 lines) - Header PHI construction +│ ├── pass3_latch.rs (~100 lines) - Latch block processing +│ └── pass4_exit_phi.rs (~150 lines) - Exit PHI construction +└── builder_core.rs (~200 lines) - Core builder logic + +Total: ~1,450 lines (24% increase for clarity, distributed across 11 files) +Average: ~132 lines per file +``` + +--- + +## Phase-by-Phase Migration Plan + +### Phase 1: Directory Structure (30 min) + +**Steps**: +1. Create `src/mir/phi_core/loopform/` directory +2. Move existing modular files into directory +3. Create `mod.rs` with re-exports +4. Update parent `mod.rs` imports +5. Run verification + +**Verification**: +```bash +cargo build --release +cargo test --lib -- loopform +``` + +**Estimated Effort**: 30 minutes + +--- + +### Phase 2: Extract 4-Pass Architecture (2 hours) + +**Steps**: +1. Create `loopform/passes/` directory +2. Identify the 4 passes in `loopform_builder.rs` +3. Extract each pass to dedicated file +4. Create `passes/mod.rs` as coordinator +5. Update imports +6. Run verification + +**Verification**: +```bash +cargo build --release +cargo test --release -- loopform +NYASH_LOOPFORM_DEBUG=1 ./target/release/nyash test_loop.hako +``` + +**Estimated Effort**: 2 hours + +--- + +### Phase 3: Extract Core Builder Logic (1 hour) + +**Steps**: +1. Create `loopform/builder_core.rs` +2. Move remaining builder logic +3. Update imports +4. Run verification + +**Verification**: +```bash +cargo build --release +cargo test --release +``` + +**Estimated Effort**: 1 hour + +--- + +### Phase 4: Final Cleanup (30 min) + +**Steps**: +1. Add module documentation +2. Verify all re-exports +3. Run comprehensive tests + +**Verification**: +```bash +cargo build --release --all-features +cargo test --release +tools/smokes/v2/run.sh --profile quick --filter "phi_*" +``` + +**Estimated Effort**: 30 minutes + +--- + +## Public API Changes + +### Before +```rust +// src/mir/phi_core/loopform_builder.rs +pub use loopform_context::LoopFormContext; +pub use loopform_variable_models::{CarrierVariable, PinnedVariable}; +pub fn build_exit_phis_for_control(...) +``` + +### After +```rust +// src/mir/phi_core/loopform/mod.rs +pub use context::LoopFormContext; +pub use variable_models::{CarrierVariable, PinnedVariable}; +pub use exit_phi::build_exit_phis_for_control; +``` + +**Guarantee**: Zero breaking changes. + +--- + +## Implementation Effort Breakdown + +| Phase | Description | Effort | Risk | +|-------|-------------|--------|------| +| Phase 1 | Directory setup | 30 min | Low | +| Phase 2 | 4-pass extraction | 2 hours | Medium | +| Phase 3 | Core builder | 1 hour | Low | +| Phase 4 | Cleanup | 30 min | Low | +| **Total** | **loopform_builder.rs** | **4 hours** | - | + +--- + +## Success Criteria + +- ✅ All tests pass +- ✅ `loopform/mod.rs` is < 150 lines +- ✅ Each pass is in a dedicated file +- ✅ Existing modular files integrated +- ✅ Documentation added + +--- + +# Implementation Order & Timeline + +## Recommended Schedule (3 weeks) + +### Week 1: control_flow.rs Phases 1-3 (Low Risk) +- **Monday**: Phase 1 (Debug utilities) - 30 min +- **Tuesday**: Phase 2 (Pattern lowerers) - 2 hours +- **Wednesday**: Phase 3 (JoinIR routing) - 1.5 hours +- **Verification**: Run full smoke tests at end of week + +### Week 2: control_flow.rs Phase 4 (High Risk) +- **Monday-Tuesday**: Phase 4 (merge_joinir_mir_blocks) - 6 hours +- **Wednesday**: Buffer day for fixing any issues +- **Thursday-Friday**: Phases 5-7 (Exception, utilities, cleanup) - 2.5 hours + +### Week 3: generic_case_a.rs (Optional) +- **Monday-Tuesday**: generic_case_a.rs Phases 1-5 - 3.5 hours +- **Wednesday**: Buffer +- **Thursday-Friday**: Documentation & final verification + +### Future (After Pattern 4+): loopform_builder.rs +- **Timing**: After Pattern 4/5/6 development stabilizes +- **Effort**: 4 hours +- **Priority**: Lower (already partially modularized) + +--- + +## Effort Summary + +| File | Total Effort | Priority | Complexity | Blocking? | +|------|--------------|----------|------------|-----------| +| **control_flow.rs** | **12.5 hours** | **HIGHEST** | **High** | **Yes** (Pattern 4+) | +| **generic_case_a.rs** | **3.5 hours** | **MEDIUM** | **Low** | No | +| **loopform_builder.rs** | **4 hours** | **LOWER** | **Medium** | No | +| **TOTAL** | **20 hours** | - | - | - | + +--- + +## Risk/Mitigation Matrix + +| Risk | Likelihood | Impact | Mitigation | Detection Method | +|------|------------|--------|------------|------------------| +| **Breaking imports** | Medium | High | Incremental phases, test after each | `cargo build` fails | +| **Merge function breakage** | Low | Critical | Keep original as comment, feature flag | Smoke tests fail | +| **Pattern lowerer breakage** | Low | High | Test each pattern independently | Integration tests | +| **Performance regression** | Very Low | Medium | No algorithmic changes | Benchmark suite | +| **Debug trace changes** | Low | Low | Verify `NYASH_OPTION_C_DEBUG` output | Manual inspection | +| **Test failures** | Low | Medium | Run tests after every phase | CI/CD pipeline | +| **Merge conflicts** | Medium | Low | Work on dedicated branch | Git status | + +--- + +## Success Criteria (Global) + +### Quantitative: +- ✅ All 267+ tests pass (no regressions) +- ✅ Build time ≤ current (no increase) +- ✅ Largest single file in modularized areas is < 250 lines +- ✅ Average file size in modularized areas is < 150 lines + +### Qualitative: +- ✅ Code is easier to navigate (developer feedback) +- ✅ New patterns can be added without modifying 1,600-line files +- ✅ Debug traces remain functional +- ✅ Documentation is clear and helpful + +### Process: +- ✅ Zero breaking changes at any phase +- ✅ Each phase can be rolled back independently +- ✅ Commits are small and focused +- ✅ CI/CD passes after every commit + +--- + +## Appendix: Emergency Rollback Procedure + +If anything goes wrong during modularization: + +```bash +# 1. Identify the problematic phase +echo "ROLLBACK: Phase $PHASE failed" + +# 2. Check git status +git status + +# 3. Rollback created files +rm -rf src/mir/builder/control_flow/ # (example) + +# 4. Restore original file +git checkout src/mir/builder/control_flow.rs + +# 5. Verify build +cargo build --release +cargo test --lib + +# 6. Document the issue +echo "$(date): Phase $PHASE rollback due to: $REASON" >> docs/development/refactoring/rollback_log.txt + +# 7. Review and adjust plan +# - Was the failure expected? +# - Do we need to adjust the approach? +# - Should we skip this phase? +``` + +--- + +## Conclusion + +This modularization plan provides a **safe, incremental path** to breaking down 3 large files into maintainable modules. The approach prioritizes: + +1. **Zero breaking changes** - Backward compatible at every step +2. **Clear verification** - Test after each phase +3. **Easy rollback** - Can undo any phase if issues arise +4. **Pattern setting** - control_flow.rs establishes the template for others + +**Next Steps**: +1. Review this plan with the team +2. Get approval for Week 1 (control_flow.rs Phases 1-3) +3. Create a dedicated branch: `refactor/modularize-control-flow` +4. Begin Phase 1! + +**Questions? Concerns?** Open an issue or discuss in the team channel. + +--- + +**Document Version**: 1.0 +**Created**: 2025-12-05 +**Author**: Claude Code (AI-assisted planning) +**Status**: Draft - Awaiting Review diff --git a/docs/development/refactoring/modularization-quick-start.md b/docs/development/refactoring/modularization-quick-start.md new file mode 100644 index 00000000..00519fdd --- /dev/null +++ b/docs/development/refactoring/modularization-quick-start.md @@ -0,0 +1,253 @@ +# Modularization Quick Start Checklist + +This is the **TL;DR** version of the full implementation plan. Use this as a quick reference during execution. + +**Full Plan**: [modularization-implementation-plan.md](./modularization-implementation-plan.md) + +--- + +## 1. control_flow.rs Modularization (12.5 hours) + +### Phase 1: Debug Utilities (30 min) +- [ ] Create `src/mir/builder/control_flow/debug.rs` +- [ ] Move `trace_varmap()` +- [ ] Update `mod.rs` imports +- [ ] Run: `cargo build --release && cargo test --lib` +- [ ] Commit: `git commit -m "refactor: Extract debug utilities from control_flow.rs"` + +### Phase 2: Pattern Lowerers (2 hours) +- [ ] Create `control_flow/joinir/patterns/` directory +- [ ] Move `cf_loop_pattern1_minimal()` to `pattern1_minimal.rs` +- [ ] Move `cf_loop_pattern2_with_break()` to `pattern2_with_break.rs` +- [ ] Move `cf_loop_pattern3_with_if_phi()` to `pattern3_with_if_phi.rs` +- [ ] Create `patterns/mod.rs` dispatcher +- [ ] Run: `cargo build --release && cargo test --lib -- --include-ignored` +- [ ] Test: `NYASH_OPTION_C_DEBUG=1 ./target/release/nyash apps/tests/loop_min_while.hako` +- [ ] Commit: `git commit -m "refactor: Extract pattern lowerers from control_flow.rs"` + +### Phase 3: JoinIR Routing (1.5 hours) +- [ ] Create `control_flow/joinir/routing.rs` +- [ ] Move `try_cf_loop_joinir()` +- [ ] Move `cf_loop_joinir_impl()` +- [ ] Update imports +- [ ] Run: `cargo build --release && cargo test --release` +- [ ] Commit: `git commit -m "refactor: Extract JoinIR routing logic"` + +### Phase 4: merge_joinir_mir_blocks (6 hours) ⚠️ CRITICAL +- [ ] Create `control_flow/joinir/merge/` directory +- [ ] Extract `id_remapper.rs` (150 lines) +- [ ] Extract `block_allocator.rs` (100 lines) +- [ ] Extract `value_collector.rs` (100 lines) +- [ ] Extract `instruction_rewriter.rs` (150 lines) +- [ ] Extract `exit_phi_builder.rs` (100 lines) +- [ ] Create `merge/mod.rs` coordinator (100 lines) +- [ ] Update all imports +- [ ] Run: `cargo build --release && cargo test --lib` +- [ ] Run: `tools/smokes/v2/run.sh --profile quick` +- [ ] Run: `NYASH_OPTION_C_DEBUG=1 ./target/release/nyash apps/tests/loop_min_while.hako 2>&1 | grep "merge_joinir"` +- [ ] Run determinism test (3x): + ```bash + for i in 1 2 3; do echo "=== Run $i ==="; cargo test --release test_loop_patterns; done + ``` +- [ ] Commit: `git commit -m "refactor: Break down merge_joinir_mir_blocks into modules"` + +### Phase 5: Exception Handling (1 hour) +- [ ] Create `control_flow/exception/` directory +- [ ] Move `cf_try_catch()` to `try_catch.rs` +- [ ] Move `cf_throw()` to `throw.rs` +- [ ] Create `exception/mod.rs` +- [ ] Run: `cargo build --release && cargo test --lib -- exception` +- [ ] Commit: `git commit -m "refactor: Extract exception handling"` + +### Phase 6: Utilities (30 min) +- [ ] Create `control_flow/utils.rs` +- [ ] Move `extract_loop_variable_from_condition()` +- [ ] Run: `cargo build --release && cargo test --lib` +- [ ] Commit: `git commit -m "refactor: Extract utilities"` + +### Phase 7: Final Cleanup (1 hour) +- [ ] Add module-level documentation to all files +- [ ] Review `control_flow/mod.rs` for clarity +- [ ] Verify all `pub(super)` visibility +- [ ] Run: `cargo build --release --all-features` +- [ ] Run: `cargo test --release` +- [ ] Run: `cargo clippy --all-targets` +- [ ] Run: `tools/smokes/v2/run.sh --profile integration` +- [ ] Update `CLAUDE.md` with new structure +- [ ] Commit: `git commit -m "docs: Update control_flow module documentation"` + +--- + +## 2. generic_case_a.rs Modularization (3.5 hours) + +### Phase 1: Directory Setup (15 min) +- [ ] Create `src/mir/join_ir/lowering/generic_case_a/` directory +- [ ] Create `mod.rs` with public API +- [ ] Move `generic_case_a_entry_builder.rs` into directory +- [ ] Move `generic_case_a_whitespace_check.rs` into directory +- [ ] Update parent `mod.rs` imports +- [ ] Run: `cargo build --release` +- [ ] Commit: `git commit -m "refactor: Create generic_case_a directory structure"` + +### Phase 2: Extract skip_ws (45 min) +- [ ] Create `generic_case_a/skip_ws.rs` +- [ ] Move `lower_case_a_skip_ws_with_scope()` and `_core()` +- [ ] Add module documentation +- [ ] Run: `cargo build --release && cargo test --lib -- skip_ws` +- [ ] Commit: `git commit -m "refactor: Extract skip_ws lowerer"` + +### Phase 3: Extract trim (1 hour) +- [ ] Create `generic_case_a/trim.rs` +- [ ] Move `lower_case_a_trim_with_scope()` and `_core()` +- [ ] Add module documentation +- [ ] Run: `cargo build --release && cargo test --lib -- trim` +- [ ] Commit: `git commit -m "refactor: Extract trim lowerer"` + +### Phase 4: Extract append_defs & stage1 (1 hour) +- [ ] Create `generic_case_a/append_defs.rs` +- [ ] Create `generic_case_a/stage1_using_resolver.rs` +- [ ] Move respective functions +- [ ] Add module documentation +- [ ] Run: `cargo build --release && cargo test --release` +- [ ] Run: `tools/smokes/v2/run.sh --profile quick --filter "funcscanner_*"` +- [ ] Commit: `git commit -m "refactor: Extract append_defs and stage1 lowerers"` + +### Phase 5: Final Cleanup (30 min) +- [ ] Add module-level documentation to all files +- [ ] Verify all imports are clean +- [ ] Run: `cargo build --release --all-features && cargo test --release` +- [ ] Commit: `git commit -m "docs: Complete generic_case_a modularization"` + +--- + +## 3. loopform_builder.rs Modularization (4 hours) - FUTURE + +**Status**: Already partially modularized in Phase 191. Remaining work is lower priority. + +### Phase 1: Directory Setup (30 min) +- [ ] Create `src/mir/phi_core/loopform/` directory +- [ ] Move existing modular files into directory +- [ ] Create `mod.rs` with re-exports +- [ ] Run: `cargo build --release && cargo test --lib -- loopform` + +### Phase 2: Extract 4-Pass Architecture (2 hours) +- [ ] Create `loopform/passes/` directory +- [ ] Extract Pass 1: Variable discovery +- [ ] Extract Pass 2: Header PHI construction +- [ ] Extract Pass 3: Latch block processing +- [ ] Extract Pass 4: Exit PHI construction +- [ ] Create `passes/mod.rs` coordinator +- [ ] Run: `cargo build --release && cargo test --release -- loopform` + +### Phase 3: Extract Core Builder (1 hour) +- [ ] Create `loopform/builder_core.rs` +- [ ] Move remaining builder logic +- [ ] Run: `cargo build --release && cargo test --release` + +### Phase 4: Final Cleanup (30 min) +- [ ] Add module documentation +- [ ] Run: `cargo build --release --all-features` +- [ ] Run: `tools/smokes/v2/run.sh --profile quick --filter "phi_*"` + +--- + +## Emergency Rollback Commands + +```bash +# Rollback control_flow.rs modularization +rm -rf src/mir/builder/control_flow/ +git checkout src/mir/builder/control_flow.rs +cargo build --release && cargo test --lib + +# Rollback generic_case_a.rs modularization +rm -rf src/mir/join_ir/lowering/generic_case_a/ +git checkout src/mir/join_ir/lowering/generic_case_a*.rs +cargo build --release && cargo test --lib + +# Rollback loopform_builder.rs modularization +rm -rf src/mir/phi_core/loopform/ +git checkout src/mir/phi_core/loopform*.rs +cargo build --release && cargo test --lib +``` + +--- + +## Verification Commands + +### Quick Verification (after each phase) +```bash +cargo build --release +cargo test --lib +``` + +### Comprehensive Verification (after critical phases) +```bash +cargo build --release --all-features +cargo test --release +cargo clippy --all-targets +tools/smokes/v2/run.sh --profile quick +``` + +### Debug Trace Verification (Phase 4 only) +```bash +NYASH_OPTION_C_DEBUG=1 ./target/release/nyash apps/tests/loop_min_while.hako 2>&1 | grep "merge_joinir" +``` + +### Determinism Check (Phase 4 only) +```bash +for i in 1 2 3; do + echo "=== Run $i ===" + cargo test --release test_loop_patterns 2>&1 | grep "test result" +done +``` + +--- + +## Success Criteria + +- ✅ All 267+ tests pass +- ✅ Build time ≤ current +- ✅ `control_flow/mod.rs` < 200 lines (vs 1,632) +- ✅ Largest file < 250 lines (vs 714) +- ✅ Debug traces still work +- ✅ No HashMap non-determinism +- ✅ Code is easier to navigate + +--- + +## Timeline + +### Week 1: control_flow.rs Phases 1-3 (Low Risk) +- **Total**: 4 hours +- **Risk**: Low +- **Deliverable**: Pattern lowerers and routing isolated + +### Week 2: control_flow.rs Phase 4 (High Risk) +- **Total**: 6 hours +- **Risk**: Medium +- **Deliverable**: merge_joinir_mir_blocks modularized + +### Week 3: control_flow.rs Phases 5-7 + generic_case_a.rs +- **Total**: 6 hours +- **Risk**: Low +- **Deliverable**: control_flow.rs complete, generic_case_a.rs modularized + +--- + +## Questions? + +**Full Plan**: [modularization-implementation-plan.md](./modularization-implementation-plan.md) + +**Open Issues**: +- Create a GitHub issue if you encounter blockers +- Tag with `refactoring` label + +**Need Help?**: +- Check the full plan for detailed explanations +- Review the risk/mitigation matrix +- Use the emergency rollback procedure if needed + +--- + +**Last Updated**: 2025-12-05 +**Status**: Ready to execute diff --git a/docs/development/refactoring/phase4-merge-function-breakdown.md b/docs/development/refactoring/phase4-merge-function-breakdown.md new file mode 100644 index 00000000..5c23e999 --- /dev/null +++ b/docs/development/refactoring/phase4-merge-function-breakdown.md @@ -0,0 +1,789 @@ +# Phase 4: merge_joinir_mir_blocks Breakdown Guide + +**CRITICAL PHASE** - This is the most complex modularization task. + +**Context**: `merge_joinir_mir_blocks()` is a 714-line monster function that performs 6 distinct operations. This guide provides step-by-step instructions for breaking it down safely. + +--- + +## Overview + +### Current State +- **File**: `src/mir/builder/control_flow.rs` +- **Function**: `merge_joinir_mir_blocks()` (lines 864-1578) +- **Size**: 714 lines (44% of the entire file!) +- **Complexity**: 6 distinct phases, multiple data structures, critical to JoinIR integration + +### Target State +- **Directory**: `src/mir/builder/control_flow/joinir/merge/` +- **Files**: 7 files (1 coordinator + 6 sub-modules) +- **Average size**: ~100-150 lines per file +- **Maintainability**: Each phase in its own file + +--- + +## Function Analysis + +### Current Structure (Lines 864-1578) + +```rust +fn merge_joinir_mir_blocks( + &mut self, + mir_module: &MirModule, + boundary: Option<&JoinInlineBoundary>, + debug: bool, +) -> Result, String> { + // Phase 1: Block ID Allocation (lines 864-923) + // - Create JoinIrIdRemapper + // - Allocate new block IDs for all functions + // - Map function entry blocks + + // Phase 2: Value Collection (lines 931-971) + // - Collect all ValueIds used across functions + // - Track Const String → function name mapping + // - Collect function parameters for tail call conversion + + // Phase 3: ValueId Remapping (lines 973-1100) + // - Allocate new ValueIds for all collected values + // - Create remapping table + // - Handle determinism (BTreeSet/BTreeMap usage) + + // Phase 4: Block Merging & Instruction Rewriting (lines 1102-1400) + // - For each function in sorted order: + // - For each block in sorted order: + // - Rewrite instructions with remapped IDs + // - Convert Call → Jump for intra-module calls + // - Convert Return → Jump to exit block + // - Handle tail call optimization + // - Insert merged blocks into current_function + + // Phase 5: Exit PHI Construction (lines 1402-1500) + // - Collect return values from all functions + // - Create exit block with PHI node + // - Return exit PHI value + + // Phase 6: Boundary Reconnection (lines 1502-1578) + // - If boundary with host_outputs specified: + // - Map exit PHI back to host variable_map + // - Update SSA slots in variable_map + // - Reconnect inlined code to host context +} +``` + +--- + +## Proposed Module Breakdown + +### 1. `merge/mod.rs` - Coordinator (~100 lines) + +**Purpose**: Orchestrate the 6 phases without implementing logic. + +```rust +//! JoinIR MIR Block Merging Coordinator +//! +//! This module coordinates the merging of JoinIR-generated MIR functions +//! into the host MIR builder. The process is broken into 6 phases: +//! +//! 1. Block ID allocation +//! 2. Value collection +//! 3. ValueId remapping +//! 4. Instruction rewriting +//! 5. Exit PHI construction +//! 6. Boundary reconnection + +use crate::mir::{MirModule, ValueId}; +use crate::mir::join_ir::lowering::inline_boundary::JoinInlineBoundary; + +mod id_remapper; +mod block_allocator; +mod value_collector; +mod instruction_rewriter; +mod exit_phi_builder; + +use id_remapper::JoinIrIdRemapper; + +pub(in crate::mir::builder) fn merge_joinir_mir_blocks( + builder: &mut crate::mir::builder::MirBuilder, + mir_module: &MirModule, + boundary: Option<&JoinInlineBoundary>, + debug: bool, +) -> Result, String> { + // Phase 1: Allocate block IDs + let mut remapper = block_allocator::allocate_blocks(builder, mir_module, debug)?; + + // Phase 2: Collect values + let (used_values, value_to_func_name, function_params) = + value_collector::collect_values(mir_module, &mut remapper, debug)?; + + // Phase 3: Remap ValueIds + id_remapper::remap_values(builder, &used_values, &mut remapper, debug)?; + + // Phase 4: Merge blocks and rewrite instructions + let exit_block_id = instruction_rewriter::merge_and_rewrite( + builder, + mir_module, + &mut remapper, + &value_to_func_name, + &function_params, + debug, + )?; + + // Phase 5: Build exit PHI + let exit_phi_value = exit_phi_builder::build_exit_phi( + builder, + mir_module, + &remapper, + exit_block_id, + debug, + )?; + + // Phase 6: Reconnect boundary (if specified) + if let Some(boundary) = boundary { + reconnect_boundary(builder, boundary, exit_phi_value, debug)?; + } + + Ok(exit_phi_value) +} + +fn reconnect_boundary( + builder: &mut crate::mir::builder::MirBuilder, + boundary: &JoinInlineBoundary, + exit_phi_value: Option, + debug: bool, +) -> Result<(), String> { + // Phase 6 implementation (simple, can stay in mod.rs) + // ... (lines 1502-1578 logic) + Ok(()) +} +``` + +--- + +### 2. `merge/id_remapper.rs` - ID Remapping (~150 lines) + +**Purpose**: Manage ValueId and BlockId remapping. + +```rust +//! JoinIR ID Remapper +//! +//! Handles the mapping of ValueIds and BlockIds from JoinIR functions +//! to the host MIR builder's ID space. + +use crate::mir::{BasicBlockId, ValueId}; +use std::collections::BTreeMap; + +/// Remapper for JoinIR → Host MIR ID translation +pub(super) struct JoinIrIdRemapper { + /// (function_name, old_block_id) → new_block_id + block_map: BTreeMap<(String, BasicBlockId), BasicBlockId>, + + /// old_value_id → new_value_id + value_map: BTreeMap, + + /// function_name → entry_block_id + function_entry_map: BTreeMap, +} + +impl JoinIrIdRemapper { + pub(super) fn new() -> Self { + Self { + block_map: BTreeMap::new(), + value_map: BTreeMap::new(), + function_entry_map: BTreeMap::new(), + } + } + + pub(super) fn set_block(&mut self, func_name: String, old_id: BasicBlockId, new_id: BasicBlockId) { + self.block_map.insert((func_name, old_id), new_id); + } + + pub(super) fn get_block(&self, func_name: &str, old_id: BasicBlockId) -> Option { + self.block_map.get(&(func_name.to_string(), old_id)).copied() + } + + pub(super) fn set_value(&mut self, old_id: ValueId, new_id: ValueId) { + self.value_map.insert(old_id, new_id); + } + + pub(super) fn get_value(&self, old_id: ValueId) -> Option { + self.value_map.get(&old_id).copied() + } + + pub(super) fn set_function_entry(&mut self, func_name: String, entry_block: BasicBlockId) { + self.function_entry_map.insert(func_name, entry_block); + } + + pub(super) fn get_function_entry(&self, func_name: &str) -> Option { + self.function_entry_map.get(func_name).copied() + } + + // ... (other helper methods from lines 873-959) +} + +pub(super) fn remap_values( + builder: &mut crate::mir::builder::MirBuilder, + used_values: &std::collections::BTreeSet, + remapper: &mut JoinIrIdRemapper, + debug: bool, +) -> Result<(), String> { + // Phase 3 logic (lines 973-1100) + // - Allocate new ValueIds + // - Store in remapper + Ok(()) +} +``` + +--- + +### 3. `merge/block_allocator.rs` - Block Allocation (~100 lines) + +**Purpose**: Allocate block IDs for all JoinIR functions. + +```rust +//! JoinIR Block ID Allocator +//! +//! Allocates new BasicBlockIds for all blocks in JoinIR functions +//! to avoid ID conflicts with the host MIR builder. + +use crate::mir::{BasicBlockId, MirModule}; +use super::id_remapper::JoinIrIdRemapper; + +pub(super) fn allocate_blocks( + builder: &mut crate::mir::builder::MirBuilder, + mir_module: &MirModule, + debug: bool, +) -> Result { + let mut remapper = JoinIrIdRemapper::new(); + + if debug { + eprintln!("[merge/block_allocator] Allocating block IDs for {} functions", + mir_module.functions.len()); + } + + // DETERMINISM FIX: Sort functions by name + let mut functions: Vec<_> = mir_module.functions.iter().collect(); + functions.sort_by_key(|(name, _)| name.as_str()); + + for (func_name, func) in functions { + if debug { + eprintln!("[merge/block_allocator] Function: {}", func_name); + } + + // DETERMINISM FIX: Sort blocks by ID + let mut blocks: Vec<_> = func.blocks.iter().collect(); + blocks.sort_by_key(|(id, _)| id.0); + + for (old_block_id, _) in blocks { + let new_block_id = builder.block_gen.next(); + remapper.set_block(func_name.clone(), *old_block_id, new_block_id); + + if debug { + eprintln!("[merge/block_allocator] Block: {:?} → {:?}", + old_block_id, new_block_id); + } + } + + // Store function entry block + let entry_block_new = remapper.get_block(func_name, func.entry_block) + .ok_or_else(|| format!("Entry block not found for {}", func_name))?; + remapper.set_function_entry(func_name.clone(), entry_block_new); + } + + if debug { + eprintln!("[merge/block_allocator] Allocation complete"); + } + + Ok(remapper) +} +``` + +--- + +### 4. `merge/value_collector.rs` - Value Collection (~100 lines) + +**Purpose**: Collect all ValueIds used across JoinIR functions. + +```rust +//! JoinIR Value Collector +//! +//! Collects all ValueIds used in JoinIR functions for remapping. + +use crate::mir::{ValueId, MirModule}; +use super::id_remapper::JoinIrIdRemapper; +use std::collections::{BTreeSet, HashMap}; + +pub(super) fn collect_values( + mir_module: &MirModule, + remapper: &mut JoinIrIdRemapper, + debug: bool, +) -> Result<(BTreeSet, HashMap, HashMap>), String> { + let mut used_values = BTreeSet::new(); + let mut value_to_func_name = HashMap::new(); + let mut function_params = HashMap::new(); + + if debug { + eprintln!("[merge/value_collector] Collecting values from {} functions", + mir_module.functions.len()); + } + + for (func_name, func) in &mir_module.functions { + // Collect function parameters + function_params.insert(func_name.clone(), func.params.clone()); + + for block in func.blocks.values() { + // Collect values from instructions + let block_values = collect_values_in_block(block); + used_values.extend(block_values); + + // Track Const String → function name mapping + for inst in &block.instructions { + if let crate::mir::MirInstruction::Const { dst, value } = inst { + if let crate::mir::types::ConstValue::String(s) = value { + // Check if this is a function name + if mir_module.functions.contains_key(s) { + value_to_func_name.insert(*dst, s.clone()); + used_values.insert(*dst); + } + } + } + } + } + } + + if debug { + eprintln!("[merge/value_collector] Collected {} values", used_values.len()); + } + + Ok((used_values, value_to_func_name, function_params)) +} + +fn collect_values_in_block(block: &crate::mir::BasicBlock) -> Vec { + // Helper function from lines 948-970 + // ... (extract logic from remapper.collect_values_in_block) + vec![] +} +``` + +--- + +### 5. `merge/instruction_rewriter.rs` - Instruction Rewriting (~150 lines) + +**Purpose**: Rewrite instructions and merge blocks. + +```rust +//! JoinIR Instruction Rewriter +//! +//! Rewrites JoinIR instructions with remapped IDs and merges blocks +//! into the host MIR builder. + +use crate::mir::{BasicBlockId, ValueId, MirModule, MirInstruction, BasicBlock}; +use super::id_remapper::JoinIrIdRemapper; +use std::collections::HashMap; + +pub(super) fn merge_and_rewrite( + builder: &mut crate::mir::builder::MirBuilder, + mir_module: &MirModule, + remapper: &mut JoinIrIdRemapper, + value_to_func_name: &HashMap, + function_params: &HashMap>, + debug: bool, +) -> Result { + // Create exit block + let exit_block_id = builder.block_gen.next(); + + if debug { + eprintln!("[merge/instruction_rewriter] Merging blocks from {} functions", + mir_module.functions.len()); + eprintln!("[merge/instruction_rewriter] Exit block: {:?}", exit_block_id); + } + + // DETERMINISM FIX: Sort functions by name + let mut functions: Vec<_> = mir_module.functions.iter().collect(); + functions.sort_by_key(|(name, _)| name.as_str()); + + for (func_name, func) in functions { + merge_function_blocks( + builder, + func_name, + func, + remapper, + value_to_func_name, + function_params, + exit_block_id, + debug, + )?; + } + + Ok(exit_block_id) +} + +fn merge_function_blocks( + builder: &mut crate::mir::builder::MirBuilder, + func_name: &str, + func: &crate::mir::MirFunction, + remapper: &mut JoinIrIdRemapper, + value_to_func_name: &HashMap, + function_params: &HashMap>, + exit_block_id: BasicBlockId, + debug: bool, +) -> Result<(), String> { + // DETERMINISM FIX: Sort blocks by ID + let mut blocks: Vec<_> = func.blocks.iter().collect(); + blocks.sort_by_key(|(id, _)| id.0); + + for (old_block_id, old_block) in blocks { + let new_block_id = remapper.get_block(func_name, *old_block_id) + .ok_or_else(|| format!("Block ID not found: {:?}", old_block_id))?; + + let new_block = rewrite_block( + old_block, + remapper, + value_to_func_name, + function_params, + exit_block_id, + debug, + )?; + + builder.current_function.as_mut() + .ok_or("No current function")? + .blocks.insert(new_block_id, new_block); + } + + Ok(()) +} + +fn rewrite_block( + old_block: &BasicBlock, + remapper: &JoinIrIdRemapper, + value_to_func_name: &HashMap, + function_params: &HashMap>, + exit_block_id: BasicBlockId, + debug: bool, +) -> Result { + let mut new_instructions = Vec::new(); + + // Rewrite each instruction (lines 1102-1300) + for inst in &old_block.instructions { + let new_inst = rewrite_instruction( + inst, + remapper, + value_to_func_name, + function_params, + exit_block_id, + debug, + )?; + new_instructions.push(new_inst); + } + + // Rewrite terminator (lines 1300-1400) + let new_terminator = rewrite_terminator( + &old_block.terminator, + remapper, + exit_block_id, + debug, + )?; + + Ok(BasicBlock { + instructions: new_instructions, + terminator: new_terminator, + }) +} + +fn rewrite_instruction( + inst: &MirInstruction, + remapper: &JoinIrIdRemapper, + value_to_func_name: &HashMap, + function_params: &HashMap>, + exit_block_id: BasicBlockId, + debug: bool, +) -> Result { + // Instruction rewriting logic (lines 1102-1300) + // - Remap ValueIds + // - Convert Call → Jump for intra-module calls + // - Handle tail call optimization + Ok(inst.clone()) // Placeholder +} + +fn rewrite_terminator( + terminator: &Option, + remapper: &JoinIrIdRemapper, + exit_block_id: BasicBlockId, + debug: bool, +) -> Result, String> { + // Terminator rewriting logic (lines 1300-1400) + // - Remap block IDs + // - Convert Return → Jump to exit block + Ok(terminator.clone()) // Placeholder +} +``` + +--- + +### 6. `merge/exit_phi_builder.rs` - Exit PHI Construction (~100 lines) + +**Purpose**: Construct exit PHI node from return values. + +```rust +//! JoinIR Exit PHI Builder +//! +//! Constructs the exit block PHI node that merges return values +//! from all inlined JoinIR functions. + +use crate::mir::{BasicBlockId, ValueId, MirModule, MirInstruction, BasicBlock}; +use super::id_remapper::JoinIrIdRemapper; + +pub(super) fn build_exit_phi( + builder: &mut crate::mir::builder::MirBuilder, + mir_module: &MirModule, + remapper: &JoinIrIdRemapper, + exit_block_id: BasicBlockId, + debug: bool, +) -> Result, String> { + if debug { + eprintln!("[merge/exit_phi_builder] Building exit PHI"); + } + + // Collect return values from all functions (lines 1402-1450) + let mut return_values = Vec::new(); + + for (func_name, func) in &mir_module.functions { + for (old_block_id, block) in &func.blocks { + if let Some(MirInstruction::Return { value: Some(old_value) }) = &block.terminator { + // Remap the return value + let new_value = remapper.get_value(*old_value) + .ok_or_else(|| format!("Return value not remapped: {:?}", old_value))?; + + // Remap the block ID (predecessor of exit block) + let new_block_id = remapper.get_block(func_name, *old_block_id) + .ok_or_else(|| format!("Block ID not remapped: {:?}", old_block_id))?; + + return_values.push((new_block_id, new_value)); + } + } + } + + if return_values.is_empty() { + if debug { + eprintln!("[merge/exit_phi_builder] No return values, creating void exit block"); + } + + // Create empty exit block + builder.current_function.as_mut() + .ok_or("No current function")? + .blocks.insert(exit_block_id, BasicBlock { + instructions: vec![], + terminator: None, + }); + + return Ok(None); + } + + // Create PHI node (lines 1450-1500) + let phi_value = builder.value_gen.next(); + + let phi_inst = MirInstruction::Phi { + dst: phi_value, + incoming: return_values.iter() + .map(|(block_id, value_id)| (*block_id, *value_id)) + .collect(), + }; + + // Create exit block with PHI + builder.current_function.as_mut() + .ok_or("No current function")? + .blocks.insert(exit_block_id, BasicBlock { + instructions: vec![phi_inst], + terminator: None, + }); + + if debug { + eprintln!("[merge/exit_phi_builder] Exit PHI created: {:?}", phi_value); + } + + Ok(Some(phi_value)) +} +``` + +--- + +## Implementation Steps + +### Step 1: Create Directory Structure (5 min) +```bash +mkdir -p src/mir/builder/control_flow/joinir/merge +``` + +### Step 2: Create Skeleton Files (10 min) +```bash +# Create empty files with module documentation +touch src/mir/builder/control_flow/joinir/merge/mod.rs +touch src/mir/builder/control_flow/joinir/merge/id_remapper.rs +touch src/mir/builder/control_flow/joinir/merge/block_allocator.rs +touch src/mir/builder/control_flow/joinir/merge/value_collector.rs +touch src/mir/builder/control_flow/joinir/merge/instruction_rewriter.rs +touch src/mir/builder/control_flow/joinir/merge/exit_phi_builder.rs +``` + +### Step 3: Extract id_remapper.rs (30 min) +- Copy `JoinIrIdRemapper` struct definition +- Move helper methods +- Add `remap_values()` function +- Test compilation: `cargo build --release` + +### Step 4: Extract block_allocator.rs (30 min) +- Copy block allocation logic (lines 888-923) +- Create `allocate_blocks()` function +- Test compilation: `cargo build --release` + +### Step 5: Extract value_collector.rs (30 min) +- Copy value collection logic (lines 931-971) +- Create `collect_values()` function +- Test compilation: `cargo build --release` + +### Step 6: Extract instruction_rewriter.rs (1.5 hours) +- Copy instruction rewriting logic (lines 1102-1400) +- Create `merge_and_rewrite()` function +- Create helper functions for instruction/terminator rewriting +- Test compilation: `cargo build --release` + +### Step 7: Extract exit_phi_builder.rs (30 min) +- Copy exit PHI logic (lines 1402-1500) +- Create `build_exit_phi()` function +- Test compilation: `cargo build --release` + +### Step 8: Create mod.rs Coordinator (1 hour) +- Create coordinator function +- Wire up all sub-modules +- Move boundary reconnection logic (lines 1502-1578) +- Test compilation: `cargo build --release` + +### Step 9: Update control_flow/mod.rs (15 min) +- Update imports to use new `merge` module +- Remove old `merge_joinir_mir_blocks()` implementation +- Test compilation: `cargo build --release` + +### Step 10: Comprehensive Testing (1 hour) +```bash +# Build verification +cargo build --release +cargo test --lib + +# Smoke tests +tools/smokes/v2/run.sh --profile quick + +# Debug trace verification +NYASH_OPTION_C_DEBUG=1 ./target/release/nyash apps/tests/loop_min_while.hako 2>&1 | grep "merge" + +# Determinism check (run 3 times) +for i in 1 2 3; do + echo "=== Run $i ===" + cargo test --release test_loop_patterns 2>&1 | grep "test result" +done +``` + +--- + +## Verification Checklist + +- [ ] All 267+ tests pass +- [ ] Build time ≤ current (no regression) +- [ ] Debug traces still appear with `NYASH_OPTION_C_DEBUG=1` +- [ ] No compiler warnings +- [ ] No clippy warnings (`cargo clippy --all-targets`) +- [ ] Determinism test passes (3 consecutive runs produce identical results) +- [ ] Smoke tests pass for all patterns (1/2/3) + +--- + +## Rollback Procedure + +If Phase 4 fails: + +```bash +# 1. Remove new directory +rm -rf src/mir/builder/control_flow/joinir/merge + +# 2. Restore original control_flow.rs +git checkout src/mir/builder/control_flow.rs + +# 3. Verify build +cargo build --release +cargo test --lib + +# 4. Document failure +echo "$(date): Phase 4 rollback due to: $REASON" >> docs/development/refactoring/rollback_log.txt +``` + +--- + +## Common Pitfalls + +### Pitfall 1: HashMap Non-Determinism +**Problem**: Using `HashMap` for iteration causes non-deterministic ValueId allocation. +**Solution**: Use `BTreeMap` and `BTreeSet` for all ID mappings. + +### Pitfall 2: Missing Value Remapping +**Problem**: Forgetting to remap a ValueId causes "value not found" errors. +**Solution**: Systematically collect ALL values before remapping phase. + +### Pitfall 3: Block Order Matters +**Problem**: Processing blocks in random order causes test failures. +**Solution**: Always sort by name/ID before iteration. + +### Pitfall 4: Forgetting to Update Imports +**Problem**: Old imports cause compilation failures. +**Solution**: Update all imports in a single commit after extraction. + +--- + +## Success Criteria + +- ✅ 714-line function → 6 focused modules (100-150 lines each) +- ✅ All tests pass (no regressions) +- ✅ Debug traces work (`NYASH_OPTION_C_DEBUG=1`) +- ✅ Determinism maintained (BTreeMap/BTreeSet usage) +- ✅ Code is easier to understand (each phase isolated) +- ✅ Future maintenance easier (modify one phase at a time) + +--- + +## Timeline + +- **Total Estimated Effort**: 6 hours +- **Buffer**: +2 hours for unexpected issues +- **Recommended Schedule**: 2 days (3 hours/day) + +**Day 1**: +- Steps 1-5 (directory setup, id_remapper, block_allocator, value_collector) +- Verification after each step + +**Day 2**: +- Steps 6-10 (instruction_rewriter, exit_phi_builder, coordinator, testing) +- Comprehensive verification + +--- + +## Conclusion + +Phase 4 is the most critical modularization task. By breaking down the 714-line monster function into 6 focused modules, we: + +1. **Improve maintainability** - Each phase in its own file +2. **Reduce complexity** - 714 lines → 100-150 lines per file +3. **Enable future development** - Easy to modify individual phases +4. **Maintain stability** - Zero breaking changes, backward compatible + +**Next Steps**: +1. Review this guide +2. Set aside 2 days for implementation +3. Execute steps 1-10 systematically +4. Run comprehensive verification +5. Commit success or rollback if needed + +**Questions?** Refer to the full implementation plan or open an issue. + +--- + +**Document Version**: 1.0 +**Created**: 2025-12-05 +**Author**: Claude Code (AI-assisted planning) +**Status**: Ready for execution