docs: Add comprehensive modularization implementation plan

Create 5-document modularization strategy for 3 large source files:
- control_flow.rs (1,632 lines → 19 modules)
- generic_case_a.rs (1,056 lines → 7 modules)
- loopform_builder.rs (1,166 lines → 11 modules)

Documents included:

1. **README.md** (352 lines)
   Navigation hub with overview, metrics, timeline, ROI analysis

2. **modularization-implementation-plan.md** (960 lines)
   Complete 20-hour implementation guide with:
   - Phase-by-phase breakdown (13 phases across 3 files)
   - Hour-by-hour effort estimates
   - Risk assessment matrices
   - Success criteria (quantitative & qualitative)
   - Public API changes (zero breaking changes)

3. **modularization-quick-start.md** (253 lines)
   Actionable checklist with:
   - Copy-paste verification commands
   - Emergency rollback procedures
   - Timeline and milestones

4. **modularization-directory-structure.md** (426 lines)
   Visual guide with:
   - Before/after directory trees
   - File size metrics
   - Import path examples
   - Code quality comparison

5. **phase4-merge-function-breakdown.md** (789 lines)
   Detailed implementation for most complex phase:
   - 714-line merge_joinir_mir_blocks() → 6 focused modules
   - 10 detailed implementation steps
   - Common pitfalls and solutions

Key metrics:
- Total effort: 20 hours (2-3 weeks)
- Files: 9 → 37 focused modules
- Largest file: 1,632 → 180 lines (-89%)
- Backward compatible: Zero breaking changes
- ROI: Breakeven in 4 months (5 hrs/month saved)

Priority: control_flow.rs Phase 1-4 (high impact)
Safety: Fully reversible at each phase

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
nyash-codex
2025-12-05 20:13:31 +09:00
parent caf38dba19
commit 1730fea206
5 changed files with 2780 additions and 0 deletions

View File

@ -0,0 +1,352 @@
# Modularization Implementation Resources
This directory contains comprehensive plans and guides for modularizing large source files in the Nyash codebase.
---
## Overview
**Goal**: Break down 3 oversized files (3,854 lines total) into 37 focused modules with clear separation of concerns.
**Priority**:
1. **control_flow.rs** (1,632 lines) - **HIGHEST** (blocking Pattern 4+ development)
2. **generic_case_a.rs** (1,056 lines) - **MEDIUM** (high code deduplication potential)
3. **loopform_builder.rs** (1,166 lines) - **LOWER** (already partially modularized)
---
## Documents in This Directory
### 1. [modularization-implementation-plan.md](./modularization-implementation-plan.md) ⭐ **START HERE**
**Comprehensive implementation plan** covering all 3 files.
**Contents**:
- Executive summary
- Phase-by-phase migration plans for each file
- Public API changes
- Build verification strategies
- Risk assessment matrices
- Implementation effort breakdowns
- Success criteria
**Who should read this**: Anyone implementing the modularization.
**Estimated read time**: 30 minutes
---
### 2. [modularization-quick-start.md](./modularization-quick-start.md) 🚀 **QUICK REFERENCE**
**TL;DR checklist version** of the implementation plan.
**Contents**:
- Step-by-step checklists for each phase
- Verification commands
- Emergency rollback commands
- Timeline and milestones
**Who should read this**: Developers actively working on modularization.
**Estimated read time**: 10 minutes
---
### 3. [modularization-directory-structure.md](./modularization-directory-structure.md) 📊 **VISUAL GUIDE**
**Visual directory structure diagrams** showing before/after states.
**Contents**:
- Directory tree diagrams for all 3 files
- Metrics comparison tables
- Import path changes
- Navigation improvement examples
- File size distribution charts
**Who should read this**: Anyone wanting to understand the proposed structure.
**Estimated read time**: 15 minutes
---
### 4. [phase4-merge-function-breakdown.md](./phase4-merge-function-breakdown.md) 🔥 **CRITICAL PHASE**
**Detailed implementation guide** for Phase 4 (merge_joinir_mir_blocks breakdown).
**Contents**:
- Function analysis (714 lines → 6 modules)
- Detailed module breakdowns with code examples
- Step-by-step implementation steps (10 steps)
- Verification checklist
- Common pitfalls and solutions
- Rollback procedure
**Who should read this**: Developers working on control_flow.rs Phase 4.
**Estimated read time**: 20 minutes
---
## Quick Navigation
### I want to...
#### **Start the modularization**
→ Read [modularization-implementation-plan.md](./modularization-implementation-plan.md) (full plan)
→ Use [modularization-quick-start.md](./modularization-quick-start.md) (checklist)
#### **Understand the proposed structure**
→ Read [modularization-directory-structure.md](./modularization-directory-structure.md) (visual guide)
#### **Work on Phase 4 (merge function)**
→ Read [phase4-merge-function-breakdown.md](./phase4-merge-function-breakdown.md) (detailed guide)
#### **Get approval for the plan**
→ Share [modularization-implementation-plan.md](./modularization-implementation-plan.md) (comprehensive)
→ Use [modularization-directory-structure.md](./modularization-directory-structure.md) (visual support)
#### **Estimate effort**
→ See "Implementation Effort Breakdown" in [modularization-implementation-plan.md](./modularization-implementation-plan.md)
#### **Assess risks**
→ See "Risk Assessment" sections in [modularization-implementation-plan.md](./modularization-implementation-plan.md)
---
## Recommended Reading Order
### For Implementers (Developers)
1. **Quick Start** - [modularization-quick-start.md](./modularization-quick-start.md) (10 min)
2. **Full Plan** - [modularization-implementation-plan.md](./modularization-implementation-plan.md) (30 min)
3. **Phase 4 Guide** - [phase4-merge-function-breakdown.md](./phase4-merge-function-breakdown.md) (when ready for Phase 4)
### For Reviewers (Team Leads)
1. **Visual Guide** - [modularization-directory-structure.md](./modularization-directory-structure.md) (15 min)
2. **Full Plan** - [modularization-implementation-plan.md](./modularization-implementation-plan.md) (30 min)
3. **Quick Start** - [modularization-quick-start.md](./modularization-quick-start.md) (verification commands)
### For Stakeholders (Management)
1. **Executive Summary** - First page of [modularization-implementation-plan.md](./modularization-implementation-plan.md) (5 min)
2. **Metrics Comparison** - Tables in [modularization-directory-structure.md](./modularization-directory-structure.md) (5 min)
---
## Key Metrics
### control_flow.rs
- **Lines**: 1,632 → 1,850 (+13% for clarity)
- **Files**: 1 → 19
- **Largest file**: 1,632 → 180 (-89%)
- **Effort**: 12.5 hours
### generic_case_a.rs
- **Lines**: 1,056 → 1,470 (+39% for clarity)
- **Files**: 3 → 7
- **Largest file**: 1,056 → 500 (-53%)
- **Effort**: 3.5 hours
### loopform_builder.rs
- **Lines**: 1,166 → 1,450 (+24% for clarity)
- **Files**: 5 → 11
- **Largest file**: 1,166 → 200 (-83%)
- **Effort**: 4 hours
### Total
- **Lines**: 3,854 → 4,770 (+24% for clarity, distributed across 37 files)
- **Files**: 9 → 37
- **Total Effort**: 20 hours (2-3 weeks)
---
## Implementation Timeline
### Week 1: control_flow.rs Phases 1-3 (Low Risk)
- **Monday**: Phase 1 (Debug utilities) - 30 min
- **Tuesday**: Phase 2 (Pattern lowerers) - 2 hours
- **Wednesday**: Phase 3 (JoinIR routing) - 1.5 hours
- **Thursday-Friday**: Verification and buffer
**Deliverable**: Pattern lowerers and routing isolated
### Week 2: control_flow.rs Phase 4 (High Risk)
- **Monday-Tuesday**: Phase 4 (merge function) - 6 hours
- **Wednesday**: Buffer for issues
- **Thursday-Friday**: Phases 5-7 (Exception, utils, cleanup) - 2.5 hours
**Deliverable**: control_flow.rs fully modularized
### Week 3: generic_case_a.rs (Optional)
- **Monday-Tuesday**: generic_case_a.rs Phases 1-5 - 3.5 hours
- **Wednesday**: Buffer
- **Thursday-Friday**: Documentation & final verification
**Deliverable**: generic_case_a.rs fully modularized
### Future: loopform_builder.rs (After Pattern 4+)
- **Timing**: After Pattern 4/5/6 development stabilizes
- **Effort**: 4 hours
- **Priority**: Lower (already partially modularized)
---
## Success Criteria
### Quantitative
- ✅ All 267+ tests pass (no regressions)
- ✅ Build time ≤ current (no increase)
- ✅ Largest file < 250 lines (vs 1,632 before)
- Average file size < 150 lines
### Qualitative
- Code is easier to navigate
- New patterns can be added without modifying 1,600-line files
- Debug traces remain functional
- Documentation is clear and helpful
### Process
- Zero breaking changes at any phase
- Each phase can be rolled back independently
- Commits are small and focused
- CI/CD passes after every commit
---
## Verification Commands
### Quick Verification (after each phase)
```bash
cargo build --release
cargo test --lib
```
### Comprehensive Verification (after critical phases)
```bash
cargo build --release --all-features
cargo test --release
cargo clippy --all-targets
tools/smokes/v2/run.sh --profile quick
```
### Debug Trace Verification (Phase 4 only)
```bash
NYASH_OPTION_C_DEBUG=1 ./target/release/nyash apps/tests/loop_min_while.hako 2>&1 | grep "merge_joinir"
```
---
## Emergency Rollback
### control_flow.rs
```bash
rm -rf src/mir/builder/control_flow/
git checkout src/mir/builder/control_flow.rs
cargo build --release && cargo test --lib
```
### generic_case_a.rs
```bash
rm -rf src/mir/join_ir/lowering/generic_case_a/
git checkout src/mir/join_ir/lowering/generic_case_a*.rs
cargo build --release && cargo test --lib
```
### loopform_builder.rs
```bash
rm -rf src/mir/phi_core/loopform/
git checkout src/mir/phi_core/loopform*.rs
cargo build --release && cargo test --lib
```
---
## Why Modularize?
### Current Pain Points
1. **714-line merge function** - Impossible to understand without hours of study
2. **1,632-line control_flow.rs** - Pattern 4+ would add another 500+ lines
3. **Merge conflicts** - Multiple developers editing the same giant file
4. **Hard to debug** - `NYASH_OPTION_C_DEBUG` traces are buried in massive files
5. **Hard to test** - Can't test individual phases in isolation
### Benefits After Modularization
1. **100-150 line modules** - Easy to understand at a glance
2. **19 focused files** - Each with a single responsibility
3. **Isolated changes** - Modify one phase without affecting others
4. **Easy debugging** - Jump to specific module for traces
5. **Testable** - Can unit test individual modules
### ROI (Return on Investment)
- **Time investment**: 20 hours (2-3 weeks)
- **Time saved**: ~5 hours/month on maintenance (conservatively)
- **Breakeven**: 4 months
- **Long-term benefit**: Much easier Pattern 4/5/6 development
---
## Implementation Order Justification
### Why control_flow.rs First?
1. **Blocking Pattern 4+** - Currently blocking new pattern development
2. **Highest pain** - 714-line merge function is the biggest code smell
3. **Sets the pattern** - Establishes the modularization template for others
4. **Most benefit** - Reduces merge conflicts immediately
### Why generic_case_a.rs Second?
1. **High code deduplication** - 4 similar lowerers can be separated
2. **Already partially split** - Companion files already exist
3. **Medium priority** - Not blocking, but would improve maintainability
### Why loopform_builder.rs Last?
1. **Already partially modularized** - Phase 191 did most of the work
2. **Lower priority** - Not blocking anything
3. **Can wait** - Best done after Pattern 4+ development stabilizes
---
## Questions & Concerns
### "Is this worth the effort?"
**Yes.** 20 hours investment for ongoing maintenance benefits. Breakeven in 4 months.
### "Will this break anything?"
**No.** Zero breaking changes, backward compatible at every phase. Full test suite verification.
### "Can we roll back if needed?"
**Yes.** Each phase can be rolled back independently with simple git commands.
### "What if we only do control_flow.rs?"
**Still valuable.** That's where the highest pain is. Do that first, others can wait.
### "Who should implement this?"
**Experienced developer** familiar with MIR builder and JoinIR integration. Phase 4 requires careful attention.
---
## Next Steps
1. **Review this README** - Understand the resources available
2. **Read the full plan** - [modularization-implementation-plan.md](./modularization-implementation-plan.md)
3. **Get approval** - Share with team leads
4. **Create a branch** - `refactor/modularize-control-flow`
5. **Start Phase 1** - Use [modularization-quick-start.md](./modularization-quick-start.md)
---
## Document Status
- **Created**: 2025-12-05
- **Status**: Ready for review and implementation
- **Maintainer**: Claude Code (AI-assisted planning)
- **Next Review**: After Week 1 completion
---
## Feedback
If you have questions or suggestions about this modularization plan:
1. **Open an issue** - Tag with `refactoring` label
2. **Update the plan** - Submit a PR with improvements
3. **Document lessons learned** - Add notes to this README
**Contact**: Open a discussion in the team channel
---
**Happy Modularizing!** 🚀

View File

@ -0,0 +1,426 @@
# Modularization Directory Structure Diagrams
Visual reference for the proposed directory structures after modularization.
---
## 1. control_flow.rs Modularization
### Before (Current State)
```
src/mir/builder/
├── control_flow.rs (1,632 lines) ⚠️ MONOLITH
├── if_form.rs
├── loops.rs
└── ... (other files)
```
### After (Proposed Structure)
```
src/mir/builder/
├── control_flow/
│ ├── mod.rs (~150 lines) ✅ Entry points
│ │ ├── pub(super) fn cf_block()
│ │ ├── pub(super) fn cf_if()
│ │ ├── pub(super) fn cf_loop()
│ │ ├── pub(super) fn cf_try_catch()
│ │ └── pub(super) fn cf_throw()
│ │
│ ├── debug.rs (~50 lines) ✅ Debug utilities
│ │ └── fn trace_varmap()
│ │
│ ├── utils.rs (~50 lines) ✅ Utility functions
│ │ └── fn extract_loop_variable_from_condition()
│ │
│ ├── joinir/
│ │ ├── mod.rs (~100 lines) ✅ JoinIR coordinator
│ │ │
│ │ ├── routing.rs (~150 lines) ✅ Routing logic
│ │ │ ├── fn try_cf_loop_joinir()
│ │ │ └── fn cf_loop_joinir_impl()
│ │ │
│ │ ├── merge/
│ │ │ ├── mod.rs (~100 lines) ✅ Merge coordinator
│ │ │ │ └── pub fn merge_joinir_mir_blocks()
│ │ │ │
│ │ │ ├── id_remapper.rs (~150 lines) ✅ ID remapping
│ │ │ │ ├── struct JoinIrIdRemapper
│ │ │ │ ├── fn create_remapper()
│ │ │ │ └── fn remap_ids()
│ │ │ │
│ │ │ ├── block_allocator.rs (~100 lines) ✅ Block allocation
│ │ │ │ └── fn allocate_blocks()
│ │ │ │
│ │ │ ├── value_collector.rs (~100 lines) ✅ Value collection
│ │ │ │ └── fn collect_values()
│ │ │ │
│ │ │ ├── instruction_rewriter.rs (~150 lines) ✅ Instruction rewriting
│ │ │ │ ├── fn rewrite_instructions()
│ │ │ │ └── fn convert_call_to_jump()
│ │ │ │
│ │ │ └── exit_phi_builder.rs (~100 lines) ✅ Exit PHI construction
│ │ │ └── fn build_exit_phi()
│ │ │
│ │ └── patterns/
│ │ ├── mod.rs (~50 lines) ✅ Pattern dispatcher
│ │ │ └── pub fn dispatch_pattern()
│ │ │
│ │ ├── pattern1_minimal.rs (~150 lines) ✅ Pattern 1 lowering
│ │ │ └── pub fn cf_loop_pattern1_minimal()
│ │ │
│ │ ├── pattern2_with_break.rs (~130 lines) ✅ Pattern 2 lowering
│ │ │ └── pub fn cf_loop_pattern2_with_break()
│ │ │
│ │ └── pattern3_with_if_phi.rs (~180 lines) ✅ Pattern 3 lowering
│ │ └── pub fn cf_loop_pattern3_with_if_phi()
│ │
│ └── exception/
│ ├── mod.rs (~50 lines) ✅ Exception API
│ │
│ ├── try_catch.rs (~150 lines) ✅ try/catch impl
│ │ └── pub fn cf_try_catch()
│ │
│ └── throw.rs (~30 lines) ✅ throw impl
│ └── pub fn cf_throw()
├── if_form.rs
├── loops.rs
└── ... (other files - unchanged)
```
### Metrics Comparison
| Metric | Before | After | Change |
|--------|--------|-------|--------|
| **Total lines** | 1,632 | ~1,850 | +13% (for clarity) |
| **Number of files** | 1 | 19 | +1800% |
| **Largest file** | 1,632 | ~180 | -89% |
| **Average file size** | 1,632 | ~97 | -94% |
| **Functions per file** | 13 | 1-2 | Much clearer |
### Benefits
-**714-line merge function → 6 focused modules** (100-150 lines each)
-**Pattern lowerers isolated** → Easy to add Pattern 4/5/6
-**Debug traces easier to locate** → NYASH_OPTION_C_DEBUG output clearer
-**Merge conflicts reduced** → Changes isolated to specific files
-**Code navigation improved** → Jump to definition works better
---
## 2. generic_case_a.rs Modularization
### Before (Current State)
```
src/mir/join_ir/lowering/
├── generic_case_a.rs (1,056 lines) ⚠️ LARGE
├── generic_case_a_entry_builder.rs (4,828 bytes)
├── generic_case_a_whitespace_check.rs (4,552 bytes)
└── ... (other files)
```
### After (Proposed Structure)
```
src/mir/join_ir/lowering/
├── generic_case_a/
│ ├── mod.rs (~100 lines) ✅ Public API
│ │ ├── pub use skip_ws::lower_case_a_skip_ws_with_scope
│ │ ├── pub use trim::lower_case_a_trim_with_scope
│ │ ├── pub use append_defs::lower_case_a_append_defs_with_scope
│ │ └── pub use stage1_using_resolver::...
│ │
│ ├── skip_ws.rs (~220 lines) ✅ skip_ws lowerer
│ │ ├── pub fn lower_case_a_skip_ws_with_scope()
│ │ └── fn lower_case_a_skip_ws_core()
│ │
│ ├── trim.rs (~500 lines) ✅ trim lowerer
│ │ ├── pub fn lower_case_a_trim_with_scope()
│ │ └── fn lower_case_a_trim_core()
│ │
│ ├── append_defs.rs (~170 lines) ✅ append_defs lowerer
│ │ ├── pub fn lower_case_a_append_defs_with_scope()
│ │ └── fn lower_case_a_append_defs_core()
│ │
│ ├── stage1_using_resolver.rs (~180 lines) ✅ stage1 lowerer
│ │ ├── pub fn lower_case_a_stage1_usingresolver_with_scope()
│ │ └── fn lower_case_a_stage1_usingresolver_core()
│ │
│ ├── entry_builder.rs (~150 lines) ✅ (moved from parent)
│ │ └── struct EntryFunctionBuilder
│ │
│ └── whitespace_check.rs (~150 lines) ✅ (moved from parent)
│ └── fn check_whitespace()
└── ... (other files - unchanged)
```
### Metrics Comparison
| Metric | Before | After | Change |
|--------|--------|-------|--------|
| **Total lines** | 1,056 | ~1,470 | +39% (for clarity) |
| **Number of files** | 3 (scattered) | 7 (organized) | +133% |
| **Largest file** | 1,056 | ~500 | -53% |
| **Average file size** | 352 | ~210 | -40% |
| **Functions per file** | 4 | 1-2 | Much clearer |
### Benefits
-**Each lowerer in its own file** → Easy to maintain
-**Companion files integrated** → All Case A logic in one directory
-**trim.rs still large (500 lines)** → Could be further split if needed
-**Clear public API** → mod.rs shows what's exported
---
## 3. loopform_builder.rs Modularization
### Before (Current State - After Phase 191)
```
src/mir/phi_core/
├── loopform_builder.rs (1,166 lines) ⚠️ LARGE
├── loopform_context.rs (✅ already modularized)
├── loopform_variable_models.rs (✅ already modularized)
├── loopform_utils.rs (✅ already modularized)
├── loopform_exit_phi.rs (✅ already modularized)
└── ... (other files)
```
### After (Proposed Structure)
```
src/mir/phi_core/
├── loopform/
│ ├── mod.rs (~100 lines) ✅ Public API
│ │ ├── pub use context::LoopFormContext
│ │ ├── pub use variable_models::{CarrierVariable, PinnedVariable}
│ │ ├── pub use exit_phi::build_exit_phis_for_control
│ │ └── pub use builder_core::LoopFormBuilder
│ │
│ ├── context.rs (~150 lines) ✅ (existing)
│ │ └── pub struct LoopFormContext
│ │
│ ├── variable_models.rs (~150 lines) ✅ (existing)
│ │ ├── pub struct CarrierVariable
│ │ ├── pub struct PinnedVariable
│ │ └── pub struct LoopBypassFlags
│ │
│ ├── utils.rs (~100 lines) ✅ (existing)
│ │ ├── pub fn is_loopform_debug_enabled()
│ │ └── pub fn get_loop_bypass_flags()
│ │
│ ├── exit_phi.rs (~150 lines) ✅ (existing)
│ │ └── pub fn build_exit_phis_for_control()
│ │
│ ├── passes/
│ │ ├── mod.rs (~50 lines) ✅ 4-pass coordinator
│ │ │ └── pub fn run_4_pass_architecture()
│ │ │
│ │ ├── pass1_discovery.rs (~150 lines) ✅ Variable discovery
│ │ │ └── pub fn discover_variables()
│ │ │
│ │ ├── pass2_header_phi.rs (~150 lines) ✅ Header PHI
│ │ │ └── pub fn build_header_phi()
│ │ │
│ │ ├── pass3_latch.rs (~100 lines) ✅ Latch processing
│ │ │ └── pub fn process_latch()
│ │ │
│ │ └── pass4_exit_phi.rs (~150 lines) ✅ Exit PHI
│ │ └── pub fn build_exit_phi()
│ │
│ └── builder_core.rs (~200 lines) ✅ Core builder
│ └── pub struct LoopFormBuilder
└── ... (other files - unchanged)
```
### Metrics Comparison
| Metric | Before | After | Change |
|--------|--------|-------|--------|
| **Total lines** | 1,166 (main file) | ~1,450 | +24% (for clarity) |
| **Number of files** | 5 (partially modularized) | 11 (fully modularized) | +120% |
| **Largest file** | 1,166 | ~200 | -83% |
| **Average file size** | 233 | ~132 | -43% |
### Benefits
-**Completes Phase 191 modularization** → Finishes what was started
-**4-pass architecture explicit** → Each pass in its own file
-**Already partially modularized** → Lower risk than control_flow.rs
-**Clear separation of concerns** → Context, models, passes, builder
---
## File Size Distribution Comparison
### control_flow.rs
```
Before:
█████████████████████████████████████████████████████████████████ 1,632 lines
After:
merge/instruction_rewriter.rs: ███████ 150 lines
pattern1_minimal.rs: ███████ 150 lines
routing.rs: ███████ 150 lines
pattern3_with_if_phi.rs: ████████ 180 lines
try_catch.rs: ███████ 150 lines
... (14 more files < 150 lines each)
```
### generic_case_a.rs
```
Before:
███████████████████████████████████████████████████ 1,056 lines
After:
trim.rs: ████████████████████████ 500 lines
skip_ws.rs: ██████████ 220 lines
stage1_using_resolver.rs: ████████ 180 lines
append_defs.rs: ████████ 170 lines
... (3 more files < 150 lines each)
```
### loopform_builder.rs
```
Before:
████████████████████████████████████████████████████████ 1,166 lines
After:
builder_core.rs: █████████ 200 lines
context.rs: ███████ 150 lines
variable_models.rs: ███████ 150 lines
exit_phi.rs: ███████ 150 lines
pass1_discovery.rs: ███████ 150 lines
... (6 more files < 150 lines each)
```
---
## Import Path Changes
### control_flow.rs
#### Before
```rust
use crate::mir::builder::MirBuilder;
impl MirBuilder {
pub(super) fn cf_loop(...) -> Result<ValueId, String> {
// 1,632 lines of code
}
}
```
#### After
```rust
// src/mir/builder/control_flow/mod.rs
use crate::mir::builder::MirBuilder;
impl MirBuilder {
pub(super) fn cf_loop(...) -> Result<ValueId, String> {
// Delegates to routing module
joinir::routing::try_cf_loop_joinir(self, ...)
}
}
// src/mir/builder/control_flow/joinir/routing.rs
pub(in crate::mir::builder::control_flow) fn try_cf_loop_joinir(
builder: &mut MirBuilder,
...
) -> Result<Option<ValueId>, String> {
// 150 lines of focused code
}
```
### generic_case_a.rs
#### Before
```rust
// src/mir/join_ir/lowering/generic_case_a.rs
pub(crate) fn lower_case_a_skip_ws_with_scope(...) -> Option<JoinModule> {
// 200+ lines
}
```
#### After
```rust
// src/mir/join_ir/lowering/generic_case_a/mod.rs
pub(crate) use skip_ws::lower_case_a_skip_ws_with_scope;
// src/mir/join_ir/lowering/generic_case_a/skip_ws.rs
pub(crate) fn lower_case_a_skip_ws_with_scope(...) -> Option<JoinModule> {
// 220 lines
}
```
### loopform_builder.rs
#### Before
```rust
// src/mir/phi_core/loopform_builder.rs
pub use loopform_context::LoopFormContext;
pub fn build_exit_phis_for_control<O: LoopFormOps>(...) {
// 1,166 lines
}
```
#### After
```rust
// src/mir/phi_core/loopform/mod.rs
pub use context::LoopFormContext;
pub use exit_phi::build_exit_phis_for_control;
// src/mir/phi_core/loopform/exit_phi.rs
pub fn build_exit_phis_for_control<O: LoopFormOps>(...) {
// 150 lines
}
```
---
## Navigation Improvements
### Before (Monolith Files)
```
Developer: "Where is the merge function?"
Answer: "control_flow.rs, line 864-1578 (search through 1,632 lines)"
Developer: "Where is Pattern 3 lowering?"
Answer: "control_flow.rs, line 696-863 (search through 1,632 lines)"
Developer: "What does merge_joinir_mir_blocks do?"
Answer: "Read 714 lines to understand"
```
### After (Modularized)
```
Developer: "Where is the merge function?"
Answer: "control_flow/joinir/merge/mod.rs (100 lines coordinator)"
Developer: "Where is Pattern 3 lowering?"
Answer: "control_flow/joinir/patterns/pattern3_with_if_phi.rs (180 lines)"
Developer: "What does merge_joinir_mir_blocks do?"
Answer: "Read merge/mod.rs (100 lines) → delegates to 6 sub-modules"
```
---
## Conclusion
The modularization dramatically improves code organization:
- **control_flow.rs**: 1,632 lines → 19 files (avg 97 lines)
- **generic_case_a.rs**: 1,056 lines → 7 files (avg 210 lines)
- **loopform_builder.rs**: 1,166 lines → 11 files (avg 132 lines)
**Total Impact**: 3,854 lines → 37 focused modules
**Developer Experience**:
- ✅ Easier navigation (jump to definition works better)
- ✅ Clearer separation of concerns
- ✅ Less merge conflicts
- ✅ Easier to add new patterns/lowerers
- ✅ Better code review experience (smaller diffs)
---
**Last Updated**: 2025-12-05

View File

@ -0,0 +1,960 @@
# Modularization Implementation Plan for Large Source Files
## Executive Summary
This plan details the modularization of three oversized source files in the Nyash MIR builder:
- **control_flow.rs** (1,632 lines) - **HIGHEST PRIORITY**
- **generic_case_a.rs** (1,056 lines) - **MEDIUM PRIORITY**
- **loopform_builder.rs** (1,166 lines) - **LOWER PRIORITY**
The strategy prioritizes **control_flow.rs** first due to its critical role in JoinIR integration and ongoing Pattern 4+ development. Each modularization is broken into incremental phases that maintain backward compatibility, with clear verification points and rollback procedures.
**Total Estimated Effort**: 15-20 hours across 2-3 weeks
**Key Principle**: Zero breaking changes, backward compatible at every phase
---
## Why control_flow.rs First?
**control_flow.rs** is the integration point for JoinIR lowering patterns, actively being extended for Pattern 4+. Modularizing now:
1. **Prevents future pain** - Pattern 4/5/6 would add another 500+ lines to an already massive file
2. **Sets the pattern** - Establishes the modularization template for other files
3. **Reduces merge conflicts** - Isolates pattern-specific changes to dedicated files
4. **Improves debuggability** - `NYASH_OPTION_C_DEBUG` traces become easier to locate
5. **Currently blocking** - The 714-line `merge_joinir_mir_blocks()` function is a code smell that makes maintenance difficult
---
# 1. control_flow.rs Modularization (HIGHEST PRIORITY)
## Current State
**File**: `src/mir/builder/control_flow.rs` (1,632 lines)
**Functions** (13 total):
- `trace_varmap()` (8 lines) - Debug utility
- `cf_block()` (4 lines) - Block entry point
- `cf_if()` (10 lines) - If entry point
- `cf_loop()` (80 lines) - Loop entry point
- `try_cf_loop_joinir()` (91 lines) - JoinIR routing logic
- `cf_loop_joinir_impl()` (247 lines) - JoinIR pattern dispatcher
- `cf_loop_pattern1_minimal()` (143 lines) - Pattern 1 lowering
- `cf_loop_pattern2_with_break()` (120 lines) - Pattern 2 lowering
- `cf_loop_pattern3_with_if_phi()` (168 lines) - Pattern 3 lowering
- **`merge_joinir_mir_blocks()` (714 lines) - LARGEST FUNCTION** ⚠️
- `cf_try_catch()` (138 lines) - Exception handling
- `extract_loop_variable_from_condition()` (31 lines) - Utility
- `cf_throw()` (23 lines) - Throw entry point
**Key Issues**:
- `merge_joinir_mir_blocks()` is 714 lines (44% of the file!)
- Pattern lowerers (1/2/3) are isolated but scattered
- JoinIR integration logic mixed with entry points
- No clear separation between routing and implementation
---
## Proposed New Structure
```
src/mir/builder/control_flow/
├── mod.rs (~150 lines) - Public API, entry points
├── debug.rs (~50 lines) - Debug utilities (trace_varmap)
├── joinir/
│ ├── mod.rs (~100 lines) - JoinIR integration coordinator
│ ├── routing.rs (~150 lines) - try_cf_loop_joinir, dispatcher
│ ├── merge/
│ │ ├── mod.rs (~100 lines) - merge_joinir_mir_blocks entry point
│ │ ├── id_remapper.rs (~150 lines) - ValueId/BlockId remapping
│ │ ├── block_allocator.rs (~100 lines) - Block ID allocation
│ │ ├── value_collector.rs (~100 lines) - Value collection phase
│ │ ├── instruction_rewriter.rs (~150 lines) - Instruction rewriting
│ │ └── exit_phi_builder.rs (~100 lines) - Exit PHI construction
│ └── patterns/
│ ├── mod.rs (~50 lines) - Pattern dispatcher
│ ├── pattern1_minimal.rs (~150 lines) - Pattern 1 lowering
│ ├── pattern2_with_break.rs (~130 lines) - Pattern 2 lowering
│ └── pattern3_with_if_phi.rs (~180 lines) - Pattern 3 lowering
├── exception/
│ ├── mod.rs (~50 lines) - Exception handling API
│ ├── try_catch.rs (~150 lines) - try/catch implementation
│ └── throw.rs (~30 lines) - throw implementation
└── utils.rs (~50 lines) - extract_loop_variable, etc.
Total: ~1,850 lines (13% increase for clarity, but distributed across 19 files)
Average file size: ~97 lines (vs 1,632 lines monolith)
```
---
## Phase-by-Phase Migration Plan
### Phase 1: Extract Debug Utilities (30 min)
**Goal**: Move `trace_varmap()` to a dedicated debug module.
**Steps**:
1. Create `src/mir/builder/control_flow/debug.rs`
2. Move `trace_varmap()` implementation
3. Update `mod.rs` to re-export `pub(super) use debug::*;`
4. Run verification
**Files Created**:
- `src/mir/builder/control_flow/debug.rs` (~50 lines)
**Files Modified**:
- `src/mir/builder/control_flow.rs``.../control_flow/mod.rs`
**Verification**:
```bash
cargo build --release
cargo test --lib
tools/smokes/v2/run.sh --profile quick --filter "loop_*"
```
**Rollback**: Delete `debug.rs`, revert imports in `mod.rs`
**Estimated Effort**: 30 minutes
---
### Phase 2: Extract Pattern Lowerers (2 hours)
**Goal**: Move Pattern 1/2/3 lowering functions to dedicated files.
**Steps**:
1. Create `src/mir/builder/control_flow/joinir/patterns/` directory
2. Create `patterns/mod.rs` with dispatcher
3. Move `cf_loop_pattern1_minimal()` to `pattern1_minimal.rs`
4. Move `cf_loop_pattern2_with_break()` to `pattern2_with_break.rs`
5. Move `cf_loop_pattern3_with_if_phi()` to `pattern3_with_if_phi.rs`
6. Update imports in `mod.rs`
7. Run verification
**Files Created**:
- `control_flow/joinir/patterns/mod.rs` (~50 lines)
- `control_flow/joinir/patterns/pattern1_minimal.rs` (~150 lines)
- `control_flow/joinir/patterns/pattern2_with_break.rs` (~130 lines)
- `control_flow/joinir/patterns/pattern3_with_if_phi.rs` (~180 lines)
**Public API Changes**: None (all functions are already `fn`, not `pub(super) fn`)
**Verification**:
```bash
cargo build --release
cargo test --lib -- --include-ignored
NYASH_OPTION_C_DEBUG=1 ./target/release/nyash apps/tests/loop_min_while.hako
tools/smokes/v2/run.sh --profile quick --filter "loop_*"
```
**Rollback**: Delete `patterns/` directory, revert imports
**Estimated Effort**: 2 hours
---
### Phase 3: Extract JoinIR Routing Logic (1.5 hours)
**Goal**: Move `try_cf_loop_joinir()` and `cf_loop_joinir_impl()` to routing module.
**Steps**:
1. Create `control_flow/joinir/routing.rs`
2. Move `try_cf_loop_joinir()` implementation
3. Move `cf_loop_joinir_impl()` implementation
4. Update imports in `mod.rs`
5. Run verification
**Files Created**:
- `control_flow/joinir/routing.rs` (~150 lines)
**Files Modified**:
- `control_flow/mod.rs` (update imports)
**Verification**:
```bash
cargo build --release
cargo test --release
HAKO_JOINIR_PRINT_TOKENS_MAIN=1 ./target/release/nyash test_program.hako
```
**Rollback**: Delete `routing.rs`, revert imports
**Estimated Effort**: 1.5 hours
---
### Phase 4: Break Down merge_joinir_mir_blocks (6 hours) ⚠️ **CRITICAL**
**Goal**: Split the 714-line monster function into 6 logical modules.
**Background**: `merge_joinir_mir_blocks()` performs 6 distinct phases:
1. **Block ID allocation** (lines 864-923)
2. **Value collection** (lines 931-971)
3. **Block merging** (lines 973-1100)
4. **Instruction rewriting** (lines 1102-1400)
5. **Exit PHI construction** (lines 1402-1500)
6. **Boundary reconnection** (lines 1502-1578)
**Steps**:
1. Create `control_flow/joinir/merge/` directory structure
2. Extract `id_remapper.rs` - ID remapping utilities
3. Extract `block_allocator.rs` - Block ID allocation logic
4. Extract `value_collector.rs` - Value collection phase
5. Extract `instruction_rewriter.rs` - Instruction transformation
6. Extract `exit_phi_builder.rs` - Exit PHI construction
7. Create `merge/mod.rs` as the coordinator
8. Update imports
9. Run comprehensive verification
**Files Created**:
- `control_flow/joinir/merge/mod.rs` (~100 lines) - Coordinator
- `control_flow/joinir/merge/id_remapper.rs` (~150 lines)
- `control_flow/joinir/merge/block_allocator.rs` (~100 lines)
- `control_flow/joinir/merge/value_collector.rs` (~100 lines)
- `control_flow/joinir/merge/instruction_rewriter.rs` (~150 lines)
- `control_flow/joinir/merge/exit_phi_builder.rs` (~100 lines)
**Public API**:
```rust
// control_flow/joinir/merge/mod.rs
pub(in crate::mir::builder) fn merge_joinir_mir_blocks(
builder: &mut MirBuilder,
mir_module: &MirModule,
boundary: Option<&JoinInlineBoundary>,
debug: bool,
) -> Result<Option<ValueId>, String> {
// Coordinator that calls the 6 sub-modules
let remapper = id_remapper::create_remapper(builder, mir_module, debug)?;
let values = value_collector::collect_values(mir_module, &remapper, debug)?;
// ... etc
}
```
**Verification** (CRITICAL - run ALL tests):
```bash
# Step 1: Build verification
cargo build --release
cargo test --lib
# Step 2: Smoke tests (ALL patterns)
tools/smokes/v2/run.sh --profile quick
# Step 3: Debug trace verification
NYASH_OPTION_C_DEBUG=1 ./target/release/nyash apps/tests/loop_min_while.hako 2>&1 | grep "merge_joinir"
# Step 4: Regression check (run 3 times for determinism)
for i in 1 2 3; do
echo "=== Run $i ==="
cargo test --release test_loop_patterns 2>&1 | grep "test result"
done
# Step 5: Full integration test
cargo test --release --all-features
```
**Rollback**:
```bash
# If anything breaks, immediately:
rm -rf src/mir/builder/control_flow/joinir/merge
git checkout src/mir/builder/control_flow.rs
cargo build --release && cargo test
```
**Risk Mitigation**:
- Keep the original `merge_joinir_mir_blocks()` as a comment at the top of `merge/mod.rs`
- Add `#[cfg(test)]` unit tests for each sub-module
- Use feature flag `NYASH_USE_LEGACY_MERGE=1` for emergency fallback (optional)
**Estimated Effort**: 6 hours (most complex phase)
---
### Phase 5: Extract Exception Handling (1 hour)
**Goal**: Move `cf_try_catch()` and `cf_throw()` to exception module.
**Steps**:
1. Create `control_flow/exception/` directory
2. Move `cf_try_catch()` to `try_catch.rs`
3. Move `cf_throw()` to `throw.rs`
4. Create `exception/mod.rs` as coordinator
5. Update imports
6. Run verification
**Files Created**:
- `control_flow/exception/mod.rs` (~50 lines)
- `control_flow/exception/try_catch.rs` (~150 lines)
- `control_flow/exception/throw.rs` (~30 lines)
**Verification**:
```bash
cargo build --release
cargo test --lib -- exception
```
**Rollback**: Delete `exception/` directory, revert imports
**Estimated Effort**: 1 hour
---
### Phase 6: Extract Utilities (30 min)
**Goal**: Move `extract_loop_variable_from_condition()` to utils module.
**Steps**:
1. Create `control_flow/utils.rs`
2. Move utility functions
3. Update imports
4. Run verification
**Files Created**:
- `control_flow/utils.rs` (~50 lines)
**Verification**:
```bash
cargo build --release
cargo test --lib
```
**Rollback**: Delete `utils.rs`, revert imports
**Estimated Effort**: 30 minutes
---
### Phase 7: Final Cleanup & Documentation (1 hour)
**Goal**: Clean up `mod.rs`, add module documentation, verify all imports.
**Steps**:
1. Review `control_flow/mod.rs` for clarity
2. Add module-level documentation to each file
3. Ensure all `pub(super)` visibility is correct
4. Run final comprehensive verification
5. Update CLAUDE.md with new structure
**Documentation Template**:
```rust
//! Pattern 1 Minimal Loop Lowering
//!
//! This module implements the simplest JoinIR loop lowering pattern:
//! - Single loop variable
//! - No break statements
//! - Simple condition (i < N)
//!
//! Used by: minimal_ssa_skip_ws, simple while loops
//! Phase: 188 (Pattern 1 implementation)
```
**Verification**:
```bash
cargo build --release --all-features
cargo test --release
cargo clippy --all-targets
tools/smokes/v2/run.sh --profile integration
```
**Estimated Effort**: 1 hour
---
## Public API Changes
### Before (control_flow.rs)
```rust
impl MirBuilder {
pub(super) fn cf_block(...) -> Result<ValueId, String>
pub(super) fn cf_if(...) -> Result<ValueId, String>
pub(super) fn cf_loop(...) -> Result<ValueId, String>
pub(super) fn cf_try_catch(...) -> Result<ValueId, String>
pub(super) fn cf_throw(...) -> Result<ValueId, String>
}
```
### After (control_flow/mod.rs)
```rust
// Re-export all entry points (NO CHANGE to public API)
pub(super) use entry_points::{cf_block, cf_if, cf_loop, cf_try_catch, cf_throw};
// Internal modules (not exposed outside control_flow)
mod debug;
mod utils;
mod joinir;
mod exception;
mod entry_points;
```
**Guarantee**: Zero breaking changes - all `pub(super)` functions remain accessible from `MirBuilder`.
---
## Build Verification Strategy
### After Each Phase:
```bash
# 1. Compilation check
cargo build --release
echo "Build: $?" >> /tmp/modularization_log.txt
# 2. Unit tests
cargo test --lib
echo "Unit tests: $?" >> /tmp/modularization_log.txt
# 3. Integration tests
cargo test --release
echo "Integration tests: $?" >> /tmp/modularization_log.txt
# 4. Smoke tests (only for Phases 2-4)
tools/smokes/v2/run.sh --profile quick --filter "loop_*"
echo "Smoke tests: $?" >> /tmp/modularization_log.txt
# 5. Debug trace verification (for Phase 4)
NYASH_OPTION_C_DEBUG=1 ./target/release/nyash apps/tests/loop_min_while.hako 2>&1 | tee /tmp/debug_trace.txt
grep -q "merge_joinir" /tmp/debug_trace.txt && echo "Debug trace: OK"
```
### Failure Handling:
```bash
# If any step fails, STOP immediately
if [ $? -ne 0 ]; then
echo "FAILURE detected at Phase $PHASE"
git status
echo "Run rollback procedure? (y/n)"
# ... rollback steps
fi
```
---
## Risk Assessment
| Risk | Likelihood | Impact | Mitigation | Detection |
|------|------------|--------|------------|-----------|
| **Breaking imports** | Medium | High | Incremental phases, test after each | `cargo build` fails |
| **Merge function breakage** | Low | Critical | Keep original as comment, feature flag | Smoke tests fail |
| **Performance regression** | Very Low | Medium | No algorithmic changes | Benchmark before/after |
| **Debug trace changes** | Low | Low | Verify `NYASH_OPTION_C_DEBUG` output | Manual inspection |
| **HashMap non-determinism** | Very Low | Low | Already using BTreeMap in critical paths | Run tests 3x |
### Critical Mitigations:
1. **Phase 4 (merge function)**: Keep original function as comment
2. **All phases**: Commit after each successful phase
3. **Rollback plan**: Document exact `git checkout` commands for each file
---
## Implementation Effort Breakdown
| Phase | Description | Effort | Risk |
|-------|-------------|--------|------|
| Phase 1 | Debug utilities | 30 min | Low |
| Phase 2 | Pattern lowerers | 2 hours | Low |
| Phase 3 | JoinIR routing | 1.5 hours | Low |
| **Phase 4** | **merge_joinir_mir_blocks** | **6 hours** | **Medium** |
| Phase 5 | Exception handling | 1 hour | Low |
| Phase 6 | Utilities | 30 min | Low |
| Phase 7 | Cleanup & docs | 1 hour | Low |
| **Total** | **control_flow.rs** | **12.5 hours** | - |
**Recommended Schedule**:
- **Week 1**: Phases 1-3 (4 hours total) - Low risk warmup
- **Week 2**: Phase 4 (6 hours) - Dedicated time for merge function
- **Week 3**: Phases 5-7 (2.5 hours) - Final cleanup
---
## Success Criteria
- ✅ All 267 tests pass (no regressions)
- ✅ Build time ≤ current (no increase)
-`control_flow/mod.rs` is < 200 lines (88% reduction)
- Largest single file is < 200 lines (vs 714 lines before)
- Debug traces still work (`NYASH_OPTION_C_DEBUG=1`)
- Smoke tests pass for all patterns (1/2/3)
- No HashMap non-determinism introduced
- Code is easier to navigate (measured by developer feedback)
---
# 2. generic_case_a.rs Modularization (MEDIUM PRIORITY)
## Current State
**File**: `src/mir/join_ir/lowering/generic_case_a.rs` (1,056 lines)
**Functions** (4 public lowerers):
- `lower_case_a_skip_ws_with_scope()` + `lower_case_a_skip_ws_core()` (~203 lines)
- `lower_case_a_trim_with_scope()` + `lower_case_a_trim_core()` (~479 lines)
- `lower_case_a_append_defs_with_scope()` + `lower_case_a_append_defs_core()` (~156 lines)
- `lower_case_a_stage1_usingresolver_with_scope()` + `lower_case_a_stage1_usingresolver_core()` (~167 lines)
**Key Observation**: This file already has companion files:
- `generic_case_a_entry_builder.rs` (4,828 bytes)
- `generic_case_a_whitespace_check.rs` (4,552 bytes)
**Strategy**: Complete the modularization pattern by splitting the 4 lowerers into separate files.
---
## Proposed New Structure
```
src/mir/join_ir/lowering/generic_case_a/
├── mod.rs (~100 lines) - Public API
├── skip_ws.rs (~220 lines) - skip_ws lowerer
├── trim.rs (~500 lines) - trim lowerer
├── append_defs.rs (~170 lines) - append_defs lowerer
├── stage1_using_resolver.rs (~180 lines) - stage1 using resolver
├── entry_builder.rs (~150 lines) - (moved from parent)
└── whitespace_check.rs (~150 lines) - (moved from parent)
Total: ~1,470 lines (39% increase for clarity, distributed across 7 files)
Average: ~210 lines per file
```
---
## Phase-by-Phase Migration Plan
### Phase 1: Create Directory Structure (15 min)
**Steps**:
1. Create `src/mir/join_ir/lowering/generic_case_a/` directory
2. Create `mod.rs` with public API exports
3. Move existing companion files into directory
4. Update parent `mod.rs` imports
5. Run verification
**Verification**:
```bash
cargo build --release
```
**Rollback**: Delete directory, revert parent `mod.rs`
**Estimated Effort**: 15 minutes
---
### Phase 2: Extract skip_ws Lowerer (45 min)
**Steps**:
1. Create `generic_case_a/skip_ws.rs`
2. Move `lower_case_a_skip_ws_with_scope()` and `_core()`
3. Add module documentation
4. Update `mod.rs` imports
5. Run verification
**Verification**:
```bash
cargo build --release
cargo test --lib -- skip_ws
```
**Estimated Effort**: 45 minutes
---
### Phase 3: Extract trim Lowerer (1 hour)
**Steps**:
1. Create `generic_case_a/trim.rs`
2. Move `lower_case_a_trim_with_scope()` and `_core()`
3. Add documentation
4. Update imports
5. Run verification
**Verification**:
```bash
cargo build --release
cargo test --lib -- trim
```
**Estimated Effort**: 1 hour
---
### Phase 4: Extract append_defs & stage1 Lowerers (1 hour)
**Steps**:
1. Create `generic_case_a/append_defs.rs`
2. Create `generic_case_a/stage1_using_resolver.rs`
3. Move respective functions
4. Add documentation
5. Update imports
6. Run verification
**Verification**:
```bash
cargo build --release
cargo test --release
tools/smokes/v2/run.sh --profile quick --filter "funcscanner_*"
```
**Estimated Effort**: 1 hour
---
### Phase 5: Final Cleanup (30 min)
**Steps**:
1. Add module-level documentation
2. Verify all imports are clean
3. Run comprehensive tests
4. Update documentation
**Verification**:
```bash
cargo build --release --all-features
cargo test --release
```
**Estimated Effort**: 30 minutes
---
## Public API Changes
### Before
```rust
// src/mir/join_ir/lowering/generic_case_a.rs
pub(crate) fn lower_case_a_skip_ws_with_scope(...) -> Option<JoinModule>
pub(crate) fn lower_case_a_trim_with_scope(...) -> Option<JoinModule>
// ... etc
```
### After
```rust
// src/mir/join_ir/lowering/generic_case_a/mod.rs
pub(crate) use skip_ws::lower_case_a_skip_ws_with_scope;
pub(crate) use trim::lower_case_a_trim_with_scope;
pub(crate) use append_defs::lower_case_a_append_defs_with_scope;
pub(crate) use stage1_using_resolver::lower_case_a_stage1_usingresolver_with_scope;
```
**Guarantee**: Zero breaking changes - all `pub(crate)` functions remain accessible.
---
## Implementation Effort Breakdown
| Phase | Description | Effort | Risk |
|-------|-------------|--------|------|
| Phase 1 | Directory setup | 15 min | Low |
| Phase 2 | skip_ws lowerer | 45 min | Low |
| Phase 3 | trim lowerer | 1 hour | Low |
| Phase 4 | append_defs & stage1 | 1 hour | Low |
| Phase 5 | Cleanup | 30 min | Low |
| **Total** | **generic_case_a.rs** | **3.5 hours** | - |
---
## Success Criteria
- All tests pass
- `generic_case_a/mod.rs` is < 150 lines
- Each lowerer is in a dedicated file
- Companion files integrated into directory
- Documentation added to all modules
---
# 3. loopform_builder.rs Modularization (LOWER PRIORITY)
## Current State
**File**: `src/mir/phi_core/loopform_builder.rs` (1,166 lines)
**Status**: Already partially modularized in Phase 191!
**Existing Structure** (Phase 191):
```
src/mir/phi_core/
├── loopform_builder.rs (1,166 lines) - Main coordinator
├── loopform_context.rs - ValueId management
├── loopform_variable_models.rs - CarrierVariable, PinnedVariable
├── loopform_utils.rs - Debug and bypass utilities
├── loopform_exit_phi.rs - Exit PHI builder
```
**Remaining Work**: The main `loopform_builder.rs` still contains implementation logic that should be moved to dedicated modules.
---
## Proposed New Structure
```
src/mir/phi_core/loopform/
├── mod.rs (~100 lines) - Public API
├── context.rs (~150 lines) - (existing loopform_context.rs)
├── variable_models.rs (~150 lines) - (existing loopform_variable_models.rs)
├── utils.rs (~100 lines) - (existing loopform_utils.rs)
├── exit_phi.rs (~150 lines) - (existing loopform_exit_phi.rs)
├── passes/
│ ├── mod.rs (~50 lines) - 4-pass architecture coordinator
│ ├── pass1_discovery.rs (~150 lines) - Variable discovery
│ ├── pass2_header_phi.rs (~150 lines) - Header PHI construction
│ ├── pass3_latch.rs (~100 lines) - Latch block processing
│ └── pass4_exit_phi.rs (~150 lines) - Exit PHI construction
└── builder_core.rs (~200 lines) - Core builder logic
Total: ~1,450 lines (24% increase for clarity, distributed across 11 files)
Average: ~132 lines per file
```
---
## Phase-by-Phase Migration Plan
### Phase 1: Directory Structure (30 min)
**Steps**:
1. Create `src/mir/phi_core/loopform/` directory
2. Move existing modular files into directory
3. Create `mod.rs` with re-exports
4. Update parent `mod.rs` imports
5. Run verification
**Verification**:
```bash
cargo build --release
cargo test --lib -- loopform
```
**Estimated Effort**: 30 minutes
---
### Phase 2: Extract 4-Pass Architecture (2 hours)
**Steps**:
1. Create `loopform/passes/` directory
2. Identify the 4 passes in `loopform_builder.rs`
3. Extract each pass to dedicated file
4. Create `passes/mod.rs` as coordinator
5. Update imports
6. Run verification
**Verification**:
```bash
cargo build --release
cargo test --release -- loopform
NYASH_LOOPFORM_DEBUG=1 ./target/release/nyash test_loop.hako
```
**Estimated Effort**: 2 hours
---
### Phase 3: Extract Core Builder Logic (1 hour)
**Steps**:
1. Create `loopform/builder_core.rs`
2. Move remaining builder logic
3. Update imports
4. Run verification
**Verification**:
```bash
cargo build --release
cargo test --release
```
**Estimated Effort**: 1 hour
---
### Phase 4: Final Cleanup (30 min)
**Steps**:
1. Add module documentation
2. Verify all re-exports
3. Run comprehensive tests
**Verification**:
```bash
cargo build --release --all-features
cargo test --release
tools/smokes/v2/run.sh --profile quick --filter "phi_*"
```
**Estimated Effort**: 30 minutes
---
## Public API Changes
### Before
```rust
// src/mir/phi_core/loopform_builder.rs
pub use loopform_context::LoopFormContext;
pub use loopform_variable_models::{CarrierVariable, PinnedVariable};
pub fn build_exit_phis_for_control<O: LoopFormOps>(...)
```
### After
```rust
// src/mir/phi_core/loopform/mod.rs
pub use context::LoopFormContext;
pub use variable_models::{CarrierVariable, PinnedVariable};
pub use exit_phi::build_exit_phis_for_control;
```
**Guarantee**: Zero breaking changes.
---
## Implementation Effort Breakdown
| Phase | Description | Effort | Risk |
|-------|-------------|--------|------|
| Phase 1 | Directory setup | 30 min | Low |
| Phase 2 | 4-pass extraction | 2 hours | Medium |
| Phase 3 | Core builder | 1 hour | Low |
| Phase 4 | Cleanup | 30 min | Low |
| **Total** | **loopform_builder.rs** | **4 hours** | - |
---
## Success Criteria
- All tests pass
- `loopform/mod.rs` is < 150 lines
- Each pass is in a dedicated file
- Existing modular files integrated
- Documentation added
---
# Implementation Order & Timeline
## Recommended Schedule (3 weeks)
### Week 1: control_flow.rs Phases 1-3 (Low Risk)
- **Monday**: Phase 1 (Debug utilities) - 30 min
- **Tuesday**: Phase 2 (Pattern lowerers) - 2 hours
- **Wednesday**: Phase 3 (JoinIR routing) - 1.5 hours
- **Verification**: Run full smoke tests at end of week
### Week 2: control_flow.rs Phase 4 (High Risk)
- **Monday-Tuesday**: Phase 4 (merge_joinir_mir_blocks) - 6 hours
- **Wednesday**: Buffer day for fixing any issues
- **Thursday-Friday**: Phases 5-7 (Exception, utilities, cleanup) - 2.5 hours
### Week 3: generic_case_a.rs (Optional)
- **Monday-Tuesday**: generic_case_a.rs Phases 1-5 - 3.5 hours
- **Wednesday**: Buffer
- **Thursday-Friday**: Documentation & final verification
### Future (After Pattern 4+): loopform_builder.rs
- **Timing**: After Pattern 4/5/6 development stabilizes
- **Effort**: 4 hours
- **Priority**: Lower (already partially modularized)
---
## Effort Summary
| File | Total Effort | Priority | Complexity | Blocking? |
|------|--------------|----------|------------|-----------|
| **control_flow.rs** | **12.5 hours** | **HIGHEST** | **High** | **Yes** (Pattern 4+) |
| **generic_case_a.rs** | **3.5 hours** | **MEDIUM** | **Low** | No |
| **loopform_builder.rs** | **4 hours** | **LOWER** | **Medium** | No |
| **TOTAL** | **20 hours** | - | - | - |
---
## Risk/Mitigation Matrix
| Risk | Likelihood | Impact | Mitigation | Detection Method |
|------|------------|--------|------------|------------------|
| **Breaking imports** | Medium | High | Incremental phases, test after each | `cargo build` fails |
| **Merge function breakage** | Low | Critical | Keep original as comment, feature flag | Smoke tests fail |
| **Pattern lowerer breakage** | Low | High | Test each pattern independently | Integration tests |
| **Performance regression** | Very Low | Medium | No algorithmic changes | Benchmark suite |
| **Debug trace changes** | Low | Low | Verify `NYASH_OPTION_C_DEBUG` output | Manual inspection |
| **Test failures** | Low | Medium | Run tests after every phase | CI/CD pipeline |
| **Merge conflicts** | Medium | Low | Work on dedicated branch | Git status |
---
## Success Criteria (Global)
### Quantitative:
- All 267+ tests pass (no regressions)
- Build time current (no increase)
- Largest single file in modularized areas is < 250 lines
- Average file size in modularized areas is < 150 lines
### Qualitative:
- Code is easier to navigate (developer feedback)
- New patterns can be added without modifying 1,600-line files
- Debug traces remain functional
- Documentation is clear and helpful
### Process:
- Zero breaking changes at any phase
- Each phase can be rolled back independently
- Commits are small and focused
- CI/CD passes after every commit
---
## Appendix: Emergency Rollback Procedure
If anything goes wrong during modularization:
```bash
# 1. Identify the problematic phase
echo "ROLLBACK: Phase $PHASE failed"
# 2. Check git status
git status
# 3. Rollback created files
rm -rf src/mir/builder/control_flow/ # (example)
# 4. Restore original file
git checkout src/mir/builder/control_flow.rs
# 5. Verify build
cargo build --release
cargo test --lib
# 6. Document the issue
echo "$(date): Phase $PHASE rollback due to: $REASON" >> docs/development/refactoring/rollback_log.txt
# 7. Review and adjust plan
# - Was the failure expected?
# - Do we need to adjust the approach?
# - Should we skip this phase?
```
---
## Conclusion
This modularization plan provides a **safe, incremental path** to breaking down 3 large files into maintainable modules. The approach prioritizes:
1. **Zero breaking changes** - Backward compatible at every step
2. **Clear verification** - Test after each phase
3. **Easy rollback** - Can undo any phase if issues arise
4. **Pattern setting** - control_flow.rs establishes the template for others
**Next Steps**:
1. Review this plan with the team
2. Get approval for Week 1 (control_flow.rs Phases 1-3)
3. Create a dedicated branch: `refactor/modularize-control-flow`
4. Begin Phase 1!
**Questions? Concerns?** Open an issue or discuss in the team channel.
---
**Document Version**: 1.0
**Created**: 2025-12-05
**Author**: Claude Code (AI-assisted planning)
**Status**: Draft - Awaiting Review

View File

@ -0,0 +1,253 @@
# Modularization Quick Start Checklist
This is the **TL;DR** version of the full implementation plan. Use this as a quick reference during execution.
**Full Plan**: [modularization-implementation-plan.md](./modularization-implementation-plan.md)
---
## 1. control_flow.rs Modularization (12.5 hours)
### Phase 1: Debug Utilities (30 min)
- [ ] Create `src/mir/builder/control_flow/debug.rs`
- [ ] Move `trace_varmap()`
- [ ] Update `mod.rs` imports
- [ ] Run: `cargo build --release && cargo test --lib`
- [ ] Commit: `git commit -m "refactor: Extract debug utilities from control_flow.rs"`
### Phase 2: Pattern Lowerers (2 hours)
- [ ] Create `control_flow/joinir/patterns/` directory
- [ ] Move `cf_loop_pattern1_minimal()` to `pattern1_minimal.rs`
- [ ] Move `cf_loop_pattern2_with_break()` to `pattern2_with_break.rs`
- [ ] Move `cf_loop_pattern3_with_if_phi()` to `pattern3_with_if_phi.rs`
- [ ] Create `patterns/mod.rs` dispatcher
- [ ] Run: `cargo build --release && cargo test --lib -- --include-ignored`
- [ ] Test: `NYASH_OPTION_C_DEBUG=1 ./target/release/nyash apps/tests/loop_min_while.hako`
- [ ] Commit: `git commit -m "refactor: Extract pattern lowerers from control_flow.rs"`
### Phase 3: JoinIR Routing (1.5 hours)
- [ ] Create `control_flow/joinir/routing.rs`
- [ ] Move `try_cf_loop_joinir()`
- [ ] Move `cf_loop_joinir_impl()`
- [ ] Update imports
- [ ] Run: `cargo build --release && cargo test --release`
- [ ] Commit: `git commit -m "refactor: Extract JoinIR routing logic"`
### Phase 4: merge_joinir_mir_blocks (6 hours) ⚠️ CRITICAL
- [ ] Create `control_flow/joinir/merge/` directory
- [ ] Extract `id_remapper.rs` (150 lines)
- [ ] Extract `block_allocator.rs` (100 lines)
- [ ] Extract `value_collector.rs` (100 lines)
- [ ] Extract `instruction_rewriter.rs` (150 lines)
- [ ] Extract `exit_phi_builder.rs` (100 lines)
- [ ] Create `merge/mod.rs` coordinator (100 lines)
- [ ] Update all imports
- [ ] Run: `cargo build --release && cargo test --lib`
- [ ] Run: `tools/smokes/v2/run.sh --profile quick`
- [ ] Run: `NYASH_OPTION_C_DEBUG=1 ./target/release/nyash apps/tests/loop_min_while.hako 2>&1 | grep "merge_joinir"`
- [ ] Run determinism test (3x):
```bash
for i in 1 2 3; do echo "=== Run $i ==="; cargo test --release test_loop_patterns; done
```
- [ ] Commit: `git commit -m "refactor: Break down merge_joinir_mir_blocks into modules"`
### Phase 5: Exception Handling (1 hour)
- [ ] Create `control_flow/exception/` directory
- [ ] Move `cf_try_catch()` to `try_catch.rs`
- [ ] Move `cf_throw()` to `throw.rs`
- [ ] Create `exception/mod.rs`
- [ ] Run: `cargo build --release && cargo test --lib -- exception`
- [ ] Commit: `git commit -m "refactor: Extract exception handling"`
### Phase 6: Utilities (30 min)
- [ ] Create `control_flow/utils.rs`
- [ ] Move `extract_loop_variable_from_condition()`
- [ ] Run: `cargo build --release && cargo test --lib`
- [ ] Commit: `git commit -m "refactor: Extract utilities"`
### Phase 7: Final Cleanup (1 hour)
- [ ] Add module-level documentation to all files
- [ ] Review `control_flow/mod.rs` for clarity
- [ ] Verify all `pub(super)` visibility
- [ ] Run: `cargo build --release --all-features`
- [ ] Run: `cargo test --release`
- [ ] Run: `cargo clippy --all-targets`
- [ ] Run: `tools/smokes/v2/run.sh --profile integration`
- [ ] Update `CLAUDE.md` with new structure
- [ ] Commit: `git commit -m "docs: Update control_flow module documentation"`
---
## 2. generic_case_a.rs Modularization (3.5 hours)
### Phase 1: Directory Setup (15 min)
- [ ] Create `src/mir/join_ir/lowering/generic_case_a/` directory
- [ ] Create `mod.rs` with public API
- [ ] Move `generic_case_a_entry_builder.rs` into directory
- [ ] Move `generic_case_a_whitespace_check.rs` into directory
- [ ] Update parent `mod.rs` imports
- [ ] Run: `cargo build --release`
- [ ] Commit: `git commit -m "refactor: Create generic_case_a directory structure"`
### Phase 2: Extract skip_ws (45 min)
- [ ] Create `generic_case_a/skip_ws.rs`
- [ ] Move `lower_case_a_skip_ws_with_scope()` and `_core()`
- [ ] Add module documentation
- [ ] Run: `cargo build --release && cargo test --lib -- skip_ws`
- [ ] Commit: `git commit -m "refactor: Extract skip_ws lowerer"`
### Phase 3: Extract trim (1 hour)
- [ ] Create `generic_case_a/trim.rs`
- [ ] Move `lower_case_a_trim_with_scope()` and `_core()`
- [ ] Add module documentation
- [ ] Run: `cargo build --release && cargo test --lib -- trim`
- [ ] Commit: `git commit -m "refactor: Extract trim lowerer"`
### Phase 4: Extract append_defs & stage1 (1 hour)
- [ ] Create `generic_case_a/append_defs.rs`
- [ ] Create `generic_case_a/stage1_using_resolver.rs`
- [ ] Move respective functions
- [ ] Add module documentation
- [ ] Run: `cargo build --release && cargo test --release`
- [ ] Run: `tools/smokes/v2/run.sh --profile quick --filter "funcscanner_*"`
- [ ] Commit: `git commit -m "refactor: Extract append_defs and stage1 lowerers"`
### Phase 5: Final Cleanup (30 min)
- [ ] Add module-level documentation to all files
- [ ] Verify all imports are clean
- [ ] Run: `cargo build --release --all-features && cargo test --release`
- [ ] Commit: `git commit -m "docs: Complete generic_case_a modularization"`
---
## 3. loopform_builder.rs Modularization (4 hours) - FUTURE
**Status**: Already partially modularized in Phase 191. Remaining work is lower priority.
### Phase 1: Directory Setup (30 min)
- [ ] Create `src/mir/phi_core/loopform/` directory
- [ ] Move existing modular files into directory
- [ ] Create `mod.rs` with re-exports
- [ ] Run: `cargo build --release && cargo test --lib -- loopform`
### Phase 2: Extract 4-Pass Architecture (2 hours)
- [ ] Create `loopform/passes/` directory
- [ ] Extract Pass 1: Variable discovery
- [ ] Extract Pass 2: Header PHI construction
- [ ] Extract Pass 3: Latch block processing
- [ ] Extract Pass 4: Exit PHI construction
- [ ] Create `passes/mod.rs` coordinator
- [ ] Run: `cargo build --release && cargo test --release -- loopform`
### Phase 3: Extract Core Builder (1 hour)
- [ ] Create `loopform/builder_core.rs`
- [ ] Move remaining builder logic
- [ ] Run: `cargo build --release && cargo test --release`
### Phase 4: Final Cleanup (30 min)
- [ ] Add module documentation
- [ ] Run: `cargo build --release --all-features`
- [ ] Run: `tools/smokes/v2/run.sh --profile quick --filter "phi_*"`
---
## Emergency Rollback Commands
```bash
# Rollback control_flow.rs modularization
rm -rf src/mir/builder/control_flow/
git checkout src/mir/builder/control_flow.rs
cargo build --release && cargo test --lib
# Rollback generic_case_a.rs modularization
rm -rf src/mir/join_ir/lowering/generic_case_a/
git checkout src/mir/join_ir/lowering/generic_case_a*.rs
cargo build --release && cargo test --lib
# Rollback loopform_builder.rs modularization
rm -rf src/mir/phi_core/loopform/
git checkout src/mir/phi_core/loopform*.rs
cargo build --release && cargo test --lib
```
---
## Verification Commands
### Quick Verification (after each phase)
```bash
cargo build --release
cargo test --lib
```
### Comprehensive Verification (after critical phases)
```bash
cargo build --release --all-features
cargo test --release
cargo clippy --all-targets
tools/smokes/v2/run.sh --profile quick
```
### Debug Trace Verification (Phase 4 only)
```bash
NYASH_OPTION_C_DEBUG=1 ./target/release/nyash apps/tests/loop_min_while.hako 2>&1 | grep "merge_joinir"
```
### Determinism Check (Phase 4 only)
```bash
for i in 1 2 3; do
echo "=== Run $i ==="
cargo test --release test_loop_patterns 2>&1 | grep "test result"
done
```
---
## Success Criteria
- ✅ All 267+ tests pass
- ✅ Build time ≤ current
- ✅ `control_flow/mod.rs` < 200 lines (vs 1,632)
- ✅ Largest file < 250 lines (vs 714)
- ✅ Debug traces still work
- ✅ No HashMap non-determinism
- ✅ Code is easier to navigate
---
## Timeline
### Week 1: control_flow.rs Phases 1-3 (Low Risk)
- **Total**: 4 hours
- **Risk**: Low
- **Deliverable**: Pattern lowerers and routing isolated
### Week 2: control_flow.rs Phase 4 (High Risk)
- **Total**: 6 hours
- **Risk**: Medium
- **Deliverable**: merge_joinir_mir_blocks modularized
### Week 3: control_flow.rs Phases 5-7 + generic_case_a.rs
- **Total**: 6 hours
- **Risk**: Low
- **Deliverable**: control_flow.rs complete, generic_case_a.rs modularized
---
## Questions?
**Full Plan**: [modularization-implementation-plan.md](./modularization-implementation-plan.md)
**Open Issues**:
- Create a GitHub issue if you encounter blockers
- Tag with `refactoring` label
**Need Help?**:
- Check the full plan for detailed explanations
- Review the risk/mitigation matrix
- Use the emergency rollback procedure if needed
---
**Last Updated**: 2025-12-05
**Status**: Ready to execute

View File

@ -0,0 +1,789 @@
# Phase 4: merge_joinir_mir_blocks Breakdown Guide
**CRITICAL PHASE** - This is the most complex modularization task.
**Context**: `merge_joinir_mir_blocks()` is a 714-line monster function that performs 6 distinct operations. This guide provides step-by-step instructions for breaking it down safely.
---
## Overview
### Current State
- **File**: `src/mir/builder/control_flow.rs`
- **Function**: `merge_joinir_mir_blocks()` (lines 864-1578)
- **Size**: 714 lines (44% of the entire file!)
- **Complexity**: 6 distinct phases, multiple data structures, critical to JoinIR integration
### Target State
- **Directory**: `src/mir/builder/control_flow/joinir/merge/`
- **Files**: 7 files (1 coordinator + 6 sub-modules)
- **Average size**: ~100-150 lines per file
- **Maintainability**: Each phase in its own file
---
## Function Analysis
### Current Structure (Lines 864-1578)
```rust
fn merge_joinir_mir_blocks(
&mut self,
mir_module: &MirModule,
boundary: Option<&JoinInlineBoundary>,
debug: bool,
) -> Result<Option<ValueId>, String> {
// Phase 1: Block ID Allocation (lines 864-923)
// - Create JoinIrIdRemapper
// - Allocate new block IDs for all functions
// - Map function entry blocks
// Phase 2: Value Collection (lines 931-971)
// - Collect all ValueIds used across functions
// - Track Const String → function name mapping
// - Collect function parameters for tail call conversion
// Phase 3: ValueId Remapping (lines 973-1100)
// - Allocate new ValueIds for all collected values
// - Create remapping table
// - Handle determinism (BTreeSet/BTreeMap usage)
// Phase 4: Block Merging & Instruction Rewriting (lines 1102-1400)
// - For each function in sorted order:
// - For each block in sorted order:
// - Rewrite instructions with remapped IDs
// - Convert Call → Jump for intra-module calls
// - Convert Return → Jump to exit block
// - Handle tail call optimization
// - Insert merged blocks into current_function
// Phase 5: Exit PHI Construction (lines 1402-1500)
// - Collect return values from all functions
// - Create exit block with PHI node
// - Return exit PHI value
// Phase 6: Boundary Reconnection (lines 1502-1578)
// - If boundary with host_outputs specified:
// - Map exit PHI back to host variable_map
// - Update SSA slots in variable_map
// - Reconnect inlined code to host context
}
```
---
## Proposed Module Breakdown
### 1. `merge/mod.rs` - Coordinator (~100 lines)
**Purpose**: Orchestrate the 6 phases without implementing logic.
```rust
//! JoinIR MIR Block Merging Coordinator
//!
//! This module coordinates the merging of JoinIR-generated MIR functions
//! into the host MIR builder. The process is broken into 6 phases:
//!
//! 1. Block ID allocation
//! 2. Value collection
//! 3. ValueId remapping
//! 4. Instruction rewriting
//! 5. Exit PHI construction
//! 6. Boundary reconnection
use crate::mir::{MirModule, ValueId};
use crate::mir::join_ir::lowering::inline_boundary::JoinInlineBoundary;
mod id_remapper;
mod block_allocator;
mod value_collector;
mod instruction_rewriter;
mod exit_phi_builder;
use id_remapper::JoinIrIdRemapper;
pub(in crate::mir::builder) fn merge_joinir_mir_blocks(
builder: &mut crate::mir::builder::MirBuilder,
mir_module: &MirModule,
boundary: Option<&JoinInlineBoundary>,
debug: bool,
) -> Result<Option<ValueId>, String> {
// Phase 1: Allocate block IDs
let mut remapper = block_allocator::allocate_blocks(builder, mir_module, debug)?;
// Phase 2: Collect values
let (used_values, value_to_func_name, function_params) =
value_collector::collect_values(mir_module, &mut remapper, debug)?;
// Phase 3: Remap ValueIds
id_remapper::remap_values(builder, &used_values, &mut remapper, debug)?;
// Phase 4: Merge blocks and rewrite instructions
let exit_block_id = instruction_rewriter::merge_and_rewrite(
builder,
mir_module,
&mut remapper,
&value_to_func_name,
&function_params,
debug,
)?;
// Phase 5: Build exit PHI
let exit_phi_value = exit_phi_builder::build_exit_phi(
builder,
mir_module,
&remapper,
exit_block_id,
debug,
)?;
// Phase 6: Reconnect boundary (if specified)
if let Some(boundary) = boundary {
reconnect_boundary(builder, boundary, exit_phi_value, debug)?;
}
Ok(exit_phi_value)
}
fn reconnect_boundary(
builder: &mut crate::mir::builder::MirBuilder,
boundary: &JoinInlineBoundary,
exit_phi_value: Option<ValueId>,
debug: bool,
) -> Result<(), String> {
// Phase 6 implementation (simple, can stay in mod.rs)
// ... (lines 1502-1578 logic)
Ok(())
}
```
---
### 2. `merge/id_remapper.rs` - ID Remapping (~150 lines)
**Purpose**: Manage ValueId and BlockId remapping.
```rust
//! JoinIR ID Remapper
//!
//! Handles the mapping of ValueIds and BlockIds from JoinIR functions
//! to the host MIR builder's ID space.
use crate::mir::{BasicBlockId, ValueId};
use std::collections::BTreeMap;
/// Remapper for JoinIR → Host MIR ID translation
pub(super) struct JoinIrIdRemapper {
/// (function_name, old_block_id) → new_block_id
block_map: BTreeMap<(String, BasicBlockId), BasicBlockId>,
/// old_value_id → new_value_id
value_map: BTreeMap<ValueId, ValueId>,
/// function_name → entry_block_id
function_entry_map: BTreeMap<String, BasicBlockId>,
}
impl JoinIrIdRemapper {
pub(super) fn new() -> Self {
Self {
block_map: BTreeMap::new(),
value_map: BTreeMap::new(),
function_entry_map: BTreeMap::new(),
}
}
pub(super) fn set_block(&mut self, func_name: String, old_id: BasicBlockId, new_id: BasicBlockId) {
self.block_map.insert((func_name, old_id), new_id);
}
pub(super) fn get_block(&self, func_name: &str, old_id: BasicBlockId) -> Option<BasicBlockId> {
self.block_map.get(&(func_name.to_string(), old_id)).copied()
}
pub(super) fn set_value(&mut self, old_id: ValueId, new_id: ValueId) {
self.value_map.insert(old_id, new_id);
}
pub(super) fn get_value(&self, old_id: ValueId) -> Option<ValueId> {
self.value_map.get(&old_id).copied()
}
pub(super) fn set_function_entry(&mut self, func_name: String, entry_block: BasicBlockId) {
self.function_entry_map.insert(func_name, entry_block);
}
pub(super) fn get_function_entry(&self, func_name: &str) -> Option<BasicBlockId> {
self.function_entry_map.get(func_name).copied()
}
// ... (other helper methods from lines 873-959)
}
pub(super) fn remap_values(
builder: &mut crate::mir::builder::MirBuilder,
used_values: &std::collections::BTreeSet<ValueId>,
remapper: &mut JoinIrIdRemapper,
debug: bool,
) -> Result<(), String> {
// Phase 3 logic (lines 973-1100)
// - Allocate new ValueIds
// - Store in remapper
Ok(())
}
```
---
### 3. `merge/block_allocator.rs` - Block Allocation (~100 lines)
**Purpose**: Allocate block IDs for all JoinIR functions.
```rust
//! JoinIR Block ID Allocator
//!
//! Allocates new BasicBlockIds for all blocks in JoinIR functions
//! to avoid ID conflicts with the host MIR builder.
use crate::mir::{BasicBlockId, MirModule};
use super::id_remapper::JoinIrIdRemapper;
pub(super) fn allocate_blocks(
builder: &mut crate::mir::builder::MirBuilder,
mir_module: &MirModule,
debug: bool,
) -> Result<JoinIrIdRemapper, String> {
let mut remapper = JoinIrIdRemapper::new();
if debug {
eprintln!("[merge/block_allocator] Allocating block IDs for {} functions",
mir_module.functions.len());
}
// DETERMINISM FIX: Sort functions by name
let mut functions: Vec<_> = mir_module.functions.iter().collect();
functions.sort_by_key(|(name, _)| name.as_str());
for (func_name, func) in functions {
if debug {
eprintln!("[merge/block_allocator] Function: {}", func_name);
}
// DETERMINISM FIX: Sort blocks by ID
let mut blocks: Vec<_> = func.blocks.iter().collect();
blocks.sort_by_key(|(id, _)| id.0);
for (old_block_id, _) in blocks {
let new_block_id = builder.block_gen.next();
remapper.set_block(func_name.clone(), *old_block_id, new_block_id);
if debug {
eprintln!("[merge/block_allocator] Block: {:?}{:?}",
old_block_id, new_block_id);
}
}
// Store function entry block
let entry_block_new = remapper.get_block(func_name, func.entry_block)
.ok_or_else(|| format!("Entry block not found for {}", func_name))?;
remapper.set_function_entry(func_name.clone(), entry_block_new);
}
if debug {
eprintln!("[merge/block_allocator] Allocation complete");
}
Ok(remapper)
}
```
---
### 4. `merge/value_collector.rs` - Value Collection (~100 lines)
**Purpose**: Collect all ValueIds used across JoinIR functions.
```rust
//! JoinIR Value Collector
//!
//! Collects all ValueIds used in JoinIR functions for remapping.
use crate::mir::{ValueId, MirModule};
use super::id_remapper::JoinIrIdRemapper;
use std::collections::{BTreeSet, HashMap};
pub(super) fn collect_values(
mir_module: &MirModule,
remapper: &mut JoinIrIdRemapper,
debug: bool,
) -> Result<(BTreeSet<ValueId>, HashMap<ValueId, String>, HashMap<String, Vec<ValueId>>), String> {
let mut used_values = BTreeSet::new();
let mut value_to_func_name = HashMap::new();
let mut function_params = HashMap::new();
if debug {
eprintln!("[merge/value_collector] Collecting values from {} functions",
mir_module.functions.len());
}
for (func_name, func) in &mir_module.functions {
// Collect function parameters
function_params.insert(func_name.clone(), func.params.clone());
for block in func.blocks.values() {
// Collect values from instructions
let block_values = collect_values_in_block(block);
used_values.extend(block_values);
// Track Const String → function name mapping
for inst in &block.instructions {
if let crate::mir::MirInstruction::Const { dst, value } = inst {
if let crate::mir::types::ConstValue::String(s) = value {
// Check if this is a function name
if mir_module.functions.contains_key(s) {
value_to_func_name.insert(*dst, s.clone());
used_values.insert(*dst);
}
}
}
}
}
}
if debug {
eprintln!("[merge/value_collector] Collected {} values", used_values.len());
}
Ok((used_values, value_to_func_name, function_params))
}
fn collect_values_in_block(block: &crate::mir::BasicBlock) -> Vec<ValueId> {
// Helper function from lines 948-970
// ... (extract logic from remapper.collect_values_in_block)
vec![]
}
```
---
### 5. `merge/instruction_rewriter.rs` - Instruction Rewriting (~150 lines)
**Purpose**: Rewrite instructions and merge blocks.
```rust
//! JoinIR Instruction Rewriter
//!
//! Rewrites JoinIR instructions with remapped IDs and merges blocks
//! into the host MIR builder.
use crate::mir::{BasicBlockId, ValueId, MirModule, MirInstruction, BasicBlock};
use super::id_remapper::JoinIrIdRemapper;
use std::collections::HashMap;
pub(super) fn merge_and_rewrite(
builder: &mut crate::mir::builder::MirBuilder,
mir_module: &MirModule,
remapper: &mut JoinIrIdRemapper,
value_to_func_name: &HashMap<ValueId, String>,
function_params: &HashMap<String, Vec<ValueId>>,
debug: bool,
) -> Result<BasicBlockId, String> {
// Create exit block
let exit_block_id = builder.block_gen.next();
if debug {
eprintln!("[merge/instruction_rewriter] Merging blocks from {} functions",
mir_module.functions.len());
eprintln!("[merge/instruction_rewriter] Exit block: {:?}", exit_block_id);
}
// DETERMINISM FIX: Sort functions by name
let mut functions: Vec<_> = mir_module.functions.iter().collect();
functions.sort_by_key(|(name, _)| name.as_str());
for (func_name, func) in functions {
merge_function_blocks(
builder,
func_name,
func,
remapper,
value_to_func_name,
function_params,
exit_block_id,
debug,
)?;
}
Ok(exit_block_id)
}
fn merge_function_blocks(
builder: &mut crate::mir::builder::MirBuilder,
func_name: &str,
func: &crate::mir::MirFunction,
remapper: &mut JoinIrIdRemapper,
value_to_func_name: &HashMap<ValueId, String>,
function_params: &HashMap<String, Vec<ValueId>>,
exit_block_id: BasicBlockId,
debug: bool,
) -> Result<(), String> {
// DETERMINISM FIX: Sort blocks by ID
let mut blocks: Vec<_> = func.blocks.iter().collect();
blocks.sort_by_key(|(id, _)| id.0);
for (old_block_id, old_block) in blocks {
let new_block_id = remapper.get_block(func_name, *old_block_id)
.ok_or_else(|| format!("Block ID not found: {:?}", old_block_id))?;
let new_block = rewrite_block(
old_block,
remapper,
value_to_func_name,
function_params,
exit_block_id,
debug,
)?;
builder.current_function.as_mut()
.ok_or("No current function")?
.blocks.insert(new_block_id, new_block);
}
Ok(())
}
fn rewrite_block(
old_block: &BasicBlock,
remapper: &JoinIrIdRemapper,
value_to_func_name: &HashMap<ValueId, String>,
function_params: &HashMap<String, Vec<ValueId>>,
exit_block_id: BasicBlockId,
debug: bool,
) -> Result<BasicBlock, String> {
let mut new_instructions = Vec::new();
// Rewrite each instruction (lines 1102-1300)
for inst in &old_block.instructions {
let new_inst = rewrite_instruction(
inst,
remapper,
value_to_func_name,
function_params,
exit_block_id,
debug,
)?;
new_instructions.push(new_inst);
}
// Rewrite terminator (lines 1300-1400)
let new_terminator = rewrite_terminator(
&old_block.terminator,
remapper,
exit_block_id,
debug,
)?;
Ok(BasicBlock {
instructions: new_instructions,
terminator: new_terminator,
})
}
fn rewrite_instruction(
inst: &MirInstruction,
remapper: &JoinIrIdRemapper,
value_to_func_name: &HashMap<ValueId, String>,
function_params: &HashMap<String, Vec<ValueId>>,
exit_block_id: BasicBlockId,
debug: bool,
) -> Result<MirInstruction, String> {
// Instruction rewriting logic (lines 1102-1300)
// - Remap ValueIds
// - Convert Call → Jump for intra-module calls
// - Handle tail call optimization
Ok(inst.clone()) // Placeholder
}
fn rewrite_terminator(
terminator: &Option<MirInstruction>,
remapper: &JoinIrIdRemapper,
exit_block_id: BasicBlockId,
debug: bool,
) -> Result<Option<MirInstruction>, String> {
// Terminator rewriting logic (lines 1300-1400)
// - Remap block IDs
// - Convert Return → Jump to exit block
Ok(terminator.clone()) // Placeholder
}
```
---
### 6. `merge/exit_phi_builder.rs` - Exit PHI Construction (~100 lines)
**Purpose**: Construct exit PHI node from return values.
```rust
//! JoinIR Exit PHI Builder
//!
//! Constructs the exit block PHI node that merges return values
//! from all inlined JoinIR functions.
use crate::mir::{BasicBlockId, ValueId, MirModule, MirInstruction, BasicBlock};
use super::id_remapper::JoinIrIdRemapper;
pub(super) fn build_exit_phi(
builder: &mut crate::mir::builder::MirBuilder,
mir_module: &MirModule,
remapper: &JoinIrIdRemapper,
exit_block_id: BasicBlockId,
debug: bool,
) -> Result<Option<ValueId>, String> {
if debug {
eprintln!("[merge/exit_phi_builder] Building exit PHI");
}
// Collect return values from all functions (lines 1402-1450)
let mut return_values = Vec::new();
for (func_name, func) in &mir_module.functions {
for (old_block_id, block) in &func.blocks {
if let Some(MirInstruction::Return { value: Some(old_value) }) = &block.terminator {
// Remap the return value
let new_value = remapper.get_value(*old_value)
.ok_or_else(|| format!("Return value not remapped: {:?}", old_value))?;
// Remap the block ID (predecessor of exit block)
let new_block_id = remapper.get_block(func_name, *old_block_id)
.ok_or_else(|| format!("Block ID not remapped: {:?}", old_block_id))?;
return_values.push((new_block_id, new_value));
}
}
}
if return_values.is_empty() {
if debug {
eprintln!("[merge/exit_phi_builder] No return values, creating void exit block");
}
// Create empty exit block
builder.current_function.as_mut()
.ok_or("No current function")?
.blocks.insert(exit_block_id, BasicBlock {
instructions: vec![],
terminator: None,
});
return Ok(None);
}
// Create PHI node (lines 1450-1500)
let phi_value = builder.value_gen.next();
let phi_inst = MirInstruction::Phi {
dst: phi_value,
incoming: return_values.iter()
.map(|(block_id, value_id)| (*block_id, *value_id))
.collect(),
};
// Create exit block with PHI
builder.current_function.as_mut()
.ok_or("No current function")?
.blocks.insert(exit_block_id, BasicBlock {
instructions: vec![phi_inst],
terminator: None,
});
if debug {
eprintln!("[merge/exit_phi_builder] Exit PHI created: {:?}", phi_value);
}
Ok(Some(phi_value))
}
```
---
## Implementation Steps
### Step 1: Create Directory Structure (5 min)
```bash
mkdir -p src/mir/builder/control_flow/joinir/merge
```
### Step 2: Create Skeleton Files (10 min)
```bash
# Create empty files with module documentation
touch src/mir/builder/control_flow/joinir/merge/mod.rs
touch src/mir/builder/control_flow/joinir/merge/id_remapper.rs
touch src/mir/builder/control_flow/joinir/merge/block_allocator.rs
touch src/mir/builder/control_flow/joinir/merge/value_collector.rs
touch src/mir/builder/control_flow/joinir/merge/instruction_rewriter.rs
touch src/mir/builder/control_flow/joinir/merge/exit_phi_builder.rs
```
### Step 3: Extract id_remapper.rs (30 min)
- Copy `JoinIrIdRemapper` struct definition
- Move helper methods
- Add `remap_values()` function
- Test compilation: `cargo build --release`
### Step 4: Extract block_allocator.rs (30 min)
- Copy block allocation logic (lines 888-923)
- Create `allocate_blocks()` function
- Test compilation: `cargo build --release`
### Step 5: Extract value_collector.rs (30 min)
- Copy value collection logic (lines 931-971)
- Create `collect_values()` function
- Test compilation: `cargo build --release`
### Step 6: Extract instruction_rewriter.rs (1.5 hours)
- Copy instruction rewriting logic (lines 1102-1400)
- Create `merge_and_rewrite()` function
- Create helper functions for instruction/terminator rewriting
- Test compilation: `cargo build --release`
### Step 7: Extract exit_phi_builder.rs (30 min)
- Copy exit PHI logic (lines 1402-1500)
- Create `build_exit_phi()` function
- Test compilation: `cargo build --release`
### Step 8: Create mod.rs Coordinator (1 hour)
- Create coordinator function
- Wire up all sub-modules
- Move boundary reconnection logic (lines 1502-1578)
- Test compilation: `cargo build --release`
### Step 9: Update control_flow/mod.rs (15 min)
- Update imports to use new `merge` module
- Remove old `merge_joinir_mir_blocks()` implementation
- Test compilation: `cargo build --release`
### Step 10: Comprehensive Testing (1 hour)
```bash
# Build verification
cargo build --release
cargo test --lib
# Smoke tests
tools/smokes/v2/run.sh --profile quick
# Debug trace verification
NYASH_OPTION_C_DEBUG=1 ./target/release/nyash apps/tests/loop_min_while.hako 2>&1 | grep "merge"
# Determinism check (run 3 times)
for i in 1 2 3; do
echo "=== Run $i ==="
cargo test --release test_loop_patterns 2>&1 | grep "test result"
done
```
---
## Verification Checklist
- [ ] All 267+ tests pass
- [ ] Build time ≤ current (no regression)
- [ ] Debug traces still appear with `NYASH_OPTION_C_DEBUG=1`
- [ ] No compiler warnings
- [ ] No clippy warnings (`cargo clippy --all-targets`)
- [ ] Determinism test passes (3 consecutive runs produce identical results)
- [ ] Smoke tests pass for all patterns (1/2/3)
---
## Rollback Procedure
If Phase 4 fails:
```bash
# 1. Remove new directory
rm -rf src/mir/builder/control_flow/joinir/merge
# 2. Restore original control_flow.rs
git checkout src/mir/builder/control_flow.rs
# 3. Verify build
cargo build --release
cargo test --lib
# 4. Document failure
echo "$(date): Phase 4 rollback due to: $REASON" >> docs/development/refactoring/rollback_log.txt
```
---
## Common Pitfalls
### Pitfall 1: HashMap Non-Determinism
**Problem**: Using `HashMap` for iteration causes non-deterministic ValueId allocation.
**Solution**: Use `BTreeMap` and `BTreeSet` for all ID mappings.
### Pitfall 2: Missing Value Remapping
**Problem**: Forgetting to remap a ValueId causes "value not found" errors.
**Solution**: Systematically collect ALL values before remapping phase.
### Pitfall 3: Block Order Matters
**Problem**: Processing blocks in random order causes test failures.
**Solution**: Always sort by name/ID before iteration.
### Pitfall 4: Forgetting to Update Imports
**Problem**: Old imports cause compilation failures.
**Solution**: Update all imports in a single commit after extraction.
---
## Success Criteria
- ✅ 714-line function → 6 focused modules (100-150 lines each)
- ✅ All tests pass (no regressions)
- ✅ Debug traces work (`NYASH_OPTION_C_DEBUG=1`)
- ✅ Determinism maintained (BTreeMap/BTreeSet usage)
- ✅ Code is easier to understand (each phase isolated)
- ✅ Future maintenance easier (modify one phase at a time)
---
## Timeline
- **Total Estimated Effort**: 6 hours
- **Buffer**: +2 hours for unexpected issues
- **Recommended Schedule**: 2 days (3 hours/day)
**Day 1**:
- Steps 1-5 (directory setup, id_remapper, block_allocator, value_collector)
- Verification after each step
**Day 2**:
- Steps 6-10 (instruction_rewriter, exit_phi_builder, coordinator, testing)
- Comprehensive verification
---
## Conclusion
Phase 4 is the most critical modularization task. By breaking down the 714-line monster function into 6 focused modules, we:
1. **Improve maintainability** - Each phase in its own file
2. **Reduce complexity** - 714 lines → 100-150 lines per file
3. **Enable future development** - Easy to modify individual phases
4. **Maintain stability** - Zero breaking changes, backward compatible
**Next Steps**:
1. Review this guide
2. Set aside 2 days for implementation
3. Execute steps 1-10 systematically
4. Run comprehensive verification
5. Commit success or rollback if needed
**Questions?** Refer to the full implementation plan or open an issue.
---
**Document Version**: 1.0
**Created**: 2025-12-05
**Author**: Claude Code (AI-assisted planning)
**Status**: Ready for execution