Files
hakorune/docs/archive/nyash_file_refactoring_issue.md

7.9 KiB

🚀 Issue: Nyash Codebase File Size Refactoring - Strategic Module Organization

Priority: 🔧 MEDIUM - Code maintainability and future development efficiency
Impact: Improved code organization, maintainability, and development velocity
Status: Ready for systematic refactoring implementation

📊 Current File Size Problems

🚨 Critical Oversized Files

1. main.rs: 1,490 lines (MASSIVE - Entry point bloat!)
2. parser/mod.rs: 1,461 lines (Parser complexity explosion)  
3. box_trait.rs: 1,456 lines (Box trait definition inflation)
4. interpreter/expressions.rs: 1,166 lines (Expression engine complexity)
5. mir/builder.rs: 1,107 lines (MIR construction complexity)

Total Impact: 35,810 lines, 2,037 functions, 587 classes - Maintenance nightmare level

🎯 Refactoring Strategy (Gemini AI Expert Analysis)

Staged Approach - Risk Mitigation Priority Order

Stage 1: main.rs (1,490 lines) - HIGHEST PRIORITY

Rationale: Entry point with loose coupling - safest to refactor first

Current Problems:

  • CLI argument parsing mixed with execution logic
  • Backend selection (Interpreter/VM/WASM/AOT) embedded in main
  • Benchmark functionality scattered throughout
  • Debug options handling integrated with core logic

Proposed Split:

// NEW FILE: src/cli.rs
// - All clap-based CLI argument definitions and parsing
// - Command-line option structures
// - Help text generation and validation

// NEW FILE: src/runner.rs  
// - Backend selection logic (Interpreter/VM/WASM/AOT)
// - File execution coordination
// - Benchmark runner implementation
// - REPL mode handling

// UPDATED: main.rs (target: <100 lines)
// - Thin entry point only
// - Call cli.rs for argument parsing
// - Pass results to runner.rs for execution

Stage 2: box_trait.rs (1,456 lines) - HIGH PRIORITY

Rationale: Everything is Box core - systematic categorization possible

Current Problems:

  • All 16 Box types crammed into single file
  • Trait definitions mixed with implementations
  • No logical grouping by functionality

Proposed Reorganization:

// NEW FILE: src/boxes/traits.rs
// - NyashBox core trait definition
// - BoxCore trait and shared interfaces
// - Common box behavior abstractions

// NEW FILE: src/boxes/primitives.rs
// - IntegerBox, StringBox, FloatBox, BoolBox
// - Basic data type implementations

// NEW FILE: src/boxes/collections.rs
// - ArrayBox, MapBox, HashBox
// - Collection-based Box implementations

// NEW FILE: src/boxes/io.rs  
// - SocketBox, FileBox, HTTPBox
// - I/O related Box implementations

// NEW FILE: src/boxes/system.rs
// - ConsoleBox, DebugBox, TimeBox
// - System interaction Box implementations

// UPDATED: src/boxes/mod.rs
// - Module declarations and public API facade
// - Unified Box registration and management

Stage 3: parser/mod.rs (1,461 lines) - MEDIUM PRIORITY

Rationale: Self-contained parsing logic - clear separation boundaries

Proposed Split:

// NEW FILE: src/parser/expressions.rs
// - Binary operations, method calls, if expressions
// - All expression parsing logic

// NEW FILE: src/parser/statements.rs  
// - let bindings, return statements, loop constructs
// - All statement parsing logic

// NEW FILE: src/parser/literals.rs
// - Number, string, array literal parsing
// - Literal value construction

// NEW FILE: src/parser/common.rs
// - Whitespace/comment skipping utilities
// - Common parser helper functions

// UPDATED: src/parser/mod.rs
// - Module orchestration and public API
// - Top-level parse function coordination

Stage 4: interpreter/expressions.rs (1,166 lines)

Rationale: Mirror parser structure for consistency

Proposed Split:

// NEW FILE: src/interpreter/eval_operations.rs
// - Binary/unary operator evaluation
// - Arithmetic and logical operations

// NEW FILE: src/interpreter/eval_calls.rs
// - Method call resolution and execution  
// - Function call handling

// NEW FILE: src/interpreter/eval_control_flow.rs
// - if expression evaluation
// - loop and control flow handling

// UPDATED: src/interpreter/expressions.rs
// - Evaluation dispatcher and coordinator
// - Expression type routing

Stage 5: mir/builder.rs (1,107 lines)

Rationale: Complex but structured - AST node correspondence

Proposed Split:

// NEW FILE: src/mir/builder/expressions.rs
// - AST expression nodes → MIR instructions
// - Expression-specific MIR generation

// NEW FILE: src/mir/builder/statements.rs
// - AST statement nodes → MIR instructions  
// - Statement-specific MIR generation

// NEW FILE: src/mir/builder/variables.rs
// - Variable binding and scope management
// - MIR variable lifecycle handling

// UPDATED: src/mir/builder.rs
// - MirBuilder struct definition
// - Top-level coordination and delegation

🎯 Implementation Requirements

Architecture Preservation

  • Everything is Box philosophy: Maintain unified Box abstraction
  • Arc threading: Preserve thread-safety model
  • Four backend support: Keep Interpreter/VM/WASM/AOT compatibility
  • 16 Box types: Ensure all existing Box functionality preserved

Quality Standards

  • No functionality changes: Pure refactoring - no behavior modification
  • Compile guarantee: Each stage must compile successfully before next stage
  • Test preservation: All existing tests must continue passing
  • Import cleanup: Remove unused imports revealed by modularization

Rust Best Practices Compliance

  • Module system: Follow Rust conventional module organization
  • Public API design: Minimize exposed implementation details
  • Documentation: Add module-level documentation for new files
  • Error handling: Maintain existing error propagation patterns

🧪 Validation Requirements

After Each Stage

# Compilation check
cargo check --all-targets
cargo build --release

# Functionality verification  
./target/release/nyash test_comprehensive_operators.hako
./target/release/nyash app_dice_rpg.hako
./target/release/nyash --benchmark --iterations 10

# Regression testing
cargo test

File Size Targets (Post-Refactoring)

main.rs: 1,490 → <100 lines (15x reduction)
parser/mod.rs: 1,461 → <200 lines (7x reduction)
box_trait.rs: 1,456 → REMOVED (distributed to boxes/* modules)
interpreter/expressions.rs: 1,166 → <300 lines (4x reduction)  
mir/builder.rs: 1,107 → <250 lines (4x reduction)

📝 Reporting Requirements

Progressive Reporting

  • Stage completion: Report each stage completion with file size metrics
  • Issue discovery: Report any architectural issues discovered during refactoring
  • Import optimization: Document removed unused imports and dependencies
  • Performance impact: Measure compilation time changes per stage

Before/After Analysis

  • File count: Document new files created and their responsibilities
  • Module dependencies: Show new module dependency graph
  • API changes: List any public API modifications (should be none)
  • Build time: Measure compilation performance impact

🚀 Expected Benefits

Developer Experience

  • Navigation: Faster code navigation and understanding
  • Maintenance: Isolated changes with minimal side effects
  • Collaboration: Multiple developers can work on different modules simultaneously
  • Testing: More focused unit testing capabilities

Future Development

  • Extensibility: Easier to add new Box types and functionality
  • Debugging: Clearer separation of concerns for troubleshooting
  • Refactoring: Future refactoring becomes safer and more targeted

🎯 This refactoring is essential for long-term maintainability and development velocity. The staged approach minimizes risk while maximizing organizational benefits.

📋 Critical: Start with Stage 1 (main.rs) as it provides the highest safety margin and immediate developer experience improvement.