# Phase 143: Canonicalizer Adaptation Range Expansion ## Status - State: 🎉 Complete (P0) ## P0: parse_number Pattern - Break in THEN Clause ### Objective Expand the canonicalizer to recognize parse_number/digit collection patterns, maximizing the adaptation range before adding new lowering patterns. ### Target Pattern `tools/selfhost/test_pattern2_parse_number.hako` ```hako loop(i < num_str.length()) { local ch = num_str.substring(i, i + 1) local digit_pos = digits.indexOf(ch) // Exit on non-digit (break in THEN clause) if digit_pos < 0 { break } // Append digit result = result + ch i = i + 1 } ``` ### Pattern Characteristics **Key Difference from skip_whitespace**: - **skip_whitespace**: `if cond { update } else { break }` - break in ELSE clause - **parse_number**: `if invalid_cond { break } body... update` - break in THEN clause **Structure**: ``` loop(cond) { // ... body statements (ch, digit_pos computation) if invalid_cond { break } // ... rest statements (result append, carrier update) carrier = carrier + const } ``` ### Implementation Summary #### 1. New Recognizer (`ast_feature_extractor.rs`) Added `detect_parse_number_pattern()`: - Detects `if cond { break }` pattern (no else clause) - Extracts body statements before break check - Extracts rest statements after break check (including carrier update) - Returns `ParseNumberInfo { carrier_name, delta, body_stmts, rest_stmts }` **Lines added**: ~150 lines #### 2. Canonicalizer Integration (`canonicalizer.rs`) - Tries parse_number pattern before skip_whitespace pattern - Builds LoopSkeleton with: - Step 1: HeaderCond - Step 2: Body (statements before break) - Step 3: Body (statements after break, excluding carrier update) - Step 4: Update (carrier update) - Routes to `Pattern2Break` (has_break=true) **Lines modified**: ~60 lines #### 3. Export Chain Added exports through the module hierarchy: - `ast_feature_extractor.rs` → `ParseNumberInfo` struct - `patterns/mod.rs` → re-export - `joinir/mod.rs` → re-export - `control_flow/mod.rs` → re-export - `builder.rs` → re-export - `mir/mod.rs` → final re-export **Files modified**: 6 files (8 lines total) #### 4. Unit Tests Added `test_parse_number_pattern_recognized()` in `canonicalizer.rs`: - Builds AST for parse_number pattern - Verifies skeleton structure (4 steps) - Verifies carrier (name="i", delta=1, role=Counter) - Verifies exit contract (has_break=true) - Verifies routing decision (Pattern2Break, no missing_caps) **Lines added**: ~130 lines ### Acceptance Criteria - ✅ Canonicalizer creates Skeleton for parse_number loop - ✅ RoutingDecision.chosen matches router (Pattern2Break) - ✅ Strict parity OK (canonicalizer and router agree) - ✅ Default behavior unchanged - ✅ quick profile not affected - ✅ Unit test added - ✅ Documentation created ### Results #### Parity Verification ```bash NYASH_JOINIR_DEV=1 HAKO_JOINIR_STRICT=1 ./target/release/hakorune \ tools/selfhost/test_pattern2_parse_number.hako ``` **Output**: ``` [loop_canonicalizer] Chosen pattern: Pattern2Break [choose_pattern_kind/PARITY] OK: canonical and actual agree on Pattern2Break [loop_canonicalizer/PARITY] OK in function 'main': canonical and actual agree on Pattern2Break ``` **Status**: ✅ **Green parity** - canonicalizer and router agree #### Unit Test Results ```bash cargo test --release --lib loop_canonicalizer::canonicalizer::tests::test_parse_number_pattern_recognized ``` **Status**: ✅ **PASS** ### Statistics | Metric | Count | |--------|-------| | New patterns supported | 1 (parse_number) | | Total patterns supported | 3 (skip_whitespace, parse_number, continue) | | New Capability Tags | 0 (uses existing ConstStep) | | Lines added | ~280 | | Files modified | 8 | | Unit tests added | 1 | | Parity status | Green ✅ | ### Comparison: Parse Number vs Skip Whitespace | Aspect | Skip Whitespace | Parse Number | |--------|----------------|--------------| | **Break location** | ELSE clause | THEN clause | | **Pattern** | `if cond { update } else { break }` | `if invalid { break } rest... update` | | **Body before if** | Optional | Optional (ch, digit_pos) | | **Body after if** | None (last statement) | Required (result append) | | **Carrier update** | In THEN clause | After if statement | | **Routing** | Pattern2Break | Pattern2Break | | **Example** | skip_whitespace, trim_leading/trailing | parse_number, digit collection | ### Follow-up Opportunities #### Immediate (Phase 143 P1-P2) - [ ] Support parse_string pattern (continue + return combo) - [ ] Add capability for variable-step updates (escape handling) #### Future Enhancements - [ ] Extend recognizer for nested if patterns - [ ] Support multiple break points (requires new capability) - [ ] Add signature-based corpus analysis ### Lessons Learned 1. **Break location matters**: THEN vs ELSE clause creates different patterns 2. **rest_stmts extraction**: Need to carefully separate body from carrier update 3. **Export chain**: Requires 6-level re-export (ast → patterns → joinir → control_flow → builder → mir) 4. **Parity first**: Always verify strict parity before claiming success ## SSOT - **Design**: `docs/development/current/main/design/loop-canonicalizer.md` - **Recognizer**: `src/mir/builder/control_flow/joinir/patterns/ast_feature_extractor.rs` - **Canonicalizer**: `src/mir/loop_canonicalizer/canonicalizer.rs` - **Tests**: Test file `tools/selfhost/test_pattern2_parse_number.hako` --- **Phase 143 P0: Complete** ✅ **Date**: 2025-12-16 **Implemented by**: Claude Code (Sonnet 4.5)