# Phase 143: Canonicalizer Adaptation Range Expansion ## Status - State: 🎉 Complete (P0) ## P0: parse_number Pattern - Break in THEN Clause ### Objective Expand the canonicalizer to recognize parse_number/digit collection patterns, maximizing the adaptation range before adding new lowering patterns. ### Target Pattern `tools/selfhost/test_pattern2_parse_number.hako` ```hako loop(i < num_str.length()) { local ch = num_str.substring(i, i + 1) local digit_pos = digits.indexOf(ch) // Exit on non-digit (break in THEN clause) if digit_pos < 0 { break } // Append digit result = result + ch i = i + 1 } ``` ### Pattern Characteristics **Key Difference from skip_whitespace**: - **skip_whitespace**: `if cond { update } else { break }` - break in ELSE clause - **parse_number**: `if invalid_cond { break } body... update` - break in THEN clause **Structure**: ``` loop(cond) { // ... body statements (ch, digit_pos computation) if invalid_cond { break } // ... rest statements (result append, carrier update) carrier = carrier + const } ``` ### Implementation Summary #### 1. New Recognizer (`ast_feature_extractor.rs`) Added `detect_parse_number_pattern()`: - Detects `if cond { break }` pattern (no else clause) - Extracts body statements before break check - Extracts rest statements after break check (including carrier update) - Returns `ParseNumberInfo { carrier_name, delta, body_stmts, rest_stmts }` **Lines added**: ~150 lines #### 2. Canonicalizer Integration (`canonicalizer.rs`) - Tries parse_number pattern before skip_whitespace pattern - Builds LoopSkeleton with: - Step 1: HeaderCond - Step 2: Body (statements before break) - Step 3: Body (statements after break, excluding carrier update) - Step 4: Update (carrier update) - Routes to `Pattern2Break` (has_break=true) **Lines modified**: ~60 lines #### 3. Export Chain Added exports through the module hierarchy: - `ast_feature_extractor.rs` → `ParseNumberInfo` struct - `patterns/mod.rs` → re-export - `joinir/mod.rs` → re-export - `control_flow/mod.rs` → re-export - `builder.rs` → re-export - `mir/mod.rs` → final re-export **Files modified**: 6 files (8 lines total) #### 4. Unit Tests Added `test_parse_number_pattern_recognized()` in `canonicalizer.rs`: - Builds AST for parse_number pattern - Verifies skeleton structure (4 steps) - Verifies carrier (name="i", delta=1, role=Counter) - Verifies exit contract (has_break=true) - Verifies routing decision (Pattern2Break, no missing_caps) **Lines added**: ~130 lines ### Acceptance Criteria - ✅ Canonicalizer creates Skeleton for parse_number loop - ✅ RoutingDecision.chosen matches router (Pattern2Break) - ✅ Strict parity OK (canonicalizer and router agree) - ✅ Default behavior unchanged - ✅ quick profile not affected - ✅ Unit test added - ✅ Documentation created ### Results #### Parity Verification ```bash NYASH_JOINIR_DEV=1 HAKO_JOINIR_STRICT=1 ./target/release/hakorune \ tools/selfhost/test_pattern2_parse_number.hako ``` **Output**: ``` [loop_canonicalizer] Chosen pattern: Pattern2Break [choose_pattern_kind/PARITY] OK: canonical and actual agree on Pattern2Break [loop_canonicalizer/PARITY] OK in function 'main': canonical and actual agree on Pattern2Break ``` **Status**: ✅ **Green parity** - canonicalizer and router agree #### Unit Test Results ```bash cargo test --release --lib loop_canonicalizer::canonicalizer::tests::test_parse_number_pattern_recognized ``` **Status**: ✅ **PASS** ### Statistics | Metric | Count | |--------|-------| | New patterns supported | 1 (parse_number) | | Total patterns supported | 3 (skip_whitespace, parse_number, continue) | | New Capability Tags | 0 (uses existing ConstStep) | | Lines added | ~280 | | Files modified | 8 | | Unit tests added | 1 | | Parity status | Green ✅ | ### Comparison: Parse Number vs Skip Whitespace | Aspect | Skip Whitespace | Parse Number | |--------|----------------|--------------| | **Break location** | ELSE clause | THEN clause | | **Pattern** | `if cond { update } else { break }` | `if invalid { break } rest... update` | | **Body before if** | Optional | Optional (ch, digit_pos) | | **Body after if** | None (last statement) | Required (result append) | | **Carrier update** | In THEN clause | After if statement | | **Routing** | Pattern2Break | Pattern2Break | | **Example** | skip_whitespace, trim_leading/trailing | parse_number, digit collection | ### Follow-up Opportunities #### Immediate (Phase 143 P1-P2) - [ ] Support parse_string pattern (continue + return combo) - [ ] Add capability for variable-step updates (escape handling) #### Future Enhancements - [ ] Extend recognizer for nested if patterns - [ ] Support multiple break points (requires new capability) - [ ] Add signature-based corpus analysis ### Lessons Learned 1. **Break location matters**: THEN vs ELSE clause creates different patterns 2. **rest_stmts extraction**: Need to carefully separate body from carrier update 3. **Export chain**: Requires 6-level re-export (ast → patterns → joinir → control_flow → builder → mir) 4. **Parity first**: Always verify strict parity before claiming success ## SSOT - **Design**: `docs/development/current/main/design/loop-canonicalizer.md` - **Recognizer**: `src/mir/builder/control_flow/joinir/patterns/ast_feature_extractor.rs` - **Canonicalizer**: `src/mir/loop_canonicalizer/canonicalizer.rs` - **Tests**: Test file `tools/selfhost/test_pattern2_parse_number.hako` --- ## P1: parse_string Pattern - Continue + Return Combo ### Status ✅ Complete (2025-12-16) ### Objective Expand canonicalizer to recognize parse_string patterns with both `continue` (escape handling) and `return` (quote found). ### Target Pattern `tools/selfhost/test_pattern4_parse_string.hako` ```hako loop(p < len) { local ch = s.substring(p, p + 1) // Check for closing quote (return) if ch == "\"" { return 0 } // Check for escape sequence (continue) if ch == "\\" { result = result + ch p = p + 1 if p < len { // Nested if result = result + s.substring(p, p + 1) p = p + 1 continue // Nested continue } } // Regular character result = result + ch p = p + 1 } ``` ### Pattern Characteristics **Key Features**: - Multiple exit types: both `return` and `continue` - Nested control flow: continue is inside a nested `if` - Variable step updates: `p++` normally, but `p += 2` on escape **Structure**: ``` loop(cond) { // ... body statements (ch computation) if quote_cond { return result } if escape_cond { // ... escape handling carrier = carrier + step if nested_cond { // ... nested handling carrier = carrier + step continue // Nested continue! } } // ... regular processing carrier = carrier + step } ``` ### Implementation Summary #### 1. New Recognizer (`ast_feature_extractor.rs`) Added `detect_parse_string_pattern()`: - Detects `if cond { return }` pattern - Detects `continue` statement (with recursive search for nested continue) - Uses `has_continue_node()` helper for deep search - Returns `ParseStringInfo { carrier_name, delta, body_stmts }` **Lines added**: ~120 lines #### 2. Canonicalizer Integration (`canonicalizer.rs`) - Tries parse_string pattern first (most specific) - Builds LoopSkeleton with: - Step 1: HeaderCond - Step 2: Body (statements before exit checks) - Step 3: Update (carrier update) - Sets ExitContract: - `has_break = false` - `has_continue = true` - `has_return = true` - Routes to `Pattern4Continue` (has both continue and return) **Lines modified**: ~45 lines #### 3. Export Chain Added exports through the module hierarchy: - `ast_feature_extractor.rs` → `ParseStringInfo` struct + `detect_parse_string_pattern()` - `patterns/mod.rs` → re-export - `joinir/mod.rs` → re-export - `control_flow/mod.rs` → re-export - `builder.rs` → re-export - `mir/mod.rs` → final re-export **Files modified**: 7 files (10 lines total) #### 4. Unit Tests Added `test_parse_string_pattern_recognized()` in `canonicalizer.rs`: - Builds AST for parse_string pattern - Verifies skeleton structure (3 steps minimum) - Verifies carrier (name="p", delta=1, role=Counter) - Verifies exit contract (has_continue=true, has_return=true, has_break=false) - Verifies routing decision (Pattern4Continue, no missing_caps) **Lines added**: ~180 lines ### Acceptance Criteria - ✅ Canonicalizer creates Skeleton for parse_string loop - ✅ RoutingDecision.chosen matches router (Pattern4Continue) - ✅ Strict parity green (canonicalizer and router agree) - ✅ Default behavior unchanged - ✅ quick profile not affected (unrelated smoke test failure) - ✅ Unit test added and passing - ✅ Nested continue detection implemented ### Results #### Parity Verification ```bash NYASH_JOINIR_DEV=1 HAKO_JOINIR_STRICT=1 ./target/release/hakorune \ tools/selfhost/test_pattern4_parse_string.hako ``` **Output**: ``` [loop_canonicalizer] Skeleton steps: 3 [loop_canonicalizer] Carriers: 1 [loop_canonicalizer] Has exits: true [loop_canonicalizer] Decision: SUCCESS [loop_canonicalizer] Chosen pattern: Pattern4Continue [loop_canonicalizer] Missing caps: [] [loop_canonicalizer/PARITY] OK in function 'main': canonical and actual agree on Pattern4Continue ``` **Status**: ✅ **Green parity** - canonicalizer and router agree on Pattern4Continue #### Unit Test Results ```bash cargo test --release --lib loop_canonicalizer --release ``` **Status**: ✅ **All 19 tests PASS** ### Statistics | Metric | Count | |--------|-------| | New patterns supported | 1 (parse_string) | | Total patterns supported | 4 (skip_whitespace, parse_number, continue, parse_string) | | New Capability Tags | 0 (uses existing ConstStep) | | Lines added | ~300 | | Files modified | 9 | | Unit tests added | 1 | | Parity status | Green ✅ | ### Technical Challenges 1. **Nested Continue Detection**: Required using `has_continue_node()` recursive helper instead of shallow iteration 2. **Complex Exit Contract**: First pattern with both `has_continue=true` AND `has_return=true` 3. **Variable Step Updates**: The actual loop has variable steps (p++ vs p+=2), but canonicalizer uses base delta=1 ### Comparison: Parse String vs Other Patterns | Aspect | Skip Whitespace | Parse Number | Continue | **Parse String** | |--------|----------------|--------------|----------|------------------| | **Break** | Yes (ELSE) | Yes (THEN) | No | No | | **Continue** | No | No | Yes | **Yes** | | **Return** | No | No | No | **Yes** | | **Nested control** | No | No | No | **Yes (nested if + continue)** | | **Routing** | Pattern2Break | Pattern2Break | Pattern4Continue | **Pattern4Continue** | ### Follow-up Opportunities #### Next Steps (Phase 143 P2-P3) - [ ] Support parse_array pattern (array element collection) - [ ] Support parse_object pattern (key-value pair collection) - [ ] Add capability for true variable-step updates (not just const delta) #### Future Enhancements - [ ] Support multiple return points - [ ] Handle more complex nested patterns - [ ] Add signature-based corpus analysis for pattern discovery ### Lessons Learned 1. **Nested Detection Required**: Simple shallow iteration isn't enough for real-world patterns 2. **ExitContract Diversity**: Patterns can have multiple exit types simultaneously 3. **Parity vs Execution**: Achieving parity doesn't guarantee runtime success (Pattern4 lowering may need enhancements) 4. **Recursive Helpers**: Reusing existing helpers (`has_continue_node`) is better than duplicating logic --- **Phase 143 P0: Complete** ✅ **Phase 143 P1: Complete** ✅ **Date**: 2025-12-16 **Implemented by**: Claude Code (Sonnet 4.5)