Add parse_number pattern recognition to canonicalizer, expanding adaptation
range for digit collection loops with break in THEN clause.
## Changes
### New Recognizer (ast_feature_extractor.rs)
- `detect_parse_number_pattern()`: Detects `if invalid { break }` pattern
- `ParseNumberInfo`: Struct for extracted pattern info
- ~150 lines added
### Canonicalizer Integration (canonicalizer.rs)
- Parse_number pattern detection before skip_whitespace
- LoopSkeleton construction with 4 steps (Header + Body x2 + Update)
- Routes to Pattern2Break (has_break=true)
- ~60 lines modified
### Export Chain (6 files)
- patterns/mod.rs → joinir/mod.rs → control_flow/mod.rs
- builder.rs → mir/mod.rs
- 8 lines total
### Tests
- `test_parse_number_pattern_recognized()`: Unit test for recognition
- Strict parity verification: GREEN (canonical and router agree)
- ~130 lines added
## Pattern Comparison
| Aspect | Skip Whitespace | Parse Number |
|--------|----------------|--------------|
| Break location | ELSE clause | THEN clause |
| Pattern | `if cond { update } else { break }` | `if invalid { break } rest... update` |
| Body after if | None | Required (result append) |
## Results
- ✅ Skeleton creation successful
- ✅ RoutingDecision matches router (Pattern2Break)
- ✅ Strict parity OK (canonicalizer ↔ router agreement)
- ✅ Unit test PASS
- ✅ Manual test: test_pattern2_parse_number.hako executes correctly
## Statistics
- New patterns: 1 (parse_number)
- Total patterns: 3 (skip_whitespace, parse_number, continue)
- Lines added: ~280
- Files modified: 8
- Parity status: Green ✅
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
5.5 KiB
5.5 KiB
Phase 143: Canonicalizer Adaptation Range Expansion
Status
- State: 🎉 Complete (P0)
P0: parse_number Pattern - Break in THEN Clause
Objective
Expand the canonicalizer to recognize parse_number/digit collection patterns, maximizing the adaptation range before adding new lowering patterns.
Target Pattern
tools/selfhost/test_pattern2_parse_number.hako
loop(i < num_str.length()) {
local ch = num_str.substring(i, i + 1)
local digit_pos = digits.indexOf(ch)
// Exit on non-digit (break in THEN clause)
if digit_pos < 0 {
break
}
// Append digit
result = result + ch
i = i + 1
}
Pattern Characteristics
Key Difference from skip_whitespace:
- skip_whitespace:
if cond { update } else { break }- break in ELSE clause - parse_number:
if invalid_cond { break } body... update- break in THEN clause
Structure:
loop(cond) {
// ... body statements (ch, digit_pos computation)
if invalid_cond {
break
}
// ... rest statements (result append, carrier update)
carrier = carrier + const
}
Implementation Summary
1. New Recognizer (ast_feature_extractor.rs)
Added detect_parse_number_pattern():
- Detects
if cond { break }pattern (no else clause) - Extracts body statements before break check
- Extracts rest statements after break check (including carrier update)
- Returns
ParseNumberInfo { carrier_name, delta, body_stmts, rest_stmts }
Lines added: ~150 lines
2. Canonicalizer Integration (canonicalizer.rs)
- Tries parse_number pattern before skip_whitespace pattern
- Builds LoopSkeleton with:
- Step 1: HeaderCond
- Step 2: Body (statements before break)
- Step 3: Body (statements after break, excluding carrier update)
- Step 4: Update (carrier update)
- Routes to
Pattern2Break(has_break=true)
Lines modified: ~60 lines
3. Export Chain
Added exports through the module hierarchy:
ast_feature_extractor.rs→ParseNumberInfostructpatterns/mod.rs→ re-exportjoinir/mod.rs→ re-exportcontrol_flow/mod.rs→ re-exportbuilder.rs→ re-exportmir/mod.rs→ final re-export
Files modified: 6 files (8 lines total)
4. Unit Tests
Added test_parse_number_pattern_recognized() in canonicalizer.rs:
- Builds AST for parse_number pattern
- Verifies skeleton structure (4 steps)
- Verifies carrier (name="i", delta=1, role=Counter)
- Verifies exit contract (has_break=true)
- Verifies routing decision (Pattern2Break, no missing_caps)
Lines added: ~130 lines
Acceptance Criteria
- ✅ Canonicalizer creates Skeleton for parse_number loop
- ✅ RoutingDecision.chosen matches router (Pattern2Break)
- ✅ Strict parity OK (canonicalizer and router agree)
- ✅ Default behavior unchanged
- ✅ quick profile not affected
- ✅ Unit test added
- ✅ Documentation created
Results
Parity Verification
NYASH_JOINIR_DEV=1 HAKO_JOINIR_STRICT=1 ./target/release/hakorune \
tools/selfhost/test_pattern2_parse_number.hako
Output:
[loop_canonicalizer] Chosen pattern: Pattern2Break
[choose_pattern_kind/PARITY] OK: canonical and actual agree on Pattern2Break
[loop_canonicalizer/PARITY] OK in function 'main': canonical and actual agree on Pattern2Break
Status: ✅ Green parity - canonicalizer and router agree
Unit Test Results
cargo test --release --lib loop_canonicalizer::canonicalizer::tests::test_parse_number_pattern_recognized
Status: ✅ PASS
Statistics
| Metric | Count |
|---|---|
| New patterns supported | 1 (parse_number) |
| Total patterns supported | 3 (skip_whitespace, parse_number, continue) |
| New Capability Tags | 0 (uses existing ConstStep) |
| Lines added | ~280 |
| Files modified | 8 |
| Unit tests added | 1 |
| Parity status | Green ✅ |
Comparison: Parse Number vs Skip Whitespace
| Aspect | Skip Whitespace | Parse Number |
|---|---|---|
| Break location | ELSE clause | THEN clause |
| Pattern | if cond { update } else { break } |
if invalid { break } rest... update |
| Body before if | Optional | Optional (ch, digit_pos) |
| Body after if | None (last statement) | Required (result append) |
| Carrier update | In THEN clause | After if statement |
| Routing | Pattern2Break | Pattern2Break |
| Example | skip_whitespace, trim_leading/trailing | parse_number, digit collection |
Follow-up Opportunities
Immediate (Phase 143 P1-P2)
- Support parse_string pattern (continue + return combo)
- Add capability for variable-step updates (escape handling)
Future Enhancements
- Extend recognizer for nested if patterns
- Support multiple break points (requires new capability)
- Add signature-based corpus analysis
Lessons Learned
- Break location matters: THEN vs ELSE clause creates different patterns
- rest_stmts extraction: Need to carefully separate body from carrier update
- Export chain: Requires 6-level re-export (ast → patterns → joinir → control_flow → builder → mir)
- Parity first: Always verify strict parity before claiming success
SSOT
- Design:
docs/development/current/main/design/loop-canonicalizer.md - Recognizer:
src/mir/builder/control_flow/joinir/patterns/ast_feature_extractor.rs - Canonicalizer:
src/mir/loop_canonicalizer/canonicalizer.rs - Tests: Test file
tools/selfhost/test_pattern2_parse_number.hako
Phase 143 P0: Complete ✅ Date: 2025-12-16 Implemented by: Claude Code (Sonnet 4.5)