Files
hakorune/docs/development/current/main/phases/phase-143/README.md
nyash-codex d605611a16 feat(canonicalizer): Phase 143-P0 - parse_number pattern support
Add parse_number pattern recognition to canonicalizer, expanding adaptation
range for digit collection loops with break in THEN clause.

## Changes

### New Recognizer (ast_feature_extractor.rs)
- `detect_parse_number_pattern()`: Detects `if invalid { break }` pattern
- `ParseNumberInfo`: Struct for extracted pattern info
- ~150 lines added

### Canonicalizer Integration (canonicalizer.rs)
- Parse_number pattern detection before skip_whitespace
- LoopSkeleton construction with 4 steps (Header + Body x2 + Update)
- Routes to Pattern2Break (has_break=true)
- ~60 lines modified

### Export Chain (6 files)
- patterns/mod.rs → joinir/mod.rs → control_flow/mod.rs
- builder.rs → mir/mod.rs
- 8 lines total

### Tests
- `test_parse_number_pattern_recognized()`: Unit test for recognition
- Strict parity verification: GREEN (canonical and router agree)
- ~130 lines added

## Pattern Comparison

| Aspect | Skip Whitespace | Parse Number |
|--------|----------------|--------------|
| Break location | ELSE clause | THEN clause |
| Pattern | `if cond { update } else { break }` | `if invalid { break } rest... update` |
| Body after if | None | Required (result append) |

## Results

-  Skeleton creation successful
-  RoutingDecision matches router (Pattern2Break)
-  Strict parity OK (canonicalizer ↔ router agreement)
-  Unit test PASS
-  Manual test: test_pattern2_parse_number.hako executes correctly

## Statistics

- New patterns: 1 (parse_number)
- Total patterns: 3 (skip_whitespace, parse_number, continue)
- Lines added: ~280
- Files modified: 8
- Parity status: Green 

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-16 09:08:37 +09:00

5.5 KiB

Phase 143: Canonicalizer Adaptation Range Expansion

Status

  • State: 🎉 Complete (P0)

P0: parse_number Pattern - Break in THEN Clause

Objective

Expand the canonicalizer to recognize parse_number/digit collection patterns, maximizing the adaptation range before adding new lowering patterns.

Target Pattern

tools/selfhost/test_pattern2_parse_number.hako

loop(i < num_str.length()) {
  local ch = num_str.substring(i, i + 1)
  local digit_pos = digits.indexOf(ch)

  // Exit on non-digit (break in THEN clause)
  if digit_pos < 0 {
    break
  }

  // Append digit
  result = result + ch
  i = i + 1
}

Pattern Characteristics

Key Difference from skip_whitespace:

  • skip_whitespace: if cond { update } else { break } - break in ELSE clause
  • parse_number: if invalid_cond { break } body... update - break in THEN clause

Structure:

loop(cond) {
    // ... body statements (ch, digit_pos computation)
    if invalid_cond {
        break
    }
    // ... rest statements (result append, carrier update)
    carrier = carrier + const
}

Implementation Summary

1. New Recognizer (ast_feature_extractor.rs)

Added detect_parse_number_pattern():

  • Detects if cond { break } pattern (no else clause)
  • Extracts body statements before break check
  • Extracts rest statements after break check (including carrier update)
  • Returns ParseNumberInfo { carrier_name, delta, body_stmts, rest_stmts }

Lines added: ~150 lines

2. Canonicalizer Integration (canonicalizer.rs)

  • Tries parse_number pattern before skip_whitespace pattern
  • Builds LoopSkeleton with:
    • Step 1: HeaderCond
    • Step 2: Body (statements before break)
    • Step 3: Body (statements after break, excluding carrier update)
    • Step 4: Update (carrier update)
  • Routes to Pattern2Break (has_break=true)

Lines modified: ~60 lines

3. Export Chain

Added exports through the module hierarchy:

  • ast_feature_extractor.rsParseNumberInfo struct
  • patterns/mod.rs → re-export
  • joinir/mod.rs → re-export
  • control_flow/mod.rs → re-export
  • builder.rs → re-export
  • mir/mod.rs → final re-export

Files modified: 6 files (8 lines total)

4. Unit Tests

Added test_parse_number_pattern_recognized() in canonicalizer.rs:

  • Builds AST for parse_number pattern
  • Verifies skeleton structure (4 steps)
  • Verifies carrier (name="i", delta=1, role=Counter)
  • Verifies exit contract (has_break=true)
  • Verifies routing decision (Pattern2Break, no missing_caps)

Lines added: ~130 lines

Acceptance Criteria

  • Canonicalizer creates Skeleton for parse_number loop
  • RoutingDecision.chosen matches router (Pattern2Break)
  • Strict parity OK (canonicalizer and router agree)
  • Default behavior unchanged
  • quick profile not affected
  • Unit test added
  • Documentation created

Results

Parity Verification

NYASH_JOINIR_DEV=1 HAKO_JOINIR_STRICT=1 ./target/release/hakorune \
  tools/selfhost/test_pattern2_parse_number.hako

Output:

[loop_canonicalizer]   Chosen pattern: Pattern2Break
[choose_pattern_kind/PARITY] OK: canonical and actual agree on Pattern2Break
[loop_canonicalizer/PARITY] OK in function 'main': canonical and actual agree on Pattern2Break

Status: Green parity - canonicalizer and router agree

Unit Test Results

cargo test --release --lib loop_canonicalizer::canonicalizer::tests::test_parse_number_pattern_recognized

Status: PASS

Statistics

Metric Count
New patterns supported 1 (parse_number)
Total patterns supported 3 (skip_whitespace, parse_number, continue)
New Capability Tags 0 (uses existing ConstStep)
Lines added ~280
Files modified 8
Unit tests added 1
Parity status Green

Comparison: Parse Number vs Skip Whitespace

Aspect Skip Whitespace Parse Number
Break location ELSE clause THEN clause
Pattern if cond { update } else { break } if invalid { break } rest... update
Body before if Optional Optional (ch, digit_pos)
Body after if None (last statement) Required (result append)
Carrier update In THEN clause After if statement
Routing Pattern2Break Pattern2Break
Example skip_whitespace, trim_leading/trailing parse_number, digit collection

Follow-up Opportunities

Immediate (Phase 143 P1-P2)

  • Support parse_string pattern (continue + return combo)
  • Add capability for variable-step updates (escape handling)

Future Enhancements

  • Extend recognizer for nested if patterns
  • Support multiple break points (requires new capability)
  • Add signature-based corpus analysis

Lessons Learned

  1. Break location matters: THEN vs ELSE clause creates different patterns
  2. rest_stmts extraction: Need to carefully separate body from carrier update
  3. Export chain: Requires 6-level re-export (ast → patterns → joinir → control_flow → builder → mir)
  4. Parity first: Always verify strict parity before claiming success

SSOT

  • Design: docs/development/current/main/design/loop-canonicalizer.md
  • Recognizer: src/mir/builder/control_flow/joinir/patterns/ast_feature_extractor.rs
  • Canonicalizer: src/mir/loop_canonicalizer/canonicalizer.rs
  • Tests: Test file tools/selfhost/test_pattern2_parse_number.hako

Phase 143 P0: Complete Date: 2025-12-16 Implemented by: Claude Code (Sonnet 4.5)