feat(mir): Phase 143 P1 - Add parse_string pattern to canonicalizer

Expand loop canonicalizer to recognize parse_string patterns with both
continue (escape handling) and return (quote found) statements.

## Implementation

### New Pattern Detection (ast_feature_extractor.rs)
- Add `detect_parse_string_pattern()` function
- Support nested continue detection using `has_continue_node()` helper
- Recognize both return and continue in same loop body
- Return ParseStringInfo { carrier_name, delta, body_stmts }
- ~120 lines added

### Canonicalizer Integration (canonicalizer.rs)
- Try parse_string pattern first (most specific)
- Build LoopSkeleton with HeaderCond, Body, Update steps
- Set ExitContract: has_continue=true, has_return=true
- Route to Pattern4Continue (both exits present)
- ~45 lines modified

### Export Chain
- Add re-exports through 7 module levels:
  ast_feature_extractor → patterns → joinir → control_flow → builder → mir
- 10 lines total across 7 files

### Unit Test
- Add `test_parse_string_pattern_recognized()` in canonicalizer.rs
- Verify skeleton structure (3+ steps)
- Verify carrier (name="p", delta=1, role=Counter)
- Verify exit contract (continue=true, return=true, break=false)
- Verify routing decision (Pattern4Continue, no missing_caps)
- ~180 lines added

## Target Pattern
`tools/selfhost/test_pattern4_parse_string.hako`

Pattern structure:
- Check for closing quote → return
- Check for escape sequence → continue (nested inside another if)
- Regular character processing → p++

## Results
-  Strict parity green: Pattern4Continue
-  All 19 unit tests pass
-  Nested continue detection working
-  ExitContract correctly set (first pattern with both continue+return)
-  Default behavior unchanged

## Technical Challenges
1. Nested continue detection required recursive search
2. First pattern with both has_continue=true AND has_return=true
3. Variable step updates (p++ vs p+=2) handled with base delta

## Statistics
- New patterns: 1 (parse_string)
- Total patterns: 4 (skip_whitespace, parse_number, continue, parse_string)
- New capabilities: 0 (uses existing ConstStep)
- Lines added: ~300
- Files modified: 9
- Parity status: Green 

Phase 143 P1: Complete

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
nyash-codex
2025-12-16 12:37:47 +09:00
parent d605611a16
commit 42339ca77f
9 changed files with 680 additions and 6 deletions

View File

@ -6,6 +6,7 @@
use crate::ast::ASTNode;
use crate::mir::detect_continue_pattern;
use crate::mir::detect_parse_number_pattern as ast_detect_parse_number;
use crate::mir::detect_parse_string_pattern as ast_detect_parse_string;
use crate::mir::detect_skip_whitespace_pattern as ast_detect;
// ============================================================================
@ -75,6 +76,39 @@ pub fn try_extract_parse_number_pattern(
})
}
// ============================================================================
// Parse String Pattern (Phase 143-P1)
// ============================================================================
/// Try to extract parse_string pattern from loop
///
/// Pattern structure:
/// ```
/// loop(cond) {
/// // ... body statements (ch computation)
/// if quote_cond {
/// return result
/// }
/// if escape_cond {
/// // ... escape handling
/// carrier = carrier + const
/// continue
/// }
/// // ... regular character handling
/// carrier = carrier + const
/// }
/// ```
///
/// Returns (carrier_name, delta, body_stmts) if pattern matches.
///
/// # Phase 143-P1: Parse String Pattern Detection
///
/// This function delegates to `ast_feature_extractor::detect_parse_string_pattern`
/// for SSOT implementation.
pub fn try_extract_parse_string_pattern(body: &[ASTNode]) -> Option<(String, i64, Vec<ASTNode>)> {
ast_detect_parse_string(body).map(|info| (info.carrier_name, info.delta, info.body_stmts))
}
// ============================================================================
// Continue Pattern (Phase 142-P1)
// ============================================================================