Files
hakorune/docs/development/current/main/design/pattern-p5b-escape-design.md
nyash-codex 93e62b1433 docs(phase93): Phase 93 P0完了記録 & ドキュメント整理
## 追加
- docs/development/current/main/phases/phase-93/README.md
  - Phase 93 P0 (ConditionOnly Derived Slot) 完了記録
  - 実装内容・テスト結果の詳細

## 更新
- CURRENT_TASK.md: Phase 93 P0完了に伴う更新
- 10-Now.md: 現在の進捗状況更新
- 30-Backlog.md: Phase 92/93関連タスク整理
- phase-91/92関連ドキュメント: historical化・要約化

## 削減
- 735行削減(historical化により詳細をREADMEに集約)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-16 23:30:39 +09:00

13 KiB
Raw Blame History

Pattern P5b: Escape Sequence Handling

Overview

Pattern P5b extends JoinIR loop recognition to handle variable-step carriers in escape sequence parsing.

This pattern is essential for:

  • JSON string parsers
  • CSV readers
  • Template engine string processing
  • Any escape-aware text processing loop

Problem Statement

Current Limitation

Standard Pattern 1-4 carriers always update by constant deltas:

Carrier i: i = i + 1 (always +1)

Escape sequences require conditional increments:

if escape_char { i = i + 2 }  // Skip both escape char and escaped char
else { i = i + 1 }            // Normal increment

Why this matters:

  • Common in string parsing (JSON, CSV, config files)
  • Appears in ~3 selfhost loops
  • Currently forces Fail-Fast (pattern not supported)
  • Could benefit from JoinIR exit-line optimization

Real-World Example: JSON String Reader

loop(i < n) {
    local ch = s.substring(i, i+1)

    if ch == "\"" { break }        // End of string

    if ch == "\\" {
        i = i + 1                  // <-- CONDITIONAL: skip escape char
        ch = s.substring(i, i+1)   // Read escaped character
    }

    out = out + ch                 // Process character
    i = i + 1                      // <-- UNCONDITIONAL: advance
}

Loop progression:

  • Normal case: i advances by 1
  • Escape case: i advances by 2 (skip inside if + final increment)

Pattern Definition

Canonical Form

LoopSkeleton {
    steps: [
        HeaderCond(carrier < limit),
        Body(escape_check_block),
        Body(process_block),
        Update(carrier_increments)
    ]
}

Header Contract

Requirement: Bounded loop on single integer carrier

loop(i < n)    ✅ Valid P5b header
loop(i < 100)  ✅ Valid P5b header
loop(i <= n)   ✅ Valid P5b header (edge case)
loop(true)     ❌ Not P5b (unbounded)
loop(i < n && j < m)  ❌ Not P5b (multi-carrier condition)

Carrier: Must be loop variable used in condition

Escape Check Contract

Requirement: Conditional increment based on character test

Escape Detection Block

if ch == escape_char {
    carrier = carrier + escape_delta
    // Optional: read next character
    ch = s.substring(carrier, carrier+1)
}

Escape character: Typically \\ (backslash), but can vary

  • JSON: \\
  • CSV: " (context-dependent)
  • Custom: Any single-character escape

Escape delta: How far to skip

  • +1: Skip just the escape marker
  • +2: Skip escape marker + escaped char (common case)
  • +N: Other values possible

Detection Algorithm

  1. Find if statement in loop body
  2. Check condition: ch == literal_char
  3. Extract escape character: The literal constant
  4. Find assignment in if block: carrier = carrier + <const>
  5. Calculate escape_delta: The constant value
  6. Validate: Escape delta > 0

Process Block Contract

Requirement: Character accumulation with optional processing

out = out + ch          ✅ Simple append
result = result + ch    ✅ Any accumulator
s = s + value           ❌ Not append pattern

Accumulator carrier: String-like box supporting append

Update Block Contract

Requirement: Unconditional carrier increment after escape check

carrier = carrier + normal_delta

Normal delta: Almost always +1

  • Defines "normal" loop progress
  • Only incremented once per iteration (not in escape block)

Detection Algorithm

  1. Find assignment after escape if block
  2. Pattern: carrier = carrier + <const>
  3. Must be unconditional (outside any if block)
  4. Extract normal_delta: The constant

Break Requirement

Requirement: Explicit break on string boundary

if ch == boundary_char { break }

Boundary character: Typically quote "

  • JSON: "
  • Custom strings: Any delimiter

Position in loop: Usually before escape check

Exit Contract for P5b

ExitContract {
    has_break: true,        // Always for escape patterns
    has_continue: false,
    has_return: false,
    carriers: vec![
        CarrierInfo {
            name: "i",      // Loop variable
            deltas: [
                normal_delta,   // e.g., 1
                escape_delta    // e.g., 2
            ]
        },
        CarrierInfo {
            name: "out",    // Accumulator
            pattern: Append
        }
    ]
}

Capability Analysis

Required Capabilities (CapabilityTag)

For Pattern P5b to be JoinIR-compatible, these must be present:

Capability Meaning P5b Requirement Status
ConstStep Carrier updates are constants Required Both deltas constant
SingleBreak Only one break point Required String boundary only
PureHeader Condition has no side effects Required i < n is pure
OuterLocalCond Condition doesn't reference locals ⚠️ Soft req Usually true
ExitBindings Exit block is simple Required Break is unconditional

Missing Capabilities (Fail-Fast Reasons)

If any of these are detected, Pattern P5b is rejected:

Capability Why It Blocks P5b Example
MultipleBreak Multiple exit points if x { break } if y { break }
MultipleCarriers Condition uses multiple vars loop(i < n && j < m)
VariableStep Deltas aren't constants i = i + adjustment
NestedEscape Escape check inside other if if outer { if ch == \\ ... }

Recognition Algorithm

High-Level Steps

  1. Extract header carrier: i from loop(i < n)
  2. Find escape check: if ch == "\\"
  3. Find escape increment: i = i + 2 inside if
  4. Find process block: out = out + ch
  5. Find normal increment: i = i + 1 after if
  6. Find break condition: if ch == "\"" { break }
  7. Build LoopSkeleton: UpdateKind::ConditionalStep { cond, then_delta, else_delta } を構築
  8. Build RoutingDecision: chosen = Pattern2Breakexit contract 優先。P5b 固有の構造情報は notes に載せる

Pseudo-Code

fn detect_escape_pattern(loop_expr: &Expr) -> Option<EscapePatternInfo> {
    // Step 1: Extract loop variable
    let (carrier_name, limit) = extract_header_carrier(loop_expr)?;

    // Step 2: Find escape check statement
    let escape_stmts = find_escape_check_block(loop_body)?;

    // Step 3: Extract escape character
    let escape_char = extract_escape_literal(escape_stmts)?;

    // Step 4: Extract escape delta
    let escape_delta = extract_escape_increment(escape_stmts, carrier_name)?;

    // Step 5: Find process statements
    let process_stmts = find_character_accumulation(loop_body)?;

    // Step 6: Extract normal increment
    let normal_delta = extract_normal_increment(loop_body, carrier_name)?;

    // Step 7: Find break condition
    let break_char = extract_break_literal(loop_body)?;

    // Build result
    Some(EscapePatternInfo {
        carrier_name,
        escape_char,
        normal_delta,
        escape_delta,
        break_char,
    })
}

Implementation Location

File: src/mir/loop_canonicalizer/canonicalizer.rs

Function: detect_escape_pattern() (new)

Integration point: canonicalize_loop_expr() main dispatch

Priority: Call before detect_skip_whitespace_pattern() (more specific)

Skeleton Representation

Standard Layout

LoopSkeleton {
    header: HeaderCond(Condition {
        operator: LessThan,
        left: Var("i"),
        right: Var("n")
    }),

    steps: [
        // Escape check block
        SkeletonStep::Body(vec![
            Expr::If {
                cond: Comparison("ch", Eq, Literal("\\")),
                then_body: [
                    Expr::Assign("i", Add, 1),  // escape_delta
                    Expr::Assign("ch", Substring("s", Var("i"), Add(Var("i"), 1))),
                ]
            }
        ]),

        // Character accumulation
        SkeletonStep::Body(vec![
            Expr::Assign("out", Append, Var("ch")),
        ]),

        // Normal increment
        SkeletonStep::Update(vec![
            Expr::Assign("i", Add, 1),  // normal_delta
        ]),
    ],

    carriers: vec![
        CarrierSlot {
            name: "i",
            update_kind: UpdateKind::ConditionalStep {
                cond: (ch == "\\"),
                then_delta: 2,
                else_delta: 1,
            },
            // ... other fieldsrole など)
        },
        CarrierSlot {
            name: "out",
            pattern: Append,
            // ... other fields
        }
    ],

    exit_contract: ExitContract {
        has_break: true,
        // ...
    }
}

RoutingDecision Output

For Valid P5b Pattern

RoutingDecision {
    chosen: Pattern2Break,
    missing_caps: vec![],
    notes: vec![
        "escape_char: \\",
        "normal_delta: 1",
        "escape_delta: 2",
        "break_char: \"",
        "accumulator: out",
    ],
    confidence: High,
}

For Invalid/Unsupported Cases

// Multiple escapes detected
RoutingDecision {
    chosen: Unknown,
    missing_caps: vec![CapabilityTag::MultipleBreak],
    notes: vec!["Multiple escape checks found"],
    confidence: Low,
}

// Variable step (not constant)
RoutingDecision {
    chosen: Unknown,
    missing_caps: vec![CapabilityTag::VariableStep],
    notes: vec!["Escape delta is not constant"],
    confidence: Low,
}

Parity Verification

Dev-Only Observation

In src/mir/builder/control_flow/joinir/routing.rs:

  1. Router makes decision using existing Pattern 1-4 logic
  2. Canonicalizer analyzes and detects Pattern P5b
  3. Parity checker compares:
    • Router decision (Pattern 1-4)
    • Canonicalizer decision (Pattern P5b)
  4. If mismatch:
    • Dev mode: Log with reason
    • Strict mode: Fail-Fast with error

Expected Outcomes

Case A: Router picks Pattern 1, Canonicalizer picks P5b

  • Router: "Simple bounded loop"
  • Canonicalizer: "Escape sequence pattern detected"
  • Resolution: Canonicalizer is more specific → router will eventually delegate

Case B: Router fails, Canonicalizer succeeds

  • Router: "No pattern matched" (Fail-Fast)
  • Canonicalizer: "Pattern P5b matched"
  • Resolution: P5b is new capability → expected until router updated

Case C: Both agree P5b

  • Router: Pattern P5b
  • Canonicalizer: Pattern P5b
  • Result: Parity green

Test Cases

Minimal Case (test_pattern5b_escape_minimal.hako)

Input: String with one escape sequence Carrier: Single position variable, single accumulator Deltas: normal=1, escape=2 Output: Processed string (escape removed)

Extended Cases (Phase 91 Step 2+)

  1. test_pattern5b_escape_json: JSON string with multiple escapes
  2. test_pattern5b_escape_custom: Custom escape character
  3. test_pattern5b_escape_newline: Escape newline handling
  4. test_pattern5b_escape_fail_multiple: Multiple escapes (should Fail-Fast)
  5. test_pattern5b_escape_fail_variable: Variable delta (should Fail-Fast)

Lowering Strategy (Future Phase 92)

Philosophy: Keep Return Simple

Pattern P5b lowering should:

  1. Reuse Pattern 1-2 lowering for normal case
  2. Extend for conditional increment:
    • PHI for carrier value after escape check
    • Separate paths for escape vs normal
  3. Close within Pattern5b (no cross-boundary complexity)

Rough Outline

Entry: LoopPrefix
  ↓
Condition: i < n
  ↓
[BRANCH]
  ├→ EscapeBlock
  │   ├→ i = i + escape_delta
  │   └→ ch = substring(i)
  │
  └→ NormalBlock
      ├→ (ch already set)
      └→ noop

  (PHI: i from both branches)
  ↓
ProcessBlock: out = out + ch
  ↓
UpdateBlock: i = i + 1
  ↓
Condition check...

Future Extensions

Pattern P5c: Multi-Character Escapes

if ch == "\\" {
    i = i + 2  // Skip \x
    if i < n {
        local second = s.substring(i, i+1)
        // Handle \n, \t, \x, etc.
    }
}

Complexity: Requires escape sequence table (not generic)

Pattern P5d: Nested Escape Contexts

// Regex with escaped /, inside JSON string with escaped "
loop(i < n) {
    if ch == "\"" { ... }     // String boundary
    if ch == "\\" {
        if in_regex {
            i = i + 2         // Regex escape
        } else {
            i = i + 1         // String escape
        }
    }
}

Complexity: State-dependent behavior (future work)

References

  • JoinIR Architecture: joinir-architecture-overview.md
  • Loop Canonicalizer: loop-canonicalizer.md
  • CapabilityTag Enum: src/mir/loop_canonicalizer/capability_guard.rs
  • Test Fixture: tools/selfhost/test_pattern5b_escape_minimal.hako
  • Phase 91 Plan: phases/phase-91/README.md

Summary

Pattern P5b enables JoinIR recognition of escape-sequence-aware string parsing loops by:

  1. Extending Canonicalizer to detect conditional increments
  2. Adding exit-line optimization for escape branching
  3. Preserving ExitContract consistency with P1-P4 patterns
  4. Enabling parity verification in strict mode

Status: Design complete, implementation ready for Phase 91 Step 2