## 追加 - docs/development/current/main/phases/phase-93/README.md - Phase 93 P0 (ConditionOnly Derived Slot) 完了記録 - 実装内容・テスト結果の詳細 ## 更新 - CURRENT_TASK.md: Phase 93 P0完了に伴う更新 - 10-Now.md: 現在の進捗状況更新 - 30-Backlog.md: Phase 92/93関連タスク整理 - phase-91/92関連ドキュメント: historical化・要約化 ## 削減 - 735行削減(historical化により詳細をREADMEに集約) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
507 lines
13 KiB
Markdown
507 lines
13 KiB
Markdown
# Pattern P5b: Escape Sequence Handling
|
||
|
||
## Overview
|
||
|
||
**Pattern P5b** extends JoinIR loop recognition to handle **variable-step carriers** in escape sequence parsing.
|
||
|
||
This pattern is essential for:
|
||
- JSON string parsers
|
||
- CSV readers
|
||
- Template engine string processing
|
||
- Any escape-aware text processing loop
|
||
|
||
## Problem Statement
|
||
|
||
### Current Limitation
|
||
|
||
Standard Pattern 1-4 carriers always update by constant deltas:
|
||
```
|
||
Carrier i: i = i + 1 (always +1)
|
||
```
|
||
|
||
Escape sequences require conditional increments:
|
||
```
|
||
if escape_char { i = i + 2 } // Skip both escape char and escaped char
|
||
else { i = i + 1 } // Normal increment
|
||
```
|
||
|
||
**Why this matters**:
|
||
- Common in string parsing (JSON, CSV, config files)
|
||
- Appears in ~3 selfhost loops
|
||
- Currently forces Fail-Fast (pattern not supported)
|
||
- Could benefit from JoinIR exit-line optimization
|
||
|
||
### Real-World Example: JSON String Reader
|
||
|
||
```hako
|
||
loop(i < n) {
|
||
local ch = s.substring(i, i+1)
|
||
|
||
if ch == "\"" { break } // End of string
|
||
|
||
if ch == "\\" {
|
||
i = i + 1 // <-- CONDITIONAL: skip escape char
|
||
ch = s.substring(i, i+1) // Read escaped character
|
||
}
|
||
|
||
out = out + ch // Process character
|
||
i = i + 1 // <-- UNCONDITIONAL: advance
|
||
}
|
||
```
|
||
|
||
Loop progression:
|
||
- Normal case: `i` advances by 1
|
||
- Escape case: `i` advances by 2 (skip inside if + final increment)
|
||
|
||
## Pattern Definition
|
||
|
||
### Canonical Form
|
||
|
||
```
|
||
LoopSkeleton {
|
||
steps: [
|
||
HeaderCond(carrier < limit),
|
||
Body(escape_check_block),
|
||
Body(process_block),
|
||
Update(carrier_increments)
|
||
]
|
||
}
|
||
```
|
||
|
||
### Header Contract
|
||
|
||
**Requirement**: Bounded loop on single integer carrier
|
||
|
||
```
|
||
loop(i < n) ✅ Valid P5b header
|
||
loop(i < 100) ✅ Valid P5b header
|
||
loop(i <= n) ✅ Valid P5b header (edge case)
|
||
loop(true) ❌ Not P5b (unbounded)
|
||
loop(i < n && j < m) ❌ Not P5b (multi-carrier condition)
|
||
```
|
||
|
||
**Carrier**: Must be loop variable used in condition
|
||
|
||
### Escape Check Contract
|
||
|
||
**Requirement**: Conditional increment based on character test
|
||
|
||
#### Escape Detection Block
|
||
|
||
```
|
||
if ch == escape_char {
|
||
carrier = carrier + escape_delta
|
||
// Optional: read next character
|
||
ch = s.substring(carrier, carrier+1)
|
||
}
|
||
```
|
||
|
||
**Escape character**: Typically `\\` (backslash), but can vary
|
||
- JSON: `\\`
|
||
- CSV: `"` (context-dependent)
|
||
- Custom: Any single-character escape
|
||
|
||
**Escape delta**: How far to skip
|
||
- `+1`: Skip just the escape marker
|
||
- `+2`: Skip escape marker + escaped char (common case)
|
||
- `+N`: Other values possible
|
||
|
||
#### Detection Algorithm
|
||
|
||
1. **Find if statement in loop body**
|
||
2. **Check condition**: `ch == literal_char`
|
||
3. **Extract escape character**: The literal constant
|
||
4. **Find assignment in if block**: `carrier = carrier + <const>`
|
||
5. **Calculate escape_delta**: The constant value
|
||
6. **Validate**: Escape delta > 0
|
||
|
||
### Process Block Contract
|
||
|
||
**Requirement**: Character accumulation with optional processing
|
||
|
||
```
|
||
out = out + ch ✅ Simple append
|
||
result = result + ch ✅ Any accumulator
|
||
s = s + value ❌ Not append pattern
|
||
```
|
||
|
||
**Accumulator carrier**: String-like box supporting append
|
||
|
||
### Update Block Contract
|
||
|
||
**Requirement**: Unconditional carrier increment after escape check
|
||
|
||
```
|
||
carrier = carrier + normal_delta
|
||
```
|
||
|
||
**Normal delta**: Almost always `+1`
|
||
- Defines "normal" loop progress
|
||
- Only incremented once per iteration (not in escape block)
|
||
|
||
#### Detection Algorithm
|
||
|
||
1. **Find assignment after escape if block**
|
||
2. **Pattern**: `carrier = carrier + <const>`
|
||
3. **Must be unconditional** (outside any if block)
|
||
4. **Extract normal_delta**: The constant
|
||
|
||
### Break Requirement
|
||
|
||
**Requirement**: Explicit break on string boundary
|
||
|
||
```
|
||
if ch == boundary_char { break }
|
||
```
|
||
|
||
**Boundary character**: Typically quote `"`
|
||
- JSON: `"`
|
||
- Custom strings: Any delimiter
|
||
|
||
**Position in loop**: Usually before escape check
|
||
|
||
### Exit Contract for P5b
|
||
|
||
```rust
|
||
ExitContract {
|
||
has_break: true, // Always for escape patterns
|
||
has_continue: false,
|
||
has_return: false,
|
||
carriers: vec![
|
||
CarrierInfo {
|
||
name: "i", // Loop variable
|
||
deltas: [
|
||
normal_delta, // e.g., 1
|
||
escape_delta // e.g., 2
|
||
]
|
||
},
|
||
CarrierInfo {
|
||
name: "out", // Accumulator
|
||
pattern: Append
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
## Capability Analysis
|
||
|
||
### Required Capabilities (CapabilityTag)
|
||
|
||
For Pattern P5b to be JoinIR-compatible, these must be present:
|
||
|
||
| Capability | Meaning | P5b Requirement | Status |
|
||
|------------|---------|-----------------|--------|
|
||
| `ConstStep` | Carrier updates are constants | ✅ Required | Both deltas constant |
|
||
| `SingleBreak` | Only one break point | ✅ Required | String boundary only |
|
||
| `PureHeader` | Condition has no side effects | ✅ Required | `i < n` is pure |
|
||
| `OuterLocalCond` | Condition doesn't reference locals | ⚠️ Soft req | Usually true |
|
||
| `ExitBindings` | Exit block is simple | ✅ Required | Break is unconditional |
|
||
|
||
### Missing Capabilities (Fail-Fast Reasons)
|
||
|
||
If any of these are detected, Pattern P5b is rejected:
|
||
|
||
| Capability | Why It Blocks P5b | Example |
|
||
|------------|-------------------|---------|
|
||
| `MultipleBreak` | Multiple exit points | `if x { break } if y { break }` |
|
||
| `MultipleCarriers` | Condition uses multiple vars | `loop(i < n && j < m)` |
|
||
| `VariableStep` | Deltas aren't constants | `i = i + adjustment` |
|
||
| `NestedEscape` | Escape check inside other if | `if outer { if ch == \\ ... }` |
|
||
|
||
## Recognition Algorithm
|
||
|
||
### High-Level Steps
|
||
|
||
1. **Extract header carrier**: `i` from `loop(i < n)`
|
||
2. **Find escape check**: `if ch == "\\"`
|
||
3. **Find escape increment**: `i = i + 2` inside if
|
||
4. **Find process block**: `out = out + ch`
|
||
5. **Find normal increment**: `i = i + 1` after if
|
||
6. **Find break condition**: `if ch == "\"" { break }`
|
||
7. **Build LoopSkeleton**: `UpdateKind::ConditionalStep { cond, then_delta, else_delta }` を構築
|
||
8. **Build RoutingDecision**: `chosen = Pattern2Break`(exit contract 優先)。P5b 固有の構造情報は `notes` に載せる
|
||
|
||
### Pseudo-Code
|
||
|
||
```rust
|
||
fn detect_escape_pattern(loop_expr: &Expr) -> Option<EscapePatternInfo> {
|
||
// Step 1: Extract loop variable
|
||
let (carrier_name, limit) = extract_header_carrier(loop_expr)?;
|
||
|
||
// Step 2: Find escape check statement
|
||
let escape_stmts = find_escape_check_block(loop_body)?;
|
||
|
||
// Step 3: Extract escape character
|
||
let escape_char = extract_escape_literal(escape_stmts)?;
|
||
|
||
// Step 4: Extract escape delta
|
||
let escape_delta = extract_escape_increment(escape_stmts, carrier_name)?;
|
||
|
||
// Step 5: Find process statements
|
||
let process_stmts = find_character_accumulation(loop_body)?;
|
||
|
||
// Step 6: Extract normal increment
|
||
let normal_delta = extract_normal_increment(loop_body, carrier_name)?;
|
||
|
||
// Step 7: Find break condition
|
||
let break_char = extract_break_literal(loop_body)?;
|
||
|
||
// Build result
|
||
Some(EscapePatternInfo {
|
||
carrier_name,
|
||
escape_char,
|
||
normal_delta,
|
||
escape_delta,
|
||
break_char,
|
||
})
|
||
}
|
||
```
|
||
|
||
### Implementation Location
|
||
|
||
**File**: `src/mir/loop_canonicalizer/canonicalizer.rs`
|
||
|
||
**Function**: `detect_escape_pattern()` (new)
|
||
|
||
**Integration point**: `canonicalize_loop_expr()` main dispatch
|
||
|
||
**Priority**: Call before `detect_skip_whitespace_pattern()` (more specific)
|
||
|
||
## Skeleton Representation
|
||
|
||
### Standard Layout
|
||
|
||
```
|
||
LoopSkeleton {
|
||
header: HeaderCond(Condition {
|
||
operator: LessThan,
|
||
left: Var("i"),
|
||
right: Var("n")
|
||
}),
|
||
|
||
steps: [
|
||
// Escape check block
|
||
SkeletonStep::Body(vec![
|
||
Expr::If {
|
||
cond: Comparison("ch", Eq, Literal("\\")),
|
||
then_body: [
|
||
Expr::Assign("i", Add, 1), // escape_delta
|
||
Expr::Assign("ch", Substring("s", Var("i"), Add(Var("i"), 1))),
|
||
]
|
||
}
|
||
]),
|
||
|
||
// Character accumulation
|
||
SkeletonStep::Body(vec![
|
||
Expr::Assign("out", Append, Var("ch")),
|
||
]),
|
||
|
||
// Normal increment
|
||
SkeletonStep::Update(vec![
|
||
Expr::Assign("i", Add, 1), // normal_delta
|
||
]),
|
||
],
|
||
|
||
carriers: vec![
|
||
CarrierSlot {
|
||
name: "i",
|
||
update_kind: UpdateKind::ConditionalStep {
|
||
cond: (ch == "\\"),
|
||
then_delta: 2,
|
||
else_delta: 1,
|
||
},
|
||
// ... other fields(role など)
|
||
},
|
||
CarrierSlot {
|
||
name: "out",
|
||
pattern: Append,
|
||
// ... other fields
|
||
}
|
||
],
|
||
|
||
exit_contract: ExitContract {
|
||
has_break: true,
|
||
// ...
|
||
}
|
||
}
|
||
```
|
||
|
||
## RoutingDecision Output
|
||
|
||
### For Valid P5b Pattern
|
||
|
||
```rust
|
||
RoutingDecision {
|
||
chosen: Pattern2Break,
|
||
missing_caps: vec![],
|
||
notes: vec![
|
||
"escape_char: \\",
|
||
"normal_delta: 1",
|
||
"escape_delta: 2",
|
||
"break_char: \"",
|
||
"accumulator: out",
|
||
],
|
||
confidence: High,
|
||
}
|
||
```
|
||
|
||
### For Invalid/Unsupported Cases
|
||
|
||
```rust
|
||
// Multiple escapes detected
|
||
RoutingDecision {
|
||
chosen: Unknown,
|
||
missing_caps: vec![CapabilityTag::MultipleBreak],
|
||
notes: vec!["Multiple escape checks found"],
|
||
confidence: Low,
|
||
}
|
||
|
||
// Variable step (not constant)
|
||
RoutingDecision {
|
||
chosen: Unknown,
|
||
missing_caps: vec![CapabilityTag::VariableStep],
|
||
notes: vec!["Escape delta is not constant"],
|
||
confidence: Low,
|
||
}
|
||
```
|
||
|
||
## Parity Verification
|
||
|
||
### Dev-Only Observation
|
||
|
||
In `src/mir/builder/control_flow/joinir/routing.rs`:
|
||
|
||
1. **Router makes decision** using existing Pattern 1-4 logic
|
||
2. **Canonicalizer analyzes** and detects Pattern P5b
|
||
3. **Parity checker compares**:
|
||
- Router decision (Pattern 1-4)
|
||
- Canonicalizer decision (Pattern P5b)
|
||
4. **If mismatch**:
|
||
- Dev mode: Log with reason
|
||
- Strict mode: Fail-Fast with error
|
||
|
||
### Expected Outcomes
|
||
|
||
**Case A: Router picks Pattern 1, Canonicalizer picks P5b**
|
||
- Router: "Simple bounded loop"
|
||
- Canonicalizer: "Escape sequence pattern detected"
|
||
- **Resolution**: Canonicalizer is more specific → router will eventually delegate
|
||
|
||
**Case B: Router fails, Canonicalizer succeeds**
|
||
- Router: "No pattern matched" (Fail-Fast)
|
||
- Canonicalizer: "Pattern P5b matched"
|
||
- **Resolution**: P5b is new capability → expected until router updated
|
||
|
||
**Case C: Both agree P5b**
|
||
- Router: Pattern P5b
|
||
- Canonicalizer: Pattern P5b
|
||
- **Result**: ✅ Parity green
|
||
|
||
## Test Cases
|
||
|
||
### Minimal Case (test_pattern5b_escape_minimal.hako)
|
||
|
||
**Input**: String with one escape sequence
|
||
**Carrier**: Single position variable, single accumulator
|
||
**Deltas**: normal=1, escape=2
|
||
**Output**: Processed string (escape removed)
|
||
|
||
### Extended Cases (Phase 91 Step 2+)
|
||
|
||
1. **test_pattern5b_escape_json**: JSON string with multiple escapes
|
||
2. **test_pattern5b_escape_custom**: Custom escape character
|
||
3. **test_pattern5b_escape_newline**: Escape newline handling
|
||
4. **test_pattern5b_escape_fail_multiple**: Multiple escapes (should Fail-Fast)
|
||
5. **test_pattern5b_escape_fail_variable**: Variable delta (should Fail-Fast)
|
||
|
||
## Lowering Strategy (Future Phase 92)
|
||
|
||
### Philosophy: Keep Return Simple
|
||
|
||
Pattern P5b lowering should:
|
||
1. **Reuse Pattern 1-2 lowering** for normal case
|
||
2. **Extend for conditional increment**:
|
||
- PHI for carrier value after escape check
|
||
- Separate paths for escape vs normal
|
||
3. **Close within Pattern5b** (no cross-boundary complexity)
|
||
|
||
### Rough Outline
|
||
|
||
```
|
||
Entry: LoopPrefix
|
||
↓
|
||
Condition: i < n
|
||
↓
|
||
[BRANCH]
|
||
├→ EscapeBlock
|
||
│ ├→ i = i + escape_delta
|
||
│ └→ ch = substring(i)
|
||
│
|
||
└→ NormalBlock
|
||
├→ (ch already set)
|
||
└→ noop
|
||
|
||
(PHI: i from both branches)
|
||
↓
|
||
ProcessBlock: out = out + ch
|
||
↓
|
||
UpdateBlock: i = i + 1
|
||
↓
|
||
Condition check...
|
||
```
|
||
|
||
## Future Extensions
|
||
|
||
### Pattern P5c: Multi-Character Escapes
|
||
|
||
```
|
||
if ch == "\\" {
|
||
i = i + 2 // Skip \x
|
||
if i < n {
|
||
local second = s.substring(i, i+1)
|
||
// Handle \n, \t, \x, etc.
|
||
}
|
||
}
|
||
```
|
||
|
||
**Complexity**: Requires escape sequence table (not generic)
|
||
|
||
### Pattern P5d: Nested Escape Contexts
|
||
|
||
```
|
||
// Regex with escaped /, inside JSON string with escaped "
|
||
loop(i < n) {
|
||
if ch == "\"" { ... } // String boundary
|
||
if ch == "\\" {
|
||
if in_regex {
|
||
i = i + 2 // Regex escape
|
||
} else {
|
||
i = i + 1 // String escape
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
**Complexity**: State-dependent behavior (future work)
|
||
|
||
## References
|
||
|
||
- **JoinIR Architecture**: `joinir-architecture-overview.md`
|
||
- **Loop Canonicalizer**: `loop-canonicalizer.md`
|
||
- **CapabilityTag Enum**: `src/mir/loop_canonicalizer/capability_guard.rs`
|
||
- **Test Fixture**: `tools/selfhost/test_pattern5b_escape_minimal.hako`
|
||
- **Phase 91 Plan**: `phases/phase-91/README.md`
|
||
|
||
---
|
||
|
||
## Summary
|
||
|
||
**Pattern P5b** enables JoinIR recognition of escape-sequence-aware string parsing loops by:
|
||
|
||
1. **Extending Canonicalizer** to detect conditional increments
|
||
2. **Adding exit-line optimization** for escape branching
|
||
3. **Preserving ExitContract** consistency with P1-P4 patterns
|
||
4. **Enabling parity verification** in strict mode
|
||
|
||
**Status**: Design complete, implementation ready for Phase 91 Step 2
|