feat(phase-91): JoinIR Selfhost depth-2 advancement - Pattern P5b design & planning
## Overview Analyzed 34 loops across selfhost codebase to identify JoinIR coverage gaps. Current readiness: 47% (16/30 loops). Next frontier: Pattern P5b (Escape Handling). ## Current Status - Phase 91 planning document: Complete - Loop inventory across 6 key files - Priority ranking: P5b (escape) > P5 (guard) > P6 (nested) - Effort estimates and ROI analysis - Pattern P5b Design: Complete - Problem statement (variable-step carriers) - Pattern definition with Skeleton layout - Recognition algorithm (8-step detection) - Capability taxonomy (P5b-specific guards) - Lowering strategy (Phase 92 preview) - Test fixture: Created - Minimal escape sequence parser - JSON string with backslash escape - Loop Canonicalizer extended - Capability table updated with P5b entries - Fail-Fast criteria documented - Implementation checklist added ## Key Findings ### Loop Readiness Matrix | Category | Count | JoinIR Status | |----------|-------|--------------| | Pattern 1 (simple bounded) | 16 | ✅ Ready | | Pattern 2 (with break) | 1 | ⚠️ Partial | | **Pattern P5b (escape seq)** | ~3 | ❌ NEW | | Pattern P5 (guard-bounded) | ~2 | ❌ Deferred | | Pattern P6 (nested loops) | ~8 | ❌ Deferred | ### Top Candidates 1. **P5b**: json_loader.hako:30 (8 lines, high reuse) - Effort: 2-3 days (recognition) - Impact: Unlocks all escape parsers 2. **P5**: mini_vm_core.hako:541 (204 lines, monolithic) - Effort: 1-2 weeks - Impact: Major JSON optimization 3. **P6**: seam_inspector.hako:76 (7+ nesting) - Effort: 2-3 weeks - Impact: Demonstrates nested composition ## Phase 91 Strategy **Recognition-only phase** (no lowering in P1): - Step 1: Design & planning ✅ - Step 2: Canonicalizer implementation (detect_escape_pattern) - Step 3: Unit tests + parity verification - Step 4: Lowering deferred to Phase 92 ## Files Added - docs/development/current/main/phases/phase-91/README.md - Full analysis & planning - docs/development/current/main/design/pattern-p5b-escape-design.md - Technical design - tools/selfhost/test_pattern5b_escape_minimal.hako - Test fixture ## Files Modified - docs/development/current/main/design/loop-canonicalizer.md - Capability table extended with P5b entries - Pattern P5b full section added - Implementation checklist updated ## Acceptance Criteria (Phase 91 Step 1) - ✅ Loop inventory complete (34 loops across 6 files) - ✅ Pattern P5b design document ready - ✅ Test fixture created - ✅ Capability taxonomy extended - ⏳ Implementation deferred (Step 2+) ## References - JoinIR Architecture: joinir-architecture-overview.md - Phase 91 Plan: phases/phase-91/README.md - P5b Design: design/pattern-p5b-escape-design.md Next: Implement detect_escape_pattern() recognition in Phase 91 Step 2 🤖 Generated with Claude Code Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
This commit is contained in:
@ -295,16 +295,26 @@ pub enum CarrierRole {
|
||||
|
||||
Skeleton を生成できても lower/merge できるとは限らない。以下の Capability で判定する:
|
||||
|
||||
| Capability | 説明 | 未達時の理由タグ |
|
||||
|--------------------------|------------------------------------------|-------------------------------------|
|
||||
| `ConstStepIncrement` | キャリア更新が定数ステップ(i=i+const) | `CAP_MISSING_CONST_STEP` |
|
||||
| `SingleBreakPoint` | break が単一箇所のみ | `CAP_MISSING_SINGLE_BREAK` |
|
||||
| `SingleContinuePoint` | continue が単一箇所のみ | `CAP_MISSING_SINGLE_CONTINUE` |
|
||||
| `NoSideEffectInHeader` | ループ条件に副作用がない | `CAP_MISSING_PURE_HEADER` |
|
||||
| `OuterLocalCondition` | 条件変数が外側スコープで定義済み | `CAP_MISSING_OUTER_LOCAL_COND` |
|
||||
| `ExitBindingsComplete` | 境界へ渡す値が過不足ない | `CAP_MISSING_EXIT_BINDINGS` |
|
||||
| `CarrierPromotion` | LoopBodyLocal を昇格可能 | `CAP_MISSING_CARRIER_PROMOTION` |
|
||||
| `BreakValueConsistent` | break 値の型が一貫 | `CAP_MISSING_BREAK_VALUE_TYPE` |
|
||||
| Capability | 説明 | 未達時の理由タグ | Pattern対応 |
|
||||
|--------------------------|------------------------------------------|-------------------------------------|------------|
|
||||
| `ConstStepIncrement` | キャリア更新が定数ステップ(i=i+const) | `CAP_MISSING_CONST_STEP` | P1-P5 |
|
||||
| `SingleBreakPoint` | break が単一箇所のみ | `CAP_MISSING_SINGLE_BREAK` | P1-P5 |
|
||||
| `SingleContinuePoint` | continue が単一箇所のみ | `CAP_MISSING_SINGLE_CONTINUE` | P4 |
|
||||
| `NoSideEffectInHeader` | ループ条件に副作用がない | `CAP_MISSING_PURE_HEADER` | P1-P5 |
|
||||
| `OuterLocalCondition` | 条件変数が外側スコープで定義済み | `CAP_MISSING_OUTER_LOCAL_COND` | P1-P5 |
|
||||
| `ExitBindingsComplete` | 境界へ渡す値が過不足ない | `CAP_MISSING_EXIT_BINDINGS` | P1-P5 |
|
||||
| `CarrierPromotion` | LoopBodyLocal を昇格可能 | `CAP_MISSING_CARRIER_PROMOTION` | P2-P3 |
|
||||
| `BreakValueConsistent` | break 値の型が一貫 | `CAP_MISSING_BREAK_VALUE_TYPE` | P2-P5 |
|
||||
| `EscapeSequencePattern` | エスケープシーケンス対応(P5b専用) | `CAP_MISSING_ESCAPE_PATTERN` | **P5b** |
|
||||
|
||||
**新規 P5b 関連 Capability**:
|
||||
|
||||
| Capability | 説明 | 必須条件 |
|
||||
|--------------------------|------------------------------------------|---------------------------------------|
|
||||
| `ConstEscapeDelta` | escape_delta が定数 | `if ch == "\\" { i = i + const }` |
|
||||
| `ConstNormalDelta` | normal_delta が定数 | `i = i + const` (after escape block) |
|
||||
| `SingleEscapeCheck` | escape check が単一箇所のみ | 複数の escape 処理がない |
|
||||
| `ClearBoundaryCondition` | 文字列終端検出が明確 | `if ch == boundary { break }` |
|
||||
|
||||
### 語彙の安定性
|
||||
|
||||
@ -365,7 +375,7 @@ pub struct RoutingDecision {
|
||||
|
||||
---
|
||||
|
||||
## 最初の対象ループ: skip_whitespace(受け入れ基準)
|
||||
## 対象ループ 1: skip_whitespace(受け入れ基準)
|
||||
|
||||
### 対象ファイル
|
||||
|
||||
@ -410,6 +420,120 @@ loop(p < len) {
|
||||
|
||||
---
|
||||
|
||||
## 対象ループ 2: Pattern P5b - Escape Sequence Handling(Phase 91 新規)
|
||||
|
||||
### 目的
|
||||
|
||||
エスケープシーケンス対応ループを JoinIR 対象に拡大する。JSON/CSV パーサーの文字列処理で共通パターン。
|
||||
|
||||
### 対象ファイル
|
||||
|
||||
`tools/selfhost/test_pattern5b_escape_minimal.hako`
|
||||
|
||||
```hako
|
||||
loop(i < n) {
|
||||
local ch = s.substring(i, i+1)
|
||||
|
||||
if ch == "\"" { break } // String boundary
|
||||
|
||||
if ch == "\\" {
|
||||
i = i + 1 // Skip escape character (conditional +2 total)
|
||||
ch = s.substring(i, i+1) // Read escaped character
|
||||
}
|
||||
|
||||
out = out + ch // Process character
|
||||
i = i + 1 // Standard increment
|
||||
}
|
||||
```
|
||||
|
||||
### Pattern P5b の特徴
|
||||
|
||||
| 特性 | 説明 |
|
||||
|-----|------|
|
||||
| **Header** | `loop(i < n)` - Bounded loop on string length |
|
||||
| **Escape Check** | `if ch == escape_char { i = i + escape_delta }` |
|
||||
| **Normal Increment** | `i = i + 1` (always +1) |
|
||||
| **Accumulator** | `out = out + char` - String append pattern |
|
||||
| **Boundary** | `if ch == boundary { break }` - String terminator |
|
||||
| **Carriers** | Position (`i`), Accumulator (`out`) |
|
||||
| **Deltas** | normal_delta=1, escape_delta=2 (or variable) |
|
||||
|
||||
### 必要 Capability (P5b 拡張)
|
||||
|
||||
- ✅ `ConstStepIncrement` (normal: i = i + 1)
|
||||
- ✅ `SingleBreakPoint` (boundary check only)
|
||||
- ✅ `NoSideEffectInHeader` (i < n は pure)
|
||||
- ✅ `OuterLocalCondition` (i, n は外側定義)
|
||||
- ✅ **`ConstEscapeDelta`** (escape: i = i + 2, etc.) - **P5b 専用**
|
||||
- ✅ **`SingleEscapeCheck`** (one escape pattern only) - **P5b 専用**
|
||||
- ✅ **`ClearBoundaryCondition`** (explicit boundary detection) - **P5b 専用**
|
||||
|
||||
### Fail-Fast 基準 (P5b 非対応のケース)
|
||||
|
||||
以下のいずれかに該当する場合、Fail-Fast:
|
||||
|
||||
1. **複数エスケープチェック**: `if ch == "\\" ... if ch2 == "'" ...`
|
||||
- 理由: `CAP_MISSING_SINGLE_ESCAPE_CHECK`
|
||||
|
||||
2. **可変ステップ**: `i = i + var` (定数でない)
|
||||
- 理由: `CAP_MISSING_CONST_ESCAPE_DELTA`
|
||||
|
||||
3. **無条件に近いループ**: `loop(true)` without clear boundary
|
||||
- 理由: `CAP_MISSING_CLEAR_BOUNDARY_CONDITION`
|
||||
|
||||
4. **複数 break 点**: String boundary + escape processing で exit
|
||||
- 理由: `CAP_MISSING_SINGLE_BREAK`
|
||||
|
||||
### 認識アルゴリズム (高レベル)
|
||||
|
||||
```
|
||||
1. Header carrier 抽出: loop(i < n) から i を取得
|
||||
2. Escape check block 発見: if ch == "\" { ... }
|
||||
3. Escape delta 抽出: i = i + const
|
||||
4. Accumulator パターン発見: out = out + ch
|
||||
5. Normal increment 抽出: i = i + 1 (escape if block 外)
|
||||
6. Boundary check 発見: if ch == "\"" { break }
|
||||
7. LoopSkeleton 構築
|
||||
- carriers: [i (dual deltas), out (append)]
|
||||
- exits: has_break=true
|
||||
8. RoutingDecision: Pattern5bEscape
|
||||
```
|
||||
|
||||
### 実装予定 (Phase 91)
|
||||
|
||||
**Step 1** (このドキュメント):
|
||||
- [ ] Pattern P5b 設計書完成 ✅
|
||||
- [ ] テストフィクスチャ作成 ✅
|
||||
- [ ] Capability 定義追加 ✅
|
||||
|
||||
**Step 2** (Phase 91 本実装):
|
||||
- [ ] `detect_escape_pattern()` in Canonicalizer
|
||||
- [ ] Unit tests (P5b recognition)
|
||||
- [ ] Parity verification (strict mode)
|
||||
- [ ] Documentation update
|
||||
|
||||
**Step 3** (Phase 92 lowering):
|
||||
- [ ] Pattern5bEscape lowerer 実装
|
||||
- [ ] E2E test with escape fixture
|
||||
- [ ] VM/LLVM parity verification
|
||||
|
||||
### 受け入れ基準 (Phase 91)
|
||||
|
||||
1. ✅ Canonicalizer が escape pattern を認識
|
||||
2. ✅ RoutingDecision.chosen == Pattern5bEscape
|
||||
3. ✅ missing_caps == [] (すべての capability 満たす)
|
||||
4. ✅ Strict parity green (`HAKO_JOINIR_STRICT=1`)
|
||||
5. ✅ 既存テスト退行なし
|
||||
6. ❌ Lowering は Step 3 へ (Phase 91 では recognition のみ)
|
||||
|
||||
### References
|
||||
|
||||
- **P5b 詳細設計**: `docs/development/current/main/design/pattern-p5b-escape-design.md`
|
||||
- **テストフィクスチャ**: `tools/selfhost/test_pattern5b_escape_minimal.hako`
|
||||
- **Phase 91 計画**: `docs/development/current/main/phases/phase-91/README.md`
|
||||
|
||||
---
|
||||
|
||||
## 追加・変更チェックリスト
|
||||
|
||||
- [ ] 追加するループ形を最小 fixture に落とす(再現固定)
|
||||
|
||||
@ -0,0 +1,502 @@
|
||||
# Pattern P5b: Escape Sequence Handling
|
||||
|
||||
## Overview
|
||||
|
||||
**Pattern P5b** extends JoinIR loop recognition to handle **variable-step carriers** in escape sequence parsing.
|
||||
|
||||
This pattern is essential for:
|
||||
- JSON string parsers
|
||||
- CSV readers
|
||||
- Template engine string processing
|
||||
- Any escape-aware text processing loop
|
||||
|
||||
## Problem Statement
|
||||
|
||||
### Current Limitation
|
||||
|
||||
Standard Pattern 1-4 carriers always update by constant deltas:
|
||||
```
|
||||
Carrier i: i = i + 1 (always +1)
|
||||
```
|
||||
|
||||
Escape sequences require conditional increments:
|
||||
```
|
||||
if escape_char { i = i + 2 } // Skip both escape char and escaped char
|
||||
else { i = i + 1 } // Normal increment
|
||||
```
|
||||
|
||||
**Why this matters**:
|
||||
- Common in string parsing (JSON, CSV, config files)
|
||||
- Appears in ~3 selfhost loops
|
||||
- Currently forces Fail-Fast (pattern not supported)
|
||||
- Could benefit from JoinIR exit-line optimization
|
||||
|
||||
### Real-World Example: JSON String Reader
|
||||
|
||||
```hako
|
||||
loop(i < n) {
|
||||
local ch = s.substring(i, i+1)
|
||||
|
||||
if ch == "\"" { break } // End of string
|
||||
|
||||
if ch == "\\" {
|
||||
i = i + 1 // <-- CONDITIONAL: skip escape char
|
||||
ch = s.substring(i, i+1) // Read escaped character
|
||||
}
|
||||
|
||||
out = out + ch // Process character
|
||||
i = i + 1 // <-- UNCONDITIONAL: advance
|
||||
}
|
||||
```
|
||||
|
||||
Loop progression:
|
||||
- Normal case: `i` advances by 1
|
||||
- Escape case: `i` advances by 2 (skip inside if + final increment)
|
||||
|
||||
## Pattern Definition
|
||||
|
||||
### Canonical Form
|
||||
|
||||
```
|
||||
LoopSkeleton {
|
||||
steps: [
|
||||
HeaderCond(carrier < limit),
|
||||
Body(escape_check_block),
|
||||
Body(process_block),
|
||||
Update(carrier_increments)
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Header Contract
|
||||
|
||||
**Requirement**: Bounded loop on single integer carrier
|
||||
|
||||
```
|
||||
loop(i < n) ✅ Valid P5b header
|
||||
loop(i < 100) ✅ Valid P5b header
|
||||
loop(i <= n) ✅ Valid P5b header (edge case)
|
||||
loop(true) ❌ Not P5b (unbounded)
|
||||
loop(i < n && j < m) ❌ Not P5b (multi-carrier condition)
|
||||
```
|
||||
|
||||
**Carrier**: Must be loop variable used in condition
|
||||
|
||||
### Escape Check Contract
|
||||
|
||||
**Requirement**: Conditional increment based on character test
|
||||
|
||||
#### Escape Detection Block
|
||||
|
||||
```
|
||||
if ch == escape_char {
|
||||
carrier = carrier + escape_delta
|
||||
// Optional: read next character
|
||||
ch = s.substring(carrier, carrier+1)
|
||||
}
|
||||
```
|
||||
|
||||
**Escape character**: Typically `\\` (backslash), but can vary
|
||||
- JSON: `\\`
|
||||
- CSV: `"` (context-dependent)
|
||||
- Custom: Any single-character escape
|
||||
|
||||
**Escape delta**: How far to skip
|
||||
- `+1`: Skip just the escape marker
|
||||
- `+2`: Skip escape marker + escaped char (common case)
|
||||
- `+N`: Other values possible
|
||||
|
||||
#### Detection Algorithm
|
||||
|
||||
1. **Find if statement in loop body**
|
||||
2. **Check condition**: `ch == literal_char`
|
||||
3. **Extract escape character**: The literal constant
|
||||
4. **Find assignment in if block**: `carrier = carrier + <const>`
|
||||
5. **Calculate escape_delta**: The constant value
|
||||
6. **Validate**: Escape delta > 0
|
||||
|
||||
### Process Block Contract
|
||||
|
||||
**Requirement**: Character accumulation with optional processing
|
||||
|
||||
```
|
||||
out = out + ch ✅ Simple append
|
||||
result = result + ch ✅ Any accumulator
|
||||
s = s + value ❌ Not append pattern
|
||||
```
|
||||
|
||||
**Accumulator carrier**: String-like box supporting append
|
||||
|
||||
### Update Block Contract
|
||||
|
||||
**Requirement**: Unconditional carrier increment after escape check
|
||||
|
||||
```
|
||||
carrier = carrier + normal_delta
|
||||
```
|
||||
|
||||
**Normal delta**: Almost always `+1`
|
||||
- Defines "normal" loop progress
|
||||
- Only incremented once per iteration (not in escape block)
|
||||
|
||||
#### Detection Algorithm
|
||||
|
||||
1. **Find assignment after escape if block**
|
||||
2. **Pattern**: `carrier = carrier + <const>`
|
||||
3. **Must be unconditional** (outside any if block)
|
||||
4. **Extract normal_delta**: The constant
|
||||
|
||||
### Break Requirement
|
||||
|
||||
**Requirement**: Explicit break on string boundary
|
||||
|
||||
```
|
||||
if ch == boundary_char { break }
|
||||
```
|
||||
|
||||
**Boundary character**: Typically quote `"`
|
||||
- JSON: `"`
|
||||
- Custom strings: Any delimiter
|
||||
|
||||
**Position in loop**: Usually before escape check
|
||||
|
||||
### Exit Contract for P5b
|
||||
|
||||
```rust
|
||||
ExitContract {
|
||||
has_break: true, // Always for escape patterns
|
||||
has_continue: false,
|
||||
has_return: false,
|
||||
carriers: vec![
|
||||
CarrierInfo {
|
||||
name: "i", // Loop variable
|
||||
deltas: [
|
||||
normal_delta, // e.g., 1
|
||||
escape_delta // e.g., 2
|
||||
]
|
||||
},
|
||||
CarrierInfo {
|
||||
name: "out", // Accumulator
|
||||
pattern: Append
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Capability Analysis
|
||||
|
||||
### Required Capabilities (CapabilityTag)
|
||||
|
||||
For Pattern P5b to be JoinIR-compatible, these must be present:
|
||||
|
||||
| Capability | Meaning | P5b Requirement | Status |
|
||||
|------------|---------|-----------------|--------|
|
||||
| `ConstStep` | Carrier updates are constants | ✅ Required | Both deltas constant |
|
||||
| `SingleBreak` | Only one break point | ✅ Required | String boundary only |
|
||||
| `PureHeader` | Condition has no side effects | ✅ Required | `i < n` is pure |
|
||||
| `OuterLocalCond` | Condition doesn't reference locals | ⚠️ Soft req | Usually true |
|
||||
| `ExitBindings` | Exit block is simple | ✅ Required | Break is unconditional |
|
||||
|
||||
### Missing Capabilities (Fail-Fast Reasons)
|
||||
|
||||
If any of these are detected, Pattern P5b is rejected:
|
||||
|
||||
| Capability | Why It Blocks P5b | Example |
|
||||
|------------|-------------------|---------|
|
||||
| `MultipleBreak` | Multiple exit points | `if x { break } if y { break }` |
|
||||
| `MultipleCarriers` | Condition uses multiple vars | `loop(i < n && j < m)` |
|
||||
| `VariableStep` | Deltas aren't constants | `i = i + adjustment` |
|
||||
| `NestedEscape` | Escape check inside other if | `if outer { if ch == \\ ... }` |
|
||||
|
||||
## Recognition Algorithm
|
||||
|
||||
### High-Level Steps
|
||||
|
||||
1. **Extract header carrier**: `i` from `loop(i < n)`
|
||||
2. **Find escape check**: `if ch == "\\"`
|
||||
3. **Find escape increment**: `i = i + 2` inside if
|
||||
4. **Find process block**: `out = out + ch`
|
||||
5. **Find normal increment**: `i = i + 1` after if
|
||||
6. **Find break condition**: `if ch == "\"" { break }`
|
||||
7. **Build ExitContract** with both deltas
|
||||
8. **Build RoutingDecision**: Pattern5bEscape if all present
|
||||
|
||||
### Pseudo-Code
|
||||
|
||||
```rust
|
||||
fn detect_escape_pattern(loop_expr: &Expr) -> Option<EscapePatternInfo> {
|
||||
// Step 1: Extract loop variable
|
||||
let (carrier_name, limit) = extract_header_carrier(loop_expr)?;
|
||||
|
||||
// Step 2: Find escape check statement
|
||||
let escape_stmts = find_escape_check_block(loop_body)?;
|
||||
|
||||
// Step 3: Extract escape character
|
||||
let escape_char = extract_escape_literal(escape_stmts)?;
|
||||
|
||||
// Step 4: Extract escape delta
|
||||
let escape_delta = extract_escape_increment(escape_stmts, carrier_name)?;
|
||||
|
||||
// Step 5: Find process statements
|
||||
let process_stmts = find_character_accumulation(loop_body)?;
|
||||
|
||||
// Step 6: Extract normal increment
|
||||
let normal_delta = extract_normal_increment(loop_body, carrier_name)?;
|
||||
|
||||
// Step 7: Find break condition
|
||||
let break_char = extract_break_literal(loop_body)?;
|
||||
|
||||
// Build result
|
||||
Some(EscapePatternInfo {
|
||||
carrier_name,
|
||||
escape_char,
|
||||
normal_delta,
|
||||
escape_delta,
|
||||
break_char,
|
||||
})
|
||||
}
|
||||
```
|
||||
|
||||
### Implementation Location
|
||||
|
||||
**File**: `src/mir/loop_canonicalizer/canonicalizer.rs`
|
||||
|
||||
**Function**: `detect_escape_pattern()` (new)
|
||||
|
||||
**Integration point**: `canonicalize_loop_expr()` main dispatch
|
||||
|
||||
**Priority**: Call before `detect_skip_whitespace_pattern()` (more specific)
|
||||
|
||||
## Skeleton Representation
|
||||
|
||||
### Standard Layout
|
||||
|
||||
```
|
||||
LoopSkeleton {
|
||||
header: HeaderCond(Condition {
|
||||
operator: LessThan,
|
||||
left: Var("i"),
|
||||
right: Var("n")
|
||||
}),
|
||||
|
||||
steps: [
|
||||
// Escape check block
|
||||
SkeletonStep::Body(vec![
|
||||
Expr::If {
|
||||
cond: Comparison("ch", Eq, Literal("\\")),
|
||||
then_body: [
|
||||
Expr::Assign("i", Add, 1), // escape_delta
|
||||
Expr::Assign("ch", Substring("s", Var("i"), Add(Var("i"), 1))),
|
||||
]
|
||||
}
|
||||
]),
|
||||
|
||||
// Character accumulation
|
||||
SkeletonStep::Body(vec![
|
||||
Expr::Assign("out", Append, Var("ch")),
|
||||
]),
|
||||
|
||||
// Normal increment
|
||||
SkeletonStep::Update(vec![
|
||||
Expr::Assign("i", Add, 1), // normal_delta
|
||||
]),
|
||||
],
|
||||
|
||||
carriers: vec![
|
||||
CarrierSlot {
|
||||
name: "i",
|
||||
deltas: [1, 2], // [normal, escape]
|
||||
// ... other fields
|
||||
},
|
||||
CarrierSlot {
|
||||
name: "out",
|
||||
pattern: Append,
|
||||
// ... other fields
|
||||
}
|
||||
],
|
||||
|
||||
exit_contract: ExitContract {
|
||||
has_break: true,
|
||||
// ...
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## RoutingDecision Output
|
||||
|
||||
### For Valid P5b Pattern
|
||||
|
||||
```rust
|
||||
RoutingDecision {
|
||||
chosen: Pattern5bEscape,
|
||||
missing_caps: vec![],
|
||||
notes: vec![
|
||||
"escape_char: \\",
|
||||
"normal_delta: 1",
|
||||
"escape_delta: 2",
|
||||
"break_char: \"",
|
||||
"accumulator: out",
|
||||
],
|
||||
confidence: High,
|
||||
}
|
||||
```
|
||||
|
||||
### For Invalid/Unsupported Cases
|
||||
|
||||
```rust
|
||||
// Multiple escapes detected
|
||||
RoutingDecision {
|
||||
chosen: Unknown,
|
||||
missing_caps: vec![CapabilityTag::MultipleBreak],
|
||||
notes: vec!["Multiple escape checks found"],
|
||||
confidence: Low,
|
||||
}
|
||||
|
||||
// Variable step (not constant)
|
||||
RoutingDecision {
|
||||
chosen: Unknown,
|
||||
missing_caps: vec![CapabilityTag::VariableStep],
|
||||
notes: vec!["Escape delta is not constant"],
|
||||
confidence: Low,
|
||||
}
|
||||
```
|
||||
|
||||
## Parity Verification
|
||||
|
||||
### Dev-Only Observation
|
||||
|
||||
In `src/mir/builder/control_flow/joinir/routing.rs`:
|
||||
|
||||
1. **Router makes decision** using existing Pattern 1-4 logic
|
||||
2. **Canonicalizer analyzes** and detects Pattern P5b
|
||||
3. **Parity checker compares**:
|
||||
- Router decision (Pattern 1-4)
|
||||
- Canonicalizer decision (Pattern P5b)
|
||||
4. **If mismatch**:
|
||||
- Dev mode: Log with reason
|
||||
- Strict mode: Fail-Fast with error
|
||||
|
||||
### Expected Outcomes
|
||||
|
||||
**Case A: Router picks Pattern 1, Canonicalizer picks P5b**
|
||||
- Router: "Simple bounded loop"
|
||||
- Canonicalizer: "Escape sequence pattern detected"
|
||||
- **Resolution**: Canonicalizer is more specific → router will eventually delegate
|
||||
|
||||
**Case B: Router fails, Canonicalizer succeeds**
|
||||
- Router: "No pattern matched" (Fail-Fast)
|
||||
- Canonicalizer: "Pattern P5b matched"
|
||||
- **Resolution**: P5b is new capability → expected until router updated
|
||||
|
||||
**Case C: Both agree P5b**
|
||||
- Router: Pattern P5b
|
||||
- Canonicalizer: Pattern P5b
|
||||
- **Result**: ✅ Parity green
|
||||
|
||||
## Test Cases
|
||||
|
||||
### Minimal Case (test_pattern5b_escape_minimal.hako)
|
||||
|
||||
**Input**: String with one escape sequence
|
||||
**Carrier**: Single position variable, single accumulator
|
||||
**Deltas**: normal=1, escape=2
|
||||
**Output**: Processed string (escape removed)
|
||||
|
||||
### Extended Cases (Phase 91 Step 2+)
|
||||
|
||||
1. **test_pattern5b_escape_json**: JSON string with multiple escapes
|
||||
2. **test_pattern5b_escape_custom**: Custom escape character
|
||||
3. **test_pattern5b_escape_newline**: Escape newline handling
|
||||
4. **test_pattern5b_escape_fail_multiple**: Multiple escapes (should Fail-Fast)
|
||||
5. **test_pattern5b_escape_fail_variable**: Variable delta (should Fail-Fast)
|
||||
|
||||
## Lowering Strategy (Future Phase 92)
|
||||
|
||||
### Philosophy: Keep Return Simple
|
||||
|
||||
Pattern P5b lowering should:
|
||||
1. **Reuse Pattern 1-2 lowering** for normal case
|
||||
2. **Extend for conditional increment**:
|
||||
- PHI for carrier value after escape check
|
||||
- Separate paths for escape vs normal
|
||||
3. **Close within Pattern5b** (no cross-boundary complexity)
|
||||
|
||||
### Rough Outline
|
||||
|
||||
```
|
||||
Entry: LoopPrefix
|
||||
↓
|
||||
Condition: i < n
|
||||
↓
|
||||
[BRANCH]
|
||||
├→ EscapeBlock
|
||||
│ ├→ i = i + escape_delta
|
||||
│ └→ ch = substring(i)
|
||||
│
|
||||
└→ NormalBlock
|
||||
├→ (ch already set)
|
||||
└→ noop
|
||||
|
||||
(PHI: i from both branches)
|
||||
↓
|
||||
ProcessBlock: out = out + ch
|
||||
↓
|
||||
UpdateBlock: i = i + 1
|
||||
↓
|
||||
Condition check...
|
||||
```
|
||||
|
||||
## Future Extensions
|
||||
|
||||
### Pattern P5c: Multi-Character Escapes
|
||||
|
||||
```
|
||||
if ch == "\\" {
|
||||
i = i + 2 // Skip \x
|
||||
if i < n {
|
||||
local second = s.substring(i, i+1)
|
||||
// Handle \n, \t, \x, etc.
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Complexity**: Requires escape sequence table (not generic)
|
||||
|
||||
### Pattern P5d: Nested Escape Contexts
|
||||
|
||||
```
|
||||
// Regex with escaped /, inside JSON string with escaped "
|
||||
loop(i < n) {
|
||||
if ch == "\"" { ... } // String boundary
|
||||
if ch == "\\" {
|
||||
if in_regex {
|
||||
i = i + 2 // Regex escape
|
||||
} else {
|
||||
i = i + 1 // String escape
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Complexity**: State-dependent behavior (future work)
|
||||
|
||||
## References
|
||||
|
||||
- **JoinIR Architecture**: `joinir-architecture-overview.md`
|
||||
- **Loop Canonicalizer**: `loop-canonicalizer.md`
|
||||
- **CapabilityTag Enum**: `src/mir/loop_canonicalizer/capability_guard.rs`
|
||||
- **Test Fixture**: `tools/selfhost/test_pattern5b_escape_minimal.hako`
|
||||
- **Phase 91 Plan**: `phases/phase-91/README.md`
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
**Pattern P5b** enables JoinIR recognition of escape-sequence-aware string parsing loops by:
|
||||
|
||||
1. **Extending Canonicalizer** to detect conditional increments
|
||||
2. **Adding exit-line optimization** for escape branching
|
||||
3. **Preserving ExitContract** consistency with P1-P4 patterns
|
||||
4. **Enabling parity verification** in strict mode
|
||||
|
||||
**Status**: Design complete, implementation ready for Phase 91 Step 2
|
||||
Reference in New Issue
Block a user