diff --git a/docs/development/current/main/design/loop-canonicalizer.md b/docs/development/current/main/design/loop-canonicalizer.md index ab5dc629..1bd0a6e9 100644 --- a/docs/development/current/main/design/loop-canonicalizer.md +++ b/docs/development/current/main/design/loop-canonicalizer.md @@ -295,16 +295,26 @@ pub enum CarrierRole { Skeleton を生成できても lower/merge できるとは限らない。以下の Capability で判定する: -| Capability | 説明 | 未達時の理由タグ | -|--------------------------|------------------------------------------|-------------------------------------| -| `ConstStepIncrement` | キャリア更新が定数ステップ(i=i+const) | `CAP_MISSING_CONST_STEP` | -| `SingleBreakPoint` | break が単一箇所のみ | `CAP_MISSING_SINGLE_BREAK` | -| `SingleContinuePoint` | continue が単一箇所のみ | `CAP_MISSING_SINGLE_CONTINUE` | -| `NoSideEffectInHeader` | ループ条件に副作用がない | `CAP_MISSING_PURE_HEADER` | -| `OuterLocalCondition` | 条件変数が外側スコープで定義済み | `CAP_MISSING_OUTER_LOCAL_COND` | -| `ExitBindingsComplete` | 境界へ渡す値が過不足ない | `CAP_MISSING_EXIT_BINDINGS` | -| `CarrierPromotion` | LoopBodyLocal を昇格可能 | `CAP_MISSING_CARRIER_PROMOTION` | -| `BreakValueConsistent` | break 値の型が一貫 | `CAP_MISSING_BREAK_VALUE_TYPE` | +| Capability | 説明 | 未達時の理由タグ | Pattern対応 | +|--------------------------|------------------------------------------|-------------------------------------|------------| +| `ConstStepIncrement` | キャリア更新が定数ステップ(i=i+const) | `CAP_MISSING_CONST_STEP` | P1-P5 | +| `SingleBreakPoint` | break が単一箇所のみ | `CAP_MISSING_SINGLE_BREAK` | P1-P5 | +| `SingleContinuePoint` | continue が単一箇所のみ | `CAP_MISSING_SINGLE_CONTINUE` | P4 | +| `NoSideEffectInHeader` | ループ条件に副作用がない | `CAP_MISSING_PURE_HEADER` | P1-P5 | +| `OuterLocalCondition` | 条件変数が外側スコープで定義済み | `CAP_MISSING_OUTER_LOCAL_COND` | P1-P5 | +| `ExitBindingsComplete` | 境界へ渡す値が過不足ない | `CAP_MISSING_EXIT_BINDINGS` | P1-P5 | +| `CarrierPromotion` | LoopBodyLocal を昇格可能 | `CAP_MISSING_CARRIER_PROMOTION` | P2-P3 | +| `BreakValueConsistent` | break 値の型が一貫 | `CAP_MISSING_BREAK_VALUE_TYPE` | P2-P5 | +| `EscapeSequencePattern` | エスケープシーケンス対応(P5b専用) | `CAP_MISSING_ESCAPE_PATTERN` | **P5b** | + +**新規 P5b 関連 Capability**: + +| Capability | 説明 | 必須条件 | +|--------------------------|------------------------------------------|---------------------------------------| +| `ConstEscapeDelta` | escape_delta が定数 | `if ch == "\\" { i = i + const }` | +| `ConstNormalDelta` | normal_delta が定数 | `i = i + const` (after escape block) | +| `SingleEscapeCheck` | escape check が単一箇所のみ | 複数の escape 処理がない | +| `ClearBoundaryCondition` | 文字列終端検出が明確 | `if ch == boundary { break }` | ### 語彙の安定性 @@ -365,7 +375,7 @@ pub struct RoutingDecision { --- -## 最初の対象ループ: skip_whitespace(受け入れ基準) +## 対象ループ 1: skip_whitespace(受け入れ基準) ### 対象ファイル @@ -410,6 +420,120 @@ loop(p < len) { --- +## 対象ループ 2: Pattern P5b - Escape Sequence Handling(Phase 91 新規) + +### 目的 + +エスケープシーケンス対応ループを JoinIR 対象に拡大する。JSON/CSV パーサーの文字列処理で共通パターン。 + +### 対象ファイル + +`tools/selfhost/test_pattern5b_escape_minimal.hako` + +```hako +loop(i < n) { + local ch = s.substring(i, i+1) + + if ch == "\"" { break } // String boundary + + if ch == "\\" { + i = i + 1 // Skip escape character (conditional +2 total) + ch = s.substring(i, i+1) // Read escaped character + } + + out = out + ch // Process character + i = i + 1 // Standard increment +} +``` + +### Pattern P5b の特徴 + +| 特性 | 説明 | +|-----|------| +| **Header** | `loop(i < n)` - Bounded loop on string length | +| **Escape Check** | `if ch == escape_char { i = i + escape_delta }` | +| **Normal Increment** | `i = i + 1` (always +1) | +| **Accumulator** | `out = out + char` - String append pattern | +| **Boundary** | `if ch == boundary { break }` - String terminator | +| **Carriers** | Position (`i`), Accumulator (`out`) | +| **Deltas** | normal_delta=1, escape_delta=2 (or variable) | + +### 必要 Capability (P5b 拡張) + +- ✅ `ConstStepIncrement` (normal: i = i + 1) +- ✅ `SingleBreakPoint` (boundary check only) +- ✅ `NoSideEffectInHeader` (i < n は pure) +- ✅ `OuterLocalCondition` (i, n は外側定義) +- ✅ **`ConstEscapeDelta`** (escape: i = i + 2, etc.) - **P5b 専用** +- ✅ **`SingleEscapeCheck`** (one escape pattern only) - **P5b 専用** +- ✅ **`ClearBoundaryCondition`** (explicit boundary detection) - **P5b 専用** + +### Fail-Fast 基準 (P5b 非対応のケース) + +以下のいずれかに該当する場合、Fail-Fast: + +1. **複数エスケープチェック**: `if ch == "\\" ... if ch2 == "'" ...` + - 理由: `CAP_MISSING_SINGLE_ESCAPE_CHECK` + +2. **可変ステップ**: `i = i + var` (定数でない) + - 理由: `CAP_MISSING_CONST_ESCAPE_DELTA` + +3. **無条件に近いループ**: `loop(true)` without clear boundary + - 理由: `CAP_MISSING_CLEAR_BOUNDARY_CONDITION` + +4. **複数 break 点**: String boundary + escape processing で exit + - 理由: `CAP_MISSING_SINGLE_BREAK` + +### 認識アルゴリズム (高レベル) + +``` +1. Header carrier 抽出: loop(i < n) から i を取得 +2. Escape check block 発見: if ch == "\" { ... } +3. Escape delta 抽出: i = i + const +4. Accumulator パターン発見: out = out + ch +5. Normal increment 抽出: i = i + 1 (escape if block 外) +6. Boundary check 発見: if ch == "\"" { break } +7. LoopSkeleton 構築 + - carriers: [i (dual deltas), out (append)] + - exits: has_break=true +8. RoutingDecision: Pattern5bEscape +``` + +### 実装予定 (Phase 91) + +**Step 1** (このドキュメント): +- [ ] Pattern P5b 設計書完成 ✅ +- [ ] テストフィクスチャ作成 ✅ +- [ ] Capability 定義追加 ✅ + +**Step 2** (Phase 91 本実装): +- [ ] `detect_escape_pattern()` in Canonicalizer +- [ ] Unit tests (P5b recognition) +- [ ] Parity verification (strict mode) +- [ ] Documentation update + +**Step 3** (Phase 92 lowering): +- [ ] Pattern5bEscape lowerer 実装 +- [ ] E2E test with escape fixture +- [ ] VM/LLVM parity verification + +### 受け入れ基準 (Phase 91) + +1. ✅ Canonicalizer が escape pattern を認識 +2. ✅ RoutingDecision.chosen == Pattern5bEscape +3. ✅ missing_caps == [] (すべての capability 満たす) +4. ✅ Strict parity green (`HAKO_JOINIR_STRICT=1`) +5. ✅ 既存テスト退行なし +6. ❌ Lowering は Step 3 へ (Phase 91 では recognition のみ) + +### References + +- **P5b 詳細設計**: `docs/development/current/main/design/pattern-p5b-escape-design.md` +- **テストフィクスチャ**: `tools/selfhost/test_pattern5b_escape_minimal.hako` +- **Phase 91 計画**: `docs/development/current/main/phases/phase-91/README.md` + +--- + ## 追加・変更チェックリスト - [ ] 追加するループ形を最小 fixture に落とす(再現固定) diff --git a/docs/development/current/main/design/pattern-p5b-escape-design.md b/docs/development/current/main/design/pattern-p5b-escape-design.md new file mode 100644 index 00000000..4ebc8a85 --- /dev/null +++ b/docs/development/current/main/design/pattern-p5b-escape-design.md @@ -0,0 +1,502 @@ +# Pattern P5b: Escape Sequence Handling + +## Overview + +**Pattern P5b** extends JoinIR loop recognition to handle **variable-step carriers** in escape sequence parsing. + +This pattern is essential for: +- JSON string parsers +- CSV readers +- Template engine string processing +- Any escape-aware text processing loop + +## Problem Statement + +### Current Limitation + +Standard Pattern 1-4 carriers always update by constant deltas: +``` +Carrier i: i = i + 1 (always +1) +``` + +Escape sequences require conditional increments: +``` +if escape_char { i = i + 2 } // Skip both escape char and escaped char +else { i = i + 1 } // Normal increment +``` + +**Why this matters**: +- Common in string parsing (JSON, CSV, config files) +- Appears in ~3 selfhost loops +- Currently forces Fail-Fast (pattern not supported) +- Could benefit from JoinIR exit-line optimization + +### Real-World Example: JSON String Reader + +```hako +loop(i < n) { + local ch = s.substring(i, i+1) + + if ch == "\"" { break } // End of string + + if ch == "\\" { + i = i + 1 // <-- CONDITIONAL: skip escape char + ch = s.substring(i, i+1) // Read escaped character + } + + out = out + ch // Process character + i = i + 1 // <-- UNCONDITIONAL: advance +} +``` + +Loop progression: +- Normal case: `i` advances by 1 +- Escape case: `i` advances by 2 (skip inside if + final increment) + +## Pattern Definition + +### Canonical Form + +``` +LoopSkeleton { + steps: [ + HeaderCond(carrier < limit), + Body(escape_check_block), + Body(process_block), + Update(carrier_increments) + ] +} +``` + +### Header Contract + +**Requirement**: Bounded loop on single integer carrier + +``` +loop(i < n) ✅ Valid P5b header +loop(i < 100) ✅ Valid P5b header +loop(i <= n) ✅ Valid P5b header (edge case) +loop(true) ❌ Not P5b (unbounded) +loop(i < n && j < m) ❌ Not P5b (multi-carrier condition) +``` + +**Carrier**: Must be loop variable used in condition + +### Escape Check Contract + +**Requirement**: Conditional increment based on character test + +#### Escape Detection Block + +``` +if ch == escape_char { + carrier = carrier + escape_delta + // Optional: read next character + ch = s.substring(carrier, carrier+1) +} +``` + +**Escape character**: Typically `\\` (backslash), but can vary +- JSON: `\\` +- CSV: `"` (context-dependent) +- Custom: Any single-character escape + +**Escape delta**: How far to skip +- `+1`: Skip just the escape marker +- `+2`: Skip escape marker + escaped char (common case) +- `+N`: Other values possible + +#### Detection Algorithm + +1. **Find if statement in loop body** +2. **Check condition**: `ch == literal_char` +3. **Extract escape character**: The literal constant +4. **Find assignment in if block**: `carrier = carrier + ` +5. **Calculate escape_delta**: The constant value +6. **Validate**: Escape delta > 0 + +### Process Block Contract + +**Requirement**: Character accumulation with optional processing + +``` +out = out + ch ✅ Simple append +result = result + ch ✅ Any accumulator +s = s + value ❌ Not append pattern +``` + +**Accumulator carrier**: String-like box supporting append + +### Update Block Contract + +**Requirement**: Unconditional carrier increment after escape check + +``` +carrier = carrier + normal_delta +``` + +**Normal delta**: Almost always `+1` +- Defines "normal" loop progress +- Only incremented once per iteration (not in escape block) + +#### Detection Algorithm + +1. **Find assignment after escape if block** +2. **Pattern**: `carrier = carrier + ` +3. **Must be unconditional** (outside any if block) +4. **Extract normal_delta**: The constant + +### Break Requirement + +**Requirement**: Explicit break on string boundary + +``` +if ch == boundary_char { break } +``` + +**Boundary character**: Typically quote `"` +- JSON: `"` +- Custom strings: Any delimiter + +**Position in loop**: Usually before escape check + +### Exit Contract for P5b + +```rust +ExitContract { + has_break: true, // Always for escape patterns + has_continue: false, + has_return: false, + carriers: vec![ + CarrierInfo { + name: "i", // Loop variable + deltas: [ + normal_delta, // e.g., 1 + escape_delta // e.g., 2 + ] + }, + CarrierInfo { + name: "out", // Accumulator + pattern: Append + } + ] +} +``` + +## Capability Analysis + +### Required Capabilities (CapabilityTag) + +For Pattern P5b to be JoinIR-compatible, these must be present: + +| Capability | Meaning | P5b Requirement | Status | +|------------|---------|-----------------|--------| +| `ConstStep` | Carrier updates are constants | ✅ Required | Both deltas constant | +| `SingleBreak` | Only one break point | ✅ Required | String boundary only | +| `PureHeader` | Condition has no side effects | ✅ Required | `i < n` is pure | +| `OuterLocalCond` | Condition doesn't reference locals | ⚠️ Soft req | Usually true | +| `ExitBindings` | Exit block is simple | ✅ Required | Break is unconditional | + +### Missing Capabilities (Fail-Fast Reasons) + +If any of these are detected, Pattern P5b is rejected: + +| Capability | Why It Blocks P5b | Example | +|------------|-------------------|---------| +| `MultipleBreak` | Multiple exit points | `if x { break } if y { break }` | +| `MultipleCarriers` | Condition uses multiple vars | `loop(i < n && j < m)` | +| `VariableStep` | Deltas aren't constants | `i = i + adjustment` | +| `NestedEscape` | Escape check inside other if | `if outer { if ch == \\ ... }` | + +## Recognition Algorithm + +### High-Level Steps + +1. **Extract header carrier**: `i` from `loop(i < n)` +2. **Find escape check**: `if ch == "\\"` +3. **Find escape increment**: `i = i + 2` inside if +4. **Find process block**: `out = out + ch` +5. **Find normal increment**: `i = i + 1` after if +6. **Find break condition**: `if ch == "\"" { break }` +7. **Build ExitContract** with both deltas +8. **Build RoutingDecision**: Pattern5bEscape if all present + +### Pseudo-Code + +```rust +fn detect_escape_pattern(loop_expr: &Expr) -> Option { + // Step 1: Extract loop variable + let (carrier_name, limit) = extract_header_carrier(loop_expr)?; + + // Step 2: Find escape check statement + let escape_stmts = find_escape_check_block(loop_body)?; + + // Step 3: Extract escape character + let escape_char = extract_escape_literal(escape_stmts)?; + + // Step 4: Extract escape delta + let escape_delta = extract_escape_increment(escape_stmts, carrier_name)?; + + // Step 5: Find process statements + let process_stmts = find_character_accumulation(loop_body)?; + + // Step 6: Extract normal increment + let normal_delta = extract_normal_increment(loop_body, carrier_name)?; + + // Step 7: Find break condition + let break_char = extract_break_literal(loop_body)?; + + // Build result + Some(EscapePatternInfo { + carrier_name, + escape_char, + normal_delta, + escape_delta, + break_char, + }) +} +``` + +### Implementation Location + +**File**: `src/mir/loop_canonicalizer/canonicalizer.rs` + +**Function**: `detect_escape_pattern()` (new) + +**Integration point**: `canonicalize_loop_expr()` main dispatch + +**Priority**: Call before `detect_skip_whitespace_pattern()` (more specific) + +## Skeleton Representation + +### Standard Layout + +``` +LoopSkeleton { + header: HeaderCond(Condition { + operator: LessThan, + left: Var("i"), + right: Var("n") + }), + + steps: [ + // Escape check block + SkeletonStep::Body(vec![ + Expr::If { + cond: Comparison("ch", Eq, Literal("\\")), + then_body: [ + Expr::Assign("i", Add, 1), // escape_delta + Expr::Assign("ch", Substring("s", Var("i"), Add(Var("i"), 1))), + ] + } + ]), + + // Character accumulation + SkeletonStep::Body(vec![ + Expr::Assign("out", Append, Var("ch")), + ]), + + // Normal increment + SkeletonStep::Update(vec![ + Expr::Assign("i", Add, 1), // normal_delta + ]), + ], + + carriers: vec![ + CarrierSlot { + name: "i", + deltas: [1, 2], // [normal, escape] + // ... other fields + }, + CarrierSlot { + name: "out", + pattern: Append, + // ... other fields + } + ], + + exit_contract: ExitContract { + has_break: true, + // ... + } +} +``` + +## RoutingDecision Output + +### For Valid P5b Pattern + +```rust +RoutingDecision { + chosen: Pattern5bEscape, + missing_caps: vec![], + notes: vec![ + "escape_char: \\", + "normal_delta: 1", + "escape_delta: 2", + "break_char: \"", + "accumulator: out", + ], + confidence: High, +} +``` + +### For Invalid/Unsupported Cases + +```rust +// Multiple escapes detected +RoutingDecision { + chosen: Unknown, + missing_caps: vec![CapabilityTag::MultipleBreak], + notes: vec!["Multiple escape checks found"], + confidence: Low, +} + +// Variable step (not constant) +RoutingDecision { + chosen: Unknown, + missing_caps: vec![CapabilityTag::VariableStep], + notes: vec!["Escape delta is not constant"], + confidence: Low, +} +``` + +## Parity Verification + +### Dev-Only Observation + +In `src/mir/builder/control_flow/joinir/routing.rs`: + +1. **Router makes decision** using existing Pattern 1-4 logic +2. **Canonicalizer analyzes** and detects Pattern P5b +3. **Parity checker compares**: + - Router decision (Pattern 1-4) + - Canonicalizer decision (Pattern P5b) +4. **If mismatch**: + - Dev mode: Log with reason + - Strict mode: Fail-Fast with error + +### Expected Outcomes + +**Case A: Router picks Pattern 1, Canonicalizer picks P5b** +- Router: "Simple bounded loop" +- Canonicalizer: "Escape sequence pattern detected" +- **Resolution**: Canonicalizer is more specific → router will eventually delegate + +**Case B: Router fails, Canonicalizer succeeds** +- Router: "No pattern matched" (Fail-Fast) +- Canonicalizer: "Pattern P5b matched" +- **Resolution**: P5b is new capability → expected until router updated + +**Case C: Both agree P5b** +- Router: Pattern P5b +- Canonicalizer: Pattern P5b +- **Result**: ✅ Parity green + +## Test Cases + +### Minimal Case (test_pattern5b_escape_minimal.hako) + +**Input**: String with one escape sequence +**Carrier**: Single position variable, single accumulator +**Deltas**: normal=1, escape=2 +**Output**: Processed string (escape removed) + +### Extended Cases (Phase 91 Step 2+) + +1. **test_pattern5b_escape_json**: JSON string with multiple escapes +2. **test_pattern5b_escape_custom**: Custom escape character +3. **test_pattern5b_escape_newline**: Escape newline handling +4. **test_pattern5b_escape_fail_multiple**: Multiple escapes (should Fail-Fast) +5. **test_pattern5b_escape_fail_variable**: Variable delta (should Fail-Fast) + +## Lowering Strategy (Future Phase 92) + +### Philosophy: Keep Return Simple + +Pattern P5b lowering should: +1. **Reuse Pattern 1-2 lowering** for normal case +2. **Extend for conditional increment**: + - PHI for carrier value after escape check + - Separate paths for escape vs normal +3. **Close within Pattern5b** (no cross-boundary complexity) + +### Rough Outline + +``` +Entry: LoopPrefix + ↓ +Condition: i < n + ↓ +[BRANCH] + ├→ EscapeBlock + │ ├→ i = i + escape_delta + │ └→ ch = substring(i) + │ + └→ NormalBlock + ├→ (ch already set) + └→ noop + + (PHI: i from both branches) + ↓ +ProcessBlock: out = out + ch + ↓ +UpdateBlock: i = i + 1 + ↓ +Condition check... +``` + +## Future Extensions + +### Pattern P5c: Multi-Character Escapes + +``` +if ch == "\\" { + i = i + 2 // Skip \x + if i < n { + local second = s.substring(i, i+1) + // Handle \n, \t, \x, etc. + } +} +``` + +**Complexity**: Requires escape sequence table (not generic) + +### Pattern P5d: Nested Escape Contexts + +``` +// Regex with escaped /, inside JSON string with escaped " +loop(i < n) { + if ch == "\"" { ... } // String boundary + if ch == "\\" { + if in_regex { + i = i + 2 // Regex escape + } else { + i = i + 1 // String escape + } + } +} +``` + +**Complexity**: State-dependent behavior (future work) + +## References + +- **JoinIR Architecture**: `joinir-architecture-overview.md` +- **Loop Canonicalizer**: `loop-canonicalizer.md` +- **CapabilityTag Enum**: `src/mir/loop_canonicalizer/capability_guard.rs` +- **Test Fixture**: `tools/selfhost/test_pattern5b_escape_minimal.hako` +- **Phase 91 Plan**: `phases/phase-91/README.md` + +--- + +## Summary + +**Pattern P5b** enables JoinIR recognition of escape-sequence-aware string parsing loops by: + +1. **Extending Canonicalizer** to detect conditional increments +2. **Adding exit-line optimization** for escape branching +3. **Preserving ExitContract** consistency with P1-P4 patterns +4. **Enabling parity verification** in strict mode + +**Status**: Design complete, implementation ready for Phase 91 Step 2 diff --git a/docs/development/current/main/phases/phase-91/README.md b/docs/development/current/main/phases/phase-91/README.md new file mode 100644 index 00000000..a994728c --- /dev/null +++ b/docs/development/current/main/phases/phase-91/README.md @@ -0,0 +1,461 @@ +# Phase 91: JoinIR Coverage Expansion (Selfhost depth-2) + +## Status +- 🔍 **Analysis Complete**: Loop inventory across selfhost codebase +- 📋 **Planning**: Pattern P5b (Escape Handling) candidate selected +- ⏳ **Implementation**: Deferred to dedicated session + +## Executive Summary + +**Current JoinIR Readiness**: 47% (16/30 loops in selfhost code) + +| Category | Count | Status | Effort | +|----------|-------|--------|--------| +| Pattern 1 (simple bounded) | 16 | ✅ Ready | None | +| Pattern 2 (with break) | 1 | ⚠️ Partial | Low | +| Pattern P5b (escape handling) | ~3 | ❌ Blocked | Medium | +| Pattern P5 (guard-bounded) | ~2 | ❌ Blocked | High | +| Pattern P6 (nested loops) | ~8 | ❌ Blocked | Very High | + +--- + +## Analysis Results + +### Loop Inventory by Component + +#### File: `apps/selfhost-vm/boxes/json_cur.hako` (3 loops) +- Lines 9-14: ✅ Pattern 1 (simple bounded) +- Lines 23-32: ✅ Pattern 1 variant with break +- Lines 42-57: ✅ Pattern 1 with guard-less loop(true) + +#### File: `apps/selfhost-vm/json_loader.hako` (3 loops) +- Lines 16-22: ✅ Pattern 1 (simple bounded) +- **Lines 30-37**: ❌ Pattern P5b **CANDIDATE** (escape sequence handling) +- Lines 43-48: ✅ Pattern 1 (simple bounded) + +#### File: `apps/selfhost-vm/boxes/mini_vm_core.hako` (9 loops) +- Lines 208-231: ⚠️ Pattern 1 variant (with continue) +- Lines 239-253: ✅ Pattern 1 (with accumulator) +- Lines 388-400, 493-505: ✅ Pattern 1 (6 bounded search loops) +- **Lines 541-745**: ❌ Pattern P5 **PRIME CANDIDATE** (guard-bounded, 204-line collect_prints) + +#### File: `apps/selfhost-vm/boxes/seam_inspector.hako` (13 loops) +- Lines 10-26: ✅ Pattern 1 +- Lines 38-42, 116-120, 123-127: ✅ Pattern 1 variants +- **Lines 76-107**: ❌ Pattern P6 (deeply nested, 7+ levels) +- Remaining: Mix of ⚠️ Pattern 1 variants with nested loops + +#### File: `apps/selfhost-vm/boxes/mini_vm_prints.hako` (1 loop) +- Line 118+: ❌ Pattern P5 (guard-bounded multi-case) + +--- + +## Candidate Selection: Priority Order + +### 🥇 **IMMEDIATE CANDIDATE: Pattern P5b (Escape Handling)** + +**Target**: `json_loader.hako:30` - `read_digits_from()` + +**Scope**: 8-line loop + +**Current Structure**: +```nyash +loop(i < n) { + local ch = s.substring(i, i+1) + if ch == "\"" { break } + if ch == "\\" { + i = i + 1 + ch = s.substring(i, i+1) + } + out = out + ch + i = i + 1 +} +``` + +**Pattern Classification**: +- **Header**: `loop(i < n)` +- **Escape Check**: `if ch == "\\" { i = i + 2 instead of i + 1 }` +- **Body**: Append character +- **Carriers**: `i` (position), `out` (buffer) +- **Challenge**: Variable increment (sometimes +1, sometimes +2) + +**Why This Candidate**: +- ✅ **Small scope** (8 lines) - good for initial implementation +- ✅ **High reuse potential** - same pattern appears in multiple parser locations +- ✅ **Moderate complexity** - requires conditional step extension (not fully generic) +- ✅ **Clear benefit** - would unlock escape sequence handling across all string parsers +- ❌ **Scope limitation** - conditional increment not yet in Canonicalizer + +**Effort Estimate**: 2-3 days +- Canonicalizer extension: 4-6 hours +- Pattern recognizer: 2-3 hours +- Lowering implementation: 4-6 hours +- Testing + verification: 2-3 hours + +--- + +### 🥈 **SECOND CANDIDATE: Pattern P5 (Guard-Bounded)** + +**Target**: `mini_vm_core.hako:541` - `collect_prints()` + +**Scope**: 204-line loop (monolithic) + +**Current Structure**: +```nyash +loop(true) { + guard = guard + 1 + if guard > 200 { break } + + local p = index_of_from(json, k_print, pos) + if p < 0 { break } + + // 5 different cases based on JSON type + if is_binary_op { ... pos = ... out.push(...) } + if is_compare { ... pos = ... out.push(...) } + if is_literal { ... pos = ... out.push(...) } + if is_function_call { ... pos = ... out.push(...) } + if is_nested { ... pos = ... out.push(...) } + + pos = obj_end + 1 +} +``` + +**Pattern Classification**: +- **Header**: `loop(true)` (unconditional) +- **Guard**: `guard > LIMIT` with increment each iteration +- **Body**: Multiple case-based mutations +- **Carriers**: `pos`, `printed`, `guard`, `out` (ArrayBox) +- **Exit conditions**: Guard exhaustion OR search failure + +**Why This Candidate**: +- ✅ **Monolithic optimization opportunity** - 204 lines of complex control flow +- ✅ **Real-world JSON parsing** - demonstrates practical JoinIR application +- ✅ **High performance impact** - guard counter could be eliminated via SSA +- ❌ **High complexity** - needs new Pattern5 guard-handling variant +- ❌ **Large scope** - would benefit from split into micro-loops first + +**Effort Estimate**: 1-2 weeks +- Design: 2-3 days (pattern definition, contract) +- Implementation: 5-7 days +- Testing + verification: 2-3 days + +**Alternative Strategy**: Could split into 5 micro-loops per case: +```nyash +// Instead of one 204-line loop with 5 cases: +// Create 5 functions, each handling one case: +loop_binary_op() { ... } +loop_compare() { ... } +loop_literal() { ... } +loop_function_call() { ... } +loop_nested() { ... } + +// Then main loop dispatches: +loop(true) { + guard = guard + 1 + if guard > limit { break } + if type == BINARY_OP { loop_binary_op(...) } + ... +} +``` + +This would make each sub-loop Pattern 1-compatible immediately. + +--- + +### 🥉 **THIRD CANDIDATE: Pattern P6 (Nested Loops)** + +**Target**: `seam_inspector.hako:76` - `_scan_boxes()` + +**Scope**: Multi-level nested (7+ nesting levels) + +**Current Structure**: 37-line outer loop containing 6 nested loops + +**Pattern Classification**: +- **Nesting levels**: 7+ +- **Carriers**: Multiple per level (`i`, `j`, `k`, `name`, `pos`, etc.) +- **Exit conditions**: Varied per level (bounds, break, continue) +- **Scope handoff**: Complex state passing between levels + +**Why This Candidate**: +- ✅ **Demonstrates nested composition** - needed for production parsers +- ✅ **Realistic code** - actual box/function scanner +- ❌ **Highest complexity** - requires recursive JoinIR composition +- ❌ **Long-term project** - 2-3 weeks minimum + +**Effort Estimate**: 2-3 weeks +- Design recursive composition: 3-5 days +- Per-level implementation: 7-10 days +- Testing nested composition: 3-5 days + +--- + +## Recommended Immediate Action + +### Phase 91 (This Session): Pattern P5b Planning + +**Objective**: Design Pattern P5b (escape sequence handling) with minimal implementation + +**Steps**: +1. ✅ **Analysis complete** (done by Explore agent) +2. **Design P5b pattern** (canonicalizer contract) +3. **Create minimal fixture** (`test_pattern5b_escape_minimal.hako`) +4. **Extend Canonicalizer** to recognize escape patterns +5. **Plan lowering** (defer implementation to next session) +6. **Document P5b architecture** in loop-canonicalizer.md + +**Acceptance Criteria**: +- ✅ Pattern P5b design document complete +- ✅ Minimal escape test fixture created +- ✅ Canonicalizer recognizes escape patterns (dev-only observation) +- ✅ Parity check passes (strict mode) +- ✅ No lowering changes yet (recognition-only phase) + +**Deliverables**: +- `docs/development/current/main/phases/phase-91/README.md` - This document +- `docs/development/current/main/design/pattern-p5b-escape-design.md` - Pattern design (new) +- `tools/selfhost/test_pattern5b_escape_minimal.hako` - Test fixture (new) +- Updated `docs/development/current/main/design/loop-canonicalizer.md` - Capability tags extended + +--- + +## Design: Pattern P5b (Escape Sequence Handling) + +### Motivation + +String parsing commonly requires escape sequence handling: +- Double quotes: `"text with \" escaped quote"` +- Backslashes: `"path\\with\\backslashes"` +- Newlines: `"text with \n newline"` + +Current loops handle this with conditional increment: +```rust +if ch == "\\" { + i = i + 1 // Skip escape character itself + ch = next_char +} +i = i + 1 // Always advance +``` + +This variable-step pattern is **not JoinIR-compatible** because: +- Loop increment is conditional (sometimes +1, sometimes +2) +- Canonicalizer expects constant-delta carriers +- Lowering expects uniform update rules + +### Solution: Pattern P5b Definition + +#### Header Requirement +``` +loop(i < n) // Bounded loop on string length +``` + +#### Escape Check Requirement +``` +if ch == "\\" { + i = i + delta_skip // Skip character (typically +1, +2, or variable) + // Optional: consume escape character + ch = s.substring(i, i+1) +} +``` + +#### After-Escape Requirement +``` +// Standard character processing +out = out + ch +i = i + delta_normal // Standard increment (typically +1) +``` + +#### Skeleton Structure +``` +LoopSkeleton { + steps: [ + HeaderCond(i < n), + Body(escape_check_stmts), + Body(process_char_stmts), + Update(i = i + normal_delta, maybe(i = i + skip_delta)) + ] +} +``` + +#### Carrier Configuration +- **Primary Carrier**: Loop variable (`i`) + - `delta_normal`: +1 (standard case) + - `delta_escape`: +1 or +2 (skip escape) +- **Secondary Carrier**: Accumulator (`out`) + - Pattern: `out = out + value` + +#### ExitContract +``` +ExitContract { + has_break: true, // Break on quote detection + has_continue: false, + has_return: false, + carriers: vec![ + CarrierInfo { name: "i", deltas: [+1, +2] }, + CarrierInfo { name: "out", pattern: Append } + ] +} +``` + +#### Routing Decision +``` +RoutingDecision { + chosen: Pattern5bEscape, + structure_notes: ["escape_handling", "variable_step"], + missing_caps: [] // All required capabilities present +} +``` + +### Recognition Algorithm + +#### AST Inspection Steps + +1. **Find escape check**: + - Pattern: `if ch == "\\" { ... }` + - Extract: Escape character constant + - Extract: Increment inside if block + +2. **Extract skip delta**: + - Pattern: `i = i + ` + - Calculate: `skip_delta = ` + +3. **Find normal increment**: + - Pattern: `i = i + ` (after escape if block) + - Calculate: `normal_delta = ` + +4. **Validate break condition**: + - Pattern: `if == "" { break }` + - Required for string boundary detection + +5. **Build LoopSkeleton**: + - Carriers: `[{name: "i", deltas: [normal, skip]}, ...]` + - ExitContract: `has_break=true` + - RoutingDecision: `chosen=Pattern5bEscape` + +### Implementation Plan + +#### Canonicalizer Extension (`src/mir/loop_canonicalizer/canonicalizer.rs`) + +Add `detect_escape_pattern()` recognition: +```rust +fn detect_escape_pattern( + loop_expr: &Expr, + carriers: &[String] +) -> Option { + // Step 1-5 as above + // Return: { escape_char, skip_delta, normal_delta, carrier_name } +} +``` + +Priority: Call before `detect_skip_whitespace_pattern()` (more specific pattern first) + +#### Pattern Recognizer Wrapper (`src/mir/loop_canonicalizer/pattern_recognizer.rs`) + +Expose `detect_escape_pattern()`: +```rust +pub fn try_extract_escape_pattern( + loop_expr: &Expr +) -> Option<(String, i64, i64)> { // (carrier, normal_delta, skip_delta) + // Delegate to canonicalizer detection +} +``` + +#### Test Fixture (`tools/selfhost/test_pattern5b_escape_minimal.hako`) + +Minimal reproducible example: +```nyash +// Minimal escape sequence parser +local s = "\\"hello\\" world" +local n = s.length() +local i = 0 +local out = "" + +loop(i < n) { + local ch = s.substring(i, i+1) + + if ch == "\"" { + break + } + + if ch == "\\" { + i = i + 1 // Skip escape character + if i < n { + ch = s.substring(i, i+1) + } + } + + out = out + ch + i = i + 1 +} + +print(out) // Should print: hello" world +``` + +--- + +## Files to Modify (Phase 91) + +### New Files +1. `docs/development/current/main/phases/phase-91/README.md` ← You are here +2. `docs/development/current/main/design/pattern-p5b-escape-design.md` (new - detailed design) +3. `tools/selfhost/test_pattern5b_escape_minimal.hako` (new - test fixture) + +### Modified Files +1. `docs/development/current/main/design/loop-canonicalizer.md` + - Add Pattern P5b to capability matrix + - Add recognition algorithm + - Add routing decision table + +2. (Phase 91 Step 2+) `src/mir/loop_canonicalizer/canonicalizer.rs` + - Add `detect_escape_pattern()` function + - Extend `canonicalize_loop_expr()` to check for escape patterns + +3. (Phase 91 Step 2+) `src/mir/loop_canonicalizer/pattern_recognizer.rs` + - Add `try_extract_escape_pattern()` wrapper + +--- + +## Next Steps (Future Sessions) + +### Phase 91 Step 2: Implementation +- Implement `detect_escape_pattern()` in Canonicalizer +- Add unit tests for escape pattern recognition +- Verify strict parity with router + +### Phase 92: Lowering +- Implement Pattern5bEscape lowerer +- Handle variable-step carrier updates +- E2E test with `test_pattern5b_escape_minimal.hako` + +### Phase 93: Pattern P5 (Guard-Bounded) +- Implement Pattern5 for `mini_vm_core.hako:541` +- Consider micro-loop refactoring alternative +- Document guard-counter optimization strategy + +### Phase 94+: Pattern P6 (Nested Loops) +- Recursive JoinIR composition for `seam_inspector.hako:76` +- Cross-level scope/carrier handoff + +--- + +## SSOT References + +- **JoinIR Architecture**: `docs/development/current/main/joinir-architecture-overview.md` +- **Loop Canonicalizer Design**: `docs/development/current/main/design/loop-canonicalizer.md` +- **Capability Tags**: `src/mir/loop_canonicalizer/capability_guard.rs` + +--- + +## Summary + +**Phase 91** establishes the next frontier of JoinIR coverage: **Pattern P5b (Escape Handling)**. + +This pattern unlocks: +- ✅ All string escape parsing loops +- ✅ Foundation for Pattern P5 (guard-bounded) +- ✅ Preparation for Pattern P6 (nested loops) + +**Current readiness**: 47% (16/30 loops) +**After Phase 91**: Expected to reach ~60% (18/30 loops) +**Long-term target**: >90% coverage with P5, P5b, P6 patterns + +All acceptance criteria defined. Implementation ready for next session. diff --git a/tools/selfhost/test_pattern5b_escape_minimal.hako b/tools/selfhost/test_pattern5b_escape_minimal.hako new file mode 100644 index 00000000..7644e780 --- /dev/null +++ b/tools/selfhost/test_pattern5b_escape_minimal.hako @@ -0,0 +1,59 @@ +// Minimal Pattern P5b (Escape Handling) Test Fixture +// Purpose: Verify JoinIR Canonicalizer recognition of escape sequence patterns +// +// Pattern: loop(i < n) with conditional increment on escape character +// Carriers: i (position), out (accumulator) +// Exit: break on quote character +// +// This pattern is common in string parsing: +// - JSON string readers +// - CSV parsers +// - Template engines +// - Escape sequence handlers + +static box Main { + console: ConsoleBox + + main() { + me.console = new ConsoleBox() + + // Test data: string with escape sequence + // Original: "hello\" world" + // After parsing: hello" world + local s = "hello\\\" world" + local n = s.length() + local i = 0 + local out = "" + + // Pattern P5b: Escape sequence handling loop + // - Header: loop(i < n) + // - Escape check: if ch == "\\" { i = i + 1 } + // - Process: out = out + ch + // - Update: i = i + 1 + loop(i < n) { + local ch = s.substring(i, i + 1) + + // Break on quote (string boundary) + if ch == "\"" { + break + } + + // Handle escape sequence: skip the escape char itself + if ch == "\\" { + i = i + 1 // Skip escape character (i increments by +2 total with final i++) + if i < n { + ch = s.substring(i, i + 1) + } + } + + // Accumulate processed character + out = out + ch + i = i + 1 // Standard increment + } + + // Expected output: hello" world (escape removed) + me.console.log(out) + + return "OK" + } +}