feat(phase-91): JoinIR Selfhost depth-2 advancement - Pattern P5b design & planning

## Overview
Analyzed 34 loops across selfhost codebase to identify JoinIR coverage gaps.
Current readiness: 47% (16/30 loops). Next frontier: Pattern P5b (Escape Handling).

## Current Status
- Phase 91 planning document: Complete
  - Loop inventory across 6 key files
  - Priority ranking: P5b (escape) > P5 (guard) > P6 (nested)
  - Effort estimates and ROI analysis

- Pattern P5b Design: Complete
  - Problem statement (variable-step carriers)
  - Pattern definition with Skeleton layout
  - Recognition algorithm (8-step detection)
  - Capability taxonomy (P5b-specific guards)
  - Lowering strategy (Phase 92 preview)

- Test fixture: Created
  - Minimal escape sequence parser
  - JSON string with backslash escape

- Loop Canonicalizer extended
  - Capability table updated with P5b entries
  - Fail-Fast criteria documented
  - Implementation checklist added

## Key Findings

### Loop Readiness Matrix
| Category | Count | JoinIR Status |
|----------|-------|--------------|
| Pattern 1 (simple bounded) | 16 |  Ready |
| Pattern 2 (with break) | 1 | ⚠️ Partial |
| **Pattern P5b (escape seq)** | ~3 |  NEW |
| Pattern P5 (guard-bounded) | ~2 |  Deferred |
| Pattern P6 (nested loops) | ~8 |  Deferred |

### Top Candidates
1. **P5b**: json_loader.hako:30 (8 lines, high reuse)
   - Effort: 2-3 days (recognition)
   - Impact: Unlocks all escape parsers

2. **P5**: mini_vm_core.hako:541 (204 lines, monolithic)
   - Effort: 1-2 weeks
   - Impact: Major JSON optimization

3. **P6**: seam_inspector.hako:76 (7+ nesting)
   - Effort: 2-3 weeks
   - Impact: Demonstrates nested composition

## Phase 91 Strategy
**Recognition-only phase** (no lowering in P1):
- Step 1: Design & planning 
- Step 2: Canonicalizer implementation (detect_escape_pattern)
- Step 3: Unit tests + parity verification
- Step 4: Lowering deferred to Phase 92

## Files Added
- docs/development/current/main/phases/phase-91/README.md - Full analysis & planning
- docs/development/current/main/design/pattern-p5b-escape-design.md - Technical design
- tools/selfhost/test_pattern5b_escape_minimal.hako - Test fixture

## Files Modified
- docs/development/current/main/design/loop-canonicalizer.md
  - Capability table extended with P5b entries
  - Pattern P5b full section added
  - Implementation checklist updated

## Acceptance Criteria (Phase 91 Step 1)
-  Loop inventory complete (34 loops across 6 files)
-  Pattern P5b design document ready
-  Test fixture created
-  Capability taxonomy extended
-  Implementation deferred (Step 2+)

## References
- JoinIR Architecture: joinir-architecture-overview.md
- Phase 91 Plan: phases/phase-91/README.md
- P5b Design: design/pattern-p5b-escape-design.md

Next: Implement detect_escape_pattern() recognition in Phase 91 Step 2

🤖 Generated with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
This commit is contained in:
nyash-codex
2025-12-16 14:22:36 +09:00
parent 26077186aa
commit 9e3b258046
4 changed files with 1157 additions and 11 deletions

View File

@ -295,16 +295,26 @@ pub enum CarrierRole {
Skeleton を生成できても lower/merge できるとは限らない。以下の Capability で判定する:
| Capability | 説明 | 未達時の理由タグ |
|--------------------------|------------------------------------------|-------------------------------------|
| `ConstStepIncrement` | キャリア更新が定数ステップi=i+const | `CAP_MISSING_CONST_STEP` |
| `SingleBreakPoint` | break が単一箇所のみ | `CAP_MISSING_SINGLE_BREAK` |
| `SingleContinuePoint` | continue が単一箇所のみ | `CAP_MISSING_SINGLE_CONTINUE` |
| `NoSideEffectInHeader` | ループ条件に副作用がない | `CAP_MISSING_PURE_HEADER` |
| `OuterLocalCondition` | 条件変数が外側スコープで定義済み | `CAP_MISSING_OUTER_LOCAL_COND` |
| `ExitBindingsComplete` | 境界へ渡す値が過不足ない | `CAP_MISSING_EXIT_BINDINGS` |
| `CarrierPromotion` | LoopBodyLocal を昇格可能 | `CAP_MISSING_CARRIER_PROMOTION` |
| `BreakValueConsistent` | break 値の型が一貫 | `CAP_MISSING_BREAK_VALUE_TYPE` |
| Capability | 説明 | 未達時の理由タグ | Pattern対応 |
|--------------------------|------------------------------------------|-------------------------------------|------------|
| `ConstStepIncrement` | キャリア更新が定数ステップi=i+const | `CAP_MISSING_CONST_STEP` | P1-P5 |
| `SingleBreakPoint` | break が単一箇所のみ | `CAP_MISSING_SINGLE_BREAK` | P1-P5 |
| `SingleContinuePoint` | continue が単一箇所のみ | `CAP_MISSING_SINGLE_CONTINUE` | P4 |
| `NoSideEffectInHeader` | ループ条件に副作用がない | `CAP_MISSING_PURE_HEADER` | P1-P5 |
| `OuterLocalCondition` | 条件変数が外側スコープで定義済み | `CAP_MISSING_OUTER_LOCAL_COND` | P1-P5 |
| `ExitBindingsComplete` | 境界へ渡す値が過不足ない | `CAP_MISSING_EXIT_BINDINGS` | P1-P5 |
| `CarrierPromotion` | LoopBodyLocal を昇格可能 | `CAP_MISSING_CARRIER_PROMOTION` | P2-P3 |
| `BreakValueConsistent` | break 値の型が一貫 | `CAP_MISSING_BREAK_VALUE_TYPE` | P2-P5 |
| `EscapeSequencePattern` | エスケープシーケンス対応P5b専用 | `CAP_MISSING_ESCAPE_PATTERN` | **P5b** |
**新規 P5b 関連 Capability**:
| Capability | 説明 | 必須条件 |
|--------------------------|------------------------------------------|---------------------------------------|
| `ConstEscapeDelta` | escape_delta が定数 | `if ch == "\\" { i = i + const }` |
| `ConstNormalDelta` | normal_delta が定数 | `i = i + const` (after escape block) |
| `SingleEscapeCheck` | escape check が単一箇所のみ | 複数の escape 処理がない |
| `ClearBoundaryCondition` | 文字列終端検出が明確 | `if ch == boundary { break }` |
### 語彙の安定性
@ -365,7 +375,7 @@ pub struct RoutingDecision {
---
## 最初の対象ループ: skip_whitespace受け入れ基準
## 対象ループ 1: skip_whitespace受け入れ基準
### 対象ファイル
@ -410,6 +420,120 @@ loop(p < len) {
---
## 対象ループ 2: Pattern P5b - Escape Sequence HandlingPhase 91 新規)
### 目的
エスケープシーケンス対応ループを JoinIR 対象に拡大する。JSON/CSV パーサーの文字列処理で共通パターン。
### 対象ファイル
`tools/selfhost/test_pattern5b_escape_minimal.hako`
```hako
loop(i < n) {
local ch = s.substring(i, i+1)
if ch == "\"" { break } // String boundary
if ch == "\\" {
i = i + 1 // Skip escape character (conditional +2 total)
ch = s.substring(i, i+1) // Read escaped character
}
out = out + ch // Process character
i = i + 1 // Standard increment
}
```
### Pattern P5b の特徴
| 特性 | 説明 |
|-----|------|
| **Header** | `loop(i < n)` - Bounded loop on string length |
| **Escape Check** | `if ch == escape_char { i = i + escape_delta }` |
| **Normal Increment** | `i = i + 1` (always +1) |
| **Accumulator** | `out = out + char` - String append pattern |
| **Boundary** | `if ch == boundary { break }` - String terminator |
| **Carriers** | Position (`i`), Accumulator (`out`) |
| **Deltas** | normal_delta=1, escape_delta=2 (or variable) |
### 必要 Capability (P5b 拡張)
-`ConstStepIncrement` (normal: i = i + 1)
-`SingleBreakPoint` (boundary check only)
-`NoSideEffectInHeader` (i < n pure)
- `OuterLocalCondition` (i, n は外側定義)
- **`ConstEscapeDelta`** (escape: i = i + 2, etc.) - **P5b 専用**
- **`SingleEscapeCheck`** (one escape pattern only) - **P5b 専用**
- **`ClearBoundaryCondition`** (explicit boundary detection) - **P5b 専用**
### Fail-Fast 基準 (P5b 非対応のケース)
以下のいずれかに該当する場合Fail-Fast:
1. **複数エスケープチェック**: `if ch == "\\" ... if ch2 == "'" ...`
- 理由: `CAP_MISSING_SINGLE_ESCAPE_CHECK`
2. **可変ステップ**: `i = i + var` (定数でない)
- 理由: `CAP_MISSING_CONST_ESCAPE_DELTA`
3. **無条件に近いループ**: `loop(true)` without clear boundary
- 理由: `CAP_MISSING_CLEAR_BOUNDARY_CONDITION`
4. **複数 break 点**: String boundary + escape processing exit
- 理由: `CAP_MISSING_SINGLE_BREAK`
### 認識アルゴリズム (高レベル)
```
1. Header carrier 抽出: loop(i < n) から i を取得
2. Escape check block 発見: if ch == "\" { ... }
3. Escape delta 抽出: i = i + const
4. Accumulator パターン発見: out = out + ch
5. Normal increment 抽出: i = i + 1 (escape if block 外)
6. Boundary check 発見: if ch == "\"" { break }
7. LoopSkeleton 構築
- carriers: [i (dual deltas), out (append)]
- exits: has_break=true
8. RoutingDecision: Pattern5bEscape
```
### 実装予定 (Phase 91)
**Step 1** (このドキュメント):
- [ ] Pattern P5b 設計書完成
- [ ] テストフィクスチャ作成
- [ ] Capability 定義追加
**Step 2** (Phase 91 本実装):
- [ ] `detect_escape_pattern()` in Canonicalizer
- [ ] Unit tests (P5b recognition)
- [ ] Parity verification (strict mode)
- [ ] Documentation update
**Step 3** (Phase 92 lowering):
- [ ] Pattern5bEscape lowerer 実装
- [ ] E2E test with escape fixture
- [ ] VM/LLVM parity verification
### 受け入れ基準 (Phase 91)
1. Canonicalizer escape pattern を認識
2. RoutingDecision.chosen == Pattern5bEscape
3. missing_caps == [] (すべての capability 満たす)
4. Strict parity green (`HAKO_JOINIR_STRICT=1`)
5. 既存テスト退行なし
6. Lowering Step 3 (Phase 91 では recognition のみ)
### References
- **P5b 詳細設計**: `docs/development/current/main/design/pattern-p5b-escape-design.md`
- **テストフィクスチャ**: `tools/selfhost/test_pattern5b_escape_minimal.hako`
- **Phase 91 計画**: `docs/development/current/main/phases/phase-91/README.md`
---
## 追加・変更チェックリスト
- [ ] 追加するループ形を最小 fixture に落とす再現固定

View File

@ -0,0 +1,502 @@
# Pattern P5b: Escape Sequence Handling
## Overview
**Pattern P5b** extends JoinIR loop recognition to handle **variable-step carriers** in escape sequence parsing.
This pattern is essential for:
- JSON string parsers
- CSV readers
- Template engine string processing
- Any escape-aware text processing loop
## Problem Statement
### Current Limitation
Standard Pattern 1-4 carriers always update by constant deltas:
```
Carrier i: i = i + 1 (always +1)
```
Escape sequences require conditional increments:
```
if escape_char { i = i + 2 } // Skip both escape char and escaped char
else { i = i + 1 } // Normal increment
```
**Why this matters**:
- Common in string parsing (JSON, CSV, config files)
- Appears in ~3 selfhost loops
- Currently forces Fail-Fast (pattern not supported)
- Could benefit from JoinIR exit-line optimization
### Real-World Example: JSON String Reader
```hako
loop(i < n) {
local ch = s.substring(i, i+1)
if ch == "\"" { break } // End of string
if ch == "\\" {
i = i + 1 // <-- CONDITIONAL: skip escape char
ch = s.substring(i, i+1) // Read escaped character
}
out = out + ch // Process character
i = i + 1 // <-- UNCONDITIONAL: advance
}
```
Loop progression:
- Normal case: `i` advances by 1
- Escape case: `i` advances by 2 (skip inside if + final increment)
## Pattern Definition
### Canonical Form
```
LoopSkeleton {
steps: [
HeaderCond(carrier < limit),
Body(escape_check_block),
Body(process_block),
Update(carrier_increments)
]
}
```
### Header Contract
**Requirement**: Bounded loop on single integer carrier
```
loop(i < n) ✅ Valid P5b header
loop(i < 100) ✅ Valid P5b header
loop(i <= n) ✅ Valid P5b header (edge case)
loop(true) ❌ Not P5b (unbounded)
loop(i < n && j < m) ❌ Not P5b (multi-carrier condition)
```
**Carrier**: Must be loop variable used in condition
### Escape Check Contract
**Requirement**: Conditional increment based on character test
#### Escape Detection Block
```
if ch == escape_char {
carrier = carrier + escape_delta
// Optional: read next character
ch = s.substring(carrier, carrier+1)
}
```
**Escape character**: Typically `\\` (backslash), but can vary
- JSON: `\\`
- CSV: `"` (context-dependent)
- Custom: Any single-character escape
**Escape delta**: How far to skip
- `+1`: Skip just the escape marker
- `+2`: Skip escape marker + escaped char (common case)
- `+N`: Other values possible
#### Detection Algorithm
1. **Find if statement in loop body**
2. **Check condition**: `ch == literal_char`
3. **Extract escape character**: The literal constant
4. **Find assignment in if block**: `carrier = carrier + <const>`
5. **Calculate escape_delta**: The constant value
6. **Validate**: Escape delta > 0
### Process Block Contract
**Requirement**: Character accumulation with optional processing
```
out = out + ch ✅ Simple append
result = result + ch ✅ Any accumulator
s = s + value ❌ Not append pattern
```
**Accumulator carrier**: String-like box supporting append
### Update Block Contract
**Requirement**: Unconditional carrier increment after escape check
```
carrier = carrier + normal_delta
```
**Normal delta**: Almost always `+1`
- Defines "normal" loop progress
- Only incremented once per iteration (not in escape block)
#### Detection Algorithm
1. **Find assignment after escape if block**
2. **Pattern**: `carrier = carrier + <const>`
3. **Must be unconditional** (outside any if block)
4. **Extract normal_delta**: The constant
### Break Requirement
**Requirement**: Explicit break on string boundary
```
if ch == boundary_char { break }
```
**Boundary character**: Typically quote `"`
- JSON: `"`
- Custom strings: Any delimiter
**Position in loop**: Usually before escape check
### Exit Contract for P5b
```rust
ExitContract {
has_break: true, // Always for escape patterns
has_continue: false,
has_return: false,
carriers: vec![
CarrierInfo {
name: "i", // Loop variable
deltas: [
normal_delta, // e.g., 1
escape_delta // e.g., 2
]
},
CarrierInfo {
name: "out", // Accumulator
pattern: Append
}
]
}
```
## Capability Analysis
### Required Capabilities (CapabilityTag)
For Pattern P5b to be JoinIR-compatible, these must be present:
| Capability | Meaning | P5b Requirement | Status |
|------------|---------|-----------------|--------|
| `ConstStep` | Carrier updates are constants | ✅ Required | Both deltas constant |
| `SingleBreak` | Only one break point | ✅ Required | String boundary only |
| `PureHeader` | Condition has no side effects | ✅ Required | `i < n` is pure |
| `OuterLocalCond` | Condition doesn't reference locals | ⚠️ Soft req | Usually true |
| `ExitBindings` | Exit block is simple | ✅ Required | Break is unconditional |
### Missing Capabilities (Fail-Fast Reasons)
If any of these are detected, Pattern P5b is rejected:
| Capability | Why It Blocks P5b | Example |
|------------|-------------------|---------|
| `MultipleBreak` | Multiple exit points | `if x { break } if y { break }` |
| `MultipleCarriers` | Condition uses multiple vars | `loop(i < n && j < m)` |
| `VariableStep` | Deltas aren't constants | `i = i + adjustment` |
| `NestedEscape` | Escape check inside other if | `if outer { if ch == \\ ... }` |
## Recognition Algorithm
### High-Level Steps
1. **Extract header carrier**: `i` from `loop(i < n)`
2. **Find escape check**: `if ch == "\\"`
3. **Find escape increment**: `i = i + 2` inside if
4. **Find process block**: `out = out + ch`
5. **Find normal increment**: `i = i + 1` after if
6. **Find break condition**: `if ch == "\"" { break }`
7. **Build ExitContract** with both deltas
8. **Build RoutingDecision**: Pattern5bEscape if all present
### Pseudo-Code
```rust
fn detect_escape_pattern(loop_expr: &Expr) -> Option<EscapePatternInfo> {
// Step 1: Extract loop variable
let (carrier_name, limit) = extract_header_carrier(loop_expr)?;
// Step 2: Find escape check statement
let escape_stmts = find_escape_check_block(loop_body)?;
// Step 3: Extract escape character
let escape_char = extract_escape_literal(escape_stmts)?;
// Step 4: Extract escape delta
let escape_delta = extract_escape_increment(escape_stmts, carrier_name)?;
// Step 5: Find process statements
let process_stmts = find_character_accumulation(loop_body)?;
// Step 6: Extract normal increment
let normal_delta = extract_normal_increment(loop_body, carrier_name)?;
// Step 7: Find break condition
let break_char = extract_break_literal(loop_body)?;
// Build result
Some(EscapePatternInfo {
carrier_name,
escape_char,
normal_delta,
escape_delta,
break_char,
})
}
```
### Implementation Location
**File**: `src/mir/loop_canonicalizer/canonicalizer.rs`
**Function**: `detect_escape_pattern()` (new)
**Integration point**: `canonicalize_loop_expr()` main dispatch
**Priority**: Call before `detect_skip_whitespace_pattern()` (more specific)
## Skeleton Representation
### Standard Layout
```
LoopSkeleton {
header: HeaderCond(Condition {
operator: LessThan,
left: Var("i"),
right: Var("n")
}),
steps: [
// Escape check block
SkeletonStep::Body(vec![
Expr::If {
cond: Comparison("ch", Eq, Literal("\\")),
then_body: [
Expr::Assign("i", Add, 1), // escape_delta
Expr::Assign("ch", Substring("s", Var("i"), Add(Var("i"), 1))),
]
}
]),
// Character accumulation
SkeletonStep::Body(vec![
Expr::Assign("out", Append, Var("ch")),
]),
// Normal increment
SkeletonStep::Update(vec![
Expr::Assign("i", Add, 1), // normal_delta
]),
],
carriers: vec![
CarrierSlot {
name: "i",
deltas: [1, 2], // [normal, escape]
// ... other fields
},
CarrierSlot {
name: "out",
pattern: Append,
// ... other fields
}
],
exit_contract: ExitContract {
has_break: true,
// ...
}
}
```
## RoutingDecision Output
### For Valid P5b Pattern
```rust
RoutingDecision {
chosen: Pattern5bEscape,
missing_caps: vec![],
notes: vec![
"escape_char: \\",
"normal_delta: 1",
"escape_delta: 2",
"break_char: \"",
"accumulator: out",
],
confidence: High,
}
```
### For Invalid/Unsupported Cases
```rust
// Multiple escapes detected
RoutingDecision {
chosen: Unknown,
missing_caps: vec![CapabilityTag::MultipleBreak],
notes: vec!["Multiple escape checks found"],
confidence: Low,
}
// Variable step (not constant)
RoutingDecision {
chosen: Unknown,
missing_caps: vec![CapabilityTag::VariableStep],
notes: vec!["Escape delta is not constant"],
confidence: Low,
}
```
## Parity Verification
### Dev-Only Observation
In `src/mir/builder/control_flow/joinir/routing.rs`:
1. **Router makes decision** using existing Pattern 1-4 logic
2. **Canonicalizer analyzes** and detects Pattern P5b
3. **Parity checker compares**:
- Router decision (Pattern 1-4)
- Canonicalizer decision (Pattern P5b)
4. **If mismatch**:
- Dev mode: Log with reason
- Strict mode: Fail-Fast with error
### Expected Outcomes
**Case A: Router picks Pattern 1, Canonicalizer picks P5b**
- Router: "Simple bounded loop"
- Canonicalizer: "Escape sequence pattern detected"
- **Resolution**: Canonicalizer is more specific → router will eventually delegate
**Case B: Router fails, Canonicalizer succeeds**
- Router: "No pattern matched" (Fail-Fast)
- Canonicalizer: "Pattern P5b matched"
- **Resolution**: P5b is new capability → expected until router updated
**Case C: Both agree P5b**
- Router: Pattern P5b
- Canonicalizer: Pattern P5b
- **Result**: ✅ Parity green
## Test Cases
### Minimal Case (test_pattern5b_escape_minimal.hako)
**Input**: String with one escape sequence
**Carrier**: Single position variable, single accumulator
**Deltas**: normal=1, escape=2
**Output**: Processed string (escape removed)
### Extended Cases (Phase 91 Step 2+)
1. **test_pattern5b_escape_json**: JSON string with multiple escapes
2. **test_pattern5b_escape_custom**: Custom escape character
3. **test_pattern5b_escape_newline**: Escape newline handling
4. **test_pattern5b_escape_fail_multiple**: Multiple escapes (should Fail-Fast)
5. **test_pattern5b_escape_fail_variable**: Variable delta (should Fail-Fast)
## Lowering Strategy (Future Phase 92)
### Philosophy: Keep Return Simple
Pattern P5b lowering should:
1. **Reuse Pattern 1-2 lowering** for normal case
2. **Extend for conditional increment**:
- PHI for carrier value after escape check
- Separate paths for escape vs normal
3. **Close within Pattern5b** (no cross-boundary complexity)
### Rough Outline
```
Entry: LoopPrefix
Condition: i < n
[BRANCH]
├→ EscapeBlock
│ ├→ i = i + escape_delta
│ └→ ch = substring(i)
└→ NormalBlock
├→ (ch already set)
└→ noop
(PHI: i from both branches)
ProcessBlock: out = out + ch
UpdateBlock: i = i + 1
Condition check...
```
## Future Extensions
### Pattern P5c: Multi-Character Escapes
```
if ch == "\\" {
i = i + 2 // Skip \x
if i < n {
local second = s.substring(i, i+1)
// Handle \n, \t, \x, etc.
}
}
```
**Complexity**: Requires escape sequence table (not generic)
### Pattern P5d: Nested Escape Contexts
```
// Regex with escaped /, inside JSON string with escaped "
loop(i < n) {
if ch == "\"" { ... } // String boundary
if ch == "\\" {
if in_regex {
i = i + 2 // Regex escape
} else {
i = i + 1 // String escape
}
}
}
```
**Complexity**: State-dependent behavior (future work)
## References
- **JoinIR Architecture**: `joinir-architecture-overview.md`
- **Loop Canonicalizer**: `loop-canonicalizer.md`
- **CapabilityTag Enum**: `src/mir/loop_canonicalizer/capability_guard.rs`
- **Test Fixture**: `tools/selfhost/test_pattern5b_escape_minimal.hako`
- **Phase 91 Plan**: `phases/phase-91/README.md`
---
## Summary
**Pattern P5b** enables JoinIR recognition of escape-sequence-aware string parsing loops by:
1. **Extending Canonicalizer** to detect conditional increments
2. **Adding exit-line optimization** for escape branching
3. **Preserving ExitContract** consistency with P1-P4 patterns
4. **Enabling parity verification** in strict mode
**Status**: Design complete, implementation ready for Phase 91 Step 2