Files

nyash-codex 7e03cb6425 docs(phase-91): Update Phase 91 README with Step 2 completion status

### Updates

#### 1. Status Section
- Updated Status to reflect all completed steps (1, 2-A/B/D, 2-E)
- Documented parity verification success

#### 2. Completion Status Section (NEW)
- Added dedicated section for Phase 91 Step 2 completion
- Listed all deliverables with checkmarks
- Documented test results: 1062/1062 PASS

#### 3. Next Steps
- Clarified Phase 92 lowering requirements
- Updated timeline expectations

#### 4. Test Fixture Fix
- Fixed syntax error in test_pattern5b_escape_minimal.hako
  (field declarations: changed `console: ConsoleBox` to `console ConsoleBox`)

### Context

Phase 91 Step 2 is now fully complete:
- ✅ AST recognizer (detect_escape_skip_pattern)
- ✅ Canonicalizer integration (UpdateKind::ConditionalStep)
- ✅ Unit tests (test_escape_skip_pattern_recognition)
- ✅ Parity verification (strict mode green)

Ready for Phase 92 lowering implementation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

2025-12-16 14:55:40 +09:00

README.md

docs(phase-91): Update Phase 91 README with Step 2 completion status

2025-12-16 14:55:40 +09:00

README.md

Phase 91: JoinIR Coverage Expansion (Selfhost depth-2)

Status

✅ Analysis Complete: Loop inventory across selfhost codebase (Step 1)
✅ Planning Complete: Pattern P5b (Escape Handling) candidate selected (Step 1)
✅ Implementation Complete: AST recognizer, canonicalizer integration, unit tests (Step 2-A/B/D)
✅ Parity Verified: Strict mode green in test_pattern5b_escape_minimal.hako (Step 2-E)
📝 Documentation: Updated Phase 91 README with completion status

Executive Summary

Current JoinIR Readiness: 47% (16/30 loops in selfhost code)

Category	Count	Status	Effort
Pattern 1 (simple bounded)	16	✅ Ready	None
Pattern 2 (with break)	1	⚠️ Partial	Low
Pattern P5b (escape handling)	~3	❌ Blocked	Medium
Pattern P5 (guard-bounded)	~2	❌ Blocked	High
Pattern P6 (nested loops)	~8	❌ Blocked	Very High

Analysis Results

Loop Inventory by Component

File: `apps/selfhost-vm/boxes/json_cur.hako` (3 loops)

Lines 9-14: ✅ Pattern 1 (simple bounded)
Lines 23-32: ✅ Pattern 1 variant with break
Lines 42-57: ✅ Pattern 1 with guard-less loop(true)

File: `apps/selfhost-vm/json_loader.hako` (3 loops)

Lines 16-22: ✅ Pattern 1 (simple bounded)
Lines 30-37: ❌ Pattern P5b CANDIDATE (escape sequence handling)
Lines 43-48: ✅ Pattern 1 (simple bounded)

File: `apps/selfhost-vm/boxes/mini_vm_core.hako` (9 loops)

Lines 208-231: ⚠️ Pattern 1 variant (with continue)
Lines 239-253: ✅ Pattern 1 (with accumulator)
Lines 388-400, 493-505: ✅ Pattern 1 (6 bounded search loops)
Lines 541-745: ❌ Pattern P5 PRIME CANDIDATE (guard-bounded, 204-line collect_prints)

File: `apps/selfhost-vm/boxes/seam_inspector.hako` (13 loops)

Lines 10-26: ✅ Pattern 1
Lines 38-42, 116-120, 123-127: ✅ Pattern 1 variants
Lines 76-107: ❌ Pattern P6 (deeply nested, 7+ levels)
Remaining: Mix of ⚠️ Pattern 1 variants with nested loops

File: `apps/selfhost-vm/boxes/mini_vm_prints.hako` (1 loop)

Line 118+: ❌ Pattern P5 (guard-bounded multi-case)

Candidate Selection: Priority Order

🥇 IMMEDIATE CANDIDATE: Pattern P5b (Escape Handling)

Target: json_loader.hako:30 - read_digits_from()

Scope: 8-line loop

Current Structure:

loop(i < n) {
  local ch = s.substring(i, i+1)
  if ch == "\"" { break }
  if ch == "\\" {
    i = i + 1
    ch = s.substring(i, i+1)
  }
  out = out + ch
  i = i + 1
}

Pattern Classification:

Header: loop(i < n)
Escape Check: if ch == "\\" { i = i + 2 instead of i + 1 }
Body: Append character
Carriers: i (position), out (buffer)
Challenge: Variable increment (sometimes +1, sometimes +2)

Why This Candidate:

✅ Small scope (8 lines) - good for initial implementation
✅ High reuse potential - same pattern appears in multiple parser locations
✅ Moderate complexity - requires conditional step extension (not fully generic)
✅ Clear benefit - would unlock escape sequence handling across all string parsers
❌ Scope limitation - conditional increment not yet in Canonicalizer

Effort Estimate: 2-3 days

Canonicalizer extension: 4-6 hours
Pattern recognizer: 2-3 hours
Lowering implementation: 4-6 hours
Testing + verification: 2-3 hours

🥈 SECOND CANDIDATE: Pattern P5 (Guard-Bounded)

Target: mini_vm_core.hako:541 - collect_prints()

Scope: 204-line loop (monolithic)

Current Structure:

loop(true) {
  guard = guard + 1
  if guard > 200 { break }

  local p = index_of_from(json, k_print, pos)
  if p < 0 { break }

  // 5 different cases based on JSON type
  if is_binary_op { ... pos = ... out.push(...) }
  if is_compare { ... pos = ... out.push(...) }
  if is_literal { ... pos = ... out.push(...) }
  if is_function_call { ... pos = ... out.push(...) }
  if is_nested { ... pos = ... out.push(...) }

  pos = obj_end + 1
}

Pattern Classification:

Header: loop(true) (unconditional)
Guard: guard > LIMIT with increment each iteration
Body: Multiple case-based mutations
Carriers: pos, printed, guard, out (ArrayBox)
Exit conditions: Guard exhaustion OR search failure

Why This Candidate:

✅ Monolithic optimization opportunity - 204 lines of complex control flow
✅ Real-world JSON parsing - demonstrates practical JoinIR application
✅ High performance impact - guard counter could be eliminated via SSA
❌ High complexity - needs new Pattern5 guard-handling variant
❌ Large scope - would benefit from split into micro-loops first

Effort Estimate: 1-2 weeks

Design: 2-3 days (pattern definition, contract)
Implementation: 5-7 days
Testing + verification: 2-3 days

Alternative Strategy: Could split into 5 micro-loops per case:

// Instead of one 204-line loop with 5 cases:
// Create 5 functions, each handling one case:
loop_binary_op() { ... }
loop_compare() { ... }
loop_literal() { ... }
loop_function_call() { ... }
loop_nested() { ... }

// Then main loop dispatches:
loop(true) {
  guard = guard + 1
  if guard > limit { break }
  if type == BINARY_OP { loop_binary_op(...) }
  ...
}

This would make each sub-loop Pattern 1-compatible immediately.

🥉 THIRD CANDIDATE: Pattern P6 (Nested Loops)

Target: seam_inspector.hako:76 - _scan_boxes()

Scope: Multi-level nested (7+ nesting levels)

Current Structure: 37-line outer loop containing 6 nested loops

Pattern Classification:

Nesting levels: 7+
Carriers: Multiple per level (i, j, k, name, pos, etc.)
Exit conditions: Varied per level (bounds, break, continue)
Scope handoff: Complex state passing between levels

Why This Candidate:

✅ Demonstrates nested composition - needed for production parsers
✅ Realistic code - actual box/function scanner
❌ Highest complexity - requires recursive JoinIR composition
❌ Long-term project - 2-3 weeks minimum

Effort Estimate: 2-3 weeks

Design recursive composition: 3-5 days
Per-level implementation: 7-10 days
Testing nested composition: 3-5 days

Recommended Immediate Action

Phase 91 (This Session): Pattern P5b Planning

Objective: Design Pattern P5b (escape sequence handling) with minimal implementation

Steps:

✅ Analysis complete (done by Explore agent)
Design P5b pattern (canonicalizer contract)
Create minimal fixture (test_pattern5b_escape_minimal.hako)
Extend Canonicalizer to recognize escape patterns
Plan lowering (defer implementation to next session)
Document P5b architecture in loop-canonicalizer.md

Acceptance Criteria:

✅ Pattern P5b design document complete
✅ Minimal escape test fixture created
✅ Canonicalizer recognizes escape patterns (dev-only observation)
✅ Parity check passes (strict mode)
✅ No lowering changes yet (recognition-only phase)

Deliverables:

docs/development/current/main/phases/phase-91/README.md - This document
docs/development/current/main/design/pattern-p5b-escape-design.md - Pattern design (new)
tools/selfhost/test_pattern5b_escape_minimal.hako - Test fixture (new)
Updated docs/development/current/main/design/loop-canonicalizer.md - Capability tags extended

Design: Pattern P5b (Escape Sequence Handling)

Motivation

String parsing commonly requires escape sequence handling:

Double quotes: "text with \" escaped quote"
Backslashes: "path\\with\\backslashes"
Newlines: "text with \n newline"

Current loops handle this with conditional increment:

if ch == "\\" {
    i = i + 1  // Skip escape character itself
    ch = next_char
}
i = i + 1  // Always advance

This variable-step pattern is not JoinIR-compatible because:

Loop increment is conditional (sometimes +1, sometimes +2)
Canonicalizer expects constant-delta carriers
Lowering expects uniform update rules

Solution: Pattern P5b Definition

Header Requirement

loop(i < n)  // Bounded loop on string length

Escape Check Requirement

if ch == "\\" {
    i = i + delta_skip  // Skip character (typically +1, +2, or variable)
    // Optional: consume escape character
    ch = s.substring(i, i+1)
}

After-Escape Requirement

// Standard character processing
out = out + ch
i = i + delta_normal  // Standard increment (typically +1)

Skeleton Structure

LoopSkeleton {
  steps: [
    HeaderCond(i < n),
    Body(escape_check_stmts),
    Body(process_char_stmts),
    Update(i = i + normal_delta, maybe(i = i + skip_delta))
  ]
}

Carrier Configuration

Primary Carrier: Loop variable (i)
- delta_normal: +1 (standard case)
- delta_escape: +1 or +2 (skip escape)
Secondary Carrier: Accumulator (out)
- Pattern: out = out + value

ExitContract

ExitContract {
  has_break: true,   // Break on quote detection
  has_continue: false,
  has_return: false,
  carriers: vec![
    CarrierInfo { name: "i", deltas: [+1, +2] },
    CarrierInfo { name: "out", pattern: Append }
  ]
}

Routing Decision

RoutingDecision {
  chosen: Pattern5bEscape,
  structure_notes: ["escape_handling", "variable_step"],
  missing_caps: []  // All required capabilities present
}

Recognition Algorithm

AST Inspection Steps

Find escape check:
- Pattern: if ch == "\\" { ... }
- Extract: Escape character constant
- Extract: Increment inside if block
Extract skip delta:
- Pattern: i = i + <const>
- Calculate: skip_delta = <const>
Find normal increment:
- Pattern: i = i + <const> (after escape if block)
- Calculate: normal_delta = <const>
Validate break condition:
- Pattern: if <char> == "<quote>" { break }
- Required for string boundary detection
Build LoopSkeleton:
- Carriers: [{name: "i", deltas: [normal, skip]}, ...]
- ExitContract: has_break=true
- RoutingDecision: chosen=Pattern5bEscape

Implementation Plan

Canonicalizer Extension (`src/mir/loop_canonicalizer/canonicalizer.rs`)

Add detect_escape_pattern() recognition:

fn detect_escape_pattern(
    loop_expr: &Expr,
    carriers: &[String]
) -> Option<EscapePatternInfo> {
    // Step 1-5 as above
    // Return: { escape_char, skip_delta, normal_delta, carrier_name }
}

Priority: Call before detect_skip_whitespace_pattern() (more specific pattern first)

Pattern Recognizer Wrapper (`src/mir/loop_canonicalizer/pattern_recognizer.rs`)

Expose detect_escape_pattern():

pub fn try_extract_escape_pattern(
    loop_expr: &Expr
) -> Option<(String, i64, i64)> {  // (carrier, normal_delta, skip_delta)
    // Delegate to canonicalizer detection
}

Test Fixture (`tools/selfhost/test_pattern5b_escape_minimal.hako`)

Minimal reproducible example:

// Minimal escape sequence parser
local s = "\\"hello\\" world"
local n = s.length()
local i = 0
local out = ""

loop(i < n) {
  local ch = s.substring(i, i+1)

  if ch == "\"" {
    break
  }

  if ch == "\\" {
    i = i + 1  // Skip escape character
    if i < n {
      ch = s.substring(i, i+1)
    }
  }

  out = out + ch
  i = i + 1
}

print(out)  // Should print: hello" world

Files to Modify (Phase 91)

New Files

docs/development/current/main/phases/phase-91/README.md ← You are here
docs/development/current/main/design/pattern-p5b-escape-design.md (new - detailed design)
tools/selfhost/test_pattern5b_escape_minimal.hako (new - test fixture)

Modified Files

docs/development/current/main/design/loop-canonicalizer.md
- Add Pattern P5b to capability matrix
- Add recognition algorithm
- Add routing decision table
(Phase 91 Step 2+) src/mir/loop_canonicalizer/canonicalizer.rs
- Add detect_escape_pattern() function
- Extend canonicalize_loop_expr() to check for escape patterns
(Phase 91 Step 2+) src/mir/loop_canonicalizer/pattern_recognizer.rs
- Add try_extract_escape_pattern() wrapper

Completion Status

Phase 91 Step 2: Implementation ✅ COMPLETE

✅ Extended UpdateKind enum with ConditionalStep variant
✅ Implemented detect_escape_skip_pattern() in AST recognizer
✅ Updated canonicalizer to recognize P5b patterns
✅ Added comprehensive unit test: test_escape_skip_pattern_recognition
✅ Verified parity in strict mode (canonical vs actual decision routing)

Key Deliverables:

Updated skeleton_types.rs: ConditionalStep support
Updated ast_feature_extractor.rs: P5b pattern detection
Updated canonicalizer.rs: P5b routing to Pattern2Break + unit test
Updated test_pattern5b_escape_minimal.hako: Fixed syntax errors

Test Results: 1062/1062 tests PASS (including new P5b unit test)

Next Steps (Future Sessions)

Phase 92: Lowering

Implement Pattern5bEscape lowerer in JoinIR
Handle ConditionalStep carrier updates with PHI composition
E2E test with test_pattern5b_escape_minimal.hako

Phase 93: Pattern P5 (Guard-Bounded)

Implement Pattern5 for mini_vm_core.hako:541
Consider micro-loop refactoring alternative
Document guard-counter optimization strategy

Phase 94+: Pattern P6 (Nested Loops)

Recursive JoinIR composition for seam_inspector.hako:76
Cross-level scope/carrier handoff

SSOT References

JoinIR Architecture: docs/development/current/main/joinir-architecture-overview.md
Loop Canonicalizer Design: docs/development/current/main/design/loop-canonicalizer.md
Capability Tags: src/mir/loop_canonicalizer/capability_guard.rs

Summary

Phase 91 establishes the next frontier of JoinIR coverage: Pattern P5b (Escape Handling).

This pattern unlocks:

✅ All string escape parsing loops
✅ Foundation for Pattern P5 (guard-bounded)
✅ Preparation for Pattern P6 (nested loops)

Current readiness: 47% (16/30 loops) After Phase 91: Expected to reach ~60% (18/30 loops) Long-term target: >90% coverage with P5, P5b, P6 patterns

All acceptance criteria defined. Implementation ready for next session.

README.md

Phase 91: JoinIR Coverage Expansion (Selfhost depth-2)

Status

Executive Summary

Analysis Results

Loop Inventory by Component

File: apps/selfhost-vm/boxes/json_cur.hako (3 loops)

File: apps/selfhost-vm/json_loader.hako (3 loops)

File: apps/selfhost-vm/boxes/mini_vm_core.hako (9 loops)

File: apps/selfhost-vm/boxes/seam_inspector.hako (13 loops)

File: apps/selfhost-vm/boxes/mini_vm_prints.hako (1 loop)

Candidate Selection: Priority Order

🥇 IMMEDIATE CANDIDATE: Pattern P5b (Escape Handling)

🥈 SECOND CANDIDATE: Pattern P5 (Guard-Bounded)

🥉 THIRD CANDIDATE: Pattern P6 (Nested Loops)

Recommended Immediate Action

Phase 91 (This Session): Pattern P5b Planning

Design: Pattern P5b (Escape Sequence Handling)

Motivation

Solution: Pattern P5b Definition

Header Requirement

Escape Check Requirement

After-Escape Requirement

Skeleton Structure

Carrier Configuration

ExitContract

Routing Decision

Recognition Algorithm

AST Inspection Steps

Implementation Plan

Canonicalizer Extension (src/mir/loop_canonicalizer/canonicalizer.rs)

Pattern Recognizer Wrapper (src/mir/loop_canonicalizer/pattern_recognizer.rs)

Test Fixture (tools/selfhost/test_pattern5b_escape_minimal.hako)

Files to Modify (Phase 91)

New Files

Modified Files

Completion Status

Phase 91 Step 2: Implementation ✅ COMPLETE

Next Steps (Future Sessions)

Phase 92: Lowering

Phase 93: Pattern P5 (Guard-Bounded)

Phase 94+: Pattern P6 (Nested Loops)

SSOT References

Summary

File: `apps/selfhost-vm/boxes/json_cur.hako` (3 loops)

File: `apps/selfhost-vm/json_loader.hako` (3 loops)

File: `apps/selfhost-vm/boxes/mini_vm_core.hako` (9 loops)

File: `apps/selfhost-vm/boxes/seam_inspector.hako` (13 loops)

File: `apps/selfhost-vm/boxes/mini_vm_prints.hako` (1 loop)

Canonicalizer Extension (`src/mir/loop_canonicalizer/canonicalizer.rs`)

Pattern Recognizer Wrapper (`src/mir/loop_canonicalizer/pattern_recognizer.rs`)

Test Fixture (`tools/selfhost/test_pattern5b_escape_minimal.hako`)