Files
hakorune/docs/development/current/main/phases/phase-91
nyash-codex 7e03cb6425 docs(phase-91): Update Phase 91 README with Step 2 completion status
### Updates

#### 1. Status Section
- Updated Status to reflect all completed steps (1, 2-A/B/D, 2-E)
- Documented parity verification success

#### 2. Completion Status Section (NEW)
- Added dedicated section for Phase 91 Step 2 completion
- Listed all deliverables with checkmarks
- Documented test results: 1062/1062 PASS

#### 3. Next Steps
- Clarified Phase 92 lowering requirements
- Updated timeline expectations

#### 4. Test Fixture Fix
- Fixed syntax error in test_pattern5b_escape_minimal.hako
  (field declarations: changed `console: ConsoleBox` to `console ConsoleBox`)

### Context

Phase 91 Step 2 is now fully complete:
-  AST recognizer (detect_escape_skip_pattern)
-  Canonicalizer integration (UpdateKind::ConditionalStep)
-  Unit tests (test_escape_skip_pattern_recognition)
-  Parity verification (strict mode green)

Ready for Phase 92 lowering implementation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-16 14:55:40 +09:00
..

Phase 91: JoinIR Coverage Expansion (Selfhost depth-2)

Status

  • Analysis Complete: Loop inventory across selfhost codebase (Step 1)
  • Planning Complete: Pattern P5b (Escape Handling) candidate selected (Step 1)
  • Implementation Complete: AST recognizer, canonicalizer integration, unit tests (Step 2-A/B/D)
  • Parity Verified: Strict mode green in test_pattern5b_escape_minimal.hako (Step 2-E)
  • 📝 Documentation: Updated Phase 91 README with completion status

Executive Summary

Current JoinIR Readiness: 47% (16/30 loops in selfhost code)

Category Count Status Effort
Pattern 1 (simple bounded) 16 Ready None
Pattern 2 (with break) 1 ⚠️ Partial Low
Pattern P5b (escape handling) ~3 Blocked Medium
Pattern P5 (guard-bounded) ~2 Blocked High
Pattern P6 (nested loops) ~8 Blocked Very High

Analysis Results

Loop Inventory by Component

File: apps/selfhost-vm/boxes/json_cur.hako (3 loops)

  • Lines 9-14: Pattern 1 (simple bounded)
  • Lines 23-32: Pattern 1 variant with break
  • Lines 42-57: Pattern 1 with guard-less loop(true)

File: apps/selfhost-vm/json_loader.hako (3 loops)

  • Lines 16-22: Pattern 1 (simple bounded)
  • Lines 30-37: Pattern P5b CANDIDATE (escape sequence handling)
  • Lines 43-48: Pattern 1 (simple bounded)

File: apps/selfhost-vm/boxes/mini_vm_core.hako (9 loops)

  • Lines 208-231: ⚠️ Pattern 1 variant (with continue)
  • Lines 239-253: Pattern 1 (with accumulator)
  • Lines 388-400, 493-505: Pattern 1 (6 bounded search loops)
  • Lines 541-745: Pattern P5 PRIME CANDIDATE (guard-bounded, 204-line collect_prints)

File: apps/selfhost-vm/boxes/seam_inspector.hako (13 loops)

  • Lines 10-26: Pattern 1
  • Lines 38-42, 116-120, 123-127: Pattern 1 variants
  • Lines 76-107: Pattern P6 (deeply nested, 7+ levels)
  • Remaining: Mix of ⚠️ Pattern 1 variants with nested loops

File: apps/selfhost-vm/boxes/mini_vm_prints.hako (1 loop)

  • Line 118+: Pattern P5 (guard-bounded multi-case)

Candidate Selection: Priority Order

🥇 IMMEDIATE CANDIDATE: Pattern P5b (Escape Handling)

Target: json_loader.hako:30 - read_digits_from()

Scope: 8-line loop

Current Structure:

loop(i < n) {
  local ch = s.substring(i, i+1)
  if ch == "\"" { break }
  if ch == "\\" {
    i = i + 1
    ch = s.substring(i, i+1)
  }
  out = out + ch
  i = i + 1
}

Pattern Classification:

  • Header: loop(i < n)
  • Escape Check: if ch == "\\" { i = i + 2 instead of i + 1 }
  • Body: Append character
  • Carriers: i (position), out (buffer)
  • Challenge: Variable increment (sometimes +1, sometimes +2)

Why This Candidate:

  • Small scope (8 lines) - good for initial implementation
  • High reuse potential - same pattern appears in multiple parser locations
  • Moderate complexity - requires conditional step extension (not fully generic)
  • Clear benefit - would unlock escape sequence handling across all string parsers
  • Scope limitation - conditional increment not yet in Canonicalizer

Effort Estimate: 2-3 days

  • Canonicalizer extension: 4-6 hours
  • Pattern recognizer: 2-3 hours
  • Lowering implementation: 4-6 hours
  • Testing + verification: 2-3 hours

🥈 SECOND CANDIDATE: Pattern P5 (Guard-Bounded)

Target: mini_vm_core.hako:541 - collect_prints()

Scope: 204-line loop (monolithic)

Current Structure:

loop(true) {
  guard = guard + 1
  if guard > 200 { break }

  local p = index_of_from(json, k_print, pos)
  if p < 0 { break }

  // 5 different cases based on JSON type
  if is_binary_op { ... pos = ... out.push(...) }
  if is_compare { ... pos = ... out.push(...) }
  if is_literal { ... pos = ... out.push(...) }
  if is_function_call { ... pos = ... out.push(...) }
  if is_nested { ... pos = ... out.push(...) }

  pos = obj_end + 1
}

Pattern Classification:

  • Header: loop(true) (unconditional)
  • Guard: guard > LIMIT with increment each iteration
  • Body: Multiple case-based mutations
  • Carriers: pos, printed, guard, out (ArrayBox)
  • Exit conditions: Guard exhaustion OR search failure

Why This Candidate:

  • Monolithic optimization opportunity - 204 lines of complex control flow
  • Real-world JSON parsing - demonstrates practical JoinIR application
  • High performance impact - guard counter could be eliminated via SSA
  • High complexity - needs new Pattern5 guard-handling variant
  • Large scope - would benefit from split into micro-loops first

Effort Estimate: 1-2 weeks

  • Design: 2-3 days (pattern definition, contract)
  • Implementation: 5-7 days
  • Testing + verification: 2-3 days

Alternative Strategy: Could split into 5 micro-loops per case:

// Instead of one 204-line loop with 5 cases:
// Create 5 functions, each handling one case:
loop_binary_op() { ... }
loop_compare() { ... }
loop_literal() { ... }
loop_function_call() { ... }
loop_nested() { ... }

// Then main loop dispatches:
loop(true) {
  guard = guard + 1
  if guard > limit { break }
  if type == BINARY_OP { loop_binary_op(...) }
  ...
}

This would make each sub-loop Pattern 1-compatible immediately.


🥉 THIRD CANDIDATE: Pattern P6 (Nested Loops)

Target: seam_inspector.hako:76 - _scan_boxes()

Scope: Multi-level nested (7+ nesting levels)

Current Structure: 37-line outer loop containing 6 nested loops

Pattern Classification:

  • Nesting levels: 7+
  • Carriers: Multiple per level (i, j, k, name, pos, etc.)
  • Exit conditions: Varied per level (bounds, break, continue)
  • Scope handoff: Complex state passing between levels

Why This Candidate:

  • Demonstrates nested composition - needed for production parsers
  • Realistic code - actual box/function scanner
  • Highest complexity - requires recursive JoinIR composition
  • Long-term project - 2-3 weeks minimum

Effort Estimate: 2-3 weeks

  • Design recursive composition: 3-5 days
  • Per-level implementation: 7-10 days
  • Testing nested composition: 3-5 days

Phase 91 (This Session): Pattern P5b Planning

Objective: Design Pattern P5b (escape sequence handling) with minimal implementation

Steps:

  1. Analysis complete (done by Explore agent)
  2. Design P5b pattern (canonicalizer contract)
  3. Create minimal fixture (test_pattern5b_escape_minimal.hako)
  4. Extend Canonicalizer to recognize escape patterns
  5. Plan lowering (defer implementation to next session)
  6. Document P5b architecture in loop-canonicalizer.md

Acceptance Criteria:

  • Pattern P5b design document complete
  • Minimal escape test fixture created
  • Canonicalizer recognizes escape patterns (dev-only observation)
  • Parity check passes (strict mode)
  • No lowering changes yet (recognition-only phase)

Deliverables:

  • docs/development/current/main/phases/phase-91/README.md - This document
  • docs/development/current/main/design/pattern-p5b-escape-design.md - Pattern design (new)
  • tools/selfhost/test_pattern5b_escape_minimal.hako - Test fixture (new)
  • Updated docs/development/current/main/design/loop-canonicalizer.md - Capability tags extended

Design: Pattern P5b (Escape Sequence Handling)

Motivation

String parsing commonly requires escape sequence handling:

  • Double quotes: "text with \" escaped quote"
  • Backslashes: "path\\with\\backslashes"
  • Newlines: "text with \n newline"

Current loops handle this with conditional increment:

if ch == "\\" {
    i = i + 1  // Skip escape character itself
    ch = next_char
}
i = i + 1  // Always advance

This variable-step pattern is not JoinIR-compatible because:

  • Loop increment is conditional (sometimes +1, sometimes +2)
  • Canonicalizer expects constant-delta carriers
  • Lowering expects uniform update rules

Solution: Pattern P5b Definition

Header Requirement

loop(i < n)  // Bounded loop on string length

Escape Check Requirement

if ch == "\\" {
    i = i + delta_skip  // Skip character (typically +1, +2, or variable)
    // Optional: consume escape character
    ch = s.substring(i, i+1)
}

After-Escape Requirement

// Standard character processing
out = out + ch
i = i + delta_normal  // Standard increment (typically +1)

Skeleton Structure

LoopSkeleton {
  steps: [
    HeaderCond(i < n),
    Body(escape_check_stmts),
    Body(process_char_stmts),
    Update(i = i + normal_delta, maybe(i = i + skip_delta))
  ]
}

Carrier Configuration

  • Primary Carrier: Loop variable (i)
    • delta_normal: +1 (standard case)
    • delta_escape: +1 or +2 (skip escape)
  • Secondary Carrier: Accumulator (out)
    • Pattern: out = out + value

ExitContract

ExitContract {
  has_break: true,   // Break on quote detection
  has_continue: false,
  has_return: false,
  carriers: vec![
    CarrierInfo { name: "i", deltas: [+1, +2] },
    CarrierInfo { name: "out", pattern: Append }
  ]
}

Routing Decision

RoutingDecision {
  chosen: Pattern5bEscape,
  structure_notes: ["escape_handling", "variable_step"],
  missing_caps: []  // All required capabilities present
}

Recognition Algorithm

AST Inspection Steps

  1. Find escape check:

    • Pattern: if ch == "\\" { ... }
    • Extract: Escape character constant
    • Extract: Increment inside if block
  2. Extract skip delta:

    • Pattern: i = i + <const>
    • Calculate: skip_delta = <const>
  3. Find normal increment:

    • Pattern: i = i + <const> (after escape if block)
    • Calculate: normal_delta = <const>
  4. Validate break condition:

    • Pattern: if <char> == "<quote>" { break }
    • Required for string boundary detection
  5. Build LoopSkeleton:

    • Carriers: [{name: "i", deltas: [normal, skip]}, ...]
    • ExitContract: has_break=true
    • RoutingDecision: chosen=Pattern5bEscape

Implementation Plan

Canonicalizer Extension (src/mir/loop_canonicalizer/canonicalizer.rs)

Add detect_escape_pattern() recognition:

fn detect_escape_pattern(
    loop_expr: &Expr,
    carriers: &[String]
) -> Option<EscapePatternInfo> {
    // Step 1-5 as above
    // Return: { escape_char, skip_delta, normal_delta, carrier_name }
}

Priority: Call before detect_skip_whitespace_pattern() (more specific pattern first)

Pattern Recognizer Wrapper (src/mir/loop_canonicalizer/pattern_recognizer.rs)

Expose detect_escape_pattern():

pub fn try_extract_escape_pattern(
    loop_expr: &Expr
) -> Option<(String, i64, i64)> {  // (carrier, normal_delta, skip_delta)
    // Delegate to canonicalizer detection
}

Test Fixture (tools/selfhost/test_pattern5b_escape_minimal.hako)

Minimal reproducible example:

// Minimal escape sequence parser
local s = "\\"hello\\" world"
local n = s.length()
local i = 0
local out = ""

loop(i < n) {
  local ch = s.substring(i, i+1)

  if ch == "\"" {
    break
  }

  if ch == "\\" {
    i = i + 1  // Skip escape character
    if i < n {
      ch = s.substring(i, i+1)
    }
  }

  out = out + ch
  i = i + 1
}

print(out)  // Should print: hello" world

Files to Modify (Phase 91)

New Files

  1. docs/development/current/main/phases/phase-91/README.md ← You are here
  2. docs/development/current/main/design/pattern-p5b-escape-design.md (new - detailed design)
  3. tools/selfhost/test_pattern5b_escape_minimal.hako (new - test fixture)

Modified Files

  1. docs/development/current/main/design/loop-canonicalizer.md

    • Add Pattern P5b to capability matrix
    • Add recognition algorithm
    • Add routing decision table
  2. (Phase 91 Step 2+) src/mir/loop_canonicalizer/canonicalizer.rs

    • Add detect_escape_pattern() function
    • Extend canonicalize_loop_expr() to check for escape patterns
  3. (Phase 91 Step 2+) src/mir/loop_canonicalizer/pattern_recognizer.rs

    • Add try_extract_escape_pattern() wrapper

Completion Status

Phase 91 Step 2: Implementation COMPLETE

  • Extended UpdateKind enum with ConditionalStep variant
  • Implemented detect_escape_skip_pattern() in AST recognizer
  • Updated canonicalizer to recognize P5b patterns
  • Added comprehensive unit test: test_escape_skip_pattern_recognition
  • Verified parity in strict mode (canonical vs actual decision routing)

Key Deliverables:

  • Updated skeleton_types.rs: ConditionalStep support
  • Updated ast_feature_extractor.rs: P5b pattern detection
  • Updated canonicalizer.rs: P5b routing to Pattern2Break + unit test
  • Updated test_pattern5b_escape_minimal.hako: Fixed syntax errors

Test Results: 1062/1062 tests PASS (including new P5b unit test)


Next Steps (Future Sessions)

Phase 92: Lowering

  • Implement Pattern5bEscape lowerer in JoinIR
  • Handle ConditionalStep carrier updates with PHI composition
  • E2E test with test_pattern5b_escape_minimal.hako

Phase 93: Pattern P5 (Guard-Bounded)

  • Implement Pattern5 for mini_vm_core.hako:541
  • Consider micro-loop refactoring alternative
  • Document guard-counter optimization strategy

Phase 94+: Pattern P6 (Nested Loops)

  • Recursive JoinIR composition for seam_inspector.hako:76
  • Cross-level scope/carrier handoff

SSOT References

  • JoinIR Architecture: docs/development/current/main/joinir-architecture-overview.md
  • Loop Canonicalizer Design: docs/development/current/main/design/loop-canonicalizer.md
  • Capability Tags: src/mir/loop_canonicalizer/capability_guard.rs

Summary

Phase 91 establishes the next frontier of JoinIR coverage: Pattern P5b (Escape Handling).

This pattern unlocks:

  • All string escape parsing loops
  • Foundation for Pattern P5 (guard-bounded)
  • Preparation for Pattern P6 (nested loops)

Current readiness: 47% (16/30 loops) After Phase 91: Expected to reach ~60% (18/30 loops) Long-term target: >90% coverage with P5, P5b, P6 patterns

All acceptance criteria defined. Implementation ready for next session.