feat(joinir): Phase 142 P2 Step 3-A - Pattern4 early return fail-fast

2025-12-16 13:48:30 +09:00
parent 42339ca77f
commit 2674e074b6
8 changed files with 1029 additions and 30 deletions
--- a/docs/development/current/main/phases/phase-143/README.md
+++ b/docs/development/current/main/phases/phase-143/README.md
@ -383,7 +383,384 @@ cargo test --release --lib loop_canonicalizer --release

 ---

+## P2: parse_array Pattern - Separator + Stop Combo
+
+### Status
+✅ Complete (2025-12-16)
+
+### Objective
+Extend canonicalizer to recognize parse_array patterns with both `continue` (separator handling) and `return` (stop condition).
+
+### Target Pattern
+`tools/selfhost/test_pattern4_parse_array.hako`
+
+```hako
+loop(p < len) {
+  local ch = s.substring(p, p + 1)
+
+  // Check for array end (return)
+  if ch == "]" {
+    if elem.length() > 0 {
+      arr.push(elem)
+    }
+    return 0
+  }
+
+  // Check for separator (continue)
+  if ch == "," {
+    if elem.length() > 0 {
+      arr.push(elem)
+      elem = ""
+    }
+    p = p + 1
+    continue
+  }
+
+  // Accumulate element
+  elem = elem + ch
+  p = p + 1
+}
+```
+
+### Pattern Characteristics
+
+**Key Features**:
+- Multiple exit types: both `return` (stop condition) and `continue` (separator)
+- Separator handling: `,` triggers element save and continue
+- Stop condition: `]` triggers final save and return
+- Same structural pattern as parse_string
+
+**Structure**:
+```
+loop(cond) {
+    // ... body statements (ch computation)
+    if stop_cond {            // ']' for array
+        // ... save final element
+        return result
+    }
+    if separator_cond {       // ',' for array
+        // ... save element, reset accumulator
+        carrier = carrier + step
+        continue
+    }
+    // ... accumulate element
+    carrier = carrier + step
+}
+```
+
+### Implementation Summary
+
+#### Key Discovery: Shared Pattern with parse_string
+
+**No new recognizer needed!** The existing `detect_parse_string_pattern()` already handles both patterns:
+- Both have `return` statement (stop condition)
+- Both have `continue` statement (separator/escape)
+- Both have carrier updates
+- Only semantic difference is what the conditions check for
+
+#### Changes Made
+
+1. **Documentation Updates** (~150 lines)
+   - Updated `ast_feature_extractor.rs` to document parse_array support
+   - Updated `pattern_recognizer.rs` wrapper documentation
+   - Updated `canonicalizer.rs` supported patterns list
+   - Added parse_array example to pattern documentation
+
+2. **Unit Test** (~165 lines)
+   - Added `test_parse_array_pattern_recognized()` in `canonicalizer.rs`
+   - Mirrors parse_string test structure with array-specific conditions
+   - Verifies same Pattern4Continue routing
+
+3. **Error Messages** (~5 lines)
+   - Updated error messages to mention parse_array
+
+**Total lines modified**: ~320 lines (mostly documentation)
+
+### Acceptance Criteria
+
+- ✅ Canonicalizer creates Skeleton for parse_array loop
+- ✅ RoutingDecision.chosen == Pattern4Continue
+- ✅ Strict parity green (canonicalizer and router agree)
+- ✅ Default behavior unchanged
+- ✅ Unit test added and passing
+- ✅ No new capability needed
+
+### Results
+
+#### Parity Verification
+
+```bash
+NYASH_JOINIR_DEV=1 HAKO_JOINIR_STRICT=1 ./target/release/hakorune \
+  tools/selfhost/test_pattern4_parse_array.hako
+```
+
+**Output**:
+```
+[loop_canonicalizer]   Skeleton steps: 3
+[loop_canonicalizer]   Carriers: 1
+[loop_canonicalizer]   Has exits: true
+[loop_canonicalizer]   Decision: SUCCESS
+[loop_canonicalizer]   Chosen pattern: Pattern4Continue
+[loop_canonicalizer]   Missing caps: []
+[loop_canonicalizer/PARITY] OK in function 'main': canonical and actual agree on Pattern4Continue
+```
+
+**Status**: ✅ **Green parity** - canonicalizer and router agree on Pattern4Continue
+
+#### Unit Test Results
+
+```bash
+cargo test --release --lib loop_canonicalizer::canonicalizer::tests::test_parse_array_pattern_recognized
+```
+
+**Status**: ✅ **PASS**
+
+### Statistics
+
+| Metric | Count |
+|--------|-------|
+| New patterns supported | 1 (parse_array, shares recognizer with parse_string) |
+| Total patterns supported | 5 (skip_whitespace, parse_number, continue, parse_string, parse_array) |
+| New Capability Tags | 0 (uses existing ConstStep) |
+| Lines added | ~320 (mostly documentation) |
+| Files modified | 3 (canonicalizer.rs, ast_feature_extractor.rs, pattern_recognizer.rs) |
+| Unit tests added | 1 |
+| Parity status | Green ✅ |
+
+### Comparison: Parse String vs Parse Array
+
+| Aspect | Parse String | Parse Array |
+|--------|--------------|-------------|
+| **Stop condition** | `"` (quote) | `]` (array end) |
+| **Separator** | `\` (escape) | `,` (element separator) |
+| **Structure** | continue + return | continue + return |
+| **Recognizer** | `detect_parse_string_pattern()` | **Same recognizer!** |
+| **Routing** | Pattern4Continue | Pattern4Continue |
+| **ExitContract** | has_continue=true, has_return=true | has_continue=true, has_return=true |
+
+### Key Insight: Structural vs Semantic Patterns
+
+**Major Discovery**: parse_string and parse_array are **structurally identical** at the AST level:
+- Both have `if stop_cond { return }`
+- Both have `if separator_cond { continue }`
+- Both have carrier updates
+
+The **semantic difference** (what the conditions check) doesn't matter for pattern recognition!
+
+This demonstrates the power of AST-based pattern matching: we can recognize structural patterns without understanding their semantic meaning.
+
+### Follow-up Opportunities
+
+#### Next Steps (Phase 143 P3)
+- [ ] Support parse_object pattern (likely also shares the same recognizer!)
+- [ ] Document pattern families (structural equivalence classes)
+
+#### Future Enhancements
+- [ ] Generalize to "dual-exit patterns" (continue + return)
+- [ ] Add corpus analysis to discover more structural equivalences
+- [ ] Create pattern taxonomy based on AST structure
+
+### Lessons Learned
+
+1. **Structural Equivalence**: Different semantic patterns can share the same AST structure
+2. **Recognizer Reuse**: One recognizer can handle multiple use cases
+3. **Documentation > Code**: More documentation changes than code changes
+4. **Test Coverage**: Unit tests verify both semantic variants work with the same recognizer
+
+---
+
+## P3: parse_object Pattern - Key-Value Pair Collection
+
+### Status
+✅ Complete (2025-12-16)
+
+### Objective
+Verify that parse_object pattern (key-value pair collection) is recognized by the existing recognizer, maintaining structural equivalence with parse_string/parse_array.
+
+### Target Pattern
+`tools/selfhost/test_pattern4_parse_object.hako`
+
+```hako
+loop(p < s.length()) {
+  // ... optional body statements
+
+  // Check for object end (return)
+  local ch = s.substring(p, p+1)
+  if ch == "}" {
+    return obj  // Stop: object complete
+  }
+
+  // Check for separator (continue)
+  if ch == "," {
+    p = p + 1
+    continue  // Separator: continue to next key-value pair
+  }
+
+  // Regular processing
+  p = p + 1
+}
+```
+
+### Pattern Characteristics
+
+**Key Features**:
+- Multiple exit types: both `return` (stop condition) and `continue` (separator)
+- Separator handling: `,` triggers continue to next pair
+- Stop condition: `}` triggers return with result
+- **Same structural pattern as parse_string/parse_array**
+
+**Structure**:
+```
+loop(cond) {
+    // ... body statements (ch computation)
+    if stop_cond {            // '}' for object
+        return result
+    }
+    if separator_cond {       // ',' for object
+        carrier = carrier + step
+        continue
+    }
+    // ... regular processing
+    carrier = carrier + step
+}
+```
+
+### Implementation Summary
+
+#### Key Discovery: Complete Structural Equivalence
+
+**No new recognizer needed!** The existing `detect_parse_string_pattern()` handles parse_object perfectly:
+- Has `return` statement (stop condition: `}`)
+- Has `continue` statement (separator: `,`)
+- Has carrier updates (`p = p + 1`)
+- Only semantic difference is the stop/separator characters
+
+**Pattern Family Confirmed**: parse_string, parse_array, and parse_object are **structurally identical**.
+
+#### Changes Made
+
+1. **Test File Creation** (~50 lines)
+   - Created `tools/selfhost/test_pattern4_parse_object.hako`
+   - Minimal test demonstrating parse_object loop structure
+
+2. **Unit Test** (~170 lines)
+   - Added `test_parse_object_pattern_recognized()` in `canonicalizer.rs`
+   - Mirrors parse_array test structure with object-specific conditions (`}` and `,`)
+   - Verifies same Pattern4Continue routing
+
+3. **Documentation** (this section)
+
+**Total implementation**: ~220 lines (no new recognizer code needed!)
+
+### Acceptance Criteria
+
+- ✅ Canonicalizer creates Skeleton for parse_object loop
+- ✅ RoutingDecision.chosen == Pattern4Continue
+- ✅ RoutingDecision.missing_caps == []
+- ✅ Strict parity green (canonicalizer and router agree)
+- ✅ Default behavior unchanged
+- ✅ Unit test added and passing
+- ✅ No new capability needed
+- ✅ **No new recognizer needed** (existing recognizer handles it)
+
+### Results
+
+#### Parity Verification
+
+```bash
+NYASH_JOINIR_DEV=1 HAKO_JOINIR_STRICT=1 ./target/release/hakorune \
+  tools/selfhost/test_pattern4_parse_object.hako
+```
+
+**Output**:
+```
+[loop_canonicalizer]   Chosen pattern: Pattern4Continue
+[choose_pattern_kind/PARITY] OK: canonical and actual agree on Pattern4Continue
+[loop_canonicalizer/PARITY] OK in function 'Main.parse_object_loop/0': canonical and actual agree on Pattern4Continue
+```
+
+**Status**: ✅ **Green parity** - canonicalizer and router agree on Pattern4Continue
+
+#### Unit Test Results
+
+```bash
+cargo test --release --lib loop_canonicalizer::canonicalizer::tests::test_parse_object_pattern_recognized
+```
+
+**Status**: ✅ **PASS**
+
+### Statistics
+
+| Metric | Count |
+|--------|-------|
+| New patterns supported | 1 (parse_object, shares recognizer with parse_string/array) |
+| Total patterns supported | 6 (skip_whitespace, parse_number, continue, parse_string, parse_array, parse_object) |
+| New Capability Tags | 0 (uses existing ConstStep) |
+| Lines added | ~220 (test file + unit test + docs) |
+| Files modified | 2 (canonicalizer.rs, new test file) |
+| Unit tests added | 1 |
+| Parity status | Green ✅ |
+| **New recognizer code** | **0 lines** (complete reuse!) |
+
+### Comparison: Parse String vs Parse Array vs Parse Object
+
+| Aspect | Parse String | Parse Array | Parse Object |
+|--------|--------------|-------------|--------------|
+| **Stop condition** | `"` (quote) | `]` (array end) | `}` (object end) |
+| **Separator** | `\` (escape) | `,` (element separator) | `,` (pair separator) |
+| **Structure** | continue + return | continue + return | continue + return |
+| **Recognizer** | `detect_parse_string_pattern()` | **Same** | **Same** |
+| **Routing** | Pattern4Continue | Pattern4Continue | Pattern4Continue |
+| **ExitContract** | has_continue=true, has_return=true | **Same** | **Same** |
+
+### Key Insight: Structural Pattern Family
+
+**Major Discovery**: parse_string, parse_array, and parse_object form a **structural pattern family**:
+- All have `if stop_cond { return }`
+- All have `if separator_cond { continue }`
+- All have carrier updates
+- **One recognizer handles all three!**
+
+The semantic differences (string quote vs array bracket vs object brace) are invisible at the AST structure level.
+
+**Implication**: AST-based pattern matching creates natural pattern families. When we implement one pattern, we often get multiple variants "for free".
+
+### Coverage Expansion Summary
+
+Phase 143 started with 3 patterns (skip_whitespace, parse_number, continue) and expanded to 6 patterns:
+- P0: Added parse_number (new recognizer)
+- P1: Added parse_string (new recognizer)
+- P2: Added parse_array (**reused parse_string recognizer**)
+- P3: Added parse_object (**reused parse_string recognizer**)
+
+**Recognizer efficiency**: 2 new recognizers → 4 new patterns supported!
+
+### Follow-up Opportunities
+
+#### Next Steps (Phase 144+)
+- [ ] Document pattern families in design docs
+- [ ] Add corpus analysis to discover more structural equivalences
+- [ ] Create pattern taxonomy based on AST structure
+- [ ] Explore other potential pattern families
+
+#### Future Enhancements
+- [ ] Generalize to "dual-exit patterns" (continue + return)
+- [ ] Support triple-exit patterns (break + continue + return)
+- [ ] Add signature-based pattern discovery
+
+### Lessons Learned
+
+1. **Pattern Families**: Structural equivalence creates natural groupings
+2. **Recognizer Reuse**: Testing existing recognizers before writing new ones saves effort
+3. **Semantic vs Structural**: AST patterns are structural; semantic meaning doesn't affect recognition
+4. **Test-Driven Discovery**: Unit tests verify recognizer generality
+5. **Documentation Value**: Recording discoveries helps future pattern work
+
+---
+
 **Phase 143 P0: Complete** ✅
 **Phase 143 P1: Complete** ✅
+**Phase 143 P2: Complete** ✅
+**Phase 143 P3: Complete** ✅
 **Date**: 2025-12-16
 **Implemented by**: Claude Code (Sonnet 4.5)