feat(joinir): Phase 142 P2 Step 3-A - Pattern4 early return fail-fast
This commit is contained in:
@ -383,7 +383,384 @@ cargo test --release --lib loop_canonicalizer --release
|
||||
|
||||
---
|
||||
|
||||
## P2: parse_array Pattern - Separator + Stop Combo
|
||||
|
||||
### Status
|
||||
✅ Complete (2025-12-16)
|
||||
|
||||
### Objective
|
||||
Extend canonicalizer to recognize parse_array patterns with both `continue` (separator handling) and `return` (stop condition).
|
||||
|
||||
### Target Pattern
|
||||
`tools/selfhost/test_pattern4_parse_array.hako`
|
||||
|
||||
```hako
|
||||
loop(p < len) {
|
||||
local ch = s.substring(p, p + 1)
|
||||
|
||||
// Check for array end (return)
|
||||
if ch == "]" {
|
||||
if elem.length() > 0 {
|
||||
arr.push(elem)
|
||||
}
|
||||
return 0
|
||||
}
|
||||
|
||||
// Check for separator (continue)
|
||||
if ch == "," {
|
||||
if elem.length() > 0 {
|
||||
arr.push(elem)
|
||||
elem = ""
|
||||
}
|
||||
p = p + 1
|
||||
continue
|
||||
}
|
||||
|
||||
// Accumulate element
|
||||
elem = elem + ch
|
||||
p = p + 1
|
||||
}
|
||||
```
|
||||
|
||||
### Pattern Characteristics
|
||||
|
||||
**Key Features**:
|
||||
- Multiple exit types: both `return` (stop condition) and `continue` (separator)
|
||||
- Separator handling: `,` triggers element save and continue
|
||||
- Stop condition: `]` triggers final save and return
|
||||
- Same structural pattern as parse_string
|
||||
|
||||
**Structure**:
|
||||
```
|
||||
loop(cond) {
|
||||
// ... body statements (ch computation)
|
||||
if stop_cond { // ']' for array
|
||||
// ... save final element
|
||||
return result
|
||||
}
|
||||
if separator_cond { // ',' for array
|
||||
// ... save element, reset accumulator
|
||||
carrier = carrier + step
|
||||
continue
|
||||
}
|
||||
// ... accumulate element
|
||||
carrier = carrier + step
|
||||
}
|
||||
```
|
||||
|
||||
### Implementation Summary
|
||||
|
||||
#### Key Discovery: Shared Pattern with parse_string
|
||||
|
||||
**No new recognizer needed!** The existing `detect_parse_string_pattern()` already handles both patterns:
|
||||
- Both have `return` statement (stop condition)
|
||||
- Both have `continue` statement (separator/escape)
|
||||
- Both have carrier updates
|
||||
- Only semantic difference is what the conditions check for
|
||||
|
||||
#### Changes Made
|
||||
|
||||
1. **Documentation Updates** (~150 lines)
|
||||
- Updated `ast_feature_extractor.rs` to document parse_array support
|
||||
- Updated `pattern_recognizer.rs` wrapper documentation
|
||||
- Updated `canonicalizer.rs` supported patterns list
|
||||
- Added parse_array example to pattern documentation
|
||||
|
||||
2. **Unit Test** (~165 lines)
|
||||
- Added `test_parse_array_pattern_recognized()` in `canonicalizer.rs`
|
||||
- Mirrors parse_string test structure with array-specific conditions
|
||||
- Verifies same Pattern4Continue routing
|
||||
|
||||
3. **Error Messages** (~5 lines)
|
||||
- Updated error messages to mention parse_array
|
||||
|
||||
**Total lines modified**: ~320 lines (mostly documentation)
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
- ✅ Canonicalizer creates Skeleton for parse_array loop
|
||||
- ✅ RoutingDecision.chosen == Pattern4Continue
|
||||
- ✅ Strict parity green (canonicalizer and router agree)
|
||||
- ✅ Default behavior unchanged
|
||||
- ✅ Unit test added and passing
|
||||
- ✅ No new capability needed
|
||||
|
||||
### Results
|
||||
|
||||
#### Parity Verification
|
||||
|
||||
```bash
|
||||
NYASH_JOINIR_DEV=1 HAKO_JOINIR_STRICT=1 ./target/release/hakorune \
|
||||
tools/selfhost/test_pattern4_parse_array.hako
|
||||
```
|
||||
|
||||
**Output**:
|
||||
```
|
||||
[loop_canonicalizer] Skeleton steps: 3
|
||||
[loop_canonicalizer] Carriers: 1
|
||||
[loop_canonicalizer] Has exits: true
|
||||
[loop_canonicalizer] Decision: SUCCESS
|
||||
[loop_canonicalizer] Chosen pattern: Pattern4Continue
|
||||
[loop_canonicalizer] Missing caps: []
|
||||
[loop_canonicalizer/PARITY] OK in function 'main': canonical and actual agree on Pattern4Continue
|
||||
```
|
||||
|
||||
**Status**: ✅ **Green parity** - canonicalizer and router agree on Pattern4Continue
|
||||
|
||||
#### Unit Test Results
|
||||
|
||||
```bash
|
||||
cargo test --release --lib loop_canonicalizer::canonicalizer::tests::test_parse_array_pattern_recognized
|
||||
```
|
||||
|
||||
**Status**: ✅ **PASS**
|
||||
|
||||
### Statistics
|
||||
|
||||
| Metric | Count |
|
||||
|--------|-------|
|
||||
| New patterns supported | 1 (parse_array, shares recognizer with parse_string) |
|
||||
| Total patterns supported | 5 (skip_whitespace, parse_number, continue, parse_string, parse_array) |
|
||||
| New Capability Tags | 0 (uses existing ConstStep) |
|
||||
| Lines added | ~320 (mostly documentation) |
|
||||
| Files modified | 3 (canonicalizer.rs, ast_feature_extractor.rs, pattern_recognizer.rs) |
|
||||
| Unit tests added | 1 |
|
||||
| Parity status | Green ✅ |
|
||||
|
||||
### Comparison: Parse String vs Parse Array
|
||||
|
||||
| Aspect | Parse String | Parse Array |
|
||||
|--------|--------------|-------------|
|
||||
| **Stop condition** | `"` (quote) | `]` (array end) |
|
||||
| **Separator** | `\` (escape) | `,` (element separator) |
|
||||
| **Structure** | continue + return | continue + return |
|
||||
| **Recognizer** | `detect_parse_string_pattern()` | **Same recognizer!** |
|
||||
| **Routing** | Pattern4Continue | Pattern4Continue |
|
||||
| **ExitContract** | has_continue=true, has_return=true | has_continue=true, has_return=true |
|
||||
|
||||
### Key Insight: Structural vs Semantic Patterns
|
||||
|
||||
**Major Discovery**: parse_string and parse_array are **structurally identical** at the AST level:
|
||||
- Both have `if stop_cond { return }`
|
||||
- Both have `if separator_cond { continue }`
|
||||
- Both have carrier updates
|
||||
|
||||
The **semantic difference** (what the conditions check) doesn't matter for pattern recognition!
|
||||
|
||||
This demonstrates the power of AST-based pattern matching: we can recognize structural patterns without understanding their semantic meaning.
|
||||
|
||||
### Follow-up Opportunities
|
||||
|
||||
#### Next Steps (Phase 143 P3)
|
||||
- [ ] Support parse_object pattern (likely also shares the same recognizer!)
|
||||
- [ ] Document pattern families (structural equivalence classes)
|
||||
|
||||
#### Future Enhancements
|
||||
- [ ] Generalize to "dual-exit patterns" (continue + return)
|
||||
- [ ] Add corpus analysis to discover more structural equivalences
|
||||
- [ ] Create pattern taxonomy based on AST structure
|
||||
|
||||
### Lessons Learned
|
||||
|
||||
1. **Structural Equivalence**: Different semantic patterns can share the same AST structure
|
||||
2. **Recognizer Reuse**: One recognizer can handle multiple use cases
|
||||
3. **Documentation > Code**: More documentation changes than code changes
|
||||
4. **Test Coverage**: Unit tests verify both semantic variants work with the same recognizer
|
||||
|
||||
---
|
||||
|
||||
## P3: parse_object Pattern - Key-Value Pair Collection
|
||||
|
||||
### Status
|
||||
✅ Complete (2025-12-16)
|
||||
|
||||
### Objective
|
||||
Verify that parse_object pattern (key-value pair collection) is recognized by the existing recognizer, maintaining structural equivalence with parse_string/parse_array.
|
||||
|
||||
### Target Pattern
|
||||
`tools/selfhost/test_pattern4_parse_object.hako`
|
||||
|
||||
```hako
|
||||
loop(p < s.length()) {
|
||||
// ... optional body statements
|
||||
|
||||
// Check for object end (return)
|
||||
local ch = s.substring(p, p+1)
|
||||
if ch == "}" {
|
||||
return obj // Stop: object complete
|
||||
}
|
||||
|
||||
// Check for separator (continue)
|
||||
if ch == "," {
|
||||
p = p + 1
|
||||
continue // Separator: continue to next key-value pair
|
||||
}
|
||||
|
||||
// Regular processing
|
||||
p = p + 1
|
||||
}
|
||||
```
|
||||
|
||||
### Pattern Characteristics
|
||||
|
||||
**Key Features**:
|
||||
- Multiple exit types: both `return` (stop condition) and `continue` (separator)
|
||||
- Separator handling: `,` triggers continue to next pair
|
||||
- Stop condition: `}` triggers return with result
|
||||
- **Same structural pattern as parse_string/parse_array**
|
||||
|
||||
**Structure**:
|
||||
```
|
||||
loop(cond) {
|
||||
// ... body statements (ch computation)
|
||||
if stop_cond { // '}' for object
|
||||
return result
|
||||
}
|
||||
if separator_cond { // ',' for object
|
||||
carrier = carrier + step
|
||||
continue
|
||||
}
|
||||
// ... regular processing
|
||||
carrier = carrier + step
|
||||
}
|
||||
```
|
||||
|
||||
### Implementation Summary
|
||||
|
||||
#### Key Discovery: Complete Structural Equivalence
|
||||
|
||||
**No new recognizer needed!** The existing `detect_parse_string_pattern()` handles parse_object perfectly:
|
||||
- Has `return` statement (stop condition: `}`)
|
||||
- Has `continue` statement (separator: `,`)
|
||||
- Has carrier updates (`p = p + 1`)
|
||||
- Only semantic difference is the stop/separator characters
|
||||
|
||||
**Pattern Family Confirmed**: parse_string, parse_array, and parse_object are **structurally identical**.
|
||||
|
||||
#### Changes Made
|
||||
|
||||
1. **Test File Creation** (~50 lines)
|
||||
- Created `tools/selfhost/test_pattern4_parse_object.hako`
|
||||
- Minimal test demonstrating parse_object loop structure
|
||||
|
||||
2. **Unit Test** (~170 lines)
|
||||
- Added `test_parse_object_pattern_recognized()` in `canonicalizer.rs`
|
||||
- Mirrors parse_array test structure with object-specific conditions (`}` and `,`)
|
||||
- Verifies same Pattern4Continue routing
|
||||
|
||||
3. **Documentation** (this section)
|
||||
|
||||
**Total implementation**: ~220 lines (no new recognizer code needed!)
|
||||
|
||||
### Acceptance Criteria
|
||||
|
||||
- ✅ Canonicalizer creates Skeleton for parse_object loop
|
||||
- ✅ RoutingDecision.chosen == Pattern4Continue
|
||||
- ✅ RoutingDecision.missing_caps == []
|
||||
- ✅ Strict parity green (canonicalizer and router agree)
|
||||
- ✅ Default behavior unchanged
|
||||
- ✅ Unit test added and passing
|
||||
- ✅ No new capability needed
|
||||
- ✅ **No new recognizer needed** (existing recognizer handles it)
|
||||
|
||||
### Results
|
||||
|
||||
#### Parity Verification
|
||||
|
||||
```bash
|
||||
NYASH_JOINIR_DEV=1 HAKO_JOINIR_STRICT=1 ./target/release/hakorune \
|
||||
tools/selfhost/test_pattern4_parse_object.hako
|
||||
```
|
||||
|
||||
**Output**:
|
||||
```
|
||||
[loop_canonicalizer] Chosen pattern: Pattern4Continue
|
||||
[choose_pattern_kind/PARITY] OK: canonical and actual agree on Pattern4Continue
|
||||
[loop_canonicalizer/PARITY] OK in function 'Main.parse_object_loop/0': canonical and actual agree on Pattern4Continue
|
||||
```
|
||||
|
||||
**Status**: ✅ **Green parity** - canonicalizer and router agree on Pattern4Continue
|
||||
|
||||
#### Unit Test Results
|
||||
|
||||
```bash
|
||||
cargo test --release --lib loop_canonicalizer::canonicalizer::tests::test_parse_object_pattern_recognized
|
||||
```
|
||||
|
||||
**Status**: ✅ **PASS**
|
||||
|
||||
### Statistics
|
||||
|
||||
| Metric | Count |
|
||||
|--------|-------|
|
||||
| New patterns supported | 1 (parse_object, shares recognizer with parse_string/array) |
|
||||
| Total patterns supported | 6 (skip_whitespace, parse_number, continue, parse_string, parse_array, parse_object) |
|
||||
| New Capability Tags | 0 (uses existing ConstStep) |
|
||||
| Lines added | ~220 (test file + unit test + docs) |
|
||||
| Files modified | 2 (canonicalizer.rs, new test file) |
|
||||
| Unit tests added | 1 |
|
||||
| Parity status | Green ✅ |
|
||||
| **New recognizer code** | **0 lines** (complete reuse!) |
|
||||
|
||||
### Comparison: Parse String vs Parse Array vs Parse Object
|
||||
|
||||
| Aspect | Parse String | Parse Array | Parse Object |
|
||||
|--------|--------------|-------------|--------------|
|
||||
| **Stop condition** | `"` (quote) | `]` (array end) | `}` (object end) |
|
||||
| **Separator** | `\` (escape) | `,` (element separator) | `,` (pair separator) |
|
||||
| **Structure** | continue + return | continue + return | continue + return |
|
||||
| **Recognizer** | `detect_parse_string_pattern()` | **Same** | **Same** |
|
||||
| **Routing** | Pattern4Continue | Pattern4Continue | Pattern4Continue |
|
||||
| **ExitContract** | has_continue=true, has_return=true | **Same** | **Same** |
|
||||
|
||||
### Key Insight: Structural Pattern Family
|
||||
|
||||
**Major Discovery**: parse_string, parse_array, and parse_object form a **structural pattern family**:
|
||||
- All have `if stop_cond { return }`
|
||||
- All have `if separator_cond { continue }`
|
||||
- All have carrier updates
|
||||
- **One recognizer handles all three!**
|
||||
|
||||
The semantic differences (string quote vs array bracket vs object brace) are invisible at the AST structure level.
|
||||
|
||||
**Implication**: AST-based pattern matching creates natural pattern families. When we implement one pattern, we often get multiple variants "for free".
|
||||
|
||||
### Coverage Expansion Summary
|
||||
|
||||
Phase 143 started with 3 patterns (skip_whitespace, parse_number, continue) and expanded to 6 patterns:
|
||||
- P0: Added parse_number (new recognizer)
|
||||
- P1: Added parse_string (new recognizer)
|
||||
- P2: Added parse_array (**reused parse_string recognizer**)
|
||||
- P3: Added parse_object (**reused parse_string recognizer**)
|
||||
|
||||
**Recognizer efficiency**: 2 new recognizers → 4 new patterns supported!
|
||||
|
||||
### Follow-up Opportunities
|
||||
|
||||
#### Next Steps (Phase 144+)
|
||||
- [ ] Document pattern families in design docs
|
||||
- [ ] Add corpus analysis to discover more structural equivalences
|
||||
- [ ] Create pattern taxonomy based on AST structure
|
||||
- [ ] Explore other potential pattern families
|
||||
|
||||
#### Future Enhancements
|
||||
- [ ] Generalize to "dual-exit patterns" (continue + return)
|
||||
- [ ] Support triple-exit patterns (break + continue + return)
|
||||
- [ ] Add signature-based pattern discovery
|
||||
|
||||
### Lessons Learned
|
||||
|
||||
1. **Pattern Families**: Structural equivalence creates natural groupings
|
||||
2. **Recognizer Reuse**: Testing existing recognizers before writing new ones saves effort
|
||||
3. **Semantic vs Structural**: AST patterns are structural; semantic meaning doesn't affect recognition
|
||||
4. **Test-Driven Discovery**: Unit tests verify recognizer generality
|
||||
5. **Documentation Value**: Recording discoveries helps future pattern work
|
||||
|
||||
---
|
||||
|
||||
**Phase 143 P0: Complete** ✅
|
||||
**Phase 143 P1: Complete** ✅
|
||||
**Phase 143 P2: Complete** ✅
|
||||
**Phase 143 P3: Complete** ✅
|
||||
**Date**: 2025-12-16
|
||||
**Implemented by**: Claude Code (Sonnet 4.5)
|
||||
|
||||
Reference in New Issue
Block a user