feat(joinir): Phase 142 P2 Step 3-A - Pattern4 early return fail-fast

This commit is contained in:
nyash-codex
2025-12-16 13:48:30 +09:00
parent 42339ca77f
commit 2674e074b6
8 changed files with 1029 additions and 30 deletions

View File

@ -363,3 +363,107 @@ Phase 142 P1 successfully extends the Canonicalizer to recognize continue patter
- Follows existing re-export pattern from Phase 140-P4-A
All acceptance criteria met. ✅
---
## P2: Pattern4 Lowering Extension (IN PROGRESS)
### Objective
Extend Pattern4 lowering to handle "continue + return" patterns found in parse_string/array/object.
### Target Pattern
- `tools/selfhost/test_pattern4_parse_string.hako` - Parse string with continue (escape) + return (quote)
### Pattern4 Lowering Contract (Phase 142 P2)
#### Accepted Minimum Structure
**Return Handling**:
- **Position**: Early return inside one or more if blocks
- **Type**: Scalar return values (complex returns are out of scope)
- **Constraint**: Only the last return in loop body is processed
**Continue Side Updates**:
- **Pattern**: `if cond { carrier = carrier ± 1; continue }`
- **Update**: Constant step only (+1, -1, +2, -2, etc.)
- **Constraint**: Multiple carriers not yet supported
**Carrier and Payload**:
- **Carrier**: Loop variable used in loop condition
- **Payload**: State updated on non-continue path (e.g., result string)
**Exit Contract**:
- `has_continue = true` (continue pattern exists)
- `has_return = true` (early return exists)
- Both must coexist
#### Unsupported (Fail-Fast)
The following patterns are rejected with explicit error messages:
- [ ] Multiple continue patterns (2+ continue statements)
- [ ] Nested continue-return (continue inside if inside if)
- [ ] Complex return values (returning multiple fields)
- [ ] Variable step updates (escape sequence handling, etc.)
### Implementation Strategy
**Step 1**: Clarify Pattern4 contract (this document)
**Step 2**: Add E2E test case
**Step 3**: Extend Pattern4 lowerer
**Step 4**: Consider box-ification / modularization
**Step 5**: Implementation and verification
### Progress
- [x] Step 1: Contract clarification
- [ ] Step 2: Add test case
- [ ] Step 3: Extend lowerer
- [ ] Step 4: Consider box-ification
- [ ] Step 5: Verification complete
### Acceptance Criteria
- ✅ Representative test (parse_string or simple_continue) passes JoinIR lowering
- ✅ Execution results are correct in both VM and LLVM (scope to be determined)
- ✅ No regression in existing tests (phase132_exit_phi_parity, etc.)
- ✅ Unsupported patterns fail fast with reason (error_tags)
- ✅ No new environment variables added (dev-only observation only)
- ✅ Documentation updated
### Files to Modify
1. `docs/development/current/main/phases/phase-142/README.md` - Contract documentation
2. `tools/selfhost/test_pattern4_parse_string_lowering.hako` - Minimal E2E test (new)
3. `src/mir/builder/control_flow/joinir/patterns/pattern4_with_continue.rs` - Lowerer extension
4. `src/mir/builder/control_flow/joinir/patterns/pattern4_carrier_analyzer.rs` - Carrier analysis (if needed)
### Step 3-A: Early Return Fail-Fast (COMPLETE ✅)
**Status**: ✅ COMPLETE - Return detection and explicit error implemented
**Implementation**: Added `has_return_in_body()` helper function to Pattern4 lowerer
- Recursively scans loop body for return statements
- Returns explicit Fail-Fast error when return is detected
- Error message references Phase 142 P2 for future lowering
**Test Results**: All 14 canonicalizer tests PASS (no regressions)
**Key Achievement**: Unsafe silent acceptance is now prevented - early returns explicitly surface as errors with actionable messages.
### Step 3-B: Return Path JoinIR Generation (DEFERRED)
**Status**: 🔄 DEFERRED for separate session - Large-scale implementation requires careful design
**Why separate**: JoinIR generation involves responsibility boundary decisions (Pattern4 direct vs delegation to Pattern5) and ExitMeta/payload handling. Separating ensures cleaner cause analysis.
**Design questions to resolve first**:
1. Should return be handled directly in Pattern4 lowerer, or delegated to Pattern5?
2. How to transport return payload through exit/boundary/ExitMeta (can we reuse ContinueReturn assets)?
### SSOT References
- **Design**: `docs/development/current/main/design/loop-canonicalizer.md`
- **JoinIR Architecture**: `docs/development/current/main/joinir-architecture-overview.md`
- **Pattern4 Implementation**: `src/mir/builder/control_flow/joinir/patterns/pattern4_with_continue.rs`

View File

@ -383,7 +383,384 @@ cargo test --release --lib loop_canonicalizer --release
---
## P2: parse_array Pattern - Separator + Stop Combo
### Status
✅ Complete (2025-12-16)
### Objective
Extend canonicalizer to recognize parse_array patterns with both `continue` (separator handling) and `return` (stop condition).
### Target Pattern
`tools/selfhost/test_pattern4_parse_array.hako`
```hako
loop(p < len) {
local ch = s.substring(p, p + 1)
// Check for array end (return)
if ch == "]" {
if elem.length() > 0 {
arr.push(elem)
}
return 0
}
// Check for separator (continue)
if ch == "," {
if elem.length() > 0 {
arr.push(elem)
elem = ""
}
p = p + 1
continue
}
// Accumulate element
elem = elem + ch
p = p + 1
}
```
### Pattern Characteristics
**Key Features**:
- Multiple exit types: both `return` (stop condition) and `continue` (separator)
- Separator handling: `,` triggers element save and continue
- Stop condition: `]` triggers final save and return
- Same structural pattern as parse_string
**Structure**:
```
loop(cond) {
// ... body statements (ch computation)
if stop_cond { // ']' for array
// ... save final element
return result
}
if separator_cond { // ',' for array
// ... save element, reset accumulator
carrier = carrier + step
continue
}
// ... accumulate element
carrier = carrier + step
}
```
### Implementation Summary
#### Key Discovery: Shared Pattern with parse_string
**No new recognizer needed!** The existing `detect_parse_string_pattern()` already handles both patterns:
- Both have `return` statement (stop condition)
- Both have `continue` statement (separator/escape)
- Both have carrier updates
- Only semantic difference is what the conditions check for
#### Changes Made
1. **Documentation Updates** (~150 lines)
- Updated `ast_feature_extractor.rs` to document parse_array support
- Updated `pattern_recognizer.rs` wrapper documentation
- Updated `canonicalizer.rs` supported patterns list
- Added parse_array example to pattern documentation
2. **Unit Test** (~165 lines)
- Added `test_parse_array_pattern_recognized()` in `canonicalizer.rs`
- Mirrors parse_string test structure with array-specific conditions
- Verifies same Pattern4Continue routing
3. **Error Messages** (~5 lines)
- Updated error messages to mention parse_array
**Total lines modified**: ~320 lines (mostly documentation)
### Acceptance Criteria
- ✅ Canonicalizer creates Skeleton for parse_array loop
- ✅ RoutingDecision.chosen == Pattern4Continue
- ✅ Strict parity green (canonicalizer and router agree)
- ✅ Default behavior unchanged
- ✅ Unit test added and passing
- ✅ No new capability needed
### Results
#### Parity Verification
```bash
NYASH_JOINIR_DEV=1 HAKO_JOINIR_STRICT=1 ./target/release/hakorune \
tools/selfhost/test_pattern4_parse_array.hako
```
**Output**:
```
[loop_canonicalizer] Skeleton steps: 3
[loop_canonicalizer] Carriers: 1
[loop_canonicalizer] Has exits: true
[loop_canonicalizer] Decision: SUCCESS
[loop_canonicalizer] Chosen pattern: Pattern4Continue
[loop_canonicalizer] Missing caps: []
[loop_canonicalizer/PARITY] OK in function 'main': canonical and actual agree on Pattern4Continue
```
**Status**: ✅ **Green parity** - canonicalizer and router agree on Pattern4Continue
#### Unit Test Results
```bash
cargo test --release --lib loop_canonicalizer::canonicalizer::tests::test_parse_array_pattern_recognized
```
**Status**: ✅ **PASS**
### Statistics
| Metric | Count |
|--------|-------|
| New patterns supported | 1 (parse_array, shares recognizer with parse_string) |
| Total patterns supported | 5 (skip_whitespace, parse_number, continue, parse_string, parse_array) |
| New Capability Tags | 0 (uses existing ConstStep) |
| Lines added | ~320 (mostly documentation) |
| Files modified | 3 (canonicalizer.rs, ast_feature_extractor.rs, pattern_recognizer.rs) |
| Unit tests added | 1 |
| Parity status | Green ✅ |
### Comparison: Parse String vs Parse Array
| Aspect | Parse String | Parse Array |
|--------|--------------|-------------|
| **Stop condition** | `"` (quote) | `]` (array end) |
| **Separator** | `\` (escape) | `,` (element separator) |
| **Structure** | continue + return | continue + return |
| **Recognizer** | `detect_parse_string_pattern()` | **Same recognizer!** |
| **Routing** | Pattern4Continue | Pattern4Continue |
| **ExitContract** | has_continue=true, has_return=true | has_continue=true, has_return=true |
### Key Insight: Structural vs Semantic Patterns
**Major Discovery**: parse_string and parse_array are **structurally identical** at the AST level:
- Both have `if stop_cond { return }`
- Both have `if separator_cond { continue }`
- Both have carrier updates
The **semantic difference** (what the conditions check) doesn't matter for pattern recognition!
This demonstrates the power of AST-based pattern matching: we can recognize structural patterns without understanding their semantic meaning.
### Follow-up Opportunities
#### Next Steps (Phase 143 P3)
- [ ] Support parse_object pattern (likely also shares the same recognizer!)
- [ ] Document pattern families (structural equivalence classes)
#### Future Enhancements
- [ ] Generalize to "dual-exit patterns" (continue + return)
- [ ] Add corpus analysis to discover more structural equivalences
- [ ] Create pattern taxonomy based on AST structure
### Lessons Learned
1. **Structural Equivalence**: Different semantic patterns can share the same AST structure
2. **Recognizer Reuse**: One recognizer can handle multiple use cases
3. **Documentation > Code**: More documentation changes than code changes
4. **Test Coverage**: Unit tests verify both semantic variants work with the same recognizer
---
## P3: parse_object Pattern - Key-Value Pair Collection
### Status
✅ Complete (2025-12-16)
### Objective
Verify that parse_object pattern (key-value pair collection) is recognized by the existing recognizer, maintaining structural equivalence with parse_string/parse_array.
### Target Pattern
`tools/selfhost/test_pattern4_parse_object.hako`
```hako
loop(p < s.length()) {
// ... optional body statements
// Check for object end (return)
local ch = s.substring(p, p+1)
if ch == "}" {
return obj // Stop: object complete
}
// Check for separator (continue)
if ch == "," {
p = p + 1
continue // Separator: continue to next key-value pair
}
// Regular processing
p = p + 1
}
```
### Pattern Characteristics
**Key Features**:
- Multiple exit types: both `return` (stop condition) and `continue` (separator)
- Separator handling: `,` triggers continue to next pair
- Stop condition: `}` triggers return with result
- **Same structural pattern as parse_string/parse_array**
**Structure**:
```
loop(cond) {
// ... body statements (ch computation)
if stop_cond { // '}' for object
return result
}
if separator_cond { // ',' for object
carrier = carrier + step
continue
}
// ... regular processing
carrier = carrier + step
}
```
### Implementation Summary
#### Key Discovery: Complete Structural Equivalence
**No new recognizer needed!** The existing `detect_parse_string_pattern()` handles parse_object perfectly:
- Has `return` statement (stop condition: `}`)
- Has `continue` statement (separator: `,`)
- Has carrier updates (`p = p + 1`)
- Only semantic difference is the stop/separator characters
**Pattern Family Confirmed**: parse_string, parse_array, and parse_object are **structurally identical**.
#### Changes Made
1. **Test File Creation** (~50 lines)
- Created `tools/selfhost/test_pattern4_parse_object.hako`
- Minimal test demonstrating parse_object loop structure
2. **Unit Test** (~170 lines)
- Added `test_parse_object_pattern_recognized()` in `canonicalizer.rs`
- Mirrors parse_array test structure with object-specific conditions (`}` and `,`)
- Verifies same Pattern4Continue routing
3. **Documentation** (this section)
**Total implementation**: ~220 lines (no new recognizer code needed!)
### Acceptance Criteria
- ✅ Canonicalizer creates Skeleton for parse_object loop
- ✅ RoutingDecision.chosen == Pattern4Continue
- ✅ RoutingDecision.missing_caps == []
- ✅ Strict parity green (canonicalizer and router agree)
- ✅ Default behavior unchanged
- ✅ Unit test added and passing
- ✅ No new capability needed
-**No new recognizer needed** (existing recognizer handles it)
### Results
#### Parity Verification
```bash
NYASH_JOINIR_DEV=1 HAKO_JOINIR_STRICT=1 ./target/release/hakorune \
tools/selfhost/test_pattern4_parse_object.hako
```
**Output**:
```
[loop_canonicalizer] Chosen pattern: Pattern4Continue
[choose_pattern_kind/PARITY] OK: canonical and actual agree on Pattern4Continue
[loop_canonicalizer/PARITY] OK in function 'Main.parse_object_loop/0': canonical and actual agree on Pattern4Continue
```
**Status**: ✅ **Green parity** - canonicalizer and router agree on Pattern4Continue
#### Unit Test Results
```bash
cargo test --release --lib loop_canonicalizer::canonicalizer::tests::test_parse_object_pattern_recognized
```
**Status**: ✅ **PASS**
### Statistics
| Metric | Count |
|--------|-------|
| New patterns supported | 1 (parse_object, shares recognizer with parse_string/array) |
| Total patterns supported | 6 (skip_whitespace, parse_number, continue, parse_string, parse_array, parse_object) |
| New Capability Tags | 0 (uses existing ConstStep) |
| Lines added | ~220 (test file + unit test + docs) |
| Files modified | 2 (canonicalizer.rs, new test file) |
| Unit tests added | 1 |
| Parity status | Green ✅ |
| **New recognizer code** | **0 lines** (complete reuse!) |
### Comparison: Parse String vs Parse Array vs Parse Object
| Aspect | Parse String | Parse Array | Parse Object |
|--------|--------------|-------------|--------------|
| **Stop condition** | `"` (quote) | `]` (array end) | `}` (object end) |
| **Separator** | `\` (escape) | `,` (element separator) | `,` (pair separator) |
| **Structure** | continue + return | continue + return | continue + return |
| **Recognizer** | `detect_parse_string_pattern()` | **Same** | **Same** |
| **Routing** | Pattern4Continue | Pattern4Continue | Pattern4Continue |
| **ExitContract** | has_continue=true, has_return=true | **Same** | **Same** |
### Key Insight: Structural Pattern Family
**Major Discovery**: parse_string, parse_array, and parse_object form a **structural pattern family**:
- All have `if stop_cond { return }`
- All have `if separator_cond { continue }`
- All have carrier updates
- **One recognizer handles all three!**
The semantic differences (string quote vs array bracket vs object brace) are invisible at the AST structure level.
**Implication**: AST-based pattern matching creates natural pattern families. When we implement one pattern, we often get multiple variants "for free".
### Coverage Expansion Summary
Phase 143 started with 3 patterns (skip_whitespace, parse_number, continue) and expanded to 6 patterns:
- P0: Added parse_number (new recognizer)
- P1: Added parse_string (new recognizer)
- P2: Added parse_array (**reused parse_string recognizer**)
- P3: Added parse_object (**reused parse_string recognizer**)
**Recognizer efficiency**: 2 new recognizers → 4 new patterns supported!
### Follow-up Opportunities
#### Next Steps (Phase 144+)
- [ ] Document pattern families in design docs
- [ ] Add corpus analysis to discover more structural equivalences
- [ ] Create pattern taxonomy based on AST structure
- [ ] Explore other potential pattern families
#### Future Enhancements
- [ ] Generalize to "dual-exit patterns" (continue + return)
- [ ] Support triple-exit patterns (break + continue + return)
- [ ] Add signature-based pattern discovery
### Lessons Learned
1. **Pattern Families**: Structural equivalence creates natural groupings
2. **Recognizer Reuse**: Testing existing recognizers before writing new ones saves effort
3. **Semantic vs Structural**: AST patterns are structural; semantic meaning doesn't affect recognition
4. **Test-Driven Discovery**: Unit tests verify recognizer generality
5. **Documentation Value**: Recording discoveries helps future pattern work
---
**Phase 143 P0: Complete**
**Phase 143 P1: Complete**
**Phase 143 P2: Complete**
**Phase 143 P3: Complete**
**Date**: 2025-12-16
**Implemented by**: Claude Code (Sonnet 4.5)