Files
hakorune/docs/development/current/main/phases/phase-142/README.md

470 lines
17 KiB
Markdown
Raw Normal View History

feat(canonicalizer): Phase 143-P0 - parse_number pattern support Add parse_number pattern recognition to canonicalizer, expanding adaptation range for digit collection loops with break in THEN clause. ## Changes ### New Recognizer (ast_feature_extractor.rs) - `detect_parse_number_pattern()`: Detects `if invalid { break }` pattern - `ParseNumberInfo`: Struct for extracted pattern info - ~150 lines added ### Canonicalizer Integration (canonicalizer.rs) - Parse_number pattern detection before skip_whitespace - LoopSkeleton construction with 4 steps (Header + Body x2 + Update) - Routes to Pattern2Break (has_break=true) - ~60 lines modified ### Export Chain (6 files) - patterns/mod.rs → joinir/mod.rs → control_flow/mod.rs - builder.rs → mir/mod.rs - 8 lines total ### Tests - `test_parse_number_pattern_recognized()`: Unit test for recognition - Strict parity verification: GREEN (canonical and router agree) - ~130 lines added ## Pattern Comparison | Aspect | Skip Whitespace | Parse Number | |--------|----------------|--------------| | Break location | ELSE clause | THEN clause | | Pattern | `if cond { update } else { break }` | `if invalid { break } rest... update` | | Body after if | None | Required (result append) | ## Results - ✅ Skeleton creation successful - ✅ RoutingDecision matches router (Pattern2Break) - ✅ Strict parity OK (canonicalizer ↔ router agreement) - ✅ Unit test PASS - ✅ Manual test: test_pattern2_parse_number.hako executes correctly ## Statistics - New patterns: 1 (parse_number) - Total patterns: 3 (skip_whitespace, parse_number, continue) - Lines added: ~280 - Files modified: 8 - Parity status: Green ✅ 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-16 09:08:37 +09:00
# Phase 142: Canonicalizer Pattern Extension
## Status
- P0: ✅ Complete (trim leading/trailing)
- P1: ✅ Complete (continue pattern)
## P0: trim leading/trailing (COMPLETE)
### Objective
Extend Canonicalizer to recognize trim leading/trailing patterns, enabling proper routing through the normalized loop pipeline.
### Target Patterns
- `tools/selfhost/test_pattern3_trim_leading.hako` - `start = start + 1` pattern
- `tools/selfhost/test_pattern3_trim_trailing.hako` - `end = end - 1` pattern
### Accepted Criteria (All Met ✅)
- ✅ Canonicalizer creates Skeleton for trim_leading/trailing
-`decision.chosen == Pattern2Break` (ExitContract priority)
-`decision.missing_caps == []` (no missing capabilities)
- ✅ Strict parity green (NYASH_JOINIR_DEV=1 HAKO_JOINIR_STRICT=1)
- ✅ Default behavior unchanged
- ✅ Unit tests added
- ✅ Documentation created
### Implementation Summary
#### 1. Pattern Recognizer Generalization
**File**: `src/mir/builder/control_flow/joinir/patterns/ast_feature_extractor.rs`
**Changes**:
- Extended `detect_skip_whitespace_pattern()` to accept both `+` and `-` operators
- Added support for negative deltas (e.g., `-1` for `end = end - 1`)
- Maintained backward compatibility with existing skip_whitespace patterns
**Key Logic**:
```rust
// Phase 142 P0: Accept both Add (+1) and Subtract (-1)
let op_multiplier = match operator {
BinaryOperator::Add => 1,
BinaryOperator::Subtract => -1,
_ => return None,
};
// Calculate delta with sign (e.g., +1 or -1)
let delta = const_val * op_multiplier;
```
**Recognized Patterns**:
- skip_whitespace: `p = p + 1` (delta = +1)
- trim_leading: `start = start + 1` (delta = +1)
- trim_trailing: `end = end - 1` (delta = -1)
#### 2. Unit Tests
**File**: `src/mir/loop_canonicalizer/canonicalizer.rs`
**Added Tests**:
- `test_trim_leading_pattern_recognized()` - Verifies `start = start + 1` pattern
- `test_trim_trailing_pattern_recognized()` - Verifies `end = end - 1` pattern
**Test Coverage**:
- Skeleton creation
- Carrier slot creation with correct delta (+1 or -1)
- ExitContract setup (has_break=true)
- RoutingDecision (chosen=Pattern2Break, missing_caps=[])
**Test Results**:
```
running 2 tests
test mir::loop_canonicalizer::canonicalizer::tests::test_trim_leading_pattern_recognized ... ok
test mir::loop_canonicalizer::canonicalizer::tests::test_trim_trailing_pattern_recognized ... ok
test result: ok. 2 passed; 0 failed; 0 ignored
```
#### 3. Manual Verification
**Strict Parity Check**:
```bash
NYASH_JOINIR_DEV=1 HAKO_JOINIR_STRICT=1 ./target/release/hakorune \
tools/selfhost/test_pattern3_trim_leading.hako
```
**Output** (trim_leading):
```
[loop_canonicalizer] Decision: SUCCESS
[loop_canonicalizer] Chosen pattern: Pattern2Break
[loop_canonicalizer] Missing caps: []
[choose_pattern_kind/PARITY] OK: canonical and actual agree on Pattern2Break
[loop_canonicalizer/PARITY] OK in function 'main': canonical and actual agree on Pattern2Break
```
**Output** (trim_trailing):
```
[loop_canonicalizer] Decision: SUCCESS
[loop_canonicalizer] Chosen pattern: Pattern2Break
[loop_canonicalizer] Missing caps: []
[choose_pattern_kind/PARITY] OK: canonical and actual agree on Pattern2Break
[loop_canonicalizer/PARITY] OK in function 'main': canonical and actual agree on Pattern2Break
```
### Design Principles Applied
#### Box-First Modularization
- Extended existing `detect_skip_whitespace_pattern()` instead of creating new functions
- Maintained SSOT (Single Source of Truth) architecture
- Preserved delegation pattern through `pattern_recognizer.rs` wrapper
#### Incremental Implementation
- Focused on recognizer generalization only
- Did not modify routing or lowering logic
- Kept scope minimal (P0 only)
#### ExitContract Priority
- Pattern choice determined by ExitContract (has_break=true)
- Routes to Pattern2Break (not Pattern3IfPhi)
- Consistent with existing SSOT policy
### Files Modified
1. `src/mir/builder/control_flow/joinir/patterns/ast_feature_extractor.rs` (+35 lines, improved comments)
2. `src/mir/loop_canonicalizer/canonicalizer.rs` (+178 lines, 2 new tests)
### Statistics
- **Total changes**: +213 lines
- **Unit tests**: 2 new tests (100% pass)
- **Manual tests**: 2 patterns verified (strict parity green)
- **Build status**: ✅ No errors, no warnings (lib)
### SSOT References
- **Design**: `docs/development/current/main/design/loop-canonicalizer.md`
- **JoinIR Architecture**: `docs/development/current/main/joinir-architecture-overview.md`
- **Pattern Detection**: `ast_feature_extractor.rs` (Phase 140-P4-A SSOT)
### Known Limitations
- Pattern2 variable promotion (A-3 Trim promotion) not yet implemented
- This is expected - Phase 142 P0 only targets recognizer extension
- Promotion will be addressed in future phases
### Next Steps (Future Phases)
- Phase 142 P1: Implement A-3 Trim promotion in Pattern2 handler
- Phase 142 P2: Extend to other loop patterns (Pattern 3/4)
- Phase 142 P3: Add more complex carrier update patterns
### Verification Commands
```bash
# Unit tests
cargo test --release loop_canonicalizer::canonicalizer::tests::test_trim --lib
# Manual verification (trim_leading)
NYASH_JOINIR_DEV=1 HAKO_JOINIR_STRICT=1 ./target/release/hakorune \
tools/selfhost/test_pattern3_trim_leading.hako
# Manual verification (trim_trailing)
NYASH_JOINIR_DEV=1 HAKO_JOINIR_STRICT=1 ./target/release/hakorune \
tools/selfhost/test_pattern3_trim_trailing.hako
```
### Conclusion
Phase 142 P0 successfully extends the Canonicalizer to recognize trim leading/trailing patterns. The implementation:
- Maintains SSOT architecture
- Passes all unit tests
- Achieves strict parity agreement
- Preserves existing behavior
- Sets foundation for future pattern extensions
All acceptance criteria met. ✅
---
## P1: continue pattern (COMPLETE)
### Objective
Extend Canonicalizer to recognize continue patterns, enabling proper routing through the normalized loop pipeline.
### Target Pattern
- `tools/selfhost/test_pattern4_simple_continue.hako` - Simple continue pattern with carrier update
### Accepted Criteria (All Met ✅)
- ✅ Canonicalizer creates Skeleton for continue pattern
-`decision.chosen == Pattern4Continue` (router agreement)
-`decision.missing_caps == []` (no missing capabilities)
- ✅ Strict parity green (NYASH_JOINIR_DEV=1 HAKO_JOINIR_STRICT=1)
- ✅ Default behavior unchanged
- ✅ Unit tests added
- ✅ Documentation updated
### Implementation Summary
#### 1. Continue Pattern Detection
**File**: `src/mir/builder/control_flow/joinir/patterns/ast_feature_extractor.rs`
**New Function**: `detect_continue_pattern()`
**Pattern Structure**:
```rust
loop(cond) {
// ... optional body statements (Body)
if skip_cond {
carrier = carrier + const // Optional update before continue
continue
}
// ... rest of body statements (Rest)
carrier = carrier + const // Carrier update
}
```
**Example** (from test_pattern4_simple_continue.hako):
```nyash
loop(i < n) {
if is_even == 1 {
i = i + 1 // Update before continue
continue
}
sum = sum + i // Rest statements
i = i + 1 // Carrier update
}
```
**Key Logic**:
- Finds if statement containing continue in then_body
- Extracts body statements before the if
- Extracts rest statements after the if
- Detects carrier update (last statement in rest_stmts)
- Returns `ContinuePatternInfo` with carrier name, delta, body_stmts, and rest_stmts
#### 2. Canonicalizer Integration
**File**: `src/mir/loop_canonicalizer/canonicalizer.rs`
**Changes**:
- Added `try_extract_continue_pattern()` call before skip_whitespace check
- Build skeleton with continue pattern structure
- Set `ExitContract` with `has_continue=true, has_break=false`
- Route to `Pattern4Continue`
**Skeleton Structure**:
1. HeaderCond - Loop condition
2. Body - Optional body statements before continue check
3. Body - Rest statements (excluding carrier update)
4. Update - Carrier update step
#### 3. Module Re-exports
**Files Modified** (re-export chain):
- `src/mir/builder/control_flow/joinir/patterns/mod.rs` - Added `detect_continue_pattern`, `ContinuePatternInfo`
- `src/mir/builder/control_flow/joinir/mod.rs` - Re-export to joinir level
- `src/mir/builder/control_flow/mod.rs` - Re-export to control_flow level
- `src/mir/builder.rs` - Re-export to builder level
- `src/mir/mod.rs` - Re-export to crate level
**Pattern**: Followed existing SSOT pattern from Phase 140-P4-A
#### 4. Pattern Recognizer Wrapper
**File**: `src/mir/loop_canonicalizer/pattern_recognizer.rs`
**New Function**: `try_extract_continue_pattern()`
- Delegates to `detect_continue_pattern()` from ast_feature_extractor
- Returns tuple: `(carrier_name, delta, body_stmts, rest_stmts)`
- Maintains backward compatibility with existing callsites
#### 5. Unit Tests
**File**: `src/mir/loop_canonicalizer/canonicalizer.rs`
**Added Test**: `test_simple_continue_pattern_recognized()`
- Builds AST: `loop(i < n) { if is_even { i = i + 1; continue } sum = sum + i; i = i + 1 }`
- Verifies skeleton creation with correct structure
- Checks carrier slot (name="i", delta=1)
- Validates ExitContract (has_continue=true, has_break=false)
- Confirms routing decision (Pattern4Continue, missing_caps=[])
**Test Results**:
```
running 8 tests
test mir::loop_canonicalizer::canonicalizer::tests::test_simple_continue_pattern_recognized ... ok
test result: ok. 8 passed; 0 failed; 0 ignored
```
#### 6. Manual Verification
**Strict Parity Check**:
```bash
NYASH_JOINIR_DEV=1 HAKO_JOINIR_STRICT=1 ./target/release/hakorune \
tools/selfhost/test_pattern4_simple_continue.hako
```
**Output**:
```
[loop_canonicalizer] Function: main
[loop_canonicalizer] Skeleton steps: 4
[loop_canonicalizer] Carriers: 1
[loop_canonicalizer] Has exits: true
[loop_canonicalizer] Decision: SUCCESS
[loop_canonicalizer] Chosen pattern: Pattern4Continue
[loop_canonicalizer] Missing caps: []
[choose_pattern_kind/PARITY] OK: canonical and actual agree on Pattern4Continue
[loop_canonicalizer/PARITY] OK in function 'main': canonical and actual agree on Pattern4Continue
```
**Status**: ✅ Strict parity green!
### Design Principles Applied
#### Box-First Modularization
- Created dedicated `detect_continue_pattern()` function in ast_feature_extractor
- Maintained SSOT architecture with proper re-export chain
- Followed existing pattern from skip_whitespace detection
#### Incremental Implementation
- Focused on pattern recognition only (P1 scope)
- Did not modify lowering logic (expected promotion errors)
- Kept changes minimal and focused
#### ExitContract Priority
- Pattern choice determined by ExitContract (has_continue=true, has_break=false)
- Routes to Pattern4Continue (not Pattern2 or Pattern3)
- Consistent with existing SSOT policy from Phase 137-5
### Files Modified
1. `src/mir/builder/control_flow/joinir/patterns/ast_feature_extractor.rs` (+167 lines, new function)
2. `src/mir/loop_canonicalizer/pattern_recognizer.rs` (+35 lines, wrapper function)
3. `src/mir/loop_canonicalizer/canonicalizer.rs` (+103 lines, continue support + unit test)
4. `src/mir/builder/control_flow/joinir/patterns/mod.rs` (+3 lines, re-export)
5. `src/mir/builder/control_flow/joinir/mod.rs` (+3 lines, re-export)
6. `src/mir/builder/control_flow/mod.rs` (+3 lines, re-export)
7. `src/mir/builder.rs` (+2 lines, re-export)
8. `src/mir/mod.rs` (+2 lines, re-export)
### Statistics
- **Total changes**: +318 lines
- **Unit tests**: 1 new test (100% pass)
- **All canonicalizer tests**: 8 passed (100%)
- **Manual tests**: 1 pattern verified (strict parity green)
- **Build status**: ✅ No errors (warnings are pre-existing)
### SSOT References
- **Design**: `docs/development/current/main/design/loop-canonicalizer.md`
- **JoinIR Architecture**: `docs/development/current/main/joinir-architecture-overview.md`
- **Pattern Detection**: `ast_feature_extractor.rs` (Phase 140-P4-A SSOT)
### Known Limitations
- Pattern4 variable promotion (A-3 Trim, A-4 DigitPos) not yet handling this pattern
- This is expected - Phase 142 P1 only targets recognizer extension
- Promotion will be addressed when Pattern4 lowering is enhanced
### Next Steps (Future Phases)
- Phase 142 P2: Extend Pattern4 lowering to handle recognized continue patterns
- Phase 142 P3: Add more complex continue patterns (multiple carriers, nested conditions)
### Verification Commands
```bash
# Unit tests
cargo test --release --lib loop_canonicalizer::canonicalizer::tests::test_simple_continue_pattern_recognized
# All canonicalizer tests
cargo test --release --lib loop_canonicalizer::canonicalizer::tests
# Manual verification
NYASH_JOINIR_DEV=1 HAKO_JOINIR_STRICT=1 ./target/release/hakorune \
tools/selfhost/test_pattern4_simple_continue.hako
```
### Conclusion
Phase 142 P1 successfully extends the Canonicalizer to recognize continue patterns. The implementation:
- Maintains SSOT architecture
- Passes all unit tests (8/8)
- Achieves strict parity agreement with router
- Preserves existing behavior
- Follows existing re-export pattern from Phase 140-P4-A
All acceptance criteria met. ✅
---
## P2: Pattern4 Lowering Extension (IN PROGRESS)
### Objective
Extend Pattern4 lowering to handle "continue + return" patterns found in parse_string/array/object.
### Target Pattern
- `tools/selfhost/test_pattern4_parse_string.hako` - Parse string with continue (escape) + return (quote)
### Pattern4 Lowering Contract (Phase 142 P2)
#### Accepted Minimum Structure
**Return Handling**:
- **Position**: Early return inside one or more if blocks
- **Type**: Scalar return values (complex returns are out of scope)
- **Constraint**: Only the last return in loop body is processed
**Continue Side Updates**:
- **Pattern**: `if cond { carrier = carrier ± 1; continue }`
- **Update**: Constant step only (+1, -1, +2, -2, etc.)
- **Constraint**: Multiple carriers not yet supported
**Carrier and Payload**:
- **Carrier**: Loop variable used in loop condition
- **Payload**: State updated on non-continue path (e.g., result string)
**Exit Contract**:
- `has_continue = true` (continue pattern exists)
- `has_return = true` (early return exists)
- Both must coexist
#### Unsupported (Fail-Fast)
The following patterns are rejected with explicit error messages:
- [ ] Multiple continue patterns (2+ continue statements)
- [ ] Nested continue-return (continue inside if inside if)
- [ ] Complex return values (returning multiple fields)
- [ ] Variable step updates (escape sequence handling, etc.)
### Implementation Strategy
**Step 1**: Clarify Pattern4 contract (this document)
**Step 2**: Add E2E test case
**Step 3**: Extend Pattern4 lowerer
**Step 4**: Consider box-ification / modularization
**Step 5**: Implementation and verification
### Progress
- [x] Step 1: Contract clarification
- [ ] Step 2: Add test case
- [ ] Step 3: Extend lowerer
- [ ] Step 4: Consider box-ification
- [ ] Step 5: Verification complete
### Acceptance Criteria
- ✅ Representative test (parse_string or simple_continue) passes JoinIR lowering
- ✅ Execution results are correct in both VM and LLVM (scope to be determined)
- ✅ No regression in existing tests (phase132_exit_phi_parity, etc.)
- ✅ Unsupported patterns fail fast with reason (error_tags)
- ✅ No new environment variables added (dev-only observation only)
- ✅ Documentation updated
### Files to Modify
1. `docs/development/current/main/phases/phase-142/README.md` - Contract documentation
2. `tools/selfhost/test_pattern4_parse_string_lowering.hako` - Minimal E2E test (new)
3. `src/mir/builder/control_flow/joinir/patterns/pattern4_with_continue.rs` - Lowerer extension
4. `src/mir/builder/control_flow/joinir/patterns/pattern4_carrier_analyzer.rs` - Carrier analysis (if needed)
### Step 3-A: Early Return Fail-Fast (COMPLETE ✅)
**Status**: ✅ COMPLETE - Return detection and explicit error implemented
**Implementation**: Added `has_return_in_body()` helper function to Pattern4 lowerer
- Recursively scans loop body for return statements
- Returns explicit Fail-Fast error when return is detected
- Error message references Phase 142 P2 for future lowering
**Test Results**: All 14 canonicalizer tests PASS (no regressions)
**Key Achievement**: Unsafe silent acceptance is now prevented - early returns explicitly surface as errors with actionable messages.
### Step 3-B: Return Path JoinIR Generation (DEFERRED)
**Status**: 🔄 DEFERRED for separate session - Large-scale implementation requires careful design
**Why separate**: JoinIR generation involves responsibility boundary decisions (Pattern4 direct vs delegation to Pattern5) and ExitMeta/payload handling. Separating ensures cleaner cause analysis.
**Design questions to resolve first**:
1. Should return be handled directly in Pattern4 lowerer, or delegated to Pattern5?
2. How to transport return payload through exit/boundary/ExitMeta (can we reuse ContinueReturn assets)?
### SSOT References
- **Design**: `docs/development/current/main/design/loop-canonicalizer.md`
- **JoinIR Architecture**: `docs/development/current/main/joinir-architecture-overview.md`
- **Pattern4 Implementation**: `src/mir/builder/control_flow/joinir/patterns/pattern4_with_continue.rs`