Extends DigitPos promotion to generate TWO carriers for Pattern A/B support: - Boolean carrier (is_digit_pos) for break conditions - Integer carrier (digit_value) for NumberAccumulation ## Implementation 1. **DigitPosPromoter** (loop_body_digitpos_promoter.rs) - Generates dual carriers: is_<var> (bool) + <base>_value (int) - Smart naming: "digit_pos" → "digit" (removes "_pos" suffix) 2. **UpdateEnv** (update_env.rs) - Context-aware promoted variable resolution - Priority: <base>_value (int) → is_<var> (bool) → standard - Pass promoted_loopbodylocals from CarrierInfo 3. **Integration** (loop_with_break_minimal.rs) - UpdateEnv constructor updated to pass promoted list ## Test Results - **Before**: 925 tests PASS - **After**: 931 tests PASS (+6 new tests, 0 failures) ## New Tests - test_promoted_variable_resolution_digit_pos - Full dual-value - test_promoted_variable_resolution_fallback_to_bool - Fallback - test_promoted_variable_not_a_carrier - Error handling ## Impact | Pattern | Before | After | |---------|--------|-------| | _parse_number | ✅ Works (bool only) | ✅ Works (bool used, int unused) | | _atoi | ❌ Failed (missing int) | ✅ READY (int carrier available!) | 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
436 lines
13 KiB
Markdown
436 lines
13 KiB
Markdown
# Phase 246-EX: JsonParser _atoi JoinIR Integration
|
|
|
|
## 0. Executive Summary
|
|
|
|
**Goal**: Integrate JsonParser's `_atoi` function into the JoinIR loop lowering system using the existing NumberAccumulation infrastructure.
|
|
|
|
**Status**: Step 0 - Infrastructure confirmed, design document created
|
|
|
|
**Key Finding**: ✅ `UpdateRhs::NumberAccumulation` already exists (Phase 190) and fully supports the `result = result * 10 + digit_pos` pattern!
|
|
|
|
---
|
|
|
|
## 1. _atoi Loop Structure Analysis
|
|
|
|
### Source Code (from `apps/json/jsonparser.hako`)
|
|
|
|
```nyash
|
|
box JsonParser {
|
|
method _atoi(s, len) {
|
|
local result = 0
|
|
local digits = "0123456789"
|
|
local i = 0
|
|
|
|
loop(i < len) {
|
|
local ch = s.substring(i, i + 1)
|
|
local digit_pos = digits.indexOf(ch)
|
|
if digit_pos < 0 { break }
|
|
result = result * 10 + digit_pos
|
|
i = i + 1
|
|
}
|
|
|
|
return result
|
|
}
|
|
}
|
|
```
|
|
|
|
### Loop Components
|
|
|
|
#### Loop Header
|
|
- **Condition**: `i < len`
|
|
- **Type**: Simple comparison (supported by ExprLowerer)
|
|
|
|
#### Loop Variables
|
|
| Variable | Type | Role | Initial Value |
|
|
|----------|------|------|---------------|
|
|
| `i` | position | LoopState (counter) | 0 |
|
|
| `result` | accumulator | LoopState (number accumulation) | 0 |
|
|
|
|
#### Loop Body
|
|
1. **Local declarations**:
|
|
- `ch = s.substring(i, i + 1)` - current character
|
|
- `digit_pos = digits.indexOf(ch)` - digit value (-1 if not digit)
|
|
|
|
2. **Break condition**:
|
|
- `if digit_pos < 0 { break }` - exit on non-digit character
|
|
|
|
3. **Updates**:
|
|
- `result = result * 10 + digit_pos` - number accumulation (NumberAccumulation pattern)
|
|
- `i = i + 1` - position increment (Const increment)
|
|
|
|
#### Captured Variables (Function Parameters)
|
|
- `s` - input string to parse
|
|
- `len` - length of string to process
|
|
- `digits` - pre-loop local (digit lookup string "0123456789")
|
|
|
|
---
|
|
|
|
## 2. Pattern Classification
|
|
|
|
### Pattern Type: **Pattern 2 (Break)**
|
|
|
|
**Rationale**:
|
|
- Single if-break structure
|
|
- No continue statements
|
|
- Early exit condition (non-digit character)
|
|
- Two carriers with different update patterns
|
|
|
|
### Carriers
|
|
|
|
| Carrier | Role | Update Pattern | UpdateExpr Variant |
|
|
|---------|------|----------------|-------------------|
|
|
| `i` | Loop counter | `i = i + 1` | `UpdateExpr::Const(1)` |
|
|
| `result` | Number accumulator | `result = result * 10 + digit_pos` | `UpdateExpr::BinOp { rhs: NumberAccumulation { base: 10, digit_var: "digit_pos" } }` |
|
|
|
|
---
|
|
|
|
## 3. UpdateExpr Infrastructure Confirmation (Phase 190)
|
|
|
|
### 3.1 Existing NumberAccumulation Support
|
|
|
|
**Location**: `/home/tomoaki/git/hakorune-selfhost/src/mir/join_ir/lowering/loop_update_analyzer.rs`
|
|
|
|
#### UpdateRhs Enum (lines 46-64)
|
|
```rust
|
|
pub enum UpdateRhs {
|
|
Const(i64),
|
|
Variable(String),
|
|
StringLiteral(String),
|
|
/// Phase 190: Number accumulation pattern: result = result * base + digit
|
|
NumberAccumulation {
|
|
base: i64,
|
|
digit_var: String,
|
|
},
|
|
Other,
|
|
}
|
|
```
|
|
|
|
✅ **NumberAccumulation variant exists!**
|
|
|
|
### 3.2 Detection Logic (lines 157-192)
|
|
|
|
**Pattern Recognition**:
|
|
```rust
|
|
// Detects: (carrier * base) + digit
|
|
if matches!(operator, BinaryOperator::Add | BinaryOperator::Subtract) {
|
|
if let ASTNode::BinaryOp {
|
|
operator: BinaryOperator::Multiply,
|
|
left: mul_left,
|
|
right: mul_right,
|
|
..
|
|
} = left.as_ref() {
|
|
// Check if multiplication is: carrier * base
|
|
if mul_lhs_name == carrier_name {
|
|
if let ASTNode::Literal { value: LiteralValue::Integer(base), .. } = mul_right.as_ref() {
|
|
if let Some(digit_var) = Self::extract_variable_name(right) {
|
|
// NumberAccumulation pattern detected!
|
|
return Some(UpdateExpr::BinOp {
|
|
lhs: carrier_name.to_string(),
|
|
op,
|
|
rhs: UpdateRhs::NumberAccumulation { base: *base, digit_var },
|
|
});
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
✅ **Exactly matches our pattern: `result = result * 10 + digit_pos`**
|
|
|
|
### 3.3 Emission Logic (carrier_update_emitter.rs)
|
|
|
|
**Location**: `/home/tomoaki/git/hakorune-selfhost/src/mir/join_ir/lowering/carrier_update_emitter.rs` (lines 139-170)
|
|
|
|
```rust
|
|
UpdateRhs::NumberAccumulation { base, digit_var } => {
|
|
// Step 1: Emit const for base
|
|
let base_id = alloc_value();
|
|
instructions.push(JoinInst::Compute(MirLikeInst::Const {
|
|
dst: base_id,
|
|
value: ConstValue::Integer(*base),
|
|
}));
|
|
|
|
// Step 2: Emit multiplication: tmp = carrier * base
|
|
let tmp_id = alloc_value();
|
|
instructions.push(JoinInst::Compute(MirLikeInst::BinOp {
|
|
dst: tmp_id,
|
|
op: BinOpKind::Mul,
|
|
lhs: carrier_param,
|
|
rhs: base_id,
|
|
}));
|
|
|
|
// Step 3: Resolve digit variable
|
|
let digit_id = env.resolve(digit_var).ok_or_else(...)?;
|
|
|
|
// Step 4: Emit addition: result = tmp + digit
|
|
let result = alloc_value();
|
|
instructions.push(JoinInst::Compute(MirLikeInst::BinOp {
|
|
dst: result,
|
|
op: *op, // Add or Subtract
|
|
lhs: tmp_id,
|
|
rhs: digit_id,
|
|
}));
|
|
}
|
|
```
|
|
|
|
✅ **Complete emission infrastructure ready!**
|
|
|
|
### 3.4 Test Coverage
|
|
|
|
**Existing tests** (loop_update_analyzer.rs):
|
|
- ✅ `test_analyze_number_accumulation_base10()` - decimal pattern
|
|
- ✅ `test_analyze_number_accumulation_base2()` - binary pattern
|
|
- ✅ `test_analyze_number_accumulation_wrong_lhs()` - validation
|
|
|
|
**Existing tests** (carrier_update_emitter.rs):
|
|
- ✅ `test_emit_number_accumulation_base10()` - JoinIR emission
|
|
- ✅ `test_emit_number_accumulation_digit_not_found()` - error handling
|
|
|
|
---
|
|
|
|
## 4. Infrastructure Confirmation Results
|
|
|
|
### ✅ Checklist
|
|
|
|
- [x] `UpdateRhs::NumberAccumulation` variant exists
|
|
- [x] `result = result * 10 + digit_pos` pattern fully supported
|
|
- [x] Detection logic handles nested BinaryOp (Mul inside Add)
|
|
- [x] Emission logic generates correct JoinIR sequence
|
|
- [x] Unit tests cover base10 and base2 patterns
|
|
- [x] Error handling for missing digit variable
|
|
|
|
### 🎯 No Extensions Needed!
|
|
|
|
The existing Phase 190 infrastructure is **complete and ready** for _atoi integration. No modifications to UpdateExpr, detection, or emission logic required.
|
|
|
|
---
|
|
|
|
## 5. Integration Requirements
|
|
|
|
### 5.1 Condition Expression Support
|
|
|
|
#### Header Condition: `i < len`
|
|
- **Type**: Simple comparison
|
|
- **Lowering**: ExprLowerer with ConditionEnv
|
|
- **Required vars in env**: `i`, `len`
|
|
|
|
#### Break Condition: `digit_pos < 0`
|
|
- **Type**: Simple comparison
|
|
- **Lowering**: ExprLowerer or digitpos_condition_normalizer
|
|
- **Required vars in env**: `digit_pos`
|
|
|
|
### 5.2 Variable Capture Requirements
|
|
|
|
#### Function Parameters (must be in ConditionEnv)
|
|
- `s` - input string
|
|
- `len` - string length
|
|
|
|
#### Loop-Local Variables (LoopBodyLocal detection required)
|
|
- `digits` - digit lookup string (pre-loop local)
|
|
- `ch` - current character (body-local)
|
|
- `digit_pos` - digit position (body-local, used in break condition)
|
|
|
|
### 5.3 UpdateEnv Resolution
|
|
|
|
The UpdateEnv must resolve:
|
|
1. **Carriers**: `i`, `result` (from carrier params)
|
|
2. **Condition vars**: `len` (from function params)
|
|
3. **Body-locals**: `digit_pos` (from loop body)
|
|
|
|
---
|
|
|
|
## 6. Test Plan
|
|
|
|
### 6.1 E2E Test Cases
|
|
|
|
**File**: `apps/tests/json_atoi_smoke.hako`
|
|
|
|
| Input | Expected Output | Test Case |
|
|
|-------|----------------|-----------|
|
|
| `"0"` | `0` | Single digit zero |
|
|
| `"42"` | `42` | Two digits |
|
|
| `"123"` | `123` | Multiple digits |
|
|
| `"007"` | `7` | Leading zeros |
|
|
| `"123abc"` | `123` | Break at non-digit |
|
|
| `"abc"` | `0` | Immediate break (no digits) |
|
|
| `""` | `0` | Empty string |
|
|
|
|
### 6.2 JoinIR Structure Tests
|
|
|
|
**Verify**:
|
|
1. **UpdateExpr detection**:
|
|
- `i = i + 1` → `UpdateExpr::Const(1)`
|
|
- `result = result * 10 + digit_pos` → `UpdateExpr::BinOp { rhs: NumberAccumulation { base: 10, digit_var: "digit_pos" } }`
|
|
|
|
2. **CarrierInfo**:
|
|
- Both `i` and `result` marked as LoopState
|
|
- Correct initial values (both 0)
|
|
|
|
3. **ExitMeta**:
|
|
- Contains `("i", ...)` and `("result", ...)`
|
|
- Exit ValueIds available for function return
|
|
|
|
4. **JoinIR instructions**:
|
|
- Mul+Add sequence emitted for `result` update
|
|
- Correct base const (10)
|
|
- Correct digit_pos variable resolution
|
|
|
|
### 6.3 MIR Dump Verification
|
|
|
|
**Commands**:
|
|
```bash
|
|
# Basic MIR structure
|
|
./target/release/hakorune --dump-mir apps/tests/json_atoi_smoke.hako
|
|
|
|
# Detailed MIR with effects
|
|
./target/release/hakorune --dump-mir --mir-verbose --mir-verbose-effects apps/tests/json_atoi_smoke.hako
|
|
|
|
# JSON format for detailed analysis
|
|
./target/release/hakorune --emit-mir-json mir.json apps/tests/json_atoi_smoke.hako
|
|
jq '.functions[] | select(.name == "_atoi") | .blocks' mir.json
|
|
```
|
|
|
|
**Expected MIR patterns**:
|
|
- Const instruction for base 10
|
|
- Mul instruction: `%tmp = %result * %base`
|
|
- Add instruction: `%result_next = %tmp + %digit_pos`
|
|
- PHI nodes for both carriers at loop header
|
|
|
|
---
|
|
|
|
## 7. Implementation Strategy (Recommended Steps)
|
|
|
|
### Step 1: Minimal Smoke Test
|
|
- Create simple test case with hardcoded digit loop
|
|
- Verify NumberAccumulation detection works
|
|
- Confirm JoinIR emission is correct
|
|
|
|
### Step 2: LoopBodyLocal Detection
|
|
- Ensure `digit_pos` is detected as loop body local
|
|
- Verify it's available in UpdateEnv during emission
|
|
- Test break condition lowering with `digit_pos < 0`
|
|
|
|
### Step 3: Function Parameter Capture
|
|
- Verify `s` and `len` are captured correctly
|
|
- Test ConditionEnv resolution for header condition `i < len`
|
|
- Ensure ExprLowerer can access function params
|
|
|
|
### Step 4: Full _atoi Integration
|
|
- Test complete `_atoi` function from jsonparser.hako
|
|
- Verify all E2E test cases pass
|
|
- Check MIR dump for correct structure
|
|
|
|
### Step 5: Edge Cases
|
|
- Empty string handling
|
|
- Single character strings
|
|
- Non-digit immediate break
|
|
- Large numbers (overflow consideration)
|
|
|
|
---
|
|
|
|
## 8. Potential Challenges
|
|
|
|
### 8.1 LoopBodyLocal Detection
|
|
|
|
**Issue**: `digit_pos` must be recognized as a loop body local variable that is:
|
|
1. Declared inside loop body (`local digit_pos = ...`)
|
|
2. Used in break condition (`if digit_pos < 0`)
|
|
3. Available in UpdateEnv during result update emission
|
|
|
|
**Solution**: Existing LoopBodyLocalDetector should handle this (Phase 184).
|
|
|
|
### 8.2 Method Call in Loop Body
|
|
|
|
**Issue**: `s.substring(i, i + 1)` and `digits.indexOf(ch)` are method calls with multiple arguments.
|
|
|
|
**Solution**: LoopBodyLocal lowering should handle method calls as "complex expressions" (Phase 184 already supports this via `ExprLowerer`).
|
|
|
|
### 8.3 UpdateEnv Variable Resolution
|
|
|
|
**Issue**: UpdateEnv must resolve variables from three sources:
|
|
- Carriers: `i`, `result`
|
|
- Function params: `s`, `len`
|
|
- Body-locals: `ch`, `digit_pos`
|
|
|
|
**Solution**: Phase 184's `UpdateEnv::new(&cond_env, &body_env)` should handle this if:
|
|
- ConditionEnv contains function params + carriers
|
|
- LoopBodyLocalEnv contains body-local ValueIds
|
|
|
|
---
|
|
|
|
## 9. Success Criteria
|
|
|
|
### ✅ Phase 246-EX Complete When:
|
|
|
|
1. **Detection**: `result = result * 10 + digit_pos` correctly identified as NumberAccumulation
|
|
2. **Emission**: Mul+Add JoinIR sequence generated with correct ValueIds
|
|
3. **Execution**: All E2E test cases produce correct numeric outputs
|
|
4. **Validation**: MIR dump shows expected loop structure with PHI nodes
|
|
5. **Integration**: Works within full JsonParser context (not just isolated test)
|
|
|
|
---
|
|
|
|
## 10. References
|
|
|
|
### Phase 190: NumberAccumulation Infrastructure
|
|
- Detection: `/src/mir/join_ir/lowering/loop_update_analyzer.rs` (lines 157-192)
|
|
- Emission: `/src/mir/join_ir/lowering/carrier_update_emitter.rs` (lines 139-170)
|
|
- Tests: Both files contain comprehensive unit tests
|
|
|
|
### Phase 184: UpdateEnv and LoopBodyLocal
|
|
- UpdateEnv: `/src/mir/join_ir/lowering/update_env.rs`
|
|
- LoopBodyLocal detection: (search for LoopBodyLocalDetector)
|
|
|
|
### Related Phases
|
|
- Phase 176-2: Carrier update emission basics
|
|
- Phase 178: String literal updates (similar multi-carrier pattern)
|
|
- Phase 197: Loop update analyzer extraction
|
|
|
|
---
|
|
|
|
## Appendix A: Quick Reference
|
|
|
|
### Pattern Match: _atoi Loop
|
|
|
|
```
|
|
Loop Type: Pattern 2 (Break)
|
|
Header: i < len
|
|
Break: digit_pos < 0
|
|
Carriers: i (counter), result (accumulator)
|
|
Body-locals: ch, digit_pos
|
|
Captured: s, len, digits (pre-loop)
|
|
|
|
UpdateExpr:
|
|
i -> Const(1)
|
|
result -> BinOp { op: Add, rhs: NumberAccumulation { base: 10, digit_var: "digit_pos" } }
|
|
```
|
|
|
|
### JoinIR Emission Sequence (NumberAccumulation)
|
|
|
|
```
|
|
%base_10 = Const(10)
|
|
%tmp = BinOp(Mul, %result_param, %base_10)
|
|
%digit_id = <resolved from UpdateEnv>
|
|
%result_next = BinOp(Add, %tmp, %digit_id)
|
|
```
|
|
|
|
### Debug Commands
|
|
|
|
```bash
|
|
# Trace variable mapping
|
|
NYASH_TRACE_VARMAP=1 cargo test --release test_json_atoi -- --nocapture
|
|
|
|
# JoinIR debug
|
|
NYASH_JOINIR_DEBUG=1 ./target/release/hakorune apps/tests/json_atoi_smoke.hako 2>&1 | grep "\[trace:"
|
|
|
|
# UpdateExpr detection (if logging added)
|
|
NYASH_UPDATE_ANALYZER_DEBUG=1 ./target/release/hakorune apps/tests/json_atoi_smoke.hako
|
|
```
|
|
|
|
---
|
|
|
|
**Document Status**: ✅ Infrastructure confirmed, ready for implementation
|
|
**Next Step**: Phase 246-EX Step 1 - Minimal smoke test creation
|
|
**Last Updated**: 2025-12-11 (Phase 246-EX Step 0 completion)
|