Files
hakorune/docs/development/current/main/phase246-jsonparser-atoi-joinir-integration.md

436 lines
13 KiB
Markdown
Raw Normal View History

# Phase 246-EX: JsonParser _atoi JoinIR Integration
## 0. Executive Summary
**Goal**: Integrate JsonParser's `_atoi` function into the JoinIR loop lowering system using the existing NumberAccumulation infrastructure.
**Status**: Step 0 - Infrastructure confirmed, design document created
**Key Finding**: ✅ `UpdateRhs::NumberAccumulation` already exists (Phase 190) and fully supports the `result = result * 10 + digit_pos` pattern!
---
## 1. _atoi Loop Structure Analysis
### Source Code (from `apps/json/jsonparser.hako`)
```nyash
box JsonParser {
method _atoi(s, len) {
local result = 0
local digits = "0123456789"
local i = 0
loop(i < len) {
local ch = s.substring(i, i + 1)
local digit_pos = digits.indexOf(ch)
if digit_pos < 0 { break }
result = result * 10 + digit_pos
i = i + 1
}
return result
}
}
```
### Loop Components
#### Loop Header
- **Condition**: `i < len`
- **Type**: Simple comparison (supported by ExprLowerer)
#### Loop Variables
| Variable | Type | Role | Initial Value |
|----------|------|------|---------------|
| `i` | position | LoopState (counter) | 0 |
| `result` | accumulator | LoopState (number accumulation) | 0 |
#### Loop Body
1. **Local declarations**:
- `ch = s.substring(i, i + 1)` - current character
- `digit_pos = digits.indexOf(ch)` - digit value (-1 if not digit)
2. **Break condition**:
- `if digit_pos < 0 { break }` - exit on non-digit character
3. **Updates**:
- `result = result * 10 + digit_pos` - number accumulation (NumberAccumulation pattern)
- `i = i + 1` - position increment (Const increment)
#### Captured Variables (Function Parameters)
- `s` - input string to parse
- `len` - length of string to process
- `digits` - pre-loop local (digit lookup string "0123456789")
---
## 2. Pattern Classification
### Pattern Type: **Pattern 2 (Break)**
**Rationale**:
- Single if-break structure
- No continue statements
- Early exit condition (non-digit character)
- Two carriers with different update patterns
### Carriers
| Carrier | Role | Update Pattern | UpdateExpr Variant |
|---------|------|----------------|-------------------|
| `i` | Loop counter | `i = i + 1` | `UpdateExpr::Const(1)` |
| `result` | Number accumulator | `result = result * 10 + digit_pos` | `UpdateExpr::BinOp { rhs: NumberAccumulation { base: 10, digit_var: "digit_pos" } }` |
---
## 3. UpdateExpr Infrastructure Confirmation (Phase 190)
### 3.1 Existing NumberAccumulation Support
**Location**: `/home/tomoaki/git/hakorune-selfhost/src/mir/join_ir/lowering/loop_update_analyzer.rs`
#### UpdateRhs Enum (lines 46-64)
```rust
pub enum UpdateRhs {
Const(i64),
Variable(String),
StringLiteral(String),
/// Phase 190: Number accumulation pattern: result = result * base + digit
NumberAccumulation {
base: i64,
digit_var: String,
},
Other,
}
```
**NumberAccumulation variant exists!**
### 3.2 Detection Logic (lines 157-192)
**Pattern Recognition**:
```rust
// Detects: (carrier * base) + digit
if matches!(operator, BinaryOperator::Add | BinaryOperator::Subtract) {
if let ASTNode::BinaryOp {
operator: BinaryOperator::Multiply,
left: mul_left,
right: mul_right,
..
} = left.as_ref() {
// Check if multiplication is: carrier * base
if mul_lhs_name == carrier_name {
if let ASTNode::Literal { value: LiteralValue::Integer(base), .. } = mul_right.as_ref() {
if let Some(digit_var) = Self::extract_variable_name(right) {
// NumberAccumulation pattern detected!
return Some(UpdateExpr::BinOp {
lhs: carrier_name.to_string(),
op,
rhs: UpdateRhs::NumberAccumulation { base: *base, digit_var },
});
}
}
}
}
}
```
**Exactly matches our pattern: `result = result * 10 + digit_pos`**
### 3.3 Emission Logic (carrier_update_emitter.rs)
**Location**: `/home/tomoaki/git/hakorune-selfhost/src/mir/join_ir/lowering/carrier_update_emitter.rs` (lines 139-170)
```rust
UpdateRhs::NumberAccumulation { base, digit_var } => {
// Step 1: Emit const for base
let base_id = alloc_value();
instructions.push(JoinInst::Compute(MirLikeInst::Const {
dst: base_id,
value: ConstValue::Integer(*base),
}));
// Step 2: Emit multiplication: tmp = carrier * base
let tmp_id = alloc_value();
instructions.push(JoinInst::Compute(MirLikeInst::BinOp {
dst: tmp_id,
op: BinOpKind::Mul,
lhs: carrier_param,
rhs: base_id,
}));
// Step 3: Resolve digit variable
let digit_id = env.resolve(digit_var).ok_or_else(...)?;
// Step 4: Emit addition: result = tmp + digit
let result = alloc_value();
instructions.push(JoinInst::Compute(MirLikeInst::BinOp {
dst: result,
op: *op, // Add or Subtract
lhs: tmp_id,
rhs: digit_id,
}));
}
```
**Complete emission infrastructure ready!**
### 3.4 Test Coverage
**Existing tests** (loop_update_analyzer.rs):
-`test_analyze_number_accumulation_base10()` - decimal pattern
-`test_analyze_number_accumulation_base2()` - binary pattern
-`test_analyze_number_accumulation_wrong_lhs()` - validation
**Existing tests** (carrier_update_emitter.rs):
-`test_emit_number_accumulation_base10()` - JoinIR emission
-`test_emit_number_accumulation_digit_not_found()` - error handling
---
## 4. Infrastructure Confirmation Results
### ✅ Checklist
- [x] `UpdateRhs::NumberAccumulation` variant exists
- [x] `result = result * 10 + digit_pos` pattern fully supported
- [x] Detection logic handles nested BinaryOp (Mul inside Add)
- [x] Emission logic generates correct JoinIR sequence
- [x] Unit tests cover base10 and base2 patterns
- [x] Error handling for missing digit variable
### 🎯 No Extensions Needed!
The existing Phase 190 infrastructure is **complete and ready** for _atoi integration. No modifications to UpdateExpr, detection, or emission logic required.
---
## 5. Integration Requirements
### 5.1 Condition Expression Support
#### Header Condition: `i < len`
- **Type**: Simple comparison
- **Lowering**: ExprLowerer with ConditionEnv
- **Required vars in env**: `i`, `len`
#### Break Condition: `digit_pos < 0`
- **Type**: Simple comparison
- **Lowering**: ExprLowerer or digitpos_condition_normalizer
- **Required vars in env**: `digit_pos`
### 5.2 Variable Capture Requirements
#### Function Parameters (must be in ConditionEnv)
- `s` - input string
- `len` - string length
#### Loop-Local Variables (LoopBodyLocal detection required)
- `digits` - digit lookup string (pre-loop local)
- `ch` - current character (body-local)
- `digit_pos` - digit position (body-local, used in break condition)
### 5.3 UpdateEnv Resolution
The UpdateEnv must resolve:
1. **Carriers**: `i`, `result` (from carrier params)
2. **Condition vars**: `len` (from function params)
3. **Body-locals**: `digit_pos` (from loop body)
---
## 6. Test Plan
### 6.1 E2E Test Cases
**File**: `apps/tests/json_atoi_smoke.hako`
| Input | Expected Output | Test Case |
|-------|----------------|-----------|
| `"0"` | `0` | Single digit zero |
| `"42"` | `42` | Two digits |
| `"123"` | `123` | Multiple digits |
| `"007"` | `7` | Leading zeros |
| `"123abc"` | `123` | Break at non-digit |
| `"abc"` | `0` | Immediate break (no digits) |
| `""` | `0` | Empty string |
### 6.2 JoinIR Structure Tests
**Verify**:
1. **UpdateExpr detection**:
- `i = i + 1``UpdateExpr::Const(1)`
- `result = result * 10 + digit_pos``UpdateExpr::BinOp { rhs: NumberAccumulation { base: 10, digit_var: "digit_pos" } }`
2. **CarrierInfo**:
- Both `i` and `result` marked as LoopState
- Correct initial values (both 0)
3. **ExitMeta**:
- Contains `("i", ...)` and `("result", ...)`
- Exit ValueIds available for function return
4. **JoinIR instructions**:
- Mul+Add sequence emitted for `result` update
- Correct base const (10)
- Correct digit_pos variable resolution
### 6.3 MIR Dump Verification
**Commands**:
```bash
# Basic MIR structure
./target/release/hakorune --dump-mir apps/tests/json_atoi_smoke.hako
# Detailed MIR with effects
./target/release/hakorune --dump-mir --mir-verbose --mir-verbose-effects apps/tests/json_atoi_smoke.hako
# JSON format for detailed analysis
./target/release/hakorune --emit-mir-json mir.json apps/tests/json_atoi_smoke.hako
jq '.functions[] | select(.name == "_atoi") | .blocks' mir.json
```
**Expected MIR patterns**:
- Const instruction for base 10
- Mul instruction: `%tmp = %result * %base`
- Add instruction: `%result_next = %tmp + %digit_pos`
- PHI nodes for both carriers at loop header
---
## 7. Implementation Strategy (Recommended Steps)
### Step 1: Minimal Smoke Test
- Create simple test case with hardcoded digit loop
- Verify NumberAccumulation detection works
- Confirm JoinIR emission is correct
### Step 2: LoopBodyLocal Detection
- Ensure `digit_pos` is detected as loop body local
- Verify it's available in UpdateEnv during emission
- Test break condition lowering with `digit_pos < 0`
### Step 3: Function Parameter Capture
- Verify `s` and `len` are captured correctly
- Test ConditionEnv resolution for header condition `i < len`
- Ensure ExprLowerer can access function params
### Step 4: Full _atoi Integration
- Test complete `_atoi` function from jsonparser.hako
- Verify all E2E test cases pass
- Check MIR dump for correct structure
### Step 5: Edge Cases
- Empty string handling
- Single character strings
- Non-digit immediate break
- Large numbers (overflow consideration)
---
## 8. Potential Challenges
### 8.1 LoopBodyLocal Detection
**Issue**: `digit_pos` must be recognized as a loop body local variable that is:
1. Declared inside loop body (`local digit_pos = ...`)
2. Used in break condition (`if digit_pos < 0`)
3. Available in UpdateEnv during result update emission
**Solution**: Existing LoopBodyLocalDetector should handle this (Phase 184).
### 8.2 Method Call in Loop Body
**Issue**: `s.substring(i, i + 1)` and `digits.indexOf(ch)` are method calls with multiple arguments.
**Solution**: LoopBodyLocal lowering should handle method calls as "complex expressions" (Phase 184 already supports this via `ExprLowerer`).
### 8.3 UpdateEnv Variable Resolution
**Issue**: UpdateEnv must resolve variables from three sources:
- Carriers: `i`, `result`
- Function params: `s`, `len`
- Body-locals: `ch`, `digit_pos`
**Solution**: Phase 184's `UpdateEnv::new(&cond_env, &body_env)` should handle this if:
- ConditionEnv contains function params + carriers
- LoopBodyLocalEnv contains body-local ValueIds
---
## 9. Success Criteria
### ✅ Phase 246-EX Complete When:
1. **Detection**: `result = result * 10 + digit_pos` correctly identified as NumberAccumulation
2. **Emission**: Mul+Add JoinIR sequence generated with correct ValueIds
3. **Execution**: All E2E test cases produce correct numeric outputs
4. **Validation**: MIR dump shows expected loop structure with PHI nodes
5. **Integration**: Works within full JsonParser context (not just isolated test)
---
## 10. References
### Phase 190: NumberAccumulation Infrastructure
- Detection: `/src/mir/join_ir/lowering/loop_update_analyzer.rs` (lines 157-192)
- Emission: `/src/mir/join_ir/lowering/carrier_update_emitter.rs` (lines 139-170)
- Tests: Both files contain comprehensive unit tests
### Phase 184: UpdateEnv and LoopBodyLocal
- UpdateEnv: `/src/mir/join_ir/lowering/update_env.rs`
- LoopBodyLocal detection: (search for LoopBodyLocalDetector)
### Related Phases
- Phase 176-2: Carrier update emission basics
- Phase 178: String literal updates (similar multi-carrier pattern)
- Phase 197: Loop update analyzer extraction
---
## Appendix A: Quick Reference
### Pattern Match: _atoi Loop
```
Loop Type: Pattern 2 (Break)
Header: i < len
Break: digit_pos < 0
Carriers: i (counter), result (accumulator)
Body-locals: ch, digit_pos
Captured: s, len, digits (pre-loop)
UpdateExpr:
i -> Const(1)
result -> BinOp { op: Add, rhs: NumberAccumulation { base: 10, digit_var: "digit_pos" } }
```
### JoinIR Emission Sequence (NumberAccumulation)
```
%base_10 = Const(10)
%tmp = BinOp(Mul, %result_param, %base_10)
%digit_id = <resolved from UpdateEnv>
%result_next = BinOp(Add, %tmp, %digit_id)
```
### Debug Commands
```bash
# Trace variable mapping
NYASH_TRACE_VARMAP=1 cargo test --release test_json_atoi -- --nocapture
# JoinIR debug
NYASH_JOINIR_DEBUG=1 ./target/release/hakorune apps/tests/json_atoi_smoke.hako 2>&1 | grep "\[trace:"
# UpdateExpr detection (if logging added)
NYASH_UPDATE_ANALYZER_DEBUG=1 ./target/release/hakorune apps/tests/json_atoi_smoke.hako
```
---
**Document Status**: ✅ Infrastructure confirmed, ready for implementation
**Next Step**: Phase 246-EX Step 1 - Minimal smoke test creation
**Last Updated**: 2025-12-11 (Phase 246-EX Step 0 completion)