Files

nyash-codex 7b0db59100 feat(joinir): Phase 53 - SELFHOST-NORM-DEV-EXPAND implementation

Expanded selfhost dev Normalized target with 2 practical P2/P3 loop variations,
strengthened structural signature axis, and implemented two-stage detection.

Key Changes:

1. Documentation (phase49-selfhost-joinir-depth2-design.md +128 lines):
   - Added Phase 53 section with candidate selection rationale
   - Documented two-stage detector strategy (structural primary + dev-only name guard)
   - Defined structural axis strengthening (carrier count/type, branch patterns)

2. Fixtures (+210 lines):
   - selfhost_args_parse_p2.program.json (60 lines): P2 with String carrier + conditional branching
   - selfhost_stmt_count_p3.program.json (150 lines): P3 with 5 carriers + multi-branch if-else

3. Structured Builders (fixtures.rs +48 lines):
   - build_selfhost_args_parse_p2_structured_for_normalized_dev()
   - build_selfhost_stmt_count_p3_structured_for_normalized_dev()

4. ShapeGuard Two-Stage Detection (shape_guard.rs +80 lines):
   - Added SelfhostArgsParseP2/SelfhostStmtCountP3 to NormalizedDevShape enum
   - Implemented is_selfhost_args_parse_p2(): P2 core family + name guard
   - Implemented is_selfhost_stmt_count_p3(): 2-10 carrier check + name guard
   - Updated capability_for_shape() mappings

5. Bridge Integration (bridge.rs +8 lines, normalized.rs +10 lines):
   - Added shape handlers delegating to existing normalizers
   - Added roundtrip reconstruction handlers

6. Entry Point Registration (ast_lowerer/mod.rs +2 lines):
   - Registered selfhost_args_parse_p2/selfhost_stmt_count_p3 as LoopFrontend routes

7. Dev VM Comparison Tests (normalized_joinir_min.rs +40 lines):
   - normalized_selfhost_args_parse_p2_vm_bridge_direct_matches_structured()
   - normalized_selfhost_stmt_count_p3_vm_bridge_direct_matches_structured()

8. Test Context Fix (dev_env.rs):
   - Added thread-local test context depth counter
   - Fixed deadlock in nested test_ctx() calls via reentrant with_dev_env_if_unset()

Structural Axis Growth:

P2 family:
- Carrier count: 1-3 (unchanged)
- NEW: Type diversity (Integer/String mixed)
- NEW: Conditional branching patterns (Eq-heavy comparisons)

P3 family:
- NEW: Carrier count upper bound: 2-10 (was 2-4)
- NEW: Multi-branch if-else (5+ branches with nested structure)
- NEW: Complex conditional patterns

Test Results:
- normalized_dev: 40/40 PASS (including 2 new tests)
- lib regression: 939 PASS, 56 ignored
- Existing behavior unchanged (normalized_dev feature-gated)

Phase 53 Achievements:
✅ P2/P3 each gained 1 practical variation (2 total)
✅ Two-stage detection: structural primary + dev-only name guard
✅ Structural axis expanded: 4 axes (carrier count/type/Compare/branch patterns)
✅ All tests PASS, no regressions
✅ Test context deadlock fixed (0.04s for 29 tests)

Files Modified: 14 files
Lines Added: ~516 lines (net)
Implementation: Pure additive (feature-gated)

Next Phase (54+):
- Accumulate 6+ loops per P2/P3 family
- Achieve 5+ stable structural axes
- Target < 5% false positive rate
- Then shrink/remove name guard scope

2025-12-12 16:40:20 +09:00

14 KiB

Raw Blame History

Phase 48: Normalized P4 (Continue) Design

Status: Phase 48-A/B/C COMPLETE (minimal + JsonParser skip_ws continue、Normalized→MIR 直経路＋canonical 昇格まで完了) Date: 2025-12-12 / 2026-01-XX

Goal

Design Pattern4 (continue) Normalized architecture, extending the unified Normalized infrastructure that successfully handles P1/P2/P3.

Key insight: P4 is the reverse control flow of P2 (break). Where P2 exits early, P4 skips to next iteration early. Same infrastructure, different routing.

Background: Unified Normalized Success

Phase 43-47 established unified Normalized for P1/P2/P3:

✅ Pattern1: Simple while loops
✅ Pattern2: Break loops (skip_whitespace, _atoi, _parse_number)
✅ Pattern3: If-sum loops (conditional carrier updates)

Infrastructure proven:

Structured→Normalized→MIR(direct) pipeline
EnvLayout, JpInst/JpOp, StepScheduleBox
ConditionEnv, CarrierInfo, ExitLine
All patterns use same loop_step(env, k_exit) skeleton

Why P4 Uses Same Normalized

Control Flow Comparison

Aspect	P2 (Break)	P4 (Continue)	Difference
Normal flow	Execute body, update carriers, loop	Same	✅ Identical
Early exit	`if (cond) break` → exit loop	`if (cond) continue` → next iteration	Flow direction
Carrier updates	Before break check	After continue check	Order
Infrastructure	ConditionEnv, ExitLine, PHI	Same	✅ Reusable

Key difference: continue = TailCallFn(loop_step, env', k_exit) (immediate recursion) vs break = TailCallKont(k_exit, result) (exit to continuation).

P4 in Normalized JoinIR

// P2 (break) structure:
loop_step(env, k_exit) {
    if (header_cond) {
        // body
        if (break_cond) {
            TailCallKont(k_exit, result)  // Exit early
        }
        // update carriers
        TailCallFn(loop_step, env', k_exit)  // Loop back
    } else {
        TailCallKont(k_exit, result)  // Normal exit
    }
}

// P4 (continue) structure:
loop_step(env, k_exit) {
    if (header_cond) {
        // body
        if (continue_cond) {
            TailCallFn(loop_step, env', k_exit)  // Skip to next iteration ← continue!
        }
        // update carriers (only if NOT continued)
        TailCallFn(loop_step, env'', k_exit)  // Loop back
    } else {
        TailCallKont(k_exit, result)  // Normal exit
    }
}

Observation: continue is just an early TailCallFn call. No new JpInst needed!

Target P4 Loops (JsonParser)

Priority Assessment

Loop	Pattern	Complexity	Priority	Rationale
_parse_array (skip whitespace)	P4 minimal	Low	◎ PRIMARY	Simple continue, single carrier (i)
_parse_object (skip whitespace)	P4 minimal	Low	○ Extended	Same as _parse_array
_unescape_string (skip special chars)	P4 mid	Medium	△ Later	String operations, body-local
_parse_string (escape handling)	P4 mid	Medium	△ Later	Complex escape sequences

Phase 48-A Target: _parse_array (skip whitespace)

Example (simplified):

local i = 0
local s = "[1, 2]"
local len = s.length()

loop(i < len) {
    local ch = s.substring(i, i+1)

    if (ch == " " || ch == "\t") {
        i = i + 1
        continue  // Skip whitespace
    }

    // Process non-whitespace character
    // ...
    i = i + 1
}

Characteristics:

Simple condition: ch == " " || ch == "\t" (OR pattern)
Single carrier: i (position counter)
Body-local: ch (character)
continue before carrier update

Normalized shape:

EnvLayout: { i: int }
StepSchedule: [HeaderCond, BodyInit(ch), ContinueCheck, Updates(process), Tail(i++)]

Normalized Components for P4

StepScheduleBox Extension

P2/P3 steps (existing):

enum StepKind {
    HeaderCond,   // loop(cond)
    BodyInit,     // local ch = ...
    BreakCheck,   // if (cond) break  (P2)
    IfCond,       // if (cond) in body  (P3)
    ThenUpdates,  // carrier updates (P3)
    Updates,      // carrier updates
    Tail,         // i = i + 1
}

P4 addition:

enum StepKind {
    // ... existing ...

    ContinueCheck,  // if (cond) continue  (P4)
}

P4 schedule:

// _parse_array skip whitespace pattern
[HeaderCond, BodyInit, ContinueCheck, Updates, Tail]

// vs P2 pattern
[HeaderCond, BodyInit, BreakCheck, Updates, Tail]

// Observation: Same structure, different check semantics!

JpInst Reuse

No new JpInst needed! P4 uses existing instructions:

// P2 break:
If { cond, then_target: k_exit, else_target: continue_body }

// P4 continue:
If { cond, then_target: loop_step_with_tail, else_target: process_body }

Key: continue = immediate TailCallFn(loop_step, ...), not a new instruction.

EnvLayout (Same as P2)

P2 example:

struct Pattern2Env {
    i: int,      // loop param
    sum: int,    // carrier
}

P4 example (identical structure):

struct Pattern4Env {
    i: int,      // loop param (position counter)
    // No additional carriers for skip whitespace
}

No new fields needed - P4 carriers work same as P2/P3.

Architecture: Unified Normalized

┌──────────────────────────────────────────┐
│   Structured JoinIR (Pattern1-4 共通)    │
│  - ConditionEnv (P2/P3/P4 統一)          │
│  - CarrierInfo                           │
│  - ExitLine/Boundary                     │
└──────────────┬───────────────────────────┘
               │
               ▼
┌──────────────────────────────────────────┐
│   Normalized JoinIR (Pattern1-4 共通)    │  ← P4 もここに載る！
│  - EnvLayout (P2 完成 → P3/P4 拡張)      │
│  - JpInst/JpOp (既存で対応済み)          │
│  - StepScheduleBox (ContinueCheck 追加)   │
└──────────────┬───────────────────────────┘
               │
               ▼
┌──────────────────────────────────────────┐
│   MIR (Pattern1-4 共通)                  │
└──────────────────────────────────────────┘

Implementation Strategy

Phase 48-A: Minimal _parse_array skip whitespace (dev-only)

Goal: Prove P4 can use Normalized infrastructure with minimal additions.

実装ステータス（48-A 完了サマリ）:

✅ Fixture 追加: pattern4_continue_min.program.json
- 「i == 2 を continue でスキップする最小 P4 ループ」を Program(JSON) として用意。
✅ ShapeGuard 拡張:
- NormalizedDevShape::Pattern4ContinueMinimal を追加し、構造ベースで P4 minimal 形状を検出。
✅ StepScheduleBox 拡張:
- StepKind::ContinueCheck を追加し、評価順序を
  HeaderCond → ContinueCheck → Updates → Tail に固定。
✅ Normalized lowering:
- normalize_pattern4_continue_minimal() を実装し、P2 正規化ロジックを 95% 再利用した continue 対応を追加。
✅ テスト:
- Normalized dev スイートに P4 minimal 用の比較テストを 4 本追加
  （Structured→Normalized→MIR(direct) vs Structured→MIR / runner / VM bridge）。
- cargo test --release ベースで 939/939 tests PASS（Phase 48-A 実装時点）。

Steps:

ShapeGuard: Add Pattern4ContinueMinimal shape
StepScheduleBox: Add ContinueCheck step kind
Normalized lowering:
- Generate If JpInst for continue check
- then_target → immediate TailCallFn(loop_step, ...) (continue)
- else_target → process body, then tail
Test: Verify Structured→Normalized→MIR(direct) matches Structured→MIR

Expected additions:

shape_guard.rs: +1 shape variant
step_schedule.rs: +1 step kind (ContinueCheck)
normalized.rs: +40 lines (normalize_pattern4_continue_minimal)
tests/normalized_joinir_min.rs: +1 P4 test

Dev fixture: Create pattern4_continue_minimal from _parse_array skip whitespace

Phase 48-B: _parse_object, _unescape_string (dev-only)

Status (dev-only): _parse_array / _parse_object の whitespace continue ループを Normalized→MIR(direct) で比較済み。
Fixture を jsonparser_parse_{array,object}_continue_skip_ws.program.json として追加し、shape_guard / normalize_for_shape / direct bridge で dev 専用ルートを通す。
_unescape_string は未着手（Phase 48-C 以降）。

Goal: Extend to multiple carriers, string operations (unescape) after skip_ws 系が固まったら続行。

Additions:

Multi-carrier EnvLayout (if needed)
String body-local handling (already exists from P2 DigitPos)

Phase 48-C: Canonical promotion

Goal: Move P4 minimal from dev-only to canonical (like P2/P3).

Key Design Decisions

1. Continue = TailCallFn, not new instruction

Rationale: continue is semantically "skip to next iteration", which is exactly what TailCallFn(loop_step, env', k_exit) does in CPS.

Benefit: No new JpInst, reuses existing MIR generation.

2. ContinueCheck step before Updates

Rationale: continue must happen BEFORE carrier updates (skip processing).

P4 step order:

HeaderCond → BodyInit → ContinueCheck → Updates (processing) → Tail (increment)
                             ↓ (if true)
                        TailCallFn (skip Updates)

3. Same EnvLayout as P2

Rationale: P4 carriers (position, accumulators) are same types as P2.

Benefit: No new EnvLayout design, reuses P2 infrastructure 100%.

Comparison with P2/P3

Component	P2 (Break)	P3 (If-Sum)	P4 (Continue)	Shared?
EnvLayout	✅	✅	✅	✅ Yes
ConditionEnv	✅	✅	✅	✅ Yes
CarrierInfo	✅	✅	✅	✅ Yes
ExitLine	✅	✅	✅	✅ Yes
StepKind	BreakCheck	IfCond, ThenUpdates	ContinueCheck	Additive
JpInst	If, TailCallFn, TailCallKont	✅ Same	✅ Same	✅ Yes
Control flow	Exit early	Conditional update	Skip early	Different

Infrastructure reuse: 95%+ (only StepKind and control flow routing differ)

Testing Strategy

Phase 48-A: Minimal

Test: test_normalized_pattern4_continue_minimal

#[cfg(feature = "normalized_dev")]
#[test]
fn test_normalized_pattern4_continue_minimal() {
    let source = r#"
        local i = 0
        local n = 5
        local count = 0
        loop(i < n) {
            if (i == 2) {
                i = i + 1
                continue
            }
            count = count + 1
            i = i + 1
        }
        print("count = " + count.to_string())
    "#;

    // Compare Structured→MIR vs Normalized→MIR(direct)
    assert_vm_output_matches(source);
}

Expected output:

count = 4  (skipped i==2, so counted 0,1,3,4)

Success Criteria

Phase 48-A complete when:

test_normalized_pattern4_continue_minimal passes (dev-only)
Structured→Normalized→MIR(direct) output matches Structured→MIR
All 938+ tests still pass (no regressions)
ShapeGuard can detect Pattern4ContinueMinimal
Documentation updated (architecture overview, CURRENT_TASK)

→ 上記 1–5 はコミット 7200309c 時点ですべて満たされており、Phase 48-A は完了ステータスだよ。

Phase 48-B complete when:

✅ _parse_object, _unescape_string tests pass (dev-only)
✅ Multi-carrier + string operations work in P4 Normalized

Phase 48-C complete when:

✅ P4 minimal promoted to canonical (always Normalized)
✅ Performance validated

Scope Management

In Scope (Phase 48-A):

✅ Minimal P4 (simple continue pattern)
✅ Dev-only Normalized support
✅ Reuse P2/P3 infrastructure (ConditionEnv, CarrierInfo, ExitLine)

Out of Scope (deferred):

❌ Complex P4 patterns (nested if, multiple continue points)
❌ Canonical promotion (Phase 48-C)
❌ Selfhost loops (later phase)

File Impact Estimate

Expected modifications (Phase 48-A):

shape_guard.rs: +20 lines (Pattern4ContinueMinimal shape)
step_schedule.rs: +10 lines (ContinueCheck step kind)
normalized.rs: +40 lines (normalize_pattern4_continue_minimal)
tests/normalized_joinir_min.rs: +30 lines (P4 test)
phase48-norm-p4-design.md: +250 lines (this doc)
joinir-architecture-overview.md: +10 lines (Phase 48 section)
CURRENT_TASK.md: +5 lines (Phase 48 entry)

Total: ~365 lines (+), pure additive (no P1/P2/P3 code changes)

Benefits

Infrastructure reuse: 95% of P2/P3 Normalized code works for P4
Unified pipeline: All patterns (P1/P2/P3/P4) use same Normalized
Incremental rollout: Dev-only → canonical, proven approach from P2/P3
Semantic clarity: continue = immediate TailCallFn (no new concepts)

Next Steps After Phase 48

Phase 48-A implementation: Minimal P4 (continue) dev-only
Phase 48-B: Extended P4 (multi-carrier, string ops)
Phase 48-C: Canonical promotion
Selfhost loops: Complex patterns from selfhost compiler

References

P2 Completion: PHASE_43_245B_NORMALIZED_COMPLETION.md
P3 Design: phase47-norm-p3-design.md
P3 Implementation: Phase 47-A-LOWERING (commit 99bdf93d)
Architecture: joinir-architecture-overview.md

14 KiB Raw Blame History Unescape Escape