Files

nyash-codex 393aaf1500 feat(phase161): Add Analyzer Box design and representative function selection

Phase 161 Task 2 & 3 completion:

**Task 2: Analyzer Box Design** (phase161_analyzer_box_design.md)
- Defined 3 core analyzer Boxes with clear responsibilities:
  1. JsonParserBox: Low-level JSON parsing (reusable utility)
  2. MirAnalyzerBox: Primary MIR v1 semantic analysis (14 methods)
  3. JoinIrAnalyzerBox: JoinIR v0 compatibility layer
- Comprehensive API contracts for all methods:
  - validateSchema(), summarize_function(), list_phis(), list_loops(), list_ifs()
  - propagate_types(), reachability_analysis(), dump methods
- Design principles applied: 箱化, 境界作成, Fail-Fast, 遅延シングルトン
- 5-stage implementation roadmap (Phase 161-2 through 161-5)
- Key algorithms documented: PHI detection, loop detection, if detection, type propagation

**Task 3: Representative Function Selection** (phase161_representative_functions.md)
- Formally selected 5 representative functions covering all patterns:
  1. if_simple: Basic if/else with PHI merge (⭐ Simple)
  2. loop_simple: Loop with back edge and loop-carried PHI (⭐ Simple)
  3. if_loop: Nested if inside loop with multiple PHI (⭐⭐ Medium)
  4. loop_break: Loop with break statement and multiple exits (⭐⭐ Medium)
  5. type_prop: Type propagation through loop arithmetic (⭐⭐ Medium)
- Each representative validates specific analyzer capabilities
- Selection criteria documented for future extensibility
- Validation strategy for Phase 161-2+ implementation

Representative test files will be created in local_tests/phase161/
(not committed due to .gitignore, but available for development)

Next: Phase 161 Task 4 - Implement basic MirAnalyzerBox on rep1_if_simple and rep2_loop_simple

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-12-04 19:37:18 +09:00

17 KiB

Raw Blame History

Phase 161 Task 2: Analyzer Box Design (JoinIrAnalyzerBox / MirAnalyzerBox)

Status: 🎯 DESIGN PHASE - Defining .hako Analyzer Box structure and responsibilities

Objective: Design the foundational .hako Boxes for analyzing Rust JSON MIR/JoinIR data, establishing clear responsibilities and API contracts.

Executive Summary

Phase 161 aims to port JoinIR analysis logic from Rust to .hako. The first step was creating a complete JSON format inventory (Task 1, completed). Now we design the .hako Box architecture that will consume this data.

Key Design Decision: Create TWO specialized Analyzer Boxes with distinct, non-overlapping responsibilities:

MirAnalyzerBox: Analyzes MIR JSON v1 (primary)
JoinIrAnalyzerBox: Analyzes JoinIR JSON v0 (secondary, for compatibility)

Both boxes will share a common JsonParserBox utility for low-level JSON parsing operations.

1. Core Architecture: Box Responsibilities

1.1 JsonParserBox (Shared Utility)

Purpose: Low-level JSON parsing and traversal (reusable across both analyzers)

Scope: Single-minded JSON access without semantic analysis

Responsibilities:

Parse JSON text into MapBox/ArrayBox structure
Provide recursive accessor methods: get(), getArray(), getInt(), getString()
Handle type conversions safely with nullability
Provide iteration helpers: forEach(), map(), filter()

Key Methods:

birth(jsonText)              // Parse JSON from string
get(path: string): any       // Get nested value by dot-notation (e.g., "functions/0/blocks")
getArray(path): ArrayBox     // Get array at path with type safety
getString(path): string      // Get string with default ""
getInt(path): integer        // Get integer with default 0
getBool(path): boolean       // Get boolean with default false

Non-Scope: Semantic analysis, MIR-specific validation, JoinIR-specific knowledge

1.2 MirAnalyzerBox (Primary Analyzer)

Purpose: Analyze MIR JSON v1 according to Phase 161 specifications

Scope: All MIR-specific analysis operations

Responsibilities:

Schema Validation: Verify MIR JSON has required fields (schema_version, functions, cfg)
Instruction Type Detection: Identify instruction types (14 types in MIR v1)
PHI Detection: Identify PHI instructions and extract incoming values
Loop Detection: Identify loops via backward edge analysis (CFG)
If Detection: Identify conditional branches and PHI merge points
Type Analysis: Propagate type hints through PHI/BinOp/Compare operations
Reachability Analysis: Mark unreachable blocks (dead code detection)

Key Methods (Single-Function Analysis):

birth(mirJsonText)                                   // Parse MIR JSON

// === Schema Validation ===
validateSchema(): boolean                            // Check MIR v1 structure

// === Function-Level Analysis ===
summarize_function(funcIndex: integer): MapBox      // Returns:
                                                    // {
                                                    //   name: string,
                                                    //   params: integer,
                                                    //   blocks: integer,
                                                    //   instructions: integer,
                                                    //   has_loops: boolean,
                                                    //   has_ifs: boolean,
                                                    //   has_phis: boolean
                                                    // }

// === Instruction Detection ===
list_instructions(funcIndex): ArrayBox              // Returns array of:
                                                    // {
                                                    //   block_id: integer,
                                                    //   inst_index: integer,
                                                    //   op: string,
                                                    //   dest: integer (ValueId),
                                                    //   src1, src2: integer (ValueId)
                                                    // }

// === PHI Analysis ===
list_phis(funcIndex): ArrayBox                      // Returns array of PHI instructions:
                                                    // {
                                                    //   block_id: integer,
                                                    //   dest: integer (ValueId),
                                                    //   incoming: ArrayBox of
                                                    //     [value_id, from_block_id]
                                                    // }

// === Loop Detection ===
list_loops(funcIndex): ArrayBox                     // Returns array of loop structures:
                                                    // {
                                                    //   header_block: integer,
                                                    //   exit_block: integer,
                                                    //   back_edge_from: integer,
                                                    //   contains_blocks: ArrayBox
                                                    // }

// === If Detection ===
list_ifs(funcIndex): ArrayBox                       // Returns array of if structures:
                                                    // {
                                                    //   condition_block: integer,
                                                    //   condition_value: integer (ValueId),
                                                    //   true_block: integer,
                                                    //   false_block: integer,
                                                    //   merge_block: integer
                                                    // }

// === Type Analysis ===
propagate_types(funcIndex): MapBox                  // Returns type map:
                                                    // {
                                                    //   value_id: type_string
                                                    //   (e.g., "i64", "void", "boxref")
                                                    // }

// === Control Flow Analysis ===
reachability_analysis(funcIndex): ArrayBox          // Returns:
                                                    // {
                                                    //   reachable_blocks: ArrayBox,
                                                    //   unreachable_blocks: ArrayBox
                                                    // }

Key Algorithms:

PHI Detection Algorithm

For each block in function:
  For each instruction in block:
    If instruction.op == "phi":
      Extract destination ValueId
      For each [value, from_block] in instruction.incoming:
        Record PHI merge point
      Mark block as PHI merge block

Loop Detection Algorithm (CFG-based)

Build adjacency list from CFG (target → [from_blocks])
For each block B:
  For each successor S in B:
    If S's block_id < B's block_id:
      Found backward edge B → S
      S is loop header
      Find all blocks in loop via DFS from S
      Record loop structure

If Detection Algorithm

For each block B with Branch instruction:
  condition = branch.condition (ValueId)
  true_block = branch.targets[0]
  false_block = branch.targets[1]

  For each successor block S of true_block OR false_block:
    If S has PHI instruction with incoming from both true_block AND false_block:
      S is the merge block
      Record if structure

Type Propagation Algorithm

Initialize: type_map[v] = v.hint (from Const/Compare/BinOp)
Iterate 4 times:  // Maximum iterations before convergence
  For each PHI instruction:
    incoming_types = [type_map[v] for each [v, _] in phi.incoming]
    Merge types: take most specific common type
    type_map[phi.dest] = merged_type

  For each BinOp/Compare/etc:
    Propagate operand types to result

1.3 JoinIrAnalyzerBox (Secondary Analyzer)

Purpose: Analyze JoinIR JSON v0 (CPS-style format)

Scope: JoinIR-specific analysis operations

Responsibilities:

Schema Validation: Verify JoinIR JSON has required fields
Continuation Extraction: Parse CPS-style continuation structures
Direct Conversion to MIR: Transform JoinIR JSON to MIR-compatible format
Backward Compatibility: Support legacy JoinIR analysis workflows

Key Methods:

birth(joinirJsonText)                               // Parse JoinIR JSON

validateSchema(): boolean                            // Check JoinIR v0 structure

// === JoinIR-Specific Analysis ===
list_continuations(funcIndex): ArrayBox            // Returns continuation structures

// === Conversion ===
convert_to_mir(funcIndex): string                  // Returns MIR JSON equivalent
                                                   // (enables reuse of MirAnalyzerBox)

Note on Design: JoinIrAnalyzerBox is intentionally minimal - its primary purpose is converting JoinIR to MIR format, then delegating to MirAnalyzerBox for actual analysis. This avoids code duplication.

2. Shared Infrastructure

2.1 AnalyzerCommonBox (Base Utilities)

Purpose: Common helper methods used by both analyzers

Key Methods:

// === Utility Methods ===
extract_function(funcIndex: integer): MapBox       // Extract single function data
extract_cfg(funcIndex: integer): MapBox             // Extract CFG for block analysis
build_adjacency_list(cfg): MapBox                  // Build block→blocks adjacency

// === Debugging/Tracing ===
set_verbose(enabled: boolean)                      // Enable detailed output
dump_function(funcIndex): string                   // Pretty-print function data
dump_cfg(funcIndex): string                        // Pretty-print CFG

3. Data Flow Architecture

JSON Input (MIR or JoinIR)
    ↓
JsonParserBox (Parse to MapBox/ArrayBox)
    ↓
    ├─→ MirAnalyzerBox → Semantic Analysis
    │       ↓
    │   (PHI detection, loop detection, etc.)
    │       ↓
    │   Analysis Results (ArrayBox/MapBox)
    │
    └─→ JoinIrAnalyzerBox → Convert to MIR
            ↓
        (Transform JoinIR → MIR)
            ↓
        MirAnalyzerBox (reuse)
            ↓
        Analysis Results

4. API Contract: Method Signatures (Finalized)

MirAnalyzerBox

static box MirAnalyzerBox {
    // Parser state
    parsed_mir: MapBox
    json_parser: JsonParserBox

    // Analysis cache
    func_cache: MapBox          // Memoization for expensive operations
    verbose_mode: BoolBox

    // Constructor
    birth(mir_json_text: string) {
        me.parsed_mir = JsonParserBox.parse(mir_json_text)
        me.json_parser = new JsonParserBox()
        me.func_cache = new MapBox()
        me.verbose_mode = false
    }

    // === Validation ===
    validateSchema(): BoolBox {
        // Returns true if MIR v1 schema valid
    }

    // === Analysis Methods ===
    summarize_function(funcIndex: IntegerBox): MapBox {
        // Returns { name, params, blocks, instructions, has_loops, has_ifs, has_phis }
    }

    list_instructions(funcIndex: IntegerBox): ArrayBox {
        // Returns array of { block_id, inst_index, op, dest, src1, src2 }
    }

    list_phis(funcIndex: IntegerBox): ArrayBox {
        // Returns array of { block_id, dest, incoming }
    }

    list_loops(funcIndex: IntegerBox): ArrayBox {
        // Returns array of { header_block, exit_block, back_edge_from, contains_blocks }
    }

    list_ifs(funcIndex: IntegerBox): ArrayBox {
        // Returns array of { condition_block, condition_value, true_block, false_block, merge_block }
    }

    propagate_types(funcIndex: IntegerBox): MapBox {
        // Returns { value_id: type_string }
    }

    reachability_analysis(funcIndex: IntegerBox): ArrayBox {
        // Returns { reachable_blocks, unreachable_blocks }
    }

    // === Debugging ===
    set_verbose(enabled: BoolBox) { }
    dump_function(funcIndex: IntegerBox): StringBox { }
    dump_cfg(funcIndex: IntegerBox): StringBox { }
}

JsonParserBox

static box JsonParserBox {
    root: MapBox

    birth(json_text: string) {
        // Parse JSON text into MapBox/ArrayBox structure
    }

    get(path: string): any {
        // Get value by dot-notation path
    }

    getArray(path: string): ArrayBox { }
    getString(path: string): string { }
    getInt(path: string): integer { }
    getBool(path: string): boolean { }
}

5. Implementation Strategy

Phase 161-2: Basic MirAnalyzerBox Structure (First Iteration)

Scope: Get basic structure working, focus on summarize_function() and list_instructions()

Implement JsonParserBox (simple recursive MapBox builder)
Implement MirAnalyzerBox.birth() to parse MIR JSON
Implement validateSchema() to verify structure
Implement summarize_function() (basic field extraction)
Implement list_instructions() (iterate blocks, extract instructions)

Success Criteria:

Can parse MIR JSON test files
Can extract function metadata
Can list all instructions in order

Phase 161-3: PHI/Loop/If Detection

Scope: Advanced control flow analysis

Implement list_phis() using pattern matching
Implement list_loops() using CFG and backward edge detection
Implement list_ifs() using condition and merge detection
Test on representative functions

Success Criteria:

Correctly identifies all PHI instructions
Correctly detects loop header and back edges
Correctly identifies if/merge structures

Phase 161-4: Type Propagation

Scope: Type hint system

Implement type extraction from Const/Compare/BinOp
Implement 4-iteration propagation algorithm
Build type map for ValueId

Success Criteria:

Type map captures all reachable types
No type conflicts or inconsistencies

Phase 161-5: Analysis Features

Scope: Extended functionality

Implement reachability analysis (mark unreachable blocks)
Implement dump methods for debugging
Add caching to expensive operations

6. Representative Functions for Testing

Per Task 3 selection criteria, these functions will be used for Phase 161-2+ validation:

if_select_simple (Simple if/else with PHI)
- 4 BasicBlocks
- 1 Branch instruction
- 1 PHI instruction at merge
- Type: Simple if pattern
min_loop (Minimal loop with PHI)
- 2 BasicBlocks (header + body)
- Loop back edge
- PHI instruction at header
- Type: Loop pattern
skip_ws (From JoinIR, more complex)
- 6+ BasicBlocks
- Nested control flow
- Multiple PHI instructions
- Type: Complex pattern

Usage: Each will be analyzed by MirAnalyzerBox to verify correctness of detection algorithms.

7. Design Principles Applied

🏗️ 箱にする (Boxification)

Each analyzer box has single responsibility
Clear API boundary (methods) with defined input/output contracts
No shared mutable state between boxes

🌳 境界を作る (Clear Boundaries)

JsonParserBox: Low-level JSON only
MirAnalyzerBox: MIR semantics only
JoinIrAnalyzerBox: JoinIR conversion only
No intermingling of concerns

⚡ Fail-Fast

validateSchema() must pass or error (no silent failures)
Invalid instruction types cause immediate error
Type propagation inconsistencies detected and reported

🔄 遅延シングルトン (Lazy Evaluation)

Each method computes its result on-demand
Results are cached in func_cache to avoid recomputation
No pre-computation of unnecessary analysis

8. Questions Answered by This Design

Q: Why two separate analyzer boxes? A: MIR and JoinIR have fundamentally different schemas. Separate boxes with clear single responsibilities are easier to test, maintain, and extend.

Q: Why separate JsonParserBox? A: JSON parsing is orthogonal to semantic analysis. Extracting it enables reuse and makes testing easier.

Q: Why caching? A: Control flow analysis is expensive (CFG traversal, reachability). Caching prevents redundant computation when multiple methods query the same data.

Q: Why 4 iterations for type propagation? A: Based on Phase 25 experience - 4 iterations handles most practical programs. Documented in phase161_joinir_analyzer_design.md.

9. Next Steps (Task 3)

Once this design is approved:

Task 3: Formally select 3-5 representative functions that cover all detection patterns
Task 4: Implement basic .hako JsonParserBox and MirAnalyzerBox
Task 5: Create joinir_analyze.sh CLI entry point
Task 6: Test on representative functions
Task 7: Update CURRENT_TASK.md and roadmap

10. References

Phase 161 Task 1: phase161_joinir_analyzer_design.md - JSON schema inventory
Phase 173-B: phase173b-boxification-assessment.md - Boxification design principles
MIR INSTRUCTION_SET: docs/reference/mir/INSTRUCTION_SET.md
Box System: docs/reference/boxes-system/

Status: 🎯 Ready for Task 3 approval and representative function selection

17 KiB Raw Blame History

Phase 161 Task 2: Analyzer Box Design (JoinIrAnalyzerBox / MirAnalyzerBox)

Executive Summary

1. Core Architecture: Box Responsibilities

1.1 JsonParserBox (Shared Utility)

1.2 MirAnalyzerBox (Primary Analyzer)

PHI Detection Algorithm

Loop Detection Algorithm (CFG-based)

If Detection Algorithm

Type Propagation Algorithm

1.3 JoinIrAnalyzerBox (Secondary Analyzer)

2. Shared Infrastructure

2.1 AnalyzerCommonBox (Base Utilities)

3. Data Flow Architecture

4. API Contract: Method Signatures (Finalized)

MirAnalyzerBox

JsonParserBox

5. Implementation Strategy

Phase 161-2: Basic MirAnalyzerBox Structure (First Iteration)

Phase 161-3: PHI/Loop/If Detection

Phase 161-4: Type Propagation

Phase 161-5: Analysis Features

6. Representative Functions for Testing

7. Design Principles Applied

🏗️ 箱にする (Boxification)

🌳 境界を作る (Clear Boundaries)

⚡ Fail-Fast

🔄 遅延シングルトン (Lazy Evaluation)

8. Questions Answered by This Design

9. Next Steps (Task 3)

10. References

17 KiB

Raw Blame History