Phase 161 Task 2 & 3 completion: **Task 2: Analyzer Box Design** (phase161_analyzer_box_design.md) - Defined 3 core analyzer Boxes with clear responsibilities: 1. JsonParserBox: Low-level JSON parsing (reusable utility) 2. MirAnalyzerBox: Primary MIR v1 semantic analysis (14 methods) 3. JoinIrAnalyzerBox: JoinIR v0 compatibility layer - Comprehensive API contracts for all methods: - validateSchema(), summarize_function(), list_phis(), list_loops(), list_ifs() - propagate_types(), reachability_analysis(), dump methods - Design principles applied: 箱化, 境界作成, Fail-Fast, 遅延シングルトン - 5-stage implementation roadmap (Phase 161-2 through 161-5) - Key algorithms documented: PHI detection, loop detection, if detection, type propagation **Task 3: Representative Function Selection** (phase161_representative_functions.md) - Formally selected 5 representative functions covering all patterns: 1. if_simple: Basic if/else with PHI merge (⭐ Simple) 2. loop_simple: Loop with back edge and loop-carried PHI (⭐ Simple) 3. if_loop: Nested if inside loop with multiple PHI (⭐⭐ Medium) 4. loop_break: Loop with break statement and multiple exits (⭐⭐ Medium) 5. type_prop: Type propagation through loop arithmetic (⭐⭐ Medium) - Each representative validates specific analyzer capabilities - Selection criteria documented for future extensibility - Validation strategy for Phase 161-2+ implementation Representative test files will be created in local_tests/phase161/ (not committed due to .gitignore, but available for development) Next: Phase 161 Task 4 - Implement basic MirAnalyzerBox on rep1_if_simple and rep2_loop_simple 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
17 KiB
Phase 161 Task 2: Analyzer Box Design (JoinIrAnalyzerBox / MirAnalyzerBox)
Status: 🎯 DESIGN PHASE - Defining .hako Analyzer Box structure and responsibilities
Objective: Design the foundational .hako Boxes for analyzing Rust JSON MIR/JoinIR data, establishing clear responsibilities and API contracts.
Executive Summary
Phase 161 aims to port JoinIR analysis logic from Rust to .hako. The first step was creating a complete JSON format inventory (Task 1, completed). Now we design the .hako Box architecture that will consume this data.
Key Design Decision: Create TWO specialized Analyzer Boxes with distinct, non-overlapping responsibilities:
- MirAnalyzerBox: Analyzes MIR JSON v1 (primary)
- JoinIrAnalyzerBox: Analyzes JoinIR JSON v0 (secondary, for compatibility)
Both boxes will share a common JsonParserBox utility for low-level JSON parsing operations.
1. Core Architecture: Box Responsibilities
1.1 JsonParserBox (Shared Utility)
Purpose: Low-level JSON parsing and traversal (reusable across both analyzers)
Scope: Single-minded JSON access without semantic analysis
Responsibilities:
- Parse JSON text into MapBox/ArrayBox structure
- Provide recursive accessor methods:
get(),getArray(),getInt(),getString() - Handle type conversions safely with nullability
- Provide iteration helpers:
forEach(),map(),filter()
Key Methods:
birth(jsonText) // Parse JSON from string
get(path: string): any // Get nested value by dot-notation (e.g., "functions/0/blocks")
getArray(path): ArrayBox // Get array at path with type safety
getString(path): string // Get string with default ""
getInt(path): integer // Get integer with default 0
getBool(path): boolean // Get boolean with default false
Non-Scope: Semantic analysis, MIR-specific validation, JoinIR-specific knowledge
1.2 MirAnalyzerBox (Primary Analyzer)
Purpose: Analyze MIR JSON v1 according to Phase 161 specifications
Scope: All MIR-specific analysis operations
Responsibilities:
- Schema Validation: Verify MIR JSON has required fields (schema_version, functions, cfg)
- Instruction Type Detection: Identify instruction types (14 types in MIR v1)
- PHI Detection: Identify PHI instructions and extract incoming values
- Loop Detection: Identify loops via backward edge analysis (CFG)
- If Detection: Identify conditional branches and PHI merge points
- Type Analysis: Propagate type hints through PHI/BinOp/Compare operations
- Reachability Analysis: Mark unreachable blocks (dead code detection)
Key Methods (Single-Function Analysis):
birth(mirJsonText) // Parse MIR JSON
// === Schema Validation ===
validateSchema(): boolean // Check MIR v1 structure
// === Function-Level Analysis ===
summarize_function(funcIndex: integer): MapBox // Returns:
// {
// name: string,
// params: integer,
// blocks: integer,
// instructions: integer,
// has_loops: boolean,
// has_ifs: boolean,
// has_phis: boolean
// }
// === Instruction Detection ===
list_instructions(funcIndex): ArrayBox // Returns array of:
// {
// block_id: integer,
// inst_index: integer,
// op: string,
// dest: integer (ValueId),
// src1, src2: integer (ValueId)
// }
// === PHI Analysis ===
list_phis(funcIndex): ArrayBox // Returns array of PHI instructions:
// {
// block_id: integer,
// dest: integer (ValueId),
// incoming: ArrayBox of
// [value_id, from_block_id]
// }
// === Loop Detection ===
list_loops(funcIndex): ArrayBox // Returns array of loop structures:
// {
// header_block: integer,
// exit_block: integer,
// back_edge_from: integer,
// contains_blocks: ArrayBox
// }
// === If Detection ===
list_ifs(funcIndex): ArrayBox // Returns array of if structures:
// {
// condition_block: integer,
// condition_value: integer (ValueId),
// true_block: integer,
// false_block: integer,
// merge_block: integer
// }
// === Type Analysis ===
propagate_types(funcIndex): MapBox // Returns type map:
// {
// value_id: type_string
// (e.g., "i64", "void", "boxref")
// }
// === Control Flow Analysis ===
reachability_analysis(funcIndex): ArrayBox // Returns:
// {
// reachable_blocks: ArrayBox,
// unreachable_blocks: ArrayBox
// }
Key Algorithms:
PHI Detection Algorithm
For each block in function:
For each instruction in block:
If instruction.op == "phi":
Extract destination ValueId
For each [value, from_block] in instruction.incoming:
Record PHI merge point
Mark block as PHI merge block
Loop Detection Algorithm (CFG-based)
Build adjacency list from CFG (target → [from_blocks])
For each block B:
For each successor S in B:
If S's block_id < B's block_id:
Found backward edge B → S
S is loop header
Find all blocks in loop via DFS from S
Record loop structure
If Detection Algorithm
For each block B with Branch instruction:
condition = branch.condition (ValueId)
true_block = branch.targets[0]
false_block = branch.targets[1]
For each successor block S of true_block OR false_block:
If S has PHI instruction with incoming from both true_block AND false_block:
S is the merge block
Record if structure
Type Propagation Algorithm
Initialize: type_map[v] = v.hint (from Const/Compare/BinOp)
Iterate 4 times: // Maximum iterations before convergence
For each PHI instruction:
incoming_types = [type_map[v] for each [v, _] in phi.incoming]
Merge types: take most specific common type
type_map[phi.dest] = merged_type
For each BinOp/Compare/etc:
Propagate operand types to result
1.3 JoinIrAnalyzerBox (Secondary Analyzer)
Purpose: Analyze JoinIR JSON v0 (CPS-style format)
Scope: JoinIR-specific analysis operations
Responsibilities:
- Schema Validation: Verify JoinIR JSON has required fields
- Continuation Extraction: Parse CPS-style continuation structures
- Direct Conversion to MIR: Transform JoinIR JSON to MIR-compatible format
- Backward Compatibility: Support legacy JoinIR analysis workflows
Key Methods:
birth(joinirJsonText) // Parse JoinIR JSON
validateSchema(): boolean // Check JoinIR v0 structure
// === JoinIR-Specific Analysis ===
list_continuations(funcIndex): ArrayBox // Returns continuation structures
// === Conversion ===
convert_to_mir(funcIndex): string // Returns MIR JSON equivalent
// (enables reuse of MirAnalyzerBox)
Note on Design: JoinIrAnalyzerBox is intentionally minimal - its primary purpose is converting JoinIR to MIR format, then delegating to MirAnalyzerBox for actual analysis. This avoids code duplication.
2. Shared Infrastructure
2.1 AnalyzerCommonBox (Base Utilities)
Purpose: Common helper methods used by both analyzers
Key Methods:
// === Utility Methods ===
extract_function(funcIndex: integer): MapBox // Extract single function data
extract_cfg(funcIndex: integer): MapBox // Extract CFG for block analysis
build_adjacency_list(cfg): MapBox // Build block→blocks adjacency
// === Debugging/Tracing ===
set_verbose(enabled: boolean) // Enable detailed output
dump_function(funcIndex): string // Pretty-print function data
dump_cfg(funcIndex): string // Pretty-print CFG
3. Data Flow Architecture
JSON Input (MIR or JoinIR)
↓
JsonParserBox (Parse to MapBox/ArrayBox)
↓
├─→ MirAnalyzerBox → Semantic Analysis
│ ↓
│ (PHI detection, loop detection, etc.)
│ ↓
│ Analysis Results (ArrayBox/MapBox)
│
└─→ JoinIrAnalyzerBox → Convert to MIR
↓
(Transform JoinIR → MIR)
↓
MirAnalyzerBox (reuse)
↓
Analysis Results
4. API Contract: Method Signatures (Finalized)
MirAnalyzerBox
static box MirAnalyzerBox {
// Parser state
parsed_mir: MapBox
json_parser: JsonParserBox
// Analysis cache
func_cache: MapBox // Memoization for expensive operations
verbose_mode: BoolBox
// Constructor
birth(mir_json_text: string) {
me.parsed_mir = JsonParserBox.parse(mir_json_text)
me.json_parser = new JsonParserBox()
me.func_cache = new MapBox()
me.verbose_mode = false
}
// === Validation ===
validateSchema(): BoolBox {
// Returns true if MIR v1 schema valid
}
// === Analysis Methods ===
summarize_function(funcIndex: IntegerBox): MapBox {
// Returns { name, params, blocks, instructions, has_loops, has_ifs, has_phis }
}
list_instructions(funcIndex: IntegerBox): ArrayBox {
// Returns array of { block_id, inst_index, op, dest, src1, src2 }
}
list_phis(funcIndex: IntegerBox): ArrayBox {
// Returns array of { block_id, dest, incoming }
}
list_loops(funcIndex: IntegerBox): ArrayBox {
// Returns array of { header_block, exit_block, back_edge_from, contains_blocks }
}
list_ifs(funcIndex: IntegerBox): ArrayBox {
// Returns array of { condition_block, condition_value, true_block, false_block, merge_block }
}
propagate_types(funcIndex: IntegerBox): MapBox {
// Returns { value_id: type_string }
}
reachability_analysis(funcIndex: IntegerBox): ArrayBox {
// Returns { reachable_blocks, unreachable_blocks }
}
// === Debugging ===
set_verbose(enabled: BoolBox) { }
dump_function(funcIndex: IntegerBox): StringBox { }
dump_cfg(funcIndex: IntegerBox): StringBox { }
}
JsonParserBox
static box JsonParserBox {
root: MapBox
birth(json_text: string) {
// Parse JSON text into MapBox/ArrayBox structure
}
get(path: string): any {
// Get value by dot-notation path
}
getArray(path: string): ArrayBox { }
getString(path: string): string { }
getInt(path: string): integer { }
getBool(path: string): boolean { }
}
5. Implementation Strategy
Phase 161-2: Basic MirAnalyzerBox Structure (First Iteration)
Scope: Get basic structure working, focus on summarize_function() and list_instructions()
- Implement JsonParserBox (simple recursive MapBox builder)
- Implement MirAnalyzerBox.birth() to parse MIR JSON
- Implement validateSchema() to verify structure
- Implement summarize_function() (basic field extraction)
- Implement list_instructions() (iterate blocks, extract instructions)
Success Criteria:
- Can parse MIR JSON test files
- Can extract function metadata
- Can list all instructions in order
Phase 161-3: PHI/Loop/If Detection
Scope: Advanced control flow analysis
- Implement list_phis() using pattern matching
- Implement list_loops() using CFG and backward edge detection
- Implement list_ifs() using condition and merge detection
- Test on representative functions
Success Criteria:
- Correctly identifies all PHI instructions
- Correctly detects loop header and back edges
- Correctly identifies if/merge structures
Phase 161-4: Type Propagation
Scope: Type hint system
- Implement type extraction from Const/Compare/BinOp
- Implement 4-iteration propagation algorithm
- Build type map for ValueId
Success Criteria:
- Type map captures all reachable types
- No type conflicts or inconsistencies
Phase 161-5: Analysis Features
Scope: Extended functionality
- Implement reachability analysis (mark unreachable blocks)
- Implement dump methods for debugging
- Add caching to expensive operations
6. Representative Functions for Testing
Per Task 3 selection criteria, these functions will be used for Phase 161-2+ validation:
-
if_select_simple (Simple if/else with PHI)
- 4 BasicBlocks
- 1 Branch instruction
- 1 PHI instruction at merge
- Type: Simple if pattern
-
min_loop (Minimal loop with PHI)
- 2 BasicBlocks (header + body)
- Loop back edge
- PHI instruction at header
- Type: Loop pattern
-
skip_ws (From JoinIR, more complex)
- 6+ BasicBlocks
- Nested control flow
- Multiple PHI instructions
- Type: Complex pattern
Usage: Each will be analyzed by MirAnalyzerBox to verify correctness of detection algorithms.
7. Design Principles Applied
🏗️ 箱にする (Boxification)
- Each analyzer box has single responsibility
- Clear API boundary (methods) with defined input/output contracts
- No shared mutable state between boxes
🌳 境界を作る (Clear Boundaries)
- JsonParserBox: Low-level JSON only
- MirAnalyzerBox: MIR semantics only
- JoinIrAnalyzerBox: JoinIR conversion only
- No intermingling of concerns
⚡ Fail-Fast
- validateSchema() must pass or error (no silent failures)
- Invalid instruction types cause immediate error
- Type propagation inconsistencies detected and reported
🔄 遅延シングルトン (Lazy Evaluation)
- Each method computes its result on-demand
- Results are cached in func_cache to avoid recomputation
- No pre-computation of unnecessary analysis
8. Questions Answered by This Design
Q: Why two separate analyzer boxes? A: MIR and JoinIR have fundamentally different schemas. Separate boxes with clear single responsibilities are easier to test, maintain, and extend.
Q: Why separate JsonParserBox? A: JSON parsing is orthogonal to semantic analysis. Extracting it enables reuse and makes testing easier.
Q: Why caching? A: Control flow analysis is expensive (CFG traversal, reachability). Caching prevents redundant computation when multiple methods query the same data.
Q: Why 4 iterations for type propagation? A: Based on Phase 25 experience - 4 iterations handles most practical programs. Documented in phase161_joinir_analyzer_design.md.
9. Next Steps (Task 3)
Once this design is approved:
- Task 3: Formally select 3-5 representative functions that cover all detection patterns
- Task 4: Implement basic .hako JsonParserBox and MirAnalyzerBox
- Task 5: Create joinir_analyze.sh CLI entry point
- Task 6: Test on representative functions
- Task 7: Update CURRENT_TASK.md and roadmap
10. References
- Phase 161 Task 1: phase161_joinir_analyzer_design.md - JSON schema inventory
- Phase 173-B: phase173b-boxification-assessment.md - Boxification design principles
- MIR INSTRUCTION_SET: docs/reference/mir/INSTRUCTION_SET.md
- Box System: docs/reference/boxes-system/
Status: 🎯 Ready for Task 3 approval and representative function selection