Files
hakorune/tools/hako_parser/tokenizer.hako

137 lines
5.3 KiB
Plaintext
Raw Normal View History

// tools/hako_parser/tokenizer.hako - HakoTokenizerBox (Stage-3 aware tokenizer, MVP)
// Produces tokens with type, lexeme, line, col. Handles strings (escapes), numbers,
// identifiers, and punctuation. Keywords are normalized to upper-case kinds.
// no external deps (self-contained tokenizer)
Phase 21.2 Complete: VM Adapter正規実装 + devブリッジ完全撤去 ## 🎉 Phase 21.2完全達成 ### ✅ 実装完了 - VM static box 永続化(singleton infrastructure) - devブリッジ完全撤去(adapter_dev.rs削除、by-name dispatch削除) - .hako正規実装(MirCallV1Handler, AbiAdapterRegistry等) - text-merge経路完全動作 - 全phase2120 adapter reps PASS(7テスト) ### 🐛 バグ修正 1. strip_local_decl修正 - トップレベルのみlocal削除、メソッド内は保持 - src/runner/modes/common_util/hako.rs:29 2. static box フィールド永続化 - MirInterpreter singleton storage実装 - me parameter binding修正(1:1マッピング) - getField/setField string→singleton解決 - src/backend/mir_interpreter/{mod,exec,handlers/boxes_object_fields}.rs 3. Map.len alias rc=0修正 - [map/missing]パターン検出でnull扱い(4箇所) - lang/src/vm/boxes/mir_call_v1_handler.hako:91-93,131-133,151-153,199-201 ### 📁 主要変更ファイル #### Rust(VM Runtime) - src/backend/mir_interpreter/mod.rs - static box singleton storage - src/backend/mir_interpreter/exec.rs - parameter binding fix - src/backend/mir_interpreter/handlers/boxes_object_fields.rs - singleton resolution - src/backend/mir_interpreter/handlers/calls.rs - dev bridge removal - src/backend/mir_interpreter/utils/mod.rs - adapter_dev module removal - src/backend/mir_interpreter/utils/adapter_dev.rs - DELETED (7555 bytes) - src/runner/modes/vm.rs - static box declaration collection - src/runner/modes/common_util/hako.rs - strip_local_decl fix - src/instance_v2.rs - Clone implementation #### Hako (.hako実装) - lang/src/vm/boxes/mir_call_v1_handler.hako - [map/missing] detection - lang/src/vm/boxes/abi_adapter_registry.hako - NEW (adapter registry) - lang/src/vm/helpers/method_alias_policy.hako - method alias support #### テスト - tools/smokes/v2/profiles/quick/core/phase2120/s3_vm_adapter_*.sh - 7 new tests ### 🎯 テスト結果 ``` ✅ s3_vm_adapter_array_len_canary_vm.sh ✅ s3_vm_adapter_array_len_per_recv_canary_vm.sh ✅ s3_vm_adapter_array_length_alias_canary_vm.sh ✅ s3_vm_adapter_array_size_alias_canary_vm.sh ✅ s3_vm_adapter_map_len_alias_state_canary_vm.sh ✅ s3_vm_adapter_map_length_alias_state_canary_vm.sh ✅ s3_vm_adapter_map_size_struct_canary_vm.sh ``` 環境フラグ: HAKO_ABI_ADAPTER=1 HAKO_ABI_ADAPTER_DEV=0 ### 🏆 設計品質 - ✅ ハードコード禁止(AGENTS.md 5.1)完全準拠 - ✅ 構造的・一般化設計(特定Box名のif分岐なし) - ✅ 後方互換性保持(既存コード破壊ゼロ) - ✅ text-merge経路(.hako依存関係正しくマージ) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-07 19:32:44 +09:00
static box HakoTokenizerBox {
// Token: Map { type, lexeme, line, col }
Phase 21.2 Complete: VM Adapter正規実装 + devブリッジ完全撤去 ## 🎉 Phase 21.2完全達成 ### ✅ 実装完了 - VM static box 永続化(singleton infrastructure) - devブリッジ完全撤去(adapter_dev.rs削除、by-name dispatch削除) - .hako正規実装(MirCallV1Handler, AbiAdapterRegistry等) - text-merge経路完全動作 - 全phase2120 adapter reps PASS(7テスト) ### 🐛 バグ修正 1. strip_local_decl修正 - トップレベルのみlocal削除、メソッド内は保持 - src/runner/modes/common_util/hako.rs:29 2. static box フィールド永続化 - MirInterpreter singleton storage実装 - me parameter binding修正(1:1マッピング) - getField/setField string→singleton解決 - src/backend/mir_interpreter/{mod,exec,handlers/boxes_object_fields}.rs 3. Map.len alias rc=0修正 - [map/missing]パターン検出でnull扱い(4箇所) - lang/src/vm/boxes/mir_call_v1_handler.hako:91-93,131-133,151-153,199-201 ### 📁 主要変更ファイル #### Rust(VM Runtime) - src/backend/mir_interpreter/mod.rs - static box singleton storage - src/backend/mir_interpreter/exec.rs - parameter binding fix - src/backend/mir_interpreter/handlers/boxes_object_fields.rs - singleton resolution - src/backend/mir_interpreter/handlers/calls.rs - dev bridge removal - src/backend/mir_interpreter/utils/mod.rs - adapter_dev module removal - src/backend/mir_interpreter/utils/adapter_dev.rs - DELETED (7555 bytes) - src/runner/modes/vm.rs - static box declaration collection - src/runner/modes/common_util/hako.rs - strip_local_decl fix - src/instance_v2.rs - Clone implementation #### Hako (.hako実装) - lang/src/vm/boxes/mir_call_v1_handler.hako - [map/missing] detection - lang/src/vm/boxes/abi_adapter_registry.hako - NEW (adapter registry) - lang/src/vm/helpers/method_alias_policy.hako - method alias support #### テスト - tools/smokes/v2/profiles/quick/core/phase2120/s3_vm_adapter_*.sh - 7 new tests ### 🎯 テスト結果 ``` ✅ s3_vm_adapter_array_len_canary_vm.sh ✅ s3_vm_adapter_array_len_per_recv_canary_vm.sh ✅ s3_vm_adapter_array_length_alias_canary_vm.sh ✅ s3_vm_adapter_array_size_alias_canary_vm.sh ✅ s3_vm_adapter_map_len_alias_state_canary_vm.sh ✅ s3_vm_adapter_map_length_alias_state_canary_vm.sh ✅ s3_vm_adapter_map_size_struct_canary_vm.sh ``` 環境フラグ: HAKO_ABI_ADAPTER=1 HAKO_ABI_ADAPTER_DEV=0 ### 🏆 設計品質 - ✅ ハードコード禁止(AGENTS.md 5.1)完全準拠 - ✅ 構造的・一般化設計(特定Box名のif分岐なし) - ✅ 後方互換性保持(既存コード破壊ゼロ) - ✅ text-merge経路(.hako依存関係正しくマージ) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-07 19:32:44 +09:00
tokenize(text) {
local out = new ArrayBox()
if text == null { return out }
local n = text.length()
local i = 0
local line = 1
local col = 1
while i < n {
local ch = text.substring(i,i+1)
// whitespace and newlines
if ch == " " || ch == "\t" { i = i + 1; col = col + 1; continue }
if ch == "\r" { i = i + 1; continue }
if ch == "\n" { i = i + 1; line = line + 1; col = 1; continue }
// line comment // ... (consume until EOL)
if ch == "/" && i+1 < n && text.substring(i+1,i+2) == "/" {
// skip until newline
i = i + 2; col = col + 2
while i < n {
local c2 = text.substring(i,i+1)
if c2 == "\n" { break }
i = i + 1; col = col + 1
}
continue
}
// block comment /* ... */ (consume until closing, track newlines)
if ch == "/" && i+1 < n && text.substring(i+1,i+2) == "*" {
i = i + 2; col = col + 2
local closed = 0
while i < n {
local c2 = text.substring(i,i+1)
if c2 == "*" && i+1 < n && text.substring(i+1,i+2) == "/" { i = i + 2; col = col + 2; closed = 1; break }
if c2 == "\n" { i = i + 1; line = line + 1; col = 1; continue }
i = i + 1; col = col + 1
}
continue
}
// string literal "..." with escapes \" \\ \n \t
if ch == '"' {
local start_col = col
local buf = ""
i = i + 1; col = col + 1
local closed = 0
while i < n {
local c3 = text.substring(i,i+1)
if c3 == '"' { closed = 1; i = i + 1; col = col + 1; break }
if c3 == "\\" {
if i+1 < n {
local esc = text.substring(i+1,i+2)
if esc == '"' { buf = buf.concat('"') }
else if esc == "\\" { buf = buf.concat("\\") }
else if esc == "n" { buf = buf.concat("\n") }
else if esc == "t" { buf = buf.concat("\t") }
else { buf = buf.concat(esc) }
i = i + 2; col = col + 2
continue
} else { i = i + 1; col = col + 1; break }
}
buf = buf.concat(c3)
i = i + 1; col = col + 1
}
local tok = new MapBox(); tok.set("type","STRING"); tok.set("lexeme", buf); tok.set("line", line); tok.set("col", start_col)
out.push(tok); continue
}
// number (integer only for MVP)
if ch >= "0" && ch <= "9" {
local start = i; local start_col = col
while i < n {
local c4 = text.substring(i,i+1)
if !(c4 >= "0" && c4 <= "9") { break }
i = i + 1; col = col + 1
}
local lex = text.substring(start, i)
local tok = new MapBox(); tok.set("type","NUMBER"); tok.set("lexeme", lex); tok.set("line", line); tok.set("col", start_col)
out.push(tok); continue
}
// identifier or keyword
if me._is_ident_start(ch) == 1 {
local start = i; local start_col = col
while i < n {
local c5 = text.substring(i,i+1)
if me._is_ident_char(c5) == 0 { break }
i = i + 1; col = col + 1
}
local lex = text.substring(start, i)
local kind = me._kw_kind(lex)
local tok = new MapBox(); tok.set("type", kind); tok.set("lexeme", lex); tok.set("line", line); tok.set("col", start_col)
out.push(tok); continue
}
// punctuation / symbols we care about
local sym_kind = me._sym_kind(ch)
if sym_kind != null {
local tok = new MapBox(); tok.set("type", sym_kind); tok.set("lexeme", ch); tok.set("line", line); tok.set("col", col)
out.push(tok); i = i + 1; col = col + 1; continue
}
// unknown char -> emit as PUNC so parser can skip gracefully
local tok = new MapBox(); tok.set("type","PUNC"); tok.set("lexeme", ch); tok.set("line", line); tok.set("col", col)
out.push(tok); i = i + 1; col = col + 1
}
return out
}
_is_ident_start(c) { if c=="_" {return 1}; if c>="A"&&c<="Z" {return 1}; if c>="a"&&c<="z" {return 1}; return 0 }
_is_ident_char(c) { if me._is_ident_start(c)==1 { return 1 }; if c>="0"&&c<="9" { return 1 }; return 0 }
_kw_kind(lex) {
if lex == "using" { return "USING" }
if lex == "as" { return "AS" }
if lex == "static" { return "STATIC" }
if lex == "box" { return "BOX" }
if lex == "method" { return "METHOD" }
if lex == "include" { return "INCLUDE" }
if lex == "while" { return "WHILE" } // Stage-3 tokens (MVP)
if lex == "for" { return "FOR" }
if lex == "in" { return "IN" }
return "IDENT"
}
_sym_kind(c) {
if c == "{" { return "LBRACE" }
if c == "}" { return "RBRACE" }
if c == "(" { return "LPAREN" }
if c == ")" { return "RPAREN" }
if c == "," { return "COMMA" }
if c == "." { return "DOT" }
if c == ":" { return "COLON" }
if c == "=" { return "EQ" }
if c == ";" { return "SEMI" }
return null
Phase 21.2 Complete: VM Adapter正規実装 + devブリッジ完全撤去 ## 🎉 Phase 21.2完全達成 ### ✅ 実装完了 - VM static box 永続化(singleton infrastructure) - devブリッジ完全撤去(adapter_dev.rs削除、by-name dispatch削除) - .hako正規実装(MirCallV1Handler, AbiAdapterRegistry等) - text-merge経路完全動作 - 全phase2120 adapter reps PASS(7テスト) ### 🐛 バグ修正 1. strip_local_decl修正 - トップレベルのみlocal削除、メソッド内は保持 - src/runner/modes/common_util/hako.rs:29 2. static box フィールド永続化 - MirInterpreter singleton storage実装 - me parameter binding修正(1:1マッピング) - getField/setField string→singleton解決 - src/backend/mir_interpreter/{mod,exec,handlers/boxes_object_fields}.rs 3. Map.len alias rc=0修正 - [map/missing]パターン検出でnull扱い(4箇所) - lang/src/vm/boxes/mir_call_v1_handler.hako:91-93,131-133,151-153,199-201 ### 📁 主要変更ファイル #### Rust(VM Runtime) - src/backend/mir_interpreter/mod.rs - static box singleton storage - src/backend/mir_interpreter/exec.rs - parameter binding fix - src/backend/mir_interpreter/handlers/boxes_object_fields.rs - singleton resolution - src/backend/mir_interpreter/handlers/calls.rs - dev bridge removal - src/backend/mir_interpreter/utils/mod.rs - adapter_dev module removal - src/backend/mir_interpreter/utils/adapter_dev.rs - DELETED (7555 bytes) - src/runner/modes/vm.rs - static box declaration collection - src/runner/modes/common_util/hako.rs - strip_local_decl fix - src/instance_v2.rs - Clone implementation #### Hako (.hako実装) - lang/src/vm/boxes/mir_call_v1_handler.hako - [map/missing] detection - lang/src/vm/boxes/abi_adapter_registry.hako - NEW (adapter registry) - lang/src/vm/helpers/method_alias_policy.hako - method alias support #### テスト - tools/smokes/v2/profiles/quick/core/phase2120/s3_vm_adapter_*.sh - 7 new tests ### 🎯 テスト結果 ``` ✅ s3_vm_adapter_array_len_canary_vm.sh ✅ s3_vm_adapter_array_len_per_recv_canary_vm.sh ✅ s3_vm_adapter_array_length_alias_canary_vm.sh ✅ s3_vm_adapter_array_size_alias_canary_vm.sh ✅ s3_vm_adapter_map_len_alias_state_canary_vm.sh ✅ s3_vm_adapter_map_length_alias_state_canary_vm.sh ✅ s3_vm_adapter_map_size_struct_canary_vm.sh ``` 環境フラグ: HAKO_ABI_ADAPTER=1 HAKO_ABI_ADAPTER_DEV=0 ### 🏆 設計品質 - ✅ ハードコード禁止(AGENTS.md 5.1)完全準拠 - ✅ 構造的・一般化設計(特定Box名のif分岐なし) - ✅ 後方互換性保持(既存コード破壊ゼロ) - ✅ text-merge経路(.hako依存関係正しくマージ) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-07 19:32:44 +09:00
}
}
static box HakoTokenizerMain { method main(args) { return 0 } }