7.5 KiB
Phase 20.39: String Scanner Fix - Single Quote & Complete Escape Sequences
Date: 2025-11-04 Status: ✅ IMPLEMENTED Task: Fix Hako string scanner to support single-quoted strings and complete escape sequences
🎯 Goal
Fix the Hako string scanner (parser_string_scan_box.hako) to:
- Support single-quoted strings (
'...') in Stage-3 mode - Properly handle all escape sequences including
\r(CR),\/,\b,\f, and\' - Handle embedded JSON from
jq -Rs .without parse errors
📋 Implementation Summary
Changes Made
1. Added scan_with_quote Method (parser_string_scan_box.hako)
File: /home/tomoaki/git/hakorune-selfhost/lang/src/compiler/parser/scan/parser_string_scan_box.hako
New Method: scan_with_quote(src, i, quote)
- Abstract scanner that accepts quote character as parameter (
"or') - Supports all required escape sequences:
\\→\(backslash)\"→"(double-quote)\'→'(single-quote) ✨ NEW\/→/(forward slash) ✨ NEW\b→ empty string (backspace, MVP approximation) ✨ NEW\f→ empty string (form feed, MVP approximation) ✨ NEW\n→ newline (LF, 0x0A)\r→ CR (0x0D) ✅ FIXED (was incorrectly\n)\t→ tab (0x09)\uXXXX→ concatenated as-is (6 characters, MVP)
Backward Compatibility:
- Existing
scan(src, i)method now wrapsscan_with_quote(src, i, "\"") - No breaking changes to existing code
2. Updated read_string_lit Method (parser_box.hako)
File: /home/tomoaki/git/hakorune-selfhost/lang/src/compiler/parser/parser_box.hako
Enhancement: Quote type detection
- Detects
'vs"at positioni - Routes to
scan_with_quote(src, i, "'")for single-quote in Stage-3 - Graceful degradation if single-quote used without Stage-3 (returns empty string)
- Falls back to existing
scan(src, i)for double-quote
Stage-3 Gate: Single-quote support only enabled when:
NYASH_PARSER_STAGE3=1environment variable is setHAKO_PARSER_STAGE3=1environment variable is setstage3_enabled()returns 1
🔍 Technical Details
Escape Sequence Handling
Fixed Issues:
-
\rBug: Previously converted to\n(LF) instead of staying as CR (0x0D)- Before:
\r→\n(incorrect) - After:
\r→\r(correct)
- Before:
-
Missing Escapes: Added support for:
\/(forward slash for JSON compatibility)\b(backspace, approximated as empty string for MVP)\f(form feed, approximated as empty string for MVP)\'(single quote escape)
-
\uXXXXHandling: For MVP, concatenated as-is (6 characters)- Future: Can decode to Unicode codepoint with
HAKO_PARSER_DECODE_UNICODE=1
- Future: Can decode to Unicode codepoint with
Quote Type Abstraction
Design:
- Single method (
scan_with_quote) handles both quote types - Quote character passed as parameter for maximum flexibility
- Maintains
content@poscontract: returns"<content>@<position>"
Stage-3 Mode
Activation:
NYASH_PARSER_STAGE3=1 HAKO_PARSER_STAGE3=1 ./hakorune program.hako
Behavior:
- Stage-3 OFF: Double-quote only (default, backward compatible)
- Stage-3 ON: Both single and double quotes supported
🧪 Testing
Test Scripts Created
Location: /home/tomoaki/git/hakorune-selfhost/tools/smokes/v2/profiles/quick/core/phase2039/
1. parser_escape_sequences_canary.sh
- Purpose: Test all escape sequences in double-quoted strings
- Test cases:
\",\\,\/,\n,\r,\t,\b,\f
2. parser_single_quote_canary.sh
- Purpose: Test single-quoted strings with
\'escape - Test cases:
'hello','it\'s working' - Stage-3: Required
3. parser_embedded_json_canary.sh
- Purpose: Test embedded JSON from
jq -Rs . - Test cases: JSON with escaped quotes and newlines
- Real-world: Validates fix for issue described in task
Manual Testing
# Test 1: Double-quote escapes
cat > /tmp/test1.hako <<'EOF'
static box Main { method main(args) {
local s = "a\"b\\c\/d\n\r\t"
print(s)
return 0
} }
EOF
# Test 2: Single-quote (Stage-3)
cat > /tmp/test2.hako <<'EOF'
static box Main { method main(args) {
local s = 'it\'s working'
print(s)
return 0
} }
EOF
NYASH_PARSER_STAGE3=1 HAKO_PARSER_STAGE3=1 ./hakorune test2.hako
# Test 3: Embedded JSON
jq -Rs . < some.json | xargs -I {} echo "local j = {}" > test3.hako
✅ Acceptance Criteria
- Stage-3 OFF: Double-quote strings work as before (with improved escapes)
- Stage-3 ON: Single-quote strings parse without error
- Escape fixes:
\rbecomes CR (not LF),\/,\b,\fsupported \uXXXX: Concatenated as 6 characters (not decoded yet)- Embedded JSON:
jq -Rs .output parses successfully - No regression: Existing quick profile tests should pass
- Contract maintained:
content@posformat unchanged
📚 Files Modified
Core Implementation
-
lang/src/compiler/parser/scan/parser_string_scan_box.hako- Added
scan_with_quote(src, i, quote)method (70 lines) - Updated
scan(src, i)to wrapper (2 lines)
- Added
-
lang/src/compiler/parser/parser_box.hako- Updated
read_string_lit(src, i)for quote detection (32 lines)
- Updated
Tests
tools/smokes/v2/profiles/quick/core/phase2039/parser_escape_sequences_canary.shtools/smokes/v2/profiles/quick/core/phase2039/parser_single_quote_canary.shtools/smokes/v2/profiles/quick/core/phase2039/parser_embedded_json_canary.sh
Documentation
docs/updates/phase2039-string-scanner-fix.md(this file)
🚀 Future Work
Phase 2: Unicode Decoding
- Feature:
\uXXXXdecoding to Unicode codepoints - Gate:
HAKO_PARSER_DECODE_UNICODE=1 - Implementation: Add
decode_unicode_escape(seq)helper
Phase 3: Strict Escape Mode
- Feature: Error on invalid escapes (instead of tolerating)
- Gate:
HAKO_PARSER_STRICT_ESCAPES=1 - Implementation: Return error instead of
out + "\\" + next
Phase 4: Control Character Handling
- Feature: Proper
\b(0x08) and\f(0x0C) handling - Implementation: May require VM-level control character support
📝 Notes
Backward Compatibility
- Default behavior unchanged (Stage-3 OFF, double-quote only)
- All existing code continues to work
- Stage-3 is opt-in via environment variables
Performance
- String concatenation in loop (same as before)
- Existing guard (max 200,000 iterations) maintained
- No performance regression
Design Decisions
- Quote abstraction: Single method handles both quote types for maintainability
- Stage-3 gate: Single-quote is experimental, behind flag
- MVP escapes:
\b,\fapproximated as empty string (sufficient for JSON/text processing) \uXXXXdeferral: Decoding postponed to avoid complexity (6-char concatenation sufficient for MVP)
🎉 Summary
Problem: String scanner couldn't handle:
- Single-quoted strings (
'...') - Escape sequences:
\r(CR),\/,\b,\f,\' - Embedded JSON from
jq -Rs .
Solution:
- Added
scan_with_quotegeneric scanner - Fixed
\rto remain as CR (not convert to LF) - Added missing escape sequences
- Implemented Stage-3 single-quote support
Impact:
- ✅ JSON embedding now works
- ✅ All standard escape sequences supported
- ✅ Single-quote strings available (opt-in)
- ✅ 100% backward compatible
Lines Changed: ~100 lines of implementation + 150 lines of tests
Status: Ready for integration testing with existing quick profile