hakorune/tests/nyash_syntax_torture_20250916/README.md


# Nyash Syntax Torture (10 minimal repros)
Date: 2025-09-16

Purpose: stress parser → AST → MIR(Core-13/PURE) → Interpreter/VM/LLVM(AOT) consistency.
Each test is **one phenomenon per file**, tiny and deterministic.

## How to run (suggested)
```
# 1) Run all modes and compare outputs
bash run_spec_smoke.sh

# 2) PURE mode (surface MIR violations):
NYASH_MIR_CORE13_PURE=1 bash run_spec_smoke.sh

# 3) Extra logging when a case fails:
NYASH_VM_STATS=1 NYASH_VM_STATS_JSON=1 NYASH_VM_DEBUG_BOXCALL=1 bash run_spec_smoke.sh
# For LLVM diagnostics (when applicable):
# NYASH_LLVM_VINVOKE_TRACE=1 NYASH_LLVM_VINVOKE_PREFER_I64=1 bash run_spec_smoke.sh
```

## Expected outputs (goldens)
We deliberately **print a single line** per test to make diffing trivial.
See inline comments in each `*.hako`.

## File list
1. 01_ops_assoc.hako – operator associativity & coercion order  
2. 02_deep_parens.hako – deep parentheses & arithmetic nesting  
3. 03_array_map_nested.hako – nested array/map literal & access  
4. 04_map_array_mix.hako – object/array cross indexing & updates  
5. 05_string_concat_unicode.hako – string/number/Unicode concatenation  
6. 06_control_flow_loopform.hako – break/continue/dispatch shape  
7. 07_await_nowait_mix.hako – nowait/await interleave determinism  
8. 08_visibility_access.hako – private/public & override routing  
9. 09_lambda_closure_scope.hako – closure capture & shadowing  
10. 10_match_result_early_return.hako – early return vs. branch merge

## CI hint
- Add this suite **before** your self-host smokes:
  - `make spec-smoke` -> `make smoke-selfhost`
- Fail fast on any diff across Interpreter/VM/LLVM.
-												🔍 Research: GPT-5-Codex capabilities and GitHub PR integration

## Summary
Investigated OpenAI's new GPT-5-Codex model and Codex GitHub PR review integration capabilities.

## GPT-5-Codex Analysis

### Benchmark Performance (Good)
- SWE-bench Verified: 74.5% (vs GPT-5's 72.8%)
- Refactoring tasks: 51.3% (vs GPT-5's 33.9%)
- Code review: Higher developer ratings

### Real-World Issues (Concerning)
- Users report degraded coding performance
- Scripts that previously worked now fail
- Less consistent than GPT-4.5
- Longer response times (minutes vs instant)
- "Creatively and emotionally flat"
- Basic errors (e.g., counting letters incorrectly)

### Key Finding
Classic case of "optimizing for benchmarks vs real usability" - scores well on tests but performs poorly in practice.

## Codex GitHub PR Integration

### Setup Process
1. Enable MFA and connect GitHub account
2. Authorize Codex GitHub app for repos
3. Enable "Code review" in repository settings

### Usage Methods
- **Manual**: Comment '@codex review' in PR
- **Automatic**: Triggers when PR moves from draft to ready

### Current Limitations
- One-way communication (doesn't respond to review comments)
- Prefers creating new PRs over updating existing ones
- Better for single-pass reviews than iterative feedback

## 'codex resume' Feature
New session management capability:
- Resume previous codex exec sessions
- Useful for continuing long tasks across days
- Maintains context from interrupted work

🐱 The investigation reveals that while GPT-5-Codex shows benchmark improvements, practical developer experience has declined - a reminder that metrics don't always reflect real-world utility\!

											
										
										
											2025-09-16 16:28:25 +09:00
 								# Nyash Syntax Torture (10 minimal repros)
 								Date: 2025-09-16
 								Purpose: stress parser → AST → MIR(Core-13/PURE) → Interpreter/VM/LLVM(AOT) consistency.
 								Each test is **one phenomenon per file**, tiny and deterministic.
 								## How to run (suggested)
 								```
 								# 1) Run all modes and compare outputs
 								bash run_spec_smoke.sh
 								# 2) PURE mode (surface MIR violations):
 								NYASH_MIR_CORE13_PURE=1 bash run_spec_smoke.sh
 								# 3) Extra logging when a case fails:
 								NYASH_VM_STATS=1 NYASH_VM_STATS_JSON=1 NYASH_VM_DEBUG_BOXCALL=1 bash run_spec_smoke.sh
 								# For LLVM diagnostics (when applicable):
 								# NYASH_LLVM_VINVOKE_TRACE=1 NYASH_LLVM_VINVOKE_PREFER_I64=1 bash run_spec_smoke.sh
 								```
 								## Expected outputs (goldens)
 								We deliberately **print a single line** per test to make diffing trivial.
-												phase: 20.49 COMPLETE; 20.50 Flow+String minimal reps; 20.51 selfhost v0/v1 minimal (Option A/B); hv1-inline binop/unop/copy; docs + run_all + CURRENT_TASK -> 21.0

											
										
										
											2025-11-06 15:41:52 +09:00
+								See inline comments in each `*.hako`.
-												🔍 Research: GPT-5-Codex capabilities and GitHub PR integration

## Summary
Investigated OpenAI's new GPT-5-Codex model and Codex GitHub PR review integration capabilities.

## GPT-5-Codex Analysis

### Benchmark Performance (Good)
- SWE-bench Verified: 74.5% (vs GPT-5's 72.8%)
- Refactoring tasks: 51.3% (vs GPT-5's 33.9%)
- Code review: Higher developer ratings

### Real-World Issues (Concerning)
- Users report degraded coding performance
- Scripts that previously worked now fail
- Less consistent than GPT-4.5
- Longer response times (minutes vs instant)
- "Creatively and emotionally flat"
- Basic errors (e.g., counting letters incorrectly)

### Key Finding
Classic case of "optimizing for benchmarks vs real usability" - scores well on tests but performs poorly in practice.

## Codex GitHub PR Integration

### Setup Process
1. Enable MFA and connect GitHub account
2. Authorize Codex GitHub app for repos
3. Enable "Code review" in repository settings

### Usage Methods
- **Manual**: Comment '@codex review' in PR
- **Automatic**: Triggers when PR moves from draft to ready

### Current Limitations
- One-way communication (doesn't respond to review comments)
- Prefers creating new PRs over updating existing ones
- Better for single-pass reviews than iterative feedback

## 'codex resume' Feature
New session management capability:
- Resume previous codex exec sessions
- Useful for continuing long tasks across days
- Maintains context from interrupted work

🐱 The investigation reveals that while GPT-5-Codex shows benchmark improvements, practical developer experience has declined - a reminder that metrics don't always reflect real-world utility\!

											
										
										
											2025-09-16 16:28:25 +09:00
 								## File list
-												phase: 20.49 COMPLETE; 20.50 Flow+String minimal reps; 20.51 selfhost v0/v1 minimal (Option A/B); hv1-inline binop/unop/copy; docs + run_all + CURRENT_TASK -> 21.0

											
										
										
											2025-11-06 15:41:52 +09:00
+. 01_ops_assoc.hako – operator associativity & coercion order
 . 02_deep_parens.hako – deep parentheses & arithmetic nesting
 . 03_array_map_nested.hako – nested array/map literal & access
 . 04_map_array_mix.hako – object/array cross indexing & updates
 . 05_string_concat_unicode.hako – string/number/Unicode concatenation
 . 06_control_flow_loopform.hako – break/continue/dispatch shape
 . 07_await_nowait_mix.hako – nowait/await interleave determinism
 . 08_visibility_access.hako – private/public & override routing
 . 09_lambda_closure_scope.hako – closure capture & shadowing
 . 10_match_result_early_return.hako – early return vs. branch merge
-												🔍 Research: GPT-5-Codex capabilities and GitHub PR integration

## Summary
Investigated OpenAI's new GPT-5-Codex model and Codex GitHub PR review integration capabilities.

## GPT-5-Codex Analysis

### Benchmark Performance (Good)
- SWE-bench Verified: 74.5% (vs GPT-5's 72.8%)
- Refactoring tasks: 51.3% (vs GPT-5's 33.9%)
- Code review: Higher developer ratings

### Real-World Issues (Concerning)
- Users report degraded coding performance
- Scripts that previously worked now fail
- Less consistent than GPT-4.5
- Longer response times (minutes vs instant)
- "Creatively and emotionally flat"
- Basic errors (e.g., counting letters incorrectly)

### Key Finding
Classic case of "optimizing for benchmarks vs real usability" - scores well on tests but performs poorly in practice.

## Codex GitHub PR Integration

### Setup Process
1. Enable MFA and connect GitHub account
2. Authorize Codex GitHub app for repos
3. Enable "Code review" in repository settings

### Usage Methods
- **Manual**: Comment '@codex review' in PR
- **Automatic**: Triggers when PR moves from draft to ready

### Current Limitations
- One-way communication (doesn't respond to review comments)
- Prefers creating new PRs over updating existing ones
- Better for single-pass reviews than iterative feedback

## 'codex resume' Feature
New session management capability:
- Resume previous codex exec sessions
- Useful for continuing long tasks across days
- Maintains context from interrupted work

🐱 The investigation reveals that while GPT-5-Codex shows benchmark improvements, practical developer experience has declined - a reminder that metrics don't always reflect real-world utility\!

											
										
										
											2025-09-16 16:28:25 +09:00
 								## CI hint
 								- Add this suite **before** your self-host smokes:
 								  - `make spec-smoke` -> `make smoke-selfhost`
 								- Fail fast on any diff across Interpreter/VM/LLVM.