## Summary
Investigated OpenAI's new GPT-5-Codex model and Codex GitHub PR review integration capabilities.
## GPT-5-Codex Analysis
### Benchmark Performance (Good)
- SWE-bench Verified: 74.5% (vs GPT-5's 72.8%)
- Refactoring tasks: 51.3% (vs GPT-5's 33.9%)
- Code review: Higher developer ratings
### Real-World Issues (Concerning)
- Users report degraded coding performance
- Scripts that previously worked now fail
- Less consistent than GPT-4.5
- Longer response times (minutes vs instant)
- "Creatively and emotionally flat"
- Basic errors (e.g., counting letters incorrectly)
### Key Finding
Classic case of "optimizing for benchmarks vs real usability" - scores well on tests but performs poorly in practice.
## Codex GitHub PR Integration
### Setup Process
1. Enable MFA and connect GitHub account
2. Authorize Codex GitHub app for repos
3. Enable "Code review" in repository settings
### Usage Methods
- **Manual**: Comment '@codex review' in PR
- **Automatic**: Triggers when PR moves from draft to ready
### Current Limitations
- One-way communication (doesn't respond to review comments)
- Prefers creating new PRs over updating existing ones
- Better for single-pass reviews than iterative feedback
## 'codex resume' Feature
New session management capability:
- Resume previous codex exec sessions
- Useful for continuing long tasks across days
- Maintains context from interrupted work
🐱 The investigation reveals that while GPT-5-Codex shows benchmark improvements, practical developer experience has declined - a reminder that metrics don't always reflect real-world utility\!
1.6 KiB
1.6 KiB
Nyash Syntax Torture (10 minimal repros)
Date: 2025-09-16
Purpose: stress parser → AST → MIR(Core-13/PURE) → Interpreter/VM/LLVM(AOT) consistency. Each test is one phenomenon per file, tiny and deterministic.
How to run (suggested)
# 1) Run all modes and compare outputs
bash run_spec_smoke.sh
# 2) PURE mode (surface MIR violations):
NYASH_MIR_CORE13_PURE=1 bash run_spec_smoke.sh
# 3) Extra logging when a case fails:
NYASH_VM_STATS=1 NYASH_VM_STATS_JSON=1 NYASH_VM_DEBUG_BOXCALL=1 bash run_spec_smoke.sh
# For LLVM diagnostics (when applicable):
# NYASH_LLVM_VINVOKE_TRACE=1 NYASH_LLVM_VINVOKE_PREFER_I64=1 bash run_spec_smoke.sh
Expected outputs (goldens)
We deliberately print a single line per test to make diffing trivial.
See inline comments in each *.nyash.
File list
- 01_ops_assoc.nyash – operator associativity & coercion order
- 02_deep_parens.nyash – deep parentheses & arithmetic nesting
- 03_array_map_nested.nyash – nested array/map literal & access
- 04_map_array_mix.nyash – object/array cross indexing & updates
- 05_string_concat_unicode.nyash – string/number/Unicode concatenation
- 06_control_flow_loopform.nyash – break/continue/dispatch shape
- 07_await_nowait_mix.nyash – nowait/await interleave determinism
- 08_visibility_access.nyash – private/public & override routing
- 09_lambda_closure_scope.nyash – closure capture & shadowing
- 10_match_result_early_return.nyash – early return vs. branch merge
CI hint
- Add this suite before your self-host smokes:
make spec-smoke->make smoke-selfhost
- Fail fast on any diff across Interpreter/VM/LLVM.