## Summary
Investigated OpenAI's new GPT-5-Codex model and Codex GitHub PR review integration capabilities.
## GPT-5-Codex Analysis
### Benchmark Performance (Good)
- SWE-bench Verified: 74.5% (vs GPT-5's 72.8%)
- Refactoring tasks: 51.3% (vs GPT-5's 33.9%)
- Code review: Higher developer ratings
### Real-World Issues (Concerning)
- Users report degraded coding performance
- Scripts that previously worked now fail
- Less consistent than GPT-4.5
- Longer response times (minutes vs instant)
- "Creatively and emotionally flat"
- Basic errors (e.g., counting letters incorrectly)
### Key Finding
Classic case of "optimizing for benchmarks vs real usability" - scores well on tests but performs poorly in practice.
## Codex GitHub PR Integration
### Setup Process
1. Enable MFA and connect GitHub account
2. Authorize Codex GitHub app for repos
3. Enable "Code review" in repository settings
### Usage Methods
- **Manual**: Comment '@codex review' in PR
- **Automatic**: Triggers when PR moves from draft to ready
### Current Limitations
- One-way communication (doesn't respond to review comments)
- Prefers creating new PRs over updating existing ones
- Better for single-pass reviews than iterative feedback
## 'codex resume' Feature
New session management capability:
- Resume previous codex exec sessions
- Useful for continuing long tasks across days
- Maintains context from interrupted work
🐱 The investigation reveals that while GPT-5-Codex shows benchmark improvements, practical developer experience has declined - a reminder that metrics don't always reflect real-world utility\!
Major implementation by ChatGPT:
- Complete JSON v0 Bridge layer with PHI generation for control flow
- If statement: Merge PHI nodes for variables updated in then/else branches
- Loop statement: Header PHI nodes for loop-carried dependencies
- Python MVP Parser Stage-2: Added local/if/loop/call/method/new support
- Full CFG guarantee: All blocks have proper terminators (branch/jump/return)
- Type metadata for string operations (+, ==, !=)
- Comprehensive PHI smoke tests for nested and edge cases
This allows MIR generation without Rust MIR builder - massive step towards
eliminating Rust build dependency!
🎉 ChatGPTが30分以上かけて実装してくれたにゃ!
Co-Authored-By: ChatGPT <noreply@openai.com>
Parser improvements:
- Added expression statement fallback in parse_statement() for flexible syntax
- Fixed ternary operator to use PeekExpr instead of If AST (better lowering)
- Added peek_token() check to avoid ?/?: operator conflicts
LLVM Python improvements:
- Added optional ESC_JSON_FIX environment flag for string concatenation
- Improved PHI generation with better default handling
- Enhanced substring tracking for esc_json pattern
Documentation updates:
- Updated language guide with peek expression examples
- Added box theory diagrams to Phase 15 planning
- Clarified peek vs when syntax differences
These changes enable cleaner parser implementation for self-hosting,
especially for handling digit conversion with peek expressions instead
of 19-line if-else chains.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>