🔍 Research: GPT-5-Codex capabilities and GitHub PR integration
## Summary
Investigated OpenAI's new GPT-5-Codex model and Codex GitHub PR review integration capabilities.
## GPT-5-Codex Analysis
### Benchmark Performance (Good)
- SWE-bench Verified: 74.5% (vs GPT-5's 72.8%)
- Refactoring tasks: 51.3% (vs GPT-5's 33.9%)
- Code review: Higher developer ratings
### Real-World Issues (Concerning)
- Users report degraded coding performance
- Scripts that previously worked now fail
- Less consistent than GPT-4.5
- Longer response times (minutes vs instant)
- "Creatively and emotionally flat"
- Basic errors (e.g., counting letters incorrectly)
### Key Finding
Classic case of "optimizing for benchmarks vs real usability" - scores well on tests but performs poorly in practice.
## Codex GitHub PR Integration
### Setup Process
1. Enable MFA and connect GitHub account
2. Authorize Codex GitHub app for repos
3. Enable "Code review" in repository settings
### Usage Methods
- **Manual**: Comment '@codex review' in PR
- **Automatic**: Triggers when PR moves from draft to ready
### Current Limitations
- One-way communication (doesn't respond to review comments)
- Prefers creating new PRs over updating existing ones
- Better for single-pass reviews than iterative feedback
## 'codex resume' Feature
New session management capability:
- Resume previous codex exec sessions
- Useful for continuing long tasks across days
- Maintains context from interrupted work
🐱 The investigation reveals that while GPT-5-Codex shows benchmark improvements, practical developer experience has declined - a reminder that metrics don't always reflect real-world utility\!
This commit is contained in:
@ -18,6 +18,7 @@ Grammar (subset):
|
||||
term := unary (('*'|'/') unary)*
|
||||
unary := '-' unary | factor
|
||||
factor := INT | STRING | IDENT call_tail* | '(' expr ')' | 'new' IDENT '(' args? ')'
|
||||
| '{' map_entries? '}' # map literal (string keys only)
|
||||
call_tail:= '.' IDENT '(' args? ')' # method
|
||||
| '(' args? ')' # function call
|
||||
args := expr (',' expr)*
|
||||
@ -45,7 +46,7 @@ def lex(s: str):
|
||||
# two-char ops
|
||||
if s.startswith('==', i) or s.startswith('!=', i) or s.startswith('<=', i) or s.startswith('>=', i) or s.startswith('&&', i) or s.startswith('||', i):
|
||||
out.append(Tok('OP2', s[i:i+2], i)); i+=2; continue
|
||||
if c in '+-*/(){}.,<>=':
|
||||
if c in '+-*/(){}.,<>=[]:':
|
||||
out.append(Tok(c, c, i)); i+=1; continue
|
||||
if c=='"':
|
||||
j=i+1; buf=[]
|
||||
@ -144,6 +145,31 @@ class P:
|
||||
if self.eat('STR'): return {"type":"Str","value":tok.val}
|
||||
if self.eat('('):
|
||||
e=self.expr(); self.expect(')'); return e
|
||||
# Array literal: [e1, e2, ...] → Call{name:"array.of", args:[...]}
|
||||
if self.eat('['):
|
||||
args=[]
|
||||
if self.cur().kind != ']':
|
||||
args.append(self.expr())
|
||||
while self.eat(','):
|
||||
args.append(self.expr())
|
||||
self.expect(']')
|
||||
return {"type":"Call","name":"array.of","args":args}
|
||||
# Map literal: {"k": v, ...} (string keys only) → Call{name:"map.of", args:[Str(k1), v1, ...]}
|
||||
if self.eat('{'):
|
||||
args=[]
|
||||
if self.cur().kind != '}':
|
||||
# first entry
|
||||
k=self.cur(); self.expect('STR');
|
||||
self.expect(':'); v=self.expr();
|
||||
args.append({"type":"Str","value":k.val}); args.append(v)
|
||||
while self.eat(','):
|
||||
if self.cur().kind == '}':
|
||||
break
|
||||
k=self.cur(); self.expect('STR');
|
||||
self.expect(':'); v=self.expr();
|
||||
args.append({"type":"Str","value":k.val}); args.append(v)
|
||||
self.expect('}')
|
||||
return {"type":"Call","name":"map.of","args":args}
|
||||
if self.eat('NEW'):
|
||||
t=self.cur(); self.expect('IDENT'); self.expect('(')
|
||||
args=self.args_opt(); self.expect(')')
|
||||
|
||||
Reference in New Issue
Block a user