🔍 Research: GPT-5-Codex capabilities and GitHub PR integration

## Summary Investigated OpenAI's new GPT-5-Codex model and Codex GitHub PR review integration capabilities. ## GPT-5-Codex Analysis ### Benchmark Performance (Good) - SWE-bench Verified: 74.5% (vs GPT-5's 72.8%) - Refactoring tasks: 51.3% (vs GPT-5's 33.9%) - Code review: Higher developer ratings ### Real-World Issues (Concerning) - Users report degraded coding performance - Scripts that previously worked now fail - Less consistent than GPT-4.5 - Longer response times (minutes vs instant) - "Creatively and emotionally flat" - Basic errors (e.g., counting letters incorrectly) ### Key Finding Classic case of "optimizing for benchmarks vs real usability" - scores well on tests but performs poorly in practice. ## Codex GitHub PR Integration ### Setup Process 1. Enable MFA and connect GitHub account 2. Authorize Codex GitHub app for repos 3. Enable "Code review" in repository settings ### Usage Methods - **Manual**: Comment '@codex review' in PR - **Automatic**: Triggers when PR moves from draft to ready ### Current Limitations - One-way communication (doesn't respond to review comments) - Prefers creating new PRs over updating existing ones - Better for single-pass reviews than iterative feedback ## 'codex resume' Feature New session management capability: - Resume previous codex exec sessions - Useful for continuing long tasks across days - Maintains context from interrupted work 🐱 The investigation reveals that while GPT-5-Codex shows benchmark improvements, practical developer experience has declined - a reminder that metrics don't always reflect real-world utility\!
2025-09-16 16:28:25 +09:00
parent 47f4ca0e44
commit 63c8fda808
41 changed files with 854 additions and 146 deletions
--- a/tools/ny_parser_mvp.py
+++ b/tools/ny_parser_mvp.py
@ -18,6 +18,7 @@ Grammar (subset):
  term     := unary (('*'|'/') unary)*
  unary    := '-' unary | factor
  factor   := INT | STRING | IDENT call_tail* | '(' expr ')' | 'new' IDENT '(' args? ')'
+            | '{' map_entries? '}'               # map literal (string keys only)
  call_tail:= '.' IDENT '(' args? ')'   # method
            | '(' args? ')'             # function call
  args     := expr (',' expr)*
@ -45,7 +46,7 @@ def lex(s: str):
        # two-char ops
        if s.startswith('==', i) or s.startswith('!=', i) or s.startswith('<=', i) or s.startswith('>=', i) or s.startswith('&&', i) or s.startswith('||', i):
            out.append(Tok('OP2', s[i:i+2], i)); i+=2; continue
-        if c in '+-*/(){}.,<>=':
+        if c in '+-*/(){}.,<>=[]:':
            out.append(Tok(c, c, i)); i+=1; continue
        if c=='"':
            j=i+1; buf=[]
@ -144,6 +145,31 @@ class P:
        if self.eat('STR'): return {"type":"Str","value":tok.val}
        if self.eat('('):
            e=self.expr(); self.expect(')'); return e
+        # Array literal: [e1, e2, ...] → Call{name:"array.of", args:[...]}
+        if self.eat('['):
+            args=[]
+            if self.cur().kind != ']':
+                args.append(self.expr())
+                while self.eat(','):
+                    args.append(self.expr())
+            self.expect(']')
+            return {"type":"Call","name":"array.of","args":args}
+        # Map literal: {"k": v, ...} (string keys only) → Call{name:"map.of", args:[Str(k1), v1, ...]}
+        if self.eat('{'):
+            args=[]
+            if self.cur().kind != '}':
+                # first entry
+                k=self.cur(); self.expect('STR');
+                self.expect(':'); v=self.expr();
+                args.append({"type":"Str","value":k.val}); args.append(v)
+                while self.eat(','):
+                    if self.cur().kind == '}':
+                        break
+                    k=self.cur(); self.expect('STR');
+                    self.expect(':'); v=self.expr();
+                    args.append({"type":"Str","value":k.val}); args.append(v)
+            self.expect('}')
+            return {"type":"Call","name":"map.of","args":args}
        if self.eat('NEW'):
            t=self.cur(); self.expect('IDENT'); self.expect('(')
            args=self.args_opt(); self.expect(')')