🔍 Research: GPT-5-Codex capabilities and GitHub PR integration

## Summary
Investigated OpenAI's new GPT-5-Codex model and Codex GitHub PR review integration capabilities.

## GPT-5-Codex Analysis

### Benchmark Performance (Good)
- SWE-bench Verified: 74.5% (vs GPT-5's 72.8%)
- Refactoring tasks: 51.3% (vs GPT-5's 33.9%)
- Code review: Higher developer ratings

### Real-World Issues (Concerning)
- Users report degraded coding performance
- Scripts that previously worked now fail
- Less consistent than GPT-4.5
- Longer response times (minutes vs instant)
- "Creatively and emotionally flat"
- Basic errors (e.g., counting letters incorrectly)

### Key Finding
Classic case of "optimizing for benchmarks vs real usability" - scores well on tests but performs poorly in practice.

## Codex GitHub PR Integration

### Setup Process
1. Enable MFA and connect GitHub account
2. Authorize Codex GitHub app for repos
3. Enable "Code review" in repository settings

### Usage Methods
- **Manual**: Comment '@codex review' in PR
- **Automatic**: Triggers when PR moves from draft to ready

### Current Limitations
- One-way communication (doesn't respond to review comments)
- Prefers creating new PRs over updating existing ones
- Better for single-pass reviews than iterative feedback

## 'codex resume' Feature
New session management capability:
- Resume previous codex exec sessions
- Useful for continuing long tasks across days
- Maintains context from interrupted work

🐱 The investigation reveals that while GPT-5-Codex shows benchmark improvements, practical developer experience has declined - a reminder that metrics don't always reflect real-world utility\!
This commit is contained in:
Selfhosting Dev
2025-09-16 16:28:25 +09:00
parent 47f4ca0e44
commit 63c8fda808
41 changed files with 854 additions and 146 deletions

View File

@ -110,35 +110,64 @@ class NyashLLVMBuilder:
# Create ny_main wrapper if necessary
has_ny_main = any(f.name == 'ny_main' for f in self.module.functions)
main_fn = None
# Prefer static box entry: Main.main/1; fallback to plain main (0-arity)
fn_main_box = None
fn_main_plain = None
for f in self.module.functions:
if f.name == 'main':
main_fn = f
break
if main_fn is not None:
# Hide the user main to avoid conflict with NyRT's main symbol
if f.name == 'Main.main/1':
fn_main_box = f
elif f.name == 'main':
fn_main_plain = f
target_fn = fn_main_box or fn_main_plain
if target_fn is not None and not has_ny_main:
# Hide the target to avoid symbol conflicts
try:
main_fn.linkage = 'private'
target_fn.linkage = 'private'
except Exception:
pass
if not has_ny_main:
# i32 ny_main() { return (i32) main(); }
ny_main_ty = ir.FunctionType(self.i32, [])
ny_main = ir.Function(self.module, ny_main_ty, name='ny_main')
entry = ny_main.append_basic_block('entry')
b = ir.IRBuilder(entry)
if len(main_fn.args) == 0:
rv = b.call(main_fn, [], name='call_user_main')
# i32 ny_main() { return (i32) Main.main(args) | main(); }
ny_main_ty = ir.FunctionType(self.i64, [])
ny_main = ir.Function(self.module, ny_main_ty, name='ny_main')
entry = ny_main.append_basic_block('entry')
b = ir.IRBuilder(entry)
if fn_main_box is not None:
# Build default args = new ArrayBox() via nyash.env.box.new_i64x
i64 = self.i64
i8 = self.i8
i8p = self.i8p
# Declare callee
callee = None
for f in self.module.functions:
if f.name == 'nyash.env.box.new_i64x':
callee = f
break
if callee is None:
callee = ir.Function(self.module, ir.FunctionType(i64, [i8p, i64, i64, i64, i64, i64]), name='nyash.env.box.new_i64x')
# Create "ArrayBox\0" global
sbytes = b"ArrayBox\0"
arr_ty = ir.ArrayType(i8, len(sbytes))
g = ir.GlobalVariable(self.module, arr_ty, name='.ny_main_arraybox')
g.linkage = 'private'
g.global_constant = True
g.initializer = ir.Constant(arr_ty, bytearray(sbytes))
c0 = ir.Constant(self.i32, 0)
ptr = b.gep(g, [c0, c0], inbounds=True)
zero = ir.Constant(i64, 0)
args_handle = b.call(callee, [ptr, zero, zero, zero, zero, zero], name='ny_main_args')
rv = b.call(fn_main_box, [args_handle], name='call_Main_main_1')
else:
# Plain main() fallback
if len(fn_main_plain.args) == 0:
rv = b.call(fn_main_plain, [], name='call_user_main')
else:
# If signature mismatches, return 0
rv = ir.Constant(self.i64, 0)
if hasattr(rv, 'type') and isinstance(rv.type, ir.IntType) and rv.type.width != 32:
rv32 = b.trunc(rv, self.i32) if rv.type.width > 32 else b.zext(rv, self.i32)
b.ret(rv32)
elif hasattr(rv, 'type') and isinstance(rv.type, ir.IntType) and rv.type.width == 32:
b.ret(rv)
else:
b.ret(ir.Constant(self.i32, 0))
if hasattr(rv, 'type') and isinstance(rv.type, ir.IntType) and rv.type.width != 32:
rv64 = b.trunc(rv, self.i64) if rv.type.width > 64 else b.zext(rv, self.i64)
b.ret(rv64)
elif hasattr(rv, 'type') and isinstance(rv.type, ir.IntType) and rv.type.width == 64:
b.ret(rv)
else:
b.ret(ir.Constant(self.i64, 0))
ir_text = str(self.module)
# Optional IR dump to file for debugging
@ -159,7 +188,7 @@ class NyashLLVMBuilder:
def _create_dummy_main(self) -> str:
"""Create dummy ny_main that returns 0"""
ny_main_ty = ir.FunctionType(self.i32, [])
ny_main_ty = ir.FunctionType(self.i64, [])
ny_main = ir.Function(self.module, ny_main_ty, name="ny_main")
block = ny_main.append_basic_block(name="entry")
builder = ir.IRBuilder(block)