## Summary
Investigated OpenAI's new GPT-5-Codex model and Codex GitHub PR review integration capabilities.
## GPT-5-Codex Analysis
### Benchmark Performance (Good)
- SWE-bench Verified: 74.5% (vs GPT-5's 72.8%)
- Refactoring tasks: 51.3% (vs GPT-5's 33.9%)
- Code review: Higher developer ratings
### Real-World Issues (Concerning)
- Users report degraded coding performance
- Scripts that previously worked now fail
- Less consistent than GPT-4.5
- Longer response times (minutes vs instant)
- "Creatively and emotionally flat"
- Basic errors (e.g., counting letters incorrectly)
### Key Finding
Classic case of "optimizing for benchmarks vs real usability" - scores well on tests but performs poorly in practice.
## Codex GitHub PR Integration
### Setup Process
1. Enable MFA and connect GitHub account
2. Authorize Codex GitHub app for repos
3. Enable "Code review" in repository settings
### Usage Methods
- **Manual**: Comment '@codex review' in PR
- **Automatic**: Triggers when PR moves from draft to ready
### Current Limitations
- One-way communication (doesn't respond to review comments)
- Prefers creating new PRs over updating existing ones
- Better for single-pass reviews than iterative feedback
## 'codex resume' Feature
New session management capability:
- Resume previous codex exec sessions
- Useful for continuing long tasks across days
- Maintains context from interrupted work
🐱 The investigation reveals that while GPT-5-Codex shows benchmark improvements, practical developer experience has declined - a reminder that metrics don't always reflect real-world utility\!
- Add plugin host initialization for LLVM mode (fixes method_id injection)
- Add FileBox method_id injection test
- Enhance MIR14 stability test
- Add warning for Mock LLVM implementation
- Create build scripts for JIT/LLVM with proper settings (24 threads, timeout)
- Update CLAUDE.md with build instructions and FileBox test results
Note: FileBox file I/O still not working in LLVM execution (separate issue)
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Keep essential information within 500 lines (now 395 lines)
- Maintain important syntax examples and development principles
- Move detailed information to appropriate docs files:
- Development practices → docs/guides/development-practices.md
- Testing guide → docs/guides/testing-guide.md
- Claude issues → docs/tools/claude-issues.md
- Add proper links to all referenced documentation
- Balance between minimal entry point and practical usability
- Add MIR26 doc≡code sync test (tests/mir_instruction_set_sync.rs)
- Quiet snapshots; filter plugin/net logs; golden all green
- Delegate VM phi selection to LoopExecutor (borrow-safe)
- ResultBox migration: remove legacy box_trait::ResultBox paths
- VM BoxRef arithmetic fallbacks via toString().parse::<i64>()
- Bridge BoxCall(InstanceBox) to Class.method/arity in VM
- Fix Effects purity (READ -> readonly, not pure)
- Mark Catch as CONTROL to prevent DCE; Try/Catch test green
- Add env-gated debug logs (effects, verifier, mir-printer, trycatch, ref, bin)
- Update CURRENT_TASK with progress and next steps
- Add e2e_vm_http_status_404/500 tests to verify HTTP status handling
- ResultBox properly returns Ok(Response) for HTTP errors, Err for connection failures
- Create dynamic-plugin-flow.md documenting MIR→VM→Registry→Plugin flow
- Add vm-stats test files for HTTP 404/500 status codes
- Update net-plugin.md with HTTP error handling clarification
- Create E2E_TESTS.md documenting all E2E test behaviors
- Add mir-26-instruction-diet.md for MIR optimization plans
- Add vm-stats-cookbook.md for VM statistics usage guide
- Update MIR verifier to properly track self-assignment patterns
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Fix MIR builder to correctly return phi values in if-else statements
- Add extract_assigned_var helper to handle Program blocks
- Update variable mapping after phi node creation
- Fixes 'Value %14 not set' error in e2e_vm_http_client_error_result
- Fix HTTP plugin error handling for connection failures
- Return error string instead of handle on tcp_ok=false
- Enable proper ResultBox(Err) creation for failed connections
- Now r.isOk() correctly returns false for connection errors
- Add VM fallback for toString() on any Box type
- Ensures error messages can be displayed even without specific implementation
- All e2e_vm_http_client_error_result tests now pass
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
✅ All E2E tests passing:
- e2e_vm_plugin_filebox_copy_from_handle: Handle arguments (tag=8) working
- e2e_vm_http_get_basic: GET with ResultBox.get_value() working
- e2e_vm_http_post_and_headers: POST with status/headers/body verified (201:V:R)
Key achievement:
- VM already had ResultBox methods (is_ok/get_value/get_error) implemented
- HTTP plugin methods now correctly return values through ResultBox
- Full VM×Plugin integration verified
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add VM instruction statistics (--vm-stats, --vm-stats-json)
- Fix missing fields in ast.rs test code (public_fields, private_fields)
- Add CliConfig fields for VM statistics
- Enable TLV debug logging in plugin_loader_v2
- Successfully test FileBox handle passing and HTTP plugin creation
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
Major changes:
- Add --vm-stats and --vm-stats-json CLI flags for VM instruction profiling
- Implement instruction counting by opcode type with JSON output support
- Add enhanced TLV debug logging with NYASH_DEBUG_PLUGIN=1 environment variable
- Fix missing fields in CliConfig and ASTNode::BoxDeclaration for test compatibility
- Improve plugin method call error messages with argument count/type details
This enables MIR→VM conversion health checks and supports the Phase 8.6 VM optimization goals.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add VM→Plugin and Plugin→VM boundary logging for method calls
- Enhanced debug output for tracking plugin invocation flow
- Added return value type/instance logging for better debugging
- Improved plugin communication traceability
This helps debug issues like the recent 45-day HTTP response debugging saga.
All logging is controlled by NYASH_DEBUG_PLUGIN=1 environment variable.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
Fixed the method ID order in HttpRequestBox configuration to match plugin implementation:
- path: method_id 1 (was incorrectly 2)
- readBody: method_id 2 (was incorrectly 3)
- respond: method_id 3 (was incorrectly 1)
This resolves the 45-day debugging issue where req.respond(resp) was calling
the wrong plugin method, causing HTTP responses to have empty bodies.
All E2E tests now pass:
- e2e_http_stub_end_to_end ✅
- e2e_http_multiple_requests_order ✅
- e2e_http_post_and_headers ✅
- e2e_http_server_restart ✅
- e2e_http_server_shutdown_and_restart ✅🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add returns_result = true to HttpServerBox methods (start, stop, accept)
- Update all E2E tests to use .get_value() for Result handling
- Prepare for gradual Result-based error handling migration
This implements the first phase of ChatGPT5's Result正規化 design,
starting with network-related methods as agreed.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add returns_result.md documenting Result正規化 (returns_result = true)
- Add when-pattern-matching.md with future pattern matching syntax design
- Update E2E tests to use get_value() for HTTP responses
- Update plugin system README with B案 (Result-based) support
- Remove singleton from HttpServerBox and SocketServerBox for stability
This prepares for gradual migration to Result-based error handling,
starting with Net plugin methods, as agreed with ChatGPT5's design.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add e2e_plugin_net_additional.rs with parallel server tests
- Fix test to properly handle request objects (no double accept)
- Add comprehensive net-plugin documentation
- Implement debug tracing for method calls
- Enhance plugin lifecycle documentation
- Improve error handling in plugin loader
- Add leak tracking infrastructure (for future use)
- Update language spec with latest plugin features
This enhances test coverage for concurrent HTTP servers and improves
the overall plugin system documentation and debugging capabilities.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Fix path parsing for absolute URLs (http://...) in net plugin
- Add Integer(tag=5) i64 compatibility for port arguments
- Implement proper accept() wait loop (up to 5 seconds)
- Add start_seq for reliable server routing
- Remove singleton pattern from HTTP/Socket servers
- Add Socket E2E tests with timeout support
- Enable NYASH_NET_LOG environment variable for debugging
All 5 HTTP E2E tests now pass consistently\!
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add e2e_http_post_and_headers: Full POST request with headers test
- Verify client POST with body data ('DATA')
- Server reads request body and responds with custom status (201)
- Custom headers (X-Test: V) properly set and retrieved
- Complete request/response cycle validation: '201:V:R' ✅
- All 4 HTTP plugin tests passing
HTTP POSTとヘッダー操作のE2Eテスト追加
- POSTリクエストのボディ送受信確認
- カスタムステータスコード(201 Created)
- HTTPヘッダーの設定と取得
- 完全なHTTPプロトコル機能の検証
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
- Add e2e_http_server_restart: Verify server can stop/start and handle requests (returns 'B')
- Add e2e_http_server_shutdown_and_restart: Verify plugin shutdown/reinit works (returns 'Y')
- Fix active server tracking with ACTIVE_SERVER_ID for proper request routing
- All 3 HTTP plugin E2E tests now passing ✅
HTTPサーバー再起動とシャットダウンテスト追加
- サーバーのstop/start再起動が正常動作
- プラグインshutdown後の再初期化も確認
- アクティブサーバー管理で適切なリクエストルーティング実現
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
- Implement full HTTP flow test: server → client → accept → respond → readBody
- All HTTP Box types working correctly (HttpServerBox, HttpClientBox, HttpRequestBox, HttpResponseBox)
- Handle type encoding for plugin method arguments working properly
- Test validates complete HTTP request/response cycle
- Net plugin E2E test passing ✅
HTTPネットワークプラグインのE2Eテスト追加
- サーバー起動からレスポンス読み取りまでの完全なフロー検証
- Handle型引数のTLVエンコーディングも正常動作
- 非同期HTTPフローの完全動作確認
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
- Add singleton support for plugin boxes (e.g., CounterBox)
- Implement shutdown_plugins_v2() for controlled plugin lifecycle
- Plugin instances now shared across multiple new() calls
- Shutdown properly releases and allows re-initialization
- All singleton E2E tests passing ✅
ChatGPT5による高度なプラグインライフサイクル管理実装
- シングルトンパターンでプラグインインスタンス共有
- 明示的なshutdownでリソース解放と再初期化対応
- Nyashの統一ライフサイクルポリシー維持
Note: ast.rs test failures are due to rapid development pace -
tests need updating for new BoxDeclaration fields (private_fields, public_fields)
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
- Add reserved name guard test to prevent non-builtin factories from hijacking builtin names
- Add Handle TLV encoding/decoding test for FileBox copyFrom method
- Add CounterBox plugin tests for inc/get operations and clone/share behavior
- All unified registry E2E tests passing ✅
統一レジストリシステムの包括的なE2Eテスト追加
- ビルトイン名保護テスト
- Handle型TLVエンコーディングテスト
- CounterBoxプラグインテスト
- 全テスト成功確認
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
- Added e2e_plugin_interop test: Verifies plugin Box (EchoBox) can be stored in and retrieved from MapBox
- Added vm_e2e test: Validates VM backend execution with plugin boxes
- Both tests pass successfully, confirming unified registry works across different execution backends
- Interop test demonstrates seamless integration between plugin and builtin boxes
Test results:
- e2e_interop_mapbox_store_plugin_box: ✅ (returns "ok")
- vm_e2e_adder_box: ✅ (VM execution successful)
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Added is_builtin_factory() method to BoxFactory trait
- Added register_box_factory() method to NyashInterpreter for dynamic factory registration
- Added is_valid_type() and has_type() methods for unified type checking
- Fixed e2e tests by removing 'return' statements (Nyash doesn't require them at top level)
- Created TestPluginFactory with EchoBox and AdderBox for testing
- All e2e tests now passing: EchoBox returns "hi", AdderBox returns "42"
This commit finalizes the unified registry system with full support for
dynamic factory registration, enabling plugins and custom Box types to be
seamlessly integrated into the interpreter at runtime.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Updated mir_phase7_async_ops.rs to use NewBox/BoxCall instead of FutureNew/Await
- Updated mir_phase6_vm_ref_ops.rs to use new instruction format
- Moved deprecated test wasm_poc2_box_operations.rs to archive
Part of Phase 5-4 completion
- Archive old documentation and test files to `docs/archive/` and `local_tests/`.
- Remove various temporary and old files from the project root.
- Add `nekocode-rust` analysis tool and its output files (`nekocode/`, `.nekocode_sessions/`, `analysis.json`).
- Minor updates to `apps/chip8_nyash/chip8_emulator.nyash` and `local_tests` files.
This commit cleans up the repository and sets the stage for further code modularization efforts, particularly in the `src/interpreter` and `src/parser` modules, based on recent analysis.