252 lines
9.8 KiB
Markdown
252 lines
9.8 KiB
Markdown
# Codex (GPT-5)の技術分析:Python→Nyashトランスパイルの実装詳細
|
||
|
||
2025-08-30 - OpenAI Codex v0.25.0による技術的深掘り
|
||
|
||
## Big Picture
|
||
|
||
- **Purpose**: Generate Nyash source from Python to leverage Nyash's optimizer, keep debugging simple, and "think in Nyash" early without locking into MIR details.
|
||
- **Core idea**: Normalize Python to a small CorePy IR, then lower to idiomatic Nyash with a thin "Py runtime shim" in Nyash that preserves Python's dynamic semantics where required.
|
||
|
||
## AST Conversion
|
||
|
||
### Two-stage lowering
|
||
```
|
||
Python AST → CorePy IR → Nyash AST
|
||
```
|
||
|
||
CorePy should be expression-friendly, desugared, and semantics-explicit (e.g., with→try/finally, for→iterator loop, boolean ops→if/else).
|
||
|
||
### Visitor + environment
|
||
Implement a node visitor that carries scope info (locals/free/globals), evaluation order, and source spans. Use Python's `symtable` to seed symbol kinds; validate against your own pass.
|
||
|
||
### Semantic shims
|
||
Lower Python ops to Nyash intrinsics that preserve semantics:
|
||
- `py_truthy(x)`
|
||
- `py_getattr(o,"a")`
|
||
- `py_setattr(o,"a",v)`
|
||
- `py_binop("add", a, b)`
|
||
- `py_cmp("lt", a, b)`
|
||
- `py_iter(x)`
|
||
- `py_call(f, args, kwargs)`
|
||
- `py_slice(x, i, j, k)`
|
||
|
||
### Boxes and cells
|
||
Model Python variables and closures with Box/Cell objects. Rule of thumb: locals are unboxed unless captured or aliased; promote to Box when needed. Everything-is-Box in Nyash aligns well with Python's mutability/aliasing.
|
||
|
||
### Control flow
|
||
Normalize to a small set: `if`, `block`, `loop`, `break`, `continue`, `try/catch/finally`, `throw`. Lower `and`/`or` with short-circuit temp; turn comprehensions into explicit loops with dedicated inner scope.
|
||
|
||
## Transpile Quality
|
||
|
||
### Readability vs optimize
|
||
Offer modes. Default emits idiomatic Nyash constructs and meaningful identifiers, comments with source spans, and simple temporaries. "Optimize" mode switches to `py_*` intrinsics fusion and fewer temps.
|
||
|
||
### Idiomatic Nyash
|
||
Prefer Nyash control constructs over procedural labels. Use native blocks for `if/else`, `match` if Nyash has it; only fall back to runtime calls where semantics demand.
|
||
|
||
### Stable pretty-printer
|
||
Round-trip friendly formatter with consistent whitespace, trailing comma rules, and deterministic temp naming (`_t1`, `_t2…`). Keep def/class emitted in declaration-order.
|
||
|
||
### Debug info
|
||
Attach `span(file, line, col)` to every IR node, carry through to Nyash AST, then emit a sidecar source map. Optionally embed lightweight `#line` directives or inline comments per statement in debug builds.
|
||
|
||
## Python Feature Mapping
|
||
|
||
### Default args
|
||
Evaluate at def-time; store tuple/dict on the function object. At call-time, fill missing with stored defaults. Beware mutable defaults: do not clone; reuse exact object.
|
||
|
||
### LEGB scoping
|
||
Build symbol table with flags for `global`/`nonlocal`. Emit closure "cells" (Boxes) for free vars; functions capture a vector of cells. Globals map to the module dict; builtins fallback when name miss in global.
|
||
|
||
### for/else, while/else
|
||
Introduce `broken=false`. On `break`, set and exit; after loop, `if !broken { else_block }`.
|
||
|
||
### Comprehensions
|
||
Create inner function/scope per comprehension (Python 3 semantics). Assignment targets exist only in that scope. Preserve evaluation order and late binding behavior.
|
||
|
||
### With statement
|
||
Desugar to try/finally per Python spec: evaluate context expr, call `__enter__`, bind target, run body, always call `__exit__`, and suppress exception only if `__exit__` returns truthy.
|
||
|
||
### Decorators
|
||
Evaluate bottom-up at def-time: `fn = decoN(...(deco1(fn)))` then assign back. Keep evaluation order of decorator expressions.
|
||
|
||
### Generators
|
||
Lower to a state machine object implementing Nyash's iterator protocol, with saved instruction pointer, stack slots, and exception injection (`throw`, `close`). Support `yield from` by delegation trampoline.
|
||
|
||
### Pattern matching (PEP 634)
|
||
If supported by Nyash, map directly; else lower to nested guards and extractor calls in a `py_match` helper library.
|
||
|
||
### Data model
|
||
Attribute access honors descriptors; method binding creates bound objects; arithmetic and comparisons dispatch to `__op__`/`__rop__` and rich comparisons; truthiness via `__bool__`/`__len__`.
|
||
|
||
## Performance Opportunities
|
||
|
||
### At transpile-time
|
||
- Constant fold literals, f-strings (format plan precomputation), simple arithmetic if types are literal.
|
||
- Invariant hoisting for loop-invariant comprehensions and attribute chains when no side-effects (guarded).
|
||
- Direct calls to Nyash intrinsics for selected builtins (`len`, `isinstance`, `range`) only if not shadowed (prove via symbol table).
|
||
- Peephole: collapse nested `py_truthy(py_truthy(x))`, merge adjacent `if` with literal conditions, drop dead temporaries.
|
||
|
||
### Defer to Nyash compiler
|
||
- Inlining across Nyash functions, register allocation, loop unrolling, vectorization, constant propagation at MIR level.
|
||
- DCE/CSE once `py_*` helpers are inlined or annotated as pure/idempotent where legal.
|
||
|
||
### Types as hints
|
||
- Consume Python annotations/`typing` to emit specialized fast paths: e.g., `int` → direct Nyash integer ops, else fallback to `py_binop`. Plumb types through IR as optional metadata for MIR to exploit.
|
||
- Profile-guided guards: optional mode emits guards around hot calls to enable Nyash JIT/AOT to speculate and deopt to generic `py_*`.
|
||
|
||
## Error Handling & Debug
|
||
|
||
### Source maps
|
||
Emit a compact mapping (e.g., VLQ JSON) from Nyash line/col → Python original; include segment mappings per expression for precise stepping.
|
||
|
||
### Exception rewriting
|
||
Wrap Nyash runtime entrypoints to translate stack frames via the source map, hiding frames from helpers (`py_*`) unless verbose mode is on.
|
||
|
||
### Stage diagnostics
|
||
- CorePy dump: toggle to print normalized IR with spans.
|
||
- Nyash preview: post-format preview with original Python line hints.
|
||
- Trace toggles: selective tracing of `py_call`, `py_getattr`, iteration; throttle to avoid noise.
|
||
|
||
### Friendly messages
|
||
On unsupported nodes or ambiguous semantics, show minimal repro, Python snippet, and link to a doc page. Include symbol table excerpt when scoping fails.
|
||
|
||
## Architecture & DX
|
||
|
||
### Pass pipeline
|
||
```
|
||
Parse Python AST → Symbol table → Normalize to CorePy →
|
||
Scope/closure analysis → Type/meta attach → Lower to Nyash AST →
|
||
Optimize (peephole/simplify) → Pretty-print + source map
|
||
```
|
||
|
||
### Runtime shim (`nyash/lib/py_runtime.ny`)
|
||
Core APIs:
|
||
- `py_call(f, pos, kw, star, dstar)`
|
||
- `py_truthy(x)`
|
||
- `py_getattr/py_setattr`
|
||
- `py_binop(op, a, b)`
|
||
- `py_cmp(op,a,b)`
|
||
- `py_iter(x)`
|
||
- `py_next(it)`
|
||
- `py_slice(x,i,j,k)`
|
||
- `py_with(mgr, body_fn)`
|
||
- `py_raise`
|
||
- `py_is`
|
||
- `py_eq`
|
||
|
||
Data model support: descriptor get/set, method binding, MRO lookup, exception hierarchy, StopIteration protocol.
|
||
|
||
Perf annotations: mark pure or inline candidates; keep stable ABI.
|
||
|
||
### CLI/flags
|
||
Modes:
|
||
- `--readable`
|
||
- `--optimized`
|
||
- `--debug`
|
||
- `--emit-sourcemap`
|
||
- `--dump-corepy`
|
||
- `--strict-builtins`
|
||
|
||
Caching: hash of Python AST + flags to cache Nyash output, source map, and diagnostics.
|
||
|
||
Watch/incremental: re-transpile changed modules, preserve source map continuity.
|
||
|
||
### Tests
|
||
- Golden tests: Python snippet → Nyash output diff, with normalization.
|
||
- Differential: run under CPython vs Nyash runtime for functional parity on a corpus (unit/property tests).
|
||
- Conformance: edge cases (scoping, descriptors, generators, exceptions) and evaluation order tests.
|
||
|
||
## Pitfalls & Remedies
|
||
|
||
### Evaluation order
|
||
Python's left-to-right arg eval, starred/unpacking, and kw conflict checks. Enforce by sequencing temps precisely before `py_call`.
|
||
|
||
### Shadowing builtins/globals
|
||
Only specialize when proven not shadowed in any reachable scope. Provide `--strict-builtins` to disable specialization unless guaranteed.
|
||
|
||
### Identity vs equality
|
||
`is` is reference identity; avoid folding or substituting.
|
||
|
||
### Integer semantics
|
||
Python's bigints; ensure Nyash numeric layer matches or route to bigints in `py_*`.
|
||
|
||
## Future Extensibility
|
||
|
||
### Plugins
|
||
Pass manager with hooks (`before_lower`, `after_lower`, `on_node_<Type>`). Allow project-local rewrites and macros, with access to symbol/type info.
|
||
|
||
### Custom rules
|
||
DSL for pattern→rewrite with predicates (types, purity), e.g., rewrite `dataclass` patterns to Nyash records.
|
||
|
||
### Multi-language
|
||
Keeping the Nyash script as a stable contract invites other frontends (e.g., a subset of JS/TypeScript or Lua) to target Nyash; keep `py_*` separate from language-agnostic intrinsics to avoid contamination.
|
||
|
||
### Gradual migration
|
||
As Nyash grows Pythonic libraries, progressively replace `py_*` with native Nyash idioms; keep a compatibility layer for mixed projects.
|
||
|
||
## Concrete Translation Sketches
|
||
|
||
### Attribute
|
||
```python
|
||
a.b
|
||
```
|
||
→
|
||
```nyash
|
||
py_getattr(a, "b")
|
||
```
|
||
|
||
### Call
|
||
```python
|
||
f(x, y=1, *as, **kw)
|
||
```
|
||
→
|
||
```nyash
|
||
py_call(f, [x], {"y":1}, as, kw)
|
||
```
|
||
|
||
### If
|
||
```python
|
||
if a and b:
|
||
```
|
||
→
|
||
```nyash
|
||
let _t=py_truthy(a); if _t { if py_truthy(b) { ... } }
|
||
```
|
||
|
||
### For/else
|
||
```python
|
||
for x in xs:
|
||
if cond:
|
||
break
|
||
else:
|
||
else_block
|
||
```
|
||
→
|
||
```nyash
|
||
let _it = py_iter(xs);
|
||
let _broken=false;
|
||
loop {
|
||
let _n = py_next(_it) catch StopIteration { break };
|
||
x = _n;
|
||
...
|
||
if cond { _broken=true; break }
|
||
}
|
||
if !_broken { else_block }
|
||
```
|
||
|
||
### With
|
||
Evaluate mgr, call `__enter__`, bind val; try body; on exception, call `__exit__(type,e,tb)` and suppress if returns true; finally call `__exit__(None,None,None)` when no exception.
|
||
|
||
### Decorators
|
||
```nyash
|
||
let f = <def>;
|
||
f = decoN(...(deco1(f)));
|
||
name = f
|
||
```
|
||
|
||
## Why Nyash Script First
|
||
|
||
- **Debuggability**: Human-readable Nyash is easier to inspect, diff, and map errors to; source maps stay small and precise.
|
||
- **Optimization leverage**: Nyash compiler/MIR can keep improving independently; your Python frontend benefits automatically.
|
||
- **Ecosystem fit**: Generates idiomatic Nyash that other tools (formatters, linters, analyzers) can understand; fosters a consistent DX. |