252 lines
9.8 KiB
Markdown
252 lines
9.8 KiB
Markdown
|
|
# Codex (GPT-5)の技術分析:Python→Nyashトランスパイルの実装詳細
|
|||
|
|
|
|||
|
|
2025-08-30 - OpenAI Codex v0.25.0による技術的深掘り
|
|||
|
|
|
|||
|
|
## Big Picture
|
|||
|
|
|
|||
|
|
- **Purpose**: Generate Nyash source from Python to leverage Nyash's optimizer, keep debugging simple, and "think in Nyash" early without locking into MIR details.
|
|||
|
|
- **Core idea**: Normalize Python to a small CorePy IR, then lower to idiomatic Nyash with a thin "Py runtime shim" in Nyash that preserves Python's dynamic semantics where required.
|
|||
|
|
|
|||
|
|
## AST Conversion
|
|||
|
|
|
|||
|
|
### Two-stage lowering
|
|||
|
|
```
|
|||
|
|
Python AST → CorePy IR → Nyash AST
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
CorePy should be expression-friendly, desugared, and semantics-explicit (e.g., with→try/finally, for→iterator loop, boolean ops→if/else).
|
|||
|
|
|
|||
|
|
### Visitor + environment
|
|||
|
|
Implement a node visitor that carries scope info (locals/free/globals), evaluation order, and source spans. Use Python's `symtable` to seed symbol kinds; validate against your own pass.
|
|||
|
|
|
|||
|
|
### Semantic shims
|
|||
|
|
Lower Python ops to Nyash intrinsics that preserve semantics:
|
|||
|
|
- `py_truthy(x)`
|
|||
|
|
- `py_getattr(o,"a")`
|
|||
|
|
- `py_setattr(o,"a",v)`
|
|||
|
|
- `py_binop("add", a, b)`
|
|||
|
|
- `py_cmp("lt", a, b)`
|
|||
|
|
- `py_iter(x)`
|
|||
|
|
- `py_call(f, args, kwargs)`
|
|||
|
|
- `py_slice(x, i, j, k)`
|
|||
|
|
|
|||
|
|
### Boxes and cells
|
|||
|
|
Model Python variables and closures with Box/Cell objects. Rule of thumb: locals are unboxed unless captured or aliased; promote to Box when needed. Everything-is-Box in Nyash aligns well with Python's mutability/aliasing.
|
|||
|
|
|
|||
|
|
### Control flow
|
|||
|
|
Normalize to a small set: `if`, `block`, `loop`, `break`, `continue`, `try/catch/finally`, `throw`. Lower `and`/`or` with short-circuit temp; turn comprehensions into explicit loops with dedicated inner scope.
|
|||
|
|
|
|||
|
|
## Transpile Quality
|
|||
|
|
|
|||
|
|
### Readability vs optimize
|
|||
|
|
Offer modes. Default emits idiomatic Nyash constructs and meaningful identifiers, comments with source spans, and simple temporaries. "Optimize" mode switches to `py_*` intrinsics fusion and fewer temps.
|
|||
|
|
|
|||
|
|
### Idiomatic Nyash
|
|||
|
|
Prefer Nyash control constructs over procedural labels. Use native blocks for `if/else`, `match` if Nyash has it; only fall back to runtime calls where semantics demand.
|
|||
|
|
|
|||
|
|
### Stable pretty-printer
|
|||
|
|
Round-trip friendly formatter with consistent whitespace, trailing comma rules, and deterministic temp naming (`_t1`, `_t2…`). Keep def/class emitted in declaration-order.
|
|||
|
|
|
|||
|
|
### Debug info
|
|||
|
|
Attach `span(file, line, col)` to every IR node, carry through to Nyash AST, then emit a sidecar source map. Optionally embed lightweight `#line` directives or inline comments per statement in debug builds.
|
|||
|
|
|
|||
|
|
## Python Feature Mapping
|
|||
|
|
|
|||
|
|
### Default args
|
|||
|
|
Evaluate at def-time; store tuple/dict on the function object. At call-time, fill missing with stored defaults. Beware mutable defaults: do not clone; reuse exact object.
|
|||
|
|
|
|||
|
|
### LEGB scoping
|
|||
|
|
Build symbol table with flags for `global`/`nonlocal`. Emit closure "cells" (Boxes) for free vars; functions capture a vector of cells. Globals map to the module dict; builtins fallback when name miss in global.
|
|||
|
|
|
|||
|
|
### for/else, while/else
|
|||
|
|
Introduce `broken=false`. On `break`, set and exit; after loop, `if !broken { else_block }`.
|
|||
|
|
|
|||
|
|
### Comprehensions
|
|||
|
|
Create inner function/scope per comprehension (Python 3 semantics). Assignment targets exist only in that scope. Preserve evaluation order and late binding behavior.
|
|||
|
|
|
|||
|
|
### With statement
|
|||
|
|
Desugar to try/finally per Python spec: evaluate context expr, call `__enter__`, bind target, run body, always call `__exit__`, and suppress exception only if `__exit__` returns truthy.
|
|||
|
|
|
|||
|
|
### Decorators
|
|||
|
|
Evaluate bottom-up at def-time: `fn = decoN(...(deco1(fn)))` then assign back. Keep evaluation order of decorator expressions.
|
|||
|
|
|
|||
|
|
### Generators
|
|||
|
|
Lower to a state machine object implementing Nyash's iterator protocol, with saved instruction pointer, stack slots, and exception injection (`throw`, `close`). Support `yield from` by delegation trampoline.
|
|||
|
|
|
|||
|
|
### Pattern matching (PEP 634)
|
|||
|
|
If supported by Nyash, map directly; else lower to nested guards and extractor calls in a `py_match` helper library.
|
|||
|
|
|
|||
|
|
### Data model
|
|||
|
|
Attribute access honors descriptors; method binding creates bound objects; arithmetic and comparisons dispatch to `__op__`/`__rop__` and rich comparisons; truthiness via `__bool__`/`__len__`.
|
|||
|
|
|
|||
|
|
## Performance Opportunities
|
|||
|
|
|
|||
|
|
### At transpile-time
|
|||
|
|
- Constant fold literals, f-strings (format plan precomputation), simple arithmetic if types are literal.
|
|||
|
|
- Invariant hoisting for loop-invariant comprehensions and attribute chains when no side-effects (guarded).
|
|||
|
|
- Direct calls to Nyash intrinsics for selected builtins (`len`, `isinstance`, `range`) only if not shadowed (prove via symbol table).
|
|||
|
|
- Peephole: collapse nested `py_truthy(py_truthy(x))`, merge adjacent `if` with literal conditions, drop dead temporaries.
|
|||
|
|
|
|||
|
|
### Defer to Nyash compiler
|
|||
|
|
- Inlining across Nyash functions, register allocation, loop unrolling, vectorization, constant propagation at MIR level.
|
|||
|
|
- DCE/CSE once `py_*` helpers are inlined or annotated as pure/idempotent where legal.
|
|||
|
|
|
|||
|
|
### Types as hints
|
|||
|
|
- Consume Python annotations/`typing` to emit specialized fast paths: e.g., `int` → direct Nyash integer ops, else fallback to `py_binop`. Plumb types through IR as optional metadata for MIR to exploit.
|
|||
|
|
- Profile-guided guards: optional mode emits guards around hot calls to enable Nyash JIT/AOT to speculate and deopt to generic `py_*`.
|
|||
|
|
|
|||
|
|
## Error Handling & Debug
|
|||
|
|
|
|||
|
|
### Source maps
|
|||
|
|
Emit a compact mapping (e.g., VLQ JSON) from Nyash line/col → Python original; include segment mappings per expression for precise stepping.
|
|||
|
|
|
|||
|
|
### Exception rewriting
|
|||
|
|
Wrap Nyash runtime entrypoints to translate stack frames via the source map, hiding frames from helpers (`py_*`) unless verbose mode is on.
|
|||
|
|
|
|||
|
|
### Stage diagnostics
|
|||
|
|
- CorePy dump: toggle to print normalized IR with spans.
|
|||
|
|
- Nyash preview: post-format preview with original Python line hints.
|
|||
|
|
- Trace toggles: selective tracing of `py_call`, `py_getattr`, iteration; throttle to avoid noise.
|
|||
|
|
|
|||
|
|
### Friendly messages
|
|||
|
|
On unsupported nodes or ambiguous semantics, show minimal repro, Python snippet, and link to a doc page. Include symbol table excerpt when scoping fails.
|
|||
|
|
|
|||
|
|
## Architecture & DX
|
|||
|
|
|
|||
|
|
### Pass pipeline
|
|||
|
|
```
|
|||
|
|
Parse Python AST → Symbol table → Normalize to CorePy →
|
|||
|
|
Scope/closure analysis → Type/meta attach → Lower to Nyash AST →
|
|||
|
|
Optimize (peephole/simplify) → Pretty-print + source map
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Runtime shim (`nyash/lib/py_runtime.ny`)
|
|||
|
|
Core APIs:
|
|||
|
|
- `py_call(f, pos, kw, star, dstar)`
|
|||
|
|
- `py_truthy(x)`
|
|||
|
|
- `py_getattr/py_setattr`
|
|||
|
|
- `py_binop(op, a, b)`
|
|||
|
|
- `py_cmp(op,a,b)`
|
|||
|
|
- `py_iter(x)`
|
|||
|
|
- `py_next(it)`
|
|||
|
|
- `py_slice(x,i,j,k)`
|
|||
|
|
- `py_with(mgr, body_fn)`
|
|||
|
|
- `py_raise`
|
|||
|
|
- `py_is`
|
|||
|
|
- `py_eq`
|
|||
|
|
|
|||
|
|
Data model support: descriptor get/set, method binding, MRO lookup, exception hierarchy, StopIteration protocol.
|
|||
|
|
|
|||
|
|
Perf annotations: mark pure or inline candidates; keep stable ABI.
|
|||
|
|
|
|||
|
|
### CLI/flags
|
|||
|
|
Modes:
|
|||
|
|
- `--readable`
|
|||
|
|
- `--optimized`
|
|||
|
|
- `--debug`
|
|||
|
|
- `--emit-sourcemap`
|
|||
|
|
- `--dump-corepy`
|
|||
|
|
- `--strict-builtins`
|
|||
|
|
|
|||
|
|
Caching: hash of Python AST + flags to cache Nyash output, source map, and diagnostics.
|
|||
|
|
|
|||
|
|
Watch/incremental: re-transpile changed modules, preserve source map continuity.
|
|||
|
|
|
|||
|
|
### Tests
|
|||
|
|
- Golden tests: Python snippet → Nyash output diff, with normalization.
|
|||
|
|
- Differential: run under CPython vs Nyash runtime for functional parity on a corpus (unit/property tests).
|
|||
|
|
- Conformance: edge cases (scoping, descriptors, generators, exceptions) and evaluation order tests.
|
|||
|
|
|
|||
|
|
## Pitfalls & Remedies
|
|||
|
|
|
|||
|
|
### Evaluation order
|
|||
|
|
Python's left-to-right arg eval, starred/unpacking, and kw conflict checks. Enforce by sequencing temps precisely before `py_call`.
|
|||
|
|
|
|||
|
|
### Shadowing builtins/globals
|
|||
|
|
Only specialize when proven not shadowed in any reachable scope. Provide `--strict-builtins` to disable specialization unless guaranteed.
|
|||
|
|
|
|||
|
|
### Identity vs equality
|
|||
|
|
`is` is reference identity; avoid folding or substituting.
|
|||
|
|
|
|||
|
|
### Integer semantics
|
|||
|
|
Python's bigints; ensure Nyash numeric layer matches or route to bigints in `py_*`.
|
|||
|
|
|
|||
|
|
## Future Extensibility
|
|||
|
|
|
|||
|
|
### Plugins
|
|||
|
|
Pass manager with hooks (`before_lower`, `after_lower`, `on_node_<Type>`). Allow project-local rewrites and macros, with access to symbol/type info.
|
|||
|
|
|
|||
|
|
### Custom rules
|
|||
|
|
DSL for pattern→rewrite with predicates (types, purity), e.g., rewrite `dataclass` patterns to Nyash records.
|
|||
|
|
|
|||
|
|
### Multi-language
|
|||
|
|
Keeping the Nyash script as a stable contract invites other frontends (e.g., a subset of JS/TypeScript or Lua) to target Nyash; keep `py_*` separate from language-agnostic intrinsics to avoid contamination.
|
|||
|
|
|
|||
|
|
### Gradual migration
|
|||
|
|
As Nyash grows Pythonic libraries, progressively replace `py_*` with native Nyash idioms; keep a compatibility layer for mixed projects.
|
|||
|
|
|
|||
|
|
## Concrete Translation Sketches
|
|||
|
|
|
|||
|
|
### Attribute
|
|||
|
|
```python
|
|||
|
|
a.b
|
|||
|
|
```
|
|||
|
|
→
|
|||
|
|
```nyash
|
|||
|
|
py_getattr(a, "b")
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Call
|
|||
|
|
```python
|
|||
|
|
f(x, y=1, *as, **kw)
|
|||
|
|
```
|
|||
|
|
→
|
|||
|
|
```nyash
|
|||
|
|
py_call(f, [x], {"y":1}, as, kw)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### If
|
|||
|
|
```python
|
|||
|
|
if a and b:
|
|||
|
|
```
|
|||
|
|
→
|
|||
|
|
```nyash
|
|||
|
|
let _t=py_truthy(a); if _t { if py_truthy(b) { ... } }
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### For/else
|
|||
|
|
```python
|
|||
|
|
for x in xs:
|
|||
|
|
if cond:
|
|||
|
|
break
|
|||
|
|
else:
|
|||
|
|
else_block
|
|||
|
|
```
|
|||
|
|
→
|
|||
|
|
```nyash
|
|||
|
|
let _it = py_iter(xs);
|
|||
|
|
let _broken=false;
|
|||
|
|
loop {
|
|||
|
|
let _n = py_next(_it) catch StopIteration { break };
|
|||
|
|
x = _n;
|
|||
|
|
...
|
|||
|
|
if cond { _broken=true; break }
|
|||
|
|
}
|
|||
|
|
if !_broken { else_block }
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### With
|
|||
|
|
Evaluate mgr, call `__enter__`, bind val; try body; on exception, call `__exit__(type,e,tb)` and suppress if returns true; finally call `__exit__(None,None,None)` when no exception.
|
|||
|
|
|
|||
|
|
### Decorators
|
|||
|
|
```nyash
|
|||
|
|
let f = <def>;
|
|||
|
|
f = decoN(...(deco1(f)));
|
|||
|
|
name = f
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## Why Nyash Script First
|
|||
|
|
|
|||
|
|
- **Debuggability**: Human-readable Nyash is easier to inspect, diff, and map errors to; source maps stay small and precise.
|
|||
|
|
- **Optimization leverage**: Nyash compiler/MIR can keep improving independently; your Python frontend benefits automatically.
|
|||
|
|
- **Ecosystem fit**: Generates idiomatic Nyash that other tools (formatters, linters, analyzers) can understand; fosters a consistent DX.
|