Files
hakorune/docs/phases/phase-10.7/archive/codex-analysis-technical-implementation.md

252 lines
9.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Codex (GPT-5)の技術分析Python→Nyashトランスパイルの実装詳細
2025-08-30 - OpenAI Codex v0.25.0による技術的深掘り
## Big Picture
- **Purpose**: Generate Nyash source from Python to leverage Nyash's optimizer, keep debugging simple, and "think in Nyash" early without locking into MIR details.
- **Core idea**: Normalize Python to a small CorePy IR, then lower to idiomatic Nyash with a thin "Py runtime shim" in Nyash that preserves Python's dynamic semantics where required.
## AST Conversion
### Two-stage lowering
```
Python AST → CorePy IR → Nyash AST
```
CorePy should be expression-friendly, desugared, and semantics-explicit (e.g., with→try/finally, for→iterator loop, boolean ops→if/else).
### Visitor + environment
Implement a node visitor that carries scope info (locals/free/globals), evaluation order, and source spans. Use Python's `symtable` to seed symbol kinds; validate against your own pass.
### Semantic shims
Lower Python ops to Nyash intrinsics that preserve semantics:
- `py_truthy(x)`
- `py_getattr(o,"a")`
- `py_setattr(o,"a",v)`
- `py_binop("add", a, b)`
- `py_cmp("lt", a, b)`
- `py_iter(x)`
- `py_call(f, args, kwargs)`
- `py_slice(x, i, j, k)`
### Boxes and cells
Model Python variables and closures with Box/Cell objects. Rule of thumb: locals are unboxed unless captured or aliased; promote to Box when needed. Everything-is-Box in Nyash aligns well with Python's mutability/aliasing.
### Control flow
Normalize to a small set: `if`, `block`, `loop`, `break`, `continue`, `try/catch/finally`, `throw`. Lower `and`/`or` with short-circuit temp; turn comprehensions into explicit loops with dedicated inner scope.
## Transpile Quality
### Readability vs optimize
Offer modes. Default emits idiomatic Nyash constructs and meaningful identifiers, comments with source spans, and simple temporaries. "Optimize" mode switches to `py_*` intrinsics fusion and fewer temps.
### Idiomatic Nyash
Prefer Nyash control constructs over procedural labels. Use native blocks for `if/else`, `match` if Nyash has it; only fall back to runtime calls where semantics demand.
### Stable pretty-printer
Round-trip friendly formatter with consistent whitespace, trailing comma rules, and deterministic temp naming (`_t1`, `_t2…`). Keep def/class emitted in declaration-order.
### Debug info
Attach `span(file, line, col)` to every IR node, carry through to Nyash AST, then emit a sidecar source map. Optionally embed lightweight `#line` directives or inline comments per statement in debug builds.
## Python Feature Mapping
### Default args
Evaluate at def-time; store tuple/dict on the function object. At call-time, fill missing with stored defaults. Beware mutable defaults: do not clone; reuse exact object.
### LEGB scoping
Build symbol table with flags for `global`/`nonlocal`. Emit closure "cells" (Boxes) for free vars; functions capture a vector of cells. Globals map to the module dict; builtins fallback when name miss in global.
### for/else, while/else
Introduce `broken=false`. On `break`, set and exit; after loop, `if !broken { else_block }`.
### Comprehensions
Create inner function/scope per comprehension (Python 3 semantics). Assignment targets exist only in that scope. Preserve evaluation order and late binding behavior.
### With statement
Desugar to try/finally per Python spec: evaluate context expr, call `__enter__`, bind target, run body, always call `__exit__`, and suppress exception only if `__exit__` returns truthy.
### Decorators
Evaluate bottom-up at def-time: `fn = decoN(...(deco1(fn)))` then assign back. Keep evaluation order of decorator expressions.
### Generators
Lower to a state machine object implementing Nyash's iterator protocol, with saved instruction pointer, stack slots, and exception injection (`throw`, `close`). Support `yield from` by delegation trampoline.
### Pattern matching (PEP 634)
If supported by Nyash, map directly; else lower to nested guards and extractor calls in a `py_match` helper library.
### Data model
Attribute access honors descriptors; method binding creates bound objects; arithmetic and comparisons dispatch to `__op__`/`__rop__` and rich comparisons; truthiness via `__bool__`/`__len__`.
## Performance Opportunities
### At transpile-time
- Constant fold literals, f-strings (format plan precomputation), simple arithmetic if types are literal.
- Invariant hoisting for loop-invariant comprehensions and attribute chains when no side-effects (guarded).
- Direct calls to Nyash intrinsics for selected builtins (`len`, `isinstance`, `range`) only if not shadowed (prove via symbol table).
- Peephole: collapse nested `py_truthy(py_truthy(x))`, merge adjacent `if` with literal conditions, drop dead temporaries.
### Defer to Nyash compiler
- Inlining across Nyash functions, register allocation, loop unrolling, vectorization, constant propagation at MIR level.
- DCE/CSE once `py_*` helpers are inlined or annotated as pure/idempotent where legal.
### Types as hints
- Consume Python annotations/`typing` to emit specialized fast paths: e.g., `int` → direct Nyash integer ops, else fallback to `py_binop`. Plumb types through IR as optional metadata for MIR to exploit.
- Profile-guided guards: optional mode emits guards around hot calls to enable Nyash JIT/AOT to speculate and deopt to generic `py_*`.
## Error Handling & Debug
### Source maps
Emit a compact mapping (e.g., VLQ JSON) from Nyash line/col → Python original; include segment mappings per expression for precise stepping.
### Exception rewriting
Wrap Nyash runtime entrypoints to translate stack frames via the source map, hiding frames from helpers (`py_*`) unless verbose mode is on.
### Stage diagnostics
- CorePy dump: toggle to print normalized IR with spans.
- Nyash preview: post-format preview with original Python line hints.
- Trace toggles: selective tracing of `py_call`, `py_getattr`, iteration; throttle to avoid noise.
### Friendly messages
On unsupported nodes or ambiguous semantics, show minimal repro, Python snippet, and link to a doc page. Include symbol table excerpt when scoping fails.
## Architecture & DX
### Pass pipeline
```
Parse Python AST → Symbol table → Normalize to CorePy →
Scope/closure analysis → Type/meta attach → Lower to Nyash AST →
Optimize (peephole/simplify) → Pretty-print + source map
```
### Runtime shim (`nyash/lib/py_runtime.ny`)
Core APIs:
- `py_call(f, pos, kw, star, dstar)`
- `py_truthy(x)`
- `py_getattr/py_setattr`
- `py_binop(op, a, b)`
- `py_cmp(op,a,b)`
- `py_iter(x)`
- `py_next(it)`
- `py_slice(x,i,j,k)`
- `py_with(mgr, body_fn)`
- `py_raise`
- `py_is`
- `py_eq`
Data model support: descriptor get/set, method binding, MRO lookup, exception hierarchy, StopIteration protocol.
Perf annotations: mark pure or inline candidates; keep stable ABI.
### CLI/flags
Modes:
- `--readable`
- `--optimized`
- `--debug`
- `--emit-sourcemap`
- `--dump-corepy`
- `--strict-builtins`
Caching: hash of Python AST + flags to cache Nyash output, source map, and diagnostics.
Watch/incremental: re-transpile changed modules, preserve source map continuity.
### Tests
- Golden tests: Python snippet → Nyash output diff, with normalization.
- Differential: run under CPython vs Nyash runtime for functional parity on a corpus (unit/property tests).
- Conformance: edge cases (scoping, descriptors, generators, exceptions) and evaluation order tests.
## Pitfalls & Remedies
### Evaluation order
Python's left-to-right arg eval, starred/unpacking, and kw conflict checks. Enforce by sequencing temps precisely before `py_call`.
### Shadowing builtins/globals
Only specialize when proven not shadowed in any reachable scope. Provide `--strict-builtins` to disable specialization unless guaranteed.
### Identity vs equality
`is` is reference identity; avoid folding or substituting.
### Integer semantics
Python's bigints; ensure Nyash numeric layer matches or route to bigints in `py_*`.
## Future Extensibility
### Plugins
Pass manager with hooks (`before_lower`, `after_lower`, `on_node_<Type>`). Allow project-local rewrites and macros, with access to symbol/type info.
### Custom rules
DSL for pattern→rewrite with predicates (types, purity), e.g., rewrite `dataclass` patterns to Nyash records.
### Multi-language
Keeping the Nyash script as a stable contract invites other frontends (e.g., a subset of JS/TypeScript or Lua) to target Nyash; keep `py_*` separate from language-agnostic intrinsics to avoid contamination.
### Gradual migration
As Nyash grows Pythonic libraries, progressively replace `py_*` with native Nyash idioms; keep a compatibility layer for mixed projects.
## Concrete Translation Sketches
### Attribute
```python
a.b
```
```nyash
py_getattr(a, "b")
```
### Call
```python
f(x, y=1, *as, **kw)
```
```nyash
py_call(f, [x], {"y":1}, as, kw)
```
### If
```python
if a and b:
```
```nyash
let _t=py_truthy(a); if _t { if py_truthy(b) { ... } }
```
### For/else
```python
for x in xs:
if cond:
break
else:
else_block
```
```nyash
let _it = py_iter(xs);
let _broken=false;
loop {
let _n = py_next(_it) catch StopIteration { break };
x = _n;
...
if cond { _broken=true; break }
}
if !_broken { else_block }
```
### With
Evaluate mgr, call `__enter__`, bind val; try body; on exception, call `__exit__(type,e,tb)` and suppress if returns true; finally call `__exit__(None,None,None)` when no exception.
### Decorators
```nyash
let f = <def>;
f = decoN(...(deco1(f)));
name = f
```
## Why Nyash Script First
- **Debuggability**: Human-readable Nyash is easier to inspect, diff, and map errors to; source maps stay small and precise.
- **Optimization leverage**: Nyash compiler/MIR can keep improving independently; your Python frontend benefits automatically.
- **Ecosystem fit**: Generates idiomatic Nyash that other tools (formatters, linters, analyzers) can understand; fosters a consistent DX.