Remove automatic WeakNew conversion and enforce strict compile-time type checking for weak field assignments. Only 3 assignment types allowed: 1. Result of weak(x) call (WeakRef type) 2. Existing WeakRef variable (e.g., me.parent = other.parent) 3. Void/null (clear operation) **Implementation**: - Added MirType::WeakRef to type system (src/mir/types.rs) - Track WeakRef type in emit_weak_new() even in pure mode - Weak field reads return WeakRef without auto-upgrade - Removed automatic WeakNew conversion from field writes - Implemented check_weak_field_assignment() with actionable errors - Fixed null literal type tracking (Phase 285A1.1: Unknown → Void) **Testing**: - 5 test fixtures (3 OK, 2 NG cases) - all passing - Smoke test: phase285_weak_field_vm.sh - Error messages guide users to use weak() or null **Documentation**: - Updated lifecycle.md SSOT with weak field contract 🤖 Generated with Claude Code Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
336 lines
14 KiB
Markdown
336 lines
14 KiB
Markdown
# Box Lifecycle and Finalization (SSOT)
|
||
|
||
Status: SSOT (language-level), with implementation status notes.
|
||
|
||
This document defines the Nyash object lifecycle model: lexical scope, ownership (strong/weak), finalization (`fini()`), and what is (and is not) guaranteed across backends.
|
||
|
||
## Terms
|
||
|
||
- **Binding**: a local variable slot (created by `local`) that points to a value.
|
||
- **Box value**: an object reference (user-defined / builtin / plugin).
|
||
- **Strong reference**: an owning reference that contributes to keeping the object alive.
|
||
- **Weak reference**: a non-owning reference; it does not keep the object alive and may become dead.
|
||
- **Finalization (`fini`)**: a logical end-of-life hook. It is not “physical deallocation”.
|
||
|
||
## 0) Two-layer model (resource vs memory)
|
||
|
||
Nyash separates two concerns:
|
||
|
||
- **Resource lifecycle (deterministic)**: `fini()` defines *logical* end-of-life and must be safe and explicit.
|
||
- **Heap memory reclamation (non-deterministic)**: physical memory is reclaimed by the runtime implementation (typically reference counting). Timing is not part of the language semantics.
|
||
|
||
This split lets Nyash keep “箱理論” simple:
|
||
- Programs must use `fini()` (or sugar that guarantees it) to deterministically release external resources (fd/socket/native handles).
|
||
- Programs must not rely on GC timing for correctness.
|
||
|
||
## 1) Scope model (locals)
|
||
|
||
- `local` is block-scoped: the binding exists from its declaration to the end of the lexical block (`{ ... }`).
|
||
- Leaving a block drops its bindings immediately (including inner `{}` blocks).
|
||
- Dropping a binding reduces strong ownership held by that binding. It may or may not physically deallocate the object (depends on other strong references).
|
||
|
||
This is the “variable lifetime” rule. Object lifetime is defined below.
|
||
|
||
## 2) Object lifetime (strong / weak)
|
||
|
||
### Strong ownership
|
||
|
||
- A strong reference keeps the object alive.
|
||
- When the last strong reference to an object disappears, the object becomes eligible for physical destruction by the runtime.
|
||
- In typical implementations this is immediate (reference-counted drop) for acyclic graphs, but the language does not require immediacy.
|
||
|
||
### Weak references
|
||
|
||
Weak references exist to avoid cycles and to represent back-pointers safely.
|
||
|
||
Language-level guidance:
|
||
- Locals and return values are typically strong.
|
||
- Back-pointers / caches / parent links that would create cycles should be weak.
|
||
|
||
Required property:
|
||
- A weak reference never keeps the object alive.
|
||
|
||
Observable operations (surface-level; exact API depends on the box type):
|
||
- “Is alive?” check.
|
||
- Weak-to-strong conversion (may fail): `weak_to_strong()`.
|
||
|
||
## 3) Finalization (`fini`) — what it means
|
||
|
||
`fini()` is a **logical** termination hook:
|
||
- After `fini()` has executed successfully for an object, the object must be treated as unusable (use-after-fini is an error).
|
||
- `fini()` must be **idempotent** (calling it multiple times is allowed and must not double-free resources).
|
||
- This supports “external force fini” and best-effort cleanup paths safely.
|
||
|
||
### Fail-fast after `fini`
|
||
|
||
After an object is finalized, operations must fail fast (use-after-fini).
|
||
Permitted exceptions (optional, per type) are strictly observational operations such as identity / debug string.
|
||
|
||
### Object states (Alive / Dead / Freed)
|
||
|
||
Nyash distinguishes:
|
||
|
||
- **Alive**: normal state; methods/fields are usable.
|
||
- **Dead**: finalized by `fini()`; object identity may still exist but is not usable.
|
||
- **Freed**: physically destroyed by the runtime (implementation detail).
|
||
|
||
State transitions (conceptual):
|
||
|
||
- `Alive --fini()--> Dead --(runtime)--> Freed`
|
||
- `Alive --(runtime)--> Freed`
|
||
|
||
SSOT rule:
|
||
- `fini()` is the only operation that creates the **Dead** state.
|
||
- Runtime reclamation does not imply `fini()` was executed.
|
||
|
||
### Dead: allowed vs forbidden operations
|
||
|
||
Allowed on **Dead** (minimal set):
|
||
- Debug/observation: `toString`, `typeName`, `id` (if provided)
|
||
- Identity checks: `==` (identity only), and identity-based hashing if the type supports hashing
|
||
|
||
Forbidden on **Dead** (Fail-Fast, UseAfterFini):
|
||
- Field read/write
|
||
- Method calls
|
||
- ByRef (`RefGet/RefSet`) operations
|
||
- Conversions / truthiness (`if dead_box { ... }` is an error)
|
||
- Creating new weak references from a dead object (`weak(dead)` is an error)
|
||
|
||
### Finalization precedence
|
||
|
||
When finalization is triggered (by explicit call or by an owning context; see below):
|
||
1) If the object is already finalized, do nothing (idempotent).
|
||
2) Run user-defined `fini()` if present.
|
||
3) Run automatic cascade finalization for remaining **strong-owned fields** (weak fields are skipped).
|
||
4) Clear fields / invalidate internal state.
|
||
|
||
### Weak references are non-owning
|
||
|
||
Weak references are values (`WeakRef`) that can be stored in locals or fields:
|
||
- They are **not** part of ownership.
|
||
- Automatic cascade finalization must not follow weak references.
|
||
- Calling `fini()` “through” a weak reference is invalid (non-owning references cannot decide the target’s lifetime).
|
||
|
||
## 4) Ownership and “escaping” out of a scope
|
||
|
||
Nyash distinguishes “dropping a binding” from “finalizing an object”.
|
||
|
||
Finalization is tied to **ownership**, not merely being in scope.
|
||
|
||
### Owning contexts
|
||
|
||
An object is considered owned by one of these contexts:
|
||
- A local binding (typical case).
|
||
- A strong-owned field of another object.
|
||
- A module/global registry entry (e.g., `env.modules`).
|
||
- A runtime host handle / singleton registry (typical for plugins).
|
||
|
||
### Escapes (ownership moves)
|
||
|
||
If a value is moved into a longer-lived owning context before the current scope ends, then the current scope must not finalize it.
|
||
|
||
Common escape paths:
|
||
- Assigning into an enclosing-scope binding (updates the owner).
|
||
- Returning via `outbox` (ownership moves to the caller).
|
||
- Storing into a strong-owned field of an object that outlives the scope.
|
||
- Publishing into global/module registries.
|
||
|
||
This rule is what keeps “scope finalization” from breaking shared references.
|
||
|
||
## 4.1) What is guaranteed to run automatically
|
||
|
||
Language guarantee (deterministic):
|
||
- Only **explicit cleanup constructs** guarantee cleanup execution for all exits (return/break/continue/error).
|
||
|
||
Recommended SSOT surface:
|
||
- `cleanup` blocks (Stage‑3): attach cleanup code structurally.
|
||
- Future sugar may exist (`defer`, RAII-style `using`), but it must lower to `cleanup` semantics.
|
||
|
||
Non-guarantees:
|
||
- “Leaving a block” does not by itself guarantee `fini()` execution for an object, because aliasing/escaping is allowed.
|
||
- GC must not call `fini()` as part of meaning.
|
||
|
||
### `cleanup` (block-postfix) — the deterministic “defer”
|
||
|
||
The primary guaranteed cleanup construct is block-postfix `cleanup` (Stage‑3):
|
||
|
||
```nyash
|
||
{
|
||
local f = open(path)
|
||
do_work(f)
|
||
} cleanup {
|
||
f.fini()
|
||
}
|
||
```
|
||
|
||
SSOT semantics:
|
||
- The `cleanup` block runs exactly once on every exit path from the attached block (normal fallthrough, `return`, `break`, `continue`, and errors).
|
||
- The `cleanup` block executes *before* the block’s locals are dropped, and can reference locals from that block.
|
||
- `cleanup` must not change the meaning of the program aside from running its code; it is not implicit GC/finalization.
|
||
Note:
|
||
- `cleanup` may appear with or without `catch`. It always runs after `catch` (if present).
|
||
|
||
## 4.2) Weak references (surface model)
|
||
|
||
Weak references exist to avoid strong cycles and to model back-pointers.
|
||
|
||
SSOT operations:
|
||
- `weak(x)` produces a `WeakRef` to `x` (x must be Alive).
|
||
- `weakRef.weak_to_strong()` returns the target box if it is usable, otherwise `null` (none).
|
||
- It returns `null` if the target is **Dead** (finalized) or **Freed** (collected).
|
||
- Note: `null` and `void` are equivalent at runtime (SSOT: `docs/reference/language/types.md`).
|
||
|
||
WeakRef in fields:
|
||
- Reading a field that stores a `WeakRef` yields a `WeakRef`. It does not auto-upgrade.
|
||
|
||
Recommended usage pattern:
|
||
```nyash
|
||
local x = w.weak_to_strong()
|
||
if x != null {
|
||
...
|
||
}
|
||
```
|
||
|
||
WeakRef equality:
|
||
- `WeakRef` carries a stable target token (conceptually: `WeakToken`).
|
||
- `w1 == w2` compares tokens. This is independent of Alive/Dead/Freed.
|
||
- "dead==dead" is true only when both weakrefs point to the same original target token.
|
||
|
||
### Weak Field Assignment Contract (Phase 285A1)
|
||
|
||
Weak fields enforce strict type requirements at compile time:
|
||
|
||
**Allowed assignments** (3 cases):
|
||
1. **Explicit weak reference**: `me.parent = weak(p)`
|
||
2. **WeakRef variable**: `me.parent = other.parent` (where `other.parent` is weak field)
|
||
3. **Void**: `me.parent = Void` (clear operation; null is sugar for Void)
|
||
|
||
**Forbidden assignments** (Fail-Fast compile error):
|
||
- Direct BoxRef: `me.parent = p` where `p` is BoxRef
|
||
- Primitives: `me.parent = 42`
|
||
- Any non-WeakRef type without explicit `weak()`
|
||
|
||
**Error message example**:
|
||
```
|
||
Cannot assign Box (NodeBox) to weak field 'Tree.parent'.
|
||
Use weak(...) to create weak reference: me.parent = weak(value)
|
||
```
|
||
|
||
**Rationale**: Explicit `weak()` calls make the semantic difference between strong and weak references visible. This prevents:
|
||
- Accidental strong references in weak fields (reference cycles)
|
||
- Confusion about object lifetime and ownership
|
||
- Silent bugs from automatic conversions
|
||
|
||
**Example**:
|
||
```nyash
|
||
box Node {
|
||
weak parent
|
||
|
||
set_parent(p) {
|
||
// ❌ me.parent = p // Compile error
|
||
// ✅ me.parent = weak(p) // Explicit weak()
|
||
// ✅ me.parent = Void // Clear operation (SSOT: Void primary)
|
||
}
|
||
|
||
copy_parent(other: Node) {
|
||
// ✅ me.parent = other.parent // WeakRef → WeakRef
|
||
}
|
||
}
|
||
```
|
||
|
||
## 5) Cycles and GC (language-level policy)
|
||
|
||
### Cycles
|
||
|
||
Nyash allows object graphs; strong cycles can exist unless the program avoids them.
|
||
|
||
Policy:
|
||
- Programs should use **weak** references for back-pointers / parent links to avoid strong cycles.
|
||
- If a strong cycle exists, memory reclamation is not guaranteed (it may leak). This is allowed behavior in “no cycle collector” mode.
|
||
|
||
Important: weak references themselves do not require tracing GC.
|
||
- They require a runtime liveness mechanism (e.g., an `Rc/Weak`-style control block) so that “weak_to_strong” can succeed/fail safely.
|
||
|
||
### GC modes
|
||
|
||
GC is treated as an optimization/diagnostics facility, not as a semantic requirement. In practice, this means “cycle collection / tracing”, not “basic refcount drop”.
|
||
|
||
- **GC off**: reference-counted reclamation still applies for non-cyclic ownership graphs; strong cycles may leak.
|
||
- **GC on**: the runtime may additionally reclaim unreachable cycles eventually; timing is not guaranteed.
|
||
|
||
Invariant:
|
||
- Whether GC is on or off must not change *program meaning*, except for observability related to resource/memory timing (which must not be relied upon for correctness).
|
||
|
||
## 6) ByRef (`RefGet/RefSet`) — borrowed slot references (non-owning)
|
||
|
||
Nyash has an internal “ByRef” concept (MIR `RefGet/RefSet`) used to access and mutate fields through a **borrowed reference to a storage slot**.
|
||
|
||
Intended use cases:
|
||
- Field get/set lowering with visibility checks (public/private) and delegation (from/override).
|
||
- Passing a “mutable reference” to runtime helpers or plugin calls without copying large values.
|
||
|
||
SSOT constraints:
|
||
- ByRef is **non-owning**: it does not keep the target alive and does not affect strong/weak counts.
|
||
- ByRef is **non-escaping**: it must not be stored in fields/arrays/maps, returned, captured by closures, or placed into global registries.
|
||
- ByRef is **scope-bound**: it is only valid within the dynamic extent where it was produced (typically a single statement or call lowering).
|
||
- Using ByRef on **Dead/Freed** targets is an error (UseAfterFini / dangling ByRef).
|
||
|
||
These constraints keep “箱理論” simple: ownership is strong/weak; ByRef is a temporary access mechanism only.
|
||
|
||
## 7) Diagnostics (non-normative)
|
||
|
||
Runtimes may provide diagnostics to help validate lifecycle rules (example: reporting remaining strong roots or non-finalized objects at process exit). These diagnostics are not part of language semantics and must be default-off.
|
||
|
||
## 8) Implementation status (non-normative)
|
||
|
||
This section documents current backend reality so we can detect drift as bugs.
|
||
|
||
### Feature Matrix (Phase 285A0 update)
|
||
|
||
| Feature | VM | LLVM | WASM |
|
||
|---------|-----|------|------|
|
||
| WeakRef (`weak(x)`, `weak_to_strong()`) | ✅ | ❌ unsupported (285A1) | ❌ unsupported |
|
||
| Leak Report (`NYASH_LEAK_LOG`) | ✅ | ⚠️ partial (not yet) | ❌ |
|
||
|
||
### Notes
|
||
|
||
- **Block-scoped locals** are the language model (`local` drops at `}`), but the *observable* effects depend on where the last strong reference is held.
|
||
- **WeakRef** (Phase 285A0): VM backend fully supports `weak(x)` and `weak_to_strong()`. LLVM harness support is planned for Phase 285A1.
|
||
- **WASM backend** currently treats MIR `WeakNew/WeakLoad` as plain copies (weak behaves like strong). This does not satisfy the SSOT weak semantics yet (see also: `docs/guides/wasm-guide/planning/unsupported_features.md`).
|
||
- **Leak Report** (Phase 285): `NYASH_LEAK_LOG={1|2}` prints exit-time diagnostics showing global roots still held (modules, host_handles, plugin_boxes). See `docs/reference/environment-variables.md`.
|
||
- Conformance gaps (any backend differences from this document) must be treated as bugs and tracked explicitly; do not "paper over" differences by changing this SSOT without a decision.
|
||
|
||
See also:
|
||
- `docs/reference/language/variables-and-scope.md` (binding scoping and assignment resolution)
|
||
- `docs/reference/boxes-system/memory-finalization.md` (design notes; must not contradict this SSOT)
|
||
|
||
## 9) Validation recipes (non-normative)
|
||
|
||
WeakRef behavior (weak_to_strong must fail safely):
|
||
```nyash
|
||
box SomeBox { }
|
||
static box Main {
|
||
main() {
|
||
local x = new SomeBox()
|
||
local w = weak(x)
|
||
x = null
|
||
local y = w.weak_to_strong()
|
||
if y == null { print("ok: dropped") }
|
||
}
|
||
}
|
||
```
|
||
|
||
Cycle avoidance (use weak for back-pointers):
|
||
```nyash
|
||
box Node { next_weak }
|
||
static box Main {
|
||
main() {
|
||
local a = new Node()
|
||
local b = new Node()
|
||
a.next_weak = weak(b)
|
||
b.next_weak = weak(a)
|
||
return 0
|
||
}
|
||
}
|
||
```
|