40 lines
1.7 KiB
Markdown
40 lines
1.7 KiB
Markdown
|
|
C7 Alloc Hotpath Flattening (design memo)
|
|||
|
|
=========================================
|
|||
|
|
|
|||
|
|
Goals
|
|||
|
|
-----
|
|||
|
|
- Make C7 alloc as close to a straight line as possible.
|
|||
|
|
- Minimise branches/indirections on the steady hit path (UC/TLS/Warm already stable).
|
|||
|
|
- Keep Box boundaries intact; isolate feature gates to one lookup.
|
|||
|
|
|
|||
|
|
Current shape (simplified)
|
|||
|
|
--------------------------
|
|||
|
|
1. size→class LUT → `class_idx = 7` for 1024B path.
|
|||
|
|
2. Route/Policy checks (tiny_route_get, tiny_policy_get) → gate UC/Warm/Page.
|
|||
|
|
3. UC pop: hit path shares code with miss/refill, includes stats/guards.
|
|||
|
|
4. TLS/Warm engagement happens behind UC miss boundary.
|
|||
|
|
5. Multiple helper calls on the hit path (gate box, policy box, UC helpers).
|
|||
|
|
|
|||
|
|
Target shape
|
|||
|
|
------------
|
|||
|
|
1. size→class LUT (unchanged).
|
|||
|
|
2. One policy snapshot: `const TinyClassPolicy* pol = tiny_policy_get(7);`
|
|||
|
|
3. One route decision: C7 fast path assumes Tiny→UC→TLS/Warm enabled.
|
|||
|
|
4. Hit path specialised:
|
|||
|
|
- Inline `tiny_unified_cache_pop_fast_c7()` that only touches the hot cache lines.
|
|||
|
|
- Stats optional/sampled (avoid atomic on every hit).
|
|||
|
|
- No feature/env reads.
|
|||
|
|
5. Miss path remains boxed and guarded; enters existing refill flow unchanged.
|
|||
|
|
|
|||
|
|
Possible refactors
|
|||
|
|
------------------
|
|||
|
|
- Add `malloc_tiny_fast_c7_inline(...)` as a static inline used only when class==7.
|
|||
|
|
- Precompute `pol->warm_enabled/page_box_enabled` once per thread and reuse.
|
|||
|
|
- Split UC helpers into `*_hit_fast` vs `*_miss` to keep the hit CFG tiny.
|
|||
|
|
|
|||
|
|
Trade-offs / checks
|
|||
|
|
-------------------
|
|||
|
|
- Keep the Box boundaries (Gate/Route/Policy) but allow an inline “fast lane” for C7.
|
|||
|
|
- Ensure Debug/Policy logging stays in the slow/miss path only.
|
|||
|
|
- Validate with IPC/ops after implementation; target +10–15% for C7-heavy mixes.
|