103 lines
3.9 KiB
Markdown
103 lines
3.9 KiB
Markdown
# Phase 87: Inline Slots Overflow Observation Results
|
|
|
|
## Objective
|
|
Measure inline slots overflow frequency (C3/C4/C5/C6) to determine if Phase 88 (batch drain optimization) is worth implementing.
|
|
|
|
## Observation Setup
|
|
- **Workload**: Mixed SSOT (WS=400, 16-1024B allocation sizes)
|
|
- **Operations**: 20,000,000 random alloc/free operations
|
|
- **Runs**: single-run observation (OBSERVE binary)
|
|
- **Configuration**:
|
|
- Route assignments: LEGACY for all C0-C7
|
|
- Inline slots: C4/C5/C6 enabled (Phase 75/76), fixed mode ON (Phase 78), switch dispatch ON (Phase 80)
|
|
|
|
## Critical Fix (measurement correctness)
|
|
|
|
An earlier observation run reported `PUSH TOTAL/POP TOTAL = 0` for all classes.
|
|
That was **not** valid evidence that inline slots were unused.
|
|
Root cause was **telemetry compile gating**:
|
|
|
|
- `tiny_inline_slots_overflow_enabled()` is a header-only hot-path check.
|
|
- The original implementation relied on a `#define` inside `tiny_inline_slots_overflow_stats_box.c`,
|
|
which does not apply to other translation units.
|
|
- Fix: introduce `HAKMEM_INLINE_SLOTS_OVERFLOW_STATS_COMPILED` in `core/hakmem_build_flags.h` and make the enabled check depend on it.
|
|
- OBSERVE build now enables it via Makefile: `bench_random_mixed_hakmem_observe` adds `-DHAKMEM_INLINE_SLOTS_OVERFLOW_STATS_COMPILED=1`.
|
|
|
|
## Verified Result: inline slots **are** being called (WS=400 SSOT)
|
|
|
|
### Total Operation Counts (Verification)
|
|
```
|
|
PUSH TOTAL (Free Path Attempts):
|
|
C4: 687,564
|
|
C5: 1,373,605
|
|
C6: 2,750,862
|
|
TOTAL (C4-C6): 4,812,031
|
|
|
|
POP TOTAL (Alloc Path Attempts):
|
|
C4: 687,564
|
|
C5: 1,373,605
|
|
C6: 2,750,862
|
|
TOTAL (C4-C6): 4,812,031
|
|
```
|
|
|
|
This confirms:
|
|
- ✅ `tiny_legacy_fallback_free_base_with_env()` is being executed (LEGACY fallback path).
|
|
- ✅ C4/C5/C6 inline slots push/pop are active in the LEGACY fallback/hot alloc paths.
|
|
|
|
## Overflow / Underflow Rates (WS=400 SSOT)
|
|
|
|
```
|
|
PUSH FULL (Free Path Ring Overflow):
|
|
TOTAL: 0 (0.00%)
|
|
|
|
POP EMPTY (Alloc Path Ring Underflow):
|
|
TOTAL: 168 (0.003%)
|
|
```
|
|
|
|
Interpretation:
|
|
- WS=400 SSOT is a **near-perfect steady state** for C4/C5/C6 inline slots.
|
|
- Overflow batching ROI is effectively zero: `push_full=0`, `pop_empty≈0.003%`.
|
|
|
|
## Phase 88 ROI Decision: **NO-GO**
|
|
|
|
### Recommendation
|
|
**DO NOT IMPLEMENT Phase 88 (Batch Drain Optimization)**
|
|
|
|
### Rationale
|
|
1. **Overflow is essentially absent**: `push_full=0`, `pop_empty≈0.003%`.
|
|
2. **Batch drain overhead would dominate**: any additional logic is far more likely to incur layout/branch tax than to save work.
|
|
3. **This is already the desirable state**: inline slots are sized correctly for WS=400 SSOT.
|
|
|
|
### Cost-Benefit Analysis
|
|
- **Implementation Cost**: high (batch logic, tests, ongoing maintenance)
|
|
- **Benefit Under SSOT**: ~0% (overflow frequency too low)
|
|
- **Risk**: layout tax / regression in a hot-path-heavy code region
|
|
|
|
### Alternative Path (If overflow work is desired)
|
|
Use a research workload that intentionally produces misses/overflow (e.g. larger WS), and re-run this observation.
|
|
Do not use WS=400 SSOT for that validation.
|
|
|
|
## Implementation Artifacts
|
|
|
|
### Files Created
|
|
- `core/box/tiny_inline_slots_overflow_stats_box.h` - Telemetry box header
|
|
- `core/box/tiny_inline_slots_overflow_stats_box.c` - Telemetry implementation
|
|
- `core/front/tiny_c{3,4,5,6}_inline_slots.h` - Updated with total counter calls
|
|
|
|
### Telemetry Infrastructure
|
|
- Atomic counters for thread-safe measurement
|
|
- Compile-time enabled (always in observation builds)
|
|
- Zero overhead when disabled (checked at init time)
|
|
- Percentage calculations for overflow rates
|
|
|
|
## Conclusion
|
|
|
|
**Phase 87 observation (with fixed telemetry gating) confirms that inline slots are active and overflow is negligible for WS=400 SSOT.**
|
|
Phase 88 is therefore correctly frozen as NO-GO for SSOT performance work.
|
|
|
|
### Score: NO-GO ✗
|
|
- Expected Improvement: ~0% (overflow extremely rare)
|
|
- Actual Improvement: N/A (measurement-only)
|
|
- Implementation Burden: High (new code path, batch logic)
|
|
- Recommendation: Archive Phase 88 pending inline slots adoption
|