Files
hakmem/docs/analysis/PHASE87_OBSERVATION_RESULTS.md

103 lines
3.9 KiB
Markdown
Raw Normal View History

# Phase 87: Inline Slots Overflow Observation Results
## Objective
Measure inline slots overflow frequency (C3/C4/C5/C6) to determine if Phase 88 (batch drain optimization) is worth implementing.
## Observation Setup
- **Workload**: Mixed SSOT (WS=400, 16-1024B allocation sizes)
- **Operations**: 20,000,000 random alloc/free operations
- **Runs**: single-run observation (OBSERVE binary)
- **Configuration**:
- Route assignments: LEGACY for all C0-C7
- Inline slots: C4/C5/C6 enabled (Phase 75/76), fixed mode ON (Phase 78), switch dispatch ON (Phase 80)
## Critical Fix (measurement correctness)
An earlier observation run reported `PUSH TOTAL/POP TOTAL = 0` for all classes.
That was **not** valid evidence that inline slots were unused.
Root cause was **telemetry compile gating**:
- `tiny_inline_slots_overflow_enabled()` is a header-only hot-path check.
- The original implementation relied on a `#define` inside `tiny_inline_slots_overflow_stats_box.c`,
which does not apply to other translation units.
- Fix: introduce `HAKMEM_INLINE_SLOTS_OVERFLOW_STATS_COMPILED` in `core/hakmem_build_flags.h` and make the enabled check depend on it.
- OBSERVE build now enables it via Makefile: `bench_random_mixed_hakmem_observe` adds `-DHAKMEM_INLINE_SLOTS_OVERFLOW_STATS_COMPILED=1`.
## Verified Result: inline slots **are** being called (WS=400 SSOT)
### Total Operation Counts (Verification)
```
PUSH TOTAL (Free Path Attempts):
C4: 687,564
C5: 1,373,605
C6: 2,750,862
TOTAL (C4-C6): 4,812,031
POP TOTAL (Alloc Path Attempts):
C4: 687,564
C5: 1,373,605
C6: 2,750,862
TOTAL (C4-C6): 4,812,031
```
This confirms:
-`tiny_legacy_fallback_free_base_with_env()` is being executed (LEGACY fallback path).
- ✅ C4/C5/C6 inline slots push/pop are active in the LEGACY fallback/hot alloc paths.
## Overflow / Underflow Rates (WS=400 SSOT)
```
PUSH FULL (Free Path Ring Overflow):
TOTAL: 0 (0.00%)
POP EMPTY (Alloc Path Ring Underflow):
TOTAL: 168 (0.003%)
```
Interpretation:
- WS=400 SSOT is a **near-perfect steady state** for C4/C5/C6 inline slots.
- Overflow batching ROI is effectively zero: `push_full=0`, `pop_empty≈0.003%`.
## Phase 88 ROI Decision: **NO-GO**
### Recommendation
**DO NOT IMPLEMENT Phase 88 (Batch Drain Optimization)**
### Rationale
1. **Overflow is essentially absent**: `push_full=0`, `pop_empty≈0.003%`.
2. **Batch drain overhead would dominate**: any additional logic is far more likely to incur layout/branch tax than to save work.
3. **This is already the desirable state**: inline slots are sized correctly for WS=400 SSOT.
### Cost-Benefit Analysis
- **Implementation Cost**: high (batch logic, tests, ongoing maintenance)
- **Benefit Under SSOT**: ~0% (overflow frequency too low)
- **Risk**: layout tax / regression in a hot-path-heavy code region
### Alternative Path (If overflow work is desired)
Use a research workload that intentionally produces misses/overflow (e.g. larger WS), and re-run this observation.
Do not use WS=400 SSOT for that validation.
## Implementation Artifacts
### Files Created
- `core/box/tiny_inline_slots_overflow_stats_box.h` - Telemetry box header
- `core/box/tiny_inline_slots_overflow_stats_box.c` - Telemetry implementation
- `core/front/tiny_c{3,4,5,6}_inline_slots.h` - Updated with total counter calls
### Telemetry Infrastructure
- Atomic counters for thread-safe measurement
- Compile-time enabled (always in observation builds)
- Zero overhead when disabled (checked at init time)
- Percentage calculations for overflow rates
## Conclusion
**Phase 87 observation (with fixed telemetry gating) confirms that inline slots are active and overflow is negligible for WS=400 SSOT.**
Phase 88 is therefore correctly frozen as NO-GO for SSOT performance work.
### Score: NO-GO ✗
- Expected Improvement: ~0% (overflow extremely rare)
- Actual Improvement: N/A (measurement-only)
- Implementation Burden: High (new code path, batch logic)
- Recommendation: Archive Phase 88 pending inline slots adoption