Files

Moe Charm (CI) 2013514f7b Working state before pushing to cyu remote

2025-12-19 03:45:01 +09:00

3.9 KiB

Raw Blame History

Phase 87: Inline Slots Overflow Observation Results

Objective

Measure inline slots overflow frequency (C3/C4/C5/C6) to determine if Phase 88 (batch drain optimization) is worth implementing.

Observation Setup

Workload: Mixed SSOT (WS=400, 16-1024B allocation sizes)
Operations: 20,000,000 random alloc/free operations
Runs: single-run observation (OBSERVE binary)
Configuration:
- Route assignments: LEGACY for all C0-C7
- Inline slots: C4/C5/C6 enabled (Phase 75/76), fixed mode ON (Phase 78), switch dispatch ON (Phase 80)

Critical Fix (measurement correctness)

An earlier observation run reported PUSH TOTAL/POP TOTAL = 0 for all classes. That was not valid evidence that inline slots were unused. Root cause was telemetry compile gating:

tiny_inline_slots_overflow_enabled() is a header-only hot-path check.
The original implementation relied on a #define inside tiny_inline_slots_overflow_stats_box.c, which does not apply to other translation units.
Fix: introduce HAKMEM_INLINE_SLOTS_OVERFLOW_STATS_COMPILED in core/hakmem_build_flags.h and make the enabled check depend on it.
OBSERVE build now enables it via Makefile: bench_random_mixed_hakmem_observe adds -DHAKMEM_INLINE_SLOTS_OVERFLOW_STATS_COMPILED=1.

Verified Result: inline slots are being called (WS=400 SSOT)

Total Operation Counts (Verification)

PUSH TOTAL (Free Path Attempts):
  C4: 687,564
  C5: 1,373,605
  C6: 2,750,862
  TOTAL (C4-C6): 4,812,031

POP TOTAL (Alloc Path Attempts):
  C4: 687,564
  C5: 1,373,605
  C6: 2,750,862
  TOTAL (C4-C6): 4,812,031

This confirms:

✅ tiny_legacy_fallback_free_base_with_env() is being executed (LEGACY fallback path).
✅ C4/C5/C6 inline slots push/pop are active in the LEGACY fallback/hot alloc paths.

Overflow / Underflow Rates (WS=400 SSOT)

PUSH FULL (Free Path Ring Overflow):
  TOTAL: 0 (0.00%)

POP EMPTY (Alloc Path Ring Underflow):
  TOTAL: 168 (0.003%)

Interpretation:

WS=400 SSOT is a near-perfect steady state for C4/C5/C6 inline slots.
Overflow batching ROI is effectively zero: push_full=0, pop_empty≈0.003%.

Phase 88 ROI Decision: NO-GO

Recommendation

DO NOT IMPLEMENT Phase 88 (Batch Drain Optimization)

Rationale

Overflow is essentially absent: push_full=0, pop_empty≈0.003%.
Batch drain overhead would dominate: any additional logic is far more likely to incur layout/branch tax than to save work.
This is already the desirable state: inline slots are sized correctly for WS=400 SSOT.

Cost-Benefit Analysis

Implementation Cost: high (batch logic, tests, ongoing maintenance)
Benefit Under SSOT: ~0% (overflow frequency too low)
Risk: layout tax / regression in a hot-path-heavy code region

Alternative Path (If overflow work is desired)

Use a research workload that intentionally produces misses/overflow (e.g. larger WS), and re-run this observation. Do not use WS=400 SSOT for that validation.

Implementation Artifacts

Files Created

core/box/tiny_inline_slots_overflow_stats_box.h - Telemetry box header
core/box/tiny_inline_slots_overflow_stats_box.c - Telemetry implementation
core/front/tiny_c{3,4,5,6}_inline_slots.h - Updated with total counter calls

Telemetry Infrastructure

Atomic counters for thread-safe measurement
Compile-time enabled (always in observation builds)
Zero overhead when disabled (checked at init time)
Percentage calculations for overflow rates

Conclusion

Phase 87 observation (with fixed telemetry gating) confirms that inline slots are active and overflow is negligible for WS=400 SSOT. Phase 88 is therefore correctly frozen as NO-GO for SSOT performance work.

Score: NO-GO ✗

Expected Improvement: ~0% (overflow extremely rare)
Actual Improvement: N/A (measurement-only)
Implementation Burden: High (new code path, batch logic)
Recommendation: Archive Phase 88 pending inline slots adoption

3.9 KiB Raw Blame History