Files
hakmem/docs/analysis/PHASE87_OBSERVATION_RESULTS.md
2025-12-19 03:45:01 +09:00

3.9 KiB

Phase 87: Inline Slots Overflow Observation Results

Objective

Measure inline slots overflow frequency (C3/C4/C5/C6) to determine if Phase 88 (batch drain optimization) is worth implementing.

Observation Setup

  • Workload: Mixed SSOT (WS=400, 16-1024B allocation sizes)
  • Operations: 20,000,000 random alloc/free operations
  • Runs: single-run observation (OBSERVE binary)
  • Configuration:
    • Route assignments: LEGACY for all C0-C7
    • Inline slots: C4/C5/C6 enabled (Phase 75/76), fixed mode ON (Phase 78), switch dispatch ON (Phase 80)

Critical Fix (measurement correctness)

An earlier observation run reported PUSH TOTAL/POP TOTAL = 0 for all classes. That was not valid evidence that inline slots were unused. Root cause was telemetry compile gating:

  • tiny_inline_slots_overflow_enabled() is a header-only hot-path check.
  • The original implementation relied on a #define inside tiny_inline_slots_overflow_stats_box.c, which does not apply to other translation units.
  • Fix: introduce HAKMEM_INLINE_SLOTS_OVERFLOW_STATS_COMPILED in core/hakmem_build_flags.h and make the enabled check depend on it.
  • OBSERVE build now enables it via Makefile: bench_random_mixed_hakmem_observe adds -DHAKMEM_INLINE_SLOTS_OVERFLOW_STATS_COMPILED=1.

Verified Result: inline slots are being called (WS=400 SSOT)

Total Operation Counts (Verification)

PUSH TOTAL (Free Path Attempts):
  C4: 687,564
  C5: 1,373,605
  C6: 2,750,862
  TOTAL (C4-C6): 4,812,031

POP TOTAL (Alloc Path Attempts):
  C4: 687,564
  C5: 1,373,605
  C6: 2,750,862
  TOTAL (C4-C6): 4,812,031

This confirms:

  • tiny_legacy_fallback_free_base_with_env() is being executed (LEGACY fallback path).
  • C4/C5/C6 inline slots push/pop are active in the LEGACY fallback/hot alloc paths.

Overflow / Underflow Rates (WS=400 SSOT)

PUSH FULL (Free Path Ring Overflow):
  TOTAL: 0 (0.00%)

POP EMPTY (Alloc Path Ring Underflow):
  TOTAL: 168 (0.003%)

Interpretation:

  • WS=400 SSOT is a near-perfect steady state for C4/C5/C6 inline slots.
  • Overflow batching ROI is effectively zero: push_full=0, pop_empty≈0.003%.

Phase 88 ROI Decision: NO-GO

Recommendation

DO NOT IMPLEMENT Phase 88 (Batch Drain Optimization)

Rationale

  1. Overflow is essentially absent: push_full=0, pop_empty≈0.003%.
  2. Batch drain overhead would dominate: any additional logic is far more likely to incur layout/branch tax than to save work.
  3. This is already the desirable state: inline slots are sized correctly for WS=400 SSOT.

Cost-Benefit Analysis

  • Implementation Cost: high (batch logic, tests, ongoing maintenance)
  • Benefit Under SSOT: ~0% (overflow frequency too low)
  • Risk: layout tax / regression in a hot-path-heavy code region

Alternative Path (If overflow work is desired)

Use a research workload that intentionally produces misses/overflow (e.g. larger WS), and re-run this observation. Do not use WS=400 SSOT for that validation.

Implementation Artifacts

Files Created

  • core/box/tiny_inline_slots_overflow_stats_box.h - Telemetry box header
  • core/box/tiny_inline_slots_overflow_stats_box.c - Telemetry implementation
  • core/front/tiny_c{3,4,5,6}_inline_slots.h - Updated with total counter calls

Telemetry Infrastructure

  • Atomic counters for thread-safe measurement
  • Compile-time enabled (always in observation builds)
  • Zero overhead when disabled (checked at init time)
  • Percentage calculations for overflow rates

Conclusion

Phase 87 observation (with fixed telemetry gating) confirms that inline slots are active and overflow is negligible for WS=400 SSOT. Phase 88 is therefore correctly frozen as NO-GO for SSOT performance work.

Score: NO-GO ✗

  • Expected Improvement: ~0% (overflow extremely rare)
  • Actual Improvement: N/A (measurement-only)
  • Implementation Burden: High (new code path, batch logic)
  • Recommendation: Archive Phase 88 pending inline slots adoption