Files
hakmem/docs/analysis/PHASE22_RESEARCH_BOX_PRUNE_1_DESIGN.md
Moe Charm (CI) 8052e8b320 Phase 24-26: Hot path atomic telemetry prune (+2.00% cumulative)
Summary:
- Phase 24 (alloc stats): +0.93% GO
- Phase 25 (free stats): +1.07% GO
- Phase 26 (diagnostics): -0.33% NEUTRAL (code cleanliness)
- Total: 11 atomics compiled-out, +2.00% improvement

Phase 24: OBSERVE tax prune (tiny_class_stats_box.h)
- Added HAKMEM_TINY_CLASS_STATS_COMPILED (default: 0)
- Wrapped 5 stats functions: uc_miss, warm_hit, shared_lock, tls_carve_*
- Result: +0.93% (baseline 56.675M vs compiled-in 56.151M ops/s)

Phase 25: Tiny free stats prune (tiny_superslab_free.inc.h)
- Added HAKMEM_TINY_FREE_STATS_COMPILED (default: 0)
- Wrapped g_free_ss_enter atomic in free hot path
- Result: +1.07% (baseline 57.017M vs compiled-in 56.415M ops/s)

Phase 26: Hot path diagnostic atomics prune
- Added 5 compile gates for low-frequency error counters:
  - HAKMEM_TINY_C7_FREE_COUNT_COMPILED
  - HAKMEM_TINY_HDR_MISMATCH_LOG_COMPILED
  - HAKMEM_TINY_HDR_META_MISMATCH_COMPILED
  - HAKMEM_TINY_METRIC_BAD_CLASS_COMPILED
  - HAKMEM_TINY_HDR_META_FAST_COMPILED
- Result: -0.33% NEUTRAL (within noise, kept for cleanliness)

Alignment with mimalloc principles:
- "No atomics on hot path" - telemetry moved to compile-time opt-in
- Fixed per-op tax elimination
- Production builds: maximum performance (atomics compiled-out)
- Research builds: full diagnostics (COMPILED=1)

Generated with Claude Code
https://claude.com/claude-code

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-16 05:35:11 +09:00

1.7 KiB

Phase 22: Research Box Prune (compile-out default-OFF boxes)

Goal

Remove per-op overhead from default-OFF research boxes by compiling them out of hot paths.

This targets the pattern:

  • feature is default OFF
  • but hot path still pays an if (enabled()) check and/or pulls in extra codegen

Box Theory framing

  • Treat this as a build-time box boundary:
    • default build: research boxes compiled-out (zero runtime overhead)
    • research build: boxes compiled-in (runtime ENV controls allowed)
  • Rollback is build-flag only (no behavioral risk in default build).

Scope (v1)

Phase 14: Tiny tcache (intrusive LIFO)

Compile gate:

  • HAKMEM_TINY_TCACHE_COMPILED=0/1 (default: 0)

Integration points:

  • core/front/tiny_unified_cache.h:
    • wrap tiny_tcache_try_push/pop() callsites with #if HAKMEM_TINY_TCACHE_COMPILED

Phase 15: UnifiedCache FIFO↔LIFO mode switch

Compile gate:

  • HAKMEM_TINY_UNIFIED_LIFO_COMPILED=0/1 (default: 0)

Integration points:

  • core/box/tiny_front_hot_box.h:
    • wrap tiny_unified_lifo_enabled() mode check + LIFO fast path with #if HAKMEM_TINY_UNIFIED_LIFO_COMPILED

Implementation notes

  • Compile gates live in core/hakmem_build_flags.h.
  • Runtime ENV gates (HAKMEM_TINY_TCACHE, HAKMEM_TINY_UNIFIED_LIFO) remain valid for research builds (i.e. when the compile gate is 1).
  • Default builds keep these features fully absent from hot paths.

A/B plan

Use the standard Mixed A/B:

  • scripts/run_mixed_10_cleanenv.sh

Compare:

  • Phase 21 baseline (HOTFULL=1, compile gates OFF → default)
  • Phase 21 + Phase 22 (compile gates OFF but callsites compiled-out)

GO/NO-GO

  • GO: Mixed 10-run mean +1.0% or more
  • NEUTRAL: ±1.0%
  • NO-GO: -1.0% or worse