Files
hakmem/docs/analysis/PHASE23_DEFAULT_OFF_TAX_PRUNE_1_DESIGN.md
Moe Charm (CI) 8052e8b320 Phase 24-26: Hot path atomic telemetry prune (+2.00% cumulative)
Summary:
- Phase 24 (alloc stats): +0.93% GO
- Phase 25 (free stats): +1.07% GO
- Phase 26 (diagnostics): -0.33% NEUTRAL (code cleanliness)
- Total: 11 atomics compiled-out, +2.00% improvement

Phase 24: OBSERVE tax prune (tiny_class_stats_box.h)
- Added HAKMEM_TINY_CLASS_STATS_COMPILED (default: 0)
- Wrapped 5 stats functions: uc_miss, warm_hit, shared_lock, tls_carve_*
- Result: +0.93% (baseline 56.675M vs compiled-in 56.151M ops/s)

Phase 25: Tiny free stats prune (tiny_superslab_free.inc.h)
- Added HAKMEM_TINY_FREE_STATS_COMPILED (default: 0)
- Wrapped g_free_ss_enter atomic in free hot path
- Result: +1.07% (baseline 57.017M vs compiled-in 56.415M ops/s)

Phase 26: Hot path diagnostic atomics prune
- Added 5 compile gates for low-frequency error counters:
  - HAKMEM_TINY_C7_FREE_COUNT_COMPILED
  - HAKMEM_TINY_HDR_MISMATCH_LOG_COMPILED
  - HAKMEM_TINY_HDR_META_MISMATCH_COMPILED
  - HAKMEM_TINY_METRIC_BAD_CLASS_COMPILED
  - HAKMEM_TINY_HDR_META_FAST_COMPILED
- Result: -0.33% NEUTRAL (within noise, kept for cleanliness)

Alignment with mimalloc principles:
- "No atomics on hot path" - telemetry moved to compile-time opt-in
- Fixed per-op tax elimination
- Production builds: maximum performance (atomics compiled-out)
- Research builds: full diagnostics (COMPILED=1)

Generated with Claude Code
https://claude.com/claude-code

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-16 05:35:11 +09:00

2.6 KiB
Raw Blame History

Phase 23: Per-op Default-OFF Tax Prune (compile-out write-once + unified-cache measurement)

Status: NEUTRALcompile gate は維持、リンク除外はしない)

Problem statement

過去の Phase 22Research Box Pruneで確認したパターンの再適用

  • 研究用の機能が default OFF なのに、
  • hot path が毎回 if (enabled()) / TLS read / small branch を払ってしまう

特に alloc/free が十分に速くなった後は、この種の 固定税per-op tax が残りやすい。

Goal

default OFF の knobs を compile-out できるようにし、hot/cold の固定税をゼロに寄せる。

  • compile-out: #if HAKMEM_*_COMPILEDPhase 22 の勝ち筋)
  • link-out: Makefile から .o を抜くPhase 22-2 の NO-GO

Scope (v1)

A) Phase 5 E5-2: Header Write-Once

Compile gate:

  • HAKMEM_TINY_HEADER_WRITE_ONCE_COMPILED=0/1default: 0

効果:

  • HAKMEM_TINY_HEADER_WRITE_ONCE が default OFF のままでも、 tiny_header_finalize_alloc() が毎回 ENV gate を評価する固定税を除去できる。

対象:

  • core/box/tiny_header_box.h: tiny_header_finalize_alloc()
  • core/front/tiny_unified_cache.c: unified_cache_prefill_headers()

B) Unified Cache measurement (ENV-gated instrumentation)

Compile gate:

  • HAKMEM_TINY_UNIFIED_CACHE_MEASURE_COMPILED=0/1default: 0

効果:

  • hot path の unified_cache_measure_check() 呼び出しと、 refill 側の測定コードを compile-out できる。

対象:

  • core/front/tiny_unified_cache.h: hit-path の measurement update既に #if でガード)
  • core/front/tiny_unified_cache.c: refill-side measurement

Box Theory framing

  • BuildFlagsBoxcore/hakmem_build_flags.h)で compile-time 境界を作る。
  • Rollback は build flag のみruntime ではなく build-time の“戻せる”)。
  • Link set は固定(.o を外さない)。

A/B plan (build-level)

原則:同じコードで、compile gate だけを切り替える

  1. baselinedefault, compile-out
  • make clean && make -j bench_random_mixed_hakmem
  • scripts/run_mixed_10_cleanenv.sh
  1. compiled-in研究用
  • make clean && make -j EXTRA_CFLAGS='-DHAKMEM_TINY_HEADER_WRITE_ONCE_COMPILED=1 -DHAKMEM_TINY_UNIFIED_CACHE_MEASURE_COMPILED=1' bench_random_mixed_hakmem
  • scripts/run_mixed_10_cleanenv.sh

GO/NO-GO

この種の “prune” は layout 変化が絡むため、判断は保守的に運用する:

  • GO: +0.5% 以上
  • NEUTRAL: ±0.5%
  • NO-GO: -0.5% 以下revert 推奨)