Summary: - Phase 24 (alloc stats): +0.93% GO - Phase 25 (free stats): +1.07% GO - Phase 26 (diagnostics): -0.33% NEUTRAL (code cleanliness) - Total: 11 atomics compiled-out, +2.00% improvement Phase 24: OBSERVE tax prune (tiny_class_stats_box.h) - Added HAKMEM_TINY_CLASS_STATS_COMPILED (default: 0) - Wrapped 5 stats functions: uc_miss, warm_hit, shared_lock, tls_carve_* - Result: +0.93% (baseline 56.675M vs compiled-in 56.151M ops/s) Phase 25: Tiny free stats prune (tiny_superslab_free.inc.h) - Added HAKMEM_TINY_FREE_STATS_COMPILED (default: 0) - Wrapped g_free_ss_enter atomic in free hot path - Result: +1.07% (baseline 57.017M vs compiled-in 56.415M ops/s) Phase 26: Hot path diagnostic atomics prune - Added 5 compile gates for low-frequency error counters: - HAKMEM_TINY_C7_FREE_COUNT_COMPILED - HAKMEM_TINY_HDR_MISMATCH_LOG_COMPILED - HAKMEM_TINY_HDR_META_MISMATCH_COMPILED - HAKMEM_TINY_METRIC_BAD_CLASS_COMPILED - HAKMEM_TINY_HDR_META_FAST_COMPILED - Result: -0.33% NEUTRAL (within noise, kept for cleanliness) Alignment with mimalloc principles: - "No atomics on hot path" - telemetry moved to compile-time opt-in - Fixed per-op tax elimination - Production builds: maximum performance (atomics compiled-out) - Research builds: full diagnostics (COMPILED=1) Generated with Claude Code https://claude.com/claude-code Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
1.8 KiB
1.8 KiB
Phase 24: OBSERVE Tax Prune(tiny_class_stats の hot-path atomic を compile-out)
Status: ✅ GO(default: compiled-out を維持)
Problem statement
Tiny の hot path に「観測(OBSERVE)」用の atomic 増分が残っている:
core/box/tiny_class_stats_box.htiny_class_stats_on_*()がatomic_fetch_add_explicit()を実行
観測は研究/診断用途であり、常時コスト(固定税)として残すのは mimalloc 的にも不利。
Goal
観測目的の atomic を compile-out して、hot path の固定税をゼロに寄せる。
- ✅ compile-out:
#if HAKMEM_*_COMPILED(Phase 22 の勝ち筋) - ❌ link-out: Makefile から
.oを外す(Phase 22-2 の NO-GO)
Scope (v1)
対象(5箇所):
tiny_class_stats_on_uc_miss(ci)tiny_class_stats_on_warm_hit(ci)tiny_class_stats_on_shared_lock(ci)tiny_class_stats_on_tls_carve_attempt(ci)tiny_class_stats_on_tls_carve_success(ci)
Design(Box Theory)
BuildFlagsBox(compile-time boundary)
core/hakmem_build_flags.hHAKMEM_TINY_CLASS_STATS_COMPILED=0/1(default: 0)
API 不変(戻せる / 構造を汚さない)
tiny_class_stats_on_*()の関数形は保持- compiled-out 時は no-op(引数未使用は
(void)ci;で抑制)
A/B plan(build-level)
- baseline(default compile-out)
make clean && make -j bench_random_mixed_hakmemscripts/run_mixed_10_cleanenv.sh
- compiled-in(研究用)
make clean && make -j EXTRA_CFLAGS='-DHAKMEM_TINY_CLASS_STATS_COMPILED=1' bench_random_mixed_hakmemscripts/run_mixed_10_cleanenv.sh
GO/NO-GO(保守運用)
この種の “prune” は layout 変化が絡むため、判断は保守的に運用する:
- GO: +0.5% 以上
- NEUTRAL: ±0.5%
- NO-GO: -0.5% 以下(revert 推奨)