Files
hakmem/docs/analysis/PERFORMANCE_TARGETS_SCORECARD.md
Moe Charm (CI) 8052e8b320 Phase 24-26: Hot path atomic telemetry prune (+2.00% cumulative)
Summary:
- Phase 24 (alloc stats): +0.93% GO
- Phase 25 (free stats): +1.07% GO
- Phase 26 (diagnostics): -0.33% NEUTRAL (code cleanliness)
- Total: 11 atomics compiled-out, +2.00% improvement

Phase 24: OBSERVE tax prune (tiny_class_stats_box.h)
- Added HAKMEM_TINY_CLASS_STATS_COMPILED (default: 0)
- Wrapped 5 stats functions: uc_miss, warm_hit, shared_lock, tls_carve_*
- Result: +0.93% (baseline 56.675M vs compiled-in 56.151M ops/s)

Phase 25: Tiny free stats prune (tiny_superslab_free.inc.h)
- Added HAKMEM_TINY_FREE_STATS_COMPILED (default: 0)
- Wrapped g_free_ss_enter atomic in free hot path
- Result: +1.07% (baseline 57.017M vs compiled-in 56.415M ops/s)

Phase 26: Hot path diagnostic atomics prune
- Added 5 compile gates for low-frequency error counters:
  - HAKMEM_TINY_C7_FREE_COUNT_COMPILED
  - HAKMEM_TINY_HDR_MISMATCH_LOG_COMPILED
  - HAKMEM_TINY_HDR_META_MISMATCH_COMPILED
  - HAKMEM_TINY_METRIC_BAD_CLASS_COMPILED
  - HAKMEM_TINY_HDR_META_FAST_COMPILED
- Result: -0.33% NEUTRAL (within noise, kept for cleanliness)

Alignment with mimalloc principles:
- "No atomics on hot path" - telemetry moved to compile-time opt-in
- Fixed per-op tax elimination
- Production builds: maximum performance (atomics compiled-out)
- Research builds: full diagnostics (COMPILED=1)

Generated with Claude Code
https://claude.com/claude-code

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-16 05:35:11 +09:00

3.3 KiB
Raw Blame History

Performance Targetsmimalloc 追跡の“数値目標”)

目的: 速さだけでなく syscall / メモリ安定性 / 長時間安定性を含めて「勝ち筋」を固定する。

Current snapshot2025-12-16, local

計測条件(再現の正):

  • hakmem: scripts/run_mixed_10_cleanenv.shITERS=20000000 WS=400、profile=MIXED_TINYV3_C7_SAFE
  • system/mimalloc: ./bench_random_mixed_system 20000000 400 1 / ./bench_random_mixed_mi 20000000 400 1各10-run
  • same-binary libc: HAKMEM_FORCE_LIBC_ALLOC=1 scripts/run_mixed_10_cleanenv.sh10-run
  • Git: HEAD=4d9429e14

結果10-run mean/median

allocator mean (M ops/s) median (M ops/s) ratio vs mimalloc (mean)
hakmem 54.646 54.671 46.2%
libc (same binary) 76.257 76.661 64.5%
system (separate) 81.540 81.801 69.0%
mimalloc (separate) 118.176 118.497 100%

Notes:

  • system/mimalloc は別バイナリ計測のため layouttext size/I-cache差分を含む reference
  • libc (same binary)HAKMEM_FORCE_LIBC_ALLOC=1 により、同一レイアウト上での比較の目安。

1) Speed相対目標

前提: 同一バイナリで hakmem vs mimalloc を比較する(別バイナリ比較は layout 差で壊れる)。

推奨マイルストーンMixed 161024B

  • M1: mimalloc の 55%(現状レンジの安定化)
  • M2: mimalloc の 60%(短期の現実目標)
  • M3: mimalloc の 6570%(大きめの構造改造が必要になりやすい境界)

2) Syscall budgetOS churn

Tiny hot path の理想:

  • steady-statewarmup 後)で mmap/munmap/madvise = 0(または “ほぼ 0”

目安(許容):

  • mmap+munmap+madvise 合計が 1e8 ops あたり 1 回以下= 1e-8 / op

Current:

  • HAKMEM_SS_OS_STATS=1Mixed, iters=200000000 ws=400:
    • [SS_OS_STATS] alloc=9 free=11 madvise=9 madvise_disabled=0 mmap_total=9 fallback_mmap=0 huge_alloc=0

観測方法(どちらか):

  • 内部: HAKMEM_SS_OS_STATS=1[SS_OS_STATS]madvise/disabled 等)
  • 外部: perf stat の syscall events か strace -c(短い実行で回数だけ見る)

3) Memory stabilityRSS / fragmentation

最低条件Mixed / ws 固定の soak

  • RSS が 時間とともに単調増加しない
  • 1時間の soak で RSS drift が +5% 以内(目安)

Current:

  • TBDsoak のテンプレは今後スクリプト化)

推奨指標:

  • RSSpeak / steady
  • page faults増え続けないこと
  • allocator 内部の “inuse / committed” 比(取れるなら)

4) Long-run stability性能・一貫性

最低条件:

  • 3060 分の soak で ops/s が -5% 以上落ちない
  • CV変動係数~12% に収まる(現状の運用と整合)

Current:

  • Mixed 10-run上の snapshot: CV ≈ 0.91%mean 54.646M / min 53.608M / max 55.311M

5) 判定ルール(運用)

  • runtime 変更ENVのみ: GO 閾値 +1.0%Mixed 10-run mean
  • build-level 変更compile-out 系): GO 閾値 +0.5%layout の揺れを考慮)