Files
hakmem/docs/analysis/PERFORMANCE_TARGETS_SCORECARD.md
Moe Charm (CI) 8052e8b320 Phase 24-26: Hot path atomic telemetry prune (+2.00% cumulative)
Summary:
- Phase 24 (alloc stats): +0.93% GO
- Phase 25 (free stats): +1.07% GO
- Phase 26 (diagnostics): -0.33% NEUTRAL (code cleanliness)
- Total: 11 atomics compiled-out, +2.00% improvement

Phase 24: OBSERVE tax prune (tiny_class_stats_box.h)
- Added HAKMEM_TINY_CLASS_STATS_COMPILED (default: 0)
- Wrapped 5 stats functions: uc_miss, warm_hit, shared_lock, tls_carve_*
- Result: +0.93% (baseline 56.675M vs compiled-in 56.151M ops/s)

Phase 25: Tiny free stats prune (tiny_superslab_free.inc.h)
- Added HAKMEM_TINY_FREE_STATS_COMPILED (default: 0)
- Wrapped g_free_ss_enter atomic in free hot path
- Result: +1.07% (baseline 57.017M vs compiled-in 56.415M ops/s)

Phase 26: Hot path diagnostic atomics prune
- Added 5 compile gates for low-frequency error counters:
  - HAKMEM_TINY_C7_FREE_COUNT_COMPILED
  - HAKMEM_TINY_HDR_MISMATCH_LOG_COMPILED
  - HAKMEM_TINY_HDR_META_MISMATCH_COMPILED
  - HAKMEM_TINY_METRIC_BAD_CLASS_COMPILED
  - HAKMEM_TINY_HDR_META_FAST_COMPILED
- Result: -0.33% NEUTRAL (within noise, kept for cleanliness)

Alignment with mimalloc principles:
- "No atomics on hot path" - telemetry moved to compile-time opt-in
- Fixed per-op tax elimination
- Production builds: maximum performance (atomics compiled-out)
- Research builds: full diagnostics (COMPILED=1)

Generated with Claude Code
https://claude.com/claude-code

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-16 05:35:11 +09:00

80 lines
3.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Performance Targetsmimalloc 追跡の“数値目標”)
目的: 速さだけでなく **syscall / メモリ安定性 / 長時間安定性**を含めて「勝ち筋」を固定する。
## Current snapshot2025-12-16, local
計測条件(再現の正):
- hakmem: `scripts/run_mixed_10_cleanenv.sh``ITERS=20000000 WS=400`、profile=`MIXED_TINYV3_C7_SAFE`
- system/mimalloc: `./bench_random_mixed_system 20000000 400 1` / `./bench_random_mixed_mi 20000000 400 1`各10-run
- same-binary libc: `HAKMEM_FORCE_LIBC_ALLOC=1 scripts/run_mixed_10_cleanenv.sh`10-run
- Git: `HEAD=4d9429e14`
結果10-run mean/median
| allocator | mean (M ops/s) | median (M ops/s) | ratio vs mimalloc (mean) |
|----------|-----------------|------------------|--------------------------|
| hakmem | 54.646 | 54.671 | 46.2% |
| libc (same binary) | 76.257 | 76.661 | 64.5% |
| system (separate) | 81.540 | 81.801 | 69.0% |
| mimalloc (separate)| 118.176| 118.497 | 100% |
Notes:
- `system/mimalloc` は別バイナリ計測のため **layouttext size/I-cache差分を含む reference**
- `libc (same binary)``HAKMEM_FORCE_LIBC_ALLOC=1` により、同一レイアウト上での比較の目安。
## 1) Speed相対目標
前提: **同一バイナリ**で hakmem vs mimalloc を比較する(別バイナリ比較は layout 差で壊れる)。
推奨マイルストーンMixed 161024B
- M1: mimalloc の **55%**(現状レンジの安定化)
- M2: mimalloc の **60%**(短期の現実目標)
- M3: mimalloc の **6570%**(大きめの構造改造が必要になりやすい境界)
## 2) Syscall budgetOS churn
Tiny hot path の理想:
- steady-statewarmup 後)で **mmap/munmap/madvise = 0**(または “ほぼ 0”
目安(許容):
- `mmap+munmap+madvise` 合計が **1e8 ops あたり 1 回以下**= 1e-8 / op
Current:
- `HAKMEM_SS_OS_STATS=1`Mixed, `iters=200000000 ws=400`:
- `[SS_OS_STATS] alloc=9 free=11 madvise=9 madvise_disabled=0 mmap_total=9 fallback_mmap=0 huge_alloc=0`
観測方法(どちらか):
- 内部: `HAKMEM_SS_OS_STATS=1``[SS_OS_STATS]`madvise/disabled 等)
- 外部: `perf stat` の syscall events か `strace -c`(短い実行で回数だけ見る)
## 3) Memory stabilityRSS / fragmentation
最低条件Mixed / ws 固定の soak
- RSS が **時間とともに単調増加しない**
- 1時間の soak で RSS drift が **+5% 以内**(目安)
Current:
- TBDsoak のテンプレは今後スクリプト化)
推奨指標:
- RSSpeak / steady
- page faults増え続けないこと
- allocator 内部の “inuse / committed” 比(取れるなら)
## 4) Long-run stability性能・一貫性
最低条件:
- 3060 分の soak で ops/s が **-5% 以上落ちない**
- CV変動係数**~12%** に収まる(現状の運用と整合)
Current:
- Mixed 10-run上の snapshot: CV ≈ 0.91%mean 54.646M / min 53.608M / max 55.311M
## 5) 判定ルール(運用)
- runtime 変更ENVのみ: GO 閾値 +1.0%Mixed 10-run mean
- build-level 変更compile-out 系): GO 閾値 +0.5%layout の揺れを考慮)