Summary: - Phase 24 (alloc stats): +0.93% GO - Phase 25 (free stats): +1.07% GO - Phase 26 (diagnostics): -0.33% NEUTRAL (code cleanliness) - Total: 11 atomics compiled-out, +2.00% improvement Phase 24: OBSERVE tax prune (tiny_class_stats_box.h) - Added HAKMEM_TINY_CLASS_STATS_COMPILED (default: 0) - Wrapped 5 stats functions: uc_miss, warm_hit, shared_lock, tls_carve_* - Result: +0.93% (baseline 56.675M vs compiled-in 56.151M ops/s) Phase 25: Tiny free stats prune (tiny_superslab_free.inc.h) - Added HAKMEM_TINY_FREE_STATS_COMPILED (default: 0) - Wrapped g_free_ss_enter atomic in free hot path - Result: +1.07% (baseline 57.017M vs compiled-in 56.415M ops/s) Phase 26: Hot path diagnostic atomics prune - Added 5 compile gates for low-frequency error counters: - HAKMEM_TINY_C7_FREE_COUNT_COMPILED - HAKMEM_TINY_HDR_MISMATCH_LOG_COMPILED - HAKMEM_TINY_HDR_META_MISMATCH_COMPILED - HAKMEM_TINY_METRIC_BAD_CLASS_COMPILED - HAKMEM_TINY_HDR_META_FAST_COMPILED - Result: -0.33% NEUTRAL (within noise, kept for cleanliness) Alignment with mimalloc principles: - "No atomics on hot path" - telemetry moved to compile-time opt-in - Fixed per-op tax elimination - Production builds: maximum performance (atomics compiled-out) - Research builds: full diagnostics (COMPILED=1) Generated with Claude Code https://claude.com/claude-code Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
80 lines
3.3 KiB
Markdown
80 lines
3.3 KiB
Markdown
# Performance Targets(mimalloc 追跡の“数値目標”)
|
||
|
||
目的: 速さだけでなく **syscall / メモリ安定性 / 長時間安定性**を含めて「勝ち筋」を固定する。
|
||
|
||
## Current snapshot(2025-12-16, local)
|
||
|
||
計測条件(再現の正):
|
||
|
||
- hakmem: `scripts/run_mixed_10_cleanenv.sh`(`ITERS=20000000 WS=400`、profile=`MIXED_TINYV3_C7_SAFE`)
|
||
- system/mimalloc: `./bench_random_mixed_system 20000000 400 1` / `./bench_random_mixed_mi 20000000 400 1`(各10-run)
|
||
- same-binary libc: `HAKMEM_FORCE_LIBC_ALLOC=1 scripts/run_mixed_10_cleanenv.sh`(10-run)
|
||
- Git: `HEAD=4d9429e14`
|
||
|
||
結果(10-run mean/median):
|
||
|
||
| allocator | mean (M ops/s) | median (M ops/s) | ratio vs mimalloc (mean) |
|
||
|----------|-----------------|------------------|--------------------------|
|
||
| hakmem | 54.646 | 54.671 | 46.2% |
|
||
| libc (same binary) | 76.257 | 76.661 | 64.5% |
|
||
| system (separate) | 81.540 | 81.801 | 69.0% |
|
||
| mimalloc (separate)| 118.176| 118.497 | 100% |
|
||
|
||
Notes:
|
||
- `system/mimalloc` は別バイナリ計測のため **layout(text size/I-cache)差分を含む reference**。
|
||
- `libc (same binary)` は `HAKMEM_FORCE_LIBC_ALLOC=1` により、同一レイアウト上での比較の目安。
|
||
|
||
## 1) Speed(相対目標)
|
||
|
||
前提: **同一バイナリ**で hakmem vs mimalloc を比較する(別バイナリ比較は layout 差で壊れる)。
|
||
|
||
推奨マイルストーン(Mixed 16–1024B):
|
||
|
||
- M1: mimalloc の **55%**(現状レンジの安定化)
|
||
- M2: mimalloc の **60%**(短期の現実目標)
|
||
- M3: mimalloc の **65–70%**(大きめの構造改造が必要になりやすい境界)
|
||
|
||
## 2) Syscall budget(OS churn)
|
||
|
||
Tiny hot path の理想:
|
||
- steady-state(warmup 後)で **mmap/munmap/madvise = 0**(または “ほぼ 0”)
|
||
|
||
目安(許容):
|
||
- `mmap+munmap+madvise` 合計が **1e8 ops あたり 1 回以下**(= 1e-8 / op)
|
||
|
||
Current:
|
||
- `HAKMEM_SS_OS_STATS=1`(Mixed, `iters=200000000 ws=400`):
|
||
- `[SS_OS_STATS] alloc=9 free=11 madvise=9 madvise_disabled=0 mmap_total=9 fallback_mmap=0 huge_alloc=0`
|
||
|
||
観測方法(どちらか):
|
||
- 内部: `HAKMEM_SS_OS_STATS=1` の `[SS_OS_STATS]`(madvise/disabled 等)
|
||
- 外部: `perf stat` の syscall events か `strace -c`(短い実行で回数だけ見る)
|
||
|
||
## 3) Memory stability(RSS / fragmentation)
|
||
|
||
最低条件(Mixed / ws 固定の soak):
|
||
- RSS が **時間とともに単調増加しない**
|
||
- 1時間の soak で RSS drift が **+5% 以内**(目安)
|
||
|
||
Current:
|
||
- TBD(soak のテンプレは今後スクリプト化)
|
||
|
||
推奨指標:
|
||
- RSS(peak / steady)
|
||
- page faults(増え続けないこと)
|
||
- allocator 内部の “inuse / committed” 比(取れるなら)
|
||
|
||
## 4) Long-run stability(性能・一貫性)
|
||
|
||
最低条件:
|
||
- 30–60 分の soak で ops/s が **-5% 以上落ちない**
|
||
- CV(変動係数)が **~1–2%** に収まる(現状の運用と整合)
|
||
|
||
Current:
|
||
- Mixed 10-run(上の snapshot): CV ≈ 0.91%(mean 54.646M / min 53.608M / max 55.311M)
|
||
|
||
## 5) 判定ルール(運用)
|
||
|
||
- runtime 変更(ENVのみ): GO 閾値 +1.0%(Mixed 10-run mean)
|
||
- build-level 変更(compile-out 系): GO 閾値 +0.5%(layout の揺れを考慮)
|