Files
hakmem/docs/BENCH_REPORT_2025_11_09.md
Moe Charm (CI) 70ad1ffb87 Tiny: Enable P0→FC direct path for class7 (1KB) by default + docs
- Class7 (1KB): P0 direct-to-FastCache now default ON (HAKMEM_TINY_P0_DIRECT_FC_C7 unset or not '0').
- Keep A/B gates: HAKMEM_TINY_P0_ENABLE, HAKMEM_TINY_P0_DIRECT_FC (class5), HAKMEM_TINY_P0_DIRECT_FC_C7 (class7),
  HAKMEM_TINY_P0_DRAIN_THRESH (default 32), HAKMEM_TINY_P0_NO_DRAIN, HAKMEM_TINY_P0_LOG.
- P0 batch now supports class7 direct fill in addition to class5: gather (drain thresholded → freelist pop → linear carve)
  without writing into objects, then bulk-push into FC, update meta/active counters once.
- Docs: Update direct-FC defaults (class5+class7 ON) in docs/TINY_P0_BATCH_REFILL.md.

Notes
- Use tools/bench_rs_from_files.sh for RS(hakmem/system) to compare runs across CPUs.
- Next: parameter sweep for class7 (FC cap/batch limit/drain threshold) and perf counters A/B.
2025-11-09 23:15:02 +09:00

37 lines
2.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Bench Report — 2025-11-09 (Tiny P0=ON, Release)
Summary
- Tiny (Random Mixed, 1T): P0 ON 安定。256B ≈ 2.84M ops/s、1024B ≈ 2.63M ops/s。
- System 比較(同ベンチ): 256B ≈ 58.08M ops/s、1024B ≈ 49.36M ops/s注: 異実装/最適化差。分岐/tcache等
- Pool TLS: HAKMEM > System1Tで+18%程度、4Tで+2%程度)。
- MidLarge/ Larzon: 概況は安定。詳細は追補(追加抽出スクリプトで集計予定)。
Tiny — Random Mixed (1T, 100k)
- HAKMEM 256B: Throughput = 2,842,497 ops/s (0.035s)
- HAKMEM 1024B: Throughput = 2,627,861 ops/s (0.038s)
- System 256B: Throughput = 58,078,114 ops/s (0.002s)
- System 1024B: Throughput = 49,361,582 ops/s (0.002s)
Pool TLS (852KB)
- HAKMEM 1T (100k, 256): 5,979,774 ops/s (0.017s)
- HAKMEM 4T (50k, 256): 13,315,913 ops/s (0.015s)
- System 1T (100k, 256): 5,056,446 ops/s (0.020s)
- System 4T (50k, 256): 13,022,558 ops/s (0.015s)
Notes
- 現行のRandom Mixedと、過去のmimallocレポートの数値はベンチ種別/規模が異なるため、ops/sの絶対比較は参考値。
- mimalloc過去, Random系マイクロベンチ: 16.53 → 24.00M ops/s設計/段階最適化の到達目標)
- 本レポートは Tiny/P0 ラインの機能安定化後の公開値。分岐ヒント/クラス5/6前段優先のA/Bで更に改善余地あり。
Runtime Switches (Tiny P0)
- 既定ON: HAKMEM_TINY_P0_ENABLE unset or not '0'
- OFF: HAKMEM_TINY_P0_ENABLE=0または HAKMEM_TINY_P0_DISABLE=1
- Remote drain 無効切り分けHAKMEM_TINY_P0_NO_DRAIN=1
- P0ログ: HAKMEM_TINY_P0_LOG=1active_delta vs taken の整合ログ)
Appendix — mimalloc 過去実績(参考)
- MIMALLOC_KEY_FINDINGS.md: HAKMEM 16.53M ops/s → mimalloc 24.21M ops/s当時
- MIMALLOC_ANALYSIS_REPORT.md: 段階最適化で 24.00M ops/s 到達を目標化
- 現行Random Mixedベンチは条件/実装が異なるため、相対比較は参考とし、同一シナリオA/Bsystem/HAKMEM/mimalloc直結を別途準備予定