Files
hakmem/docs/benchmarks/LARSON_TINY_PERF_2025-11-02.md
Moe Charm (CI) 52386401b3 Debug Counters Implementation - Clean History
Major Features:
- Debug counter infrastructure for Refill Stage tracking
- Free Pipeline counters (ss_local, ss_remote, tls_sll)
- Diagnostic counters for early return analysis
- Unified larson.sh benchmark runner with profiles
- Phase 6-3 regression analysis documentation

Bug Fixes:
- Fix SuperSlab disabled by default (HAKMEM_TINY_USE_SUPERSLAB)
- Fix profile variable naming consistency
- Add .gitignore patterns for large files

Performance:
- Phase 6-3: 4.79 M ops/s (has OOM risk)
- With SuperSlab: 3.13 M ops/s (+19% improvement)

This is a clean repository without large log files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-05 12:31:14 +09:00

58 lines
2.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Larson Tiny Contention: perf summary (2025-11-02)
Target: 8128B, chunks=1024, rounds=1, seed=12345, duration=2s
- Binaries: `larson_system`, `larson_mi`, `larson_hakmem`(直リンク; LD_PRELOAD不使用
- HAKMEM env: `HAKMEM_QUIET=1 HAKMEM_DISABLE_BATCH=1 HAKMEM_TINY_META_ALLOC=1 HAKMEM_TINY_META_FREE=1`
- Scripts:
- Run: `scripts/run_larson.sh -d 2 -t 1,4`
- Perf: `scripts/run_larson_perf.sh`(出力: `scripts/bench_results/larson_perf_*.txt`
## Throughput (ops/sec)
- 1T: system ~14.7M / mimalloc ~16.8M / HAKMEM ~2.4M
- 4T: system ~16.8M / mimalloc ~16.8M / HAKMEM ~4.2M
HAKMEMはMid/Large MTではmimallocを上回る一方、Tiny高競合Larsonでは大きく劣後。
## perf stat highlights4T, 2s
出力: `scripts/bench_results/larson_perf_{system,mimalloc,hakmem}_4T_2s_8-128.txt`
- HAKMEM
- page-faults: ~0.91M13.1K/sec
- IPC: ~0.92、branch-miss: ~7.5%、L1d-miss: ~4.4%
- user ~0.98s / sys ~3.81ssysが支配的
- 観測: SuperSlabの新規ページタッチ・ゼロ化が多いPF・sys時間増
- mimalloc
- page-faults: ~0.087M1.3K/sec
- IPC: ~0.77、branch-miss: ~7.3%、L1d-miss: ~6.6%
- system
- page-faults: ~0.078M1.18K/sec
- IPC: ~0.93、branch-miss: ~5.9%、L1d-miss: ~4.7%
## perf reportHAKMEM, 4T
サンプル上位はカーネル(ページフォールト処理系)と`memset`。ユーザランド側は`hak_free_at``hak_tiny_alloc{,_slow}`などが小さく見えるのみ。
## 解釈・次の最適化
- Tiny高競合での主因は「再利用不足→ページタッチ/フォールト過多→sys時間増」。
- HAKMEMのfree/allocのマイクロコスト差より、メモリ側PF/キャッシュ)のペナルティが支配的。
改善案(優先度)
- Tiny tcacheSLL, 32/64/128B, cap小: 即時返却/即時再利用でPF削減
- SuperSlab版ターゲットキュー: prefix pendingが閾値超でクラス別ワークキューに載せ、所有者不在でも排出を前進
- 併行: Mid registryシャーディング+read側lock-free、L25/Mid page-end prefix
## 再現手順
```bash
make larson_hakmem larson_system larson_mi
scripts/run_larson.sh -d 2 -t 1,4
scripts/run_larson_perf.sh
```