From 24fad8f72fe25f53a7edd2cd70432b46ea004d9b Mon Sep 17 00:00:00 2001 From: "Moe Charm (CI)" Date: Sat, 29 Nov 2025 11:28:51 +0900 Subject: [PATCH] docs: Add comprehensive allocator benchmark comparison (Phase 3) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Benchmark Results: - bench_random_mixed: hakmem 56.8M, system 84.5M, mimalloc 107M - bench_tiny_hot: hakmem 81.0M, system 156.3M - bench_mid_large_mt: hakmem 9.94M, system 8.40M (hakmem wins! +18.3%) Key Findings: 1. Tiny allocations: hakmem is 0.52x slower than mimalloc (main weakness) 2. Mid/Large MT: hakmem is 1.18x faster than system (strength!) 3. Identified Tiny Front as optimization target for Phase 4 This benchmark comparison informed the Phase 4 optimization strategy: - Focus on Tiny Front bottleneck (15-20 branches) - Target: 2x improvement via PGO + Hot/Cold separation + Config optimization - Expected: 56.8M → 110M+ ops/s (closing gap with mimalloc) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude --- BENCHMARK_COMPARISON_PHASE3.md | 157 +++++++++++++++++++++++++++++++++ 1 file changed, 157 insertions(+) create mode 100644 BENCHMARK_COMPARISON_PHASE3.md diff --git a/BENCHMARK_COMPARISON_PHASE3.md b/BENCHMARK_COMPARISON_PHASE3.md new file mode 100644 index 00000000..4b6f1586 --- /dev/null +++ b/BENCHMARK_COMPARISON_PHASE3.md @@ -0,0 +1,157 @@ +# HAKMEM Allocator - Benchmark Comparison (Phase 3) + +**測定日**: 2025-11-29 +**Phase**: Phase 3 (mincore完全削除後) +**Commit**: d78baf41c - Phase 3: Remove mincore() syscall completely + +--- + +## 📊 ベンチマーク結果サマリー + +### 1. bench_random_mixed (混合ランダムサイズ: 16-1024B) + +**Parameters**: 1M iterations, working set=256 + +| Allocator | Throughput | vs hakmem | +|-----------|------------|-----------| +| **hakmem** | **56.8M ops/s** | 1.00x | +| system (glibc) | 84.5M ops/s | **1.49x faster** | +| mimalloc | 107.0M ops/s | **1.88x faster** | + +**Result**: ❌ hakmem が最下位 (system の 67%, mimalloc の 53%) + +--- + +### 2. bench_tiny_hot (固定サイズ: 64B, ホットパス) + +**Parameters**: 64B allocations, batch=100, cycles=100K (20M total ops) + +| Allocator | Throughput | Latency | vs hakmem | +|-----------|------------|---------|-----------| +| **hakmem** | **81.0M ops/s** | 12.35 ns/op | 1.00x | +| system (glibc) | 156.3M ops/s | 6.40 ns/op | **1.93x faster** | +| mimalloc | N/A (lib error) | N/A | - | + +**Result**: ❌ hakmem が遅い (system の 52%) + +--- + +### 3. bench_mid_large_mt (Mid/Large マルチスレッド) + +**Parameters**: Default (multithreaded allocations) + +| Allocator | Throughput | vs hakmem | +|-----------|------------|-----------| +| **hakmem** | **9.94M ops/s** | 1.00x ← **Winner!** | +| system (glibc) | 8.40M ops/s | 0.85x (slower) | +| mimalloc | timeout | - | + +**Result**: ✅ **hakmem が勝利!** (system より +18.3% 速い) + +--- + +## 🎯 総合評価 + +### 強み (hakmem が勝つケース) +- ✅ **Mid/Large マルチスレッド**: system より +18.3% 速い +- ✅ **効率性**: IPC, Branch Miss率, Cache Miss率は優秀(前回分析より) + +### 弱み (hakmem が負けるケース) +- ❌ **Small 混合ランダム**: mimalloc の 53%, system の 67% +- ❌ **Tiny ホットパス**: system の 52% + +--- + +## 📈 Phase 2 → Phase 3 の変化 + +### bench_random_mixed (ws=256) +- **Phase 2**: 51.3M ops/s (commit 0ce20bb83) +- **Phase 3**: 56.8M ops/s (commit d78baf41c) +- **Change**: **+10.7% 改善** ✓ (mincore削除の効果) + +### Phase 3の変更内容 +``` +Phase 3: Remove mincore() syscall completely +- Makefile: DISABLE_MINCORE 設定削除 +- hak_free_api.inc.h: mincore ロジック削除 (~60 lines) +- external_guard_box.h: is_mapped() を常に 1 を返すように簡略化 +``` + +**Performance Impact**: +10.7% improvement (mincore overhead eliminated) + +--- + +## 🔍 ボトルネック分析 (from PERF_ANALYSIS_EXECUTIVE_SUMMARY.md) + +### Top 3 問題点 + +#### 1位: SuperSlab 初期化の Page Fault (23.83% CPU時間) +- `shared_pool_acquire_slab()` → memset(1MB-2MB) → page fault +- **原因**: mmap が既にゼロページを返すのに、4つの memset() を実行 +- **対策**: Lines 912-915 in hakmem_tiny_superslab.c をコメントアウト +- **推定効果**: +10-15% throughput + +#### 2位: mincore() syscall overhead (17.94% CPU時間) ← **Phase 3で解決済み** +- **Phase 2**: 毎回のfreeでmincore()呼び出し (TLSキャッシュあり) +- **Phase 3**: 完全削除 → +10.7% 改善確認済み ✓ + +#### 3位: getenv() overhead (9.86% CPU時間) +- 毎回のmalloc/freeでgetenv()呼び出し +- **対策**: TLSキャッシュ、または __attribute__((constructor))で初期化 + +--- + +## 🚀 Phase 4 最適化ターゲット + +### Priority 1: SuperSlab memset() 削除 (Expected: +10-15%) +```bash +# Lines 912-915 in hakmem_tiny_superslab.c をコメントアウト +# mmap(MAP_ANONYMOUS) は既にゼロ初期化済み +``` + +### Priority 2: getenv() TLS キャッシュ (Expected: +5-8%) +```bash +# malloc/free の高頻度 getenv() 呼び出しをTLSキャッシュ化 +``` + +### Priority 3: Branch Optimization/PGO (Expected: +3-5%) +```bash +# Profile-Guided Optimization + LIKELY/UNLIKELY hints +``` + +**Total Expected**: +18-28% improvement → **Target: 67-73M ops/s** (bench_random_mixed) + +--- + +## 📌 次のアクション + +### Option A: Phase 4 最適化開始 +1. SuperSlab memset() 削除 (+10-15%) +2. getenv() TLSキャッシュ化 (+5-8%) +3. PGO/branch optimization (+3-5%) + +### Option B: 詳細プロファイル (Phase 3 baseline) +```bash +# 現在のPhase 3ベースラインでperf record +perf record -F 9999 -g ./bench_random_mixed_hakmem 5000000 256 42 +perf report --stdio --no-children --sort symbol +``` + +### Option C: mimalloc との徹底比較 +- mimalloc のビルド修正(libmimalloc.so.1 パス設定) +- 全ベンチマークで3回実行して統計取得 + +--- + +## 🔗 参考ドキュメント + +- `CHECKPOINT_PHASE2_COMPLETE.md` - Phase 2 Box化完了 +- `PHASE2_PERF_ANALYSIS.md` - Phase 2 パフォーマンス分析 +- `PERF_ANALYSIS_EXECUTIVE_SUMMARY.md` - ボトルネック分析 +- `perf_phase2_*.txt` - Phase 2 詳細プロファイル結果 + +--- + +Generated: 2025-11-29 +Phase: 3 (mincore removal complete) +Next: Phase 4 optimization targets (+18-28% expected)