Phase 83-1 + Allocator Comparison: Switch dispatch fixed (NO-GO +0.32%), PROFILE correction, SCORECARD update
Key changes: - Phase 83-1: Switch dispatch fixed mode (tiny_inline_slots_switch_dispatch_fixed_box) - NO-GO (marginal +0.32%, branch reduction negligible) Reason: lazy-init pattern already optimal, Phase 78-1 pattern shows diminishing returns - Allocator comparison baseline update (10-run SSOT, WS=400, ITERS=20M): tcmalloc: 115.26M (92.33% of mimalloc) jemalloc: 97.39M (77.96% of mimalloc) system: 85.20M (68.24% of mimalloc) mimalloc: 124.82M (baseline) - hakmem PROFILE correction: scripts/run_mixed_10_cleanenv.sh + run_allocator_quick_matrix.sh PROFILE explicitly set to MIXED_TINYV3_C7_SAFE for hakmem measurements Result: baseline stabilized to 55.53M (44.46% of mimalloc) Previous unstable measurement (35.57M) was due to profile leak - Documentation: * PERFORMANCE_TARGETS_SCORECARD.md: Reference allocators + M1/M2 milestone status * PHASE83_1_SWITCH_DISPATCH_FIXED_RESULTS.md: Phase 83-1 analysis (NO-GO) * ALLOCATOR_COMPARISON_QUICK_RUNBOOK.md: Quick comparison procedure * ALLOCATOR_COMPARISON_SSOT.md: Detailed SSOT methodology - M2 milestone status: 44.46% (target 55%, gap -10.54pp) - structural improvements needed 🤖 Generated with Claude Code Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
@ -53,17 +53,60 @@ Note:
|
||||
|
||||
| allocator | mean (M ops/s) | median (M ops/s) | ratio vs mimalloc (mean) | CV |
|
||||
|----------|-----------------|------------------|--------------------------|-----|
|
||||
| **mimalloc (separate)** | **120.979** | 120.967 | **100%** | 0.90% |
|
||||
| jemalloc (LD_PRELOAD) | 96.06 | 97.00 | 79.73% | 2.93% |
|
||||
| system (separate) | 85.10 | 85.24 | 70.65% | 1.01% |
|
||||
| **mimalloc (separate)** | **124.82** | 124.71 | **100%** | 1.10% |
|
||||
| **tcmalloc (LD_PRELOAD)** | **115.26** | 115.51 | **92.33%** | 1.22% |
|
||||
| **jemalloc (LD_PRELOAD)** | **97.39** | 97.88 | **77.96%** | 1.29% |
|
||||
| **system (separate)** | **85.20** | 85.40 | **68.24%** | 1.98% |
|
||||
| libc (same binary) | 76.26 | 76.66 | 63.30% | (old) |
|
||||
|
||||
Notes:
|
||||
- **Phase 59b rebase**: mimalloc updated (120.466M → 120.979M, +0.43% variation)
|
||||
- `system/mimalloc/jemalloc` は別バイナリ計測のため **layout(text size/I-cache)差分を含む reference**
|
||||
- **2025-12-18 Update (corrected)**: tcmalloc/jemalloc/system 計測完了 (10-run Random Mixed, WS=400, ITERS=20M, SEED=1)
|
||||
- tcmalloc: 115.26M ops/s (92.33% of mimalloc) ✓
|
||||
- jemalloc: 97.39M ops/s (77.96% of mimalloc)
|
||||
- system: 85.20M ops/s (68.24% of mimalloc)
|
||||
- mimalloc: 124.82M ops/s (baseline)
|
||||
- 計測スクリプト: `scripts/run_allocator_quick_matrix.sh` (hakmem via run_mixed_10_cleanenv.sh)
|
||||
- **修正**: hakmem 計測が HAKMEM_PROFILE を明示するように修正 → SSOT レンジ復帰
|
||||
- `system/mimalloc/jemalloc/tcmalloc` は別バイナリ計測のため **layout(text size/I-cache)差分を含む reference**
|
||||
- `tcmalloc (LD_PRELOAD)` は gperftools から install (`/usr/lib/x86_64-linux-gnu/libtcmalloc_minimal.so`)
|
||||
- `libc (same binary)` は `HAKMEM_FORCE_LIBC_ALLOC=1` により、同一レイアウト上での比較の目安(Phase 48 前計測)
|
||||
- **mimalloc 比較は FAST build を使用すること**(Standard の gate overhead は hakmem 固有の税)
|
||||
- **jemalloc 初回計測**: 79.73% of mimalloc(Phase 59 baseline, system より 9% 速い strong competitor)
|
||||
- 比較手順(SSOT): `docs/analysis/ALLOCATOR_COMPARISON_SSOT.md`
|
||||
- **同一バイナリ比較(layout差を最小化)**: `scripts/run_allocator_preload_matrix.sh`(`bench_random_mixed_system` 固定 + `LD_PRELOAD` 差し替え)
|
||||
- 注意: hakmem の SSOT(`bench_random_mixed_hakmem*`)とは経路が異なる(drop-in wrapper reference)
|
||||
|
||||
## Allocator Comparison(bench_allocators_compare.sh, small-scale reference)
|
||||
|
||||
注意:
|
||||
- これは `bench_allocators_*` の `--scenario mixed`(8B..1MB の簡易混合)による **small-scale reference**。
|
||||
- Mixed 16–1024B SSOT(`scripts/run_mixed_10_cleanenv.sh`)とは **別物**なので、FAST baseline/マイルストーンとは混同しない。
|
||||
|
||||
実行(例):
|
||||
```bash
|
||||
make bench
|
||||
JEMALLOC_SO=/path/to/libjemalloc.so.2 \
|
||||
TCMALLOC_SO=/path/to/libtcmalloc.so \
|
||||
scripts/bench_allocators_compare.sh --scenario mixed --iterations 50
|
||||
```
|
||||
|
||||
結果(2025-12-18, mixed, iterations=50):
|
||||
|
||||
| allocator | ops/sec (M) | vs mimalloc (Phase 69 ref) | vs system | soft_pf | RSS (MB) |
|
||||
|----------|--------------|----------------------------|-----------|---------|----------|
|
||||
| tcmalloc (LD_PRELOAD) | 34.56 | 28.6% | 11.2x | 3,842 | 21.5 |
|
||||
| jemalloc (LD_PRELOAD) | 24.33 | 20.1% | 7.9x | 143 | 3.8 |
|
||||
| hakmem (linked) | 16.85 | 13.9% | 5.4x | 4,701 | 46.5 |
|
||||
| system (linked) | 3.09 | 2.6% | 1.0x | 68,590 | 19.6 |
|
||||
|
||||
補足:
|
||||
- `soft_pf`/`RSS` は `getrusage()` 由来(Linux の `ru_maxrss` は KB)。
|
||||
|
||||
## Allocator Comparison(Random Mixed, 10-run, WS=400, reference)
|
||||
|
||||
注意:
|
||||
- 別バイナリ比較は layout tax が混ざる。
|
||||
- **同一バイナリ比較(LD_PRELOAD)を優先**したい場合は `scripts/run_allocator_preload_matrix.sh` を使う。
|
||||
|
||||
## 1) Speed(相対目標)
|
||||
|
||||
@ -71,14 +114,16 @@ Notes:
|
||||
|
||||
推奨マイルストーン(Mixed 16–1024B, FAST build):
|
||||
|
||||
| Milestone | Target | Current (FAST v3 + PGO Phase 69) | Status |
|
||||
| Milestone | Target | Current (2025-12-18, corrected) | Status |
|
||||
|-----------|--------|-----------------------------------|--------|
|
||||
| M1 | mimalloc の **50%** | 51.77% | 🟢 **EXCEEDED** (Phase 69, Warm Pool Size=16, ENV-only) |
|
||||
| M2 | mimalloc の **55%** | - | 🔴 未達(残り +3.23pp、Phase 69+ 継続中)|
|
||||
| M1 | mimalloc の **50%** | 44.46% | 🟡 **未達** (PROFILE 修正後の計測) |
|
||||
| M2 | mimalloc の **55%** | 44.46% | 🔴 **未達** (Gap: -10.54pp)|
|
||||
| M3 | mimalloc の **60%** | - | 🔴 未達(構造改造必要)|
|
||||
| M4 | mimalloc の **65–70%** | - | 🔴 未達(構造改造必要)|
|
||||
|
||||
**現状:** FAST v3 + PGO (Phase 69) = 62.63M ops/s = mimalloc の 51.77%(Warm Pool Size=16, ENV-only, 10-run 検証済み)
|
||||
**現状:** hakmem (FAST PGO) (2025-12-18) = 55.53M ops/s = mimalloc の 44.46%(Random Mixed, WS=400, ITERS=20M, 10-run)
|
||||
|
||||
⚠️ **重要**: Phase 69 baseline (62.63M = 51.77%) は古い計測条件の可能性。PROFILE 明示修正後の新 baseline は 44.46%(M1 未達)。
|
||||
|
||||
**Phase 68 PGO 昇格(Phase 66 → Phase 68 upgrade):**
|
||||
- Phase 66 baseline: 60.89M ops/s = 50.32% (+3.0% mean, 3-run stable)
|
||||
|
||||
Reference in New Issue
Block a user