47 lines
1.5 KiB
Markdown
47 lines
1.5 KiB
Markdown
|
|
# mimalloc Gap Summary (Phase v11b-1)
|
||
|
|
|
||
|
|
## Current Status: 2025-12-12
|
||
|
|
|
||
|
|
### Throughput Comparison (Mixed 16-1024B, ws=400, 10M iter)
|
||
|
|
|
||
|
|
| Allocator | Throughput | vs mimalloc |
|
||
|
|
|-----------|------------|-------------|
|
||
|
|
| mimalloc | 65.5M ops/s | 1.00x |
|
||
|
|
| hakmem v11b-1 | 50.7M ops/s | **0.77x** |
|
||
|
|
|
||
|
|
### Progress Summary
|
||
|
|
|
||
|
|
| Phase | Throughput | vs mimalloc | Key Change |
|
||
|
|
|-------|------------|-------------|------------|
|
||
|
|
| v11a-4 | 38.6M | 0.59x | baseline |
|
||
|
|
| v11a-5 | 45.4M | 0.69x | alloc path: single switch + C7 early-exit |
|
||
|
|
| v11b-1 | 50.7M | **0.77x** | free path: single switch + C7 early-exit |
|
||
|
|
|
||
|
|
### perf stat Comparison (Mixed 16-1024B, v11a-5 data)
|
||
|
|
|
||
|
|
| Metric | mimalloc | hakmem | Ratio |
|
||
|
|
|--------|----------|--------|-------|
|
||
|
|
| cycles | ~500M | 1.04B | 2.1x |
|
||
|
|
| instructions | ~920M | 2.2B | 2.4x |
|
||
|
|
| cache-misses | ~90K | 408K | 4.5x |
|
||
|
|
| branch-misses | ~6.3M | 14.5M | 2.3x |
|
||
|
|
|
||
|
|
### Next Target
|
||
|
|
|
||
|
|
**フロント alloc/free 両方を最適化完了。次は backend core または cache locality 改善。**
|
||
|
|
|
||
|
|
Candidates:
|
||
|
|
1. **cache locality**: cache-misses 4.5x が最大差 → TLS page prefetch, hot page reuse
|
||
|
|
2. **instructions削減**: 2.4x → inline 化, マクロ展開
|
||
|
|
3. small-object v7 の small帯 (C2-C3) 設計
|
||
|
|
|
||
|
|
### Key Insight
|
||
|
|
|
||
|
|
- alloc + free 両パスで switch (jump table) 化が有効
|
||
|
|
- フロント層の最適化だけで v11a-4 → v11b-1 で +31% 改善 (38.6M → 50.7M)
|
||
|
|
- mimalloc との差は主に cache-misses (4.5x) と instructions (2.4x)
|
||
|
|
|
||
|
|
---
|
||
|
|
**Date**: 2025-12-12
|
||
|
|
**Phase**: v11b-1 complete
|