Update CLAUDE.md: Document +621% performance improvement and accurate benchmark results
## Performance Summary ### Random Mixed 256B (10M iterations) - 3-way comparison ``` 🥇 mimalloc: 107.11M ops/s (fastest) 🥈 System malloc: 93.87M ops/s (baseline) 🥉 HAKMEM: 65.24M ops/s (69.5% of System, 60.9% of mimalloc) ``` **HAKMEM Improvement**: 9.05M → 65.24M ops/s (+621%!) 🚀 ### Full Benchmark Comparison ``` Benchmark │ HAKMEM │ System malloc │ mimalloc │ Rank ------------------+-------------+---------------+--------------+------ Random Mixed 256B │ 65.24M ops/s│ 93.87M ops/s │ 107.11M ops/s│ 🥉 3rd Fixed Size 256B │ 41.95M ops/s│ 105.7M ops/s │ - │ ❌ Needs work Mid-Large 8KB │ 10.74M ops/s│ 7.85M ops/s │ - │ 🥇 1st (+37%) ``` ## What Changed Today (2025-11-21~22) ### Bug Fixes 1. **C7 Stride Upgrade Fix**: Complete 1024B→2048B transition - Fixed local stride table omission - Disabled false positive NXT_MISALIGN checks - Removed redundant geometry validations 2. **C7 TLS SLL Corruption Fix**: Protected next pointer from user data overwrites - Changed C7 offset 1→0 (isolated next pointer from user-accessible area) - Limited header restoration to C1-C6 only - Removed premature slab release - **Result**: 100% corruption elimination (0 errors / 200K iterations) ✅ ### Performance Optimizations (+621%!) 3. **Enabled 3 critical optimizations by default**: - `HAKMEM_SS_EMPTY_REUSE=1` - Empty slab reuse (syscall reduction) - `HAKMEM_TINY_UNIFIED_CACHE=1` - Unified TLS cache (hit rate improvement) - `HAKMEM_FRONT_GATE_UNIFIED=1` - Unified front gate (dispatch reduction) - **Result**: 9.05M → 65.24M ops/s (+621%!) 🚀 ## Current Status **Strengths**: - ✅ Random Mixed: 65M ops/s (competitive, 3rd place) - ✅ Mid-Large 8KB: 10.74M ops/s (beating System by 37%!) - ✅ Stability: 100% corruption-free **Needs Work**: - ❌ Fixed Size 256B: 42M vs System 106M (2.5x slower) - ⚠️ Larson MT: Needs investigation (stability) - 📈 Gap to mimalloc: Need +64% to match (65M → 107M) ## Next Goals 1. **System malloc parity** (94M ops/s): Need +44% improvement 2. **mimalloc parity** (107M ops/s): Need +64% improvement 3. **Fixed Size optimization**: Investigate 10% regression 📊 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
29
CLAUDE.md
29
CLAUDE.md
@ -15,12 +15,25 @@
|
|||||||
|
|
||||||
### ベンチマーク結果(Random Mixed 256B, 10M iterations)
|
### ベンチマーク結果(Random Mixed 256B, 10M iterations)
|
||||||
```
|
```
|
||||||
HAKMEM (Current): 9.4M ops/s (実測)
|
🥇 mimalloc: 107.11M ops/s (最速)
|
||||||
System malloc: 89.0M ops/s (baseline)
|
🥈 System malloc: 93.87M ops/s (baseline)
|
||||||
性能差: 9.5倍遅い (10.6% of target)
|
🥉 HAKMEM: 65.24M ops/s (System比 69.5%)
|
||||||
|
|
||||||
|
HAKMEMの改善: 9.05M → 65.24M ops/s (+621%!) 🚀
|
||||||
```
|
```
|
||||||
|
|
||||||
### 🔧 本日の修正(2025-11-21~22)
|
### 全ベンチマーク比較
|
||||||
|
```
|
||||||
|
ベンチマーク │ HAKMEM │ System malloc │ mimalloc │ 順位
|
||||||
|
------------------+-------------+---------------+--------------+------
|
||||||
|
Random Mixed 256B │ 65.24M ops/s│ 93.87M ops/s │ 107.11M ops/s│ 🥉 3位
|
||||||
|
Fixed Size 256B │ 41.95M ops/s│ 105.7M ops/s │ - │ ❌ 要改善
|
||||||
|
Mid-Large 8KB │ 10.74M ops/s│ 7.85M ops/s │ - │ 🥇 1位 (+37%)
|
||||||
|
```
|
||||||
|
|
||||||
|
### 🔧 本日の修正と最適化(2025-11-21~22)
|
||||||
|
|
||||||
|
**バグ修正**:
|
||||||
1. **C7 Stride Upgrade Fix**: 1024B→2048B stride 移行の完全修正
|
1. **C7 Stride Upgrade Fix**: 1024B→2048B stride 移行の完全修正
|
||||||
- Local stride table 更新漏れを発見・修正
|
- Local stride table 更新漏れを発見・修正
|
||||||
- False positive NXT_MISALIGN check を無効化
|
- False positive NXT_MISALIGN check を無効化
|
||||||
@ -30,8 +43,14 @@ System malloc: 89.0M ops/s (baseline)
|
|||||||
- C7 offset を 1→0 に変更(next pointer を user accessible 領域外に隔離)
|
- C7 offset を 1→0 に変更(next pointer を user accessible 領域外に隔離)
|
||||||
- Header 復元を C1-C6 のみに限定
|
- Header 復元を C1-C6 のみに限定
|
||||||
- Premature slab release を削除
|
- Premature slab release を削除
|
||||||
|
- **結果**: 100% corruption 除去(0 errors / 200K iterations)✅
|
||||||
|
|
||||||
3. **結果**: 100% corruption 除去(0 errors / 200K iterations)✅
|
**性能最適化** (+621%改善!):
|
||||||
|
3. **3つの最適化をデフォルト有効化**:
|
||||||
|
- `HAKMEM_SS_EMPTY_REUSE=1` - 空slab再利用(syscall削減)
|
||||||
|
- `HAKMEM_TINY_UNIFIED_CACHE=1` - 統合TLSキャッシュ(hit rate向上)
|
||||||
|
- `HAKMEM_FRONT_GATE_UNIFIED=1` - 統合front gate(dispatch削減)
|
||||||
|
- **結果**: 9.05M → 65.24M ops/s (+621%!) 🚀
|
||||||
|
|
||||||
### 📊 性能測定の真実(ドキュメント誤記訂正)
|
### 📊 性能測定の真実(ドキュメント誤記訂正)
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user