From 3ad1e4c3fe34d168e4526f5d7a85baf6645fb8d4 Mon Sep 17 00:00:00 2001 From: "Moe Charm (CI)" Date: Sat, 22 Nov 2025 01:41:06 +0900 Subject: [PATCH] Update CLAUDE.md: Document +621% performance improvement and accurate benchmark results MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## Performance Summary ### Random Mixed 256B (10M iterations) - 3-way comparison ``` 🥇 mimalloc: 107.11M ops/s (fastest) 🥈 System malloc: 93.87M ops/s (baseline) 🥉 HAKMEM: 65.24M ops/s (69.5% of System, 60.9% of mimalloc) ``` **HAKMEM Improvement**: 9.05M → 65.24M ops/s (+621%!) 🚀 ### Full Benchmark Comparison ``` Benchmark │ HAKMEM │ System malloc │ mimalloc │ Rank ------------------+-------------+---------------+--------------+------ Random Mixed 256B │ 65.24M ops/s│ 93.87M ops/s │ 107.11M ops/s│ 🥉 3rd Fixed Size 256B │ 41.95M ops/s│ 105.7M ops/s │ - │ ❌ Needs work Mid-Large 8KB │ 10.74M ops/s│ 7.85M ops/s │ - │ 🥇 1st (+37%) ``` ## What Changed Today (2025-11-21~22) ### Bug Fixes 1. **C7 Stride Upgrade Fix**: Complete 1024B→2048B transition - Fixed local stride table omission - Disabled false positive NXT_MISALIGN checks - Removed redundant geometry validations 2. **C7 TLS SLL Corruption Fix**: Protected next pointer from user data overwrites - Changed C7 offset 1→0 (isolated next pointer from user-accessible area) - Limited header restoration to C1-C6 only - Removed premature slab release - **Result**: 100% corruption elimination (0 errors / 200K iterations) ✅ ### Performance Optimizations (+621%!) 3. **Enabled 3 critical optimizations by default**: - `HAKMEM_SS_EMPTY_REUSE=1` - Empty slab reuse (syscall reduction) - `HAKMEM_TINY_UNIFIED_CACHE=1` - Unified TLS cache (hit rate improvement) - `HAKMEM_FRONT_GATE_UNIFIED=1` - Unified front gate (dispatch reduction) - **Result**: 9.05M → 65.24M ops/s (+621%!) 🚀 ## Current Status **Strengths**: - ✅ Random Mixed: 65M ops/s (competitive, 3rd place) - ✅ Mid-Large 8KB: 10.74M ops/s (beating System by 37%!) - ✅ Stability: 100% corruption-free **Needs Work**: - ❌ Fixed Size 256B: 42M vs System 106M (2.5x slower) - ⚠️ Larson MT: Needs investigation (stability) - 📈 Gap to mimalloc: Need +64% to match (65M → 107M) ## Next Goals 1. **System malloc parity** (94M ops/s): Need +44% improvement 2. **mimalloc parity** (107M ops/s): Need +64% improvement 3. **Fixed Size optimization**: Investigate 10% regression 📊 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude --- CLAUDE.md | 29 ++++++++++++++++++++++++----- 1 file changed, 24 insertions(+), 5 deletions(-) diff --git a/CLAUDE.md b/CLAUDE.md index 5cfdedb7..82a45d5d 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -15,12 +15,25 @@ ### ベンチマーク結果(Random Mixed 256B, 10M iterations) ``` -HAKMEM (Current): 9.4M ops/s (実測) -System malloc: 89.0M ops/s (baseline) -性能差: 9.5倍遅い (10.6% of target) +🥇 mimalloc: 107.11M ops/s (最速) +🥈 System malloc: 93.87M ops/s (baseline) +🥉 HAKMEM: 65.24M ops/s (System比 69.5%) + +HAKMEMの改善: 9.05M → 65.24M ops/s (+621%!) 🚀 ``` -### 🔧 本日の修正(2025-11-21~22) +### 全ベンチマーク比較 +``` +ベンチマーク │ HAKMEM │ System malloc │ mimalloc │ 順位 +------------------+-------------+---------------+--------------+------ +Random Mixed 256B │ 65.24M ops/s│ 93.87M ops/s │ 107.11M ops/s│ 🥉 3位 +Fixed Size 256B │ 41.95M ops/s│ 105.7M ops/s │ - │ ❌ 要改善 +Mid-Large 8KB │ 10.74M ops/s│ 7.85M ops/s │ - │ 🥇 1位 (+37%) +``` + +### 🔧 本日の修正と最適化(2025-11-21~22) + +**バグ修正**: 1. **C7 Stride Upgrade Fix**: 1024B→2048B stride 移行の完全修正 - Local stride table 更新漏れを発見・修正 - False positive NXT_MISALIGN check を無効化 @@ -30,8 +43,14 @@ System malloc: 89.0M ops/s (baseline) - C7 offset を 1→0 に変更(next pointer を user accessible 領域外に隔離) - Header 復元を C1-C6 のみに限定 - Premature slab release を削除 + - **結果**: 100% corruption 除去(0 errors / 200K iterations)✅ -3. **結果**: 100% corruption 除去(0 errors / 200K iterations)✅ +**性能最適化** (+621%改善!): +3. **3つの最適化をデフォルト有効化**: + - `HAKMEM_SS_EMPTY_REUSE=1` - 空slab再利用(syscall削減) + - `HAKMEM_TINY_UNIFIED_CACHE=1` - 統合TLSキャッシュ(hit rate向上) + - `HAKMEM_FRONT_GATE_UNIFIED=1` - 統合front gate(dispatch削減) + - **結果**: 9.05M → 65.24M ops/s (+621%!) 🚀 ### 📊 性能測定の真実(ドキュメント誤記訂正)