Update CLAUDE.md: Document +621% performance improvement and accurate benchmark results

## Performance Summary

### Random Mixed 256B (10M iterations) - 3-way comparison
```
🥇 mimalloc:      107.11M ops/s  (fastest)
🥈 System malloc:  93.87M ops/s  (baseline)
🥉 HAKMEM:         65.24M ops/s  (69.5% of System, 60.9% of mimalloc)
```

**HAKMEM Improvement**: 9.05M → 65.24M ops/s (+621%!) 🚀

### Full Benchmark Comparison
```
Benchmark         │ HAKMEM      │ System malloc │ mimalloc     │ Rank
------------------+-------------+---------------+--------------+------
Random Mixed 256B │ 65.24M ops/s│ 93.87M ops/s  │ 107.11M ops/s│ 🥉 3rd
Fixed Size 256B   │ 41.95M ops/s│ 105.7M ops/s  │ -            │  Needs work
Mid-Large 8KB     │ 10.74M ops/s│ 7.85M ops/s   │ -            │ 🥇 1st (+37%)
```

## What Changed Today (2025-11-21~22)

### Bug Fixes
1. **C7 Stride Upgrade Fix**: Complete 1024B→2048B transition
   - Fixed local stride table omission
   - Disabled false positive NXT_MISALIGN checks
   - Removed redundant geometry validations

2. **C7 TLS SLL Corruption Fix**: Protected next pointer from user data overwrites
   - Changed C7 offset 1→0 (isolated next pointer from user-accessible area)
   - Limited header restoration to C1-C6 only
   - Removed premature slab release
   - **Result**: 100% corruption elimination (0 errors / 200K iterations) 

### Performance Optimizations (+621%!)
3. **Enabled 3 critical optimizations by default**:
   - `HAKMEM_SS_EMPTY_REUSE=1` - Empty slab reuse (syscall reduction)
   - `HAKMEM_TINY_UNIFIED_CACHE=1` - Unified TLS cache (hit rate improvement)
   - `HAKMEM_FRONT_GATE_UNIFIED=1` - Unified front gate (dispatch reduction)
   - **Result**: 9.05M → 65.24M ops/s (+621%!) 🚀

## Current Status

**Strengths**:
-  Random Mixed: 65M ops/s (competitive, 3rd place)
-  Mid-Large 8KB: 10.74M ops/s (beating System by 37%!)
-  Stability: 100% corruption-free

**Needs Work**:
-  Fixed Size 256B: 42M vs System 106M (2.5x slower)
- ⚠️ Larson MT: Needs investigation (stability)
- 📈 Gap to mimalloc: Need +64% to match (65M → 107M)

## Next Goals

1. **System malloc parity** (94M ops/s): Need +44% improvement
2. **mimalloc parity** (107M ops/s): Need +64% improvement
3. **Fixed Size optimization**: Investigate 10% regression

📊 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Moe Charm (CI)
2025-11-22 01:41:06 +09:00
parent 5c9fe34b40
commit 3ad1e4c3fe

View File

@ -15,12 +15,25 @@
### ベンチマーク結果Random Mixed 256B, 10M iterations
```
HAKMEM (Current): 9.4M ops/s (実測)
System malloc: 89.0M ops/s (baseline)
性能差: 9.5倍遅い (10.6% of target)
🥇 mimalloc: 107.11M ops/s (最速)
🥈 System malloc: 93.87M ops/s (baseline)
🥉 HAKMEM: 65.24M ops/s (System比 69.5%)
HAKMEMの改善: 9.05M → 65.24M ops/s (+621%) 🚀
```
### 🔧 本日の修正2025-11-2122
### 全ベンチマーク比較
```
ベンチマーク │ HAKMEM │ System malloc │ mimalloc │ 順位
------------------+-------------+---------------+--------------+------
Random Mixed 256B │ 65.24M ops/s│ 93.87M ops/s │ 107.11M ops/s│ 🥉 3位
Fixed Size 256B │ 41.95M ops/s│ 105.7M ops/s │ - │ ❌ 要改善
Mid-Large 8KB │ 10.74M ops/s│ 7.85M ops/s │ - │ 🥇 1位 (+37%)
```
### 🔧 本日の修正と最適化2025-11-2122
**バグ修正**:
1. **C7 Stride Upgrade Fix**: 1024B→2048B stride 移行の完全修正
- Local stride table 更新漏れを発見・修正
- False positive NXT_MISALIGN check を無効化
@ -30,8 +43,14 @@ System malloc: 89.0M ops/s (baseline)
- C7 offset を 1→0 に変更next pointer を user accessible 領域外に隔離)
- Header 復元を C1-C6 のみに限定
- Premature slab release を削除
- **結果**: 100% corruption 除去0 errors / 200K iterations
3. **結果**: 100% corruption 除去0 errors / 200K iterations
**性能最適化** (+621%改善!):
3. **3つの最適化をデフォルト有効化**:
- `HAKMEM_SS_EMPTY_REUSE=1` - 空slab再利用syscall削減
- `HAKMEM_TINY_UNIFIED_CACHE=1` - 統合TLSキャッシュhit rate向上
- `HAKMEM_FRONT_GATE_UNIFIED=1` - 統合front gatedispatch削減
- **結果**: 9.05M → 65.24M ops/s (+621%) 🚀
### 📊 性能測定の真実(ドキュメント誤記訂正)