Correct CLAUDE.md: Fix performance measurement documentation error
## Critical Discovery The Phase 3d-B (22.6M) and Phase 3d-C (25.1M) performance claims were **never actually measured**. These were mathematical extrapolations of "expected" improvements that were incorrectly documented as measured results. ## Evidence **Phase 3d-C commit (23c0d9541, 2025-11-20)**: ``` Testing: - 10K ops sanity test: PASS (1.4M ops/s) - Baseline established for Phase C-8 benchmark comparison ``` → Only 10K sanity test, NO full benchmark run **Documentation commit (b3a156879, 6 minutes later)**: ``` HAKMEM (Phase 3d-C): 25.1M ops/s (+11.1% vs Phase 3d-B) ✅ ``` → Zero code changes, only CLAUDE.md updated with unverified numbers ## How 25.1M Was Generated Mathematical extrapolation without measurement: ``` Phase 11: 9.38M ops/s (verified) Expected: +12-18% (Phase 3d-B), +8-12% (Phase 3d-C) Calculation: 9.38M × 1.24 × 1.10 = 12.8M (expected) Documented: 22.6M → 25.1M (inflated by stacking "expected" gains) ``` ## True Performance Timeline | Phase | Documented | Actually Measured | |-------|-----------|-------------------| | Phase 11 (2025-11-13) | 9.38M ops/s | ✅ 9.38M (verified) | | Phase 3d-A (2025-11-20) | - | No benchmark | | Phase 3d-B (2025-11-20) | 22.6M ❌ | No full benchmark | | Phase 3d-C (2025-11-20) | 25.1M ❌ | 1.4M (10K sanity only) | | Current (2025-11-22) | - | ✅ 9.4M (verified, 10M iter) | **True cumulative improvement**: 9.38M → 9.4M = **+0.2%** (NOT +168%) ## Corrected Documentation ### Before (Incorrect): ``` HAKMEM (Phase 3d-C): 25.1M ops/s (+11.1% vs Phase 3d-B) ✅ System malloc: 90M ops/s 性能差: 3.6倍遅い (27.9% of target) Phase 3d-B: 22.6M ops/s - g_tls_sll[] 統合 Phase 3d-C: 25.1M ops/s (+11.1%) - Slab分離 ``` ### After (Correct): ``` HAKMEM (Current): 9.4M ops/s (実測, 10M iterations) System malloc: 89.0M ops/s 性能差: 9.5倍遅い (10.6% of target) Phase 3d-B: 実装完了(期待値 +12-18%、実測なし) Phase 3d-C: 実装完了(期待値 +8-12%、実測なし) ``` ## Impact Assessment **No performance regression occurred from today's C7 bug fixes**: - Phase 3d-C (claimed 25.1M): Never existed - Current (9.4M ops/s): Consistent with Phase 11 baseline (9.38M) - C7 corruption fix: Maintained performance while eliminating bugs ✅ ## Lessons Learned 1. **Always run actual benchmarks** before documenting performance 2. **Distinguish "expected" from "measured"** in documentation 3. **Save benchmark command and output** for reproducibility 4. **Verify measurements across multiple runs** for consistency 📊 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
43
CLAUDE.md
43
CLAUDE.md
@ -11,16 +11,16 @@
|
||||
|
||||
---
|
||||
|
||||
## 📊 現在の性能(2025-11-21)
|
||||
## 📊 現在の性能(2025-11-22)
|
||||
|
||||
### ベンチマーク結果(Random Mixed 256B)
|
||||
### ベンチマーク結果(Random Mixed 256B, 10M iterations)
|
||||
```
|
||||
HAKMEM (Bug Fix後): 9.3M ops/s ⚠️
|
||||
System malloc: 58.8M ops/s (baseline)
|
||||
性能差: 6.3倍遅い (15.8% of target)
|
||||
HAKMEM (Current): 9.4M ops/s (実測)
|
||||
System malloc: 89.0M ops/s (baseline)
|
||||
性能差: 9.5倍遅い (10.6% of target)
|
||||
```
|
||||
|
||||
### 🔧 本日の修正(2025-11-21)
|
||||
### 🔧 本日の修正(2025-11-21~22)
|
||||
1. **C7 Stride Upgrade Fix**: 1024B→2048B stride 移行の完全修正
|
||||
- Local stride table 更新漏れを発見・修正
|
||||
- False positive NXT_MISALIGN check を無効化
|
||||
@ -33,24 +33,33 @@ System malloc: 58.8M ops/s (baseline)
|
||||
|
||||
3. **結果**: 100% corruption 除去(0 errors / 200K iterations)✅
|
||||
|
||||
### ⚠️ 性能低下の懸念
|
||||
### 📊 性能測定の真実(ドキュメント誤記訂正)
|
||||
|
||||
**誤記発覚**: Phase 3d-B (22.6M) / Phase 3d-C (25.1M) は**実測されていなかった**
|
||||
|
||||
```
|
||||
Phase 3d-C (2025-11-20): 25.1M ops/s (System比 27.9%)
|
||||
本日(Bug Fix後): 9.3M ops/s (System比 15.8%)
|
||||
性能差: -63% 低下
|
||||
Phase 11 (2025-11-13): 9.38M ops/s ✅ (実測・検証済み)
|
||||
Phase 3d-A (2025-11-20): 実装のみ(benchmark未実施)
|
||||
Phase 3d-B (2025-11-20): 実装のみ(期待値 +12-18%、実測なし)
|
||||
Phase 3d-C (2025-11-20): 10K sanity test 1.4M ops/s のみ(期待値 +8-12%、full benchmark未実施)
|
||||
本日 (2025-11-22): 9.4M ops/s ✅ (実測・検証済み)
|
||||
```
|
||||
|
||||
**原因候補**:
|
||||
- C7 offset=0 の影響(header 犠牲による overhead?)
|
||||
- TLS SLL drain 変更の影響
|
||||
- 測定誤差(System malloc: 90M→58.8M)
|
||||
**真の累積改善**: Phase 11 (9.38M) → Current (9.4M) = **+0.2%** (NOT +168%)
|
||||
|
||||
**次のアクション**: 性能低下の原因調査が必要 🔍
|
||||
**原因**: 期待値の数学的推定が実測値として誤記録された
|
||||
- Phase 3d-B: 9.38M × 1.24 = 11.6M (期待) → 22.6M (誤記)
|
||||
- Phase 3d-C: 11.6M × 1.10 = 12.8M (期待) → 25.1M (誤記)
|
||||
|
||||
**結論**: 今日のバグフィックスによる性能低下は**発生していない** ✅
|
||||
|
||||
### Phase 3d シリーズの成果 🎯
|
||||
1. **Phase 3d-A (SlabMeta Box)**: Box境界確立 - メタデータアクセスのカプセル化
|
||||
2. **Phase 3d-B (TLS Cache Merge)**: 22.6M ops/s - g_tls_sll[] 統合でL1D局所性向上
|
||||
3. **Phase 3d-C (Hot/Cold Split)**: 25.1M ops/s (+11.1%) - Slab分離でキャッシュ効率改善
|
||||
2. **Phase 3d-B (TLS Cache Merge)**: g_tls_sll[] 統合でL1D局所性向上(実装完了、full benchmark未実施)
|
||||
3. **Phase 3d-C (Hot/Cold Split)**: Slab分離でキャッシュ効率改善(実装完了、full benchmark未実施)
|
||||
|
||||
**注**: Phase 3d シリーズは実装のみ完了。期待される性能向上(+12-18%, +8-12%)は未検証。
|
||||
現在の実測性能: **9.4M ops/s** (Phase 11比 +0.2%)
|
||||
|
||||
### Phase 9-11の教訓 🎓
|
||||
1. **Phase 9 (Lazy Deallocation)**: +12% → syscall削減は正しいが不十分
|
||||
|
||||
Reference in New Issue
Block a user