Files
hakmem/docs/status/CURRENT_TASK.md

343 lines
13 KiB
Markdown
Raw Normal View History

# CURRENT TASK - Critical Bugs Fixed
**Last Updated**: 2025-11-29
**Branch**: `master` @ 6d40dc741
**Scope**: Header Corruption + Segfault 根治完了 - 完全安定化達成
---
## 🎉 2025-11-29 UPDATE: CRITICAL BUGS RESOLVED
### ✅ 完了した修正
#### 1. Header Corruption Bug (Class 1) - **根治完了**
- **症状**: `[TLS_SLL_HDR_RESET] cls=1 got=0x00 expect=0xa1`
- **原因**: freelist → TLS SLL の2パスで header 未復元
- **修正**: 両パスに header restoration 追加
- **結果**: 20-thread Larson で header corruption **完全消滅**
**Commits:**
- `3c6c76cb1` - box_carve_and_push_with_freelist() fix
- `a94344c1a` - tiny_drain_freelist_to_sll_once() fix
#### 2. Segmentation Fault Bug - **根治完了**
- **症状**: larson_hakmem が intermittent に SEGV (~50% 確率)
- **原因**: superslab_allocate() の implicit int declaration → pointer corruption via sign extension
- **修正**: 2ファイルに `#include "box/ss_allocation_box.h"` 追加
- **結果**: larson_hakmem が完全安定動作 ✅
**Commit:**
- `6d40dc741` - Add missing superslab_allocate() declaration
### 🔬 Task Agent の貢献
- Header corruption: 全 freelist paths を網羅的調査、dead code path まで発見
- Segfault: gdb + coredump で Assembly レベル解析、sign extension メカニズムを特定
### 📊 現在の安定性
| Test | Status |
|------|--------|
| Larson 1T | ✅ 安定動作 (51.95M ops/s) |
| Larson 4T | ✅ 安定動作 (header validation 有効) |
| Larson 20T | ✅ Header corruption 0 errors |
| Random Mixed | ✅ 安定動作 (66.82M ops/s) |
| SuperSlab expansion | ✅ Segfault 消滅 |
---
## 📋 Stable Master Established (2025-11-26)
Wrap debug fprintf in !HAKMEM_BUILD_RELEASE guards (Release build optimization) ## Changes ### 1. core/page_arena.c - Removed init failure message (lines 25-27) - error is handled by returning early - All other fprintf statements already wrapped in existing #if !HAKMEM_BUILD_RELEASE blocks ### 2. core/hakmem.c - Wrapped SIGSEGV handler init message (line 72) - CRITICAL: Kept SIGSEGV/SIGBUS/SIGABRT error messages (lines 62-64) - production needs crash logs ### 3. core/hakmem_shared_pool.c - Wrapped all debug fprintf statements in #if !HAKMEM_BUILD_RELEASE: - Node pool exhaustion warning (line 252) - SP_META_CAPACITY_ERROR warning (line 421) - SP_FIX_GEOMETRY debug logging (line 745) - SP_ACQUIRE_STAGE0.5_EMPTY debug logging (line 865) - SP_ACQUIRE_STAGE0_L0 debug logging (line 803) - SP_ACQUIRE_STAGE1_LOCKFREE debug logging (line 922) - SP_ACQUIRE_STAGE2_LOCKFREE debug logging (line 996) - SP_ACQUIRE_STAGE3 debug logging (line 1116) - SP_SLOT_RELEASE debug logging (line 1245) - SP_SLOT_FREELIST_LOCKFREE debug logging (line 1305) - SP_SLOT_COMPLETELY_EMPTY debug logging (line 1316) - Fixed lock_stats_init() for release builds (lines 60-65) - ensure g_lock_stats_enabled is initialized ## Performance Validation Before: 51M ops/s (with debug fprintf overhead) After: 49.1M ops/s (consistent performance, fprintf removed from hot paths) ## Build & Test ```bash ./build.sh larson_hakmem ./out/release/larson_hakmem 1 5 1 1000 100 10000 42 # Result: 49.1M ops/s ``` Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 13:14:18 +09:00
**Branch**: `master` (formerly `larson-master-rebuild`)
**Scope**: 安定版 master 確立完了 - Larson 動作 + 67M ops/s
Wrap debug fprintf in !HAKMEM_BUILD_RELEASE guards (Release build optimization) ## Changes ### 1. core/page_arena.c - Removed init failure message (lines 25-27) - error is handled by returning early - All other fprintf statements already wrapped in existing #if !HAKMEM_BUILD_RELEASE blocks ### 2. core/hakmem.c - Wrapped SIGSEGV handler init message (line 72) - CRITICAL: Kept SIGSEGV/SIGBUS/SIGABRT error messages (lines 62-64) - production needs crash logs ### 3. core/hakmem_shared_pool.c - Wrapped all debug fprintf statements in #if !HAKMEM_BUILD_RELEASE: - Node pool exhaustion warning (line 252) - SP_META_CAPACITY_ERROR warning (line 421) - SP_FIX_GEOMETRY debug logging (line 745) - SP_ACQUIRE_STAGE0.5_EMPTY debug logging (line 865) - SP_ACQUIRE_STAGE0_L0 debug logging (line 803) - SP_ACQUIRE_STAGE1_LOCKFREE debug logging (line 922) - SP_ACQUIRE_STAGE2_LOCKFREE debug logging (line 996) - SP_ACQUIRE_STAGE3 debug logging (line 1116) - SP_SLOT_RELEASE debug logging (line 1245) - SP_SLOT_FREELIST_LOCKFREE debug logging (line 1305) - SP_SLOT_COMPLETELY_EMPTY debug logging (line 1316) - Fixed lock_stats_init() for release builds (lines 60-65) - ensure g_lock_stats_enabled is initialized ## Performance Validation Before: 51M ops/s (with debug fprintf overhead) After: 49.1M ops/s (consistent performance, fprintf removed from hot paths) ## Build & Test ```bash ./build.sh larson_hakmem ./out/release/larson_hakmem 1 5 1 1000 100 10000 42 # Result: 49.1M ops/s ``` Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 13:14:18 +09:00
---
## 🎯 現状サマリ
### ✅ 新 master 性能(安定版)
| Benchmark | Performance | Status |
|-----------|-------------|--------|
| Larson 1T | **51.95M ops/s** | ✅ 安定動作 (0% crash) |
| Random Mixed 256B | **66.82M ops/s** | ✅ 安定動作 |
Wrap debug fprintf in !HAKMEM_BUILD_RELEASE guards (Release build optimization) ## Changes ### 1. core/page_arena.c - Removed init failure message (lines 25-27) - error is handled by returning early - All other fprintf statements already wrapped in existing #if !HAKMEM_BUILD_RELEASE blocks ### 2. core/hakmem.c - Wrapped SIGSEGV handler init message (line 72) - CRITICAL: Kept SIGSEGV/SIGBUS/SIGABRT error messages (lines 62-64) - production needs crash logs ### 3. core/hakmem_shared_pool.c - Wrapped all debug fprintf statements in #if !HAKMEM_BUILD_RELEASE: - Node pool exhaustion warning (line 252) - SP_META_CAPACITY_ERROR warning (line 421) - SP_FIX_GEOMETRY debug logging (line 745) - SP_ACQUIRE_STAGE0.5_EMPTY debug logging (line 865) - SP_ACQUIRE_STAGE0_L0 debug logging (line 803) - SP_ACQUIRE_STAGE1_LOCKFREE debug logging (line 922) - SP_ACQUIRE_STAGE2_LOCKFREE debug logging (line 996) - SP_ACQUIRE_STAGE3 debug logging (line 1116) - SP_SLOT_RELEASE debug logging (line 1245) - SP_SLOT_FREELIST_LOCKFREE debug logging (line 1305) - SP_SLOT_COMPLETELY_EMPTY debug logging (line 1316) - Fixed lock_stats_init() for release builds (lines 60-65) - ensure g_lock_stats_enabled is initialized ## Performance Validation Before: 51M ops/s (with debug fprintf overhead) After: 49.1M ops/s (consistent performance, fprintf removed from hot paths) ## Build & Test ```bash ./build.sh larson_hakmem ./out/release/larson_hakmem 1 5 1 1000 100 10000 42 # Result: 49.1M ops/s ``` Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 13:14:18 +09:00
**Branch**: `master` @ d26dd092b
**Architecture**: E1-CORRECT (C0,C7 offset=0; C1-C6 offset=1)
### 📚 旧 master 保存(参考用)
- **Branch**: `master-80M-unstable` @ 328a6b722
- Random Mixed: ~80M ops/s
- Larson: **100% クラッシュ** (Step 2.5 バグ)
- Architecture: UNIFIED-HEADER (全クラス offset=1)
- **80M 達成経路**: `PERFORMANCE_HISTORY_62M_TO_80M.md` 参照
Wrap debug fprintf in !HAKMEM_BUILD_RELEASE guards (Release build optimization) ## Changes ### 1. core/page_arena.c - Removed init failure message (lines 25-27) - error is handled by returning early - All other fprintf statements already wrapped in existing #if !HAKMEM_BUILD_RELEASE blocks ### 2. core/hakmem.c - Wrapped SIGSEGV handler init message (line 72) - CRITICAL: Kept SIGSEGV/SIGBUS/SIGABRT error messages (lines 62-64) - production needs crash logs ### 3. core/hakmem_shared_pool.c - Wrapped all debug fprintf statements in #if !HAKMEM_BUILD_RELEASE: - Node pool exhaustion warning (line 252) - SP_META_CAPACITY_ERROR warning (line 421) - SP_FIX_GEOMETRY debug logging (line 745) - SP_ACQUIRE_STAGE0.5_EMPTY debug logging (line 865) - SP_ACQUIRE_STAGE0_L0 debug logging (line 803) - SP_ACQUIRE_STAGE1_LOCKFREE debug logging (line 922) - SP_ACQUIRE_STAGE2_LOCKFREE debug logging (line 996) - SP_ACQUIRE_STAGE3 debug logging (line 1116) - SP_SLOT_RELEASE debug logging (line 1245) - SP_SLOT_FREELIST_LOCKFREE debug logging (line 1305) - SP_SLOT_COMPLETELY_EMPTY debug logging (line 1316) - Fixed lock_stats_init() for release builds (lines 60-65) - ensure g_lock_stats_enabled is initialized ## Performance Validation Before: 51M ops/s (with debug fprintf overhead) After: 49.1M ops/s (consistent performance, fprintf removed from hot paths) ## Build & Test ```bash ./build.sh larson_hakmem ./out/release/larson_hakmem 1 5 1 1000 100 10000 42 # Result: 49.1M ops/s ``` Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 13:14:18 +09:00
---
## 📋 作業計画
### Phase 0: 安定ベースライン確立 ✅ DONE
- [x] `larson-fix` ブランチから `larson-master-rebuild` 作成
- [x] Larson 動作確認 (51M ops/s)
- [x] Random Mixed 動作確認 (62M ops/s)
### Phase 1: クリーンアップ & 安定化 ✅ DONE
**目標**: 安定状態でコードベースを整理
#### 1.1 Cherry-pick 済み7コミット
- [x] `9793f17d6` レガシーコード削除 (-1,159 LOC)
- [x] `cc0104c4e` テストファイル削除 (-1,750 LOC)
- [x] `416930eb6` バックアップファイル削除 (-1,072 KB)
- [x] `225b6fcc7` 死コード削除: UltraHot, RingCache等 (-1,844 LOC)
- [x] `2c99afa49` 学習システムバグドキュメント
- [x] `328a6b722` Larsonバグ分析更新
- [x] `0143e0fed` CONFIGURATION.md 追加
#### 1.2 追加最適化
- [x] `a2e65716b` tiny_get_max_size inline化 (+2M ops/s期待値)
- [x] `d35504163` Superslab Min-Keep ポート(後にリバート)
- [x] `bea839add` Min-Keep リバートLarson 安定化)
- [x] `d26dd092b` Performance History ドキュメント作成
#### 1.3 master 確立
- [x] 旧 master を `master-80M-unstable` にバックアップ
- [x] master ブランチを安定版 (d26dd092b) に更新
- [x] Larson 0% crash 確認 (51.95M ops/s)
- [x] Random Mixed 67M ops/s 確認
### Phase 2: 性能最適化ポート 📊 PENDING
**目標**: 62M → 80M+ ops/s 回復
#### 2.1 簡単なチューニング(独立・低リスク)
- [ ] `e81fe783d` tiny_get_max_size inline化 (+2M)
- [ ] `04a60c316` Superslab/SharedPool チューニング (+1M)
- [ ] `392d29018` Unified Cache容量チューニング (+1M)
- [ ] `dcd89ee88` Stage 1 lock-free (+0.3M)
#### 2.2 本丸UNIFIED-HEADER
- [ ] `472b6a60b` Phase UNIFIED-HEADER (+17%, C7ヘッダ統一)
- [ ] `d26519f67` UNIFIED-HEADERバグ修正 (+15-41%)
- [ ] `165c33bc2` Larsonフォールバック修正必要なら
#### 2.3 スキップ対象
-`03d321f6b` Phase 27 Ultra-Inline → **-10~15%回帰**
- ❌ Step 2.5関連コミット → **Larsonクラッシュの原因**
### Phase 3: 検証 & マージ 🔀 PENDING
- [ ] Larson 10回平均ベンチマーク
- [ ] Random Mixed 10回平均ベンチマーク
- [ ] master ブランチ更新
Wrap debug fprintf in !HAKMEM_BUILD_RELEASE guards (Release build optimization) ## Changes ### 1. core/page_arena.c - Removed init failure message (lines 25-27) - error is handled by returning early - All other fprintf statements already wrapped in existing #if !HAKMEM_BUILD_RELEASE blocks ### 2. core/hakmem.c - Wrapped SIGSEGV handler init message (line 72) - CRITICAL: Kept SIGSEGV/SIGBUS/SIGABRT error messages (lines 62-64) - production needs crash logs ### 3. core/hakmem_shared_pool.c - Wrapped all debug fprintf statements in #if !HAKMEM_BUILD_RELEASE: - Node pool exhaustion warning (line 252) - SP_META_CAPACITY_ERROR warning (line 421) - SP_FIX_GEOMETRY debug logging (line 745) - SP_ACQUIRE_STAGE0.5_EMPTY debug logging (line 865) - SP_ACQUIRE_STAGE0_L0 debug logging (line 803) - SP_ACQUIRE_STAGE1_LOCKFREE debug logging (line 922) - SP_ACQUIRE_STAGE2_LOCKFREE debug logging (line 996) - SP_ACQUIRE_STAGE3 debug logging (line 1116) - SP_SLOT_RELEASE debug logging (line 1245) - SP_SLOT_FREELIST_LOCKFREE debug logging (line 1305) - SP_SLOT_COMPLETELY_EMPTY debug logging (line 1316) - Fixed lock_stats_init() for release builds (lines 60-65) - ensure g_lock_stats_enabled is initialized ## Performance Validation Before: 51M ops/s (with debug fprintf overhead) After: 49.1M ops/s (consistent performance, fprintf removed from hot paths) ## Build & Test ```bash ./build.sh larson_hakmem ./out/release/larson_hakmem 1 5 1 1000 100 10000 42 # Result: 49.1M ops/s ``` Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 13:14:18 +09:00
---
## 🔍 根本原因分析
Wrap debug fprintf in !HAKMEM_BUILD_RELEASE guards (Release build optimization) ## Changes ### 1. core/page_arena.c - Removed init failure message (lines 25-27) - error is handled by returning early - All other fprintf statements already wrapped in existing #if !HAKMEM_BUILD_RELEASE blocks ### 2. core/hakmem.c - Wrapped SIGSEGV handler init message (line 72) - CRITICAL: Kept SIGSEGV/SIGBUS/SIGABRT error messages (lines 62-64) - production needs crash logs ### 3. core/hakmem_shared_pool.c - Wrapped all debug fprintf statements in #if !HAKMEM_BUILD_RELEASE: - Node pool exhaustion warning (line 252) - SP_META_CAPACITY_ERROR warning (line 421) - SP_FIX_GEOMETRY debug logging (line 745) - SP_ACQUIRE_STAGE0.5_EMPTY debug logging (line 865) - SP_ACQUIRE_STAGE0_L0 debug logging (line 803) - SP_ACQUIRE_STAGE1_LOCKFREE debug logging (line 922) - SP_ACQUIRE_STAGE2_LOCKFREE debug logging (line 996) - SP_ACQUIRE_STAGE3 debug logging (line 1116) - SP_SLOT_RELEASE debug logging (line 1245) - SP_SLOT_FREELIST_LOCKFREE debug logging (line 1305) - SP_SLOT_COMPLETELY_EMPTY debug logging (line 1316) - Fixed lock_stats_init() for release builds (lines 60-65) - ensure g_lock_stats_enabled is initialized ## Performance Validation Before: 51M ops/s (with debug fprintf overhead) After: 49.1M ops/s (consistent performance, fprintf removed from hot paths) ## Build & Test ```bash ./build.sh larson_hakmem ./out/release/larson_hakmem 1 5 1 1000 100 10000 42 # Result: 49.1M ops/s ``` Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 13:14:18 +09:00
### Larson クラッシュの原因
**First Bad Commit**: `19c1abfe7` "Fix Unified Cache TLS SLL bypass"
Wrap debug fprintf in !HAKMEM_BUILD_RELEASE guards (Release build optimization) ## Changes ### 1. core/page_arena.c - Removed init failure message (lines 25-27) - error is handled by returning early - All other fprintf statements already wrapped in existing #if !HAKMEM_BUILD_RELEASE blocks ### 2. core/hakmem.c - Wrapped SIGSEGV handler init message (line 72) - CRITICAL: Kept SIGSEGV/SIGBUS/SIGABRT error messages (lines 62-64) - production needs crash logs ### 3. core/hakmem_shared_pool.c - Wrapped all debug fprintf statements in #if !HAKMEM_BUILD_RELEASE: - Node pool exhaustion warning (line 252) - SP_META_CAPACITY_ERROR warning (line 421) - SP_FIX_GEOMETRY debug logging (line 745) - SP_ACQUIRE_STAGE0.5_EMPTY debug logging (line 865) - SP_ACQUIRE_STAGE0_L0 debug logging (line 803) - SP_ACQUIRE_STAGE1_LOCKFREE debug logging (line 922) - SP_ACQUIRE_STAGE2_LOCKFREE debug logging (line 996) - SP_ACQUIRE_STAGE3 debug logging (line 1116) - SP_SLOT_RELEASE debug logging (line 1245) - SP_SLOT_FREELIST_LOCKFREE debug logging (line 1305) - SP_SLOT_COMPLETELY_EMPTY debug logging (line 1316) - Fixed lock_stats_init() for release builds (lines 60-65) - ensure g_lock_stats_enabled is initialized ## Performance Validation Before: 51M ops/s (with debug fprintf overhead) After: 49.1M ops/s (consistent performance, fprintf removed from hot paths) ## Build & Test ```bash ./build.sh larson_hakmem ./out/release/larson_hakmem 1 5 1 1000 100 10000 42 # Result: 49.1M ops/s ``` Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 13:14:18 +09:00
Step 2.5 が TLS_SLL_PUSH_DUP を「修正」するために追加されたが:
1. TLS_SLL_PUSH_DUP は実際には発生しないベースで10M回テスト済み
2. Step 2.5 がマルチスレッド環境で cross-thread ownership 問題を引き起こす
3. 結論:**不要な「修正」が Larson を壊した**
Wrap debug fprintf in !HAKMEM_BUILD_RELEASE guards (Release build optimization) ## Changes ### 1. core/page_arena.c - Removed init failure message (lines 25-27) - error is handled by returning early - All other fprintf statements already wrapped in existing #if !HAKMEM_BUILD_RELEASE blocks ### 2. core/hakmem.c - Wrapped SIGSEGV handler init message (line 72) - CRITICAL: Kept SIGSEGV/SIGBUS/SIGABRT error messages (lines 62-64) - production needs crash logs ### 3. core/hakmem_shared_pool.c - Wrapped all debug fprintf statements in #if !HAKMEM_BUILD_RELEASE: - Node pool exhaustion warning (line 252) - SP_META_CAPACITY_ERROR warning (line 421) - SP_FIX_GEOMETRY debug logging (line 745) - SP_ACQUIRE_STAGE0.5_EMPTY debug logging (line 865) - SP_ACQUIRE_STAGE0_L0 debug logging (line 803) - SP_ACQUIRE_STAGE1_LOCKFREE debug logging (line 922) - SP_ACQUIRE_STAGE2_LOCKFREE debug logging (line 996) - SP_ACQUIRE_STAGE3 debug logging (line 1116) - SP_SLOT_RELEASE debug logging (line 1245) - SP_SLOT_FREELIST_LOCKFREE debug logging (line 1305) - SP_SLOT_COMPLETELY_EMPTY debug logging (line 1316) - Fixed lock_stats_init() for release builds (lines 60-65) - ensure g_lock_stats_enabled is initialized ## Performance Validation Before: 51M ops/s (with debug fprintf overhead) After: 49.1M ops/s (consistent performance, fprintf removed from hot paths) ## Build & Test ```bash ./build.sh larson_hakmem ./out/release/larson_hakmem 1 5 1 1000 100 10000 42 # Result: 49.1M ops/s ``` Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 13:14:18 +09:00
### 80M 達成の主要因
| コミット | 内容 | 改善幅 |
|---------|------|--------|
| `472b6a60b` | UNIFIED-HEADER (C7統一) | **+17%** |
| `d26519f67` | UH バグ修正 | +15-41% |
| その他チューニング | inline, policy等 | +4-5M |
Wrap debug fprintf in !HAKMEM_BUILD_RELEASE guards (Release build optimization) ## Changes ### 1. core/page_arena.c - Removed init failure message (lines 25-27) - error is handled by returning early - All other fprintf statements already wrapped in existing #if !HAKMEM_BUILD_RELEASE blocks ### 2. core/hakmem.c - Wrapped SIGSEGV handler init message (line 72) - CRITICAL: Kept SIGSEGV/SIGBUS/SIGABRT error messages (lines 62-64) - production needs crash logs ### 3. core/hakmem_shared_pool.c - Wrapped all debug fprintf statements in #if !HAKMEM_BUILD_RELEASE: - Node pool exhaustion warning (line 252) - SP_META_CAPACITY_ERROR warning (line 421) - SP_FIX_GEOMETRY debug logging (line 745) - SP_ACQUIRE_STAGE0.5_EMPTY debug logging (line 865) - SP_ACQUIRE_STAGE0_L0 debug logging (line 803) - SP_ACQUIRE_STAGE1_LOCKFREE debug logging (line 922) - SP_ACQUIRE_STAGE2_LOCKFREE debug logging (line 996) - SP_ACQUIRE_STAGE3 debug logging (line 1116) - SP_SLOT_RELEASE debug logging (line 1245) - SP_SLOT_FREELIST_LOCKFREE debug logging (line 1305) - SP_SLOT_COMPLETELY_EMPTY debug logging (line 1316) - Fixed lock_stats_init() for release builds (lines 60-65) - ensure g_lock_stats_enabled is initialized ## Performance Validation Before: 51M ops/s (with debug fprintf overhead) After: 49.1M ops/s (consistent performance, fprintf removed from hot paths) ## Build & Test ```bash ./build.sh larson_hakmem ./out/release/larson_hakmem 1 5 1 1000 100 10000 42 # Result: 49.1M ops/s ``` Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 13:14:18 +09:00
---
## 📁 関連ファイル
### 修正対象
- `core/front/tiny_unified_cache.c` - Step 2.5 なしのまま維持
- `core/tiny_free_fast_v2.inc.h` - LARSON_FIX 関連
- `core/box/ptr_conversion_box.h` - UNIFIED-HEADER で変更予定
Wrap debug fprintf in !HAKMEM_BUILD_RELEASE guards (Release build optimization) ## Changes ### 1. core/page_arena.c - Removed init failure message (lines 25-27) - error is handled by returning early - All other fprintf statements already wrapped in existing #if !HAKMEM_BUILD_RELEASE blocks ### 2. core/hakmem.c - Wrapped SIGSEGV handler init message (line 72) - CRITICAL: Kept SIGSEGV/SIGBUS/SIGABRT error messages (lines 62-64) - production needs crash logs ### 3. core/hakmem_shared_pool.c - Wrapped all debug fprintf statements in #if !HAKMEM_BUILD_RELEASE: - Node pool exhaustion warning (line 252) - SP_META_CAPACITY_ERROR warning (line 421) - SP_FIX_GEOMETRY debug logging (line 745) - SP_ACQUIRE_STAGE0.5_EMPTY debug logging (line 865) - SP_ACQUIRE_STAGE0_L0 debug logging (line 803) - SP_ACQUIRE_STAGE1_LOCKFREE debug logging (line 922) - SP_ACQUIRE_STAGE2_LOCKFREE debug logging (line 996) - SP_ACQUIRE_STAGE3 debug logging (line 1116) - SP_SLOT_RELEASE debug logging (line 1245) - SP_SLOT_FREELIST_LOCKFREE debug logging (line 1305) - SP_SLOT_COMPLETELY_EMPTY debug logging (line 1316) - Fixed lock_stats_init() for release builds (lines 60-65) - ensure g_lock_stats_enabled is initialized ## Performance Validation Before: 51M ops/s (with debug fprintf overhead) After: 49.1M ops/s (consistent performance, fprintf removed from hot paths) ## Build & Test ```bash ./build.sh larson_hakmem ./out/release/larson_hakmem 1 5 1 1000 100 10000 42 # Result: 49.1M ops/s ``` Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 13:14:18 +09:00
### ドキュメント
- `LEARNING_SYSTEM_BUGS_P0.md` - 学習システムバグ記録
- `CONFIGURATION.md` - ENV変数リファレンス
- `PROFILES.md` - 性能プロファイル
Wrap debug fprintf in !HAKMEM_BUILD_RELEASE guards (Release build optimization) ## Changes ### 1. core/page_arena.c - Removed init failure message (lines 25-27) - error is handled by returning early - All other fprintf statements already wrapped in existing #if !HAKMEM_BUILD_RELEASE blocks ### 2. core/hakmem.c - Wrapped SIGSEGV handler init message (line 72) - CRITICAL: Kept SIGSEGV/SIGBUS/SIGABRT error messages (lines 62-64) - production needs crash logs ### 3. core/hakmem_shared_pool.c - Wrapped all debug fprintf statements in #if !HAKMEM_BUILD_RELEASE: - Node pool exhaustion warning (line 252) - SP_META_CAPACITY_ERROR warning (line 421) - SP_FIX_GEOMETRY debug logging (line 745) - SP_ACQUIRE_STAGE0.5_EMPTY debug logging (line 865) - SP_ACQUIRE_STAGE0_L0 debug logging (line 803) - SP_ACQUIRE_STAGE1_LOCKFREE debug logging (line 922) - SP_ACQUIRE_STAGE2_LOCKFREE debug logging (line 996) - SP_ACQUIRE_STAGE3 debug logging (line 1116) - SP_SLOT_RELEASE debug logging (line 1245) - SP_SLOT_FREELIST_LOCKFREE debug logging (line 1305) - SP_SLOT_COMPLETELY_EMPTY debug logging (line 1316) - Fixed lock_stats_init() for release builds (lines 60-65) - ensure g_lock_stats_enabled is initialized ## Performance Validation Before: 51M ops/s (with debug fprintf overhead) After: 49.1M ops/s (consistent performance, fprintf removed from hot paths) ## Build & Test ```bash ./build.sh larson_hakmem ./out/release/larson_hakmem 1 5 1 1000 100 10000 42 # Result: 49.1M ops/s ``` Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 13:14:18 +09:00
---
## ✅ 完了マイルストーン
Wrap debug fprintf in !HAKMEM_BUILD_RELEASE guards (Release build optimization) ## Changes ### 1. core/page_arena.c - Removed init failure message (lines 25-27) - error is handled by returning early - All other fprintf statements already wrapped in existing #if !HAKMEM_BUILD_RELEASE blocks ### 2. core/hakmem.c - Wrapped SIGSEGV handler init message (line 72) - CRITICAL: Kept SIGSEGV/SIGBUS/SIGABRT error messages (lines 62-64) - production needs crash logs ### 3. core/hakmem_shared_pool.c - Wrapped all debug fprintf statements in #if !HAKMEM_BUILD_RELEASE: - Node pool exhaustion warning (line 252) - SP_META_CAPACITY_ERROR warning (line 421) - SP_FIX_GEOMETRY debug logging (line 745) - SP_ACQUIRE_STAGE0.5_EMPTY debug logging (line 865) - SP_ACQUIRE_STAGE0_L0 debug logging (line 803) - SP_ACQUIRE_STAGE1_LOCKFREE debug logging (line 922) - SP_ACQUIRE_STAGE2_LOCKFREE debug logging (line 996) - SP_ACQUIRE_STAGE3 debug logging (line 1116) - SP_SLOT_RELEASE debug logging (line 1245) - SP_SLOT_FREELIST_LOCKFREE debug logging (line 1305) - SP_SLOT_COMPLETELY_EMPTY debug logging (line 1316) - Fixed lock_stats_init() for release builds (lines 60-65) - ensure g_lock_stats_enabled is initialized ## Performance Validation Before: 51M ops/s (with debug fprintf overhead) After: 49.1M ops/s (consistent performance, fprintf removed from hot paths) ## Build & Test ```bash ./build.sh larson_hakmem ./out/release/larson_hakmem 1 5 1 1000 100 10000 42 # Result: 49.1M ops/s ``` Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 13:14:18 +09:00
1. **Larson 安定化** - 51M ops/s で動作 ✅
2. **Cherry-pick Phase 1** - 7コミット完了 ✅
3. **ベースライン確立** - 62M/51M で安定 ✅
Wrap debug fprintf in !HAKMEM_BUILD_RELEASE guards (Release build optimization) ## Changes ### 1. core/page_arena.c - Removed init failure message (lines 25-27) - error is handled by returning early - All other fprintf statements already wrapped in existing #if !HAKMEM_BUILD_RELEASE blocks ### 2. core/hakmem.c - Wrapped SIGSEGV handler init message (line 72) - CRITICAL: Kept SIGSEGV/SIGBUS/SIGABRT error messages (lines 62-64) - production needs crash logs ### 3. core/hakmem_shared_pool.c - Wrapped all debug fprintf statements in #if !HAKMEM_BUILD_RELEASE: - Node pool exhaustion warning (line 252) - SP_META_CAPACITY_ERROR warning (line 421) - SP_FIX_GEOMETRY debug logging (line 745) - SP_ACQUIRE_STAGE0.5_EMPTY debug logging (line 865) - SP_ACQUIRE_STAGE0_L0 debug logging (line 803) - SP_ACQUIRE_STAGE1_LOCKFREE debug logging (line 922) - SP_ACQUIRE_STAGE2_LOCKFREE debug logging (line 996) - SP_ACQUIRE_STAGE3 debug logging (line 1116) - SP_SLOT_RELEASE debug logging (line 1245) - SP_SLOT_FREELIST_LOCKFREE debug logging (line 1305) - SP_SLOT_COMPLETELY_EMPTY debug logging (line 1316) - Fixed lock_stats_init() for release builds (lines 60-65) - ensure g_lock_stats_enabled is initialized ## Performance Validation Before: 51M ops/s (with debug fprintf overhead) After: 49.1M ops/s (consistent performance, fprintf removed from hot paths) ## Build & Test ```bash ./build.sh larson_hakmem ./out/release/larson_hakmem 1 5 1 1000 100 10000 42 # Result: 49.1M ops/s ``` Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 13:14:18 +09:00
---
Phase FREE-FRONT-V3-1: Free route snapshot infrastructure + build fix Summary: ======== Implemented Phase FREE-FRONT-V3 infrastructure to optimize free hotpath by: 1. Creating snapshot-based route decision table (consolidating route logic) 2. Removing redundant ENV checks from hot path 3. Preparing for future integration into hak_free_at() Key Changes: ============ 1. NEW FILES: - core/box/free_front_v3_env_box.h: Route snapshot definition & API - core/box/free_front_v3_env_box.c: Snapshot initialization & caching 2. Infrastructure Details: - FreeRouteSnapshotV3: Maps class_idx → free_route_kind for all 8 classes - Routes defined: LEGACY, TINY_V3, CORE_V6_C6, POOL_V1 - ENV-gated initialization (HAKMEM_TINY_FREE_FRONT_V3_ENABLED, default OFF) - Per-thread TLS caching to avoid repeated ENV reads 3. Design Goals: - Consolidate tiny_route_for_class() results into snapshot table - Remove C7 ULTRA / v4 / v5 / v6 ENV checks from hot path - Limit lookup (ss_fast_lookup/slab_index_for) to paths that truly need it - Clear ownership boundary: front v3 handles routing, downstream handles free 4. Phase Plan: - v3-1 ✅ COMPLETE: Infrastructure (snapshot table, ENV initialization, TLS cache) - v3-2 (INFRASTRUCTURE ONLY): Placeholder integration in hak_free_api.inc.h - v3-3 (FUTURE): Full integration + benchmark A/B to measure hotpath improvement 5. BUILD FIX: - Added missing core/box/c7_meta_used_counter_box.o to OBJS_BASE in Makefile - This symbol was referenced but not linked, causing undefined reference errors - Benchmark targets now build cleanly without LTO Status: ======= - Build: ✅ PASS (bench_allocators_hakmem builds without errors) - Integration: Currently DISABLED (default OFF, ready for v3-2 phase) - No performance impact: Infrastructure-only, hotpath unchanged Future Work: ============ - Phase v3-2: Integrate snapshot routing into hak_free_at() main path - Phase v3-3: Measure free hotpath performance improvement (target: 1-2% less branch mispredict) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-11 19:17:30 +09:00
## Phase FREE-LEGACY-OPT シリーズ2025-12-11
### Phase FREE-LEGACY-OPT-4-1: Legacy per-class 分析 ✅ 完了
**目的**: Legacy fallback 49.2% の内訳を per-class で分析
**測定結果Mixed 16-1024B**:
- **C6 (513-1024B)**: 51.4% (137,319 / 266,942 Legacy calls)
- C5 (257-512B): 25.8%
- C4 (129-256B): 13.0%
- C3 (65-128B): 6.5%
- C2 (33-64B): 3.3%
- C0/C1/C7: 0.0%
**最大ターゲット**: C6 が Legacy の過半数を占める
**詳細**: `docs/analysis/FREE_LEGACY_PATH_ANALYSIS.md` 参照
### Phase FREE-LEGACY-OPT-4-2: C6_ULTRA_FREE_BOX 実装(進行中)
**目的**: C6 の free だけを C7 ULTRA 風 TLS キャッシュで受け、Legacy fallback を半減
**実装範囲**:
- C6 専用・free 専用alloc は既存ルートのまま)
- TLS に `c6_freelist[32]` + `c6_count` + segment range check
- ENV: `HAKMEM_TINY_C6_ULTRA_FREE_ENABLED=0`(研究箱、デフォルト OFF
**期待効果**:
- Legacy fallback: 49.2% → 24-27%C6 分を削減)
- Mixed throughput: +5-8% 改善44.8M → 47-48M ops/s
---
## 🎯 次のアクション
Wrap debug fprintf in !HAKMEM_BUILD_RELEASE guards (Release build optimization) ## Changes ### 1. core/page_arena.c - Removed init failure message (lines 25-27) - error is handled by returning early - All other fprintf statements already wrapped in existing #if !HAKMEM_BUILD_RELEASE blocks ### 2. core/hakmem.c - Wrapped SIGSEGV handler init message (line 72) - CRITICAL: Kept SIGSEGV/SIGBUS/SIGABRT error messages (lines 62-64) - production needs crash logs ### 3. core/hakmem_shared_pool.c - Wrapped all debug fprintf statements in #if !HAKMEM_BUILD_RELEASE: - Node pool exhaustion warning (line 252) - SP_META_CAPACITY_ERROR warning (line 421) - SP_FIX_GEOMETRY debug logging (line 745) - SP_ACQUIRE_STAGE0.5_EMPTY debug logging (line 865) - SP_ACQUIRE_STAGE0_L0 debug logging (line 803) - SP_ACQUIRE_STAGE1_LOCKFREE debug logging (line 922) - SP_ACQUIRE_STAGE2_LOCKFREE debug logging (line 996) - SP_ACQUIRE_STAGE3 debug logging (line 1116) - SP_SLOT_RELEASE debug logging (line 1245) - SP_SLOT_FREELIST_LOCKFREE debug logging (line 1305) - SP_SLOT_COMPLETELY_EMPTY debug logging (line 1316) - Fixed lock_stats_init() for release builds (lines 60-65) - ensure g_lock_stats_enabled is initialized ## Performance Validation Before: 51M ops/s (with debug fprintf overhead) After: 49.1M ops/s (consistent performance, fprintf removed from hot paths) ## Build & Test ```bash ./build.sh larson_hakmem ./out/release/larson_hakmem 1 5 1 1000 100 10000 42 # Result: 49.1M ops/s ``` Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 13:14:18 +09:00
### 現時点での選択肢
1. **Option A: 現状維持(推奨)**
- master @ 67M ops/s (Larson 安定)
- 80M の知見は `PERFORMANCE_HISTORY_62M_TO_80M.md``master-80M-unstable` に保存済み
- Phase 2 (性能最適化) は将来の作業として保留
2. **Option B: UNIFIED-HEADER ポート(高難度)**
- 80M 達成の主要因(+17% + +15-41%
- E1-CORRECT との互換性問題あり
- 大規模な書き換えが必要
- 詳細: `PERFORMANCE_HISTORY_62M_TO_80M.md` Section "Option 3"
3. **Option C: Step 2.5 Revert失敗済み**
- master-80M-unstable から Step 2.5 をリバート
- 複雑な conflict (33行変更) で35+ 回失敗済み
- 推奨しない
Phase FREE-DISPATCHER-OPT-1: free dispatcher 統計計測 **目的**: free dispatcher(29%)の内訳を細分化して計測。 **実装内容**: - FreeDispatchStats 構造体追加(ENV: HAKMEM_FREE_DISPATCH_STATS, default 0) - カウンタ: total_calls / domain (tiny/mid/large) / route (ultra/legacy/pool/v6) / env_checks / route_for_class_calls - hak_free_at / tiny_route_for_class / tiny_route_snapshot_init にカウンタ埋め込み - 挙動変更なし(計測のみ、ENV OFF 時は overhead ゼロ) **計測結果**: Mixed 16-1024B (1M iter, ws=400): - total=8,081, route_calls=267,967, env_checks=9 - BENCH_FAST_FRONT により大半は早期リターン - route_for_class は主に alloc 側で呼ばれる(267k calls vs 8k frees) - ENV check は初期化時の 9回のみ(snapshot 効果) C6-heavy (257-768B, 1M iter, ws=400): - total=500,099, route_calls=1,034, env_checks=9 - fg_classify_domain に到達する free が多い - route_for_class 呼び出しは極小(snapshot 効果) **結論**: - ENV check は既に十分最適化されている(初期化時のみ) - route_for_class は alloc 側での呼び出しが主で、free 側は snapshot で O(1) - 次フェーズ(OPT-2)では別のアプローチを検討 **ドキュメント追加**: - docs/analysis/FREE_DISPATCHER_ANALYSIS.md(新規) - CURRENT_TASK.md に Phase FREE-DISPATCHER-OPT-1 セクション追加 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-11 21:21:40 +09:00
---
## Phase PERF-ULTRA-REBASE-2: C4-C7 ULTRA free最適化後のperf再計測 (2025-12-11)
**計測対象**: Mixed 16-1024B, C6-heavy
### 計測条件
- **Mixed 16-1024B**:
- ENV: `HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE`
- ULTRA: C4-C7 全て ON
- ワークロード: iters=1M, ws=400
- perf: -F 5000 --call-graph dwarf -e cycles:u
- **C6-heavy**:
- ENV: `HAKMEM_PROFILE=C6_HEAVY_LEGACY_POOLV1`
- pool v1 + C6/C5/C4 ULTRA
- ワークロード: 1 thread, iters=1M, ws=400
### スループット
- **Mixed**: 42.63M ops/s前回 31.61M から大幅改善、但し ws=400 vs 8192の差
- **C6-heavy**: 26.49M ops/s
### 新しい最大ホットスポットMixed
**WARNING: サンプル数が極めて少ない238 samples**
| 順位 | 関数 | self% | 分類 |
|------|------|-------|------|
| #1 | free | 27.88% | ベンチマーク wrapper |
| #2 | tiny_alloc_gate_fast | 23.57% | alloc gate/front |
| #3 | main | 17.66% | ベンチマーク harness |
| #4 | malloc | 6.94% | ベンチマーク wrapper |
| #5 | header 書き込み合計 | 8.07% | tiny_region_id_write_header |
**ULTRA 関連関数が全て不可視**: tiny_c7_ultra_alloc や ULTRA free 群が上位20に入っていない
### 前フェーズPERF-ULTRA-FREE-OPT-1との比較
- **前回ws=8192, iters=10M**: C7 ULTRA alloc 7.66%, ULTRA free群 5.41%, so_alloc 3.60%
- **今回ws=400, iters=1M**: これらが全て不可視(< 0.63%
- **サンプル数**: 前回1842 samples vs 今回238 samples約1/8
### 結論
- **計測条件の再検討が必要**: ワークロードが軽すぎiters=1M, ws=400
- perf サンプル数が238と極めて少なく、統計的信頼性が低い
- ベンチマーク時間が0.023s23msと極めて短く、allocator 本体が可視化されていない
- **次の最大ボトルネックを確定できず**:
- gate/front23.57%)と header8.07%)が可視範囲での上位
- ULTRA alloc/free の実態が見えていない
- **推奨される次の手順**:
- iters を 10M に増やす、または ws を 8192 に拡大して再計測
- サンプル数を1000+に増やして統計的信頼性を確保
### C6-heavy 分析
- **pool v1 が支配的**: alloc 14.46% + free 19.89% = 34.35%
- **ULTRA は可視化されず**: C6/C5/C4 ULTRA 関数が上位に入っていない
- **pthread_once が8.21%**: 初期化同期のオーバーヘッドが目立つ(ワークロードが軽い証拠)
### 出力物
1. **perf_ultra_mixed_v2.txt** - Mixed の perf report 完全出力238 samples
2. **perf_ultra_c6_v2.txt** - C6-heavy の perf report 完全出力292 samples
3. **更新済み TINY_CPU_HOTPATH_USERLAND_ANALYSIS.md** - 解析結果とWARNINGを記載
4. **更新済み CURRENT_TASK.md** - 本セクション
---
## Phase PERF-ULTRA-REBASE-2 後の次フェーズ候補(要判断)
### 判断保留理由
- **サンプル不足**: 現状の perf 結果では allocator 本体のボトルネックが可視化されていない
- **再計測が必須**: より重いワークロードiters=10M, ws=8192で再測定してから判断すべき
### パターン分析(参考:前回 PERF-ULTRA-REBASE-1 の結果に基づく)
#### パターン A: tiny_c7_ultra_alloc が依然トップの場合
- **前回の値**: 7.66% self最大ホットスポット
- 理由: ULTRA alloc の内部構造refill/page_metaに起因
- 次フェーズ案: **Phase PERF-ULTRA-REFILL-OPT-1**
- C7 ULTRA の refill pathcold 側)を最適化
- segment 取得ロジックの軽量化
- page_meta 管理の簡略化
- 判断: C7 refill を弄る技術的難度と効果のバランスを検討
#### パターン B: so_alloc_fast / page_of が浮いてきた場合
- **前回の値**: so_alloc系 3.60%, page_of系 2.74%
- 理由: ULTRA が C4-C7 を完全カバーし、v3 backend 側が relative に大きくなった
- 次フェーズ案: **Phase PERF-ULTRA-V3-OPT-1**
- so_alloc_v3 の hotpath 簡略化
- page_ofptr→page 解決)の高速化
- header write の最適化(既に thin は確認済み)
- 判断: v3 backend 自体を軽くするか、別 backendv4/v6を考えるか
#### パターン C: 残り legacy (C2/C3) が顕著な場合
- 確認: legacy 総数が何%か、C2/C3 の比率を確認
- 判断基準:
- C2/C3 合計が全体の 5% 以上 → C3 ULTRA を検討する価値あり
- C2/C3 合計が 2% 未満 → 後回しL1 汚染のリスクが高い)
#### パターン D: tiny_alloc_gate_fast / header が上位の場合(今回観測)
- **今回の値**: gate 23.57%, header 8.07%
- 理由: ワークロードが軽すぎて allocator 本体が見えていない可能性が高い
- 次フェーズ案: **Phase PERF-ULTRA-REBASE-3再計測**
- より重いワークロードiters=10M, ws=8192で再測定
- サンプル数1000+を確保して統計的信頼性を高める
- その上でパターンA/B/Cのどれかに該当するか判断
### 推奨される判断フロー
1. **Phase PERF-ULTRA-REBASE-3再計測を実施**
- iters=10M, ws=8192 で Mixed を再計測
- perf record 時間を長くしてサンプル数を確保
2. **再計測後の perf_ultra_mixed_v3.txt の #1/#2 を見る**
- パターン A/B/C/D のどれに該当するか判定
3. **該当フェーズ案を実装 or 研究箱化の判断**
- A: ULTRA refill 最適化(高難度、効果大)
- B: v3 backend 最適化(中難度、効果中)
- C: C2/C3 ULTRA 追加(低難度、効果不明)
- D: 再計測(必須)