Files
hakmem/core/box/free_local_box.c

125 lines
5.7 KiB
C
Raw Normal View History

CRITICAL FIX: TLS 未初期化による 4T SEGV を完全解消 **問題:** - Larson 4T で 100% SEGV (1T は 2.09M ops/s で完走) - System/mimalloc は 4T で 33.52M ops/s 正常動作 - SS OFF + Remote OFF でも 4T で SEGV **根本原因: (Task agent ultrathink 調査結果)** ``` CRASH: mov (%r15),%r13 R15 = 0x6261 ← ASCII "ba" (ゴミ値、未初期化TLS) ``` Worker スレッドの TLS 変数が未初期化: - `__thread void* g_tls_sll_head[TINY_NUM_CLASSES];` ← 初期化なし - pthread_create() で生成されたスレッドでゼロ初期化されない - NULL チェックが通過 (0x6261 != NULL) → dereference → SEGV **修正内容:** 全 TLS 配列に明示的初期化子 `= {0}` を追加: 1. **core/hakmem_tiny.c:** - `g_tls_sll_head[TINY_NUM_CLASSES] = {0}` - `g_tls_sll_count[TINY_NUM_CLASSES] = {0}` - `g_tls_live_ss[TINY_NUM_CLASSES] = {0}` - `g_tls_bcur[TINY_NUM_CLASSES] = {0}` - `g_tls_bend[TINY_NUM_CLASSES] = {0}` 2. **core/tiny_fastcache.c:** - `g_tiny_fast_cache[TINY_FAST_CLASS_COUNT] = {0}` - `g_tiny_fast_count[TINY_FAST_CLASS_COUNT] = {0}` - `g_tiny_fast_free_head[TINY_FAST_CLASS_COUNT] = {0}` - `g_tiny_fast_free_count[TINY_FAST_CLASS_COUNT] = {0}` 3. **core/hakmem_tiny_magazine.c:** - `g_tls_mags[TINY_NUM_CLASSES] = {0}` 4. **core/tiny_sticky.c:** - `g_tls_sticky_ss[TINY_NUM_CLASSES][TINY_STICKY_RING] = {0}` - `g_tls_sticky_idx[TINY_NUM_CLASSES][TINY_STICKY_RING] = {0}` - `g_tls_sticky_pos[TINY_NUM_CLASSES] = {0}` **効果:** ``` Before: 1T: 2.09M ✅ | 4T: SEGV 💀 After: 1T: 2.41M ✅ | 4T: 4.19M ✅ (+15% 1T, SEGV解消) ``` **テスト:** ```bash # 1 thread: 完走 ./larson_hakmem 2 8 128 1024 1 12345 1 → Throughput = 2,407,597 ops/s ✅ # 4 threads: 完走(以前は SEGV) ./larson_hakmem 2 8 128 1024 1 12345 4 → Throughput = 4,192,155 ops/s ✅ ``` **調査協力:** Task agent (ultrathink mode) による完璧な根本原因特定 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-07 01:27:04 +09:00
#include "free_local_box.h"
#include "free_publish_box.h"
#include "hakmem_tiny.h"
Phase E1-CORRECT: Fix USER/BASE pointer conversion bugs in slab_index_for calls CRITICAL BUG FIX: Phase E1 introduced 1-byte headers for ALL size classes (C0-C7), changing the pointer contract. However, many locations still called slab_index_for() with USER pointers (storage+1) instead of BASE pointers (storage), causing off-by-one slab index calculations that corrupted memory. Root Cause: - USER pointer = BASE + 1 (returned by malloc, points past header) - BASE pointer = storage start (where 1-byte header is written) - slab_index_for() expects BASE pointer for correct slab boundary calculations - Passing USER pointer → wrong slab_idx → wrong metadata → freelist corruption Impact Before Fix: - bench_random_mixed crashes at ~14K iterations with SEGV - Massive C7 alignment check failures (wrong slab classification) - Memory corruption from writing to wrong slab freelists Fixes Applied (8 locations): 1. core/hakmem_tiny_free.inc:137 - Added USER→BASE conversion before slab_index_for() 2. core/hakmem_tiny_ultra_simple.inc:148 - Added USER→BASE conversion before slab_index_for() 3. core/tiny_free_fast.inc.h:220 - Added USER→BASE conversion before slab_index_for() 4-5. core/tiny_free_magazine.inc.h:126,315 - Added USER→BASE conversion before slab_index_for() (2 locations) 6. core/box/free_local_box.c:14,22,62 - Added USER→BASE conversion before slab_index_for() - Fixed delta calculation to use BASE instead of USER - Fixed debug logging to use BASE instead of USER 7. core/hakmem_tiny.c:448,460,473 (tiny_debug_track_alloc_ret) - Added USER→BASE conversion before slab_index_for() (2 calls) - Fixed delta calculation to use BASE instead of USER - This function is called on EVERY allocation in debug builds Results After Fix: ✅ bench_random_mixed stable up to 66K iterations (~4.7x improvement) ✅ C7 alignment check failures eliminated (was: 100% failure rate) ✅ Front Gate "Unknown" classification dropped to 0% (was: 1.67%) ✅ No segfaults for workloads up to ~33K allocations Remaining Issue: ❌ Segfault still occurs at iteration 66152 (allocs=33137, frees=33014) - Different bug from USER/BASE conversion issues - Likely capacity/boundary condition (further investigation needed) Testing: - bench_random_mixed_hakmem 1K-66K iterations: PASS - bench_random_mixed_hakmem 67K+ iterations: FAIL (different bug) - bench_fixed_size_hakmem 200K iterations: PASS 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 05:21:36 +09:00
#include "tiny_next_ptr_box.h" // Phase E1-CORRECT: Box API
CRITICAL FIX: TLS 未初期化による 4T SEGV を完全解消 **問題:** - Larson 4T で 100% SEGV (1T は 2.09M ops/s で完走) - System/mimalloc は 4T で 33.52M ops/s 正常動作 - SS OFF + Remote OFF でも 4T で SEGV **根本原因: (Task agent ultrathink 調査結果)** ``` CRASH: mov (%r15),%r13 R15 = 0x6261 ← ASCII "ba" (ゴミ値、未初期化TLS) ``` Worker スレッドの TLS 変数が未初期化: - `__thread void* g_tls_sll_head[TINY_NUM_CLASSES];` ← 初期化なし - pthread_create() で生成されたスレッドでゼロ初期化されない - NULL チェックが通過 (0x6261 != NULL) → dereference → SEGV **修正内容:** 全 TLS 配列に明示的初期化子 `= {0}` を追加: 1. **core/hakmem_tiny.c:** - `g_tls_sll_head[TINY_NUM_CLASSES] = {0}` - `g_tls_sll_count[TINY_NUM_CLASSES] = {0}` - `g_tls_live_ss[TINY_NUM_CLASSES] = {0}` - `g_tls_bcur[TINY_NUM_CLASSES] = {0}` - `g_tls_bend[TINY_NUM_CLASSES] = {0}` 2. **core/tiny_fastcache.c:** - `g_tiny_fast_cache[TINY_FAST_CLASS_COUNT] = {0}` - `g_tiny_fast_count[TINY_FAST_CLASS_COUNT] = {0}` - `g_tiny_fast_free_head[TINY_FAST_CLASS_COUNT] = {0}` - `g_tiny_fast_free_count[TINY_FAST_CLASS_COUNT] = {0}` 3. **core/hakmem_tiny_magazine.c:** - `g_tls_mags[TINY_NUM_CLASSES] = {0}` 4. **core/tiny_sticky.c:** - `g_tls_sticky_ss[TINY_NUM_CLASSES][TINY_STICKY_RING] = {0}` - `g_tls_sticky_idx[TINY_NUM_CLASSES][TINY_STICKY_RING] = {0}` - `g_tls_sticky_pos[TINY_NUM_CLASSES] = {0}` **効果:** ``` Before: 1T: 2.09M ✅ | 4T: SEGV 💀 After: 1T: 2.41M ✅ | 4T: 4.19M ✅ (+15% 1T, SEGV解消) ``` **テスト:** ```bash # 1 thread: 完走 ./larson_hakmem 2 8 128 1024 1 12345 1 → Throughput = 2,407,597 ops/s ✅ # 4 threads: 完走(以前は SEGV) ./larson_hakmem 2 8 128 1024 1 12345 4 → Throughput = 4,192,155 ops/s ✅ ``` **調査協力:** Task agent (ultrathink mode) による完璧な根本原因特定 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-07 01:27:04 +09:00
void tiny_free_local_box(SuperSlab* ss, int slab_idx, TinySlabMeta* meta, void* ptr, uint32_t my_tid) {
extern _Atomic uint64_t g_free_local_box_calls;
atomic_fetch_add_explicit(&g_free_local_box_calls, 1, memory_order_relaxed);
if (!(ss && ss->magic == SUPERSLAB_MAGIC)) return;
if (slab_idx < 0 || slab_idx >= ss_slabs_capacity(ss)) return;
(void)my_tid;
Phase E1-CORRECT: Fix USER/BASE pointer conversion bugs in slab_index_for calls CRITICAL BUG FIX: Phase E1 introduced 1-byte headers for ALL size classes (C0-C7), changing the pointer contract. However, many locations still called slab_index_for() with USER pointers (storage+1) instead of BASE pointers (storage), causing off-by-one slab index calculations that corrupted memory. Root Cause: - USER pointer = BASE + 1 (returned by malloc, points past header) - BASE pointer = storage start (where 1-byte header is written) - slab_index_for() expects BASE pointer for correct slab boundary calculations - Passing USER pointer → wrong slab_idx → wrong metadata → freelist corruption Impact Before Fix: - bench_random_mixed crashes at ~14K iterations with SEGV - Massive C7 alignment check failures (wrong slab classification) - Memory corruption from writing to wrong slab freelists Fixes Applied (8 locations): 1. core/hakmem_tiny_free.inc:137 - Added USER→BASE conversion before slab_index_for() 2. core/hakmem_tiny_ultra_simple.inc:148 - Added USER→BASE conversion before slab_index_for() 3. core/tiny_free_fast.inc.h:220 - Added USER→BASE conversion before slab_index_for() 4-5. core/tiny_free_magazine.inc.h:126,315 - Added USER→BASE conversion before slab_index_for() (2 locations) 6. core/box/free_local_box.c:14,22,62 - Added USER→BASE conversion before slab_index_for() - Fixed delta calculation to use BASE instead of USER - Fixed debug logging to use BASE instead of USER 7. core/hakmem_tiny.c:448,460,473 (tiny_debug_track_alloc_ret) - Added USER→BASE conversion before slab_index_for() (2 calls) - Fixed delta calculation to use BASE instead of USER - This function is called on EVERY allocation in debug builds Results After Fix: ✅ bench_random_mixed stable up to 66K iterations (~4.7x improvement) ✅ C7 alignment check failures eliminated (was: 100% failure rate) ✅ Front Gate "Unknown" classification dropped to 0% (was: 1.67%) ✅ No segfaults for workloads up to ~33K allocations Remaining Issue: ❌ Segfault still occurs at iteration 66152 (allocs=33137, frees=33014) - Different bug from USER/BASE conversion issues - Likely capacity/boundary condition (further investigation needed) Testing: - bench_random_mixed_hakmem 1K-66K iterations: PASS - bench_random_mixed_hakmem 67K+ iterations: FAIL (different bug) - bench_fixed_size_hakmem 200K iterations: PASS 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 05:21:36 +09:00
// ✅ Phase E1-CORRECT: ALL classes have headers, calculate BASE pointer once
void* base = (void*)((uint8_t*)ptr - 1);
Phase 6-2.3~6-2.5: Critical bug fixes + SuperSlab optimization (WIP) ## Phase 6-2.3: Fix 4T Larson crash (active counter bug) ✅ **Problem:** 4T Larson crashed with "free(): invalid pointer", OOM errors **Root cause:** core/hakmem_tiny_refill_p0.inc.h:103 - P0 batch refill moved freelist blocks to TLS cache - Active counter NOT incremented → double-decrement on free - Counter underflows → SuperSlab appears full → OOM → crash **Fix:** Added ss_active_add(tls->ss, from_freelist); **Result:** 4T stable at 838K ops/s ✅ ## Phase 6-2.4: Fix SEGV in random_mixed/mid_large_mt benchmarks ✅ **Problem:** bench_random_mixed_hakmem, bench_mid_large_mt_hakmem → immediate SEGV **Root cause #1:** core/box/hak_free_api.inc.h:92-95 - "Guess loop" dereferenced unmapped memory when registry lookup failed **Root cause #2:** core/box/hak_free_api.inc.h:115 - Header magic check dereferenced unmapped memory **Fix:** 1. Removed dangerous guess loop (lines 92-95) 2. Added hak_is_memory_readable() check before dereferencing header (core/hakmem_internal.h:277-294 - uses mincore() syscall) **Result:** - random_mixed (2KB): SEGV → 2.22M ops/s ✅ - random_mixed (4KB): SEGV → 2.58M ops/s ✅ - Larson 4T: no regression (838K ops/s) ✅ ## Phase 6-2.5: Performance investigation + SuperSlab fix (WIP) ⚠️ **Problem:** Severe performance gaps (19-26x slower than system malloc) **Investigation:** Task agent identified root cause - hak_is_memory_readable() syscall overhead (100-300 cycles per free) - ALL frees hit unmapped_header_fallback path - SuperSlab lookup NEVER called - Why? g_use_superslab = 0 (disabled by diet mode) **Root cause:** core/hakmem_tiny_init.inc:104-105 - Diet mode (default ON) disables SuperSlab - SuperSlab defaults to 1 (hakmem_config.c:334) - BUT diet mode overrides it to 0 during init **Fix:** Separate SuperSlab from diet mode - SuperSlab: Performance-critical (fast alloc/free) - Diet mode: Memory efficiency (magazine capacity limits only) - Both are independent features, should not interfere **Status:** ⚠️ INCOMPLETE - New SEGV discovered after fix - SuperSlab lookup now works (confirmed via debug output) - But benchmark crashes (Exit 139) after ~20 lookups - Needs further investigation **Files modified:** - core/hakmem_tiny_init.inc:99-109 - Removed diet mode override - PERFORMANCE_INVESTIGATION_REPORT.md - Task agent analysis (303x instruction gap) **Next steps:** - Investigate new SEGV (likely SuperSlab free path bug) - OR: Revert Phase 6-2.5 changes if blocking progress 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-07 20:31:01 +09:00
if (__builtin_expect(tiny_refill_failfast_level() >= 2, 0)) {
Phase E1-CORRECT: Fix USER/BASE pointer conversion bugs in slab_index_for calls CRITICAL BUG FIX: Phase E1 introduced 1-byte headers for ALL size classes (C0-C7), changing the pointer contract. However, many locations still called slab_index_for() with USER pointers (storage+1) instead of BASE pointers (storage), causing off-by-one slab index calculations that corrupted memory. Root Cause: - USER pointer = BASE + 1 (returned by malloc, points past header) - BASE pointer = storage start (where 1-byte header is written) - slab_index_for() expects BASE pointer for correct slab boundary calculations - Passing USER pointer → wrong slab_idx → wrong metadata → freelist corruption Impact Before Fix: - bench_random_mixed crashes at ~14K iterations with SEGV - Massive C7 alignment check failures (wrong slab classification) - Memory corruption from writing to wrong slab freelists Fixes Applied (8 locations): 1. core/hakmem_tiny_free.inc:137 - Added USER→BASE conversion before slab_index_for() 2. core/hakmem_tiny_ultra_simple.inc:148 - Added USER→BASE conversion before slab_index_for() 3. core/tiny_free_fast.inc.h:220 - Added USER→BASE conversion before slab_index_for() 4-5. core/tiny_free_magazine.inc.h:126,315 - Added USER→BASE conversion before slab_index_for() (2 locations) 6. core/box/free_local_box.c:14,22,62 - Added USER→BASE conversion before slab_index_for() - Fixed delta calculation to use BASE instead of USER - Fixed debug logging to use BASE instead of USER 7. core/hakmem_tiny.c:448,460,473 (tiny_debug_track_alloc_ret) - Added USER→BASE conversion before slab_index_for() (2 calls) - Fixed delta calculation to use BASE instead of USER - This function is called on EVERY allocation in debug builds Results After Fix: ✅ bench_random_mixed stable up to 66K iterations (~4.7x improvement) ✅ C7 alignment check failures eliminated (was: 100% failure rate) ✅ Front Gate "Unknown" classification dropped to 0% (was: 1.67%) ✅ No segfaults for workloads up to ~33K allocations Remaining Issue: ❌ Segfault still occurs at iteration 66152 (allocs=33137, frees=33014) - Different bug from USER/BASE conversion issues - Likely capacity/boundary condition (further investigation needed) Testing: - bench_random_mixed_hakmem 1K-66K iterations: PASS - bench_random_mixed_hakmem 67K+ iterations: FAIL (different bug) - bench_fixed_size_hakmem 200K iterations: PASS 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 05:21:36 +09:00
int actual_idx = slab_index_for(ss, base);
Phase 6-2.3~6-2.5: Critical bug fixes + SuperSlab optimization (WIP) ## Phase 6-2.3: Fix 4T Larson crash (active counter bug) ✅ **Problem:** 4T Larson crashed with "free(): invalid pointer", OOM errors **Root cause:** core/hakmem_tiny_refill_p0.inc.h:103 - P0 batch refill moved freelist blocks to TLS cache - Active counter NOT incremented → double-decrement on free - Counter underflows → SuperSlab appears full → OOM → crash **Fix:** Added ss_active_add(tls->ss, from_freelist); **Result:** 4T stable at 838K ops/s ✅ ## Phase 6-2.4: Fix SEGV in random_mixed/mid_large_mt benchmarks ✅ **Problem:** bench_random_mixed_hakmem, bench_mid_large_mt_hakmem → immediate SEGV **Root cause #1:** core/box/hak_free_api.inc.h:92-95 - "Guess loop" dereferenced unmapped memory when registry lookup failed **Root cause #2:** core/box/hak_free_api.inc.h:115 - Header magic check dereferenced unmapped memory **Fix:** 1. Removed dangerous guess loop (lines 92-95) 2. Added hak_is_memory_readable() check before dereferencing header (core/hakmem_internal.h:277-294 - uses mincore() syscall) **Result:** - random_mixed (2KB): SEGV → 2.22M ops/s ✅ - random_mixed (4KB): SEGV → 2.58M ops/s ✅ - Larson 4T: no regression (838K ops/s) ✅ ## Phase 6-2.5: Performance investigation + SuperSlab fix (WIP) ⚠️ **Problem:** Severe performance gaps (19-26x slower than system malloc) **Investigation:** Task agent identified root cause - hak_is_memory_readable() syscall overhead (100-300 cycles per free) - ALL frees hit unmapped_header_fallback path - SuperSlab lookup NEVER called - Why? g_use_superslab = 0 (disabled by diet mode) **Root cause:** core/hakmem_tiny_init.inc:104-105 - Diet mode (default ON) disables SuperSlab - SuperSlab defaults to 1 (hakmem_config.c:334) - BUT diet mode overrides it to 0 during init **Fix:** Separate SuperSlab from diet mode - SuperSlab: Performance-critical (fast alloc/free) - Diet mode: Memory efficiency (magazine capacity limits only) - Both are independent features, should not interfere **Status:** ⚠️ INCOMPLETE - New SEGV discovered after fix - SuperSlab lookup now works (confirmed via debug output) - But benchmark crashes (Exit 139) after ~20 lookups - Needs further investigation **Files modified:** - core/hakmem_tiny_init.inc:99-109 - Removed diet mode override - PERFORMANCE_INVESTIGATION_REPORT.md - Task agent analysis (303x instruction gap) **Next steps:** - Investigate new SEGV (likely SuperSlab free path bug) - OR: Revert Phase 6-2.5 changes if blocking progress 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-07 20:31:01 +09:00
if (actual_idx != slab_idx) {
tiny_failfast_abort_ptr("free_local_box_idx", ss, slab_idx, ptr, "slab_idx_mismatch");
} else {
Phase 12: Shared SuperSlab Pool implementation (WIP - runtime crash) ## Summary Implemented Phase 12 Shared SuperSlab Pool (mimalloc-style) to address SuperSlab allocation churn (877 SuperSlabs → 100-200 target). ## Implementation (ChatGPT + Claude) 1. **Metadata changes** (superslab_types.h): - Added class_idx to TinySlabMeta (per-slab dynamic class) - Removed size_class from SuperSlab (no longer per-SuperSlab) - Changed owner_tid (16-bit) → owner_tid_low (8-bit) 2. **Shared Pool** (hakmem_shared_pool.{h,c}): - Global pool shared by all size classes - shared_pool_acquire_slab() - Get free slab for class_idx - shared_pool_release_slab() - Return slab when empty - Per-class hints for fast path optimization 3. **Integration** (23 files modified): - Updated all ss->size_class → meta->class_idx - Updated all meta->owner_tid → meta->owner_tid_low - superslab_refill() now uses shared pool - Free path releases empty slabs back to pool 4. **Build system** (Makefile): - Added hakmem_shared_pool.o to OBJS_BASE and TINY_BENCH_OBJS_BASE ## Status: ⚠️ Build OK, Runtime CRASH **Build**: ✅ SUCCESS - All 23 files compile without errors - Only warnings: superslab_allocate type mismatch (legacy code) **Runtime**: ❌ SEGFAULT - Crash location: sll_refill_small_from_ss() - Exit code: 139 (SIGSEGV) - Test case: ./bench_random_mixed_hakmem 1000 256 42 ## Known Issues 1. **SEGFAULT in refill path** - Likely shared_pool_acquire_slab() issue 2. **Legacy superslab_allocate()** still exists (type mismatch warning) 3. **Remaining TODOs** from design doc: - SuperSlab physical layout integration - slab_handle.h cleanup - Remove old per-class head implementation ## Next Steps 1. Debug SEGFAULT (gdb backtrace shows sll_refill_small_from_ss) 2. Fix shared_pool_acquire_slab() or superslab_init_slab() 3. Basic functionality test (1K → 100K iterations) 4. Measure SuperSlab count reduction (877 → 100-200) 5. Performance benchmark (+650-860% expected) ## Files Changed (25 files) core/box/free_local_box.c core/box/free_remote_box.c core/box/front_gate_classifier.c core/hakmem_super_registry.c core/hakmem_tiny.c core/hakmem_tiny_bg_spill.c core/hakmem_tiny_free.inc core/hakmem_tiny_lifecycle.inc core/hakmem_tiny_magazine.c core/hakmem_tiny_query.c core/hakmem_tiny_refill.inc.h core/hakmem_tiny_superslab.c core/hakmem_tiny_superslab.h core/hakmem_tiny_tls_ops.h core/slab_handle.h core/superslab/superslab_inline.h core/superslab/superslab_types.h core/tiny_debug.h core/tiny_free_fast.inc.h core/tiny_free_magazine.inc.h core/tiny_remote.c core/tiny_superslab_alloc.inc.h core/tiny_superslab_free.inc.h Makefile ## New Files (3 files) PHASE12_SHARED_SUPERSLAB_POOL_DESIGN.md core/hakmem_shared_pool.c core/hakmem_shared_pool.h 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: ChatGPT <chatgpt@openai.com>
2025-11-13 16:33:03 +09:00
uint8_t cls = (meta && meta->class_idx < TINY_NUM_CLASSES) ? meta->class_idx : 0;
size_t blk = g_tiny_class_sizes[cls];
Phase 6-2.3~6-2.5: Critical bug fixes + SuperSlab optimization (WIP) ## Phase 6-2.3: Fix 4T Larson crash (active counter bug) ✅ **Problem:** 4T Larson crashed with "free(): invalid pointer", OOM errors **Root cause:** core/hakmem_tiny_refill_p0.inc.h:103 - P0 batch refill moved freelist blocks to TLS cache - Active counter NOT incremented → double-decrement on free - Counter underflows → SuperSlab appears full → OOM → crash **Fix:** Added ss_active_add(tls->ss, from_freelist); **Result:** 4T stable at 838K ops/s ✅ ## Phase 6-2.4: Fix SEGV in random_mixed/mid_large_mt benchmarks ✅ **Problem:** bench_random_mixed_hakmem, bench_mid_large_mt_hakmem → immediate SEGV **Root cause #1:** core/box/hak_free_api.inc.h:92-95 - "Guess loop" dereferenced unmapped memory when registry lookup failed **Root cause #2:** core/box/hak_free_api.inc.h:115 - Header magic check dereferenced unmapped memory **Fix:** 1. Removed dangerous guess loop (lines 92-95) 2. Added hak_is_memory_readable() check before dereferencing header (core/hakmem_internal.h:277-294 - uses mincore() syscall) **Result:** - random_mixed (2KB): SEGV → 2.22M ops/s ✅ - random_mixed (4KB): SEGV → 2.58M ops/s ✅ - Larson 4T: no regression (838K ops/s) ✅ ## Phase 6-2.5: Performance investigation + SuperSlab fix (WIP) ⚠️ **Problem:** Severe performance gaps (19-26x slower than system malloc) **Investigation:** Task agent identified root cause - hak_is_memory_readable() syscall overhead (100-300 cycles per free) - ALL frees hit unmapped_header_fallback path - SuperSlab lookup NEVER called - Why? g_use_superslab = 0 (disabled by diet mode) **Root cause:** core/hakmem_tiny_init.inc:104-105 - Diet mode (default ON) disables SuperSlab - SuperSlab defaults to 1 (hakmem_config.c:334) - BUT diet mode overrides it to 0 during init **Fix:** Separate SuperSlab from diet mode - SuperSlab: Performance-critical (fast alloc/free) - Diet mode: Memory efficiency (magazine capacity limits only) - Both are independent features, should not interfere **Status:** ⚠️ INCOMPLETE - New SEGV discovered after fix - SuperSlab lookup now works (confirmed via debug output) - But benchmark crashes (Exit 139) after ~20 lookups - Needs further investigation **Files modified:** - core/hakmem_tiny_init.inc:99-109 - Removed diet mode override - PERFORMANCE_INVESTIGATION_REPORT.md - Task agent analysis (303x instruction gap) **Next steps:** - Investigate new SEGV (likely SuperSlab free path bug) - OR: Revert Phase 6-2.5 changes if blocking progress 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-07 20:31:01 +09:00
uint8_t* slab_base = tiny_slab_base_for(ss, slab_idx);
Phase E1-CORRECT: Fix USER/BASE pointer conversion bugs in slab_index_for calls CRITICAL BUG FIX: Phase E1 introduced 1-byte headers for ALL size classes (C0-C7), changing the pointer contract. However, many locations still called slab_index_for() with USER pointers (storage+1) instead of BASE pointers (storage), causing off-by-one slab index calculations that corrupted memory. Root Cause: - USER pointer = BASE + 1 (returned by malloc, points past header) - BASE pointer = storage start (where 1-byte header is written) - slab_index_for() expects BASE pointer for correct slab boundary calculations - Passing USER pointer → wrong slab_idx → wrong metadata → freelist corruption Impact Before Fix: - bench_random_mixed crashes at ~14K iterations with SEGV - Massive C7 alignment check failures (wrong slab classification) - Memory corruption from writing to wrong slab freelists Fixes Applied (8 locations): 1. core/hakmem_tiny_free.inc:137 - Added USER→BASE conversion before slab_index_for() 2. core/hakmem_tiny_ultra_simple.inc:148 - Added USER→BASE conversion before slab_index_for() 3. core/tiny_free_fast.inc.h:220 - Added USER→BASE conversion before slab_index_for() 4-5. core/tiny_free_magazine.inc.h:126,315 - Added USER→BASE conversion before slab_index_for() (2 locations) 6. core/box/free_local_box.c:14,22,62 - Added USER→BASE conversion before slab_index_for() - Fixed delta calculation to use BASE instead of USER - Fixed debug logging to use BASE instead of USER 7. core/hakmem_tiny.c:448,460,473 (tiny_debug_track_alloc_ret) - Added USER→BASE conversion before slab_index_for() (2 calls) - Fixed delta calculation to use BASE instead of USER - This function is called on EVERY allocation in debug builds Results After Fix: ✅ bench_random_mixed stable up to 66K iterations (~4.7x improvement) ✅ C7 alignment check failures eliminated (was: 100% failure rate) ✅ Front Gate "Unknown" classification dropped to 0% (was: 1.67%) ✅ No segfaults for workloads up to ~33K allocations Remaining Issue: ❌ Segfault still occurs at iteration 66152 (allocs=33137, frees=33014) - Different bug from USER/BASE conversion issues - Likely capacity/boundary condition (further investigation needed) Testing: - bench_random_mixed_hakmem 1K-66K iterations: PASS - bench_random_mixed_hakmem 67K+ iterations: FAIL (different bug) - bench_fixed_size_hakmem 200K iterations: PASS 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 05:21:36 +09:00
uintptr_t delta = (uintptr_t)base - (uintptr_t)slab_base;
Phase 6-2.3~6-2.5: Critical bug fixes + SuperSlab optimization (WIP) ## Phase 6-2.3: Fix 4T Larson crash (active counter bug) ✅ **Problem:** 4T Larson crashed with "free(): invalid pointer", OOM errors **Root cause:** core/hakmem_tiny_refill_p0.inc.h:103 - P0 batch refill moved freelist blocks to TLS cache - Active counter NOT incremented → double-decrement on free - Counter underflows → SuperSlab appears full → OOM → crash **Fix:** Added ss_active_add(tls->ss, from_freelist); **Result:** 4T stable at 838K ops/s ✅ ## Phase 6-2.4: Fix SEGV in random_mixed/mid_large_mt benchmarks ✅ **Problem:** bench_random_mixed_hakmem, bench_mid_large_mt_hakmem → immediate SEGV **Root cause #1:** core/box/hak_free_api.inc.h:92-95 - "Guess loop" dereferenced unmapped memory when registry lookup failed **Root cause #2:** core/box/hak_free_api.inc.h:115 - Header magic check dereferenced unmapped memory **Fix:** 1. Removed dangerous guess loop (lines 92-95) 2. Added hak_is_memory_readable() check before dereferencing header (core/hakmem_internal.h:277-294 - uses mincore() syscall) **Result:** - random_mixed (2KB): SEGV → 2.22M ops/s ✅ - random_mixed (4KB): SEGV → 2.58M ops/s ✅ - Larson 4T: no regression (838K ops/s) ✅ ## Phase 6-2.5: Performance investigation + SuperSlab fix (WIP) ⚠️ **Problem:** Severe performance gaps (19-26x slower than system malloc) **Investigation:** Task agent identified root cause - hak_is_memory_readable() syscall overhead (100-300 cycles per free) - ALL frees hit unmapped_header_fallback path - SuperSlab lookup NEVER called - Why? g_use_superslab = 0 (disabled by diet mode) **Root cause:** core/hakmem_tiny_init.inc:104-105 - Diet mode (default ON) disables SuperSlab - SuperSlab defaults to 1 (hakmem_config.c:334) - BUT diet mode overrides it to 0 during init **Fix:** Separate SuperSlab from diet mode - SuperSlab: Performance-critical (fast alloc/free) - Diet mode: Memory efficiency (magazine capacity limits only) - Both are independent features, should not interfere **Status:** ⚠️ INCOMPLETE - New SEGV discovered after fix - SuperSlab lookup now works (confirmed via debug output) - But benchmark crashes (Exit 139) after ~20 lookups - Needs further investigation **Files modified:** - core/hakmem_tiny_init.inc:99-109 - Removed diet mode override - PERFORMANCE_INVESTIGATION_REPORT.md - Task agent analysis (303x instruction gap) **Next steps:** - Investigate new SEGV (likely SuperSlab free path bug) - OR: Revert Phase 6-2.5 changes if blocking progress 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-07 20:31:01 +09:00
if (blk == 0 || (delta % blk) != 0) {
tiny_failfast_abort_ptr("free_local_box_align", ss, slab_idx, ptr, "misaligned");
} else if (meta && delta / blk >= meta->capacity) {
tiny_failfast_abort_ptr("free_local_box_range", ss, slab_idx, ptr, "out_of_capacity");
}
}
}
CRITICAL FIX: TLS 未初期化による 4T SEGV を完全解消 **問題:** - Larson 4T で 100% SEGV (1T は 2.09M ops/s で完走) - System/mimalloc は 4T で 33.52M ops/s 正常動作 - SS OFF + Remote OFF でも 4T で SEGV **根本原因: (Task agent ultrathink 調査結果)** ``` CRASH: mov (%r15),%r13 R15 = 0x6261 ← ASCII "ba" (ゴミ値、未初期化TLS) ``` Worker スレッドの TLS 変数が未初期化: - `__thread void* g_tls_sll_head[TINY_NUM_CLASSES];` ← 初期化なし - pthread_create() で生成されたスレッドでゼロ初期化されない - NULL チェックが通過 (0x6261 != NULL) → dereference → SEGV **修正内容:** 全 TLS 配列に明示的初期化子 `= {0}` を追加: 1. **core/hakmem_tiny.c:** - `g_tls_sll_head[TINY_NUM_CLASSES] = {0}` - `g_tls_sll_count[TINY_NUM_CLASSES] = {0}` - `g_tls_live_ss[TINY_NUM_CLASSES] = {0}` - `g_tls_bcur[TINY_NUM_CLASSES] = {0}` - `g_tls_bend[TINY_NUM_CLASSES] = {0}` 2. **core/tiny_fastcache.c:** - `g_tiny_fast_cache[TINY_FAST_CLASS_COUNT] = {0}` - `g_tiny_fast_count[TINY_FAST_CLASS_COUNT] = {0}` - `g_tiny_fast_free_head[TINY_FAST_CLASS_COUNT] = {0}` - `g_tiny_fast_free_count[TINY_FAST_CLASS_COUNT] = {0}` 3. **core/hakmem_tiny_magazine.c:** - `g_tls_mags[TINY_NUM_CLASSES] = {0}` 4. **core/tiny_sticky.c:** - `g_tls_sticky_ss[TINY_NUM_CLASSES][TINY_STICKY_RING] = {0}` - `g_tls_sticky_idx[TINY_NUM_CLASSES][TINY_STICKY_RING] = {0}` - `g_tls_sticky_pos[TINY_NUM_CLASSES] = {0}` **効果:** ``` Before: 1T: 2.09M ✅ | 4T: SEGV 💀 After: 1T: 2.41M ✅ | 4T: 4.19M ✅ (+15% 1T, SEGV解消) ``` **テスト:** ```bash # 1 thread: 完走 ./larson_hakmem 2 8 128 1024 1 12345 1 → Throughput = 2,407,597 ops/s ✅ # 4 threads: 完走(以前は SEGV) ./larson_hakmem 2 8 128 1024 1 12345 4 → Throughput = 4,192,155 ops/s ✅ ``` **調査協力:** Task agent (ultrathink mode) による完璧な根本原因特定 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-07 01:27:04 +09:00
void* prev = meta->freelist;
// FREELIST CORRUPTION DEBUG: Validate pointer before writing
if (__builtin_expect(tiny_refill_failfast_level() >= 2, 0)) {
Phase 12: Shared SuperSlab Pool implementation (WIP - runtime crash) ## Summary Implemented Phase 12 Shared SuperSlab Pool (mimalloc-style) to address SuperSlab allocation churn (877 SuperSlabs → 100-200 target). ## Implementation (ChatGPT + Claude) 1. **Metadata changes** (superslab_types.h): - Added class_idx to TinySlabMeta (per-slab dynamic class) - Removed size_class from SuperSlab (no longer per-SuperSlab) - Changed owner_tid (16-bit) → owner_tid_low (8-bit) 2. **Shared Pool** (hakmem_shared_pool.{h,c}): - Global pool shared by all size classes - shared_pool_acquire_slab() - Get free slab for class_idx - shared_pool_release_slab() - Return slab when empty - Per-class hints for fast path optimization 3. **Integration** (23 files modified): - Updated all ss->size_class → meta->class_idx - Updated all meta->owner_tid → meta->owner_tid_low - superslab_refill() now uses shared pool - Free path releases empty slabs back to pool 4. **Build system** (Makefile): - Added hakmem_shared_pool.o to OBJS_BASE and TINY_BENCH_OBJS_BASE ## Status: ⚠️ Build OK, Runtime CRASH **Build**: ✅ SUCCESS - All 23 files compile without errors - Only warnings: superslab_allocate type mismatch (legacy code) **Runtime**: ❌ SEGFAULT - Crash location: sll_refill_small_from_ss() - Exit code: 139 (SIGSEGV) - Test case: ./bench_random_mixed_hakmem 1000 256 42 ## Known Issues 1. **SEGFAULT in refill path** - Likely shared_pool_acquire_slab() issue 2. **Legacy superslab_allocate()** still exists (type mismatch warning) 3. **Remaining TODOs** from design doc: - SuperSlab physical layout integration - slab_handle.h cleanup - Remove old per-class head implementation ## Next Steps 1. Debug SEGFAULT (gdb backtrace shows sll_refill_small_from_ss) 2. Fix shared_pool_acquire_slab() or superslab_init_slab() 3. Basic functionality test (1K → 100K iterations) 4. Measure SuperSlab count reduction (877 → 100-200) 5. Performance benchmark (+650-860% expected) ## Files Changed (25 files) core/box/free_local_box.c core/box/free_remote_box.c core/box/front_gate_classifier.c core/hakmem_super_registry.c core/hakmem_tiny.c core/hakmem_tiny_bg_spill.c core/hakmem_tiny_free.inc core/hakmem_tiny_lifecycle.inc core/hakmem_tiny_magazine.c core/hakmem_tiny_query.c core/hakmem_tiny_refill.inc.h core/hakmem_tiny_superslab.c core/hakmem_tiny_superslab.h core/hakmem_tiny_tls_ops.h core/slab_handle.h core/superslab/superslab_inline.h core/superslab/superslab_types.h core/tiny_debug.h core/tiny_free_fast.inc.h core/tiny_free_magazine.inc.h core/tiny_remote.c core/tiny_superslab_alloc.inc.h core/tiny_superslab_free.inc.h Makefile ## New Files (3 files) PHASE12_SHARED_SUPERSLAB_POOL_DESIGN.md core/hakmem_shared_pool.c core/hakmem_shared_pool.h 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: ChatGPT <chatgpt@openai.com>
2025-11-13 16:33:03 +09:00
uint8_t cls = (meta && meta->class_idx < TINY_NUM_CLASSES) ? meta->class_idx : 0;
size_t blk = g_tiny_class_sizes[cls];
uint8_t* base_ss = (uint8_t*)ss;
uint8_t* slab_base = tiny_slab_base_for(ss, slab_idx);
// Verify prev pointer is valid (if not NULL)
if (prev != NULL) {
uintptr_t prev_addr = (uintptr_t)prev;
uintptr_t slab_addr = (uintptr_t)slab_base;
// Check if prev is within this slab
if (prev_addr < (uintptr_t)base_ss || prev_addr >= (uintptr_t)base_ss + (2*1024*1024)) {
Phase 12: Shared SuperSlab Pool implementation (WIP - runtime crash) ## Summary Implemented Phase 12 Shared SuperSlab Pool (mimalloc-style) to address SuperSlab allocation churn (877 SuperSlabs → 100-200 target). ## Implementation (ChatGPT + Claude) 1. **Metadata changes** (superslab_types.h): - Added class_idx to TinySlabMeta (per-slab dynamic class) - Removed size_class from SuperSlab (no longer per-SuperSlab) - Changed owner_tid (16-bit) → owner_tid_low (8-bit) 2. **Shared Pool** (hakmem_shared_pool.{h,c}): - Global pool shared by all size classes - shared_pool_acquire_slab() - Get free slab for class_idx - shared_pool_release_slab() - Return slab when empty - Per-class hints for fast path optimization 3. **Integration** (23 files modified): - Updated all ss->size_class → meta->class_idx - Updated all meta->owner_tid → meta->owner_tid_low - superslab_refill() now uses shared pool - Free path releases empty slabs back to pool 4. **Build system** (Makefile): - Added hakmem_shared_pool.o to OBJS_BASE and TINY_BENCH_OBJS_BASE ## Status: ⚠️ Build OK, Runtime CRASH **Build**: ✅ SUCCESS - All 23 files compile without errors - Only warnings: superslab_allocate type mismatch (legacy code) **Runtime**: ❌ SEGFAULT - Crash location: sll_refill_small_from_ss() - Exit code: 139 (SIGSEGV) - Test case: ./bench_random_mixed_hakmem 1000 256 42 ## Known Issues 1. **SEGFAULT in refill path** - Likely shared_pool_acquire_slab() issue 2. **Legacy superslab_allocate()** still exists (type mismatch warning) 3. **Remaining TODOs** from design doc: - SuperSlab physical layout integration - slab_handle.h cleanup - Remove old per-class head implementation ## Next Steps 1. Debug SEGFAULT (gdb backtrace shows sll_refill_small_from_ss) 2. Fix shared_pool_acquire_slab() or superslab_init_slab() 3. Basic functionality test (1K → 100K iterations) 4. Measure SuperSlab count reduction (877 → 100-200) 5. Performance benchmark (+650-860% expected) ## Files Changed (25 files) core/box/free_local_box.c core/box/free_remote_box.c core/box/front_gate_classifier.c core/hakmem_super_registry.c core/hakmem_tiny.c core/hakmem_tiny_bg_spill.c core/hakmem_tiny_free.inc core/hakmem_tiny_lifecycle.inc core/hakmem_tiny_magazine.c core/hakmem_tiny_query.c core/hakmem_tiny_refill.inc.h core/hakmem_tiny_superslab.c core/hakmem_tiny_superslab.h core/hakmem_tiny_tls_ops.h core/slab_handle.h core/superslab/superslab_inline.h core/superslab/superslab_types.h core/tiny_debug.h core/tiny_free_fast.inc.h core/tiny_free_magazine.inc.h core/tiny_remote.c core/tiny_superslab_alloc.inc.h core/tiny_superslab_free.inc.h Makefile ## New Files (3 files) PHASE12_SHARED_SUPERSLAB_POOL_DESIGN.md core/hakmem_shared_pool.c core/hakmem_shared_pool.h 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: ChatGPT <chatgpt@openai.com>
2025-11-13 16:33:03 +09:00
fprintf(stderr, "[FREE_CORRUPT] prev=%p outside SuperSlab ss=%p slab=%d\n",
prev, ss, slab_idx);
tiny_failfast_abort_ptr("free_local_prev_range", ss, slab_idx, ptr, "prev_outside_ss");
}
// Check alignment of prev
if ((prev_addr - slab_addr) % blk != 0) {
fprintf(stderr, "[FREE_CORRUPT] prev=%p misaligned (cls=%u slab=%d blk=%zu offset=%zu)\n",
Phase 12: Shared SuperSlab Pool implementation (WIP - runtime crash) ## Summary Implemented Phase 12 Shared SuperSlab Pool (mimalloc-style) to address SuperSlab allocation churn (877 SuperSlabs → 100-200 target). ## Implementation (ChatGPT + Claude) 1. **Metadata changes** (superslab_types.h): - Added class_idx to TinySlabMeta (per-slab dynamic class) - Removed size_class from SuperSlab (no longer per-SuperSlab) - Changed owner_tid (16-bit) → owner_tid_low (8-bit) 2. **Shared Pool** (hakmem_shared_pool.{h,c}): - Global pool shared by all size classes - shared_pool_acquire_slab() - Get free slab for class_idx - shared_pool_release_slab() - Return slab when empty - Per-class hints for fast path optimization 3. **Integration** (23 files modified): - Updated all ss->size_class → meta->class_idx - Updated all meta->owner_tid → meta->owner_tid_low - superslab_refill() now uses shared pool - Free path releases empty slabs back to pool 4. **Build system** (Makefile): - Added hakmem_shared_pool.o to OBJS_BASE and TINY_BENCH_OBJS_BASE ## Status: ⚠️ Build OK, Runtime CRASH **Build**: ✅ SUCCESS - All 23 files compile without errors - Only warnings: superslab_allocate type mismatch (legacy code) **Runtime**: ❌ SEGFAULT - Crash location: sll_refill_small_from_ss() - Exit code: 139 (SIGSEGV) - Test case: ./bench_random_mixed_hakmem 1000 256 42 ## Known Issues 1. **SEGFAULT in refill path** - Likely shared_pool_acquire_slab() issue 2. **Legacy superslab_allocate()** still exists (type mismatch warning) 3. **Remaining TODOs** from design doc: - SuperSlab physical layout integration - slab_handle.h cleanup - Remove old per-class head implementation ## Next Steps 1. Debug SEGFAULT (gdb backtrace shows sll_refill_small_from_ss) 2. Fix shared_pool_acquire_slab() or superslab_init_slab() 3. Basic functionality test (1K → 100K iterations) 4. Measure SuperSlab count reduction (877 → 100-200) 5. Performance benchmark (+650-860% expected) ## Files Changed (25 files) core/box/free_local_box.c core/box/free_remote_box.c core/box/front_gate_classifier.c core/hakmem_super_registry.c core/hakmem_tiny.c core/hakmem_tiny_bg_spill.c core/hakmem_tiny_free.inc core/hakmem_tiny_lifecycle.inc core/hakmem_tiny_magazine.c core/hakmem_tiny_query.c core/hakmem_tiny_refill.inc.h core/hakmem_tiny_superslab.c core/hakmem_tiny_superslab.h core/hakmem_tiny_tls_ops.h core/slab_handle.h core/superslab/superslab_inline.h core/superslab/superslab_types.h core/tiny_debug.h core/tiny_free_fast.inc.h core/tiny_free_magazine.inc.h core/tiny_remote.c core/tiny_superslab_alloc.inc.h core/tiny_superslab_free.inc.h Makefile ## New Files (3 files) PHASE12_SHARED_SUPERSLAB_POOL_DESIGN.md core/hakmem_shared_pool.c core/hakmem_shared_pool.h 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: ChatGPT <chatgpt@openai.com>
2025-11-13 16:33:03 +09:00
prev, cls, slab_idx, blk, (size_t)(prev_addr - slab_addr));
fprintf(stderr, "[FREE_CORRUPT] Writing from ptr=%p, freelist was=%p\n", ptr, prev);
tiny_failfast_abort_ptr("free_local_prev_misalign", ss, slab_idx, ptr, "prev_misaligned");
}
}
fprintf(stderr, "[FREE_VERIFY] cls=%u slab=%d ptr=%p prev=%p (offset_ptr=%zu offset_prev=%zu)\n",
Phase 12: Shared SuperSlab Pool implementation (WIP - runtime crash) ## Summary Implemented Phase 12 Shared SuperSlab Pool (mimalloc-style) to address SuperSlab allocation churn (877 SuperSlabs → 100-200 target). ## Implementation (ChatGPT + Claude) 1. **Metadata changes** (superslab_types.h): - Added class_idx to TinySlabMeta (per-slab dynamic class) - Removed size_class from SuperSlab (no longer per-SuperSlab) - Changed owner_tid (16-bit) → owner_tid_low (8-bit) 2. **Shared Pool** (hakmem_shared_pool.{h,c}): - Global pool shared by all size classes - shared_pool_acquire_slab() - Get free slab for class_idx - shared_pool_release_slab() - Return slab when empty - Per-class hints for fast path optimization 3. **Integration** (23 files modified): - Updated all ss->size_class → meta->class_idx - Updated all meta->owner_tid → meta->owner_tid_low - superslab_refill() now uses shared pool - Free path releases empty slabs back to pool 4. **Build system** (Makefile): - Added hakmem_shared_pool.o to OBJS_BASE and TINY_BENCH_OBJS_BASE ## Status: ⚠️ Build OK, Runtime CRASH **Build**: ✅ SUCCESS - All 23 files compile without errors - Only warnings: superslab_allocate type mismatch (legacy code) **Runtime**: ❌ SEGFAULT - Crash location: sll_refill_small_from_ss() - Exit code: 139 (SIGSEGV) - Test case: ./bench_random_mixed_hakmem 1000 256 42 ## Known Issues 1. **SEGFAULT in refill path** - Likely shared_pool_acquire_slab() issue 2. **Legacy superslab_allocate()** still exists (type mismatch warning) 3. **Remaining TODOs** from design doc: - SuperSlab physical layout integration - slab_handle.h cleanup - Remove old per-class head implementation ## Next Steps 1. Debug SEGFAULT (gdb backtrace shows sll_refill_small_from_ss) 2. Fix shared_pool_acquire_slab() or superslab_init_slab() 3. Basic functionality test (1K → 100K iterations) 4. Measure SuperSlab count reduction (877 → 100-200) 5. Performance benchmark (+650-860% expected) ## Files Changed (25 files) core/box/free_local_box.c core/box/free_remote_box.c core/box/front_gate_classifier.c core/hakmem_super_registry.c core/hakmem_tiny.c core/hakmem_tiny_bg_spill.c core/hakmem_tiny_free.inc core/hakmem_tiny_lifecycle.inc core/hakmem_tiny_magazine.c core/hakmem_tiny_query.c core/hakmem_tiny_refill.inc.h core/hakmem_tiny_superslab.c core/hakmem_tiny_superslab.h core/hakmem_tiny_tls_ops.h core/slab_handle.h core/superslab/superslab_inline.h core/superslab/superslab_types.h core/tiny_debug.h core/tiny_free_fast.inc.h core/tiny_free_magazine.inc.h core/tiny_remote.c core/tiny_superslab_alloc.inc.h core/tiny_superslab_free.inc.h Makefile ## New Files (3 files) PHASE12_SHARED_SUPERSLAB_POOL_DESIGN.md core/hakmem_shared_pool.c core/hakmem_shared_pool.h 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: ChatGPT <chatgpt@openai.com>
2025-11-13 16:33:03 +09:00
cls, slab_idx, ptr, prev,
Phase E1-CORRECT: Fix USER/BASE pointer conversion bugs in slab_index_for calls CRITICAL BUG FIX: Phase E1 introduced 1-byte headers for ALL size classes (C0-C7), changing the pointer contract. However, many locations still called slab_index_for() with USER pointers (storage+1) instead of BASE pointers (storage), causing off-by-one slab index calculations that corrupted memory. Root Cause: - USER pointer = BASE + 1 (returned by malloc, points past header) - BASE pointer = storage start (where 1-byte header is written) - slab_index_for() expects BASE pointer for correct slab boundary calculations - Passing USER pointer → wrong slab_idx → wrong metadata → freelist corruption Impact Before Fix: - bench_random_mixed crashes at ~14K iterations with SEGV - Massive C7 alignment check failures (wrong slab classification) - Memory corruption from writing to wrong slab freelists Fixes Applied (8 locations): 1. core/hakmem_tiny_free.inc:137 - Added USER→BASE conversion before slab_index_for() 2. core/hakmem_tiny_ultra_simple.inc:148 - Added USER→BASE conversion before slab_index_for() 3. core/tiny_free_fast.inc.h:220 - Added USER→BASE conversion before slab_index_for() 4-5. core/tiny_free_magazine.inc.h:126,315 - Added USER→BASE conversion before slab_index_for() (2 locations) 6. core/box/free_local_box.c:14,22,62 - Added USER→BASE conversion before slab_index_for() - Fixed delta calculation to use BASE instead of USER - Fixed debug logging to use BASE instead of USER 7. core/hakmem_tiny.c:448,460,473 (tiny_debug_track_alloc_ret) - Added USER→BASE conversion before slab_index_for() (2 calls) - Fixed delta calculation to use BASE instead of USER - This function is called on EVERY allocation in debug builds Results After Fix: ✅ bench_random_mixed stable up to 66K iterations (~4.7x improvement) ✅ C7 alignment check failures eliminated (was: 100% failure rate) ✅ Front Gate "Unknown" classification dropped to 0% (was: 1.67%) ✅ No segfaults for workloads up to ~33K allocations Remaining Issue: ❌ Segfault still occurs at iteration 66152 (allocs=33137, frees=33014) - Different bug from USER/BASE conversion issues - Likely capacity/boundary condition (further investigation needed) Testing: - bench_random_mixed_hakmem 1K-66K iterations: PASS - bench_random_mixed_hakmem 67K+ iterations: FAIL (different bug) - bench_fixed_size_hakmem 200K iterations: PASS 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 05:21:36 +09:00
(size_t)((uintptr_t)base - (uintptr_t)slab_base),
prev ? (size_t)((uintptr_t)prev - (uintptr_t)slab_base) : 0);
}
Phase 12: Shared SuperSlab Pool implementation (WIP - runtime crash) ## Summary Implemented Phase 12 Shared SuperSlab Pool (mimalloc-style) to address SuperSlab allocation churn (877 SuperSlabs → 100-200 target). ## Implementation (ChatGPT + Claude) 1. **Metadata changes** (superslab_types.h): - Added class_idx to TinySlabMeta (per-slab dynamic class) - Removed size_class from SuperSlab (no longer per-SuperSlab) - Changed owner_tid (16-bit) → owner_tid_low (8-bit) 2. **Shared Pool** (hakmem_shared_pool.{h,c}): - Global pool shared by all size classes - shared_pool_acquire_slab() - Get free slab for class_idx - shared_pool_release_slab() - Return slab when empty - Per-class hints for fast path optimization 3. **Integration** (23 files modified): - Updated all ss->size_class → meta->class_idx - Updated all meta->owner_tid → meta->owner_tid_low - superslab_refill() now uses shared pool - Free path releases empty slabs back to pool 4. **Build system** (Makefile): - Added hakmem_shared_pool.o to OBJS_BASE and TINY_BENCH_OBJS_BASE ## Status: ⚠️ Build OK, Runtime CRASH **Build**: ✅ SUCCESS - All 23 files compile without errors - Only warnings: superslab_allocate type mismatch (legacy code) **Runtime**: ❌ SEGFAULT - Crash location: sll_refill_small_from_ss() - Exit code: 139 (SIGSEGV) - Test case: ./bench_random_mixed_hakmem 1000 256 42 ## Known Issues 1. **SEGFAULT in refill path** - Likely shared_pool_acquire_slab() issue 2. **Legacy superslab_allocate()** still exists (type mismatch warning) 3. **Remaining TODOs** from design doc: - SuperSlab physical layout integration - slab_handle.h cleanup - Remove old per-class head implementation ## Next Steps 1. Debug SEGFAULT (gdb backtrace shows sll_refill_small_from_ss) 2. Fix shared_pool_acquire_slab() or superslab_init_slab() 3. Basic functionality test (1K → 100K iterations) 4. Measure SuperSlab count reduction (877 → 100-200) 5. Performance benchmark (+650-860% expected) ## Files Changed (25 files) core/box/free_local_box.c core/box/free_remote_box.c core/box/front_gate_classifier.c core/hakmem_super_registry.c core/hakmem_tiny.c core/hakmem_tiny_bg_spill.c core/hakmem_tiny_free.inc core/hakmem_tiny_lifecycle.inc core/hakmem_tiny_magazine.c core/hakmem_tiny_query.c core/hakmem_tiny_refill.inc.h core/hakmem_tiny_superslab.c core/hakmem_tiny_superslab.h core/hakmem_tiny_tls_ops.h core/slab_handle.h core/superslab/superslab_inline.h core/superslab/superslab_types.h core/tiny_debug.h core/tiny_free_fast.inc.h core/tiny_free_magazine.inc.h core/tiny_remote.c core/tiny_superslab_alloc.inc.h core/tiny_superslab_free.inc.h Makefile ## New Files (3 files) PHASE12_SHARED_SUPERSLAB_POOL_DESIGN.md core/hakmem_shared_pool.c core/hakmem_shared_pool.h 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: ChatGPT <chatgpt@openai.com>
2025-11-13 16:33:03 +09:00
// Use per-slab class for freelist linkage
uint8_t cls = (meta && meta->class_idx < TINY_NUM_CLASSES) ? meta->class_idx : 0;
tiny_next_write(cls, ptr, prev); // Phase E1-CORRECT: Box API with shared pool
CRITICAL FIX: TLS 未初期化による 4T SEGV を完全解消 **問題:** - Larson 4T で 100% SEGV (1T は 2.09M ops/s で完走) - System/mimalloc は 4T で 33.52M ops/s 正常動作 - SS OFF + Remote OFF でも 4T で SEGV **根本原因: (Task agent ultrathink 調査結果)** ``` CRASH: mov (%r15),%r13 R15 = 0x6261 ← ASCII "ba" (ゴミ値、未初期化TLS) ``` Worker スレッドの TLS 変数が未初期化: - `__thread void* g_tls_sll_head[TINY_NUM_CLASSES];` ← 初期化なし - pthread_create() で生成されたスレッドでゼロ初期化されない - NULL チェックが通過 (0x6261 != NULL) → dereference → SEGV **修正内容:** 全 TLS 配列に明示的初期化子 `= {0}` を追加: 1. **core/hakmem_tiny.c:** - `g_tls_sll_head[TINY_NUM_CLASSES] = {0}` - `g_tls_sll_count[TINY_NUM_CLASSES] = {0}` - `g_tls_live_ss[TINY_NUM_CLASSES] = {0}` - `g_tls_bcur[TINY_NUM_CLASSES] = {0}` - `g_tls_bend[TINY_NUM_CLASSES] = {0}` 2. **core/tiny_fastcache.c:** - `g_tiny_fast_cache[TINY_FAST_CLASS_COUNT] = {0}` - `g_tiny_fast_count[TINY_FAST_CLASS_COUNT] = {0}` - `g_tiny_fast_free_head[TINY_FAST_CLASS_COUNT] = {0}` - `g_tiny_fast_free_count[TINY_FAST_CLASS_COUNT] = {0}` 3. **core/hakmem_tiny_magazine.c:** - `g_tls_mags[TINY_NUM_CLASSES] = {0}` 4. **core/tiny_sticky.c:** - `g_tls_sticky_ss[TINY_NUM_CLASSES][TINY_STICKY_RING] = {0}` - `g_tls_sticky_idx[TINY_NUM_CLASSES][TINY_STICKY_RING] = {0}` - `g_tls_sticky_pos[TINY_NUM_CLASSES] = {0}` **効果:** ``` Before: 1T: 2.09M ✅ | 4T: SEGV 💀 After: 1T: 2.41M ✅ | 4T: 4.19M ✅ (+15% 1T, SEGV解消) ``` **テスト:** ```bash # 1 thread: 完走 ./larson_hakmem 2 8 128 1024 1 12345 1 → Throughput = 2,407,597 ops/s ✅ # 4 threads: 完走(以前は SEGV) ./larson_hakmem 2 8 128 1024 1 12345 4 → Throughput = 4,192,155 ops/s ✅ ``` **調査協力:** Task agent (ultrathink mode) による完璧な根本原因特定 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-07 01:27:04 +09:00
meta->freelist = ptr;
// FREELIST CORRUPTION DEBUG: Verify write succeeded
if (__builtin_expect(tiny_refill_failfast_level() >= 2, 0)) {
Phase 12: Shared SuperSlab Pool implementation (WIP - runtime crash) ## Summary Implemented Phase 12 Shared SuperSlab Pool (mimalloc-style) to address SuperSlab allocation churn (877 SuperSlabs → 100-200 target). ## Implementation (ChatGPT + Claude) 1. **Metadata changes** (superslab_types.h): - Added class_idx to TinySlabMeta (per-slab dynamic class) - Removed size_class from SuperSlab (no longer per-SuperSlab) - Changed owner_tid (16-bit) → owner_tid_low (8-bit) 2. **Shared Pool** (hakmem_shared_pool.{h,c}): - Global pool shared by all size classes - shared_pool_acquire_slab() - Get free slab for class_idx - shared_pool_release_slab() - Return slab when empty - Per-class hints for fast path optimization 3. **Integration** (23 files modified): - Updated all ss->size_class → meta->class_idx - Updated all meta->owner_tid → meta->owner_tid_low - superslab_refill() now uses shared pool - Free path releases empty slabs back to pool 4. **Build system** (Makefile): - Added hakmem_shared_pool.o to OBJS_BASE and TINY_BENCH_OBJS_BASE ## Status: ⚠️ Build OK, Runtime CRASH **Build**: ✅ SUCCESS - All 23 files compile without errors - Only warnings: superslab_allocate type mismatch (legacy code) **Runtime**: ❌ SEGFAULT - Crash location: sll_refill_small_from_ss() - Exit code: 139 (SIGSEGV) - Test case: ./bench_random_mixed_hakmem 1000 256 42 ## Known Issues 1. **SEGFAULT in refill path** - Likely shared_pool_acquire_slab() issue 2. **Legacy superslab_allocate()** still exists (type mismatch warning) 3. **Remaining TODOs** from design doc: - SuperSlab physical layout integration - slab_handle.h cleanup - Remove old per-class head implementation ## Next Steps 1. Debug SEGFAULT (gdb backtrace shows sll_refill_small_from_ss) 2. Fix shared_pool_acquire_slab() or superslab_init_slab() 3. Basic functionality test (1K → 100K iterations) 4. Measure SuperSlab count reduction (877 → 100-200) 5. Performance benchmark (+650-860% expected) ## Files Changed (25 files) core/box/free_local_box.c core/box/free_remote_box.c core/box/front_gate_classifier.c core/hakmem_super_registry.c core/hakmem_tiny.c core/hakmem_tiny_bg_spill.c core/hakmem_tiny_free.inc core/hakmem_tiny_lifecycle.inc core/hakmem_tiny_magazine.c core/hakmem_tiny_query.c core/hakmem_tiny_refill.inc.h core/hakmem_tiny_superslab.c core/hakmem_tiny_superslab.h core/hakmem_tiny_tls_ops.h core/slab_handle.h core/superslab/superslab_inline.h core/superslab/superslab_types.h core/tiny_debug.h core/tiny_free_fast.inc.h core/tiny_free_magazine.inc.h core/tiny_remote.c core/tiny_superslab_alloc.inc.h core/tiny_superslab_free.inc.h Makefile ## New Files (3 files) PHASE12_SHARED_SUPERSLAB_POOL_DESIGN.md core/hakmem_shared_pool.c core/hakmem_shared_pool.h 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: ChatGPT <chatgpt@openai.com>
2025-11-13 16:33:03 +09:00
void* readback = tiny_next_read(cls, ptr); // Phase E1-CORRECT: Box API
if (readback != prev) {
fprintf(stderr, "[FREE_CORRUPT] Wrote prev=%p to ptr=%p but read back %p!\n",
prev, ptr, readback);
fprintf(stderr, "[FREE_CORRUPT] Memory corruption detected during freelist push\n");
tiny_failfast_abort_ptr("free_local_readback", ss, slab_idx, ptr, "write_corrupted");
}
}
Phase 12: Shared SuperSlab Pool implementation (WIP - runtime crash) ## Summary Implemented Phase 12 Shared SuperSlab Pool (mimalloc-style) to address SuperSlab allocation churn (877 SuperSlabs → 100-200 target). ## Implementation (ChatGPT + Claude) 1. **Metadata changes** (superslab_types.h): - Added class_idx to TinySlabMeta (per-slab dynamic class) - Removed size_class from SuperSlab (no longer per-SuperSlab) - Changed owner_tid (16-bit) → owner_tid_low (8-bit) 2. **Shared Pool** (hakmem_shared_pool.{h,c}): - Global pool shared by all size classes - shared_pool_acquire_slab() - Get free slab for class_idx - shared_pool_release_slab() - Return slab when empty - Per-class hints for fast path optimization 3. **Integration** (23 files modified): - Updated all ss->size_class → meta->class_idx - Updated all meta->owner_tid → meta->owner_tid_low - superslab_refill() now uses shared pool - Free path releases empty slabs back to pool 4. **Build system** (Makefile): - Added hakmem_shared_pool.o to OBJS_BASE and TINY_BENCH_OBJS_BASE ## Status: ⚠️ Build OK, Runtime CRASH **Build**: ✅ SUCCESS - All 23 files compile without errors - Only warnings: superslab_allocate type mismatch (legacy code) **Runtime**: ❌ SEGFAULT - Crash location: sll_refill_small_from_ss() - Exit code: 139 (SIGSEGV) - Test case: ./bench_random_mixed_hakmem 1000 256 42 ## Known Issues 1. **SEGFAULT in refill path** - Likely shared_pool_acquire_slab() issue 2. **Legacy superslab_allocate()** still exists (type mismatch warning) 3. **Remaining TODOs** from design doc: - SuperSlab physical layout integration - slab_handle.h cleanup - Remove old per-class head implementation ## Next Steps 1. Debug SEGFAULT (gdb backtrace shows sll_refill_small_from_ss) 2. Fix shared_pool_acquire_slab() or superslab_init_slab() 3. Basic functionality test (1K → 100K iterations) 4. Measure SuperSlab count reduction (877 → 100-200) 5. Performance benchmark (+650-860% expected) ## Files Changed (25 files) core/box/free_local_box.c core/box/free_remote_box.c core/box/front_gate_classifier.c core/hakmem_super_registry.c core/hakmem_tiny.c core/hakmem_tiny_bg_spill.c core/hakmem_tiny_free.inc core/hakmem_tiny_lifecycle.inc core/hakmem_tiny_magazine.c core/hakmem_tiny_query.c core/hakmem_tiny_refill.inc.h core/hakmem_tiny_superslab.c core/hakmem_tiny_superslab.h core/hakmem_tiny_tls_ops.h core/slab_handle.h core/superslab/superslab_inline.h core/superslab/superslab_types.h core/tiny_debug.h core/tiny_free_fast.inc.h core/tiny_free_magazine.inc.h core/tiny_remote.c core/tiny_superslab_alloc.inc.h core/tiny_superslab_free.inc.h Makefile ## New Files (3 files) PHASE12_SHARED_SUPERSLAB_POOL_DESIGN.md core/hakmem_shared_pool.c core/hakmem_shared_pool.h 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: ChatGPT <chatgpt@openai.com>
2025-11-13 16:33:03 +09:00
tiny_failfast_log("free_local_box", cls, ss, meta, ptr, prev);
CRITICAL FIX: TLS 未初期化による 4T SEGV を完全解消 **問題:** - Larson 4T で 100% SEGV (1T は 2.09M ops/s で完走) - System/mimalloc は 4T で 33.52M ops/s 正常動作 - SS OFF + Remote OFF でも 4T で SEGV **根本原因: (Task agent ultrathink 調査結果)** ``` CRASH: mov (%r15),%r13 R15 = 0x6261 ← ASCII "ba" (ゴミ値、未初期化TLS) ``` Worker スレッドの TLS 変数が未初期化: - `__thread void* g_tls_sll_head[TINY_NUM_CLASSES];` ← 初期化なし - pthread_create() で生成されたスレッドでゼロ初期化されない - NULL チェックが通過 (0x6261 != NULL) → dereference → SEGV **修正内容:** 全 TLS 配列に明示的初期化子 `= {0}` を追加: 1. **core/hakmem_tiny.c:** - `g_tls_sll_head[TINY_NUM_CLASSES] = {0}` - `g_tls_sll_count[TINY_NUM_CLASSES] = {0}` - `g_tls_live_ss[TINY_NUM_CLASSES] = {0}` - `g_tls_bcur[TINY_NUM_CLASSES] = {0}` - `g_tls_bend[TINY_NUM_CLASSES] = {0}` 2. **core/tiny_fastcache.c:** - `g_tiny_fast_cache[TINY_FAST_CLASS_COUNT] = {0}` - `g_tiny_fast_count[TINY_FAST_CLASS_COUNT] = {0}` - `g_tiny_fast_free_head[TINY_FAST_CLASS_COUNT] = {0}` - `g_tiny_fast_free_count[TINY_FAST_CLASS_COUNT] = {0}` 3. **core/hakmem_tiny_magazine.c:** - `g_tls_mags[TINY_NUM_CLASSES] = {0}` 4. **core/tiny_sticky.c:** - `g_tls_sticky_ss[TINY_NUM_CLASSES][TINY_STICKY_RING] = {0}` - `g_tls_sticky_idx[TINY_NUM_CLASSES][TINY_STICKY_RING] = {0}` - `g_tls_sticky_pos[TINY_NUM_CLASSES] = {0}` **効果:** ``` Before: 1T: 2.09M ✅ | 4T: SEGV 💀 After: 1T: 2.41M ✅ | 4T: 4.19M ✅ (+15% 1T, SEGV解消) ``` **テスト:** ```bash # 1 thread: 完走 ./larson_hakmem 2 8 128 1024 1 12345 1 → Throughput = 2,407,597 ops/s ✅ # 4 threads: 完走(以前は SEGV) ./larson_hakmem 2 8 128 1024 1 12345 4 → Throughput = 4,192,155 ops/s ✅ ``` **調査協力:** Task agent (ultrathink mode) による完璧な根本原因特定 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-07 01:27:04 +09:00
// BUGFIX: Memory barrier to ensure freelist visibility before used decrement
// Without this, other threads can see new freelist but old used count (race)
atomic_thread_fence(memory_order_release);
// Optional freelist mask update on first push
do {
static int g_mask_en = -1;
if (__builtin_expect(g_mask_en == -1, 0)) {
const char* e = getenv("HAKMEM_TINY_FREELIST_MASK");
g_mask_en = (e && *e && *e != '0') ? 1 : 0;
}
if (__builtin_expect(g_mask_en, 0) && prev == NULL) {
uint32_t bit = (1u << slab_idx);
atomic_fetch_or_explicit(&ss->freelist_mask, bit, memory_order_release);
}
} while (0);
// Track local free (debug helpers may be no-op)
tiny_remote_track_on_local_free(ss, slab_idx, ptr, "local_free", my_tid);
meta->used--;
ss_active_dec_one(ss);
// DEBUG LOGGING - Track when used reaches 0
static int dbg = -1;
if (__builtin_expect(dbg == -1, 0)) {
const char* e = getenv("HAKMEM_SS_FREE_DEBUG");
dbg = (e && *e && *e != '0') ? 1 : 0;
}
if (dbg == 1 && meta->used == 0) {
fprintf(stderr, "[FREE_LOCAL_BOX] meta->used=0 detected: cls=%u ss=%p slab=%d\n",
cls, (void*)ss, slab_idx);
}
CRITICAL FIX: TLS 未初期化による 4T SEGV を完全解消 **問題:** - Larson 4T で 100% SEGV (1T は 2.09M ops/s で完走) - System/mimalloc は 4T で 33.52M ops/s 正常動作 - SS OFF + Remote OFF でも 4T で SEGV **根本原因: (Task agent ultrathink 調査結果)** ``` CRASH: mov (%r15),%r13 R15 = 0x6261 ← ASCII "ba" (ゴミ値、未初期化TLS) ``` Worker スレッドの TLS 変数が未初期化: - `__thread void* g_tls_sll_head[TINY_NUM_CLASSES];` ← 初期化なし - pthread_create() で生成されたスレッドでゼロ初期化されない - NULL チェックが通過 (0x6261 != NULL) → dereference → SEGV **修正内容:** 全 TLS 配列に明示的初期化子 `= {0}` を追加: 1. **core/hakmem_tiny.c:** - `g_tls_sll_head[TINY_NUM_CLASSES] = {0}` - `g_tls_sll_count[TINY_NUM_CLASSES] = {0}` - `g_tls_live_ss[TINY_NUM_CLASSES] = {0}` - `g_tls_bcur[TINY_NUM_CLASSES] = {0}` - `g_tls_bend[TINY_NUM_CLASSES] = {0}` 2. **core/tiny_fastcache.c:** - `g_tiny_fast_cache[TINY_FAST_CLASS_COUNT] = {0}` - `g_tiny_fast_count[TINY_FAST_CLASS_COUNT] = {0}` - `g_tiny_fast_free_head[TINY_FAST_CLASS_COUNT] = {0}` - `g_tiny_fast_free_count[TINY_FAST_CLASS_COUNT] = {0}` 3. **core/hakmem_tiny_magazine.c:** - `g_tls_mags[TINY_NUM_CLASSES] = {0}` 4. **core/tiny_sticky.c:** - `g_tls_sticky_ss[TINY_NUM_CLASSES][TINY_STICKY_RING] = {0}` - `g_tls_sticky_idx[TINY_NUM_CLASSES][TINY_STICKY_RING] = {0}` - `g_tls_sticky_pos[TINY_NUM_CLASSES] = {0}` **効果:** ``` Before: 1T: 2.09M ✅ | 4T: SEGV 💀 After: 1T: 2.41M ✅ | 4T: 4.19M ✅ (+15% 1T, SEGV解消) ``` **テスト:** ```bash # 1 thread: 完走 ./larson_hakmem 2 8 128 1024 1 12345 1 → Throughput = 2,407,597 ops/s ✅ # 4 threads: 完走(以前は SEGV) ./larson_hakmem 2 8 128 1024 1 12345 4 → Throughput = 4,192,155 ops/s ✅ ``` **調査協力:** Task agent (ultrathink mode) による完璧な根本原因特定 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-07 01:27:04 +09:00
if (prev == NULL) {
Phase 12: Shared SuperSlab Pool implementation (WIP - runtime crash) ## Summary Implemented Phase 12 Shared SuperSlab Pool (mimalloc-style) to address SuperSlab allocation churn (877 SuperSlabs → 100-200 target). ## Implementation (ChatGPT + Claude) 1. **Metadata changes** (superslab_types.h): - Added class_idx to TinySlabMeta (per-slab dynamic class) - Removed size_class from SuperSlab (no longer per-SuperSlab) - Changed owner_tid (16-bit) → owner_tid_low (8-bit) 2. **Shared Pool** (hakmem_shared_pool.{h,c}): - Global pool shared by all size classes - shared_pool_acquire_slab() - Get free slab for class_idx - shared_pool_release_slab() - Return slab when empty - Per-class hints for fast path optimization 3. **Integration** (23 files modified): - Updated all ss->size_class → meta->class_idx - Updated all meta->owner_tid → meta->owner_tid_low - superslab_refill() now uses shared pool - Free path releases empty slabs back to pool 4. **Build system** (Makefile): - Added hakmem_shared_pool.o to OBJS_BASE and TINY_BENCH_OBJS_BASE ## Status: ⚠️ Build OK, Runtime CRASH **Build**: ✅ SUCCESS - All 23 files compile without errors - Only warnings: superslab_allocate type mismatch (legacy code) **Runtime**: ❌ SEGFAULT - Crash location: sll_refill_small_from_ss() - Exit code: 139 (SIGSEGV) - Test case: ./bench_random_mixed_hakmem 1000 256 42 ## Known Issues 1. **SEGFAULT in refill path** - Likely shared_pool_acquire_slab() issue 2. **Legacy superslab_allocate()** still exists (type mismatch warning) 3. **Remaining TODOs** from design doc: - SuperSlab physical layout integration - slab_handle.h cleanup - Remove old per-class head implementation ## Next Steps 1. Debug SEGFAULT (gdb backtrace shows sll_refill_small_from_ss) 2. Fix shared_pool_acquire_slab() or superslab_init_slab() 3. Basic functionality test (1K → 100K iterations) 4. Measure SuperSlab count reduction (877 → 100-200) 5. Performance benchmark (+650-860% expected) ## Files Changed (25 files) core/box/free_local_box.c core/box/free_remote_box.c core/box/front_gate_classifier.c core/hakmem_super_registry.c core/hakmem_tiny.c core/hakmem_tiny_bg_spill.c core/hakmem_tiny_free.inc core/hakmem_tiny_lifecycle.inc core/hakmem_tiny_magazine.c core/hakmem_tiny_query.c core/hakmem_tiny_refill.inc.h core/hakmem_tiny_superslab.c core/hakmem_tiny_superslab.h core/hakmem_tiny_tls_ops.h core/slab_handle.h core/superslab/superslab_inline.h core/superslab/superslab_types.h core/tiny_debug.h core/tiny_free_fast.inc.h core/tiny_free_magazine.inc.h core/tiny_remote.c core/tiny_superslab_alloc.inc.h core/tiny_superslab_free.inc.h Makefile ## New Files (3 files) PHASE12_SHARED_SUPERSLAB_POOL_DESIGN.md core/hakmem_shared_pool.c core/hakmem_shared_pool.h 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: ChatGPT <chatgpt@openai.com>
2025-11-13 16:33:03 +09:00
// First-free → advertise slab to adopters using per-slab class
uint8_t cls0 = (meta && meta->class_idx < TINY_NUM_CLASSES) ? meta->class_idx : 0;
tiny_free_publish_first_free((int)cls0, ss, slab_idx);
CRITICAL FIX: TLS 未初期化による 4T SEGV を完全解消 **問題:** - Larson 4T で 100% SEGV (1T は 2.09M ops/s で完走) - System/mimalloc は 4T で 33.52M ops/s 正常動作 - SS OFF + Remote OFF でも 4T で SEGV **根本原因: (Task agent ultrathink 調査結果)** ``` CRASH: mov (%r15),%r13 R15 = 0x6261 ← ASCII "ba" (ゴミ値、未初期化TLS) ``` Worker スレッドの TLS 変数が未初期化: - `__thread void* g_tls_sll_head[TINY_NUM_CLASSES];` ← 初期化なし - pthread_create() で生成されたスレッドでゼロ初期化されない - NULL チェックが通過 (0x6261 != NULL) → dereference → SEGV **修正内容:** 全 TLS 配列に明示的初期化子 `= {0}` を追加: 1. **core/hakmem_tiny.c:** - `g_tls_sll_head[TINY_NUM_CLASSES] = {0}` - `g_tls_sll_count[TINY_NUM_CLASSES] = {0}` - `g_tls_live_ss[TINY_NUM_CLASSES] = {0}` - `g_tls_bcur[TINY_NUM_CLASSES] = {0}` - `g_tls_bend[TINY_NUM_CLASSES] = {0}` 2. **core/tiny_fastcache.c:** - `g_tiny_fast_cache[TINY_FAST_CLASS_COUNT] = {0}` - `g_tiny_fast_count[TINY_FAST_CLASS_COUNT] = {0}` - `g_tiny_fast_free_head[TINY_FAST_CLASS_COUNT] = {0}` - `g_tiny_fast_free_count[TINY_FAST_CLASS_COUNT] = {0}` 3. **core/hakmem_tiny_magazine.c:** - `g_tls_mags[TINY_NUM_CLASSES] = {0}` 4. **core/tiny_sticky.c:** - `g_tls_sticky_ss[TINY_NUM_CLASSES][TINY_STICKY_RING] = {0}` - `g_tls_sticky_idx[TINY_NUM_CLASSES][TINY_STICKY_RING] = {0}` - `g_tls_sticky_pos[TINY_NUM_CLASSES] = {0}` **効果:** ``` Before: 1T: 2.09M ✅ | 4T: SEGV 💀 After: 1T: 2.41M ✅ | 4T: 4.19M ✅ (+15% 1T, SEGV解消) ``` **テスト:** ```bash # 1 thread: 完走 ./larson_hakmem 2 8 128 1024 1 12345 1 → Throughput = 2,407,597 ops/s ✅ # 4 threads: 完走(以前は SEGV) ./larson_hakmem 2 8 128 1024 1 12345 4 → Throughput = 4,192,155 ops/s ✅ ``` **調査協力:** Task agent (ultrathink mode) による完璧な根本原因特定 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-07 01:27:04 +09:00
}
}