Fix SEGV in Shared Pool Stage 1: Add NULL check for freed SuperSlab

Problem: Race condition causing NULL pointer dereference - Thread A: Pushes slot to freelist → frees SuperSlab → ss=NULL - Thread B: Pops stale slot from freelist → loads ss=NULL → CRASH at Line 584 Symptoms (bench_fixed_size_hakmem): - workset=64, iterations >= 2150: SEGV (NULL dereference) - Crash happened after ~67 drain cycles (interval=2048) - Affected ALL size classes at high churn (not workset-specific) Root Cause: core/hakmem_shared_pool.c Line 564-584 - Stage 1 loads SuperSlab pointer (Line 564) but missing NULL check - Stage 2 already has this NULL check (Line 618-622) but Stage 1 missed it - Classic race: freelist slot points to freed SuperSlab Solution: Add defensive NULL check in Stage 1 (13 lines) - Check if ss==NULL after atomic load (Line 569-576) - On NULL: unlock mutex, goto stage2_fallback - Matches Stage 2's existing pattern (consistency) Results (bench_fixed_size 16B): - Before: workset=64 10K iter → SEGV (core dump) - After: workset=64 10K iter → 28M ops/s ✅ - After: workset=64 100K iter → 44M ops/s ✅ (high load stable) Not related to Phase 13-B TinyHeapV2 supply hook - Crash reproduces with HAKMEM_TINY_HEAP_V2=0 - Pre-existing bug in Phase 12 shared pool implementation Credit: Discovered and analyzed by Task agent (general-purpose) Report: BENCH_FIXED_SIZE_WORKSET64_CRASH_REPORT.md 🤝 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-15 13:38:22 +09:00
parent 0d42913efe
commit 52cd7c5543
1 changed files with 13 additions and 0 deletions
--- a/core/hakmem_shared_pool.c
+++ b/core/hakmem_shared_pool.c
@ -563,6 +563,18 @@ shared_pool_acquire_slab(int class_idx, SuperSlab** ss_out, int* slab_idx_out)
            // RACE FIX: Load SuperSlab pointer atomically (consistency)
            SuperSlab* ss = atomic_load_explicit(&reuse_meta->ss, memory_order_relaxed);

+            // RACE FIX: Check if SuperSlab was freed (NULL pointer)
+            // This can happen if Thread A freed the SuperSlab after pushing slot to freelist,
+            // but Thread B popped the stale slot before the freelist was cleared.
+            if (!ss) {
+                // SuperSlab freed - skip and fall through to Stage 2/3
+                if (g_lock_stats_enabled == 1) {
+                    atomic_fetch_add(&g_lock_release_count, 1);
+                }
+                pthread_mutex_unlock(&g_shared_pool.alloc_lock);
+                goto stage2_fallback;
+            }
+
            if (dbg_acquire == 1) {
                fprintf(stderr, "[SP_ACQUIRE_STAGE1_LOCKFREE] class=%d reusing EMPTY slot (ss=%p slab=%d)\n",
                        class_idx, (void*)ss, reuse_slot_idx);
@ -598,6 +610,7 @@ shared_pool_acquire_slab(int class_idx, SuperSlab** ss_out, int* slab_idx_out)
        pthread_mutex_unlock(&g_shared_pool.alloc_lock);
    }

+stage2_fallback:
    // ========== Stage 2 (Lock-Free): Try to claim UNUSED slots ==========
    // P0-5: Lock-free atomic CAS claiming (no mutex needed for slot state transition!)
    // RACE FIX: Read ss_meta_count atomically (now properly declared as _Atomic)