Fix SEGV in Shared Pool Stage 1: Add NULL check for freed SuperSlab
Problem: Race condition causing NULL pointer dereference - Thread A: Pushes slot to freelist → frees SuperSlab → ss=NULL - Thread B: Pops stale slot from freelist → loads ss=NULL → CRASH at Line 584 Symptoms (bench_fixed_size_hakmem): - workset=64, iterations >= 2150: SEGV (NULL dereference) - Crash happened after ~67 drain cycles (interval=2048) - Affected ALL size classes at high churn (not workset-specific) Root Cause: core/hakmem_shared_pool.c Line 564-584 - Stage 1 loads SuperSlab pointer (Line 564) but missing NULL check - Stage 2 already has this NULL check (Line 618-622) but Stage 1 missed it - Classic race: freelist slot points to freed SuperSlab Solution: Add defensive NULL check in Stage 1 (13 lines) - Check if ss==NULL after atomic load (Line 569-576) - On NULL: unlock mutex, goto stage2_fallback - Matches Stage 2's existing pattern (consistency) Results (bench_fixed_size 16B): - Before: workset=64 10K iter → SEGV (core dump) - After: workset=64 10K iter → 28M ops/s ✅ - After: workset=64 100K iter → 44M ops/s ✅ (high load stable) Not related to Phase 13-B TinyHeapV2 supply hook - Crash reproduces with HAKMEM_TINY_HEAP_V2=0 - Pre-existing bug in Phase 12 shared pool implementation Credit: Discovered and analyzed by Task agent (general-purpose) Report: BENCH_FIXED_SIZE_WORKSET64_CRASH_REPORT.md 🤝 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@ -563,6 +563,18 @@ shared_pool_acquire_slab(int class_idx, SuperSlab** ss_out, int* slab_idx_out)
|
||||
// RACE FIX: Load SuperSlab pointer atomically (consistency)
|
||||
SuperSlab* ss = atomic_load_explicit(&reuse_meta->ss, memory_order_relaxed);
|
||||
|
||||
// RACE FIX: Check if SuperSlab was freed (NULL pointer)
|
||||
// This can happen if Thread A freed the SuperSlab after pushing slot to freelist,
|
||||
// but Thread B popped the stale slot before the freelist was cleared.
|
||||
if (!ss) {
|
||||
// SuperSlab freed - skip and fall through to Stage 2/3
|
||||
if (g_lock_stats_enabled == 1) {
|
||||
atomic_fetch_add(&g_lock_release_count, 1);
|
||||
}
|
||||
pthread_mutex_unlock(&g_shared_pool.alloc_lock);
|
||||
goto stage2_fallback;
|
||||
}
|
||||
|
||||
if (dbg_acquire == 1) {
|
||||
fprintf(stderr, "[SP_ACQUIRE_STAGE1_LOCKFREE] class=%d reusing EMPTY slot (ss=%p slab=%d)\n",
|
||||
class_idx, (void*)ss, reuse_slot_idx);
|
||||
@ -598,6 +610,7 @@ shared_pool_acquire_slab(int class_idx, SuperSlab** ss_out, int* slab_idx_out)
|
||||
pthread_mutex_unlock(&g_shared_pool.alloc_lock);
|
||||
}
|
||||
|
||||
stage2_fallback:
|
||||
// ========== Stage 2 (Lock-Free): Try to claim UNUSED slots ==========
|
||||
// P0-5: Lock-free atomic CAS claiming (no mutex needed for slot state transition!)
|
||||
// RACE FIX: Read ss_meta_count atomically (now properly declared as _Atomic)
|
||||
|
||||
Reference in New Issue
Block a user