From 52cd7c55432d5ce69fa4113cb582f9cff6f7a927 Mon Sep 17 00:00:00 2001 From: "Moe Charm (CI)" Date: Sat, 15 Nov 2025 13:38:22 +0900 Subject: [PATCH] Fix SEGV in Shared Pool Stage 1: Add NULL check for freed SuperSlab MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Problem: Race condition causing NULL pointer dereference - Thread A: Pushes slot to freelist → frees SuperSlab → ss=NULL - Thread B: Pops stale slot from freelist → loads ss=NULL → CRASH at Line 584 Symptoms (bench_fixed_size_hakmem): - workset=64, iterations >= 2150: SEGV (NULL dereference) - Crash happened after ~67 drain cycles (interval=2048) - Affected ALL size classes at high churn (not workset-specific) Root Cause: core/hakmem_shared_pool.c Line 564-584 - Stage 1 loads SuperSlab pointer (Line 564) but missing NULL check - Stage 2 already has this NULL check (Line 618-622) but Stage 1 missed it - Classic race: freelist slot points to freed SuperSlab Solution: Add defensive NULL check in Stage 1 (13 lines) - Check if ss==NULL after atomic load (Line 569-576) - On NULL: unlock mutex, goto stage2_fallback - Matches Stage 2's existing pattern (consistency) Results (bench_fixed_size 16B): - Before: workset=64 10K iter → SEGV (core dump) - After: workset=64 10K iter → 28M ops/s ✅ - After: workset=64 100K iter → 44M ops/s ✅ (high load stable) Not related to Phase 13-B TinyHeapV2 supply hook - Crash reproduces with HAKMEM_TINY_HEAP_V2=0 - Pre-existing bug in Phase 12 shared pool implementation Credit: Discovered and analyzed by Task agent (general-purpose) Report: BENCH_FIXED_SIZE_WORKSET64_CRASH_REPORT.md 🤝 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude --- core/hakmem_shared_pool.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/core/hakmem_shared_pool.c b/core/hakmem_shared_pool.c index daaeb6c2..71a5129e 100644 --- a/core/hakmem_shared_pool.c +++ b/core/hakmem_shared_pool.c @@ -563,6 +563,18 @@ shared_pool_acquire_slab(int class_idx, SuperSlab** ss_out, int* slab_idx_out) // RACE FIX: Load SuperSlab pointer atomically (consistency) SuperSlab* ss = atomic_load_explicit(&reuse_meta->ss, memory_order_relaxed); + // RACE FIX: Check if SuperSlab was freed (NULL pointer) + // This can happen if Thread A freed the SuperSlab after pushing slot to freelist, + // but Thread B popped the stale slot before the freelist was cleared. + if (!ss) { + // SuperSlab freed - skip and fall through to Stage 2/3 + if (g_lock_stats_enabled == 1) { + atomic_fetch_add(&g_lock_release_count, 1); + } + pthread_mutex_unlock(&g_shared_pool.alloc_lock); + goto stage2_fallback; + } + if (dbg_acquire == 1) { fprintf(stderr, "[SP_ACQUIRE_STAGE1_LOCKFREE] class=%d reusing EMPTY slot (ss=%p slab=%d)\n", class_idx, (void*)ss, reuse_slot_idx); @@ -598,6 +610,7 @@ shared_pool_acquire_slab(int class_idx, SuperSlab** ss_out, int* slab_idx_out) pthread_mutex_unlock(&g_shared_pool.alloc_lock); } +stage2_fallback: // ========== Stage 2 (Lock-Free): Try to claim UNUSED slots ========== // P0-5: Lock-free atomic CAS claiming (no mutex needed for slot state transition!) // RACE FIX: Read ss_meta_count atomically (now properly declared as _Atomic)