Fix SEGV in Shared Pool Stage 1: Add NULL check for freed SuperSlab

Problem: Race condition causing NULL pointer dereference
- Thread A: Pushes slot to freelist → frees SuperSlab → ss=NULL
- Thread B: Pops stale slot from freelist → loads ss=NULL → CRASH at Line 584

Symptoms (bench_fixed_size_hakmem):
- workset=64, iterations >= 2150: SEGV (NULL dereference)
- Crash happened after ~67 drain cycles (interval=2048)
- Affected ALL size classes at high churn (not workset-specific)

Root Cause: core/hakmem_shared_pool.c Line 564-584
- Stage 1 loads SuperSlab pointer (Line 564) but missing NULL check
- Stage 2 already has this NULL check (Line 618-622) but Stage 1 missed it
- Classic race: freelist slot points to freed SuperSlab

Solution: Add defensive NULL check in Stage 1 (13 lines)
- Check if ss==NULL after atomic load (Line 569-576)
- On NULL: unlock mutex, goto stage2_fallback
- Matches Stage 2's existing pattern (consistency)

Results (bench_fixed_size 16B):
- Before: workset=64 10K iter → SEGV (core dump)
- After:  workset=64 10K iter → 28M ops/s 
- After:  workset=64 100K iter → 44M ops/s  (high load stable)

Not related to Phase 13-B TinyHeapV2 supply hook
- Crash reproduces with HAKMEM_TINY_HEAP_V2=0
- Pre-existing bug in Phase 12 shared pool implementation

Credit: Discovered and analyzed by Task agent (general-purpose)
Report: BENCH_FIXED_SIZE_WORKSET64_CRASH_REPORT.md

🤝 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Moe Charm (CI)
2025-11-15 13:38:22 +09:00
parent 0d42913efe
commit 52cd7c5543

View File

@ -563,6 +563,18 @@ shared_pool_acquire_slab(int class_idx, SuperSlab** ss_out, int* slab_idx_out)
// RACE FIX: Load SuperSlab pointer atomically (consistency)
SuperSlab* ss = atomic_load_explicit(&reuse_meta->ss, memory_order_relaxed);
// RACE FIX: Check if SuperSlab was freed (NULL pointer)
// This can happen if Thread A freed the SuperSlab after pushing slot to freelist,
// but Thread B popped the stale slot before the freelist was cleared.
if (!ss) {
// SuperSlab freed - skip and fall through to Stage 2/3
if (g_lock_stats_enabled == 1) {
atomic_fetch_add(&g_lock_release_count, 1);
}
pthread_mutex_unlock(&g_shared_pool.alloc_lock);
goto stage2_fallback;
}
if (dbg_acquire == 1) {
fprintf(stderr, "[SP_ACQUIRE_STAGE1_LOCKFREE] class=%d reusing EMPTY slot (ss=%p slab=%d)\n",
class_idx, (void*)ss, reuse_slot_idx);
@ -598,6 +610,7 @@ shared_pool_acquire_slab(int class_idx, SuperSlab** ss_out, int* slab_idx_out)
pthread_mutex_unlock(&g_shared_pool.alloc_lock);
}
stage2_fallback:
// ========== Stage 2 (Lock-Free): Try to claim UNUSED slots ==========
// P0-5: Lock-free atomic CAS claiming (no mutex needed for slot state transition!)
// RACE FIX: Read ss_meta_count atomically (now properly declared as _Atomic)