From 52cd7c55432d5ce69fa4113cb582f9cff6f7a927 Mon Sep 17 00:00:00 2001
From: "Moe Charm (CI)" <moecharm@example.com>
Date: Sat, 15 Nov 2025 13:38:22 +0900
Subject: [PATCH] Fix SEGV in Shared Pool Stage 1: Add NULL check for freed
 SuperSlab
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Problem: Race condition causing NULL pointer dereference
- Thread A: Pushes slot to freelist → frees SuperSlab → ss=NULL
- Thread B: Pops stale slot from freelist → loads ss=NULL → CRASH at Line 584

Symptoms (bench_fixed_size_hakmem):
- workset=64, iterations >= 2150: SEGV (NULL dereference)
- Crash happened after ~67 drain cycles (interval=2048)
- Affected ALL size classes at high churn (not workset-specific)

Root Cause: core/hakmem_shared_pool.c Line 564-584
- Stage 1 loads SuperSlab pointer (Line 564) but missing NULL check
- Stage 2 already has this NULL check (Line 618-622) but Stage 1 missed it
- Classic race: freelist slot points to freed SuperSlab

Solution: Add defensive NULL check in Stage 1 (13 lines)
- Check if ss==NULL after atomic load (Line 569-576)
- On NULL: unlock mutex, goto stage2_fallback
- Matches Stage 2's existing pattern (consistency)

Results (bench_fixed_size 16B):
- Before: workset=64 10K iter → SEGV (core dump)
- After:  workset=64 10K iter → 28M ops/s ✅
- After:  workset=64 100K iter → 44M ops/s ✅ (high load stable)

Not related to Phase 13-B TinyHeapV2 supply hook
- Crash reproduces with HAKMEM_TINY_HEAP_V2=0
- Pre-existing bug in Phase 12 shared pool implementation

Credit: Discovered and analyzed by Task agent (general-purpose)
Report: BENCH_FIXED_SIZE_WORKSET64_CRASH_REPORT.md

🤝 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
---
 core/hakmem_shared_pool.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/core/hakmem_shared_pool.c b/core/hakmem_shared_pool.c
index daaeb6c2..71a5129e 100644
--- a/core/hakmem_shared_pool.c
+++ b/core/hakmem_shared_pool.c
@@ -563,6 +563,18 @@ shared_pool_acquire_slab(int class_idx, SuperSlab** ss_out, int* slab_idx_out)
             // RACE FIX: Load SuperSlab pointer atomically (consistency)
             SuperSlab* ss = atomic_load_explicit(&reuse_meta->ss, memory_order_relaxed);
 
+            // RACE FIX: Check if SuperSlab was freed (NULL pointer)
+            // This can happen if Thread A freed the SuperSlab after pushing slot to freelist,
+            // but Thread B popped the stale slot before the freelist was cleared.
+            if (!ss) {
+                // SuperSlab freed - skip and fall through to Stage 2/3
+                if (g_lock_stats_enabled == 1) {
+                    atomic_fetch_add(&g_lock_release_count, 1);
+                }
+                pthread_mutex_unlock(&g_shared_pool.alloc_lock);
+                goto stage2_fallback;
+            }
+
             if (dbg_acquire == 1) {
                 fprintf(stderr, "[SP_ACQUIRE_STAGE1_LOCKFREE] class=%d reusing EMPTY slot (ss=%p slab=%d)\n",
                         class_idx, (void*)ss, reuse_slot_idx);
@@ -598,6 +610,7 @@ shared_pool_acquire_slab(int class_idx, SuperSlab** ss_out, int* slab_idx_out)
         pthread_mutex_unlock(&g_shared_pool.alloc_lock);
     }
 
+stage2_fallback:
     // ========== Stage 2 (Lock-Free): Try to claim UNUSED slots ==========
     // P0-5: Lock-free atomic CAS claiming (no mutex needed for slot state transition!)
     // RACE FIX: Read ss_meta_count atomically (now properly declared as _Atomic)