# bench_fixed_size_hakmem Crash Report: workset=64 Race Condition **Date**: 2025-11-15 **Status**: ๐Ÿ”ด **ROOT CAUSE IDENTIFIED** - Race condition in Stage 1 (lock-free freelist reuse) --- ## Executive Summary `bench_fixed_size_hakmem` crashes with SEGV when `workset=64` and `iterations >= 2150`: ```bash # Works fine: ./out/release/bench_fixed_size_hakmem 10000 16 60 # OK ./out/release/bench_fixed_size_hakmem 2100 16 64 # OK # Crashes: ./out/release/bench_fixed_size_hakmem 2150 16 64 # SEGV ./out/release/bench_fixed_size_hakmem 10000 16 64 # SEGV ``` **Root Cause**: NULL pointer dereference in `shared_pool_acquire_slab()` Stage 1 due to race condition between: - Thread A releasing a SuperSlab (sets `sp_meta->ss = NULL`, frees memory) - Thread B reusing a slot from the freelist (loads stale `sp_meta` with NULL `ss`) --- ## Crash Details ### Stack Trace ``` Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00005a12b89a770b in shared_pool_acquire_slab.constprop () Crashing instruction: => or %r15d,0x14(%r14) Register state: r14 = 0x0 (NULL pointer!) ``` **Disassembly context** (line 572 in `hakmem_shared_pool.c`): ```asm 0x5a12b89a770b: or %r15d,0x14(%r14) ; Tries to access ss->slab_bitmap (offset 0x14) ; r14 = ss = NULL โ†’ SEGV ``` ### Debug Log Output ``` [SP_ACQUIRE_STAGE2_LOCKFREE] class=2 claimed UNUSED slot (ss=0x791110200000 slab=31) [SP_ACQUIRE_STAGE3] class=2 new SuperSlab (ss=0x79110fe00000 from_lru=0) [SP_ACQUIRE_STAGE1_LOCKFREE] class=2 reusing EMPTY slot (ss=(nil) slab=0) โ† CRASH HERE ``` **Smoking gun**: Last line shows Stage 1 got `ss=(nil)` but still tried to use it! --- ## Root Cause Analysis ### The Race Condition **File**: `core/hakmem_shared_pool.c` **Function**: `shared_pool_acquire_slab()` (lines 514-738) **Race Timeline**: | Time | Thread A (Releasing Slab) | Thread B (Acquiring Slab) | |------|---------------------------|---------------------------| | T0 | `shared_pool_release_slab(ss, idx)` called | - | | T1 | Line 840: `sp_freelist_push_lockfree(class, meta, idx)` | - | | | (Slot pushed to freelist, ss still valid) | - | | T2 | Line 850: Detects `active_slots == 0` | - | | T3 | Line 862: `atomic_store(&meta->ss, NULL)` | - | | T4 | Line 870: `superslab_free(ss)` (memory freed) | - | | T5 | - | `shared_pool_acquire_slab(class, ...)` called | | T6 | - | Line 548: `sp_freelist_pop_lockfree()` **pops stale meta** | | T7 | - | Line 564: `ss = atomic_load(&meta->ss)` **ss = NULL!** | | T8 | - | Line 566-569: Debug log shows `ss=(nil)` | | T9 | - | Line 572: `ss->slab_bitmap \|= ...` **SEGV!** | ### Vulnerable Code Path **Stage 1 (Lock-Free Freelist Reuse)** in `shared_pool_acquire_slab()`: ```c // Lines 548-592 (hakmem_shared_pool.c) if (sp_freelist_pop_lockfree(class_idx, &reuse_meta, &reuse_slot_idx)) { // ... pthread_mutex_lock(&g_shared_pool.alloc_lock); // Activate slot under mutex (slot state transition requires protection) if (sp_slot_mark_active(reuse_meta, reuse_slot_idx, class_idx) == 0) { // โš ๏ธ BUG: Load ss atomically, but NO NULL CHECK! SuperSlab* ss = atomic_load_explicit(&reuse_meta->ss, memory_order_relaxed); if (dbg_acquire == 1) { fprintf(stderr, "[SP_ACQUIRE_STAGE1_LOCKFREE] class=%d reusing EMPTY slot (ss=%p slab=%d)\n", class_idx, (void*)ss, reuse_slot_idx); } // โŒ CRASH HERE: ss can be NULL if SuperSlab was freed after push but before pop ss->slab_bitmap |= (1u << reuse_slot_idx); // Line 572: NULL dereference! // ... } } ``` **Why the NULL check is missing:** The code assumes: 1. If `sp_freelist_pop_lockfree()` returns true โ†’ slot is valid 2. If `sp_slot_mark_active()` succeeds โ†’ SuperSlab must still exist **But this is wrong** because: 1. Slot was pushed to freelist when SuperSlab was still valid (line 840) 2. SuperSlab was freed AFTER push but BEFORE pop (line 862-870) 3. The freelist node contains a stale `sp_meta` pointer whose `ss` is now NULL ### Why Stage 2 Doesn't Crash **Stage 2 (Lock-Free UNUSED Slot Claiming)** has proper NULL handling: ```c // Lines 613-622 (hakmem_shared_pool.c) int claimed_idx = sp_slot_claim_lockfree(meta, class_idx); if (claimed_idx >= 0) { SuperSlab* ss = atomic_load_explicit(&meta->ss, memory_order_acquire); if (!ss) { // โœ… CORRECT: Skip if SuperSlab was freed continue; } // ... safe to use ss } ``` This check was added in a previous RACE FIX but **was not applied to Stage 1**. --- ## Why workset=64 Specifically? The crash is **NOT** specific to workset=64, but rather to **total operations ร— drain frequency**: ### Crash Threshold Analysis | workset | iterations | Total Ops | Crash? | Drain Cycles (รท2048) | |---------|-----------|-----------|--------|---------------------| | 60 | 10000 | 600,000 | โŒ OK | 293 | | 64 | 2100 | 134,400 | โŒ OK | 66 | | 64 | 2150 | 137,600 | โœ… CRASH | 67 | | 64 | 10000 | 640,000 | โœ… CRASH | 313 | **Pattern**: Crash happens around **2150 iterations** (137,600 ops, ~67 drain cycles). **Why this threshold?** 1. **TLS SLL drain interval** = 2048 (default) 2. At ~2150 iterations: - First major drain cycle completes (~67 drains) - Many slabs are released to shared pool - Freelist accumulates many freed slots - Some SuperSlabs become completely empty โ†’ freed - Race window opens: slots in freelist whose SuperSlabs are freed 3. **workset=64** amplifies the issue: - Larger working set = more concurrent allocations - More slabs active โ†’ more slabs released during drain - Higher probability of hitting the race window --- ## Reproduction ### Minimal Repro ```bash cd /mnt/workdisk/public_share/hakmem # Crash reliably: ./out/release/bench_fixed_size_hakmem 2150 16 64 # Debug logging (shows ss=(nil)): HAKMEM_SS_ACQUIRE_DEBUG=1 ./out/release/bench_fixed_size_hakmem 2150 16 64 ``` **Expected Output** (last lines before crash): ``` [SP_ACQUIRE_STAGE2_LOCKFREE] class=2 claimed UNUSED slot (ss=0x... slab=31) [SP_ACQUIRE_STAGE3] class=2 new SuperSlab (ss=0x... from_lru=0) [SP_ACQUIRE_STAGE1_LOCKFREE] class=2 reusing EMPTY slot (ss=(nil) slab=0) Segmentation fault (core dumped) ``` ### Testing Boundaries ```bash # Find exact crash threshold: for i in {2100..2200..10}; do ./out/release/bench_fixed_size_hakmem $i 16 64 >/dev/null 2>&1 \ && echo "iters=$i: OK" \ || echo "iters=$i: CRASH" done # Output: # iters=2100: OK # iters=2110: OK # ... # iters=2140: OK # iters=2150: CRASH โ† First crash ``` --- ## Recommended Fix **File**: `core/hakmem_shared_pool.c` **Function**: `shared_pool_acquire_slab()` **Lines**: 562-592 (Stage 1) ### Patch (Minimal, 5 lines) ```diff --- a/core/hakmem_shared_pool.c +++ b/core/hakmem_shared_pool.c @@ -561,6 +561,12 @@ shared_pool_acquire_slab(int class_idx, SuperSlab** ss_out, int* slab_idx_out) // Activate slot under mutex (slot state transition requires protection) if (sp_slot_mark_active(reuse_meta, reuse_slot_idx, class_idx) == 0) { // RACE FIX: Load SuperSlab pointer atomically (consistency) SuperSlab* ss = atomic_load_explicit(&reuse_meta->ss, memory_order_relaxed); + + // RACE FIX: Check if SuperSlab was freed between push and pop + if (!ss) { + // SuperSlab freed after slot was pushed to freelist - skip and fall through + pthread_mutex_unlock(&g_shared_pool.alloc_lock); + goto stage2_fallback; // Try Stage 2 (UNUSED slots) or Stage 3 (new SS) + } if (dbg_acquire == 1) { fprintf(stderr, "[SP_ACQUIRE_STAGE1_LOCKFREE] class=%d reusing EMPTY slot (ss=%p slab=%d)\n", @@ -598,6 +604,7 @@ shared_pool_acquire_slab(int class_idx, SuperSlab** ss_out, int* slab_idx_out) pthread_mutex_unlock(&g_shared_pool.alloc_lock); } +stage2_fallback: // ========== Stage 2 (Lock-Free): Try to claim UNUSED slots ========== ``` ### Alternative Fix (No goto, +10 lines) If `goto` is undesirable, wrap Stage 2+3 in a helper function or use a flag: ```c // After line 564: SuperSlab* ss = atomic_load_explicit(&reuse_meta->ss, memory_order_relaxed); if (!ss) { // SuperSlab was freed - release lock and continue to Stage 2 if (g_lock_stats_enabled == 1) { atomic_fetch_add(&g_lock_release_count, 1); } pthread_mutex_unlock(&g_shared_pool.alloc_lock); // Fall through to Stage 2 below (no goto needed) } else { // ... existing code (lines 566-591) } ``` --- ## Verification Plan ### Test Cases ```bash # 1. Original crash case (must pass after fix): ./out/release/bench_fixed_size_hakmem 2150 16 64 ./out/release/bench_fixed_size_hakmem 10000 16 64 # 2. Boundary cases (all must pass): ./out/release/bench_fixed_size_hakmem 2100 16 64 ./out/release/bench_fixed_size_hakmem 3000 16 64 ./out/release/bench_fixed_size_hakmem 10000 16 128 # 3. Other size classes (regression test): ./out/release/bench_fixed_size_hakmem 10000 256 128 ./out/release/bench_fixed_size_hakmem 10000 1024 128 # 4. Stress test (100K iterations, various worksets): for ws in 32 64 96 128 192 256; do echo "Testing workset=$ws..." ./out/release/bench_fixed_size_hakmem 100000 16 $ws || echo "FAIL: workset=$ws" done ``` ### Debug Validation After applying the fix, verify with debug logging: ```bash HAKMEM_SS_ACQUIRE_DEBUG=1 ./out/release/bench_fixed_size_hakmem 2150 16 64 2>&1 | \ grep "ss=(nil)" # Expected: No output (no NULL ss should reach Stage 1 activation) ``` --- ## Impact Assessment ### Severity: **CRITICAL (P0)** - **Reliability**: Crash in production workloads with high allocation churn - **Frequency**: Deterministic after ~2150 iterations (workload-dependent) - **Scope**: Affects all allocations using shared pool (Phase 12+) ### Affected Components 1. **Shared SuperSlab Pool** (`core/hakmem_shared_pool.c`) - Stage 1 lock-free freelist reuse path 2. **TLS SLL Drain** (indirectly) - Triggers slab releases that populate freelist 3. **All benchmarks using fixed worksets** - `bench_fixed_size_hakmem` - Potentially `bench_random_mixed_hakmem` with high churn ### Pre-Existing or Phase 13-B? **Pre-existing bug** in Phase 12 shared pool implementation. **Not caused by Phase 13-B changes** (TinyHeapV2 supply hook): - Crash reproduces with `HAKMEM_TINY_HEAP_V2=0` (HeapV2 disabled) - Root cause is in Stage 1 freelist logic (lines 562-592) - Phase 13-B only added supply hook in `tiny_free_fast_v2.inc.h` (separate code path) --- ## Related Issues ### Similar Bugs Fixed Previously 1. **Stage 2 NULL check** (lines 618-622): - Added in previous RACE FIX commit - Comment: "SuperSlab was freed between claiming and loading" - **Same pattern, but Stage 1 was missed!** 2. **sp_meta->ss NULL store** (line 862): - Added in RACE FIX: "Set meta->ss to NULL BEFORE unlocking mutex" - Correctly prevents Stage 2 from accessing freed SuperSlab - **But Stage 1 freelist can still hold stale pointers** ### Design Flaw: Freelist Lifetime Management The root issue is **decoupled lifetimes**: - Freelist nodes live in global pool (`g_free_node_pool`, never freed) - SuperSlabs are dynamically freed (line 870: `superslab_free(ss)`) - No mechanism to invalidate freelist nodes when SuperSlab is freed **Potential long-term fixes** (beyond this patch): 1. **Generation counter** in `SharedSSMeta`: - Increment on each SuperSlab allocation/free - Freelist node stores generation number - Pop path checks if generation matches (stale node โ†’ skip) 2. **Lazy freelist cleanup**: - Before freeing SuperSlab, scan freelist and remove matching nodes - Requires lock-free list traversal or fallback to mutex 3. **Reference counting** on `SharedSSMeta`: - Increment when pushing to freelist - Decrement when popping or freeing SuperSlab - Only free SuperSlab when refcount == 0 --- ## Files Involved ### Primary Bug Location - `/mnt/workdisk/public_share/hakmem/core/hakmem_shared_pool.c` - Line 562-592: Stage 1 (lock-free freelist reuse) - **MISSING NULL CHECK** - Line 618-622: Stage 2 (lock-free unused claiming) - **HAS NULL CHECK** โœ… - Line 840: `sp_freelist_push_lockfree()` - pushes slot to freelist - Line 862: Sets `sp_meta->ss = NULL` before freeing SuperSlab - Line 870: `superslab_free(ss)` - frees SuperSlab memory ### Related Files (Context) - `/mnt/workdisk/public_share/hakmem/benchmarks/src/fixed/bench_fixed_size.c` - Benchmark that triggers the crash (workset=64 pattern) - `/mnt/workdisk/public_share/hakmem/core/box/tls_sll_drain_box.h` - TLS SLL drain interval (2048) - affects when slabs are released - `/mnt/workdisk/public_share/hakmem/core/tiny_superslab_free.inc.h` - Line 234-235: Calls `shared_pool_release_slab()` when slab is empty --- ## Summary ### What Happened 1. **workset=64, iterations=2150** creates high allocation churn 2. After ~67 drain cycles, many slabs are released to shared pool 3. Some SuperSlabs become completely empty โ†’ freed 4. Freelist contains slots whose SuperSlabs are already freed (`ss = NULL`) 5. Stage 1 pops a stale slot, loads `ss = NULL`, crashes on dereference ### Why It Wasn't Caught Earlier 1. **Low iteration count** in normal testing (< 2000 iterations) 2. **Stage 2 already has NULL check** - assumed Stage 1 was also safe 3. **Race window is small** - only happens when: - Freelist is non-empty (needs prior releases) - SuperSlab is completely empty (all slots freed) - Another thread pops before SuperSlab is reallocated ### The Fix Add NULL check in Stage 1 after loading `ss`, matching Stage 2's pattern: ```c SuperSlab* ss = atomic_load_explicit(&reuse_meta->ss, memory_order_relaxed); if (!ss) { // SuperSlab freed - skip and fall through to Stage 2/3 pthread_mutex_unlock(&g_shared_pool.alloc_lock); goto stage2_fallback; // or return and retry } ``` **Impact**: Minimal overhead (1 NULL check per Stage 1 hit), fixes critical crash. --- ## Action Items - [ ] Apply minimal NULL check patch to `shared_pool_acquire_slab()` Stage 1 - [ ] Rebuild and test crash cases (workset=64, iterations=2150/10000) - [ ] Run stress test (100K iterations, worksets 32-256) - [ ] Verify with debug logging (no `ss=(nil)` in Stage 1) - [ ] Consider long-term fix (generation counter or refcounting) - [ ] Update `CURRENT_TASK.md` with fix status --- **Report End**