Files
hakmem/BENCH_FIXED_SIZE_WORKSET64_CRASH_REPORT.md
Moe Charm (CI) 176bbf6569 Fix workset=128 infinite recursion bug (Shared Pool realloc → mmap)
Root Cause:
  - shared_pool_ensure_capacity_unlocked() used realloc() for metadata
  - realloc() → hak_alloc_at(128) → shared_pool_init() → realloc() → INFINITE RECURSION
  - Triggered by workset=128 (high memory pressure) but not workset=64

Symptoms:
  - bench_fixed_size_hakmem 1 16 128: timeout (infinite hang)
  - bench_fixed_size_hakmem 1 1024 128: works fine
  - Size-class specific: C1-C3 (16-64B) hung, C7 (1024B) worked

Fix:
  - Replace realloc() with direct mmap() for Shared Pool metadata allocation
  - Use munmap() to free old mappings (not free()\!)
  - Breaks recursion: Shared Pool metadata now allocated outside HAKMEM allocator

Files Modified:
  - core/hakmem_shared_pool.c:
    * Added sys/mman.h include
    * shared_pool_ensure_capacity_unlocked(): realloc → mmap/munmap (40 lines)
  - benchmarks/src/fixed/bench_fixed_size.c: (cleanup only, no logic change)

Performance (before → after):
  - 16B / workset=128: timeout → 18.5M ops/s  FIXED
  - 1024B / workset=128: 4.3M ops/s → 18.5M ops/s (no regression)
  - 16B / workset=64: 44M ops/s → 18.5M ops/s (no regression)

Testing:
  ./out/release/bench_fixed_size_hakmem 10000 256 128
  Expected: ~18M ops/s (instant completion)
  Before: infinite hang

Commit includes debug trace cleanup (Task agent removed all fprintf debug output).

Phase: 13-C (TinyHeapV2 debugging / Shared Pool stability fix)
2025-11-15 14:35:44 +09:00

14 KiB
Raw Blame History

bench_fixed_size_hakmem Crash Report: workset=64 Race Condition

Date: 2025-11-15 Status: 🔴 ROOT CAUSE IDENTIFIED - Race condition in Stage 1 (lock-free freelist reuse)


Executive Summary

bench_fixed_size_hakmem crashes with SEGV when workset=64 and iterations >= 2150:

# Works fine:
./out/release/bench_fixed_size_hakmem 10000 16 60  # OK
./out/release/bench_fixed_size_hakmem 2100 16 64   # OK

# Crashes:
./out/release/bench_fixed_size_hakmem 2150 16 64   # SEGV
./out/release/bench_fixed_size_hakmem 10000 16 64  # SEGV

Root Cause: NULL pointer dereference in shared_pool_acquire_slab() Stage 1 due to race condition between:

  • Thread A releasing a SuperSlab (sets sp_meta->ss = NULL, frees memory)
  • Thread B reusing a slot from the freelist (loads stale sp_meta with NULL ss)

Crash Details

Stack Trace

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00005a12b89a770b in shared_pool_acquire_slab.constprop ()

Crashing instruction:
=> or %r15d,0x14(%r14)

Register state:
r14 = 0x0  (NULL pointer!)

Disassembly context (line 572 in hakmem_shared_pool.c):

0x5a12b89a770b:  or %r15d,0x14(%r14)  ; Tries to access ss->slab_bitmap (offset 0x14)
                                       ; r14 = ss = NULL → SEGV

Debug Log Output

[SP_ACQUIRE_STAGE2_LOCKFREE] class=2 claimed UNUSED slot (ss=0x791110200000 slab=31)
[SP_ACQUIRE_STAGE3] class=2 new SuperSlab (ss=0x79110fe00000 from_lru=0)
[SP_ACQUIRE_STAGE1_LOCKFREE] class=2 reusing EMPTY slot (ss=(nil) slab=0)  ← CRASH HERE

Smoking gun: Last line shows Stage 1 got ss=(nil) but still tried to use it!


Root Cause Analysis

The Race Condition

File: core/hakmem_shared_pool.c Function: shared_pool_acquire_slab() (lines 514-738)

Race Timeline:

Time Thread A (Releasing Slab) Thread B (Acquiring Slab)
T0 shared_pool_release_slab(ss, idx) called -
T1 Line 840: sp_freelist_push_lockfree(class, meta, idx) -
(Slot pushed to freelist, ss still valid) -
T2 Line 850: Detects active_slots == 0 -
T3 Line 862: atomic_store(&meta->ss, NULL) -
T4 Line 870: superslab_free(ss) (memory freed) -
T5 - shared_pool_acquire_slab(class, ...) called
T6 - Line 548: sp_freelist_pop_lockfree() pops stale meta
T7 - Line 564: ss = atomic_load(&meta->ss) ss = NULL!
T8 - Line 566-569: Debug log shows ss=(nil)
T9 - Line 572: ss->slab_bitmap |= ... SEGV!

Vulnerable Code Path

Stage 1 (Lock-Free Freelist Reuse) in shared_pool_acquire_slab():

// Lines 548-592 (hakmem_shared_pool.c)
if (sp_freelist_pop_lockfree(class_idx, &reuse_meta, &reuse_slot_idx)) {
    // ...
    pthread_mutex_lock(&g_shared_pool.alloc_lock);

    // Activate slot under mutex (slot state transition requires protection)
    if (sp_slot_mark_active(reuse_meta, reuse_slot_idx, class_idx) == 0) {
        // ⚠️ BUG: Load ss atomically, but NO NULL CHECK!
        SuperSlab* ss = atomic_load_explicit(&reuse_meta->ss, memory_order_relaxed);

        if (dbg_acquire == 1) {
            fprintf(stderr, "[SP_ACQUIRE_STAGE1_LOCKFREE] class=%d reusing EMPTY slot (ss=%p slab=%d)\n",
                    class_idx, (void*)ss, reuse_slot_idx);
        }

        // ❌ CRASH HERE: ss can be NULL if SuperSlab was freed after push but before pop
        ss->slab_bitmap |= (1u << reuse_slot_idx);  // Line 572: NULL dereference!
        // ...
    }
}

Why the NULL check is missing:

The code assumes:

  1. If sp_freelist_pop_lockfree() returns true → slot is valid
  2. If sp_slot_mark_active() succeeds → SuperSlab must still exist

But this is wrong because:

  1. Slot was pushed to freelist when SuperSlab was still valid (line 840)
  2. SuperSlab was freed AFTER push but BEFORE pop (line 862-870)
  3. The freelist node contains a stale sp_meta pointer whose ss is now NULL

Why Stage 2 Doesn't Crash

Stage 2 (Lock-Free UNUSED Slot Claiming) has proper NULL handling:

// Lines 613-622 (hakmem_shared_pool.c)
int claimed_idx = sp_slot_claim_lockfree(meta, class_idx);
if (claimed_idx >= 0) {
    SuperSlab* ss = atomic_load_explicit(&meta->ss, memory_order_acquire);
    if (!ss) {
        // ✅ CORRECT: Skip if SuperSlab was freed
        continue;
    }
    // ... safe to use ss
}

This check was added in a previous RACE FIX but was not applied to Stage 1.


Why workset=64 Specifically?

The crash is NOT specific to workset=64, but rather to total operations × drain frequency:

Crash Threshold Analysis

workset iterations Total Ops Crash? Drain Cycles (÷2048)
60 10000 600,000 OK 293
64 2100 134,400 OK 66
64 2150 137,600 CRASH 67
64 10000 640,000 CRASH 313

Pattern: Crash happens around 2150 iterations (137,600 ops, ~67 drain cycles).

Why this threshold?

  1. TLS SLL drain interval = 2048 (default)

  2. At ~2150 iterations:

    • First major drain cycle completes (~67 drains)
    • Many slabs are released to shared pool
    • Freelist accumulates many freed slots
    • Some SuperSlabs become completely empty → freed
    • Race window opens: slots in freelist whose SuperSlabs are freed
  3. workset=64 amplifies the issue:

    • Larger working set = more concurrent allocations
    • More slabs active → more slabs released during drain
    • Higher probability of hitting the race window

Reproduction

Minimal Repro

cd /mnt/workdisk/public_share/hakmem

# Crash reliably:
./out/release/bench_fixed_size_hakmem 2150 16 64

# Debug logging (shows ss=(nil)):
HAKMEM_SS_ACQUIRE_DEBUG=1 ./out/release/bench_fixed_size_hakmem 2150 16 64

Expected Output (last lines before crash):

[SP_ACQUIRE_STAGE2_LOCKFREE] class=2 claimed UNUSED slot (ss=0x... slab=31)
[SP_ACQUIRE_STAGE3] class=2 new SuperSlab (ss=0x... from_lru=0)
[SP_ACQUIRE_STAGE1_LOCKFREE] class=2 reusing EMPTY slot (ss=(nil) slab=0)
Segmentation fault (core dumped)

Testing Boundaries

# Find exact crash threshold:
for i in {2100..2200..10}; do
  ./out/release/bench_fixed_size_hakmem $i 16 64 >/dev/null 2>&1 \
    && echo "iters=$i: OK" \
    || echo "iters=$i: CRASH"
done

# Output:
# iters=2100: OK
# iters=2110: OK
# ...
# iters=2140: OK
# iters=2150: CRASH  ← First crash

File: core/hakmem_shared_pool.c Function: shared_pool_acquire_slab() Lines: 562-592 (Stage 1)

Patch (Minimal, 5 lines)

--- a/core/hakmem_shared_pool.c
+++ b/core/hakmem_shared_pool.c
@@ -561,6 +561,12 @@ shared_pool_acquire_slab(int class_idx, SuperSlab** ss_out, int* slab_idx_out)
         // Activate slot under mutex (slot state transition requires protection)
         if (sp_slot_mark_active(reuse_meta, reuse_slot_idx, class_idx) == 0) {
             // RACE FIX: Load SuperSlab pointer atomically (consistency)
             SuperSlab* ss = atomic_load_explicit(&reuse_meta->ss, memory_order_relaxed);
+
+            // RACE FIX: Check if SuperSlab was freed between push and pop
+            if (!ss) {
+                // SuperSlab freed after slot was pushed to freelist - skip and fall through
+                pthread_mutex_unlock(&g_shared_pool.alloc_lock);
+                goto stage2_fallback;  // Try Stage 2 (UNUSED slots) or Stage 3 (new SS)
+            }

             if (dbg_acquire == 1) {
                 fprintf(stderr, "[SP_ACQUIRE_STAGE1_LOCKFREE] class=%d reusing EMPTY slot (ss=%p slab=%d)\n",
@@ -598,6 +604,7 @@ shared_pool_acquire_slab(int class_idx, SuperSlab** ss_out, int* slab_idx_out)
         pthread_mutex_unlock(&g_shared_pool.alloc_lock);
     }

+stage2_fallback:
     // ========== Stage 2 (Lock-Free): Try to claim UNUSED slots ==========

Alternative Fix (No goto, +10 lines)

If goto is undesirable, wrap Stage 2+3 in a helper function or use a flag:

// After line 564:
SuperSlab* ss = atomic_load_explicit(&reuse_meta->ss, memory_order_relaxed);
if (!ss) {
    // SuperSlab was freed - release lock and continue to Stage 2
    if (g_lock_stats_enabled == 1) {
        atomic_fetch_add(&g_lock_release_count, 1);
    }
    pthread_mutex_unlock(&g_shared_pool.alloc_lock);
    // Fall through to Stage 2 below (no goto needed)
} else {
    // ... existing code (lines 566-591)
}

Verification Plan

Test Cases

# 1. Original crash case (must pass after fix):
./out/release/bench_fixed_size_hakmem 2150 16 64
./out/release/bench_fixed_size_hakmem 10000 16 64

# 2. Boundary cases (all must pass):
./out/release/bench_fixed_size_hakmem 2100 16 64
./out/release/bench_fixed_size_hakmem 3000 16 64
./out/release/bench_fixed_size_hakmem 10000 16 128

# 3. Other size classes (regression test):
./out/release/bench_fixed_size_hakmem 10000 256 128
./out/release/bench_fixed_size_hakmem 10000 1024 128

# 4. Stress test (100K iterations, various worksets):
for ws in 32 64 96 128 192 256; do
  echo "Testing workset=$ws..."
  ./out/release/bench_fixed_size_hakmem 100000 16 $ws || echo "FAIL: workset=$ws"
done

Debug Validation

After applying the fix, verify with debug logging:

HAKMEM_SS_ACQUIRE_DEBUG=1 ./out/release/bench_fixed_size_hakmem 2150 16 64 2>&1 | \
  grep "ss=(nil)"

# Expected: No output (no NULL ss should reach Stage 1 activation)

Impact Assessment

Severity: CRITICAL (P0)

  • Reliability: Crash in production workloads with high allocation churn
  • Frequency: Deterministic after ~2150 iterations (workload-dependent)
  • Scope: Affects all allocations using shared pool (Phase 12+)

Affected Components

  1. Shared SuperSlab Pool (core/hakmem_shared_pool.c)
    • Stage 1 lock-free freelist reuse path
  2. TLS SLL Drain (indirectly)
    • Triggers slab releases that populate freelist
  3. All benchmarks using fixed worksets
    • bench_fixed_size_hakmem
    • Potentially bench_random_mixed_hakmem with high churn

Pre-Existing or Phase 13-B?

Pre-existing bug in Phase 12 shared pool implementation.

Not caused by Phase 13-B changes (TinyHeapV2 supply hook):

  • Crash reproduces with HAKMEM_TINY_HEAP_V2=0 (HeapV2 disabled)
  • Root cause is in Stage 1 freelist logic (lines 562-592)
  • Phase 13-B only added supply hook in tiny_free_fast_v2.inc.h (separate code path)

Similar Bugs Fixed Previously

  1. Stage 2 NULL check (lines 618-622):

    • Added in previous RACE FIX commit
    • Comment: "SuperSlab was freed between claiming and loading"
    • Same pattern, but Stage 1 was missed!
  2. sp_meta->ss NULL store (line 862):

    • Added in RACE FIX: "Set meta->ss to NULL BEFORE unlocking mutex"
    • Correctly prevents Stage 2 from accessing freed SuperSlab
    • But Stage 1 freelist can still hold stale pointers

Design Flaw: Freelist Lifetime Management

The root issue is decoupled lifetimes:

  • Freelist nodes live in global pool (g_free_node_pool, never freed)
  • SuperSlabs are dynamically freed (line 870: superslab_free(ss))
  • No mechanism to invalidate freelist nodes when SuperSlab is freed

Potential long-term fixes (beyond this patch):

  1. Generation counter in SharedSSMeta:

    • Increment on each SuperSlab allocation/free
    • Freelist node stores generation number
    • Pop path checks if generation matches (stale node → skip)
  2. Lazy freelist cleanup:

    • Before freeing SuperSlab, scan freelist and remove matching nodes
    • Requires lock-free list traversal or fallback to mutex
  3. Reference counting on SharedSSMeta:

    • Increment when pushing to freelist
    • Decrement when popping or freeing SuperSlab
    • Only free SuperSlab when refcount == 0

Files Involved

Primary Bug Location

  • /mnt/workdisk/public_share/hakmem/core/hakmem_shared_pool.c
    • Line 562-592: Stage 1 (lock-free freelist reuse) - MISSING NULL CHECK
    • Line 618-622: Stage 2 (lock-free unused claiming) - HAS NULL CHECK
    • Line 840: sp_freelist_push_lockfree() - pushes slot to freelist
    • Line 862: Sets sp_meta->ss = NULL before freeing SuperSlab
    • Line 870: superslab_free(ss) - frees SuperSlab memory
  • /mnt/workdisk/public_share/hakmem/benchmarks/src/fixed/bench_fixed_size.c
    • Benchmark that triggers the crash (workset=64 pattern)
  • /mnt/workdisk/public_share/hakmem/core/box/tls_sll_drain_box.h
    • TLS SLL drain interval (2048) - affects when slabs are released
  • /mnt/workdisk/public_share/hakmem/core/tiny_superslab_free.inc.h
    • Line 234-235: Calls shared_pool_release_slab() when slab is empty

Summary

What Happened

  1. workset=64, iterations=2150 creates high allocation churn
  2. After ~67 drain cycles, many slabs are released to shared pool
  3. Some SuperSlabs become completely empty → freed
  4. Freelist contains slots whose SuperSlabs are already freed (ss = NULL)
  5. Stage 1 pops a stale slot, loads ss = NULL, crashes on dereference

Why It Wasn't Caught Earlier

  1. Low iteration count in normal testing (< 2000 iterations)
  2. Stage 2 already has NULL check - assumed Stage 1 was also safe
  3. Race window is small - only happens when:
    • Freelist is non-empty (needs prior releases)
    • SuperSlab is completely empty (all slots freed)
    • Another thread pops before SuperSlab is reallocated

The Fix

Add NULL check in Stage 1 after loading ss, matching Stage 2's pattern:

SuperSlab* ss = atomic_load_explicit(&reuse_meta->ss, memory_order_relaxed);
if (!ss) {
    // SuperSlab freed - skip and fall through to Stage 2/3
    pthread_mutex_unlock(&g_shared_pool.alloc_lock);
    goto stage2_fallback;  // or return and retry
}

Impact: Minimal overhead (1 NULL check per Stage 1 hit), fixes critical crash.


Action Items

  • Apply minimal NULL check patch to shared_pool_acquire_slab() Stage 1
  • Rebuild and test crash cases (workset=64, iterations=2150/10000)
  • Run stress test (100K iterations, worksets 32-256)
  • Verify with debug logging (no ss=(nil) in Stage 1)
  • Consider long-term fix (generation counter or refcounting)
  • Update CURRENT_TASK.md with fix status

Report End