# SEGFAULT Investigation Report - bench_random_mixed & bench_mid_large_mt **Date**: 2025-11-07 **Status**: ✅ ROOT CAUSE IDENTIFIED **Priority**: CRITICAL --- ## Executive Summary **Problem**: `bench_random_mixed_hakmem` and `bench_mid_large_mt_hakmem` crash with SEGV (exit 139) when direct-linked, but work fine with LD_PRELOAD. **Root Cause**: **SuperSlab registry lookup failures** cause headerless tiny allocations to be misidentified as having HAKMEM headers during free(), leading to: 1. Invalid memory reads at `ptr - HEADER_SIZE` → SEGV 2. Memory leaks when `g_invalid_free_mode=1` skips frees 3. Eventual memory exhaustion or corruption **Why LD_PRELOAD Works**: LD_PRELOAD defaults to `g_invalid_free_mode=0` (fallback to libc), which masks the issue by routing failed frees to `__libc_free()`. **Why Direct-Link Crashes**: Direct-link defaults to `g_invalid_free_mode=1` (skip invalid frees), which silently leaks memory until exhaustion. --- ## Reproduction ### Crashes (Direct-Link) ```bash ./bench_random_mixed_hakmem 50000 2048 123 # → Segmentation fault (exit 139) ./bench_mid_large_mt_hakmem 4 40000 2048 42 # → Segmentation fault (exit 139) ``` **Error Output**: ``` [hakmem] ERROR: Invalid magic 0x0 (expected 0x48414B4D) [hakmem] ERROR: Invalid magic 0x0 (expected 0x48414B4D) ... (hundreds of errors) free(): invalid pointer Segmentation fault (core dumped) ``` ### Works Fine (LD_PRELOAD) ```bash LD_PRELOAD=./libhakmem_asan.so ./bench_random_mixed_system 200000 4096 1234567 # → 5.7M ops/s ✅ ``` ### Crash Threshold - **Small workloads**: ≤20K ops with 512 slots → Works - **Large workloads**: ≥25K ops with 2048 slots → Crashes immediately - **Pattern**: Scales with working set size (more live objects = more failures) --- ## Technical Analysis ### 1. Allocation Flow (Working) ``` malloc(size) [size ≤ 1KB] ↓ hak_alloc_at(size) ↓ hak_tiny_alloc_fast_wrapper(size) ↓ tiny_alloc_fast(size) ↓ [TLS freelist miss] ↓ hak_tiny_alloc_slow(size) ↓ hak_tiny_alloc_superslab(class_idx) ↓ ✅ Returns pointer WITHOUT header (SuperSlab allocation) ``` ### 2. Free Flow (Broken) ``` free(ptr) ↓ hak_free_at(ptr, 0, site) ↓ [SS-first free path] hak_super_lookup(ptr) ↓ ❌ Lookup FAILS (should succeed!) ↓ [Fallback] Try mid/L25 lookup → Fails ↓ [Fallback] Header dispatch: void* raw = (char*)ptr - HEADER_SIZE; // ← ptr has NO header! AllocHeader* hdr = (AllocHeader*)raw; // ← Invalid pointer if (hdr->magic != HAKMEM_MAGIC) { // ← ⚠️ SEGV or reads 0x0 // g_invalid_free_mode = 1 (direct-link) goto done; // ← ❌ MEMORY LEAK! } ``` **Key Bug**: When SuperSlab lookup fails for a tiny allocation, the code assumes there's a HAKMEM header and tries to read it. But tiny allocations are **headerless**, so this reads invalid memory. ### 3. Why SuperSlab Lookup Fails Based on testing: ```bash # Default (crashes with "Invalid magic 0x0") ./bench_random_mixed_hakmem 25000 2048 123 # → Hundreds of "Invalid magic" errors # With SuperSlab explicitly enabled (no "Invalid magic" errors, but still SEGVs) HAKMEM_TINY_USE_SUPERSLAB=1 ./bench_random_mixed_hakmem 25000 2048 123 # → SEGV without "Invalid magic" errors ``` **Hypothesis**: When `HAKMEM_TINY_USE_SUPERSLAB` is not explicitly set, there may be a code path where: 1. Tiny allocations succeed (from some non-SuperSlab path) 2. But they're not registered in the SuperSlab registry 3. So lookups fail during free **Possible causes**: - **Configuration bug**: `g_use_superslab` may be uninitialized or overridden - **TLS allocation path**: There may be a TLS-only allocation path that bypasses SuperSlab - **Magazine/HotMag path**: Allocations from magazine layers might not come from SuperSlab - **Registry capacity**: Registry might be full (unlikely with SUPER_REG_SIZE=262144) ### 4. Direct-Link vs LD_PRELOAD Behavior **LD_PRELOAD** (`hak_core_init.inc.h:147-164`): ```c if (ldpre && strstr(ldpre, "libhakmem.so")) { g_ldpreload_mode = 1; g_invalid_free_mode = 0; // ← Fallback to libc } ``` - Defaults to `g_invalid_free_mode=0` (fallback mode) - Invalid frees → `__libc_free(ptr)` → **masks the bug** (may work if ptr was originally from libc) **Direct-Link**: ```c else { g_invalid_free_mode = 1; // ← Skip invalid frees } ``` - Defaults to `g_invalid_free_mode=1` (skip mode) - Invalid frees → `goto done` → **silent memory leak** - Accumulated leaks → memory exhaustion → SEGV --- ## GDB Analysis ### Backtrace ``` Thread 1 "bench_random_mi" received signal SIGSEGV, Segmentation fault. 0x000055555555eb40 in free () #0 0x000055555555eb40 in free () #1 0xffffffffffffffff in ?? () ... #8 0x00005555555587e1 in main () Registers: rax 0x555556c9d040 (some address) rbp 0x7ffff6e00000 (pointer being freed - page-aligned!) rdi 0x0 (NULL!) rip 0x55555555eb40 ``` ### Disassembly at Crash Point (free+2176) ```asm 0xab40 <+2176>: mov -0x28(%rbp),%ecx # Load header magic 0xab43 <+2179>: cmp $0x48414B4D,%ecx # Compare with HAKMEM_MAGIC 0xab49 <+2185>: je 0xabd0 # Jump if magic matches ``` **Key observation**: - `rbp = 0x7ffff6e00000` (page-aligned, likely start of mmap region) - Trying to read from `rbp - 0x28 = 0x7ffff6dffffd8` - If this is at page boundary, reading before the page causes SEGV --- ## Proposed Fix ### Option A: Safe Header Read (Recommended) Add a safety check before reading the header: ```c // hak_free_api.inc.h, line 78-88 (header dispatch) // BEFORE: Unsafe header read void* raw = (char*)ptr - HEADER_SIZE; AllocHeader* hdr = (AllocHeader*)raw; if (hdr->magic != HAKMEM_MAGIC) { ... } // AFTER: Safe fallback for tiny allocations // If SuperSlab lookup failed for a tiny-sized allocation, // assume it's an invalid free or was already freed { // Check if this could be a tiny allocation (size ≤ 1KB) // Heuristic: If SuperSlab/Mid/L25 lookup all failed, and we're here, // either it's a libc allocation with header, or a leaked tiny allocation // Try to safely read header magic void* raw = (char*)ptr - HEADER_SIZE; AllocHeader* hdr = (AllocHeader*)raw; // If magic is valid, proceed with header dispatch if (hdr->magic == HAKMEM_MAGIC) { // Header exists, dispatch normally if (HAK_ENABLED_CACHE(HAKMEM_FEATURE_BIGCACHE) && hdr->class_bytes >= 2097152) { if (hak_bigcache_put(ptr, hdr->size, hdr->alloc_site)) goto done; } switch (hdr->method) { case ALLOC_METHOD_MALLOC: __libc_free(raw); break; case ALLOC_METHOD_MMAP: /* ... */ break; // ... } } else { // Invalid magic - could be: // 1. Tiny allocation where SuperSlab lookup failed // 2. Already freed pointer // 3. Pointer from external library if (g_invalid_free_log) { fprintf(stderr, "[hakmem] WARNING: free() of pointer %p with invalid magic 0x%X (expected 0x%X)\n", ptr, hdr->magic, HAKMEM_MAGIC); fprintf(stderr, "[hakmem] Possible causes: tiny allocation lookup failure, double-free, or external pointer\n"); } // In direct-link mode, do NOT leak - try to return to tiny pool // as a best-effort recovery if (!g_ldpreload_mode) { // Attempt to route to tiny free (may succeed if it's a valid tiny allocation) hak_tiny_free(ptr); // Will validate internally } else { // LD_PRELOAD mode: fallback to libc (may be mixed allocation) if (g_invalid_free_mode == 0) { __libc_free(ptr); // Not raw! ptr itself } } } } goto done; ``` ### Option B: Fix SuperSlab Lookup Root Cause Investigate why SuperSlab lookups are failing: 1. **Add comprehensive logging**: ```c // At allocation time fprintf(stderr, "[ALLOC_DEBUG] ptr=%p class=%d from_superslab=%d\n", ptr, class_idx, from_superslab); // At free time SuperSlab* ss = hak_super_lookup(ptr); fprintf(stderr, "[FREE_DEBUG] ptr=%p lookup=%p magic=%llx\n", ptr, ss, ss ? ss->magic : 0); ``` 2. **Check TLS allocation paths**: - Verify all paths through `tiny_alloc_fast_pop()` come from SuperSlab - Check if magazine/HotMag allocations are properly registered - Verify TLS SLL allocations are from registered SuperSlabs 3. **Verify registry initialization**: ```c // At startup fprintf(stderr, "[INIT] g_super_reg_initialized=%d g_use_superslab=%d\n", g_super_reg_initialized, g_use_superslab); ``` ### Option C: Force SuperSlab Path Simplify the allocation path to always use SuperSlab: ```c // Disable competing paths that might bypass SuperSlab g_hotmag_enable = 0; // Disable HotMag g_tls_list_enable = 0; // Disable TLS List g_tls_sll_enable = 1; // Enable TLS SLL (SuperSlab-backed) ``` --- ## Immediate Workaround For users hitting this bug: ```bash # Workaround 1: Use LD_PRELOAD (masks the issue) LD_PRELOAD=./libhakmem.so your_benchmark # Workaround 2: Force SuperSlab (may still crash, but different symptoms) HAKMEM_TINY_USE_SUPERSLAB=1 ./your_benchmark # Workaround 3: Disable tiny allocator (fallback to libc) HAKMEM_WRAP_TINY=0 ./your_benchmark ``` --- ## Next Steps 1. **Implement Option A (Safe Header Read)** - Immediate fix to prevent SEGV 2. **Add logging to identify root cause** - Why are SuperSlab lookups failing? 3. **Fix underlying issue** - Ensure all tiny allocations are SuperSlab-backed 4. **Add regression tests** - Prevent future breakage --- ## Files to Modify 1. `/mnt/workdisk/public_share/hakmem/core/box/hak_free_api.inc.h` - Lines 78-120 (header dispatch logic) 2. `/mnt/workdisk/public_share/hakmem/core/hakmem_tiny.c` - Add allocation path logging 3. `/mnt/workdisk/public_share/hakmem/core/tiny_alloc_fast.inc.h` - Verify SuperSlab usage 4. `/mnt/workdisk/public_share/hakmem/core/hakmem_super_registry.c` - Add lookup diagnostics --- ## Related Issues - **Phase 6-2.3**: Active counter bug fix (freed blocks not tracked) - **Sanitizer Fix**: Similar TLS initialization ordering issues - **LD_PRELOAD vs Direct-Link**: Behavioral differences in error handling --- ## Verification After fix, verify: ```bash # Should complete without errors ./bench_random_mixed_hakmem 50000 2048 123 ./bench_mid_large_mt_hakmem 4 40000 2048 42 # Should see no "Invalid magic" errors HAKMEM_INVALID_FREE_LOG=1 ./bench_random_mixed_hakmem 50000 2048 123 ```