337 lines
10 KiB
Markdown
337 lines
10 KiB
Markdown
|
|
# SEGFAULT Investigation Report - bench_random_mixed & bench_mid_large_mt
|
||
|
|
|
||
|
|
**Date**: 2025-11-07
|
||
|
|
**Status**: ✅ ROOT CAUSE IDENTIFIED
|
||
|
|
**Priority**: CRITICAL
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Executive Summary
|
||
|
|
|
||
|
|
**Problem**: `bench_random_mixed_hakmem` and `bench_mid_large_mt_hakmem` crash with SEGV (exit 139) when direct-linked, but work fine with LD_PRELOAD.
|
||
|
|
|
||
|
|
**Root Cause**: **SuperSlab registry lookup failures** cause headerless tiny allocations to be misidentified as having HAKMEM headers during free(), leading to:
|
||
|
|
1. Invalid memory reads at `ptr - HEADER_SIZE` → SEGV
|
||
|
|
2. Memory leaks when `g_invalid_free_mode=1` skips frees
|
||
|
|
3. Eventual memory exhaustion or corruption
|
||
|
|
|
||
|
|
**Why LD_PRELOAD Works**: LD_PRELOAD defaults to `g_invalid_free_mode=0` (fallback to libc), which masks the issue by routing failed frees to `__libc_free()`.
|
||
|
|
|
||
|
|
**Why Direct-Link Crashes**: Direct-link defaults to `g_invalid_free_mode=1` (skip invalid frees), which silently leaks memory until exhaustion.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Reproduction
|
||
|
|
|
||
|
|
### Crashes (Direct-Link)
|
||
|
|
```bash
|
||
|
|
./bench_random_mixed_hakmem 50000 2048 123
|
||
|
|
# → Segmentation fault (exit 139)
|
||
|
|
|
||
|
|
./bench_mid_large_mt_hakmem 4 40000 2048 42
|
||
|
|
# → Segmentation fault (exit 139)
|
||
|
|
```
|
||
|
|
|
||
|
|
**Error Output**:
|
||
|
|
```
|
||
|
|
[hakmem] ERROR: Invalid magic 0x0 (expected 0x48414B4D)
|
||
|
|
[hakmem] ERROR: Invalid magic 0x0 (expected 0x48414B4D)
|
||
|
|
... (hundreds of errors)
|
||
|
|
free(): invalid pointer
|
||
|
|
Segmentation fault (core dumped)
|
||
|
|
```
|
||
|
|
|
||
|
|
### Works Fine (LD_PRELOAD)
|
||
|
|
```bash
|
||
|
|
LD_PRELOAD=./libhakmem_asan.so ./bench_random_mixed_system 200000 4096 1234567
|
||
|
|
# → 5.7M ops/s ✅
|
||
|
|
```
|
||
|
|
|
||
|
|
### Crash Threshold
|
||
|
|
- **Small workloads**: ≤20K ops with 512 slots → Works
|
||
|
|
- **Large workloads**: ≥25K ops with 2048 slots → Crashes immediately
|
||
|
|
- **Pattern**: Scales with working set size (more live objects = more failures)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Technical Analysis
|
||
|
|
|
||
|
|
### 1. Allocation Flow (Working)
|
||
|
|
```
|
||
|
|
malloc(size) [size ≤ 1KB]
|
||
|
|
↓
|
||
|
|
hak_alloc_at(size)
|
||
|
|
↓
|
||
|
|
hak_tiny_alloc_fast_wrapper(size)
|
||
|
|
↓
|
||
|
|
tiny_alloc_fast(size)
|
||
|
|
↓ [TLS freelist miss]
|
||
|
|
↓
|
||
|
|
hak_tiny_alloc_slow(size)
|
||
|
|
↓
|
||
|
|
hak_tiny_alloc_superslab(class_idx)
|
||
|
|
↓
|
||
|
|
✅ Returns pointer WITHOUT header (SuperSlab allocation)
|
||
|
|
```
|
||
|
|
|
||
|
|
### 2. Free Flow (Broken)
|
||
|
|
```
|
||
|
|
free(ptr)
|
||
|
|
↓
|
||
|
|
hak_free_at(ptr, 0, site)
|
||
|
|
↓
|
||
|
|
[SS-first free path] hak_super_lookup(ptr)
|
||
|
|
↓ ❌ Lookup FAILS (should succeed!)
|
||
|
|
↓
|
||
|
|
[Fallback] Try mid/L25 lookup → Fails
|
||
|
|
↓
|
||
|
|
[Fallback] Header dispatch:
|
||
|
|
void* raw = (char*)ptr - HEADER_SIZE; // ← ptr has NO header!
|
||
|
|
AllocHeader* hdr = (AllocHeader*)raw; // ← Invalid pointer
|
||
|
|
if (hdr->magic != HAKMEM_MAGIC) { // ← ⚠️ SEGV or reads 0x0
|
||
|
|
// g_invalid_free_mode = 1 (direct-link)
|
||
|
|
goto done; // ← ❌ MEMORY LEAK!
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Key Bug**: When SuperSlab lookup fails for a tiny allocation, the code assumes there's a HAKMEM header and tries to read it. But tiny allocations are **headerless**, so this reads invalid memory.
|
||
|
|
|
||
|
|
### 3. Why SuperSlab Lookup Fails
|
||
|
|
|
||
|
|
Based on testing:
|
||
|
|
```bash
|
||
|
|
# Default (crashes with "Invalid magic 0x0")
|
||
|
|
./bench_random_mixed_hakmem 25000 2048 123
|
||
|
|
# → Hundreds of "Invalid magic" errors
|
||
|
|
|
||
|
|
# With SuperSlab explicitly enabled (no "Invalid magic" errors, but still SEGVs)
|
||
|
|
HAKMEM_TINY_USE_SUPERSLAB=1 ./bench_random_mixed_hakmem 25000 2048 123
|
||
|
|
# → SEGV without "Invalid magic" errors
|
||
|
|
```
|
||
|
|
|
||
|
|
**Hypothesis**: When `HAKMEM_TINY_USE_SUPERSLAB` is not explicitly set, there may be a code path where:
|
||
|
|
1. Tiny allocations succeed (from some non-SuperSlab path)
|
||
|
|
2. But they're not registered in the SuperSlab registry
|
||
|
|
3. So lookups fail during free
|
||
|
|
|
||
|
|
**Possible causes**:
|
||
|
|
- **Configuration bug**: `g_use_superslab` may be uninitialized or overridden
|
||
|
|
- **TLS allocation path**: There may be a TLS-only allocation path that bypasses SuperSlab
|
||
|
|
- **Magazine/HotMag path**: Allocations from magazine layers might not come from SuperSlab
|
||
|
|
- **Registry capacity**: Registry might be full (unlikely with SUPER_REG_SIZE=262144)
|
||
|
|
|
||
|
|
### 4. Direct-Link vs LD_PRELOAD Behavior
|
||
|
|
|
||
|
|
**LD_PRELOAD** (`hak_core_init.inc.h:147-164`):
|
||
|
|
```c
|
||
|
|
if (ldpre && strstr(ldpre, "libhakmem.so")) {
|
||
|
|
g_ldpreload_mode = 1;
|
||
|
|
g_invalid_free_mode = 0; // ← Fallback to libc
|
||
|
|
}
|
||
|
|
```
|
||
|
|
- Defaults to `g_invalid_free_mode=0` (fallback mode)
|
||
|
|
- Invalid frees → `__libc_free(ptr)` → **masks the bug** (may work if ptr was originally from libc)
|
||
|
|
|
||
|
|
**Direct-Link**:
|
||
|
|
```c
|
||
|
|
else {
|
||
|
|
g_invalid_free_mode = 1; // ← Skip invalid frees
|
||
|
|
}
|
||
|
|
```
|
||
|
|
- Defaults to `g_invalid_free_mode=1` (skip mode)
|
||
|
|
- Invalid frees → `goto done` → **silent memory leak**
|
||
|
|
- Accumulated leaks → memory exhaustion → SEGV
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## GDB Analysis
|
||
|
|
|
||
|
|
### Backtrace
|
||
|
|
```
|
||
|
|
Thread 1 "bench_random_mi" received signal SIGSEGV, Segmentation fault.
|
||
|
|
0x000055555555eb40 in free ()
|
||
|
|
|
||
|
|
#0 0x000055555555eb40 in free ()
|
||
|
|
#1 0xffffffffffffffff in ?? ()
|
||
|
|
...
|
||
|
|
#8 0x00005555555587e1 in main ()
|
||
|
|
|
||
|
|
Registers:
|
||
|
|
rax 0x555556c9d040 (some address)
|
||
|
|
rbp 0x7ffff6e00000 (pointer being freed - page-aligned!)
|
||
|
|
rdi 0x0 (NULL!)
|
||
|
|
rip 0x55555555eb40 <free+2176>
|
||
|
|
```
|
||
|
|
|
||
|
|
### Disassembly at Crash Point (free+2176)
|
||
|
|
```asm
|
||
|
|
0xab40 <+2176>: mov -0x28(%rbp),%ecx # Load header magic
|
||
|
|
0xab43 <+2179>: cmp $0x48414B4D,%ecx # Compare with HAKMEM_MAGIC
|
||
|
|
0xab49 <+2185>: je 0xabd0 <free+2320> # Jump if magic matches
|
||
|
|
```
|
||
|
|
|
||
|
|
**Key observation**:
|
||
|
|
- `rbp = 0x7ffff6e00000` (page-aligned, likely start of mmap region)
|
||
|
|
- Trying to read from `rbp - 0x28 = 0x7ffff6dffffd8`
|
||
|
|
- If this is at page boundary, reading before the page causes SEGV
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Proposed Fix
|
||
|
|
|
||
|
|
### Option A: Safe Header Read (Recommended)
|
||
|
|
Add a safety check before reading the header:
|
||
|
|
|
||
|
|
```c
|
||
|
|
// hak_free_api.inc.h, line 78-88 (header dispatch)
|
||
|
|
|
||
|
|
// BEFORE: Unsafe header read
|
||
|
|
void* raw = (char*)ptr - HEADER_SIZE;
|
||
|
|
AllocHeader* hdr = (AllocHeader*)raw;
|
||
|
|
if (hdr->magic != HAKMEM_MAGIC) { ... }
|
||
|
|
|
||
|
|
// AFTER: Safe fallback for tiny allocations
|
||
|
|
// If SuperSlab lookup failed for a tiny-sized allocation,
|
||
|
|
// assume it's an invalid free or was already freed
|
||
|
|
{
|
||
|
|
// Check if this could be a tiny allocation (size ≤ 1KB)
|
||
|
|
// Heuristic: If SuperSlab/Mid/L25 lookup all failed, and we're here,
|
||
|
|
// either it's a libc allocation with header, or a leaked tiny allocation
|
||
|
|
|
||
|
|
// Try to safely read header magic
|
||
|
|
void* raw = (char*)ptr - HEADER_SIZE;
|
||
|
|
AllocHeader* hdr = (AllocHeader*)raw;
|
||
|
|
|
||
|
|
// If magic is valid, proceed with header dispatch
|
||
|
|
if (hdr->magic == HAKMEM_MAGIC) {
|
||
|
|
// Header exists, dispatch normally
|
||
|
|
if (HAK_ENABLED_CACHE(HAKMEM_FEATURE_BIGCACHE) && hdr->class_bytes >= 2097152) {
|
||
|
|
if (hak_bigcache_put(ptr, hdr->size, hdr->alloc_site)) goto done;
|
||
|
|
}
|
||
|
|
switch (hdr->method) {
|
||
|
|
case ALLOC_METHOD_MALLOC: __libc_free(raw); break;
|
||
|
|
case ALLOC_METHOD_MMAP: /* ... */ break;
|
||
|
|
// ...
|
||
|
|
}
|
||
|
|
} else {
|
||
|
|
// Invalid magic - could be:
|
||
|
|
// 1. Tiny allocation where SuperSlab lookup failed
|
||
|
|
// 2. Already freed pointer
|
||
|
|
// 3. Pointer from external library
|
||
|
|
|
||
|
|
if (g_invalid_free_log) {
|
||
|
|
fprintf(stderr, "[hakmem] WARNING: free() of pointer %p with invalid magic 0x%X (expected 0x%X)\n",
|
||
|
|
ptr, hdr->magic, HAKMEM_MAGIC);
|
||
|
|
fprintf(stderr, "[hakmem] Possible causes: tiny allocation lookup failure, double-free, or external pointer\n");
|
||
|
|
}
|
||
|
|
|
||
|
|
// In direct-link mode, do NOT leak - try to return to tiny pool
|
||
|
|
// as a best-effort recovery
|
||
|
|
if (!g_ldpreload_mode) {
|
||
|
|
// Attempt to route to tiny free (may succeed if it's a valid tiny allocation)
|
||
|
|
hak_tiny_free(ptr); // Will validate internally
|
||
|
|
} else {
|
||
|
|
// LD_PRELOAD mode: fallback to libc (may be mixed allocation)
|
||
|
|
if (g_invalid_free_mode == 0) {
|
||
|
|
__libc_free(ptr); // Not raw! ptr itself
|
||
|
|
}
|
||
|
|
}
|
||
|
|
}
|
||
|
|
}
|
||
|
|
goto done;
|
||
|
|
```
|
||
|
|
|
||
|
|
### Option B: Fix SuperSlab Lookup Root Cause
|
||
|
|
Investigate why SuperSlab lookups are failing:
|
||
|
|
|
||
|
|
1. **Add comprehensive logging**:
|
||
|
|
```c
|
||
|
|
// At allocation time
|
||
|
|
fprintf(stderr, "[ALLOC_DEBUG] ptr=%p class=%d from_superslab=%d\n",
|
||
|
|
ptr, class_idx, from_superslab);
|
||
|
|
|
||
|
|
// At free time
|
||
|
|
SuperSlab* ss = hak_super_lookup(ptr);
|
||
|
|
fprintf(stderr, "[FREE_DEBUG] ptr=%p lookup=%p magic=%llx\n",
|
||
|
|
ptr, ss, ss ? ss->magic : 0);
|
||
|
|
```
|
||
|
|
|
||
|
|
2. **Check TLS allocation paths**:
|
||
|
|
- Verify all paths through `tiny_alloc_fast_pop()` come from SuperSlab
|
||
|
|
- Check if magazine/HotMag allocations are properly registered
|
||
|
|
- Verify TLS SLL allocations are from registered SuperSlabs
|
||
|
|
|
||
|
|
3. **Verify registry initialization**:
|
||
|
|
```c
|
||
|
|
// At startup
|
||
|
|
fprintf(stderr, "[INIT] g_super_reg_initialized=%d g_use_superslab=%d\n",
|
||
|
|
g_super_reg_initialized, g_use_superslab);
|
||
|
|
```
|
||
|
|
|
||
|
|
### Option C: Force SuperSlab Path
|
||
|
|
Simplify the allocation path to always use SuperSlab:
|
||
|
|
|
||
|
|
```c
|
||
|
|
// Disable competing paths that might bypass SuperSlab
|
||
|
|
g_hotmag_enable = 0; // Disable HotMag
|
||
|
|
g_tls_list_enable = 0; // Disable TLS List
|
||
|
|
g_tls_sll_enable = 1; // Enable TLS SLL (SuperSlab-backed)
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Immediate Workaround
|
||
|
|
|
||
|
|
For users hitting this bug:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Workaround 1: Use LD_PRELOAD (masks the issue)
|
||
|
|
LD_PRELOAD=./libhakmem.so your_benchmark
|
||
|
|
|
||
|
|
# Workaround 2: Force SuperSlab (may still crash, but different symptoms)
|
||
|
|
HAKMEM_TINY_USE_SUPERSLAB=1 ./your_benchmark
|
||
|
|
|
||
|
|
# Workaround 3: Disable tiny allocator (fallback to libc)
|
||
|
|
HAKMEM_WRAP_TINY=0 ./your_benchmark
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Next Steps
|
||
|
|
|
||
|
|
1. **Implement Option A (Safe Header Read)** - Immediate fix to prevent SEGV
|
||
|
|
2. **Add logging to identify root cause** - Why are SuperSlab lookups failing?
|
||
|
|
3. **Fix underlying issue** - Ensure all tiny allocations are SuperSlab-backed
|
||
|
|
4. **Add regression tests** - Prevent future breakage
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Files to Modify
|
||
|
|
|
||
|
|
1. `/mnt/workdisk/public_share/hakmem/core/box/hak_free_api.inc.h` - Lines 78-120 (header dispatch logic)
|
||
|
|
2. `/mnt/workdisk/public_share/hakmem/core/hakmem_tiny.c` - Add allocation path logging
|
||
|
|
3. `/mnt/workdisk/public_share/hakmem/core/tiny_alloc_fast.inc.h` - Verify SuperSlab usage
|
||
|
|
4. `/mnt/workdisk/public_share/hakmem/core/hakmem_super_registry.c` - Add lookup diagnostics
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Related Issues
|
||
|
|
|
||
|
|
- **Phase 6-2.3**: Active counter bug fix (freed blocks not tracked)
|
||
|
|
- **Sanitizer Fix**: Similar TLS initialization ordering issues
|
||
|
|
- **LD_PRELOAD vs Direct-Link**: Behavioral differences in error handling
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Verification
|
||
|
|
|
||
|
|
After fix, verify:
|
||
|
|
```bash
|
||
|
|
# Should complete without errors
|
||
|
|
./bench_random_mixed_hakmem 50000 2048 123
|
||
|
|
./bench_mid_large_mt_hakmem 4 40000 2048 42
|
||
|
|
|
||
|
|
# Should see no "Invalid magic" errors
|
||
|
|
HAKMEM_INVALID_FREE_LOG=1 ./bench_random_mixed_hakmem 50000 2048 123
|
||
|
|
```
|