- Root cause: header-based class indexing (HEADER_CLASSIDX=1) wrote a 1-byte header during allocation, but linear carve/refill and initial slab capacity still used bare class block sizes. This mismatch could overrun slab usable space and corrupt freelists, causing reproducible SEGV at ~100k iters. Changes - Superslab: compute capacity with effective stride (block_size + header for classes 0..6; class7 remains headerless) in superslab_init_slab(). Add a debug-only bound check in superslab_alloc_from_slab() to fail fast if carve would exceed usable bytes. - Refill (non-P0 and P0): use header-aware stride for all linear carving and TLS window bump operations. Ensure alignment/validation in tiny_refill_opt.h also uses stride, not raw class size. - Drain: keep existing defense-in-depth for remote sentinel and sanitize nodes before splicing into freelist (already present). Notes - This unifies the memory layout across alloc/linear-carve/refill with a single stride definition and keeps class7 (1024B) headerless as designed. - Debug builds add fail-fast checks; release builds remain lean. Next - Re-run Tiny benches (256/1024B) in debug to confirm stability, then in release. If any remaining crash persists, bisect with HAKMEM_TINY_P0_BATCH_REFILL=0 to isolate P0 batch carve, and continue reducing branch-miss as planned.
9.3 KiB
Pool TLS Phase 1.5a SEGV Deep Investigation
Executive Summary
ROOT CAUSE IDENTIFIED: TLS Variable Uninitialized Access
The SEGV occurs BEFORE Pool TLS free dispatch code (line 138-171 in hak_free_api.inc.h) because the crash happens during free() wrapper TLS variable access at line 108.
Critical Finding
Evidence:
- Debug fprintf() added at lines 145-146 in
hak_free_api.inc.h - NO debug output appears before SEGV
- GDB shows crash at
movzbl -0x1(%rbp),%edxwithrdi = 0x0 - This means: The crash happens in the free() wrapper BEFORE reaching Pool TLS dispatch
Exact Crash Location
File: /mnt/workdisk/public_share/hakmem/core/box/hak_wrappers.inc.h:108
void free(void* ptr) {
atomic_fetch_add_explicit(&g_free_wrapper_calls, 1, memory_order_relaxed);
if (!ptr) return;
if (g_hakmem_lock_depth > 0) { // ← CRASH HERE (line 108)
extern void __libc_free(void*);
__libc_free(ptr);
return;
}
Analysis:
g_hakmem_lock_depthis a __thread TLS variable- When Pool TLS Phase 1 is enabled, TLS initialization ordering changes
- TLS variable access BEFORE initialization → unmapped memory → SEGV
Why Pool TLS Triggers the Bug
Normal build (Pool TLS disabled):
- TLS variables auto-initialized to 0 on thread creation
g_hakmem_lock_depthaccessible- free() wrapper works
Pool TLS build (Phase 1.5a enabled):
- Additional TLS variables added:
g_tls_pool_head[7],g_tls_pool_count[7](pool_tls.c:12-13) - TLS segment grows significantly
- Thread library may defer TLS initialization
- First free() call → TLS not ready → SEGV on
g_hakmem_lock_depthaccess
TLS Variables Inventory
Pool TLS adds (core/pool_tls.c:12-13):
__thread void* g_tls_pool_head[POOL_SIZE_CLASSES]; // 7 * 8 bytes = 56 bytes
__thread uint32_t g_tls_pool_count[POOL_SIZE_CLASSES]; // 7 * 4 bytes = 28 bytes
Wrapper TLS variables (core/box/hak_wrappers.inc.h:32-38):
__thread uint64_t g_malloc_total_calls = 0;
__thread uint64_t g_malloc_tiny_size_match = 0;
__thread uint64_t g_malloc_fast_path_tried = 0;
__thread uint64_t g_malloc_fast_path_null = 0;
__thread uint64_t g_malloc_slow_path = 0;
extern __thread void* g_tls_sll_head[TINY_NUM_CLASSES]; // Defined elsewhere
Total TLS burden: 56 + 28 + 40 + (TINY_NUM_CLASSES * 8) = 124+ bytes before counting Tiny TLS cache
Why Debug Prints Never Appear
Execution flow:
free(ptr)
↓
hak_wrappers.inc.h:105 // free() entry
↓
line 106: g_free_wrapper_calls++ // atomic, works
↓
line 107: if (!ptr) return; // NULL check, works
↓
line 108: if (g_hakmem_lock_depth > 0) // ← SEGV HERE (TLS unmapped)
↓
NEVER REACHES line 117: hak_free_at(ptr, ...)
↓
NEVER REACHES hak_free_api.inc.h:138 (Pool TLS dispatch)
↓
NEVER PRINTS debug output at lines 145-146
GDB Evidence Analysis
From user report:
(gdb) p $rbp
$1 = (void *) 0x7ffff7137017
(gdb) p $rdi
$2 = 0
Crash instruction: movzbl -0x1(%rbp),%edx
Interpretation:
rdi = 0suggests free was called with NULL or corrupted pointerrbp = 0x7ffff7137017(unmapped address) → likely TLS segment base before initializationmovzbl -0x1(%rbp)is trying to read TLS variable → unmapped memory → SEGV
Root Cause Chain
- Pool TLS Phase 1.5a adds TLS variables (g_tls_pool_head, g_tls_pool_count)
- TLS segment size increases
- Thread library defers TLS allocation (optimization for large TLS segments)
- First free() call occurs BEFORE TLS initialization
g_hakmem_lock_depthaccess at line 108 → unmapped memory- SEGV before reaching Pool TLS dispatch code
Why Pool TLS Disabled Build Works
- Without Pool TLS: TLS segment is smaller
- Thread library initializes TLS immediately on thread creation
g_hakmem_lock_depthis always accessible- No SEGV
Missing Initialization
Pool TLS defines thread init function but NEVER calls it:
// core/pool_tls.c:104-107
void pool_thread_init(void) {
memset(g_tls_pool_head, 0, sizeof(g_tls_pool_head));
memset(g_tls_pool_count, 0, sizeof(g_tls_pool_count));
}
Search for calls:
grep -r "pool_thread_init" /mnt/workdisk/public_share/hakmem/core/
# Result: ONLY definition, NO calls!
No pthread_key_create + destructor for Pool TLS:
- Other subsystems use
pthread_oncefor TLS initialization (e.g., hakmem_pool.c:81) - Pool TLS has NO such initialization mechanism
Arena TLS Variables
Additional TLS burden (core/pool_tls_arena.c:7):
__thread PoolChunk g_tls_arena[POOL_SIZE_CLASSES];
Where PoolChunk is:
typedef struct {
void* chunk_base; // 8 bytes
size_t chunk_size; // 8 bytes
size_t offset; // 8 bytes
int growth_level; // 4 bytes (+ 4 padding)
} PoolChunk; // 32 bytes per class
Total Arena TLS: 32 * 7 = 224 bytes
Combined Pool TLS burden: 56 + 28 + 224 = 308 bytes (just for Pool TLS Phase 1.5a)
Why This Is a Heisenbug
Timing-dependent:
- If TLS happens to be initialized before first free() → works
- If free() called BEFORE TLS initialization → SEGV
- Larson benchmark allocates BEFORE freeing → high chance TLS is initialized by then
- Single-threaded tests with immediate free → high chance of SEGV
Load-dependent:
- More threads → more TLS segments → higher chance of deferred initialization
- Larger allocations → less free() calls → TLS more likely initialized
Recommended Fix
Option A: Explicit TLS Initialization (RECOMMENDED)
Add constructor with priority:
// core/pool_tls.c
__attribute__((constructor(101))) // Priority 101 (before main, after libc)
static void pool_tls_global_init(void) {
// Force TLS allocation for main thread
pool_thread_init();
}
// For pthread threads (not main)
static pthread_once_t g_pool_tls_key_once = PTHREAD_ONCE_INIT;
static pthread_key_t g_pool_tls_key;
static void pool_tls_pthread_init(void) {
pthread_key_create(&g_pool_tls_key, pool_thread_cleanup);
}
// Call from pool_alloc/pool_free entry
static inline void ensure_pool_tls_init(void) {
pthread_once(&g_pool_tls_key_once, pool_tls_pthread_init);
// Force TLS initialization on first use
static __thread int initialized = 0;
if (!initialized) {
pool_thread_init();
pthread_setspecific(g_pool_tls_key, (void*)1); // Mark initialized
initialized = 1;
}
}
Complexity: Medium (3-5 hours) Risk: Low Effectiveness: HIGH - guarantees TLS initialization before use
Option B: Lazy Initialization with Guard
Add guard variable:
// core/pool_tls.c
static __thread int g_pool_tls_ready = 0;
void* pool_alloc(size_t size) {
if (!g_pool_tls_ready) {
pool_thread_init();
g_pool_tls_ready = 1;
}
// ... rest of function
}
void pool_free(void* ptr) {
if (!g_pool_tls_ready) return; // Not our allocation
// ... rest of function
}
Complexity: Low (1-2 hours) Risk: Medium (guard access itself could SEGV) Effectiveness: MEDIUM
Option C: Reduce TLS Burden (ALTERNATIVE)
Move TLS variables to heap-allocated per-thread struct:
// core/pool_tls.h
typedef struct {
void* head[POOL_SIZE_CLASSES];
uint32_t count[POOL_SIZE_CLASSES];
PoolChunk arena[POOL_SIZE_CLASSES];
} PoolTLS;
// Single TLS pointer instead of 3 arrays
static __thread PoolTLS* g_pool_tls = NULL;
static inline PoolTLS* get_pool_tls(void) {
if (!g_pool_tls) {
g_pool_tls = mmap(NULL, sizeof(PoolTLS), PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
memset(g_pool_tls, 0, sizeof(PoolTLS));
}
return g_pool_tls;
}
Pros:
- TLS burden: 308 bytes → 8 bytes (single pointer)
- Thread library won't defer initialization
- Works with existing wrappers
Cons:
- Extra indirection (1 cycle penalty)
- Need pthread_key_create for cleanup
Complexity: Medium (4-6 hours) Risk: Low Effectiveness: HIGH
Verification Plan
After fix, test:
- Single-threaded immediate free:
./bench_random_mixed_hakmem 1000 8192 1234567
- Multi-threaded stress:
./bench_mid_large_mt_hakmem 4 10000
- Larson (currently works, ensure no regression):
./larson_hakmem 10 8 128 1024 1 12345 4
- Valgrind TLS check:
valgrind --tool=helgrind ./bench_random_mixed_hakmem 1000 8192 1234567
Priority: CRITICAL
Why:
- Blocks Pool TLS Phase 1.5a completely
- 100% reproducible in bench_random_mixed
- Root cause is architectural (TLS initialization ordering)
- Fix is required before any Pool TLS testing can proceed
Estimated Fix Time
- Option A (Recommended): 3-5 hours
- Option B (Quick Fix): 1-2 hours (but risky)
- Option C (Robust): 4-6 hours
Recommended: Option A (explicit pthread_once initialization)
Next Steps
- Implement Option A (pthread_once + constructor)
- Test with all benchmarks
- Add TLS initialization trace (env: HAKMEM_POOL_TLS_INIT_TRACE=1)
- Document TLS initialization order in code comments
- Add unit test for Pool TLS initialization
Investigation completed: 2025-11-09 Investigator: Claude Task Agent (Ultrathink mode) Severity: CRITICAL - Architecture bug, not implementation bug Confidence: 95% (high confidence based on TLS access pattern and GDB evidence)