# Pool TLS Phase 1.5a SEGV Deep Investigation ## Executive Summary **ROOT CAUSE IDENTIFIED: TLS Variable Uninitialized Access** The SEGV occurs **BEFORE** Pool TLS free dispatch code (line 138-171 in `hak_free_api.inc.h`) because the crash happens during **free() wrapper TLS variable access** at line 108. ## Critical Finding **Evidence:** - Debug fprintf() added at lines 145-146 in `hak_free_api.inc.h` - **NO debug output appears** before SEGV - GDB shows crash at `movzbl -0x1(%rbp),%edx` with `rdi = 0x0` - This means: The crash happens in the **free() wrapper BEFORE reaching Pool TLS dispatch** ## Exact Crash Location **File:** `/mnt/workdisk/public_share/hakmem/core/box/hak_wrappers.inc.h:108` ```c void free(void* ptr) { atomic_fetch_add_explicit(&g_free_wrapper_calls, 1, memory_order_relaxed); if (!ptr) return; if (g_hakmem_lock_depth > 0) { // ← CRASH HERE (line 108) extern void __libc_free(void*); __libc_free(ptr); return; } ``` **Analysis:** - `g_hakmem_lock_depth` is a **__thread TLS variable** - When Pool TLS Phase 1 is enabled, TLS initialization ordering changes - TLS variable access BEFORE initialization → unmapped memory → **SEGV** ## Why Pool TLS Triggers the Bug **Normal build (Pool TLS disabled):** 1. TLS variables auto-initialized to 0 on thread creation 2. `g_hakmem_lock_depth` accessible 3. free() wrapper works **Pool TLS build (Phase 1.5a enabled):** 1. Additional TLS variables added: `g_tls_pool_head[7]`, `g_tls_pool_count[7]` (pool_tls.c:12-13) 2. TLS segment grows significantly 3. Thread library may defer TLS initialization 4. **First free() call → TLS not ready → SEGV on `g_hakmem_lock_depth` access** ## TLS Variables Inventory **Pool TLS adds (core/pool_tls.c:12-13):** ```c __thread void* g_tls_pool_head[POOL_SIZE_CLASSES]; // 7 * 8 bytes = 56 bytes __thread uint32_t g_tls_pool_count[POOL_SIZE_CLASSES]; // 7 * 4 bytes = 28 bytes ``` **Wrapper TLS variables (core/box/hak_wrappers.inc.h:32-38):** ```c __thread uint64_t g_malloc_total_calls = 0; __thread uint64_t g_malloc_tiny_size_match = 0; __thread uint64_t g_malloc_fast_path_tried = 0; __thread uint64_t g_malloc_fast_path_null = 0; __thread uint64_t g_malloc_slow_path = 0; extern __thread void* g_tls_sll_head[TINY_NUM_CLASSES]; // Defined elsewhere ``` **Total TLS burden:** 56 + 28 + 40 + (TINY_NUM_CLASSES * 8) = 124+ bytes **before** counting Tiny TLS cache ## Why Debug Prints Never Appear **Execution flow:** ``` free(ptr) ↓ hak_wrappers.inc.h:105 // free() entry ↓ line 106: g_free_wrapper_calls++ // atomic, works ↓ line 107: if (!ptr) return; // NULL check, works ↓ line 108: if (g_hakmem_lock_depth > 0) // ← SEGV HERE (TLS unmapped) ↓ NEVER REACHES line 117: hak_free_at(ptr, ...) ↓ NEVER REACHES hak_free_api.inc.h:138 (Pool TLS dispatch) ↓ NEVER PRINTS debug output at lines 145-146 ``` ## GDB Evidence Analysis **From user report:** ``` (gdb) p $rbp $1 = (void *) 0x7ffff7137017 (gdb) p $rdi $2 = 0 Crash instruction: movzbl -0x1(%rbp),%edx ``` **Interpretation:** - `rdi = 0` suggests free was called with NULL or corrupted pointer - `rbp = 0x7ffff7137017` (unmapped address) → likely **TLS segment base** before initialization - `movzbl -0x1(%rbp)` is trying to read TLS variable → unmapped memory → SEGV ## Root Cause Chain 1. **Pool TLS Phase 1.5a adds TLS variables** (g_tls_pool_head, g_tls_pool_count) 2. **TLS segment size increases** 3. **Thread library defers TLS allocation** (optimization for large TLS segments) 4. **First free() call occurs BEFORE TLS initialization** 5. **`g_hakmem_lock_depth` access at line 108 → unmapped memory** 6. **SEGV before reaching Pool TLS dispatch code** ## Why Pool TLS Disabled Build Works - Without Pool TLS: TLS segment is smaller - Thread library initializes TLS immediately on thread creation - `g_hakmem_lock_depth` is always accessible - No SEGV ## Missing Initialization **Pool TLS defines thread init function but NEVER calls it:** ```c // core/pool_tls.c:104-107 void pool_thread_init(void) { memset(g_tls_pool_head, 0, sizeof(g_tls_pool_head)); memset(g_tls_pool_count, 0, sizeof(g_tls_pool_count)); } ``` **Search for calls:** ```bash grep -r "pool_thread_init" /mnt/workdisk/public_share/hakmem/core/ # Result: ONLY definition, NO calls! ``` **No pthread_key_create + destructor for Pool TLS:** - Other subsystems use `pthread_once` for TLS initialization (e.g., hakmem_pool.c:81) - Pool TLS has NO such initialization mechanism ## Arena TLS Variables **Additional TLS burden (core/pool_tls_arena.c:7):** ```c __thread PoolChunk g_tls_arena[POOL_SIZE_CLASSES]; ``` Where `PoolChunk` is: ```c typedef struct { void* chunk_base; // 8 bytes size_t chunk_size; // 8 bytes size_t offset; // 8 bytes int growth_level; // 4 bytes (+ 4 padding) } PoolChunk; // 32 bytes per class ``` **Total Arena TLS:** 32 * 7 = 224 bytes **Combined Pool TLS burden:** 56 + 28 + 224 = **308 bytes** (just for Pool TLS Phase 1.5a) ## Why This Is a Heisenbug **Timing-dependent:** - If TLS happens to be initialized before first free() → works - If free() called BEFORE TLS initialization → SEGV - Larson benchmark allocates BEFORE freeing → high chance TLS is initialized by then - Single-threaded tests with immediate free → high chance of SEGV **Load-dependent:** - More threads → more TLS segments → higher chance of deferred initialization - Larger allocations → less free() calls → TLS more likely initialized ## Recommended Fix ### Option A: Explicit TLS Initialization (RECOMMENDED) **Add constructor with priority:** ```c // core/pool_tls.c __attribute__((constructor(101))) // Priority 101 (before main, after libc) static void pool_tls_global_init(void) { // Force TLS allocation for main thread pool_thread_init(); } // For pthread threads (not main) static pthread_once_t g_pool_tls_key_once = PTHREAD_ONCE_INIT; static pthread_key_t g_pool_tls_key; static void pool_tls_pthread_init(void) { pthread_key_create(&g_pool_tls_key, pool_thread_cleanup); } // Call from pool_alloc/pool_free entry static inline void ensure_pool_tls_init(void) { pthread_once(&g_pool_tls_key_once, pool_tls_pthread_init); // Force TLS initialization on first use static __thread int initialized = 0; if (!initialized) { pool_thread_init(); pthread_setspecific(g_pool_tls_key, (void*)1); // Mark initialized initialized = 1; } } ``` **Complexity:** Medium (3-5 hours) **Risk:** Low **Effectiveness:** HIGH - guarantees TLS initialization before use ### Option B: Lazy Initialization with Guard **Add guard variable:** ```c // core/pool_tls.c static __thread int g_pool_tls_ready = 0; void* pool_alloc(size_t size) { if (!g_pool_tls_ready) { pool_thread_init(); g_pool_tls_ready = 1; } // ... rest of function } void pool_free(void* ptr) { if (!g_pool_tls_ready) return; // Not our allocation // ... rest of function } ``` **Complexity:** Low (1-2 hours) **Risk:** Medium (guard access itself could SEGV) **Effectiveness:** MEDIUM ### Option C: Reduce TLS Burden (ALTERNATIVE) **Move TLS variables to heap-allocated per-thread struct:** ```c // core/pool_tls.h typedef struct { void* head[POOL_SIZE_CLASSES]; uint32_t count[POOL_SIZE_CLASSES]; PoolChunk arena[POOL_SIZE_CLASSES]; } PoolTLS; // Single TLS pointer instead of 3 arrays static __thread PoolTLS* g_pool_tls = NULL; static inline PoolTLS* get_pool_tls(void) { if (!g_pool_tls) { g_pool_tls = mmap(NULL, sizeof(PoolTLS), PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0); memset(g_pool_tls, 0, sizeof(PoolTLS)); } return g_pool_tls; } ``` **Pros:** - TLS burden: 308 bytes → 8 bytes (single pointer) - Thread library won't defer initialization - Works with existing wrappers **Cons:** - Extra indirection (1 cycle penalty) - Need pthread_key_create for cleanup **Complexity:** Medium (4-6 hours) **Risk:** Low **Effectiveness:** HIGH ## Verification Plan **After fix, test:** 1. **Single-threaded immediate free:** ```bash ./bench_random_mixed_hakmem 1000 8192 1234567 ``` 2. **Multi-threaded stress:** ```bash ./bench_mid_large_mt_hakmem 4 10000 ``` 3. **Larson (currently works, ensure no regression):** ```bash ./larson_hakmem 10 8 128 1024 1 12345 4 ``` 4. **Valgrind TLS check:** ```bash valgrind --tool=helgrind ./bench_random_mixed_hakmem 1000 8192 1234567 ``` ## Priority: CRITICAL **Why:** - Blocks Pool TLS Phase 1.5a completely - 100% reproducible in bench_random_mixed - Root cause is architectural (TLS initialization ordering) - Fix is required before any Pool TLS testing can proceed ## Estimated Fix Time - **Option A (Recommended):** 3-5 hours - **Option B (Quick Fix):** 1-2 hours (but risky) - **Option C (Robust):** 4-6 hours **Recommended:** Option A (explicit pthread_once initialization) ## Next Steps 1. Implement Option A (pthread_once + constructor) 2. Test with all benchmarks 3. Add TLS initialization trace (env: HAKMEM_POOL_TLS_INIT_TRACE=1) 4. Document TLS initialization order in code comments 5. Add unit test for Pool TLS initialization --- **Investigation completed:** 2025-11-09 **Investigator:** Claude Task Agent (Ultrathink mode) **Severity:** CRITICAL - Architecture bug, not implementation bug **Confidence:** 95% (high confidence based on TLS access pattern and GDB evidence)