# Larson Double-Free Investigation Report ## Date: 2025-11-27 ## Summary Larson benchmark crashes with TLS_SLL_PUSH_DUP error (double-free detection). Investigation reveals potential metadata inconsistency causing same pointer to be allocated twice without proper free. ## Symptoms ``` [TLS_SLL_PUSH_DUP] cls=1 ptr=0x76e109240430 last_push_from=hak_tiny_free_fast_v2 last_pop_from=(null) ← Never popped from TLS SLL! where=hak_tiny_free_fast_v2 ``` **Key Observation**: Pointer was pushed to TLS SLL but never popped, yet being freed again. ## Root Cause Analysis ### Eliminated Hypotheses 1. ❌ **Larson Benchmark Bug**: ChatGPT analyzed larson.cpp - no double-free logic found 2. ❌ **Cross-Thread Free**: LARSON_FIX=1 doesn't prevent the crash 3. ❌ **Stale Header**: Fixed freelist header write (commit e4868bf23) but crash persists ### Current Leading Hypothesis: Metadata Inconsistency **Scenario**: 1. User: `free(P)` → P pushed to TLS SLL (count++) 2. **Without pop**: P somehow reallocated from slab freelist or carve 3. User: `p2 = malloc()` → Returns P (same address!) 4. User: `free(p2)` → Tries to push P to TLS SLL again 5. Duplicate detection: P already in TLS SLL → ABORT **This requires**: - `meta->used` count mismatch - P in both TLS SLL AND slab freelist simultaneously - Synchronization failure between TLS SLL and slab metadata ## Evidence ### TLS SLL Pop Logic (Suspicious) File: `core/box/tls_sll_box.h:570-572` ```c if (g_tls_sll[class_idx].count > 0) { g_tls_sll[class_idx].count--; // Conditional decrement! } ``` If count somehow becomes 0, head is updated but count doesn't decrement! ### TLS Drain Leak (Memory Leak Bug) File: `core/box/tls_sll_drain_box.h:148-154` ```c SuperSlab* ss = hak_super_lookup(base); if (!ss || ss->magic != SUPERSLAB_MAGIC) { fprintf(stderr, "[TLS_SLL_DRAIN] SKIP: ...\n"); continue; // ← Pointer DROPPED without returning to freelist! } ``` **Critical**: If SuperSlab lookup fails, pointer is popped but never returned → memory leak. ## Fixes Implemented ### 1. Freelist Header Write (commit e4868bf23) File: `core/tiny_superslab_alloc.inc.h:159-169` **Problem**: Freelist allocation path didn't write headers ```c // OLD (buggy) return block; // Returns BASE without header // NEW (fixed) void* user = tiny_region_id_write_header(block, meta->class_idx); return user; ``` **Impact**: Prevents stale headers, but doesn't fix double-free. ### 2. Abort on Duplicate (commit e4868bf23) File: `core/box/tls_sll_box.h:381` **Change**: `return true` → `abort()` for diagnostic backtrace **Impact**: Enables precise root cause identification. ## Root Cause CONFIRMED (2025-11-27) ### TLS Drain Pushback Bug Creates Duplicates! **File**: `core/box/tls_sll_drain_box.h:148-162` **Buggy Fix (commit c2f104618)**: ```c if (!ss || ss->magic != SUPERSLAB_MAGIC) { // CRITICAL BUG: Creates duplicates! tiny_next_write(class_idx, base, g_tls_sll[class_idx].head); g_tls_sll[class_idx].head = base; // ← Pushes to position 0 g_tls_sll[class_idx].count++; // ← But pointer ALREADY at position 11! break; } ``` **Scenario**: 1. TLS SLL has pointer at position 11 (count=18) 2. Drain loop pops pointer from TLS SLL (now count=17, but pointer still in chain at position 10) 3. SuperSlab lookup fails (transient state) 4. Pushback adds pointer to position 0 → **NOW AT TWO POSITIONS** (0 and 10) 5. Allocation pops from position 0 6. User frees → tries to push → **duplicate detected at position 10** **Evidence**: ``` [TLS_SLL_DUP] cls=1 ptr=0x... count=18 scanned=11 ``` Pointer found at position 11 during duplicate scan! **Correct Fix**: DON'T push back when already in TLS SLL. Just **stop draining** when validation fails. ### Priority 3: Enhanced Tracing Add debug logging to track pointer lifecycle: 1. Malloc: "P allocated from source X" 2. Free: "P freed to TLS SLL" 3. Pop: "P popped from TLS SLL" 4. Drain: "P drained to freelist" 5. Re-alloc: "P reallocated from freelist" ENV: `HAKMEM_TINY_PTR_TRACE=
` to track specific pointer. ## Crash Rate **Before fixes**: 47% (14/30 runs crash) **After header fix**: 100% (still crashes, just faster detection) ## References - Commit: e4868bf23 "Larson crash investigation: Add freelist header write + abort()" - Previous investigation: Task agent Phase 2 (identified TLS_SLL_PUSH_DUP pattern) - Larson benchmark analysis: ChatGPT confirmed no user-code bug