## Root Cause
TLS drain was dropping pointers when SuperSlab lookup or slab_idx validation failed:
- Pop pointer from TLS SLL
- Lookup/validation fails
- continue → LEAK! Pointer never returned to any freelist
## Impact
Memory leak + potential double allocation:
1. Pointer P popped but leaked
2. Same address P reallocated from carve/other source
3. User frees P again → duplicate detection → ABORT
## Fix
**Before (BUGGY)**:
```c
if (!ss || invalid_slab_idx) {
continue; // ← LEAK!
}
```
**After (FIXED)**:
```c
if (!ss || invalid_slab_idx) {
// Push back to TLS SLL head (retry later)
tiny_next_write(class_idx, base, g_tls_sll[class_idx].head);
g_tls_sll[class_idx].head = base;
g_tls_sll[class_idx].count++;
break; // Stop draining to avoid infinite retry
}
```
## Files Changed
- core/box/tls_sll_drain_box.h: Fix 2 leak sites (SS lookup + slab_idx validation)
- docs/analysis/LARSON_DOUBLE_FREE_INVESTIGATION.md: Investigation report
## Related
- Larson double-free investigation (47% crash rate)
- Commit e4868bf23: Freelist header write + abort() on duplicate
- ChatGPT analysis: Larson benchmark code is correct (no user bug)
4.0 KiB
Larson Double-Free Investigation Report
Date: 2025-11-27
Summary
Larson benchmark crashes with TLS_SLL_PUSH_DUP error (double-free detection). Investigation reveals potential metadata inconsistency causing same pointer to be allocated twice without proper free.
Symptoms
[TLS_SLL_PUSH_DUP] cls=1 ptr=0x76e109240430
last_push_from=hak_tiny_free_fast_v2
last_pop_from=(null) ← Never popped from TLS SLL!
where=hak_tiny_free_fast_v2
Key Observation: Pointer was pushed to TLS SLL but never popped, yet being freed again.
Root Cause Analysis
Eliminated Hypotheses
- ❌ Larson Benchmark Bug: ChatGPT analyzed larson.cpp - no double-free logic found
- ❌ Cross-Thread Free: LARSON_FIX=1 doesn't prevent the crash
- ❌ Stale Header: Fixed freelist header write (commit
e4868bf23) but crash persists
Current Leading Hypothesis: Metadata Inconsistency
Scenario:
- User:
free(P)→ P pushed to TLS SLL (count++) - Without pop: P somehow reallocated from slab freelist or carve
- User:
p2 = malloc()→ Returns P (same address!) - User:
free(p2)→ Tries to push P to TLS SLL again - Duplicate detection: P already in TLS SLL → ABORT
This requires:
meta->usedcount mismatch- P in both TLS SLL AND slab freelist simultaneously
- Synchronization failure between TLS SLL and slab metadata
Evidence
TLS SLL Pop Logic (Suspicious)
File: core/box/tls_sll_box.h:570-572
if (g_tls_sll[class_idx].count > 0) {
g_tls_sll[class_idx].count--; // Conditional decrement!
}
If count somehow becomes 0, head is updated but count doesn't decrement!
TLS Drain Leak (Memory Leak Bug)
File: core/box/tls_sll_drain_box.h:148-154
SuperSlab* ss = hak_super_lookup(base);
if (!ss || ss->magic != SUPERSLAB_MAGIC) {
fprintf(stderr, "[TLS_SLL_DRAIN] SKIP: ...\n");
continue; // ← Pointer DROPPED without returning to freelist!
}
Critical: If SuperSlab lookup fails, pointer is popped but never returned → memory leak.
Fixes Implemented
1. Freelist Header Write (commit e4868bf23)
File: core/tiny_superslab_alloc.inc.h:159-169
Problem: Freelist allocation path didn't write headers
// OLD (buggy)
return block; // Returns BASE without header
// NEW (fixed)
void* user = tiny_region_id_write_header(block, meta->class_idx);
return user;
Impact: Prevents stale headers, but doesn't fix double-free.
2. Abort on Duplicate (commit e4868bf23)
File: core/box/tls_sll_box.h:381
Change: return true → abort() for diagnostic backtrace
Impact: Enables precise root cause identification.
Next Steps
Priority 1: Verify Metadata Consistency
Add assertions to check:
-
Pointer is in ONLY ONE location at a time:
- TLS SLL
- Slab freelist
- Not both!
-
meta->usedcount matches reality
Priority 2: Fix TLS Drain Leak
File: core/box/tls_sll_drain_box.h:154
Current (buggy):
if (!ss || ss->magic != SUPERSLAB_MAGIC) {
continue; // ← LEAK!
}
Proposed Fix:
if (!ss || ss->magic != SUPERSLAB_MAGIC) {
// Option A: Push back to TLS SLL (retry later)
tls_sll_push(class_idx, base, UINT32_MAX);
break; // Stop draining this class for now
// Option B: Force to remote queue (if lookup fails, assume remote)
// (requires more analysis)
}
Priority 3: Enhanced Tracing
Add debug logging to track pointer lifecycle:
- Malloc: "P allocated from source X"
- Free: "P freed to TLS SLL"
- Pop: "P popped from TLS SLL"
- Drain: "P drained to freelist"
- Re-alloc: "P reallocated from freelist"
ENV: HAKMEM_TINY_PTR_TRACE=<address> to track specific pointer.
Crash Rate
Before fixes: 47% (14/30 runs crash) After header fix: 100% (still crashes, just faster detection)
References
- Commit:
e4868bf23"Larson crash investigation: Add freelist header write + abort()" - Previous investigation: Task agent Phase 2 (identified TLS_SLL_PUSH_DUP pattern)
- Larson benchmark analysis: ChatGPT confirmed no user-code bug