Fix critical TLS drain memory leak causing potential double-free

## Root Cause

TLS drain was dropping pointers when SuperSlab lookup or slab_idx validation failed:
- Pop pointer from TLS SLL
- Lookup/validation fails
- continue → LEAK! Pointer never returned to any freelist

## Impact

Memory leak + potential double allocation:
1. Pointer P popped but leaked
2. Same address P reallocated from carve/other source
3. User frees P again → duplicate detection → ABORT

## Fix

**Before (BUGGY)**:
```c
if (!ss || invalid_slab_idx) {
    continue;  // ← LEAK!
}
```

**After (FIXED)**:
```c
if (!ss || invalid_slab_idx) {
    // Push back to TLS SLL head (retry later)
    tiny_next_write(class_idx, base, g_tls_sll[class_idx].head);
    g_tls_sll[class_idx].head = base;
    g_tls_sll[class_idx].count++;
    break;  // Stop draining to avoid infinite retry
}
```

## Files Changed

- core/box/tls_sll_drain_box.h: Fix 2 leak sites (SS lookup + slab_idx validation)
- docs/analysis/LARSON_DOUBLE_FREE_INVESTIGATION.md: Investigation report

## Related

- Larson double-free investigation (47% crash rate)
- Commit e4868bf23: Freelist header write + abort() on duplicate
- ChatGPT analysis: Larson benchmark code is correct (no user bug)
This commit is contained in:
Moe Charm (CI)
2025-11-27 06:49:38 +09:00
parent e4868bf236
commit c2f104618f
2 changed files with 161 additions and 6 deletions

View File

@ -146,23 +146,35 @@ static inline uint32_t tiny_tls_sll_drain(int class_idx, uint32_t batch_size) {
// Resolve SuperSlab/Slab (like slow path does)
SuperSlab* ss = hak_super_lookup(base);
if (!ss || ss->magic != SUPERSLAB_MAGIC) {
// Invalid SuperSlab - skip this block
// CRITICAL FIX: Don't leak pointers when SuperSlab lookup fails!
// Problem: Pointer was popped from TLS SLL but not returned anywhere → leak + potential double-alloc
// Solution: Push back to TLS SLL (will retry on next drain cycle)
if (g_debug) {
fprintf(stderr, "[TLS_SLL_DRAIN] SKIP: class=%d base=%p (invalid SuperSlab)\n",
fprintf(stderr, "[TLS_SLL_DRAIN] PUSHBACK: class=%d base=%p (invalid SuperSlab, will retry)\n",
class_idx, base);
}
continue;
// Push back to TLS SLL head (retry later when SS becomes valid)
extern __thread TinyTLSSLL g_tls_sll[TINY_NUM_CLASSES];
tiny_next_write(class_idx, base, g_tls_sll[class_idx].head);
g_tls_sll[class_idx].head = base;
g_tls_sll[class_idx].count++; // Restore count
break; // Stop draining this class for now (avoid infinite retry loop)
}
// Get slab index
int slab_idx = slab_index_for(ss, base);
if (slab_idx < 0 || slab_idx >= ss_slabs_capacity(ss)) {
// Invalid slab index - skip this block
// CRITICAL FIX: Don't leak pointers when slab index is invalid!
if (g_debug) {
fprintf(stderr, "[TLS_SLL_DRAIN] SKIP: class=%d base=%p (invalid slab_idx=%d)\n",
fprintf(stderr, "[TLS_SLL_DRAIN] PUSHBACK: class=%d base=%p (invalid slab_idx=%d, will retry)\n",
class_idx, base, slab_idx);
}
continue;
// Push back to TLS SLL head (retry later)
extern __thread TinyTLSSLL g_tls_sll[TINY_NUM_CLASSES];
tiny_next_write(class_idx, base, g_tls_sll[class_idx].head);
g_tls_sll[class_idx].head = base;
g_tls_sll[class_idx].count++;
break;
}
// Get slab metadata