Larson double-free investigation: Enhanced diagnostics + Remove buggy drain pushback

**Problem**: Larson benchmark crashes with TLS_SLL_DUP (double-free), 100% crash rate in debug

**Root Cause**: TLS drain pushback code (commit c2f104618) created duplicates by
pushing pointers back to TLS SLL while they were still in the linked list chain.

**Diagnostic Enhancements** (ChatGPT + Claude collaboration):
1. **Callsite Tracking**: Track file:line for each TLS SLL push (debug only)
   - Arrays: g_tls_sll_push_file[], g_tls_sll_push_line[]
   - Macro: tls_sll_push() auto-records __FILE__, __LINE__

2. **Enhanced Duplicate Detection**:
   - Scan depth: 64 → 256 nodes (deep duplicate detection)
   - Error message shows BOTH current and previous push locations
   - Calls ptr_trace_dump_now() for detailed analysis

3. **Evidence Captured**:
   - Both duplicate pushes from same line (221)
   - Pointer at position 11 in TLS SLL (count=18, scanned=11)
   - Confirms pointer allocated without being popped from TLS SLL

**Fix**:
- **core/box/tls_sll_drain_box.h**: Remove pushback code entirely
  - Old: Push back to TLS SLL on validation failure → duplicates!
  - New: Skip pointer (accept rare leak) to avoid duplicates
  - Rationale: SuperSlab lookup failures are transient/rare

**Status**: Fix implemented, ready for testing

**Updated**:
- LARSON_DOUBLE_FREE_INVESTIGATION.md: Root cause confirmed
This commit is contained in:
Moe Charm (CI)
2025-11-27 07:30:32 +09:00
parent c2f104618f
commit 8553894171
3 changed files with 83 additions and 45 deletions

View File

@ -146,35 +146,30 @@ static inline uint32_t tiny_tls_sll_drain(int class_idx, uint32_t batch_size) {
// Resolve SuperSlab/Slab (like slow path does)
SuperSlab* ss = hak_super_lookup(base);
if (!ss || ss->magic != SUPERSLAB_MAGIC) {
// CRITICAL FIX: Don't leak pointers when SuperSlab lookup fails!
// Problem: Pointer was popped from TLS SLL but not returned anywhere → leak + potential double-alloc
// Solution: Push back to TLS SLL (will retry on next drain cycle)
// CRITICAL FIX (2025-11-27): Don't push back - causes duplicates!
// Problem: Pushback bypasses duplicate checking and creates cycles
// Old buggy approach: Push back to TLS SLL → pointer at BOTH position 0 and position N
// New approach: Skip this pointer (accept rare leak) to avoid duplicates
// Leak is acceptable because SuperSlab lookup failure is transient/rare
if (g_debug) {
fprintf(stderr, "[TLS_SLL_DRAIN] PUSHBACK: class=%d base=%p (invalid SuperSlab, will retry)\n",
fprintf(stderr, "[TLS_SLL_DRAIN] SKIP: class=%d base=%p (invalid SuperSlab, pointer leaked)\n",
class_idx, base);
}
// Push back to TLS SLL head (retry later when SS becomes valid)
extern __thread TinyTLSSLL g_tls_sll[TINY_NUM_CLASSES];
tiny_next_write(class_idx, base, g_tls_sll[class_idx].head);
g_tls_sll[class_idx].head = base;
g_tls_sll[class_idx].count++; // Restore count
break; // Stop draining this class for now (avoid infinite retry loop)
// DO NOT push back - would create duplicate!
// Just continue to next pointer
continue;
}
// Get slab index
int slab_idx = slab_index_for(ss, base);
if (slab_idx < 0 || slab_idx >= ss_slabs_capacity(ss)) {
// CRITICAL FIX: Don't leak pointers when slab index is invalid!
// CRITICAL FIX (2025-11-27): Don't push back - causes duplicates!
if (g_debug) {
fprintf(stderr, "[TLS_SLL_DRAIN] PUSHBACK: class=%d base=%p (invalid slab_idx=%d, will retry)\n",
fprintf(stderr, "[TLS_SLL_DRAIN] SKIP: class=%d base=%p (invalid slab_idx=%d, pointer leaked)\n",
class_idx, base, slab_idx);
}
// Push back to TLS SLL head (retry later)
extern __thread TinyTLSSLL g_tls_sll[TINY_NUM_CLASSES];
tiny_next_write(class_idx, base, g_tls_sll[class_idx].head);
g_tls_sll[class_idx].head = base;
g_tls_sll[class_idx].count++;
break;
// DO NOT push back - would create duplicate!
continue;
}
// Get slab metadata