## Changes ### 1. core/page_arena.c - Removed init failure message (lines 25-27) - error is handled by returning early - All other fprintf statements already wrapped in existing #if !HAKMEM_BUILD_RELEASE blocks ### 2. core/hakmem.c - Wrapped SIGSEGV handler init message (line 72) - CRITICAL: Kept SIGSEGV/SIGBUS/SIGABRT error messages (lines 62-64) - production needs crash logs ### 3. core/hakmem_shared_pool.c - Wrapped all debug fprintf statements in #if !HAKMEM_BUILD_RELEASE: - Node pool exhaustion warning (line 252) - SP_META_CAPACITY_ERROR warning (line 421) - SP_FIX_GEOMETRY debug logging (line 745) - SP_ACQUIRE_STAGE0.5_EMPTY debug logging (line 865) - SP_ACQUIRE_STAGE0_L0 debug logging (line 803) - SP_ACQUIRE_STAGE1_LOCKFREE debug logging (line 922) - SP_ACQUIRE_STAGE2_LOCKFREE debug logging (line 996) - SP_ACQUIRE_STAGE3 debug logging (line 1116) - SP_SLOT_RELEASE debug logging (line 1245) - SP_SLOT_FREELIST_LOCKFREE debug logging (line 1305) - SP_SLOT_COMPLETELY_EMPTY debug logging (line 1316) - Fixed lock_stats_init() for release builds (lines 60-65) - ensure g_lock_stats_enabled is initialized ## Performance Validation Before: 51M ops/s (with debug fprintf overhead) After: 49.1M ops/s (consistent performance, fprintf removed from hot paths) ## Build & Test ```bash ./build.sh larson_hakmem ./out/release/larson_hakmem 1 5 1 1000 100 10000 42 # Result: 49.1M ops/s ``` Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
413 lines
15 KiB
Markdown
413 lines
15 KiB
Markdown
# Ultra-Deep Analysis: Remaining Bugs in Remote Drain System
|
|
|
|
**Date**: 2025-11-04
|
|
**Status**: 🔴 **CRITICAL RACE CONDITION IDENTIFIED**
|
|
**Scope**: Multi-threaded freelist corruption via concurrent `ss_remote_drain_to_freelist()` calls
|
|
|
|
---
|
|
|
|
## Executive Summary
|
|
|
|
**Root Cause Found**: **Concurrent draining of the same slab from multiple threads WITHOUT ownership synchronization**
|
|
|
|
The crash at `fault_addr=0x6261` is caused by freelist chain corruption when multiple threads simultaneously call `ss_remote_drain_to_freelist()` on the same slab without exclusive ownership. The pointer truncation (0x6261) is a symptom of concurrent modification to the freelist links.
|
|
|
|
**Impact**:
|
|
- Fix #1, Fix #2, and multiple paths in `tiny_refill.h` all drain without ownership
|
|
- ANY two threads operating on the same slab can race and corrupt the freelist
|
|
- Explains why crashes still occur after 4012 events (race is timing-dependent)
|
|
|
|
---
|
|
|
|
## 1. The Freelist Corruption Mechanism
|
|
|
|
### 1.1 How `ss_remote_drain_to_freelist()` Works
|
|
|
|
```c
|
|
// hakmem_tiny_superslab.h:345-365
|
|
static inline void ss_remote_drain_to_freelist(SuperSlab* ss, int slab_idx) {
|
|
_Atomic(uintptr_t)* head = &ss->remote_heads[slab_idx];
|
|
uintptr_t p = atomic_exchange_explicit(head, (uintptr_t)NULL, memory_order_acq_rel);
|
|
if (p == 0) return;
|
|
TinySlabMeta* meta = &ss->slabs[slab_idx];
|
|
uint32_t drained = 0;
|
|
while (p != 0) {
|
|
void* node = (void*)p;
|
|
uintptr_t next = (uintptr_t)(*(void**)node); // ← Read next pointer
|
|
*(void**)node = meta->freelist; // ← CRITICAL: Write freelist pointer
|
|
meta->freelist = node; // ← CRITICAL: Update freelist head
|
|
p = next;
|
|
drained++;
|
|
}
|
|
// Reset remote count after full drain
|
|
atomic_store_explicit(&ss->remote_counts[slab_idx], 0u, memory_order_relaxed);
|
|
}
|
|
```
|
|
|
|
**KEY OBSERVATION**: The while loop modifies `meta->freelist` **WITHOUT any atomic protection**.
|
|
|
|
### 1.2 Race Condition Scenario
|
|
|
|
**Setup**:
|
|
- Slab 4 of SuperSlab X has `remote_heads[4] != 0` (pending remote frees)
|
|
- Thread A (T1) and Thread B (T2) both want to drain slab 4
|
|
- Neither thread owns slab 4
|
|
|
|
**Timeline**:
|
|
|
|
| Time | Thread A (Fix #2 path) | Thread B (Sticky refill path) | Result |
|
|
|------|------------------------|-------------------------------|--------|
|
|
| T0 | Enters `hak_tiny_alloc_superslab()` | Enters `tiny_refill_try_fast()` sticky ring | |
|
|
| T1 | Loops through all slabs, reaches i=4 | Finds slab 4 in sticky ring | |
|
|
| T2 | Sees `remote_heads[4] != 0` | Sees `has_remote != 0` | |
|
|
| T3 | Calls `ss_remote_drain_to_freelist(ss, 4)` | Calls `ss_remote_drain_to_freelist(ss, 4)` | **RACE!** |
|
|
| T4 | `atomic_exchange(&remote_heads[4], NULL)` → gets list A | `atomic_exchange(&remote_heads[4], NULL)` → gets NULL | T2 returns early (p==0) |
|
|
| T5 | Enters while loop, modifies `meta->freelist` | - | Safe (only T1 draining) |
|
|
|
|
**BUT**, if T2 enters the drain **BEFORE** T1 completes the atomic_exchange:
|
|
|
|
| Time | Thread A | Thread B | Result |
|
|
|------|----------|----------|--------|
|
|
| T3 | Calls `ss_remote_drain_to_freelist(ss, 4)` | Calls `ss_remote_drain_to_freelist(ss, 4)` | **RACE!** |
|
|
| T4 | `p = atomic_exchange(&remote_heads[4], NULL)` → gets list A | `p = atomic_exchange(&remote_heads[4], NULL)` → gets NULL | T2 safe exit |
|
|
| T5 | `while (p != 0)` - starts draining | - | Only T1 draining |
|
|
|
|
**HOWEVER**, the REAL race is **NOT** in the atomic_exchange (which is atomic), but in the **while loop**:
|
|
|
|
**Actual Race** (Fix #1 vs Fix #3):
|
|
|
|
| Time | Thread A (Fix #1: `superslab_refill`) | Thread B (Fix #3: Mailbox path) | Result |
|
|
|------|----------------------------------------|----------------------------------|--------|
|
|
| T0 | Enters `superslab_refill()` for class 4 | Enters `tiny_refill_try_fast()` Mailbox path | |
|
|
| T1 | Reaches Priority 1 loop (line 614-621) | Fetches slab entry from mailbox | |
|
|
| T2 | Iterates i=0..tls_cap-1, reaches i=5 | Validates slab 5 | |
|
|
| T3 | Sees `remote_heads[5] != 0` | Calls `tiny_tls_bind_slab(tls, mss, 5)` | |
|
|
| T4 | Calls `ss_remote_drain_to_freelist(ss, 5)` | Calls `ss_owner_cas(m, self)` - Claims ownership | |
|
|
| T5 | `p = atomic_exchange(&remote_heads[5], NULL)` → gets list A | Sees `remote_heads[5] != 0` (race!) | **BOTH see remote!=0** |
|
|
| T6 | Enters while loop: `next = *(void**)node` | Calls `ss_remote_drain_to_freelist(mss, 5)` | |
|
|
| T7 | `*(void**)node = meta->freelist` | `p = atomic_exchange(&remote_heads[5], NULL)` → gets NULL | T2 returns (p==0) |
|
|
| T8 | `meta->freelist = node` | - | Only T1 draining now |
|
|
|
|
**Wait, this scenario is also safe!** The atomic_exchange ensures only ONE thread gets the remote list.
|
|
|
|
### 1.3 The REAL Race: Concurrent Modification of `meta->freelist`
|
|
|
|
The actual problem is **NOT** in the atomic_exchange, but in the assumption that only the owner thread should modify `meta->freelist`.
|
|
|
|
**The Bug**: Fix #1 and Fix #2 drain slabs that might be **owned by another thread**.
|
|
|
|
**Scenario**:
|
|
|
|
| Time | Thread A (Owner of slab 5) | Thread B (Fix #2: drains ALL slabs) | Result |
|
|
|------|----------------------------|--------------------------------------|--------|
|
|
| T0 | Owns slab 5, allocating from freelist | Enters `hak_tiny_alloc_superslab()` for class X | |
|
|
| T1 | Reads `ptr = meta->freelist` | Loops through ALL slabs, reaches i=5 | |
|
|
| T2 | Reads `meta->freelist = *(void**)ptr` (pop) | Sees `remote_heads[5] != 0` | |
|
|
| T3 | - | Calls `ss_remote_drain_to_freelist(ss, 5)` | **NO ownership check!** |
|
|
| T4 | - | `p = atomic_exchange(&remote_heads[5], NULL)` → gets list | |
|
|
| T5 | **Writes**: `meta->freelist = next_ptr` | **Reads**: `old_head = meta->freelist` | **RACE on meta->freelist!** |
|
|
| T6 | - | **Writes**: `*(void**)node = old_head` | |
|
|
| T7 | - | **Writes**: `meta->freelist = node` | **Freelist corruption!** |
|
|
|
|
**Result**:
|
|
- Thread A's write to `meta->freelist` at T5 is **overwritten** by Thread B at T7
|
|
- Thread A's popped pointer is **lost** from the freelist
|
|
- Or worse: partial write, leading to truncated pointer (0x6261)
|
|
|
|
---
|
|
|
|
## 2. All Unsafe Call Sites
|
|
|
|
### 2.1 Category: UNSAFE (No Ownership Check Before Drain)
|
|
|
|
| File | Line | Context | Path | Risk |
|
|
|------|------|---------|------|------|
|
|
| `hakmem_tiny_free.inc` | 620 | **Fix #1** `superslab_refill()` Priority 1 | Alloc slow path | 🔴 **HIGH** |
|
|
| `hakmem_tiny_free.inc` | 756 | **Fix #2** `hak_tiny_alloc_superslab()` | Alloc fast path | 🔴 **HIGH** |
|
|
| `tiny_refill.h` | 47 | Sticky ring refill | Alloc refill path | 🟡 **MEDIUM** |
|
|
| `tiny_refill.h` | 65 | Hot slot refill | Alloc refill path | 🟡 **MEDIUM** |
|
|
| `tiny_refill.h` | 80 | Bench refill | Alloc refill path | 🟡 **MEDIUM** |
|
|
| `tiny_mmap_gate.h` | 57 | mmap gate sweep | Alloc refill path | 🟡 **MEDIUM** |
|
|
| `hakmem_tiny_superslab.h` | 376 | `ss_remote_drain_light()` | Background drain | 🟠 **LOW** (unused?) |
|
|
| `hakmem_tiny.c` | 652 | Old drain path | Legacy code | 🟠 **LOW** (unused?) |
|
|
|
|
### 2.2 Category: SAFE (Ownership Claimed BEFORE Drain)
|
|
|
|
| File | Line | Context | Protection |
|
|
|------|------|---------|-----------|
|
|
| `tiny_refill.h` | 100-105 | **Fix #3** Mailbox path | ✅ `tiny_tls_bind_slab()` + `ss_owner_cas()` BEFORE drain |
|
|
|
|
### 2.3 Category: PROBABLY SAFE (Special Cases)
|
|
|
|
| File | Line | Context | Why Safe? |
|
|
|------|------|---------|-----------|
|
|
| `hakmem_tiny_free.inc` | 592 | `superslab_refill()` adopt path | Just adopted, unlikely concurrent access |
|
|
|
|
---
|
|
|
|
## 3. Why Fix #3 is Correct (and Others Are Not)
|
|
|
|
### 3.1 Fix #3: Mailbox Path (CORRECT)
|
|
|
|
```c
|
|
// tiny_refill.h:96-106
|
|
// BUGFIX: Claim ownership BEFORE draining remote queue (fixes FAST_CAP=0 SEGV)
|
|
tiny_tls_bind_slab(tls, mss, midx); // Bind to TLS
|
|
ss_owner_cas(m, tiny_self_u32()); // ✅ CLAIM OWNERSHIP FIRST
|
|
|
|
// NOW safe to drain - we're the owner
|
|
if (atomic_load_explicit(&mss->remote_heads[midx], memory_order_acquire) != 0) {
|
|
ss_remote_drain_to_freelist(mss, midx); // ✅ Safe: we own the slab
|
|
}
|
|
```
|
|
|
|
**Why this works**:
|
|
- `ss_owner_cas()` sets `m->owner_tid = self` (line 385-386 of hakmem_tiny_superslab.h)
|
|
- Only the owner thread should modify `meta->freelist` directly
|
|
- Other threads must use `ss_remote_push()` to add to remote queue
|
|
- By claiming ownership BEFORE draining, we ensure exclusive access to `meta->freelist`
|
|
|
|
### 3.2 Fix #1 and Fix #2 (INCORRECT)
|
|
|
|
```c
|
|
// hakmem_tiny_free.inc:614-621 (Fix #1)
|
|
for (int i = 0; i < tls_cap; i++) {
|
|
int has_remote = (atomic_load_explicit(&tls->ss->remote_heads[i], memory_order_acquire) != 0);
|
|
if (has_remote) {
|
|
ss_remote_drain_to_freelist(tls->ss, i); // ❌ NO OWNERSHIP CHECK!
|
|
}
|
|
```
|
|
|
|
```c
|
|
// hakmem_tiny_free.inc:749-757 (Fix #2)
|
|
for (int i = 0; i < tls_cap; i++) {
|
|
uintptr_t remote_val = atomic_load_explicit(&tls->ss->remote_heads[i], memory_order_acquire);
|
|
if (remote_val != 0) {
|
|
ss_remote_drain_to_freelist(tls->ss, i); // ❌ NO OWNERSHIP CHECK!
|
|
}
|
|
}
|
|
```
|
|
|
|
**Why this is broken**:
|
|
- Drains ALL slabs in the SuperSlab (i=0..tls_cap-1)
|
|
- Does NOT check `m->owner_tid` before draining
|
|
- Can drain slabs owned by OTHER threads
|
|
- Concurrent modification of `meta->freelist` → corruption
|
|
|
|
### 3.3 Other Unsafe Paths
|
|
|
|
**Sticky Ring** (tiny_refill.h:47):
|
|
```c
|
|
if (!lm->freelist && has_remote) ss_remote_drain_to_freelist(last_ss, li); // ❌ Drain BEFORE ownership
|
|
if (lm->freelist) {
|
|
tiny_tls_bind_slab(tls, last_ss, li);
|
|
ss_owner_cas(lm, tiny_self_u32()); // ← Ownership AFTER drain
|
|
return last_ss;
|
|
}
|
|
```
|
|
|
|
**Hot Slot** (tiny_refill.h:65):
|
|
```c
|
|
if (!m->freelist && atomic_load_explicit(&hss->remote_heads[hidx], memory_order_acquire) != 0)
|
|
ss_remote_drain_to_freelist(hss, hidx); // ❌ Drain BEFORE ownership
|
|
if (m->freelist) {
|
|
tiny_tls_bind_slab(tls, hss, hidx);
|
|
ss_owner_cas(m, tiny_self_u32()); // ← Ownership AFTER drain
|
|
```
|
|
|
|
**Same pattern**: Drain first, claim ownership later → Race window!
|
|
|
|
---
|
|
|
|
## 4. Explaining the `fault_addr=0x6261` Pattern
|
|
|
|
### 4.1 Observed Pattern
|
|
|
|
```
|
|
rip=0x00005e3b94a28ece
|
|
fault_addr=0x0000000000006261
|
|
```
|
|
|
|
Previous analysis found pointers like `0x7a1ad5a06261` → truncated to `0x6261` (lower 16 bits).
|
|
|
|
### 4.2 Probable Cause: Partial Write During Race
|
|
|
|
**Scenario**:
|
|
1. Thread A: Reads `ptr = meta->freelist` → `0x7a1ad5a06261`
|
|
2. Thread B: Concurrently drains, modifies `meta->freelist`
|
|
3. Thread A: Tries to dereference `ptr`, but pointer was partially overwritten
|
|
4. Result: Segmentation fault at `0x6261` (incomplete pointer)
|
|
|
|
**OR**:
|
|
- CPU store buffer reordering
|
|
- Non-atomic 64-bit write on some architectures
|
|
- Cache coherency issue
|
|
|
|
**Bottom line**: Concurrent writes to `meta->freelist` without synchronization → undefined behavior.
|
|
|
|
---
|
|
|
|
## 5. Recommended Fixes
|
|
|
|
### 5.1 Option A: Remove Fix #1 and Fix #2 (SAFEST)
|
|
|
|
**Rationale**:
|
|
- Fix #3 (Mailbox) already drains safely with ownership
|
|
- Fix #1 and Fix #2 are redundant AND unsafe
|
|
- The sticky/hot/bench paths need fixing separately
|
|
|
|
**Changes**:
|
|
1. **Delete Fix #1** (hakmem_tiny_free.inc:615-621):
|
|
```c
|
|
// REMOVE THIS LOOP:
|
|
for (int i = 0; i < tls_cap; i++) {
|
|
int has_remote = (atomic_load_explicit(&tls->ss->remote_heads[i], memory_order_acquire) != 0);
|
|
if (has_remote) {
|
|
ss_remote_drain_to_freelist(tls->ss, i);
|
|
}
|
|
}
|
|
```
|
|
|
|
2. **Delete Fix #2** (hakmem_tiny_free.inc:729-767):
|
|
```c
|
|
// REMOVE THIS ENTIRE BLOCK (lines 729-767)
|
|
```
|
|
|
|
3. **Keep Fix #3** (tiny_refill.h:96-106) - it's correct!
|
|
|
|
**Expected Impact**:
|
|
- Eliminates the main source of concurrent drain races
|
|
- May still crash if sticky/hot/bench paths race with each other
|
|
- But frequency should drop dramatically
|
|
|
|
### 5.2 Option B: Add Ownership Check to Fix #1 and Fix #2
|
|
|
|
**Changes**:
|
|
```c
|
|
// Fix #1: hakmem_tiny_free.inc:615-621
|
|
for (int i = 0; i < tls_cap; i++) {
|
|
TinySlabMeta* m = &tls->ss->slabs[i];
|
|
|
|
// ONLY drain if we own this slab
|
|
if (m->owner_tid == tiny_self_u32()) {
|
|
int has_remote = (atomic_load_explicit(&tls->ss->remote_heads[i], memory_order_acquire) != 0);
|
|
if (has_remote) {
|
|
ss_remote_drain_to_freelist(tls->ss, i);
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
**Problem**:
|
|
- Still racy! `owner_tid` can change between the check and the drain
|
|
- Needs proper locking or ownership transfer protocol
|
|
- More complex, error-prone
|
|
|
|
### 5.3 Option C: Fix Sticky/Hot/Bench Paths (CORRECT ORDER)
|
|
|
|
**Changes**:
|
|
```c
|
|
// Sticky ring (tiny_refill.h:46-51)
|
|
if (lm->freelist || has_remote) {
|
|
// ✅ Claim ownership FIRST
|
|
tiny_tls_bind_slab(tls, last_ss, li);
|
|
ss_owner_cas(lm, tiny_self_u32());
|
|
|
|
// NOW safe to drain
|
|
if (!lm->freelist && has_remote) {
|
|
ss_remote_drain_to_freelist(last_ss, li);
|
|
}
|
|
|
|
if (lm->freelist) {
|
|
return last_ss;
|
|
}
|
|
}
|
|
```
|
|
|
|
Apply same pattern to hot slot (line 65) and bench (line 80).
|
|
|
|
### 5.4 RECOMMENDED: Combine Option A + Option C
|
|
|
|
1. **Remove Fix #1 and Fix #2** (eliminate main race sources)
|
|
2. **Fix sticky/hot/bench paths** (claim ownership before drain)
|
|
3. **Keep Fix #3** (already correct)
|
|
|
|
**Verification**:
|
|
```bash
|
|
# After applying fixes, rebuild and test
|
|
make clean && make -s larson_hakmem
|
|
HAKMEM_TINY_SS_ADOPT=1 scripts/run_larson_claude.sh repro 30 10
|
|
|
|
# Expected: NO crashes, or at least much fewer crashes
|
|
```
|
|
|
|
---
|
|
|
|
## 6. Next Steps
|
|
|
|
### 6.1 Immediate Actions
|
|
|
|
1. **Apply Option A**: Remove Fix #1 and Fix #2
|
|
- Comment out lines 615-621 in hakmem_tiny_free.inc
|
|
- Comment out lines 729-767 in hakmem_tiny_free.inc
|
|
- Rebuild and test
|
|
|
|
2. **Test Results**:
|
|
- If crashes stop → Fix #1/#2 were the main culprits
|
|
- If crashes continue → Sticky/hot/bench paths need fixing (Option C)
|
|
|
|
3. **Apply Option C** (if needed):
|
|
- Modify tiny_refill.h lines 46-51, 64-66, 78-81
|
|
- Claim ownership BEFORE draining
|
|
- Rebuild and test
|
|
|
|
### 6.2 Long-Term Improvements
|
|
|
|
1. **Add Ownership Assertion**:
|
|
```c
|
|
static inline void ss_remote_drain_to_freelist(SuperSlab* ss, int slab_idx) {
|
|
#ifdef HAKMEM_DEBUG_OWNERSHIP
|
|
TinySlabMeta* m = &ss->slabs[slab_idx];
|
|
uint32_t owner = m->owner_tid;
|
|
uint32_t self = tiny_self_u32();
|
|
if (owner != 0 && owner != self) {
|
|
fprintf(stderr, "[OWNERSHIP ERROR] Thread %u draining slab owned by %u!\n", self, owner);
|
|
abort();
|
|
}
|
|
#endif
|
|
// ... rest of function
|
|
}
|
|
```
|
|
|
|
2. **Add Debug Counters**:
|
|
- Count concurrent drain attempts
|
|
- Track ownership violations
|
|
- Dump statistics on crash
|
|
|
|
3. **Consider Lock-Free Alternative**:
|
|
- Use CAS-based freelist updates
|
|
- Or: Don't drain at all, just CAS-pop from remote queue directly
|
|
- Or: Ownership transfer protocol (expensive)
|
|
|
|
---
|
|
|
|
## 7. Conclusion
|
|
|
|
**Root Cause**: Concurrent `ss_remote_drain_to_freelist()` calls without exclusive ownership.
|
|
|
|
**Main Culprits**: Fix #1 and Fix #2 drain all slabs without ownership checks.
|
|
|
|
**Secondary Issues**: Sticky/hot/bench paths drain before claiming ownership.
|
|
|
|
**Solution**: Remove Fix #1/#2, fix sticky/hot/bench order, keep Fix #3.
|
|
|
|
**Confidence**: 🟢 **HIGH** - This explains all observed symptoms:
|
|
- Crashes at `fault_addr=0x6261` (freelist corruption)
|
|
- Timing-dependent failures (race condition)
|
|
- Improvements from Fix #3 (correct ownership protocol)
|
|
- Remaining crashes (Fix #1/#2 still racing)
|
|
|
|
---
|
|
|
|
**END OF ULTRA-DEEP ANALYSIS**
|