## Changes ### 1. core/page_arena.c - Removed init failure message (lines 25-27) - error is handled by returning early - All other fprintf statements already wrapped in existing #if !HAKMEM_BUILD_RELEASE blocks ### 2. core/hakmem.c - Wrapped SIGSEGV handler init message (line 72) - CRITICAL: Kept SIGSEGV/SIGBUS/SIGABRT error messages (lines 62-64) - production needs crash logs ### 3. core/hakmem_shared_pool.c - Wrapped all debug fprintf statements in #if !HAKMEM_BUILD_RELEASE: - Node pool exhaustion warning (line 252) - SP_META_CAPACITY_ERROR warning (line 421) - SP_FIX_GEOMETRY debug logging (line 745) - SP_ACQUIRE_STAGE0.5_EMPTY debug logging (line 865) - SP_ACQUIRE_STAGE0_L0 debug logging (line 803) - SP_ACQUIRE_STAGE1_LOCKFREE debug logging (line 922) - SP_ACQUIRE_STAGE2_LOCKFREE debug logging (line 996) - SP_ACQUIRE_STAGE3 debug logging (line 1116) - SP_SLOT_RELEASE debug logging (line 1245) - SP_SLOT_FREELIST_LOCKFREE debug logging (line 1305) - SP_SLOT_COMPLETELY_EMPTY debug logging (line 1316) - Fixed lock_stats_init() for release builds (lines 60-65) - ensure g_lock_stats_enabled is initialized ## Performance Validation Before: 51M ops/s (with debug fprintf overhead) After: 49.1M ops/s (consistent performance, fprintf removed from hot paths) ## Build & Test ```bash ./build.sh larson_hakmem ./out/release/larson_hakmem 1 5 1 1000 100 10000 42 # Result: 49.1M ops/s ``` Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
393 lines
10 KiB
Markdown
393 lines
10 KiB
Markdown
# Task for Other AI: Fix 4T High-Contention Crash (Mixed Allocation Bug)
|
||
|
||
**Date**: 2025-11-08
|
||
**Priority**: CRITICAL
|
||
**Status**: BLOCKING production deployment
|
||
|
||
---
|
||
|
||
## Executive Summary
|
||
|
||
**Problem**: 4T high-contention crash with **70% failure rate** (6/20 success)
|
||
|
||
**Root Cause Identified**: Mixed HAKMEM/libc allocations causing `free(): invalid pointer`
|
||
|
||
**Your Mission**: Fix the mixed allocation bug to achieve **100% stability**
|
||
|
||
---
|
||
|
||
## Background
|
||
|
||
### Current Status
|
||
|
||
Phase 7 optimization achieved **excellent performance**:
|
||
- Single-threaded: **91.3% of System malloc** (target was 40-55%) ✅
|
||
- Multi-threaded low-contention: **100% stable** ✅
|
||
- **BUT**: 4T high-contention: **70% crash rate** ❌
|
||
|
||
### What Works
|
||
|
||
```bash
|
||
# ✅ Works perfectly (100% stable)
|
||
./larson_hakmem 1 1 128 1024 1 12345 1 # 1T: 2.74M ops/s
|
||
./larson_hakmem 2 8 128 1024 1 12345 2 # 2T: 4.91M ops/s
|
||
./larson_hakmem 10 8 128 256 1 12345 4 # 4T low: 251K ops/s
|
||
|
||
# ❌ Crashes 70% of the time
|
||
./larson_hakmem 10 8 128 1024 1 12345 4 # 4T high: 981K ops/s (when it works)
|
||
```
|
||
|
||
### What Breaks
|
||
|
||
**Crash pattern**:
|
||
```
|
||
free(): invalid pointer
|
||
[DEBUG] superslab_refill returned NULL (OOM) detail:
|
||
class=4 prev_ss=(nil) active=0 bitmap=0x00000000
|
||
prev_meta=(nil) used=0 cap=0 slab_idx=0
|
||
reused_freelist=0 free_idx=-2 errno=12
|
||
```
|
||
|
||
**Sequence of events**:
|
||
1. Thread exhausts SuperSlab for class 6 (or 1, 4)
|
||
2. `superslab_refill()` fails with OOM (errno=12, ENOMEM)
|
||
3. Code falls back to `malloc()` (libc malloc)
|
||
4. Now we have **mixed allocations**: some from HAKMEM, some from libc
|
||
5. `free()` receives a libc-allocated pointer
|
||
6. HAKMEM's free path tries to handle it → **CRASH**
|
||
|
||
---
|
||
|
||
## Root Cause Analysis (from Task Agent)
|
||
|
||
### The Mixed Allocation Problem
|
||
|
||
**File**: `core/box/hak_alloc_api.inc.h` or similar allocation paths
|
||
|
||
**Current behavior**:
|
||
```c
|
||
// Pseudo-code of current allocation path
|
||
void* hak_alloc(size_t size) {
|
||
// Try HAKMEM allocation
|
||
void* ptr = hak_tiny_alloc(size);
|
||
if (ptr) return ptr;
|
||
|
||
// HAKMEM failed (OOM) → fallback to libc malloc
|
||
return malloc(size); // ← PROBLEM: Now we have mixed allocations!
|
||
}
|
||
|
||
void hak_free(void* ptr) {
|
||
// Try to free as HAKMEM allocation
|
||
if (looks_like_hakmem(ptr)) {
|
||
hakmem_free(ptr); // ← PROBLEM: What if it's actually from malloc()?
|
||
} else {
|
||
free(ptr); // ← PROBLEM: What if we guessed wrong?
|
||
}
|
||
}
|
||
```
|
||
|
||
**Why this crashes**:
|
||
- HAKMEM can't distinguish between HAKMEM-allocated and malloc-allocated pointers
|
||
- Header-based detection is unreliable (malloc memory might look like HAKMEM headers)
|
||
- Cross-allocation free causes corruption/crashes
|
||
|
||
### Why SuperSlab OOM Happens
|
||
|
||
**High-contention scenario**:
|
||
- 4 threads × 1024 chunks each = 4096 concurrent allocations
|
||
- All threads allocate 128B blocks (class 4 or 6)
|
||
- SuperSlab runs out of slabs for that class
|
||
- No dynamic scaling → OOM
|
||
|
||
**Evidence**: `bitmap=0x00000000` means all 32 slabs exhausted
|
||
|
||
---
|
||
|
||
## Your Mission: 3 Potential Fixes (Choose Best Approach)
|
||
|
||
### Option A: Disable malloc Fallback (Recommended - Safest)
|
||
|
||
**Idea**: Make allocation failures explicit instead of silently falling back
|
||
|
||
**Implementation**:
|
||
|
||
**File**: Find the allocation path that does malloc fallback (likely `core/box/hak_alloc_api.inc.h` or `core/hakmem_tiny.c`)
|
||
|
||
**Change**:
|
||
```c
|
||
// Before (BROKEN):
|
||
void* hak_alloc(size_t size) {
|
||
void* ptr = hak_tiny_alloc(size);
|
||
if (ptr) return ptr;
|
||
|
||
// Fallback to malloc (causes mixed allocations)
|
||
return malloc(size); // ❌ BAD
|
||
}
|
||
|
||
// After (SAFE):
|
||
void* hak_alloc(size_t size) {
|
||
void* ptr = hak_tiny_alloc(size);
|
||
if (!ptr) {
|
||
// OOM: Log and fail explicitly
|
||
fprintf(stderr, "[HAKMEM] OOM for size=%zu, returning NULL\n", size);
|
||
errno = ENOMEM;
|
||
return NULL; // ✅ Explicit failure
|
||
}
|
||
return ptr;
|
||
}
|
||
```
|
||
|
||
**Pros**:
|
||
- Simple and safe
|
||
- No mixed allocations
|
||
- Caller can handle OOM explicitly
|
||
|
||
**Cons**:
|
||
- Applications must handle NULL returns
|
||
- Might break code that assumes malloc never fails
|
||
|
||
**Testing**:
|
||
```bash
|
||
# Should complete without crashes OR fail cleanly with OOM message
|
||
./larson_hakmem 10 8 128 1024 1 12345 4
|
||
```
|
||
|
||
---
|
||
|
||
### Option B: Fix SuperSlab Starvation (Recommended - Best Long-term)
|
||
|
||
**Idea**: Prevent OOM by dynamically scaling SuperSlab capacity
|
||
|
||
**Implementation**:
|
||
|
||
**File**: `core/tiny_superslab_alloc.inc.h` or SuperSlab management code
|
||
|
||
**Change 1: Detect starvation**:
|
||
```c
|
||
// In superslab_refill()
|
||
if (bitmap == 0x00000000) {
|
||
// All slabs exhausted → try to allocate more
|
||
fprintf(stderr, "[HAKMEM] SuperSlab class %d exhausted, allocating more...\n", class_idx);
|
||
|
||
// Allocate a new SuperSlab
|
||
SuperSlab* new_ss = allocate_superslab(class_idx);
|
||
if (new_ss) {
|
||
register_superslab(new_ss);
|
||
// Retry refill from new SuperSlab
|
||
return refill_from_superslab(new_ss, class_idx, count);
|
||
}
|
||
}
|
||
```
|
||
|
||
**Change 2: Increase initial capacity for hot classes**:
|
||
```c
|
||
// In SuperSlab initialization
|
||
// Classes 1, 4, 6 are hot in multi-threaded workloads
|
||
if (class_idx == 1 || class_idx == 4 || class_idx == 6) {
|
||
initial_slabs = 64; // Double capacity for hot classes
|
||
} else {
|
||
initial_slabs = 32; // Default
|
||
}
|
||
```
|
||
|
||
**Pros**:
|
||
- Fixes root cause (OOM)
|
||
- No mixed allocations needed
|
||
- Scales naturally with workload
|
||
|
||
**Cons**:
|
||
- More complex
|
||
- Memory overhead for extra SuperSlabs
|
||
|
||
**Testing**:
|
||
```bash
|
||
# Should complete 100% of the time without OOM
|
||
for i in {1..20}; do ./larson_hakmem 10 8 128 1024 1 12345 4; done
|
||
```
|
||
|
||
---
|
||
|
||
### Option C: Add Allocation Ownership Tracking (Comprehensive)
|
||
|
||
**Idea**: Track which allocator owns each pointer
|
||
|
||
**Implementation**:
|
||
|
||
**File**: `core/box/hak_free_api.inc.h` or free path
|
||
|
||
**Change 1: Add ownership bitmap**:
|
||
```c
|
||
// Global bitmap to track HAKMEM allocations
|
||
// Each bit represents a 64KB region
|
||
#define OWNERSHIP_BITMAP_SIZE (1ULL << 20) // 1M bits = 64GB coverage
|
||
static uint64_t g_hakmem_ownership_bitmap[OWNERSHIP_BITMAP_SIZE / 64];
|
||
|
||
// Mark allocation as HAKMEM-owned
|
||
static inline void mark_hakmem_allocation(void* ptr, size_t size) {
|
||
uintptr_t addr = (uintptr_t)ptr;
|
||
size_t region = addr / (64 * 1024); // 64KB regions
|
||
size_t word = region / 64;
|
||
size_t bit = region % 64;
|
||
atomic_fetch_or(&g_hakmem_ownership_bitmap[word], 1ULL << bit);
|
||
}
|
||
|
||
// Check if allocation is HAKMEM-owned
|
||
static inline int is_hakmem_allocation(void* ptr) {
|
||
uintptr_t addr = (uintptr_t)ptr;
|
||
size_t region = addr / (64 * 1024);
|
||
size_t word = region / 64;
|
||
size_t bit = region % 64;
|
||
return (g_hakmem_ownership_bitmap[word] & (1ULL << bit)) != 0;
|
||
}
|
||
```
|
||
|
||
**Change 2: Use ownership in free path**:
|
||
```c
|
||
void hak_free(void* ptr) {
|
||
if (is_hakmem_allocation(ptr)) {
|
||
hakmem_free(ptr); // ✅ Confirmed HAKMEM
|
||
} else {
|
||
free(ptr); // ✅ Confirmed libc malloc
|
||
}
|
||
}
|
||
```
|
||
|
||
**Pros**:
|
||
- Allows mixed allocations safely
|
||
- Works with existing malloc fallback
|
||
|
||
**Cons**:
|
||
- Complex to implement correctly
|
||
- Memory overhead for bitmap
|
||
- Atomic operations on free path
|
||
|
||
---
|
||
|
||
## Recommendation: **Combine Option A + Option B**
|
||
|
||
**Phase 1 (Immediate - 1 hour)**: Disable malloc fallback (Option A)
|
||
- Quick and safe fix
|
||
- Prevents crashes immediately
|
||
- Test 4T stability → should be 100%
|
||
|
||
**Phase 2 (Next - 2-4 hours)**: Fix SuperSlab starvation (Option B)
|
||
- Implement dynamic SuperSlab scaling
|
||
- Increase capacity for hot classes (1, 4, 6)
|
||
- Remove Option A workaround
|
||
|
||
**Phase 3 (Optional)**: Add ownership tracking (Option C) for defense-in-depth
|
||
|
||
---
|
||
|
||
## Testing Requirements
|
||
|
||
### Test 1: Stability (CRITICAL)
|
||
|
||
```bash
|
||
# Must achieve 100% success rate
|
||
for i in {1..20}; do
|
||
echo "Run $i:"
|
||
env HAKMEM_TINY_USE_SUPERSLAB=1 HAKMEM_TINY_MEM_DIET=0 \
|
||
./larson_hakmem 10 8 128 1024 1 12345 4 2>&1 | grep "Throughput"
|
||
echo "Exit code: $?"
|
||
done
|
||
|
||
# Expected: 20/20 success (100%)
|
||
```
|
||
|
||
### Test 2: Performance (No regression)
|
||
|
||
```bash
|
||
# Should maintain ~981K ops/s
|
||
env HAKMEM_TINY_USE_SUPERSLAB=1 HAKMEM_TINY_MEM_DIET=0 \
|
||
./larson_hakmem 10 8 128 1024 1 12345 4
|
||
|
||
# Expected: Throughput ≈ 981K ops/s (same as before)
|
||
```
|
||
|
||
### Test 3: Regression Check
|
||
|
||
```bash
|
||
# Ensure low-contention still works
|
||
./larson_hakmem 1 1 128 1024 1 12345 1 # 1T
|
||
./larson_hakmem 2 8 128 1024 1 12345 2 # 2T
|
||
./larson_hakmem 10 8 128 256 1 12345 4 # 4T low
|
||
|
||
# Expected: All complete successfully
|
||
```
|
||
|
||
---
|
||
|
||
## Success Criteria
|
||
|
||
✅ **4T high-contention stability: 100% (20/20 runs)**
|
||
✅ **No performance regression** (≥950K ops/s)
|
||
✅ **No crashes or OOM errors**
|
||
✅ **1T/2T/4T low-contention still work**
|
||
|
||
---
|
||
|
||
## Files to Review/Modify
|
||
|
||
**Likely files** (search for malloc fallback):
|
||
1. `core/box/hak_alloc_api.inc.h` - Main allocation API
|
||
2. `core/hakmem_tiny.c` - Tiny allocator implementation
|
||
3. `core/tiny_alloc_fast.inc.h` - Fast path allocation
|
||
4. `core/tiny_superslab_alloc.inc.h` - SuperSlab allocation
|
||
5. `core/hakmem_tiny_refill_p0.inc.h` - Refill logic
|
||
|
||
**Search commands**:
|
||
```bash
|
||
# Find malloc fallback
|
||
grep -rn "malloc(" core/ | grep -v "//.*malloc"
|
||
|
||
# Find OOM handling
|
||
grep -rn "errno.*ENOMEM\|OOM\|returned NULL" core/
|
||
|
||
# Find SuperSlab allocation
|
||
grep -rn "superslab_refill\|allocate.*superslab" core/
|
||
```
|
||
|
||
---
|
||
|
||
## Expected Deliverable
|
||
|
||
**Report file**: `/mnt/workdisk/public_share/hakmem/PHASE7_MIXED_ALLOCATION_FIX.md`
|
||
|
||
**Required sections**:
|
||
1. **Approach chosen** (A, B, C, or combination)
|
||
2. **Code changes** (diffs showing before/after)
|
||
3. **Why it works** (explanation of fix)
|
||
4. **Test results** (20/20 stability test)
|
||
5. **Performance impact** (before/after comparison)
|
||
6. **Production readiness** (YES/NO verdict)
|
||
|
||
---
|
||
|
||
## Context Documents
|
||
|
||
- `PHASE7_4T_STABILITY_VERIFICATION.md` - Recent stability test (30% success)
|
||
- `PHASE7_BUG3_FIX_REPORT.md` - Previous debugging attempts
|
||
- `PHASE7_FINAL_BENCHMARK_RESULTS.md` - Overall Phase 7 results
|
||
- `CLAUDE.md` - Project history and status
|
||
|
||
---
|
||
|
||
## Questions? Debug Hints
|
||
|
||
**Q: Where is the malloc fallback code?**
|
||
A: Search for `malloc(` in `core/box/*.inc.h` and `core/hakmem_tiny*.c`
|
||
|
||
**Q: How do I test just the fix without full rebuild?**
|
||
A: `make clean && make HEADER_CLASSIDX=1 AGGRESSIVE_INLINE=1 PREWARM_TLS=1 larson_hakmem`
|
||
|
||
**Q: What if Option A causes application crashes?**
|
||
A: That's expected if the app doesn't handle malloc failures. Move to Option B.
|
||
|
||
**Q: How do I know if SuperSlab OOM is fixed?**
|
||
A: No more `[DEBUG] superslab_refill returned NULL (OOM)` messages in output
|
||
|
||
---
|
||
|
||
**Good luck! Let's achieve 100% stability! 🚀**
|