Performance Achievements: - Tiny allocations: +180-280% (21M → 59-70M ops/s random mixed) - Single-thread: +24% (2.71M → 3.36M ops/s Larson) - 4T stability: 0% → 95% (19/20 success rate) - Overall: 91.3% of System malloc average (target was 40-55%) ✓ Phase 7 (Tasks 1-3): Core Optimizations - Task 1: Header validation removal (Region-ID direct lookup) - Task 2: Aggressive inline (TLS cache access optimization) - Task 3: Pre-warm TLS cache (eliminate cold-start penalty) Result: +180-280% improvement, 85-146% of System malloc Critical Bug Fixes: - Fix 64B allocation crash (size-to-class +1 for header) - Fix 4T wrapper recursion bugs (BUG #7, #8, #10, #11) - Remove malloc fallback (30% → 50% stability) Phase 2a: SuperSlab Dynamic Expansion (CRITICAL) - Implement mimalloc-style chunk linking - Unlimited slab expansion (no more OOM at 32 slabs) - Fix chunk initialization bug (bitmap=0x00000001 after expansion) Files: core/hakmem_tiny_superslab.c/h, core/superslab/superslab_types.h Result: 50% → 95% stability (19/20 4T success) Phase 2b: TLS Cache Adaptive Sizing - Dynamic capacity: 16-2048 slots based on usage - High-water mark tracking + exponential growth/shrink - Expected: +3-10% performance, -30-50% memory Files: core/tiny_adaptive_sizing.c/h (new) Phase 2c: BigCache Dynamic Hash Table - Migrate from fixed 256×8 array to dynamic hash table - Auto-resize: 256 → 512 → 1024 → 65,536 buckets - Improved hash function (FNV-1a) + collision chaining Files: core/hakmem_bigcache.c/h Expected: +10-20% cache hit rate Design Flaws Analysis: - Identified 6 components with fixed-capacity bottlenecks - SuperSlab (CRITICAL), TLS Cache (HIGH), BigCache/L2.5 (MEDIUM) - Report: DESIGN_FLAWS_ANALYSIS.md (11 chapters) Documentation: - 13 comprehensive reports (PHASE*.md, DESIGN_FLAWS*.md) - Implementation guides, test results, production readiness - Bug fix reports, root cause analysis Build System: - Makefile: phase7 targets, PREWARM_TLS flag - Auto dependency generation (-MMD -MP) for .inc files Known Issues: - 4T stability: 19/20 (95%) - investigating 1 failure for 100% - L2.5 Pool dynamic sharding: design only (needs 2-3 days integration) 🤖 Generated with Claude Code (https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
393 lines
10 KiB
Markdown
393 lines
10 KiB
Markdown
# Task for Other AI: Fix 4T High-Contention Crash (Mixed Allocation Bug)
|
||
|
||
**Date**: 2025-11-08
|
||
**Priority**: CRITICAL
|
||
**Status**: BLOCKING production deployment
|
||
|
||
---
|
||
|
||
## Executive Summary
|
||
|
||
**Problem**: 4T high-contention crash with **70% failure rate** (6/20 success)
|
||
|
||
**Root Cause Identified**: Mixed HAKMEM/libc allocations causing `free(): invalid pointer`
|
||
|
||
**Your Mission**: Fix the mixed allocation bug to achieve **100% stability**
|
||
|
||
---
|
||
|
||
## Background
|
||
|
||
### Current Status
|
||
|
||
Phase 7 optimization achieved **excellent performance**:
|
||
- Single-threaded: **91.3% of System malloc** (target was 40-55%) ✅
|
||
- Multi-threaded low-contention: **100% stable** ✅
|
||
- **BUT**: 4T high-contention: **70% crash rate** ❌
|
||
|
||
### What Works
|
||
|
||
```bash
|
||
# ✅ Works perfectly (100% stable)
|
||
./larson_hakmem 1 1 128 1024 1 12345 1 # 1T: 2.74M ops/s
|
||
./larson_hakmem 2 8 128 1024 1 12345 2 # 2T: 4.91M ops/s
|
||
./larson_hakmem 10 8 128 256 1 12345 4 # 4T low: 251K ops/s
|
||
|
||
# ❌ Crashes 70% of the time
|
||
./larson_hakmem 10 8 128 1024 1 12345 4 # 4T high: 981K ops/s (when it works)
|
||
```
|
||
|
||
### What Breaks
|
||
|
||
**Crash pattern**:
|
||
```
|
||
free(): invalid pointer
|
||
[DEBUG] superslab_refill returned NULL (OOM) detail:
|
||
class=4 prev_ss=(nil) active=0 bitmap=0x00000000
|
||
prev_meta=(nil) used=0 cap=0 slab_idx=0
|
||
reused_freelist=0 free_idx=-2 errno=12
|
||
```
|
||
|
||
**Sequence of events**:
|
||
1. Thread exhausts SuperSlab for class 6 (or 1, 4)
|
||
2. `superslab_refill()` fails with OOM (errno=12, ENOMEM)
|
||
3. Code falls back to `malloc()` (libc malloc)
|
||
4. Now we have **mixed allocations**: some from HAKMEM, some from libc
|
||
5. `free()` receives a libc-allocated pointer
|
||
6. HAKMEM's free path tries to handle it → **CRASH**
|
||
|
||
---
|
||
|
||
## Root Cause Analysis (from Task Agent)
|
||
|
||
### The Mixed Allocation Problem
|
||
|
||
**File**: `core/box/hak_alloc_api.inc.h` or similar allocation paths
|
||
|
||
**Current behavior**:
|
||
```c
|
||
// Pseudo-code of current allocation path
|
||
void* hak_alloc(size_t size) {
|
||
// Try HAKMEM allocation
|
||
void* ptr = hak_tiny_alloc(size);
|
||
if (ptr) return ptr;
|
||
|
||
// HAKMEM failed (OOM) → fallback to libc malloc
|
||
return malloc(size); // ← PROBLEM: Now we have mixed allocations!
|
||
}
|
||
|
||
void hak_free(void* ptr) {
|
||
// Try to free as HAKMEM allocation
|
||
if (looks_like_hakmem(ptr)) {
|
||
hakmem_free(ptr); // ← PROBLEM: What if it's actually from malloc()?
|
||
} else {
|
||
free(ptr); // ← PROBLEM: What if we guessed wrong?
|
||
}
|
||
}
|
||
```
|
||
|
||
**Why this crashes**:
|
||
- HAKMEM can't distinguish between HAKMEM-allocated and malloc-allocated pointers
|
||
- Header-based detection is unreliable (malloc memory might look like HAKMEM headers)
|
||
- Cross-allocation free causes corruption/crashes
|
||
|
||
### Why SuperSlab OOM Happens
|
||
|
||
**High-contention scenario**:
|
||
- 4 threads × 1024 chunks each = 4096 concurrent allocations
|
||
- All threads allocate 128B blocks (class 4 or 6)
|
||
- SuperSlab runs out of slabs for that class
|
||
- No dynamic scaling → OOM
|
||
|
||
**Evidence**: `bitmap=0x00000000` means all 32 slabs exhausted
|
||
|
||
---
|
||
|
||
## Your Mission: 3 Potential Fixes (Choose Best Approach)
|
||
|
||
### Option A: Disable malloc Fallback (Recommended - Safest)
|
||
|
||
**Idea**: Make allocation failures explicit instead of silently falling back
|
||
|
||
**Implementation**:
|
||
|
||
**File**: Find the allocation path that does malloc fallback (likely `core/box/hak_alloc_api.inc.h` or `core/hakmem_tiny.c`)
|
||
|
||
**Change**:
|
||
```c
|
||
// Before (BROKEN):
|
||
void* hak_alloc(size_t size) {
|
||
void* ptr = hak_tiny_alloc(size);
|
||
if (ptr) return ptr;
|
||
|
||
// Fallback to malloc (causes mixed allocations)
|
||
return malloc(size); // ❌ BAD
|
||
}
|
||
|
||
// After (SAFE):
|
||
void* hak_alloc(size_t size) {
|
||
void* ptr = hak_tiny_alloc(size);
|
||
if (!ptr) {
|
||
// OOM: Log and fail explicitly
|
||
fprintf(stderr, "[HAKMEM] OOM for size=%zu, returning NULL\n", size);
|
||
errno = ENOMEM;
|
||
return NULL; // ✅ Explicit failure
|
||
}
|
||
return ptr;
|
||
}
|
||
```
|
||
|
||
**Pros**:
|
||
- Simple and safe
|
||
- No mixed allocations
|
||
- Caller can handle OOM explicitly
|
||
|
||
**Cons**:
|
||
- Applications must handle NULL returns
|
||
- Might break code that assumes malloc never fails
|
||
|
||
**Testing**:
|
||
```bash
|
||
# Should complete without crashes OR fail cleanly with OOM message
|
||
./larson_hakmem 10 8 128 1024 1 12345 4
|
||
```
|
||
|
||
---
|
||
|
||
### Option B: Fix SuperSlab Starvation (Recommended - Best Long-term)
|
||
|
||
**Idea**: Prevent OOM by dynamically scaling SuperSlab capacity
|
||
|
||
**Implementation**:
|
||
|
||
**File**: `core/tiny_superslab_alloc.inc.h` or SuperSlab management code
|
||
|
||
**Change 1: Detect starvation**:
|
||
```c
|
||
// In superslab_refill()
|
||
if (bitmap == 0x00000000) {
|
||
// All slabs exhausted → try to allocate more
|
||
fprintf(stderr, "[HAKMEM] SuperSlab class %d exhausted, allocating more...\n", class_idx);
|
||
|
||
// Allocate a new SuperSlab
|
||
SuperSlab* new_ss = allocate_superslab(class_idx);
|
||
if (new_ss) {
|
||
register_superslab(new_ss);
|
||
// Retry refill from new SuperSlab
|
||
return refill_from_superslab(new_ss, class_idx, count);
|
||
}
|
||
}
|
||
```
|
||
|
||
**Change 2: Increase initial capacity for hot classes**:
|
||
```c
|
||
// In SuperSlab initialization
|
||
// Classes 1, 4, 6 are hot in multi-threaded workloads
|
||
if (class_idx == 1 || class_idx == 4 || class_idx == 6) {
|
||
initial_slabs = 64; // Double capacity for hot classes
|
||
} else {
|
||
initial_slabs = 32; // Default
|
||
}
|
||
```
|
||
|
||
**Pros**:
|
||
- Fixes root cause (OOM)
|
||
- No mixed allocations needed
|
||
- Scales naturally with workload
|
||
|
||
**Cons**:
|
||
- More complex
|
||
- Memory overhead for extra SuperSlabs
|
||
|
||
**Testing**:
|
||
```bash
|
||
# Should complete 100% of the time without OOM
|
||
for i in {1..20}; do ./larson_hakmem 10 8 128 1024 1 12345 4; done
|
||
```
|
||
|
||
---
|
||
|
||
### Option C: Add Allocation Ownership Tracking (Comprehensive)
|
||
|
||
**Idea**: Track which allocator owns each pointer
|
||
|
||
**Implementation**:
|
||
|
||
**File**: `core/box/hak_free_api.inc.h` or free path
|
||
|
||
**Change 1: Add ownership bitmap**:
|
||
```c
|
||
// Global bitmap to track HAKMEM allocations
|
||
// Each bit represents a 64KB region
|
||
#define OWNERSHIP_BITMAP_SIZE (1ULL << 20) // 1M bits = 64GB coverage
|
||
static uint64_t g_hakmem_ownership_bitmap[OWNERSHIP_BITMAP_SIZE / 64];
|
||
|
||
// Mark allocation as HAKMEM-owned
|
||
static inline void mark_hakmem_allocation(void* ptr, size_t size) {
|
||
uintptr_t addr = (uintptr_t)ptr;
|
||
size_t region = addr / (64 * 1024); // 64KB regions
|
||
size_t word = region / 64;
|
||
size_t bit = region % 64;
|
||
atomic_fetch_or(&g_hakmem_ownership_bitmap[word], 1ULL << bit);
|
||
}
|
||
|
||
// Check if allocation is HAKMEM-owned
|
||
static inline int is_hakmem_allocation(void* ptr) {
|
||
uintptr_t addr = (uintptr_t)ptr;
|
||
size_t region = addr / (64 * 1024);
|
||
size_t word = region / 64;
|
||
size_t bit = region % 64;
|
||
return (g_hakmem_ownership_bitmap[word] & (1ULL << bit)) != 0;
|
||
}
|
||
```
|
||
|
||
**Change 2: Use ownership in free path**:
|
||
```c
|
||
void hak_free(void* ptr) {
|
||
if (is_hakmem_allocation(ptr)) {
|
||
hakmem_free(ptr); // ✅ Confirmed HAKMEM
|
||
} else {
|
||
free(ptr); // ✅ Confirmed libc malloc
|
||
}
|
||
}
|
||
```
|
||
|
||
**Pros**:
|
||
- Allows mixed allocations safely
|
||
- Works with existing malloc fallback
|
||
|
||
**Cons**:
|
||
- Complex to implement correctly
|
||
- Memory overhead for bitmap
|
||
- Atomic operations on free path
|
||
|
||
---
|
||
|
||
## Recommendation: **Combine Option A + Option B**
|
||
|
||
**Phase 1 (Immediate - 1 hour)**: Disable malloc fallback (Option A)
|
||
- Quick and safe fix
|
||
- Prevents crashes immediately
|
||
- Test 4T stability → should be 100%
|
||
|
||
**Phase 2 (Next - 2-4 hours)**: Fix SuperSlab starvation (Option B)
|
||
- Implement dynamic SuperSlab scaling
|
||
- Increase capacity for hot classes (1, 4, 6)
|
||
- Remove Option A workaround
|
||
|
||
**Phase 3 (Optional)**: Add ownership tracking (Option C) for defense-in-depth
|
||
|
||
---
|
||
|
||
## Testing Requirements
|
||
|
||
### Test 1: Stability (CRITICAL)
|
||
|
||
```bash
|
||
# Must achieve 100% success rate
|
||
for i in {1..20}; do
|
||
echo "Run $i:"
|
||
env HAKMEM_TINY_USE_SUPERSLAB=1 HAKMEM_TINY_MEM_DIET=0 \
|
||
./larson_hakmem 10 8 128 1024 1 12345 4 2>&1 | grep "Throughput"
|
||
echo "Exit code: $?"
|
||
done
|
||
|
||
# Expected: 20/20 success (100%)
|
||
```
|
||
|
||
### Test 2: Performance (No regression)
|
||
|
||
```bash
|
||
# Should maintain ~981K ops/s
|
||
env HAKMEM_TINY_USE_SUPERSLAB=1 HAKMEM_TINY_MEM_DIET=0 \
|
||
./larson_hakmem 10 8 128 1024 1 12345 4
|
||
|
||
# Expected: Throughput ≈ 981K ops/s (same as before)
|
||
```
|
||
|
||
### Test 3: Regression Check
|
||
|
||
```bash
|
||
# Ensure low-contention still works
|
||
./larson_hakmem 1 1 128 1024 1 12345 1 # 1T
|
||
./larson_hakmem 2 8 128 1024 1 12345 2 # 2T
|
||
./larson_hakmem 10 8 128 256 1 12345 4 # 4T low
|
||
|
||
# Expected: All complete successfully
|
||
```
|
||
|
||
---
|
||
|
||
## Success Criteria
|
||
|
||
✅ **4T high-contention stability: 100% (20/20 runs)**
|
||
✅ **No performance regression** (≥950K ops/s)
|
||
✅ **No crashes or OOM errors**
|
||
✅ **1T/2T/4T low-contention still work**
|
||
|
||
---
|
||
|
||
## Files to Review/Modify
|
||
|
||
**Likely files** (search for malloc fallback):
|
||
1. `core/box/hak_alloc_api.inc.h` - Main allocation API
|
||
2. `core/hakmem_tiny.c` - Tiny allocator implementation
|
||
3. `core/tiny_alloc_fast.inc.h` - Fast path allocation
|
||
4. `core/tiny_superslab_alloc.inc.h` - SuperSlab allocation
|
||
5. `core/hakmem_tiny_refill_p0.inc.h` - Refill logic
|
||
|
||
**Search commands**:
|
||
```bash
|
||
# Find malloc fallback
|
||
grep -rn "malloc(" core/ | grep -v "//.*malloc"
|
||
|
||
# Find OOM handling
|
||
grep -rn "errno.*ENOMEM\|OOM\|returned NULL" core/
|
||
|
||
# Find SuperSlab allocation
|
||
grep -rn "superslab_refill\|allocate.*superslab" core/
|
||
```
|
||
|
||
---
|
||
|
||
## Expected Deliverable
|
||
|
||
**Report file**: `/mnt/workdisk/public_share/hakmem/PHASE7_MIXED_ALLOCATION_FIX.md`
|
||
|
||
**Required sections**:
|
||
1. **Approach chosen** (A, B, C, or combination)
|
||
2. **Code changes** (diffs showing before/after)
|
||
3. **Why it works** (explanation of fix)
|
||
4. **Test results** (20/20 stability test)
|
||
5. **Performance impact** (before/after comparison)
|
||
6. **Production readiness** (YES/NO verdict)
|
||
|
||
---
|
||
|
||
## Context Documents
|
||
|
||
- `PHASE7_4T_STABILITY_VERIFICATION.md` - Recent stability test (30% success)
|
||
- `PHASE7_BUG3_FIX_REPORT.md` - Previous debugging attempts
|
||
- `PHASE7_FINAL_BENCHMARK_RESULTS.md` - Overall Phase 7 results
|
||
- `CLAUDE.md` - Project history and status
|
||
|
||
---
|
||
|
||
## Questions? Debug Hints
|
||
|
||
**Q: Where is the malloc fallback code?**
|
||
A: Search for `malloc(` in `core/box/*.inc.h` and `core/hakmem_tiny*.c`
|
||
|
||
**Q: How do I test just the fix without full rebuild?**
|
||
A: `make clean && make HEADER_CLASSIDX=1 AGGRESSIVE_INLINE=1 PREWARM_TLS=1 larson_hakmem`
|
||
|
||
**Q: What if Option A causes application crashes?**
|
||
A: That's expected if the app doesn't handle malloc failures. Move to Option B.
|
||
|
||
**Q: How do I know if SuperSlab OOM is fixed?**
|
||
A: No more `[DEBUG] superslab_refill returned NULL (OOM)` messages in output
|
||
|
||
---
|
||
|
||
**Good luck! Let's achieve 100% stability! 🚀**
|