Files
hakmem/TASK_FOR_OTHER_AI.md

393 lines
10 KiB
Markdown
Raw Normal View History

feat: Phase 7 + Phase 2 - Massive performance & stability improvements Performance Achievements: - Tiny allocations: +180-280% (21M → 59-70M ops/s random mixed) - Single-thread: +24% (2.71M → 3.36M ops/s Larson) - 4T stability: 0% → 95% (19/20 success rate) - Overall: 91.3% of System malloc average (target was 40-55%) ✓ Phase 7 (Tasks 1-3): Core Optimizations - Task 1: Header validation removal (Region-ID direct lookup) - Task 2: Aggressive inline (TLS cache access optimization) - Task 3: Pre-warm TLS cache (eliminate cold-start penalty) Result: +180-280% improvement, 85-146% of System malloc Critical Bug Fixes: - Fix 64B allocation crash (size-to-class +1 for header) - Fix 4T wrapper recursion bugs (BUG #7, #8, #10, #11) - Remove malloc fallback (30% → 50% stability) Phase 2a: SuperSlab Dynamic Expansion (CRITICAL) - Implement mimalloc-style chunk linking - Unlimited slab expansion (no more OOM at 32 slabs) - Fix chunk initialization bug (bitmap=0x00000001 after expansion) Files: core/hakmem_tiny_superslab.c/h, core/superslab/superslab_types.h Result: 50% → 95% stability (19/20 4T success) Phase 2b: TLS Cache Adaptive Sizing - Dynamic capacity: 16-2048 slots based on usage - High-water mark tracking + exponential growth/shrink - Expected: +3-10% performance, -30-50% memory Files: core/tiny_adaptive_sizing.c/h (new) Phase 2c: BigCache Dynamic Hash Table - Migrate from fixed 256×8 array to dynamic hash table - Auto-resize: 256 → 512 → 1024 → 65,536 buckets - Improved hash function (FNV-1a) + collision chaining Files: core/hakmem_bigcache.c/h Expected: +10-20% cache hit rate Design Flaws Analysis: - Identified 6 components with fixed-capacity bottlenecks - SuperSlab (CRITICAL), TLS Cache (HIGH), BigCache/L2.5 (MEDIUM) - Report: DESIGN_FLAWS_ANALYSIS.md (11 chapters) Documentation: - 13 comprehensive reports (PHASE*.md, DESIGN_FLAWS*.md) - Implementation guides, test results, production readiness - Bug fix reports, root cause analysis Build System: - Makefile: phase7 targets, PREWARM_TLS flag - Auto dependency generation (-MMD -MP) for .inc files Known Issues: - 4T stability: 19/20 (95%) - investigating 1 failure for 100% - L2.5 Pool dynamic sharding: design only (needs 2-3 days integration) 🤖 Generated with Claude Code (https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-08 17:08:00 +09:00
# Task for Other AI: Fix 4T High-Contention Crash (Mixed Allocation Bug)
**Date**: 2025-11-08
**Priority**: CRITICAL
**Status**: BLOCKING production deployment
---
## Executive Summary
**Problem**: 4T high-contention crash with **70% failure rate** (6/20 success)
**Root Cause Identified**: Mixed HAKMEM/libc allocations causing `free(): invalid pointer`
**Your Mission**: Fix the mixed allocation bug to achieve **100% stability**
---
## Background
### Current Status
Phase 7 optimization achieved **excellent performance**:
- Single-threaded: **91.3% of System malloc** (target was 40-55%) ✅
- Multi-threaded low-contention: **100% stable**
- **BUT**: 4T high-contention: **70% crash rate**
### What Works
```bash
# ✅ Works perfectly (100% stable)
./larson_hakmem 1 1 128 1024 1 12345 1 # 1T: 2.74M ops/s
./larson_hakmem 2 8 128 1024 1 12345 2 # 2T: 4.91M ops/s
./larson_hakmem 10 8 128 256 1 12345 4 # 4T low: 251K ops/s
# ❌ Crashes 70% of the time
./larson_hakmem 10 8 128 1024 1 12345 4 # 4T high: 981K ops/s (when it works)
```
### What Breaks
**Crash pattern**:
```
free(): invalid pointer
[DEBUG] superslab_refill returned NULL (OOM) detail:
class=4 prev_ss=(nil) active=0 bitmap=0x00000000
prev_meta=(nil) used=0 cap=0 slab_idx=0
reused_freelist=0 free_idx=-2 errno=12
```
**Sequence of events**:
1. Thread exhausts SuperSlab for class 6 (or 1, 4)
2. `superslab_refill()` fails with OOM (errno=12, ENOMEM)
3. Code falls back to `malloc()` (libc malloc)
4. Now we have **mixed allocations**: some from HAKMEM, some from libc
5. `free()` receives a libc-allocated pointer
6. HAKMEM's free path tries to handle it → **CRASH**
---
## Root Cause Analysis (from Task Agent)
### The Mixed Allocation Problem
**File**: `core/box/hak_alloc_api.inc.h` or similar allocation paths
**Current behavior**:
```c
// Pseudo-code of current allocation path
void* hak_alloc(size_t size) {
// Try HAKMEM allocation
void* ptr = hak_tiny_alloc(size);
if (ptr) return ptr;
// HAKMEM failed (OOM) → fallback to libc malloc
return malloc(size); // ← PROBLEM: Now we have mixed allocations!
}
void hak_free(void* ptr) {
// Try to free as HAKMEM allocation
if (looks_like_hakmem(ptr)) {
hakmem_free(ptr); // ← PROBLEM: What if it's actually from malloc()?
} else {
free(ptr); // ← PROBLEM: What if we guessed wrong?
}
}
```
**Why this crashes**:
- HAKMEM can't distinguish between HAKMEM-allocated and malloc-allocated pointers
- Header-based detection is unreliable (malloc memory might look like HAKMEM headers)
- Cross-allocation free causes corruption/crashes
### Why SuperSlab OOM Happens
**High-contention scenario**:
- 4 threads × 1024 chunks each = 4096 concurrent allocations
- All threads allocate 128B blocks (class 4 or 6)
- SuperSlab runs out of slabs for that class
- No dynamic scaling → OOM
**Evidence**: `bitmap=0x00000000` means all 32 slabs exhausted
---
## Your Mission: 3 Potential Fixes (Choose Best Approach)
### Option A: Disable malloc Fallback (Recommended - Safest)
**Idea**: Make allocation failures explicit instead of silently falling back
**Implementation**:
**File**: Find the allocation path that does malloc fallback (likely `core/box/hak_alloc_api.inc.h` or `core/hakmem_tiny.c`)
**Change**:
```c
// Before (BROKEN):
void* hak_alloc(size_t size) {
void* ptr = hak_tiny_alloc(size);
if (ptr) return ptr;
// Fallback to malloc (causes mixed allocations)
return malloc(size); // ❌ BAD
}
// After (SAFE):
void* hak_alloc(size_t size) {
void* ptr = hak_tiny_alloc(size);
if (!ptr) {
// OOM: Log and fail explicitly
fprintf(stderr, "[HAKMEM] OOM for size=%zu, returning NULL\n", size);
errno = ENOMEM;
return NULL; // ✅ Explicit failure
}
return ptr;
}
```
**Pros**:
- Simple and safe
- No mixed allocations
- Caller can handle OOM explicitly
**Cons**:
- Applications must handle NULL returns
- Might break code that assumes malloc never fails
**Testing**:
```bash
# Should complete without crashes OR fail cleanly with OOM message
./larson_hakmem 10 8 128 1024 1 12345 4
```
---
### Option B: Fix SuperSlab Starvation (Recommended - Best Long-term)
**Idea**: Prevent OOM by dynamically scaling SuperSlab capacity
**Implementation**:
**File**: `core/tiny_superslab_alloc.inc.h` or SuperSlab management code
**Change 1: Detect starvation**:
```c
// In superslab_refill()
if (bitmap == 0x00000000) {
// All slabs exhausted → try to allocate more
fprintf(stderr, "[HAKMEM] SuperSlab class %d exhausted, allocating more...\n", class_idx);
// Allocate a new SuperSlab
SuperSlab* new_ss = allocate_superslab(class_idx);
if (new_ss) {
register_superslab(new_ss);
// Retry refill from new SuperSlab
return refill_from_superslab(new_ss, class_idx, count);
}
}
```
**Change 2: Increase initial capacity for hot classes**:
```c
// In SuperSlab initialization
// Classes 1, 4, 6 are hot in multi-threaded workloads
if (class_idx == 1 || class_idx == 4 || class_idx == 6) {
initial_slabs = 64; // Double capacity for hot classes
} else {
initial_slabs = 32; // Default
}
```
**Pros**:
- Fixes root cause (OOM)
- No mixed allocations needed
- Scales naturally with workload
**Cons**:
- More complex
- Memory overhead for extra SuperSlabs
**Testing**:
```bash
# Should complete 100% of the time without OOM
for i in {1..20}; do ./larson_hakmem 10 8 128 1024 1 12345 4; done
```
---
### Option C: Add Allocation Ownership Tracking (Comprehensive)
**Idea**: Track which allocator owns each pointer
**Implementation**:
**File**: `core/box/hak_free_api.inc.h` or free path
**Change 1: Add ownership bitmap**:
```c
// Global bitmap to track HAKMEM allocations
// Each bit represents a 64KB region
#define OWNERSHIP_BITMAP_SIZE (1ULL << 20) // 1M bits = 64GB coverage
static uint64_t g_hakmem_ownership_bitmap[OWNERSHIP_BITMAP_SIZE / 64];
// Mark allocation as HAKMEM-owned
static inline void mark_hakmem_allocation(void* ptr, size_t size) {
uintptr_t addr = (uintptr_t)ptr;
size_t region = addr / (64 * 1024); // 64KB regions
size_t word = region / 64;
size_t bit = region % 64;
atomic_fetch_or(&g_hakmem_ownership_bitmap[word], 1ULL << bit);
}
// Check if allocation is HAKMEM-owned
static inline int is_hakmem_allocation(void* ptr) {
uintptr_t addr = (uintptr_t)ptr;
size_t region = addr / (64 * 1024);
size_t word = region / 64;
size_t bit = region % 64;
return (g_hakmem_ownership_bitmap[word] & (1ULL << bit)) != 0;
}
```
**Change 2: Use ownership in free path**:
```c
void hak_free(void* ptr) {
if (is_hakmem_allocation(ptr)) {
hakmem_free(ptr); // ✅ Confirmed HAKMEM
} else {
free(ptr); // ✅ Confirmed libc malloc
}
}
```
**Pros**:
- Allows mixed allocations safely
- Works with existing malloc fallback
**Cons**:
- Complex to implement correctly
- Memory overhead for bitmap
- Atomic operations on free path
---
## Recommendation: **Combine Option A + Option B**
**Phase 1 (Immediate - 1 hour)**: Disable malloc fallback (Option A)
- Quick and safe fix
- Prevents crashes immediately
- Test 4T stability → should be 100%
**Phase 2 (Next - 2-4 hours)**: Fix SuperSlab starvation (Option B)
- Implement dynamic SuperSlab scaling
- Increase capacity for hot classes (1, 4, 6)
- Remove Option A workaround
**Phase 3 (Optional)**: Add ownership tracking (Option C) for defense-in-depth
---
## Testing Requirements
### Test 1: Stability (CRITICAL)
```bash
# Must achieve 100% success rate
for i in {1..20}; do
echo "Run $i:"
env HAKMEM_TINY_USE_SUPERSLAB=1 HAKMEM_TINY_MEM_DIET=0 \
./larson_hakmem 10 8 128 1024 1 12345 4 2>&1 | grep "Throughput"
echo "Exit code: $?"
done
# Expected: 20/20 success (100%)
```
### Test 2: Performance (No regression)
```bash
# Should maintain ~981K ops/s
env HAKMEM_TINY_USE_SUPERSLAB=1 HAKMEM_TINY_MEM_DIET=0 \
./larson_hakmem 10 8 128 1024 1 12345 4
# Expected: Throughput ≈ 981K ops/s (same as before)
```
### Test 3: Regression Check
```bash
# Ensure low-contention still works
./larson_hakmem 1 1 128 1024 1 12345 1 # 1T
./larson_hakmem 2 8 128 1024 1 12345 2 # 2T
./larson_hakmem 10 8 128 256 1 12345 4 # 4T low
# Expected: All complete successfully
```
---
## Success Criteria
**4T high-contention stability: 100% (20/20 runs)**
**No performance regression** (≥950K ops/s)
**No crashes or OOM errors**
**1T/2T/4T low-contention still work**
---
## Files to Review/Modify
**Likely files** (search for malloc fallback):
1. `core/box/hak_alloc_api.inc.h` - Main allocation API
2. `core/hakmem_tiny.c` - Tiny allocator implementation
3. `core/tiny_alloc_fast.inc.h` - Fast path allocation
4. `core/tiny_superslab_alloc.inc.h` - SuperSlab allocation
5. `core/hakmem_tiny_refill_p0.inc.h` - Refill logic
**Search commands**:
```bash
# Find malloc fallback
grep -rn "malloc(" core/ | grep -v "//.*malloc"
# Find OOM handling
grep -rn "errno.*ENOMEM\|OOM\|returned NULL" core/
# Find SuperSlab allocation
grep -rn "superslab_refill\|allocate.*superslab" core/
```
---
## Expected Deliverable
**Report file**: `/mnt/workdisk/public_share/hakmem/PHASE7_MIXED_ALLOCATION_FIX.md`
**Required sections**:
1. **Approach chosen** (A, B, C, or combination)
2. **Code changes** (diffs showing before/after)
3. **Why it works** (explanation of fix)
4. **Test results** (20/20 stability test)
5. **Performance impact** (before/after comparison)
6. **Production readiness** (YES/NO verdict)
---
## Context Documents
- `PHASE7_4T_STABILITY_VERIFICATION.md` - Recent stability test (30% success)
- `PHASE7_BUG3_FIX_REPORT.md` - Previous debugging attempts
- `PHASE7_FINAL_BENCHMARK_RESULTS.md` - Overall Phase 7 results
- `CLAUDE.md` - Project history and status
---
## Questions? Debug Hints
**Q: Where is the malloc fallback code?**
A: Search for `malloc(` in `core/box/*.inc.h` and `core/hakmem_tiny*.c`
**Q: How do I test just the fix without full rebuild?**
A: `make clean && make HEADER_CLASSIDX=1 AGGRESSIVE_INLINE=1 PREWARM_TLS=1 larson_hakmem`
**Q: What if Option A causes application crashes?**
A: That's expected if the app doesn't handle malloc failures. Move to Option B.
**Q: How do I know if SuperSlab OOM is fixed?**
A: No more `[DEBUG] superslab_refill returned NULL (OOM)` messages in output
---
**Good luck! Let's achieve 100% stability! 🚀**