Files
hakmem/TASK_FOR_OTHER_AI.md
Moe Charm (CI) 707056b765 feat: Phase 7 + Phase 2 - Massive performance & stability improvements
Performance Achievements:
- Tiny allocations: +180-280% (21M → 59-70M ops/s random mixed)
- Single-thread: +24% (2.71M → 3.36M ops/s Larson)
- 4T stability: 0% → 95% (19/20 success rate)
- Overall: 91.3% of System malloc average (target was 40-55%) ✓

Phase 7 (Tasks 1-3): Core Optimizations
- Task 1: Header validation removal (Region-ID direct lookup)
- Task 2: Aggressive inline (TLS cache access optimization)
- Task 3: Pre-warm TLS cache (eliminate cold-start penalty)
  Result: +180-280% improvement, 85-146% of System malloc

Critical Bug Fixes:
- Fix 64B allocation crash (size-to-class +1 for header)
- Fix 4T wrapper recursion bugs (BUG #7, #8, #10, #11)
- Remove malloc fallback (30% → 50% stability)

Phase 2a: SuperSlab Dynamic Expansion (CRITICAL)
- Implement mimalloc-style chunk linking
- Unlimited slab expansion (no more OOM at 32 slabs)
- Fix chunk initialization bug (bitmap=0x00000001 after expansion)
  Files: core/hakmem_tiny_superslab.c/h, core/superslab/superslab_types.h
  Result: 50% → 95% stability (19/20 4T success)

Phase 2b: TLS Cache Adaptive Sizing
- Dynamic capacity: 16-2048 slots based on usage
- High-water mark tracking + exponential growth/shrink
- Expected: +3-10% performance, -30-50% memory
  Files: core/tiny_adaptive_sizing.c/h (new)

Phase 2c: BigCache Dynamic Hash Table
- Migrate from fixed 256×8 array to dynamic hash table
- Auto-resize: 256 → 512 → 1024 → 65,536 buckets
- Improved hash function (FNV-1a) + collision chaining
  Files: core/hakmem_bigcache.c/h
  Expected: +10-20% cache hit rate

Design Flaws Analysis:
- Identified 6 components with fixed-capacity bottlenecks
- SuperSlab (CRITICAL), TLS Cache (HIGH), BigCache/L2.5 (MEDIUM)
- Report: DESIGN_FLAWS_ANALYSIS.md (11 chapters)

Documentation:
- 13 comprehensive reports (PHASE*.md, DESIGN_FLAWS*.md)
- Implementation guides, test results, production readiness
- Bug fix reports, root cause analysis

Build System:
- Makefile: phase7 targets, PREWARM_TLS flag
- Auto dependency generation (-MMD -MP) for .inc files

Known Issues:
- 4T stability: 19/20 (95%) - investigating 1 failure for 100%
- L2.5 Pool dynamic sharding: design only (needs 2-3 days integration)

🤖 Generated with Claude Code (https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-08 17:08:00 +09:00

393 lines
10 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Task for Other AI: Fix 4T High-Contention Crash (Mixed Allocation Bug)
**Date**: 2025-11-08
**Priority**: CRITICAL
**Status**: BLOCKING production deployment
---
## Executive Summary
**Problem**: 4T high-contention crash with **70% failure rate** (6/20 success)
**Root Cause Identified**: Mixed HAKMEM/libc allocations causing `free(): invalid pointer`
**Your Mission**: Fix the mixed allocation bug to achieve **100% stability**
---
## Background
### Current Status
Phase 7 optimization achieved **excellent performance**:
- Single-threaded: **91.3% of System malloc** (target was 40-55%) ✅
- Multi-threaded low-contention: **100% stable**
- **BUT**: 4T high-contention: **70% crash rate**
### What Works
```bash
# ✅ Works perfectly (100% stable)
./larson_hakmem 1 1 128 1024 1 12345 1 # 1T: 2.74M ops/s
./larson_hakmem 2 8 128 1024 1 12345 2 # 2T: 4.91M ops/s
./larson_hakmem 10 8 128 256 1 12345 4 # 4T low: 251K ops/s
# ❌ Crashes 70% of the time
./larson_hakmem 10 8 128 1024 1 12345 4 # 4T high: 981K ops/s (when it works)
```
### What Breaks
**Crash pattern**:
```
free(): invalid pointer
[DEBUG] superslab_refill returned NULL (OOM) detail:
class=4 prev_ss=(nil) active=0 bitmap=0x00000000
prev_meta=(nil) used=0 cap=0 slab_idx=0
reused_freelist=0 free_idx=-2 errno=12
```
**Sequence of events**:
1. Thread exhausts SuperSlab for class 6 (or 1, 4)
2. `superslab_refill()` fails with OOM (errno=12, ENOMEM)
3. Code falls back to `malloc()` (libc malloc)
4. Now we have **mixed allocations**: some from HAKMEM, some from libc
5. `free()` receives a libc-allocated pointer
6. HAKMEM's free path tries to handle it → **CRASH**
---
## Root Cause Analysis (from Task Agent)
### The Mixed Allocation Problem
**File**: `core/box/hak_alloc_api.inc.h` or similar allocation paths
**Current behavior**:
```c
// Pseudo-code of current allocation path
void* hak_alloc(size_t size) {
// Try HAKMEM allocation
void* ptr = hak_tiny_alloc(size);
if (ptr) return ptr;
// HAKMEM failed (OOM) → fallback to libc malloc
return malloc(size); // ← PROBLEM: Now we have mixed allocations!
}
void hak_free(void* ptr) {
// Try to free as HAKMEM allocation
if (looks_like_hakmem(ptr)) {
hakmem_free(ptr); // ← PROBLEM: What if it's actually from malloc()?
} else {
free(ptr); // ← PROBLEM: What if we guessed wrong?
}
}
```
**Why this crashes**:
- HAKMEM can't distinguish between HAKMEM-allocated and malloc-allocated pointers
- Header-based detection is unreliable (malloc memory might look like HAKMEM headers)
- Cross-allocation free causes corruption/crashes
### Why SuperSlab OOM Happens
**High-contention scenario**:
- 4 threads × 1024 chunks each = 4096 concurrent allocations
- All threads allocate 128B blocks (class 4 or 6)
- SuperSlab runs out of slabs for that class
- No dynamic scaling → OOM
**Evidence**: `bitmap=0x00000000` means all 32 slabs exhausted
---
## Your Mission: 3 Potential Fixes (Choose Best Approach)
### Option A: Disable malloc Fallback (Recommended - Safest)
**Idea**: Make allocation failures explicit instead of silently falling back
**Implementation**:
**File**: Find the allocation path that does malloc fallback (likely `core/box/hak_alloc_api.inc.h` or `core/hakmem_tiny.c`)
**Change**:
```c
// Before (BROKEN):
void* hak_alloc(size_t size) {
void* ptr = hak_tiny_alloc(size);
if (ptr) return ptr;
// Fallback to malloc (causes mixed allocations)
return malloc(size); // ❌ BAD
}
// After (SAFE):
void* hak_alloc(size_t size) {
void* ptr = hak_tiny_alloc(size);
if (!ptr) {
// OOM: Log and fail explicitly
fprintf(stderr, "[HAKMEM] OOM for size=%zu, returning NULL\n", size);
errno = ENOMEM;
return NULL; // ✅ Explicit failure
}
return ptr;
}
```
**Pros**:
- Simple and safe
- No mixed allocations
- Caller can handle OOM explicitly
**Cons**:
- Applications must handle NULL returns
- Might break code that assumes malloc never fails
**Testing**:
```bash
# Should complete without crashes OR fail cleanly with OOM message
./larson_hakmem 10 8 128 1024 1 12345 4
```
---
### Option B: Fix SuperSlab Starvation (Recommended - Best Long-term)
**Idea**: Prevent OOM by dynamically scaling SuperSlab capacity
**Implementation**:
**File**: `core/tiny_superslab_alloc.inc.h` or SuperSlab management code
**Change 1: Detect starvation**:
```c
// In superslab_refill()
if (bitmap == 0x00000000) {
// All slabs exhausted → try to allocate more
fprintf(stderr, "[HAKMEM] SuperSlab class %d exhausted, allocating more...\n", class_idx);
// Allocate a new SuperSlab
SuperSlab* new_ss = allocate_superslab(class_idx);
if (new_ss) {
register_superslab(new_ss);
// Retry refill from new SuperSlab
return refill_from_superslab(new_ss, class_idx, count);
}
}
```
**Change 2: Increase initial capacity for hot classes**:
```c
// In SuperSlab initialization
// Classes 1, 4, 6 are hot in multi-threaded workloads
if (class_idx == 1 || class_idx == 4 || class_idx == 6) {
initial_slabs = 64; // Double capacity for hot classes
} else {
initial_slabs = 32; // Default
}
```
**Pros**:
- Fixes root cause (OOM)
- No mixed allocations needed
- Scales naturally with workload
**Cons**:
- More complex
- Memory overhead for extra SuperSlabs
**Testing**:
```bash
# Should complete 100% of the time without OOM
for i in {1..20}; do ./larson_hakmem 10 8 128 1024 1 12345 4; done
```
---
### Option C: Add Allocation Ownership Tracking (Comprehensive)
**Idea**: Track which allocator owns each pointer
**Implementation**:
**File**: `core/box/hak_free_api.inc.h` or free path
**Change 1: Add ownership bitmap**:
```c
// Global bitmap to track HAKMEM allocations
// Each bit represents a 64KB region
#define OWNERSHIP_BITMAP_SIZE (1ULL << 20) // 1M bits = 64GB coverage
static uint64_t g_hakmem_ownership_bitmap[OWNERSHIP_BITMAP_SIZE / 64];
// Mark allocation as HAKMEM-owned
static inline void mark_hakmem_allocation(void* ptr, size_t size) {
uintptr_t addr = (uintptr_t)ptr;
size_t region = addr / (64 * 1024); // 64KB regions
size_t word = region / 64;
size_t bit = region % 64;
atomic_fetch_or(&g_hakmem_ownership_bitmap[word], 1ULL << bit);
}
// Check if allocation is HAKMEM-owned
static inline int is_hakmem_allocation(void* ptr) {
uintptr_t addr = (uintptr_t)ptr;
size_t region = addr / (64 * 1024);
size_t word = region / 64;
size_t bit = region % 64;
return (g_hakmem_ownership_bitmap[word] & (1ULL << bit)) != 0;
}
```
**Change 2: Use ownership in free path**:
```c
void hak_free(void* ptr) {
if (is_hakmem_allocation(ptr)) {
hakmem_free(ptr); // ✅ Confirmed HAKMEM
} else {
free(ptr); // ✅ Confirmed libc malloc
}
}
```
**Pros**:
- Allows mixed allocations safely
- Works with existing malloc fallback
**Cons**:
- Complex to implement correctly
- Memory overhead for bitmap
- Atomic operations on free path
---
## Recommendation: **Combine Option A + Option B**
**Phase 1 (Immediate - 1 hour)**: Disable malloc fallback (Option A)
- Quick and safe fix
- Prevents crashes immediately
- Test 4T stability → should be 100%
**Phase 2 (Next - 2-4 hours)**: Fix SuperSlab starvation (Option B)
- Implement dynamic SuperSlab scaling
- Increase capacity for hot classes (1, 4, 6)
- Remove Option A workaround
**Phase 3 (Optional)**: Add ownership tracking (Option C) for defense-in-depth
---
## Testing Requirements
### Test 1: Stability (CRITICAL)
```bash
# Must achieve 100% success rate
for i in {1..20}; do
echo "Run $i:"
env HAKMEM_TINY_USE_SUPERSLAB=1 HAKMEM_TINY_MEM_DIET=0 \
./larson_hakmem 10 8 128 1024 1 12345 4 2>&1 | grep "Throughput"
echo "Exit code: $?"
done
# Expected: 20/20 success (100%)
```
### Test 2: Performance (No regression)
```bash
# Should maintain ~981K ops/s
env HAKMEM_TINY_USE_SUPERSLAB=1 HAKMEM_TINY_MEM_DIET=0 \
./larson_hakmem 10 8 128 1024 1 12345 4
# Expected: Throughput ≈ 981K ops/s (same as before)
```
### Test 3: Regression Check
```bash
# Ensure low-contention still works
./larson_hakmem 1 1 128 1024 1 12345 1 # 1T
./larson_hakmem 2 8 128 1024 1 12345 2 # 2T
./larson_hakmem 10 8 128 256 1 12345 4 # 4T low
# Expected: All complete successfully
```
---
## Success Criteria
**4T high-contention stability: 100% (20/20 runs)**
**No performance regression** (≥950K ops/s)
**No crashes or OOM errors**
**1T/2T/4T low-contention still work**
---
## Files to Review/Modify
**Likely files** (search for malloc fallback):
1. `core/box/hak_alloc_api.inc.h` - Main allocation API
2. `core/hakmem_tiny.c` - Tiny allocator implementation
3. `core/tiny_alloc_fast.inc.h` - Fast path allocation
4. `core/tiny_superslab_alloc.inc.h` - SuperSlab allocation
5. `core/hakmem_tiny_refill_p0.inc.h` - Refill logic
**Search commands**:
```bash
# Find malloc fallback
grep -rn "malloc(" core/ | grep -v "//.*malloc"
# Find OOM handling
grep -rn "errno.*ENOMEM\|OOM\|returned NULL" core/
# Find SuperSlab allocation
grep -rn "superslab_refill\|allocate.*superslab" core/
```
---
## Expected Deliverable
**Report file**: `/mnt/workdisk/public_share/hakmem/PHASE7_MIXED_ALLOCATION_FIX.md`
**Required sections**:
1. **Approach chosen** (A, B, C, or combination)
2. **Code changes** (diffs showing before/after)
3. **Why it works** (explanation of fix)
4. **Test results** (20/20 stability test)
5. **Performance impact** (before/after comparison)
6. **Production readiness** (YES/NO verdict)
---
## Context Documents
- `PHASE7_4T_STABILITY_VERIFICATION.md` - Recent stability test (30% success)
- `PHASE7_BUG3_FIX_REPORT.md` - Previous debugging attempts
- `PHASE7_FINAL_BENCHMARK_RESULTS.md` - Overall Phase 7 results
- `CLAUDE.md` - Project history and status
---
## Questions? Debug Hints
**Q: Where is the malloc fallback code?**
A: Search for `malloc(` in `core/box/*.inc.h` and `core/hakmem_tiny*.c`
**Q: How do I test just the fix without full rebuild?**
A: `make clean && make HEADER_CLASSIDX=1 AGGRESSIVE_INLINE=1 PREWARM_TLS=1 larson_hakmem`
**Q: What if Option A causes application crashes?**
A: That's expected if the app doesn't handle malloc failures. Move to Option B.
**Q: How do I know if SuperSlab OOM is fixed?**
A: No more `[DEBUG] superslab_refill returned NULL (OOM)` messages in output
---
**Good luck! Let's achieve 100% stability! 🚀**