Files
hakmem/docs/analysis/PHASE7_BUG3_FIX_REPORT.md

461 lines
13 KiB
Markdown
Raw Normal View History

feat: Phase 7 + Phase 2 - Massive performance & stability improvements Performance Achievements: - Tiny allocations: +180-280% (21M → 59-70M ops/s random mixed) - Single-thread: +24% (2.71M → 3.36M ops/s Larson) - 4T stability: 0% → 95% (19/20 success rate) - Overall: 91.3% of System malloc average (target was 40-55%) ✓ Phase 7 (Tasks 1-3): Core Optimizations - Task 1: Header validation removal (Region-ID direct lookup) - Task 2: Aggressive inline (TLS cache access optimization) - Task 3: Pre-warm TLS cache (eliminate cold-start penalty) Result: +180-280% improvement, 85-146% of System malloc Critical Bug Fixes: - Fix 64B allocation crash (size-to-class +1 for header) - Fix 4T wrapper recursion bugs (BUG #7, #8, #10, #11) - Remove malloc fallback (30% → 50% stability) Phase 2a: SuperSlab Dynamic Expansion (CRITICAL) - Implement mimalloc-style chunk linking - Unlimited slab expansion (no more OOM at 32 slabs) - Fix chunk initialization bug (bitmap=0x00000001 after expansion) Files: core/hakmem_tiny_superslab.c/h, core/superslab/superslab_types.h Result: 50% → 95% stability (19/20 4T success) Phase 2b: TLS Cache Adaptive Sizing - Dynamic capacity: 16-2048 slots based on usage - High-water mark tracking + exponential growth/shrink - Expected: +3-10% performance, -30-50% memory Files: core/tiny_adaptive_sizing.c/h (new) Phase 2c: BigCache Dynamic Hash Table - Migrate from fixed 256×8 array to dynamic hash table - Auto-resize: 256 → 512 → 1024 → 65,536 buckets - Improved hash function (FNV-1a) + collision chaining Files: core/hakmem_bigcache.c/h Expected: +10-20% cache hit rate Design Flaws Analysis: - Identified 6 components with fixed-capacity bottlenecks - SuperSlab (CRITICAL), TLS Cache (HIGH), BigCache/L2.5 (MEDIUM) - Report: DESIGN_FLAWS_ANALYSIS.md (11 chapters) Documentation: - 13 comprehensive reports (PHASE*.md, DESIGN_FLAWS*.md) - Implementation guides, test results, production readiness - Bug fix reports, root cause analysis Build System: - Makefile: phase7 targets, PREWARM_TLS flag - Auto dependency generation (-MMD -MP) for .inc files Known Issues: - 4T stability: 19/20 (95%) - investigating 1 failure for 100% - L2.5 Pool dynamic sharding: design only (needs 2-3 days integration) 🤖 Generated with Claude Code (https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-08 17:08:00 +09:00
# Phase 7 Bug #3: 4T High-Contention Crash Debug Report
**Date:** 2025-11-08
**Engineer:** Claude Task Agent
**Duration:** 2.5 hours
**Goal:** Fix 4T Larson crash with 1024 chunks/thread (high contention)
---
## Summary
**Result:** PARTIAL SUCCESS - Fixed 4 critical bugs but crash persists
**Success Rate:** 35% (7/20 runs) - same as before fixes
**Root Cause:** Multiple interacting issues; deeper investigation needed
**Bugs Fixed:**
1. BUG #7: malloc() wrapper `g_hakmem_lock_depth++` called too late
2. BUG #8: calloc() wrapper `g_hakmem_lock_depth++` called too late
3. BUG #10: dlopen() called on hot path causing infinite recursion
4. BUG #11: Unprotected fprintf() in OOM logging paths
**Status:** These fixes are NECESSARY but NOT SUFFICIENT to solve the crash
---
## Bug Details
### BUG #7: malloc() Wrapper Lock Depth (FIXED)
**File:** `/mnt/workdisk/public_share/hakmem/core/box/hak_wrappers.inc.h:40-99`
**Problem:**
```c
// BEFORE (WRONG):
void* malloc(size_t size) {
if (g_initializing != 0) { return __libc_malloc(size); }
// BUG: getenv/fprintf/dlopen called BEFORE g_hakmem_lock_depth++
static int debug_enabled = -1;
if (debug_enabled < 0) {
debug_enabled = (getenv("HAKMEM_SFC_DEBUG") != NULL) ? 1 : 0; // malloc!
}
if (debug_enabled) fprintf(stderr, "[DEBUG] malloc(%zu)\n", size); // malloc!
if (hak_force_libc_alloc()) { ... } // calls getenv → malloc!
int ld_mode = hak_ld_env_mode(); // calls getenv → malloc!
if (ld_mode && hak_jemalloc_loaded()) { ... } // calls dlopen → malloc!
g_hakmem_lock_depth++; // TOO LATE!
void* ptr = hak_alloc_at(size, HAK_CALLSITE());
g_hakmem_lock_depth--;
return ptr;
}
```
**Why It Crashes:**
1. `getenv()` doesn't malloc, but `fprintf()` does (for stderr buffering)
2. `dlopen()` **definitely** mallocs (internal data structures)
3. When these malloc, they call back into our wrapper → infinite recursion
4. Result: `free(): invalid pointer` (corrupted metadata)
**Fix:**
```c
// AFTER (CORRECT):
void* malloc(size_t size) {
// CRITICAL FIX: Increment lock depth FIRST!
g_hakmem_lock_depth++;
// Guard against recursion
if (g_initializing != 0) {
g_hakmem_lock_depth--;
return __libc_malloc(size);
}
// Now safe - any malloc from getenv/fprintf/dlopen uses __libc_malloc
static int debug_enabled = -1;
if (debug_enabled < 0) {
debug_enabled = (getenv("HAKMEM_SFC_DEBUG") != NULL) ? 1 : 0; // OK!
}
// ... rest of code
void* ptr = hak_alloc_at(size, HAK_CALLSITE());
g_hakmem_lock_depth--; // Decrement at end
return ptr;
}
```
**Impact:** Prevents infinite recursion when malloc wrapper calls libc functions
---
### BUG #8: calloc() Wrapper Lock Depth (FIXED)
**File:** `/mnt/workdisk/public_share/hakmem/core/box/hak_wrappers.inc.h:117-180`
**Problem:** Same as BUG #7 - `g_hakmem_lock_depth++` called after getenv/dlopen
**Fix:** Move `g_hakmem_lock_depth++` to line 119 (function start)
**Impact:** Prevents calloc infinite recursion
---
### BUG #10: dlopen() on Hot Path (FIXED)
**File:**
- `/mnt/workdisk/public_share/hakmem/core/hakmem.c:166-174` (hak_jemalloc_loaded function)
- `/mnt/workdisk/public_share/hakmem/core/box/hak_core_init.inc.h:43-55` (initialization)
- `/mnt/workdisk/public_share/hakmem/core/box/hak_wrappers.inc.h:42,72,112,149,192` (wrapper call sites)
**Problem:**
```c
// OLD (DANGEROUS):
static inline int hak_jemalloc_loaded(void) {
if (g_jemalloc_loaded < 0) {
void* h = dlopen("libjemalloc.so.2", RTLD_NOLOAD | RTLD_NOW); // MALLOC!
if (!h) h = dlopen("libjemalloc.so.1", RTLD_NOLOAD | RTLD_NOW); // MALLOC!
g_jemalloc_loaded = (h != NULL) ? 1 : 0;
if (h) dlclose(h); // MALLOC!
}
return g_jemalloc_loaded;
}
// Called from malloc wrapper:
if (hak_ld_block_jemalloc() && hak_jemalloc_loaded()) { // dlopen → malloc → wrapper → dlopen → ...
return __libc_malloc(size);
}
```
**Why It Crashes:**
- `dlopen()` calls malloc internally (dynamic linker allocations)
- Wrapper calls `hak_jemalloc_loaded()``dlopen()``malloc()` → wrapper → infinite loop
**Fix:**
1. Pre-detect jemalloc during initialization (hak_init_impl):
```c
// In hak_core_init.inc.h:43-55
extern int g_jemalloc_loaded;
if (g_jemalloc_loaded < 0) {
void* h = dlopen("libjemalloc.so.2", RTLD_NOLOAD | RTLD_NOW);
if (!h) h = dlopen("libjemalloc.so.1", RTLD_NOLOAD | RTLD_NOW);
g_jemalloc_loaded = (h != NULL) ? 1 : 0;
if (h) dlclose(h);
}
```
2. Use cached variable in wrapper:
```c
// In hak_wrappers.inc.h
extern int g_jemalloc_loaded; // Declared at top
// In malloc():
if (hak_ld_block_jemalloc() && g_jemalloc_loaded) { // No function call!
g_hakmem_lock_depth--;
return __libc_malloc(size);
}
```
**Impact:** Removes dlopen from hot path, prevents infinite recursion
---
### BUG #11: Unprotected fprintf() in OOM Logging (FIXED)
**Files:**
- `/mnt/workdisk/public_share/hakmem/core/hakmem_tiny_superslab.c:146-177` (log_superslab_oom_once)
- `/mnt/workdisk/public_share/hakmem/core/tiny_superslab_alloc.inc.h:391-411` (superslab_refill debug)
**Problem 1: log_superslab_oom_once (PARTIALLY FIXED BEFORE)**
```c
// OLD (WRONG):
static void log_superslab_oom_once(...) {
g_hakmem_lock_depth++;
FILE* status = fopen("/proc/self/status", "r"); // OK (lock_depth=1)
// ... read file ...
fclose(status); // OK (lock_depth=1)
g_hakmem_lock_depth--; // WRONG LOCATION!
// BUG: fprintf called AFTER lock_depth restored to 0!
fprintf(stderr, "[SS OOM] ..."); // fprintf → malloc → wrapper (lock_depth=0) → CRASH!
}
```
**Fix 1:**
```c
// NEW (CORRECT):
static void log_superslab_oom_once(...) {
g_hakmem_lock_depth++;
FILE* status = fopen("/proc/self/status", "r");
// ... read file ...
fclose(status);
// Don't decrement yet! fprintf needs protection
fprintf(stderr, "[SS OOM] ..."); // OK (lock_depth still 1)
g_hakmem_lock_depth--; // Now safe (all libc calls done)
}
```
**Problem 2: superslab_refill debug message (NEW BUG FOUND)**
```c
// OLD (WRONG):
SuperSlab* ss = superslab_allocate((uint8_t)class_idx);
if (!ss) {
if (!g_superslab_refill_debug_once) {
g_superslab_refill_debug_once = 1;
int err = errno;
fprintf(stderr, "[DEBUG] superslab_refill returned NULL (OOM) ..."); // UNPROTECTED!
}
return NULL;
}
```
**Fix 2:**
```c
// NEW (CORRECT):
SuperSlab* ss = superslab_allocate((uint8_t)class_idx);
if (!ss) {
if (!g_superslab_refill_debug_once) {
g_superslab_refill_debug_once = 1;
int err = errno;
extern __thread int g_hakmem_lock_depth;
g_hakmem_lock_depth++;
fprintf(stderr, "[DEBUG] superslab_refill returned NULL (OOM) ...");
g_hakmem_lock_depth--;
}
return NULL;
}
```
**Impact:** Prevents fprintf from triggering malloc on wrapper hot path
---
## Test Results
### Before Fixes
- **Success Rate:** 35% (estimated based on REMAINING_BUGS_ANALYSIS.md: 70% → 30% with previous fixes)
- **Crash:** `free(): invalid pointer` from libc
### After ALL Fixes (BUG #7, #8, #10, #11)
```bash
Testing 4T Larson high-contention (20 runs)...
Success: 7/20
Failed: 13/20
Success rate: 35%
```
**Conclusion:** No improvement. The fixes are correct but address only PART of the problem.
---
## Root Cause Analysis
### Why Fixes Didn't Help
The crash is **NOT** solely due to wrapper recursion. Evidence:
1. **OOM Happens First:**
```
[DEBUG] superslab_refill returned NULL (OOM)
[DEBUG] Phase 7: tiny_alloc(1024) rejected, using malloc fallback
free(): invalid pointer
```
2. **Malloc Fallback Path:**
When Tiny allocation fails (OOM), it falls back to `hak_alloc_malloc_impl()`:
```c
// core/box/hak_alloc_api.inc.h:43
void* fallback_ptr = hak_alloc_malloc_impl(size);
```
This allocates with:
```c
void* raw = __libc_malloc(HEADER_SIZE + size); // Allocate with libc
// Write HAKMEM header
hdr->magic = HAKMEM_MAGIC;
hdr->method = ALLOC_METHOD_MALLOC;
return raw + HEADER_SIZE; // Return user pointer
```
3. **Free Path Should Work:**
When this pointer is freed, `hak_free_at()` should:
- Step 2 (line 92-120): Detect HAKMEM_MAGIC header
- Check `hdr->method == ALLOC_METHOD_MALLOC`
- Call `__libc_free(raw)` correctly
4. **So Why Does It Crash?**
**Hypothesis 1:** Race condition in header write/read
**Hypothesis 2:** OOM causes memory corruption before crash
**Hypothesis 3:** Multiple allocations in flight, one corrupts another's metadata
**Hypothesis 4:** Libc malloc returns pointer that overlaps with HAKMEM memory
---
## Next Steps (Recommended)
### Immediate (High Priority)
1. **Add Comprehensive Logging:**
```c
// In hak_alloc_malloc_impl():
fprintf(stderr, "[FALLBACK_ALLOC] size=%zu raw=%p user=%p\n", size, raw, raw + HEADER_SIZE);
// In hak_free_at() step 2:
fprintf(stderr, "[FALLBACK_FREE] ptr=%p raw=%p magic=0x%X method=%d\n",
ptr, raw, hdr->magic, hdr->method);
```
2. **Test with Valgrind:**
```bash
valgrind --leak-check=full --show-leak-kinds=all --track-origins=yes \
./larson_hakmem 10 8 128 1024 1 12345 4
```
3. **Test with ASan:**
```bash
make asan-larson-alloc
./larson_hakmem_asan_alloc 10 8 128 1024 1 12345 4
```
### Medium Priority
4. **Disable Fallback Path Temporarily:**
```c
// In hak_alloc_api.inc.h:36
if (size <= TINY_MAX_SIZE) {
// TEST: Return NULL instead of fallback
return NULL; // Force application to handle OOM
}
```
5. **Increase Memory Limit:**
```bash
ulimit -v unlimited
./larson_hakmem 10 8 128 1024 1 12345 4
```
6. **Reduce Contention:**
```bash
# Test with fewer chunks to avoid OOM
./larson_hakmem 10 8 128 512 1 12345 4 # 512 instead of 1024
```
### Root Cause Investigation
7. **Check Active Counter Logic:**
The OOM suggests active counter underflow. Review:
- `/mnt/workdisk/public_share/hakmem/core/hakmem_tiny_refill_p0.inc.h:103` (ss_active_add fix from Phase 6-2.3)
- All `ss_active_add()` / `ss_active_dec()` call sites
8. **Check SuperSlab Allocation:**
```bash
# Enable detailed SS logging
HAKMEM_SUPER_REG_REQTRACE=1 HAKMEM_FREE_ROUTE_TRACE=1 \
./larson_hakmem 10 8 128 1024 1 12345 4
```
---
## Production Impact
**Status:** NOT READY FOR PRODUCTION
**Blocking Issues:**
1. 65% crash rate on 4T high-contention workload
2. Unknown root cause (wrapper fixes necessary but insufficient)
3. Potential active counter bug or memory corruption
**Safe Configurations:**
- 1T: 100% stable (2.97M ops/s)
- 4T low-contention (256 chunks): 100% stable (251K ops/s)
- 4T high-contention (1024 chunks): 35% stable (981K ops/s when stable)
---
## Code Changes
### Modified Files
1. `/mnt/workdisk/public_share/hakmem/core/box/hak_wrappers.inc.h`
- Line 40-99: malloc() - moved `g_hakmem_lock_depth++` to start
- Line 117-180: calloc() - moved `g_hakmem_lock_depth++` to start
- Line 42: Added extern declaration for `g_jemalloc_loaded`
- Lines 72,112,149,192: Changed `hak_jemalloc_loaded()``g_jemalloc_loaded`
2. `/mnt/workdisk/public_share/hakmem/core/box/hak_core_init.inc.h`
- Lines 43-55: Pre-detect jemalloc during init (not hot path)
3. `/mnt/workdisk/public_share/hakmem/core/hakmem_tiny_superslab.c`
- Line 146→177: Moved `g_hakmem_lock_depth--` to AFTER fprintf
4. `/mnt/workdisk/public_share/hakmem/core/tiny_superslab_alloc.inc.h`
- Lines 392-411: Added `g_hakmem_lock_depth++/--` around fprintf
### Build Command
```bash
make clean
make HEADER_CLASSIDX=1 AGGRESSIVE_INLINE=1 PREWARM_TLS=1 larson_hakmem
```
### Test Command
```bash
# 4T high-contention
./larson_hakmem 10 8 128 1024 1 12345 4
# 20-run stability test
bash /tmp/test_larson_20.sh
```
---
## Lessons Learned
1. **Wrapper Recursion is Insidious:**
- Any libc function that might malloc must be protected
- `getenv()`, `fprintf()`, `dlopen()`, `fopen()`, `fclose()` ALL can malloc
- `g_hakmem_lock_depth` must be incremented BEFORE any libc call
2. **Debug Code Can Cause Bugs:**
- fprintf in hot paths is dangerous
- Debug messages should either be compile-time disabled or fully protected
3. **Initialization Order Matters:**
- dlopen must happen during init, not on first malloc
- Cached values avoid hot-path overhead and recursion risk
4. **Multiple Bugs Can Hide Each Other:**
- Fixing wrapper recursion (BUG #7,#8) didn't improve stability
- Real issue is deeper (OOM, active counter, or corruption)
---
## Recommendations for User
**Short Term (今すぐ):**
- Use 4T with 256 chunks/thread (100% stable)
- Avoid 4T with 1024+ chunks until root cause found
**Medium Term (1-2 days):**
- Run Valgrind/ASan analysis (see "Next Steps")
- Investigate active counter logic
- Add comprehensive logging to fallback path
**Long Term (1 week):**
- Consider disabling fallback path (fail fast instead of corrupt)
- Implement active counter assertions to catch underflow early
- Add memory fence/barrier around header writes in fallback path
---
**End of Report**
がんばりました! 4つのバグを修正しましたが、根本原因はまだ深いところにあります。次は Valgrind/ASan で詳細調査が必要です。🔥🐛