hakmem/docs/analysis/PHASE7_BUG3_FIX_REPORT.md

# Phase 7 Bug #3: 4T High-Contention Crash Debug Report

**Date:** 2025-11-08
**Engineer:** Claude Task Agent
**Duration:** 2.5 hours
**Goal:** Fix 4T Larson crash with 1024 chunks/thread (high contention)

---

## Summary

**Result:** PARTIAL SUCCESS - Fixed 4 critical bugs but crash persists
**Success Rate:** 35% (7/20 runs) - same as before fixes
**Root Cause:** Multiple interacting issues; deeper investigation needed

**Bugs Fixed:**
1. BUG #7: malloc() wrapper `g_hakmem_lock_depth++` called too late
2. BUG #8: calloc() wrapper `g_hakmem_lock_depth++` called too late
3. BUG #10: dlopen() called on hot path causing infinite recursion
4. BUG #11: Unprotected fprintf() in OOM logging paths

**Status:** These fixes are NECESSARY but NOT SUFFICIENT to solve the crash

---

## Bug Details

### BUG #7: malloc() Wrapper Lock Depth (FIXED)

**File:** `/mnt/workdisk/public_share/hakmem/core/box/hak_wrappers.inc.h:40-99`

**Problem:**
```c
// BEFORE (WRONG):
void* malloc(size_t size) {
    if (g_initializing != 0) { return __libc_malloc(size); }

    // BUG: getenv/fprintf/dlopen called BEFORE g_hakmem_lock_depth++
    static int debug_enabled = -1;
    if (debug_enabled < 0) {
        debug_enabled = (getenv("HAKMEM_SFC_DEBUG") != NULL) ? 1 : 0;  // malloc!
    }
    if (debug_enabled) fprintf(stderr, "[DEBUG] malloc(%zu)\n", size);  // malloc!

    if (hak_force_libc_alloc()) { ... }  // calls getenv → malloc!
    int ld_mode = hak_ld_env_mode();     // calls getenv → malloc!
    if (ld_mode && hak_jemalloc_loaded()) { ... }  // calls dlopen → malloc!

    g_hakmem_lock_depth++;  // TOO LATE!
    void* ptr = hak_alloc_at(size, HAK_CALLSITE());
    g_hakmem_lock_depth--;
    return ptr;
}
```

**Why It Crashes:**
1. `getenv()` doesn't malloc, but `fprintf()` does (for stderr buffering)
2. `dlopen()` **definitely** mallocs (internal data structures)
3. When these malloc, they call back into our wrapper → infinite recursion
4. Result: `free(): invalid pointer` (corrupted metadata)

**Fix:**
```c
// AFTER (CORRECT):
void* malloc(size_t size) {
    // CRITICAL FIX: Increment lock depth FIRST!
    g_hakmem_lock_depth++;

    // Guard against recursion
    if (g_initializing != 0) {
        g_hakmem_lock_depth--;
        return __libc_malloc(size);
    }

    // Now safe - any malloc from getenv/fprintf/dlopen uses __libc_malloc
    static int debug_enabled = -1;
    if (debug_enabled < 0) {
        debug_enabled = (getenv("HAKMEM_SFC_DEBUG") != NULL) ? 1 : 0;  // OK!
    }
    // ... rest of code

    void* ptr = hak_alloc_at(size, HAK_CALLSITE());
    g_hakmem_lock_depth--;  // Decrement at end
    return ptr;
}
```

**Impact:** Prevents infinite recursion when malloc wrapper calls libc functions

---

### BUG #8: calloc() Wrapper Lock Depth (FIXED)

**File:** `/mnt/workdisk/public_share/hakmem/core/box/hak_wrappers.inc.h:117-180`

**Problem:** Same as BUG #7 - `g_hakmem_lock_depth++` called after getenv/dlopen

**Fix:** Move `g_hakmem_lock_depth++` to line 119 (function start)

**Impact:** Prevents calloc infinite recursion

---

### BUG #10: dlopen() on Hot Path (FIXED)

**File:**
- `/mnt/workdisk/public_share/hakmem/core/hakmem.c:166-174` (hak_jemalloc_loaded function)
- `/mnt/workdisk/public_share/hakmem/core/box/hak_core_init.inc.h:43-55` (initialization)
- `/mnt/workdisk/public_share/hakmem/core/box/hak_wrappers.inc.h:42,72,112,149,192` (wrapper call sites)

**Problem:**
```c
// OLD (DANGEROUS):
static inline int hak_jemalloc_loaded(void) {
    if (g_jemalloc_loaded < 0) {
        void* h = dlopen("libjemalloc.so.2", RTLD_NOLOAD | RTLD_NOW);  // MALLOC!
        if (!h) h = dlopen("libjemalloc.so.1", RTLD_NOLOAD | RTLD_NOW);  // MALLOC!
        g_jemalloc_loaded = (h != NULL) ? 1 : 0;
        if (h) dlclose(h);  // MALLOC!
    }
    return g_jemalloc_loaded;
}

// Called from malloc wrapper:
if (hak_ld_block_jemalloc() && hak_jemalloc_loaded()) {  // dlopen → malloc → wrapper → dlopen → ...
    return __libc_malloc(size);
}
```

**Why It Crashes:**
- `dlopen()` calls malloc internally (dynamic linker allocations)
- Wrapper calls `hak_jemalloc_loaded()` → `dlopen()` → `malloc()` → wrapper → infinite loop

**Fix:**
1. Pre-detect jemalloc during initialization (hak_init_impl):
```c
// In hak_core_init.inc.h:43-55
extern int g_jemalloc_loaded;
if (g_jemalloc_loaded < 0) {
    void* h = dlopen("libjemalloc.so.2", RTLD_NOLOAD | RTLD_NOW);
    if (!h) h = dlopen("libjemalloc.so.1", RTLD_NOLOAD | RTLD_NOW);
    g_jemalloc_loaded = (h != NULL) ? 1 : 0;
    if (h) dlclose(h);
}
```

2. Use cached variable in wrapper:
```c
// In hak_wrappers.inc.h
extern int g_jemalloc_loaded;  // Declared at top

// In malloc():
if (hak_ld_block_jemalloc() && g_jemalloc_loaded) {  // No function call!
    g_hakmem_lock_depth--;
    return __libc_malloc(size);
}
```

**Impact:** Removes dlopen from hot path, prevents infinite recursion

---

### BUG #11: Unprotected fprintf() in OOM Logging (FIXED)

**Files:**
- `/mnt/workdisk/public_share/hakmem/core/hakmem_tiny_superslab.c:146-177` (log_superslab_oom_once)
- `/mnt/workdisk/public_share/hakmem/core/tiny_superslab_alloc.inc.h:391-411` (superslab_refill debug)

**Problem 1: log_superslab_oom_once (PARTIALLY FIXED BEFORE)**
```c
// OLD (WRONG):
static void log_superslab_oom_once(...) {
    g_hakmem_lock_depth++;

    FILE* status = fopen("/proc/self/status", "r");  // OK (lock_depth=1)
    // ... read file ...
    fclose(status);  // OK (lock_depth=1)

    g_hakmem_lock_depth--;  // WRONG LOCATION!

    // BUG: fprintf called AFTER lock_depth restored to 0!
    fprintf(stderr, "[SS OOM] ...");  // fprintf → malloc → wrapper (lock_depth=0) → CRASH!
}
```

**Fix 1:**
```c
// NEW (CORRECT):
static void log_superslab_oom_once(...) {
    g_hakmem_lock_depth++;

    FILE* status = fopen("/proc/self/status", "r");
    // ... read file ...
    fclose(status);

    // Don't decrement yet! fprintf needs protection

    fprintf(stderr, "[SS OOM] ...");  // OK (lock_depth still 1)

    g_hakmem_lock_depth--;  // Now safe (all libc calls done)
}
```

**Problem 2: superslab_refill debug message (NEW BUG FOUND)**
```c
// OLD (WRONG):
SuperSlab* ss = superslab_allocate((uint8_t)class_idx);
if (!ss) {
    if (!g_superslab_refill_debug_once) {
        g_superslab_refill_debug_once = 1;
        int err = errno;
        fprintf(stderr, "[DEBUG] superslab_refill returned NULL (OOM) ...");  // UNPROTECTED!
    }
    return NULL;
}
```

**Fix 2:**
```c
// NEW (CORRECT):
SuperSlab* ss = superslab_allocate((uint8_t)class_idx);
if (!ss) {
    if (!g_superslab_refill_debug_once) {
        g_superslab_refill_debug_once = 1;
        int err = errno;

        extern __thread int g_hakmem_lock_depth;
        g_hakmem_lock_depth++;
        fprintf(stderr, "[DEBUG] superslab_refill returned NULL (OOM) ...");
        g_hakmem_lock_depth--;
    }
    return NULL;
}
```

**Impact:** Prevents fprintf from triggering malloc on wrapper hot path

---

## Test Results

### Before Fixes
- **Success Rate:** 35% (estimated based on REMAINING_BUGS_ANALYSIS.md: 70% → 30% with previous fixes)
- **Crash:** `free(): invalid pointer` from libc

### After ALL Fixes (BUG #7, #8, #10, #11)
```bash
Testing 4T Larson high-contention (20 runs)...
Success: 7/20
Failed: 13/20
Success rate: 35%
```

**Conclusion:** No improvement. The fixes are correct but address only PART of the problem.

---

## Root Cause Analysis

### Why Fixes Didn't Help

The crash is **NOT** solely due to wrapper recursion. Evidence:

1. **OOM Happens First:**
```
[DEBUG] superslab_refill returned NULL (OOM)
[DEBUG] Phase 7: tiny_alloc(1024) rejected, using malloc fallback
free(): invalid pointer
```

2. **Malloc Fallback Path:**
When Tiny allocation fails (OOM), it falls back to `hak_alloc_malloc_impl()`:
```c
// core/box/hak_alloc_api.inc.h:43
void* fallback_ptr = hak_alloc_malloc_impl(size);
```

This allocates with:
```c
void* raw = __libc_malloc(HEADER_SIZE + size);  // Allocate with libc
// Write HAKMEM header
hdr->magic = HAKMEM_MAGIC;
hdr->method = ALLOC_METHOD_MALLOC;
return raw + HEADER_SIZE;  // Return user pointer
```

3. **Free Path Should Work:**
When this pointer is freed, `hak_free_at()` should:
- Step 2 (line 92-120): Detect HAKMEM_MAGIC header
- Check `hdr->method == ALLOC_METHOD_MALLOC`
- Call `__libc_free(raw)` correctly

4. **So Why Does It Crash?**

**Hypothesis 1:** Race condition in header write/read
**Hypothesis 2:** OOM causes memory corruption before crash
**Hypothesis 3:** Multiple allocations in flight, one corrupts another's metadata
**Hypothesis 4:** Libc malloc returns pointer that overlaps with HAKMEM memory

---

## Next Steps (Recommended)

### Immediate (High Priority)

1. **Add Comprehensive Logging:**
```c
// In hak_alloc_malloc_impl():
fprintf(stderr, "[FALLBACK_ALLOC] size=%zu raw=%p user=%p\n", size, raw, raw + HEADER_SIZE);

// In hak_free_at() step 2:
fprintf(stderr, "[FALLBACK_FREE] ptr=%p raw=%p magic=0x%X method=%d\n",
        ptr, raw, hdr->magic, hdr->method);
```

2. **Test with Valgrind:**
```bash
valgrind --leak-check=full --show-leak-kinds=all --track-origins=yes \
  ./larson_hakmem 10 8 128 1024 1 12345 4
```

3. **Test with ASan:**
```bash
make asan-larson-alloc
./larson_hakmem_asan_alloc 10 8 128 1024 1 12345 4
```

### Medium Priority

4. **Disable Fallback Path Temporarily:**
```c
// In hak_alloc_api.inc.h:36
if (size <= TINY_MAX_SIZE) {
    // TEST: Return NULL instead of fallback
    return NULL;  // Force application to handle OOM
}
```

5. **Increase Memory Limit:**
```bash
ulimit -v unlimited
./larson_hakmem 10 8 128 1024 1 12345 4
```

6. **Reduce Contention:**
```bash
# Test with fewer chunks to avoid OOM
./larson_hakmem 10 8 128 512 1 12345 4  # 512 instead of 1024
```

### Root Cause Investigation

7. **Check Active Counter Logic:**
The OOM suggests active counter underflow. Review:
- `/mnt/workdisk/public_share/hakmem/core/hakmem_tiny_refill_p0.inc.h:103` (ss_active_add fix from Phase 6-2.3)
- All `ss_active_add()` / `ss_active_dec()` call sites

8. **Check SuperSlab Allocation:**
```bash
# Enable detailed SS logging
HAKMEM_SUPER_REG_REQTRACE=1 HAKMEM_FREE_ROUTE_TRACE=1 \
  ./larson_hakmem 10 8 128 1024 1 12345 4
```

---

## Production Impact

**Status:** NOT READY FOR PRODUCTION

**Blocking Issues:**
1. 65% crash rate on 4T high-contention workload
2. Unknown root cause (wrapper fixes necessary but insufficient)
3. Potential active counter bug or memory corruption

**Safe Configurations:**
- 1T: 100% stable (2.97M ops/s)
- 4T low-contention (256 chunks): 100% stable (251K ops/s)
- 4T high-contention (1024 chunks): 35% stable (981K ops/s when stable)

---

## Code Changes

### Modified Files

1. `/mnt/workdisk/public_share/hakmem/core/box/hak_wrappers.inc.h`
   - Line 40-99: malloc() - moved `g_hakmem_lock_depth++` to start
   - Line 117-180: calloc() - moved `g_hakmem_lock_depth++` to start
   - Line 42: Added extern declaration for `g_jemalloc_loaded`
   - Lines 72,112,149,192: Changed `hak_jemalloc_loaded()` → `g_jemalloc_loaded`

2. `/mnt/workdisk/public_share/hakmem/core/box/hak_core_init.inc.h`
   - Lines 43-55: Pre-detect jemalloc during init (not hot path)

3. `/mnt/workdisk/public_share/hakmem/core/hakmem_tiny_superslab.c`
   - Line 146→177: Moved `g_hakmem_lock_depth--` to AFTER fprintf

4. `/mnt/workdisk/public_share/hakmem/core/tiny_superslab_alloc.inc.h`
   - Lines 392-411: Added `g_hakmem_lock_depth++/--` around fprintf

### Build Command
```bash
make clean
make HEADER_CLASSIDX=1 AGGRESSIVE_INLINE=1 PREWARM_TLS=1 larson_hakmem
```

### Test Command
```bash
# 4T high-contention
./larson_hakmem 10 8 128 1024 1 12345 4

# 20-run stability test
bash /tmp/test_larson_20.sh
```

---

## Lessons Learned

1. **Wrapper Recursion is Insidious:**
   - Any libc function that might malloc must be protected
   - `getenv()`, `fprintf()`, `dlopen()`, `fopen()`, `fclose()` ALL can malloc
   - `g_hakmem_lock_depth` must be incremented BEFORE any libc call

2. **Debug Code Can Cause Bugs:**
   - fprintf in hot paths is dangerous
   - Debug messages should either be compile-time disabled or fully protected

3. **Initialization Order Matters:**
   - dlopen must happen during init, not on first malloc
   - Cached values avoid hot-path overhead and recursion risk

4. **Multiple Bugs Can Hide Each Other:**
   - Fixing wrapper recursion (BUG #7,#8) didn't improve stability
   - Real issue is deeper (OOM, active counter, or corruption)

---

## Recommendations for User

**Short Term (今すぐ):**
- Use 4T with 256 chunks/thread (100% stable)
- Avoid 4T with 1024+ chunks until root cause found

**Medium Term (1-2 days):**
- Run Valgrind/ASan analysis (see "Next Steps")
- Investigate active counter logic
- Add comprehensive logging to fallback path

**Long Term (1 week):**
- Consider disabling fallback path (fail fast instead of corrupt)
- Implement active counter assertions to catch underflow early
- Add memory fence/barrier around header writes in fallback path

---

**End of Report**

がんばりました！ 4つのバグを修正しましたが、根本原因はまだ深いところにあります。次は Valgrind/ASan で詳細調査が必要です。🔥🐛
feat: Phase 7 + Phase 2 - Massive performance & stability improvements Performance Achievements: - Tiny allocations: +180-280% (21M → 59-70M ops/s random mixed) - Single-thread: +24% (2.71M → 3.36M ops/s Larson) - 4T stability: 0% → 95% (19/20 success rate) - Overall: 91.3% of System malloc average (target was 40-55%) ✓ Phase 7 (Tasks 1-3): Core Optimizations - Task 1: Header validation removal (Region-ID direct lookup) - Task 2: Aggressive inline (TLS cache access optimization) - Task 3: Pre-warm TLS cache (eliminate cold-start penalty) Result: +180-280% improvement, 85-146% of System malloc Critical Bug Fixes: - Fix 64B allocation crash (size-to-class +1 for header) - Fix 4T wrapper recursion bugs (BUG #7, #8, #10, #11) - Remove malloc fallback (30% → 50% stability) Phase 2a: SuperSlab Dynamic Expansion (CRITICAL) - Implement mimalloc-style chunk linking - Unlimited slab expansion (no more OOM at 32 slabs) - Fix chunk initialization bug (bitmap=0x00000001 after expansion) Files: core/hakmem_tiny_superslab.c/h, core/superslab/superslab_types.h Result: 50% → 95% stability (19/20 4T success) Phase 2b: TLS Cache Adaptive Sizing - Dynamic capacity: 16-2048 slots based on usage - High-water mark tracking + exponential growth/shrink - Expected: +3-10% performance, -30-50% memory Files: core/tiny_adaptive_sizing.c/h (new) Phase 2c: BigCache Dynamic Hash Table - Migrate from fixed 256×8 array to dynamic hash table - Auto-resize: 256 → 512 → 1024 → 65,536 buckets - Improved hash function (FNV-1a) + collision chaining Files: core/hakmem_bigcache.c/h Expected: +10-20% cache hit rate Design Flaws Analysis: - Identified 6 components with fixed-capacity bottlenecks - SuperSlab (CRITICAL), TLS Cache (HIGH), BigCache/L2.5 (MEDIUM) - Report: DESIGN_FLAWS_ANALYSIS.md (11 chapters) Documentation: - 13 comprehensive reports (PHASE.md, DESIGN_FLAWS.md) - Implementation guides, test results, production readiness - Bug fix reports, root cause analysis Build System: - Makefile: phase7 targets, PREWARM_TLS flag - Auto dependency generation (-MMD -MP) for .inc files Known Issues: - 4T stability: 19/20 (95%) - investigating 1 failure for 100% - L2.5 Pool dynamic sharding: design only (needs 2-3 days integration) 🤖 Generated with Claude Code (https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> 2025-11-08 17:08:00 +09:00			`# Phase 7 Bug #3: 4T High-Contention Crash Debug Report`

			`Date: 2025-11-08`
			`Engineer: Claude Task Agent`
			`Duration: 2.5 hours`
			`Goal: Fix 4T Larson crash with 1024 chunks/thread (high contention)`

			`---`

			`## Summary`

			`Result: PARTIAL SUCCESS - Fixed 4 critical bugs but crash persists`
			`Success Rate: 35% (7/20 runs) - same as before fixes`
			`Root Cause: Multiple interacting issues; deeper investigation needed`

			`Bugs Fixed:`
			1. BUG #7: malloc() wrapper `g_hakmem_lock_depth++` called too late
			2. BUG #8: calloc() wrapper `g_hakmem_lock_depth++` called too late
			`3. BUG #10: dlopen() called on hot path causing infinite recursion`
			`4. BUG #11: Unprotected fprintf() in OOM logging paths`

			`Status: These fixes are NECESSARY but NOT SUFFICIENT to solve the crash`

			`---`

			`## Bug Details`

			`### BUG #7: malloc() Wrapper Lock Depth (FIXED)`

			File: `/mnt/workdisk/public_share/hakmem/core/box/hak_wrappers.inc.h:40-99`

			`Problem:`
			```c
			`// BEFORE (WRONG):`
			`void* malloc(size_t size) {`
			`if (g_initializing != 0) { return __libc_malloc(size); }`

			`// BUG: getenv/fprintf/dlopen called BEFORE g_hakmem_lock_depth++`
			`static int debug_enabled = -1;`
			`if (debug_enabled < 0) {`
			`debug_enabled = (getenv("HAKMEM_SFC_DEBUG") != NULL) ? 1 : 0; // malloc!`
			`}`
			`if (debug_enabled) fprintf(stderr, "[DEBUG] malloc(%zu)\n", size); // malloc!`

			`if (hak_force_libc_alloc()) { ... } // calls getenv → malloc!`
			`int ld_mode = hak_ld_env_mode(); // calls getenv → malloc!`
			`if (ld_mode && hak_jemalloc_loaded()) { ... } // calls dlopen → malloc!`

			`g_hakmem_lock_depth++; // TOO LATE!`
			`void* ptr = hak_alloc_at(size, HAK_CALLSITE());`
			`g_hakmem_lock_depth--;`
			`return ptr;`
			`}`
			```

			`Why It Crashes:`
			1. `getenv()` doesn't malloc, but `fprintf()` does (for stderr buffering)
			2. `dlopen()` definitely mallocs (internal data structures)
			`3. When these malloc, they call back into our wrapper → infinite recursion`
			4. Result: `free(): invalid pointer` (corrupted metadata)

			`Fix:`
			```c
			`// AFTER (CORRECT):`
			`void* malloc(size_t size) {`
			`// CRITICAL FIX: Increment lock depth FIRST!`
			`g_hakmem_lock_depth++;`

			`// Guard against recursion`
			`if (g_initializing != 0) {`
			`g_hakmem_lock_depth--;`
			`return __libc_malloc(size);`
			`}`

			`// Now safe - any malloc from getenv/fprintf/dlopen uses __libc_malloc`
			`static int debug_enabled = -1;`
			`if (debug_enabled < 0) {`
			`debug_enabled = (getenv("HAKMEM_SFC_DEBUG") != NULL) ? 1 : 0; // OK!`
			`}`
			`// ... rest of code`

			`void* ptr = hak_alloc_at(size, HAK_CALLSITE());`
			`g_hakmem_lock_depth--; // Decrement at end`
			`return ptr;`
			`}`
			```

			`Impact: Prevents infinite recursion when malloc wrapper calls libc functions`

			`---`

			`### BUG #8: calloc() Wrapper Lock Depth (FIXED)`

			File: `/mnt/workdisk/public_share/hakmem/core/box/hak_wrappers.inc.h:117-180`

			Problem: Same as BUG #7 - `g_hakmem_lock_depth++` called after getenv/dlopen

			Fix: Move `g_hakmem_lock_depth++` to line 119 (function start)

			`Impact: Prevents calloc infinite recursion`

			`---`

			`### BUG #10: dlopen() on Hot Path (FIXED)`

			`File:`
			- `/mnt/workdisk/public_share/hakmem/core/hakmem.c:166-174` (hak_jemalloc_loaded function)
			- `/mnt/workdisk/public_share/hakmem/core/box/hak_core_init.inc.h:43-55` (initialization)
			- `/mnt/workdisk/public_share/hakmem/core/box/hak_wrappers.inc.h:42,72,112,149,192` (wrapper call sites)

			`Problem:`
			```c
			`// OLD (DANGEROUS):`
			`static inline int hak_jemalloc_loaded(void) {`
			`if (g_jemalloc_loaded < 0) {`
			`void* h = dlopen("libjemalloc.so.2", RTLD_NOLOAD \| RTLD_NOW); // MALLOC!`
			`if (!h) h = dlopen("libjemalloc.so.1", RTLD_NOLOAD \| RTLD_NOW); // MALLOC!`
			`g_jemalloc_loaded = (h != NULL) ? 1 : 0;`
			`if (h) dlclose(h); // MALLOC!`
			`}`
			`return g_jemalloc_loaded;`
			`}`

			`// Called from malloc wrapper:`
			`if (hak_ld_block_jemalloc() && hak_jemalloc_loaded()) { // dlopen → malloc → wrapper → dlopen → ...`
			`return __libc_malloc(size);`
			`}`
			```

			`Why It Crashes:`
			- `dlopen()` calls malloc internally (dynamic linker allocations)
			- Wrapper calls `hak_jemalloc_loaded()` → `dlopen()` → `malloc()` → wrapper → infinite loop

			`Fix:`
			`1. Pre-detect jemalloc during initialization (hak_init_impl):`
			```c
			`// In hak_core_init.inc.h:43-55`
			`extern int g_jemalloc_loaded;`
			`if (g_jemalloc_loaded < 0) {`
			`void* h = dlopen("libjemalloc.so.2", RTLD_NOLOAD \| RTLD_NOW);`
			`if (!h) h = dlopen("libjemalloc.so.1", RTLD_NOLOAD \| RTLD_NOW);`
			`g_jemalloc_loaded = (h != NULL) ? 1 : 0;`
			`if (h) dlclose(h);`
			`}`
			```

			`2. Use cached variable in wrapper:`
			```c
			`// In hak_wrappers.inc.h`
			`extern int g_jemalloc_loaded; // Declared at top`

			`// In malloc():`
			`if (hak_ld_block_jemalloc() && g_jemalloc_loaded) { // No function call!`
			`g_hakmem_lock_depth--;`
			`return __libc_malloc(size);`
			`}`
			```

			`Impact: Removes dlopen from hot path, prevents infinite recursion`

			`---`

			`### BUG #11: Unprotected fprintf() in OOM Logging (FIXED)`

			`Files:`
			- `/mnt/workdisk/public_share/hakmem/core/hakmem_tiny_superslab.c:146-177` (log_superslab_oom_once)
			- `/mnt/workdisk/public_share/hakmem/core/tiny_superslab_alloc.inc.h:391-411` (superslab_refill debug)

			`Problem 1: log_superslab_oom_once (PARTIALLY FIXED BEFORE)`
			```c
			`// OLD (WRONG):`
			`static void log_superslab_oom_once(...) {`
			`g_hakmem_lock_depth++;`

			`FILE* status = fopen("/proc/self/status", "r"); // OK (lock_depth=1)`
			`// ... read file ...`
			`fclose(status); // OK (lock_depth=1)`

			`g_hakmem_lock_depth--; // WRONG LOCATION!`

			`// BUG: fprintf called AFTER lock_depth restored to 0!`
			`fprintf(stderr, "[SS OOM] ..."); // fprintf → malloc → wrapper (lock_depth=0) → CRASH!`
			`}`
			```

			`Fix 1:`
			```c
			`// NEW (CORRECT):`
			`static void log_superslab_oom_once(...) {`
			`g_hakmem_lock_depth++;`

			`FILE* status = fopen("/proc/self/status", "r");`
			`// ... read file ...`
			`fclose(status);`

			`// Don't decrement yet! fprintf needs protection`

			`fprintf(stderr, "[SS OOM] ..."); // OK (lock_depth still 1)`

			`g_hakmem_lock_depth--; // Now safe (all libc calls done)`
			`}`
			```

			`Problem 2: superslab_refill debug message (NEW BUG FOUND)`
			```c
			`// OLD (WRONG):`
			`SuperSlab* ss = superslab_allocate((uint8_t)class_idx);`
			`if (!ss) {`
			`if (!g_superslab_refill_debug_once) {`
			`g_superslab_refill_debug_once = 1;`
			`int err = errno;`
			`fprintf(stderr, "[DEBUG] superslab_refill returned NULL (OOM) ..."); // UNPROTECTED!`
			`}`
			`return NULL;`
			`}`
			```

			`Fix 2:`
			```c
			`// NEW (CORRECT):`
			`SuperSlab* ss = superslab_allocate((uint8_t)class_idx);`
			`if (!ss) {`
			`if (!g_superslab_refill_debug_once) {`
			`g_superslab_refill_debug_once = 1;`
			`int err = errno;`

			`extern __thread int g_hakmem_lock_depth;`
			`g_hakmem_lock_depth++;`
			`fprintf(stderr, "[DEBUG] superslab_refill returned NULL (OOM) ...");`
			`g_hakmem_lock_depth--;`
			`}`
			`return NULL;`
			`}`
			```

			`Impact: Prevents fprintf from triggering malloc on wrapper hot path`

			`---`

			`## Test Results`

			`### Before Fixes`
			`- Success Rate: 35% (estimated based on REMAINING_BUGS_ANALYSIS.md: 70% → 30% with previous fixes)`
			- Crash: `free(): invalid pointer` from libc

			`### After ALL Fixes (BUG #7, #8, #10, #11)`
			```bash
			`Testing 4T Larson high-contention (20 runs)...`
			`Success: 7/20`
			`Failed: 13/20`
			`Success rate: 35%`
			```

			`Conclusion: No improvement. The fixes are correct but address only PART of the problem.`

			`---`

			`## Root Cause Analysis`

			`### Why Fixes Didn't Help`

			`The crash is NOT solely due to wrapper recursion. Evidence:`

			`1. OOM Happens First:`
			```
			`[DEBUG] superslab_refill returned NULL (OOM)`
			`[DEBUG] Phase 7: tiny_alloc(1024) rejected, using malloc fallback`
			`free(): invalid pointer`
			```

			`2. Malloc Fallback Path:`
			When Tiny allocation fails (OOM), it falls back to `hak_alloc_malloc_impl()`:
			```c
			`// core/box/hak_alloc_api.inc.h:43`
			`void* fallback_ptr = hak_alloc_malloc_impl(size);`
			```

			`This allocates with:`
			```c
			`void* raw = __libc_malloc(HEADER_SIZE + size); // Allocate with libc`
			`// Write HAKMEM header`
			`hdr->magic = HAKMEM_MAGIC;`
			`hdr->method = ALLOC_METHOD_MALLOC;`
			`return raw + HEADER_SIZE; // Return user pointer`
			```

			`3. Free Path Should Work:`
			When this pointer is freed, `hak_free_at()` should:
			`- Step 2 (line 92-120): Detect HAKMEM_MAGIC header`
			- Check `hdr->method == ALLOC_METHOD_MALLOC`
			- Call `__libc_free(raw)` correctly

			`4. So Why Does It Crash?`

			`Hypothesis 1: Race condition in header write/read`
			`Hypothesis 2: OOM causes memory corruption before crash`
			`Hypothesis 3: Multiple allocations in flight, one corrupts another's metadata`
			`Hypothesis 4: Libc malloc returns pointer that overlaps with HAKMEM memory`

			`---`

			`## Next Steps (Recommended)`

			`### Immediate (High Priority)`

			`1. Add Comprehensive Logging:`
			```c
			`// In hak_alloc_malloc_impl():`
			`fprintf(stderr, "[FALLBACK_ALLOC] size=%zu raw=%p user=%p\n", size, raw, raw + HEADER_SIZE);`

			`// In hak_free_at() step 2:`
			`fprintf(stderr, "[FALLBACK_FREE] ptr=%p raw=%p magic=0x%X method=%d\n",`
			`ptr, raw, hdr->magic, hdr->method);`
			```

			`2. Test with Valgrind:`
			```bash
			`valgrind --leak-check=full --show-leak-kinds=all --track-origins=yes \`
			`./larson_hakmem 10 8 128 1024 1 12345 4`
			```

			`3. Test with ASan:`
			```bash
			`make asan-larson-alloc`
			`./larson_hakmem_asan_alloc 10 8 128 1024 1 12345 4`
			```

			`### Medium Priority`

			`4. Disable Fallback Path Temporarily:`
			```c
			`// In hak_alloc_api.inc.h:36`
			`if (size <= TINY_MAX_SIZE) {`
			`// TEST: Return NULL instead of fallback`
			`return NULL; // Force application to handle OOM`
			`}`
			```

			`5. Increase Memory Limit:`
			```bash
			`ulimit -v unlimited`
			`./larson_hakmem 10 8 128 1024 1 12345 4`
			```

			`6. Reduce Contention:`
			```bash
			`# Test with fewer chunks to avoid OOM`
			`./larson_hakmem 10 8 128 512 1 12345 4 # 512 instead of 1024`
			```

			`### Root Cause Investigation`

			`7. Check Active Counter Logic:`
			`The OOM suggests active counter underflow. Review:`
			- `/mnt/workdisk/public_share/hakmem/core/hakmem_tiny_refill_p0.inc.h:103` (ss_active_add fix from Phase 6-2.3)
			- All `ss_active_add()` / `ss_active_dec()` call sites

			`8. Check SuperSlab Allocation:`
			```bash
			`# Enable detailed SS logging`
			`HAKMEM_SUPER_REG_REQTRACE=1 HAKMEM_FREE_ROUTE_TRACE=1 \`
			`./larson_hakmem 10 8 128 1024 1 12345 4`
			```

			`---`

			`## Production Impact`

			`Status: NOT READY FOR PRODUCTION`

			`Blocking Issues:`
			`1. 65% crash rate on 4T high-contention workload`
			`2. Unknown root cause (wrapper fixes necessary but insufficient)`
			`3. Potential active counter bug or memory corruption`

			`Safe Configurations:`
			`- 1T: 100% stable (2.97M ops/s)`
			`- 4T low-contention (256 chunks): 100% stable (251K ops/s)`
			`- 4T high-contention (1024 chunks): 35% stable (981K ops/s when stable)`

			`---`

			`## Code Changes`

			`### Modified Files`

			1. `/mnt/workdisk/public_share/hakmem/core/box/hak_wrappers.inc.h`
			- Line 40-99: malloc() - moved `g_hakmem_lock_depth++` to start
			- Line 117-180: calloc() - moved `g_hakmem_lock_depth++` to start
			- Line 42: Added extern declaration for `g_jemalloc_loaded`
			- Lines 72,112,149,192: Changed `hak_jemalloc_loaded()` → `g_jemalloc_loaded`

			2. `/mnt/workdisk/public_share/hakmem/core/box/hak_core_init.inc.h`
			`- Lines 43-55: Pre-detect jemalloc during init (not hot path)`

			3. `/mnt/workdisk/public_share/hakmem/core/hakmem_tiny_superslab.c`
			- Line 146→177: Moved `g_hakmem_lock_depth--` to AFTER fprintf

			4. `/mnt/workdisk/public_share/hakmem/core/tiny_superslab_alloc.inc.h`
			- Lines 392-411: Added `g_hakmem_lock_depth++/--` around fprintf

			`### Build Command`
			```bash
			`make clean`
			`make HEADER_CLASSIDX=1 AGGRESSIVE_INLINE=1 PREWARM_TLS=1 larson_hakmem`
			```

			`### Test Command`
			```bash
			`# 4T high-contention`
			`./larson_hakmem 10 8 128 1024 1 12345 4`

			`# 20-run stability test`
			`bash /tmp/test_larson_20.sh`
			```

			`---`

			`## Lessons Learned`

			`1. Wrapper Recursion is Insidious:`
			`- Any libc function that might malloc must be protected`
			- `getenv()`, `fprintf()`, `dlopen()`, `fopen()`, `fclose()` ALL can malloc
			- `g_hakmem_lock_depth` must be incremented BEFORE any libc call

			`2. Debug Code Can Cause Bugs:`
			`- fprintf in hot paths is dangerous`
			`- Debug messages should either be compile-time disabled or fully protected`

			`3. Initialization Order Matters:`
			`- dlopen must happen during init, not on first malloc`
			`- Cached values avoid hot-path overhead and recursion risk`

			`4. Multiple Bugs Can Hide Each Other:`
			`- Fixing wrapper recursion (BUG #7,#8) didn't improve stability`
			`- Real issue is deeper (OOM, active counter, or corruption)`

			`---`

			`## Recommendations for User`

			`Short Term (今すぐ):`
			`- Use 4T with 256 chunks/thread (100% stable)`
			`- Avoid 4T with 1024+ chunks until root cause found`

			`Medium Term (1-2 days):`
			`- Run Valgrind/ASan analysis (see "Next Steps")`
			`- Investigate active counter logic`
			`- Add comprehensive logging to fallback path`

			`Long Term (1 week):`
			`- Consider disabling fallback path (fail fast instead of corrupt)`
			`- Implement active counter assertions to catch underflow early`
			`- Add memory fence/barrier around header writes in fallback path`

			`---`

			`End of Report`

			`がんばりました！ 4つのバグを修正しましたが、根本原因はまだ深いところにあります。次は Valgrind/ASan で詳細調査が必要です。🔥🐛`