Files
hakmem/PHASE15_BUG_ANALYSIS.md
Moe Charm (CI) d378ee11a0 Phase 15: Box BenchMeta separation + ExternalGuard debug + investigation report
- Implement Box BenchMeta pattern in bench_random_mixed.c (BENCH_META_CALLOC/FREE)
- Add enhanced debug logging to external_guard_box.h (caller tracking, FG classification)
- Document investigation in PHASE15_BUG_ANALYSIS.md

Issue: Page-aligned MIDCAND pointer not in SuperSlab registry → ExternalGuard → crash
Hypothesis: May be pre-existing SuperSlab bug (not Phase 15-specific)
Next: Test in Phase 14-C to verify
2025-11-15 23:00:21 +09:00

4.6 KiB
Raw Blame History

Phase 15 Bug Analysis - ExternalGuard Crash Investigation

Date: 2025-11-15 Status: ROOT CAUSE IDENTIFIED

Summary

ExternalGuard is being called with a page-aligned pointer (0x7fd8f8202000) that:

  • hak_super_lookup() returns NULL (not in registry)
  • __libc_free() rejects as "invalid pointer"

Evidence

Crash Log

[ExternalGuard] ptr=0x7fd8f8202000 offset_in_page=0x0 (call #1)
[ExternalGuard] >>> Use: addr2line -e <binary> 0x58b613548275
[ExternalGuard] hak_super_lookup(ptr) = (nil)
[ExternalGuard] ptr=0x7fd8f8202000 delegated to __libc_free
free(): invalid pointer

Caller Identification

Using objdump analysis, caller address 0x...8275 maps to:

  • Function: free() wrapper (line 0xb270 in binary)
  • Source: free(slots) from bench_random_mixed.c line 85

Allocation Analysis

// bench_random_mixed.c line 34:
void** slots = (void**)calloc(256, sizeof(void*));  // = 2048 bytes

calloc(2048) routing (core/box/hak_wrappers.inc.h:282-285):

if (ld_safe_mode_calloc >= 2 || total > TINY_MAX_SIZE) {  // TINY_MAX_SIZE = 1023
    extern void* __libc_calloc(size_t, size_t);
    return __libc_calloc(nmemb, size);  // ← Delegates to libc!
}

Expected: calloc(2048)__libc_calloc() (delegated to libc)

Root Cause Analysis

Free Path Bug (core/box/hak_wrappers.inc.h)

Lines 147-166: Early classification

ptr_classification_t c = classify_ptr(ptr);
if (is_hakmem_owned) {
    hak_free_at(ptr, ...);  // Path A: HAKMEM allocations
    return;
}

Lines 226-228: FINAL FALLBACK - unconditional routing

g_hakmem_lock_depth++;
hak_free_at(ptr, 0, HAK_CALLSITE());  // ← BUG: Routes ALL pointers!
g_hakmem_lock_depth--;

The Bug: Non-HAKMEM pointers that pass all early-exit checks (lines 171-225) get unconditionally routed to hak_free_at(), even though classify_ptr() returned PTR_KIND_EXTERNAL (not HAKMEM-owned).

Why __libc_free() Rejects the Pointer

Two Hypotheses:

Hypothesis A: Pointer is from __libc_calloc() (expected), but something corrupts it before reaching __libc_free()

  • Test: calloc(256, 8) returned offset 0x2a0 (not page-aligned)
  • Contradiction: Crash log shows page-aligned pointer (0x...000)
  • Conclusion: Pointer is NOT from calloc(slots)

Hypothesis B: Pointer is a HAKMEM allocation that classify_ptr() failed to recognize

  • Pool TLS allocations CAN be page-aligned (mmap'd chunks)
  • hak_super_lookup() returns NULL → not in Tiny registry
  • Likely: This is a Pool TLS allocation (2KB = Pool range 8-52KB)

Verification Tests

Test 1: Pool TLS Allocation Check

# Check if 2KB allocations use Pool TLS
./test/pool_tls_allocation_test 2048

Test 2: classify_ptr() Behavior

void* ptr = calloc(256, sizeof(void*));  // 2048 bytes
ptr_classification_t c = classify_ptr(ptr);
printf("kind=%d (POOL_TLS=%d, EXTERNAL=%d)\n",
       c.kind, PTR_KIND_POOL_TLS, PTR_KIND_EXTERNAL);

Next Steps

Change line 227 to check HAKMEM ownership first:

// Before (BUG):
hak_free_at(ptr, 0, HAK_CALLSITE());  // Routes ALL pointers

// After (FIX):
if (is_hakmem_owned) {
    hak_free_at(ptr, 0, HAK_CALLSITE());
} else {
    extern void __libc_free(void*);
    __libc_free(ptr);  // Proper fallback for libc allocations
}

Problem: is_hakmem_owned is out of scope (line 149-159 block)

Solution: Hoist is_hakmem_owned to function scope or re-classify at line 226

Option 2: Fix classify_ptr() to Recognize Pool TLS

If pointer is actually Pool TLS but misclassified:

  • Add Pool TLS registry lookup to classify_ptr()
  • Ensure Pool allocations are properly registered

Option 3: Defer Phase 15 (Current)

Revert to Phase 14-C until free() wrapper logic is fixed

User's Insight

"うん mincore のセグフォはむしろ 違う層から呼ばれているという バグ発見じゃにゃいの?"

Translation: "Wait, isn't the mincore SEGV actually detecting a bug - that it's being called from the wrong layer?"

Interpretation: ExternalGuard being called is CORRECT behavior - it's detecting that a HAKMEM pointer (Pool TLS?) is not being recognized by the classification layer!

Conclusion

Primary Bug: free() wrapper unconditionally routes all pointers to hak_free_at() at line 227, regardless of HAKMEM ownership.

Secondary Bug (suspected): classify_ptr() may fail to recognize Pool TLS allocations, causing them to be misclassified as PTR_KIND_EXTERNAL.

Recommendation: Fix Option 1 (free() wrapper logic) first, then investigate Pool TLS classification if issue persists.