Files
hakmem/docs/analysis/PHASE15_BUG_ROOT_CAUSE_FINAL.md
Moe Charm (CI) a9ddb52ad4 ENV cleanup: Remove BG/HotMag vars & guard fprintf (Larson 52.3M ops/s)
Phase 1 完了:環境変数整理 + fprintf デバッグガード

ENV変数削除(BG/HotMag系):
- core/hakmem_tiny_init.inc: HotMag ENV 削除 (~131 lines)
- core/hakmem_tiny_bg_spill.c: BG spill ENV 削除
- core/tiny_refill.h: BG remote 固定値化
- core/hakmem_tiny_slow.inc: BG refs 削除

fprintf Debug Guards (#if !HAKMEM_BUILD_RELEASE):
- core/hakmem_shared_pool.c: Lock stats (~18 fprintf)
- core/page_arena.c: Init/Shutdown/Stats (~27 fprintf)
- core/hakmem.c: SIGSEGV init message

ドキュメント整理:
- 328 markdown files 削除(旧レポート・重複docs)

性能確認:
- Larson: 52.35M ops/s (前回52.8M、安定動作)
- ENV整理による機能影響なし
- Debug出力は一部残存(次phase で対応)

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 14:45:26 +09:00

5.3 KiB
Raw Blame History

Phase 15 Bug - Root Cause Analysis (FINAL)

Date: 2025-11-15
Status: ROOT CAUSE IDENTIFIED

Summary

Page-aligned Tiny allocations (0x...000) reach ExternalGuard → __libc_free() → crash.

Evidence

Phase 14 vs Phase 15 Behavior

Phase Test Result Behavior
Phase 14 PASS (5.69M ops/s) No crash with same test
Phase 15 CRASH ExternalGuard → __libc_free() failure

Crash Pattern

[ExternalGuard] ptr=0x706c21a00000 offset_in_page=0x0 (page-aligned!)
[ExternalGuard] hak_super_lookup(ptr) = (nil)  ← SuperSlab registry: NOT FOUND
[ExternalGuard] FrontGate classification: domain=MIDCAND
[ExternalGuard] ptr=0x706c21a00000 delegated to __libc_free
free(): invalid pointer  ← CRASH

Root Cause

1. Page-Aligned Tiny Allocations Exist

Proof (mathematical):

  • Block stride = user_size + 1 (with 1-byte header)
  • Example: 257B stride (class 5)
  • Carved pointer: base + (carved_index × 257)
  • User pointer: carved_ptr + 1
  • For page-aligned user_ptr: (n × 257) mod 4096 == 4095
  • Since gcd(257, 4096) = 1, solution exists!

Allocation flow:

// hakmem_tiny.c:160-163
#define HAK_RET_ALLOC(cls, base_ptr) do { \
    *(uint8_t*)(base_ptr) = HEADER_MAGIC | ((cls) & HEADER_CLASS_MASK); \
    return (void*)((uint8_t*)(base_ptr) + 1);  ← Returns user_ptr
} while(0)

If base_ptr = 0x...FFF, then user_ptr = 0x...000 (PAGE-ALIGNED!).

2. Box FrontGate Classifies as MIDCAND (Correct by Design)

front_gate_v2.h:52-59:

// CRITICAL: Same-page guard (header must be in same page as ptr)
uintptr_t offset_in_page = (uintptr_t)ptr & 0xFFF;
if (offset_in_page == 0) {
    // Page-aligned pointer → no header in same page → must be MIDCAND
    result.domain = FG_DOMAIN_MIDCAND;
    return result;
}

Reason: Reading header at ptr-1 would cross page boundary (unsafe).
Result: Page-aligned Tiny allocations → classified as MIDCAND

3. MIDCAND Routing → SuperSlab Registry Lookup FAILS

hak_free_api.inc.h MIDCAND path:

  1. Mid registry lookup → NULL (not Mid allocation)
  2. L25 registry lookup → NULL (not L25 allocation)
  3. SuperSlab registry lookupNULL (BUG!)
  4. ExternalGuard → __libc_free() → crash

Why SuperSlab lookup fails:

Theory A: Pointer is NOT from hakmem

  • REJECTED: System malloc test shows no page-aligned pointers for 16-1040B

Theory B: SuperSlab is not registered

  • LIKELY: Race condition, registry exhaustion, or allocation before registration

Theory C: Registry lookup bug

  • POSSIBLE: Hash collision, probe limit, or alignment mismatch

4. Why Phase 14 Works but Phase 15 Doesn't

Phase 14: Old classification system (no Box FrontGate/ExternalGuard)

  • Uses different routing logic
  • May have headerless path for page-aligned pointers
  • Different SuperSlab lookup implementation?

Phase 15: New Box architecture

  • Box FrontGate → classifies page-aligned as MIDCAND
  • Box routing → SuperSlab lookup
  • Box ExternalGuard → delegates to __libc_free()CRASH

Fix Options

Issue: hak_super_lookup(0x706c21a00000) returns NULL for valid hakmem allocation.

Root cause options:

  1. SuperSlab not registered (allocation race)
  2. Registry full/hash collision
  3. Lookup alignment mismatch

Investigation needed:

  • Add debug logging to hak_super_register() / hak_super_lookup()
  • Check if SuperSlab exists for this pointer
  • Verify registration happens before user pointer is returned

Fix: Ensure all SuperSlabs are properly registered before returning user pointers.

Idea: Classify page-aligned Tiny pointers as TINY instead of MIDCAND.

Problems:

  • Can't read header at ptr-1 (page boundary violation)
  • Would need alternative classification (size class lookup?)
  • Violates Box FG design (1-byte header only)

Option 3: Fix ExternalGuard Fallback ⚠️ WORKAROUND

Idea: ExternalGuard should NOT delegate unknown pointers to __libc_free().

Change:

// Before (BUG):
if (!is_mapped) return 0;  // Delegate to __libc_free (crashes!)

// After (FIX):
if (!is_mapped) {
    // Unknown pointer - log and return success (leak vs crash tradeoff)
    fprintf(stderr, "[ExternalGuard] WARNING: Unknown pointer %p (ignored)\n", ptr);
    return 1;  // Claim handled (prevent __libc_free crash)
}

Cons: Memory leak for genuinely external pointers.

Next Steps

  1. Add SuperSlab Registry Debug Logging

    • Log all hak_super_register() calls
    • Log all hak_super_lookup() failures
    • Track when 0x706c21a00000 is allocated and registered
  2. Verify Registration Timing

    • Ensure SuperSlab is registered BEFORE user pointers are returned
    • Check for race conditions in allocation path
  3. Implement Fix Option 1

    • Fix SuperSlab registry lookup
    • Verify with 100K iterations test

Conclusion

Primary Bug: SuperSlab registry lookup fails for page-aligned Tiny allocations.

Secondary Bug: ExternalGuard unconditionally delegates to __libc_free() (should handle unknown pointers safely).

Recommended Fix: Fix SuperSlab registry (Option 1) + improve ExternalGuard safety (Option 3 as backup).