# Phase 15 Bug - Root Cause Analysis (FINAL) **Date**: 2025-11-15 **Status**: ROOT CAUSE IDENTIFIED ✅ ## Summary Page-aligned Tiny allocations (`0x...000`) reach ExternalGuard → `__libc_free()` → crash. ## Evidence ### Phase 14 vs Phase 15 Behavior | Phase | Test Result | Behavior | |-------|-------------|----------| | Phase 14 | ✅ PASS (5.69M ops/s) | No crash with same test | | Phase 15 | ❌ CRASH | ExternalGuard → `__libc_free()` failure | ### Crash Pattern ``` [ExternalGuard] ptr=0x706c21a00000 offset_in_page=0x0 (page-aligned!) [ExternalGuard] hak_super_lookup(ptr) = (nil) ← SuperSlab registry: NOT FOUND [ExternalGuard] FrontGate classification: domain=MIDCAND [ExternalGuard] ptr=0x706c21a00000 delegated to __libc_free free(): invalid pointer ← CRASH ``` ## Root Cause ### 1. Page-Aligned Tiny Allocations Exist **Proof** (mathematical): - Block stride = user_size + 1 (with 1-byte header) - Example: 257B stride (class 5) - Carved pointer: `base + (carved_index × 257)` - User pointer: `carved_ptr + 1` - For page-aligned user_ptr: `(n × 257) mod 4096 == 4095` - Since gcd(257, 4096) = 1, **solution exists**! **Allocation flow**: ```c // hakmem_tiny.c:160-163 #define HAK_RET_ALLOC(cls, base_ptr) do { \ *(uint8_t*)(base_ptr) = HEADER_MAGIC | ((cls) & HEADER_CLASS_MASK); \ return (void*)((uint8_t*)(base_ptr) + 1); ← Returns user_ptr } while(0) ``` If `base_ptr = 0x...FFF`, then `user_ptr = 0x...000` (PAGE-ALIGNED!). ### 2. Box FrontGate Classifies as MIDCAND (Correct by Design) **front_gate_v2.h:52-59**: ```c // CRITICAL: Same-page guard (header must be in same page as ptr) uintptr_t offset_in_page = (uintptr_t)ptr & 0xFFF; if (offset_in_page == 0) { // Page-aligned pointer → no header in same page → must be MIDCAND result.domain = FG_DOMAIN_MIDCAND; return result; } ``` **Reason**: Reading header at `ptr-1` would cross page boundary (unsafe). **Result**: Page-aligned Tiny allocations → classified as MIDCAND ✅ ### 3. MIDCAND Routing → SuperSlab Registry Lookup FAILS **hak_free_api.inc.h** MIDCAND path: 1. Mid registry lookup → NULL (not Mid allocation) 2. L25 registry lookup → NULL (not L25 allocation) 3. **SuperSlab registry lookup** → **NULL** ❌ (BUG!) 4. ExternalGuard → `__libc_free()` → crash **Why SuperSlab lookup fails**: **Theory A**: Pointer is NOT from hakmem - **REJECTED**: System malloc test shows no page-aligned pointers for 16-1040B **Theory B**: SuperSlab is not registered - **LIKELY**: Race condition, registry exhaustion, or allocation before registration **Theory C**: Registry lookup bug - **POSSIBLE**: Hash collision, probe limit, or alignment mismatch ### 4. Why Phase 14 Works but Phase 15 Doesn't **Phase 14**: Old classification system (no Box FrontGate/ExternalGuard) - Uses different routing logic - May have headerless path for page-aligned pointers - Different SuperSlab lookup implementation? **Phase 15**: New Box architecture - Box FrontGate → classifies page-aligned as MIDCAND - Box routing → SuperSlab lookup - Box ExternalGuard → delegates to `__libc_free()` → **CRASH** ## Fix Options ### Option 1: Fix SuperSlab Registry Lookup ✅ **RECOMMENDED** **Issue**: `hak_super_lookup(0x706c21a00000)` returns NULL for valid hakmem allocation. **Root cause options**: 1. SuperSlab not registered (allocation race) 2. Registry full/hash collision 3. Lookup alignment mismatch **Investigation needed**: - Add debug logging to `hak_super_register()` / `hak_super_lookup()` - Check if SuperSlab exists for this pointer - Verify registration happens before user pointer is returned **Fix**: Ensure all SuperSlabs are properly registered before returning user pointers. ### Option 2: Add Page-Aligned Special Path in FrontGate ❌ NOT RECOMMENDED **Idea**: Classify page-aligned Tiny pointers as TINY instead of MIDCAND. **Problems**: - Can't read header at `ptr-1` (page boundary violation) - Would need alternative classification (size class lookup?) - Violates Box FG design (1-byte header only) ### Option 3: Fix ExternalGuard Fallback ⚠️ WORKAROUND **Idea**: ExternalGuard should NOT delegate unknown pointers to `__libc_free()`. **Change**: ```c // Before (BUG): if (!is_mapped) return 0; // Delegate to __libc_free (crashes!) // After (FIX): if (!is_mapped) { // Unknown pointer - log and return success (leak vs crash tradeoff) fprintf(stderr, "[ExternalGuard] WARNING: Unknown pointer %p (ignored)\n", ptr); return 1; // Claim handled (prevent __libc_free crash) } ``` **Cons**: Memory leak for genuinely external pointers. ## Next Steps 1. **Add SuperSlab Registry Debug Logging** ✅ - Log all `hak_super_register()` calls - Log all `hak_super_lookup()` failures - Track when `0x706c21a00000` is allocated and registered 2. **Verify Registration Timing** - Ensure SuperSlab is registered BEFORE user pointers are returned - Check for race conditions in allocation path 3. **Implement Fix Option 1** - Fix SuperSlab registry lookup - Verify with 100K iterations test ## Conclusion **Primary Bug**: SuperSlab registry lookup fails for page-aligned Tiny allocations. **Secondary Bug**: ExternalGuard unconditionally delegates to `__libc_free()` (should handle unknown pointers safely). **Recommended Fix**: Fix SuperSlab registry (Option 1) + improve ExternalGuard safety (Option 3 as backup).