167 lines
5.3 KiB
Markdown
167 lines
5.3 KiB
Markdown
|
|
# Phase 15 Bug - Root Cause Analysis (FINAL)
|
|||
|
|
|
|||
|
|
**Date**: 2025-11-15
|
|||
|
|
**Status**: ROOT CAUSE IDENTIFIED ✅
|
|||
|
|
|
|||
|
|
## Summary
|
|||
|
|
|
|||
|
|
Page-aligned Tiny allocations (`0x...000`) reach ExternalGuard → `__libc_free()` → crash.
|
|||
|
|
|
|||
|
|
## Evidence
|
|||
|
|
|
|||
|
|
### Phase 14 vs Phase 15 Behavior
|
|||
|
|
|
|||
|
|
| Phase | Test Result | Behavior |
|
|||
|
|
|-------|-------------|----------|
|
|||
|
|
| Phase 14 | ✅ PASS (5.69M ops/s) | No crash with same test |
|
|||
|
|
| Phase 15 | ❌ CRASH | ExternalGuard → `__libc_free()` failure |
|
|||
|
|
|
|||
|
|
### Crash Pattern
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
[ExternalGuard] ptr=0x706c21a00000 offset_in_page=0x0 (page-aligned!)
|
|||
|
|
[ExternalGuard] hak_super_lookup(ptr) = (nil) ← SuperSlab registry: NOT FOUND
|
|||
|
|
[ExternalGuard] FrontGate classification: domain=MIDCAND
|
|||
|
|
[ExternalGuard] ptr=0x706c21a00000 delegated to __libc_free
|
|||
|
|
free(): invalid pointer ← CRASH
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## Root Cause
|
|||
|
|
|
|||
|
|
### 1. Page-Aligned Tiny Allocations Exist
|
|||
|
|
|
|||
|
|
**Proof** (mathematical):
|
|||
|
|
- Block stride = user_size + 1 (with 1-byte header)
|
|||
|
|
- Example: 257B stride (class 5)
|
|||
|
|
- Carved pointer: `base + (carved_index × 257)`
|
|||
|
|
- User pointer: `carved_ptr + 1`
|
|||
|
|
- For page-aligned user_ptr: `(n × 257) mod 4096 == 4095`
|
|||
|
|
- Since gcd(257, 4096) = 1, **solution exists**!
|
|||
|
|
|
|||
|
|
**Allocation flow**:
|
|||
|
|
```c
|
|||
|
|
// hakmem_tiny.c:160-163
|
|||
|
|
#define HAK_RET_ALLOC(cls, base_ptr) do { \
|
|||
|
|
*(uint8_t*)(base_ptr) = HEADER_MAGIC | ((cls) & HEADER_CLASS_MASK); \
|
|||
|
|
return (void*)((uint8_t*)(base_ptr) + 1); ← Returns user_ptr
|
|||
|
|
} while(0)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
If `base_ptr = 0x...FFF`, then `user_ptr = 0x...000` (PAGE-ALIGNED!).
|
|||
|
|
|
|||
|
|
### 2. Box FrontGate Classifies as MIDCAND (Correct by Design)
|
|||
|
|
|
|||
|
|
**front_gate_v2.h:52-59**:
|
|||
|
|
```c
|
|||
|
|
// CRITICAL: Same-page guard (header must be in same page as ptr)
|
|||
|
|
uintptr_t offset_in_page = (uintptr_t)ptr & 0xFFF;
|
|||
|
|
if (offset_in_page == 0) {
|
|||
|
|
// Page-aligned pointer → no header in same page → must be MIDCAND
|
|||
|
|
result.domain = FG_DOMAIN_MIDCAND;
|
|||
|
|
return result;
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Reason**: Reading header at `ptr-1` would cross page boundary (unsafe).
|
|||
|
|
**Result**: Page-aligned Tiny allocations → classified as MIDCAND ✅
|
|||
|
|
|
|||
|
|
### 3. MIDCAND Routing → SuperSlab Registry Lookup FAILS
|
|||
|
|
|
|||
|
|
**hak_free_api.inc.h** MIDCAND path:
|
|||
|
|
1. Mid registry lookup → NULL (not Mid allocation)
|
|||
|
|
2. L25 registry lookup → NULL (not L25 allocation)
|
|||
|
|
3. **SuperSlab registry lookup** → **NULL** ❌ (BUG!)
|
|||
|
|
4. ExternalGuard → `__libc_free()` → crash
|
|||
|
|
|
|||
|
|
**Why SuperSlab lookup fails**:
|
|||
|
|
|
|||
|
|
**Theory A**: Pointer is NOT from hakmem
|
|||
|
|
- **REJECTED**: System malloc test shows no page-aligned pointers for 16-1040B
|
|||
|
|
|
|||
|
|
**Theory B**: SuperSlab is not registered
|
|||
|
|
- **LIKELY**: Race condition, registry exhaustion, or allocation before registration
|
|||
|
|
|
|||
|
|
**Theory C**: Registry lookup bug
|
|||
|
|
- **POSSIBLE**: Hash collision, probe limit, or alignment mismatch
|
|||
|
|
|
|||
|
|
### 4. Why Phase 14 Works but Phase 15 Doesn't
|
|||
|
|
|
|||
|
|
**Phase 14**: Old classification system (no Box FrontGate/ExternalGuard)
|
|||
|
|
- Uses different routing logic
|
|||
|
|
- May have headerless path for page-aligned pointers
|
|||
|
|
- Different SuperSlab lookup implementation?
|
|||
|
|
|
|||
|
|
**Phase 15**: New Box architecture
|
|||
|
|
- Box FrontGate → classifies page-aligned as MIDCAND
|
|||
|
|
- Box routing → SuperSlab lookup
|
|||
|
|
- Box ExternalGuard → delegates to `__libc_free()` → **CRASH**
|
|||
|
|
|
|||
|
|
## Fix Options
|
|||
|
|
|
|||
|
|
### Option 1: Fix SuperSlab Registry Lookup ✅ **RECOMMENDED**
|
|||
|
|
|
|||
|
|
**Issue**: `hak_super_lookup(0x706c21a00000)` returns NULL for valid hakmem allocation.
|
|||
|
|
|
|||
|
|
**Root cause options**:
|
|||
|
|
1. SuperSlab not registered (allocation race)
|
|||
|
|
2. Registry full/hash collision
|
|||
|
|
3. Lookup alignment mismatch
|
|||
|
|
|
|||
|
|
**Investigation needed**:
|
|||
|
|
- Add debug logging to `hak_super_register()` / `hak_super_lookup()`
|
|||
|
|
- Check if SuperSlab exists for this pointer
|
|||
|
|
- Verify registration happens before user pointer is returned
|
|||
|
|
|
|||
|
|
**Fix**: Ensure all SuperSlabs are properly registered before returning user pointers.
|
|||
|
|
|
|||
|
|
### Option 2: Add Page-Aligned Special Path in FrontGate ❌ NOT RECOMMENDED
|
|||
|
|
|
|||
|
|
**Idea**: Classify page-aligned Tiny pointers as TINY instead of MIDCAND.
|
|||
|
|
|
|||
|
|
**Problems**:
|
|||
|
|
- Can't read header at `ptr-1` (page boundary violation)
|
|||
|
|
- Would need alternative classification (size class lookup?)
|
|||
|
|
- Violates Box FG design (1-byte header only)
|
|||
|
|
|
|||
|
|
### Option 3: Fix ExternalGuard Fallback ⚠️ WORKAROUND
|
|||
|
|
|
|||
|
|
**Idea**: ExternalGuard should NOT delegate unknown pointers to `__libc_free()`.
|
|||
|
|
|
|||
|
|
**Change**:
|
|||
|
|
```c
|
|||
|
|
// Before (BUG):
|
|||
|
|
if (!is_mapped) return 0; // Delegate to __libc_free (crashes!)
|
|||
|
|
|
|||
|
|
// After (FIX):
|
|||
|
|
if (!is_mapped) {
|
|||
|
|
// Unknown pointer - log and return success (leak vs crash tradeoff)
|
|||
|
|
fprintf(stderr, "[ExternalGuard] WARNING: Unknown pointer %p (ignored)\n", ptr);
|
|||
|
|
return 1; // Claim handled (prevent __libc_free crash)
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Cons**: Memory leak for genuinely external pointers.
|
|||
|
|
|
|||
|
|
## Next Steps
|
|||
|
|
|
|||
|
|
1. **Add SuperSlab Registry Debug Logging** ✅
|
|||
|
|
- Log all `hak_super_register()` calls
|
|||
|
|
- Log all `hak_super_lookup()` failures
|
|||
|
|
- Track when `0x706c21a00000` is allocated and registered
|
|||
|
|
|
|||
|
|
2. **Verify Registration Timing**
|
|||
|
|
- Ensure SuperSlab is registered BEFORE user pointers are returned
|
|||
|
|
- Check for race conditions in allocation path
|
|||
|
|
|
|||
|
|
3. **Implement Fix Option 1**
|
|||
|
|
- Fix SuperSlab registry lookup
|
|||
|
|
- Verify with 100K iterations test
|
|||
|
|
|
|||
|
|
## Conclusion
|
|||
|
|
|
|||
|
|
**Primary Bug**: SuperSlab registry lookup fails for page-aligned Tiny allocations.
|
|||
|
|
|
|||
|
|
**Secondary Bug**: ExternalGuard unconditionally delegates to `__libc_free()` (should handle unknown pointers safely).
|
|||
|
|
|
|||
|
|
**Recommended Fix**: Fix SuperSlab registry (Option 1) + improve ExternalGuard safety (Option 3 as backup).
|