183 lines
5.3 KiB
Markdown
183 lines
5.3 KiB
Markdown
|
|
# Phase 15 Registry Lookup Investigation
|
|||
|
|
|
|||
|
|
**Date**: 2025-11-15
|
|||
|
|
**Status**: 🔍 ROOT CAUSE IDENTIFIED
|
|||
|
|
|
|||
|
|
## Summary
|
|||
|
|
|
|||
|
|
Page-aligned Tiny allocations reach ExternalGuard → SuperSlab registry lookup FAILS → delegated to `__libc_free()` → crash.
|
|||
|
|
|
|||
|
|
## Critical Findings
|
|||
|
|
|
|||
|
|
### 1. Registry Only Stores ONE SuperSlab
|
|||
|
|
|
|||
|
|
**Evidence**:
|
|||
|
|
```
|
|||
|
|
[SUPER_REG] register base=0x7d3893c00000 lg=21 slot=115870 magic=5353504c
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Only 1 registration** in entire test run (10K iterations, 100K operations).
|
|||
|
|
|
|||
|
|
### 2. 4MB Address Gap
|
|||
|
|
|
|||
|
|
**Pattern** (consistent across multiple runs):
|
|||
|
|
- **Registry stores**: `0x7d3893c00000` (SuperSlab structure address)
|
|||
|
|
- **Lookup searches**: `0x7d3893800000` (user pointer, 4MB **lower**)
|
|||
|
|
- **Difference**: `0x400000 = 4MB = 2 × SuperSlab size (lg=21, 2MB)`
|
|||
|
|
|
|||
|
|
### 3. User Data Layout
|
|||
|
|
|
|||
|
|
**From code analysis** (`superslab_inline.h:30-35`):
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
size_t off = SUPERSLAB_SLAB0_DATA_OFFSET + (size_t)slab_idx * SLAB_SIZE;
|
|||
|
|
return (uint8_t*)ss + off;
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**User data is placed AFTER SuperSlab structure**, NOT before!
|
|||
|
|
|
|||
|
|
**Implication**: User pointer `0x7d3893800000` **cannot** belong to SuperSlab `0x7d3893c00000` (4MB higher).
|
|||
|
|
|
|||
|
|
### 4. mmap Alignment Mechanism
|
|||
|
|
|
|||
|
|
**Code** (`hakmem_tiny_superslab.c:280-308`):
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
size_t alloc_size = ss_size * 2; // Allocate 4MB for 2MB SuperSlab
|
|||
|
|
void* raw = mmap(NULL, alloc_size, ...);
|
|||
|
|
uintptr_t aligned_addr = (raw_addr + ss_mask) & ~ss_mask; // 2MB align
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Scenario**:
|
|||
|
|
- mmap returns `0x7d3893800000` (already 2MB-aligned)
|
|||
|
|
- `aligned_addr = 0x7d3893800000` (no change)
|
|||
|
|
- Prefix size = 0, Suffix = 2MB (munmapped)
|
|||
|
|
- **SuperSlab registered at**: `0x7d3893800000`
|
|||
|
|
|
|||
|
|
**Contradiction**: Registry shows `0x7d3893c00000`, not `0x7d3893800000`!
|
|||
|
|
|
|||
|
|
### 5. Hash Slot Mismatch
|
|||
|
|
|
|||
|
|
**Lookup**:
|
|||
|
|
```
|
|||
|
|
[SUPER_LOOKUP] ptr=0x7d3893800000 lg=21 aligned_base=0x7d3893800000 hash=115868
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Registry**:
|
|||
|
|
```
|
|||
|
|
[SUPER_REG] register base=0x7d3893c00000 lg=21 slot=115870
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Hash difference**: 115868 vs 115870 (2 slots apart)
|
|||
|
|
**Reason**: Linear probing found different slot due to collision.
|
|||
|
|
|
|||
|
|
## Root Cause Hypothesis
|
|||
|
|
|
|||
|
|
### Option A: Multiple SuperSlabs, Only One Registered
|
|||
|
|
|
|||
|
|
**Theory**: Multiple SuperSlabs allocated, but only the **last one** is logged.
|
|||
|
|
|
|||
|
|
**Problem**: Debug logging should show ALL registrations after fix (ENV check on every call).
|
|||
|
|
|
|||
|
|
### Option B: LRU Cache Reuse
|
|||
|
|
|
|||
|
|
**Theory**: Most SuperSlabs come from LRU cache (already registered), only new allocations are logged.
|
|||
|
|
|
|||
|
|
**Problem**: First few iterations should still show multiple registrations.
|
|||
|
|
|
|||
|
|
### Option C: Pointer is NOT from hakmem
|
|||
|
|
|
|||
|
|
**Theory**: `0x7d3893800000` is allocated by **`__libc_malloc()`**, NOT hakmem.
|
|||
|
|
|
|||
|
|
**Evidence**:
|
|||
|
|
- Box BenchMeta uses `__libc_calloc` for `slots[]` array
|
|||
|
|
- `free(slots[idx])` uses hakmem wrapper
|
|||
|
|
- **But**: `slots[]` array itself is freed with `__libc_free(slots)` (Line 99)
|
|||
|
|
|
|||
|
|
**Contradiction**: `slots[]` should NOT reach hakmem `free()` wrapper.
|
|||
|
|
|
|||
|
|
### Option D: Registry Lookup Bug
|
|||
|
|
|
|||
|
|
**Theory**: SuperSlab **is** registered at `0x7d3893800000`, but lookup fails due to:
|
|||
|
|
1. Hash collision (different slot used during registration vs lookup)
|
|||
|
|
2. Linear probing limit exceeded (SUPER_MAX_PROBE = 8)
|
|||
|
|
3. Alignment mismatch (looking for wrong base address)
|
|||
|
|
|
|||
|
|
## Test Results Comparison
|
|||
|
|
|
|||
|
|
| Phase | Test Result | Behavior |
|
|||
|
|
|-------|-------------|----------|
|
|||
|
|
| Phase 14 | ✅ PASS (5.69M ops/s) | No crash with same test |
|
|||
|
|
| Phase 15 | ❌ CRASH | ExternalGuard → `__libc_free()` failure |
|
|||
|
|
|
|||
|
|
**Conclusion**: Phase 15 Box Separation introduced regression.
|
|||
|
|
|
|||
|
|
## Next Steps
|
|||
|
|
|
|||
|
|
### Investigation Needed
|
|||
|
|
|
|||
|
|
1. **Add more detailed logging**:
|
|||
|
|
- Log ALL mmap calls with returned address
|
|||
|
|
- Log prefix/suffix munmap with exact ranges
|
|||
|
|
- Log final SuperSlab address vs mmap address
|
|||
|
|
- Track which pointers are allocated from which SuperSlab
|
|||
|
|
|
|||
|
|
2. **Verify registry integrity**:
|
|||
|
|
- Dump entire registry before crash
|
|||
|
|
- Check for hash collisions
|
|||
|
|
- Verify linear probing behavior
|
|||
|
|
|
|||
|
|
3. **Test with reduced SuperSlab size**:
|
|||
|
|
- Try lg=20 (1MB) instead of lg=21 (2MB)
|
|||
|
|
- See if 2MB gap still occurs
|
|||
|
|
|
|||
|
|
### Fix Options
|
|||
|
|
|
|||
|
|
#### **Option 1: Fix SuperSlab Registry Lookup** ✅ **RECOMMENDED**
|
|||
|
|
|
|||
|
|
**Issue**: Registry lookup fails for valid hakmem allocations.
|
|||
|
|
|
|||
|
|
**Potential fixes**:
|
|||
|
|
- Increase SUPER_MAX_PROBE from 8 to 16/32
|
|||
|
|
- Use better hash function to reduce collisions
|
|||
|
|
- Store address **range** instead of single base
|
|||
|
|
- Support lookup by any address within SuperSlab region
|
|||
|
|
|
|||
|
|
#### **Option 2: Improve ExternalGuard Safety** ⚠️ **WORKAROUND**
|
|||
|
|
|
|||
|
|
**Current behavior** (DANGEROUS):
|
|||
|
|
```c
|
|||
|
|
if (!is_mapped) return 0; // Delegate to __libc_free → CRASH!
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Safer behavior**:
|
|||
|
|
```c
|
|||
|
|
if (!is_mapped) {
|
|||
|
|
fprintf(stderr, "[ExternalGuard] WARNING: Unknown pointer %p (ignored)\n", ptr);
|
|||
|
|
return 1; // Claim handled (leak vs crash tradeoff)
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Pros**: Prevents crash
|
|||
|
|
**Cons**: Memory leak for genuinely external pointers
|
|||
|
|
|
|||
|
|
#### **Option 3: Fix Box FrontGate Classification** ❌ NOT RECOMMENDED
|
|||
|
|
|
|||
|
|
**Idea**: Add special path for page-aligned Tiny pointers.
|
|||
|
|
|
|||
|
|
**Problems**:
|
|||
|
|
- Can't read header at `ptr-1` (page boundary violation)
|
|||
|
|
- Violates 1-byte header design
|
|||
|
|
- Requires alternative classification
|
|||
|
|
|
|||
|
|
## Conclusion
|
|||
|
|
|
|||
|
|
**Primary Issue**: SuperSlab registry lookup fails for page-aligned user pointers.
|
|||
|
|
|
|||
|
|
**Secondary Issue**: ExternalGuard unconditionally delegates unknown pointers to `__libc_free()`.
|
|||
|
|
|
|||
|
|
**Recommended Action**:
|
|||
|
|
1. Fix registry lookup (Option 1)
|
|||
|
|
2. Add ExternalGuard safety (Option 2 as backup)
|
|||
|
|
3. Comprehensive logging to confirm root cause
|