244 lines
7.3 KiB
Markdown
244 lines
7.3 KiB
Markdown
|
|
# Class 2 Header Corruption - FINAL ROOT CAUSE
|
||
|
|
|
||
|
|
## Executive Summary
|
||
|
|
|
||
|
|
**STATUS**: ✅ **ROOT CAUSE IDENTIFIED**
|
||
|
|
|
||
|
|
**Corrupted Pointer**: `0x74db60210116`
|
||
|
|
**Corruption Call**: `14209`
|
||
|
|
**Last Valid PUSH**: Call `3957`
|
||
|
|
|
||
|
|
**Root Cause**: The logs reveal `0x74db60210115` and `0x74db60210116` (only 1 byte apart) are being pushed/popped from TLS SLL. This spacing is IMPOSSIBLE for Class 2 (32B blocks + 1B header = 33B stride).
|
||
|
|
|
||
|
|
**Conclusion**: These are **USER and BASE representations of the SAME block**, indicating a USER/BASE pointer mismatch somewhere in the code that allows USER pointers to leak into the TLS SLL.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Evidence
|
||
|
|
|
||
|
|
### Timeline of Corrupted Block
|
||
|
|
|
||
|
|
```
|
||
|
|
[C2_PUSH] ptr=0x74db60210115 before=0xa2 after=0xa2 call=3915 ← USER pointer!
|
||
|
|
[C2_POP] ptr=0x74db60210115 header=0xa2 expected=0xa2 call=3936 ← USER pointer!
|
||
|
|
[C2_PUSH] ptr=0x74db60210116 before=0xa2 after=0xa2 call=3957 ← BASE pointer (correct)
|
||
|
|
[C2_POP] ptr=0x74db60210116 header=0x00 expected=0xa2 call=14209 ← CORRUPTION!
|
||
|
|
```
|
||
|
|
|
||
|
|
### Address Analysis
|
||
|
|
|
||
|
|
```
|
||
|
|
0x74db60210115 ← USER pointer (BASE + 1)
|
||
|
|
0x74db60210116 ← BASE pointer (header location)
|
||
|
|
```
|
||
|
|
|
||
|
|
**Difference**: 1 byte (should be 33 bytes for different Class 2 blocks)
|
||
|
|
|
||
|
|
**Conclusion**: Same physical block, two different pointer conventions
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Corruption Mechanism
|
||
|
|
|
||
|
|
### Phase 1: USER Pointer Leak (Calls 3915-3936)
|
||
|
|
|
||
|
|
1. **Call 3915**: FREE operation pushes `0x115` (USER pointer) to TLS SLL
|
||
|
|
- BUG: Code path passes USER to `tls_sll_push` instead of BASE
|
||
|
|
- TLS SLL receives USER pointer
|
||
|
|
- `tls_sll_push` writes header at USER-1 (`0x116`), so header is correct
|
||
|
|
|
||
|
|
2. **Call 3936**: ALLOC operation pops `0x115` (USER pointer) from TLS SLL
|
||
|
|
- Returns USER pointer to application (correct for external API)
|
||
|
|
- User writes to `0x115+` (user data area)
|
||
|
|
- Header at `0x116` remains intact (not touched by user)
|
||
|
|
|
||
|
|
### Phase 2: Correct BASE Pointer (Call 3957)
|
||
|
|
|
||
|
|
3. **Call 3957**: FREE operation pushes `0x116` (BASE pointer) to TLS SLL
|
||
|
|
- Correct: Passes BASE to `tls_sll_push`
|
||
|
|
- Header restored to `0xa2`
|
||
|
|
|
||
|
|
### Phase 3: User Overwrites Header (Calls 3957-14209)
|
||
|
|
|
||
|
|
4. **Between 3957-14209**: ALLOC operation pops `0x116` from TLS SLL
|
||
|
|
- **BUG: Returns BASE pointer to user instead of USER pointer!**
|
||
|
|
- User receives `0x116` thinking it's the start of user data
|
||
|
|
- User writes to `0x116[0]` (thinks it's user byte 0)
|
||
|
|
- **ACTUALLY overwrites header byte!**
|
||
|
|
- Header becomes `0x00`
|
||
|
|
|
||
|
|
5. **Call 14209**: FREE operation pushes `0x116` to TLS SLL
|
||
|
|
- **CORRUPTION DETECTED**: Header is `0x00` instead of `0xa2`
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Code Analysis
|
||
|
|
|
||
|
|
### Allocation Paths (USER Conversion) ✅ CORRECT
|
||
|
|
|
||
|
|
**File**: `/mnt/workdisk/public_share/hakmem/core/tiny_region_id.h:46`
|
||
|
|
|
||
|
|
```c
|
||
|
|
static inline void* tiny_region_id_write_header(void* base, int class_idx) {
|
||
|
|
if (!base) return base;
|
||
|
|
if (__builtin_expect(class_idx == 7, 0)) {
|
||
|
|
return base; // C7: headerless
|
||
|
|
}
|
||
|
|
|
||
|
|
// Write header at BASE
|
||
|
|
uint8_t* header_ptr = (uint8_t*)base;
|
||
|
|
*header_ptr = HEADER_MAGIC | (class_idx & HEADER_CLASS_MASK);
|
||
|
|
|
||
|
|
void* user = header_ptr + 1; // ✅ Convert BASE → USER
|
||
|
|
return user; // ✅ CORRECT: Returns USER pointer
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Usage**: All `HAK_RET_ALLOC(class_idx, ptr)` calls use this function, which correctly returns USER pointers.
|
||
|
|
|
||
|
|
### Free Paths (BASE Conversion) - MIXED RESULTS
|
||
|
|
|
||
|
|
#### Path 1: Ultra-Simple Free ✅ CORRECT
|
||
|
|
|
||
|
|
**File**: `/mnt/workdisk/public_share/hakmem/core/hakmem_tiny_free.inc:383`
|
||
|
|
|
||
|
|
```c
|
||
|
|
void* base = (class_idx == 7) ? ptr : (void*)((uint8_t*)ptr - 1); // ✅ Convert USER → BASE
|
||
|
|
if (tls_sll_push(class_idx, base, (uint32_t)sll_cap)) {
|
||
|
|
return; // Success
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Status**: ✅ CORRECT - Converts USER → BASE before push
|
||
|
|
|
||
|
|
#### Path 2: Freelist Drain ❓ SUSPICIOUS
|
||
|
|
|
||
|
|
**File**: `/mnt/workdisk/public_share/hakmem/core/hakmem_tiny_free.inc:75`
|
||
|
|
|
||
|
|
```c
|
||
|
|
static inline void tiny_drain_freelist_to_sll_once(SuperSlab* ss, int slab_idx, int class_idx) {
|
||
|
|
// ...
|
||
|
|
while (m->freelist && moved < budget) {
|
||
|
|
void* p = m->freelist; // ← What is this? BASE or USER?
|
||
|
|
// ...
|
||
|
|
if (tls_sll_push(class_idx, p, sll_capacity)) { // ← Pushing p directly
|
||
|
|
moved++;
|
||
|
|
}
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Question**: Is `m->freelist` stored as BASE or USER?
|
||
|
|
|
||
|
|
**Answer**: Freelist stores pointers at offset 0 (header location for header classes), so `m->freelist` contains **BASE pointers**. This is **CORRECT**.
|
||
|
|
|
||
|
|
#### Path 3: Fast Free ❓ NEEDS INVESTIGATION
|
||
|
|
|
||
|
|
**File**: `/mnt/workdisk/public_share/hakmem/core/tiny_free_fast_v2.inc.h`
|
||
|
|
|
||
|
|
Need to check if fast free path converts USER → BASE.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Next Steps: Find the Buggy Path
|
||
|
|
|
||
|
|
### Step 1: Check Fast Free Path
|
||
|
|
|
||
|
|
```bash
|
||
|
|
grep -A 10 -B 5 "tls_sll_push" core/tiny_free_fast_v2.inc.h
|
||
|
|
```
|
||
|
|
|
||
|
|
Look for paths that pass `ptr` directly to `tls_sll_push` without USER → BASE conversion.
|
||
|
|
|
||
|
|
### Step 2: Check All Free Wrappers
|
||
|
|
|
||
|
|
```bash
|
||
|
|
grep -rn "void.*free.*void.*ptr" core/ | grep -v "\.o:"
|
||
|
|
```
|
||
|
|
|
||
|
|
Check all free entry points to ensure USER → BASE conversion.
|
||
|
|
|
||
|
|
### Step 3: Add Validation to tls_sll_push
|
||
|
|
|
||
|
|
Temporarily add address alignment check in `tls_sll_push`:
|
||
|
|
|
||
|
|
```c
|
||
|
|
// In tls_sll_box.h: tls_sll_push()
|
||
|
|
#if !HAKMEM_BUILD_RELEASE
|
||
|
|
if (class_idx != 7) {
|
||
|
|
// For header classes, ptr should be BASE (even address for 32B blocks)
|
||
|
|
// USER pointers would be BASE+1 (odd addresses for 32B blocks)
|
||
|
|
uintptr_t addr = (uintptr_t)ptr;
|
||
|
|
if ((addr & 1) != 0) { // ODD address = USER pointer!
|
||
|
|
extern _Atomic uint64_t malloc_count;
|
||
|
|
uint64_t call = atomic_load(&malloc_count);
|
||
|
|
fprintf(stderr, "[TLS_SLL_PUSH_BUG] call=%lu cls=%d ptr=%p is ODD (USER pointer!)\\n",
|
||
|
|
call, class_idx, ptr);
|
||
|
|
fprintf(stderr, "[TLS_SLL_PUSH_BUG] Caller passed USER instead of BASE!\\n");
|
||
|
|
fflush(stderr);
|
||
|
|
abort();
|
||
|
|
}
|
||
|
|
}
|
||
|
|
#endif
|
||
|
|
```
|
||
|
|
|
||
|
|
This will catch USER pointers immediately at injection point!
|
||
|
|
|
||
|
|
### Step 4: Run Test
|
||
|
|
|
||
|
|
```bash
|
||
|
|
./build.sh bench_random_mixed_hakmem
|
||
|
|
timeout 60s ./out/release/bench_random_mixed_hakmem 10000 256 42 2>&1 | tee user_ptr_catch.log
|
||
|
|
```
|
||
|
|
|
||
|
|
Expected: Immediate abort with backtrace showing which path is passing USER pointers.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Hypothesis
|
||
|
|
|
||
|
|
Based on the evidence, the bug is likely in:
|
||
|
|
|
||
|
|
1. **Fast free path** that doesn't convert USER → BASE before `tls_sll_push`
|
||
|
|
2. **Some wrapper** around `hakmem_free()` that pre-converts USER → BASE incorrectly
|
||
|
|
3. **Some refill/drain path** that accidentally uses USER pointers from freelist
|
||
|
|
|
||
|
|
**Most Likely**: Fast free path optimization that skips USER → BASE conversion for performance.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Verification Plan
|
||
|
|
|
||
|
|
1. Add ODD address validation to `tls_sll_push` (debug builds only)
|
||
|
|
2. Run 10K iteration test
|
||
|
|
3. Catch USER pointer injection with backtrace
|
||
|
|
4. Fix the specific path
|
||
|
|
5. Re-test with 100K iterations
|
||
|
|
6. Remove validation (keep in comments for future debugging)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Expected Fix
|
||
|
|
|
||
|
|
Once we identify the buggy path, the fix will be a 1-liner:
|
||
|
|
|
||
|
|
```c
|
||
|
|
// BEFORE (BUG):
|
||
|
|
tls_sll_push(class_idx, user_ptr, ...); // ← Passing USER!
|
||
|
|
|
||
|
|
// AFTER (FIX):
|
||
|
|
void* base = PTR_USER_TO_BASE(user_ptr, class_idx); // ✅ Convert to BASE
|
||
|
|
tls_sll_push(class_idx, base, ...);
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Status
|
||
|
|
|
||
|
|
- ✅ Root cause identified (USER/BASE mismatch)
|
||
|
|
- ✅ Evidence collected (logs showing ODD/EVEN addresses)
|
||
|
|
- ✅ Mechanism understood (user overwrites header when given BASE)
|
||
|
|
- ⏳ Specific buggy path: TO BE IDENTIFIED (next step)
|
||
|
|
- ⏳ Fix: TO BE APPLIED (1-line change)
|
||
|
|
- ⏳ Verification: TO BE DONE (100K test)
|