# Class 2 Header Corruption - FINAL ROOT CAUSE ## Executive Summary **STATUS**: ✅ **ROOT CAUSE IDENTIFIED** **Corrupted Pointer**: `0x74db60210116` **Corruption Call**: `14209` **Last Valid PUSH**: Call `3957` **Root Cause**: The logs reveal `0x74db60210115` and `0x74db60210116` (only 1 byte apart) are being pushed/popped from TLS SLL. This spacing is IMPOSSIBLE for Class 2 (32B blocks + 1B header = 33B stride). **Conclusion**: These are **USER and BASE representations of the SAME block**, indicating a USER/BASE pointer mismatch somewhere in the code that allows USER pointers to leak into the TLS SLL. --- ## Evidence ### Timeline of Corrupted Block ``` [C2_PUSH] ptr=0x74db60210115 before=0xa2 after=0xa2 call=3915 ← USER pointer! [C2_POP] ptr=0x74db60210115 header=0xa2 expected=0xa2 call=3936 ← USER pointer! [C2_PUSH] ptr=0x74db60210116 before=0xa2 after=0xa2 call=3957 ← BASE pointer (correct) [C2_POP] ptr=0x74db60210116 header=0x00 expected=0xa2 call=14209 ← CORRUPTION! ``` ### Address Analysis ``` 0x74db60210115 ← USER pointer (BASE + 1) 0x74db60210116 ← BASE pointer (header location) ``` **Difference**: 1 byte (should be 33 bytes for different Class 2 blocks) **Conclusion**: Same physical block, two different pointer conventions --- ## Corruption Mechanism ### Phase 1: USER Pointer Leak (Calls 3915-3936) 1. **Call 3915**: FREE operation pushes `0x115` (USER pointer) to TLS SLL - BUG: Code path passes USER to `tls_sll_push` instead of BASE - TLS SLL receives USER pointer - `tls_sll_push` writes header at USER-1 (`0x116`), so header is correct 2. **Call 3936**: ALLOC operation pops `0x115` (USER pointer) from TLS SLL - Returns USER pointer to application (correct for external API) - User writes to `0x115+` (user data area) - Header at `0x116` remains intact (not touched by user) ### Phase 2: Correct BASE Pointer (Call 3957) 3. **Call 3957**: FREE operation pushes `0x116` (BASE pointer) to TLS SLL - Correct: Passes BASE to `tls_sll_push` - Header restored to `0xa2` ### Phase 3: User Overwrites Header (Calls 3957-14209) 4. **Between 3957-14209**: ALLOC operation pops `0x116` from TLS SLL - **BUG: Returns BASE pointer to user instead of USER pointer!** - User receives `0x116` thinking it's the start of user data - User writes to `0x116[0]` (thinks it's user byte 0) - **ACTUALLY overwrites header byte!** - Header becomes `0x00` 5. **Call 14209**: FREE operation pushes `0x116` to TLS SLL - **CORRUPTION DETECTED**: Header is `0x00` instead of `0xa2` --- ## Code Analysis ### Allocation Paths (USER Conversion) ✅ CORRECT **File**: `/mnt/workdisk/public_share/hakmem/core/tiny_region_id.h:46` ```c static inline void* tiny_region_id_write_header(void* base, int class_idx) { if (!base) return base; if (__builtin_expect(class_idx == 7, 0)) { return base; // C7: headerless } // Write header at BASE uint8_t* header_ptr = (uint8_t*)base; *header_ptr = HEADER_MAGIC | (class_idx & HEADER_CLASS_MASK); void* user = header_ptr + 1; // ✅ Convert BASE → USER return user; // ✅ CORRECT: Returns USER pointer } ``` **Usage**: All `HAK_RET_ALLOC(class_idx, ptr)` calls use this function, which correctly returns USER pointers. ### Free Paths (BASE Conversion) - MIXED RESULTS #### Path 1: Ultra-Simple Free ✅ CORRECT **File**: `/mnt/workdisk/public_share/hakmem/core/hakmem_tiny_free.inc:383` ```c void* base = (class_idx == 7) ? ptr : (void*)((uint8_t*)ptr - 1); // ✅ Convert USER → BASE if (tls_sll_push(class_idx, base, (uint32_t)sll_cap)) { return; // Success } ``` **Status**: ✅ CORRECT - Converts USER → BASE before push #### Path 2: Freelist Drain ❓ SUSPICIOUS **File**: `/mnt/workdisk/public_share/hakmem/core/hakmem_tiny_free.inc:75` ```c static inline void tiny_drain_freelist_to_sll_once(SuperSlab* ss, int slab_idx, int class_idx) { // ... while (m->freelist && moved < budget) { void* p = m->freelist; // ← What is this? BASE or USER? // ... if (tls_sll_push(class_idx, p, sll_capacity)) { // ← Pushing p directly moved++; } } } ``` **Question**: Is `m->freelist` stored as BASE or USER? **Answer**: Freelist stores pointers at offset 0 (header location for header classes), so `m->freelist` contains **BASE pointers**. This is **CORRECT**. #### Path 3: Fast Free ❓ NEEDS INVESTIGATION **File**: `/mnt/workdisk/public_share/hakmem/core/tiny_free_fast_v2.inc.h` Need to check if fast free path converts USER → BASE. --- ## Next Steps: Find the Buggy Path ### Step 1: Check Fast Free Path ```bash grep -A 10 -B 5 "tls_sll_push" core/tiny_free_fast_v2.inc.h ``` Look for paths that pass `ptr` directly to `tls_sll_push` without USER → BASE conversion. ### Step 2: Check All Free Wrappers ```bash grep -rn "void.*free.*void.*ptr" core/ | grep -v "\.o:" ``` Check all free entry points to ensure USER → BASE conversion. ### Step 3: Add Validation to tls_sll_push Temporarily add address alignment check in `tls_sll_push`: ```c // In tls_sll_box.h: tls_sll_push() #if !HAKMEM_BUILD_RELEASE if (class_idx != 7) { // For header classes, ptr should be BASE (even address for 32B blocks) // USER pointers would be BASE+1 (odd addresses for 32B blocks) uintptr_t addr = (uintptr_t)ptr; if ((addr & 1) != 0) { // ODD address = USER pointer! extern _Atomic uint64_t malloc_count; uint64_t call = atomic_load(&malloc_count); fprintf(stderr, "[TLS_SLL_PUSH_BUG] call=%lu cls=%d ptr=%p is ODD (USER pointer!)\\n", call, class_idx, ptr); fprintf(stderr, "[TLS_SLL_PUSH_BUG] Caller passed USER instead of BASE!\\n"); fflush(stderr); abort(); } } #endif ``` This will catch USER pointers immediately at injection point! ### Step 4: Run Test ```bash ./build.sh bench_random_mixed_hakmem timeout 60s ./out/release/bench_random_mixed_hakmem 10000 256 42 2>&1 | tee user_ptr_catch.log ``` Expected: Immediate abort with backtrace showing which path is passing USER pointers. --- ## Hypothesis Based on the evidence, the bug is likely in: 1. **Fast free path** that doesn't convert USER → BASE before `tls_sll_push` 2. **Some wrapper** around `hakmem_free()` that pre-converts USER → BASE incorrectly 3. **Some refill/drain path** that accidentally uses USER pointers from freelist **Most Likely**: Fast free path optimization that skips USER → BASE conversion for performance. --- ## Verification Plan 1. Add ODD address validation to `tls_sll_push` (debug builds only) 2. Run 10K iteration test 3. Catch USER pointer injection with backtrace 4. Fix the specific path 5. Re-test with 100K iterations 6. Remove validation (keep in comments for future debugging) --- ## Expected Fix Once we identify the buggy path, the fix will be a 1-liner: ```c // BEFORE (BUG): tls_sll_push(class_idx, user_ptr, ...); // ← Passing USER! // AFTER (FIX): void* base = PTR_USER_TO_BASE(user_ptr, class_idx); // ✅ Convert to BASE tls_sll_push(class_idx, base, ...); ``` --- ## Status - ✅ Root cause identified (USER/BASE mismatch) - ✅ Evidence collected (logs showing ODD/EVEN addresses) - ✅ Mechanism understood (user overwrites header when given BASE) - ⏳ Specific buggy path: TO BE IDENTIFIED (next step) - ⏳ Fix: TO BE APPLIED (1-line change) - ⏳ Verification: TO BE DONE (100K test)