# FREELIST CORRUPTION ROOT CAUSE ANALYSIS ## Phase 6-2.5 SLAB0_DATA_OFFSET Investigation ### Executive Summary The freelist corruption after changing SLAB0_DATA_OFFSET from 1024 to 2048 is **NOT caused by the offset change**. The root cause is a **use-after-free vulnerability** in the remote free queue combined with **massive double-frees**. ### Timeline - **Initial symptom:** `[TRC_FAILFAST] stage=freelist_next cls=7 node=0x7e1ff3c1d474` - **Investigation started:** After Phase 6-2.5 offset change - **Root cause found:** Use-after-free in `ss_remote_push` + double-frees ### Root Cause Analysis #### 1. Double-Free Epidemic ```bash # Test reveals 180+ duplicate freed addresses HAKMEM_WRAP_TINY=1 ./larson_hakmem 1 1 1024 1024 1 12345 1 | \ grep "free_local_box" | awk '{print $6}' | sort | uniq -d | wc -l # Result: 180+ duplicates ``` #### 2. Use-After-Free Vulnerability **Location:** `/mnt/workdisk/public_share/hakmem/core/hakmem_tiny_superslab.h:437` ```c static inline int ss_remote_push(SuperSlab* ss, int slab_idx, void* ptr) { // ... validation ... do { old = atomic_load_explicit(head, memory_order_acquire); if (!g_remote_side_enable) { *(void**)ptr = (void*)old; // ← WRITES TO POTENTIALLY ALLOCATED MEMORY! } } while (!atomic_compare_exchange_weak_explicit(...)); } ``` #### 3. The Attack Sequence 1. Thread A frees block X → pushed to remote queue (next pointer written) 2. Thread B (owner) drains remote queue → adds X to freelist 3. Thread B allocates X → application starts using it 4. Thread C double-frees X → **corrupts active user memory** 5. User writes data including `0x6261` pattern 6. Freelist traversal interprets user data as next pointer → **CRASH** ### Evidence #### Corrupted Pointers - `0x7c1b4a606261` - User data ending with 0x6261 pattern - `0x6261` - Pure user data, no valid address - Pattern `0x6261` detected as "TLS guard scribble" in code #### Debug Output ``` [TRC_FREELIST_LOG] stage=free_local_box cls=7 node=0x7da27ec0b800 next=0x7da27ec0bc00 [TRC_FREELIST_LOG] stage=free_local_box cls=7 node=0x7da27ec0b800 next=0x7da27ec04000 ^^^^^^^^^^^ SAME ADDRESS FREED TWICE! ``` #### Remote Queue Activity ``` [DEBUG ss_remote_push] Call #1 ss=0x735d23e00000 slab_idx=0 [DEBUG ss_remote_push] Call #2 ss=0x735d23e00000 slab_idx=5 [TRC_FAILFAST] stage=freelist_next cls=7 node=0x6261 ``` ### Why SLAB0_DATA_OFFSET Change Exposed This The offset change from 1024 to 2048 didn't cause the bug but may have: 1. Changed memory layout/timing 2. Made corruption more visible 3. Affected which blocks get double-freed 4. The bug existed before but was latent ### Attempted Mitigations #### 1. Enable Safe Free (COMPLETED) ```c // core/hakmem_tiny.c:39 int g_tiny_safe_free = 1; // ULTRATHINK FIX: Enable by default ``` **Result:** Still crashes - race condition persists #### 2. Required Fixes (PENDING) - Add ownership validation before writing next pointer - Implement proper memory barriers - Add atomic state tracking for blocks - Consider hazard pointers or epoch-based reclamation ### Reproduction ```bash # Immediate crash with SuperSlab enabled HAKMEM_WRAP_TINY=1 ./larson_hakmem 1 1 1024 1024 1 12345 1 # Works fine without SuperSlab HAKMEM_WRAP_TINY=0 ./larson_hakmem 1 1 1024 1024 1 12345 1 ``` ### Recommendations 1. **IMMEDIATE:** Do not use in production 2. **SHORT-TERM:** Disable remote free queue (`HAKMEM_TINY_DISABLE_REMOTE=1`) 3. **LONG-TERM:** Redesign lock-free MPSC with safe memory reclamation ### Technical Details #### Memory Layout (Class 7, 1024-byte blocks) ``` SuperSlab base: 0x7c1b4a600000 Slab 0 start: 0x7c1b4a600000 + 2048 = 0x7c1b4a600800 Block 0: 0x7c1b4a600800 Block 1: 0x7c1b4a600c00 Block 42: 0x7c1b4a60b000 (offset 43008 from slab 0 start) ``` #### Validation Points - Offset 2048 is correct (aligns to 1024-byte blocks) - `sizeof(SuperSlab) = 1088` requires 2048-byte alignment - All legitimate blocks ARE properly aligned - Corruption comes from use-after-free, not misalignment ### Conclusion The HAKMEM allocator has a **critical memory safety bug** in its lock-free remote free queue. The bug allows: - Use-after-free corruption - Double-free vulnerabilities - Memory corruption of active allocations This is a **SECURITY VULNERABILITY** that could be exploited for arbitrary code execution. ### Author Claude Opus 4.1 (ULTRATHINK Mode) Analysis Date: 2025-11-07