131 lines
4.4 KiB
Markdown
131 lines
4.4 KiB
Markdown
|
|
# FREELIST CORRUPTION ROOT CAUSE ANALYSIS
|
||
|
|
## Phase 6-2.5 SLAB0_DATA_OFFSET Investigation
|
||
|
|
|
||
|
|
### Executive Summary
|
||
|
|
The freelist corruption after changing SLAB0_DATA_OFFSET from 1024 to 2048 is **NOT caused by the offset change**. The root cause is a **use-after-free vulnerability** in the remote free queue combined with **massive double-frees**.
|
||
|
|
|
||
|
|
### Timeline
|
||
|
|
- **Initial symptom:** `[TRC_FAILFAST] stage=freelist_next cls=7 node=0x7e1ff3c1d474`
|
||
|
|
- **Investigation started:** After Phase 6-2.5 offset change
|
||
|
|
- **Root cause found:** Use-after-free in `ss_remote_push` + double-frees
|
||
|
|
|
||
|
|
### Root Cause Analysis
|
||
|
|
|
||
|
|
#### 1. Double-Free Epidemic
|
||
|
|
```bash
|
||
|
|
# Test reveals 180+ duplicate freed addresses
|
||
|
|
HAKMEM_WRAP_TINY=1 ./larson_hakmem 1 1 1024 1024 1 12345 1 | \
|
||
|
|
grep "free_local_box" | awk '{print $6}' | sort | uniq -d | wc -l
|
||
|
|
# Result: 180+ duplicates
|
||
|
|
```
|
||
|
|
|
||
|
|
#### 2. Use-After-Free Vulnerability
|
||
|
|
**Location:** `/mnt/workdisk/public_share/hakmem/core/hakmem_tiny_superslab.h:437`
|
||
|
|
```c
|
||
|
|
static inline int ss_remote_push(SuperSlab* ss, int slab_idx, void* ptr) {
|
||
|
|
// ... validation ...
|
||
|
|
do {
|
||
|
|
old = atomic_load_explicit(head, memory_order_acquire);
|
||
|
|
if (!g_remote_side_enable) {
|
||
|
|
*(void**)ptr = (void*)old; // ← WRITES TO POTENTIALLY ALLOCATED MEMORY!
|
||
|
|
}
|
||
|
|
} while (!atomic_compare_exchange_weak_explicit(...));
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
#### 3. The Attack Sequence
|
||
|
|
1. Thread A frees block X → pushed to remote queue (next pointer written)
|
||
|
|
2. Thread B (owner) drains remote queue → adds X to freelist
|
||
|
|
3. Thread B allocates X → application starts using it
|
||
|
|
4. Thread C double-frees X → **corrupts active user memory**
|
||
|
|
5. User writes data including `0x6261` pattern
|
||
|
|
6. Freelist traversal interprets user data as next pointer → **CRASH**
|
||
|
|
|
||
|
|
### Evidence
|
||
|
|
|
||
|
|
#### Corrupted Pointers
|
||
|
|
- `0x7c1b4a606261` - User data ending with 0x6261 pattern
|
||
|
|
- `0x6261` - Pure user data, no valid address
|
||
|
|
- Pattern `0x6261` detected as "TLS guard scribble" in code
|
||
|
|
|
||
|
|
#### Debug Output
|
||
|
|
```
|
||
|
|
[TRC_FREELIST_LOG] stage=free_local_box cls=7 node=0x7da27ec0b800 next=0x7da27ec0bc00
|
||
|
|
[TRC_FREELIST_LOG] stage=free_local_box cls=7 node=0x7da27ec0b800 next=0x7da27ec04000
|
||
|
|
^^^^^^^^^^^ SAME ADDRESS FREED TWICE!
|
||
|
|
```
|
||
|
|
|
||
|
|
#### Remote Queue Activity
|
||
|
|
```
|
||
|
|
[DEBUG ss_remote_push] Call #1 ss=0x735d23e00000 slab_idx=0
|
||
|
|
[DEBUG ss_remote_push] Call #2 ss=0x735d23e00000 slab_idx=5
|
||
|
|
[TRC_FAILFAST] stage=freelist_next cls=7 node=0x6261
|
||
|
|
```
|
||
|
|
|
||
|
|
### Why SLAB0_DATA_OFFSET Change Exposed This
|
||
|
|
|
||
|
|
The offset change from 1024 to 2048 didn't cause the bug but may have:
|
||
|
|
1. Changed memory layout/timing
|
||
|
|
2. Made corruption more visible
|
||
|
|
3. Affected which blocks get double-freed
|
||
|
|
4. The bug existed before but was latent
|
||
|
|
|
||
|
|
### Attempted Mitigations
|
||
|
|
|
||
|
|
#### 1. Enable Safe Free (COMPLETED)
|
||
|
|
```c
|
||
|
|
// core/hakmem_tiny.c:39
|
||
|
|
int g_tiny_safe_free = 1; // ULTRATHINK FIX: Enable by default
|
||
|
|
```
|
||
|
|
**Result:** Still crashes - race condition persists
|
||
|
|
|
||
|
|
#### 2. Required Fixes (PENDING)
|
||
|
|
- Add ownership validation before writing next pointer
|
||
|
|
- Implement proper memory barriers
|
||
|
|
- Add atomic state tracking for blocks
|
||
|
|
- Consider hazard pointers or epoch-based reclamation
|
||
|
|
|
||
|
|
### Reproduction
|
||
|
|
```bash
|
||
|
|
# Immediate crash with SuperSlab enabled
|
||
|
|
HAKMEM_WRAP_TINY=1 ./larson_hakmem 1 1 1024 1024 1 12345 1
|
||
|
|
|
||
|
|
# Works fine without SuperSlab
|
||
|
|
HAKMEM_WRAP_TINY=0 ./larson_hakmem 1 1 1024 1024 1 12345 1
|
||
|
|
```
|
||
|
|
|
||
|
|
### Recommendations
|
||
|
|
|
||
|
|
1. **IMMEDIATE:** Do not use in production
|
||
|
|
2. **SHORT-TERM:** Disable remote free queue (`HAKMEM_TINY_DISABLE_REMOTE=1`)
|
||
|
|
3. **LONG-TERM:** Redesign lock-free MPSC with safe memory reclamation
|
||
|
|
|
||
|
|
### Technical Details
|
||
|
|
|
||
|
|
#### Memory Layout (Class 7, 1024-byte blocks)
|
||
|
|
```
|
||
|
|
SuperSlab base: 0x7c1b4a600000
|
||
|
|
Slab 0 start: 0x7c1b4a600000 + 2048 = 0x7c1b4a600800
|
||
|
|
Block 0: 0x7c1b4a600800
|
||
|
|
Block 1: 0x7c1b4a600c00
|
||
|
|
Block 42: 0x7c1b4a60b000 (offset 43008 from slab 0 start)
|
||
|
|
```
|
||
|
|
|
||
|
|
#### Validation Points
|
||
|
|
- Offset 2048 is correct (aligns to 1024-byte blocks)
|
||
|
|
- `sizeof(SuperSlab) = 1088` requires 2048-byte alignment
|
||
|
|
- All legitimate blocks ARE properly aligned
|
||
|
|
- Corruption comes from use-after-free, not misalignment
|
||
|
|
|
||
|
|
### Conclusion
|
||
|
|
|
||
|
|
The HAKMEM allocator has a **critical memory safety bug** in its lock-free remote free queue. The bug allows:
|
||
|
|
- Use-after-free corruption
|
||
|
|
- Double-free vulnerabilities
|
||
|
|
- Memory corruption of active allocations
|
||
|
|
|
||
|
|
This is a **SECURITY VULNERABILITY** that could be exploited for arbitrary code execution.
|
||
|
|
|
||
|
|
### Author
|
||
|
|
Claude Opus 4.1 (ULTRATHINK Mode)
|
||
|
|
Analysis Date: 2025-11-07
|