Problem: - Phase 6-2.5 changed SUPERSLAB_SLAB0_DATA_OFFSET from 1024 → 2048 - Fixed sizeof(SuperSlab) mismatch (1088 bytes) - But 3 locations still used old slab_data_start() + manual offset This caused: - Address mismatch between allocation carving and validation - Freelist corruption false positives - 53-byte misalignment errors resolved, but new errors appeared Changes: 1. core/tiny_tls_guard.h:34 - Validation: slab_data_start() → tiny_slab_base_for() - Ensures validation uses same base address as allocation 2. core/hakmem_tiny_refill.inc.h:222 - Allocation carving: Remove manual +2048 hack - Use canonical tiny_slab_base_for() 3. core/hakmem_tiny_refill.inc.h:275 - Bump allocation: Remove duplicate slab_start calculation - Use existing base calculation with tiny_slab_base_for() Result: - Consistent use of tiny_slab_base_for() across all paths - All code uses SUPERSLAB_SLAB0_DATA_OFFSET constant - Remaining freelist corruption needs deeper investigation (not simple offset bug) Related commits: -d2f0d8458: Phase 6-2.5 (constants.h + 2048 offset) -c9053a43a: Phase 6-2.3~6-2.4 (active counter + SEGV fixes) 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
4.4 KiB
4.4 KiB
FREELIST CORRUPTION ROOT CAUSE ANALYSIS
Phase 6-2.5 SLAB0_DATA_OFFSET Investigation
Executive Summary
The freelist corruption after changing SLAB0_DATA_OFFSET from 1024 to 2048 is NOT caused by the offset change. The root cause is a use-after-free vulnerability in the remote free queue combined with massive double-frees.
Timeline
- Initial symptom:
[TRC_FAILFAST] stage=freelist_next cls=7 node=0x7e1ff3c1d474 - Investigation started: After Phase 6-2.5 offset change
- Root cause found: Use-after-free in
ss_remote_push+ double-frees
Root Cause Analysis
1. Double-Free Epidemic
# Test reveals 180+ duplicate freed addresses
HAKMEM_WRAP_TINY=1 ./larson_hakmem 1 1 1024 1024 1 12345 1 | \
grep "free_local_box" | awk '{print $6}' | sort | uniq -d | wc -l
# Result: 180+ duplicates
2. Use-After-Free Vulnerability
Location: /mnt/workdisk/public_share/hakmem/core/hakmem_tiny_superslab.h:437
static inline int ss_remote_push(SuperSlab* ss, int slab_idx, void* ptr) {
// ... validation ...
do {
old = atomic_load_explicit(head, memory_order_acquire);
if (!g_remote_side_enable) {
*(void**)ptr = (void*)old; // ← WRITES TO POTENTIALLY ALLOCATED MEMORY!
}
} while (!atomic_compare_exchange_weak_explicit(...));
}
3. The Attack Sequence
- Thread A frees block X → pushed to remote queue (next pointer written)
- Thread B (owner) drains remote queue → adds X to freelist
- Thread B allocates X → application starts using it
- Thread C double-frees X → corrupts active user memory
- User writes data including
0x6261pattern - Freelist traversal interprets user data as next pointer → CRASH
Evidence
Corrupted Pointers
0x7c1b4a606261- User data ending with 0x6261 pattern0x6261- Pure user data, no valid address- Pattern
0x6261detected as "TLS guard scribble" in code
Debug Output
[TRC_FREELIST_LOG] stage=free_local_box cls=7 node=0x7da27ec0b800 next=0x7da27ec0bc00
[TRC_FREELIST_LOG] stage=free_local_box cls=7 node=0x7da27ec0b800 next=0x7da27ec04000
^^^^^^^^^^^ SAME ADDRESS FREED TWICE!
Remote Queue Activity
[DEBUG ss_remote_push] Call #1 ss=0x735d23e00000 slab_idx=0
[DEBUG ss_remote_push] Call #2 ss=0x735d23e00000 slab_idx=5
[TRC_FAILFAST] stage=freelist_next cls=7 node=0x6261
Why SLAB0_DATA_OFFSET Change Exposed This
The offset change from 1024 to 2048 didn't cause the bug but may have:
- Changed memory layout/timing
- Made corruption more visible
- Affected which blocks get double-freed
- The bug existed before but was latent
Attempted Mitigations
1. Enable Safe Free (COMPLETED)
// core/hakmem_tiny.c:39
int g_tiny_safe_free = 1; // ULTRATHINK FIX: Enable by default
Result: Still crashes - race condition persists
2. Required Fixes (PENDING)
- Add ownership validation before writing next pointer
- Implement proper memory barriers
- Add atomic state tracking for blocks
- Consider hazard pointers or epoch-based reclamation
Reproduction
# Immediate crash with SuperSlab enabled
HAKMEM_WRAP_TINY=1 ./larson_hakmem 1 1 1024 1024 1 12345 1
# Works fine without SuperSlab
HAKMEM_WRAP_TINY=0 ./larson_hakmem 1 1 1024 1024 1 12345 1
Recommendations
- IMMEDIATE: Do not use in production
- SHORT-TERM: Disable remote free queue (
HAKMEM_TINY_DISABLE_REMOTE=1) - LONG-TERM: Redesign lock-free MPSC with safe memory reclamation
Technical Details
Memory Layout (Class 7, 1024-byte blocks)
SuperSlab base: 0x7c1b4a600000
Slab 0 start: 0x7c1b4a600000 + 2048 = 0x7c1b4a600800
Block 0: 0x7c1b4a600800
Block 1: 0x7c1b4a600c00
Block 42: 0x7c1b4a60b000 (offset 43008 from slab 0 start)
Validation Points
- Offset 2048 is correct (aligns to 1024-byte blocks)
sizeof(SuperSlab) = 1088requires 2048-byte alignment- All legitimate blocks ARE properly aligned
- Corruption comes from use-after-free, not misalignment
Conclusion
The HAKMEM allocator has a critical memory safety bug in its lock-free remote free queue. The bug allows:
- Use-after-free corruption
- Double-free vulnerabilities
- Memory corruption of active allocations
This is a SECURITY VULNERABILITY that could be exploited for arbitrary code execution.
Author
Claude Opus 4.1 (ULTRATHINK Mode) Analysis Date: 2025-11-07