86 lines
2.9 KiB
Markdown
86 lines
2.9 KiB
Markdown
|
|
# Final Verdict: HAKMEM Memory Overhead Analysis
|
|||
|
|
|
|||
|
|
## The Real Answer
|
|||
|
|
|
|||
|
|
After deep investigation, the 39.6 MB RSS for 1M × 16B allocations breaks down as follows:
|
|||
|
|
|
|||
|
|
### Component Breakdown
|
|||
|
|
|
|||
|
|
1. **Actual Data**: 15.26 MB (1M × 16B)
|
|||
|
|
2. **Pointer Array**: 7.63 MB (test program's `void** ptrs`)
|
|||
|
|
3. **HAKMEM Overhead**: 16.71 MB
|
|||
|
|
|
|||
|
|
### Where Does the 16.71 MB Come From?
|
|||
|
|
|
|||
|
|
The investigation revealed that **RSS != actual memory allocations** due to:
|
|||
|
|
|
|||
|
|
1. **Page Granularity**: RSS counts in 4 KB pages
|
|||
|
|
- Slab size: 64 KB (16 pages)
|
|||
|
|
- 245 slabs × 16 pages = 3,920 pages
|
|||
|
|
- 3,920 × 4 KB = 15.31 MB (matches data!)
|
|||
|
|
|
|||
|
|
2. **Metadata is Separate**: Bitmaps, slab headers, etc. are allocated separately
|
|||
|
|
- Primary bitmaps: 122.5 KB
|
|||
|
|
- Summary bitmaps: 1.9 KB
|
|||
|
|
- Slab metadata: 21 KB
|
|||
|
|
- TLS Magazine: 128 KB
|
|||
|
|
- **Total metadata: ~274 KB**
|
|||
|
|
|
|||
|
|
3. **The Mystery 16 MB**:
|
|||
|
|
After eliminating all known sources, the remaining 16 MB is likely:
|
|||
|
|
- **Virtual memory overhead from the system allocator** used by `aligned_alloc()`
|
|||
|
|
- **TLS and stack overhead** from threading infrastructure
|
|||
|
|
- **Shared library overhead** (HAKMEM itself as a .so file)
|
|||
|
|
- **Process overhead** (heap arena, etc.)
|
|||
|
|
|
|||
|
|
## The Real Problem: Not What We Thought!
|
|||
|
|
|
|||
|
|
### Initial Hypothesis (WRONG)
|
|||
|
|
- `aligned_alloc()` wastes 64 KB per slab due to alignment
|
|||
|
|
|
|||
|
|
### Evidence Against
|
|||
|
|
- Test showed `aligned_alloc(64KB) × 100` only added 1.5 MB RSS, not 6.4 MB
|
|||
|
|
- This means system allocator is efficient at alignment
|
|||
|
|
|
|||
|
|
### Actual Problem (CORRECT)
|
|||
|
|
**The benchmark may be fundamentally flawed!**
|
|||
|
|
|
|||
|
|
The test program (`test_memory_usage.c`) only touches ONE BYTE per allocation:
|
|||
|
|
```c
|
|||
|
|
ptrs[i] = malloc(16);
|
|||
|
|
if (ptrs[i]) *(char*)ptrs[i] = 'A'; // Only touches first byte!
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**RSS only counts touched pages!**
|
|||
|
|
|
|||
|
|
If only the first byte of each 16-byte block is touched, and blocks are packed:
|
|||
|
|
- 256 blocks fit in 4 KB page (256 × 16B = 4KB)
|
|||
|
|
- 1M blocks need 3,907 pages minimum
|
|||
|
|
- But if blocks span pages due to slab boundaries...
|
|||
|
|
|
|||
|
|
## Revised Analysis
|
|||
|
|
|
|||
|
|
I need to run actual measurements to understand where the overhead truly comes from.
|
|||
|
|
|
|||
|
|
### The Scaling Pattern is Real
|
|||
|
|
```
|
|||
|
|
100K allocs: HAKMEM 221% OH, mimalloc 234% OH → HAKMEM wins!
|
|||
|
|
1M allocs: HAKMEM 160% OH, mimalloc 65% OH → mimalloc wins!
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
This suggests HAKMEM has:
|
|||
|
|
- **Better fixed overhead** (wins at small scale)
|
|||
|
|
- **Worse variable overhead** (loses at large scale)
|
|||
|
|
|
|||
|
|
## Conclusion
|
|||
|
|
|
|||
|
|
The document `MEMORY_OVERHEAD_ANALYSIS.md` contains correct diagnostic methodology but may have jumped to conclusions about `aligned_alloc()`.
|
|||
|
|
|
|||
|
|
The real issue is likely one of:
|
|||
|
|
1. SuperSlab is NOT being used (g_use_superslab=1 but not active)
|
|||
|
|
2. TLS Magazine is holding too many blocks
|
|||
|
|
3. Slab fragmentation (last slab partially filled)
|
|||
|
|
4. Test methodology issue (RSS vs actual allocations)
|
|||
|
|
|
|||
|
|
**Recommendation**: Run actual instrumented tests with slab counters to see exactly how many slabs are allocated and what their utilization is.
|