# Final Verdict: HAKMEM Memory Overhead Analysis ## The Real Answer After deep investigation, the 39.6 MB RSS for 1M × 16B allocations breaks down as follows: ### Component Breakdown 1. **Actual Data**: 15.26 MB (1M × 16B) 2. **Pointer Array**: 7.63 MB (test program's `void** ptrs`) 3. **HAKMEM Overhead**: 16.71 MB ### Where Does the 16.71 MB Come From? The investigation revealed that **RSS != actual memory allocations** due to: 1. **Page Granularity**: RSS counts in 4 KB pages - Slab size: 64 KB (16 pages) - 245 slabs × 16 pages = 3,920 pages - 3,920 × 4 KB = 15.31 MB (matches data!) 2. **Metadata is Separate**: Bitmaps, slab headers, etc. are allocated separately - Primary bitmaps: 122.5 KB - Summary bitmaps: 1.9 KB - Slab metadata: 21 KB - TLS Magazine: 128 KB - **Total metadata: ~274 KB** 3. **The Mystery 16 MB**: After eliminating all known sources, the remaining 16 MB is likely: - **Virtual memory overhead from the system allocator** used by `aligned_alloc()` - **TLS and stack overhead** from threading infrastructure - **Shared library overhead** (HAKMEM itself as a .so file) - **Process overhead** (heap arena, etc.) ## The Real Problem: Not What We Thought! ### Initial Hypothesis (WRONG) - `aligned_alloc()` wastes 64 KB per slab due to alignment ### Evidence Against - Test showed `aligned_alloc(64KB) × 100` only added 1.5 MB RSS, not 6.4 MB - This means system allocator is efficient at alignment ### Actual Problem (CORRECT) **The benchmark may be fundamentally flawed!** The test program (`test_memory_usage.c`) only touches ONE BYTE per allocation: ```c ptrs[i] = malloc(16); if (ptrs[i]) *(char*)ptrs[i] = 'A'; // Only touches first byte! ``` **RSS only counts touched pages!** If only the first byte of each 16-byte block is touched, and blocks are packed: - 256 blocks fit in 4 KB page (256 × 16B = 4KB) - 1M blocks need 3,907 pages minimum - But if blocks span pages due to slab boundaries... ## Revised Analysis I need to run actual measurements to understand where the overhead truly comes from. ### The Scaling Pattern is Real ``` 100K allocs: HAKMEM 221% OH, mimalloc 234% OH → HAKMEM wins! 1M allocs: HAKMEM 160% OH, mimalloc 65% OH → mimalloc wins! ``` This suggests HAKMEM has: - **Better fixed overhead** (wins at small scale) - **Worse variable overhead** (loses at large scale) ## Conclusion The document `MEMORY_OVERHEAD_ANALYSIS.md` contains correct diagnostic methodology but may have jumped to conclusions about `aligned_alloc()`. The real issue is likely one of: 1. SuperSlab is NOT being used (g_use_superslab=1 but not active) 2. TLS Magazine is holding too many blocks 3. Slab fragmentation (last slab partially filled) 4. Test methodology issue (RSS vs actual allocations) **Recommendation**: Run actual instrumented tests with slab counters to see exactly how many slabs are allocated and what their utilization is.