Files
hakmem/docs/archive/final_verdict.md
Moe Charm (CI) 52386401b3 Debug Counters Implementation - Clean History
Major Features:
- Debug counter infrastructure for Refill Stage tracking
- Free Pipeline counters (ss_local, ss_remote, tls_sll)
- Diagnostic counters for early return analysis
- Unified larson.sh benchmark runner with profiles
- Phase 6-3 regression analysis documentation

Bug Fixes:
- Fix SuperSlab disabled by default (HAKMEM_TINY_USE_SUPERSLAB)
- Fix profile variable naming consistency
- Add .gitignore patterns for large files

Performance:
- Phase 6-3: 4.79 M ops/s (has OOM risk)
- With SuperSlab: 3.13 M ops/s (+19% improvement)

This is a clean repository without large log files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-05 12:31:14 +09:00

86 lines
2.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Final Verdict: HAKMEM Memory Overhead Analysis
## The Real Answer
After deep investigation, the 39.6 MB RSS for 1M × 16B allocations breaks down as follows:
### Component Breakdown
1. **Actual Data**: 15.26 MB (1M × 16B)
2. **Pointer Array**: 7.63 MB (test program's `void** ptrs`)
3. **HAKMEM Overhead**: 16.71 MB
### Where Does the 16.71 MB Come From?
The investigation revealed that **RSS != actual memory allocations** due to:
1. **Page Granularity**: RSS counts in 4 KB pages
- Slab size: 64 KB (16 pages)
- 245 slabs × 16 pages = 3,920 pages
- 3,920 × 4 KB = 15.31 MB (matches data!)
2. **Metadata is Separate**: Bitmaps, slab headers, etc. are allocated separately
- Primary bitmaps: 122.5 KB
- Summary bitmaps: 1.9 KB
- Slab metadata: 21 KB
- TLS Magazine: 128 KB
- **Total metadata: ~274 KB**
3. **The Mystery 16 MB**:
After eliminating all known sources, the remaining 16 MB is likely:
- **Virtual memory overhead from the system allocator** used by `aligned_alloc()`
- **TLS and stack overhead** from threading infrastructure
- **Shared library overhead** (HAKMEM itself as a .so file)
- **Process overhead** (heap arena, etc.)
## The Real Problem: Not What We Thought!
### Initial Hypothesis (WRONG)
- `aligned_alloc()` wastes 64 KB per slab due to alignment
### Evidence Against
- Test showed `aligned_alloc(64KB) × 100` only added 1.5 MB RSS, not 6.4 MB
- This means system allocator is efficient at alignment
### Actual Problem (CORRECT)
**The benchmark may be fundamentally flawed!**
The test program (`test_memory_usage.c`) only touches ONE BYTE per allocation:
```c
ptrs[i] = malloc(16);
if (ptrs[i]) *(char*)ptrs[i] = 'A'; // Only touches first byte!
```
**RSS only counts touched pages!**
If only the first byte of each 16-byte block is touched, and blocks are packed:
- 256 blocks fit in 4 KB page (256 × 16B = 4KB)
- 1M blocks need 3,907 pages minimum
- But if blocks span pages due to slab boundaries...
## Revised Analysis
I need to run actual measurements to understand where the overhead truly comes from.
### The Scaling Pattern is Real
```
100K allocs: HAKMEM 221% OH, mimalloc 234% OH → HAKMEM wins!
1M allocs: HAKMEM 160% OH, mimalloc 65% OH → mimalloc wins!
```
This suggests HAKMEM has:
- **Better fixed overhead** (wins at small scale)
- **Worse variable overhead** (loses at large scale)
## Conclusion
The document `MEMORY_OVERHEAD_ANALYSIS.md` contains correct diagnostic methodology but may have jumped to conclusions about `aligned_alloc()`.
The real issue is likely one of:
1. SuperSlab is NOT being used (g_use_superslab=1 but not active)
2. TLS Magazine is holding too many blocks
3. Slab fragmentation (last slab partially filled)
4. Test methodology issue (RSS vs actual allocations)
**Recommendation**: Run actual instrumented tests with slab counters to see exactly how many slabs are allocated and what their utilization is.