Files
hakmem/docs/archive/final_verdict.md
Moe Charm (CI) 52386401b3 Debug Counters Implementation - Clean History
Major Features:
- Debug counter infrastructure for Refill Stage tracking
- Free Pipeline counters (ss_local, ss_remote, tls_sll)
- Diagnostic counters for early return analysis
- Unified larson.sh benchmark runner with profiles
- Phase 6-3 regression analysis documentation

Bug Fixes:
- Fix SuperSlab disabled by default (HAKMEM_TINY_USE_SUPERSLAB)
- Fix profile variable naming consistency
- Add .gitignore patterns for large files

Performance:
- Phase 6-3: 4.79 M ops/s (has OOM risk)
- With SuperSlab: 3.13 M ops/s (+19% improvement)

This is a clean repository without large log files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-05 12:31:14 +09:00

2.9 KiB
Raw Blame History

Final Verdict: HAKMEM Memory Overhead Analysis

The Real Answer

After deep investigation, the 39.6 MB RSS for 1M × 16B allocations breaks down as follows:

Component Breakdown

  1. Actual Data: 15.26 MB (1M × 16B)
  2. Pointer Array: 7.63 MB (test program's void** ptrs)
  3. HAKMEM Overhead: 16.71 MB

Where Does the 16.71 MB Come From?

The investigation revealed that RSS != actual memory allocations due to:

  1. Page Granularity: RSS counts in 4 KB pages

    • Slab size: 64 KB (16 pages)
    • 245 slabs × 16 pages = 3,920 pages
    • 3,920 × 4 KB = 15.31 MB (matches data!)
  2. Metadata is Separate: Bitmaps, slab headers, etc. are allocated separately

    • Primary bitmaps: 122.5 KB
    • Summary bitmaps: 1.9 KB
    • Slab metadata: 21 KB
    • TLS Magazine: 128 KB
    • Total metadata: ~274 KB
  3. The Mystery 16 MB: After eliminating all known sources, the remaining 16 MB is likely:

    • Virtual memory overhead from the system allocator used by aligned_alloc()
    • TLS and stack overhead from threading infrastructure
    • Shared library overhead (HAKMEM itself as a .so file)
    • Process overhead (heap arena, etc.)

The Real Problem: Not What We Thought!

Initial Hypothesis (WRONG)

  • aligned_alloc() wastes 64 KB per slab due to alignment

Evidence Against

  • Test showed aligned_alloc(64KB) × 100 only added 1.5 MB RSS, not 6.4 MB
  • This means system allocator is efficient at alignment

Actual Problem (CORRECT)

The benchmark may be fundamentally flawed!

The test program (test_memory_usage.c) only touches ONE BYTE per allocation:

ptrs[i] = malloc(16);
if (ptrs[i]) *(char*)ptrs[i] = 'A';  // Only touches first byte!

RSS only counts touched pages!

If only the first byte of each 16-byte block is touched, and blocks are packed:

  • 256 blocks fit in 4 KB page (256 × 16B = 4KB)
  • 1M blocks need 3,907 pages minimum
  • But if blocks span pages due to slab boundaries...

Revised Analysis

I need to run actual measurements to understand where the overhead truly comes from.

The Scaling Pattern is Real

100K allocs: HAKMEM 221% OH, mimalloc 234% OH → HAKMEM wins!
1M allocs:   HAKMEM 160% OH, mimalloc 65% OH  → mimalloc wins!

This suggests HAKMEM has:

  • Better fixed overhead (wins at small scale)
  • Worse variable overhead (loses at large scale)

Conclusion

The document MEMORY_OVERHEAD_ANALYSIS.md contains correct diagnostic methodology but may have jumped to conclusions about aligned_alloc().

The real issue is likely one of:

  1. SuperSlab is NOT being used (g_use_superslab=1 but not active)
  2. TLS Magazine is holding too many blocks
  3. Slab fragmentation (last slab partially filled)
  4. Test methodology issue (RSS vs actual allocations)

Recommendation: Run actual instrumented tests with slab counters to see exactly how many slabs are allocated and what their utilization is.