Files
hakmem/docs/analysis/PHASE15_REGISTRY_LOOKUP_INVESTIGATION.md
Moe Charm (CI) a9ddb52ad4 ENV cleanup: Remove BG/HotMag vars & guard fprintf (Larson 52.3M ops/s)
Phase 1 完了:環境変数整理 + fprintf デバッグガード

ENV変数削除(BG/HotMag系):
- core/hakmem_tiny_init.inc: HotMag ENV 削除 (~131 lines)
- core/hakmem_tiny_bg_spill.c: BG spill ENV 削除
- core/tiny_refill.h: BG remote 固定値化
- core/hakmem_tiny_slow.inc: BG refs 削除

fprintf Debug Guards (#if !HAKMEM_BUILD_RELEASE):
- core/hakmem_shared_pool.c: Lock stats (~18 fprintf)
- core/page_arena.c: Init/Shutdown/Stats (~27 fprintf)
- core/hakmem.c: SIGSEGV init message

ドキュメント整理:
- 328 markdown files 削除(旧レポート・重複docs)

性能確認:
- Larson: 52.35M ops/s (前回52.8M、安定動作)
- ENV整理による機能影響なし
- Debug出力は一部残存(次phase で対応)

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 14:45:26 +09:00

5.3 KiB
Raw Blame History

Phase 15 Registry Lookup Investigation

Date: 2025-11-15 Status: 🔍 ROOT CAUSE IDENTIFIED

Summary

Page-aligned Tiny allocations reach ExternalGuard → SuperSlab registry lookup FAILS → delegated to __libc_free() → crash.

Critical Findings

1. Registry Only Stores ONE SuperSlab

Evidence:

[SUPER_REG] register base=0x7d3893c00000 lg=21 slot=115870 magic=5353504c

Only 1 registration in entire test run (10K iterations, 100K operations).

2. 4MB Address Gap

Pattern (consistent across multiple runs):

  • Registry stores: 0x7d3893c00000 (SuperSlab structure address)
  • Lookup searches: 0x7d3893800000 (user pointer, 4MB lower)
  • Difference: 0x400000 = 4MB = 2 × SuperSlab size (lg=21, 2MB)

3. User Data Layout

From code analysis (superslab_inline.h:30-35):

size_t off = SUPERSLAB_SLAB0_DATA_OFFSET + (size_t)slab_idx * SLAB_SIZE;
return (uint8_t*)ss + off;

User data is placed AFTER SuperSlab structure, NOT before!

Implication: User pointer 0x7d3893800000 cannot belong to SuperSlab 0x7d3893c00000 (4MB higher).

4. mmap Alignment Mechanism

Code (hakmem_tiny_superslab.c:280-308):

size_t alloc_size = ss_size * 2;  // Allocate 4MB for 2MB SuperSlab
void* raw = mmap(NULL, alloc_size, ...);
uintptr_t aligned_addr = (raw_addr + ss_mask) & ~ss_mask;  // 2MB align

Scenario:

  • mmap returns 0x7d3893800000 (already 2MB-aligned)
  • aligned_addr = 0x7d3893800000 (no change)
  • Prefix size = 0, Suffix = 2MB (munmapped)
  • SuperSlab registered at: 0x7d3893800000

Contradiction: Registry shows 0x7d3893c00000, not 0x7d3893800000!

5. Hash Slot Mismatch

Lookup:

[SUPER_LOOKUP] ptr=0x7d3893800000 lg=21 aligned_base=0x7d3893800000 hash=115868

Registry:

[SUPER_REG] register base=0x7d3893c00000 lg=21 slot=115870

Hash difference: 115868 vs 115870 (2 slots apart) Reason: Linear probing found different slot due to collision.

Root Cause Hypothesis

Option A: Multiple SuperSlabs, Only One Registered

Theory: Multiple SuperSlabs allocated, but only the last one is logged.

Problem: Debug logging should show ALL registrations after fix (ENV check on every call).

Option B: LRU Cache Reuse

Theory: Most SuperSlabs come from LRU cache (already registered), only new allocations are logged.

Problem: First few iterations should still show multiple registrations.

Option C: Pointer is NOT from hakmem

Theory: 0x7d3893800000 is allocated by __libc_malloc(), NOT hakmem.

Evidence:

  • Box BenchMeta uses __libc_calloc for slots[] array
  • free(slots[idx]) uses hakmem wrapper
  • But: slots[] array itself is freed with __libc_free(slots) (Line 99)

Contradiction: slots[] should NOT reach hakmem free() wrapper.

Option D: Registry Lookup Bug

Theory: SuperSlab is registered at 0x7d3893800000, but lookup fails due to:

  1. Hash collision (different slot used during registration vs lookup)
  2. Linear probing limit exceeded (SUPER_MAX_PROBE = 8)
  3. Alignment mismatch (looking for wrong base address)

Test Results Comparison

Phase Test Result Behavior
Phase 14 PASS (5.69M ops/s) No crash with same test
Phase 15 CRASH ExternalGuard → __libc_free() failure

Conclusion: Phase 15 Box Separation introduced regression.

Next Steps

Investigation Needed

  1. Add more detailed logging:

    • Log ALL mmap calls with returned address
    • Log prefix/suffix munmap with exact ranges
    • Log final SuperSlab address vs mmap address
    • Track which pointers are allocated from which SuperSlab
  2. Verify registry integrity:

    • Dump entire registry before crash
    • Check for hash collisions
    • Verify linear probing behavior
  3. Test with reduced SuperSlab size:

    • Try lg=20 (1MB) instead of lg=21 (2MB)
    • See if 2MB gap still occurs

Fix Options

Issue: Registry lookup fails for valid hakmem allocations.

Potential fixes:

  • Increase SUPER_MAX_PROBE from 8 to 16/32
  • Use better hash function to reduce collisions
  • Store address range instead of single base
  • Support lookup by any address within SuperSlab region

Option 2: Improve ExternalGuard Safety ⚠️ WORKAROUND

Current behavior (DANGEROUS):

if (!is_mapped) return 0;  // Delegate to __libc_free → CRASH!

Safer behavior:

if (!is_mapped) {
    fprintf(stderr, "[ExternalGuard] WARNING: Unknown pointer %p (ignored)\n", ptr);
    return 1;  // Claim handled (leak vs crash tradeoff)
}

Pros: Prevents crash Cons: Memory leak for genuinely external pointers

Idea: Add special path for page-aligned Tiny pointers.

Problems:

  • Can't read header at ptr-1 (page boundary violation)
  • Violates 1-byte header design
  • Requires alternative classification

Conclusion

Primary Issue: SuperSlab registry lookup fails for page-aligned user pointers.

Secondary Issue: ExternalGuard unconditionally delegates unknown pointers to __libc_free().

Recommended Action:

  1. Fix registry lookup (Option 1)
  2. Add ExternalGuard safety (Option 2 as backup)
  3. Comprehensive logging to confirm root cause