Phase 1 完了:環境変数整理 + fprintf デバッグガード ENV変数削除(BG/HotMag系): - core/hakmem_tiny_init.inc: HotMag ENV 削除 (~131 lines) - core/hakmem_tiny_bg_spill.c: BG spill ENV 削除 - core/tiny_refill.h: BG remote 固定値化 - core/hakmem_tiny_slow.inc: BG refs 削除 fprintf Debug Guards (#if !HAKMEM_BUILD_RELEASE): - core/hakmem_shared_pool.c: Lock stats (~18 fprintf) - core/page_arena.c: Init/Shutdown/Stats (~27 fprintf) - core/hakmem.c: SIGSEGV init message ドキュメント整理: - 328 markdown files 削除(旧レポート・重複docs) 性能確認: - Larson: 52.35M ops/s (前回52.8M、安定動作✅) - ENV整理による機能影響なし - Debug出力は一部残存(次phase で対応) 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
5.3 KiB
Phase 15 Registry Lookup Investigation
Date: 2025-11-15 Status: 🔍 ROOT CAUSE IDENTIFIED
Summary
Page-aligned Tiny allocations reach ExternalGuard → SuperSlab registry lookup FAILS → delegated to __libc_free() → crash.
Critical Findings
1. Registry Only Stores ONE SuperSlab
Evidence:
[SUPER_REG] register base=0x7d3893c00000 lg=21 slot=115870 magic=5353504c
Only 1 registration in entire test run (10K iterations, 100K operations).
2. 4MB Address Gap
Pattern (consistent across multiple runs):
- Registry stores:
0x7d3893c00000(SuperSlab structure address) - Lookup searches:
0x7d3893800000(user pointer, 4MB lower) - Difference:
0x400000 = 4MB = 2 × SuperSlab size (lg=21, 2MB)
3. User Data Layout
From code analysis (superslab_inline.h:30-35):
size_t off = SUPERSLAB_SLAB0_DATA_OFFSET + (size_t)slab_idx * SLAB_SIZE;
return (uint8_t*)ss + off;
User data is placed AFTER SuperSlab structure, NOT before!
Implication: User pointer 0x7d3893800000 cannot belong to SuperSlab 0x7d3893c00000 (4MB higher).
4. mmap Alignment Mechanism
Code (hakmem_tiny_superslab.c:280-308):
size_t alloc_size = ss_size * 2; // Allocate 4MB for 2MB SuperSlab
void* raw = mmap(NULL, alloc_size, ...);
uintptr_t aligned_addr = (raw_addr + ss_mask) & ~ss_mask; // 2MB align
Scenario:
- mmap returns
0x7d3893800000(already 2MB-aligned) aligned_addr = 0x7d3893800000(no change)- Prefix size = 0, Suffix = 2MB (munmapped)
- SuperSlab registered at:
0x7d3893800000
Contradiction: Registry shows 0x7d3893c00000, not 0x7d3893800000!
5. Hash Slot Mismatch
Lookup:
[SUPER_LOOKUP] ptr=0x7d3893800000 lg=21 aligned_base=0x7d3893800000 hash=115868
Registry:
[SUPER_REG] register base=0x7d3893c00000 lg=21 slot=115870
Hash difference: 115868 vs 115870 (2 slots apart) Reason: Linear probing found different slot due to collision.
Root Cause Hypothesis
Option A: Multiple SuperSlabs, Only One Registered
Theory: Multiple SuperSlabs allocated, but only the last one is logged.
Problem: Debug logging should show ALL registrations after fix (ENV check on every call).
Option B: LRU Cache Reuse
Theory: Most SuperSlabs come from LRU cache (already registered), only new allocations are logged.
Problem: First few iterations should still show multiple registrations.
Option C: Pointer is NOT from hakmem
Theory: 0x7d3893800000 is allocated by __libc_malloc(), NOT hakmem.
Evidence:
- Box BenchMeta uses
__libc_callocforslots[]array free(slots[idx])uses hakmem wrapper- But:
slots[]array itself is freed with__libc_free(slots)(Line 99)
Contradiction: slots[] should NOT reach hakmem free() wrapper.
Option D: Registry Lookup Bug
Theory: SuperSlab is registered at 0x7d3893800000, but lookup fails due to:
- Hash collision (different slot used during registration vs lookup)
- Linear probing limit exceeded (SUPER_MAX_PROBE = 8)
- Alignment mismatch (looking for wrong base address)
Test Results Comparison
| Phase | Test Result | Behavior |
|---|---|---|
| Phase 14 | ✅ PASS (5.69M ops/s) | No crash with same test |
| Phase 15 | ❌ CRASH | ExternalGuard → __libc_free() failure |
Conclusion: Phase 15 Box Separation introduced regression.
Next Steps
Investigation Needed
-
Add more detailed logging:
- Log ALL mmap calls with returned address
- Log prefix/suffix munmap with exact ranges
- Log final SuperSlab address vs mmap address
- Track which pointers are allocated from which SuperSlab
-
Verify registry integrity:
- Dump entire registry before crash
- Check for hash collisions
- Verify linear probing behavior
-
Test with reduced SuperSlab size:
- Try lg=20 (1MB) instead of lg=21 (2MB)
- See if 2MB gap still occurs
Fix Options
Option 1: Fix SuperSlab Registry Lookup ✅ RECOMMENDED
Issue: Registry lookup fails for valid hakmem allocations.
Potential fixes:
- Increase SUPER_MAX_PROBE from 8 to 16/32
- Use better hash function to reduce collisions
- Store address range instead of single base
- Support lookup by any address within SuperSlab region
Option 2: Improve ExternalGuard Safety ⚠️ WORKAROUND
Current behavior (DANGEROUS):
if (!is_mapped) return 0; // Delegate to __libc_free → CRASH!
Safer behavior:
if (!is_mapped) {
fprintf(stderr, "[ExternalGuard] WARNING: Unknown pointer %p (ignored)\n", ptr);
return 1; // Claim handled (leak vs crash tradeoff)
}
Pros: Prevents crash Cons: Memory leak for genuinely external pointers
Option 3: Fix Box FrontGate Classification ❌ NOT RECOMMENDED
Idea: Add special path for page-aligned Tiny pointers.
Problems:
- Can't read header at
ptr-1(page boundary violation) - Violates 1-byte header design
- Requires alternative classification
Conclusion
Primary Issue: SuperSlab registry lookup fails for page-aligned user pointers.
Secondary Issue: ExternalGuard unconditionally delegates unknown pointers to __libc_free().
Recommended Action:
- Fix registry lookup (Option 1)
- Add ExternalGuard safety (Option 2 as backup)
- Comprehensive logging to confirm root cause