Files
hakmem/docs/analysis/PHASE15_REGISTRY_LOOKUP_INVESTIGATION.md
Moe Charm (CI) a9ddb52ad4 ENV cleanup: Remove BG/HotMag vars & guard fprintf (Larson 52.3M ops/s)
Phase 1 完了:環境変数整理 + fprintf デバッグガード

ENV変数削除(BG/HotMag系):
- core/hakmem_tiny_init.inc: HotMag ENV 削除 (~131 lines)
- core/hakmem_tiny_bg_spill.c: BG spill ENV 削除
- core/tiny_refill.h: BG remote 固定値化
- core/hakmem_tiny_slow.inc: BG refs 削除

fprintf Debug Guards (#if !HAKMEM_BUILD_RELEASE):
- core/hakmem_shared_pool.c: Lock stats (~18 fprintf)
- core/page_arena.c: Init/Shutdown/Stats (~27 fprintf)
- core/hakmem.c: SIGSEGV init message

ドキュメント整理:
- 328 markdown files 削除(旧レポート・重複docs)

性能確認:
- Larson: 52.35M ops/s (前回52.8M、安定動作)
- ENV整理による機能影響なし
- Debug出力は一部残存(次phase で対応)

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 14:45:26 +09:00

183 lines
5.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Phase 15 Registry Lookup Investigation
**Date**: 2025-11-15
**Status**: 🔍 ROOT CAUSE IDENTIFIED
## Summary
Page-aligned Tiny allocations reach ExternalGuard → SuperSlab registry lookup FAILS → delegated to `__libc_free()` → crash.
## Critical Findings
### 1. Registry Only Stores ONE SuperSlab
**Evidence**:
```
[SUPER_REG] register base=0x7d3893c00000 lg=21 slot=115870 magic=5353504c
```
**Only 1 registration** in entire test run (10K iterations, 100K operations).
### 2. 4MB Address Gap
**Pattern** (consistent across multiple runs):
- **Registry stores**: `0x7d3893c00000` (SuperSlab structure address)
- **Lookup searches**: `0x7d3893800000` (user pointer, 4MB **lower**)
- **Difference**: `0x400000 = 4MB = 2 × SuperSlab size (lg=21, 2MB)`
### 3. User Data Layout
**From code analysis** (`superslab_inline.h:30-35`):
```c
size_t off = SUPERSLAB_SLAB0_DATA_OFFSET + (size_t)slab_idx * SLAB_SIZE;
return (uint8_t*)ss + off;
```
**User data is placed AFTER SuperSlab structure**, NOT before!
**Implication**: User pointer `0x7d3893800000` **cannot** belong to SuperSlab `0x7d3893c00000` (4MB higher).
### 4. mmap Alignment Mechanism
**Code** (`hakmem_tiny_superslab.c:280-308`):
```c
size_t alloc_size = ss_size * 2; // Allocate 4MB for 2MB SuperSlab
void* raw = mmap(NULL, alloc_size, ...);
uintptr_t aligned_addr = (raw_addr + ss_mask) & ~ss_mask; // 2MB align
```
**Scenario**:
- mmap returns `0x7d3893800000` (already 2MB-aligned)
- `aligned_addr = 0x7d3893800000` (no change)
- Prefix size = 0, Suffix = 2MB (munmapped)
- **SuperSlab registered at**: `0x7d3893800000`
**Contradiction**: Registry shows `0x7d3893c00000`, not `0x7d3893800000`!
### 5. Hash Slot Mismatch
**Lookup**:
```
[SUPER_LOOKUP] ptr=0x7d3893800000 lg=21 aligned_base=0x7d3893800000 hash=115868
```
**Registry**:
```
[SUPER_REG] register base=0x7d3893c00000 lg=21 slot=115870
```
**Hash difference**: 115868 vs 115870 (2 slots apart)
**Reason**: Linear probing found different slot due to collision.
## Root Cause Hypothesis
### Option A: Multiple SuperSlabs, Only One Registered
**Theory**: Multiple SuperSlabs allocated, but only the **last one** is logged.
**Problem**: Debug logging should show ALL registrations after fix (ENV check on every call).
### Option B: LRU Cache Reuse
**Theory**: Most SuperSlabs come from LRU cache (already registered), only new allocations are logged.
**Problem**: First few iterations should still show multiple registrations.
### Option C: Pointer is NOT from hakmem
**Theory**: `0x7d3893800000` is allocated by **`__libc_malloc()`**, NOT hakmem.
**Evidence**:
- Box BenchMeta uses `__libc_calloc` for `slots[]` array
- `free(slots[idx])` uses hakmem wrapper
- **But**: `slots[]` array itself is freed with `__libc_free(slots)` (Line 99)
**Contradiction**: `slots[]` should NOT reach hakmem `free()` wrapper.
### Option D: Registry Lookup Bug
**Theory**: SuperSlab **is** registered at `0x7d3893800000`, but lookup fails due to:
1. Hash collision (different slot used during registration vs lookup)
2. Linear probing limit exceeded (SUPER_MAX_PROBE = 8)
3. Alignment mismatch (looking for wrong base address)
## Test Results Comparison
| Phase | Test Result | Behavior |
|-------|-------------|----------|
| Phase 14 | ✅ PASS (5.69M ops/s) | No crash with same test |
| Phase 15 | ❌ CRASH | ExternalGuard → `__libc_free()` failure |
**Conclusion**: Phase 15 Box Separation introduced regression.
## Next Steps
### Investigation Needed
1. **Add more detailed logging**:
- Log ALL mmap calls with returned address
- Log prefix/suffix munmap with exact ranges
- Log final SuperSlab address vs mmap address
- Track which pointers are allocated from which SuperSlab
2. **Verify registry integrity**:
- Dump entire registry before crash
- Check for hash collisions
- Verify linear probing behavior
3. **Test with reduced SuperSlab size**:
- Try lg=20 (1MB) instead of lg=21 (2MB)
- See if 2MB gap still occurs
### Fix Options
#### **Option 1: Fix SuperSlab Registry Lookup** ✅ **RECOMMENDED**
**Issue**: Registry lookup fails for valid hakmem allocations.
**Potential fixes**:
- Increase SUPER_MAX_PROBE from 8 to 16/32
- Use better hash function to reduce collisions
- Store address **range** instead of single base
- Support lookup by any address within SuperSlab region
#### **Option 2: Improve ExternalGuard Safety** ⚠️ **WORKAROUND**
**Current behavior** (DANGEROUS):
```c
if (!is_mapped) return 0; // Delegate to __libc_free → CRASH!
```
**Safer behavior**:
```c
if (!is_mapped) {
fprintf(stderr, "[ExternalGuard] WARNING: Unknown pointer %p (ignored)\n", ptr);
return 1; // Claim handled (leak vs crash tradeoff)
}
```
**Pros**: Prevents crash
**Cons**: Memory leak for genuinely external pointers
#### **Option 3: Fix Box FrontGate Classification** ❌ NOT RECOMMENDED
**Idea**: Add special path for page-aligned Tiny pointers.
**Problems**:
- Can't read header at `ptr-1` (page boundary violation)
- Violates 1-byte header design
- Requires alternative classification
## Conclusion
**Primary Issue**: SuperSlab registry lookup fails for page-aligned user pointers.
**Secondary Issue**: ExternalGuard unconditionally delegates unknown pointers to `__libc_free()`.
**Recommended Action**:
1. Fix registry lookup (Option 1)
2. Add ExternalGuard safety (Option 2 as backup)
3. Comprehensive logging to confirm root cause