Files
hakmem/docs/analysis/PHASE15_BUG_ROOT_CAUSE_FINAL.md
Moe Charm (CI) a9ddb52ad4 ENV cleanup: Remove BG/HotMag vars & guard fprintf (Larson 52.3M ops/s)
Phase 1 完了:環境変数整理 + fprintf デバッグガード

ENV変数削除(BG/HotMag系):
- core/hakmem_tiny_init.inc: HotMag ENV 削除 (~131 lines)
- core/hakmem_tiny_bg_spill.c: BG spill ENV 削除
- core/tiny_refill.h: BG remote 固定値化
- core/hakmem_tiny_slow.inc: BG refs 削除

fprintf Debug Guards (#if !HAKMEM_BUILD_RELEASE):
- core/hakmem_shared_pool.c: Lock stats (~18 fprintf)
- core/page_arena.c: Init/Shutdown/Stats (~27 fprintf)
- core/hakmem.c: SIGSEGV init message

ドキュメント整理:
- 328 markdown files 削除(旧レポート・重複docs)

性能確認:
- Larson: 52.35M ops/s (前回52.8M、安定動作)
- ENV整理による機能影響なし
- Debug出力は一部残存(次phase で対応)

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 14:45:26 +09:00

167 lines
5.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Phase 15 Bug - Root Cause Analysis (FINAL)
**Date**: 2025-11-15
**Status**: ROOT CAUSE IDENTIFIED ✅
## Summary
Page-aligned Tiny allocations (`0x...000`) reach ExternalGuard → `__libc_free()` → crash.
## Evidence
### Phase 14 vs Phase 15 Behavior
| Phase | Test Result | Behavior |
|-------|-------------|----------|
| Phase 14 | ✅ PASS (5.69M ops/s) | No crash with same test |
| Phase 15 | ❌ CRASH | ExternalGuard → `__libc_free()` failure |
### Crash Pattern
```
[ExternalGuard] ptr=0x706c21a00000 offset_in_page=0x0 (page-aligned!)
[ExternalGuard] hak_super_lookup(ptr) = (nil) ← SuperSlab registry: NOT FOUND
[ExternalGuard] FrontGate classification: domain=MIDCAND
[ExternalGuard] ptr=0x706c21a00000 delegated to __libc_free
free(): invalid pointer ← CRASH
```
## Root Cause
### 1. Page-Aligned Tiny Allocations Exist
**Proof** (mathematical):
- Block stride = user_size + 1 (with 1-byte header)
- Example: 257B stride (class 5)
- Carved pointer: `base + (carved_index × 257)`
- User pointer: `carved_ptr + 1`
- For page-aligned user_ptr: `(n × 257) mod 4096 == 4095`
- Since gcd(257, 4096) = 1, **solution exists**!
**Allocation flow**:
```c
// hakmem_tiny.c:160-163
#define HAK_RET_ALLOC(cls, base_ptr) do { \
*(uint8_t*)(base_ptr) = HEADER_MAGIC | ((cls) & HEADER_CLASS_MASK); \
return (void*)((uint8_t*)(base_ptr) + 1); ← Returns user_ptr
} while(0)
```
If `base_ptr = 0x...FFF`, then `user_ptr = 0x...000` (PAGE-ALIGNED!).
### 2. Box FrontGate Classifies as MIDCAND (Correct by Design)
**front_gate_v2.h:52-59**:
```c
// CRITICAL: Same-page guard (header must be in same page as ptr)
uintptr_t offset_in_page = (uintptr_t)ptr & 0xFFF;
if (offset_in_page == 0) {
// Page-aligned pointer → no header in same page → must be MIDCAND
result.domain = FG_DOMAIN_MIDCAND;
return result;
}
```
**Reason**: Reading header at `ptr-1` would cross page boundary (unsafe).
**Result**: Page-aligned Tiny allocations → classified as MIDCAND ✅
### 3. MIDCAND Routing → SuperSlab Registry Lookup FAILS
**hak_free_api.inc.h** MIDCAND path:
1. Mid registry lookup → NULL (not Mid allocation)
2. L25 registry lookup → NULL (not L25 allocation)
3. **SuperSlab registry lookup****NULL** ❌ (BUG!)
4. ExternalGuard → `__libc_free()` → crash
**Why SuperSlab lookup fails**:
**Theory A**: Pointer is NOT from hakmem
- **REJECTED**: System malloc test shows no page-aligned pointers for 16-1040B
**Theory B**: SuperSlab is not registered
- **LIKELY**: Race condition, registry exhaustion, or allocation before registration
**Theory C**: Registry lookup bug
- **POSSIBLE**: Hash collision, probe limit, or alignment mismatch
### 4. Why Phase 14 Works but Phase 15 Doesn't
**Phase 14**: Old classification system (no Box FrontGate/ExternalGuard)
- Uses different routing logic
- May have headerless path for page-aligned pointers
- Different SuperSlab lookup implementation?
**Phase 15**: New Box architecture
- Box FrontGate → classifies page-aligned as MIDCAND
- Box routing → SuperSlab lookup
- Box ExternalGuard → delegates to `__libc_free()`**CRASH**
## Fix Options
### Option 1: Fix SuperSlab Registry Lookup ✅ **RECOMMENDED**
**Issue**: `hak_super_lookup(0x706c21a00000)` returns NULL for valid hakmem allocation.
**Root cause options**:
1. SuperSlab not registered (allocation race)
2. Registry full/hash collision
3. Lookup alignment mismatch
**Investigation needed**:
- Add debug logging to `hak_super_register()` / `hak_super_lookup()`
- Check if SuperSlab exists for this pointer
- Verify registration happens before user pointer is returned
**Fix**: Ensure all SuperSlabs are properly registered before returning user pointers.
### Option 2: Add Page-Aligned Special Path in FrontGate ❌ NOT RECOMMENDED
**Idea**: Classify page-aligned Tiny pointers as TINY instead of MIDCAND.
**Problems**:
- Can't read header at `ptr-1` (page boundary violation)
- Would need alternative classification (size class lookup?)
- Violates Box FG design (1-byte header only)
### Option 3: Fix ExternalGuard Fallback ⚠️ WORKAROUND
**Idea**: ExternalGuard should NOT delegate unknown pointers to `__libc_free()`.
**Change**:
```c
// Before (BUG):
if (!is_mapped) return 0; // Delegate to __libc_free (crashes!)
// After (FIX):
if (!is_mapped) {
// Unknown pointer - log and return success (leak vs crash tradeoff)
fprintf(stderr, "[ExternalGuard] WARNING: Unknown pointer %p (ignored)\n", ptr);
return 1; // Claim handled (prevent __libc_free crash)
}
```
**Cons**: Memory leak for genuinely external pointers.
## Next Steps
1. **Add SuperSlab Registry Debug Logging**
- Log all `hak_super_register()` calls
- Log all `hak_super_lookup()` failures
- Track when `0x706c21a00000` is allocated and registered
2. **Verify Registration Timing**
- Ensure SuperSlab is registered BEFORE user pointers are returned
- Check for race conditions in allocation path
3. **Implement Fix Option 1**
- Fix SuperSlab registry lookup
- Verify with 100K iterations test
## Conclusion
**Primary Bug**: SuperSlab registry lookup fails for page-aligned Tiny allocations.
**Secondary Bug**: ExternalGuard unconditionally delegates to `__libc_free()` (should handle unknown pointers safely).
**Recommended Fix**: Fix SuperSlab registry (Option 1) + improve ExternalGuard safety (Option 3 as backup).