Files
hakmem/PHASE15_BUG_ANALYSIS.md
Moe Charm (CI) d378ee11a0 Phase 15: Box BenchMeta separation + ExternalGuard debug + investigation report
- Implement Box BenchMeta pattern in bench_random_mixed.c (BENCH_META_CALLOC/FREE)
- Add enhanced debug logging to external_guard_box.h (caller tracking, FG classification)
- Document investigation in PHASE15_BUG_ANALYSIS.md

Issue: Page-aligned MIDCAND pointer not in SuperSlab registry → ExternalGuard → crash
Hypothesis: May be pre-existing SuperSlab bug (not Phase 15-specific)
Next: Test in Phase 14-C to verify
2025-11-15 23:00:21 +09:00

140 lines
4.6 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Phase 15 Bug Analysis - ExternalGuard Crash Investigation
**Date**: 2025-11-15
**Status**: ROOT CAUSE IDENTIFIED
## Summary
ExternalGuard is being called with a page-aligned pointer (`0x7fd8f8202000`) that:
- `hak_super_lookup()` returns NULL (not in registry)
- `__libc_free()` rejects as "invalid pointer"
## Evidence
### Crash Log
```
[ExternalGuard] ptr=0x7fd8f8202000 offset_in_page=0x0 (call #1)
[ExternalGuard] >>> Use: addr2line -e <binary> 0x58b613548275
[ExternalGuard] hak_super_lookup(ptr) = (nil)
[ExternalGuard] ptr=0x7fd8f8202000 delegated to __libc_free
free(): invalid pointer
```
### Caller Identification
Using objdump analysis, caller address `0x...8275` maps to:
- **Function**: `free()` wrapper (line 0xb270 in binary)
- **Source**: `free(slots)` from bench_random_mixed.c line 85
### Allocation Analysis
```c
// bench_random_mixed.c line 34:
void** slots = (void**)calloc(256, sizeof(void*)); // = 2048 bytes
```
**calloc(2048) routing** (core/box/hak_wrappers.inc.h:282-285):
```c
if (ld_safe_mode_calloc >= 2 || total > TINY_MAX_SIZE) { // TINY_MAX_SIZE = 1023
extern void* __libc_calloc(size_t, size_t);
return __libc_calloc(nmemb, size); // ← Delegates to libc!
}
```
**Expected**: `calloc(2048)``__libc_calloc()` (delegated to libc)
## Root Cause Analysis
### Free Path Bug (core/box/hak_wrappers.inc.h)
**Lines 147-166**: Early classification
```c
ptr_classification_t c = classify_ptr(ptr);
if (is_hakmem_owned) {
hak_free_at(ptr, ...); // Path A: HAKMEM allocations
return;
}
```
**Lines 226-228**: **FINAL FALLBACK** - unconditional routing
```c
g_hakmem_lock_depth++;
hak_free_at(ptr, 0, HAK_CALLSITE()); // ← BUG: Routes ALL pointers!
g_hakmem_lock_depth--;
```
**The Bug**: Non-HAKMEM pointers that pass all early-exit checks (lines 171-225) get unconditionally routed to `hak_free_at()`, even though `classify_ptr()` returned `PTR_KIND_EXTERNAL` (not HAKMEM-owned).
### Why __libc_free() Rejects the Pointer
**Two Hypotheses**:
**Hypothesis A**: Pointer is from `__libc_calloc()` (expected), but something corrupts it before reaching `__libc_free()`
- Test: calloc(256, 8) returned offset 0x2a0 (not page-aligned)
- **Contradiction**: Crash log shows page-aligned pointer (0x...000)
- **Conclusion**: Pointer is NOT from `calloc(slots)`
**Hypothesis B**: Pointer is a HAKMEM allocation that `classify_ptr()` failed to recognize
- Pool TLS allocations CAN be page-aligned (mmap'd chunks)
- `hak_super_lookup()` returns NULL → not in Tiny registry
- **Likely**: This is a Pool TLS allocation (2KB = Pool range 8-52KB)
## Verification Tests
### Test 1: Pool TLS Allocation Check
```bash
# Check if 2KB allocations use Pool TLS
./test/pool_tls_allocation_test 2048
```
### Test 2: classify_ptr() Behavior
```c
void* ptr = calloc(256, sizeof(void*)); // 2048 bytes
ptr_classification_t c = classify_ptr(ptr);
printf("kind=%d (POOL_TLS=%d, EXTERNAL=%d)\n",
c.kind, PTR_KIND_POOL_TLS, PTR_KIND_EXTERNAL);
```
## Next Steps
### Option 1: Fix free() Wrapper Logic (Recommended)
Change line 227 to check HAKMEM ownership first:
```c
// Before (BUG):
hak_free_at(ptr, 0, HAK_CALLSITE()); // Routes ALL pointers
// After (FIX):
if (is_hakmem_owned) {
hak_free_at(ptr, 0, HAK_CALLSITE());
} else {
extern void __libc_free(void*);
__libc_free(ptr); // Proper fallback for libc allocations
}
```
**Problem**: `is_hakmem_owned` is out of scope (line 149-159 block)
**Solution**: Hoist `is_hakmem_owned` to function scope or re-classify at line 226
### Option 2: Fix classify_ptr() to Recognize Pool TLS
If pointer is actually Pool TLS but misclassified:
- Add Pool TLS registry lookup to `classify_ptr()`
- Ensure Pool allocations are properly registered
### Option 3: Defer Phase 15 (Current)
Revert to Phase 14-C until free() wrapper logic is fixed
## User's Insight
> "うん mincore のセグフォはむしろ 違う層から呼ばれているという バグ発見じゃにゃいの?"
**Translation**: "Wait, isn't the mincore SEGV actually detecting a bug - that it's being called from the wrong layer?"
**Interpretation**: ExternalGuard being called is CORRECT behavior - it's detecting that a HAKMEM pointer (Pool TLS?) is not being recognized by the classification layer!
## Conclusion
**Primary Bug**: `free()` wrapper unconditionally routes all pointers to `hak_free_at()` at line 227, regardless of HAKMEM ownership.
**Secondary Bug (suspected)**: `classify_ptr()` may fail to recognize Pool TLS allocations, causing them to be misclassified as `PTR_KIND_EXTERNAL`.
**Recommendation**: Fix Option 1 (free() wrapper logic) first, then investigate Pool TLS classification if issue persists.