Files
hakmem/docs/status/PHASE15_BUG_ANALYSIS.md
Moe Charm (CI) 67fb15f35f Wrap debug fprintf in !HAKMEM_BUILD_RELEASE guards (Release build optimization)
## Changes

### 1. core/page_arena.c
- Removed init failure message (lines 25-27) - error is handled by returning early
- All other fprintf statements already wrapped in existing #if !HAKMEM_BUILD_RELEASE blocks

### 2. core/hakmem.c
- Wrapped SIGSEGV handler init message (line 72)
- CRITICAL: Kept SIGSEGV/SIGBUS/SIGABRT error messages (lines 62-64) - production needs crash logs

### 3. core/hakmem_shared_pool.c
- Wrapped all debug fprintf statements in #if !HAKMEM_BUILD_RELEASE:
  - Node pool exhaustion warning (line 252)
  - SP_META_CAPACITY_ERROR warning (line 421)
  - SP_FIX_GEOMETRY debug logging (line 745)
  - SP_ACQUIRE_STAGE0.5_EMPTY debug logging (line 865)
  - SP_ACQUIRE_STAGE0_L0 debug logging (line 803)
  - SP_ACQUIRE_STAGE1_LOCKFREE debug logging (line 922)
  - SP_ACQUIRE_STAGE2_LOCKFREE debug logging (line 996)
  - SP_ACQUIRE_STAGE3 debug logging (line 1116)
  - SP_SLOT_RELEASE debug logging (line 1245)
  - SP_SLOT_FREELIST_LOCKFREE debug logging (line 1305)
  - SP_SLOT_COMPLETELY_EMPTY debug logging (line 1316)
- Fixed lock_stats_init() for release builds (lines 60-65) - ensure g_lock_stats_enabled is initialized

## Performance Validation

Before: 51M ops/s (with debug fprintf overhead)
After:  49.1M ops/s (consistent performance, fprintf removed from hot paths)

## Build & Test

```bash
./build.sh larson_hakmem
./out/release/larson_hakmem 1 5 1 1000 100 10000 42
# Result: 49.1M ops/s
```

Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 13:14:18 +09:00

140 lines
4.6 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Phase 15 Bug Analysis - ExternalGuard Crash Investigation
**Date**: 2025-11-15
**Status**: ROOT CAUSE IDENTIFIED
## Summary
ExternalGuard is being called with a page-aligned pointer (`0x7fd8f8202000`) that:
- `hak_super_lookup()` returns NULL (not in registry)
- `__libc_free()` rejects as "invalid pointer"
## Evidence
### Crash Log
```
[ExternalGuard] ptr=0x7fd8f8202000 offset_in_page=0x0 (call #1)
[ExternalGuard] >>> Use: addr2line -e <binary> 0x58b613548275
[ExternalGuard] hak_super_lookup(ptr) = (nil)
[ExternalGuard] ptr=0x7fd8f8202000 delegated to __libc_free
free(): invalid pointer
```
### Caller Identification
Using objdump analysis, caller address `0x...8275` maps to:
- **Function**: `free()` wrapper (line 0xb270 in binary)
- **Source**: `free(slots)` from bench_random_mixed.c line 85
### Allocation Analysis
```c
// bench_random_mixed.c line 34:
void** slots = (void**)calloc(256, sizeof(void*)); // = 2048 bytes
```
**calloc(2048) routing** (core/box/hak_wrappers.inc.h:282-285):
```c
if (ld_safe_mode_calloc >= 2 || total > TINY_MAX_SIZE) { // TINY_MAX_SIZE = 1023
extern void* __libc_calloc(size_t, size_t);
return __libc_calloc(nmemb, size); // ← Delegates to libc!
}
```
**Expected**: `calloc(2048)``__libc_calloc()` (delegated to libc)
## Root Cause Analysis
### Free Path Bug (core/box/hak_wrappers.inc.h)
**Lines 147-166**: Early classification
```c
ptr_classification_t c = classify_ptr(ptr);
if (is_hakmem_owned) {
hak_free_at(ptr, ...); // Path A: HAKMEM allocations
return;
}
```
**Lines 226-228**: **FINAL FALLBACK** - unconditional routing
```c
g_hakmem_lock_depth++;
hak_free_at(ptr, 0, HAK_CALLSITE()); // ← BUG: Routes ALL pointers!
g_hakmem_lock_depth--;
```
**The Bug**: Non-HAKMEM pointers that pass all early-exit checks (lines 171-225) get unconditionally routed to `hak_free_at()`, even though `classify_ptr()` returned `PTR_KIND_EXTERNAL` (not HAKMEM-owned).
### Why __libc_free() Rejects the Pointer
**Two Hypotheses**:
**Hypothesis A**: Pointer is from `__libc_calloc()` (expected), but something corrupts it before reaching `__libc_free()`
- Test: calloc(256, 8) returned offset 0x2a0 (not page-aligned)
- **Contradiction**: Crash log shows page-aligned pointer (0x...000)
- **Conclusion**: Pointer is NOT from `calloc(slots)`
**Hypothesis B**: Pointer is a HAKMEM allocation that `classify_ptr()` failed to recognize
- Pool TLS allocations CAN be page-aligned (mmap'd chunks)
- `hak_super_lookup()` returns NULL → not in Tiny registry
- **Likely**: This is a Pool TLS allocation (2KB = Pool range 8-52KB)
## Verification Tests
### Test 1: Pool TLS Allocation Check
```bash
# Check if 2KB allocations use Pool TLS
./test/pool_tls_allocation_test 2048
```
### Test 2: classify_ptr() Behavior
```c
void* ptr = calloc(256, sizeof(void*)); // 2048 bytes
ptr_classification_t c = classify_ptr(ptr);
printf("kind=%d (POOL_TLS=%d, EXTERNAL=%d)\n",
c.kind, PTR_KIND_POOL_TLS, PTR_KIND_EXTERNAL);
```
## Next Steps
### Option 1: Fix free() Wrapper Logic (Recommended)
Change line 227 to check HAKMEM ownership first:
```c
// Before (BUG):
hak_free_at(ptr, 0, HAK_CALLSITE()); // Routes ALL pointers
// After (FIX):
if (is_hakmem_owned) {
hak_free_at(ptr, 0, HAK_CALLSITE());
} else {
extern void __libc_free(void*);
__libc_free(ptr); // Proper fallback for libc allocations
}
```
**Problem**: `is_hakmem_owned` is out of scope (line 149-159 block)
**Solution**: Hoist `is_hakmem_owned` to function scope or re-classify at line 226
### Option 2: Fix classify_ptr() to Recognize Pool TLS
If pointer is actually Pool TLS but misclassified:
- Add Pool TLS registry lookup to `classify_ptr()`
- Ensure Pool allocations are properly registered
### Option 3: Defer Phase 15 (Current)
Revert to Phase 14-C until free() wrapper logic is fixed
## User's Insight
> "うん mincore のセグフォはむしろ 違う層から呼ばれているという バグ発見じゃにゃいの?"
**Translation**: "Wait, isn't the mincore SEGV actually detecting a bug - that it's being called from the wrong layer?"
**Interpretation**: ExternalGuard being called is CORRECT behavior - it's detecting that a HAKMEM pointer (Pool TLS?) is not being recognized by the classification layer!
## Conclusion
**Primary Bug**: `free()` wrapper unconditionally routes all pointers to `hak_free_at()` at line 227, regardless of HAKMEM ownership.
**Secondary Bug (suspected)**: `classify_ptr()` may fail to recognize Pool TLS allocations, causing them to be misclassified as `PTR_KIND_EXTERNAL`.
**Recommendation**: Fix Option 1 (free() wrapper logic) first, then investigate Pool TLS classification if issue persists.