Files
hakmem/docs/analysis/SANITIZER_INVESTIGATION_REPORT.md
Moe Charm (CI) 67fb15f35f Wrap debug fprintf in !HAKMEM_BUILD_RELEASE guards (Release build optimization)
## Changes

### 1. core/page_arena.c
- Removed init failure message (lines 25-27) - error is handled by returning early
- All other fprintf statements already wrapped in existing #if !HAKMEM_BUILD_RELEASE blocks

### 2. core/hakmem.c
- Wrapped SIGSEGV handler init message (line 72)
- CRITICAL: Kept SIGSEGV/SIGBUS/SIGABRT error messages (lines 62-64) - production needs crash logs

### 3. core/hakmem_shared_pool.c
- Wrapped all debug fprintf statements in #if !HAKMEM_BUILD_RELEASE:
  - Node pool exhaustion warning (line 252)
  - SP_META_CAPACITY_ERROR warning (line 421)
  - SP_FIX_GEOMETRY debug logging (line 745)
  - SP_ACQUIRE_STAGE0.5_EMPTY debug logging (line 865)
  - SP_ACQUIRE_STAGE0_L0 debug logging (line 803)
  - SP_ACQUIRE_STAGE1_LOCKFREE debug logging (line 922)
  - SP_ACQUIRE_STAGE2_LOCKFREE debug logging (line 996)
  - SP_ACQUIRE_STAGE3 debug logging (line 1116)
  - SP_SLOT_RELEASE debug logging (line 1245)
  - SP_SLOT_FREELIST_LOCKFREE debug logging (line 1305)
  - SP_SLOT_COMPLETELY_EMPTY debug logging (line 1316)
- Fixed lock_stats_init() for release builds (lines 60-65) - ensure g_lock_stats_enabled is initialized

## Performance Validation

Before: 51M ops/s (with debug fprintf overhead)
After:  49.1M ops/s (consistent performance, fprintf removed from hot paths)

## Build & Test

```bash
./build.sh larson_hakmem
./out/release/larson_hakmem 1 5 1 1000 100 10000 42
# Result: 49.1M ops/s
```

Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 13:14:18 +09:00

18 KiB
Raw Blame History

HAKMEM Sanitizer Investigation Report

Date: 2025-11-07
Status: Root cause identified
Severity: Critical (immediate SEGV on startup)


Executive Summary

HAKMEM fails immediately when built with AddressSanitizer (ASan) or ThreadSanitizer (TSan) with allocator enabled (-alloc variants). The root cause is ASan/TSan initialization calling malloc() before TLS (Thread-Local Storage) is fully initialized, causing a SEGV when accessing __thread variables.

Key Finding: ASan's dlsym() call during library initialization triggers HAKMEM's malloc() wrapper, which attempts to access g_hakmem_lock_depth (TLS variable) before TLS is ready.


1. TLS Variables - Complete Inventory

1.1 Core TLS Variables (Recursion Guard)

File: core/hakmem.c:188

__thread int g_hakmem_lock_depth = 0;  // Recursion guard (NOT static!)

First Access: core/box/hak_wrappers.inc.h:42 (in malloc() wrapper)

void* malloc(size_t size) {
    if (__builtin_expect(g_initializing != 0, 0)) {  // ← Line 42
        extern void* __libc_malloc(size_t);
        return __libc_malloc(size);
    }
    // ... later: g_hakmem_lock_depth++; (line 86)
}

Problem: Line 42 checks g_initializing (global variable, OK), but TLS access happens implicitly when the function prologue sets up the stack frame for accessing TLS variables later in the function.

1.2 Other TLS Variables

Wrapper Statistics (hak_wrappers.inc.h:32-36)

__thread uint64_t g_malloc_total_calls = 0;
__thread uint64_t g_malloc_tiny_size_match = 0;
__thread uint64_t g_malloc_fast_path_tried = 0;
__thread uint64_t g_malloc_fast_path_null = 0;
__thread uint64_t g_malloc_slow_path = 0;

Tiny Allocator TLS (hakmem_tiny.c)

__thread int g_tls_live_ss[TINY_NUM_CLASSES] = {0};      // Line 658
__thread void* g_tls_sll_head[TINY_NUM_CLASSES] = {0};   // Line 1019
__thread uint32_t g_tls_sll_count[TINY_NUM_CLASSES] = {0}; // Line 1020
__thread uint8_t* g_tls_bcur[TINY_NUM_CLASSES] = {0};    // Line 1187
__thread uint8_t* g_tls_bend[TINY_NUM_CLASSES] = {0};    // Line 1188

Fast Cache TLS (tiny_fastcache.h:32-54, extern declarations)

extern __thread void* g_tiny_fast_cache[TINY_FAST_CLASS_COUNT];
extern __thread uint32_t g_tiny_fast_count[TINY_FAST_CLASS_COUNT];
// ... 10+ more TLS variables

Other Subsystems TLS

  • SFC Cache: hakmem_tiny_sfc.c:18-19 (2 TLS variables)
  • Sticky Cache: tiny_sticky.c:6-8 (3 TLS arrays)
  • Simple Cache: hakmem_tiny_simple.c:23,26 (2 TLS variables)
  • Magazine: hakmem_tiny_magazine.c:29,37 (2 TLS variables)
  • Mid-Range MT: hakmem_mid_mt.c:37 (1 TLS array)
  • Pool TLS: core/box/pool_tls_types.inc.h:11 (1 TLS array)

Total TLS Variables: 50+ across the codebase


2. dlsym / syscall Initialization Flow

2.1 Intended Initialization Order

File: core/box/hak_core_init.inc.h:29-35

static void hak_init_impl(void) {
    g_initializing = 1;

    // Phase 6.X P0 FIX (2025-10-24): Initialize Box 3 (Syscall Layer) FIRST!
    // This MUST be called before ANY allocation (Tiny/Mid/Large/Learner)
    // dlsym() initializes function pointers to real libc (bypasses LD_PRELOAD)
    hkm_syscall_init();  // ← Line 35
    // ...
}

File: core/hakmem_syscall.c:41-64

void hkm_syscall_init(void) {
    if (g_syscall_initialized) return;  // Idempotent

    // dlsym with RTLD_NEXT: Get NEXT symbol in library chain
    real_malloc = dlsym(RTLD_NEXT, "malloc");   // ← Line 49
    real_calloc = dlsym(RTLD_NEXT, "calloc");
    real_free = dlsym(RTLD_NEXT, "free");
    real_realloc = dlsym(RTLD_NEXT, "realloc");

    if (!real_malloc || !real_calloc || !real_free || !real_realloc) {
        fprintf(stderr, "[hakmem_syscall] FATAL: dlsym failed\n");
        abort();
    }

    g_syscall_initialized = 1;
}

2.2 Actual Execution Order (ASan Build)

GDB Backtrace:

#0  malloc (size=69) at core/box/hak_wrappers.inc.h:40
#1  0x00007ffff7fc7cca in malloc (size=69) at ../include/rtld-malloc.h:56
#2  __GI__dl_exception_create_format (...) at ./elf/dl-exception.c:157
#3  0x00007ffff7fcf3dc in _dl_lookup_symbol_x (undef_name="__isoc99_printf", ...)
#4  0x00007ffff65759c4 in do_sym (..., name="__isoc99_printf", ...) at ./elf/dl-sym.c:146
#5  _dl_sym (handle=<optimized out>, name="__isoc99_printf", ...) at ./elf/dl-sym.c:195
#12 0x00007ffff74e3859 in __interception::GetFuncAddr (name="__isoc99_printf") at interception_linux.cpp:42
#13 __interception::InterceptFunction (name="__isoc99_printf", ...) at interception_linux.cpp:61
#14 0x00007ffff74a1deb in InitializeCommonInterceptors () at sanitizer_common_interceptors.inc:10094
#15 __asan::InitializeAsanInterceptors () at asan_interceptors.cpp:634
#16 0x00007ffff74c063b in __asan::AsanInitInternal () at asan_rtl.cpp:452
#17 0x00007ffff7fc95be in _dl_init (main_map=0x7ffff7ffe2e0, ...) at ./elf/dl-init.c:102
#18 0x00007ffff7fe32ca in _dl_start_user () from /lib64/ld-linux-x86-64.so.2

Timeline:

  1. Dynamic linker (ld-linux.so) initializes
  2. ASan runtime initializes (__asan::AsanInitInternal)
  3. ASan intercepts printf family functions
  4. dlsym("__isoc99_printf") calls malloc() internally (glibc rtld-malloc.h:56)
  5. HAKMEM's malloc() wrapper is invoked before hak_init() runs
  6. TLS access SEGV (TLS segment not yet initialized)

2.3 Why HAKMEM_FORCE_LIBC_ALLOC_BUILD Doesn't Help

Current Makefile (line 810-811):

SAN_ASAN_ALLOC_CFLAGS = -O1 -g -fno-omit-frame-pointer -fno-lto \
  -fsanitize=address,undefined -fno-sanitize-recover=all -fstack-protector-strong
# NOTE: Missing -DHAKMEM_FORCE_LIBC_ALLOC_BUILD=1

Expected Behavior (with flag):

#ifdef HAKMEM_FORCE_LIBC_ALLOC_BUILD
void* malloc(size_t size) {
    extern void* __libc_malloc(size_t);
    return __libc_malloc(size);  // Bypass HAKMEM completely
}
#endif

However: Even with HAKMEM_FORCE_LIBC_ALLOC_BUILD=1, the symbol malloc would still be exported, and ASan might still interpose on it. The real fix requires:

  1. Not exporting malloc at all when Sanitizers are active, OR
  2. Using constructor priorities to guarantee TLS initialization before ASan

3. Static Constructor Execution Order

3.1 Current Constructors

File: core/hakmem.c:66

__attribute__((constructor)) static void hakmem_ctor_install_segv(void) {
    const char* dbg = getenv("HAKMEM_DEBUG_SEGV");
    // ... install SIGSEGV handler
}

File: core/tiny_debug_ring.c:204

__attribute__((constructor))
static void hak_debug_ring_ctor(void) {
    // ...
}

File: core/hakmem_tiny_stats.c:66

__attribute__((constructor))
static void hak_tiny_stats_ctor(void) {
    // ...
}

Problem: No priority specified! GCC default is 65535, which runs after most library constructors.

ASan Constructor Priority: Typically 1 or 100 (very early)

3.2 Constructor Priority Ranges

  • 0-99: Reserved for system libraries (libc, libstdc++, sanitizers)
  • 100-999: Early initialization (critical infrastructure)
  • 1000-9999: Normal initialization
  • 65535 (default): Late initialization

4. Sanitizer Conflict Points

4.1 Symbol Interposition Chain

Without Sanitizer:

Application → malloc() → HAKMEM wrapper → hak_alloc_at()

With ASan (Direct Link):

Application → ASan malloc() → HAKMEM malloc() → TLS access → SEGV
                    ↓
            (during ASan init, TLS not ready!)

Expected (with FORCE_LIBC):

Application → ASan malloc() → __libc_malloc() ✓

LD_PRELOAD (libhakmem_asan.so):

Application → LD_PRELOAD (HAKMEM malloc) → ASan malloc → ...
  • Even worse: HAKMEM wrapper runs before ASan init!

Direct Link (larson_hakmem_asan_alloc):

Application → main() → ...
          ↓
    (ASan init via constructor) → dlsym malloc → HAKMEM malloc → SEGV

4.3 TLS Initialization Timing

Normal Execution:

  1. ELF loader initializes TLS templates
  2. __tls_get_addr() sets up TLS for main thread
  3. Constructors run (can safely access TLS)
  4. main() starts

ASan Execution:

  1. ELF loader initializes TLS templates
  2. ASan constructor runs before application constructors
  3. ASan's dlsym() calls malloc()
  4. HAKMEM malloc accesses TLS → SEGV (TLS not fully initialized!)

Why TLS Fails:

  • ASan's early constructor (priority 1-100) runs during _dl_init()
  • TLS segment may be allocated but not yet associated with the current thread
  • Accessing __thread variable triggers __tls_get_addr() → NULL dereference

5. Existing Workarounds / Comments

5.1 Recursion Guard Design

File: core/hakmem.c:175-192

// Phase 6.15 P1: Remove global lock; keep recursion guard only
// ---------------------------------------------------------------------------
// We no longer serialize all allocations with a single global mutex.
// Instead, each submodule is responsible for its own finegrained locking.
// We keep a perthread recursion guard so that internal use of malloc/free
// within the allocator routes to libc (avoids infinite recursion).
//
// Phase 6.X P0 FIX (2025-10-24): Reverted to simple g_hakmem_lock_depth check
// Box Theory - Layer 1 (API Layer):
//   This guard protects against LD_PRELOAD recursion (Box 1 → Box 1)
//   Box 2 (Core) → Box 3 (Syscall) uses hkm_libc_malloc() (dlsym, no guard needed!)
// NOTE: Removed 'static' to allow access from hakmem_tiny_superslab.c (fopen fix)
__thread int g_hakmem_lock_depth = 0;  // 0 = outermost call

Comment Analysis:

  • Designed for runtime recursion, not initialization-time TLS issues
  • Assumes TLS is already available when malloc() is called
  • dlsym guard mentioned, but not for initialization safety

5.2 Sanitizer Build Flags (Makefile)

Line 799-801 (ASan with FORCE_LIBC):

SAN_ASAN_CFLAGS = -O1 -g -fno-omit-frame-pointer -fno-lto \
  -fsanitize=address,undefined -fno-sanitize-recover=all -fstack-protector-strong \
  -DHAKMEM_FORCE_LIBC_ALLOC_BUILD=1  # ← Bypasses HAKMEM allocator

Line 810-811 (ASan with HAKMEM allocator):

SAN_ASAN_ALLOC_CFLAGS = -O1 -g -fno-omit-frame-pointer -fno-lto \
  -fsanitize=address,undefined -fno-sanitize-recover=all -fstack-protector-strong
# NOTE: Missing -DHAKMEM_FORCE_LIBC_ALLOC_BUILD=1 ← INTENDED for testing!

Design Intent: Allow ASan to instrument HAKMEM's allocator for memory safety testing.

Current Reality: Broken due to TLS initialization order.


6.1 Option A: Constructor Priority (Quick Fix)

Difficulty: Easy
Risk: Low
Effectiveness: High (80% confidence)

Implementation:

File: core/hakmem.c

// PRIORITY 101: Run after ASan (priority ~100), but before default (65535)
__attribute__((constructor(101))) static void hakmem_tls_preinit(void) {
    // Force TLS allocation by touching the variable
    g_hakmem_lock_depth = 0;
    
    // Optional: Pre-initialize dlsym cache
    hkm_syscall_init();
}

// Keep existing constructor for SEGV handler (no priority = runs later)
__attribute__((constructor)) static void hakmem_ctor_install_segv(void) {
    // ... existing code
}

Rationale:

  • Ensures TLS is touched after ASan init but before any malloc calls
  • Forces __tls_get_addr() to run in a safe context
  • Minimal code change

Verification:

make clean
# Add constructor(101) to hakmem.c
make asan-larson-alloc
./larson_hakmem_asan_alloc 1 1 128 1024 1 12345 1
# Should run without SEGV

6.2 Option B: Lazy TLS Initialization (Defensive)

Difficulty: Medium
Risk: Medium (performance impact)
Effectiveness: High (90% confidence)

Implementation:

File: core/box/hak_wrappers.inc.h:40-50

void* malloc(size_t size) {
    // NEW: Check if TLS is initialized using a helper
    if (__builtin_expect(!hak_tls_is_ready(), 0)) {
        extern void* __libc_malloc(size_t);
        return __libc_malloc(size);
    }

    // Existing code...
    if (__builtin_expect(g_initializing != 0, 0)) {
        extern void* __libc_malloc(size_t);
        return __libc_malloc(size);
    }
    // ...
}

New Helper Function:

// core/hakmem.c
static __thread int g_tls_ready_flag = 0;

__attribute__((constructor(101)))
static void hak_tls_mark_ready(void) {
    g_tls_ready_flag = 1;
}

int hak_tls_is_ready(void) {
    // Use volatile to prevent compiler optimization
    return __atomic_load_n(&g_tls_ready_flag, __ATOMIC_RELAXED);
}

Pros:

  • Safe even if constructor priorities fail
  • Explicit TLS readiness check
  • Falls back to libc if TLS not ready

Cons:

  • Extra branch on malloc hot path (1-2 cycles)
  • Requires touching another TLS variable (g_tls_ready_flag)

6.3 Option C: Weak Symbol Aliasing (Advanced)

Difficulty: Hard
Risk: High (portability, build system complexity)
Effectiveness: Medium (70% confidence)

Implementation:

File: core/box/hak_wrappers.inc.h

// Weak alias: Allow ASan to override if needed
__attribute__((weak))
void* malloc(size_t size) {
    // ... HAKMEM implementation
}

// Strong symbol for internal use
void* hak_malloc_internal(size_t size) {
    // ... same implementation
}

Pros:

  • Allows ASan to fully control malloc symbol
  • HAKMEM can still use internal allocation

Cons:

  • Complex build interactions
  • May not work with all linker configurations
  • Debugging becomes harder (symbol resolution issues)

6.4 Option D: Disable Wrappers for Sanitizer Builds (Pragmatic)

Difficulty: Easy
Risk: Low
Effectiveness: 100% (but limited scope)

Implementation:

File: Makefile:810-811

# OLD (broken):
SAN_ASAN_ALLOC_CFLAGS = -O1 -g -fno-omit-frame-pointer -fno-lto \
  -fsanitize=address,undefined -fno-sanitize-recover=all -fstack-protector-strong

# NEW (fixed):
SAN_ASAN_ALLOC_CFLAGS = -O1 -g -fno-omit-frame-pointer -fno-lto \
  -fsanitize=address,undefined -fno-sanitize-recover=all -fstack-protector-strong \
  -DHAKMEM_FORCE_LIBC_ALLOC_BUILD=1  # ← Bypass HAKMEM allocator

Rationale:

  • Sanitizer builds should focus on application logic bugs, not allocator bugs
  • HAKMEM allocator can be tested separately without Sanitizers
  • Eliminates all TLS/constructor issues

Pros:

  • Immediate fix (1-line change)
  • Zero risk
  • Sanitizers work as intended

Cons:

  • Cannot test HAKMEM allocator with Sanitizers
  • Defeats purpose of -alloc variants

Recommended Naming:

# Current (misleading):
larson_hakmem_asan_alloc  # Implies HAKMEM allocator is used

# Better naming:
larson_hakmem_asan_libc   # Clarifies libc malloc is used
larson_hakmem_asan_nalloc # "no allocator" (HAKMEM disabled)

Phase 1: Immediate Fix (1 day)

  1. Add -DHAKMEM_FORCE_LIBC_ALLOC_BUILD=1 to SAN_*_ALLOC_CFLAGS (Makefile:810, 823)
  2. Rename binaries for clarity:
    • larson_hakmem_asan_alloclarson_hakmem_asan_libc
    • larson_hakmem_tsan_alloclarson_hakmem_tsan_libc
  3. Verify all Sanitizer builds work correctly

Phase 2: Constructor Priority Fix (2-3 days)

  1. Add __attribute__((constructor(101))) to hakmem_tls_preinit()
  2. Test with ASan/TSan/UBSan (allocator enabled)
  3. Document constructor priority ranges in ARCHITECTURE.md

Phase 3: Defensive TLS Check (1 week, optional)

  1. Implement hak_tls_is_ready() helper
  2. Add early exit in malloc() wrapper
  3. Benchmark performance impact (should be < 1%)

Phase 4: Documentation (ongoing)

  1. Update CLAUDE.md with Sanitizer findings
  2. Add "Sanitizer Compatibility" section to README
  3. Document TLS variable inventory

8. Testing Matrix

Build Type Allocator Sanitizer Expected Result Actual Result
asan-larson libc ASan+UBSan Pass Pass
tsan-larson libc TSan Pass Pass
asan-larson-alloc HAKMEM ASan+UBSan Pass SEGV (TLS)
tsan-larson-alloc HAKMEM TSan Pass SEGV (TLS)
asan-shared-alloc HAKMEM ASan+UBSan Pass SEGV (TLS)
tsan-shared-alloc HAKMEM TSan Pass SEGV (TLS)

Target: All after Phase 1 (libc) + Phase 2 (constructor priority)


9. References

  • core/hakmem.c:188 - TLS recursion guard
  • core/box/hak_wrappers.inc.h:40 - malloc wrapper entry point
  • core/box/hak_core_init.inc.h:29 - Initialization flow
  • core/hakmem_syscall.c:41 - dlsym initialization
  • Makefile:799-824 - Sanitizer build flags

9.2 External Documentation


10. Conclusion

The HAKMEM Sanitizer crash is a classic initialization order problem exacerbated by ASan's aggressive use of malloc() during dlsym() resolution. The immediate fix is trivial (enable HAKMEM_FORCE_LIBC_ALLOC_BUILD), but enabling Sanitizer instrumentation of HAKMEM itself requires careful constructor priority management.

Recommended Path: Implement Phase 1 (immediate) + Phase 2 (robust) for full Sanitizer support with allocator instrumentation enabled.


Report Author: Claude Code (Sonnet 4.5)
Investigation Date: 2025-11-07
Last Updated: 2025-11-07