Commit Graph

358 Commits

Author SHA1 Message Date
7d0782d5b6 ENV Cleanup Step 17: Gate HAKMEM_TINY_RF_TRACE
Gate the refill trace debug variable behind #if !HAKMEM_BUILD_RELEASE:
- HAKMEM_TINY_RF_TRACE: Controls refill/mailbox publish path tracing
- File: core/tiny_publish.c:21-34 (1 call site gated)

Other 2 call sites already gated:
- core/tiny_refill.h:94 (already inside #if !HAKMEM_BUILD_RELEASE)
- core/box/mailbox_box.c:64 (already inside #if !HAKMEM_BUILD_RELEASE)

Performance: 30.7M ops/s avg (baseline maintained, 3 runs: 30.6M, 30.9M, 30.7M)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-28 04:36:37 +09:00
2cdec72ee3 ENV Cleanup Step 16: Gate HAKMEM_SS_FREE_DEBUG
Gate the shared pool free debug variable behind #if !HAKMEM_BUILD_RELEASE:
- HAKMEM_SS_FREE_DEBUG: Controls shared pool slot release tracing
- File: core/hakmem_shared_pool.c:1221-1229

The debug output was already gated inside #if !HAKMEM_BUILD_RELEASE blocks.
This change only gates the ENV check itself. In release builds, sets
dbg to constant 0, allowing compiler to optimize away checks.

Performance: 30.3M ops/s (baseline maintained)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-28 04:35:07 +09:00
f119f048f2 ENV Cleanup Step 15: Gate HAKMEM_SS_ACQUIRE_DEBUG
Gate the shared pool acquire debug variable behind #if !HAKMEM_BUILD_RELEASE:
- HAKMEM_SS_ACQUIRE_DEBUG: Controls shared pool acquisition stage tracing
- File: core/hakmem_shared_pool.c:780-788

The debug output was already gated inside #if !HAKMEM_BUILD_RELEASE blocks.
This change only gates the ENV check itself. In release builds, sets
dbg_acquire to constant 0, allowing compiler to optimize away checks.

Performance: 31.1M ops/s (+2% vs baseline)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-28 04:34:21 +09:00
679c821573 ENV Cleanup Step 14: Gate HAKMEM_TINY_HEAP_V2_DEBUG
Gate the HeapV2 push debug logging behind #if !HAKMEM_BUILD_RELEASE:
- HAKMEM_TINY_HEAP_V2_DEBUG: Controls magazine push event tracing
- File: core/front/tiny_heap_v2.h:117-130

Wraps the ENV check and debug output that logs the first 5 push
operations per size class for HeapV2 magazine diagnostics.

Performance: 29.6M ops/s (within baseline range)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-28 04:33:39 +09:00
be9bdd7812 ENV Cleanup Step 13: Gate HAKMEM_TINY_REFILL_OPT_DEBUG
Gate the refill optimization debug output behind #if !HAKMEM_BUILD_RELEASE:
- HAKMEM_TINY_REFILL_OPT_DEBUG: Controls refill chain optimization tracing
- File: core/tiny_refill_opt.h:30

Changed condition from:
  #if HAKMEM_TINY_REFILL_OPT
to:
  #if HAKMEM_TINY_REFILL_OPT && !HAKMEM_BUILD_RELEASE

Performance: 30.6M ops/s (baseline maintained)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-28 04:32:55 +09:00
417f149479 ENV Cleanup Step 12: Gate HAKMEM_TINY_FAST_DEBUG + HAKMEM_TINY_FAST_DEBUG_MAX
Gate the fast cache debug system behind #if !HAKMEM_BUILD_RELEASE:
- HAKMEM_TINY_FAST_DEBUG: Enable/disable fastcache event logging
- HAKMEM_TINY_FAST_DEBUG_MAX: Limit number of debug messages per class
- File: core/hakmem_tiny_fastcache.inc.h:48-76

Both variables combined in single gate since they work together as a
debug logging subsystem. In release builds, provides no-op inline stub.

Performance: 30.5M ops/s (baseline maintained)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-28 04:32:15 +09:00
a24f17386c ENV Cleanup Step 11: Gate HAKMEM_SS_PREWARM_DEBUG in super_registry.c
Gate HAKMEM_SS_PREWARM_DEBUG environment variable behind
#if !HAKMEM_BUILD_RELEASE in prewarm functions (2 call sites).

Changes:
- Wrap dbg variable in hak_ss_prewarm_class()
- Wrap dbg variable in hak_ss_prewarm_init()
- Release builds use constant dbg = 0 for complete code elimination

Performance: 30.2M ops/s Larson (stable, within expected variance)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-28 01:48:57 +09:00
2c3dcdb90b ENV Cleanup Step 10: Gate HAKMEM_SS_LRU_DEBUG in super_registry.c
Gate HAKMEM_SS_LRU_DEBUG environment variable behind
#if !HAKMEM_BUILD_RELEASE in LRU cache operations (3 call sites).

Changes:
- Wrap dbg variable in ss_lru_evict_one()
- Wrap dbg variable in hak_ss_lru_pop()
- Wrap dbg variable in hak_ss_lru_push()
- Release builds use constant dbg = 0 for complete code elimination

Performance: 30.7M ops/s Larson (+1.3% improvement)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-28 01:48:02 +09:00
4540b01da0 ENV Cleanup Step 9: Gate HAKMEM_SUPER_REG_DEBUG in super_registry.c
Gate HAKMEM_SUPER_REG_DEBUG environment variable behind
#if !HAKMEM_BUILD_RELEASE in register/unregister functions.

Changes:
- Wrap dbg variable initialization in hak_super_register()
- Wrap dbg_once static variable and ENV check in hak_super_unregister()
- Release builds use constant dbg = 0 for complete code elimination

Performance: 30.6M ops/s Larson (+1.0% improvement)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-28 01:46:50 +09:00
f8b0f38f78 ENV Cleanup Step 8: Gate HAKMEM_SUPER_LOOKUP_DEBUG in header
Gate HAKMEM_SUPER_LOOKUP_DEBUG environment variable behind
#if !HAKMEM_BUILD_RELEASE in hakmem_super_registry.h inline function.

Changes:
- Wrap s_dbg initialization in conditional compilation
- Release builds use constant s_dbg = 0 for complete elimination
- Debug logging in hak_super_lookup() now fully compiled out in release

Performance: 30.3M ops/s Larson (stable, no regression)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-28 01:45:45 +09:00
cfa5e4e91c ENV Cleanup Step 7: Gate debug ENV vars in core/box/free_local_box.c
Changes:
- Gated HAKMEM_TINY_SLL_DIAG (2 call sites) behind #if !HAKMEM_BUILD_RELEASE
- Gated HAKMEM_TINY_FREELIST_MASK behind #if !HAKMEM_BUILD_RELEASE
- Gated HAKMEM_SS_FREE_DEBUG behind #if !HAKMEM_BUILD_RELEASE
- Entire diagnostic blocks wrapped (not just getenv) to avoid compilation errors
- ENV variables gated: HAKMEM_TINY_SLL_DIAG, HAKMEM_TINY_FREELIST_MASK, HAKMEM_SS_FREE_DEBUG

Performance: 30.4M ops/s Larson (baseline 30.4M, perfect match)
Build: Clean, pre-existing warnings only

FIX: Previous version had scoping issue with static variables inside do{}while blocks

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-28 01:10:26 +09:00
d0d2814f15 ENV Cleanup Step 6: Gate HAKMEM_TIMING in core/hakmem_debug.c
Changes:
- Gated HAKMEM_TIMING ENV check behind #if !HAKMEM_BUILD_RELEASE
- Release builds set g_timing_enabled = 0 directly (no getenv call)
- Debug builds preserve existing behavior
- ENV variable gated: HAKMEM_TIMING

Performance: 30.3M ops/s Larson (baseline 30.4M, within margin)
Build: Clean, LTO warnings only (pre-existing)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-28 01:05:18 +09:00
35e8e4c34d ENV Cleanup Step 5: Gate HAKMEM_PTR_TRACE_DUMP/VERBOSE in core/ptr_trace.h
Changes:
- Gated HAKMEM_PTR_TRACE_DUMP behind #if !HAKMEM_BUILD_RELEASE
- Gated HAKMEM_PTR_TRACE_VERBOSE behind #if !HAKMEM_BUILD_RELEASE
- Used lazy init pattern with __builtin_expect for branch prediction
- ENV variables gated: HAKMEM_PTR_TRACE_DUMP, HAKMEM_PTR_TRACE_VERBOSE

Performance: 29.2M ops/s Larson (baseline 30.4M, -4% acceptable variance)
Build: Clean, LTO warnings only (pre-existing)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-28 01:04:29 +09:00
316ea4dfd6 ENV Cleanup Step 4: Gate HAKMEM_WATCH_ADDR in tiny_region_id.h
Gate get_watch_addr() debug functionality with HAKMEM_BUILD_RELEASE,
returning 0 in release builds to disable address watching overhead.

Performance: 30.31M ops/s (baseline: 30.2M)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-28 00:47:16 +09:00
42747a1080 ENV Cleanup Step 3: Gate HAKMEM_TINY_PROFILE in tiny_fastcache.h
Gate tiny_fast_profile_enabled() getenv call with HAKMEM_BUILD_RELEASE,
returning 0 in release builds to disable profiling overhead.

Performance: 30.34M ops/s (baseline: 30.2M)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-28 00:46:32 +09:00
794bf996f1 ENV Cleanup Step 2c: Gate debug code in hakmem_tiny_alloc.inc
Gate tiny_alloc_dump_tls_state() call on allocation failure path with
HAKMEM_BUILD_RELEASE guard.

Performance: 30.15M ops/s (baseline: 30.2M)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-28 00:45:27 +09:00
0567e2957f ENV Cleanup Step 2b: Gate debug code in tiny_superslab_free.inc.h
Gate tiny_alloc_dump_tls_state() call in remote watch debug path with
HAKMEM_BUILD_RELEASE guard, consolidating with existing debug fprintf.

Performance: 30.3M ops/s (baseline: 30.2M)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-28 00:44:47 +09:00
d6c2ea6f3e ENV Cleanup Step 2a: Gate debug code in hakmem_tiny_slow.inc
Gate tiny_alloc_dump_tls_state() call and getenv debug code on slow path
failure with HAKMEM_BUILD_RELEASE guard.

Performance: 30.5M ops/s (baseline: 30.2M)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-28 00:43:57 +09:00
3833d4e3eb ENV Cleanup Step 1: Gate tiny_debug.h with HAKMEM_BUILD_RELEASE
Wrap debug functionality in !HAKMEM_BUILD_RELEASE guard with no-op stubs
for release builds. This eliminates getenv() calls for HAKMEM_TINY_ALLOC_DEBUG
in production while maintaining API compatibility.

Performance: 30.0M ops/s (baseline: 30.2M)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-28 00:43:07 +09:00
930c5283b4 Fix Larson 36x slowdown: Remove tls_uninitialized early return in sll_refill_small_from_ss()
Problem:
- Larson benchmark showed 730K ops/s instead of expected 26M ops/s
- Class 1 TLS SLL cache always stayed empty (tls_count=0)
- All allocations went through slow path (shared_pool_acquire_slab at 48% CPU)

Root cause:
- In sll_refill_small_from_ss(), when TLS was completely uninitialized
  (ss=NULL, meta=NULL, slab_base=NULL), the function returned 0 immediately
  without calling superslab_refill() to initialize it
- The comment said "expect upper logic to call superslab_refill" but
  tiny_alloc_fast_refill() did NOT call it after receiving 0
- This created a loop: TLS SLL stays empty → refill returns 0 → slow path

Fix:
- Remove the tls_uninitialized early return
- Let the existing downstream condition (!tls->ss || !tls->meta || ...)
  handle the uninitialized case and call superslab_refill()

Result:
- Throughput: 730K → 26.5M ops/s (36x improvement)
- shared_pool_acquire_slab: 48% → 0% in perf profile

Introduced in: fcf098857 (Phase12 debug, 2025-11-14)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-27 16:47:30 +09:00
8355214135 Fix NULL pointer crash in unified_cache_refill ss_active_add
When superslab_refill() fails in the inner loop, tls->ss can remain
NULL even when produced > 0 (from earlier successful allocations).
This caused a segfault at high iteration counts (>500K) in the
random_mixed benchmark.

Root cause: Line 353 calls ss_active_add(tls->ss, ...) without
checking if tls->ss is NULL after a failed refill breaks the loop.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-27 13:31:46 +09:00
7a03a614fd Restrict ss_fast_lookup to validated Tiny pointer paths only
Safety fix: ss_fast_lookup masks pointer to 1MB boundary and reads
memory at that address. If called with arbitrary (non-Tiny) pointers,
the masked address could be unmapped → SEGFAULT.

Changes:
- tiny_free_fast(): Reverted to safe hak_super_lookup (can receive
  arbitrary pointers without prior validation)
- ss_fast_lookup(): Added safety warning in comments documenting when
  it's safe to use (after header magic 0xA0 validation)

ss_fast_lookup remains in LARSON_FIX paths where header magic is
already validated before the SuperSlab lookup.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-27 12:55:40 +09:00
64ed3d8d8c Add ss_fast_lookup() for O(1) SuperSlab lookup via mask
Replaces expensive hak_super_lookup() (registry hash lookup, 50-100 cycles)
with fast mask-based lookup (~5-10 cycles) in free hot paths.

Algorithm:
1. Mask pointer with SUPERSLAB_SIZE_MIN (1MB) - works for both 1MB and 2MB SS
2. Validate magic (SUPERSLAB_MAGIC)
3. Range check using ss->lg_size

Applied to:
- tiny_free_fast.inc.h: tiny_free_fast() SuperSlab path
- tiny_free_fast_v2.inc.h: LARSON_FIX cross-thread check
- front/malloc_tiny_fast.h: free_tiny_fast() LARSON_FIX path

Note: Performance impact minimal with LARSON_FIX=OFF (default) since
SuperSlab lookup is skipped entirely in that case. Optimization benefits
LARSON_FIX=ON path for safe multi-threaded operation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-27 12:47:10 +09:00
0a8bdb8b18 Fix release build debug logging in tiny_region_id.h
The allocation logging at line 236-249 was missing the
#if !HAKMEM_BUILD_RELEASE guard, causing fprintf(stderr)
on every allocation even in release builds.

Impact: 19.8M ops/s → 28.0M ops/s (+42%)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-27 11:58:00 +09:00
d8e3971dc2 Fix cross-thread ownership check: Use bits 8-15 for owner_tid_low
Problem:
- TLS_SLL_PUSH_DUP crash in Larson multi-threaded benchmark
- Cross-thread frees incorrectly routed to same-thread TLS path
- Root cause: pthread_t on glibc is 256-byte aligned (TCB base)
  so lower 8 bits are ALWAYS 0x00 for ALL threads

Fix:
- Change owner_tid_low from (tid & 0xFF) to ((tid >> 8) & 0xFF)
- Bits 8-15 actually vary between threads, enabling correct detection
- Applied consistently across all ownership check locations:
  - superslab_inline.h: ss_owner_try_acquire/release/is_mine
  - slab_handle.h: slab_try_acquire
  - tiny_free_fast.inc.h: tiny_free_is_same_thread_ss
  - tiny_free_fast_v2.inc.h: cross-thread detection
  - tiny_superslab_free.inc.h: same-thread check
  - ss_allocation_box.c: slab initialization
  - hakmem_tiny_superslab.c: ownership handling

Also added:
- Address watcher debug infrastructure (tiny_region_id.h)
- Cross-thread detection in malloc_tiny_fast.h Front Gate

Test results:
- Larson 1T/2T/4T: PASS (no TLS_SLL_PUSH_DUP crash)
- random_mixed: PASS
- Performance: ~20M ops/s (regression from 48M, needs optimization)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-27 11:52:11 +09:00
8af9123bcc Larson double-free investigation: Add full operation lifecycle logging
**Diagnostic Enhancement**: Complete malloc/free/pop operation tracing for debug

**Problem**: Larson crashes with TLS_SLL_DUP at count=18, need to trace exact
pointer lifecycle to identify if allocator returns duplicate addresses or if
benchmark has double-free bug.

**Implementation** (ChatGPT + Claude + Task collaboration):

1. **Global Operation Counter** (core/hakmem_tiny_config_box.inc:9):
   - Single atomic counter for all operations (malloc/free/pop)
   - Chronological ordering across all paths

2. **Allocation Logging** (core/hakmem_tiny_config_box.inc:148-161):
   - HAK_RET_ALLOC macro enhanced with operation logging
   - Logs first 50 class=1 allocations with ptr/base/tls_count

3. **Free Logging** (core/tiny_free_fast_v2.inc.h:222-235):
   - Added before tls_sll_push() call (line 221)
   - Logs first 50 class=1 frees with ptr/base/tls_count_before

4. **Pop Logging** (core/box/tls_sll_box.h:587-597):
   - Added in tls_sll_pop_impl() after successful pop
   - Logs first 50 class=1 pops with base/tls_count_after

5. **Drain Debug Logging** (core/box/tls_sll_drain_box.h:143-151):
   - Enhanced drain loop with detailed logging
   - Tracks pop failures and drained block counts

**Initial Findings**:
- First 19 operations: ALL frees, ZERO allocations, ZERO pops
- OP#0006: First free of 0x...430
- OP#0018: Duplicate free of 0x...430 → TLS_SLL_DUP detected
- Suggests either: (a) allocations before logging starts, or (b) Larson bug

**Debug-only**: All logging gated by !HAKMEM_BUILD_RELEASE (zero cost in release)

**Next Steps**:
- Expand logging window to 200 operations
- Log initialization phase allocations
- Cross-check with Larson benchmark source

**Status**: Ready for extended testing
2025-11-27 08:18:01 +09:00
8553894171 Larson double-free investigation: Enhanced diagnostics + Remove buggy drain pushback
**Problem**: Larson benchmark crashes with TLS_SLL_DUP (double-free), 100% crash rate in debug

**Root Cause**: TLS drain pushback code (commit c2f104618) created duplicates by
pushing pointers back to TLS SLL while they were still in the linked list chain.

**Diagnostic Enhancements** (ChatGPT + Claude collaboration):
1. **Callsite Tracking**: Track file:line for each TLS SLL push (debug only)
   - Arrays: g_tls_sll_push_file[], g_tls_sll_push_line[]
   - Macro: tls_sll_push() auto-records __FILE__, __LINE__

2. **Enhanced Duplicate Detection**:
   - Scan depth: 64 → 256 nodes (deep duplicate detection)
   - Error message shows BOTH current and previous push locations
   - Calls ptr_trace_dump_now() for detailed analysis

3. **Evidence Captured**:
   - Both duplicate pushes from same line (221)
   - Pointer at position 11 in TLS SLL (count=18, scanned=11)
   - Confirms pointer allocated without being popped from TLS SLL

**Fix**:
- **core/box/tls_sll_drain_box.h**: Remove pushback code entirely
  - Old: Push back to TLS SLL on validation failure → duplicates!
  - New: Skip pointer (accept rare leak) to avoid duplicates
  - Rationale: SuperSlab lookup failures are transient/rare

**Status**: Fix implemented, ready for testing

**Updated**:
- LARSON_DOUBLE_FREE_INVESTIGATION.md: Root cause confirmed
2025-11-27 07:30:32 +09:00
c2f104618f Fix critical TLS drain memory leak causing potential double-free
## Root Cause

TLS drain was dropping pointers when SuperSlab lookup or slab_idx validation failed:
- Pop pointer from TLS SLL
- Lookup/validation fails
- continue → LEAK! Pointer never returned to any freelist

## Impact

Memory leak + potential double allocation:
1. Pointer P popped but leaked
2. Same address P reallocated from carve/other source
3. User frees P again → duplicate detection → ABORT

## Fix

**Before (BUGGY)**:
```c
if (!ss || invalid_slab_idx) {
    continue;  // ← LEAK!
}
```

**After (FIXED)**:
```c
if (!ss || invalid_slab_idx) {
    // Push back to TLS SLL head (retry later)
    tiny_next_write(class_idx, base, g_tls_sll[class_idx].head);
    g_tls_sll[class_idx].head = base;
    g_tls_sll[class_idx].count++;
    break;  // Stop draining to avoid infinite retry
}
```

## Files Changed

- core/box/tls_sll_drain_box.h: Fix 2 leak sites (SS lookup + slab_idx validation)
- docs/analysis/LARSON_DOUBLE_FREE_INVESTIGATION.md: Investigation report

## Related

- Larson double-free investigation (47% crash rate)
- Commit e4868bf23: Freelist header write + abort() on duplicate
- ChatGPT analysis: Larson benchmark code is correct (no user bug)
2025-11-27 06:49:38 +09:00
e4868bf236 Larson crash investigation: Add freelist header write + abort() on duplicate
## Changes

1. **TLS SLL duplicate detection** (core/box/tls_sll_box.h:381)
   - Changed 'return true' to 'abort()' to get backtrace on double-free
   - Enables precise root cause identification

2. **Freelist header write fix** (core/tiny_superslab_alloc.inc.h:159-169)
   - Added tiny_region_id_write_header() call in freelist allocation path
   - Previously only linear carve wrote headers → stale headers on reuse
   - Now both paths write headers consistently

## Root Cause Analysis

Backtrace revealed true double-free pattern:
- last_push_from=hak_tiny_free_fast_v2 (freed once)
- last_pop_from=(null) (never allocated)
- where=hak_tiny_free_fast_v2 (freed again!)

Same pointer freed twice WITHOUT reallocation in between.

## Status

- Freelist header fix:  Implemented (necessary but not sufficient)
- Double-free still occurs:  Deeper investigation needed
- Possible causes: User code bug, TLS drain race, remote free issue

Next: Investigate allocation/free flow with enhanced tracing
2025-11-27 05:57:22 +09:00
12c36afe46 Fix TSan build: Add weak stubs for sanitizer compatibility
Added weak stubs to core/link_stubs.c for symbols that are not needed
in HAKMEM_FORCE_LIBC_ALLOC_BUILD=1 (TSan/ASan) builds:

Stubs added:
- g_bump_chunk (int)
- g_tls_bcur, g_tls_bend (__thread uint8_t*[8])
- smallmid_backend_free()
- expand_superslab_head()

Also added: #include <stdint.h> for uint8_t

Impact:
- TSan build: PASS (larson_hakmem_tsan successfully built)
- Phase 2 ready: Can now use TSan to debug Larson crashes

Next: Use TSan to investigate Larson 47% crash rate

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-27 05:19:56 +09:00
6b791b97d4 ENV Cleanup: Delete Ultra HEAP & BG Remote dead code (-1,096 LOC)
Deleted files (11):
- core/ultra/ directory (6 files: tiny_ultra_heap.*, tiny_ultra_page_arena.*)
- core/front/tiny_ultrafront.h
- core/tiny_ultra_fast.inc.h
- core/hakmem_tiny_ultra_front.inc.h
- core/hakmem_tiny_ultra_simple.inc
- core/hakmem_tiny_ultra_batch_box.inc

Edited files (10):
- core/hakmem_tiny.c: Remove Ultra HEAP #includes, move ultra_batch_for_class()
- core/hakmem_tiny_tls_state_box.inc: Delete TinyUltraFront, g_ultra_simple
- core/hakmem_tiny_phase6_wrappers_box.inc: Delete ULTRA_SIMPLE block
- core/hakmem_tiny_alloc.inc: Delete Ultra-Front code block
- core/hakmem_tiny_init.inc: Delete ULTRA_SIMPLE ENV loading
- core/hakmem_tiny_remote_target.{c,h}: Delete g_bg_remote_enable/batch
- core/tiny_refill.h: Remove BG Remote check (always break)
- core/hakmem_tiny_background.inc: Delete BG Remote drain loop

Deleted ENV variables:
- HAKMEM_TINY_ULTRA_HEAP (build flag, undefined)
- HAKMEM_TINY_ULTRA_L0
- HAKMEM_TINY_ULTRA_HEAP_DUMP
- HAKMEM_TINY_ULTRA_PAGE_DUMP
- HAKMEM_TINY_ULTRA_FRONT
- HAKMEM_TINY_BG_REMOTE (no getenv, dead code)
- HAKMEM_TINY_BG_REMOTE_BATCH (no getenv, dead code)
- HAKMEM_TINY_ULTRA_SIMPLE (references only)

Impact:
- Code reduction: -1,096 lines
- Binary size: 305KB → 304KB (-1KB)
- Build: PASS
- Sanity: 15.69M ops/s (3 runs avg)
- Larson: 1 crash observed (seed 43, likely existing instability)

Notes:
- Ultra HEAP never compiled (#if HAKMEM_TINY_ULTRA_HEAP undefined)
- BG Remote variables never initialized (g_bg_remote_enable always 0)
- Ultra SLIM (ultra_slim_alloc_box.h) preserved (active 4-layer path)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-27 04:35:47 +09:00
f4978b1529 ENV Cleanup Phase 5: Additional DEBUG guards + doc cleanup
Code changes:
- core/slab_handle.h: Add RELEASE guard for HAKMEM_TINY_FREELIST_MASK
- core/tiny_superslab_free.inc.h: Add guards for HAKMEM_TINY_ROUTE_FREE, HAKMEM_TINY_FREELIST_MASK

Documentation cleanup:
- docs/specs/CONFIGURATION.md: Remove 21 doc-only ENV variables
- docs/specs/ENV_VARS.md: Remove doc-only variables

Testing:
- Build: PASS (305KB binary, unchanged)
- Sanity: PASS (17.22M ops/s average, 3 runs)
- Larson: PASS (52.12M ops/s, 0 crashes)

Impact:
- 2 additional DEBUG ENV variables guarded (no overhead in RELEASE)
- Documentation accuracy improved
- Binary size maintained

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-27 03:55:17 +09:00
43015725af ENV cleanup: Add RELEASE guards to DEBUG ENV variables (14 vars)
Added compile-time guards (#if HAKMEM_BUILD_RELEASE) to eliminate
DEBUG ENV variable overhead in RELEASE builds.

Variables guarded (14 total):
- HAKMEM_TINY_TRACE_RING, HAKMEM_TINY_DUMP_RING_ATEXIT
- HAKMEM_TINY_RF_TRACE, HAKMEM_TINY_MAILBOX_TRACE
- HAKMEM_TINY_MAILBOX_TRACE_LIMIT, HAKMEM_TINY_MAILBOX_SLOWDISC
- HAKMEM_TINY_MAILBOX_SLOWDISC_PERIOD
- HAKMEM_SS_PREWARM_DEBUG, HAKMEM_SS_FREE_DEBUG
- HAKMEM_TINY_FRONT_METRICS, HAKMEM_TINY_FRONT_DUMP
- HAKMEM_TINY_COUNTERS_DUMP, HAKMEM_TINY_REFILL_DUMP
- HAKMEM_PTR_TRACE_DUMP, HAKMEM_PTR_TRACE_VERBOSE

Files modified (9 core files):
- core/tiny_debug_ring.c (ring trace/dump)
- core/box/mailbox_box.c (mailbox trace + slowdisc)
- core/tiny_refill.h (refill trace)
- core/hakmem_tiny_superslab.c (superslab debug)
- core/box/ss_allocation_box.c (allocation debug)
- core/tiny_superslab_free.inc.h (free debug)
- core/box/front_metrics_box.c (frontend metrics)
- core/hakmem_tiny_stats.c (stats dump)
- core/ptr_trace.h (pointer trace)

Bug fixes during implementation:
1. mailbox_box.c - Fixed variable scope (moved 'used' outside guard)
2. hakmem_tiny_stats.c - Fixed incomplete declarations (on1, on2)

Impact:
- Binary size: -85KB total
  - bench_random_mixed_hakmem: 319K → 305K (-14K, -4.4%)
  - larson_hakmem: 380K → 309K (-71K, -18.7%)
- Performance: No regression (16.9-17.9M ops/s maintained)
- Functional: All tests pass (Random Mixed + Larson)
- Behavior: DEBUG ENV vars correctly ignored in RELEASE builds

Testing:
- Build: Clean compilation (warnings only, pre-existing)
- 100K Random Mixed: 16.9-17.9M ops/s (PASS)
- 10K Larson: 25.9M ops/s (PASS)
- DEBUG ENV verification: Correctly ignored (PASS)

Result: 14 DEBUG ENV variables now have zero overhead in RELEASE builds.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-27 03:41:07 +09:00
543abb0586 ENV cleanup: Consolidate SFC_DEBUG getenv() calls (86% reduction)
Optimized HAKMEM_SFC_DEBUG environment variable handling by caching
the value at initialization instead of repeated getenv() calls in
hot paths.

Changes:
1. Added g_sfc_debug global variable (core/hakmem_tiny_sfc.c)
   - Initialized once in sfc_init() by reading HAKMEM_SFC_DEBUG
   - Single source of truth for SFC debug state

2. Declared g_sfc_debug as extern (core/hakmem_tiny_config.h)
   - Available to all modules that need SFC debug checks

3. Replaced getenv() with g_sfc_debug in hot paths:
   - core/tiny_alloc_fast_sfc.inc.h (allocation path)
   - core/tiny_free_fast.inc.h (free path)
   - core/box/hak_wrappers.inc.h (wrapper layer)

Impact:
- getenv() calls: 7 → 1 (86% reduction)
- Hot-path calls eliminated: 6 (all moved to init-time)
- Performance: 15.10M ops/s (stable, 0% CV)
- Build: Clean compilation, no new warnings

Testing:
- 10 runs of 100K iterations: consistent performance
- Symbol verification: g_sfc_debug present in hakmem_tiny_sfc.o
- No regression detected

Note: 3 additional getenv("HAKMEM_SFC_DEBUG") calls exist in
hakmem_tiny_ultra_simple.inc but are dead code (file not compiled
in current build configuration).

Files modified: 5 core files
Status: Production-ready, all tests passed

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-27 03:18:33 +09:00
6fadc74405 ENV cleanup: Remove obsolete ULTRAHOT variable + organize docs
Changes:
1. Removed HAKMEM_TINY_FRONT_ENABLE_ULTRAHOT variable
   - Deleted front_prune_ultrahot_enabled() function
   - UltraHot feature was removed in commit bcfb4f6b5
   - Variable was dead code, no longer referenced

2. Organized ENV cleanup analysis documents
   - Moved 5 ENV analysis docs to docs/analysis/
   - ENV_CLEANUP_PLAN.md - detailed file-by-file plan
   - ENV_CLEANUP_SUMMARY.md - executive summary
   - ENV_CLEANUP_ANALYSIS.md - categorized analysis
   - ENV_CONSOLIDATION_PLAN.md - consolidation proposals
   - ENV_QUICK_REFERENCE.md - quick reference guide

Impact:
- ENV variables: 221 → 220 (-1)
- Build:  Successful
- Risk: Zero (dead code removal)

Next steps (documented in ENV_CLEANUP_SUMMARY.md):
- 21 variables need verification (Ultra/HeapV2/BG/HotMag)
- SFC_DEBUG deduplication opportunity (7 callsites)

File: core/box/front_metrics_box.h
Status: SAVEPOINT - stable baseline for future ENV cleanup

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 17:12:41 +09:00
bea839add6 Revert "Port: Tune Superslab Min-Keep and Shared Pool Soft Caps (04a60c316)"
This reverts commit d355041638.
2025-11-26 15:43:45 +09:00
d355041638 Port: Tune Superslab Min-Keep and Shared Pool Soft Caps (04a60c316)
- Policy: Set tiny_min_keep for C2-C6 to reduce mmap/munmap churn
- Policy: Loosen tiny_cap (soft cap) for C4-C6 to allow more active slots
- Added tiny_min_keep field to FrozenPolicy struct

Larson: 52.13M ops/s (stable)

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 15:06:36 +09:00
a2e65716b3 Port: Optimize tiny_get_max_size inline (e81fe783d)
- Move tiny_get_max_size to header for inlining
- Use cached static variable to avoid repeated env lookup
- Larson: 51.99M ops/s (stable)

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 15:05:03 +09:00
a9ddb52ad4 ENV cleanup: Remove BG/HotMag vars & guard fprintf (Larson 52.3M ops/s)
Phase 1 完了:環境変数整理 + fprintf デバッグガード

ENV変数削除(BG/HotMag系):
- core/hakmem_tiny_init.inc: HotMag ENV 削除 (~131 lines)
- core/hakmem_tiny_bg_spill.c: BG spill ENV 削除
- core/tiny_refill.h: BG remote 固定値化
- core/hakmem_tiny_slow.inc: BG refs 削除

fprintf Debug Guards (#if !HAKMEM_BUILD_RELEASE):
- core/hakmem_shared_pool.c: Lock stats (~18 fprintf)
- core/page_arena.c: Init/Shutdown/Stats (~27 fprintf)
- core/hakmem.c: SIGSEGV init message

ドキュメント整理:
- 328 markdown files 削除(旧レポート・重複docs)

性能確認:
- Larson: 52.35M ops/s (前回52.8M、安定動作)
- ENV整理による機能影響なし
- Debug出力は一部残存(次phase で対応)

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 14:45:26 +09:00
67fb15f35f Wrap debug fprintf in !HAKMEM_BUILD_RELEASE guards (Release build optimization)
## Changes

### 1. core/page_arena.c
- Removed init failure message (lines 25-27) - error is handled by returning early
- All other fprintf statements already wrapped in existing #if !HAKMEM_BUILD_RELEASE blocks

### 2. core/hakmem.c
- Wrapped SIGSEGV handler init message (line 72)
- CRITICAL: Kept SIGSEGV/SIGBUS/SIGABRT error messages (lines 62-64) - production needs crash logs

### 3. core/hakmem_shared_pool.c
- Wrapped all debug fprintf statements in #if !HAKMEM_BUILD_RELEASE:
  - Node pool exhaustion warning (line 252)
  - SP_META_CAPACITY_ERROR warning (line 421)
  - SP_FIX_GEOMETRY debug logging (line 745)
  - SP_ACQUIRE_STAGE0.5_EMPTY debug logging (line 865)
  - SP_ACQUIRE_STAGE0_L0 debug logging (line 803)
  - SP_ACQUIRE_STAGE1_LOCKFREE debug logging (line 922)
  - SP_ACQUIRE_STAGE2_LOCKFREE debug logging (line 996)
  - SP_ACQUIRE_STAGE3 debug logging (line 1116)
  - SP_SLOT_RELEASE debug logging (line 1245)
  - SP_SLOT_FREELIST_LOCKFREE debug logging (line 1305)
  - SP_SLOT_COMPLETELY_EMPTY debug logging (line 1316)
- Fixed lock_stats_init() for release builds (lines 60-65) - ensure g_lock_stats_enabled is initialized

## Performance Validation

Before: 51M ops/s (with debug fprintf overhead)
After:  49.1M ops/s (consistent performance, fprintf removed from hot paths)

## Build & Test

```bash
./build.sh larson_hakmem
./out/release/larson_hakmem 1 5 1 1000 100 10000 42
# Result: 49.1M ops/s
```

Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 13:14:18 +09:00
4e082505cc Cleanup: Wrap shared_pool debug fprintf in #if !HAKMEM_BUILD_RELEASE
- Lock stats (P0 instrumentation): ~10 fprintf wrapped
- Stage stats (S1/S2/S3 breakdown): ~8 fprintf wrapped
- Release build now has no-op stubs for stats init functions
- Data collection APIs kept for learning layer compatibility
2025-11-26 13:05:17 +09:00
6b38bc840e Cleanup: Remove unused hakmem_libc.c (duplicate of hakmem_syscall.c)
- File was not included in Makefile OBJS_BASE
- Functions already implemented in hakmem_syscall.c
- Size: 361 bytes removed
2025-11-26 13:03:17 +09:00
bcfb4f6b59 Remove dead code: UltraHot, RingCache, FrontC23, Class5 Hotpath
(cherry-picked from 225b6fcc7, conflicts resolved)
2025-11-26 12:33:49 +09:00
feadc2832f Legacy cleanup: Remove obsolete test files and #if 0 blocks (-1,750 LOC)
(cherry-picked from cc0104c4e)
2025-11-26 12:31:04 +09:00
950627587a Remove legacy/unused code: 6 .inc files + disabled #if 0 block (1,159 LOC)
(cherry-picked from 9793f17d6)
2025-11-26 12:30:30 +09:00
5c85675621 Add callsite tracking for tls_sll_push/pop (macro-based Box Theory)
Problem:
- [TLS_SLL_PUSH_DUP] at 225K iterations but couldn't identify bypass path
- Need push AND pop callsites to diagnose reuse-before-pop bug

Implementation (Box Theory):
- Renamed tls_sll_push → tls_sll_push_impl (with where parameter)
- Renamed tls_sll_pop → tls_sll_pop_impl (with where parameter)
- Added macro wrappers with __func__ auto-insertion
- Zero changes to 40+ call sites (Box boundary preserved)

Debug-only tracking:
- All tracking code wrapped in #if !HAKMEM_BUILD_RELEASE
- Release builds: where=NULL, zero overhead
- Arrays: s_tls_sll_last_push_from[], s_tls_sll_last_pop_from[]

New log format:
[TLS_SLL_PUSH_DUP] cls=5 ptr=0x...
  last_push_from=hak_tiny_free_fast_v2
  last_pop_from=(null)  ← SMOKING GUN!
  where=hak_tiny_free_fast_v2

Decisive Evidence:
 last_pop_from=(null) proves TLS SLL never popped
 Unified Cache bypasses TLS SLL (confirmed by Task agent)
 Root cause: unified_cache_refill() directly carves from SuperSlab

Impact:
- Complete push/pop flow tracking (debug builds only)
- Root cause identified: Unified Cache at Line 289
- Next step: Fix unified_cache_refill() to check TLS SLL first

Credit: Box Theory macro pattern suggested by ChatGPT

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-22 11:30:46 +09:00
c8842360ca Fix: Double header calculation bug in tiny_block_stride_for_class() - META_MISMATCH resolved
Problem:
workset=8192 crashed with META_MISMATCH errors (off-by-one):
- [TLS_SLL_PUSH_META_MISMATCH] cls=3 meta_cls=2
- [HDR_META_MISMATCH] cls=6 meta_cls=5
- [FREE_FAST_HDR_META_MISMATCH] cls=7 meta_cls=6

Root Cause (discovered by Task agent):
Contradictory stride calculations in codebase:

1. g_tiny_class_sizes[TINY_NUM_CLASSES]
   - Already includes 1-byte header (TOTAL size)
   - {8, 16, 32, 64, 128, 256, 512, 2048}

2. tiny_block_stride_for_class() (BEFORE FIX)
   - Added extra +1 for header (DOUBLE COUNTING!)
   - Class 5: 256 + 1 = 257 (should be 256)
   - Class 6: 512 + 1 = 513 (should be 512)

This caused stride → class_idx reverse lookup to fail:
- superslab_init_slab() searched g_tiny_class_sizes[?] == 257
- No match found → meta->class_idx corrupted
- Free: header has cls=6, meta has cls=5 → MISMATCH!

Fix Applied (core/hakmem_tiny_superslab.h:49-69):

- Removed duplicate +1 calculation under HAKMEM_TINY_HEADER_CLASSIDX
- Added OOB guard (return 0 for invalid class_idx)
- Added comment: "g_tiny_class_sizes already includes the 1-byte header"

Test Results:

Before fix:
- 100K iterations: META_MISMATCH errors → SEGV
- 200K iterations: Immediate SEGV

After fix:
- 100K iterations:  9.9M ops/s (no errors)
- 200K iterations:  15.2M ops/s (no errors)
- 220K iterations:  15.3M ops/s (no errors)
- 225K iterations:  SEGV (different bug, not META_MISMATCH)

Impact:
 META_MISMATCH errors completely eliminated
 Stability improved: 100K → 220K iterations (+120%)
 Throughput stable: 15M ops/s
⚠️  Different SEGV at 225K (requires separate investigation)

Investigation Credit:
- Task agent: Identified contradictory stride tables
- ChatGPT: Applied fix and verified LUT correctness

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-22 09:34:35 +09:00
3d341a8b3f Fix: TLS SLL double-free diagnostics - Add error handling and detection improvements
Problem:
workset=8192 crashes at 240K iterations with TLS SLL double-free:
[TLS_SLL_PUSH] FATAL double-free: cls=5 ptr=... already in SLL

Investigation (Task agent):
Identified 8 tls_sll_push() call sites and 3 high-risk areas:
1. HIGH: Carve-Push Rollback pop failures (carve_push_box.c)
2. MEDIUM: Splice partial orphaned nodes (tiny_refill_opt.h)
3. MEDIUM: Incomplete double-free scan - only 64 nodes (tls_sll_box.h)

Fixes Applied:

1. core/box/carve_push_box.c (Lines 115-139)
   - Track pop_failed count during rollback
   - Log orphaned blocks: [BOX_CARVE_PUSH_ROLLBACK] warning
   - Helps identify when rollback leaves blocks in SLL

2. core/box/tls_sll_box.h (Lines 347-370)
   - Increase double-free scan: 64 → 256 nodes
   - Add scanned count to error: (scanned=%u/%u)
   - Catches orphaned blocks deeper in chain

3. core/tiny_refill_opt.h (Lines 135-166)
   - Enhanced splice partial logging
   - Abort in debug builds on orphaned nodes
   - Prevents silent memory leaks

Test Results:
Before: SEGV at 220K iterations
After:  SEGV at 240K iterations (improved detection)
        [TLS_SLL_PUSH] FATAL double-free: cls=5 ptr=... (scanned=2/71)

Impact:
 Early detection working (catches at position 2)
 Diagnostic capability greatly improved
⚠️  Root cause not yet resolved (deeper investigation needed)

Status: Diagnostic improvements committed for further analysis

Credit: Root cause analysis by Task agent (Explore)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-22 08:43:18 +09:00
6ae0db9fd2 Fix: workset=8192 SEGV - Align slab_index_for to Box3 geometry (iteration 2)
Problem:
After Box3 geometry unification (commit 2fe970252), workset=8192 still SEGVs:
- 200K iterations:  OK
- 300K iterations:  SEGV

Root Cause (identified by ChatGPT):
Header/metadata class mismatches around 300K iterations:
- [HDR_META_MISMATCH] hdr_cls=6 meta_cls=5
- [FREE_FAST_HDR_META_MISMATCH] hdr_cls=5 meta_cls=4
- [TLS_SLL_PUSH_META_MISMATCH] cls=5 meta_cls=4

Cause: slab_index_for() geometry mismatch with Box3
- tiny_slab_base_for_geometry() (Box3):
    - Slab 0: ss + SUPERSLAB_SLAB0_DATA_OFFSET
    - Slab 1: ss + 1*SLAB_SIZE
    - Slab k: ss + k*SLAB_SIZE

- Old slab_index_for():
    rel = p - (base + SUPERSLAB_SLAB0_DATA_OFFSET);
    idx = rel / SLAB_SIZE;

- Result: Off-by-one for slab_idx > 0
    Example: tiny_slab_base_for_geometry(ss, 4) returns 0x...40000
             slab_index_for(ss, 0x...40000) returns 3 (wrong!)

Impact:
- Block allocated in "C6 slab 4" appears to be in "C5 slab 3"
- Header class_idx (C6) != meta->class_idx (C5)
- TLS SLL corruption → SEGV after extended runs

Fix: core/superslab/superslab_inline.h
======================================
Rewrite slab_index_for() as inverse of Box3 geometry:

  static inline int slab_index_for(SuperSlab* ss, void* ptr) {
      // ... bounds checks ...

      // Slab 0: special case (has metadata offset)
      if (p < base + SLAB_SIZE) {
          return 0;
      }

      // Slab 1+: simple SLAB_SIZE spacing from base
      size_t rel = p - base;  // ← Changed from (p - base - OFFSET)
      int idx = (int)(rel / SLAB_SIZE);
      return idx;
  }

Verification:
- slab_index_for(ss, tiny_slab_base_for_geometry(ss, idx)) == idx 
- Consistent for any address within slab

Test Results:
=============
workset=8192 SEGV threshold improved further:

Before this fix (after 2fe970252):
   200K iterations: OK
   300K iterations: SEGV

After this fix:
   220K iterations: OK (15.5M ops/s)
   240K iterations: SEGV (different bug)

Progress:
- Iteration 1 (2fe970252): 0 → 200K stable
- Iteration 2 (this fix):  200K → 220K stable
- Total improvement: ∞ → 220K iterations (+10% stability)

Known Issues:
- 240K+ still SEGVs (suspected: TLS SLL double-free, per ChatGPT)
- Debug builds may show TLS_SLL_PUSH FATAL double-free detection
- Requires further investigation of free path

Impact:
- No performance regression in stable range
- Header/metadata mismatch errors eliminated
- workset=256 unaffected: 60M+ ops/s maintained

Credit: Root cause analysis and fix by ChatGPT

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-22 07:56:06 +09:00
2fe970252a Fix: workset=8192 SEGV - Unify SuperSlab geometry to Box3 (partial fix)
Problem:
- bench_random_mixed_hakmem with workset=8192 causes SEGV
- workset=256 works fine
- Root cause identified by ChatGPT analysis

Root Cause:
SuperSlab geometry double definition caused slab_base misalignment:
- Old: tiny_slab_base_for() used SLAB0_OFFSET + idx * SLAB_SIZE
- New: Box3 tiny_slab_base_for_geometry() uses offset only for idx=0
- Result: slab_idx > 0 had +2048 byte offset error
- Impact: Unified Cache carve stepped beyond slab boundary → SEGV

Fix 1: core/superslab/superslab_inline.h
========================================
Delegate SuperSlab base calculation to Box3:

  static inline uint8_t* tiny_slab_base_for(SuperSlab* ss, int slab_idx) {
      if (!ss || slab_idx < 0) return NULL;
      return tiny_slab_base_for_geometry(ss, slab_idx);  // ← Box3 unified
  }

Effect:
- All tiny_slab_base_for() calls now use single Box3 implementation
- TLS slab_base and Box3 calculations perfectly aligned
- Eliminates geometry mismatch between layers

Fix 2: core/front/tiny_unified_cache.c
========================================
Enhanced fail-fast validation (debug builds only):
- unified_refill_validate_base(): Use TLS as source of truth
- Cross-check with registry lookup for safety
- Validate: slab_base range, alignment, meta consistency
- Box3 + TLS boundary consolidated to one place

Fix 3: core/hakmem_tiny_superslab.h
========================================
Added forward declaration:
- SuperSlab* superslab_refill(int class_idx);
- Required by tiny_unified_cache.c

Test Results:
=============
workset=8192 SEGV threshold improved:

Before fix:
   Immediate SEGV at any iteration count

After fix:
   100K iterations: OK (9.8M ops/s)
   200K iterations: OK (15.5M ops/s)
   300K iterations: SEGV (different bug exposed)

Conclusion:
- Box3 geometry unification fixed primary SEGV
- Stability improved: 0 → 200K iterations
- Remaining issue: 300K+ iterations hit different bug
- Likely causes: memory pressure, different corruption pattern

Known Issues:
- Debug warnings still present: FREE_FAST_HDR_META_MISMATCH, NXT_HDR_MISMATCH
- These are separate header consistency issues (not related to geometry)
- 300K+ SEGV requires further investigation

Performance:
- No performance regression observed in stable range
- workset=256 unaffected: 60M+ ops/s maintained

Credit: Root cause analysis and fix strategy by ChatGPT

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-22 07:40:35 +09:00