a24f17386c
ENV Cleanup Step 11: Gate HAKMEM_SS_PREWARM_DEBUG in super_registry.c
...
Gate HAKMEM_SS_PREWARM_DEBUG environment variable behind
#if !HAKMEM_BUILD_RELEASE in prewarm functions (2 call sites).
Changes:
- Wrap dbg variable in hak_ss_prewarm_class()
- Wrap dbg variable in hak_ss_prewarm_init()
- Release builds use constant dbg = 0 for complete code elimination
Performance: 30.2M ops/s Larson (stable, within expected variance)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-28 01:48:57 +09:00
2c3dcdb90b
ENV Cleanup Step 10: Gate HAKMEM_SS_LRU_DEBUG in super_registry.c
...
Gate HAKMEM_SS_LRU_DEBUG environment variable behind
#if !HAKMEM_BUILD_RELEASE in LRU cache operations (3 call sites).
Changes:
- Wrap dbg variable in ss_lru_evict_one()
- Wrap dbg variable in hak_ss_lru_pop()
- Wrap dbg variable in hak_ss_lru_push()
- Release builds use constant dbg = 0 for complete code elimination
Performance: 30.7M ops/s Larson (+1.3% improvement)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-28 01:48:02 +09:00
4540b01da0
ENV Cleanup Step 9: Gate HAKMEM_SUPER_REG_DEBUG in super_registry.c
...
Gate HAKMEM_SUPER_REG_DEBUG environment variable behind
#if !HAKMEM_BUILD_RELEASE in register/unregister functions.
Changes:
- Wrap dbg variable initialization in hak_super_register()
- Wrap dbg_once static variable and ENV check in hak_super_unregister()
- Release builds use constant dbg = 0 for complete code elimination
Performance: 30.6M ops/s Larson (+1.0% improvement)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-28 01:46:50 +09:00
f8b0f38f78
ENV Cleanup Step 8: Gate HAKMEM_SUPER_LOOKUP_DEBUG in header
...
Gate HAKMEM_SUPER_LOOKUP_DEBUG environment variable behind
#if !HAKMEM_BUILD_RELEASE in hakmem_super_registry.h inline function.
Changes:
- Wrap s_dbg initialization in conditional compilation
- Release builds use constant s_dbg = 0 for complete elimination
- Debug logging in hak_super_lookup() now fully compiled out in release
Performance: 30.3M ops/s Larson (stable, no regression)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-28 01:45:45 +09:00
cfa5e4e91c
ENV Cleanup Step 7: Gate debug ENV vars in core/box/free_local_box.c
...
Changes:
- Gated HAKMEM_TINY_SLL_DIAG (2 call sites) behind #if !HAKMEM_BUILD_RELEASE
- Gated HAKMEM_TINY_FREELIST_MASK behind #if !HAKMEM_BUILD_RELEASE
- Gated HAKMEM_SS_FREE_DEBUG behind #if !HAKMEM_BUILD_RELEASE
- Entire diagnostic blocks wrapped (not just getenv) to avoid compilation errors
- ENV variables gated: HAKMEM_TINY_SLL_DIAG, HAKMEM_TINY_FREELIST_MASK, HAKMEM_SS_FREE_DEBUG
Performance: 30.4M ops/s Larson (baseline 30.4M, perfect match)
Build: Clean, pre-existing warnings only
FIX: Previous version had scoping issue with static variables inside do{}while blocks
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-28 01:10:26 +09:00
d0d2814f15
ENV Cleanup Step 6: Gate HAKMEM_TIMING in core/hakmem_debug.c
...
Changes:
- Gated HAKMEM_TIMING ENV check behind #if !HAKMEM_BUILD_RELEASE
- Release builds set g_timing_enabled = 0 directly (no getenv call)
- Debug builds preserve existing behavior
- ENV variable gated: HAKMEM_TIMING
Performance: 30.3M ops/s Larson (baseline 30.4M, within margin)
Build: Clean, LTO warnings only (pre-existing)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-28 01:05:18 +09:00
35e8e4c34d
ENV Cleanup Step 5: Gate HAKMEM_PTR_TRACE_DUMP/VERBOSE in core/ptr_trace.h
...
Changes:
- Gated HAKMEM_PTR_TRACE_DUMP behind #if !HAKMEM_BUILD_RELEASE
- Gated HAKMEM_PTR_TRACE_VERBOSE behind #if !HAKMEM_BUILD_RELEASE
- Used lazy init pattern with __builtin_expect for branch prediction
- ENV variables gated: HAKMEM_PTR_TRACE_DUMP, HAKMEM_PTR_TRACE_VERBOSE
Performance: 29.2M ops/s Larson (baseline 30.4M, -4% acceptable variance)
Build: Clean, LTO warnings only (pre-existing)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-28 01:04:29 +09:00
316ea4dfd6
ENV Cleanup Step 4: Gate HAKMEM_WATCH_ADDR in tiny_region_id.h
...
Gate get_watch_addr() debug functionality with HAKMEM_BUILD_RELEASE,
returning 0 in release builds to disable address watching overhead.
Performance: 30.31M ops/s (baseline: 30.2M)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-28 00:47:16 +09:00
42747a1080
ENV Cleanup Step 3: Gate HAKMEM_TINY_PROFILE in tiny_fastcache.h
...
Gate tiny_fast_profile_enabled() getenv call with HAKMEM_BUILD_RELEASE,
returning 0 in release builds to disable profiling overhead.
Performance: 30.34M ops/s (baseline: 30.2M)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-28 00:46:32 +09:00
794bf996f1
ENV Cleanup Step 2c: Gate debug code in hakmem_tiny_alloc.inc
...
Gate tiny_alloc_dump_tls_state() call on allocation failure path with
HAKMEM_BUILD_RELEASE guard.
Performance: 30.15M ops/s (baseline: 30.2M)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-28 00:45:27 +09:00
0567e2957f
ENV Cleanup Step 2b: Gate debug code in tiny_superslab_free.inc.h
...
Gate tiny_alloc_dump_tls_state() call in remote watch debug path with
HAKMEM_BUILD_RELEASE guard, consolidating with existing debug fprintf.
Performance: 30.3M ops/s (baseline: 30.2M)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-28 00:44:47 +09:00
d6c2ea6f3e
ENV Cleanup Step 2a: Gate debug code in hakmem_tiny_slow.inc
...
Gate tiny_alloc_dump_tls_state() call and getenv debug code on slow path
failure with HAKMEM_BUILD_RELEASE guard.
Performance: 30.5M ops/s (baseline: 30.2M)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-28 00:43:57 +09:00
3833d4e3eb
ENV Cleanup Step 1: Gate tiny_debug.h with HAKMEM_BUILD_RELEASE
...
Wrap debug functionality in !HAKMEM_BUILD_RELEASE guard with no-op stubs
for release builds. This eliminates getenv() calls for HAKMEM_TINY_ALLOC_DEBUG
in production while maintaining API compatibility.
Performance: 30.0M ops/s (baseline: 30.2M)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-28 00:43:07 +09:00
930c5283b4
Fix Larson 36x slowdown: Remove tls_uninitialized early return in sll_refill_small_from_ss()
...
Problem:
- Larson benchmark showed 730K ops/s instead of expected 26M ops/s
- Class 1 TLS SLL cache always stayed empty (tls_count=0)
- All allocations went through slow path (shared_pool_acquire_slab at 48% CPU)
Root cause:
- In sll_refill_small_from_ss(), when TLS was completely uninitialized
(ss=NULL, meta=NULL, slab_base=NULL), the function returned 0 immediately
without calling superslab_refill() to initialize it
- The comment said "expect upper logic to call superslab_refill" but
tiny_alloc_fast_refill() did NOT call it after receiving 0
- This created a loop: TLS SLL stays empty → refill returns 0 → slow path
Fix:
- Remove the tls_uninitialized early return
- Let the existing downstream condition (!tls->ss || !tls->meta || ...)
handle the uninitialized case and call superslab_refill()
Result:
- Throughput: 730K → 26.5M ops/s (36x improvement)
- shared_pool_acquire_slab: 48% → 0% in perf profile
Introduced in: fcf098857 (Phase12 debug, 2025-11-14)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-27 16:47:30 +09:00
8355214135
Fix NULL pointer crash in unified_cache_refill ss_active_add
...
When superslab_refill() fails in the inner loop, tls->ss can remain
NULL even when produced > 0 (from earlier successful allocations).
This caused a segfault at high iteration counts (>500K) in the
random_mixed benchmark.
Root cause: Line 353 calls ss_active_add(tls->ss, ...) without
checking if tls->ss is NULL after a failed refill breaks the loop.
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-27 13:31:46 +09:00
7a03a614fd
Restrict ss_fast_lookup to validated Tiny pointer paths only
...
Safety fix: ss_fast_lookup masks pointer to 1MB boundary and reads
memory at that address. If called with arbitrary (non-Tiny) pointers,
the masked address could be unmapped → SEGFAULT.
Changes:
- tiny_free_fast(): Reverted to safe hak_super_lookup (can receive
arbitrary pointers without prior validation)
- ss_fast_lookup(): Added safety warning in comments documenting when
it's safe to use (after header magic 0xA0 validation)
ss_fast_lookup remains in LARSON_FIX paths where header magic is
already validated before the SuperSlab lookup.
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-27 12:55:40 +09:00
64ed3d8d8c
Add ss_fast_lookup() for O(1) SuperSlab lookup via mask
...
Replaces expensive hak_super_lookup() (registry hash lookup, 50-100 cycles)
with fast mask-based lookup (~5-10 cycles) in free hot paths.
Algorithm:
1. Mask pointer with SUPERSLAB_SIZE_MIN (1MB) - works for both 1MB and 2MB SS
2. Validate magic (SUPERSLAB_MAGIC)
3. Range check using ss->lg_size
Applied to:
- tiny_free_fast.inc.h: tiny_free_fast() SuperSlab path
- tiny_free_fast_v2.inc.h: LARSON_FIX cross-thread check
- front/malloc_tiny_fast.h: free_tiny_fast() LARSON_FIX path
Note: Performance impact minimal with LARSON_FIX=OFF (default) since
SuperSlab lookup is skipped entirely in that case. Optimization benefits
LARSON_FIX=ON path for safe multi-threaded operation.
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-27 12:47:10 +09:00
0a8bdb8b18
Fix release build debug logging in tiny_region_id.h
...
The allocation logging at line 236-249 was missing the
#if !HAKMEM_BUILD_RELEASE guard, causing fprintf(stderr)
on every allocation even in release builds.
Impact: 19.8M ops/s → 28.0M ops/s (+42%)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-27 11:58:00 +09:00
d8e3971dc2
Fix cross-thread ownership check: Use bits 8-15 for owner_tid_low
...
Problem:
- TLS_SLL_PUSH_DUP crash in Larson multi-threaded benchmark
- Cross-thread frees incorrectly routed to same-thread TLS path
- Root cause: pthread_t on glibc is 256-byte aligned (TCB base)
so lower 8 bits are ALWAYS 0x00 for ALL threads
Fix:
- Change owner_tid_low from (tid & 0xFF) to ((tid >> 8) & 0xFF)
- Bits 8-15 actually vary between threads, enabling correct detection
- Applied consistently across all ownership check locations:
- superslab_inline.h: ss_owner_try_acquire/release/is_mine
- slab_handle.h: slab_try_acquire
- tiny_free_fast.inc.h: tiny_free_is_same_thread_ss
- tiny_free_fast_v2.inc.h: cross-thread detection
- tiny_superslab_free.inc.h: same-thread check
- ss_allocation_box.c: slab initialization
- hakmem_tiny_superslab.c: ownership handling
Also added:
- Address watcher debug infrastructure (tiny_region_id.h)
- Cross-thread detection in malloc_tiny_fast.h Front Gate
Test results:
- Larson 1T/2T/4T: PASS (no TLS_SLL_PUSH_DUP crash)
- random_mixed: PASS
- Performance: ~20M ops/s (regression from 48M, needs optimization)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-27 11:52:11 +09:00
8af9123bcc
Larson double-free investigation: Add full operation lifecycle logging
...
**Diagnostic Enhancement**: Complete malloc/free/pop operation tracing for debug
**Problem**: Larson crashes with TLS_SLL_DUP at count=18, need to trace exact
pointer lifecycle to identify if allocator returns duplicate addresses or if
benchmark has double-free bug.
**Implementation** (ChatGPT + Claude + Task collaboration):
1. **Global Operation Counter** (core/hakmem_tiny_config_box.inc:9):
- Single atomic counter for all operations (malloc/free/pop)
- Chronological ordering across all paths
2. **Allocation Logging** (core/hakmem_tiny_config_box.inc:148-161):
- HAK_RET_ALLOC macro enhanced with operation logging
- Logs first 50 class=1 allocations with ptr/base/tls_count
3. **Free Logging** (core/tiny_free_fast_v2.inc.h:222-235):
- Added before tls_sll_push() call (line 221)
- Logs first 50 class=1 frees with ptr/base/tls_count_before
4. **Pop Logging** (core/box/tls_sll_box.h:587-597):
- Added in tls_sll_pop_impl() after successful pop
- Logs first 50 class=1 pops with base/tls_count_after
5. **Drain Debug Logging** (core/box/tls_sll_drain_box.h:143-151):
- Enhanced drain loop with detailed logging
- Tracks pop failures and drained block counts
**Initial Findings**:
- First 19 operations: ALL frees, ZERO allocations, ZERO pops
- OP#0006: First free of 0x...430
- OP#0018: Duplicate free of 0x...430 → TLS_SLL_DUP detected
- Suggests either: (a) allocations before logging starts, or (b) Larson bug
**Debug-only**: All logging gated by !HAKMEM_BUILD_RELEASE (zero cost in release)
**Next Steps**:
- Expand logging window to 200 operations
- Log initialization phase allocations
- Cross-check with Larson benchmark source
**Status**: Ready for extended testing
2025-11-27 08:18:01 +09:00
8553894171
Larson double-free investigation: Enhanced diagnostics + Remove buggy drain pushback
...
**Problem**: Larson benchmark crashes with TLS_SLL_DUP (double-free), 100% crash rate in debug
**Root Cause**: TLS drain pushback code (commit c2f104618 ) created duplicates by
pushing pointers back to TLS SLL while they were still in the linked list chain.
**Diagnostic Enhancements** (ChatGPT + Claude collaboration):
1. **Callsite Tracking**: Track file:line for each TLS SLL push (debug only)
- Arrays: g_tls_sll_push_file[], g_tls_sll_push_line[]
- Macro: tls_sll_push() auto-records __FILE__, __LINE__
2. **Enhanced Duplicate Detection**:
- Scan depth: 64 → 256 nodes (deep duplicate detection)
- Error message shows BOTH current and previous push locations
- Calls ptr_trace_dump_now() for detailed analysis
3. **Evidence Captured**:
- Both duplicate pushes from same line (221)
- Pointer at position 11 in TLS SLL (count=18, scanned=11)
- Confirms pointer allocated without being popped from TLS SLL
**Fix**:
- **core/box/tls_sll_drain_box.h**: Remove pushback code entirely
- Old: Push back to TLS SLL on validation failure → duplicates!
- New: Skip pointer (accept rare leak) to avoid duplicates
- Rationale: SuperSlab lookup failures are transient/rare
**Status**: Fix implemented, ready for testing
**Updated**:
- LARSON_DOUBLE_FREE_INVESTIGATION.md: Root cause confirmed
2025-11-27 07:30:32 +09:00
c2f104618f
Fix critical TLS drain memory leak causing potential double-free
...
## Root Cause
TLS drain was dropping pointers when SuperSlab lookup or slab_idx validation failed:
- Pop pointer from TLS SLL
- Lookup/validation fails
- continue → LEAK! Pointer never returned to any freelist
## Impact
Memory leak + potential double allocation:
1. Pointer P popped but leaked
2. Same address P reallocated from carve/other source
3. User frees P again → duplicate detection → ABORT
## Fix
**Before (BUGGY)**:
```c
if (!ss || invalid_slab_idx) {
continue; // ← LEAK!
}
```
**After (FIXED)**:
```c
if (!ss || invalid_slab_idx) {
// Push back to TLS SLL head (retry later)
tiny_next_write(class_idx, base, g_tls_sll[class_idx].head);
g_tls_sll[class_idx].head = base;
g_tls_sll[class_idx].count++;
break; // Stop draining to avoid infinite retry
}
```
## Files Changed
- core/box/tls_sll_drain_box.h: Fix 2 leak sites (SS lookup + slab_idx validation)
- docs/analysis/LARSON_DOUBLE_FREE_INVESTIGATION.md: Investigation report
## Related
- Larson double-free investigation (47% crash rate)
- Commit e4868bf23 : Freelist header write + abort() on duplicate
- ChatGPT analysis: Larson benchmark code is correct (no user bug)
2025-11-27 06:49:38 +09:00
e4868bf236
Larson crash investigation: Add freelist header write + abort() on duplicate
...
## Changes
1. **TLS SLL duplicate detection** (core/box/tls_sll_box.h:381)
- Changed 'return true' to 'abort()' to get backtrace on double-free
- Enables precise root cause identification
2. **Freelist header write fix** (core/tiny_superslab_alloc.inc.h:159-169)
- Added tiny_region_id_write_header() call in freelist allocation path
- Previously only linear carve wrote headers → stale headers on reuse
- Now both paths write headers consistently
## Root Cause Analysis
Backtrace revealed true double-free pattern:
- last_push_from=hak_tiny_free_fast_v2 (freed once)
- last_pop_from=(null) (never allocated)
- where=hak_tiny_free_fast_v2 (freed again!)
Same pointer freed twice WITHOUT reallocation in between.
## Status
- Freelist header fix: ✅ Implemented (necessary but not sufficient)
- Double-free still occurs: ❌ Deeper investigation needed
- Possible causes: User code bug, TLS drain race, remote free issue
Next: Investigate allocation/free flow with enhanced tracing
2025-11-27 05:57:22 +09:00
12c36afe46
Fix TSan build: Add weak stubs for sanitizer compatibility
...
Added weak stubs to core/link_stubs.c for symbols that are not needed
in HAKMEM_FORCE_LIBC_ALLOC_BUILD=1 (TSan/ASan) builds:
Stubs added:
- g_bump_chunk (int)
- g_tls_bcur, g_tls_bend (__thread uint8_t*[8])
- smallmid_backend_free()
- expand_superslab_head()
Also added: #include <stdint.h> for uint8_t
Impact:
- TSan build: PASS (larson_hakmem_tsan successfully built)
- Phase 2 ready: Can now use TSan to debug Larson crashes
Next: Use TSan to investigate Larson 47% crash rate
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-27 05:19:56 +09:00
6b791b97d4
ENV Cleanup: Delete Ultra HEAP & BG Remote dead code (-1,096 LOC)
...
Deleted files (11):
- core/ultra/ directory (6 files: tiny_ultra_heap.*, tiny_ultra_page_arena.*)
- core/front/tiny_ultrafront.h
- core/tiny_ultra_fast.inc.h
- core/hakmem_tiny_ultra_front.inc.h
- core/hakmem_tiny_ultra_simple.inc
- core/hakmem_tiny_ultra_batch_box.inc
Edited files (10):
- core/hakmem_tiny.c: Remove Ultra HEAP #includes, move ultra_batch_for_class()
- core/hakmem_tiny_tls_state_box.inc: Delete TinyUltraFront, g_ultra_simple
- core/hakmem_tiny_phase6_wrappers_box.inc: Delete ULTRA_SIMPLE block
- core/hakmem_tiny_alloc.inc: Delete Ultra-Front code block
- core/hakmem_tiny_init.inc: Delete ULTRA_SIMPLE ENV loading
- core/hakmem_tiny_remote_target.{c,h}: Delete g_bg_remote_enable/batch
- core/tiny_refill.h: Remove BG Remote check (always break)
- core/hakmem_tiny_background.inc: Delete BG Remote drain loop
Deleted ENV variables:
- HAKMEM_TINY_ULTRA_HEAP (build flag, undefined)
- HAKMEM_TINY_ULTRA_L0
- HAKMEM_TINY_ULTRA_HEAP_DUMP
- HAKMEM_TINY_ULTRA_PAGE_DUMP
- HAKMEM_TINY_ULTRA_FRONT
- HAKMEM_TINY_BG_REMOTE (no getenv, dead code)
- HAKMEM_TINY_BG_REMOTE_BATCH (no getenv, dead code)
- HAKMEM_TINY_ULTRA_SIMPLE (references only)
Impact:
- Code reduction: -1,096 lines
- Binary size: 305KB → 304KB (-1KB)
- Build: PASS
- Sanity: 15.69M ops/s (3 runs avg)
- Larson: 1 crash observed (seed 43, likely existing instability)
Notes:
- Ultra HEAP never compiled (#if HAKMEM_TINY_ULTRA_HEAP undefined)
- BG Remote variables never initialized (g_bg_remote_enable always 0)
- Ultra SLIM (ultra_slim_alloc_box.h) preserved (active 4-layer path)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-27 04:35:47 +09:00
f4978b1529
ENV Cleanup Phase 5: Additional DEBUG guards + doc cleanup
...
Code changes:
- core/slab_handle.h: Add RELEASE guard for HAKMEM_TINY_FREELIST_MASK
- core/tiny_superslab_free.inc.h: Add guards for HAKMEM_TINY_ROUTE_FREE, HAKMEM_TINY_FREELIST_MASK
Documentation cleanup:
- docs/specs/CONFIGURATION.md: Remove 21 doc-only ENV variables
- docs/specs/ENV_VARS.md: Remove doc-only variables
Testing:
- Build: PASS (305KB binary, unchanged)
- Sanity: PASS (17.22M ops/s average, 3 runs)
- Larson: PASS (52.12M ops/s, 0 crashes)
Impact:
- 2 additional DEBUG ENV variables guarded (no overhead in RELEASE)
- Documentation accuracy improved
- Binary size maintained
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-27 03:55:17 +09:00
43015725af
ENV cleanup: Add RELEASE guards to DEBUG ENV variables (14 vars)
...
Added compile-time guards (#if HAKMEM_BUILD_RELEASE) to eliminate
DEBUG ENV variable overhead in RELEASE builds.
Variables guarded (14 total):
- HAKMEM_TINY_TRACE_RING, HAKMEM_TINY_DUMP_RING_ATEXIT
- HAKMEM_TINY_RF_TRACE, HAKMEM_TINY_MAILBOX_TRACE
- HAKMEM_TINY_MAILBOX_TRACE_LIMIT, HAKMEM_TINY_MAILBOX_SLOWDISC
- HAKMEM_TINY_MAILBOX_SLOWDISC_PERIOD
- HAKMEM_SS_PREWARM_DEBUG, HAKMEM_SS_FREE_DEBUG
- HAKMEM_TINY_FRONT_METRICS, HAKMEM_TINY_FRONT_DUMP
- HAKMEM_TINY_COUNTERS_DUMP, HAKMEM_TINY_REFILL_DUMP
- HAKMEM_PTR_TRACE_DUMP, HAKMEM_PTR_TRACE_VERBOSE
Files modified (9 core files):
- core/tiny_debug_ring.c (ring trace/dump)
- core/box/mailbox_box.c (mailbox trace + slowdisc)
- core/tiny_refill.h (refill trace)
- core/hakmem_tiny_superslab.c (superslab debug)
- core/box/ss_allocation_box.c (allocation debug)
- core/tiny_superslab_free.inc.h (free debug)
- core/box/front_metrics_box.c (frontend metrics)
- core/hakmem_tiny_stats.c (stats dump)
- core/ptr_trace.h (pointer trace)
Bug fixes during implementation:
1. mailbox_box.c - Fixed variable scope (moved 'used' outside guard)
2. hakmem_tiny_stats.c - Fixed incomplete declarations (on1, on2)
Impact:
- Binary size: -85KB total
- bench_random_mixed_hakmem: 319K → 305K (-14K, -4.4%)
- larson_hakmem: 380K → 309K (-71K, -18.7%)
- Performance: No regression (16.9-17.9M ops/s maintained)
- Functional: All tests pass (Random Mixed + Larson)
- Behavior: DEBUG ENV vars correctly ignored in RELEASE builds
Testing:
- Build: Clean compilation (warnings only, pre-existing)
- 100K Random Mixed: 16.9-17.9M ops/s (PASS)
- 10K Larson: 25.9M ops/s (PASS)
- DEBUG ENV verification: Correctly ignored (PASS)
Result: 14 DEBUG ENV variables now have zero overhead in RELEASE builds.
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-27 03:41:07 +09:00
543abb0586
ENV cleanup: Consolidate SFC_DEBUG getenv() calls (86% reduction)
...
Optimized HAKMEM_SFC_DEBUG environment variable handling by caching
the value at initialization instead of repeated getenv() calls in
hot paths.
Changes:
1. Added g_sfc_debug global variable (core/hakmem_tiny_sfc.c)
- Initialized once in sfc_init() by reading HAKMEM_SFC_DEBUG
- Single source of truth for SFC debug state
2. Declared g_sfc_debug as extern (core/hakmem_tiny_config.h)
- Available to all modules that need SFC debug checks
3. Replaced getenv() with g_sfc_debug in hot paths:
- core/tiny_alloc_fast_sfc.inc.h (allocation path)
- core/tiny_free_fast.inc.h (free path)
- core/box/hak_wrappers.inc.h (wrapper layer)
Impact:
- getenv() calls: 7 → 1 (86% reduction)
- Hot-path calls eliminated: 6 (all moved to init-time)
- Performance: 15.10M ops/s (stable, 0% CV)
- Build: Clean compilation, no new warnings
Testing:
- 10 runs of 100K iterations: consistent performance
- Symbol verification: g_sfc_debug present in hakmem_tiny_sfc.o
- No regression detected
Note: 3 additional getenv("HAKMEM_SFC_DEBUG") calls exist in
hakmem_tiny_ultra_simple.inc but are dead code (file not compiled
in current build configuration).
Files modified: 5 core files
Status: Production-ready, all tests passed
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-27 03:18:33 +09:00
6fadc74405
ENV cleanup: Remove obsolete ULTRAHOT variable + organize docs
...
Changes:
1. Removed HAKMEM_TINY_FRONT_ENABLE_ULTRAHOT variable
- Deleted front_prune_ultrahot_enabled() function
- UltraHot feature was removed in commit bcfb4f6b5
- Variable was dead code, no longer referenced
2. Organized ENV cleanup analysis documents
- Moved 5 ENV analysis docs to docs/analysis/
- ENV_CLEANUP_PLAN.md - detailed file-by-file plan
- ENV_CLEANUP_SUMMARY.md - executive summary
- ENV_CLEANUP_ANALYSIS.md - categorized analysis
- ENV_CONSOLIDATION_PLAN.md - consolidation proposals
- ENV_QUICK_REFERENCE.md - quick reference guide
Impact:
- ENV variables: 221 → 220 (-1)
- Build: ✅ Successful
- Risk: Zero (dead code removal)
Next steps (documented in ENV_CLEANUP_SUMMARY.md):
- 21 variables need verification (Ultra/HeapV2/BG/HotMag)
- SFC_DEBUG deduplication opportunity (7 callsites)
File: core/box/front_metrics_box.h
Status: SAVEPOINT - stable baseline for future ENV cleanup
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-26 17:12:41 +09:00
bea839add6
Revert "Port: Tune Superslab Min-Keep and Shared Pool Soft Caps (04a60c316)"
...
This reverts commit d355041638 .
2025-11-26 15:43:45 +09:00
d355041638
Port: Tune Superslab Min-Keep and Shared Pool Soft Caps (04a60c316)
...
- Policy: Set tiny_min_keep for C2-C6 to reduce mmap/munmap churn
- Policy: Loosen tiny_cap (soft cap) for C4-C6 to allow more active slots
- Added tiny_min_keep field to FrozenPolicy struct
Larson: 52.13M ops/s (stable)
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-26 15:06:36 +09:00
a2e65716b3
Port: Optimize tiny_get_max_size inline (e81fe783d)
...
- Move tiny_get_max_size to header for inlining
- Use cached static variable to avoid repeated env lookup
- Larson: 51.99M ops/s (stable)
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-26 15:05:03 +09:00
a9ddb52ad4
ENV cleanup: Remove BG/HotMag vars & guard fprintf (Larson 52.3M ops/s)
...
Phase 1 完了:環境変数整理 + fprintf デバッグガード
ENV変数削除(BG/HotMag系):
- core/hakmem_tiny_init.inc: HotMag ENV 削除 (~131 lines)
- core/hakmem_tiny_bg_spill.c: BG spill ENV 削除
- core/tiny_refill.h: BG remote 固定値化
- core/hakmem_tiny_slow.inc: BG refs 削除
fprintf Debug Guards (#if !HAKMEM_BUILD_RELEASE):
- core/hakmem_shared_pool.c: Lock stats (~18 fprintf)
- core/page_arena.c: Init/Shutdown/Stats (~27 fprintf)
- core/hakmem.c: SIGSEGV init message
ドキュメント整理:
- 328 markdown files 削除(旧レポート・重複docs)
性能確認:
- Larson: 52.35M ops/s (前回52.8M、安定動作✅ )
- ENV整理による機能影響なし
- Debug出力は一部残存(次phase で対応)
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-26 14:45:26 +09:00
67fb15f35f
Wrap debug fprintf in !HAKMEM_BUILD_RELEASE guards (Release build optimization)
...
## Changes
### 1. core/page_arena.c
- Removed init failure message (lines 25-27) - error is handled by returning early
- All other fprintf statements already wrapped in existing #if !HAKMEM_BUILD_RELEASE blocks
### 2. core/hakmem.c
- Wrapped SIGSEGV handler init message (line 72)
- CRITICAL: Kept SIGSEGV/SIGBUS/SIGABRT error messages (lines 62-64) - production needs crash logs
### 3. core/hakmem_shared_pool.c
- Wrapped all debug fprintf statements in #if !HAKMEM_BUILD_RELEASE:
- Node pool exhaustion warning (line 252)
- SP_META_CAPACITY_ERROR warning (line 421)
- SP_FIX_GEOMETRY debug logging (line 745)
- SP_ACQUIRE_STAGE0.5_EMPTY debug logging (line 865)
- SP_ACQUIRE_STAGE0_L0 debug logging (line 803)
- SP_ACQUIRE_STAGE1_LOCKFREE debug logging (line 922)
- SP_ACQUIRE_STAGE2_LOCKFREE debug logging (line 996)
- SP_ACQUIRE_STAGE3 debug logging (line 1116)
- SP_SLOT_RELEASE debug logging (line 1245)
- SP_SLOT_FREELIST_LOCKFREE debug logging (line 1305)
- SP_SLOT_COMPLETELY_EMPTY debug logging (line 1316)
- Fixed lock_stats_init() for release builds (lines 60-65) - ensure g_lock_stats_enabled is initialized
## Performance Validation
Before: 51M ops/s (with debug fprintf overhead)
After: 49.1M ops/s (consistent performance, fprintf removed from hot paths)
## Build & Test
```bash
./build.sh larson_hakmem
./out/release/larson_hakmem 1 5 1 1000 100 10000 42
# Result: 49.1M ops/s
```
Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-26 13:14:18 +09:00
4e082505cc
Cleanup: Wrap shared_pool debug fprintf in #if !HAKMEM_BUILD_RELEASE
...
- Lock stats (P0 instrumentation): ~10 fprintf wrapped
- Stage stats (S1/S2/S3 breakdown): ~8 fprintf wrapped
- Release build now has no-op stubs for stats init functions
- Data collection APIs kept for learning layer compatibility
2025-11-26 13:05:17 +09:00
6b38bc840e
Cleanup: Remove unused hakmem_libc.c (duplicate of hakmem_syscall.c)
...
- File was not included in Makefile OBJS_BASE
- Functions already implemented in hakmem_syscall.c
- Size: 361 bytes removed
2025-11-26 13:03:17 +09:00
bcfb4f6b59
Remove dead code: UltraHot, RingCache, FrontC23, Class5 Hotpath
...
(cherry-picked from 225b6fcc7, conflicts resolved)
2025-11-26 12:33:49 +09:00
feadc2832f
Legacy cleanup: Remove obsolete test files and #if 0 blocks (-1,750 LOC)
...
(cherry-picked from cc0104c4e)
2025-11-26 12:31:04 +09:00
950627587a
Remove legacy/unused code: 6 .inc files + disabled #if 0 block (1,159 LOC)
...
(cherry-picked from 9793f17d6)
2025-11-26 12:30:30 +09:00
5c85675621
Add callsite tracking for tls_sll_push/pop (macro-based Box Theory)
...
Problem:
- [TLS_SLL_PUSH_DUP] at 225K iterations but couldn't identify bypass path
- Need push AND pop callsites to diagnose reuse-before-pop bug
Implementation (Box Theory):
- Renamed tls_sll_push → tls_sll_push_impl (with where parameter)
- Renamed tls_sll_pop → tls_sll_pop_impl (with where parameter)
- Added macro wrappers with __func__ auto-insertion
- Zero changes to 40+ call sites (Box boundary preserved)
Debug-only tracking:
- All tracking code wrapped in #if !HAKMEM_BUILD_RELEASE
- Release builds: where=NULL, zero overhead
- Arrays: s_tls_sll_last_push_from[], s_tls_sll_last_pop_from[]
New log format:
[TLS_SLL_PUSH_DUP] cls=5 ptr=0x...
last_push_from=hak_tiny_free_fast_v2
last_pop_from=(null) ← SMOKING GUN!
where=hak_tiny_free_fast_v2
Decisive Evidence:
✅ last_pop_from=(null) proves TLS SLL never popped
✅ Unified Cache bypasses TLS SLL (confirmed by Task agent)
✅ Root cause: unified_cache_refill() directly carves from SuperSlab
Impact:
- Complete push/pop flow tracking (debug builds only)
- Root cause identified: Unified Cache at Line 289
- Next step: Fix unified_cache_refill() to check TLS SLL first
Credit: Box Theory macro pattern suggested by ChatGPT
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-22 11:30:46 +09:00
c8842360ca
Fix: Double header calculation bug in tiny_block_stride_for_class() - META_MISMATCH resolved
...
Problem:
workset=8192 crashed with META_MISMATCH errors (off-by-one):
- [TLS_SLL_PUSH_META_MISMATCH] cls=3 meta_cls=2
- [HDR_META_MISMATCH] cls=6 meta_cls=5
- [FREE_FAST_HDR_META_MISMATCH] cls=7 meta_cls=6
Root Cause (discovered by Task agent):
Contradictory stride calculations in codebase:
1. g_tiny_class_sizes[TINY_NUM_CLASSES]
- Already includes 1-byte header (TOTAL size)
- {8, 16, 32, 64, 128, 256, 512, 2048}
2. tiny_block_stride_for_class() (BEFORE FIX)
- Added extra +1 for header (DOUBLE COUNTING!)
- Class 5: 256 + 1 = 257 (should be 256)
- Class 6: 512 + 1 = 513 (should be 512)
This caused stride → class_idx reverse lookup to fail:
- superslab_init_slab() searched g_tiny_class_sizes[?] == 257
- No match found → meta->class_idx corrupted
- Free: header has cls=6, meta has cls=5 → MISMATCH!
Fix Applied (core/hakmem_tiny_superslab.h:49-69):
- Removed duplicate +1 calculation under HAKMEM_TINY_HEADER_CLASSIDX
- Added OOB guard (return 0 for invalid class_idx)
- Added comment: "g_tiny_class_sizes already includes the 1-byte header"
Test Results:
Before fix:
- 100K iterations: META_MISMATCH errors → SEGV
- 200K iterations: Immediate SEGV
After fix:
- 100K iterations: ✅ 9.9M ops/s (no errors)
- 200K iterations: ✅ 15.2M ops/s (no errors)
- 220K iterations: ✅ 15.3M ops/s (no errors)
- 225K iterations: ❌ SEGV (different bug, not META_MISMATCH)
Impact:
✅ META_MISMATCH errors completely eliminated
✅ Stability improved: 100K → 220K iterations (+120%)
✅ Throughput stable: 15M ops/s
⚠️ Different SEGV at 225K (requires separate investigation)
Investigation Credit:
- Task agent: Identified contradictory stride tables
- ChatGPT: Applied fix and verified LUT correctness
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-22 09:34:35 +09:00
3d341a8b3f
Fix: TLS SLL double-free diagnostics - Add error handling and detection improvements
...
Problem:
workset=8192 crashes at 240K iterations with TLS SLL double-free:
[TLS_SLL_PUSH] FATAL double-free: cls=5 ptr=... already in SLL
Investigation (Task agent):
Identified 8 tls_sll_push() call sites and 3 high-risk areas:
1. HIGH: Carve-Push Rollback pop failures (carve_push_box.c)
2. MEDIUM: Splice partial orphaned nodes (tiny_refill_opt.h)
3. MEDIUM: Incomplete double-free scan - only 64 nodes (tls_sll_box.h)
Fixes Applied:
1. core/box/carve_push_box.c (Lines 115-139)
- Track pop_failed count during rollback
- Log orphaned blocks: [BOX_CARVE_PUSH_ROLLBACK] warning
- Helps identify when rollback leaves blocks in SLL
2. core/box/tls_sll_box.h (Lines 347-370)
- Increase double-free scan: 64 → 256 nodes
- Add scanned count to error: (scanned=%u/%u)
- Catches orphaned blocks deeper in chain
3. core/tiny_refill_opt.h (Lines 135-166)
- Enhanced splice partial logging
- Abort in debug builds on orphaned nodes
- Prevents silent memory leaks
Test Results:
Before: SEGV at 220K iterations
After: SEGV at 240K iterations (improved detection)
[TLS_SLL_PUSH] FATAL double-free: cls=5 ptr=... (scanned=2/71)
Impact:
✅ Early detection working (catches at position 2)
✅ Diagnostic capability greatly improved
⚠️ Root cause not yet resolved (deeper investigation needed)
Status: Diagnostic improvements committed for further analysis
Credit: Root cause analysis by Task agent (Explore)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-22 08:43:18 +09:00
6ae0db9fd2
Fix: workset=8192 SEGV - Align slab_index_for to Box3 geometry (iteration 2)
...
Problem:
After Box3 geometry unification (commit 2fe970252 ), workset=8192 still SEGVs:
- 200K iterations: ✅ OK
- 300K iterations: ❌ SEGV
Root Cause (identified by ChatGPT):
Header/metadata class mismatches around 300K iterations:
- [HDR_META_MISMATCH] hdr_cls=6 meta_cls=5
- [FREE_FAST_HDR_META_MISMATCH] hdr_cls=5 meta_cls=4
- [TLS_SLL_PUSH_META_MISMATCH] cls=5 meta_cls=4
Cause: slab_index_for() geometry mismatch with Box3
- tiny_slab_base_for_geometry() (Box3):
- Slab 0: ss + SUPERSLAB_SLAB0_DATA_OFFSET
- Slab 1: ss + 1*SLAB_SIZE
- Slab k: ss + k*SLAB_SIZE
- Old slab_index_for():
rel = p - (base + SUPERSLAB_SLAB0_DATA_OFFSET);
idx = rel / SLAB_SIZE;
- Result: Off-by-one for slab_idx > 0
Example: tiny_slab_base_for_geometry(ss, 4) returns 0x...40000
slab_index_for(ss, 0x...40000) returns 3 (wrong!)
Impact:
- Block allocated in "C6 slab 4" appears to be in "C5 slab 3"
- Header class_idx (C6) != meta->class_idx (C5)
- TLS SLL corruption → SEGV after extended runs
Fix: core/superslab/superslab_inline.h
======================================
Rewrite slab_index_for() as inverse of Box3 geometry:
static inline int slab_index_for(SuperSlab* ss, void* ptr) {
// ... bounds checks ...
// Slab 0: special case (has metadata offset)
if (p < base + SLAB_SIZE) {
return 0;
}
// Slab 1+: simple SLAB_SIZE spacing from base
size_t rel = p - base; // ← Changed from (p - base - OFFSET)
int idx = (int)(rel / SLAB_SIZE);
return idx;
}
Verification:
- slab_index_for(ss, tiny_slab_base_for_geometry(ss, idx)) == idx ✅
- Consistent for any address within slab
Test Results:
=============
workset=8192 SEGV threshold improved further:
Before this fix (after 2fe970252 ):
✅ 200K iterations: OK
❌ 300K iterations: SEGV
After this fix:
✅ 220K iterations: OK (15.5M ops/s)
❌ 240K iterations: SEGV (different bug)
Progress:
- Iteration 1 (2fe970252 ): 0 → 200K stable
- Iteration 2 (this fix): 200K → 220K stable
- Total improvement: ∞ → 220K iterations (+10% stability)
Known Issues:
- 240K+ still SEGVs (suspected: TLS SLL double-free, per ChatGPT)
- Debug builds may show TLS_SLL_PUSH FATAL double-free detection
- Requires further investigation of free path
Impact:
- No performance regression in stable range
- Header/metadata mismatch errors eliminated
- workset=256 unaffected: 60M+ ops/s maintained
Credit: Root cause analysis and fix by ChatGPT
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-22 07:56:06 +09:00
2fe970252a
Fix: workset=8192 SEGV - Unify SuperSlab geometry to Box3 (partial fix)
...
Problem:
- bench_random_mixed_hakmem with workset=8192 causes SEGV
- workset=256 works fine
- Root cause identified by ChatGPT analysis
Root Cause:
SuperSlab geometry double definition caused slab_base misalignment:
- Old: tiny_slab_base_for() used SLAB0_OFFSET + idx * SLAB_SIZE
- New: Box3 tiny_slab_base_for_geometry() uses offset only for idx=0
- Result: slab_idx > 0 had +2048 byte offset error
- Impact: Unified Cache carve stepped beyond slab boundary → SEGV
Fix 1: core/superslab/superslab_inline.h
========================================
Delegate SuperSlab base calculation to Box3:
static inline uint8_t* tiny_slab_base_for(SuperSlab* ss, int slab_idx) {
if (!ss || slab_idx < 0) return NULL;
return tiny_slab_base_for_geometry(ss, slab_idx); // ← Box3 unified
}
Effect:
- All tiny_slab_base_for() calls now use single Box3 implementation
- TLS slab_base and Box3 calculations perfectly aligned
- Eliminates geometry mismatch between layers
Fix 2: core/front/tiny_unified_cache.c
========================================
Enhanced fail-fast validation (debug builds only):
- unified_refill_validate_base(): Use TLS as source of truth
- Cross-check with registry lookup for safety
- Validate: slab_base range, alignment, meta consistency
- Box3 + TLS boundary consolidated to one place
Fix 3: core/hakmem_tiny_superslab.h
========================================
Added forward declaration:
- SuperSlab* superslab_refill(int class_idx);
- Required by tiny_unified_cache.c
Test Results:
=============
workset=8192 SEGV threshold improved:
Before fix:
❌ Immediate SEGV at any iteration count
After fix:
✅ 100K iterations: OK (9.8M ops/s)
✅ 200K iterations: OK (15.5M ops/s)
❌ 300K iterations: SEGV (different bug exposed)
Conclusion:
- Box3 geometry unification fixed primary SEGV
- Stability improved: 0 → 200K iterations
- Remaining issue: 300K+ iterations hit different bug
- Likely causes: memory pressure, different corruption pattern
Known Issues:
- Debug warnings still present: FREE_FAST_HDR_META_MISMATCH, NXT_HDR_MISMATCH
- These are separate header consistency issues (not related to geometry)
- 300K+ SEGV requires further investigation
Performance:
- No performance regression observed in stable range
- workset=256 unaffected: 60M+ ops/s maintained
Credit: Root cause analysis and fix strategy by ChatGPT
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-22 07:40:35 +09:00
38e4e8d4c2
Phase 19-2: Ultra SLIM debug logging and root cause analysis
...
Add comprehensive statistics tracking and debug logging to Ultra SLIM 4-layer
fast path to diagnose why it wasn't being called.
Changes:
1. core/box/ultra_slim_alloc_box.h
- Move statistics tracking (ultra_slim_track_hit/miss) before first use
- Add debug logging in ultra_slim_print_stats()
- Track call counts to verify Ultra SLIM path execution
- Enhanced stats output with per-class breakdown
2. core/tiny_alloc_fast.inc.h
- Add debug logging at Ultra SLIM gate (line 700-710)
- Log whether Ultra SLIM mode is enabled on first allocation
- Helps diagnose allocation path routing
Root Cause Analysis (with ChatGPT):
========================================
Problem: Ultra SLIM was not being called in default configuration
- ENV: HAKMEM_TINY_ULTRA_SLIM=1
- Observed: Statistics counters remained zero
- Expected: Ultra SLIM 4-layer path to handle allocations
Investigation:
- malloc() → Front Gate Unified Cache → complete (default path)
- Ultra SLIM gate in tiny_alloc_fast() never reached
- Front Gate/Unified Cache handles 100% of allocations
Solution to Test Ultra SLIM:
Turn OFF Front Gate and Unified Cache to force old Tiny path:
HAKMEM_TINY_ULTRA_SLIM=1 \
HAKMEM_FRONT_GATE_UNIFIED=0 \
HAKMEM_TINY_UNIFIED_CACHE=0 \
./out/release/bench_random_mixed_hakmem 100000 256 42
Results:
✅ Ultra SLIM gate logged: ENABLED
✅ Statistics: 49,526 hits, 542 misses (98.9% hit rate)
✅ Throughput: 9.1M ops/s (100K iterations)
⚠️ 10M iterations: TLS SLL corruption (not Ultra SLIM bug)
Secondary Discovery (ChatGPT Analysis):
========================================
TLS SLL C6/C7 corruption is NOT caused by Ultra SLIM:
Evidence:
- Same [TLS_SLL_POP_POST_INVALID] errors occur with Ultra SLIM OFF
- Ultra SLIM OFF + FrontGate/Unified OFF: 9.2M ops/s with same errors
- Root cause: Existing TLS SLL bug exposed when bypassing Front Gate
- Ultra SLIM never pushes to TLS SLL (only pops)
Conclusion:
- Ultra SLIM implementation is correct ✅
- Default configuration (Front Gate/Unified ON) is stable: 60M ops/s
- TLS SLL bugs are pre-existing, unrelated to Ultra SLIM
- Ultra SLIM can be safely enabled with default configuration
Performance Summary:
- Front Gate/Unified ON (default): 60.1M ops/s ✅ stable
- Ultra SLIM works correctly when path is reachable
- No changes needed to Ultra SLIM code
Next Steps:
1. Address workset=8192 SEGV (existing bug, high priority)
2. TLS SLL C6/C7 corruption (separate existing issue)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-22 06:50:38 +09:00
896f24367f
Phase 19-2: Ultra SLIM 4-layer fast path implementation (ENV gated)
...
Implement Ultra SLIM 4-layer allocation fast path with ACE learning preserved.
ENV: HAKMEM_TINY_ULTRA_SLIM=1 (default OFF)
Architecture (4 layers):
- Layer 1: Init Safety (1-2 cycles, cold path only)
- Layer 2: Size-to-Class (1-2 cycles, LUT lookup)
- Layer 3: ACE Learning (2-3 cycles, histogram update) ← PRESERVED!
- Layer 4: TLS SLL Direct (3-5 cycles, freelist pop)
- Total: 7-12 cycles (~2-4ns on 3GHz CPU)
Goal: Achieve mimalloc parity (90-110M ops/s) by removing intermediate layers
(HeapV2, FastCache, SFC) while preserving HAKMEM's learning capability.
Deleted Layers (from standard 7-layer path):
❌ HeapV2 (C0-C3 magazine)
❌ FastCache (C0-C3 array stack)
❌ SFC (Super Front Cache)
Expected savings: 11-15 cycles
Implementation:
1. core/box/ultra_slim_alloc_box.h
- 4-layer allocation path (returns USER pointer)
- TLS-cached ENV check (once per thread)
- Statistics & diagnostics (HAKMEM_ULTRA_SLIM_STATS=1)
- Refill integration with backend
2. core/tiny_alloc_fast.inc.h
- Ultra SLIM gate at entry point (line 694-702)
- Early return if Ultra SLIM mode enabled
- Zero impact on standard path (cold branch)
Performance Results (Random Mixed 256B, 10M iterations):
- Baseline (Ultra SLIM OFF): 63.3M ops/s
- Ultra SLIM ON: 62.6M ops/s (-1.1%)
- Target: 90-110M ops/s (mimalloc parity)
- Gap: 44-76% slower than target
Status: Implementation complete, but performance target not achieved.
The 4-layer architecture is in place and ACE learning is preserved.
Further optimization needed to reach mimalloc parity.
Next Steps:
- Profile Ultra SLIM path to identify remaining bottlenecks
- Verify TLS SLL hit rate (statistics currently show zero)
- Consider further cycle reduction in Layer 3 (ACE learning)
- A/B test with ACE learning disabled to measure impact
Notes:
- Ultra SLIM mode is ENV gated (off by default)
- No impact on standard 7-layer path performance
- Statistics tracking implemented but needs verification
- workset=256 tested and verified working
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-22 06:16:20 +09:00
707365e43b
Build: Remove tracked .d files (now in .gitignore)
...
Cleanup commit: Remove previously tracked dependency files
- core/box/tiny_near_empty_box.d
- core/hakmem_tiny.d
- core/hakmem_tiny_lifecycle.d
- core/hakmem_tiny_unified_stats.d
- hakmem_tiny_unified_stats.d
These files are build artifacts and should not be tracked.
They are now covered by *.d pattern in .gitignore.
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-22 06:12:31 +09:00
eae0435c03
Adaptive CAS: Single-threaded fast path optimization
...
PROBLEM:
- Atomic freelist (Phase 1) introduced 3-5x overhead in hot path
- CAS loop overhead: 16-27 cycles vs 4-6 cycles (non-atomic)
- Single-threaded workloads pay MT safety cost unnecessarily
SOLUTION:
- Runtime thread detection with g_hakmem_active_threads counter
- Single-threaded (1T): Skip CAS, use relaxed load/store (fast)
- Multi-threaded (2+T): Full CAS loop for MT safety
IMPLEMENTATION:
1. core/hakmem_tiny.c:240 - Added g_hakmem_active_threads atomic counter
2. core/hakmem_tiny.c:248 - Added hakmem_thread_register() for per-thread init
3. core/hakmem_tiny.h:160-163 - Exported thread counter and registration API
4. core/box/hak_alloc_api.inc.h:34 - Call hakmem_thread_register() on first alloc
5. core/box/slab_freelist_atomic.h:58-68 - Adaptive CAS in pop_lockfree()
6. core/box/slab_freelist_atomic.h:118-126 - Adaptive CAS in push_lockfree()
DESIGN:
- Thread counter: Incremented on first allocation per thread
- Fast path check: if (num_threads <= 1) → relaxed ops
- Slow path: Full CAS loop (existing Phase 1 implementation)
- Zero overhead when truly single-threaded
PERFORMANCE:
Random Mixed 256B (Single-threaded):
Before (Phase 1): 16.7M ops/s
After: 14.9M ops/s (-11%, thread counter overhead)
Larson (Single-threaded):
Before: 47.9M ops/s
After: 47.9M ops/s (no change, already fast)
Larson (Multi-threaded 8T):
Before: 48.8M ops/s
After: 48.3M ops/s (-1%, within noise)
MT STABILITY:
1T: 47.9M ops/s ✅
8T: 48.3M ops/s ✅ (zero crashes, stable)
NOTES:
- Expected Larson improvement (0.80M → 1.80M) not observed
- Larson was already fast (47.9M) in Phase 1
- Possible Task investigation used different benchmark
- Adaptive CAS implementation verified and working correctly
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-22 03:30:47 +09:00
2d01332c7a
Phase 1: Atomic Freelist Implementation - MT Safety Foundation
...
PROBLEM:
- Larson crashes with 3+ threads (SEGV in freelist operations)
- Root cause: Non-atomic TinySlabMeta.freelist access under contention
- Race condition: Multiple threads pop/push freelist concurrently
SOLUTION:
- Made TinySlabMeta.freelist and .used _Atomic for MT safety
- Created lock-free accessor API (slab_freelist_atomic.h)
- Converted 5 critical hot path sites to use atomic operations
IMPLEMENTATION:
1. superslab_types.h:12-13 - Made freelist and used _Atomic
2. slab_freelist_atomic.h (NEW) - Lock-free CAS operations
- slab_freelist_pop_lockfree() - Atomic pop with CAS loop
- slab_freelist_push_lockfree() - Atomic push (template)
- Relaxed load/store for non-critical paths
3. ss_slab_meta_box.h - Box API now uses atomic accessor
4. hakmem_tiny_superslab.c - Atomic init (store_relaxed)
5. tiny_refill_opt.h - trc_pop_from_freelist() uses lock-free CAS
6. hakmem_tiny_refill_p0.inc.h - Atomic used increment + prefetch
PERFORMANCE:
Single-Threaded (Random Mixed 256B):
Before: 25.1M ops/s (Phase 3d-C baseline)
After: 16.7M ops/s (-34%, atomic overhead expected)
Multi-Threaded (Larson):
1T: 47.9M ops/s ✅
2T: 48.1M ops/s ✅
3T: 46.5M ops/s ✅ (was SEGV before)
4T: 48.1M ops/s ✅
8T: 48.8M ops/s ✅ (stable, no crashes)
MT STABILITY:
Before: SEGV at 3+ threads (100% crash rate)
After: Zero crashes (100% stable at 8 threads)
DESIGN:
- Lock-free CAS: 6-10 cycles overhead (vs 20-30 for mutex)
- Relaxed ordering: 0 cycles overhead (same as non-atomic)
- Memory ordering: acquire/release for CAS, relaxed for checks
- Expected regression: <3% single-threaded, +MT stability
NEXT STEPS:
- Phase 2: Convert 40 important sites (TLS-related freelist ops)
- Phase 3: Convert 25 cleanup sites (remaining + documentation)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-22 02:46:57 +09:00
d8168a2021
Fix C7 TLS SLL header restoration regression + Document Larson MT race condition
...
## Bug Fix: Restore C7 Exception in TLS SLL Push
**File**: `core/box/tls_sll_box.h:309`
**Problem**: Commit 25d963a4a (Code Cleanup) accidentally reverted the C7 fix by changing:
```c
if (class_idx != 0 && class_idx != 7) { // CORRECT (commit 8b67718bf )
if (class_idx != 0) { // BROKEN (commit 25d963a4a )
```
**Impact**: C7 (1024B class) header restoration in TLS SLL push overwrote next pointer at base[0], causing corruption.
**Fix**: Restored `&& class_idx != 7` check to prevent header restoration for C7.
**Why C7 Needs Exception**:
- C7 uses offset=0 (stores next pointer at base[0])
- User pointer is at base+1
- Next pointer MUST NOT be overwritten by header restoration
- C1-C6 use offset=1 (next at base[1]), so base[0] header restoration is safe
## Investigation: Larson MT Race Condition (SEPARATE ISSUE)
**Finding**: Larson still crashes with 3+ threads due to UNRELATED multi-threading race condition in unified cache freelist management.
**Root Cause**: Non-atomic freelist operations in `TinySlabMeta`:
```c
typedef struct TinySlabMeta {
void* freelist; // ❌ NOT ATOMIC
uint16_t used; // ❌ NOT ATOMIC
} TinySlabMeta;
```
**Evidence**:
```
1 thread: ✅ PASS (1.88M - 41.8M ops/s)
2 threads: ✅ PASS (24.6M ops/s)
3 threads: ❌ SEGV (race condition)
4+ threads: ❌ SEGV (race condition)
```
**Status**: C7 fix is CORRECT. Larson crash is separate MT issue requiring atomic freelist implementation.
## Documentation Added
Created comprehensive investigation reports:
- `LARSON_CRASH_ROOT_CAUSE_REPORT.md` - Full technical analysis
- `LARSON_DIAGNOSTIC_PATCH.md` - Implementation guide
- `LARSON_INVESTIGATION_SUMMARY.md` - Executive summary
- `LARSON_QUICK_REF.md` - Quick reference
- `verify_race_condition.sh` - Automated verification script
## Next Steps
Implement atomic freelist operations for full MT safety (7-9 hour effort):
1. Make `TinySlabMeta.freelist` atomic with CAS loop
2. Audit 87 freelist access sites
3. Test with Larson 8+ threads
🔧 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-22 02:15:34 +09:00