hakmem

Author	SHA1	Message	Date
Moe Charm (CI)	912123cbbe	P3: Skip header write in alloc path when class_map is active Skip the 1-byte header write in tiny_region_id_write_header() when class_map is active (default). class_map provides out-of-band class_idx lookup, making the header byte unnecessary for the free path. Changes: - Add ENV-gated conditional to skip header write (default: skip) - ENV: HAKMEM_TINY_WRITE_HEADER=1 to force header write (legacy mode) - Memory layout preserved: user pointer = base + 1 (1B unused when skipped) Performance improvement: - tiny_hot 64B: 83.5M → 84.2M ops/sec (+0.8%) - random_mixed ws=256: 68.1M → 72.2M ops/sec (+6%) The header skip reduces one store instruction per allocation, which is particularly beneficial for mixed-size workloads like random_mixed. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 14:46:55 +09:00
Moe Charm (CI)	a6e681aae7	P2: TLS SLL Redesign - class_map default, tls_cached tracking, conditional header restore This commit completes the P2 phase of the Tiny Pool TLS SLL redesign to fix the Header/Next pointer conflict that was causing ~30% crash rates. Changes: - P2.1: Make class_map lookup the default (ENV: HAKMEM_TINY_NO_CLASS_MAP=1 for legacy) - P2.2: Add meta->tls_cached field to track blocks cached in TLS SLL - P2.3: Make Header restoration conditional in tiny_next_store() (default: skip) - P2.4: Add invariant verification functions (active + tls_cached ≈ used) - P0.4: Document new ENV variables in ENV_VARS.md New ENV variables: - HAKMEM_TINY_ACTIVE_TRACK=1: Enable active/tls_cached tracking (~1% overhead) - HAKMEM_TINY_NO_CLASS_MAP=1: Disable class_map (legacy mode) - HAKMEM_TINY_RESTORE_HEADER=1: Force header restoration (legacy mode) - HAKMEM_TINY_INVARIANT_CHECK=1: Enable invariant verification (debug) - HAKMEM_TINY_INVARIANT_DUMP=1: Enable periodic state dumps (debug) Benchmark results (bench_tiny_hot_hakmem 64B): - Default (class_map ON): 84.49 M ops/sec - ACTIVE_TRACK=1: 83.62 M ops/sec (-1%) - NO_CLASS_MAP=1 (legacy): 85.06 M ops/sec - MT performance: +21-28% vs system allocator No crashes observed. All tests passed. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 14:11:37 +09:00
Moe Charm (CI)	6b86c60a20	P1.3: Add meta->active for TLS SLL tracking Add active field to TinySlabMeta to track blocks currently held by users (not in TLS SLL or freelist caches). This enables accurate empty slab detection that accounts for TLS SLL cached blocks. Changes: - superslab_types.h: Add _Atomic uint16_t active field - ss_allocation_box.c, hakmem_tiny_superslab.c: Initialize active=0 - tiny_free_fast_v2.inc.h: Decrement active on TLS SLL push - tiny_alloc_fast.inc.h: Add tiny_active_track_alloc() helper, increment active on TLS SLL pop (all code paths) - ss_hot_cold_box.h: ss_is_slab_empty() uses active when enabled All tracking is ENV-gated: HAKMEM_TINY_ACTIVE_TRACK=1 to enable. Default is off for zero performance impact. Invariant: active = used - tls_cached (active <= used) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 13:53:45 +09:00
Moe Charm (CI)	dc9e650db3	Tiny Pool redesign: P0.1, P0.3, P1.1, P1.2 - Out-of-band class_idx lookup This commit implements the first phase of Tiny Pool redesign based on ChatGPT architecture review. The goal is to eliminate Header/Next pointer conflicts by moving class_idx lookup out-of-band (to SuperSlab metadata). ## P0.1: C0(8B) class upgraded to 16B - Size table changed: {16,32,64,128,256,512,1024,2048} (8 classes) - LUT updated: 1..16 → class 0, 17..32 → class 1, etc. - tiny_next_off: C0 now uses offset 1 (header preserved) - Eliminates edge cases for 8B allocations ## P0.3: Slab reuse guard Box (tls_slab_reuse_guard_box.h) - New Box for draining TLS SLL before slab reuse - ENV gate: HAKMEM_TINY_SLAB_REUSE_GUARD=1 - Prevents stale pointers when slabs are recycled - Follows Box theory: single responsibility, minimal API ## P1.1: SuperSlab class_map addition - Added uint8_t class_map[SLABS_PER_SUPERSLAB_MAX] to SuperSlab - Maps slab_idx → class_idx for out-of-band lookup - Initialized to 255 (UNASSIGNED) on SuperSlab creation - Set correctly on slab initialization in all backends ## P1.2: Free fast path uses class_map - ENV gate: HAKMEM_TINY_USE_CLASS_MAP=1 - Free path can now get class_idx from class_map instead of Header - Falls back to Header read if class_map returns invalid value - Fixed Legacy Backend dynamic slab initialization bug ## Documentation added - HAKMEM_ARCHITECTURE_OVERVIEW.md: 4-layer architecture analysis - TLS_SLL_ARCHITECTURE_INVESTIGATION.md: Root cause analysis - PTR_LIFECYCLE_TRACE_AND_ROOT_CAUSE_ANALYSIS.md: Pointer tracking - TINY_REDESIGN_CHECKLIST.md: Implementation roadmap (P0-P3) ## Test results - Baseline: 70% success rate (30% crash - pre-existing issue) - class_map enabled: 70% success rate (same as baseline) - Performance: ~30.5M ops/s (unchanged) ## Next steps (P1.3, P2, P3) - P1.3: Add meta->active for accurate TLS/freelist sync - P2: TLS SLL redesign with Box-based counting - P3: Complete Header out-of-band migration 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 13:42:39 +09:00
Moe Charm (CI)	813ebd5221	ENV Cleanup Step 18: Gate HAKMEM_TINY_SLL_DIAG Gate the SLL diagnostics debug variable behind #if !HAKMEM_BUILD_RELEASE: - HAKMEM_TINY_SLL_DIAG: Controls singly-linked list integrity diagnostics - 5 call sites gated (2 already gated, 5 needed gating): Files modified: - core/box/tls_sll_box.h:117 (tls_sll_dump_tls_window) - core/box/tls_sll_box.h:191 (tls_sll_diag_next) - core/hakmem_tiny.c:629 (tiny_tls_sll_diag_atexit destructor) - core/hakmem_tiny_superslab.c:142 (remote drain diag) - core/tiny_superslab_free.inc.h:132 (header mismatch detector) Already gated: - core/box/free_local_box.c:38 (already gated at line 33) - core/box/free_local_box.c:87 (already gated at line 82) Performance: 30.9M ops/s (baseline maintained) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 04:39:20 +09:00
Moe Charm (CI)	7d0782d5b6	ENV Cleanup Step 17: Gate HAKMEM_TINY_RF_TRACE Gate the refill trace debug variable behind #if !HAKMEM_BUILD_RELEASE: - HAKMEM_TINY_RF_TRACE: Controls refill/mailbox publish path tracing - File: core/tiny_publish.c:21-34 (1 call site gated) Other 2 call sites already gated: - core/tiny_refill.h:94 (already inside #if !HAKMEM_BUILD_RELEASE) - core/box/mailbox_box.c:64 (already inside #if !HAKMEM_BUILD_RELEASE) Performance: 30.7M ops/s avg (baseline maintained, 3 runs: 30.6M, 30.9M, 30.7M) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 04:36:37 +09:00
Moe Charm (CI)	2cdec72ee3	ENV Cleanup Step 16: Gate HAKMEM_SS_FREE_DEBUG Gate the shared pool free debug variable behind #if !HAKMEM_BUILD_RELEASE: - HAKMEM_SS_FREE_DEBUG: Controls shared pool slot release tracing - File: core/hakmem_shared_pool.c:1221-1229 The debug output was already gated inside #if !HAKMEM_BUILD_RELEASE blocks. This change only gates the ENV check itself. In release builds, sets dbg to constant 0, allowing compiler to optimize away checks. Performance: 30.3M ops/s (baseline maintained) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 04:35:07 +09:00
Moe Charm (CI)	f119f048f2	ENV Cleanup Step 15: Gate HAKMEM_SS_ACQUIRE_DEBUG Gate the shared pool acquire debug variable behind #if !HAKMEM_BUILD_RELEASE: - HAKMEM_SS_ACQUIRE_DEBUG: Controls shared pool acquisition stage tracing - File: core/hakmem_shared_pool.c:780-788 The debug output was already gated inside #if !HAKMEM_BUILD_RELEASE blocks. This change only gates the ENV check itself. In release builds, sets dbg_acquire to constant 0, allowing compiler to optimize away checks. Performance: 31.1M ops/s (+2% vs baseline) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 04:34:21 +09:00
Moe Charm (CI)	679c821573	ENV Cleanup Step 14: Gate HAKMEM_TINY_HEAP_V2_DEBUG Gate the HeapV2 push debug logging behind #if !HAKMEM_BUILD_RELEASE: - HAKMEM_TINY_HEAP_V2_DEBUG: Controls magazine push event tracing - File: core/front/tiny_heap_v2.h:117-130 Wraps the ENV check and debug output that logs the first 5 push operations per size class for HeapV2 magazine diagnostics. Performance: 29.6M ops/s (within baseline range) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 04:33:39 +09:00
Moe Charm (CI)	be9bdd7812	ENV Cleanup Step 13: Gate HAKMEM_TINY_REFILL_OPT_DEBUG Gate the refill optimization debug output behind #if !HAKMEM_BUILD_RELEASE: - HAKMEM_TINY_REFILL_OPT_DEBUG: Controls refill chain optimization tracing - File: core/tiny_refill_opt.h:30 Changed condition from: #if HAKMEM_TINY_REFILL_OPT to: #if HAKMEM_TINY_REFILL_OPT && !HAKMEM_BUILD_RELEASE Performance: 30.6M ops/s (baseline maintained) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 04:32:55 +09:00
Moe Charm (CI)	417f149479	ENV Cleanup Step 12: Gate HAKMEM_TINY_FAST_DEBUG + HAKMEM_TINY_FAST_DEBUG_MAX Gate the fast cache debug system behind #if !HAKMEM_BUILD_RELEASE: - HAKMEM_TINY_FAST_DEBUG: Enable/disable fastcache event logging - HAKMEM_TINY_FAST_DEBUG_MAX: Limit number of debug messages per class - File: core/hakmem_tiny_fastcache.inc.h:48-76 Both variables combined in single gate since they work together as a debug logging subsystem. In release builds, provides no-op inline stub. Performance: 30.5M ops/s (baseline maintained) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 04:32:15 +09:00
Moe Charm (CI)	a24f17386c	ENV Cleanup Step 11: Gate HAKMEM_SS_PREWARM_DEBUG in super_registry.c Gate HAKMEM_SS_PREWARM_DEBUG environment variable behind #if !HAKMEM_BUILD_RELEASE in prewarm functions (2 call sites). Changes: - Wrap dbg variable in hak_ss_prewarm_class() - Wrap dbg variable in hak_ss_prewarm_init() - Release builds use constant dbg = 0 for complete code elimination Performance: 30.2M ops/s Larson (stable, within expected variance) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 01:48:57 +09:00
Moe Charm (CI)	2c3dcdb90b	ENV Cleanup Step 10: Gate HAKMEM_SS_LRU_DEBUG in super_registry.c Gate HAKMEM_SS_LRU_DEBUG environment variable behind #if !HAKMEM_BUILD_RELEASE in LRU cache operations (3 call sites). Changes: - Wrap dbg variable in ss_lru_evict_one() - Wrap dbg variable in hak_ss_lru_pop() - Wrap dbg variable in hak_ss_lru_push() - Release builds use constant dbg = 0 for complete code elimination Performance: 30.7M ops/s Larson (+1.3% improvement) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 01:48:02 +09:00
Moe Charm (CI)	4540b01da0	ENV Cleanup Step 9: Gate HAKMEM_SUPER_REG_DEBUG in super_registry.c Gate HAKMEM_SUPER_REG_DEBUG environment variable behind #if !HAKMEM_BUILD_RELEASE in register/unregister functions. Changes: - Wrap dbg variable initialization in hak_super_register() - Wrap dbg_once static variable and ENV check in hak_super_unregister() - Release builds use constant dbg = 0 for complete code elimination Performance: 30.6M ops/s Larson (+1.0% improvement) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 01:46:50 +09:00
Moe Charm (CI)	f8b0f38f78	ENV Cleanup Step 8: Gate HAKMEM_SUPER_LOOKUP_DEBUG in header Gate HAKMEM_SUPER_LOOKUP_DEBUG environment variable behind #if !HAKMEM_BUILD_RELEASE in hakmem_super_registry.h inline function. Changes: - Wrap s_dbg initialization in conditional compilation - Release builds use constant s_dbg = 0 for complete elimination - Debug logging in hak_super_lookup() now fully compiled out in release Performance: 30.3M ops/s Larson (stable, no regression) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 01:45:45 +09:00
Moe Charm (CI)	cfa5e4e91c	ENV Cleanup Step 7: Gate debug ENV vars in core/box/free_local_box.c Changes: - Gated HAKMEM_TINY_SLL_DIAG (2 call sites) behind #if !HAKMEM_BUILD_RELEASE - Gated HAKMEM_TINY_FREELIST_MASK behind #if !HAKMEM_BUILD_RELEASE - Gated HAKMEM_SS_FREE_DEBUG behind #if !HAKMEM_BUILD_RELEASE - Entire diagnostic blocks wrapped (not just getenv) to avoid compilation errors - ENV variables gated: HAKMEM_TINY_SLL_DIAG, HAKMEM_TINY_FREELIST_MASK, HAKMEM_SS_FREE_DEBUG Performance: 30.4M ops/s Larson (baseline 30.4M, perfect match) Build: Clean, pre-existing warnings only FIX: Previous version had scoping issue with static variables inside do{}while blocks 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 01:10:26 +09:00
Moe Charm (CI)	d0d2814f15	ENV Cleanup Step 6: Gate HAKMEM_TIMING in core/hakmem_debug.c Changes: - Gated HAKMEM_TIMING ENV check behind #if !HAKMEM_BUILD_RELEASE - Release builds set g_timing_enabled = 0 directly (no getenv call) - Debug builds preserve existing behavior - ENV variable gated: HAKMEM_TIMING Performance: 30.3M ops/s Larson (baseline 30.4M, within margin) Build: Clean, LTO warnings only (pre-existing) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 01:05:18 +09:00
Moe Charm (CI)	35e8e4c34d	ENV Cleanup Step 5: Gate HAKMEM_PTR_TRACE_DUMP/VERBOSE in core/ptr_trace.h Changes: - Gated HAKMEM_PTR_TRACE_DUMP behind #if !HAKMEM_BUILD_RELEASE - Gated HAKMEM_PTR_TRACE_VERBOSE behind #if !HAKMEM_BUILD_RELEASE - Used lazy init pattern with __builtin_expect for branch prediction - ENV variables gated: HAKMEM_PTR_TRACE_DUMP, HAKMEM_PTR_TRACE_VERBOSE Performance: 29.2M ops/s Larson (baseline 30.4M, -4% acceptable variance) Build: Clean, LTO warnings only (pre-existing) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 01:04:29 +09:00
Moe Charm (CI)	316ea4dfd6	ENV Cleanup Step 4: Gate HAKMEM_WATCH_ADDR in tiny_region_id.h Gate get_watch_addr() debug functionality with HAKMEM_BUILD_RELEASE, returning 0 in release builds to disable address watching overhead. Performance: 30.31M ops/s (baseline: 30.2M) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 00:47:16 +09:00
Moe Charm (CI)	42747a1080	ENV Cleanup Step 3: Gate HAKMEM_TINY_PROFILE in tiny_fastcache.h Gate tiny_fast_profile_enabled() getenv call with HAKMEM_BUILD_RELEASE, returning 0 in release builds to disable profiling overhead. Performance: 30.34M ops/s (baseline: 30.2M) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 00:46:32 +09:00
Moe Charm (CI)	794bf996f1	ENV Cleanup Step 2c: Gate debug code in hakmem_tiny_alloc.inc Gate tiny_alloc_dump_tls_state() call on allocation failure path with HAKMEM_BUILD_RELEASE guard. Performance: 30.15M ops/s (baseline: 30.2M) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 00:45:27 +09:00
Moe Charm (CI)	0567e2957f	ENV Cleanup Step 2b: Gate debug code in tiny_superslab_free.inc.h Gate tiny_alloc_dump_tls_state() call in remote watch debug path with HAKMEM_BUILD_RELEASE guard, consolidating with existing debug fprintf. Performance: 30.3M ops/s (baseline: 30.2M) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 00:44:47 +09:00
Moe Charm (CI)	d6c2ea6f3e	ENV Cleanup Step 2a: Gate debug code in hakmem_tiny_slow.inc Gate tiny_alloc_dump_tls_state() call and getenv debug code on slow path failure with HAKMEM_BUILD_RELEASE guard. Performance: 30.5M ops/s (baseline: 30.2M) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 00:43:57 +09:00
Moe Charm (CI)	3833d4e3eb	ENV Cleanup Step 1: Gate tiny_debug.h with HAKMEM_BUILD_RELEASE Wrap debug functionality in !HAKMEM_BUILD_RELEASE guard with no-op stubs for release builds. This eliminates getenv() calls for HAKMEM_TINY_ALLOC_DEBUG in production while maintaining API compatibility. Performance: 30.0M ops/s (baseline: 30.2M) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 00:43:07 +09:00
Moe Charm (CI)	930c5283b4	Fix Larson 36x slowdown: Remove tls_uninitialized early return in sll_refill_small_from_ss() Problem: - Larson benchmark showed 730K ops/s instead of expected 26M ops/s - Class 1 TLS SLL cache always stayed empty (tls_count=0) - All allocations went through slow path (shared_pool_acquire_slab at 48% CPU) Root cause: - In sll_refill_small_from_ss(), when TLS was completely uninitialized (ss=NULL, meta=NULL, slab_base=NULL), the function returned 0 immediately without calling superslab_refill() to initialize it - The comment said "expect upper logic to call superslab_refill" but tiny_alloc_fast_refill() did NOT call it after receiving 0 - This created a loop: TLS SLL stays empty → refill returns 0 → slow path Fix: - Remove the tls_uninitialized early return - Let the existing downstream condition (!tls->ss \|\| !tls->meta \|\| ...) handle the uninitialized case and call superslab_refill() Result: - Throughput: 730K → 26.5M ops/s (36x improvement) - shared_pool_acquire_slab: 48% → 0% in perf profile Introduced in: `fcf098857` (Phase12 debug, 2025-11-14) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-27 16:47:30 +09:00
Moe Charm (CI)	8355214135	Fix NULL pointer crash in unified_cache_refill ss_active_add When superslab_refill() fails in the inner loop, tls->ss can remain NULL even when produced > 0 (from earlier successful allocations). This caused a segfault at high iteration counts (>500K) in the random_mixed benchmark. Root cause: Line 353 calls ss_active_add(tls->ss, ...) without checking if tls->ss is NULL after a failed refill breaks the loop. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-27 13:31:46 +09:00
Moe Charm (CI)	7a03a614fd	Restrict ss_fast_lookup to validated Tiny pointer paths only Safety fix: ss_fast_lookup masks pointer to 1MB boundary and reads memory at that address. If called with arbitrary (non-Tiny) pointers, the masked address could be unmapped → SEGFAULT. Changes: - tiny_free_fast(): Reverted to safe hak_super_lookup (can receive arbitrary pointers without prior validation) - ss_fast_lookup(): Added safety warning in comments documenting when it's safe to use (after header magic 0xA0 validation) ss_fast_lookup remains in LARSON_FIX paths where header magic is already validated before the SuperSlab lookup. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-27 12:55:40 +09:00
Moe Charm (CI)	64ed3d8d8c	Add ss_fast_lookup() for O(1) SuperSlab lookup via mask Replaces expensive hak_super_lookup() (registry hash lookup, 50-100 cycles) with fast mask-based lookup (~5-10 cycles) in free hot paths. Algorithm: 1. Mask pointer with SUPERSLAB_SIZE_MIN (1MB) - works for both 1MB and 2MB SS 2. Validate magic (SUPERSLAB_MAGIC) 3. Range check using ss->lg_size Applied to: - tiny_free_fast.inc.h: tiny_free_fast() SuperSlab path - tiny_free_fast_v2.inc.h: LARSON_FIX cross-thread check - front/malloc_tiny_fast.h: free_tiny_fast() LARSON_FIX path Note: Performance impact minimal with LARSON_FIX=OFF (default) since SuperSlab lookup is skipped entirely in that case. Optimization benefits LARSON_FIX=ON path for safe multi-threaded operation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-27 12:47:10 +09:00
Moe Charm (CI)	0a8bdb8b18	Fix release build debug logging in tiny_region_id.h The allocation logging at line 236-249 was missing the #if !HAKMEM_BUILD_RELEASE guard, causing fprintf(stderr) on every allocation even in release builds. Impact: 19.8M ops/s → 28.0M ops/s (+42%) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-27 11:58:00 +09:00
Moe Charm (CI)	d8e3971dc2	Fix cross-thread ownership check: Use bits 8-15 for owner_tid_low Problem: - TLS_SLL_PUSH_DUP crash in Larson multi-threaded benchmark - Cross-thread frees incorrectly routed to same-thread TLS path - Root cause: pthread_t on glibc is 256-byte aligned (TCB base) so lower 8 bits are ALWAYS 0x00 for ALL threads Fix: - Change owner_tid_low from (tid & 0xFF) to ((tid >> 8) & 0xFF) - Bits 8-15 actually vary between threads, enabling correct detection - Applied consistently across all ownership check locations: - superslab_inline.h: ss_owner_try_acquire/release/is_mine - slab_handle.h: slab_try_acquire - tiny_free_fast.inc.h: tiny_free_is_same_thread_ss - tiny_free_fast_v2.inc.h: cross-thread detection - tiny_superslab_free.inc.h: same-thread check - ss_allocation_box.c: slab initialization - hakmem_tiny_superslab.c: ownership handling Also added: - Address watcher debug infrastructure (tiny_region_id.h) - Cross-thread detection in malloc_tiny_fast.h Front Gate Test results: - Larson 1T/2T/4T: PASS (no TLS_SLL_PUSH_DUP crash) - random_mixed: PASS - Performance: ~20M ops/s (regression from 48M, needs optimization) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-27 11:52:11 +09:00
Moe Charm (CI)	8af9123bcc	Larson double-free investigation: Add full operation lifecycle logging Diagnostic Enhancement: Complete malloc/free/pop operation tracing for debug Problem: Larson crashes with TLS_SLL_DUP at count=18, need to trace exact pointer lifecycle to identify if allocator returns duplicate addresses or if benchmark has double-free bug. Implementation (ChatGPT + Claude + Task collaboration): 1. Global Operation Counter (core/hakmem_tiny_config_box.inc:9): - Single atomic counter for all operations (malloc/free/pop) - Chronological ordering across all paths 2. Allocation Logging (core/hakmem_tiny_config_box.inc:148-161): - HAK_RET_ALLOC macro enhanced with operation logging - Logs first 50 class=1 allocations with ptr/base/tls_count 3. Free Logging (core/tiny_free_fast_v2.inc.h:222-235): - Added before tls_sll_push() call (line 221) - Logs first 50 class=1 frees with ptr/base/tls_count_before 4. Pop Logging (core/box/tls_sll_box.h:587-597): - Added in tls_sll_pop_impl() after successful pop - Logs first 50 class=1 pops with base/tls_count_after 5. Drain Debug Logging (core/box/tls_sll_drain_box.h:143-151): - Enhanced drain loop with detailed logging - Tracks pop failures and drained block counts Initial Findings: - First 19 operations: ALL frees, ZERO allocations, ZERO pops - OP#0006: First free of 0x...430 - OP#0018: Duplicate free of 0x...430 → TLS_SLL_DUP detected - Suggests either: (a) allocations before logging starts, or (b) Larson bug Debug-only: All logging gated by !HAKMEM_BUILD_RELEASE (zero cost in release) Next Steps: - Expand logging window to 200 operations - Log initialization phase allocations - Cross-check with Larson benchmark source Status: Ready for extended testing	2025-11-27 08:18:01 +09:00
Moe Charm (CI)	8553894171	Larson double-free investigation: Enhanced diagnostics + Remove buggy drain pushback Problem: Larson benchmark crashes with TLS_SLL_DUP (double-free), 100% crash rate in debug Root Cause: TLS drain pushback code (commit `c2f104618`) created duplicates by pushing pointers back to TLS SLL while they were still in the linked list chain. Diagnostic Enhancements (ChatGPT + Claude collaboration): 1. Callsite Tracking: Track file:line for each TLS SLL push (debug only) - Arrays: g_tls_sll_push_file[], g_tls_sll_push_line[] - Macro: tls_sll_push() auto-records __FILE__, __LINE__ 2. Enhanced Duplicate Detection: - Scan depth: 64 → 256 nodes (deep duplicate detection) - Error message shows BOTH current and previous push locations - Calls ptr_trace_dump_now() for detailed analysis 3. Evidence Captured: - Both duplicate pushes from same line (221) - Pointer at position 11 in TLS SLL (count=18, scanned=11) - Confirms pointer allocated without being popped from TLS SLL Fix: - core/box/tls_sll_drain_box.h: Remove pushback code entirely - Old: Push back to TLS SLL on validation failure → duplicates! - New: Skip pointer (accept rare leak) to avoid duplicates - Rationale: SuperSlab lookup failures are transient/rare Status: Fix implemented, ready for testing Updated: - LARSON_DOUBLE_FREE_INVESTIGATION.md: Root cause confirmed	2025-11-27 07:30:32 +09:00
Moe Charm (CI)	c2f104618f	Fix critical TLS drain memory leak causing potential double-free ## Root Cause TLS drain was dropping pointers when SuperSlab lookup or slab_idx validation failed: - Pop pointer from TLS SLL - Lookup/validation fails - continue → LEAK! Pointer never returned to any freelist ## Impact Memory leak + potential double allocation: 1. Pointer P popped but leaked 2. Same address P reallocated from carve/other source 3. User frees P again → duplicate detection → ABORT ## Fix Before (BUGGY): ```c if (!ss \|\| invalid_slab_idx) { continue; // ← LEAK! } ``` After (FIXED): ```c if (!ss \|\| invalid_slab_idx) { // Push back to TLS SLL head (retry later) tiny_next_write(class_idx, base, g_tls_sll[class_idx].head); g_tls_sll[class_idx].head = base; g_tls_sll[class_idx].count++; break; // Stop draining to avoid infinite retry } ``` ## Files Changed - core/box/tls_sll_drain_box.h: Fix 2 leak sites (SS lookup + slab_idx validation) - docs/analysis/LARSON_DOUBLE_FREE_INVESTIGATION.md: Investigation report ## Related - Larson double-free investigation (47% crash rate) - Commit `e4868bf23`: Freelist header write + abort() on duplicate - ChatGPT analysis: Larson benchmark code is correct (no user bug)	2025-11-27 06:49:38 +09:00
Moe Charm (CI)	e4868bf236	Larson crash investigation: Add freelist header write + abort() on duplicate ## Changes 1. TLS SLL duplicate detection (core/box/tls_sll_box.h:381) - Changed 'return true' to 'abort()' to get backtrace on double-free - Enables precise root cause identification 2. Freelist header write fix (core/tiny_superslab_alloc.inc.h:159-169) - Added tiny_region_id_write_header() call in freelist allocation path - Previously only linear carve wrote headers → stale headers on reuse - Now both paths write headers consistently ## Root Cause Analysis Backtrace revealed true double-free pattern: - last_push_from=hak_tiny_free_fast_v2 (freed once) - last_pop_from=(null) (never allocated) - where=hak_tiny_free_fast_v2 (freed again!) Same pointer freed twice WITHOUT reallocation in between. ## Status - Freelist header fix: ✅ Implemented (necessary but not sufficient) - Double-free still occurs: ❌ Deeper investigation needed - Possible causes: User code bug, TLS drain race, remote free issue Next: Investigate allocation/free flow with enhanced tracing	2025-11-27 05:57:22 +09:00
Moe Charm (CI)	12c36afe46	Fix TSan build: Add weak stubs for sanitizer compatibility Added weak stubs to core/link_stubs.c for symbols that are not needed in HAKMEM_FORCE_LIBC_ALLOC_BUILD=1 (TSan/ASan) builds: Stubs added: - g_bump_chunk (int) - g_tls_bcur, g_tls_bend (__thread uint8_t*[8]) - smallmid_backend_free() - expand_superslab_head() Also added: #include <stdint.h> for uint8_t Impact: - TSan build: PASS (larson_hakmem_tsan successfully built) - Phase 2 ready: Can now use TSan to debug Larson crashes Next: Use TSan to investigate Larson 47% crash rate 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-27 05:19:56 +09:00
Moe Charm (CI)	6b791b97d4	ENV Cleanup: Delete Ultra HEAP & BG Remote dead code (-1,096 LOC) Deleted files (11): - core/ultra/ directory (6 files: tiny_ultra_heap., tiny_ultra_page_arena.) - core/front/tiny_ultrafront.h - core/tiny_ultra_fast.inc.h - core/hakmem_tiny_ultra_front.inc.h - core/hakmem_tiny_ultra_simple.inc - core/hakmem_tiny_ultra_batch_box.inc Edited files (10): - core/hakmem_tiny.c: Remove Ultra HEAP #includes, move ultra_batch_for_class() - core/hakmem_tiny_tls_state_box.inc: Delete TinyUltraFront, g_ultra_simple - core/hakmem_tiny_phase6_wrappers_box.inc: Delete ULTRA_SIMPLE block - core/hakmem_tiny_alloc.inc: Delete Ultra-Front code block - core/hakmem_tiny_init.inc: Delete ULTRA_SIMPLE ENV loading - core/hakmem_tiny_remote_target.{c,h}: Delete g_bg_remote_enable/batch - core/tiny_refill.h: Remove BG Remote check (always break) - core/hakmem_tiny_background.inc: Delete BG Remote drain loop Deleted ENV variables: - HAKMEM_TINY_ULTRA_HEAP (build flag, undefined) - HAKMEM_TINY_ULTRA_L0 - HAKMEM_TINY_ULTRA_HEAP_DUMP - HAKMEM_TINY_ULTRA_PAGE_DUMP - HAKMEM_TINY_ULTRA_FRONT - HAKMEM_TINY_BG_REMOTE (no getenv, dead code) - HAKMEM_TINY_BG_REMOTE_BATCH (no getenv, dead code) - HAKMEM_TINY_ULTRA_SIMPLE (references only) Impact: - Code reduction: -1,096 lines - Binary size: 305KB → 304KB (-1KB) - Build: PASS - Sanity: 15.69M ops/s (3 runs avg) - Larson: 1 crash observed (seed 43, likely existing instability) Notes: - Ultra HEAP never compiled (#if HAKMEM_TINY_ULTRA_HEAP undefined) - BG Remote variables never initialized (g_bg_remote_enable always 0) - Ultra SLIM (ultra_slim_alloc_box.h) preserved (active 4-layer path) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-27 04:35:47 +09:00
Moe Charm (CI)	f4978b1529	ENV Cleanup Phase 5: Additional DEBUG guards + doc cleanup Code changes: - core/slab_handle.h: Add RELEASE guard for HAKMEM_TINY_FREELIST_MASK - core/tiny_superslab_free.inc.h: Add guards for HAKMEM_TINY_ROUTE_FREE, HAKMEM_TINY_FREELIST_MASK Documentation cleanup: - docs/specs/CONFIGURATION.md: Remove 21 doc-only ENV variables - docs/specs/ENV_VARS.md: Remove doc-only variables Testing: - Build: PASS (305KB binary, unchanged) - Sanity: PASS (17.22M ops/s average, 3 runs) - Larson: PASS (52.12M ops/s, 0 crashes) Impact: - 2 additional DEBUG ENV variables guarded (no overhead in RELEASE) - Documentation accuracy improved - Binary size maintained 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-27 03:55:17 +09:00
Moe Charm (CI)	43015725af	ENV cleanup: Add RELEASE guards to DEBUG ENV variables (14 vars) Added compile-time guards (#if HAKMEM_BUILD_RELEASE) to eliminate DEBUG ENV variable overhead in RELEASE builds. Variables guarded (14 total): - HAKMEM_TINY_TRACE_RING, HAKMEM_TINY_DUMP_RING_ATEXIT - HAKMEM_TINY_RF_TRACE, HAKMEM_TINY_MAILBOX_TRACE - HAKMEM_TINY_MAILBOX_TRACE_LIMIT, HAKMEM_TINY_MAILBOX_SLOWDISC - HAKMEM_TINY_MAILBOX_SLOWDISC_PERIOD - HAKMEM_SS_PREWARM_DEBUG, HAKMEM_SS_FREE_DEBUG - HAKMEM_TINY_FRONT_METRICS, HAKMEM_TINY_FRONT_DUMP - HAKMEM_TINY_COUNTERS_DUMP, HAKMEM_TINY_REFILL_DUMP - HAKMEM_PTR_TRACE_DUMP, HAKMEM_PTR_TRACE_VERBOSE Files modified (9 core files): - core/tiny_debug_ring.c (ring trace/dump) - core/box/mailbox_box.c (mailbox trace + slowdisc) - core/tiny_refill.h (refill trace) - core/hakmem_tiny_superslab.c (superslab debug) - core/box/ss_allocation_box.c (allocation debug) - core/tiny_superslab_free.inc.h (free debug) - core/box/front_metrics_box.c (frontend metrics) - core/hakmem_tiny_stats.c (stats dump) - core/ptr_trace.h (pointer trace) Bug fixes during implementation: 1. mailbox_box.c - Fixed variable scope (moved 'used' outside guard) 2. hakmem_tiny_stats.c - Fixed incomplete declarations (on1, on2) Impact: - Binary size: -85KB total - bench_random_mixed_hakmem: 319K → 305K (-14K, -4.4%) - larson_hakmem: 380K → 309K (-71K, -18.7%) - Performance: No regression (16.9-17.9M ops/s maintained) - Functional: All tests pass (Random Mixed + Larson) - Behavior: DEBUG ENV vars correctly ignored in RELEASE builds Testing: - Build: Clean compilation (warnings only, pre-existing) - 100K Random Mixed: 16.9-17.9M ops/s (PASS) - 10K Larson: 25.9M ops/s (PASS) - DEBUG ENV verification: Correctly ignored (PASS) Result: 14 DEBUG ENV variables now have zero overhead in RELEASE builds. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-27 03:41:07 +09:00
Moe Charm (CI)	543abb0586	ENV cleanup: Consolidate SFC_DEBUG getenv() calls (86% reduction) Optimized HAKMEM_SFC_DEBUG environment variable handling by caching the value at initialization instead of repeated getenv() calls in hot paths. Changes: 1. Added g_sfc_debug global variable (core/hakmem_tiny_sfc.c) - Initialized once in sfc_init() by reading HAKMEM_SFC_DEBUG - Single source of truth for SFC debug state 2. Declared g_sfc_debug as extern (core/hakmem_tiny_config.h) - Available to all modules that need SFC debug checks 3. Replaced getenv() with g_sfc_debug in hot paths: - core/tiny_alloc_fast_sfc.inc.h (allocation path) - core/tiny_free_fast.inc.h (free path) - core/box/hak_wrappers.inc.h (wrapper layer) Impact: - getenv() calls: 7 → 1 (86% reduction) - Hot-path calls eliminated: 6 (all moved to init-time) - Performance: 15.10M ops/s (stable, 0% CV) - Build: Clean compilation, no new warnings Testing: - 10 runs of 100K iterations: consistent performance - Symbol verification: g_sfc_debug present in hakmem_tiny_sfc.o - No regression detected Note: 3 additional getenv("HAKMEM_SFC_DEBUG") calls exist in hakmem_tiny_ultra_simple.inc but are dead code (file not compiled in current build configuration). Files modified: 5 core files Status: Production-ready, all tests passed 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-27 03:18:33 +09:00
Moe Charm (CI)	6fadc74405	ENV cleanup: Remove obsolete ULTRAHOT variable + organize docs Changes: 1. Removed HAKMEM_TINY_FRONT_ENABLE_ULTRAHOT variable - Deleted front_prune_ultrahot_enabled() function - UltraHot feature was removed in commit `bcfb4f6b5` - Variable was dead code, no longer referenced 2. Organized ENV cleanup analysis documents - Moved 5 ENV analysis docs to docs/analysis/ - ENV_CLEANUP_PLAN.md - detailed file-by-file plan - ENV_CLEANUP_SUMMARY.md - executive summary - ENV_CLEANUP_ANALYSIS.md - categorized analysis - ENV_CONSOLIDATION_PLAN.md - consolidation proposals - ENV_QUICK_REFERENCE.md - quick reference guide Impact: - ENV variables: 221 → 220 (-1) - Build: ✅ Successful - Risk: Zero (dead code removal) Next steps (documented in ENV_CLEANUP_SUMMARY.md): - 21 variables need verification (Ultra/HeapV2/BG/HotMag) - SFC_DEBUG deduplication opportunity (7 callsites) File: core/box/front_metrics_box.h Status: SAVEPOINT - stable baseline for future ENV cleanup 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-26 17:12:41 +09:00
Moe Charm (CI)	bea839add6	Revert "Port: Tune Superslab Min-Keep and Shared Pool Soft Caps (04a60c316)" This reverts commit `d355041638`.	2025-11-26 15:43:45 +09:00
Moe Charm (CI)	d355041638	Port: Tune Superslab Min-Keep and Shared Pool Soft Caps (04a60c316) - Policy: Set tiny_min_keep for C2-C6 to reduce mmap/munmap churn - Policy: Loosen tiny_cap (soft cap) for C4-C6 to allow more active slots - Added tiny_min_keep field to FrozenPolicy struct Larson: 52.13M ops/s (stable) 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-26 15:06:36 +09:00
Moe Charm (CI)	a2e65716b3	Port: Optimize tiny_get_max_size inline (e81fe783d) - Move tiny_get_max_size to header for inlining - Use cached static variable to avoid repeated env lookup - Larson: 51.99M ops/s (stable) 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-26 15:05:03 +09:00
Moe Charm (CI)	a9ddb52ad4	ENV cleanup: Remove BG/HotMag vars & guard fprintf (Larson 52.3M ops/s) Phase 1 完了：環境変数整理 + fprintf デバッグガード ENV変数削除（BG/HotMag系）: - core/hakmem_tiny_init.inc: HotMag ENV 削除 (~131 lines) - core/hakmem_tiny_bg_spill.c: BG spill ENV 削除 - core/tiny_refill.h: BG remote 固定値化 - core/hakmem_tiny_slow.inc: BG refs 削除 fprintf Debug Guards (#if !HAKMEM_BUILD_RELEASE): - core/hakmem_shared_pool.c: Lock stats (~18 fprintf) - core/page_arena.c: Init/Shutdown/Stats (~27 fprintf) - core/hakmem.c: SIGSEGV init message ドキュメント整理: - 328 markdown files 削除（旧レポート・重複docs）性能確認: - Larson: 52.35M ops/s (前回52.8M、安定動作✅) - ENV整理による機能影響なし - Debug出力は一部残存（次phase で対応） 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-26 14:45:26 +09:00
Moe Charm (CI)	67fb15f35f	Wrap debug fprintf in !HAKMEM_BUILD_RELEASE guards (Release build optimization) ## Changes ### 1. core/page_arena.c - Removed init failure message (lines 25-27) - error is handled by returning early - All other fprintf statements already wrapped in existing #if !HAKMEM_BUILD_RELEASE blocks ### 2. core/hakmem.c - Wrapped SIGSEGV handler init message (line 72) - CRITICAL: Kept SIGSEGV/SIGBUS/SIGABRT error messages (lines 62-64) - production needs crash logs ### 3. core/hakmem_shared_pool.c - Wrapped all debug fprintf statements in #if !HAKMEM_BUILD_RELEASE: - Node pool exhaustion warning (line 252) - SP_META_CAPACITY_ERROR warning (line 421) - SP_FIX_GEOMETRY debug logging (line 745) - SP_ACQUIRE_STAGE0.5_EMPTY debug logging (line 865) - SP_ACQUIRE_STAGE0_L0 debug logging (line 803) - SP_ACQUIRE_STAGE1_LOCKFREE debug logging (line 922) - SP_ACQUIRE_STAGE2_LOCKFREE debug logging (line 996) - SP_ACQUIRE_STAGE3 debug logging (line 1116) - SP_SLOT_RELEASE debug logging (line 1245) - SP_SLOT_FREELIST_LOCKFREE debug logging (line 1305) - SP_SLOT_COMPLETELY_EMPTY debug logging (line 1316) - Fixed lock_stats_init() for release builds (lines 60-65) - ensure g_lock_stats_enabled is initialized ## Performance Validation Before: 51M ops/s (with debug fprintf overhead) After: 49.1M ops/s (consistent performance, fprintf removed from hot paths) ## Build & Test ```bash ./build.sh larson_hakmem ./out/release/larson_hakmem 1 5 1 1000 100 10000 42 # Result: 49.1M ops/s ``` Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-26 13:14:18 +09:00
Moe Charm (CI)	4e082505cc	Cleanup: Wrap shared_pool debug fprintf in #if !HAKMEM_BUILD_RELEASE - Lock stats (P0 instrumentation): ~10 fprintf wrapped - Stage stats (S1/S2/S3 breakdown): ~8 fprintf wrapped - Release build now has no-op stubs for stats init functions - Data collection APIs kept for learning layer compatibility	2025-11-26 13:05:17 +09:00
Moe Charm (CI)	6b38bc840e	Cleanup: Remove unused hakmem_libc.c (duplicate of hakmem_syscall.c) - File was not included in Makefile OBJS_BASE - Functions already implemented in hakmem_syscall.c - Size: 361 bytes removed	2025-11-26 13:03:17 +09:00
Moe Charm (CI)	bcfb4f6b59	Remove dead code: UltraHot, RingCache, FrontC23, Class5 Hotpath (cherry-picked from 225b6fcc7, conflicts resolved)	2025-11-26 12:33:49 +09:00
Moe Charm (CI)	feadc2832f	Legacy cleanup: Remove obsolete test files and #if 0 blocks (-1,750 LOC) (cherry-picked from cc0104c4e)	2025-11-26 12:31:04 +09:00
Moe Charm (CI)	950627587a	Remove legacy/unused code: 6 .inc files + disabled #if 0 block (1,159 LOC) (cherry-picked from 9793f17d6)	2025-11-26 12:30:30 +09:00

... 4 5 6 7 8 ...

463 Commits