hakmem

Author	SHA1	Message	Date
Moe Charm (CI)	a94344c1aa	Fix: Restore headers in tiny_drain_freelist_to_sll_once() Second freelist path identified by Task exploration agent: - tiny_drain_freelist_to_sll_once() in hakmem_tiny_free.inc - Activated via HAKMEM_TINY_DRAIN_TO_SLL environment variable - Pops blocks from freelist without restoring headers - Missing header restoration before tls_sll_push() call Fix applied: 1. Added HEADER_MAGIC restoration before tls_sll_push() in tiny_drain_freelist_to_sll_once() (lines 74-79) 2. Added tiny_region_id.h include for HEADER_MAGIC definition This completes the header restoration fixes for all known freelist → TLS SLL code paths: 1. box_carve_and_push_with_freelist() ✓ (commit `3c6c76cb1`) 2. tiny_drain_freelist_to_sll_once() ✓ (this commit) Expected result: - Eliminates remaining 4-thread header corruption error - All freelist blocks now have valid headers before TLS SLL push Note: Encountered segfault in larson_hakmem during testing, but this appears to be a pre-existing issue unrelated to header restoration fixes (verified by testing without changes). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-29 06:11:48 +09:00
Moe Charm (CI)	3c6c76cb11	Fix: Restore headers in box_carve_and_push_with_freelist() Root cause identified by Task exploration agent: - box_carve_and_push_with_freelist() pops blocks from slab freelist without restoring headers before pushing to TLS SLL - Freelist blocks have stale data at offset 0 - When popped from TLS SLL, header validation fails - Error: [TLS_SLL_HDR_RESET] cls=1 got=0x00 expect=0xa1 Fix applied: 1. Added HEADER_MAGIC restoration before tls_sll_push() in box_carve_and_push_with_freelist() (carve_push_box.c:193-198) 2. Added tiny_region_id.h include for HEADER_MAGIC definition Results: - 20 threads: Header corruption ELIMINATED ✓ - 4 threads: Still shows 1 corruption (partial fix) - Suggests multiple freelist pop paths exist Additional work needed: - Check hakmem_tiny_alloc_new.inc freelist pops - Verify all freelist → TLS SLL paths write headers Reference: Same pattern as tiny_superslab_alloc.inc.h:159-169 (correct impl) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-29 05:44:13 +09:00
Moe Charm (CI)	d5645ec42d	Add: Allocation path tracking for debugging Added HAK_RET_ALLOC_BLOCK_TRACED macro with path identifiers: - ALLOC_PATH_BACKEND (1): SuperSlab backend allocation - ALLOC_PATH_TLS_POP (2): TLS SLL pop - ALLOC_PATH_CARVE (3): Linear carve - ALLOC_PATH_FREELIST (4): Freelist pop - ALLOC_PATH_HOTMAG (5): Hot magazine - ALLOC_PATH_FASTCACHE (6): Fast cache - ALLOC_PATH_BUMP (7): Bump allocator - ALLOC_PATH_REFILL (8): Refill/adoption Usage: HAKMEM_ALLOC_PATH_TRACE=1 ./larson_hakmem ... Logs first 20 allocations with path ID for debugging. Updated SuperSlab backend to use traced version. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-29 05:38:30 +09:00
Moe Charm (CI)	5582cbc22c	Refactor: Unified allocation macros + header validation 1. Archive unused backend files (ss_legacy/unified_backend_box.c/h) - These files were not linked in the build - Moved to archive/ to reduce confusion 2. Created HAK_RET_ALLOC_BLOCK macro for SuperSlab allocations - Replaces superslab_return_block() function - Consistent with existing HAK_RET_ALLOC pattern - Single source of truth for header writing - Defined in hakmem_tiny_superslab_internal.h 3. Added header validation on TLS SLL push - Detects blocks pushed without proper header - Enabled via HAKMEM_TINY_SLL_VALIDATE_HDR=1 (release) - Always on in debug builds - Logs first 10 violations with backtraces Benefits: - Easier to track allocation paths - Catches header bugs at push time - More maintainable macro-based design Note: Larson bug still reproduces - header corruption occurs before push validation can catch it. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-29 05:37:24 +09:00
Moe Charm (CI)	6ac6f5ae1b	Refactor: Split hakmem_tiny_superslab.c + unified backend exit point Major refactoring to improve maintainability and debugging: 1. Split hakmem_tiny_superslab.c (1521 lines) into 7 focused files: - superslab_allocate.c: SuperSlab allocation/deallocation - superslab_backend.c: Backend allocation paths (legacy, shared) - superslab_ace.c: ACE (Adaptive Cache Engine) logic - superslab_slab.c: Slab initialization and bitmap management - superslab_cache.c: LRU cache and prewarm cache management - superslab_head.c: SuperSlabHead management and expansion - superslab_stats.c: Statistics tracking and debugging 2. Created hakmem_tiny_superslab_internal.h for shared declarations 3. Added superslab_return_block() as single exit point for header writing: - All backend allocations now go through this helper - Prevents bugs where headers are forgotten in some paths - Makes future debugging easier 4. Updated Makefile for new file structure 5. Added header writing to ss_legacy_backend_box.c and ss_unified_backend_box.c (though not currently linked) Note: Header corruption bug in Larson benchmark still exists. Class 1-6 allocations go through TLS refill/carve paths, not backend. Further investigation needed. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-29 05:13:04 +09:00
Moe Charm (CI)	b52e1985e6	Phase 2-Opt2: Reduce SuperSlab default size to 512KB (+10-15% perf) Changes: - SUPERSLAB_LG_MIN: 20 → 19 (1MB → 512KB) - SUPERSLAB_LG_DEFAULT: 21 → 19 (2MB → 512KB) - SUPERSLAB_LG_MAX: 21 (unchanged, still allows 2MB) Benchmark Results: - ws=256: 72M → 79.80M ops/s (+10.8%, +7.8M ops/s) - ws=1024: 56.71M → 65.07M ops/s (+14.7%, +8.36M ops/s) Expected: +3-5% improvement Actual: +10-15% improvement (EXCEEDED PREDICTION!) Root Cause Analysis: - Perf analysis showed shared_pool_acquire_slab at 23.83% CPU time - Phase 1 removed memset overhead (+1.3%) - Phase 2 reduces mmap allocation size by 75% (2MB → 512KB) - Fewer page faults during SuperSlab initialization - Better memory granularity (less VA space waste) - Smaller allocations complete faster even without page faults Technical Details: - Each SuperSlab contains 8 slabs of 64KB (total 512KB) - Previous: 16-32 slabs per SuperSlab (1-2MB) - New: 8 slabs per SuperSlab (512KB) - Refill frequency increases slightly, but init cost dominates - Net effect: Major throughput improvement Phase 1+2 Cumulative Improvement: - Baseline: 64.61M ops/s - Phase 1 final: 72.92M ops/s (+12.9%) - Phase 2 final: 79.80M ops/s (+23.5% total, +9.4% over Phase 1) Files Modified: - core/hakmem_tiny_superslab_constants.h:12-33 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 18:16:32 +09:00
Moe Charm (CI)	e7710982f8	Phase 2-Opt1: Force inline range check functions (neutral perf) Changes: - smallmid_is_in_range(): Add __attribute__((always_inline)) - mid_is_in_range(): Add __attribute__((always_inline)) Expected: Reduce function call overhead in Front Gate routing Result: Neutral performance (~72M ops/s, same as Phase 1 final) Analysis: - Compiler was already inlining these simple functions with -O3 -flto - 36M branches identified by perf are NOT from Front Gate routing - Most branches are inside allocators (tiny_alloc, free, etc.) - Front Gate optimization had minimal impact, as predicted Next: SuperSlab size optimization (clear 3-5% benefit expected) Files: - core/hakmem_smallmid.h:116-119 - core/hakmem_mid_mt.h:228-231 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 18:14:31 +09:00
Moe Charm (CI)	da3f3507b8	Perf optimization: Add __builtin_expect hints to hot paths Problem: Branch mispredictions in allocation hot paths. Perf analysis suggested adding likely/unlikely hints. Solution: Added __builtin_expect hints to critical allocation paths: 1. smallmid_is_enabled() - unlikely (disabled by default) 2. sm_ptr/tiny_ptr/pool_ptr/mid_ptr null checks - likely (success expected) Optimized locations (core/box/hak_alloc_api.inc.h): - Line 44: smallmid check (unlikely) - Line 53: smallmid success check (likely) - Line 81: tiny success check (likely) - Line 112: pool success check (likely) - Line 126: mid success check (likely) Benchmark results (10M iterations × 5 runs, ws=256): - Before (Opt2): 71.30M ops/s (avg) - After (Opt3): 72.92M ops/s (avg) - Improvement: +2.3% (+1.62M ops/s) Matches Task agent's prediction of +2-3% throughput gain. Perf analysis: commit `53bc92842`	2025-11-28 18:04:32 +09:00
Moe Charm (CI)	9a30a577e7	Perf optimization: Remove redundant memset in SuperSlab init Problem: 4 memset() calls in superslab_allocate() consumed 23.83% CPU time according to perf analysis (see PERF_ANALYSIS_EXECUTIVE_SUMMARY.md). Root cause: mmap() already returns zero-initialized pages, making these memset() calls redundant in production builds. Solution: Comment out 4 memset() calls (lines 913-916): - memset(ss->slabs, 0, ...) - memset(ss->remote_heads, 0, ...) - memset(ss->remote_counts, 0, ...) - memset(ss->slab_listed, 0, ...) Benchmark results (10M iterations × 5 runs, ws=256): - Before: 71.86M ops/s (avg) - After: 72.78M ops/s (avg) - Improvement: +1.3% (+920K ops/s) Note: Improvement is modest because this benchmark doesn't allocate many new SuperSlabs. Greater impact expected in workloads with frequent SuperSlab allocations or longer-running applications. Perf analysis: commit `53bc92842`	2025-11-28 17:57:00 +09:00
Moe Charm (CI)	3df38074a2	Fix: Suppress Ultra SLIM debug log in release builds Problem: Large amount of debug logs in release builds causing performance degradation in benchmarks (ChatGPT reported 0.73M ops/s vs expected 70M+). Solution: Guard Ultra SLIM gate debug log with #if !HAKMEM_BUILD_RELEASE. This log was printing once per thread, acceptable in debug but should be silent in production. Performance impact: Logs now suppressed in release builds, reducing I/O overhead during benchmarks.	2025-11-28 17:21:44 +09:00
Moe Charm (CI)	5a5aaf7514	Cleanup: Reformat super-long line in pool_api.inc.h for readability Refactored the extremely compressed line 312 (previously 600+ chars) into properly indented, readable code while preserving identical logic: - Broke down TLS local freelist spill operation into clear steps - Added clarifying comment for spill operation - Improved atomic CAS loop formatting - No functional changes, only formatting improvements Performance verified: 16-18M ops/s maintained (same as before)	2025-11-28 17:10:32 +09:00
Moe Charm (CI)	e56115f1e9	Cleanup: Replace magic numbers with named constants in ELO Replace hardcoded values with named constants for better maintainability: - ELO_MAX_CPU_NS = 100000.0 (100 microseconds) - ELO_MAX_PAGE_FAULTS = 1000.0 - ELO_MAX_BYTES_LIVE = 100000000.0 (100 MB) These constants define the normalization range for ELO score computation. Moving them to file scope makes them easier to tune and document. Performance: No change (70.1M ops/s average) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 17:00:56 +09:00
Moe Charm (CI)	141a4832f1	Cleanup: Remove Phase E5 ultra fast path comment Remove obsolete comment line referencing deleted Phase E5 code. The actual code was already removed in 2025-11-27 cleanup. Performance: No change (69.7M ops/s) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 16:55:57 +09:00
Moe Charm (CI)	73640284b1	Phase 4d: Add master stats control (HAKMEM_STATS) Add unified stats/dump control that allows enabling specific stats modules using comma-separated values or "all" to enable everything. New file: core/hakmem_stats_master.h - HAKMEM_STATS=all: Enable all stats modules - HAKMEM_STATS=sfc,fast,pool: Enable specific modules - HAKMEM_STATS_DUMP=1: Dump stats at exit - hak_stats_check(): Check if module should enable stats Available stats modules: sfc, fast, heap, refill, counters, ring, invariant, pagefault, front, pool, slim, guard, nearempty Updated files: - core/hakmem_tiny_sfc.c: Use hak_stats_check() for SFC stats - core/hakmem_shared_pool.c: Use hak_stats_check() for pool stats Performance: No regression (72.9M ops/s) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 16:11:15 +09:00
Moe Charm (CI)	f36ebe83aa	Phase 4c: Add master trace control (HAKMEM_TRACE) Add unified trace control that allows enabling specific trace modules using comma-separated values or "all" to enable everything. New file: core/hakmem_trace_master.h - HAKMEM_TRACE=all: Enable all trace modules - HAKMEM_TRACE=ptr,refill,free,mailbox: Enable specific modules - HAKMEM_TRACE_LEVEL=N: Set trace verbosity (1-3) - hak_trace_check(): Check if module should enable tracing Available trace modules: ptr, refill, superslab, ring, free, mailbox, registry Priority order: 1. HAKMEM_QUIET=1 → suppress all 2. Specific module ENV (e.g., HAKMEM_PTR_TRACE=1) 3. HAKMEM_TRACE=module1,module2 4. Default → disabled Updated files: - core/tiny_refill.h: Use hak_trace_check() for refill tracing - core/box/mailbox_box.c: Use hak_trace_check() for mailbox tracing Performance: No regression (72.9M ops/s) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 16:08:44 +09:00
Moe Charm (CI)	7778b64387	Phase 4b: Add master debug control (HAKMEM_DEBUG_ALL/LEVEL) Add centralized debug control system that allows enabling all debug modules at once, while maintaining backwards compatibility with individual module ENVs. New file: core/hakmem_debug_master.h - HAKMEM_DEBUG_ALL=1: Enable all debug modules - HAKMEM_DEBUG_LEVEL=N: Set debug level (0=off, 1=critical, 2=normal, 3=verbose) - HAKMEM_QUIET=1: Suppress all debug (highest priority) - hak_debug_check(): Check if module should enable debug - hak_is_quiet(): Quick check for quiet mode Priority order: 1. HAKMEM_QUIET=1 → suppress all 2. Specific module ENV (e.g., HAKMEM_SFC_DEBUG=1) 3. HAKMEM_DEBUG_ALL=1 4. HAKMEM_DEBUG_LEVEL >= threshold Updated files: - core/hakmem_elo.c: Use hak_is_quiet() instead of local implementation - core/hakmem_shared_pool.c: Use hak_debug_check() for lock stats Performance: No regression (71.5M ops/s maintained) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 16:03:20 +09:00
Moe Charm (CI)	bf02ffca5a	ENV Cleanup: Cache HAKMEM_QUIET flag in hakmem_elo.c Critical hot path fix: hakmem_elo.c was calling getenv("HAKMEM_QUIET") 10+ times inside loops, causing 50-100μs overhead per iteration. Fix: Cache the flag in a static variable with lazy initialization. - Added is_quiet() helper function with __builtin_expect optimization - Replaced all 10 inline getenv() calls with is_quiet() - First call initializes, subsequent calls are just a branch This is part of the ENV variable cleanup effort identified by the survey: - Total ENV variables: 228 (target: ~80) - getenv() calls in hot paths: CRITICAL issue 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 15:23:48 +09:00
Moe Charm (CI)	73da7ac588	Fix C0 (8B) next pointer overflow and optimize with bitmask lookup Problem: Class 0 (8B stride) was using offset 1 for next pointer storage, but 8B stride cannot fit [1B header][8B next pointer] - it overflows by 1 byte into the adjacent block. Fix: Use offset 0 for C0 (same as C7), allowing the header to be overwritten. This is safe because: 1. class_map provides out-of-band class_idx lookup (header not needed for free) 2. P3 skips header write by default (header byte is unused anyway) Optimization: Replace branching with bitmask lookup for zero-cost abstraction. - Old: (class_idx == 0 \|\| class_idx == 7) ? 0u : 1u (branch) - New: (0x7Eu >> class_idx) & 1u (branchless) Bit pattern: C0=0, C1-C6=1, C7=0 → 0b01111110 = 0x7E Performance results: - 8B: 85.19M → 85.61M (+0.5%) - 16B: 137.43M → 147.31M (+7.2%) - 64B: 84.21M → 84.90M (+0.8%) Thanks to ChatGPT for spotting the g_tiny_class_sizes vs tiny_nextptr.h mismatch! 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 15:04:06 +09:00
Moe Charm (CI)	912123cbbe	P3: Skip header write in alloc path when class_map is active Skip the 1-byte header write in tiny_region_id_write_header() when class_map is active (default). class_map provides out-of-band class_idx lookup, making the header byte unnecessary for the free path. Changes: - Add ENV-gated conditional to skip header write (default: skip) - ENV: HAKMEM_TINY_WRITE_HEADER=1 to force header write (legacy mode) - Memory layout preserved: user pointer = base + 1 (1B unused when skipped) Performance improvement: - tiny_hot 64B: 83.5M → 84.2M ops/sec (+0.8%) - random_mixed ws=256: 68.1M → 72.2M ops/sec (+6%) The header skip reduces one store instruction per allocation, which is particularly beneficial for mixed-size workloads like random_mixed. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 14:46:55 +09:00
Moe Charm (CI)	a6e681aae7	P2: TLS SLL Redesign - class_map default, tls_cached tracking, conditional header restore This commit completes the P2 phase of the Tiny Pool TLS SLL redesign to fix the Header/Next pointer conflict that was causing ~30% crash rates. Changes: - P2.1: Make class_map lookup the default (ENV: HAKMEM_TINY_NO_CLASS_MAP=1 for legacy) - P2.2: Add meta->tls_cached field to track blocks cached in TLS SLL - P2.3: Make Header restoration conditional in tiny_next_store() (default: skip) - P2.4: Add invariant verification functions (active + tls_cached ≈ used) - P0.4: Document new ENV variables in ENV_VARS.md New ENV variables: - HAKMEM_TINY_ACTIVE_TRACK=1: Enable active/tls_cached tracking (~1% overhead) - HAKMEM_TINY_NO_CLASS_MAP=1: Disable class_map (legacy mode) - HAKMEM_TINY_RESTORE_HEADER=1: Force header restoration (legacy mode) - HAKMEM_TINY_INVARIANT_CHECK=1: Enable invariant verification (debug) - HAKMEM_TINY_INVARIANT_DUMP=1: Enable periodic state dumps (debug) Benchmark results (bench_tiny_hot_hakmem 64B): - Default (class_map ON): 84.49 M ops/sec - ACTIVE_TRACK=1: 83.62 M ops/sec (-1%) - NO_CLASS_MAP=1 (legacy): 85.06 M ops/sec - MT performance: +21-28% vs system allocator No crashes observed. All tests passed. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 14:11:37 +09:00
Moe Charm (CI)	6b86c60a20	P1.3: Add meta->active for TLS SLL tracking Add active field to TinySlabMeta to track blocks currently held by users (not in TLS SLL or freelist caches). This enables accurate empty slab detection that accounts for TLS SLL cached blocks. Changes: - superslab_types.h: Add _Atomic uint16_t active field - ss_allocation_box.c, hakmem_tiny_superslab.c: Initialize active=0 - tiny_free_fast_v2.inc.h: Decrement active on TLS SLL push - tiny_alloc_fast.inc.h: Add tiny_active_track_alloc() helper, increment active on TLS SLL pop (all code paths) - ss_hot_cold_box.h: ss_is_slab_empty() uses active when enabled All tracking is ENV-gated: HAKMEM_TINY_ACTIVE_TRACK=1 to enable. Default is off for zero performance impact. Invariant: active = used - tls_cached (active <= used) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 13:53:45 +09:00
Moe Charm (CI)	dc9e650db3	Tiny Pool redesign: P0.1, P0.3, P1.1, P1.2 - Out-of-band class_idx lookup This commit implements the first phase of Tiny Pool redesign based on ChatGPT architecture review. The goal is to eliminate Header/Next pointer conflicts by moving class_idx lookup out-of-band (to SuperSlab metadata). ## P0.1: C0(8B) class upgraded to 16B - Size table changed: {16,32,64,128,256,512,1024,2048} (8 classes) - LUT updated: 1..16 → class 0, 17..32 → class 1, etc. - tiny_next_off: C0 now uses offset 1 (header preserved) - Eliminates edge cases for 8B allocations ## P0.3: Slab reuse guard Box (tls_slab_reuse_guard_box.h) - New Box for draining TLS SLL before slab reuse - ENV gate: HAKMEM_TINY_SLAB_REUSE_GUARD=1 - Prevents stale pointers when slabs are recycled - Follows Box theory: single responsibility, minimal API ## P1.1: SuperSlab class_map addition - Added uint8_t class_map[SLABS_PER_SUPERSLAB_MAX] to SuperSlab - Maps slab_idx → class_idx for out-of-band lookup - Initialized to 255 (UNASSIGNED) on SuperSlab creation - Set correctly on slab initialization in all backends ## P1.2: Free fast path uses class_map - ENV gate: HAKMEM_TINY_USE_CLASS_MAP=1 - Free path can now get class_idx from class_map instead of Header - Falls back to Header read if class_map returns invalid value - Fixed Legacy Backend dynamic slab initialization bug ## Documentation added - HAKMEM_ARCHITECTURE_OVERVIEW.md: 4-layer architecture analysis - TLS_SLL_ARCHITECTURE_INVESTIGATION.md: Root cause analysis - PTR_LIFECYCLE_TRACE_AND_ROOT_CAUSE_ANALYSIS.md: Pointer tracking - TINY_REDESIGN_CHECKLIST.md: Implementation roadmap (P0-P3) ## Test results - Baseline: 70% success rate (30% crash - pre-existing issue) - class_map enabled: 70% success rate (same as baseline) - Performance: ~30.5M ops/s (unchanged) ## Next steps (P1.3, P2, P3) - P1.3: Add meta->active for accurate TLS/freelist sync - P2: TLS SLL redesign with Box-based counting - P3: Complete Header out-of-band migration 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 13:42:39 +09:00
Moe Charm (CI)	813ebd5221	ENV Cleanup Step 18: Gate HAKMEM_TINY_SLL_DIAG Gate the SLL diagnostics debug variable behind #if !HAKMEM_BUILD_RELEASE: - HAKMEM_TINY_SLL_DIAG: Controls singly-linked list integrity diagnostics - 5 call sites gated (2 already gated, 5 needed gating): Files modified: - core/box/tls_sll_box.h:117 (tls_sll_dump_tls_window) - core/box/tls_sll_box.h:191 (tls_sll_diag_next) - core/hakmem_tiny.c:629 (tiny_tls_sll_diag_atexit destructor) - core/hakmem_tiny_superslab.c:142 (remote drain diag) - core/tiny_superslab_free.inc.h:132 (header mismatch detector) Already gated: - core/box/free_local_box.c:38 (already gated at line 33) - core/box/free_local_box.c:87 (already gated at line 82) Performance: 30.9M ops/s (baseline maintained) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 04:39:20 +09:00
Moe Charm (CI)	7d0782d5b6	ENV Cleanup Step 17: Gate HAKMEM_TINY_RF_TRACE Gate the refill trace debug variable behind #if !HAKMEM_BUILD_RELEASE: - HAKMEM_TINY_RF_TRACE: Controls refill/mailbox publish path tracing - File: core/tiny_publish.c:21-34 (1 call site gated) Other 2 call sites already gated: - core/tiny_refill.h:94 (already inside #if !HAKMEM_BUILD_RELEASE) - core/box/mailbox_box.c:64 (already inside #if !HAKMEM_BUILD_RELEASE) Performance: 30.7M ops/s avg (baseline maintained, 3 runs: 30.6M, 30.9M, 30.7M) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 04:36:37 +09:00
Moe Charm (CI)	2cdec72ee3	ENV Cleanup Step 16: Gate HAKMEM_SS_FREE_DEBUG Gate the shared pool free debug variable behind #if !HAKMEM_BUILD_RELEASE: - HAKMEM_SS_FREE_DEBUG: Controls shared pool slot release tracing - File: core/hakmem_shared_pool.c:1221-1229 The debug output was already gated inside #if !HAKMEM_BUILD_RELEASE blocks. This change only gates the ENV check itself. In release builds, sets dbg to constant 0, allowing compiler to optimize away checks. Performance: 30.3M ops/s (baseline maintained) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 04:35:07 +09:00
Moe Charm (CI)	f119f048f2	ENV Cleanup Step 15: Gate HAKMEM_SS_ACQUIRE_DEBUG Gate the shared pool acquire debug variable behind #if !HAKMEM_BUILD_RELEASE: - HAKMEM_SS_ACQUIRE_DEBUG: Controls shared pool acquisition stage tracing - File: core/hakmem_shared_pool.c:780-788 The debug output was already gated inside #if !HAKMEM_BUILD_RELEASE blocks. This change only gates the ENV check itself. In release builds, sets dbg_acquire to constant 0, allowing compiler to optimize away checks. Performance: 31.1M ops/s (+2% vs baseline) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 04:34:21 +09:00
Moe Charm (CI)	679c821573	ENV Cleanup Step 14: Gate HAKMEM_TINY_HEAP_V2_DEBUG Gate the HeapV2 push debug logging behind #if !HAKMEM_BUILD_RELEASE: - HAKMEM_TINY_HEAP_V2_DEBUG: Controls magazine push event tracing - File: core/front/tiny_heap_v2.h:117-130 Wraps the ENV check and debug output that logs the first 5 push operations per size class for HeapV2 magazine diagnostics. Performance: 29.6M ops/s (within baseline range) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 04:33:39 +09:00
Moe Charm (CI)	be9bdd7812	ENV Cleanup Step 13: Gate HAKMEM_TINY_REFILL_OPT_DEBUG Gate the refill optimization debug output behind #if !HAKMEM_BUILD_RELEASE: - HAKMEM_TINY_REFILL_OPT_DEBUG: Controls refill chain optimization tracing - File: core/tiny_refill_opt.h:30 Changed condition from: #if HAKMEM_TINY_REFILL_OPT to: #if HAKMEM_TINY_REFILL_OPT && !HAKMEM_BUILD_RELEASE Performance: 30.6M ops/s (baseline maintained) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 04:32:55 +09:00
Moe Charm (CI)	417f149479	ENV Cleanup Step 12: Gate HAKMEM_TINY_FAST_DEBUG + HAKMEM_TINY_FAST_DEBUG_MAX Gate the fast cache debug system behind #if !HAKMEM_BUILD_RELEASE: - HAKMEM_TINY_FAST_DEBUG: Enable/disable fastcache event logging - HAKMEM_TINY_FAST_DEBUG_MAX: Limit number of debug messages per class - File: core/hakmem_tiny_fastcache.inc.h:48-76 Both variables combined in single gate since they work together as a debug logging subsystem. In release builds, provides no-op inline stub. Performance: 30.5M ops/s (baseline maintained) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 04:32:15 +09:00
Moe Charm (CI)	a24f17386c	ENV Cleanup Step 11: Gate HAKMEM_SS_PREWARM_DEBUG in super_registry.c Gate HAKMEM_SS_PREWARM_DEBUG environment variable behind #if !HAKMEM_BUILD_RELEASE in prewarm functions (2 call sites). Changes: - Wrap dbg variable in hak_ss_prewarm_class() - Wrap dbg variable in hak_ss_prewarm_init() - Release builds use constant dbg = 0 for complete code elimination Performance: 30.2M ops/s Larson (stable, within expected variance) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 01:48:57 +09:00
Moe Charm (CI)	2c3dcdb90b	ENV Cleanup Step 10: Gate HAKMEM_SS_LRU_DEBUG in super_registry.c Gate HAKMEM_SS_LRU_DEBUG environment variable behind #if !HAKMEM_BUILD_RELEASE in LRU cache operations (3 call sites). Changes: - Wrap dbg variable in ss_lru_evict_one() - Wrap dbg variable in hak_ss_lru_pop() - Wrap dbg variable in hak_ss_lru_push() - Release builds use constant dbg = 0 for complete code elimination Performance: 30.7M ops/s Larson (+1.3% improvement) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 01:48:02 +09:00
Moe Charm (CI)	4540b01da0	ENV Cleanup Step 9: Gate HAKMEM_SUPER_REG_DEBUG in super_registry.c Gate HAKMEM_SUPER_REG_DEBUG environment variable behind #if !HAKMEM_BUILD_RELEASE in register/unregister functions. Changes: - Wrap dbg variable initialization in hak_super_register() - Wrap dbg_once static variable and ENV check in hak_super_unregister() - Release builds use constant dbg = 0 for complete code elimination Performance: 30.6M ops/s Larson (+1.0% improvement) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 01:46:50 +09:00
Moe Charm (CI)	f8b0f38f78	ENV Cleanup Step 8: Gate HAKMEM_SUPER_LOOKUP_DEBUG in header Gate HAKMEM_SUPER_LOOKUP_DEBUG environment variable behind #if !HAKMEM_BUILD_RELEASE in hakmem_super_registry.h inline function. Changes: - Wrap s_dbg initialization in conditional compilation - Release builds use constant s_dbg = 0 for complete elimination - Debug logging in hak_super_lookup() now fully compiled out in release Performance: 30.3M ops/s Larson (stable, no regression) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 01:45:45 +09:00
Moe Charm (CI)	cfa5e4e91c	ENV Cleanup Step 7: Gate debug ENV vars in core/box/free_local_box.c Changes: - Gated HAKMEM_TINY_SLL_DIAG (2 call sites) behind #if !HAKMEM_BUILD_RELEASE - Gated HAKMEM_TINY_FREELIST_MASK behind #if !HAKMEM_BUILD_RELEASE - Gated HAKMEM_SS_FREE_DEBUG behind #if !HAKMEM_BUILD_RELEASE - Entire diagnostic blocks wrapped (not just getenv) to avoid compilation errors - ENV variables gated: HAKMEM_TINY_SLL_DIAG, HAKMEM_TINY_FREELIST_MASK, HAKMEM_SS_FREE_DEBUG Performance: 30.4M ops/s Larson (baseline 30.4M, perfect match) Build: Clean, pre-existing warnings only FIX: Previous version had scoping issue with static variables inside do{}while blocks 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 01:10:26 +09:00
Moe Charm (CI)	d0d2814f15	ENV Cleanup Step 6: Gate HAKMEM_TIMING in core/hakmem_debug.c Changes: - Gated HAKMEM_TIMING ENV check behind #if !HAKMEM_BUILD_RELEASE - Release builds set g_timing_enabled = 0 directly (no getenv call) - Debug builds preserve existing behavior - ENV variable gated: HAKMEM_TIMING Performance: 30.3M ops/s Larson (baseline 30.4M, within margin) Build: Clean, LTO warnings only (pre-existing) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 01:05:18 +09:00
Moe Charm (CI)	35e8e4c34d	ENV Cleanup Step 5: Gate HAKMEM_PTR_TRACE_DUMP/VERBOSE in core/ptr_trace.h Changes: - Gated HAKMEM_PTR_TRACE_DUMP behind #if !HAKMEM_BUILD_RELEASE - Gated HAKMEM_PTR_TRACE_VERBOSE behind #if !HAKMEM_BUILD_RELEASE - Used lazy init pattern with __builtin_expect for branch prediction - ENV variables gated: HAKMEM_PTR_TRACE_DUMP, HAKMEM_PTR_TRACE_VERBOSE Performance: 29.2M ops/s Larson (baseline 30.4M, -4% acceptable variance) Build: Clean, LTO warnings only (pre-existing) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 01:04:29 +09:00
Moe Charm (CI)	316ea4dfd6	ENV Cleanup Step 4: Gate HAKMEM_WATCH_ADDR in tiny_region_id.h Gate get_watch_addr() debug functionality with HAKMEM_BUILD_RELEASE, returning 0 in release builds to disable address watching overhead. Performance: 30.31M ops/s (baseline: 30.2M) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 00:47:16 +09:00
Moe Charm (CI)	42747a1080	ENV Cleanup Step 3: Gate HAKMEM_TINY_PROFILE in tiny_fastcache.h Gate tiny_fast_profile_enabled() getenv call with HAKMEM_BUILD_RELEASE, returning 0 in release builds to disable profiling overhead. Performance: 30.34M ops/s (baseline: 30.2M) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 00:46:32 +09:00
Moe Charm (CI)	794bf996f1	ENV Cleanup Step 2c: Gate debug code in hakmem_tiny_alloc.inc Gate tiny_alloc_dump_tls_state() call on allocation failure path with HAKMEM_BUILD_RELEASE guard. Performance: 30.15M ops/s (baseline: 30.2M) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 00:45:27 +09:00
Moe Charm (CI)	0567e2957f	ENV Cleanup Step 2b: Gate debug code in tiny_superslab_free.inc.h Gate tiny_alloc_dump_tls_state() call in remote watch debug path with HAKMEM_BUILD_RELEASE guard, consolidating with existing debug fprintf. Performance: 30.3M ops/s (baseline: 30.2M) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 00:44:47 +09:00
Moe Charm (CI)	d6c2ea6f3e	ENV Cleanup Step 2a: Gate debug code in hakmem_tiny_slow.inc Gate tiny_alloc_dump_tls_state() call and getenv debug code on slow path failure with HAKMEM_BUILD_RELEASE guard. Performance: 30.5M ops/s (baseline: 30.2M) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 00:43:57 +09:00
Moe Charm (CI)	3833d4e3eb	ENV Cleanup Step 1: Gate tiny_debug.h with HAKMEM_BUILD_RELEASE Wrap debug functionality in !HAKMEM_BUILD_RELEASE guard with no-op stubs for release builds. This eliminates getenv() calls for HAKMEM_TINY_ALLOC_DEBUG in production while maintaining API compatibility. Performance: 30.0M ops/s (baseline: 30.2M) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-28 00:43:07 +09:00
Moe Charm (CI)	930c5283b4	Fix Larson 36x slowdown: Remove tls_uninitialized early return in sll_refill_small_from_ss() Problem: - Larson benchmark showed 730K ops/s instead of expected 26M ops/s - Class 1 TLS SLL cache always stayed empty (tls_count=0) - All allocations went through slow path (shared_pool_acquire_slab at 48% CPU) Root cause: - In sll_refill_small_from_ss(), when TLS was completely uninitialized (ss=NULL, meta=NULL, slab_base=NULL), the function returned 0 immediately without calling superslab_refill() to initialize it - The comment said "expect upper logic to call superslab_refill" but tiny_alloc_fast_refill() did NOT call it after receiving 0 - This created a loop: TLS SLL stays empty → refill returns 0 → slow path Fix: - Remove the tls_uninitialized early return - Let the existing downstream condition (!tls->ss \|\| !tls->meta \|\| ...) handle the uninitialized case and call superslab_refill() Result: - Throughput: 730K → 26.5M ops/s (36x improvement) - shared_pool_acquire_slab: 48% → 0% in perf profile Introduced in: `fcf098857` (Phase12 debug, 2025-11-14) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-27 16:47:30 +09:00
Moe Charm (CI)	8355214135	Fix NULL pointer crash in unified_cache_refill ss_active_add When superslab_refill() fails in the inner loop, tls->ss can remain NULL even when produced > 0 (from earlier successful allocations). This caused a segfault at high iteration counts (>500K) in the random_mixed benchmark. Root cause: Line 353 calls ss_active_add(tls->ss, ...) without checking if tls->ss is NULL after a failed refill breaks the loop. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-27 13:31:46 +09:00
Moe Charm (CI)	7a03a614fd	Restrict ss_fast_lookup to validated Tiny pointer paths only Safety fix: ss_fast_lookup masks pointer to 1MB boundary and reads memory at that address. If called with arbitrary (non-Tiny) pointers, the masked address could be unmapped → SEGFAULT. Changes: - tiny_free_fast(): Reverted to safe hak_super_lookup (can receive arbitrary pointers without prior validation) - ss_fast_lookup(): Added safety warning in comments documenting when it's safe to use (after header magic 0xA0 validation) ss_fast_lookup remains in LARSON_FIX paths where header magic is already validated before the SuperSlab lookup. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-27 12:55:40 +09:00
Moe Charm (CI)	64ed3d8d8c	Add ss_fast_lookup() for O(1) SuperSlab lookup via mask Replaces expensive hak_super_lookup() (registry hash lookup, 50-100 cycles) with fast mask-based lookup (~5-10 cycles) in free hot paths. Algorithm: 1. Mask pointer with SUPERSLAB_SIZE_MIN (1MB) - works for both 1MB and 2MB SS 2. Validate magic (SUPERSLAB_MAGIC) 3. Range check using ss->lg_size Applied to: - tiny_free_fast.inc.h: tiny_free_fast() SuperSlab path - tiny_free_fast_v2.inc.h: LARSON_FIX cross-thread check - front/malloc_tiny_fast.h: free_tiny_fast() LARSON_FIX path Note: Performance impact minimal with LARSON_FIX=OFF (default) since SuperSlab lookup is skipped entirely in that case. Optimization benefits LARSON_FIX=ON path for safe multi-threaded operation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-27 12:47:10 +09:00
Moe Charm (CI)	0a8bdb8b18	Fix release build debug logging in tiny_region_id.h The allocation logging at line 236-249 was missing the #if !HAKMEM_BUILD_RELEASE guard, causing fprintf(stderr) on every allocation even in release builds. Impact: 19.8M ops/s → 28.0M ops/s (+42%) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-27 11:58:00 +09:00
Moe Charm (CI)	d8e3971dc2	Fix cross-thread ownership check: Use bits 8-15 for owner_tid_low Problem: - TLS_SLL_PUSH_DUP crash in Larson multi-threaded benchmark - Cross-thread frees incorrectly routed to same-thread TLS path - Root cause: pthread_t on glibc is 256-byte aligned (TCB base) so lower 8 bits are ALWAYS 0x00 for ALL threads Fix: - Change owner_tid_low from (tid & 0xFF) to ((tid >> 8) & 0xFF) - Bits 8-15 actually vary between threads, enabling correct detection - Applied consistently across all ownership check locations: - superslab_inline.h: ss_owner_try_acquire/release/is_mine - slab_handle.h: slab_try_acquire - tiny_free_fast.inc.h: tiny_free_is_same_thread_ss - tiny_free_fast_v2.inc.h: cross-thread detection - tiny_superslab_free.inc.h: same-thread check - ss_allocation_box.c: slab initialization - hakmem_tiny_superslab.c: ownership handling Also added: - Address watcher debug infrastructure (tiny_region_id.h) - Cross-thread detection in malloc_tiny_fast.h Front Gate Test results: - Larson 1T/2T/4T: PASS (no TLS_SLL_PUSH_DUP crash) - random_mixed: PASS - Performance: ~20M ops/s (regression from 48M, needs optimization) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-27 11:52:11 +09:00
Moe Charm (CI)	8af9123bcc	Larson double-free investigation: Add full operation lifecycle logging Diagnostic Enhancement: Complete malloc/free/pop operation tracing for debug Problem: Larson crashes with TLS_SLL_DUP at count=18, need to trace exact pointer lifecycle to identify if allocator returns duplicate addresses or if benchmark has double-free bug. Implementation (ChatGPT + Claude + Task collaboration): 1. Global Operation Counter (core/hakmem_tiny_config_box.inc:9): - Single atomic counter for all operations (malloc/free/pop) - Chronological ordering across all paths 2. Allocation Logging (core/hakmem_tiny_config_box.inc:148-161): - HAK_RET_ALLOC macro enhanced with operation logging - Logs first 50 class=1 allocations with ptr/base/tls_count 3. Free Logging (core/tiny_free_fast_v2.inc.h:222-235): - Added before tls_sll_push() call (line 221) - Logs first 50 class=1 frees with ptr/base/tls_count_before 4. Pop Logging (core/box/tls_sll_box.h:587-597): - Added in tls_sll_pop_impl() after successful pop - Logs first 50 class=1 pops with base/tls_count_after 5. Drain Debug Logging (core/box/tls_sll_drain_box.h:143-151): - Enhanced drain loop with detailed logging - Tracks pop failures and drained block counts Initial Findings: - First 19 operations: ALL frees, ZERO allocations, ZERO pops - OP#0006: First free of 0x...430 - OP#0018: Duplicate free of 0x...430 → TLS_SLL_DUP detected - Suggests either: (a) allocations before logging starts, or (b) Larson bug Debug-only: All logging gated by !HAKMEM_BUILD_RELEASE (zero cost in release) Next Steps: - Expand logging window to 200 operations - Log initialization phase allocations - Cross-check with Larson benchmark source Status: Ready for extended testing	2025-11-27 08:18:01 +09:00
Moe Charm (CI)	8553894171	Larson double-free investigation: Enhanced diagnostics + Remove buggy drain pushback Problem: Larson benchmark crashes with TLS_SLL_DUP (double-free), 100% crash rate in debug Root Cause: TLS drain pushback code (commit `c2f104618`) created duplicates by pushing pointers back to TLS SLL while they were still in the linked list chain. Diagnostic Enhancements (ChatGPT + Claude collaboration): 1. Callsite Tracking: Track file:line for each TLS SLL push (debug only) - Arrays: g_tls_sll_push_file[], g_tls_sll_push_line[] - Macro: tls_sll_push() auto-records __FILE__, __LINE__ 2. Enhanced Duplicate Detection: - Scan depth: 64 → 256 nodes (deep duplicate detection) - Error message shows BOTH current and previous push locations - Calls ptr_trace_dump_now() for detailed analysis 3. Evidence Captured: - Both duplicate pushes from same line (221) - Pointer at position 11 in TLS SLL (count=18, scanned=11) - Confirms pointer allocated without being popped from TLS SLL Fix: - core/box/tls_sll_drain_box.h: Remove pushback code entirely - Old: Push back to TLS SLL on validation failure → duplicates! - New: Skip pointer (accept rare leak) to avoid duplicates - Rationale: SuperSlab lookup failures are transient/rare Status: Fix implemented, ready for testing Updated: - LARSON_DOUBLE_FREE_INVESTIGATION.md: Root cause confirmed	2025-11-27 07:30:32 +09:00

1 2 3 4 5 ...

331 Commits