316ea4dfd6
ENV Cleanup Step 4: Gate HAKMEM_WATCH_ADDR in tiny_region_id.h
...
Gate get_watch_addr() debug functionality with HAKMEM_BUILD_RELEASE,
returning 0 in release builds to disable address watching overhead.
Performance: 30.31M ops/s (baseline: 30.2M)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-28 00:47:16 +09:00
42747a1080
ENV Cleanup Step 3: Gate HAKMEM_TINY_PROFILE in tiny_fastcache.h
...
Gate tiny_fast_profile_enabled() getenv call with HAKMEM_BUILD_RELEASE,
returning 0 in release builds to disable profiling overhead.
Performance: 30.34M ops/s (baseline: 30.2M)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-28 00:46:32 +09:00
794bf996f1
ENV Cleanup Step 2c: Gate debug code in hakmem_tiny_alloc.inc
...
Gate tiny_alloc_dump_tls_state() call on allocation failure path with
HAKMEM_BUILD_RELEASE guard.
Performance: 30.15M ops/s (baseline: 30.2M)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-28 00:45:27 +09:00
0567e2957f
ENV Cleanup Step 2b: Gate debug code in tiny_superslab_free.inc.h
...
Gate tiny_alloc_dump_tls_state() call in remote watch debug path with
HAKMEM_BUILD_RELEASE guard, consolidating with existing debug fprintf.
Performance: 30.3M ops/s (baseline: 30.2M)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-28 00:44:47 +09:00
d6c2ea6f3e
ENV Cleanup Step 2a: Gate debug code in hakmem_tiny_slow.inc
...
Gate tiny_alloc_dump_tls_state() call and getenv debug code on slow path
failure with HAKMEM_BUILD_RELEASE guard.
Performance: 30.5M ops/s (baseline: 30.2M)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-28 00:43:57 +09:00
3833d4e3eb
ENV Cleanup Step 1: Gate tiny_debug.h with HAKMEM_BUILD_RELEASE
...
Wrap debug functionality in !HAKMEM_BUILD_RELEASE guard with no-op stubs
for release builds. This eliminates getenv() calls for HAKMEM_TINY_ALLOC_DEBUG
in production while maintaining API compatibility.
Performance: 30.0M ops/s (baseline: 30.2M)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-28 00:43:07 +09:00
930c5283b4
Fix Larson 36x slowdown: Remove tls_uninitialized early return in sll_refill_small_from_ss()
...
Problem:
- Larson benchmark showed 730K ops/s instead of expected 26M ops/s
- Class 1 TLS SLL cache always stayed empty (tls_count=0)
- All allocations went through slow path (shared_pool_acquire_slab at 48% CPU)
Root cause:
- In sll_refill_small_from_ss(), when TLS was completely uninitialized
(ss=NULL, meta=NULL, slab_base=NULL), the function returned 0 immediately
without calling superslab_refill() to initialize it
- The comment said "expect upper logic to call superslab_refill" but
tiny_alloc_fast_refill() did NOT call it after receiving 0
- This created a loop: TLS SLL stays empty → refill returns 0 → slow path
Fix:
- Remove the tls_uninitialized early return
- Let the existing downstream condition (!tls->ss || !tls->meta || ...)
handle the uninitialized case and call superslab_refill()
Result:
- Throughput: 730K → 26.5M ops/s (36x improvement)
- shared_pool_acquire_slab: 48% → 0% in perf profile
Introduced in: fcf098857 (Phase12 debug, 2025-11-14)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-27 16:47:30 +09:00
b72519311a
Bench: Include params in output to prevent measurement confusion
...
Output now shows: Throughput = XXX ops/s [iter=N ws=M] time=Xs
This prevents confusion when comparing results measured with different
workset sizes (e.g., ws=256 gives 67M ops/s vs ws=8192 gives 18M ops/s).
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-27 13:48:21 +09:00
8355214135
Fix NULL pointer crash in unified_cache_refill ss_active_add
...
When superslab_refill() fails in the inner loop, tls->ss can remain
NULL even when produced > 0 (from earlier successful allocations).
This caused a segfault at high iteration counts (>500K) in the
random_mixed benchmark.
Root cause: Line 353 calls ss_active_add(tls->ss, ...) without
checking if tls->ss is NULL after a failed refill breaks the loop.
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-27 13:31:46 +09:00
7a03a614fd
Restrict ss_fast_lookup to validated Tiny pointer paths only
...
Safety fix: ss_fast_lookup masks pointer to 1MB boundary and reads
memory at that address. If called with arbitrary (non-Tiny) pointers,
the masked address could be unmapped → SEGFAULT.
Changes:
- tiny_free_fast(): Reverted to safe hak_super_lookup (can receive
arbitrary pointers without prior validation)
- ss_fast_lookup(): Added safety warning in comments documenting when
it's safe to use (after header magic 0xA0 validation)
ss_fast_lookup remains in LARSON_FIX paths where header magic is
already validated before the SuperSlab lookup.
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-27 12:55:40 +09:00
64ed3d8d8c
Add ss_fast_lookup() for O(1) SuperSlab lookup via mask
...
Replaces expensive hak_super_lookup() (registry hash lookup, 50-100 cycles)
with fast mask-based lookup (~5-10 cycles) in free hot paths.
Algorithm:
1. Mask pointer with SUPERSLAB_SIZE_MIN (1MB) - works for both 1MB and 2MB SS
2. Validate magic (SUPERSLAB_MAGIC)
3. Range check using ss->lg_size
Applied to:
- tiny_free_fast.inc.h: tiny_free_fast() SuperSlab path
- tiny_free_fast_v2.inc.h: LARSON_FIX cross-thread check
- front/malloc_tiny_fast.h: free_tiny_fast() LARSON_FIX path
Note: Performance impact minimal with LARSON_FIX=OFF (default) since
SuperSlab lookup is skipped entirely in that case. Optimization benefits
LARSON_FIX=ON path for safe multi-threaded operation.
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-27 12:47:10 +09:00
0a8bdb8b18
Fix release build debug logging in tiny_region_id.h
...
The allocation logging at line 236-249 was missing the
#if !HAKMEM_BUILD_RELEASE guard, causing fprintf(stderr)
on every allocation even in release builds.
Impact: 19.8M ops/s → 28.0M ops/s (+42%)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-27 11:58:00 +09:00
d8e3971dc2
Fix cross-thread ownership check: Use bits 8-15 for owner_tid_low
...
Problem:
- TLS_SLL_PUSH_DUP crash in Larson multi-threaded benchmark
- Cross-thread frees incorrectly routed to same-thread TLS path
- Root cause: pthread_t on glibc is 256-byte aligned (TCB base)
so lower 8 bits are ALWAYS 0x00 for ALL threads
Fix:
- Change owner_tid_low from (tid & 0xFF) to ((tid >> 8) & 0xFF)
- Bits 8-15 actually vary between threads, enabling correct detection
- Applied consistently across all ownership check locations:
- superslab_inline.h: ss_owner_try_acquire/release/is_mine
- slab_handle.h: slab_try_acquire
- tiny_free_fast.inc.h: tiny_free_is_same_thread_ss
- tiny_free_fast_v2.inc.h: cross-thread detection
- tiny_superslab_free.inc.h: same-thread check
- ss_allocation_box.c: slab initialization
- hakmem_tiny_superslab.c: ownership handling
Also added:
- Address watcher debug infrastructure (tiny_region_id.h)
- Cross-thread detection in malloc_tiny_fast.h Front Gate
Test results:
- Larson 1T/2T/4T: PASS (no TLS_SLL_PUSH_DUP crash)
- random_mixed: PASS
- Performance: ~20M ops/s (regression from 48M, needs optimization)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-27 11:52:11 +09:00
8af9123bcc
Larson double-free investigation: Add full operation lifecycle logging
...
**Diagnostic Enhancement**: Complete malloc/free/pop operation tracing for debug
**Problem**: Larson crashes with TLS_SLL_DUP at count=18, need to trace exact
pointer lifecycle to identify if allocator returns duplicate addresses or if
benchmark has double-free bug.
**Implementation** (ChatGPT + Claude + Task collaboration):
1. **Global Operation Counter** (core/hakmem_tiny_config_box.inc:9):
- Single atomic counter for all operations (malloc/free/pop)
- Chronological ordering across all paths
2. **Allocation Logging** (core/hakmem_tiny_config_box.inc:148-161):
- HAK_RET_ALLOC macro enhanced with operation logging
- Logs first 50 class=1 allocations with ptr/base/tls_count
3. **Free Logging** (core/tiny_free_fast_v2.inc.h:222-235):
- Added before tls_sll_push() call (line 221)
- Logs first 50 class=1 frees with ptr/base/tls_count_before
4. **Pop Logging** (core/box/tls_sll_box.h:587-597):
- Added in tls_sll_pop_impl() after successful pop
- Logs first 50 class=1 pops with base/tls_count_after
5. **Drain Debug Logging** (core/box/tls_sll_drain_box.h:143-151):
- Enhanced drain loop with detailed logging
- Tracks pop failures and drained block counts
**Initial Findings**:
- First 19 operations: ALL frees, ZERO allocations, ZERO pops
- OP#0006: First free of 0x...430
- OP#0018: Duplicate free of 0x...430 → TLS_SLL_DUP detected
- Suggests either: (a) allocations before logging starts, or (b) Larson bug
**Debug-only**: All logging gated by !HAKMEM_BUILD_RELEASE (zero cost in release)
**Next Steps**:
- Expand logging window to 200 operations
- Log initialization phase allocations
- Cross-check with Larson benchmark source
**Status**: Ready for extended testing
2025-11-27 08:18:01 +09:00
8553894171
Larson double-free investigation: Enhanced diagnostics + Remove buggy drain pushback
...
**Problem**: Larson benchmark crashes with TLS_SLL_DUP (double-free), 100% crash rate in debug
**Root Cause**: TLS drain pushback code (commit c2f104618 ) created duplicates by
pushing pointers back to TLS SLL while they were still in the linked list chain.
**Diagnostic Enhancements** (ChatGPT + Claude collaboration):
1. **Callsite Tracking**: Track file:line for each TLS SLL push (debug only)
- Arrays: g_tls_sll_push_file[], g_tls_sll_push_line[]
- Macro: tls_sll_push() auto-records __FILE__, __LINE__
2. **Enhanced Duplicate Detection**:
- Scan depth: 64 → 256 nodes (deep duplicate detection)
- Error message shows BOTH current and previous push locations
- Calls ptr_trace_dump_now() for detailed analysis
3. **Evidence Captured**:
- Both duplicate pushes from same line (221)
- Pointer at position 11 in TLS SLL (count=18, scanned=11)
- Confirms pointer allocated without being popped from TLS SLL
**Fix**:
- **core/box/tls_sll_drain_box.h**: Remove pushback code entirely
- Old: Push back to TLS SLL on validation failure → duplicates!
- New: Skip pointer (accept rare leak) to avoid duplicates
- Rationale: SuperSlab lookup failures are transient/rare
**Status**: Fix implemented, ready for testing
**Updated**:
- LARSON_DOUBLE_FREE_INVESTIGATION.md: Root cause confirmed
2025-11-27 07:30:32 +09:00
c2f104618f
Fix critical TLS drain memory leak causing potential double-free
...
## Root Cause
TLS drain was dropping pointers when SuperSlab lookup or slab_idx validation failed:
- Pop pointer from TLS SLL
- Lookup/validation fails
- continue → LEAK! Pointer never returned to any freelist
## Impact
Memory leak + potential double allocation:
1. Pointer P popped but leaked
2. Same address P reallocated from carve/other source
3. User frees P again → duplicate detection → ABORT
## Fix
**Before (BUGGY)**:
```c
if (!ss || invalid_slab_idx) {
continue; // ← LEAK!
}
```
**After (FIXED)**:
```c
if (!ss || invalid_slab_idx) {
// Push back to TLS SLL head (retry later)
tiny_next_write(class_idx, base, g_tls_sll[class_idx].head);
g_tls_sll[class_idx].head = base;
g_tls_sll[class_idx].count++;
break; // Stop draining to avoid infinite retry
}
```
## Files Changed
- core/box/tls_sll_drain_box.h: Fix 2 leak sites (SS lookup + slab_idx validation)
- docs/analysis/LARSON_DOUBLE_FREE_INVESTIGATION.md: Investigation report
## Related
- Larson double-free investigation (47% crash rate)
- Commit e4868bf23 : Freelist header write + abort() on duplicate
- ChatGPT analysis: Larson benchmark code is correct (no user bug)
2025-11-27 06:49:38 +09:00
e4868bf236
Larson crash investigation: Add freelist header write + abort() on duplicate
...
## Changes
1. **TLS SLL duplicate detection** (core/box/tls_sll_box.h:381)
- Changed 'return true' to 'abort()' to get backtrace on double-free
- Enables precise root cause identification
2. **Freelist header write fix** (core/tiny_superslab_alloc.inc.h:159-169)
- Added tiny_region_id_write_header() call in freelist allocation path
- Previously only linear carve wrote headers → stale headers on reuse
- Now both paths write headers consistently
## Root Cause Analysis
Backtrace revealed true double-free pattern:
- last_push_from=hak_tiny_free_fast_v2 (freed once)
- last_pop_from=(null) (never allocated)
- where=hak_tiny_free_fast_v2 (freed again!)
Same pointer freed twice WITHOUT reallocation in between.
## Status
- Freelist header fix: ✅ Implemented (necessary but not sufficient)
- Double-free still occurs: ❌ Deeper investigation needed
- Possible causes: User code bug, TLS drain race, remote free issue
Next: Investigate allocation/free flow with enhanced tracing
2025-11-27 05:57:22 +09:00
12c36afe46
Fix TSan build: Add weak stubs for sanitizer compatibility
...
Added weak stubs to core/link_stubs.c for symbols that are not needed
in HAKMEM_FORCE_LIBC_ALLOC_BUILD=1 (TSan/ASan) builds:
Stubs added:
- g_bump_chunk (int)
- g_tls_bcur, g_tls_bend (__thread uint8_t*[8])
- smallmid_backend_free()
- expand_superslab_head()
Also added: #include <stdint.h> for uint8_t
Impact:
- TSan build: PASS (larson_hakmem_tsan successfully built)
- Phase 2 ready: Can now use TSan to debug Larson crashes
Next: Use TSan to investigate Larson 47% crash rate
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-27 05:19:56 +09:00
2ec6689dee
Docs: Update ENV variable documentation after Ultra HEAP deletion
...
Updated documentation to reflect commit 6b791b97d deletions:
Removed ENV variables (6):
- HAKMEM_TINY_ULTRA_FRONT
- HAKMEM_TINY_ULTRA_L0
- HAKMEM_TINY_ULTRA_HEAP_DUMP
- HAKMEM_TINY_ULTRA_PAGE_DUMP
- HAKMEM_TINY_BG_REMOTE (no getenv, dead code)
- HAKMEM_TINY_BG_REMOTE_BATCH (no getenv, dead code)
Files updated (5):
- docs/analysis/ENV_CLEANUP_ANALYSIS.md: Updated BG/Ultra counts
- docs/analysis/ENV_QUICK_REFERENCE.md: Updated verification sections
- docs/analysis/ENV_CLEANUP_PLAN.md: Added REMOVED category
- docs/archive/TINY_LEARNING_LAYER.md: Added archive notice
- docs/archive/MAINLINE_INTEGRATION.md: Added archive notice
Changes: +71/-32 lines
Preserved ENV variables:
- HAKMEM_TINY_ULTRA_SLIM (active 4-layer fast path)
- HAKMEM_ULTRA_SLIM_STATS (Ultra SLIM statistics)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-27 04:51:59 +09:00
6b791b97d4
ENV Cleanup: Delete Ultra HEAP & BG Remote dead code (-1,096 LOC)
...
Deleted files (11):
- core/ultra/ directory (6 files: tiny_ultra_heap.*, tiny_ultra_page_arena.*)
- core/front/tiny_ultrafront.h
- core/tiny_ultra_fast.inc.h
- core/hakmem_tiny_ultra_front.inc.h
- core/hakmem_tiny_ultra_simple.inc
- core/hakmem_tiny_ultra_batch_box.inc
Edited files (10):
- core/hakmem_tiny.c: Remove Ultra HEAP #includes, move ultra_batch_for_class()
- core/hakmem_tiny_tls_state_box.inc: Delete TinyUltraFront, g_ultra_simple
- core/hakmem_tiny_phase6_wrappers_box.inc: Delete ULTRA_SIMPLE block
- core/hakmem_tiny_alloc.inc: Delete Ultra-Front code block
- core/hakmem_tiny_init.inc: Delete ULTRA_SIMPLE ENV loading
- core/hakmem_tiny_remote_target.{c,h}: Delete g_bg_remote_enable/batch
- core/tiny_refill.h: Remove BG Remote check (always break)
- core/hakmem_tiny_background.inc: Delete BG Remote drain loop
Deleted ENV variables:
- HAKMEM_TINY_ULTRA_HEAP (build flag, undefined)
- HAKMEM_TINY_ULTRA_L0
- HAKMEM_TINY_ULTRA_HEAP_DUMP
- HAKMEM_TINY_ULTRA_PAGE_DUMP
- HAKMEM_TINY_ULTRA_FRONT
- HAKMEM_TINY_BG_REMOTE (no getenv, dead code)
- HAKMEM_TINY_BG_REMOTE_BATCH (no getenv, dead code)
- HAKMEM_TINY_ULTRA_SIMPLE (references only)
Impact:
- Code reduction: -1,096 lines
- Binary size: 305KB → 304KB (-1KB)
- Build: PASS
- Sanity: 15.69M ops/s (3 runs avg)
- Larson: 1 crash observed (seed 43, likely existing instability)
Notes:
- Ultra HEAP never compiled (#if HAKMEM_TINY_ULTRA_HEAP undefined)
- BG Remote variables never initialized (g_bg_remote_enable always 0)
- Ultra SLIM (ultra_slim_alloc_box.h) preserved (active 4-layer path)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-27 04:35:47 +09:00
f4978b1529
ENV Cleanup Phase 5: Additional DEBUG guards + doc cleanup
...
Code changes:
- core/slab_handle.h: Add RELEASE guard for HAKMEM_TINY_FREELIST_MASK
- core/tiny_superslab_free.inc.h: Add guards for HAKMEM_TINY_ROUTE_FREE, HAKMEM_TINY_FREELIST_MASK
Documentation cleanup:
- docs/specs/CONFIGURATION.md: Remove 21 doc-only ENV variables
- docs/specs/ENV_VARS.md: Remove doc-only variables
Testing:
- Build: PASS (305KB binary, unchanged)
- Sanity: PASS (17.22M ops/s average, 3 runs)
- Larson: PASS (52.12M ops/s, 0 crashes)
Impact:
- 2 additional DEBUG ENV variables guarded (no overhead in RELEASE)
- Documentation accuracy improved
- Binary size maintained
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-27 03:55:17 +09:00
43015725af
ENV cleanup: Add RELEASE guards to DEBUG ENV variables (14 vars)
...
Added compile-time guards (#if HAKMEM_BUILD_RELEASE) to eliminate
DEBUG ENV variable overhead in RELEASE builds.
Variables guarded (14 total):
- HAKMEM_TINY_TRACE_RING, HAKMEM_TINY_DUMP_RING_ATEXIT
- HAKMEM_TINY_RF_TRACE, HAKMEM_TINY_MAILBOX_TRACE
- HAKMEM_TINY_MAILBOX_TRACE_LIMIT, HAKMEM_TINY_MAILBOX_SLOWDISC
- HAKMEM_TINY_MAILBOX_SLOWDISC_PERIOD
- HAKMEM_SS_PREWARM_DEBUG, HAKMEM_SS_FREE_DEBUG
- HAKMEM_TINY_FRONT_METRICS, HAKMEM_TINY_FRONT_DUMP
- HAKMEM_TINY_COUNTERS_DUMP, HAKMEM_TINY_REFILL_DUMP
- HAKMEM_PTR_TRACE_DUMP, HAKMEM_PTR_TRACE_VERBOSE
Files modified (9 core files):
- core/tiny_debug_ring.c (ring trace/dump)
- core/box/mailbox_box.c (mailbox trace + slowdisc)
- core/tiny_refill.h (refill trace)
- core/hakmem_tiny_superslab.c (superslab debug)
- core/box/ss_allocation_box.c (allocation debug)
- core/tiny_superslab_free.inc.h (free debug)
- core/box/front_metrics_box.c (frontend metrics)
- core/hakmem_tiny_stats.c (stats dump)
- core/ptr_trace.h (pointer trace)
Bug fixes during implementation:
1. mailbox_box.c - Fixed variable scope (moved 'used' outside guard)
2. hakmem_tiny_stats.c - Fixed incomplete declarations (on1, on2)
Impact:
- Binary size: -85KB total
- bench_random_mixed_hakmem: 319K → 305K (-14K, -4.4%)
- larson_hakmem: 380K → 309K (-71K, -18.7%)
- Performance: No regression (16.9-17.9M ops/s maintained)
- Functional: All tests pass (Random Mixed + Larson)
- Behavior: DEBUG ENV vars correctly ignored in RELEASE builds
Testing:
- Build: Clean compilation (warnings only, pre-existing)
- 100K Random Mixed: 16.9-17.9M ops/s (PASS)
- 10K Larson: 25.9M ops/s (PASS)
- DEBUG ENV verification: Correctly ignored (PASS)
Result: 14 DEBUG ENV variables now have zero overhead in RELEASE builds.
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-27 03:41:07 +09:00
543abb0586
ENV cleanup: Consolidate SFC_DEBUG getenv() calls (86% reduction)
...
Optimized HAKMEM_SFC_DEBUG environment variable handling by caching
the value at initialization instead of repeated getenv() calls in
hot paths.
Changes:
1. Added g_sfc_debug global variable (core/hakmem_tiny_sfc.c)
- Initialized once in sfc_init() by reading HAKMEM_SFC_DEBUG
- Single source of truth for SFC debug state
2. Declared g_sfc_debug as extern (core/hakmem_tiny_config.h)
- Available to all modules that need SFC debug checks
3. Replaced getenv() with g_sfc_debug in hot paths:
- core/tiny_alloc_fast_sfc.inc.h (allocation path)
- core/tiny_free_fast.inc.h (free path)
- core/box/hak_wrappers.inc.h (wrapper layer)
Impact:
- getenv() calls: 7 → 1 (86% reduction)
- Hot-path calls eliminated: 6 (all moved to init-time)
- Performance: 15.10M ops/s (stable, 0% CV)
- Build: Clean compilation, no new warnings
Testing:
- 10 runs of 100K iterations: consistent performance
- Symbol verification: g_sfc_debug present in hakmem_tiny_sfc.o
- No regression detected
Note: 3 additional getenv("HAKMEM_SFC_DEBUG") calls exist in
hakmem_tiny_ultra_simple.inc but are dead code (file not compiled
in current build configuration).
Files modified: 5 core files
Status: Production-ready, all tests passed
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-27 03:18:33 +09:00
d511084c5b
ENV cleanup: Remove 21 doc-only variables from ENV_VARS.md
...
Removed 21 ENV variables that existed only in documentation with zero
code references (no getenv() calls in source):
Pool/Refill (1):
- HAKMEM_POOL_REFILL_BATCH
Intelligence Engine (3):
- HAKMEM_INT_ENGINE, HAKMEM_INT_EVENT_TS, HAKMEM_INT_SAMPLE
Frontend/FastCache (3):
- HAKMEM_TINY_FRONTEND, HAKMEM_TINY_FASTCACHE, HAKMEM_TINY_FAST
Wrapper/Safety/Debug (5):
- HAKMEM_WRAP_TINY_REFILL, HAKMEM_SAFE_FREE_STRICT, HAKMEM_TINY_GUARD
- HAKMEM_TINY_DEBUG_FAST0, HAKMEM_TINY_DEBUG_REMOTE_GUARD
Optimization/TLS/Memory (9):
- HAKMEM_TINY_QUICK, HAKMEM_USE_REGISTRY
- HAKMEM_TINY_TLS_LIST, HAKMEM_TINY_DRAIN_TO_SLL, HAKMEM_TINY_ALLOC_RING
- HAKMEM_TINY_MEM_DIET, HAKMEM_SLL_MULTIPLIER
- HAKMEM_TINY_PREFETCH, HAKMEM_TINY_SS_RESERVE
Impact:
- ENV_VARS.md: 327 lines → 285 lines (-42 lines, 12.8% reduction)
- Code impact: Zero (documentation-only cleanup)
- Variables were: planned features never implemented, replaced features,
or abandoned experiments
Documentation:
- Added SAFE_TO_DELETE_ENV_VARS.md to docs/analysis/
- Complete analysis of why each variable is obsolete
- Verification proof that variables don't exist in code
File: docs/specs/ENV_VARS.md
Status: Documentation cleanup - no code changes
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-27 02:52:35 +09:00
6fadc74405
ENV cleanup: Remove obsolete ULTRAHOT variable + organize docs
...
Changes:
1. Removed HAKMEM_TINY_FRONT_ENABLE_ULTRAHOT variable
- Deleted front_prune_ultrahot_enabled() function
- UltraHot feature was removed in commit bcfb4f6b5
- Variable was dead code, no longer referenced
2. Organized ENV cleanup analysis documents
- Moved 5 ENV analysis docs to docs/analysis/
- ENV_CLEANUP_PLAN.md - detailed file-by-file plan
- ENV_CLEANUP_SUMMARY.md - executive summary
- ENV_CLEANUP_ANALYSIS.md - categorized analysis
- ENV_CONSOLIDATION_PLAN.md - consolidation proposals
- ENV_QUICK_REFERENCE.md - quick reference guide
Impact:
- ENV variables: 221 → 220 (-1)
- Build: ✅ Successful
- Risk: Zero (dead code removal)
Next steps (documented in ENV_CLEANUP_SUMMARY.md):
- 21 variables need verification (Ultra/HeapV2/BG/HotMag)
- SFC_DEBUG deduplication opportunity (7 callsites)
File: core/box/front_metrics_box.h
Status: SAVEPOINT - stable baseline for future ENV cleanup
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-26 17:12:41 +09:00
963004413a
Update CURRENT_TASK: master branch established as stable baseline
...
Changes:
- Branch updated from larson-master-rebuild to master
- Phase 1 marked as DONE (cleanup & stabilization complete)
- Documented master establishment (d26dd092b )
- Added reference to master-80M-unstable backup branch
- Updated performance numbers (Larson 51.95M, Random Mixed 66.82M)
- Outlined three options for future work
Current state:
- master @ d26dd092b : stable, Larson works (0% crash)
- master-80M-unstable @ 328a6b722: preserved for reference
- PERFORMANCE_HISTORY_62M_TO_80M.md: documents 80M path
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-26 16:54:36 +09:00
d26dd092bb
Document performance improvements from 62M → 80M ops/s
...
Created comprehensive record of all optimization commits that led to
80M ops/s Random Mixed performance on master branch:
- UNIFIED-HEADER (472b6a60b): +17% improvement
- Bug fixes (d26519f67): +15-41% improvement
- tiny_get_max_size inline (e81fe783d): +2M ops/s
- Min-Keep policy (04a60c316): +1M ops/s
- Unified Cache tuning (392d29018): +1M ops/s
- Stage 1 lock-free (dcd89ee88): +0.3M ops/s
- Shared Pool optimizations: +2-3M ops/s
Also documents:
- Step 2.5 bug (19c1abfe7) causing 100% Larson crash
- Architecture differences (E1-CORRECT vs UNIFIED-HEADER)
- Current state comparison between master and larson-master-rebuild
- Three future options with trade-offs
This documentation preserves knowledge of the 80M optimization path
for future reference when larson-master-rebuild becomes new master.
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-26 16:52:18 +09:00
bea839add6
Revert "Port: Tune Superslab Min-Keep and Shared Pool Soft Caps (04a60c316)"
...
This reverts commit d355041638 .
2025-11-26 15:43:45 +09:00
d355041638
Port: Tune Superslab Min-Keep and Shared Pool Soft Caps (04a60c316)
...
- Policy: Set tiny_min_keep for C2-C6 to reduce mmap/munmap churn
- Policy: Loosen tiny_cap (soft cap) for C4-C6 to allow more active slots
- Added tiny_min_keep field to FrozenPolicy struct
Larson: 52.13M ops/s (stable)
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-26 15:06:36 +09:00
a2e65716b3
Port: Optimize tiny_get_max_size inline (e81fe783d)
...
- Move tiny_get_max_size to header for inlining
- Use cached static variable to avoid repeated env lookup
- Larson: 51.99M ops/s (stable)
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-26 15:05:03 +09:00
a9ddb52ad4
ENV cleanup: Remove BG/HotMag vars & guard fprintf (Larson 52.3M ops/s)
...
Phase 1 完了:環境変数整理 + fprintf デバッグガード
ENV変数削除(BG/HotMag系):
- core/hakmem_tiny_init.inc: HotMag ENV 削除 (~131 lines)
- core/hakmem_tiny_bg_spill.c: BG spill ENV 削除
- core/tiny_refill.h: BG remote 固定値化
- core/hakmem_tiny_slow.inc: BG refs 削除
fprintf Debug Guards (#if !HAKMEM_BUILD_RELEASE):
- core/hakmem_shared_pool.c: Lock stats (~18 fprintf)
- core/page_arena.c: Init/Shutdown/Stats (~27 fprintf)
- core/hakmem.c: SIGSEGV init message
ドキュメント整理:
- 328 markdown files 削除(旧レポート・重複docs)
性能確認:
- Larson: 52.35M ops/s (前回52.8M、安定動作✅ )
- ENV整理による機能影響なし
- Debug出力は一部残存(次phase で対応)
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-26 14:45:26 +09:00
67fb15f35f
Wrap debug fprintf in !HAKMEM_BUILD_RELEASE guards (Release build optimization)
...
## Changes
### 1. core/page_arena.c
- Removed init failure message (lines 25-27) - error is handled by returning early
- All other fprintf statements already wrapped in existing #if !HAKMEM_BUILD_RELEASE blocks
### 2. core/hakmem.c
- Wrapped SIGSEGV handler init message (line 72)
- CRITICAL: Kept SIGSEGV/SIGBUS/SIGABRT error messages (lines 62-64) - production needs crash logs
### 3. core/hakmem_shared_pool.c
- Wrapped all debug fprintf statements in #if !HAKMEM_BUILD_RELEASE:
- Node pool exhaustion warning (line 252)
- SP_META_CAPACITY_ERROR warning (line 421)
- SP_FIX_GEOMETRY debug logging (line 745)
- SP_ACQUIRE_STAGE0.5_EMPTY debug logging (line 865)
- SP_ACQUIRE_STAGE0_L0 debug logging (line 803)
- SP_ACQUIRE_STAGE1_LOCKFREE debug logging (line 922)
- SP_ACQUIRE_STAGE2_LOCKFREE debug logging (line 996)
- SP_ACQUIRE_STAGE3 debug logging (line 1116)
- SP_SLOT_RELEASE debug logging (line 1245)
- SP_SLOT_FREELIST_LOCKFREE debug logging (line 1305)
- SP_SLOT_COMPLETELY_EMPTY debug logging (line 1316)
- Fixed lock_stats_init() for release builds (lines 60-65) - ensure g_lock_stats_enabled is initialized
## Performance Validation
Before: 51M ops/s (with debug fprintf overhead)
After: 49.1M ops/s (consistent performance, fprintf removed from hot paths)
## Build & Test
```bash
./build.sh larson_hakmem
./out/release/larson_hakmem 1 5 1 1000 100 10000 42
# Result: 49.1M ops/s
```
Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-26 13:14:18 +09:00
4e082505cc
Cleanup: Wrap shared_pool debug fprintf in #if !HAKMEM_BUILD_RELEASE
...
- Lock stats (P0 instrumentation): ~10 fprintf wrapped
- Stage stats (S1/S2/S3 breakdown): ~8 fprintf wrapped
- Release build now has no-op stubs for stats init functions
- Data collection APIs kept for learning layer compatibility
2025-11-26 13:05:17 +09:00
6b38bc840e
Cleanup: Remove unused hakmem_libc.c (duplicate of hakmem_syscall.c)
...
- File was not included in Makefile OBJS_BASE
- Functions already implemented in hakmem_syscall.c
- Size: 361 bytes removed
2025-11-26 13:03:17 +09:00
9d74f7e57d
Add comprehensive CONFIGURATION.md user documentation
...
(cherry-picked from 0143e0fed, conflict resolved)
2025-11-26 12:34:44 +09:00
ee722b2131
Update Larson bug analysis: root cause identified, larson-fix branch created
...
(cherry-picked from 328a6b722)
2025-11-26 12:34:27 +09:00
8f5a162c41
Document learning system critical bugs discovered during benchmark validation
...
(cherry-picked from 2c99afa49)
2025-11-26 12:34:21 +09:00
bcfb4f6b59
Remove dead code: UltraHot, RingCache, FrontC23, Class5 Hotpath
...
(cherry-picked from 225b6fcc7, conflicts resolved)
2025-11-26 12:33:49 +09:00
a3b80833eb
Legacy cleanup Phase 2a: Remove backup files (-1,072 KB)
...
(cherry-picked from 416930eb6)
2025-11-26 12:31:10 +09:00
feadc2832f
Legacy cleanup: Remove obsolete test files and #if 0 blocks (-1,750 LOC)
...
(cherry-picked from cc0104c4e)
2025-11-26 12:31:04 +09:00
950627587a
Remove legacy/unused code: 6 .inc files + disabled #if 0 block (1,159 LOC)
...
(cherry-picked from 9793f17d6)
2025-11-26 12:30:30 +09:00
5c85675621
Add callsite tracking for tls_sll_push/pop (macro-based Box Theory)
...
Problem:
- [TLS_SLL_PUSH_DUP] at 225K iterations but couldn't identify bypass path
- Need push AND pop callsites to diagnose reuse-before-pop bug
Implementation (Box Theory):
- Renamed tls_sll_push → tls_sll_push_impl (with where parameter)
- Renamed tls_sll_pop → tls_sll_pop_impl (with where parameter)
- Added macro wrappers with __func__ auto-insertion
- Zero changes to 40+ call sites (Box boundary preserved)
Debug-only tracking:
- All tracking code wrapped in #if !HAKMEM_BUILD_RELEASE
- Release builds: where=NULL, zero overhead
- Arrays: s_tls_sll_last_push_from[], s_tls_sll_last_pop_from[]
New log format:
[TLS_SLL_PUSH_DUP] cls=5 ptr=0x...
last_push_from=hak_tiny_free_fast_v2
last_pop_from=(null) ← SMOKING GUN!
where=hak_tiny_free_fast_v2
Decisive Evidence:
✅ last_pop_from=(null) proves TLS SLL never popped
✅ Unified Cache bypasses TLS SLL (confirmed by Task agent)
✅ Root cause: unified_cache_refill() directly carves from SuperSlab
Impact:
- Complete push/pop flow tracking (debug builds only)
- Root cause identified: Unified Cache at Line 289
- Next step: Fix unified_cache_refill() to check TLS SLL first
Credit: Box Theory macro pattern suggested by ChatGPT
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-22 11:30:46 +09:00
c8842360ca
Fix: Double header calculation bug in tiny_block_stride_for_class() - META_MISMATCH resolved
...
Problem:
workset=8192 crashed with META_MISMATCH errors (off-by-one):
- [TLS_SLL_PUSH_META_MISMATCH] cls=3 meta_cls=2
- [HDR_META_MISMATCH] cls=6 meta_cls=5
- [FREE_FAST_HDR_META_MISMATCH] cls=7 meta_cls=6
Root Cause (discovered by Task agent):
Contradictory stride calculations in codebase:
1. g_tiny_class_sizes[TINY_NUM_CLASSES]
- Already includes 1-byte header (TOTAL size)
- {8, 16, 32, 64, 128, 256, 512, 2048}
2. tiny_block_stride_for_class() (BEFORE FIX)
- Added extra +1 for header (DOUBLE COUNTING!)
- Class 5: 256 + 1 = 257 (should be 256)
- Class 6: 512 + 1 = 513 (should be 512)
This caused stride → class_idx reverse lookup to fail:
- superslab_init_slab() searched g_tiny_class_sizes[?] == 257
- No match found → meta->class_idx corrupted
- Free: header has cls=6, meta has cls=5 → MISMATCH!
Fix Applied (core/hakmem_tiny_superslab.h:49-69):
- Removed duplicate +1 calculation under HAKMEM_TINY_HEADER_CLASSIDX
- Added OOB guard (return 0 for invalid class_idx)
- Added comment: "g_tiny_class_sizes already includes the 1-byte header"
Test Results:
Before fix:
- 100K iterations: META_MISMATCH errors → SEGV
- 200K iterations: Immediate SEGV
After fix:
- 100K iterations: ✅ 9.9M ops/s (no errors)
- 200K iterations: ✅ 15.2M ops/s (no errors)
- 220K iterations: ✅ 15.3M ops/s (no errors)
- 225K iterations: ❌ SEGV (different bug, not META_MISMATCH)
Impact:
✅ META_MISMATCH errors completely eliminated
✅ Stability improved: 100K → 220K iterations (+120%)
✅ Throughput stable: 15M ops/s
⚠️ Different SEGV at 225K (requires separate investigation)
Investigation Credit:
- Task agent: Identified contradictory stride tables
- ChatGPT: Applied fix and verified LUT correctness
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-22 09:34:35 +09:00
3d341a8b3f
Fix: TLS SLL double-free diagnostics - Add error handling and detection improvements
...
Problem:
workset=8192 crashes at 240K iterations with TLS SLL double-free:
[TLS_SLL_PUSH] FATAL double-free: cls=5 ptr=... already in SLL
Investigation (Task agent):
Identified 8 tls_sll_push() call sites and 3 high-risk areas:
1. HIGH: Carve-Push Rollback pop failures (carve_push_box.c)
2. MEDIUM: Splice partial orphaned nodes (tiny_refill_opt.h)
3. MEDIUM: Incomplete double-free scan - only 64 nodes (tls_sll_box.h)
Fixes Applied:
1. core/box/carve_push_box.c (Lines 115-139)
- Track pop_failed count during rollback
- Log orphaned blocks: [BOX_CARVE_PUSH_ROLLBACK] warning
- Helps identify when rollback leaves blocks in SLL
2. core/box/tls_sll_box.h (Lines 347-370)
- Increase double-free scan: 64 → 256 nodes
- Add scanned count to error: (scanned=%u/%u)
- Catches orphaned blocks deeper in chain
3. core/tiny_refill_opt.h (Lines 135-166)
- Enhanced splice partial logging
- Abort in debug builds on orphaned nodes
- Prevents silent memory leaks
Test Results:
Before: SEGV at 220K iterations
After: SEGV at 240K iterations (improved detection)
[TLS_SLL_PUSH] FATAL double-free: cls=5 ptr=... (scanned=2/71)
Impact:
✅ Early detection working (catches at position 2)
✅ Diagnostic capability greatly improved
⚠️ Root cause not yet resolved (deeper investigation needed)
Status: Diagnostic improvements committed for further analysis
Credit: Root cause analysis by Task agent (Explore)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-22 08:43:18 +09:00
6ae0db9fd2
Fix: workset=8192 SEGV - Align slab_index_for to Box3 geometry (iteration 2)
...
Problem:
After Box3 geometry unification (commit 2fe970252 ), workset=8192 still SEGVs:
- 200K iterations: ✅ OK
- 300K iterations: ❌ SEGV
Root Cause (identified by ChatGPT):
Header/metadata class mismatches around 300K iterations:
- [HDR_META_MISMATCH] hdr_cls=6 meta_cls=5
- [FREE_FAST_HDR_META_MISMATCH] hdr_cls=5 meta_cls=4
- [TLS_SLL_PUSH_META_MISMATCH] cls=5 meta_cls=4
Cause: slab_index_for() geometry mismatch with Box3
- tiny_slab_base_for_geometry() (Box3):
- Slab 0: ss + SUPERSLAB_SLAB0_DATA_OFFSET
- Slab 1: ss + 1*SLAB_SIZE
- Slab k: ss + k*SLAB_SIZE
- Old slab_index_for():
rel = p - (base + SUPERSLAB_SLAB0_DATA_OFFSET);
idx = rel / SLAB_SIZE;
- Result: Off-by-one for slab_idx > 0
Example: tiny_slab_base_for_geometry(ss, 4) returns 0x...40000
slab_index_for(ss, 0x...40000) returns 3 (wrong!)
Impact:
- Block allocated in "C6 slab 4" appears to be in "C5 slab 3"
- Header class_idx (C6) != meta->class_idx (C5)
- TLS SLL corruption → SEGV after extended runs
Fix: core/superslab/superslab_inline.h
======================================
Rewrite slab_index_for() as inverse of Box3 geometry:
static inline int slab_index_for(SuperSlab* ss, void* ptr) {
// ... bounds checks ...
// Slab 0: special case (has metadata offset)
if (p < base + SLAB_SIZE) {
return 0;
}
// Slab 1+: simple SLAB_SIZE spacing from base
size_t rel = p - base; // ← Changed from (p - base - OFFSET)
int idx = (int)(rel / SLAB_SIZE);
return idx;
}
Verification:
- slab_index_for(ss, tiny_slab_base_for_geometry(ss, idx)) == idx ✅
- Consistent for any address within slab
Test Results:
=============
workset=8192 SEGV threshold improved further:
Before this fix (after 2fe970252 ):
✅ 200K iterations: OK
❌ 300K iterations: SEGV
After this fix:
✅ 220K iterations: OK (15.5M ops/s)
❌ 240K iterations: SEGV (different bug)
Progress:
- Iteration 1 (2fe970252 ): 0 → 200K stable
- Iteration 2 (this fix): 200K → 220K stable
- Total improvement: ∞ → 220K iterations (+10% stability)
Known Issues:
- 240K+ still SEGVs (suspected: TLS SLL double-free, per ChatGPT)
- Debug builds may show TLS_SLL_PUSH FATAL double-free detection
- Requires further investigation of free path
Impact:
- No performance regression in stable range
- Header/metadata mismatch errors eliminated
- workset=256 unaffected: 60M+ ops/s maintained
Credit: Root cause analysis and fix by ChatGPT
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-22 07:56:06 +09:00
2fe970252a
Fix: workset=8192 SEGV - Unify SuperSlab geometry to Box3 (partial fix)
...
Problem:
- bench_random_mixed_hakmem with workset=8192 causes SEGV
- workset=256 works fine
- Root cause identified by ChatGPT analysis
Root Cause:
SuperSlab geometry double definition caused slab_base misalignment:
- Old: tiny_slab_base_for() used SLAB0_OFFSET + idx * SLAB_SIZE
- New: Box3 tiny_slab_base_for_geometry() uses offset only for idx=0
- Result: slab_idx > 0 had +2048 byte offset error
- Impact: Unified Cache carve stepped beyond slab boundary → SEGV
Fix 1: core/superslab/superslab_inline.h
========================================
Delegate SuperSlab base calculation to Box3:
static inline uint8_t* tiny_slab_base_for(SuperSlab* ss, int slab_idx) {
if (!ss || slab_idx < 0) return NULL;
return tiny_slab_base_for_geometry(ss, slab_idx); // ← Box3 unified
}
Effect:
- All tiny_slab_base_for() calls now use single Box3 implementation
- TLS slab_base and Box3 calculations perfectly aligned
- Eliminates geometry mismatch between layers
Fix 2: core/front/tiny_unified_cache.c
========================================
Enhanced fail-fast validation (debug builds only):
- unified_refill_validate_base(): Use TLS as source of truth
- Cross-check with registry lookup for safety
- Validate: slab_base range, alignment, meta consistency
- Box3 + TLS boundary consolidated to one place
Fix 3: core/hakmem_tiny_superslab.h
========================================
Added forward declaration:
- SuperSlab* superslab_refill(int class_idx);
- Required by tiny_unified_cache.c
Test Results:
=============
workset=8192 SEGV threshold improved:
Before fix:
❌ Immediate SEGV at any iteration count
After fix:
✅ 100K iterations: OK (9.8M ops/s)
✅ 200K iterations: OK (15.5M ops/s)
❌ 300K iterations: SEGV (different bug exposed)
Conclusion:
- Box3 geometry unification fixed primary SEGV
- Stability improved: 0 → 200K iterations
- Remaining issue: 300K+ iterations hit different bug
- Likely causes: memory pressure, different corruption pattern
Known Issues:
- Debug warnings still present: FREE_FAST_HDR_META_MISMATCH, NXT_HDR_MISMATCH
- These are separate header consistency issues (not related to geometry)
- 300K+ SEGV requires further investigation
Performance:
- No performance regression observed in stable range
- workset=256 unaffected: 60M+ ops/s maintained
Credit: Root cause analysis and fix strategy by ChatGPT
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-22 07:40:35 +09:00
38e4e8d4c2
Phase 19-2: Ultra SLIM debug logging and root cause analysis
...
Add comprehensive statistics tracking and debug logging to Ultra SLIM 4-layer
fast path to diagnose why it wasn't being called.
Changes:
1. core/box/ultra_slim_alloc_box.h
- Move statistics tracking (ultra_slim_track_hit/miss) before first use
- Add debug logging in ultra_slim_print_stats()
- Track call counts to verify Ultra SLIM path execution
- Enhanced stats output with per-class breakdown
2. core/tiny_alloc_fast.inc.h
- Add debug logging at Ultra SLIM gate (line 700-710)
- Log whether Ultra SLIM mode is enabled on first allocation
- Helps diagnose allocation path routing
Root Cause Analysis (with ChatGPT):
========================================
Problem: Ultra SLIM was not being called in default configuration
- ENV: HAKMEM_TINY_ULTRA_SLIM=1
- Observed: Statistics counters remained zero
- Expected: Ultra SLIM 4-layer path to handle allocations
Investigation:
- malloc() → Front Gate Unified Cache → complete (default path)
- Ultra SLIM gate in tiny_alloc_fast() never reached
- Front Gate/Unified Cache handles 100% of allocations
Solution to Test Ultra SLIM:
Turn OFF Front Gate and Unified Cache to force old Tiny path:
HAKMEM_TINY_ULTRA_SLIM=1 \
HAKMEM_FRONT_GATE_UNIFIED=0 \
HAKMEM_TINY_UNIFIED_CACHE=0 \
./out/release/bench_random_mixed_hakmem 100000 256 42
Results:
✅ Ultra SLIM gate logged: ENABLED
✅ Statistics: 49,526 hits, 542 misses (98.9% hit rate)
✅ Throughput: 9.1M ops/s (100K iterations)
⚠️ 10M iterations: TLS SLL corruption (not Ultra SLIM bug)
Secondary Discovery (ChatGPT Analysis):
========================================
TLS SLL C6/C7 corruption is NOT caused by Ultra SLIM:
Evidence:
- Same [TLS_SLL_POP_POST_INVALID] errors occur with Ultra SLIM OFF
- Ultra SLIM OFF + FrontGate/Unified OFF: 9.2M ops/s with same errors
- Root cause: Existing TLS SLL bug exposed when bypassing Front Gate
- Ultra SLIM never pushes to TLS SLL (only pops)
Conclusion:
- Ultra SLIM implementation is correct ✅
- Default configuration (Front Gate/Unified ON) is stable: 60M ops/s
- TLS SLL bugs are pre-existing, unrelated to Ultra SLIM
- Ultra SLIM can be safely enabled with default configuration
Performance Summary:
- Front Gate/Unified ON (default): 60.1M ops/s ✅ stable
- Ultra SLIM works correctly when path is reachable
- No changes needed to Ultra SLIM code
Next Steps:
1. Address workset=8192 SEGV (existing bug, high priority)
2. TLS SLL C6/C7 corruption (separate existing issue)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-22 06:50:38 +09:00
674965080f
Build: Add out/ directory to .gitignore
...
Fix Claude Code performance warning:
- Repository snapshot was tracking 385+ untracked files in out/ directory
- out/ contains build artifacts (binaries, intermediate objects)
- Adding out/ to .gitignore resolves the warning
Impact: Improves Claude Code repository scanning performance
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-22 06:28:53 +09:00
896f24367f
Phase 19-2: Ultra SLIM 4-layer fast path implementation (ENV gated)
...
Implement Ultra SLIM 4-layer allocation fast path with ACE learning preserved.
ENV: HAKMEM_TINY_ULTRA_SLIM=1 (default OFF)
Architecture (4 layers):
- Layer 1: Init Safety (1-2 cycles, cold path only)
- Layer 2: Size-to-Class (1-2 cycles, LUT lookup)
- Layer 3: ACE Learning (2-3 cycles, histogram update) ← PRESERVED!
- Layer 4: TLS SLL Direct (3-5 cycles, freelist pop)
- Total: 7-12 cycles (~2-4ns on 3GHz CPU)
Goal: Achieve mimalloc parity (90-110M ops/s) by removing intermediate layers
(HeapV2, FastCache, SFC) while preserving HAKMEM's learning capability.
Deleted Layers (from standard 7-layer path):
❌ HeapV2 (C0-C3 magazine)
❌ FastCache (C0-C3 array stack)
❌ SFC (Super Front Cache)
Expected savings: 11-15 cycles
Implementation:
1. core/box/ultra_slim_alloc_box.h
- 4-layer allocation path (returns USER pointer)
- TLS-cached ENV check (once per thread)
- Statistics & diagnostics (HAKMEM_ULTRA_SLIM_STATS=1)
- Refill integration with backend
2. core/tiny_alloc_fast.inc.h
- Ultra SLIM gate at entry point (line 694-702)
- Early return if Ultra SLIM mode enabled
- Zero impact on standard path (cold branch)
Performance Results (Random Mixed 256B, 10M iterations):
- Baseline (Ultra SLIM OFF): 63.3M ops/s
- Ultra SLIM ON: 62.6M ops/s (-1.1%)
- Target: 90-110M ops/s (mimalloc parity)
- Gap: 44-76% slower than target
Status: Implementation complete, but performance target not achieved.
The 4-layer architecture is in place and ACE learning is preserved.
Further optimization needed to reach mimalloc parity.
Next Steps:
- Profile Ultra SLIM path to identify remaining bottlenecks
- Verify TLS SLL hit rate (statistics currently show zero)
- Consider further cycle reduction in Layer 3 (ACE learning)
- A/B test with ACE learning disabled to measure impact
Notes:
- Ultra SLIM mode is ENV gated (off by default)
- No impact on standard 7-layer path performance
- Statistics tracking implemented but needs verification
- workset=256 tested and verified working
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-22 06:16:20 +09:00
707365e43b
Build: Remove tracked .d files (now in .gitignore)
...
Cleanup commit: Remove previously tracked dependency files
- core/box/tiny_near_empty_box.d
- core/hakmem_tiny.d
- core/hakmem_tiny_lifecycle.d
- core/hakmem_tiny_unified_stats.d
- hakmem_tiny_unified_stats.d
These files are build artifacts and should not be tracked.
They are now covered by *.d pattern in .gitignore.
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-22 06:12:31 +09:00