Major breakthrough: sh8bench now completes without SIGSEGV!
Added defensive refcounting and failsafe mechanisms to prevent
use-after-free and corruption propagation.
Changes:
1. SuperSlab Refcount Pinning (core/box/tls_sll_box.h)
- tls_sll_push_impl: increment refcount before adding to list
- tls_sll_pop_impl: decrement refcount when removing from list
- Prevents SuperSlab from being freed while TLS SLL holds pointers
2. SuperSlab Release Guards (core/superslab_allocate.c, shared_pool_release.c)
- Check refcount > 0 before freeing SuperSlab
- If refcount > 0, defer release instead of freeing
- Prevents use-after-free when TLS/remote/freelist hold stale pointers
3. TLS SLL Next Pointer Validation (core/box/tls_sll_box.h)
- Detect invalid next pointer during traversal
- Log [TLS_SLL_NEXT_INVALID] when detected
- Drop list to prevent corruption propagation
4. Unified Cache Freelist Validation (core/front/tiny_unified_cache.c)
- Validate freelist head before use
- Log [UNIFIED_FREELIST_INVALID] for corrupted lists
- Defensive drop to prevent bad allocations
5. Early Refcount Decrement Fix (core/tiny_free_fast.inc.h)
- Removed ss_active_dec_one from fast path
- Prevents premature refcount depletion
- Defers decrement to proper cleanup path
Test Results:
✅ sh8bench completes successfully (exit code 0)
✅ No SIGSEGV or ABORT signals
✅ Short runs (5s) crash-free
⚠️ Multiple [TLS_SLL_NEXT_INVALID] / [UNIFIED_FREELIST_INVALID] logged
⚠️ Invalid pointers still present (stale references exist)
Status Analysis:
- Stability: ACHIEVED (no crashes)
- Root Cause: NOT FULLY SOLVED (invalid pointers remain)
- Approach: Defensive + refcount guards working well
Remaining Issues:
❌ Why does SuperSlab get unregistered while TLS SLL holds pointers?
❌ SuperSlab lifecycle: remote_queue / adopt / LRU interactions?
❌ Stale pointers indicate improper SuperSlab lifetime management
Performance Impact:
- Refcount operations: +1-3 cycles per push/pop (minor)
- Validation checks: +2-5 cycles (minor)
- Overall: < 5% overhead estimated
Next Investigation:
- Trace SuperSlab lifecycle (allocation → registration → unregister → free)
- Check remote_queue handling
- Verify adopt/LRU mechanisms
- Correlate stale pointer logs with SuperSlab unregister events
Log Volume Warning:
- May produce many diagnostic logs on long runs
- Consider ENV gating for production
Technical Notes:
- Refcount is per-SuperSlab, not global
- Guards prevent symptom propagation, not root cause
- Root cause is in SuperSlab lifecycle management
🤖 Generated with Claude Code (https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Critical discovery: TLS SLL head itself is getting corrupted with invalid pointers,
not a next-pointer offset issue. Added defensive sanitization and detailed logging.
Changes:
1. tls_sll_sanitize_head() - New defensive function
- Validates TLS head against SuperSlab metadata
- Checks header magic byte consistency
- Resets corrupted list immediately on detection
- Called at push_enter and pop_enter (defensive walls)
2. Enhanced HDR_RESET diagnostics
- Dump both next pointers (offset 0 and tiny_next_off())
- Show first 8 bytes of block (raw dump)
- Include next_off value and pointer values
- Better correlation with SuperSlab metadata
Key Findings from Diagnostic Run (/tmp/sh8_short.log):
- TLS head becomes unregistered garbage value at pop_enter
- Example: head=0x749fe96c0990 meta_cls=255 idx=-1 ss=(nil)
- Sanitize detects and resets the list
- SuperSlab registration is SUCCESSFUL (map_count=4)
- But head gets corrupted AFTER registration
Root Cause Analysis:
✅ NOT a next-pointer offset issue (would be consistent)
❌ TLS head is being OVERWRITTEN by external code
- Candidates: TLS variable collision, memset overflow, stray write
Corruption Pattern:
1. Superslab initialized successfully (verified by map_count)
2. TLS head is initially correct
3. Between registration and pop_enter: head gets corrupted
4. Corruption value is garbage (unregistered pointer)
5. Lower bytes damaged (0xe1/0x31 patterns)
Next Steps:
- Check TLS layout and variable boundaries (stack overflow?)
- Audit all writes to g_tls_sll array
- Look for memset/memcpy operating on wrong range
- Consider thread-local storage fragmentation
Technical Impact:
- Sanitize prevents list propagation (defensive)
- But underlying corruption source remains
- May be in TLS initialization, variable layout, or external overwrite
Performance: Negligible (sanitize is once per pop_enter)
🤖 Generated with Claude Code (https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
ChatGPT's diagnostic changes to address TLS_SLL_HDR_RESET issue.
Current status: Partial mitigation, but root cause remains.
Changes Applied:
1. SuperSlab Registry Fallback (hakmem_super_registry.h)
- Added legacy table probe when hash map lookup misses
- Prevents NULL returns for valid SuperSlabs during initialization
- Status: ✅ Works but may hide underlying registration issues
2. TLS SLL Push Validation (tls_sll_box.h)
- Reject push if SuperSlab lookup returns NULL
- Reject push if class_idx mismatch detected
- Added [TLS_SLL_PUSH_NO_SS] diagnostic message
- Status: ✅ Prevents list corruption (defensive)
3. SuperSlab Allocation Class Fix (superslab_allocate.c)
- Pass actual class_idx to sp_internal_allocate_superslab
- Prevents dummy class=8 causing OOB access
- Status: ✅ Root cause fix for allocation path
4. Debug Output Additions
- First 256 push/pop operations traced
- First 4 mismatches logged with details
- SuperSlab registration state logged
- Status: ✅ Diagnostic tool (not a fix)
5. TLS Hint Box Removed
- Deleted ss_tls_hint_box.{c,h} (Phase 1 optimization)
- Simplified to focus on stability first
- Status: ⏳ Can be re-added after root cause fixed
Current Problem (REMAINS UNSOLVED):
- [TLS_SLL_HDR_RESET] still occurs after ~60 seconds of sh8bench
- Pointer is 16 bytes offset from expected (class 1 → class 2 boundary)
- hak_super_lookup returns NULL for that pointer
- Suggests: Use-After-Free, Double-Free, or pointer arithmetic error
Root Cause Analysis:
- Pattern: Pointer offset by +16 (one class 1 stride)
- Timing: Cumulative problem (appears after 60s, not immediately)
- Location: Header corruption detected during TLS SLL pop
Remaining Issues:
⚠️ Registry fallback is defensive (may hide registration bugs)
⚠️ Push validation prevents symptoms but not root cause
⚠️ 16-byte pointer offset source unidentified
Next Steps for Investigation:
1. Full pointer arithmetic audit (Magazine ⇔ TLS SLL paths)
2. Enhanced logging at HDR_RESET point:
- Expected vs actual pointer value
- Pointer provenance (where it came from)
- Allocation trace for that block
3. Verify Headerless flag is OFF throughout build
4. Check for double-offset application in conversions
Technical Assessment:
- 60% root cause fixes (allocation class, validation)
- 40% defensive mitigation (registry fallback, push rejection)
Performance Impact:
- Registry fallback: +10-30 cycles on cold path (negligible)
- Push validation: +5-10 cycles per push (acceptable)
- Overall: < 2% performance impact estimated
Related Issues:
- Phase 1 TLS Hint Box removed temporarily
- Phase 2 Headerless blocked until stability achieved
🤖 Generated with Claude Code (https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Changes:
- Updated ptr_conversion_box.h: Use TINY_HEADER_SIZE instead of hardcoded -1
- Updated tiny_front_hot_box.h: Use tiny_user_offset() for BASE->USER conversion
- Updated tiny_front_cold_box.h: Use tiny_user_offset() for BASE->USER conversion
- Added tiny_layout_box.h includes to both front box headers
Box theory: Layout parameters now isolated in dedicated Box component.
All offset arithmetic centralized - no scattered +1/-1 arithmetic.
Verified: Build succeeds (make clean && make shared -j8)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Problem: hak_core_init.inc.h references KPI measurement variables
(g_latency_histogram, g_latency_samples, g_baseline_soft_pf, etc.)
but hakmem.c was including hak_kpi_util.inc.h AFTER hak_core_init.inc.h,
causing undefined reference errors.
Solution: Reorder includes so hak_kpi_util.inc.h (definition) comes
before hak_core_init.inc.h (usage).
Build result: ✅ Success (libhakmem.so 547KB, 0 errors)
Minor changes:
- Added extern __thread declarations for TLS SLL debug variables
- Added signal handler logging for debug_dump_last_push
- Improved hakmem_tiny.c structure for Phase 2 preparation
🤖 Generated with Claude Code + Task Agent
Co-Authored-By: Gemini <gemini@example.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Root cause: unified_cache_refill() accessed cache->slots before initialization
when a size class was first used via the refill path (not pop path).
Fix: Add lazy initialization check at start of unified_cache_refill()
- Check if cache->slots is NULL before accessing
- Call unified_cache_init() if needed
- Return NULL if init fails (graceful degradation)
Also includes:
- ss_cold_start_box.inc.h: Box Pattern for default prewarm settings
- hakmem_super_registry.c: Use static array in prewarm (avoid recursion)
- Default prewarm enabled (1 SuperSlab/class, configurable via ENV)
Test: 8B→16B→Mixed allocation pattern now works correctly
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
This commit introduces a comprehensive tracing mechanism for allocation failures within the Adaptive Cache Engine (ACE) component. This feature allows for precise identification of the root cause for Out-Of-Memory (OOM) issues related to ACE allocations.
Key changes include:
- **ACE Tracing Implementation**:
- Added environment variable to enable/disable detailed logging of allocation failures.
- Instrumented , , and to distinguish between "Threshold" (size class mismatch), "Exhaustion" (pool depletion), and "MapFail" (OS memory allocation failure).
- **Build System Fixes**:
- Corrected to ensure is properly linked into , resolving an error.
- **LD_PRELOAD Wrapper Adjustments**:
- Investigated and understood the wrapper's behavior under , particularly its interaction with and checks.
- Enabled debugging flags for environment to prevent unintended fallbacks to 's for non-tiny allocations, allowing comprehensive testing of the allocator.
- **Debugging & Verification**:
- Introduced temporary verbose logging to pinpoint execution flow issues within interception and routing. These temporary logs have been removed.
- Created to facilitate testing of the tracing features.
This feature will significantly aid in diagnosing and resolving allocation-related OOM issues in by providing clear insights into the failure pathways.
- Removed Legacy Backend fallback; Shared Pool is now the sole backend.
- Removed Soft Cap limit in Shared Pool to allow full memory management.
- Implemented EMPTY slab recycling with batched meta->used decrement in remote drain.
- Updated tiny_free_local_box to return is_empty status for safe recycling.
- Fixed race condition in release path by removing from legacy list early.
- Achieved 50.3M ops/s in WS8192 benchmark (+200% vs baseline).
**Problem**: Include order dependency prevented using TINY_FRONT_FASTCACHE_ENABLED
macro in hakmem_tiny_refill.inc.h (included before tiny_alloc_fast.inc.h).
**Solution** (from ChatGPT advice):
- Move wrapper functions to tiny_front_config_box.h as static inline
- This makes them available regardless of include order
- Enables dead code elimination in PGO mode for refill path
**Changes**:
1. core/box/tiny_front_config_box.h:
- Add tiny_fastcache_enabled() and sfc_cascade_enabled() as static inline
- These access static global variables via extern declaration
2. core/hakmem_tiny_refill.inc.h:
- Include tiny_front_config_box.h
- Use TINY_FRONT_FASTCACHE_ENABLED macro (line 162)
- Enables dead code elimination in PGO mode
3. core/tiny_alloc_fast.inc.h:
- Remove duplicate wrapper function definitions
- Now uses functions from config box header
**Performance**: 79.8M ops/s (maintained, 77M/81M/81M across 3 runs)
**Design Principle**: Config Box as "single entry point" for Tiny Front policy
- All config checks go through TINY_FRONT_*_ENABLED macros
- Wrapper functions centralized in config box header
- Include order independent (static inline in header)
🐱 Generated with ChatGPT advice for solving include order dependencies
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Implement compile-time configuration system for dead code elimination in Tiny
allocation hot paths. The Config Box provides dual-mode configuration:
- Normal mode: Runtime ENV checks (backward compatible, flexible)
- PGO mode: Compile-time constants (dead code elimination, performance)
PERFORMANCE:
- Baseline (runtime config): 50.32 M ops/s (avg of 5 runs)
- Config Box (PGO mode): 52.77 M ops/s (avg of 5 runs)
- Improvement: +2.45 M ops/s (+4.87% with outlier, +2.72% without)
- Target: +5-8% (partially achieved)
IMPLEMENTATION:
1. core/box/tiny_front_config_box.h (NEW):
- Defines TINY_FRONT_*_ENABLED macros for all config checks
- PGO mode (#if HAKMEM_TINY_FRONT_PGO): Macros expand to constants (0/1)
- Normal mode (#else): Macros expand to function calls
- Functions remain in their original locations (no code duplication)
2. core/hakmem_build_flags.h:
- Added HAKMEM_TINY_FRONT_PGO build flag (default: 0, off)
- Documentation: Usage with make EXTRA_CFLAGS="-DHAKMEM_TINY_FRONT_PGO=1"
3. core/box/hak_wrappers.inc.h:
- Replaced front_gate_unified_enabled() with TINY_FRONT_UNIFIED_GATE_ENABLED
- 2 call sites updated (malloc and free fast paths)
- Added config box include
EXPECTED DEAD CODE ELIMINATION (PGO mode):
if (TINY_FRONT_UNIFIED_GATE_ENABLED) { ... }
→ if (1) { ... } // Constant, always true
→ Compiler optimizes away the branch, keeps body
SCOPE:
Currently only front_gate_unified_enabled() is replaced (2 call sites).
To achieve full +5-8% target, expand to other config checks:
- ultra_slim_mode_enabled()
- tiny_heap_v2_enabled()
- sfc_cascade_enabled()
- tiny_fastcache_enabled()
- tiny_metrics_enabled()
- tiny_diag_enabled()
BUILD USAGE:
Normal mode (runtime config, default):
make bench_random_mixed_hakmem
PGO mode (compile-time config, dead code elimination):
make EXTRA_CFLAGS="-DHAKMEM_TINY_FRONT_PGO=1" bench_random_mixed_hakmem
BOX PATTERN COMPLIANCE:
✅ Single Responsibility: Configuration management ONLY
✅ Clear Contract: Dual-mode (PGO = constants, Normal = runtime)
✅ Observable: Config report function (debug builds)
✅ Safe: Backward compatible (default is normal mode)
✅ Testable: Easy A/B comparison (PGO vs normal builds)
WHY +2.7-4.9% (below +5-8% target)?
- Limited scope: Only 2 call sites for 1 config function replaced
- Lazy init overhead: front_gate_unified_enabled() cached after first call
- Need to expand to more config checks for full benefit
NEXT STEPS:
- Expand config macro usage to other functions (optional)
- OR proceed with PGO re-enablement (Final polish)
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
1. Archive unused backend files (ss_legacy/unified_backend_box.c/h)
- These files were not linked in the build
- Moved to archive/ to reduce confusion
2. Created HAK_RET_ALLOC_BLOCK macro for SuperSlab allocations
- Replaces superslab_return_block() function
- Consistent with existing HAK_RET_ALLOC pattern
- Single source of truth for header writing
- Defined in hakmem_tiny_superslab_internal.h
3. Added header validation on TLS SLL push
- Detects blocks pushed without proper header
- Enabled via HAKMEM_TINY_SLL_VALIDATE_HDR=1 (release)
- Always on in debug builds
- Logs first 10 violations with backtraces
Benefits:
- Easier to track allocation paths
- Catches header bugs at push time
- More maintainable macro-based design
Note: Larson bug still reproduces - header corruption occurs
before push validation can catch it.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Major refactoring to improve maintainability and debugging:
1. Split hakmem_tiny_superslab.c (1521 lines) into 7 focused files:
- superslab_allocate.c: SuperSlab allocation/deallocation
- superslab_backend.c: Backend allocation paths (legacy, shared)
- superslab_ace.c: ACE (Adaptive Cache Engine) logic
- superslab_slab.c: Slab initialization and bitmap management
- superslab_cache.c: LRU cache and prewarm cache management
- superslab_head.c: SuperSlabHead management and expansion
- superslab_stats.c: Statistics tracking and debugging
2. Created hakmem_tiny_superslab_internal.h for shared declarations
3. Added superslab_return_block() as single exit point for header writing:
- All backend allocations now go through this helper
- Prevents bugs where headers are forgotten in some paths
- Makes future debugging easier
4. Updated Makefile for new file structure
5. Added header writing to ss_legacy_backend_box.c and
ss_unified_backend_box.c (though not currently linked)
Note: Header corruption bug in Larson benchmark still exists.
Class 1-6 allocations go through TLS refill/carve paths, not backend.
Further investigation needed.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>