ChatGPT's diagnostic changes to address TLS_SLL_HDR_RESET issue.
Current status: Partial mitigation, but root cause remains.
Changes Applied:
1. SuperSlab Registry Fallback (hakmem_super_registry.h)
- Added legacy table probe when hash map lookup misses
- Prevents NULL returns for valid SuperSlabs during initialization
- Status: ✅ Works but may hide underlying registration issues
2. TLS SLL Push Validation (tls_sll_box.h)
- Reject push if SuperSlab lookup returns NULL
- Reject push if class_idx mismatch detected
- Added [TLS_SLL_PUSH_NO_SS] diagnostic message
- Status: ✅ Prevents list corruption (defensive)
3. SuperSlab Allocation Class Fix (superslab_allocate.c)
- Pass actual class_idx to sp_internal_allocate_superslab
- Prevents dummy class=8 causing OOB access
- Status: ✅ Root cause fix for allocation path
4. Debug Output Additions
- First 256 push/pop operations traced
- First 4 mismatches logged with details
- SuperSlab registration state logged
- Status: ✅ Diagnostic tool (not a fix)
5. TLS Hint Box Removed
- Deleted ss_tls_hint_box.{c,h} (Phase 1 optimization)
- Simplified to focus on stability first
- Status: ⏳ Can be re-added after root cause fixed
Current Problem (REMAINS UNSOLVED):
- [TLS_SLL_HDR_RESET] still occurs after ~60 seconds of sh8bench
- Pointer is 16 bytes offset from expected (class 1 → class 2 boundary)
- hak_super_lookup returns NULL for that pointer
- Suggests: Use-After-Free, Double-Free, or pointer arithmetic error
Root Cause Analysis:
- Pattern: Pointer offset by +16 (one class 1 stride)
- Timing: Cumulative problem (appears after 60s, not immediately)
- Location: Header corruption detected during TLS SLL pop
Remaining Issues:
⚠️ Registry fallback is defensive (may hide registration bugs)
⚠️ Push validation prevents symptoms but not root cause
⚠️ 16-byte pointer offset source unidentified
Next Steps for Investigation:
1. Full pointer arithmetic audit (Magazine ⇔ TLS SLL paths)
2. Enhanced logging at HDR_RESET point:
- Expected vs actual pointer value
- Pointer provenance (where it came from)
- Allocation trace for that block
3. Verify Headerless flag is OFF throughout build
4. Check for double-offset application in conversions
Technical Assessment:
- 60% root cause fixes (allocation class, validation)
- 40% defensive mitigation (registry fallback, push rejection)
Performance Impact:
- Registry fallback: +10-30 cycles on cold path (negligible)
- Push validation: +5-10 cycles per push (acceptable)
- Overall: < 2% performance impact estimated
Related Issues:
- Phase 1 TLS Hint Box removed temporarily
- Phase 2 Headerless blocked until stability achieved
🤖 Generated with Claude Code (https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Add active field to TinySlabMeta to track blocks currently held by
users (not in TLS SLL or freelist caches). This enables accurate
empty slab detection that accounts for TLS SLL cached blocks.
Changes:
- superslab_types.h: Add _Atomic uint16_t active field
- ss_allocation_box.c, hakmem_tiny_superslab.c: Initialize active=0
- tiny_free_fast_v2.inc.h: Decrement active on TLS SLL push
- tiny_alloc_fast.inc.h: Add tiny_active_track_alloc() helper,
increment active on TLS SLL pop (all code paths)
- ss_hot_cold_box.h: ss_is_slab_empty() uses active when enabled
All tracking is ENV-gated: HAKMEM_TINY_ACTIVE_TRACK=1 to enable.
Default is off for zero performance impact.
Invariant: active = used - tls_cached (active <= used)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>