WIP: Add TLS SLL validation and SuperSlab registry fallback

ChatGPT's diagnostic changes to address TLS_SLL_HDR_RESET issue.
Current status: Partial mitigation, but root cause remains.

Changes Applied:
1. SuperSlab Registry Fallback (hakmem_super_registry.h)
   - Added legacy table probe when hash map lookup misses
   - Prevents NULL returns for valid SuperSlabs during initialization
   - Status:  Works but may hide underlying registration issues

2. TLS SLL Push Validation (tls_sll_box.h)
   - Reject push if SuperSlab lookup returns NULL
   - Reject push if class_idx mismatch detected
   - Added [TLS_SLL_PUSH_NO_SS] diagnostic message
   - Status:  Prevents list corruption (defensive)

3. SuperSlab Allocation Class Fix (superslab_allocate.c)
   - Pass actual class_idx to sp_internal_allocate_superslab
   - Prevents dummy class=8 causing OOB access
   - Status:  Root cause fix for allocation path

4. Debug Output Additions
   - First 256 push/pop operations traced
   - First 4 mismatches logged with details
   - SuperSlab registration state logged
   - Status:  Diagnostic tool (not a fix)

5. TLS Hint Box Removed
   - Deleted ss_tls_hint_box.{c,h} (Phase 1 optimization)
   - Simplified to focus on stability first
   - Status:  Can be re-added after root cause fixed

Current Problem (REMAINS UNSOLVED):
- [TLS_SLL_HDR_RESET] still occurs after ~60 seconds of sh8bench
- Pointer is 16 bytes offset from expected (class 1 → class 2 boundary)
- hak_super_lookup returns NULL for that pointer
- Suggests: Use-After-Free, Double-Free, or pointer arithmetic error

Root Cause Analysis:
- Pattern: Pointer offset by +16 (one class 1 stride)
- Timing: Cumulative problem (appears after 60s, not immediately)
- Location: Header corruption detected during TLS SLL pop

Remaining Issues:
⚠️ Registry fallback is defensive (may hide registration bugs)
⚠️ Push validation prevents symptoms but not root cause
⚠️ 16-byte pointer offset source unidentified

Next Steps for Investigation:
1. Full pointer arithmetic audit (Magazine ⇔ TLS SLL paths)
2. Enhanced logging at HDR_RESET point:
   - Expected vs actual pointer value
   - Pointer provenance (where it came from)
   - Allocation trace for that block
3. Verify Headerless flag is OFF throughout build
4. Check for double-offset application in conversions

Technical Assessment:
- 60% root cause fixes (allocation class, validation)
- 40% defensive mitigation (registry fallback, push rejection)

Performance Impact:
- Registry fallback: +10-30 cycles on cold path (negligible)
- Push validation: +5-10 cycles per push (acceptable)
- Overall: < 2% performance impact estimated

Related Issues:
- Phase 1 TLS Hint Box removed temporarily
- Phase 2 Headerless blocked until stability achieved

🤖 Generated with Claude Code (https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Moe Charm (CI)
2025-12-03 20:42:28 +09:00
parent 2624dcce62
commit 0546454168
27 changed files with 543 additions and 46 deletions

View File

@ -569,13 +569,30 @@ int sp_freelist_pop_lockfree(int class_idx, SharedSSMeta** meta_out, int* slot_i
// Allocator helper for SuperSlab (Phase 9-2 Task 1)
// NOTE: class_idx MUST be a valid tiny class (0-7). Passing an out-of-range
// value previously went through superslab_allocate(8), which overflowed
// g_ss_ace[] and could corrupt neighboring globals, leading to missing
// registry entries and TLS SLL header corruption.
SuperSlab*
sp_internal_allocate_superslab(void)
sp_internal_allocate_superslab(int class_idx)
{
do {
static _Atomic uint32_t g_sp_alloc_log = 0;
uint32_t shot = atomic_fetch_add_explicit(&g_sp_alloc_log, 1, memory_order_relaxed);
if (shot < 4) {
fprintf(stderr, "[SP_INTERNAL_ALLOC] class_idx=%d\n", class_idx);
fflush(stderr);
}
} while (0);
// Clamp to valid range to avoid out-of-bounds access inside superslab_allocate().
if (class_idx < 0 || class_idx >= TINY_NUM_CLASSES_SS) {
class_idx = TINY_NUM_CLASSES_SS - 1;
}
// Use legacy backend to allocate a SuperSlab (malloc-based)
extern SuperSlab* superslab_allocate(uint8_t size_class);
// Pass 8 as class_idx (dummy, will be overwritten) or larger
SuperSlab* ss = superslab_allocate(8);
SuperSlab* ss = superslab_allocate((uint8_t)class_idx);
if (!ss) {
return NULL;
}
@ -596,7 +613,7 @@ shared_pool_acquire_superslab(void)
{
// Phase 12: Legacy wrapper?
// This function seems to be a direct allocation bypass.
return sp_internal_allocate_superslab();
return sp_internal_allocate_superslab(0);
}
void sp_fix_geometry_if_needed(SuperSlab* ss, int slab_idx, int class_idx) {