Created 9 diagnostic and handoff documents (48KB) to guide ChatGPT through systematic diagnosis and fix of TLS SLL header corruption issue. Documents Added: - README_HANDOFF_CHATGPT.md: Master guide explaining 3-doc system - CHATGPT_CONTEXT_SUMMARY.md: Quick facts & architecture (2-3 min read) - CHATGPT_HANDOFF_TLS_DIAGNOSIS.md: 7-step procedure (4-8h timeline) - GEMINI_HANDOFF_SUMMARY.md: Handoff summary for user review - STATUS_2025_12_03_CURRENT.md: Complete project status snapshot - TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md: Deep reference (1,150+ lines) - 6 root cause patterns with code examples - Diagnostic logging instrumentation - Fix templates and validation procedures - TLS_SS_HINT_BOX_DESIGN.md: Phase 1 optimization design (1,148 lines) - HEADERLESS_STABILITY_DEBUG_INSTRUCTIONS.md: Test environment setup - SEGFAULT_INVESTIGATION_FOR_GEMINI.md: Original investigation notes Problem Context: - Baseline (Headerless OFF) crashes with [TLS_SLL_HDR_RESET] - Error: cls=1 base=0x... got=0x31 expect=0xa1 - Blocks Phase 1 validation and Phase 2 progression Expected Outcome: - ChatGPT follows 7-step diagnostic process - Root cause identified (one of 6 patterns) - Surgical fix (1-5 lines) - TC1 baseline completes without crashes 🤖 Generated with Claude Code (https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
8.5 KiB
Context Summary for ChatGPT - TLS SLL Header Corruption Fix
Date: 2025-12-03 Project: hakmem - Custom Memory Allocator Handoff From: Gemini + Task agent (previous phase) Current Task: Diagnose and fix TLS SLL header corruption Status: CRITICAL BLOCKER - Investigation Required
Quick Facts
| Item | Value |
|---|---|
| Problem | Header corruption in TLS SLL during baseline testing |
| Error Message | [TLS_SLL_HDR_RESET] cls=1 base=0x... got=0x31 expect=0xa1 |
| Error Location | core/box/tls_sll_box.h:282-303 |
| Affected Configurations | ALL (shared code path issue) |
| Root Cause | Unknown (6 patterns documented) |
| Fix Type | Surgical (1-5 lines expected) |
| Build Status | ✅ Succeeds |
| Baseline Test Status | ❌ Crashes (SIGSEGV at ~22 seconds) |
What is 0x31 vs 0xa1?
Expected (header magic): 0xa1 = 0xa0 (HEADER_MAGIC) | 0x01 (class_idx=1)
Got (corruption): 0x31 = ASCII character '1' or some user data
This means: User data exists where header should be.
Project Architecture (Box Theory)
The hakmem allocator uses a Box Theory architecture where:
- Each component (memory layout, pointer conversion, TLS state) is a separate "box"
- Each box has a single responsibility and clear API boundaries
- Examples:
tiny_layout_box.h- Class sizes and header offsets (single source of truth)ptr_conversion_box.h- Pointer type safety (base vs user pointers)tls_sll_box.h- Thread-local single-linked list managementtls_ss_hint_box.h- SuperSlab hint cache (Phase 1 optimization)
Recent Changes (Last 5 Commits)
-
f3f75ba3d- "Fix Magazine Spill RAW pointer type conversion"- Added HAK_BASE_FROM_RAW() wrapper in hakmem_tiny_refill.inc.h:228
- Status: ✅ Fixed
-
2dc9d5d59- "Fix include order in hakmem.c"- Moved #include "box/hak_kpi_util.inc.h" before hak_core_init.inc.h
- Status: ✅ Fixed
-
94f9ea51- "Implement TLS SuperSlab Hint Box (Phase 1)"- New header-only cache for recently-used SuperSlabs
- Status: ✅ Implemented, but only 2.3% performance improvement (target was 15-20%)
-
Earlier: Box theory framework, phantom types, etc.
The Remaining Issue: TLS SLL Header Corruption
Symptom
# Build succeeds
$ make clean && make shared -j8
Building libhakmem.so... OK (547KB)
# But baseline test crashes
$ LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/sh8bench
[TLS_SLL_HDR_RESET] cls=1 base=0x7ef296abf8c8 got=0x31 expect=0xa1 count=0
Segmentation fault (core dumped)
Timeline
- When Discovered: During Phase 1 benchmarking (2025-12-03)
- Frequency: 100% reproducible with sh8bench
- Scope: Affects baseline (Headerless OFF), so affects all configurations
Error Location
File: core/box/tls_sll_box.h (lines 282-303)
Function: tls_sll_pop_impl()
Operation: Reading header validation
// Simplified logic (actual code has more details)
if (tiny_class_preserves_header(class_idx)) {
uint8_t* b = (uint8_t*)raw_base;
uint8_t got = *b; // Read byte at offset 0
uint8_t expected = (uint8_t)(HEADER_MAGIC | (class_idx & HEADER_CLASS_MASK));
if (got != expected) {
fprintf(stderr, "[TLS_SLL_HDR_RESET] cls=%d base=%p got=0x%02x expect=0x%02x\n",
class_idx, raw_base, got, expected);
// Reset TLS SLL for this class
}
}
Root Cause - Six Documented Patterns
The diagnostic document identifies six possible patterns:
- RAW Pointer vs BASE Pointer - Wrong pointer type passed to tls_sll_push()
- Header Offset Mismatch - Writing at one offset, reading at another
- Atomic Fence Missing - Compiler/CPU reordering of write + push
- Adjacent Block Overflow - User data from previous block overwrites header
- Class Index Mismatch - Push with one class_idx, pop as different class_idx
- Headerless Mode Interference - Mixed header/headerless logic despite OFF flag
Your Task
You have two comprehensive documents:
-
docs/CHATGPT_HANDOFF_TLS_DIAGNOSIS.md(THIS FILE'S COMPANION)- Step-by-step task breakdown
- 7-step investigation and fix process
- Expected validation criteria
-
docs/TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md(MAIN REFERENCE - 1,150+ LINES)- Deep dive into all 6 root cause patterns
- Code examples for each pattern
- Minimal test case template
- Diagnostic logging instrumentation
- Fix code templates
- 7-step validation procedure
Follow the handoff document's steps 1-7 to diagnose and fix this issue.
Build & Test Commands
Quick Build
cd /mnt/workdisk/public_share/hakmem
make clean
make shared -j8
Baseline Test (Should Currently Crash)
LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/sh8bench 2>&1 | \
grep -E "TLS_SLL_HDR_RESET|Total|Segmentation"
Minimal Test Case (After Creation)
./tests/test_tls_sll_minimal 2>&1 | grep -E "TLS_SLL_HDR_RESET|PASS|FAIL"
Important File Locations
| Path | Purpose |
|---|---|
core/box/tls_sll_box.h |
TLS SLL implementation (error source) |
core/hakmem_tiny_free.inc |
Free path - where headers are written |
core/hakmem_tiny_refill.inc.h |
Magazine spill - recent fix location |
core/box/ptr_conversion_box.h |
Pointer type conversion |
core/box/tiny_layout_box.h |
Class layout definitions |
core/box/tls_ss_hint_box.h |
Phase 1 optimization (new) |
docs/TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md |
YOUR MAIN REFERENCE |
Key Data Structures
TLS SLL Header Structure
typedef struct {
uint8_t hdr; // Header: 0xa0 | class_idx
uint8_t pad; // Padding/metadata
uint16_t _unused; // Alignment
SuperSlab* next; // Pointer to next SuperSlab
} TlsSllEntry;
Header Validation
// Expected value for class 1:
expected = 0xa0 | 1 = 0xa1
// What we're seeing:
got = 0x31 = some user data
// This means the header was never written OR was overwritten
Pointer Types in hakmem
The codebase distinguishes between:
hak_base_ptr_t - "Base pointer" pointing to start of allocation (includes header)
hak_user_ptr_t - "User pointer" pointing to user data (after offset adjustment)
Conversion:
user = base + tiny_user_offset(class_idx) // Typically base + 1
base = user - tiny_user_offset(class_idx) // Typically user - 1
Critical: In Headerless mode, the offset is 0, so base == user.
Known Good Patterns (For Reference)
From previous fixes:
// Pattern: Wrapping RAW pointer before TLS SLL push (ALREADY FIXED)
void* p = mag->items[--mag->top].ptr; // RAW pointer (user offset)
hak_base_ptr_t base_p = HAK_BASE_FROM_RAW(p); // Wrap to base pointer
if (!tls_sll_push(class_idx, base_p, cap)) { // Push base pointer
// Pattern: Consistent include order (ALREADY FIXED)
#include "box/hak_kpi_util.inc.h" // Must come first
#include "hak_core_init.inc.h" // Must come after
Success Criteria
| Criteria | Status |
|---|---|
| TLS SLL Header Corruption diagnosed | ❌ In progress |
| Root cause pattern identified | ❌ In progress |
| Minimal reproducer created | ❌ In progress |
| Fix implemented | ❌ In progress |
| sh8bench runs without errors | ❌ GOAL |
| cfrac runs without errors | ❌ GOAL |
| No performance regression | ❌ GOAL |
Previous Phase Context
This project has gone through several phases:
- Phase 0: Initial implementation (completed)
- Phase 1: TLS SuperSlab Hint Box optimization (implemented, needs validation)
- Phase 2: Headerless mode (designed, blocked by current issue)
- Phase 102: MemApi bridge (future)
The current issue blocks validation of Phase 1 and progression to Phase 2.
Timeline Estimate
- Step 1 (Read guide): 15-30 min
- Step 2-3 (Setup + logging): 1-2 hours
- Step 4 (Diagnostic run): 30 min
- Step 5 (Pattern matching): 1 hour
- Step 6 (Fix implementation): 30 min - 1 hour
- Step 7 (Validation): 1-2 hours
Total: 4-8 hours expected
Next: Start Investigation
👉 Next Action: Read docs/CHATGPT_HANDOFF_TLS_DIAGNOSIS.md and follow steps 1-7.
The comprehensive diagnostic guide (docs/TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md) contains all the details you need for each pattern and debugging technique.
Questions or blockers? The diagnostic guide has extensive explanations for each pattern.
You're now ready to begin the investigation. Good luck! 🚀