# Context Summary for ChatGPT - TLS SLL Header Corruption Fix **Date**: 2025-12-03 **Project**: hakmem - Custom Memory Allocator **Handoff From**: Gemini + Task agent (previous phase) **Current Task**: Diagnose and fix TLS SLL header corruption **Status**: CRITICAL BLOCKER - Investigation Required --- ## Quick Facts | Item | Value | |------|-------| | **Problem** | Header corruption in TLS SLL during baseline testing | | **Error Message** | `[TLS_SLL_HDR_RESET] cls=1 base=0x... got=0x31 expect=0xa1` | | **Error Location** | `core/box/tls_sll_box.h:282-303` | | **Affected Configurations** | ALL (shared code path issue) | | **Root Cause** | Unknown (6 patterns documented) | | **Fix Type** | Surgical (1-5 lines expected) | | **Build Status** | ✅ Succeeds | | **Baseline Test Status** | ❌ Crashes (SIGSEGV at ~22 seconds) | --- ## What is 0x31 vs 0xa1? ``` Expected (header magic): 0xa1 = 0xa0 (HEADER_MAGIC) | 0x01 (class_idx=1) Got (corruption): 0x31 = ASCII character '1' or some user data This means: User data exists where header should be. ``` --- ## Project Architecture (Box Theory) The hakmem allocator uses a **Box Theory** architecture where: - Each component (memory layout, pointer conversion, TLS state) is a separate "box" - Each box has a single responsibility and clear API boundaries - Examples: - `tiny_layout_box.h` - Class sizes and header offsets (single source of truth) - `ptr_conversion_box.h` - Pointer type safety (base vs user pointers) - `tls_sll_box.h` - Thread-local single-linked list management - `tls_ss_hint_box.h` - SuperSlab hint cache (Phase 1 optimization) --- ## Recent Changes (Last 5 Commits) 1. **f3f75ba3d** - "Fix Magazine Spill RAW pointer type conversion" - Added HAK_BASE_FROM_RAW() wrapper in hakmem_tiny_refill.inc.h:228 - Status: ✅ Fixed 2. **2dc9d5d59** - "Fix include order in hakmem.c" - Moved #include "box/hak_kpi_util.inc.h" before hak_core_init.inc.h - Status: ✅ Fixed 3. **94f9ea51** - "Implement TLS SuperSlab Hint Box (Phase 1)" - New header-only cache for recently-used SuperSlabs - Status: ✅ Implemented, but only 2.3% performance improvement (target was 15-20%) 4. Earlier: Box theory framework, phantom types, etc. --- ## The Remaining Issue: TLS SLL Header Corruption ### Symptom ```bash # Build succeeds $ make clean && make shared -j8 Building libhakmem.so... OK (547KB) # But baseline test crashes $ LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/sh8bench [TLS_SLL_HDR_RESET] cls=1 base=0x7ef296abf8c8 got=0x31 expect=0xa1 count=0 Segmentation fault (core dumped) ``` ### Timeline - **When Discovered**: During Phase 1 benchmarking (2025-12-03) - **Frequency**: 100% reproducible with sh8bench - **Scope**: Affects baseline (Headerless OFF), so affects all configurations ### Error Location **File**: `core/box/tls_sll_box.h` (lines 282-303) **Function**: `tls_sll_pop_impl()` **Operation**: Reading header validation ```c // Simplified logic (actual code has more details) if (tiny_class_preserves_header(class_idx)) { uint8_t* b = (uint8_t*)raw_base; uint8_t got = *b; // Read byte at offset 0 uint8_t expected = (uint8_t)(HEADER_MAGIC | (class_idx & HEADER_CLASS_MASK)); if (got != expected) { fprintf(stderr, "[TLS_SLL_HDR_RESET] cls=%d base=%p got=0x%02x expect=0x%02x\n", class_idx, raw_base, got, expected); // Reset TLS SLL for this class } } ``` ### Root Cause - Six Documented Patterns The diagnostic document identifies six possible patterns: 1. **RAW Pointer vs BASE Pointer** - Wrong pointer type passed to tls_sll_push() 2. **Header Offset Mismatch** - Writing at one offset, reading at another 3. **Atomic Fence Missing** - Compiler/CPU reordering of write + push 4. **Adjacent Block Overflow** - User data from previous block overwrites header 5. **Class Index Mismatch** - Push with one class_idx, pop as different class_idx 6. **Headerless Mode Interference** - Mixed header/headerless logic despite OFF flag --- ## Your Task **You have two comprehensive documents**: 1. **`docs/CHATGPT_HANDOFF_TLS_DIAGNOSIS.md`** (THIS FILE'S COMPANION) - Step-by-step task breakdown - 7-step investigation and fix process - Expected validation criteria 2. **`docs/TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md`** (MAIN REFERENCE - 1,150+ LINES) - Deep dive into all 6 root cause patterns - Code examples for each pattern - Minimal test case template - Diagnostic logging instrumentation - Fix code templates - 7-step validation procedure **Follow the handoff document's steps 1-7 to diagnose and fix this issue.** --- ## Build & Test Commands ### Quick Build ```bash cd /mnt/workdisk/public_share/hakmem make clean make shared -j8 ``` ### Baseline Test (Should Currently Crash) ```bash LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/sh8bench 2>&1 | \ grep -E "TLS_SLL_HDR_RESET|Total|Segmentation" ``` ### Minimal Test Case (After Creation) ```bash ./tests/test_tls_sll_minimal 2>&1 | grep -E "TLS_SLL_HDR_RESET|PASS|FAIL" ``` --- ## Important File Locations | Path | Purpose | |------|---------| | `core/box/tls_sll_box.h` | TLS SLL implementation (error source) | | `core/hakmem_tiny_free.inc` | Free path - where headers are written | | `core/hakmem_tiny_refill.inc.h` | Magazine spill - recent fix location | | `core/box/ptr_conversion_box.h` | Pointer type conversion | | `core/box/tiny_layout_box.h` | Class layout definitions | | `core/box/tls_ss_hint_box.h` | Phase 1 optimization (new) | | `docs/TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md` | YOUR MAIN REFERENCE | --- ## Key Data Structures ### TLS SLL Header Structure ```c typedef struct { uint8_t hdr; // Header: 0xa0 | class_idx uint8_t pad; // Padding/metadata uint16_t _unused; // Alignment SuperSlab* next; // Pointer to next SuperSlab } TlsSllEntry; ``` ### Header Validation ```c // Expected value for class 1: expected = 0xa0 | 1 = 0xa1 // What we're seeing: got = 0x31 = some user data // This means the header was never written OR was overwritten ``` --- ## Pointer Types in hakmem The codebase distinguishes between: ```c hak_base_ptr_t - "Base pointer" pointing to start of allocation (includes header) hak_user_ptr_t - "User pointer" pointing to user data (after offset adjustment) Conversion: user = base + tiny_user_offset(class_idx) // Typically base + 1 base = user - tiny_user_offset(class_idx) // Typically user - 1 ``` **Critical**: In Headerless mode, the offset is 0, so base == user. --- ## Known Good Patterns (For Reference) From previous fixes: ```c // Pattern: Wrapping RAW pointer before TLS SLL push (ALREADY FIXED) void* p = mag->items[--mag->top].ptr; // RAW pointer (user offset) hak_base_ptr_t base_p = HAK_BASE_FROM_RAW(p); // Wrap to base pointer if (!tls_sll_push(class_idx, base_p, cap)) { // Push base pointer // Pattern: Consistent include order (ALREADY FIXED) #include "box/hak_kpi_util.inc.h" // Must come first #include "hak_core_init.inc.h" // Must come after ``` --- ## Success Criteria | Criteria | Status | |----------|--------| | TLS SLL Header Corruption diagnosed | ❌ In progress | | Root cause pattern identified | ❌ In progress | | Minimal reproducer created | ❌ In progress | | Fix implemented | ❌ In progress | | sh8bench runs without errors | ❌ GOAL | | cfrac runs without errors | ❌ GOAL | | No performance regression | ❌ GOAL | --- ## Previous Phase Context This project has gone through several phases: - **Phase 0**: Initial implementation (completed) - **Phase 1**: TLS SuperSlab Hint Box optimization (implemented, needs validation) - **Phase 2**: Headerless mode (designed, blocked by current issue) - **Phase 102**: MemApi bridge (future) The current issue blocks validation of Phase 1 and progression to Phase 2. --- ## Timeline Estimate - **Step 1 (Read guide)**: 15-30 min - **Step 2-3 (Setup + logging)**: 1-2 hours - **Step 4 (Diagnostic run)**: 30 min - **Step 5 (Pattern matching)**: 1 hour - **Step 6 (Fix implementation)**: 30 min - 1 hour - **Step 7 (Validation)**: 1-2 hours **Total**: 4-8 hours expected --- ## Next: Start Investigation 👉 **Next Action**: Read `docs/CHATGPT_HANDOFF_TLS_DIAGNOSIS.md` and follow steps 1-7. The comprehensive diagnostic guide (`docs/TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md`) contains all the details you need for each pattern and debugging technique. **Questions or blockers?** The diagnostic guide has extensive explanations for each pattern. --- **You're now ready to begin the investigation. Good luck! 🚀**