diff --git a/docs/CHATGPT_CONTEXT_SUMMARY.md b/docs/CHATGPT_CONTEXT_SUMMARY.md
new file mode 100644
index 00000000..6bf42e75
--- /dev/null
+++ b/docs/CHATGPT_CONTEXT_SUMMARY.md
@@ -0,0 +1,295 @@
+# Context Summary for ChatGPT - TLS SLL Header Corruption Fix
+
+**Date**: 2025-12-03
+**Project**: hakmem - Custom Memory Allocator
+**Handoff From**: Gemini + Task agent (previous phase)
+**Current Task**: Diagnose and fix TLS SLL header corruption
+**Status**: CRITICAL BLOCKER - Investigation Required
+
+---
+
+## Quick Facts
+
+| Item | Value |
+|------|-------|
+| **Problem** | Header corruption in TLS SLL during baseline testing |
+| **Error Message** | `[TLS_SLL_HDR_RESET] cls=1 base=0x... got=0x31 expect=0xa1` |
+| **Error Location** | `core/box/tls_sll_box.h:282-303` |
+| **Affected Configurations** | ALL (shared code path issue) |
+| **Root Cause** | Unknown (6 patterns documented) |
+| **Fix Type** | Surgical (1-5 lines expected) |
+| **Build Status** | ✅ Succeeds |
+| **Baseline Test Status** | ❌ Crashes (SIGSEGV at ~22 seconds) |
+
+---
+
+## What is 0x31 vs 0xa1?
+
+```
+Expected (header magic): 0xa1 = 0xa0 (HEADER_MAGIC) | 0x01 (class_idx=1)
+Got (corruption):        0x31 = ASCII character '1' or some user data
+
+This means: User data exists where header should be.
+```
+
+---
+
+## Project Architecture (Box Theory)
+
+The hakmem allocator uses a **Box Theory** architecture where:
+
+- Each component (memory layout, pointer conversion, TLS state) is a separate "box"
+- Each box has a single responsibility and clear API boundaries
+- Examples:
+  - `tiny_layout_box.h` - Class sizes and header offsets (single source of truth)
+  - `ptr_conversion_box.h` - Pointer type safety (base vs user pointers)
+  - `tls_sll_box.h` - Thread-local single-linked list management
+  - `tls_ss_hint_box.h` - SuperSlab hint cache (Phase 1 optimization)
+
+---
+
+## Recent Changes (Last 5 Commits)
+
+1. **f3f75ba3d** - "Fix Magazine Spill RAW pointer type conversion"
+   - Added HAK_BASE_FROM_RAW() wrapper in hakmem_tiny_refill.inc.h:228
+   - Status: ✅ Fixed
+
+2. **2dc9d5d59** - "Fix include order in hakmem.c"
+   - Moved #include "box/hak_kpi_util.inc.h" before hak_core_init.inc.h
+   - Status: ✅ Fixed
+
+3. **94f9ea51** - "Implement TLS SuperSlab Hint Box (Phase 1)"
+   - New header-only cache for recently-used SuperSlabs
+   - Status: ✅ Implemented, but only 2.3% performance improvement (target was 15-20%)
+
+4. Earlier: Box theory framework, phantom types, etc.
+
+---
+
+## The Remaining Issue: TLS SLL Header Corruption
+
+### Symptom
+
+```bash
+# Build succeeds
+$ make clean && make shared -j8
+Building libhakmem.so... OK (547KB)
+
+# But baseline test crashes
+$ LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/sh8bench
+[TLS_SLL_HDR_RESET] cls=1 base=0x7ef296abf8c8 got=0x31 expect=0xa1 count=0
+Segmentation fault (core dumped)
+```
+
+### Timeline
+
+- **When Discovered**: During Phase 1 benchmarking (2025-12-03)
+- **Frequency**: 100% reproducible with sh8bench
+- **Scope**: Affects baseline (Headerless OFF), so affects all configurations
+
+### Error Location
+
+**File**: `core/box/tls_sll_box.h` (lines 282-303)
+**Function**: `tls_sll_pop_impl()`
+**Operation**: Reading header validation
+
+```c
+// Simplified logic (actual code has more details)
+if (tiny_class_preserves_header(class_idx)) {
+    uint8_t* b = (uint8_t*)raw_base;
+    uint8_t got = *b;  // Read byte at offset 0
+    uint8_t expected = (uint8_t)(HEADER_MAGIC | (class_idx & HEADER_CLASS_MASK));
+
+    if (got != expected) {
+        fprintf(stderr, "[TLS_SLL_HDR_RESET] cls=%d base=%p got=0x%02x expect=0x%02x\n",
+                class_idx, raw_base, got, expected);
+        // Reset TLS SLL for this class
+    }
+}
+```
+
+### Root Cause - Six Documented Patterns
+
+The diagnostic document identifies six possible patterns:
+
+1. **RAW Pointer vs BASE Pointer** - Wrong pointer type passed to tls_sll_push()
+2. **Header Offset Mismatch** - Writing at one offset, reading at another
+3. **Atomic Fence Missing** - Compiler/CPU reordering of write + push
+4. **Adjacent Block Overflow** - User data from previous block overwrites header
+5. **Class Index Mismatch** - Push with one class_idx, pop as different class_idx
+6. **Headerless Mode Interference** - Mixed header/headerless logic despite OFF flag
+
+---
+
+## Your Task
+
+**You have two comprehensive documents**:
+
+1. **`docs/CHATGPT_HANDOFF_TLS_DIAGNOSIS.md`** (THIS FILE'S COMPANION)
+   - Step-by-step task breakdown
+   - 7-step investigation and fix process
+   - Expected validation criteria
+
+2. **`docs/TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md`** (MAIN REFERENCE - 1,150+ LINES)
+   - Deep dive into all 6 root cause patterns
+   - Code examples for each pattern
+   - Minimal test case template
+   - Diagnostic logging instrumentation
+   - Fix code templates
+   - 7-step validation procedure
+
+**Follow the handoff document's steps 1-7 to diagnose and fix this issue.**
+
+---
+
+## Build & Test Commands
+
+### Quick Build
+
+```bash
+cd /mnt/workdisk/public_share/hakmem
+make clean
+make shared -j8
+```
+
+### Baseline Test (Should Currently Crash)
+
+```bash
+LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/sh8bench 2>&1 | \
+  grep -E "TLS_SLL_HDR_RESET|Total|Segmentation"
+```
+
+### Minimal Test Case (After Creation)
+
+```bash
+./tests/test_tls_sll_minimal 2>&1 | grep -E "TLS_SLL_HDR_RESET|PASS|FAIL"
+```
+
+---
+
+## Important File Locations
+
+| Path | Purpose |
+|------|---------|
+| `core/box/tls_sll_box.h` | TLS SLL implementation (error source) |
+| `core/hakmem_tiny_free.inc` | Free path - where headers are written |
+| `core/hakmem_tiny_refill.inc.h` | Magazine spill - recent fix location |
+| `core/box/ptr_conversion_box.h` | Pointer type conversion |
+| `core/box/tiny_layout_box.h` | Class layout definitions |
+| `core/box/tls_ss_hint_box.h` | Phase 1 optimization (new) |
+| `docs/TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md` | YOUR MAIN REFERENCE |
+
+---
+
+## Key Data Structures
+
+### TLS SLL Header Structure
+
+```c
+typedef struct {
+    uint8_t hdr;       // Header: 0xa0 | class_idx
+    uint8_t pad;       // Padding/metadata
+    uint16_t _unused;  // Alignment
+    SuperSlab* next;   // Pointer to next SuperSlab
+} TlsSllEntry;
+```
+
+### Header Validation
+
+```c
+// Expected value for class 1:
+expected = 0xa0 | 1 = 0xa1
+
+// What we're seeing:
+got = 0x31 = some user data
+
+// This means the header was never written OR was overwritten
+```
+
+---
+
+## Pointer Types in hakmem
+
+The codebase distinguishes between:
+
+```c
+hak_base_ptr_t   - "Base pointer" pointing to start of allocation (includes header)
+hak_user_ptr_t   - "User pointer" pointing to user data (after offset adjustment)
+
+Conversion:
+user = base + tiny_user_offset(class_idx)   // Typically base + 1
+base = user - tiny_user_offset(class_idx)   // Typically user - 1
+```
+
+**Critical**: In Headerless mode, the offset is 0, so base == user.
+
+---
+
+## Known Good Patterns (For Reference)
+
+From previous fixes:
+
+```c
+// Pattern: Wrapping RAW pointer before TLS SLL push (ALREADY FIXED)
+void* p = mag->items[--mag->top].ptr;              // RAW pointer (user offset)
+hak_base_ptr_t base_p = HAK_BASE_FROM_RAW(p);     // Wrap to base pointer
+if (!tls_sll_push(class_idx, base_p, cap)) {      // Push base pointer
+
+// Pattern: Consistent include order (ALREADY FIXED)
+#include "box/hak_kpi_util.inc.h"      // Must come first
+#include "hak_core_init.inc.h"          // Must come after
+```
+
+---
+
+## Success Criteria
+
+| Criteria | Status |
+|----------|--------|
+| TLS SLL Header Corruption diagnosed | ❌ In progress |
+| Root cause pattern identified | ❌ In progress |
+| Minimal reproducer created | ❌ In progress |
+| Fix implemented | ❌ In progress |
+| sh8bench runs without errors | ❌ GOAL |
+| cfrac runs without errors | ❌ GOAL |
+| No performance regression | ❌ GOAL |
+
+---
+
+## Previous Phase Context
+
+This project has gone through several phases:
+
+- **Phase 0**: Initial implementation (completed)
+- **Phase 1**: TLS SuperSlab Hint Box optimization (implemented, needs validation)
+- **Phase 2**: Headerless mode (designed, blocked by current issue)
+- **Phase 102**: MemApi bridge (future)
+
+The current issue blocks validation of Phase 1 and progression to Phase 2.
+
+---
+
+## Timeline Estimate
+
+- **Step 1 (Read guide)**: 15-30 min
+- **Step 2-3 (Setup + logging)**: 1-2 hours
+- **Step 4 (Diagnostic run)**: 30 min
+- **Step 5 (Pattern matching)**: 1 hour
+- **Step 6 (Fix implementation)**: 30 min - 1 hour
+- **Step 7 (Validation)**: 1-2 hours
+
+**Total**: 4-8 hours expected
+
+---
+
+## Next: Start Investigation
+
+👉 **Next Action**: Read `docs/CHATGPT_HANDOFF_TLS_DIAGNOSIS.md` and follow steps 1-7.
+
+The comprehensive diagnostic guide (`docs/TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md`) contains all the details you need for each pattern and debugging technique.
+
+**Questions or blockers?** The diagnostic guide has extensive explanations for each pattern.
+
+---
+
+**You're now ready to begin the investigation. Good luck! 🚀**
diff --git a/docs/CHATGPT_HANDOFF_TLS_DIAGNOSIS.md b/docs/CHATGPT_HANDOFF_TLS_DIAGNOSIS.md
new file mode 100644
index 00000000..629089c1
--- /dev/null
+++ b/docs/CHATGPT_HANDOFF_TLS_DIAGNOSIS.md
@@ -0,0 +1,301 @@
+# ChatGPT Task: TLS SLL Header Corruption Diagnosis & Fix
+
+**Status**: BLOCKING - System instability detected in baseline configuration
+**Priority**: CRITICAL
+**Assigned to**: Claude (ChatGPT model)
+**Expected Duration**: 4-8 hours
+
+---
+
+## Executive Summary
+
+The hakmem memory allocator baseline configuration crashes with a critical header corruption error:
+
+```
+[TLS_SLL_HDR_RESET] cls=1 base=0x... got=0x31 expect=0xa1
+```
+
+This occurs in **shared code paths** (not Phase 1 specific), blocking all further development and validation.
+
+**Your Task**: Diagnose and fix this issue using the comprehensive diagnostic guide.
+
+---
+
+## What You Need to Know
+
+### Context
+
+- **Project**: hakmem - custom memory allocator with "Box Theory" architecture
+- **Language**: C
+- **Current Phase**: Phase 1 implementation + Phase 2 (Headerless) planning
+- **Problem**: Baseline test crashes before completing benchmarks
+- **Error Location**: `core/box/tls_sll_box.h` - header validation during TLS SLL pop
+
+### The Error
+
+When a block is popped from the TLS SLL (Thread-Local Single-Linked List), the header validation checks:
+
+```c
+uint8_t got = *b;              // Read byte at offset 0 of base pointer
+uint8_t expected = 0xa0 | class_idx;  // For class 1: 0xa1
+
+if (got != expected) {
+    // ERROR DETECTED - got 0x31 instead of 0xa1
+}
+```
+
+The header byte contains user data (0x31 = '1' character) instead of the expected magic value (0xa1).
+
+**This means**: Either:
+1. Wrong pointer was stored in TLS SLL
+2. Header was not written before pushing to TLS SLL
+3. Header was overwritten after pushing
+4. Offset calculation is wrong
+
+---
+
+## Your Step-by-Step Task
+
+### Step 1: Read the Comprehensive Diagnostic Document
+
+**File**: `/mnt/workdisk/public_share/hakmem/docs/TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md`
+
+This 1,150+ line document contains:
+- 6 detailed root cause patterns with code examples
+- Minimal test case template (test_tls_sll_minimal.c)
+- Diagnostic logging instrumentation points
+- Fix patterns with code snippets
+- 7-step validation procedure
+
+**Action**: Read the entire document and understand the investigation methodology.
+
+---
+
+### Step 2: Reproduce the Error with Minimal Test Case
+
+Create `/mnt/workdisk/public_share/hakmem/tests/test_tls_sll_minimal.c` based on template in the diagnostic document.
+
+```bash
+cd /mnt/workdisk/public_share/hakmem
+
+# Build minimal test
+gcc -g -O1 -I./core -I./core/box \
+    tests/test_tls_sll_minimal.c \
+    -L. -lhakmem -lpthread -o test_minimal
+
+# Run (should crash with TLS_SLL_HDR_RESET error)
+./test_minimal 2>&1 | grep -E "TLS_SLL_HDR_RESET|Segmentation"
+```
+
+**Expected Output**: Should reproduce the header corruption within first 100-1000 allocations.
+
+---
+
+### Step 3: Add Diagnostic Logging
+
+Instrument the following locations to capture when header corruption occurs:
+
+**Location A**: `core/hakmem_tiny_free.inc` - Header write before TLS SLL push
+```c
+// Around line 550: Before tls_sll_push()
+// ADD LOGGING:
+fprintf(stderr, "[HEADER_WRITE] base=%p, offset=%zu, writing 0x%02x\n",
+        base, offset, (HEADER_MAGIC | (class_idx & HEADER_CLASS_MASK)));
+```
+
+**Location B**: `core/box/tls_sll_box.h` - Header read during pop
+```c
+// Around line 282-303: In tls_sll_pop_impl()
+// ADD LOGGING:
+fprintf(stderr, "[HEADER_READ] base=%p, got=0x%02x, expected=0x%02x\n",
+        raw_base, got, expected);
+```
+
+**Location C**: `core/hakmem_tiny_refill.inc.h` - Magazine spill
+```c
+// Around line 228: Before/after tls_sll_push()
+// ADD LOGGING:
+fprintf(stderr, "[SPILL] class=%d, ptr=%p (wrapping to base)\n", class_idx, p);
+```
+
+**Action**: Add detailed logging to identify which allocation/free cycle causes corruption.
+
+---
+
+### Step 4: Run Diagnostic Test with Logging
+
+```bash
+# Rebuild with logging enabled
+make clean
+make shared -j8 EXTRA_CFLAGS="-g -O1 -DHAKMEM_TINY_DEBUG_LOGGING=1"
+
+# Run minimal test and capture log
+./test_minimal 2>&1 | tee diagnostic_output.txt
+
+# Analyze log to find last successful write before corruption
+grep HEADER_WRITE diagnostic_output.txt | tail -10
+grep HEADER_READ diagnostic_output.txt | grep -A1 -B1 "0x31"
+```
+
+**Expected Result**: Log will show exact allocation/free sequence leading to corruption.
+
+---
+
+### Step 5: Identify Root Cause (One of Six Patterns)
+
+Based on diagnostic logs, match against these patterns from the diagnostic document:
+
+1. **RAW Pointer vs BASE Pointer**: Wrong pointer type passed to tls_sll_push()
+2. **Header Offset Mismatch**: Writing at offset 1, reading at offset 0
+3. **Atomic Fence Missing**: Compiler reordering causing write-after-push
+4. **Adjacent Block Overflow**: User data from preceding block overwrites header
+5. **Class Index Mismatch**: Push with class_idx A, pop as class_idx B
+6. **Headerless Mode Interference**: Mixed header/headerless logic
+
+**Action**: Determine which pattern applies to your findings.
+
+---
+
+### Step 6: Implement Surgical Fix
+
+Once root cause is identified, apply a minimal fix (typically 1-5 lines):
+
+**Example fixes** (from diagnostic document):
+
+```c
+// Pattern 1 - RAW vs BASE pointer:
+// WRONG:
+tls_sll_push(class_idx, p, size);  // p is RAW pointer
+// FIXED:
+hak_base_ptr_t base = HAK_BASE_FROM_RAW(p);
+tls_sll_push(class_idx, base, size);
+
+// Pattern 2 - Offset mismatch:
+// WRONG:
+*(uint8_t*)((char*)base + 1) = header;  // Writing at offset 1
+// In pop: uint8_t h = *((uint8_t*)base);  // Reading at offset 0
+// FIXED:
+*(uint8_t*)base = header;  // Consistent offset
+
+// Pattern 3 - Atomic fence missing:
+// WRONG:
+*hdr = magic;
+tls_sll_push(...);
+// FIXED:
+*hdr = magic;
+atomic_thread_fence(memory_order_release);  // Prevent reordering
+tls_sll_push(...);
+```
+
+**Action**: Apply fix to source code and rebuild.
+
+---
+
+### Step 7: Validate Fix
+
+```bash
+# Step 7a: Run minimal test
+./test_minimal 2>&1 | grep -E "TLS_SLL_HDR_RESET|passed|failed"
+
+# Step 7b: Run baseline benchmark
+make clean
+make shared -j8
+LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/sh8bench 2>&1 | \
+  grep -E "TLS_SLL_HDR_RESET|Total|PASSED|FAILED"
+
+# Step 7c: Run cfrac (memory intensive)
+LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/cfrac 2>&1 | \
+  grep -E "error|TLS_SLL_HDR_RESET|Total"
+
+# Step 7d: Check for regressions
+make test -j8 FILTER="tls_sll"
+```
+
+**Success Criteria**:
+- ✅ Minimal test completes without TLS_SLL_HDR_RESET
+- ✅ sh8bench runs to completion (several minutes)
+- ✅ cfrac completes without errors
+- ✅ All unit tests pass
+- ✅ No performance regression (< 5%)
+
+---
+
+## Commit & Documentation
+
+Once validated, commit with detailed message:
+
+```bash
+git add -A
+git commit -m "Fix TLS SLL header corruption in [Component]
+
+Root Cause:
+[Brief 1-2 sentence explanation of what was wrong]
+
+Pattern Affected:
+[Which of the 6 patterns this was]
+
+Fix Applied:
+[Minimal description of the fix]
+
+Validation:
+- [Test case] passed
+- [Benchmark] completed without TLS_SLL_HDR_RESET
+- No performance regression
+
+Related Issues:
+- TLS SLL baseline instability
+- Required for Phase 1/2 validation"
+```
+
+---
+
+## Reference Files
+
+| File | Purpose |
+|------|---------|
+| `docs/TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md` | **Complete diagnostic guide** - READ FIRST |
+| `core/box/tls_sll_box.h` | TLS SLL implementation (header validation at lines 282-303) |
+| `core/hakmem_tiny_free.inc` | Free path (header write before push, lines ~550) |
+| `core/hakmem_tiny_refill.inc.h` | Magazine spill (lines ~228) |
+| `docs/HEADERLESS_STABILITY_DEBUG_INSTRUCTIONS.md` | Test environment setup |
+| `debug_artifacts/headerless/` | Benchmark results showing error |
+
+---
+
+## Communication Plan
+
+**Status Updates**: After each step, provide brief status:
+- Step 2: "Reproducer created - X allocations before crash"
+- Step 3: "Logging added to [X locations]"
+- Step 4: "Log analysis complete - [pattern identified]"
+- Step 5: "Root cause identified: Pattern #[N]"
+- Step 6: "Fix applied - [brief description]"
+- Step 7: "Validation complete - [test results]"
+
+---
+
+## Post-Fix: Unblocking Next Phases
+
+Once this issue is fixed, the following can proceed:
+
+1. **Phase 1 Completion**: TLS Hint Box performance optimization (currently showing 2.3% improvement vs target 15-20%)
+2. **Phase 2 Validation**: Test Headerless mode (ON/OFF configurations)
+3. **Performance Benchmarking**: Full multi-test suite (TC1, TC2, TC3)
+4. **Future Phases**: Phase 102 (MemApi bridge), production optimization
+
+---
+
+## Success Metric
+
+**GOAL**: TC1 baseline test completes successfully with zero TLS_SLL_HDR_RESET errors.
+
+Current Status: ❌ FAILING (crashes at ~22 seconds)
+Target Status: ✅ PASSING (completion in 4-6 minutes)
+
+---
+
+**Questions?** Refer to the diagnostic document for detailed explanations of each pattern and debugging technique.
+
+**Ready to start?** Begin with Step 1: Read the full diagnostic guide.
+
+🚀 Your investigation begins now!
diff --git a/docs/GEMINI_HANDOFF_SUMMARY.md b/docs/GEMINI_HANDOFF_SUMMARY.md
new file mode 100644
index 00000000..dbfd3241
--- /dev/null
+++ b/docs/GEMINI_HANDOFF_SUMMARY.md
@@ -0,0 +1,296 @@
+# 📋 Handoff Summary for User & ChatGPT
+
+**Date**: 2025-12-03
+**From**: Claude Code (Haiku) + Task Agent (previous phases)
+**To**: User (decision maker) & ChatGPT (executor)
+**Status**: 🟢 All Handoff Documents Prepared - Ready for ChatGPT Execution
+
+---
+
+## What Has Been Completed
+
+### Documents Created Today (5 Files, 38 KB total)
+
+1. ✅ **`CHATGPT_CONTEXT_SUMMARY.md`** (8.5 KB)
+   - Quick reference: facts, architecture, commands
+   - Read time: 2-3 minutes
+   - First document to read
+
+2. ✅ **`CHATGPT_HANDOFF_TLS_DIAGNOSIS.md`** (8.6 KB)
+   - 7-step diagnostic procedure
+   - Follow time: 4-8 hours
+   - Main task document for ChatGPT
+
+3. ✅ **`README_HANDOFF_CHATGPT.md`** (12 KB)
+   - Master guide explaining all three documents
+   - How to use them together
+   - Expected timeline and checkpoints
+
+4. ✅ **`STATUS_2025_12_03_CURRENT.md`** (9.1 KB)
+   - Current project status
+   - Completed phases and pending tasks
+   - Metrics and history
+
+5. ✅ **`TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md`** (existing, 1,150+ lines)
+   - Deep reference document
+   - 6 root cause patterns with code examples
+   - Diagnostic logging instrumentation points
+   - Fix templates and validation procedures
+
+**Total Documentation**: 38 KB of new handoff materials + 1,150+ lines of diagnostic reference
+
+---
+
+## The Problem (Recap)
+
+hakmem baseline crashes with TLS SLL header corruption:
+
+```
+[TLS_SLL_HDR_RESET] cls=1 base=0x... got=0x31 expect=0xa1
+SIGSEGV (exit code 139)
+```
+
+**Status**: 🔴 CRITICAL BLOCKER
+**Scope**: Affects ALL configurations (shared code path)
+**Impact**: Cannot validate Phase 1 or proceed to Phase 2
+
+---
+
+## The Solution (Documented)
+
+Three comprehensive documents guide ChatGPT through a 7-step diagnostic and fix process:
+
+1. **Read context** (summary document)
+2. **Create minimal reproducer** (test case)
+3. **Add diagnostic logging** (instrumentation)
+4. **Run diagnostic test** (capture behavior)
+5. **Identify root cause** (match to one of 6 patterns)
+6. **Implement fix** (1-5 line code change)
+7. **Validate fix** (run benchmarks)
+
+**Expected Outcome**: TC1 baseline completes without crashes
+**Expected Duration**: 4-8 hours
+
+---
+
+## Handoff Contents
+
+### For ChatGPT
+
+The main handoff is structured as:
+
+```
+1. README_HANDOFF_CHATGPT.md
+   ↓ (start here - understand the 3-document system)
+
+2. CHATGPT_CONTEXT_SUMMARY.md
+   ↓ (read for quick facts & architecture)
+
+3. CHATGPT_HANDOFF_TLS_DIAGNOSIS.md
+   ↓ (follow the 7 steps)
+
+4. TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md
+   ↓ (reference for deep details during diagnosis)
+```
+
+### Files & Commands
+
+**All necessary information is in the documents:**
+- Build commands
+- Test commands
+- File locations
+- Code examples
+- Validation procedures
+- Commit templates
+
+**ChatGPT needs no external research** - all answers are in the documents.
+
+---
+
+## Key Metrics
+
+| Item | Value |
+|------|-------|
+| **Documents Created** | 5 files |
+| **Total Documentation** | 38 KB new + 1,150 lines reference |
+| **Diagnostic Steps** | 7 (clearly defined) |
+| **Root Cause Patterns** | 6 (documented with code examples) |
+| **Expected Fix Size** | 1-5 lines of code |
+| **Timeline Estimate** | 4-8 hours |
+
+---
+
+## Success Looks Like
+
+**BEFORE FIX**:
+```bash
+$ LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/sh8bench
+[TLS_SLL_HDR_RESET] cls=1 base=0x... got=0x31 expect=0xa1
+Segmentation fault
+```
+
+**AFTER FIX**:
+```bash
+$ LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/sh8bench
+Total: 54.5 Mops/s [no errors]
+✓ Completed successfully
+```
+
+---
+
+## Next Steps
+
+### For User
+
+**Option 1: Pass documents to ChatGPT immediately**
+- All documents ready in `/mnt/workdisk/public_share/hakmem/docs/`
+- ChatGPT can start diagnostics right away
+- Expected completion: 4-8 hours
+
+**Option 2: Review documents first**
+- Read `STATUS_2025_12_03_CURRENT.md` for overview
+- Read `README_HANDOFF_CHATGPT.md` to understand handoff structure
+- Then pass to ChatGPT when ready
+
+### For ChatGPT (When Handed Off)
+
+1. Read `README_HANDOFF_CHATGPT.md` (5 min)
+2. Read `CHATGPT_CONTEXT_SUMMARY.md` (2-3 min)
+3. Follow `CHATGPT_HANDOFF_TLS_DIAGNOSIS.md` steps 1-7 (4-8 hours)
+4. Consult `TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md` as reference during steps 3-7
+
+---
+
+## Project Context (For Reference)
+
+### Recent Work
+
+- ✅ **Phase 0**: Type safety framework (Phantom Types, Box theory)
+- ✅ **Phase 1**: TLS SuperSlab Hint Box implementation (6 unit tests passing)
+- ✅ **Phase 1 Optimization**: Only 2.3% improvement (target 15-20%)
+- ❌ **Stability Issue**: TLS SLL header corruption blocking all validation
+- ⏳ **Phase 2**: Headerless mode design complete, awaiting baseline stability
+
+### Critical Path to Unblock Phases
+
+```
+Fix TLS SLL header corruption (4-8 hours)
+    ↓
+Validate Phase 1 performance (1-2 hours)
+    ↓
+Proceed to Phase 2 Headerless testing (2-3 days)
+    ↓
+Complete Phase 102 planning (1 week)
+```
+
+---
+
+## Files Involved
+
+**Documentation**: `/mnt/workdisk/public_share/hakmem/docs/`
+```
+README_HANDOFF_CHATGPT.md              ← Master guide
+CHATGPT_CONTEXT_SUMMARY.md             ← Quick reference
+CHATGPT_HANDOFF_TLS_DIAGNOSIS.md       ← Step-by-step task
+TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md ← Deep reference
+STATUS_2025_12_03_CURRENT.md           ← Project status
+```
+
+**Source Code**: `/mnt/workdisk/public_share/hakmem/`
+```
+core/box/tls_sll_box.h                 ← Error source
+core/hakmem_tiny_free.inc              ← Header write location
+core/hakmem_tiny_refill.inc.h          ← Magazine spill
+(and many others - detailed in context summary)
+```
+
+---
+
+## Communication Checkpoints
+
+**After ChatGPT Step 2**: "Reproducer created - X allocations before crash"
+**After ChatGPT Step 4**: "Diagnostic logs show [pattern type]"
+**After ChatGPT Step 5**: "Root cause: Pattern #[N]"
+**After ChatGPT Step 6**: "Fix applied - [description]"
+**After ChatGPT Step 7**: "Validation complete - all tests pass"
+
+---
+
+## Risk Assessment
+
+| Risk | Mitigation |
+|------|-----------|
+| Fix too invasive | Only 1-5 lines expected, surgical approach |
+| Fix breaks other code | 6 validation tests in Step 7 |
+| Performance regression | < 5% threshold, < 1% expected |
+| Diagnosis takes too long | Step-by-step procedure keeps focus |
+
+**Overall Risk**: LOW (well-documented, clear success criteria)
+
+---
+
+## Summary for User
+
+### What's Ready
+
+✅ All diagnostic documentation complete
+✅ 7-step procedure clearly defined
+✅ 6 root cause patterns documented with code examples
+✅ Minimal test case template provided
+✅ Validation procedures detailed
+✅ Project context available
+
+### What's Needed from ChatGPT
+
+🎯 Execute the 7-step diagnostic procedure
+🎯 Identify which pattern caused the issue
+🎯 Implement surgical fix (1-5 lines)
+🎯 Validate with benchmarks
+🎯 Commit with detailed message
+
+### Timeline
+
+**Documentation**: ✅ Complete (0 hours)
+**ChatGPT Execution**: ⏳ 4-8 hours estimated
+**Project Unblock**: 🎯 Within 8 hours total
+
+---
+
+## Decision Point
+
+**Should ChatGPT proceed with diagnosis?**
+
+- **YES**: Pass the 5 documents to ChatGPT immediately
+  - Start: `README_HANDOFF_CHATGPT.md`
+  - Follow: `CHATGPT_HANDOFF_TLS_DIAGNOSIS.md`
+  - Reference: The other documents
+
+- **NO**: Review project first
+  - Read: `STATUS_2025_12_03_CURRENT.md`
+  - Then decide to handoff
+
+---
+
+## Success Metric (Clear & Measurable)
+
+✅ **SUCCESS** = TC1 baseline test completes without TLS_SLL_HDR_RESET errors
+
+---
+
+## Final Note
+
+This handoff is **complete and comprehensive**. Every piece of information ChatGPT needs is in the five documents. No external research required. The diagnostic methodology is sound. The fix is likely to be simple once identified.
+
+**Ready to hand off to ChatGPT.** 🚀
+
+---
+
+**Questions for ChatGPT before starting?** → They're answered in the documents.
+
+**Ready to proceed?** → Start with `README_HANDOFF_CHATGPT.md`
+
+---
+
+*Prepared by: Claude Code (Haiku) on 2025-12-03*
+*For: User + ChatGPT*
+*Status: ✅ Ready for handoff*
diff --git a/docs/HEADERLESS_STABILITY_DEBUG_INSTRUCTIONS.md b/docs/HEADERLESS_STABILITY_DEBUG_INSTRUCTIONS.md
new file mode 100644
index 00000000..568baa6e
--- /dev/null
+++ b/docs/HEADERLESS_STABILITY_DEBUG_INSTRUCTIONS.md
@@ -0,0 +1,228 @@
+# Headerless Stability Debug Instructions (Root-Cause / Fail-Fast)
+
+Quality bar for this playbook:
+
+| Metric | Score | Notes |
+| --- | --- | --- |
+| Coverage | 9/10 | Seven root-cause candidates + multiple probes |
+| Actionability | 9/10 | Copy/pasteable bash + gdb/asan commands |
+| Time budget | 10-22h | Phased so we can stop after each milestone |
+| Expected success | 85-90% | Parallel probes + bisect safety net |
+
+Goal (Definition of Done)
+- Reproduce, isolate, and permanently fix the headerless instability with a verified regression test.
+- Fix must be A/B switchable and observable (Box Theory: isolate boxes, single boundary, backout flag).
+
+Scope and signals
+- Both Headerless OFF and Headerless ON crash: suggests shared path, not just hint box.
+- Observed symptoms: TLS_SLL integrity failures, invalid free() pointers, hangs in sh8bench/cfrac.
+
+Box Theory anchors (work inside clear boxes, fail-fast, reversible)
+- Box 2: Remote queue push/drain (no owner/publish side effects).
+- Box 3: Ownership CAS (only at bind boundary).
+- Box 4: Publish/Adopt boundary (single drain->bind->owner acquire point).
+- Hint box: tls_ss_hint cache (guarded by `HAKMEM_TINY_SS_TLS_HINT`).
+- Backouts: `HAKMEM_TINY_HEADERLESS`, `HAKMEM_TINY_SS_TLS_HINT`, `HAKMEM_TINY_SS_ADOPT`, `HAKMEM_TINY_RF_FORCE_NOTIFY`.
+
+---
+
+## Step-by-Step Flow
+
+### 0) Pre-flight (15 min)
+- `ulimit -c unlimited`; ensure `git status -sb` clean enough to bisect.
+- Use single-thread first: `export HAKMEM_TINY_THREADS=1`.
+- Disable learn/ACE noise: `export HAKMEM_ACE_ENABLED=0 HAKMEM_LEARN=0`.
+- Keep artifacts: `mkdir -p debug_artifacts/headerless`.
+
+### 1) Test Case 1 — Headerless OFF (control)
+```bash
+cd /mnt/workdisk/public_share/hakmem
+make clean && make shared -j8
+LD_PRELOAD=./libhakmem.so timeout 30 ./mimalloc-bench/out/bench/sh8bench \
+  2>&1 | tee debug_artifacts/headerless/tc1_off.log | tail -40
+```
+Expected: completes with "Total elapsed time".  
+If it crashes: the base path (non-headerless) is already broken -> focus on shared free/registry first.
+
+### 2) Test Case 2 — Headerless ON, hint OFF
+```bash
+make clean && make shared -j8 EXTRA_CFLAGS="-DHAKMEM_TINY_HEADERLESS=1 -DHAKMEM_TINY_SS_TLS_HINT=0"
+LD_PRELOAD=./libhakmem.so timeout 30 ./mimalloc-bench/out/bench/sh8bench \
+  2>&1 | tee debug_artifacts/headerless/tc2_hdrless_nohint.log | tail -40
+```
+Outcome tells us whether headerless core path (without hint) is already unstable.
+
+### 3) Test Case 3 — Headerless ON, hint ON (Phase 1 path)
+```bash
+make clean && make shared -j8 EXTRA_CFLAGS="-DHAKMEM_TINY_HEADERLESS=1 -DHAKMEM_TINY_SS_TLS_HINT=1"
+LD_PRELOAD=./libhakmem.so timeout 30 ./mimalloc-bench/out/bench/sh8bench \
+  2>&1 | tee debug_artifacts/headerless/tc3_hdrless_hint.log | tail -40
+```
+If TC2 passes and TC3 fails, suspect hint cache / adopt boundary; otherwise suspect shared box.
+
+### 4) ASan pass (pinpoint corruption early)
+```bash
+make clean && make asan-shared-alloc -j8 \
+  EXTRA_CFLAGS="-DHAKMEM_TINY_HEADERLESS=1 -DHAKMEM_TINY_SS_TLS_HINT=1"
+LD_PRELOAD=./libhakmem_asan.so timeout 20 ./mimalloc-bench/out/bench/sh8bench \
+  2>&1 | tee debug_artifacts/headerless/asan_hdrless.log | head -200
+```
+If ASan is noisy, rerun with `HAKMEM_TINY_SS_TLS_HINT=0` to see if corruption follows the hint box.
+
+### 5) GDB capture (first crash)
+```bash
+make clean && make shared -j8 EXTRA_CFLAGS="-g -O1 -DHAKMEM_TINY_HEADERLESS=1 -DHAKMEM_TINY_SS_TLS_HINT=1"
+gdb --args ./mimalloc-bench/out/bench/sh8bench
+(gdb) set environment LD_PRELOAD ./libhakmem.so
+(gdb) run
+(gdb) bt
+(gdb) frame 0
+(gdb) info locals
+(gdb) x/4gx ptr  # replace ptr with the crashing pointer
+```
+Save to `debug_artifacts/headerless/gdb_bt.txt`.
+
+### 6) Git bisect (only after TC1 result is known)
+```bash
+git bisect start
+git bisect bad HEAD
+git bisect good <last-known-good>   # e.g., pre f3f75ba3d if that was stable
+# For each step:
+make clean && make shared -j8 EXTRA_CFLAGS="-DHAKMEM_TINY_HEADERLESS=1 -DHAKMEM_TINY_SS_TLS_HINT=1" || exit 125
+LD_PRELOAD=./libhakmem.so timeout 15 ./mimalloc-bench/out/bench/sh8bench && exit 0 || exit 1
+```
+Record each verdict in `debug_artifacts/headerless/bisect_log.txt`. Reset with `git bisect reset` after.
+
+---
+
+## Root-Cause Candidates (7) and Probes
+
+1) TLS hint cache stale/dangling (Box: hint)
+- Symptom: free() uses cached ss that was recycled; remote-dangling or wrong class.
+- Probe: log generation vs pointer range.
+```c
+fprintf(stderr, "[HINT_LOOKUP] ptr=%p ss=%p gen=%llu magic=%llx\n",
+        ptr, ss, ss ? (unsigned long long)ss->generation : 0,
+        ss ? (unsigned long long)ss->magic : 0);
+```
+- A/B: `HAKMEM_TINY_SS_TLS_HINT=0` should fully remove this path.
+
+2) TLS SLL normalize mismatch (Box: TLS SLL)
+- Symptom: headerless ptr hits queue expecting header offset.
+- Probe: in `core/box/tls_sll_box.h` around normalize/mismatch detection, log once:
+```c
+fprintf(stderr, "[TLS_SLL_MISMATCH] ptr=%p has_hdr=%d expect_hdr=%d q=%s\n",
+        ptr, actual_has_header, expected_has_header, queue_name);
+```
+- Check that `TLS_SLL_NORMALIZE_USERPTR/RAWPTR` is invoked at every push/pop boundary.
+
+3) SuperSlab registry stale or race (Box: registry boundary)
+- Symptom: registry returns freed slab; hint and registry disagree.
+- Probe: add generation/epoch in TinySuperSlab and compare on lookup; assert `SUPERSLAB_MAGIC`.
+- A/B: force registry path only by turning hint off; compare crash locus.
+
+4) Class index drift (Box: metadata)
+- Symptom: slab->class_idx corrupt -> wrong free list math.
+- Probe: after `slab_index_for()`, assert `class_idx < TINY_NUM_CLASSES`; log slab_idx/class_idx.
+- A/B: run small vs 1024-byte classes; see if only one class fails.
+
+5) Magazine wrap/unwrap slip (Box: refill/magazine)
+- Symptom: pointer stored raw, read as user (or vice versa) in refill spill.
+- Probe: instrument `core/hakmem_tiny_refill.inc` around magazine push/pop; dump raw/user pointer deltas.
+- A/B: force refill slow path only: `export HAKMEM_TINY_MUST_ADOPT=1`.
+
+6) Remote queue drain boundary breach (Box 2->4 boundary)
+- Symptom: remote drain merges freelist twice or skips owner check.
+- Probe: ring events or one-shot logs at `ss_remote_drain_to_freelist()` and adopt boundary:
+```c
+fprintf(stderr, "[REMOTE_DRAIN] ss=%p slab=%d count_before=%u\n", ss, slab_idx, remote_counts[slab_idx]);
+```
+- A/B: `HAKMEM_TINY_SS_ADOPT=0` to see if crash is tied to adopt boundary logic.
+
+7) Pointer wrap/unwrap toggle confusion (Box: pointer bridge)
+- Symptom: header offset applied twice or skipped.
+- Probe: assert alignment and expected delta at every `user_to_raw/raw_to_user` site in free path.
+- A/B: run with `HAKMEM_TINY_HEADERLESS=0` vs `1` with same workload; see if delta shows only in headerless.
+
+---
+
+## Data to Capture (single-pass, no log spam)
+- Logs: last 400 lines from each TC run; grep for `[TLS_SLL]`, `[HINT]`, `[REMOTE]`.
+- GDB: full `bt`, `frame 0`, `info locals`, and pointer dump.
+- ASan: first 150 lines including shadow/poison info.
+- Minimal repro: smallest C snippet or shell script that crashes within 30s.
+- Env stamp: `uname -a`, `lscpu | head -20`, `git rev-parse HEAD`.
+
+Format when reporting:
+```
+=== TC1 (Headerless OFF) ===
+Result: crash / hang / pass
+Last log lines: ...
+
+=== TC2 (Headerless ON, hint OFF) ===
+Result: ...
+
+=== TC3 (Headerless ON, hint ON) ===
+Result: ...
+
+=== ASan ===
+<first 20 lines + error site>
+
+=== GDB (first crash) ===
+<bt + frame 0 locals>
+```
+
+---
+
+## Observability and Guardrails (Box Theory)
+- One-shot logs only; no continuous debug spam. Use counters where possible.
+- Keep boundary single: drain->bind->owner_acquire only inside refill/adopt; do not add side effects in remote push/publish.
+- Toggleable fixes: wrap new checks with `#if defined(DEBUG_HDRLESS)` or env flags so we can A/B quickly.
+- Fail-fast: `assert`/`abort` on invalid class_idx, magic, or out-of-range pointers instead of silently recovering.
+
+---
+
+## Decision Tree
+- TC1 fails -> shared free/registry bug; ignore hint; inspect pointer normalize + registry first.
+- TC1 passes, TC2 fails -> headerless core path bug; focus on pointer normalize and class_idx drift.
+- TC2 passes, TC3 fails -> hint cache or adopt boundary; focus on stale hint + generation checks.
+- ASan shows UAF/double-free -> instrument free path and magazine spill; gate hint off to see if corruption follows.
+- Bisect isolates commit -> fix there, keep A/B flag, add regression test.
+
+---
+
+## Timeline (target 10-22h)
+- 2-4h: run TC1-3, capture GDB/ASan, decide branch of decision tree.
+- 4-8h: instrument relevant box (from candidates), build A/B toggles, derive minimal repro.
+- 2-6h: root-cause confirmation with repro + ASan clean pass.
+- 2-4h: implement fix, add regression test, verify all three test cases + baseline perf smoke.
+
+---
+
+## Quick Command Reference
+```bash
+# Clean builds
+make clean && make shared -j8
+make clean && make shared -j8 EXTRA_CFLAGS="-DHAKMEM_TINY_HEADERLESS=1 -DHAKMEM_TINY_SS_TLS_HINT=0"
+make clean && make shared -j8 EXTRA_CFLAGS="-DHAKMEM_TINY_HEADERLESS=1 -DHAKMEM_TINY_SS_TLS_HINT=1"
+make clean && make asan-shared-alloc -j8 EXTRA_CFLAGS="-DHAKMEM_TINY_HEADERLESS=1 -DHAKMEM_TINY_SS_TLS_HINT=1"
+
+# Runs
+LD_PRELOAD=./libhakmem.so timeout 30 ./mimalloc-bench/out/bench/sh8bench
+LD_PRELOAD=./libhakmem_asan.so timeout 20 ./mimalloc-bench/out/bench/sh8bench
+
+# GDB essentials
+gdb --args ./mimalloc-bench/out/bench/sh8bench
+(gdb) set environment LD_PRELOAD ./libhakmem.so
+(gdb) run
+(gdb) bt
+(gdb) frame 0
+(gdb) info locals
+
+# Bisect skeleton
+git bisect start
+git bisect bad HEAD
+git bisect good <good-sha>
+# build/test, mark good|bad|skip
+git bisect reset
+```
diff --git a/docs/README_HANDOFF_CHATGPT.md b/docs/README_HANDOFF_CHATGPT.md
new file mode 100644
index 00000000..9f5844a7
--- /dev/null
+++ b/docs/README_HANDOFF_CHATGPT.md
@@ -0,0 +1,378 @@
+# 🚀 ChatGPT Task Handoff - TLS SLL Header Corruption Fix
+
+**Target**: Claude (ChatGPT model)
+**Task**: Diagnose and fix critical TLS SLL header corruption
+**Status**: Ready for immediate handoff
+**Date**: 2025-12-03
+
+---
+
+## Quick Start (TL;DR)
+
+**The Problem**: hakmem baseline crashes with header corruption
+```
+[TLS_SLL_HDR_RESET] cls=1 base=0x... got=0x31 expect=0xa1
+```
+
+**Your Task**: Fix it using 7 documented steps
+
+**Documents You Need** (in order):
+1. 📖 **READ FIRST**: `CHATGPT_CONTEXT_SUMMARY.md` (2-3 min read)
+2. 📋 **FOLLOW**: `CHATGPT_HANDOFF_TLS_DIAGNOSIS.md` (7 detailed steps)
+3. 🔍 **REFERENCE**: `TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md` (1,150 lines of deep reference)
+
+**Success**: TC1 baseline test completes without crashes
+
+**Timeline**: 4-8 hours expected
+
+---
+
+## The Three Documents Explained
+
+### 1. CHATGPT_CONTEXT_SUMMARY.md
+
+**Purpose**: Quick reference and architecture overview
+**Read Time**: 2-3 minutes
+**Contains**:
+- What 0x31 means vs 0xa1
+- Project architecture (Box Theory)
+- Recent changes (5 commits)
+- The remaining issue explained simply
+- File locations and data structures
+- Build & test commands
+- Success criteria
+
+**When to Use**:
+- First thing to read
+- Reference when you need quick facts
+- Before diving into detailed diagnosis
+
+---
+
+### 2. CHATGPT_HANDOFF_TLS_DIAGNOSIS.md
+
+**Purpose**: Step-by-step task breakdown for fixing the issue
+**Follow Time**: 4-8 hours
+**Contains**:
+- Executive summary
+- 7 specific steps to diagnose and fix:
+  - Step 1: Read the diagnostic guide
+  - Step 2: Reproduce with minimal test
+  - Step 3: Add diagnostic logging
+  - Step 4: Run diagnostic test
+  - Step 5: Identify root cause pattern
+  - Step 6: Implement fix
+  - Step 7: Validate fix
+- Expected output for each step
+- How to identify which of 6 patterns caused the issue
+- Example fix code for each pattern
+- Validation criteria
+- Commit message template
+
+**When to Use**:
+- This is your TASK DOCUMENT
+- Follow the 7 steps in order
+- After each step, update status
+
+---
+
+### 3. TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md
+
+**Purpose**: Deep reference for detailed understanding
+**Reference Time**: As needed during diagnosis
+**Contains**:
+- 6 root cause patterns with full code examples
+- Minimal test case template
+- Detailed diagnostic logging instrumentation
+- Pattern-specific fix templates
+- 7-step validation procedure
+- Debugging techniques and tools
+
+**When to Use**:
+- During Step 3 (diagnostic logging)
+- During Step 5 (pattern matching)
+- During Step 6 (implementing fix)
+- As reference for understanding each pattern
+
+---
+
+## Document Relationships
+
+```
+┌─────────────────────────────────────────┐
+│ CHATGPT_CONTEXT_SUMMARY.md              │
+│ (Start here - 2-3 min)                  │
+│ ↓                                       │
+│ Quick facts + architecture overview     │
+└──────────────┬──────────────────────────┘
+               │
+               ↓
+┌──────────────────────────────────────────┐
+│ CHATGPT_HANDOFF_TLS_DIAGNOSIS.md        │
+│ (Follow these 7 steps - 4-8 hours)      │
+│ ↓                                        │
+│ Step 1: Read diagnostic guide            │
+│ Step 2: Create minimal reproducer        │
+│ Step 3: Add logging [→ consult ref #3]  │
+│ Step 4: Run diagnostic test              │
+│ Step 5: Match pattern [→ consult ref #3]│
+│ Step 6: Implement fix [→ consult ref #3]│
+│ Step 7: Validate                         │
+└──────────────┬───────────────────────────┘
+               │
+               ↓
+┌──────────────────────────────────────────┐
+│ TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md   │
+│ (Deep reference - consult as needed)     │
+│                                          │
+│ 6 Root Cause Patterns:                   │
+│ 1. RAW vs BASE pointer                   │
+│ 2. Header offset mismatch                │
+│ 3. Atomic fence missing                  │
+│ 4. Adjacent block overflow               │
+│ 5. Class index mismatch                  │
+│ 6. Headerless mode interference          │
+│                                          │
+│ For each pattern: code examples + fixes  │
+└──────────────────────────────────────────┘
+```
+
+---
+
+## How to Use These Documents
+
+### Before Starting
+
+1. **Read Summary** (2-3 min)
+   - Understand what the problem is
+   - Learn about the project architecture
+   - Know what tools you'll use
+
+2. **Skim Handoff** (5 min)
+   - Understand the 7-step process
+   - Know what's expected at each step
+   - Identify reference points
+
+### During Work
+
+3. **Follow Handoff Step-by-Step** (4-8 hours)
+   - Step 1: Read the diagnostic guide thoroughly
+   - Step 2: Create minimal reproducer
+   - Step 3: Add logging (reference diagnostic guide)
+   - Step 4: Run and capture output
+   - Step 5: Match observed behavior to patterns (reference diagnostic guide)
+   - Step 6: Implement fix (reference diagnostic guide for fix templates)
+   - Step 7: Validate success
+
+4. **Consult Diagnostic Guide as Needed**
+   - When you need pattern details (Step 5)
+   - When you need fix code templates (Step 6)
+   - When you need validation procedures (Step 7)
+
+### After Completion
+
+5. **Report Status**
+   - Which root cause pattern was identified
+   - What fix was applied
+   - Validation results
+   - Commit message
+
+---
+
+## Key Information to Know
+
+### The Error Explained
+
+```
+Error Message: [TLS_SLL_HDR_RESET] cls=1 base=0x... got=0x31 expect=0xa1
+
+Interpretation:
+- Location: Reading header byte from allocated block during free
+- Expected: 0xa1 (0xa0 MAGIC | class_idx=1)
+- Got: 0x31 (user data or corruption)
+- Meaning: Header was never written OR was overwritten
+
+Root Cause: One of 6 documented patterns
+```
+
+### Success Looks Like
+
+```bash
+# Before fix:
+$ LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/sh8bench
+[TLS_SLL_HDR_RESET] cls=1 base=0x... got=0x31 expect=0xa1
+Segmentation fault (code 139)
+Execution time: ~22 seconds before crash
+
+# After fix:
+$ LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/sh8bench
+Total: 54.5 Mops/s  [no TLS_SLL_HDR_RESET errors]
+Execution time: 4-6 minutes [completes successfully]
+```
+
+---
+
+## File Locations You'll Need
+
+| File | Purpose | Action |
+|------|---------|--------|
+| `core/box/tls_sll_box.h` | Error source | Read/understand |
+| `core/hakmem_tiny_free.inc` | Header write | Add logging |
+| `core/hakmem_tiny_refill.inc.h` | Magazine spill | Check for issues |
+| `core/box/ptr_conversion_box.h` | Pointer conversion | Understand logic |
+| `core/box/tiny_layout_box.h` | Class layout | Understand definitions |
+| `tests/test_tls_sll_minimal.c` | Your test | Create this |
+| `debug_artifacts/headerless/` | Benchmark logs | Reference existing |
+
+---
+
+## Commands You'll Use
+
+### Build & Test
+
+```bash
+# Clean build
+cd /mnt/workdisk/public_share/hakmem
+make clean
+make shared -j8
+
+# Run baseline (will currently crash)
+LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/sh8bench
+
+# Run minimal test (after creating it)
+./tests/test_tls_sll_minimal
+```
+
+### With Logging
+
+```bash
+# Build with debug logging
+make clean
+make shared -j8 EXTRA_CFLAGS="-g -O1 -DHAKMEM_TINY_DEBUG_LOGGING=1"
+
+# Capture diagnostic output
+./test_tls_sll_minimal 2>&1 | tee diagnostic_output.txt
+
+# Analyze logs
+grep HEADER_WRITE diagnostic_output.txt | tail -10
+grep -B5 "got=0x31" diagnostic_output.txt
+```
+
+---
+
+## What to Expect
+
+### Per-Step Timeline
+
+- **Step 1** (Read diagnostic guide): 30-45 min
+- **Step 2** (Create reproducer): 30-60 min
+- **Step 3** (Add logging): 1-2 hours
+- **Step 4** (Run test): 30 min
+- **Step 5** (Pattern matching): 1 hour
+- **Step 6** (Implement fix): 30 min - 1 hour
+- **Step 7** (Validate): 1-2 hours
+
+**Total**: 4-8 hours
+
+### What You'll Discover
+
+By the end of the process, you will have:
+- ✅ Identified which of 6 patterns caused the issue
+- ✅ Created a minimal reproducer
+- ✅ Added diagnostic logging to find corruption
+- ✅ Traced the exact allocation/free sequence causing the problem
+- ✅ Implemented a 1-5 line fix
+- ✅ Validated the fix works with multiple benchmarks
+- ✅ Understood the root cause completely
+
+---
+
+## Communication Checkpoints
+
+After completing each step, provide brief status:
+
+**Step 2**: "Reproducer created - crashes after X allocations"
+**Step 4**: "Diagnostic logs show pattern [A/B/C/etc]"
+**Step 5**: "Root cause identified as Pattern #[N]"
+**Step 6**: "Fix applied - [1-2 line description]"
+**Step 7**: "Validation: sh8bench passed, cfrac passed, no regressions"
+
+---
+
+## Success Criteria (Clear & Measurable)
+
+| Criterion | Status |
+|-----------|--------|
+| Minimal reproducer created | ✅ Expected |
+| Root cause identified (one of 6 patterns) | ✅ Expected |
+| Diagnostic logging captured | ✅ Expected |
+| Fix implemented (1-5 lines) | ✅ Expected |
+| sh8bench completes without crashes | ✅ TARGET |
+| cfrac completes without crashes | ✅ TARGET |
+| Unit tests pass | ✅ TARGET |
+| < 5% performance regression | ✅ TARGET |
+
+---
+
+## If You Get Stuck
+
+**Problem**: Can't reproduce the error
+- **Solution**: Check if build includes logging headers. Verify LD_PRELOAD path is correct.
+
+**Problem**: Logs don't show expected pattern
+- **Solution**: Check if you're logging at the right locations. Reference diagnostic guide for exact instrumentation points.
+
+**Problem**: Multiple patterns seem possible
+- **Solution**: Add more detailed logging to narrow down. Reference diagnostic guide's pattern-specific logging recommendations.
+
+**Problem**: Fix doesn't resolve the issue
+- **Solution**: Validate that logging shows the assumed pattern. May need to test a different pattern. Try pattern #2, #3, etc. in order.
+
+---
+
+## Next Steps After Completion
+
+Once TLS SLL header corruption is fixed:
+
+1. **Validate Phase 1 Performance** (Currently 2.3%, target 15-20%)
+   - Profile with perf/cachegrind
+   - Identify secondary bottlenecks
+   - Consider cache size optimization
+
+2. **Proceed to Phase 2** (Headerless mode)
+   - Implement HAKMEM_TINY_HEADERLESS toggle
+   - Test alignment guarantees
+   - Benchmark performance trade-offs
+
+3. **Plan Phase 102** (MemApi bridge)
+   - Connect hakmem to nyrt Ring0 runtime
+   - Design integration points
+
+---
+
+## Questions Before Starting?
+
+- ❓ What is Box Theory? → Read the Context Summary
+- ❓ What are Phantom Types? → Read the Context Summary
+- ❓ What are the 6 root cause patterns? → They're in the Diagnostic Guide
+- ❓ How do I add logging? → Step 3 of Handoff document + Diagnostic Guide
+
+**All answers are in the three documents. No need for external research.**
+
+---
+
+## You're Now Ready! 🚀
+
+1. **Read** `CHATGPT_CONTEXT_SUMMARY.md` (2-3 min)
+2. **Follow** `CHATGPT_HANDOFF_TLS_DIAGNOSIS.md` (7 steps, 4-8 hours)
+3. **Reference** `TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md` (as needed)
+
+**Start with Step 1 of the Handoff document.**
+
+**Expected outcome**: TLS SLL header corruption diagnosed and fixed. ✅
+
+**Next review**: After fix is validated and committed.
+
+---
+
+**Good luck! The investigation methodology is solid, the documentation is comprehensive, and the fix is likely to be simple once identified. 💪**
diff --git a/docs/SEGFAULT_INVESTIGATION_FOR_GEMINI.md b/docs/SEGFAULT_INVESTIGATION_FOR_GEMINI.md
new file mode 100644
index 00000000..19741926
--- /dev/null
+++ b/docs/SEGFAULT_INVESTIGATION_FOR_GEMINI.md
@@ -0,0 +1,272 @@
+# Segmentation Fault 調査指示書 for Gemini
+
+Version: 1.0 (2025-12-03)
+Status: Phase 2 Headerless 実装中に segfault 発生
+
+---
+
+## 🔍 現状
+
+### ビルド状況
+
+- ✅ **ビルド成功**: `libhakmem.so` が正常に生成される
+- ✅ インクルード順序エラー解決済み
+- ⚠️ **実行時エラー**: Segmentation Fault が発生
+
+### Segfault 情報
+
+**報告内容**:
+- Phase 2 Headerless 実装中に segfault 発生
+- ビルドは通るが実行時にクラッシュ
+- 詳細なエラーメッセージは未報告
+
+---
+
+## 🎯 調査目標
+
+1. **Segfault が発生する正確な条件を特定**
+2. **どのコンポーネントが原因か判定**
+3. **修正パッチを提案**
+
+---
+
+## 📋 調査手順
+
+### Step 1: デバッグビルド＆ GDB での実行
+
+```bash
+cd /mnt/workdisk/public_share/hakmem
+
+# クリーンビルド（デバッグシンボル付き）
+find . -name "*.o" -delete
+make clean
+make shared -j8 EXTRA_CFLAGS="-g -O1"
+
+# GDB で実行
+gdb --args ./mimalloc-bench/out/bench/sh8bench
+
+# GDB 内:
+(gdb) run
+# → Segfault が発生したら:
+(gdb) backtrace
+(gdb) frame 0
+(gdb) info locals
+(gdb) disassemble
+```
+
+### Step 2: ASan（AddressSanitizer）での検証
+
+```bash
+# ASan ビルド
+make clean
+make asan-shared-alloc -j8
+
+# ASan 実行（詳細なエラー情報が出力される）
+LD_PRELOAD=./libhakmem_asan.so ./mimalloc-bench/out/bench/sh8bench 2>&1 | head -100
+```
+
+**ASan が出力する情報**:
+- どのアドレスでクラッシュしたか
+- どの関数で発生したか
+- メモリ破壊の詳細
+
+### Step 3: 最小限のテストプログラム作成
+
+Segfault が頻繁に発生する場合、最小限のテストプログラムを作成して確認：
+
+```c
+// tests/test_segfault_minimal.c
+
+#include <stdio.h>
+#include <stdlib.h>
+#include "../core/hakmem.h"
+
+int main() {
+    printf("Test 1: Simple malloc\n");
+    void* ptr1 = malloc(15);
+    printf("  malloc(15) = %p\n", ptr1);
+
+    printf("Test 2: Simple free\n");
+    free(ptr1);
+    printf("  free() succeeded\n");
+
+    printf("Test 3: Multiple allocations\n");
+    for (int i = 0; i < 100; i++) {
+        void* p = malloc(15);
+        free(p);
+    }
+    printf("  100 alloc/free cycles succeeded\n");
+
+    printf("Test 4: Concurrent-like pattern\n");
+    void* ptrs[10];
+    for (int i = 0; i < 10; i++) {
+        ptrs[i] = malloc(15 + i);
+    }
+    for (int i = 0; i < 10; i++) {
+        free(ptrs[i]);
+    }
+    printf("  Concurrent pattern succeeded\n");
+
+    return 0;
+}
+```
+
+### Step 4: Headerless フラグの確認
+
+Headerless モード（Phase 2）での動作確認：
+
+```bash
+# Headerless OFF（Phase 1 互換）
+make clean
+make shared -j8
+LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/sh8bench 2>&1 | grep -E "error|Segmentation|Total"
+
+# Headerless ON（Phase 2）
+make clean
+make shared -j8 EXTRA_CFLAGS="-DHAKMEM_TINY_HEADERLESS=1"
+LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/sh8bench 2>&1 | grep -E "error|Segmentation|Total"
+```
+
+**確認項目**:
+- [ ] Headerless OFF で segfault が出ないか
+- [ ] Headerless ON で segfault が出るか
+- [ ] Phase 1 と Phase 2 のどちらの問題か判定
+
+---
+
+## 🔧 よくある Segfault 原因と確認方法
+
+### 原因1: Use-After-Free
+
+**兆候**:
+- Free 後のポインタアクセス
+- GDB: `backtrace` に free → access の順序が見える
+
+**確認コマンド**:
+```bash
+# ASan で USE_AFTER_FREE エラーが報告される
+LD_PRELOAD=./libhakmem_asan.so ./test 2>&1 | grep -i "use.*after.*free"
+```
+
+### 原因2: Buffer Overflow
+
+**兆候**:
+- 配列境界外アクセス
+- 隣接メモリの破壊
+
+**確認コマンド**:
+```bash
+# ASan で BUFFER_OVERFLOW エラーが報告される
+LD_PRELOAD=./libhakmem_asan.so ./test 2>&1 | grep -i "buffer\|overflow"
+```
+
+### 原因3: NULL ポインタデリファレンス
+
+**兆候**:
+- `malloc()` が NULL を返す
+- NULL チェックなしでアクセス
+
+**確認コマンド**:
+```bash
+# GDB で frame 0 の命令が NULL の dereference か確認
+(gdb) disassemble
+# → 「mov $0x0」「dereference」のパターン
+```
+
+### 原因4: メモリリーク → ヒープ枯渇
+
+**兆候**:
+- 長時間実行でメモリ使用量が増加
+- やがてメモリ割り当て失敗 → segfault
+
+**確認コマンド**:
+```bash
+# メモリ使用量を監視しながら実行
+( while true; do ps aux | grep sh8bench | grep -v grep | awk '{print $6}'; sleep 1; done ) &
+LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/sh8bench
+```
+
+---
+
+## 📝 調査報告形式
+
+Segfault の調査が完了したら、以下の形式で報告してください：
+
+```markdown
+## Segfault 調査結果
+
+### 環境
+- ビルドオプション: [e.g., "-DHAKMEM_TINY_HEADERLESS=1"]
+- テスト内容: [e.g., "sh8bench"]
+
+### GDB 情報
+\`\`\`
+(gdb) backtrace
+#0  0x... in function_name ()
+#1  0x... in caller_function ()
+...
+
+(gdb) frame 0
+#0  address in function_name ()
+at file.c:123
+\`\`\`
+
+### ASan 出力
+[ASan error output if available]
+
+### 根本原因
+[Your analysis of the root cause]
+
+### 修正案
+[Proposed fix]
+```
+
+---
+
+## 🎯 実装フロー
+
+**推奨手順**:
+
+1. **Step 1-2 実行**: GDB + ASan で問題を特定
+2. **Step 3 実行**: 最小限テストプログラムで再現
+3. **Step 4 実行**: Headerless ON/OFF の判定
+4. **修正提案**: 原因に基づいた修正をコード提示
+
+---
+
+## 📚 参考資料
+
+### これまでのドキュメント
+- `docs/REFACTOR_PLAN_GEMINI_ENHANCED.md` - 全体計画
+- `docs/PHASE2_HEADERLESS_INSTRUCTION_FOR_GEMINI.md` - Phase 2 実装指示
+- `docs/tls_sll_hdr_reset_final_report.md` - Phase 2 の背景
+
+### デバッグツール
+- GDB: `gdb --args ./program`
+- ASan: `make asan-shared-alloc`
+- Valgrind: `valgrind --leak-check=full ./program`
+
+### 既知の課題
+- TLS_SLL_HDR_RESET は Phase 2 で解決予定
+- Headerless モード実装中のため、不安定な可能性あり
+
+---
+
+## 💡 ヒント
+
+1. **頻繁に segfault が発生する場合**:
+   - 最小限テストプログラムを使用して条件を狭める
+   - GDB で `run` → `backtrace` → `frame 0` の順で実行
+
+2. **ASan のエラーメッセージが出ない場合**:
+   - ASan が検出できない微妙なメモリ破壊の可能性
+   - GDB で manual inspection
+
+3. **Headerless モードが原因の場合**:
+   - Phase 2 指示書の Task 2.1-2.7 を見直す
+   - 特に Task 2.4（Free パスの class_idx 取得）が怪しい
+
+---
+
+Gemini の調査力に期待しています！
+根本原因の特定と修正パッチの提案をお願いします。🚀
diff --git a/docs/STATUS_2025_12_03_CURRENT.md b/docs/STATUS_2025_12_03_CURRENT.md
new file mode 100644
index 00000000..0d3615ba
--- /dev/null
+++ b/docs/STATUS_2025_12_03_CURRENT.md
@@ -0,0 +1,296 @@
+# Project Status - 2025-12-03
+
+**Last Updated**: 2025-12-03 (Current)
+**Status**: 🔴 CRITICAL BLOCKER - TLS SLL Header Corruption Detected
+**Overall Phase**: Phase 1 Implementation + Phase 2 Design (Blocked)
+
+---
+
+## Summary
+
+The hakmem memory allocator project has reached a critical stability issue during Phase 1 performance benchmarking. The baseline configuration crashes with a TLS SLL header corruption error that affects **all configurations**, indicating a shared code path problem rather than a Phase 1 specific issue.
+
+---
+
+## Completed Phases ✅
+
+### Phase 0: Type Safety & Box Architecture Framework
+- ✅ Phantom Types implementation (`ptr_type_box.h`)
+- ✅ Pointer conversion API (`ptr_conversion_box.h`)
+- ✅ Root cause analysis verified (Gemini's mathematical proof)
+- ✅ Box theory framework established
+- ✅ Include order dependencies resolved (commit 2dc9d5d59)
+- ✅ Magazine Spill pointer wrapping fixed (commit f3f75ba3d)
+
+### Phase 1: Logic Centralization & Optimization (TLS Hint Box)
+- ✅ Designed TLS SuperSlab Hint Box (`tls_ss_hint_box.h`)
+- ✅ Implemented 5-function API (init, lookup, update, clear, stats)
+- ✅ Integrated into free path (lines 477-481, 550-555)
+- ✅ Integrated into alloc path (lines 115-122, 179-186)
+- ✅ Created 6 unit tests - **ALL PASSING**
+- ✅ Compiled as header-only (zero overhead when disabled)
+- ⚠️ Performance benchmarking: Only 2.3% improvement vs target 15-20%
+
+### Phase 2: Headerless Mode Design
+- ✅ Comprehensive design document (21KB)
+- ✅ All 7 task specifications documented
+- ✅ A/B toggle flag designed (HAKMEM_TINY_HEADERLESS)
+- ✅ SuperSlab Registry integration planned
+- ✅ TLS SLL validation skipping documented
+- ❌ **BLOCKED**: Cannot proceed - baseline instability
+
+---
+
+## Current Critical Issue 🔴
+
+### Symptom
+
+```
+[TLS_SLL_HDR_RESET] cls=1 base=0x7ef296abf8c8 got=0x31 expect=0xa1 count=0
+Segmentation fault (core dumped)
+```
+
+### Location
+
+- **File**: `core/box/tls_sll_box.h`
+- **Lines**: 282-303
+- **Function**: `tls_sll_pop_impl()`
+- **Operation**: Header validation during free path
+
+### Impact
+
+- ❌ TC1 (Baseline) crashes after ~22 seconds of execution
+- ❌ Cannot validate Phase 1 performance improvements
+- ❌ Cannot proceed to Phase 2 implementation
+- ❌ Cannot benchmark any configuration variant
+
+### Root Cause
+
+**Unknown** - One of six documented patterns:
+
+1. RAW pointer vs BASE pointer type mismatch
+2. Header offset mismatch (write vs read location)
+3. Atomic fence missing (compiler/CPU reordering)
+4. Adjacent block overflow corrupting header
+5. Class index mismatch during push/pop
+6. Headerless mode interference
+
+---
+
+## Documents Created for Diagnosis
+
+Three comprehensive documents have been created to guide the fix:
+
+1. **`docs/CHATGPT_CONTEXT_SUMMARY.md`**
+   - Quick facts about the problem
+   - Architecture overview
+   - File locations and data structures
+   - Timeline estimate: 4-8 hours
+
+2. **`docs/CHATGPT_HANDOFF_TLS_DIAGNOSIS.md`**
+   - Step-by-step 7-step task breakdown
+   - Detailed instructions for each phase
+   - Expected validation criteria
+   - Success metrics
+
+3. **`docs/TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md`** (Existing, 1,150+ lines)
+   - Deep dive into all 6 root cause patterns
+   - Code examples for each pattern
+   - Minimal test case template
+   - Diagnostic logging instrumentation
+   - Fix code templates
+   - 7-step validation procedure
+
+---
+
+## What Needs to Happen
+
+### Immediate (Blocking)
+
+1. **[CHATGPT TASK]** Diagnose TLS SLL header corruption
+   - Use the three diagnostic documents
+   - Follow 7-step process
+   - Expected delivery: 4-8 hours
+   - Success criterion: TC1 baseline completes without crashes
+
+### After Diagnosis
+
+2. **[DEPENDS ON #1]** Validate Phase 1 performance
+   - Run full benchmarks (TC1, TC2, TC3)
+   - Confirm TLS Hint Box improves performance
+   - Identify optimization opportunities
+
+3. **[DEPENDS ON #1]** Proceed to Phase 2
+   - Implement Headerless mode (ON/OFF toggle)
+   - Validate alignment guarantees
+   - Benchmark performance trade-offs
+
+4. **[DEPENDS ON #1-3]** Phase 102 Planning
+   - Design MemApi bridge
+   - Connect hakmem to nyrt Ring0 runtime
+
+---
+
+## Recent Git History
+
+```
+ad852e5d5 - Priority-2 ENV Cache: hakmem_batch.c (1変数追加、1箇所置換)
+b741d61b4 - Priority-2 ENV Cache: hakmem_debug.c (1変数追加、1箇所置換)
+22a67e5ca - Priority-2 ENV Cache: hakmem_smallmid.c (1変数追加、1箇所置換)
+f0e77a000 - Priority-2 ENV Cache: hakmem_tiny.c (3箇所置換)
+183b10673 - Priority-2 ENV Cache: Shared Pool Release (1箇所置換)
+
+[Earlier commits in THIS session:]
+94f9ea51  - Implement TLS SuperSlab Hint Box (Phase 1) ✅
+           - Header-only implementation (256 lines)
+           - 5 function APIs
+           - 6 unit tests - ALL PASSING
+           - Benchmarked at only 2.3% improvement
+
+f3f75ba3d - Fix Magazine Spill RAW pointer type conversion ✅
+           - Added HAK_BASE_FROM_RAW() wrapping
+           - hakmem_tiny_refill.inc.h:228
+           - Verified with cfrac/sh8bench
+
+2dc9d5d59 - Fix include order in hakmem.c ✅
+           - Moved hak_kpi_util.inc.h before hak_core_init.inc.h
+           - Resolved undefined reference errors
+           - Clean build verified
+```
+
+---
+
+## File Statistics
+
+| Category | Count | Status |
+|----------|-------|--------|
+| **Core Implementation** | 47 files | ✅ Compiles |
+| **Box Components** | 15 files | ✅ Box theory applied |
+| **Test Suite** | 23 tests | ⚠️ 6 TLS Hint tests PASS, 17 others untested due to crash |
+| **Documentation** | 12 documents | ✅ Comprehensive |
+| **Build Artifacts** | libhakmem.so | ✅ Generates (547 KB) |
+
+---
+
+## Build Status
+
+```
+$ make clean && make shared -j8
+✅ Compilation: SUCCESS
+✅ Linking: SUCCESS
+✅ Output: ./libhakmem.so (547 KB)
+✅ Debug symbols: Included (-g flag)
+
+$ LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/sh8bench
+❌ Execution: SEGFAULT
+Error: [TLS_SLL_HDR_RESET] cls=1 base=0x... got=0x31 expect=0xa1
+Exit Code: 139 (SIGSEGV)
+Runtime: ~22 seconds before crash
+```
+
+---
+
+## Key Metrics
+
+| Metric | Value | Status |
+|--------|-------|--------|
+| **Compilation Time** | 8-12 sec | ✅ Good |
+| **Executable Size** | 547 KB | ✅ Reasonable |
+| **Baseline Performance** | N/A | ❌ Crashes |
+| **Phase 1 Optimization** | 2.3% | ⚠️ Below target (15-20%) |
+| **Code Coverage** | Unknown | ⏳ Pending baseline fix |
+
+---
+
+## Next Steps (Clearly Defined)
+
+### For ChatGPT (Immediate Handoff)
+
+**Task**: Diagnose and fix TLS SLL header corruption
+
+**Documents to Use**:
+1. `docs/CHATGPT_CONTEXT_SUMMARY.md` - Quick reference
+2. `docs/CHATGPT_HANDOFF_TLS_DIAGNOSIS.md` - Step-by-step instructions
+3. `docs/TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md` - Deep reference
+
+**Steps**:
+1. Read diagnostic documents
+2. Create minimal reproducer
+3. Add diagnostic logging
+4. Run diagnostic test
+5. Identify root cause pattern
+6. Implement surgical fix (1-5 lines)
+7. Validate with TC1 baseline test
+
+**Success Criterion**:
+- ✅ sh8bench runs to completion
+- ✅ cfrac runs without errors
+- ✅ No TLS_SLL_HDR_RESET errors
+- ✅ < 5% performance regression
+
+---
+
+## Notes for Future Reference
+
+### Architecture Decisions Locked In
+
+1. **Box Theory**: Each component is isolated with clear APIs
+2. **Phantom Types**: Type safety in Debug mode, zero-cost in Release
+3. **Pointer Conversion**: Centralized in `ptr_conversion_box.h`
+4. **Layout Definitions**: Centralized in `tiny_layout_box.h`
+5. **TLS SLL**: Thread-local single-linked list with header validation
+6. **SuperSlab Registry**: Maps free pointers to class information (Phase 2)
+
+### Known Working Patterns
+
+- Magazine Spill RAW→BASE wrapping (fixed)
+- Include order dependencies (fixed)
+- Unit test framework (6 TLS Hint tests passing)
+- Box header-only compilation (verified)
+
+### Known Issues Needing Diagnosis
+
+- TLS SLL header corruption (PRIMARY BLOCKER)
+- Phase 1 performance below target (SECONDARY - optimization opportunity)
+- Headerless mode not yet validated (DEPENDS ON PRIMARY FIX)
+
+---
+
+## Handoff Status
+
+✅ **All diagnostic documents prepared**
+✅ **Comprehensive step-by-step instructions created**
+✅ **Root cause patterns documented with code examples**
+✅ **Minimal test case template provided**
+✅ **Validation procedures detailed**
+
+🎯 **Ready for ChatGPT handoff**
+
+Next: Pass the three documents to ChatGPT with the directive to follow the 7-step process.
+
+---
+
+## Questions for Next Phase
+
+After the fix is complete, the following should be investigated:
+
+1. Why is Phase 1 performance only 2.3% improvement vs expected 15-20%?
+   - Is 4 slots enough for the cache?
+   - Are there secondary bottlenecks?
+   - Does perf/cachegrind show cache misses?
+
+2. Can Phase 2 Headerless provide better performance than Phase 1?
+   - What are the trade-offs?
+   - Is the SuperSlab Registry lookup overhead worth it?
+
+3. How does hakmem compare to mimalloc and jemalloc across different workloads?
+   - Are there specific use cases where hakmem excels?
+   - Where does it fall short?
+
+---
+
+**Status**: 🔴 CRITICAL - Awaiting ChatGPT diagnosis and fix
+
+**Estimated Resolution Time**: 4-8 hours from ChatGPT engagement
+
+**Next Review**: After ChatGPT completes TLS SLL diagnosis and fix
diff --git a/docs/TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md b/docs/TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md
new file mode 100644
index 00000000..9aa49f6a
--- /dev/null
+++ b/docs/TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md
@@ -0,0 +1,1111 @@
+# TLS SLL Header Corruption Diagnosis & Fix Instructions for ChatGPT
+
+## Problem Statement
+
+**Symptom**:
+- Baseline (Headerless OFF) crashes with SIGSEGV
+- Error log: `[TLS_SLL_HDR_RESET] cls=1 base=0x7ef296abf8c8 got=0x31 expect=0xa1 count=0`
+- Location: `core/box/tls_sll_box.h` header integrity check during pop operation
+
+**Root Cause**:
+Header byte at offset 0 from base pointer contains user data (0x31) instead of header magic (0xa1).
+This indicates one of:
+1. Wrong pointer is being stored in TLS SLL
+2. Header is not being written correctly before push
+3. Adjacent block corruption overwrites header
+4. Header write/read offset mismatch
+
+**Impact**:
+- TLS SLL header reset occurs (entire freelist for class 1 dropped)
+- Subsequent allocations may fail or use wrong metadata
+- Benchmark crashes with SIGSEGV
+- Memory corruption potential
+
+**Timeline**:
+- Discovered during Phase 1 TLS Hint Box benchmarking
+- Affects baseline configuration (no hints involved)
+- Suggests pre-existing issue in shared TLS SLL code
+
+---
+
+## Investigation Strategy
+
+**Phase A: Understand the Error**
+- Where is header validation happening?
+- What does 0x31 represent? (Is it deterministic or random data?)
+- Can we reproduce with minimal allocations?
+
+**Phase B: Locate Corruption Source**
+- Where is header supposed to be written?
+- Is header being written BEFORE push or after?
+- Are there any recent changes to header write logic?
+
+**Phase C: Implement Fix**
+- Add instrumentation to catch corruption early
+- Identify exact allocation/free cycle causing problem
+- Fix root cause (not just symptom)
+
+**Phase D: Validate**
+- TC1 baseline should complete without crashes
+- TC2/TC3 can then be evaluated
+- No performance regression
+
+---
+
+## Deep Dive: TLS SLL Header Corruption
+
+### What is 0x31?
+
+The error reports `got=0x31`. Let's understand what this means:
+
+```c
+// Expected (header magic for class 1):
+0xa1 = 0xa0 (HEADER_MAGIC) | 0x01 (class_idx)
+
+// Got:
+0x31 = 0b00110001
+     = ASCII '1' character
+     = Some piece of user data or metadata
+```
+
+**Questions to answer**:
+1. Is 0x31 always the same, or does it vary? (Deterministic vs random corruption)
+2. Does 0x31 correspond to any known data pattern in hakmem?
+3. Does the corruption happen during alloc or free?
+4. Is 0x31 part of the test program's data?
+
+### TLS SLL Header Check Logic
+
+**Location**: `/mnt/workdisk/public_share/hakmem/core/box/tls_sll_box.h` (around lines 280-320)
+
+```c
+// In tls_sll_pop_impl():
+if (tiny_class_preserves_header(class_idx)) {
+    uint8_t* b = (uint8_t*)raw_base;
+    uint8_t got = *b;  // Read byte at offset 0 of base pointer
+    uint8_t expected = (uint8_t)(HEADER_MAGIC | (class_idx & HEADER_CLASS_MASK));
+
+    if (got != expected) {
+        // CORRUPTION DETECTED!
+        fprintf(stderr, "[TLS_SLL_HDR_RESET] cls=%d base=%p got=0x%02x expect=0x%02x ...\n",
+                class_idx, raw_base, got, expected);
+        // ... reset logic follows
+    }
+}
+```
+
+**Key Points**:
+- Header is read at `(uint8_t*)raw_base` (offset 0)
+- Expected value is `0xa0 | class_idx`
+- For class 1: expect `0xa1`
+- Got `0x31` instead (user data)
+
+### When Does This Happen?
+
+The error occurs during `tls_sll_pop()`, which is called when:
+1. **Freelist refill**: Taking blocks from TLS SLL back to unified cache
+2. **Magazine spill**: Freelist → TLS SLL transition for overflow
+3. **Allocation path**: Pulling blocks from TLS SLL to satisfy malloc
+
+**The header corruption must have happened BEFORE push**, but is detected AFTER pop.
+
+This suggests:
+- Either the pointer stored in TLS SLL is wrong (points to wrong location)
+- Or the header was never written correctly
+- Or adjacent block corruption overwrote the header
+- Or there's an offset calculation error between push and pop
+
+---
+
+## Diagnostic Procedure
+
+### Step 1: Reproduce with Minimal Test
+
+Create the smallest possible test case:
+
+**File**: `/mnt/workdisk/public_share/hakmem/tests/test_tls_sll_minimal.c`
+
+```c
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+
+int main() {
+    printf("Test 1: Simple alloc/free cycle\n");
+    for (int i = 0; i < 10; i++) {
+        void* p = malloc(16);  // Class 1
+        if (p) {
+            memset(p, 0x31, 16);  // Write user data (includes 0x31!)
+            free(p);
+        }
+    }
+    printf("✓ Test 1 passed\n");
+
+    printf("Test 2: Rapid alloc/free (trigger refill)\n");
+    for (int i = 0; i < 1000; i++) {
+        void* p = malloc(16);
+        if (p) {
+            memset(p, 0x31, 16);
+            free(p);
+        }
+    }
+    printf("✓ Test 2 passed\n");
+
+    printf("Test 3: Multiple sizes\n");
+    for (int size = 8; size <= 512; size *= 2) {
+        for (int j = 0; j < 100; j++) {
+            void* p = malloc(size);
+            if (p) {
+                memset(p, 0x31, size);
+                free(p);
+            }
+        }
+    }
+    printf("✓ Test 3 passed\n");
+
+    printf("Test 4: Heavy churn (trigger SLL push/pop)\n");
+    void* ptrs[100];
+    for (int round = 0; round < 10; round++) {
+        for (int i = 0; i < 100; i++) {
+            ptrs[i] = malloc(16);
+            if (ptrs[i]) memset(ptrs[i], 0x31, 16);
+        }
+        for (int i = 0; i < 100; i++) {
+            free(ptrs[i]);
+        }
+    }
+    printf("✓ Test 4 passed\n");
+
+    return 0;
+}
+```
+
+**Build and test**:
+```bash
+cd /mnt/workdisk/public_share/hakmem
+mkdir -p tests
+gcc -o tests/test_tls_sll_minimal tests/test_tls_sll_minimal.c
+LD_PRELOAD=./libhakmem.so ./tests/test_tls_sll_minimal
+```
+
+**Goal**: Find the minimal reproduction:
+- If test 1 fails: Early corruption (basic alloc/free)
+- If test 2 fails: Refill-related corruption
+- If test 3 fails: Class-specific issue
+- If test 4 fails: SLL push/pop cycling issue
+
+### Step 2: Add Diagnostic Logging
+
+Instrument the header write/read paths:
+
+#### Instrument Header Write
+
+**File**: `/mnt/workdisk/public_share/hakmem/core/hakmem_tiny_config_box.inc`
+
+Find the `HAK_RET_ALLOC` macro and add logging:
+
+```c
+// Add diagnostic logging
+#define HAK_RET_ALLOC(base, cls) do { \
+    fprintf(stderr, "[ALLOC_HEADER_WRITE] base=%p cls=%d\n", base, cls); \
+    uint8_t* hdr = (uint8_t*)(base); \
+    uint8_t magic = (uint8_t)(0xa0 | ((cls) & 0x0f)); \
+    *hdr = magic; \
+    fprintf(stderr, "[ALLOC_HEADER_WROTE] base=%p magic=0x%02x (at %p)\n", base, *hdr, hdr); \
+    __atomic_thread_fence(__ATOMIC_RELEASE); \
+    hak_user_ptr_t user = ptr_base_to_user(base, cls); \
+    fprintf(stderr, "[ALLOC_RETURN] user=%p (base=%p + %ld)\n", user, base, (char*)user - (char*)base); \
+    return user; \
+} while(0)
+```
+
+#### Instrument Header Read
+
+**File**: `/mnt/workdisk/public_share/hakmem/core/box/tls_sll_box.h`
+
+Modify the header read/check in `tls_sll_pop_impl()`:
+
+```c
+// In tls_sll_pop_impl(), before the check:
+if (tiny_class_preserves_header(class_idx)) {
+    uint8_t* b = (uint8_t*)raw_base;
+    uint8_t got = *b;
+    uint8_t expected = (uint8_t)(HEADER_MAGIC | (class_idx & HEADER_CLASS_MASK));
+
+    // NEW DIAGNOSTIC LOGGING:
+    fprintf(stderr, "[TLS_SLL_POP_CHECK] class=%d raw_base=%p checking at %p\n",
+            class_idx, raw_base, b);
+    fprintf(stderr, "[TLS_SLL_POP_READ] got=0x%02x expected=0x%02x\n", got, expected);
+
+    if (got != expected) {
+        fprintf(stderr, "[CORRUPTION_DETECTED] Mismatch! Dumping context...\n");
+        fprintf(stderr, "[CORRUPTION_CONTEXT] raw_base=%p, offset=%ld\n", raw_base, (char*)b - (char*)raw_base);
+
+        // Dump surrounding bytes
+        fprintf(stderr, "[CORRUPTION_DUMP] Bytes around base: ");
+        for (int i = -8; i < 16; i++) {
+            fprintf(stderr, "%02x ", ((uint8_t*)raw_base)[i]);
+        }
+        fprintf(stderr, "\n");
+
+        // ... existing reset logic
+    }
+}
+```
+
+#### Instrument SLL Push
+
+**File**: `/mnt/workdisk/public_share/hakmem/core/box/tls_sll_box.h`
+
+Find `tls_sll_push_impl()` and add logging:
+
+```c
+static inline bool tls_sll_push_impl(..., hak_base_ptr_t ptr, ...) {
+    fprintf(stderr, "[TLS_SLL_PUSH] class=%d ptr=%p\n", class_idx, ptr);
+
+    // Check header BEFORE push
+    if (tiny_class_preserves_header(class_idx)) {
+        uint8_t hdr = *(uint8_t*)ptr;
+        fprintf(stderr, "[TLS_SLL_PUSH_HDR_CHECK] ptr=%p header=0x%02x\n", ptr, hdr);
+    }
+
+    // ... existing push logic
+}
+```
+
+**Build and run**:
+```bash
+make clean
+make shared -j8
+LD_PRELOAD=./libhakmem.so ./tests/test_tls_sll_minimal 2>&1 | grep -E "ALLOC|POP|PUSH|CORRUPTION" | head -100
+```
+
+**What to look for**:
+- Do ALLOC_HEADER_WRITE and TLS_SLL_PUSH_HDR_CHECK match?
+- Does TLS_SLL_POP_READ show corruption?
+- What is the sequence: WRITE → PUSH → POP?
+- Are pointers consistent across operations?
+
+### Step 3: Examine Header Write Locations
+
+Search for all places headers are written:
+
+```bash
+cd /mnt/workdisk/public_share/hakmem
+grep -rn "= 0xa\|= HEADER_MAGIC\|= TINY_HEADER\|0xa0 |" core/ --include="*.h" --include="*.c" --include="*.inc"
+```
+
+Expected locations:
+1. `core/hakmem_tiny_config_box.inc` - HAK_RET_ALLOC macro
+2. `core/box/tls_sll_box.h` - Optional header write on SLL push (if needed)
+3. `core/tiny_alloc_fast_push.c` - Fast path allocations
+4. Other allocation paths?
+
+**Check each location**:
+- Is the offset correct? (Should be offset 0 from base)
+- Is it written BEFORE or AFTER pushing to TLS SLL?
+- Is there an atomic fence to prevent reordering?
+- Is the class_idx valid?
+
+### Step 4: Examine Pointer Conversion Logic
+
+The key question: **Are we storing the right pointer in TLS SLL?**
+
+**File**: `/mnt/workdisk/public_share/hakmem/core/hakmem_tiny_types.h`
+
+Check the pointer conversion macros:
+
+```bash
+cd /mnt/workdisk/public_share/hakmem
+grep -A5 "ptr_user_to_base\|ptr_base_to_user\|HAK_BASE_FROM_RAW" core/hakmem_tiny_types.h
+```
+
+**Critical questions**:
+1. When we free a user pointer, do we convert it to base pointer correctly?
+2. When we push to TLS SLL, do we push the base pointer or user pointer?
+3. When we pop from TLS SLL, do we get back the exact same base pointer?
+
+**Expected flow**:
+```
+Alloc: BASE → (write header at BASE) → (convert to USER) → return USER
+Free:  USER → (convert to BASE) → (push BASE to TLS SLL)
+Pop:   (pop BASE from TLS SLL) → (read header at BASE) → validate
+```
+
+If any step uses wrong offset, corruption occurs.
+
+### Step 5: Git Blame on Recent Changes
+
+```bash
+cd /mnt/workdisk/public_share/hakmem
+git log --oneline -30
+git show b5be708b6  # "Fix potential freelist corruption"
+git show c91602f18  # "Fix ptr_user_to_base_blind regression"
+git show f3f75ba3d  # "Fix magazine spill RAW pointer"
+```
+
+**Check**: Did any of these changes affect header write logic?
+
+Look for:
+- Changes to `HAK_RET_ALLOC` macro
+- Changes to pointer conversion logic
+- Changes to TLS SLL push/pop
+- Changes to header offset calculations
+
+### Step 6: Review Commit History for TLS SLL
+
+```bash
+cd /mnt/workdisk/public_share/hakmem
+git log --oneline --all -- core/box/tls_sll_box.h | head -20
+git log -p --all -- core/box/tls_sll_box.h | head -200
+```
+
+Look for:
+- When was header logic last changed?
+- Were there any defensive fixes recently?
+- Any atomic fence changes?
+- Any offset calculation changes?
+
+### Step 7: Check Phase 1 Configuration
+
+**File**: `/mnt/workdisk/public_share/hakmem/core/hakmem_tiny_types.h`
+
+Verify the header configuration:
+
+```c
+// Phase 1: headerless = false → headers ON
+// Header should be at offset 0 of base pointer
+#define TINY_HEADER_SIZE_BYTES 1
+#define HEADER_MAGIC 0xa0
+```
+
+**Check**:
+- Is HEADERLESS defined? (Should be undefined for Phase 1)
+- Is header size correct? (Should be 1 byte)
+- Are offset calculations consistent?
+
+---
+
+## Likely Root Causes (Narrowed)
+
+### Root Cause A: Header Written at Wrong Offset
+
+**Symptom**: User data appears where header should be
+
+**Check**:
+```c
+// In HAK_RET_ALLOC, are we writing at the right place?
+// Phase 1: header at offset 0 of base
+uint8_t* hdr_ptr = (uint8_t*)base;  // Should be offset 0
+*hdr_ptr = magic;
+
+// If this was changed to:
+uint8_t* hdr_ptr = (uint8_t*)base + 1;  // WRONG! User data location
+*hdr_ptr = magic;
+// Then header is written in user space, gets overwritten
+```
+
+**How to verify**:
+```bash
+cd /mnt/workdisk/public_share/hakmem
+grep -n "HAK_RET_ALLOC" core/hakmem_tiny_config_box.inc
+# Check that header write is at (uint8_t*)base, not base+offset
+```
+
+**Fix**: Ensure header write is at `(uint8_t*)base`, not base+offset.
+
+### Root Cause B: User Pointer Pushed Instead of Base Pointer
+
+**Symptom**: SLL contains user pointers, but pop expects base pointers
+
+**Sequence**:
+```c
+// During free:
+void* user_ptr = ...;  // User pointer (base + 1 for Phase 1)
+tls_sll_push(class_idx, user_ptr);  // WRONG! Should be base pointer
+
+// During pop:
+void* popped = tls_sll_pop(class_idx);  // Gets user_ptr
+uint8_t header = *(uint8_t*)popped;  // Reads at user_ptr, not base_ptr!
+// This reads user data instead of header
+```
+
+**How to verify**:
+```bash
+cd /mnt/workdisk/public_share/hakmem
+grep -rn "tls_sll_push" core/ --include="*.c" --include="*.inc" -A3 -B3
+# Check that all pushes use base pointer, not user pointer
+```
+
+**Fix**: Convert user pointer to base pointer before pushing:
+```c
+hak_base_ptr_t base = ptr_user_to_base(user_ptr, class_idx);
+tls_sll_push(class_idx, base, cap);
+```
+
+### Root Cause C: Atomic Fence Missing
+
+**Symptom**: Compiler reorders header write after SLL push
+
+**Check**:
+```c
+*(uint8_t*)base = header_magic;  // Instruction 1
+__atomic_thread_fence(__ATOMIC_RELEASE);  // Fence (required!)
+tls_sll_push(class_idx, base);  // Instruction 2
+```
+
+If fence is missing, CPU/compiler might:
+1. Schedule push before header write
+2. Other thread sees unprepared node in SLL
+3. Pop reads unwritten header → corruption
+
+**How to verify**:
+```bash
+cd /mnt/workdisk/public_share/hakmem
+grep -B5 "tls_sll_push" core/ --include="*.c" --include="*.inc" | grep -E "fence|barrier|atomic"
+# Check that fence exists between header write and push
+```
+
+**Fix**: Add `__atomic_thread_fence(__ATOMIC_RELEASE)` after header write, before SLL push.
+
+### Root Cause D: Magazine Spill Pointer Wrapping
+
+**Symptom**: Magazine stores RAW pointer, SLL expects BASE pointer
+
+**Already Fixed**: Commit f3f75ba3d added `HAK_BASE_FROM_RAW()` wrapper
+
+**Verify**:
+```bash
+cd /mnt/workdisk/public_share/hakmem
+grep -n "HAK_BASE_FROM_RAW\|magazine.*spill" core/hakmem_tiny_refill.inc.h
+# Check line 228 or nearby has the fix
+```
+
+**Expected code**:
+```c
+void* p = mag->items[--mag->top].ptr;
+hak_base_ptr_t base_p = HAK_BASE_FROM_RAW(p);  // Must have this!
+if (!tls_sll_push(class_idx, base_p, cap)) {
+    // ...
+}
+```
+
+**Fix**: If missing, add `HAK_BASE_FROM_RAW()` wrapper around raw pointer.
+
+### Root Cause E: Class Index Mismatch
+
+**Symptom**: Wrong class_idx used for header magic
+
+**Check**:
+```c
+int class_idx = ...;  // Where does this come from?
+uint8_t magic = (uint8_t)(0xa0 | (class_idx & 0x0f));
+// If class_idx is wrong (e.g., -1 or 999), magic will be corrupt
+```
+
+**How to verify**:
+```bash
+cd /mnt/workdisk/public_share/hakmem
+grep -rn "class_idx\|tiny_size_to_class" core/ --include="*.h" | grep -E "= -1|= 0xff"
+# Look for places where class_idx might be invalid
+```
+
+**Fix**: Validate class_idx is in range [0, 7] before using:
+```c
+if (class_idx < 0 || class_idx >= TINY_NUM_CLASSES) {
+    fprintf(stderr, "[ERROR] Invalid class_idx: %d\n", class_idx);
+    abort();
+}
+```
+
+### Root Cause F: Offset Calculation Error
+
+**Symptom**: Header written at base, but read at base+offset (or vice versa)
+
+**Check**:
+```c
+// During alloc:
+*(uint8_t*)base = magic;  // Write at base+0
+user = base + 1;  // User at base+1 (Phase 1)
+
+// During free/pop:
+base = user - 1;  // Should recover original base
+uint8_t hdr = *(uint8_t*)base;  // Should read at base+0
+
+// BUT if conversion is wrong:
+base = user - 0;  // WRONG! Off by one
+uint8_t hdr = *(uint8_t*)base;  // Reads at wrong location
+```
+
+**How to verify**:
+```bash
+cd /mnt/workdisk/public_share/hakmem
+grep -A10 "ptr_user_to_base_impl\|ptr_base_to_user_impl" core/hakmem_tiny_types.h
+# Check offset calculations are consistent
+```
+
+**Fix**: Ensure offset calculations match between:
+- `ptr_base_to_user` (add offset)
+- `ptr_user_to_base` (subtract same offset)
+
+---
+
+## Proposed Fix Patterns
+
+Based on diagnostic results, the fix will likely be one of:
+
+### Fix Pattern 1: Restore Header Write Logic
+
+**Problem**: Header write uses wrong offset or wrong pointer
+
+**File**: `/mnt/workdisk/public_share/hakmem/core/hakmem_tiny_config_box.inc`
+
+```c
+#define HAK_RET_ALLOC(base, cls) do { \
+    /* Write header FIRST at offset 0 of base */ \
+    *(uint8_t*)(base) = (uint8_t)(0xa0 | ((cls) & 0x0f)); \
+    /* Ensure header write completes before next operation */ \
+    __atomic_thread_fence(__ATOMIC_RELEASE); \
+    /* Now convert to user pointer and return */ \
+    return ptr_base_to_user((base), (cls)); \
+} while(0)
+```
+
+### Fix Pattern 2: Add Missing Fence
+
+**Problem**: Compiler reorders header write after SLL push
+
+**File**: `/mnt/workdisk/public_share/hakmem/core/tiny_alloc_fast_push.c` or `core/hakmem_tiny_free.inc`
+
+```c
+// Before push to TLS SLL:
+*(uint8_t*)base = header_magic;
+__atomic_thread_fence(__ATOMIC_RELEASE);  // ADD THIS LINE
+tls_sll_push(class_idx, base, cap);
+```
+
+### Fix Pattern 3: Fix Pointer Type in Push
+
+**Problem**: User pointer pushed instead of base pointer
+
+**File**: Multiple locations (search for `tls_sll_push`)
+
+```c
+// In free path:
+void* user_ptr = ptr;  // From user
+hak_base_ptr_t base_ptr = ptr_user_to_base(user_ptr, class_idx);  // Convert!
+if (!tls_sll_push(class_idx, base_ptr, cap)) {  // Push base, not user
+    // ...
+}
+```
+
+### Fix Pattern 4: Validate Inputs
+
+**Problem**: Invalid class_idx or pointer values
+
+**File**: `/mnt/workdisk/public_share/hakmem/core/box/tls_sll_box.h`
+
+```c
+// At entry of tls_sll_push_impl():
+static inline bool tls_sll_push_impl(..., hak_base_ptr_t ptr, ...) {
+    // Validate inputs
+    if (class_idx < 0 || class_idx >= TINY_NUM_CLASSES) {
+        fprintf(stderr, "[ERROR] Invalid class_idx: %d\n", class_idx);
+        return false;
+    }
+    if (!ptr || ptr == (void*)-1) {
+        fprintf(stderr, "[ERROR] Invalid pointer: %p\n", ptr);
+        return false;
+    }
+
+    // ... existing logic
+}
+```
+
+### Fix Pattern 5: Check Magazine Spill
+
+**Problem**: Magazine spill uses wrong pointer type
+
+**File**: `/mnt/workdisk/public_share/hakmem/core/hakmem_tiny_refill.inc.h`
+
+```c
+// Around line 228 (magazine spill):
+void* p = mag->items[--mag->top].ptr;
+
+// MUST convert RAW to BASE before pushing:
+hak_base_ptr_t base_p = HAK_BASE_FROM_RAW(p);  // Essential!
+
+if (!tls_sll_push(class_idx, base_p, cap)) {
+    // ... error handling
+}
+```
+
+**Verify fix exists**:
+```bash
+cd /mnt/workdisk/public_share/hakmem
+grep -n "HAK_BASE_FROM_RAW" core/hakmem_tiny_refill.inc.h
+# Should see it used before tls_sll_push
+```
+
+### Fix Pattern 6: Fix Offset Calculation
+
+**Problem**: Pointer conversion uses wrong offset
+
+**File**: `/mnt/workdisk/public_share/hakmem/core/hakmem_tiny_types.h`
+
+```c
+// Verify Phase 1 offsets:
+static inline hak_user_ptr_t ptr_base_to_user_impl(hak_base_ptr_t base, int cls) {
+    if (tiny_class_preserves_header(cls)) {
+        return (hak_user_ptr_t)((uint8_t*)base + TINY_HEADER_SIZE_BYTES);  // +1 for Phase 1
+    }
+    return (hak_user_ptr_t)base;
+}
+
+static inline hak_base_ptr_t ptr_user_to_base_impl(hak_user_ptr_t user, int cls) {
+    if (tiny_class_preserves_header(cls)) {
+        return (hak_base_ptr_t)((uint8_t*)user - TINY_HEADER_SIZE_BYTES);  // -1 for Phase 1
+    }
+    return (hak_base_ptr_t)user;
+}
+```
+
+**Check**: Ensure +1 and -1 match, and TINY_HEADER_SIZE_BYTES is 1.
+
+---
+
+## Debug Workflow
+
+### Quick Debug Cycle
+
+```bash
+cd /mnt/workdisk/public_share/hakmem
+
+# 1. Make changes to source
+# ... edit files ...
+
+# 2. Rebuild
+make clean && make shared -j8
+
+# 3. Test with minimal reproducer
+LD_PRELOAD=./libhakmem.so ./tests/test_tls_sll_minimal 2>&1 | tee debug.log
+
+# 4. Check for errors
+grep "TLS_SLL_HDR_RESET\|CORRUPTION\|SIGSEGV" debug.log
+
+# 5. Analyze log patterns
+grep -E "ALLOC|PUSH|POP" debug.log | head -50
+```
+
+### Advanced Debug: GDB
+
+```bash
+cd /mnt/workdisk/public_share/hakmem
+
+# Build with debug symbols
+make clean
+CFLAGS="-g -O0" make shared -j8
+
+# Run under GDB
+gdb --args ./tests/test_tls_sll_minimal
+```
+
+**GDB commands**:
+```gdb
+(gdb) set environment LD_PRELOAD ./libhakmem.so
+(gdb) break tls_sll_push_impl
+(gdb) break tls_sll_pop_impl
+(gdb) run
+(gdb) print /x *(uint8_t*)ptr  # Check header byte
+(gdb) print class_idx
+(gdb) backtrace
+(gdb) continue
+```
+
+### Memory Corruption Detection
+
+Enable AddressSanitizer:
+
+```bash
+cd /mnt/workdisk/public_share/hakmem
+make clean
+CFLAGS="-fsanitize=address -g" LDFLAGS="-fsanitize=address" make shared -j8
+LD_PRELOAD=./libhakmem.so ./tests/test_tls_sll_minimal
+```
+
+ASan will catch:
+- Buffer overflows
+- Use-after-free
+- Double-free
+- Invalid pointer arithmetic
+
+---
+
+## After Applying Fix
+
+### Step 1: Rebuild and Test Minimal Reproducer
+
+```bash
+cd /mnt/workdisk/public_share/hakmem
+make clean
+make shared -j8
+LD_PRELOAD=./libhakmem.so ./tests/test_tls_sll_minimal
+```
+
+**Expected**:
+- All tests pass
+- No `[TLS_SLL_HDR_RESET]` errors
+- No SIGSEGV crashes
+
+### Step 2: Run TC1 Baseline Test
+
+```bash
+cd /mnt/workdisk/public_share/hakmem
+make clean
+make shared -j8
+LD_PRELOAD=./libhakmem.so timeout 20 ./mimalloc-bench/out/bench/sh8bench 2>&1 | tail -20
+```
+
+**Expected**:
+- "Total elapsed time..." message
+- No SIGSEGV
+- Completion within timeout
+
+### Step 3: Run Full Benchmark Suite
+
+```bash
+cd /mnt/workdisk/public_share/hakmem
+
+# cfrac test
+LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/cfrac 2>&1 | head -10
+
+# larson test
+LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/larson 8 2>&1 | tail -10
+
+# sh6bench test
+LD_PRELOAD=./libhakmem.so timeout 20 ./mimalloc-bench/out/bench/sh6bench 2>&1 | tail -5
+```
+
+**Expected**: All pass without crashes or corruption errors
+
+### Step 4: Regression Check
+
+Ensure fix doesn't break other configurations:
+
+```bash
+cd /mnt/workdisk/public_share/hakmem
+
+# Test Phase 2 (headerless=true) - if implemented
+# ... config changes ...
+# make clean && make shared -j8
+# LD_PRELOAD=./libhakmem.so ./tests/test_tls_sll_minimal
+
+# Test with different workloads
+LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/mstress 10 2
+LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/rptest 10
+```
+
+### Step 5: Performance Check
+
+Verify no performance regression:
+
+```bash
+cd /mnt/workdisk/public_share/hakmem
+
+# Before fix (save baseline):
+LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/sh8bench 2>&1 | grep "Total elapsed"
+# Note: May crash, but if it runs, record time
+
+# After fix:
+LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/sh8bench 2>&1 | grep "Total elapsed"
+
+# Compare: Should be within 5% of baseline (if baseline worked)
+```
+
+### Step 6: Remove Diagnostic Logging
+
+After fix is confirmed, remove debug logging:
+
+```bash
+cd /mnt/workdisk/public_share/hakmem
+
+# Remove fprintf statements added for diagnosis
+# Restore original HAK_RET_ALLOC macro
+# Restore original tls_sll_push/pop implementations
+
+# Rebuild clean version
+make clean
+make shared -j8
+
+# Final test
+LD_PRELOAD=./libhakmem.so ./tests/test_tls_sll_minimal
+LD_PRELOAD=./libhakmem.so timeout 20 ./mimalloc-bench/out/bench/sh8bench
+```
+
+### Step 7: Commit with Detailed Message
+
+```bash
+cd /mnt/workdisk/public_share/hakmem
+git status
+git add [modified files]
+git commit -m "Fix TLS SLL header corruption
+
+Problem: Header magic byte being corrupted during allocation/free path,
+causing [TLS_SLL_HDR_RESET] errors and SIGSEGV crashes in baseline tests.
+
+Symptoms:
+- sh8bench crashes with SIGSEGV
+- Error: [TLS_SLL_HDR_RESET] cls=1 got=0x31 expect=0xa1
+- Header validation fails during tls_sll_pop()
+
+Root cause: [DESCRIBE WHAT WAS WRONG - e.g.:]
+- User pointer was being pushed to TLS SLL instead of base pointer
+- Header read at wrong offset due to pointer type mismatch
+- Missing atomic fence allowed reordering of header write
+
+Solution: [DESCRIBE WHAT WAS FIXED - e.g.:]
+- Convert user pointer to base pointer before tls_sll_push()
+- Add atomic fence after header write, before SLL operations
+- Validate pointer types at SLL entry points
+
+Changes:
+- core/hakmem_tiny_config_box.inc: Fixed HAK_RET_ALLOC header offset
+- core/box/tls_sll_box.h: Added pointer validation
+- core/hakmem_tiny_free.inc: Convert to base ptr before push
+
+Validation:
+- test_tls_sll_minimal passes (4/4 tests)
+- sh8bench baseline completes successfully
+- cfrac/larson/sh6bench pass without crashes
+- No performance regression (<2% variance)
+
+Verified: TC1 baseline stability restored, ready for Phase 1 testing"
+```
+
+---
+
+## Expected Timeline
+
+**Phase A: Understanding (1-2 hours)**
+- Read this document
+- Understand TLS SLL architecture
+- Review header mechanism
+- Locate relevant code sections
+
+**Phase B: Diagnosis (2-4 hours)**
+- Create minimal test case
+- Add diagnostic logging
+- Run tests and analyze logs
+- Identify root cause
+
+**Phase C: Fix Implementation (1-2 hours)**
+- Implement surgical fix
+- Remove diagnostic logging
+- Clean build and test
+
+**Phase D: Validation (1-2 hours)**
+- Run full test suite
+- Verify no regressions
+- Performance check
+- Document and commit
+
+**Total: 5-10 hours** for complete diagnosis, fix, and validation
+
+---
+
+## Success Criteria
+
+**Must Have**:
+1. No `[TLS_SLL_HDR_RESET]` errors in baseline tests
+2. sh8bench completes without SIGSEGV
+3. Minimal test suite passes (4/4 tests)
+4. Fix is surgical (minimal code changes)
+5. Root cause documented clearly
+
+**Nice to Have**:
+1. Performance neutral (<5% variance)
+2. Fix applies to all configurations
+3. Additional validation checks added
+4. Regression tests added
+
+**Verification**:
+```bash
+cd /mnt/workdisk/public_share/hakmem
+LD_PRELOAD=./libhakmem.so timeout 20 ./mimalloc-bench/out/bench/sh8bench 2>&1 | grep -E "Total elapsed|RESET|SIGSEGV"
+# Should show "Total elapsed time" with no errors
+```
+
+---
+
+## Common Pitfalls
+
+### Pitfall 1: Fixing Symptoms, Not Root Cause
+
+**Wrong approach**:
+```c
+// Just disable the check
+if (got != expected) {
+    // Do nothing, ignore corruption
+}
+```
+
+**Right approach**:
+- Understand WHY corruption happens
+- Fix the source (wrong pointer, wrong offset, etc.)
+- Keep the validation check enabled
+
+### Pitfall 2: Over-Engineering
+
+**Wrong approach**:
+- Rewrite entire TLS SLL system
+- Add complex locking mechanisms
+- Change fundamental architecture
+
+**Right approach**:
+- Minimal fix (usually 1-5 lines)
+- Fix pointer conversion or offset
+- Add fence if missing
+
+### Pitfall 3: Ignoring Test Results
+
+**Wrong approach**:
+- Fix compiles, assume it works
+- Skip minimal reproducer
+- Don't verify with benchmarks
+
+**Right approach**:
+- Test with minimal case FIRST
+- Verify all benchmarks pass
+- Check performance impact
+
+### Pitfall 4: Removing Too Much Logging Too Early
+
+**Wrong approach**:
+- Remove diagnostic logging immediately
+- Hard to debug if issue returns
+
+**Right approach**:
+- Keep logging until fix is verified
+- Remove logging in separate commit
+- Document what was learned
+
+---
+
+## Additional Resources
+
+### Key Files to Understand
+
+1. `/mnt/workdisk/public_share/hakmem/core/box/tls_sll_box.h`
+   - TLS SLL push/pop implementation
+   - Header validation logic
+
+2. `/mnt/workdisk/public_share/hakmem/core/hakmem_tiny_config_box.inc`
+   - HAK_RET_ALLOC macro
+   - Header write logic
+
+3. `/mnt/workdisk/public_share/hakmem/core/hakmem_tiny_types.h`
+   - Pointer conversion macros
+   - ptr_user_to_base / ptr_base_to_user
+
+4. `/mnt/workdisk/public_share/hakmem/core/hakmem_tiny_refill.inc.h`
+   - Magazine spill logic
+   - TLS SLL interaction
+
+### Useful Git Commands
+
+```bash
+# Find when header logic changed
+git log -p --all -S "0xa0" -- core/
+
+# Find recent changes to TLS SLL
+git log --oneline -20 -- core/box/tls_sll_box.h
+
+# Compare current vs previous version
+git diff HEAD~5 core/hakmem_tiny_config_box.inc
+
+# Find all references to a function
+git grep -n "tls_sll_push" core/
+```
+
+### Debugging Commands
+
+```bash
+# Check header size configuration
+grep -n "TINY_HEADER\|HEADERLESS" core/hakmem_tiny_types.h
+
+# Find all allocation return points
+grep -rn "HAK_RET_ALLOC\|return.*user" core/ --include="*.inc"
+
+# Find all TLS SLL push calls
+grep -rn "tls_sll_push" core/ --include="*.c" --include="*.inc" -B3 -A3
+
+# Check atomic operations
+grep -rn "atomic_thread_fence\|__atomic\|memory_order" core/ --include="*.h"
+```
+
+---
+
+## Questions to Answer During Diagnosis
+
+1. **What is 0x31?**
+   - Is it always 0x31, or does it vary?
+   - Does it correspond to test data?
+   - Is it ASCII '1' character?
+
+2. **Where is the header written?**
+   - In HAK_RET_ALLOC macro?
+   - In tls_sll_push?
+   - Somewhere else?
+
+3. **Where is the header read?**
+   - In tls_sll_pop?
+   - In allocation path?
+
+4. **Are offsets consistent?**
+   - Write at offset X
+   - Read at offset X
+   - Both use same base pointer?
+
+5. **Are pointer types correct?**
+   - Push base or user pointer?
+   - Pop returns base or user pointer?
+   - Conversions correct?
+
+6. **Is there a fence?**
+   - Between header write and SLL push?
+   - Between SLL pop and header read?
+
+7. **Is class_idx valid?**
+   - In range [0, 7]?
+   - Matches actual allocation size?
+
+8. **Has this ever worked?**
+   - Check git history
+   - Was there a recent breaking change?
+
+---
+
+## Document Version
+
+- **Version**: 1.0
+- **Date**: 2025-12-03
+- **Author**: System diagnostic documentation
+- **Target**: ChatGPT diagnostic agent
+- **Estimated completion time**: 5-10 hours
+
+---
+
+## Final Checklist
+
+Before considering the fix complete:
+
+- [ ] Minimal reproducer created and passes
+- [ ] Root cause identified and documented
+- [ ] Fix implemented with explanation
+- [ ] Diagnostic logging removed
+- [ ] All baseline tests pass
+- [ ] No performance regression
+- [ ] Git commit with detailed message
+- [ ] This document updated with findings
+
+**Good luck with the diagnosis!**
diff --git a/docs/TLS_SS_HINT_BOX_DESIGN.md b/docs/TLS_SS_HINT_BOX_DESIGN.md
new file mode 100644
index 00000000..9381d738
--- /dev/null
+++ b/docs/TLS_SS_HINT_BOX_DESIGN.md
@@ -0,0 +1,1148 @@
+# TLS Superslab Hint Box - Design Document
+
+**Phase**: Headerless Performance Optimization - Phase 1
+**Date**: 2025-12-03
+**Status**: Design Review
+**Author**: hakmem team
+
+---
+
+## 1. Executive Summary
+
+The TLS Superslab Hint Box is a thread-local cache that accelerates pointer-to-SuperSlab resolution in Headerless mode. When HAKMEM_TINY_HEADERLESS=1 is enabled, every free() operation requires translating a user pointer to its owning SuperSlab. Currently, this uses `hak_super_lookup()`, which performs a hash table lookup costing 10-50 cycles. By caching recently-used SuperSlab references in thread-local storage, we can reduce this to 2-5 cycles for cache hits (85-95% hit rate expected).
+
+**Expected Performance Improvement**: 15-20% throughput increase (54.60 → 64-68 Mops/s on sh8bench)
+
+**Risk Level**: Low
+- Thread-local storage eliminates cache coherency issues
+- Magic number validation provides fail-safe fallback
+- Self-contained Box with minimal integration surface
+- Memory overhead: ~128 bytes per thread (negligible)
+
+---
+
+## 2. Box Definition (Box Theory)
+
+```
+Box: TLS Superslab Hint Cache
+
+MISSION:
+  Cache recently-used SuperSlab references in TLS to accelerate
+  ptr→SuperSlab resolution in Headerless mode, avoiding expensive
+  hash table lookups on the critical free() path.
+
+DESIGN:
+  - Provides O(1) lookup for hot SuperSlabs (L1 cache hit, 2-5 cycles)
+  - Falls back to global registry on miss (fail-safe, no data loss)
+  - No ownership, no remote queues, pure read-only cache
+  - FIFO eviction policy with configurable cache size (2-4 slots)
+
+INVARIANTS:
+  - hint.base <= ptr < hint.end implies hint.ss is valid
+  - Miss is always safe (triggers fallback to hak_super_lookup)
+  - TLS data survives only within thread lifetime
+  - Cache entries are invalidated implicitly by FIFO rotation
+  - Magic number check (SUPERSLAB_MAGIC) validates all pointers
+
+BOUNDARY:
+  - Input: raw user pointer (void* ptr) from free() path
+  - Output: SuperSlab* or NULL (miss triggers fallback)
+  - Does NOT determine class_idx (that's slab_index_for's job)
+  - Does NOT perform ownership validation (that's SuperSlab's job)
+
+PERFORMANCE:
+  - Cache hit: 2-5 cycles (L1 cache hit, 4 pointer comparisons)
+  - Cache miss: fallback to hak_super_lookup (10-50 cycles)
+  - Expected hit rate: 85-95% for single-threaded workloads
+  - Expected hit rate: 70-85% for multi-threaded workloads
+
+THREAD SAFETY:
+  - TLS storage: no sharing, no synchronization required
+  - Read-only cache: never modifies SuperSlab state
+  - Stale entries: caught by magic number check
+```
+
+---
+
+## 3. Data Structures
+
+```c
+// core/box/tls_ss_hint_box.h
+
+#ifndef TLS_SS_HINT_BOX_H
+#define TLS_SS_HINT_BOX_H
+
+#include <stdint.h>
+#include <stdbool.h>
+
+// Forward declaration
+struct SuperSlab;
+
+// Cache entry for a single SuperSlab hint
+// Size: 24 bytes (cache-friendly, fits in 1 cache line with metadata)
+typedef struct {
+    void* base;              // SuperSlab base address (aligned to 1MB or 2MB)
+    void* end;               // base + superslab_size (for range check)
+    struct SuperSlab* ss;    // Cached SuperSlab pointer
+} TlsSsHintEntry;
+
+// TLS hint cache configuration
+// - 4 slots provide good hit rate without excessive overhead
+// - Larger caches (8, 16) show diminishing returns in benchmarks
+// - Smaller caches (2) may thrash on workloads with 3+ active SuperSlabs
+#define TLS_SS_HINT_SLOTS 4
+
+// Thread-local SuperSlab hint cache
+// Total size: 24*4 + 16 = 112 bytes per thread (negligible overhead)
+typedef struct {
+    TlsSsHintEntry entries[TLS_SS_HINT_SLOTS];  // Cache entries
+    uint32_t count;          // Number of valid entries (0 to TLS_SS_HINT_SLOTS)
+    uint32_t next_slot;      // Next slot for FIFO rotation (wraps at TLS_SS_HINT_SLOTS)
+
+    // Statistics (optional, for profiling builds)
+    // Disabled in HAKMEM_BUILD_RELEASE to save 16 bytes per thread
+    #if !HAKMEM_BUILD_RELEASE
+    uint64_t hits;           // Cache hit count
+    uint64_t misses;         // Cache miss count
+    #endif
+} TlsSsHintCache;
+
+// Thread-local storage instance
+// Initialized to zero by TLS semantics, formal init in tls_ss_hint_init()
+extern __thread TlsSsHintCache g_tls_ss_hint;
+
+#endif // TLS_SS_HINT_BOX_H
+```
+
+---
+
+## 4. API Design
+
+```c
+// core/box/tls_ss_hint_box.h (continued)
+
+/**
+ * @brief Initialize TLS hint cache for current thread
+ *
+ * Call once per thread, typically in thread-local initialization path.
+ * Safe to call multiple times (idempotent).
+ *
+ * Thread Safety: TLS, no synchronization required
+ * Performance: ~10 cycles (negligible one-time cost)
+ */
+static inline void tls_ss_hint_init(void);
+
+/**
+ * @brief Update hint cache with a SuperSlab reference
+ *
+ * Called on paths where we know the SuperSlab for a given address range:
+ * - After successful tiny_alloc (cache the allocated-from SuperSlab)
+ * - After superslab refill (cache the newly bound SuperSlab)
+ * - After unified cache refill (cache the refilled SuperSlab)
+ *
+ * Duplicate detection: If the SuperSlab is already cached, no update occurs.
+ * This prevents thrashing when repeatedly allocating from the same SuperSlab.
+ *
+ * @param ss    SuperSlab to cache (must be non-NULL, SUPERSLAB_MAGIC validated by caller)
+ * @param base  SuperSlab base address (1MB or 2MB aligned)
+ * @param size  SuperSlab size in bytes (1MB or 2MB)
+ *
+ * Thread Safety: TLS, no synchronization required
+ * Performance: ~15-20 cycles (duplicate check + FIFO rotation)
+ */
+static inline void tls_ss_hint_update(struct SuperSlab* ss, void* base, size_t size);
+
+/**
+ * @brief Lookup SuperSlab for given pointer (fast path)
+ *
+ * Called on free() entry, before falling back to hak_super_lookup().
+ * Performs linear search over cached entries (4 iterations max).
+ *
+ * Cache hit: Returns true, sets *out_ss to cached SuperSlab pointer
+ * Cache miss: Returns false, caller must use hak_super_lookup()
+ *
+ * @param ptr     User pointer to lookup (arbitrary alignment)
+ * @param out_ss  Output: SuperSlab pointer if found (only valid if return true)
+ * @return true if cache hit (out_ss is valid), false if miss
+ *
+ * Thread Safety: TLS, no synchronization required
+ * Performance: 2-5 cycles (hit), 8-12 cycles (miss)
+ *
+ * NOTE: Caller MUST validate SUPERSLAB_MAGIC after successful lookup.
+ *       This Box does not perform magic validation to keep fast path minimal.
+ */
+static inline bool tls_ss_hint_lookup(void* ptr, struct SuperSlab** out_ss);
+
+/**
+ * @brief Clear all cached hints (for testing/reset)
+ *
+ * Use cases:
+ * - Unit tests: Reset cache between test cases
+ * - Debug: Force cache cold start for profiling
+ * - Thread teardown: Optional cleanup (TLS auto-cleanup on thread exit)
+ *
+ * Thread Safety: TLS, no synchronization required
+ * Performance: ~10 cycles
+ */
+static inline void tls_ss_hint_clear(void);
+
+/**
+ * @brief Get cache statistics (for profiling builds)
+ *
+ * Returns hit/miss counters for performance analysis.
+ * Only available in non-release builds (HAKMEM_BUILD_RELEASE=0).
+ *
+ * @param hits    Output: Total cache hits
+ * @param misses  Output: Total cache misses
+ *
+ * Thread Safety: TLS, no synchronization required
+ * Performance: ~5 cycles (two loads)
+ */
+#if !HAKMEM_BUILD_RELEASE
+static inline void tls_ss_hint_stats(uint64_t* hits, uint64_t* misses);
+#endif
+```
+
+---
+
+## 5. Implementation Details
+
+```c
+// core/box/tls_ss_hint_box.c (or inline in .h for header-only Box)
+
+#include "tls_ss_hint_box.h"
+#include "../hakmem_tiny_superslab.h"  // For SuperSlab, SUPERSLAB_MAGIC
+
+// Thread-local storage definition
+__thread TlsSsHintCache g_tls_ss_hint = {0};
+
+/**
+ * Initialize TLS hint cache
+ * Safe to call multiple times (idempotent check via count)
+ */
+static inline void tls_ss_hint_init(void) {
+    // Zero-initialization by TLS, but explicit init for clarity
+    g_tls_ss_hint.count = 0;
+    g_tls_ss_hint.next_slot = 0;
+
+    #if !HAKMEM_BUILD_RELEASE
+    g_tls_ss_hint.hits = 0;
+    g_tls_ss_hint.misses = 0;
+    #endif
+
+    // Clear all entries (paranoid, but cache-friendly loop)
+    for (int i = 0; i < TLS_SS_HINT_SLOTS; i++) {
+        g_tls_ss_hint.entries[i].base = NULL;
+        g_tls_ss_hint.entries[i].end = NULL;
+        g_tls_ss_hint.entries[i].ss = NULL;
+    }
+}
+
+/**
+ * Update hint cache with SuperSlab reference
+ * FIFO rotation: oldest entry is evicted when cache is full
+ * Duplicate detection: skip if SuperSlab already cached
+ */
+static inline void tls_ss_hint_update(struct SuperSlab* ss, void* base, size_t size) {
+    // Sanity check: reject invalid inputs
+    if (__builtin_expect(!ss || !base || size == 0, 0)) {
+        return;
+    }
+
+    // Duplicate detection: check if this SuperSlab is already cached
+    // This prevents thrashing when allocating from the same SuperSlab repeatedly
+    for (uint32_t i = 0; i < g_tls_ss_hint.count; i++) {
+        if (g_tls_ss_hint.entries[i].ss == ss) {
+            return;  // Already cached, no update needed
+        }
+    }
+
+    // Add to next slot (FIFO rotation)
+    uint32_t slot = g_tls_ss_hint.next_slot;
+    g_tls_ss_hint.entries[slot].base = base;
+    g_tls_ss_hint.entries[slot].end = (char*)base + size;
+    g_tls_ss_hint.entries[slot].ss = ss;
+
+    // Advance to next slot (wrap at TLS_SS_HINT_SLOTS)
+    g_tls_ss_hint.next_slot = (slot + 1) % TLS_SS_HINT_SLOTS;
+
+    // Increment count until cache is full
+    if (g_tls_ss_hint.count < TLS_SS_HINT_SLOTS) {
+        g_tls_ss_hint.count++;
+    }
+}
+
+/**
+ * Lookup SuperSlab for pointer (fast path)
+ * Linear search over cached entries (4 iterations max)
+ *
+ * Performance note:
+ * - Linear search is faster than hash table for small N (N <= 8)
+ * - Branch-free comparison (ptr >= base && ptr < end) is 2-3 cycles
+ * - Total cost: 2-5 cycles (hit), 8-12 cycles (miss with 4 entries)
+ */
+static inline bool tls_ss_hint_lookup(void* ptr, struct SuperSlab** out_ss) {
+    // Fast path: iterate over valid entries
+    // Unrolling this loop (if count is small) is beneficial, but let compiler decide
+    for (uint32_t i = 0; i < g_tls_ss_hint.count; i++) {
+        TlsSsHintEntry* e = &g_tls_ss_hint.entries[i];
+
+        // Range check: base <= ptr < end
+        // Note: end is exclusive (base + size), so use < not <=
+        if (ptr >= e->base && ptr < e->end) {
+            // Cache hit!
+            *out_ss = e->ss;
+
+            #if !HAKMEM_BUILD_RELEASE
+            g_tls_ss_hint.hits++;
+            #endif
+
+            return true;
+        }
+    }
+
+    // Cache miss: caller must fall back to hak_super_lookup()
+    #if !HAKMEM_BUILD_RELEASE
+    g_tls_ss_hint.misses++;
+    #endif
+
+    return false;
+}
+
+/**
+ * Clear all cached hints
+ * Use for testing or manual reset
+ */
+static inline void tls_ss_hint_clear(void) {
+    g_tls_ss_hint.count = 0;
+    g_tls_ss_hint.next_slot = 0;
+
+    #if !HAKMEM_BUILD_RELEASE
+    // Preserve stats across clear (for cumulative profiling)
+    // Uncomment to reset stats:
+    // g_tls_ss_hint.hits = 0;
+    // g_tls_ss_hint.misses = 0;
+    #endif
+
+    // Optional: zero out entries (paranoid, not required for correctness)
+    for (int i = 0; i < TLS_SS_HINT_SLOTS; i++) {
+        g_tls_ss_hint.entries[i].base = NULL;
+        g_tls_ss_hint.entries[i].end = NULL;
+        g_tls_ss_hint.entries[i].ss = NULL;
+    }
+}
+
+/**
+ * Get cache statistics (profiling builds only)
+ */
+#if !HAKMEM_BUILD_RELEASE
+static inline void tls_ss_hint_stats(uint64_t* hits, uint64_t* misses) {
+    if (hits) *hits = g_tls_ss_hint.hits;
+    if (misses) *misses = g_tls_ss_hint.misses;
+}
+#endif
+```
+
+---
+
+## 6. Integration Points
+
+### 6.1 Update Points: When to Call `tls_ss_hint_update()`
+
+The hint cache should be updated whenever we know the SuperSlab for an address range. This happens on allocation success paths:
+
+#### Location 1: After Successful Tiny Alloc (hakmem_tiny.c)
+```c
+// In hak_tiny_alloc or similar allocation path
+void* ptr = tiny_allocate_from_superslab(class_idx, &ss);
+if (ptr) {
+    #if HAKMEM_TINY_SS_TLS_HINT
+    // Cache the SuperSlab we just allocated from
+    // This improves free() performance for LIFO allocation patterns
+    tls_ss_hint_update(ss, ss->base_addr, ss->size_bytes);
+    #endif
+    return ptr;
+}
+```
+
+#### Location 2: After SuperSlab Refill (hakmem_tiny_refill.inc.h)
+```c
+// In tiny_refill_from_superslab or superslab_allocate
+SuperSlab* ss = superslab_allocate(class_idx);
+if (ss) {
+    // Bind SuperSlab to thread's TLS state
+    bind_superslab_to_thread(ss, class_idx);
+
+    #if HAKMEM_TINY_SS_TLS_HINT
+    // Cache the newly bound SuperSlab
+    // Future allocations from this SuperSlab will have cached hint
+    tls_ss_hint_update(ss, ss->base_addr, ss->size_bytes);
+    #endif
+}
+```
+
+#### Location 3: Unified Cache Refill (core/front/tiny_unified_cache.c)
+```c
+// In unified_cache_refill_class
+void* block = superslab_alloc_block(class_idx, &ss);
+if (block) {
+    #if HAKMEM_TINY_SS_TLS_HINT
+    // Cache the SuperSlab that provided this block
+    tls_ss_hint_update(ss, ss->base_addr, ss->size_bytes);
+    #endif
+
+    // Push to unified cache
+    unified_cache_push(class_idx, block);
+}
+```
+
+#### Location 4: Thread-Local Init (hakmem_tiny_tls_init)
+```c
+// In tiny_tls_init or thread_local_init
+void tiny_tls_init(void) {
+    // Initialize TLS structures
+    tiny_magazine_init();
+    tiny_sll_init();
+
+    #if HAKMEM_TINY_SS_TLS_HINT
+    // Initialize hint cache (zero-init by TLS, but explicit for clarity)
+    tls_ss_hint_init();
+    #endif
+}
+```
+
+### 6.2 Lookup Points: When to Call `tls_ss_hint_lookup()`
+
+The hint lookup should be the **first step** in free() path, before falling back to registry lookup:
+
+#### Location 1: Tiny Free Entry (core/hakmem_tiny_free.inc)
+```c
+// In hak_tiny_free or similar free path
+void hak_tiny_free(void* ptr) {
+    if (!ptr) return;
+
+    SuperSlab* ss = NULL;
+
+    #if HAKMEM_TINY_HEADERLESS
+        // Phase 1: Try TLS hint cache (fast path, 2-5 cycles on hit)
+        #if HAKMEM_TINY_SS_TLS_HINT
+        if (!tls_ss_hint_lookup(ptr, &ss)) {
+        #endif
+            // Phase 2: Fallback to global registry (slow path, 10-50 cycles)
+            ss = hak_super_lookup(ptr);
+        #if HAKMEM_TINY_SS_TLS_HINT
+        }
+        #endif
+
+        // Validate SuperSlab (magic check)
+        if (!ss || ss->magic != SUPERSLAB_MAGIC) {
+            // Invalid pointer - external guard path
+            hak_external_guard_free(ptr);
+            return;
+        }
+
+        // Proceed with free using SuperSlab info
+        int class_idx = slab_index_for(ss, ptr);
+        tiny_free_to_slab(ss, ptr, class_idx);
+
+    #else
+        // Header mode: read class_idx from header (1-3 cycles)
+        uint8_t hdr = *((uint8_t*)ptr - 1);
+        int class_idx = hdr & 0x7;
+        tiny_free_to_class(class_idx, ptr);
+    #endif
+}
+```
+
+#### Location 2: Fast Free Path (core/tiny_free_fast_v2.inc.h)
+```c
+// In tiny_free_fast or inline free path
+static inline void tiny_free_fast(void* ptr) {
+    #if HAKMEM_TINY_HEADERLESS
+        SuperSlab* ss = NULL;
+
+        // Try hint cache first
+        #if HAKMEM_TINY_SS_TLS_HINT
+        if (!tls_ss_hint_lookup(ptr, &ss)) {
+        #endif
+            ss = hak_super_lookup(ptr);
+        #if HAKMEM_TINY_SS_TLS_HINT
+        }
+        #endif
+
+        if (__builtin_expect(!ss || ss->magic != SUPERSLAB_MAGIC, 0)) {
+            // Slow path: external guard or invalid pointer
+            hak_tiny_free_slow(ptr);
+            return;
+        }
+
+        // Fast path: push to TLS freelist
+        int class_idx = slab_index_for(ss, ptr);
+        front_gate_push_tls(class_idx, ptr);
+
+    #else
+        // Header mode fast path
+        uint8_t hdr = *((uint8_t*)ptr - 1);
+        int class_idx = hdr & 0x7;
+        front_gate_push_tls(class_idx, ptr);
+    #endif
+}
+```
+
+---
+
+## 7. Environment Variable
+
+```c
+// In hakmem_build_flags.h or similar configuration header
+
+// ============================================================================
+// Phase 1: Headerless Optimization - TLS SuperSlab Hint Cache
+// ============================================================================
+// Purpose: Accelerate ptr→SuperSlab lookup in Headerless mode
+// Default: 0 (disabled during development and testing)
+// Target: 1 (enabled after validation in Phase 1 rollout)
+//
+// Performance Impact:
+// - Cache hit: 2-5 cycles (vs 10-50 cycles for hak_super_lookup)
+// - Expected hit rate: 85-95% (single-threaded), 70-85% (multi-threaded)
+// - Expected throughput improvement: 15-20%
+//
+// Memory Overhead:
+// - 112 bytes per thread (TLS)
+// - Negligible for typical workloads (1000 threads = 112KB)
+//
+// Dependencies:
+// - Requires HAKMEM_TINY_HEADERLESS=1 (hint is no-op in header mode)
+// - No other dependencies (self-contained Box)
+
+#ifndef HAKMEM_TINY_SS_TLS_HINT
+  #define HAKMEM_TINY_SS_TLS_HINT 0
+#endif
+
+// Validation: Hint Box only active in Headerless mode
+#if HAKMEM_TINY_SS_TLS_HINT && !HAKMEM_TINY_HEADERLESS
+  #error "HAKMEM_TINY_SS_TLS_HINT requires HAKMEM_TINY_HEADERLESS=1"
+#endif
+```
+
+---
+
+## 8. Testing Plan
+
+### 8.1 Unit Tests
+
+Create `/mnt/workdisk/public_share/hakmem/tests/test_tls_ss_hint.c`:
+
+```c
+#include <assert.h>
+#include <stdio.h>
+#include <string.h>
+#include "core/box/tls_ss_hint_box.h"
+#include "core/hakmem_tiny_superslab.h"
+
+// Mock SuperSlab for testing
+typedef struct {
+    uint32_t magic;
+    void* base_addr;
+    size_t size_bytes;
+    uint8_t size_class;
+} MockSuperSlab;
+
+void test_hint_init(void) {
+    printf("test_hint_init...\n");
+
+    tls_ss_hint_init();
+
+    // Verify cache is empty
+    assert(g_tls_ss_hint.count == 0);
+    assert(g_tls_ss_hint.next_slot == 0);
+
+    #if !HAKMEM_BUILD_RELEASE
+    assert(g_tls_ss_hint.hits == 0);
+    assert(g_tls_ss_hint.misses == 0);
+    #endif
+
+    printf("  PASS\n");
+}
+
+void test_hint_basic(void) {
+    printf("test_hint_basic...\n");
+
+    tls_ss_hint_init();
+
+    // Mock SuperSlab
+    MockSuperSlab ss = {
+        .magic = SUPERSLAB_MAGIC,
+        .base_addr = (void*)0x1000000,
+        .size_bytes = 2 * 1024 * 1024,  // 2MB
+        .size_class = 0
+    };
+
+    // Update hint
+    tls_ss_hint_update((SuperSlab*)&ss, ss.base_addr, ss.size_bytes);
+
+    // Verify cache entry
+    assert(g_tls_ss_hint.count == 1);
+    assert(g_tls_ss_hint.entries[0].base == ss.base_addr);
+    assert(g_tls_ss_hint.entries[0].ss == (SuperSlab*)&ss);
+
+    // Lookup should hit (within range)
+    SuperSlab* out = NULL;
+    assert(tls_ss_hint_lookup((void*)0x1000100, &out) == true);
+    assert(out == (SuperSlab*)&ss);
+
+    // Lookup at base should hit
+    assert(tls_ss_hint_lookup((void*)0x1000000, &out) == true);
+    assert(out == (SuperSlab*)&ss);
+
+    // Lookup at end-1 should hit
+    assert(tls_ss_hint_lookup((void*)0x12FFFFF, &out) == true);
+    assert(out == (SuperSlab*)&ss);
+
+    // Lookup at end should miss (exclusive boundary)
+    assert(tls_ss_hint_lookup((void*)0x1300000, &out) == false);
+
+    // Lookup outside range should miss
+    assert(tls_ss_hint_lookup((void*)0x3000000, &out) == false);
+
+    printf("  PASS\n");
+}
+
+void test_hint_fifo_rotation(void) {
+    printf("test_hint_fifo_rotation...\n");
+
+    tls_ss_hint_init();
+
+    // Create 6 mock SuperSlabs (cache has 4 slots)
+    MockSuperSlab ss[6];
+    for (int i = 0; i < 6; i++) {
+        ss[i].magic = SUPERSLAB_MAGIC;
+        ss[i].base_addr = (void*)(uintptr_t)(0x1000000 + i * 0x200000);  // 2MB apart
+        ss[i].size_bytes = 2 * 1024 * 1024;
+        ss[i].size_class = 0;
+
+        tls_ss_hint_update((SuperSlab*)&ss[i], ss[i].base_addr, ss[i].size_bytes);
+    }
+
+    // Cache should be full (4 slots)
+    assert(g_tls_ss_hint.count == TLS_SS_HINT_SLOTS);
+
+    // First 2 SuperSlabs should be evicted (FIFO)
+    SuperSlab* out = NULL;
+    assert(tls_ss_hint_lookup((void*)0x1000100, &out) == false);  // ss[0] evicted
+    assert(tls_ss_hint_lookup((void*)0x1200100, &out) == false);  // ss[1] evicted
+
+    // Last 4 SuperSlabs should be cached
+    assert(tls_ss_hint_lookup((void*)0x1400100, &out) == true);   // ss[2]
+    assert(out == (SuperSlab*)&ss[2]);
+    assert(tls_ss_hint_lookup((void*)0x1600100, &out) == true);   // ss[3]
+    assert(out == (SuperSlab*)&ss[3]);
+    assert(tls_ss_hint_lookup((void*)0x1800100, &out) == true);   // ss[4]
+    assert(out == (SuperSlab*)&ss[4]);
+    assert(tls_ss_hint_lookup((void*)0x1A00100, &out) == true);   // ss[5]
+    assert(out == (SuperSlab*)&ss[5]);
+
+    printf("  PASS\n");
+}
+
+void test_hint_duplicate_detection(void) {
+    printf("test_hint_duplicate_detection...\n");
+
+    tls_ss_hint_init();
+
+    // Mock SuperSlab
+    MockSuperSlab ss = {
+        .magic = SUPERSLAB_MAGIC,
+        .base_addr = (void*)0x1000000,
+        .size_bytes = 2 * 1024 * 1024,
+        .size_class = 0
+    };
+
+    // Update hint 3 times with same SuperSlab
+    tls_ss_hint_update((SuperSlab*)&ss, ss.base_addr, ss.size_bytes);
+    tls_ss_hint_update((SuperSlab*)&ss, ss.base_addr, ss.size_bytes);
+    tls_ss_hint_update((SuperSlab*)&ss, ss.base_addr, ss.size_bytes);
+
+    // Cache should have only 1 entry (duplicates ignored)
+    assert(g_tls_ss_hint.count == 1);
+    assert(g_tls_ss_hint.entries[0].ss == (SuperSlab*)&ss);
+
+    printf("  PASS\n");
+}
+
+void test_hint_clear(void) {
+    printf("test_hint_clear...\n");
+
+    tls_ss_hint_init();
+
+    // Add some entries
+    MockSuperSlab ss = {
+        .magic = SUPERSLAB_MAGIC,
+        .base_addr = (void*)0x1000000,
+        .size_bytes = 2 * 1024 * 1024,
+        .size_class = 0
+    };
+    tls_ss_hint_update((SuperSlab*)&ss, ss.base_addr, ss.size_bytes);
+
+    assert(g_tls_ss_hint.count == 1);
+
+    // Clear cache
+    tls_ss_hint_clear();
+
+    // Cache should be empty
+    assert(g_tls_ss_hint.count == 0);
+    assert(g_tls_ss_hint.next_slot == 0);
+
+    // Lookup should miss
+    SuperSlab* out = NULL;
+    assert(tls_ss_hint_lookup((void*)0x1000100, &out) == false);
+
+    printf("  PASS\n");
+}
+
+#if !HAKMEM_BUILD_RELEASE
+void test_hint_stats(void) {
+    printf("test_hint_stats...\n");
+
+    tls_ss_hint_init();
+
+    // Add entry
+    MockSuperSlab ss = {
+        .magic = SUPERSLAB_MAGIC,
+        .base_addr = (void*)0x1000000,
+        .size_bytes = 2 * 1024 * 1024,
+        .size_class = 0
+    };
+    tls_ss_hint_update((SuperSlab*)&ss, ss.base_addr, ss.size_bytes);
+
+    // Perform lookups
+    SuperSlab* out = NULL;
+    tls_ss_hint_lookup((void*)0x1000100, &out);  // Hit
+    tls_ss_hint_lookup((void*)0x1000200, &out);  // Hit
+    tls_ss_hint_lookup((void*)0x3000000, &out);  // Miss
+
+    // Check stats
+    uint64_t hits = 0, misses = 0;
+    tls_ss_hint_stats(&hits, &misses);
+
+    assert(hits == 2);
+    assert(misses == 1);
+
+    printf("  PASS\n");
+}
+#endif
+
+int main(void) {
+    printf("Running TLS SS Hint Box unit tests...\n\n");
+
+    test_hint_init();
+    test_hint_basic();
+    test_hint_fifo_rotation();
+    test_hint_duplicate_detection();
+    test_hint_clear();
+
+    #if !HAKMEM_BUILD_RELEASE
+    test_hint_stats();
+    #endif
+
+    printf("\nAll tests passed!\n");
+    return 0;
+}
+```
+
+### 8.2 Integration Tests
+
+#### Test 1: Build Validation
+```bash
+# Test 1: Build with hint disabled (baseline)
+make clean
+make shared -j8 EXTRA_CFLAGS="-DHAKMEM_TINY_HEADERLESS=1 -DHAKMEM_TINY_SS_TLS_HINT=0"
+
+# Test 2: Build with hint enabled
+make clean
+make shared -j8 EXTRA_CFLAGS="-DHAKMEM_TINY_HEADERLESS=1 -DHAKMEM_TINY_SS_TLS_HINT=1"
+
+# Test 3: Verify hint is disabled in header mode (should error)
+# make clean
+# make shared -j8 EXTRA_CFLAGS="-DHAKMEM_TINY_HEADERLESS=0 -DHAKMEM_TINY_SS_TLS_HINT=1"
+# Expected: Compile error (validation check in hakmem_build_flags.h)
+```
+
+#### Test 2: Benchmark Comparison
+```bash
+# Build baseline (hint disabled)
+make clean
+make shared -j8 EXTRA_CFLAGS="-DHAKMEM_TINY_HEADERLESS=1 -DHAKMEM_TINY_SS_TLS_HINT=0"
+
+# Run benchmarks
+LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/sh8bench > baseline.txt
+LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/cfrac 17545186520809 > cfrac_baseline.txt
+LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/larson 8 > larson_baseline.txt
+
+# Build with hint enabled
+make clean
+make shared -j8 EXTRA_CFLAGS="-DHAKMEM_TINY_HEADERLESS=1 -DHAKMEM_TINY_SS_TLS_HINT=1"
+
+# Run same benchmarks
+LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/sh8bench > hint.txt
+LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/cfrac 17545186520809 > cfrac_hint.txt
+LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/larson 8 > larson_hint.txt
+
+# Compare results
+echo "=== sh8bench ==="
+grep "Mops" baseline.txt hint.txt
+
+echo "=== cfrac ==="
+grep "time:" cfrac_baseline.txt cfrac_hint.txt
+
+echo "=== larson ==="
+grep "ops/s" larson_baseline.txt larson_hint.txt
+```
+
+#### Test 3: Hit Rate Profiling
+```bash
+# Build with stats enabled (non-release)
+make clean
+make shared -j8 EXTRA_CFLAGS="-DHAKMEM_TINY_HEADERLESS=1 -DHAKMEM_TINY_SS_TLS_HINT=1 -DHAKMEM_BUILD_RELEASE=0"
+
+# Add stats dump at exit (in hakmem_exit.c or similar)
+# void dump_hint_stats(void) {
+#     uint64_t hits = 0, misses = 0;
+#     tls_ss_hint_stats(&hits, &misses);
+#     fprintf(stderr, "[TLS_HINT_STATS] hits=%lu misses=%lu hit_rate=%.2f%%\n",
+#             hits, misses, 100.0 * hits / (hits + misses));
+# }
+
+# Run benchmark and check hit rate
+LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/sh8bench 2>&1 | grep TLS_HINT_STATS
+# Expected: hit_rate >= 85%
+```
+
+### 8.3 Correctness Tests
+
+```bash
+# Test with external pointer (should fall back to hak_super_lookup)
+# This tests that cache misses are handled correctly
+
+# Build with hint enabled
+make clean
+make shared -j8 EXTRA_CFLAGS="-DHAKMEM_TINY_HEADERLESS=1 -DHAKMEM_TINY_SS_TLS_HINT=1"
+
+# Run sh8bench (allocates from multiple SuperSlabs)
+LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/sh8bench
+
+# No crashes or assertion failures = success
+echo "Correctness test passed"
+```
+
+---
+
+## 9. Performance Expectations
+
+### 9.1 Cycle Count Analysis
+
+| Operation | Without Hint | With Hint (Hit) | With Hint (Miss) | Improvement |
+|-----------|-------------|----------------|-----------------|-------------|
+| free() lookup | 10-50 cycles | 2-5 cycles | 10-50 cycles | 80-95% |
+| Range check (per entry) | N/A | 2 cycles | 2 cycles | - |
+| Hash table lookup | 10-50 cycles | N/A | 10-50 cycles | - |
+| Total free() cost | 15-60 cycles | 7-15 cycles (hit) | 20-65 cycles (miss) | 40-60% |
+
+### 9.2 Expected Hit Rates
+
+| Workload | Hit Rate | Reasoning |
+|----------|----------|-----------|
+| Single-threaded LIFO | 95-99% | Free() immediately after alloc() from same SuperSlab |
+| Single-threaded FIFO | 85-95% | Recent allocations from 2-4 SuperSlabs |
+| Multi-threaded (8 threads) | 70-85% | Shared SuperSlabs, more cache thrashing |
+| Larson (high churn) | 65-80% | Many active SuperSlabs, frequent evictions |
+
+### 9.3 Benchmark Targets
+
+| Benchmark | Baseline (no hint) | Target (with hint) | Improvement |
+|-----------|-------------------|-------------------|-------------|
+| sh8bench | 54.60 Mops/s | 64-68 Mops/s | +15-20% |
+| cfrac | 1.25 sec | 1.10-1.15 sec | +10-15% |
+| larson (8 threads) | 6.5M ops/s | 7.5-8.0M ops/s | +15-20% |
+
+### 9.4 Memory Overhead
+
+| Metric | Value | Notes |
+|--------|-------|-------|
+| Per-thread overhead | 112 bytes | TLS cache (release build) |
+| Per-thread overhead (debug) | 128 bytes | TLS cache + stats counters |
+| 1000 threads | 112 KB | Negligible for server workloads |
+| 10000 threads | 1.12 MB | Still negligible |
+
+---
+
+## 10. Risk Analysis
+
+| Risk | Likelihood | Impact | Mitigation |
+|------|-----------|--------|------------|
+| **Cache coherency issues** | Very Low | Low | TLS is thread-local, no sharing between threads |
+| **Stale hint after munmap** | Low | Low | Magic check (SUPERSLAB_MAGIC) catches freed SuperSlabs |
+| **Cache thrashing (many SS)** | Low | Low | 4 slots cover typical workloads; miss falls back to registry |
+| **Memory overhead** | Very Low | Very Low | 112 bytes/thread, negligible for most workloads |
+| **Integration bugs** | Low | Medium | Self-contained Box, clear API, comprehensive tests |
+| **Hit rate lower than expected** | Low | Low | Even 50% hit rate improves performance; no regression on miss |
+| **Complexity increase** | Low | Low | 150 LOC, header-only Box, minimal dependencies |
+
+### 10.1 Failure Modes and Recovery
+
+| Failure Mode | Detection | Recovery |
+|-------------|-----------|----------|
+| Stale SuperSlab pointer | Magic check (SUPERSLAB_MAGIC != expected) | Fall back to hak_super_lookup() |
+| Cache miss | tls_ss_hint_lookup returns false | Fall back to hak_super_lookup() |
+| Invalid hint range | ptr outside [base, end) | Linear search continues, eventually misses |
+| Thread teardown | TLS cleanup by OS | No manual cleanup needed |
+| SuperSlab freed | Magic number cleared | Caught by magic check in free() path |
+
+---
+
+## 11. Future Considerations
+
+### 11.1 Phase 2 Integration: Global Class Map
+
+When Phase 2 introduces a Global Class Map (pointer → class_idx lookup), the TLS Hint Box becomes the first tier in a three-tier lookup hierarchy:
+
+```
+Tier 1 (fastest): TLS Hint Cache (2-5 cycles, 85-95% hit rate)
+    ↓ miss
+Tier 2 (medium): Global Class Map (5-15 cycles, 99%+ hit rate)
+    ↓ miss
+Tier 3 (slowest): Global SuperSlab Registry (10-50 cycles, 100% hit rate)
+```
+
+**Integration point**:
+```c
+SuperSlab* ss = NULL;
+int class_idx = -1;
+
+// Tier 1: TLS hint
+#if HAKMEM_TINY_SS_TLS_HINT
+if (tls_ss_hint_lookup(ptr, &ss)) {
+    class_idx = slab_index_for(ss, ptr);
+    goto found;
+}
+#endif
+
+// Tier 2: Global class map
+#if HAKMEM_TINY_CLASS_MAP
+class_idx = class_map_lookup(ptr);
+if (class_idx >= 0) {
+    ss = hak_super_lookup(ptr);  // Still need SS for metadata
+    goto found;
+}
+#endif
+
+// Tier 3: Registry fallback
+ss = hak_super_lookup(ptr);
+if (ss && ss->magic == SUPERSLAB_MAGIC) {
+    class_idx = slab_index_for(ss, ptr);
+    goto found;
+}
+
+// External pointer
+hak_external_guard_free(ptr);
+return;
+
+found:
+    tiny_free_to_class(class_idx, ptr);
+```
+
+### 11.2 Adaptive Cache Sizing
+
+Current design uses fixed `TLS_SS_HINT_SLOTS = 4`. Future optimization could make this adaptive:
+
+- **Workload detection**: Track hit rate over time windows
+- **Dynamic sizing**: Increase slots (4 → 8) if hit rate < 80%
+- **Memory pressure**: Decrease slots (8 → 2) if memory constrained
+
+**Implementation sketch**:
+```c
+#define TLS_SS_HINT_SLOTS_MAX 8
+
+typedef struct {
+    uint32_t current_slots;  // Dynamic (2, 4, 8)
+    uint64_t hits_window;
+    uint64_t misses_window;
+} TlsSsHintAdaptive;
+
+void tls_ss_hint_tune(void) {
+    double hit_rate = (double)g_tls_ss_hint.hits_window /
+                      (g_tls_ss_hint.hits_window + g_tls_ss_hint.misses_window);
+
+    if (hit_rate < 0.80 && g_tls_ss_hint.current_slots < TLS_SS_HINT_SLOTS_MAX) {
+        g_tls_ss_hint.current_slots *= 2;  // Grow cache
+    } else if (hit_rate > 0.95 && g_tls_ss_hint.current_slots > 2) {
+        g_tls_ss_hint.current_slots /= 2;  // Shrink cache
+    }
+}
+```
+
+### 11.3 LRU vs FIFO Eviction Policy
+
+Current design uses FIFO (simple, predictable). Alternative: LRU with move-to-front on hit.
+
+**LRU advantages**:
+- Better hit rate for workloads with temporal locality
+- Commonly used SuperSlabs stay cached longer
+
+**LRU disadvantages**:
+- 2-3 extra cycles per hit (move to front)
+- More complex implementation (doubly-linked list)
+
+**Benchmark before switching**: Profile sh8bench, larson, cfrac with both policies.
+
+### 11.4 Per-Class Hint Caches
+
+Current design: Single cache for all classes (4 entries, any class).
+Alternative: Per-class caches (1 entry per class, 8 entries total).
+
+**Per-class advantages**:
+- Guaranteed cache slot for each class
+- No inter-class eviction
+
+**Per-class disadvantages**:
+- Wastes space if only 2-3 classes are active
+- More TLS overhead (8 entries vs 4)
+
+**Recommendation**: Defer until benchmarks show inter-class thrashing.
+
+### 11.5 Statistics Export API
+
+For production monitoring, export hit rate via:
+
+```c
+// Global aggregated stats (all threads)
+void hak_tls_hint_global_stats(uint64_t* total_hits, uint64_t* total_misses);
+
+// ENV-based stats dump at exit
+// HAKMEM_TLS_HINT_STATS=1 → dump to stderr at exit
+```
+
+---
+
+## 12. Implementation Checklist
+
+### 12.1 Phase 1a: Core Implementation (Week 1)
+- [ ] Create `core/box/tls_ss_hint_box.h`
+- [ ] Implement `tls_ss_hint_init()`
+- [ ] Implement `tls_ss_hint_update()`
+- [ ] Implement `tls_ss_hint_lookup()`
+- [ ] Implement `tls_ss_hint_clear()`
+- [ ] Add `HAKMEM_TINY_SS_TLS_HINT` flag to `hakmem_build_flags.h`
+- [ ] Add validation check (hint requires headerless mode)
+
+### 12.2 Phase 1b: Integration (Week 2)
+- [ ] Integrate into `hakmem_tiny_free.inc` (lookup path)
+- [ ] Integrate into `hakmem_tiny.c` (update path after alloc)
+- [ ] Integrate into `hakmem_tiny_refill.inc.h` (update path after refill)
+- [ ] Integrate into `core/front/tiny_unified_cache.c` (update path)
+- [ ] Call `tls_ss_hint_init()` in thread-local init
+
+### 12.3 Phase 1c: Testing (Week 2-3)
+- [ ] Write unit tests (`tests/test_tls_ss_hint.c`)
+- [ ] Run unit tests: `make test_tls_ss_hint && ./test_tls_ss_hint`
+- [ ] Build validation (hint disabled, hint enabled, error check)
+- [ ] Benchmark comparison (sh8bench, cfrac, larson)
+- [ ] Hit rate profiling (debug build with stats)
+- [ ] Correctness tests (no crashes, no assertion failures)
+
+### 12.4 Phase 1d: Validation (Week 3)
+- [ ] Benchmark: sh8bench (target: +15-20%)
+- [ ] Benchmark: cfrac (target: +10-15%)
+- [ ] Benchmark: larson 8 threads (target: +15-20%)
+- [ ] Hit rate analysis (target: 85-95%)
+- [ ] Memory overhead check (target: < 150 bytes/thread)
+- [ ] Regression test: Headerless=0 mode still works
+
+### 12.5 Phase 1e: Documentation (Week 3-4)
+- [ ] Update `docs/PHASE2_HEADERLESS_INSTRUCTION.md` with hint Box
+- [ ] Add Box Theory annotation to hakmem Box registry
+- [ ] Write performance analysis report (before/after comparison)
+- [ ] Update build instructions (`make shared EXTRA_CFLAGS=...`)
+
+---
+
+## 13. Rollout Plan
+
+### Stage 1: Internal Testing (Week 1-3)
+- Build with `HAKMEM_TINY_SS_TLS_HINT=1` in dev environment
+- Run full benchmark suite (mimalloc-bench)
+- Profile with perf/cachegrind (verify cycle count reduction)
+- Fix any integration bugs
+
+### Stage 2: Canary Deployment (Week 4)
+- Enable hint Box in 5% of production traffic
+- Monitor: crash rate, performance metrics, hit rate
+- A/B test: Hint ON vs Hint OFF
+
+### Stage 3: Gradual Rollout (Week 5-6)
+- 25% traffic (if canary success)
+- 50% traffic
+- 100% traffic
+
+### Stage 4: Default Enable (Week 7)
+- Change default: `HAKMEM_TINY_SS_TLS_HINT=1`
+- Update build scripts, CI/CD pipelines
+- Announce in release notes
+
+---
+
+## 14. Success Metrics
+
+| Metric | Baseline | Target | Measurement |
+|--------|----------|--------|-------------|
+| sh8bench throughput | 54.60 Mops/s | 64-68 Mops/s | +15-20% |
+| cfrac runtime | 1.25 sec | 1.10-1.15 sec | -10-15% |
+| larson throughput | 6.5M ops/s | 7.5-8.0M ops/s | +15-20% |
+| TLS hint hit rate | N/A | 85-95% | Stats API |
+| free() cycle count | 15-60 cycles | 7-15 cycles (hit) | perf/cachegrind |
+| Memory overhead | 0 | < 150 bytes/thread | sizeof(TlsSsHintCache) |
+| Crash rate | 0.001% | 0.001% (no regression) | Production monitoring |
+
+---
+
+## 15. Open Questions
+
+1. **Q**: Should we implement per-class hint caches instead of unified cache?
+   **A**: Defer until benchmarks show inter-class thrashing. Current unified design is simpler and sufficient for most workloads.
+
+2. **Q**: Should we use LRU instead of FIFO eviction?
+   **A**: Defer until benchmarks show FIFO hit rate < 80%. FIFO is simpler and avoids move-to-front cost on hits.
+
+3. **Q**: Should we make TLS_SS_HINT_SLOTS runtime-configurable?
+   **A**: No, compile-time constant allows better optimization (loop unrolling, register allocation). Consider adaptive sizing in Phase 2 if needed.
+
+4. **Q**: Should we validate SUPERSLAB_MAGIC in tls_ss_hint_lookup()?
+   **A**: No, keep lookup minimal (2-5 cycles). Caller (free() path) must validate magic. This matches existing design where hak_super_lookup() also requires caller validation.
+
+5. **Q**: Should we export hit rate stats in production builds?
+   **A**: Phase 1: No (save 16 bytes/thread). Phase 2: Add global aggregated stats API for monitoring if needed.
+
+---
+
+## 16. Conclusion
+
+The TLS Superslab Hint Box is a low-risk, high-reward optimization that reduces the performance gap between Headerless mode and Header mode from 30% to ~15%. The design is self-contained, testable, and follows hakmem's Box Theory architecture. Expected implementation time: 3-4 weeks (including testing and validation).
+
+**Key Strengths**:
+- Minimal integration surface (5 call sites)
+- Self-contained Box (no dependencies)
+- Fail-safe fallback (miss → hak_super_lookup)
+- Low memory overhead (112 bytes/thread)
+- Proven pattern (TLS caching used in jemalloc, tcmalloc)
+
+**Next Steps**:
+1. Review this design document
+2. Approve Phase 1a implementation (core Box)
+3. Begin implementation with unit tests
+4. Benchmark and validate in dev environment
+5. Plan Phase 2 integration (Global Class Map)
+
+---
+
+**End of Design Document**