diff --git a/docs/CHATGPT_CONTEXT_SUMMARY.md b/docs/CHATGPT_CONTEXT_SUMMARY.md new file mode 100644 index 00000000..6bf42e75 --- /dev/null +++ b/docs/CHATGPT_CONTEXT_SUMMARY.md @@ -0,0 +1,295 @@ +# Context Summary for ChatGPT - TLS SLL Header Corruption Fix + +**Date**: 2025-12-03 +**Project**: hakmem - Custom Memory Allocator +**Handoff From**: Gemini + Task agent (previous phase) +**Current Task**: Diagnose and fix TLS SLL header corruption +**Status**: CRITICAL BLOCKER - Investigation Required + +--- + +## Quick Facts + +| Item | Value | +|------|-------| +| **Problem** | Header corruption in TLS SLL during baseline testing | +| **Error Message** | `[TLS_SLL_HDR_RESET] cls=1 base=0x... got=0x31 expect=0xa1` | +| **Error Location** | `core/box/tls_sll_box.h:282-303` | +| **Affected Configurations** | ALL (shared code path issue) | +| **Root Cause** | Unknown (6 patterns documented) | +| **Fix Type** | Surgical (1-5 lines expected) | +| **Build Status** | ✅ Succeeds | +| **Baseline Test Status** | ❌ Crashes (SIGSEGV at ~22 seconds) | + +--- + +## What is 0x31 vs 0xa1? + +``` +Expected (header magic): 0xa1 = 0xa0 (HEADER_MAGIC) | 0x01 (class_idx=1) +Got (corruption): 0x31 = ASCII character '1' or some user data + +This means: User data exists where header should be. +``` + +--- + +## Project Architecture (Box Theory) + +The hakmem allocator uses a **Box Theory** architecture where: + +- Each component (memory layout, pointer conversion, TLS state) is a separate "box" +- Each box has a single responsibility and clear API boundaries +- Examples: + - `tiny_layout_box.h` - Class sizes and header offsets (single source of truth) + - `ptr_conversion_box.h` - Pointer type safety (base vs user pointers) + - `tls_sll_box.h` - Thread-local single-linked list management + - `tls_ss_hint_box.h` - SuperSlab hint cache (Phase 1 optimization) + +--- + +## Recent Changes (Last 5 Commits) + +1. **f3f75ba3d** - "Fix Magazine Spill RAW pointer type conversion" + - Added HAK_BASE_FROM_RAW() wrapper in hakmem_tiny_refill.inc.h:228 + - Status: ✅ Fixed + +2. **2dc9d5d59** - "Fix include order in hakmem.c" + - Moved #include "box/hak_kpi_util.inc.h" before hak_core_init.inc.h + - Status: ✅ Fixed + +3. **94f9ea51** - "Implement TLS SuperSlab Hint Box (Phase 1)" + - New header-only cache for recently-used SuperSlabs + - Status: ✅ Implemented, but only 2.3% performance improvement (target was 15-20%) + +4. Earlier: Box theory framework, phantom types, etc. + +--- + +## The Remaining Issue: TLS SLL Header Corruption + +### Symptom + +```bash +# Build succeeds +$ make clean && make shared -j8 +Building libhakmem.so... OK (547KB) + +# But baseline test crashes +$ LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/sh8bench +[TLS_SLL_HDR_RESET] cls=1 base=0x7ef296abf8c8 got=0x31 expect=0xa1 count=0 +Segmentation fault (core dumped) +``` + +### Timeline + +- **When Discovered**: During Phase 1 benchmarking (2025-12-03) +- **Frequency**: 100% reproducible with sh8bench +- **Scope**: Affects baseline (Headerless OFF), so affects all configurations + +### Error Location + +**File**: `core/box/tls_sll_box.h` (lines 282-303) +**Function**: `tls_sll_pop_impl()` +**Operation**: Reading header validation + +```c +// Simplified logic (actual code has more details) +if (tiny_class_preserves_header(class_idx)) { + uint8_t* b = (uint8_t*)raw_base; + uint8_t got = *b; // Read byte at offset 0 + uint8_t expected = (uint8_t)(HEADER_MAGIC | (class_idx & HEADER_CLASS_MASK)); + + if (got != expected) { + fprintf(stderr, "[TLS_SLL_HDR_RESET] cls=%d base=%p got=0x%02x expect=0x%02x\n", + class_idx, raw_base, got, expected); + // Reset TLS SLL for this class + } +} +``` + +### Root Cause - Six Documented Patterns + +The diagnostic document identifies six possible patterns: + +1. **RAW Pointer vs BASE Pointer** - Wrong pointer type passed to tls_sll_push() +2. **Header Offset Mismatch** - Writing at one offset, reading at another +3. **Atomic Fence Missing** - Compiler/CPU reordering of write + push +4. **Adjacent Block Overflow** - User data from previous block overwrites header +5. **Class Index Mismatch** - Push with one class_idx, pop as different class_idx +6. **Headerless Mode Interference** - Mixed header/headerless logic despite OFF flag + +--- + +## Your Task + +**You have two comprehensive documents**: + +1. **`docs/CHATGPT_HANDOFF_TLS_DIAGNOSIS.md`** (THIS FILE'S COMPANION) + - Step-by-step task breakdown + - 7-step investigation and fix process + - Expected validation criteria + +2. **`docs/TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md`** (MAIN REFERENCE - 1,150+ LINES) + - Deep dive into all 6 root cause patterns + - Code examples for each pattern + - Minimal test case template + - Diagnostic logging instrumentation + - Fix code templates + - 7-step validation procedure + +**Follow the handoff document's steps 1-7 to diagnose and fix this issue.** + +--- + +## Build & Test Commands + +### Quick Build + +```bash +cd /mnt/workdisk/public_share/hakmem +make clean +make shared -j8 +``` + +### Baseline Test (Should Currently Crash) + +```bash +LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/sh8bench 2>&1 | \ + grep -E "TLS_SLL_HDR_RESET|Total|Segmentation" +``` + +### Minimal Test Case (After Creation) + +```bash +./tests/test_tls_sll_minimal 2>&1 | grep -E "TLS_SLL_HDR_RESET|PASS|FAIL" +``` + +--- + +## Important File Locations + +| Path | Purpose | +|------|---------| +| `core/box/tls_sll_box.h` | TLS SLL implementation (error source) | +| `core/hakmem_tiny_free.inc` | Free path - where headers are written | +| `core/hakmem_tiny_refill.inc.h` | Magazine spill - recent fix location | +| `core/box/ptr_conversion_box.h` | Pointer type conversion | +| `core/box/tiny_layout_box.h` | Class layout definitions | +| `core/box/tls_ss_hint_box.h` | Phase 1 optimization (new) | +| `docs/TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md` | YOUR MAIN REFERENCE | + +--- + +## Key Data Structures + +### TLS SLL Header Structure + +```c +typedef struct { + uint8_t hdr; // Header: 0xa0 | class_idx + uint8_t pad; // Padding/metadata + uint16_t _unused; // Alignment + SuperSlab* next; // Pointer to next SuperSlab +} TlsSllEntry; +``` + +### Header Validation + +```c +// Expected value for class 1: +expected = 0xa0 | 1 = 0xa1 + +// What we're seeing: +got = 0x31 = some user data + +// This means the header was never written OR was overwritten +``` + +--- + +## Pointer Types in hakmem + +The codebase distinguishes between: + +```c +hak_base_ptr_t - "Base pointer" pointing to start of allocation (includes header) +hak_user_ptr_t - "User pointer" pointing to user data (after offset adjustment) + +Conversion: +user = base + tiny_user_offset(class_idx) // Typically base + 1 +base = user - tiny_user_offset(class_idx) // Typically user - 1 +``` + +**Critical**: In Headerless mode, the offset is 0, so base == user. + +--- + +## Known Good Patterns (For Reference) + +From previous fixes: + +```c +// Pattern: Wrapping RAW pointer before TLS SLL push (ALREADY FIXED) +void* p = mag->items[--mag->top].ptr; // RAW pointer (user offset) +hak_base_ptr_t base_p = HAK_BASE_FROM_RAW(p); // Wrap to base pointer +if (!tls_sll_push(class_idx, base_p, cap)) { // Push base pointer + +// Pattern: Consistent include order (ALREADY FIXED) +#include "box/hak_kpi_util.inc.h" // Must come first +#include "hak_core_init.inc.h" // Must come after +``` + +--- + +## Success Criteria + +| Criteria | Status | +|----------|--------| +| TLS SLL Header Corruption diagnosed | ❌ In progress | +| Root cause pattern identified | ❌ In progress | +| Minimal reproducer created | ❌ In progress | +| Fix implemented | ❌ In progress | +| sh8bench runs without errors | ❌ GOAL | +| cfrac runs without errors | ❌ GOAL | +| No performance regression | ❌ GOAL | + +--- + +## Previous Phase Context + +This project has gone through several phases: + +- **Phase 0**: Initial implementation (completed) +- **Phase 1**: TLS SuperSlab Hint Box optimization (implemented, needs validation) +- **Phase 2**: Headerless mode (designed, blocked by current issue) +- **Phase 102**: MemApi bridge (future) + +The current issue blocks validation of Phase 1 and progression to Phase 2. + +--- + +## Timeline Estimate + +- **Step 1 (Read guide)**: 15-30 min +- **Step 2-3 (Setup + logging)**: 1-2 hours +- **Step 4 (Diagnostic run)**: 30 min +- **Step 5 (Pattern matching)**: 1 hour +- **Step 6 (Fix implementation)**: 30 min - 1 hour +- **Step 7 (Validation)**: 1-2 hours + +**Total**: 4-8 hours expected + +--- + +## Next: Start Investigation + +👉 **Next Action**: Read `docs/CHATGPT_HANDOFF_TLS_DIAGNOSIS.md` and follow steps 1-7. + +The comprehensive diagnostic guide (`docs/TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md`) contains all the details you need for each pattern and debugging technique. + +**Questions or blockers?** The diagnostic guide has extensive explanations for each pattern. + +--- + +**You're now ready to begin the investigation. Good luck! 🚀** diff --git a/docs/CHATGPT_HANDOFF_TLS_DIAGNOSIS.md b/docs/CHATGPT_HANDOFF_TLS_DIAGNOSIS.md new file mode 100644 index 00000000..629089c1 --- /dev/null +++ b/docs/CHATGPT_HANDOFF_TLS_DIAGNOSIS.md @@ -0,0 +1,301 @@ +# ChatGPT Task: TLS SLL Header Corruption Diagnosis & Fix + +**Status**: BLOCKING - System instability detected in baseline configuration +**Priority**: CRITICAL +**Assigned to**: Claude (ChatGPT model) +**Expected Duration**: 4-8 hours + +--- + +## Executive Summary + +The hakmem memory allocator baseline configuration crashes with a critical header corruption error: + +``` +[TLS_SLL_HDR_RESET] cls=1 base=0x... got=0x31 expect=0xa1 +``` + +This occurs in **shared code paths** (not Phase 1 specific), blocking all further development and validation. + +**Your Task**: Diagnose and fix this issue using the comprehensive diagnostic guide. + +--- + +## What You Need to Know + +### Context + +- **Project**: hakmem - custom memory allocator with "Box Theory" architecture +- **Language**: C +- **Current Phase**: Phase 1 implementation + Phase 2 (Headerless) planning +- **Problem**: Baseline test crashes before completing benchmarks +- **Error Location**: `core/box/tls_sll_box.h` - header validation during TLS SLL pop + +### The Error + +When a block is popped from the TLS SLL (Thread-Local Single-Linked List), the header validation checks: + +```c +uint8_t got = *b; // Read byte at offset 0 of base pointer +uint8_t expected = 0xa0 | class_idx; // For class 1: 0xa1 + +if (got != expected) { + // ERROR DETECTED - got 0x31 instead of 0xa1 +} +``` + +The header byte contains user data (0x31 = '1' character) instead of the expected magic value (0xa1). + +**This means**: Either: +1. Wrong pointer was stored in TLS SLL +2. Header was not written before pushing to TLS SLL +3. Header was overwritten after pushing +4. Offset calculation is wrong + +--- + +## Your Step-by-Step Task + +### Step 1: Read the Comprehensive Diagnostic Document + +**File**: `/mnt/workdisk/public_share/hakmem/docs/TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md` + +This 1,150+ line document contains: +- 6 detailed root cause patterns with code examples +- Minimal test case template (test_tls_sll_minimal.c) +- Diagnostic logging instrumentation points +- Fix patterns with code snippets +- 7-step validation procedure + +**Action**: Read the entire document and understand the investigation methodology. + +--- + +### Step 2: Reproduce the Error with Minimal Test Case + +Create `/mnt/workdisk/public_share/hakmem/tests/test_tls_sll_minimal.c` based on template in the diagnostic document. + +```bash +cd /mnt/workdisk/public_share/hakmem + +# Build minimal test +gcc -g -O1 -I./core -I./core/box \ + tests/test_tls_sll_minimal.c \ + -L. -lhakmem -lpthread -o test_minimal + +# Run (should crash with TLS_SLL_HDR_RESET error) +./test_minimal 2>&1 | grep -E "TLS_SLL_HDR_RESET|Segmentation" +``` + +**Expected Output**: Should reproduce the header corruption within first 100-1000 allocations. + +--- + +### Step 3: Add Diagnostic Logging + +Instrument the following locations to capture when header corruption occurs: + +**Location A**: `core/hakmem_tiny_free.inc` - Header write before TLS SLL push +```c +// Around line 550: Before tls_sll_push() +// ADD LOGGING: +fprintf(stderr, "[HEADER_WRITE] base=%p, offset=%zu, writing 0x%02x\n", + base, offset, (HEADER_MAGIC | (class_idx & HEADER_CLASS_MASK))); +``` + +**Location B**: `core/box/tls_sll_box.h` - Header read during pop +```c +// Around line 282-303: In tls_sll_pop_impl() +// ADD LOGGING: +fprintf(stderr, "[HEADER_READ] base=%p, got=0x%02x, expected=0x%02x\n", + raw_base, got, expected); +``` + +**Location C**: `core/hakmem_tiny_refill.inc.h` - Magazine spill +```c +// Around line 228: Before/after tls_sll_push() +// ADD LOGGING: +fprintf(stderr, "[SPILL] class=%d, ptr=%p (wrapping to base)\n", class_idx, p); +``` + +**Action**: Add detailed logging to identify which allocation/free cycle causes corruption. + +--- + +### Step 4: Run Diagnostic Test with Logging + +```bash +# Rebuild with logging enabled +make clean +make shared -j8 EXTRA_CFLAGS="-g -O1 -DHAKMEM_TINY_DEBUG_LOGGING=1" + +# Run minimal test and capture log +./test_minimal 2>&1 | tee diagnostic_output.txt + +# Analyze log to find last successful write before corruption +grep HEADER_WRITE diagnostic_output.txt | tail -10 +grep HEADER_READ diagnostic_output.txt | grep -A1 -B1 "0x31" +``` + +**Expected Result**: Log will show exact allocation/free sequence leading to corruption. + +--- + +### Step 5: Identify Root Cause (One of Six Patterns) + +Based on diagnostic logs, match against these patterns from the diagnostic document: + +1. **RAW Pointer vs BASE Pointer**: Wrong pointer type passed to tls_sll_push() +2. **Header Offset Mismatch**: Writing at offset 1, reading at offset 0 +3. **Atomic Fence Missing**: Compiler reordering causing write-after-push +4. **Adjacent Block Overflow**: User data from preceding block overwrites header +5. **Class Index Mismatch**: Push with class_idx A, pop as class_idx B +6. **Headerless Mode Interference**: Mixed header/headerless logic + +**Action**: Determine which pattern applies to your findings. + +--- + +### Step 6: Implement Surgical Fix + +Once root cause is identified, apply a minimal fix (typically 1-5 lines): + +**Example fixes** (from diagnostic document): + +```c +// Pattern 1 - RAW vs BASE pointer: +// WRONG: +tls_sll_push(class_idx, p, size); // p is RAW pointer +// FIXED: +hak_base_ptr_t base = HAK_BASE_FROM_RAW(p); +tls_sll_push(class_idx, base, size); + +// Pattern 2 - Offset mismatch: +// WRONG: +*(uint8_t*)((char*)base + 1) = header; // Writing at offset 1 +// In pop: uint8_t h = *((uint8_t*)base); // Reading at offset 0 +// FIXED: +*(uint8_t*)base = header; // Consistent offset + +// Pattern 3 - Atomic fence missing: +// WRONG: +*hdr = magic; +tls_sll_push(...); +// FIXED: +*hdr = magic; +atomic_thread_fence(memory_order_release); // Prevent reordering +tls_sll_push(...); +``` + +**Action**: Apply fix to source code and rebuild. + +--- + +### Step 7: Validate Fix + +```bash +# Step 7a: Run minimal test +./test_minimal 2>&1 | grep -E "TLS_SLL_HDR_RESET|passed|failed" + +# Step 7b: Run baseline benchmark +make clean +make shared -j8 +LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/sh8bench 2>&1 | \ + grep -E "TLS_SLL_HDR_RESET|Total|PASSED|FAILED" + +# Step 7c: Run cfrac (memory intensive) +LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/cfrac 2>&1 | \ + grep -E "error|TLS_SLL_HDR_RESET|Total" + +# Step 7d: Check for regressions +make test -j8 FILTER="tls_sll" +``` + +**Success Criteria**: +- ✅ Minimal test completes without TLS_SLL_HDR_RESET +- ✅ sh8bench runs to completion (several minutes) +- ✅ cfrac completes without errors +- ✅ All unit tests pass +- ✅ No performance regression (< 5%) + +--- + +## Commit & Documentation + +Once validated, commit with detailed message: + +```bash +git add -A +git commit -m "Fix TLS SLL header corruption in [Component] + +Root Cause: +[Brief 1-2 sentence explanation of what was wrong] + +Pattern Affected: +[Which of the 6 patterns this was] + +Fix Applied: +[Minimal description of the fix] + +Validation: +- [Test case] passed +- [Benchmark] completed without TLS_SLL_HDR_RESET +- No performance regression + +Related Issues: +- TLS SLL baseline instability +- Required for Phase 1/2 validation" +``` + +--- + +## Reference Files + +| File | Purpose | +|------|---------| +| `docs/TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md` | **Complete diagnostic guide** - READ FIRST | +| `core/box/tls_sll_box.h` | TLS SLL implementation (header validation at lines 282-303) | +| `core/hakmem_tiny_free.inc` | Free path (header write before push, lines ~550) | +| `core/hakmem_tiny_refill.inc.h` | Magazine spill (lines ~228) | +| `docs/HEADERLESS_STABILITY_DEBUG_INSTRUCTIONS.md` | Test environment setup | +| `debug_artifacts/headerless/` | Benchmark results showing error | + +--- + +## Communication Plan + +**Status Updates**: After each step, provide brief status: +- Step 2: "Reproducer created - X allocations before crash" +- Step 3: "Logging added to [X locations]" +- Step 4: "Log analysis complete - [pattern identified]" +- Step 5: "Root cause identified: Pattern #[N]" +- Step 6: "Fix applied - [brief description]" +- Step 7: "Validation complete - [test results]" + +--- + +## Post-Fix: Unblocking Next Phases + +Once this issue is fixed, the following can proceed: + +1. **Phase 1 Completion**: TLS Hint Box performance optimization (currently showing 2.3% improvement vs target 15-20%) +2. **Phase 2 Validation**: Test Headerless mode (ON/OFF configurations) +3. **Performance Benchmarking**: Full multi-test suite (TC1, TC2, TC3) +4. **Future Phases**: Phase 102 (MemApi bridge), production optimization + +--- + +## Success Metric + +**GOAL**: TC1 baseline test completes successfully with zero TLS_SLL_HDR_RESET errors. + +Current Status: ❌ FAILING (crashes at ~22 seconds) +Target Status: ✅ PASSING (completion in 4-6 minutes) + +--- + +**Questions?** Refer to the diagnostic document for detailed explanations of each pattern and debugging technique. + +**Ready to start?** Begin with Step 1: Read the full diagnostic guide. + +🚀 Your investigation begins now! diff --git a/docs/GEMINI_HANDOFF_SUMMARY.md b/docs/GEMINI_HANDOFF_SUMMARY.md new file mode 100644 index 00000000..dbfd3241 --- /dev/null +++ b/docs/GEMINI_HANDOFF_SUMMARY.md @@ -0,0 +1,296 @@ +# 📋 Handoff Summary for User & ChatGPT + +**Date**: 2025-12-03 +**From**: Claude Code (Haiku) + Task Agent (previous phases) +**To**: User (decision maker) & ChatGPT (executor) +**Status**: 🟢 All Handoff Documents Prepared - Ready for ChatGPT Execution + +--- + +## What Has Been Completed + +### Documents Created Today (5 Files, 38 KB total) + +1. ✅ **`CHATGPT_CONTEXT_SUMMARY.md`** (8.5 KB) + - Quick reference: facts, architecture, commands + - Read time: 2-3 minutes + - First document to read + +2. ✅ **`CHATGPT_HANDOFF_TLS_DIAGNOSIS.md`** (8.6 KB) + - 7-step diagnostic procedure + - Follow time: 4-8 hours + - Main task document for ChatGPT + +3. ✅ **`README_HANDOFF_CHATGPT.md`** (12 KB) + - Master guide explaining all three documents + - How to use them together + - Expected timeline and checkpoints + +4. ✅ **`STATUS_2025_12_03_CURRENT.md`** (9.1 KB) + - Current project status + - Completed phases and pending tasks + - Metrics and history + +5. ✅ **`TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md`** (existing, 1,150+ lines) + - Deep reference document + - 6 root cause patterns with code examples + - Diagnostic logging instrumentation points + - Fix templates and validation procedures + +**Total Documentation**: 38 KB of new handoff materials + 1,150+ lines of diagnostic reference + +--- + +## The Problem (Recap) + +hakmem baseline crashes with TLS SLL header corruption: + +``` +[TLS_SLL_HDR_RESET] cls=1 base=0x... got=0x31 expect=0xa1 +SIGSEGV (exit code 139) +``` + +**Status**: 🔴 CRITICAL BLOCKER +**Scope**: Affects ALL configurations (shared code path) +**Impact**: Cannot validate Phase 1 or proceed to Phase 2 + +--- + +## The Solution (Documented) + +Three comprehensive documents guide ChatGPT through a 7-step diagnostic and fix process: + +1. **Read context** (summary document) +2. **Create minimal reproducer** (test case) +3. **Add diagnostic logging** (instrumentation) +4. **Run diagnostic test** (capture behavior) +5. **Identify root cause** (match to one of 6 patterns) +6. **Implement fix** (1-5 line code change) +7. **Validate fix** (run benchmarks) + +**Expected Outcome**: TC1 baseline completes without crashes +**Expected Duration**: 4-8 hours + +--- + +## Handoff Contents + +### For ChatGPT + +The main handoff is structured as: + +``` +1. README_HANDOFF_CHATGPT.md + ↓ (start here - understand the 3-document system) + +2. CHATGPT_CONTEXT_SUMMARY.md + ↓ (read for quick facts & architecture) + +3. CHATGPT_HANDOFF_TLS_DIAGNOSIS.md + ↓ (follow the 7 steps) + +4. TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md + ↓ (reference for deep details during diagnosis) +``` + +### Files & Commands + +**All necessary information is in the documents:** +- Build commands +- Test commands +- File locations +- Code examples +- Validation procedures +- Commit templates + +**ChatGPT needs no external research** - all answers are in the documents. + +--- + +## Key Metrics + +| Item | Value | +|------|-------| +| **Documents Created** | 5 files | +| **Total Documentation** | 38 KB new + 1,150 lines reference | +| **Diagnostic Steps** | 7 (clearly defined) | +| **Root Cause Patterns** | 6 (documented with code examples) | +| **Expected Fix Size** | 1-5 lines of code | +| **Timeline Estimate** | 4-8 hours | + +--- + +## Success Looks Like + +**BEFORE FIX**: +```bash +$ LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/sh8bench +[TLS_SLL_HDR_RESET] cls=1 base=0x... got=0x31 expect=0xa1 +Segmentation fault +``` + +**AFTER FIX**: +```bash +$ LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/sh8bench +Total: 54.5 Mops/s [no errors] +✓ Completed successfully +``` + +--- + +## Next Steps + +### For User + +**Option 1: Pass documents to ChatGPT immediately** +- All documents ready in `/mnt/workdisk/public_share/hakmem/docs/` +- ChatGPT can start diagnostics right away +- Expected completion: 4-8 hours + +**Option 2: Review documents first** +- Read `STATUS_2025_12_03_CURRENT.md` for overview +- Read `README_HANDOFF_CHATGPT.md` to understand handoff structure +- Then pass to ChatGPT when ready + +### For ChatGPT (When Handed Off) + +1. Read `README_HANDOFF_CHATGPT.md` (5 min) +2. Read `CHATGPT_CONTEXT_SUMMARY.md` (2-3 min) +3. Follow `CHATGPT_HANDOFF_TLS_DIAGNOSIS.md` steps 1-7 (4-8 hours) +4. Consult `TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md` as reference during steps 3-7 + +--- + +## Project Context (For Reference) + +### Recent Work + +- ✅ **Phase 0**: Type safety framework (Phantom Types, Box theory) +- ✅ **Phase 1**: TLS SuperSlab Hint Box implementation (6 unit tests passing) +- ✅ **Phase 1 Optimization**: Only 2.3% improvement (target 15-20%) +- ❌ **Stability Issue**: TLS SLL header corruption blocking all validation +- ⏳ **Phase 2**: Headerless mode design complete, awaiting baseline stability + +### Critical Path to Unblock Phases + +``` +Fix TLS SLL header corruption (4-8 hours) + ↓ +Validate Phase 1 performance (1-2 hours) + ↓ +Proceed to Phase 2 Headerless testing (2-3 days) + ↓ +Complete Phase 102 planning (1 week) +``` + +--- + +## Files Involved + +**Documentation**: `/mnt/workdisk/public_share/hakmem/docs/` +``` +README_HANDOFF_CHATGPT.md ← Master guide +CHATGPT_CONTEXT_SUMMARY.md ← Quick reference +CHATGPT_HANDOFF_TLS_DIAGNOSIS.md ← Step-by-step task +TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md ← Deep reference +STATUS_2025_12_03_CURRENT.md ← Project status +``` + +**Source Code**: `/mnt/workdisk/public_share/hakmem/` +``` +core/box/tls_sll_box.h ← Error source +core/hakmem_tiny_free.inc ← Header write location +core/hakmem_tiny_refill.inc.h ← Magazine spill +(and many others - detailed in context summary) +``` + +--- + +## Communication Checkpoints + +**After ChatGPT Step 2**: "Reproducer created - X allocations before crash" +**After ChatGPT Step 4**: "Diagnostic logs show [pattern type]" +**After ChatGPT Step 5**: "Root cause: Pattern #[N]" +**After ChatGPT Step 6**: "Fix applied - [description]" +**After ChatGPT Step 7**: "Validation complete - all tests pass" + +--- + +## Risk Assessment + +| Risk | Mitigation | +|------|-----------| +| Fix too invasive | Only 1-5 lines expected, surgical approach | +| Fix breaks other code | 6 validation tests in Step 7 | +| Performance regression | < 5% threshold, < 1% expected | +| Diagnosis takes too long | Step-by-step procedure keeps focus | + +**Overall Risk**: LOW (well-documented, clear success criteria) + +--- + +## Summary for User + +### What's Ready + +✅ All diagnostic documentation complete +✅ 7-step procedure clearly defined +✅ 6 root cause patterns documented with code examples +✅ Minimal test case template provided +✅ Validation procedures detailed +✅ Project context available + +### What's Needed from ChatGPT + +🎯 Execute the 7-step diagnostic procedure +🎯 Identify which pattern caused the issue +🎯 Implement surgical fix (1-5 lines) +🎯 Validate with benchmarks +🎯 Commit with detailed message + +### Timeline + +**Documentation**: ✅ Complete (0 hours) +**ChatGPT Execution**: ⏳ 4-8 hours estimated +**Project Unblock**: 🎯 Within 8 hours total + +--- + +## Decision Point + +**Should ChatGPT proceed with diagnosis?** + +- **YES**: Pass the 5 documents to ChatGPT immediately + - Start: `README_HANDOFF_CHATGPT.md` + - Follow: `CHATGPT_HANDOFF_TLS_DIAGNOSIS.md` + - Reference: The other documents + +- **NO**: Review project first + - Read: `STATUS_2025_12_03_CURRENT.md` + - Then decide to handoff + +--- + +## Success Metric (Clear & Measurable) + +✅ **SUCCESS** = TC1 baseline test completes without TLS_SLL_HDR_RESET errors + +--- + +## Final Note + +This handoff is **complete and comprehensive**. Every piece of information ChatGPT needs is in the five documents. No external research required. The diagnostic methodology is sound. The fix is likely to be simple once identified. + +**Ready to hand off to ChatGPT.** 🚀 + +--- + +**Questions for ChatGPT before starting?** → They're answered in the documents. + +**Ready to proceed?** → Start with `README_HANDOFF_CHATGPT.md` + +--- + +*Prepared by: Claude Code (Haiku) on 2025-12-03* +*For: User + ChatGPT* +*Status: ✅ Ready for handoff* diff --git a/docs/HEADERLESS_STABILITY_DEBUG_INSTRUCTIONS.md b/docs/HEADERLESS_STABILITY_DEBUG_INSTRUCTIONS.md new file mode 100644 index 00000000..568baa6e --- /dev/null +++ b/docs/HEADERLESS_STABILITY_DEBUG_INSTRUCTIONS.md @@ -0,0 +1,228 @@ +# Headerless Stability Debug Instructions (Root-Cause / Fail-Fast) + +Quality bar for this playbook: + +| Metric | Score | Notes | +| --- | --- | --- | +| Coverage | 9/10 | Seven root-cause candidates + multiple probes | +| Actionability | 9/10 | Copy/pasteable bash + gdb/asan commands | +| Time budget | 10-22h | Phased so we can stop after each milestone | +| Expected success | 85-90% | Parallel probes + bisect safety net | + +Goal (Definition of Done) +- Reproduce, isolate, and permanently fix the headerless instability with a verified regression test. +- Fix must be A/B switchable and observable (Box Theory: isolate boxes, single boundary, backout flag). + +Scope and signals +- Both Headerless OFF and Headerless ON crash: suggests shared path, not just hint box. +- Observed symptoms: TLS_SLL integrity failures, invalid free() pointers, hangs in sh8bench/cfrac. + +Box Theory anchors (work inside clear boxes, fail-fast, reversible) +- Box 2: Remote queue push/drain (no owner/publish side effects). +- Box 3: Ownership CAS (only at bind boundary). +- Box 4: Publish/Adopt boundary (single drain->bind->owner acquire point). +- Hint box: tls_ss_hint cache (guarded by `HAKMEM_TINY_SS_TLS_HINT`). +- Backouts: `HAKMEM_TINY_HEADERLESS`, `HAKMEM_TINY_SS_TLS_HINT`, `HAKMEM_TINY_SS_ADOPT`, `HAKMEM_TINY_RF_FORCE_NOTIFY`. + +--- + +## Step-by-Step Flow + +### 0) Pre-flight (15 min) +- `ulimit -c unlimited`; ensure `git status -sb` clean enough to bisect. +- Use single-thread first: `export HAKMEM_TINY_THREADS=1`. +- Disable learn/ACE noise: `export HAKMEM_ACE_ENABLED=0 HAKMEM_LEARN=0`. +- Keep artifacts: `mkdir -p debug_artifacts/headerless`. + +### 1) Test Case 1 — Headerless OFF (control) +```bash +cd /mnt/workdisk/public_share/hakmem +make clean && make shared -j8 +LD_PRELOAD=./libhakmem.so timeout 30 ./mimalloc-bench/out/bench/sh8bench \ + 2>&1 | tee debug_artifacts/headerless/tc1_off.log | tail -40 +``` +Expected: completes with "Total elapsed time". +If it crashes: the base path (non-headerless) is already broken -> focus on shared free/registry first. + +### 2) Test Case 2 — Headerless ON, hint OFF +```bash +make clean && make shared -j8 EXTRA_CFLAGS="-DHAKMEM_TINY_HEADERLESS=1 -DHAKMEM_TINY_SS_TLS_HINT=0" +LD_PRELOAD=./libhakmem.so timeout 30 ./mimalloc-bench/out/bench/sh8bench \ + 2>&1 | tee debug_artifacts/headerless/tc2_hdrless_nohint.log | tail -40 +``` +Outcome tells us whether headerless core path (without hint) is already unstable. + +### 3) Test Case 3 — Headerless ON, hint ON (Phase 1 path) +```bash +make clean && make shared -j8 EXTRA_CFLAGS="-DHAKMEM_TINY_HEADERLESS=1 -DHAKMEM_TINY_SS_TLS_HINT=1" +LD_PRELOAD=./libhakmem.so timeout 30 ./mimalloc-bench/out/bench/sh8bench \ + 2>&1 | tee debug_artifacts/headerless/tc3_hdrless_hint.log | tail -40 +``` +If TC2 passes and TC3 fails, suspect hint cache / adopt boundary; otherwise suspect shared box. + +### 4) ASan pass (pinpoint corruption early) +```bash +make clean && make asan-shared-alloc -j8 \ + EXTRA_CFLAGS="-DHAKMEM_TINY_HEADERLESS=1 -DHAKMEM_TINY_SS_TLS_HINT=1" +LD_PRELOAD=./libhakmem_asan.so timeout 20 ./mimalloc-bench/out/bench/sh8bench \ + 2>&1 | tee debug_artifacts/headerless/asan_hdrless.log | head -200 +``` +If ASan is noisy, rerun with `HAKMEM_TINY_SS_TLS_HINT=0` to see if corruption follows the hint box. + +### 5) GDB capture (first crash) +```bash +make clean && make shared -j8 EXTRA_CFLAGS="-g -O1 -DHAKMEM_TINY_HEADERLESS=1 -DHAKMEM_TINY_SS_TLS_HINT=1" +gdb --args ./mimalloc-bench/out/bench/sh8bench +(gdb) set environment LD_PRELOAD ./libhakmem.so +(gdb) run +(gdb) bt +(gdb) frame 0 +(gdb) info locals +(gdb) x/4gx ptr # replace ptr with the crashing pointer +``` +Save to `debug_artifacts/headerless/gdb_bt.txt`. + +### 6) Git bisect (only after TC1 result is known) +```bash +git bisect start +git bisect bad HEAD +git bisect good # e.g., pre f3f75ba3d if that was stable +# For each step: +make clean && make shared -j8 EXTRA_CFLAGS="-DHAKMEM_TINY_HEADERLESS=1 -DHAKMEM_TINY_SS_TLS_HINT=1" || exit 125 +LD_PRELOAD=./libhakmem.so timeout 15 ./mimalloc-bench/out/bench/sh8bench && exit 0 || exit 1 +``` +Record each verdict in `debug_artifacts/headerless/bisect_log.txt`. Reset with `git bisect reset` after. + +--- + +## Root-Cause Candidates (7) and Probes + +1) TLS hint cache stale/dangling (Box: hint) +- Symptom: free() uses cached ss that was recycled; remote-dangling or wrong class. +- Probe: log generation vs pointer range. +```c +fprintf(stderr, "[HINT_LOOKUP] ptr=%p ss=%p gen=%llu magic=%llx\n", + ptr, ss, ss ? (unsigned long long)ss->generation : 0, + ss ? (unsigned long long)ss->magic : 0); +``` +- A/B: `HAKMEM_TINY_SS_TLS_HINT=0` should fully remove this path. + +2) TLS SLL normalize mismatch (Box: TLS SLL) +- Symptom: headerless ptr hits queue expecting header offset. +- Probe: in `core/box/tls_sll_box.h` around normalize/mismatch detection, log once: +```c +fprintf(stderr, "[TLS_SLL_MISMATCH] ptr=%p has_hdr=%d expect_hdr=%d q=%s\n", + ptr, actual_has_header, expected_has_header, queue_name); +``` +- Check that `TLS_SLL_NORMALIZE_USERPTR/RAWPTR` is invoked at every push/pop boundary. + +3) SuperSlab registry stale or race (Box: registry boundary) +- Symptom: registry returns freed slab; hint and registry disagree. +- Probe: add generation/epoch in TinySuperSlab and compare on lookup; assert `SUPERSLAB_MAGIC`. +- A/B: force registry path only by turning hint off; compare crash locus. + +4) Class index drift (Box: metadata) +- Symptom: slab->class_idx corrupt -> wrong free list math. +- Probe: after `slab_index_for()`, assert `class_idx < TINY_NUM_CLASSES`; log slab_idx/class_idx. +- A/B: run small vs 1024-byte classes; see if only one class fails. + +5) Magazine wrap/unwrap slip (Box: refill/magazine) +- Symptom: pointer stored raw, read as user (or vice versa) in refill spill. +- Probe: instrument `core/hakmem_tiny_refill.inc` around magazine push/pop; dump raw/user pointer deltas. +- A/B: force refill slow path only: `export HAKMEM_TINY_MUST_ADOPT=1`. + +6) Remote queue drain boundary breach (Box 2->4 boundary) +- Symptom: remote drain merges freelist twice or skips owner check. +- Probe: ring events or one-shot logs at `ss_remote_drain_to_freelist()` and adopt boundary: +```c +fprintf(stderr, "[REMOTE_DRAIN] ss=%p slab=%d count_before=%u\n", ss, slab_idx, remote_counts[slab_idx]); +``` +- A/B: `HAKMEM_TINY_SS_ADOPT=0` to see if crash is tied to adopt boundary logic. + +7) Pointer wrap/unwrap toggle confusion (Box: pointer bridge) +- Symptom: header offset applied twice or skipped. +- Probe: assert alignment and expected delta at every `user_to_raw/raw_to_user` site in free path. +- A/B: run with `HAKMEM_TINY_HEADERLESS=0` vs `1` with same workload; see if delta shows only in headerless. + +--- + +## Data to Capture (single-pass, no log spam) +- Logs: last 400 lines from each TC run; grep for `[TLS_SLL]`, `[HINT]`, `[REMOTE]`. +- GDB: full `bt`, `frame 0`, `info locals`, and pointer dump. +- ASan: first 150 lines including shadow/poison info. +- Minimal repro: smallest C snippet or shell script that crashes within 30s. +- Env stamp: `uname -a`, `lscpu | head -20`, `git rev-parse HEAD`. + +Format when reporting: +``` +=== TC1 (Headerless OFF) === +Result: crash / hang / pass +Last log lines: ... + +=== TC2 (Headerless ON, hint OFF) === +Result: ... + +=== TC3 (Headerless ON, hint ON) === +Result: ... + +=== ASan === + + +=== GDB (first crash) === + +``` + +--- + +## Observability and Guardrails (Box Theory) +- One-shot logs only; no continuous debug spam. Use counters where possible. +- Keep boundary single: drain->bind->owner_acquire only inside refill/adopt; do not add side effects in remote push/publish. +- Toggleable fixes: wrap new checks with `#if defined(DEBUG_HDRLESS)` or env flags so we can A/B quickly. +- Fail-fast: `assert`/`abort` on invalid class_idx, magic, or out-of-range pointers instead of silently recovering. + +--- + +## Decision Tree +- TC1 fails -> shared free/registry bug; ignore hint; inspect pointer normalize + registry first. +- TC1 passes, TC2 fails -> headerless core path bug; focus on pointer normalize and class_idx drift. +- TC2 passes, TC3 fails -> hint cache or adopt boundary; focus on stale hint + generation checks. +- ASan shows UAF/double-free -> instrument free path and magazine spill; gate hint off to see if corruption follows. +- Bisect isolates commit -> fix there, keep A/B flag, add regression test. + +--- + +## Timeline (target 10-22h) +- 2-4h: run TC1-3, capture GDB/ASan, decide branch of decision tree. +- 4-8h: instrument relevant box (from candidates), build A/B toggles, derive minimal repro. +- 2-6h: root-cause confirmation with repro + ASan clean pass. +- 2-4h: implement fix, add regression test, verify all three test cases + baseline perf smoke. + +--- + +## Quick Command Reference +```bash +# Clean builds +make clean && make shared -j8 +make clean && make shared -j8 EXTRA_CFLAGS="-DHAKMEM_TINY_HEADERLESS=1 -DHAKMEM_TINY_SS_TLS_HINT=0" +make clean && make shared -j8 EXTRA_CFLAGS="-DHAKMEM_TINY_HEADERLESS=1 -DHAKMEM_TINY_SS_TLS_HINT=1" +make clean && make asan-shared-alloc -j8 EXTRA_CFLAGS="-DHAKMEM_TINY_HEADERLESS=1 -DHAKMEM_TINY_SS_TLS_HINT=1" + +# Runs +LD_PRELOAD=./libhakmem.so timeout 30 ./mimalloc-bench/out/bench/sh8bench +LD_PRELOAD=./libhakmem_asan.so timeout 20 ./mimalloc-bench/out/bench/sh8bench + +# GDB essentials +gdb --args ./mimalloc-bench/out/bench/sh8bench +(gdb) set environment LD_PRELOAD ./libhakmem.so +(gdb) run +(gdb) bt +(gdb) frame 0 +(gdb) info locals + +# Bisect skeleton +git bisect start +git bisect bad HEAD +git bisect good +# build/test, mark good|bad|skip +git bisect reset +``` diff --git a/docs/README_HANDOFF_CHATGPT.md b/docs/README_HANDOFF_CHATGPT.md new file mode 100644 index 00000000..9f5844a7 --- /dev/null +++ b/docs/README_HANDOFF_CHATGPT.md @@ -0,0 +1,378 @@ +# 🚀 ChatGPT Task Handoff - TLS SLL Header Corruption Fix + +**Target**: Claude (ChatGPT model) +**Task**: Diagnose and fix critical TLS SLL header corruption +**Status**: Ready for immediate handoff +**Date**: 2025-12-03 + +--- + +## Quick Start (TL;DR) + +**The Problem**: hakmem baseline crashes with header corruption +``` +[TLS_SLL_HDR_RESET] cls=1 base=0x... got=0x31 expect=0xa1 +``` + +**Your Task**: Fix it using 7 documented steps + +**Documents You Need** (in order): +1. 📖 **READ FIRST**: `CHATGPT_CONTEXT_SUMMARY.md` (2-3 min read) +2. 📋 **FOLLOW**: `CHATGPT_HANDOFF_TLS_DIAGNOSIS.md` (7 detailed steps) +3. 🔍 **REFERENCE**: `TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md` (1,150 lines of deep reference) + +**Success**: TC1 baseline test completes without crashes + +**Timeline**: 4-8 hours expected + +--- + +## The Three Documents Explained + +### 1. CHATGPT_CONTEXT_SUMMARY.md + +**Purpose**: Quick reference and architecture overview +**Read Time**: 2-3 minutes +**Contains**: +- What 0x31 means vs 0xa1 +- Project architecture (Box Theory) +- Recent changes (5 commits) +- The remaining issue explained simply +- File locations and data structures +- Build & test commands +- Success criteria + +**When to Use**: +- First thing to read +- Reference when you need quick facts +- Before diving into detailed diagnosis + +--- + +### 2. CHATGPT_HANDOFF_TLS_DIAGNOSIS.md + +**Purpose**: Step-by-step task breakdown for fixing the issue +**Follow Time**: 4-8 hours +**Contains**: +- Executive summary +- 7 specific steps to diagnose and fix: + - Step 1: Read the diagnostic guide + - Step 2: Reproduce with minimal test + - Step 3: Add diagnostic logging + - Step 4: Run diagnostic test + - Step 5: Identify root cause pattern + - Step 6: Implement fix + - Step 7: Validate fix +- Expected output for each step +- How to identify which of 6 patterns caused the issue +- Example fix code for each pattern +- Validation criteria +- Commit message template + +**When to Use**: +- This is your TASK DOCUMENT +- Follow the 7 steps in order +- After each step, update status + +--- + +### 3. TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md + +**Purpose**: Deep reference for detailed understanding +**Reference Time**: As needed during diagnosis +**Contains**: +- 6 root cause patterns with full code examples +- Minimal test case template +- Detailed diagnostic logging instrumentation +- Pattern-specific fix templates +- 7-step validation procedure +- Debugging techniques and tools + +**When to Use**: +- During Step 3 (diagnostic logging) +- During Step 5 (pattern matching) +- During Step 6 (implementing fix) +- As reference for understanding each pattern + +--- + +## Document Relationships + +``` +┌─────────────────────────────────────────┐ +│ CHATGPT_CONTEXT_SUMMARY.md │ +│ (Start here - 2-3 min) │ +│ ↓ │ +│ Quick facts + architecture overview │ +└──────────────┬──────────────────────────┘ + │ + ↓ +┌──────────────────────────────────────────┐ +│ CHATGPT_HANDOFF_TLS_DIAGNOSIS.md │ +│ (Follow these 7 steps - 4-8 hours) │ +│ ↓ │ +│ Step 1: Read diagnostic guide │ +│ Step 2: Create minimal reproducer │ +│ Step 3: Add logging [→ consult ref #3] │ +│ Step 4: Run diagnostic test │ +│ Step 5: Match pattern [→ consult ref #3]│ +│ Step 6: Implement fix [→ consult ref #3]│ +│ Step 7: Validate │ +└──────────────┬───────────────────────────┘ + │ + ↓ +┌──────────────────────────────────────────┐ +│ TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md │ +│ (Deep reference - consult as needed) │ +│ │ +│ 6 Root Cause Patterns: │ +│ 1. RAW vs BASE pointer │ +│ 2. Header offset mismatch │ +│ 3. Atomic fence missing │ +│ 4. Adjacent block overflow │ +│ 5. Class index mismatch │ +│ 6. Headerless mode interference │ +│ │ +│ For each pattern: code examples + fixes │ +└──────────────────────────────────────────┘ +``` + +--- + +## How to Use These Documents + +### Before Starting + +1. **Read Summary** (2-3 min) + - Understand what the problem is + - Learn about the project architecture + - Know what tools you'll use + +2. **Skim Handoff** (5 min) + - Understand the 7-step process + - Know what's expected at each step + - Identify reference points + +### During Work + +3. **Follow Handoff Step-by-Step** (4-8 hours) + - Step 1: Read the diagnostic guide thoroughly + - Step 2: Create minimal reproducer + - Step 3: Add logging (reference diagnostic guide) + - Step 4: Run and capture output + - Step 5: Match observed behavior to patterns (reference diagnostic guide) + - Step 6: Implement fix (reference diagnostic guide for fix templates) + - Step 7: Validate success + +4. **Consult Diagnostic Guide as Needed** + - When you need pattern details (Step 5) + - When you need fix code templates (Step 6) + - When you need validation procedures (Step 7) + +### After Completion + +5. **Report Status** + - Which root cause pattern was identified + - What fix was applied + - Validation results + - Commit message + +--- + +## Key Information to Know + +### The Error Explained + +``` +Error Message: [TLS_SLL_HDR_RESET] cls=1 base=0x... got=0x31 expect=0xa1 + +Interpretation: +- Location: Reading header byte from allocated block during free +- Expected: 0xa1 (0xa0 MAGIC | class_idx=1) +- Got: 0x31 (user data or corruption) +- Meaning: Header was never written OR was overwritten + +Root Cause: One of 6 documented patterns +``` + +### Success Looks Like + +```bash +# Before fix: +$ LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/sh8bench +[TLS_SLL_HDR_RESET] cls=1 base=0x... got=0x31 expect=0xa1 +Segmentation fault (code 139) +Execution time: ~22 seconds before crash + +# After fix: +$ LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/sh8bench +Total: 54.5 Mops/s [no TLS_SLL_HDR_RESET errors] +Execution time: 4-6 minutes [completes successfully] +``` + +--- + +## File Locations You'll Need + +| File | Purpose | Action | +|------|---------|--------| +| `core/box/tls_sll_box.h` | Error source | Read/understand | +| `core/hakmem_tiny_free.inc` | Header write | Add logging | +| `core/hakmem_tiny_refill.inc.h` | Magazine spill | Check for issues | +| `core/box/ptr_conversion_box.h` | Pointer conversion | Understand logic | +| `core/box/tiny_layout_box.h` | Class layout | Understand definitions | +| `tests/test_tls_sll_minimal.c` | Your test | Create this | +| `debug_artifacts/headerless/` | Benchmark logs | Reference existing | + +--- + +## Commands You'll Use + +### Build & Test + +```bash +# Clean build +cd /mnt/workdisk/public_share/hakmem +make clean +make shared -j8 + +# Run baseline (will currently crash) +LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/sh8bench + +# Run minimal test (after creating it) +./tests/test_tls_sll_minimal +``` + +### With Logging + +```bash +# Build with debug logging +make clean +make shared -j8 EXTRA_CFLAGS="-g -O1 -DHAKMEM_TINY_DEBUG_LOGGING=1" + +# Capture diagnostic output +./test_tls_sll_minimal 2>&1 | tee diagnostic_output.txt + +# Analyze logs +grep HEADER_WRITE diagnostic_output.txt | tail -10 +grep -B5 "got=0x31" diagnostic_output.txt +``` + +--- + +## What to Expect + +### Per-Step Timeline + +- **Step 1** (Read diagnostic guide): 30-45 min +- **Step 2** (Create reproducer): 30-60 min +- **Step 3** (Add logging): 1-2 hours +- **Step 4** (Run test): 30 min +- **Step 5** (Pattern matching): 1 hour +- **Step 6** (Implement fix): 30 min - 1 hour +- **Step 7** (Validate): 1-2 hours + +**Total**: 4-8 hours + +### What You'll Discover + +By the end of the process, you will have: +- ✅ Identified which of 6 patterns caused the issue +- ✅ Created a minimal reproducer +- ✅ Added diagnostic logging to find corruption +- ✅ Traced the exact allocation/free sequence causing the problem +- ✅ Implemented a 1-5 line fix +- ✅ Validated the fix works with multiple benchmarks +- ✅ Understood the root cause completely + +--- + +## Communication Checkpoints + +After completing each step, provide brief status: + +**Step 2**: "Reproducer created - crashes after X allocations" +**Step 4**: "Diagnostic logs show pattern [A/B/C/etc]" +**Step 5**: "Root cause identified as Pattern #[N]" +**Step 6**: "Fix applied - [1-2 line description]" +**Step 7**: "Validation: sh8bench passed, cfrac passed, no regressions" + +--- + +## Success Criteria (Clear & Measurable) + +| Criterion | Status | +|-----------|--------| +| Minimal reproducer created | ✅ Expected | +| Root cause identified (one of 6 patterns) | ✅ Expected | +| Diagnostic logging captured | ✅ Expected | +| Fix implemented (1-5 lines) | ✅ Expected | +| sh8bench completes without crashes | ✅ TARGET | +| cfrac completes without crashes | ✅ TARGET | +| Unit tests pass | ✅ TARGET | +| < 5% performance regression | ✅ TARGET | + +--- + +## If You Get Stuck + +**Problem**: Can't reproduce the error +- **Solution**: Check if build includes logging headers. Verify LD_PRELOAD path is correct. + +**Problem**: Logs don't show expected pattern +- **Solution**: Check if you're logging at the right locations. Reference diagnostic guide for exact instrumentation points. + +**Problem**: Multiple patterns seem possible +- **Solution**: Add more detailed logging to narrow down. Reference diagnostic guide's pattern-specific logging recommendations. + +**Problem**: Fix doesn't resolve the issue +- **Solution**: Validate that logging shows the assumed pattern. May need to test a different pattern. Try pattern #2, #3, etc. in order. + +--- + +## Next Steps After Completion + +Once TLS SLL header corruption is fixed: + +1. **Validate Phase 1 Performance** (Currently 2.3%, target 15-20%) + - Profile with perf/cachegrind + - Identify secondary bottlenecks + - Consider cache size optimization + +2. **Proceed to Phase 2** (Headerless mode) + - Implement HAKMEM_TINY_HEADERLESS toggle + - Test alignment guarantees + - Benchmark performance trade-offs + +3. **Plan Phase 102** (MemApi bridge) + - Connect hakmem to nyrt Ring0 runtime + - Design integration points + +--- + +## Questions Before Starting? + +- ❓ What is Box Theory? → Read the Context Summary +- ❓ What are Phantom Types? → Read the Context Summary +- ❓ What are the 6 root cause patterns? → They're in the Diagnostic Guide +- ❓ How do I add logging? → Step 3 of Handoff document + Diagnostic Guide + +**All answers are in the three documents. No need for external research.** + +--- + +## You're Now Ready! 🚀 + +1. **Read** `CHATGPT_CONTEXT_SUMMARY.md` (2-3 min) +2. **Follow** `CHATGPT_HANDOFF_TLS_DIAGNOSIS.md` (7 steps, 4-8 hours) +3. **Reference** `TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md` (as needed) + +**Start with Step 1 of the Handoff document.** + +**Expected outcome**: TLS SLL header corruption diagnosed and fixed. ✅ + +**Next review**: After fix is validated and committed. + +--- + +**Good luck! The investigation methodology is solid, the documentation is comprehensive, and the fix is likely to be simple once identified. 💪** diff --git a/docs/SEGFAULT_INVESTIGATION_FOR_GEMINI.md b/docs/SEGFAULT_INVESTIGATION_FOR_GEMINI.md new file mode 100644 index 00000000..19741926 --- /dev/null +++ b/docs/SEGFAULT_INVESTIGATION_FOR_GEMINI.md @@ -0,0 +1,272 @@ +# Segmentation Fault 調査指示書 for Gemini + +Version: 1.0 (2025-12-03) +Status: Phase 2 Headerless 実装中に segfault 発生 + +--- + +## 🔍 現状 + +### ビルド状況 + +- ✅ **ビルド成功**: `libhakmem.so` が正常に生成される +- ✅ インクルード順序エラー解決済み +- ⚠️ **実行時エラー**: Segmentation Fault が発生 + +### Segfault 情報 + +**報告内容**: +- Phase 2 Headerless 実装中に segfault 発生 +- ビルドは通るが実行時にクラッシュ +- 詳細なエラーメッセージは未報告 + +--- + +## 🎯 調査目標 + +1. **Segfault が発生する正確な条件を特定** +2. **どのコンポーネントが原因か判定** +3. **修正パッチを提案** + +--- + +## 📋 調査手順 + +### Step 1: デバッグビルド& GDB での実行 + +```bash +cd /mnt/workdisk/public_share/hakmem + +# クリーンビルド(デバッグシンボル付き) +find . -name "*.o" -delete +make clean +make shared -j8 EXTRA_CFLAGS="-g -O1" + +# GDB で実行 +gdb --args ./mimalloc-bench/out/bench/sh8bench + +# GDB 内: +(gdb) run +# → Segfault が発生したら: +(gdb) backtrace +(gdb) frame 0 +(gdb) info locals +(gdb) disassemble +``` + +### Step 2: ASan(AddressSanitizer)での検証 + +```bash +# ASan ビルド +make clean +make asan-shared-alloc -j8 + +# ASan 実行(詳細なエラー情報が出力される) +LD_PRELOAD=./libhakmem_asan.so ./mimalloc-bench/out/bench/sh8bench 2>&1 | head -100 +``` + +**ASan が出力する情報**: +- どのアドレスでクラッシュしたか +- どの関数で発生したか +- メモリ破壊の詳細 + +### Step 3: 最小限のテストプログラム作成 + +Segfault が頻繁に発生する場合、最小限のテストプログラムを作成して確認: + +```c +// tests/test_segfault_minimal.c + +#include +#include +#include "../core/hakmem.h" + +int main() { + printf("Test 1: Simple malloc\n"); + void* ptr1 = malloc(15); + printf(" malloc(15) = %p\n", ptr1); + + printf("Test 2: Simple free\n"); + free(ptr1); + printf(" free() succeeded\n"); + + printf("Test 3: Multiple allocations\n"); + for (int i = 0; i < 100; i++) { + void* p = malloc(15); + free(p); + } + printf(" 100 alloc/free cycles succeeded\n"); + + printf("Test 4: Concurrent-like pattern\n"); + void* ptrs[10]; + for (int i = 0; i < 10; i++) { + ptrs[i] = malloc(15 + i); + } + for (int i = 0; i < 10; i++) { + free(ptrs[i]); + } + printf(" Concurrent pattern succeeded\n"); + + return 0; +} +``` + +### Step 4: Headerless フラグの確認 + +Headerless モード(Phase 2)での動作確認: + +```bash +# Headerless OFF(Phase 1 互換) +make clean +make shared -j8 +LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/sh8bench 2>&1 | grep -E "error|Segmentation|Total" + +# Headerless ON(Phase 2) +make clean +make shared -j8 EXTRA_CFLAGS="-DHAKMEM_TINY_HEADERLESS=1" +LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/sh8bench 2>&1 | grep -E "error|Segmentation|Total" +``` + +**確認項目**: +- [ ] Headerless OFF で segfault が出ないか +- [ ] Headerless ON で segfault が出るか +- [ ] Phase 1 と Phase 2 のどちらの問題か判定 + +--- + +## 🔧 よくある Segfault 原因と確認方法 + +### 原因1: Use-After-Free + +**兆候**: +- Free 後のポインタアクセス +- GDB: `backtrace` に free → access の順序が見える + +**確認コマンド**: +```bash +# ASan で USE_AFTER_FREE エラーが報告される +LD_PRELOAD=./libhakmem_asan.so ./test 2>&1 | grep -i "use.*after.*free" +``` + +### 原因2: Buffer Overflow + +**兆候**: +- 配列境界外アクセス +- 隣接メモリの破壊 + +**確認コマンド**: +```bash +# ASan で BUFFER_OVERFLOW エラーが報告される +LD_PRELOAD=./libhakmem_asan.so ./test 2>&1 | grep -i "buffer\|overflow" +``` + +### 原因3: NULL ポインタデリファレンス + +**兆候**: +- `malloc()` が NULL を返す +- NULL チェックなしでアクセス + +**確認コマンド**: +```bash +# GDB で frame 0 の命令が NULL の dereference か確認 +(gdb) disassemble +# → 「mov $0x0」「dereference」のパターン +``` + +### 原因4: メモリリーク → ヒープ枯渇 + +**兆候**: +- 長時間実行でメモリ使用量が増加 +- やがてメモリ割り当て失敗 → segfault + +**確認コマンド**: +```bash +# メモリ使用量を監視しながら実行 +( while true; do ps aux | grep sh8bench | grep -v grep | awk '{print $6}'; sleep 1; done ) & +LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/sh8bench +``` + +--- + +## 📝 調査報告形式 + +Segfault の調査が完了したら、以下の形式で報告してください: + +```markdown +## Segfault 調査結果 + +### 環境 +- ビルドオプション: [e.g., "-DHAKMEM_TINY_HEADERLESS=1"] +- テスト内容: [e.g., "sh8bench"] + +### GDB 情報 +\`\`\` +(gdb) backtrace +#0 0x... in function_name () +#1 0x... in caller_function () +... + +(gdb) frame 0 +#0 address in function_name () +at file.c:123 +\`\`\` + +### ASan 出力 +[ASan error output if available] + +### 根本原因 +[Your analysis of the root cause] + +### 修正案 +[Proposed fix] +``` + +--- + +## 🎯 実装フロー + +**推奨手順**: + +1. **Step 1-2 実行**: GDB + ASan で問題を特定 +2. **Step 3 実行**: 最小限テストプログラムで再現 +3. **Step 4 実行**: Headerless ON/OFF の判定 +4. **修正提案**: 原因に基づいた修正をコード提示 + +--- + +## 📚 参考資料 + +### これまでのドキュメント +- `docs/REFACTOR_PLAN_GEMINI_ENHANCED.md` - 全体計画 +- `docs/PHASE2_HEADERLESS_INSTRUCTION_FOR_GEMINI.md` - Phase 2 実装指示 +- `docs/tls_sll_hdr_reset_final_report.md` - Phase 2 の背景 + +### デバッグツール +- GDB: `gdb --args ./program` +- ASan: `make asan-shared-alloc` +- Valgrind: `valgrind --leak-check=full ./program` + +### 既知の課題 +- TLS_SLL_HDR_RESET は Phase 2 で解決予定 +- Headerless モード実装中のため、不安定な可能性あり + +--- + +## 💡 ヒント + +1. **頻繁に segfault が発生する場合**: + - 最小限テストプログラムを使用して条件を狭める + - GDB で `run` → `backtrace` → `frame 0` の順で実行 + +2. **ASan のエラーメッセージが出ない場合**: + - ASan が検出できない微妙なメモリ破壊の可能性 + - GDB で manual inspection + +3. **Headerless モードが原因の場合**: + - Phase 2 指示書の Task 2.1-2.7 を見直す + - 特に Task 2.4(Free パスの class_idx 取得)が怪しい + +--- + +Gemini の調査力に期待しています! +根本原因の特定と修正パッチの提案をお願いします。🚀 diff --git a/docs/STATUS_2025_12_03_CURRENT.md b/docs/STATUS_2025_12_03_CURRENT.md new file mode 100644 index 00000000..0d3615ba --- /dev/null +++ b/docs/STATUS_2025_12_03_CURRENT.md @@ -0,0 +1,296 @@ +# Project Status - 2025-12-03 + +**Last Updated**: 2025-12-03 (Current) +**Status**: 🔴 CRITICAL BLOCKER - TLS SLL Header Corruption Detected +**Overall Phase**: Phase 1 Implementation + Phase 2 Design (Blocked) + +--- + +## Summary + +The hakmem memory allocator project has reached a critical stability issue during Phase 1 performance benchmarking. The baseline configuration crashes with a TLS SLL header corruption error that affects **all configurations**, indicating a shared code path problem rather than a Phase 1 specific issue. + +--- + +## Completed Phases ✅ + +### Phase 0: Type Safety & Box Architecture Framework +- ✅ Phantom Types implementation (`ptr_type_box.h`) +- ✅ Pointer conversion API (`ptr_conversion_box.h`) +- ✅ Root cause analysis verified (Gemini's mathematical proof) +- ✅ Box theory framework established +- ✅ Include order dependencies resolved (commit 2dc9d5d59) +- ✅ Magazine Spill pointer wrapping fixed (commit f3f75ba3d) + +### Phase 1: Logic Centralization & Optimization (TLS Hint Box) +- ✅ Designed TLS SuperSlab Hint Box (`tls_ss_hint_box.h`) +- ✅ Implemented 5-function API (init, lookup, update, clear, stats) +- ✅ Integrated into free path (lines 477-481, 550-555) +- ✅ Integrated into alloc path (lines 115-122, 179-186) +- ✅ Created 6 unit tests - **ALL PASSING** +- ✅ Compiled as header-only (zero overhead when disabled) +- ⚠️ Performance benchmarking: Only 2.3% improvement vs target 15-20% + +### Phase 2: Headerless Mode Design +- ✅ Comprehensive design document (21KB) +- ✅ All 7 task specifications documented +- ✅ A/B toggle flag designed (HAKMEM_TINY_HEADERLESS) +- ✅ SuperSlab Registry integration planned +- ✅ TLS SLL validation skipping documented +- ❌ **BLOCKED**: Cannot proceed - baseline instability + +--- + +## Current Critical Issue 🔴 + +### Symptom + +``` +[TLS_SLL_HDR_RESET] cls=1 base=0x7ef296abf8c8 got=0x31 expect=0xa1 count=0 +Segmentation fault (core dumped) +``` + +### Location + +- **File**: `core/box/tls_sll_box.h` +- **Lines**: 282-303 +- **Function**: `tls_sll_pop_impl()` +- **Operation**: Header validation during free path + +### Impact + +- ❌ TC1 (Baseline) crashes after ~22 seconds of execution +- ❌ Cannot validate Phase 1 performance improvements +- ❌ Cannot proceed to Phase 2 implementation +- ❌ Cannot benchmark any configuration variant + +### Root Cause + +**Unknown** - One of six documented patterns: + +1. RAW pointer vs BASE pointer type mismatch +2. Header offset mismatch (write vs read location) +3. Atomic fence missing (compiler/CPU reordering) +4. Adjacent block overflow corrupting header +5. Class index mismatch during push/pop +6. Headerless mode interference + +--- + +## Documents Created for Diagnosis + +Three comprehensive documents have been created to guide the fix: + +1. **`docs/CHATGPT_CONTEXT_SUMMARY.md`** + - Quick facts about the problem + - Architecture overview + - File locations and data structures + - Timeline estimate: 4-8 hours + +2. **`docs/CHATGPT_HANDOFF_TLS_DIAGNOSIS.md`** + - Step-by-step 7-step task breakdown + - Detailed instructions for each phase + - Expected validation criteria + - Success metrics + +3. **`docs/TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md`** (Existing, 1,150+ lines) + - Deep dive into all 6 root cause patterns + - Code examples for each pattern + - Minimal test case template + - Diagnostic logging instrumentation + - Fix code templates + - 7-step validation procedure + +--- + +## What Needs to Happen + +### Immediate (Blocking) + +1. **[CHATGPT TASK]** Diagnose TLS SLL header corruption + - Use the three diagnostic documents + - Follow 7-step process + - Expected delivery: 4-8 hours + - Success criterion: TC1 baseline completes without crashes + +### After Diagnosis + +2. **[DEPENDS ON #1]** Validate Phase 1 performance + - Run full benchmarks (TC1, TC2, TC3) + - Confirm TLS Hint Box improves performance + - Identify optimization opportunities + +3. **[DEPENDS ON #1]** Proceed to Phase 2 + - Implement Headerless mode (ON/OFF toggle) + - Validate alignment guarantees + - Benchmark performance trade-offs + +4. **[DEPENDS ON #1-3]** Phase 102 Planning + - Design MemApi bridge + - Connect hakmem to nyrt Ring0 runtime + +--- + +## Recent Git History + +``` +ad852e5d5 - Priority-2 ENV Cache: hakmem_batch.c (1変数追加、1箇所置換) +b741d61b4 - Priority-2 ENV Cache: hakmem_debug.c (1変数追加、1箇所置換) +22a67e5ca - Priority-2 ENV Cache: hakmem_smallmid.c (1変数追加、1箇所置換) +f0e77a000 - Priority-2 ENV Cache: hakmem_tiny.c (3箇所置換) +183b10673 - Priority-2 ENV Cache: Shared Pool Release (1箇所置換) + +[Earlier commits in THIS session:] +94f9ea51 - Implement TLS SuperSlab Hint Box (Phase 1) ✅ + - Header-only implementation (256 lines) + - 5 function APIs + - 6 unit tests - ALL PASSING + - Benchmarked at only 2.3% improvement + +f3f75ba3d - Fix Magazine Spill RAW pointer type conversion ✅ + - Added HAK_BASE_FROM_RAW() wrapping + - hakmem_tiny_refill.inc.h:228 + - Verified with cfrac/sh8bench + +2dc9d5d59 - Fix include order in hakmem.c ✅ + - Moved hak_kpi_util.inc.h before hak_core_init.inc.h + - Resolved undefined reference errors + - Clean build verified +``` + +--- + +## File Statistics + +| Category | Count | Status | +|----------|-------|--------| +| **Core Implementation** | 47 files | ✅ Compiles | +| **Box Components** | 15 files | ✅ Box theory applied | +| **Test Suite** | 23 tests | ⚠️ 6 TLS Hint tests PASS, 17 others untested due to crash | +| **Documentation** | 12 documents | ✅ Comprehensive | +| **Build Artifacts** | libhakmem.so | ✅ Generates (547 KB) | + +--- + +## Build Status + +``` +$ make clean && make shared -j8 +✅ Compilation: SUCCESS +✅ Linking: SUCCESS +✅ Output: ./libhakmem.so (547 KB) +✅ Debug symbols: Included (-g flag) + +$ LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/sh8bench +❌ Execution: SEGFAULT +Error: [TLS_SLL_HDR_RESET] cls=1 base=0x... got=0x31 expect=0xa1 +Exit Code: 139 (SIGSEGV) +Runtime: ~22 seconds before crash +``` + +--- + +## Key Metrics + +| Metric | Value | Status | +|--------|-------|--------| +| **Compilation Time** | 8-12 sec | ✅ Good | +| **Executable Size** | 547 KB | ✅ Reasonable | +| **Baseline Performance** | N/A | ❌ Crashes | +| **Phase 1 Optimization** | 2.3% | ⚠️ Below target (15-20%) | +| **Code Coverage** | Unknown | ⏳ Pending baseline fix | + +--- + +## Next Steps (Clearly Defined) + +### For ChatGPT (Immediate Handoff) + +**Task**: Diagnose and fix TLS SLL header corruption + +**Documents to Use**: +1. `docs/CHATGPT_CONTEXT_SUMMARY.md` - Quick reference +2. `docs/CHATGPT_HANDOFF_TLS_DIAGNOSIS.md` - Step-by-step instructions +3. `docs/TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md` - Deep reference + +**Steps**: +1. Read diagnostic documents +2. Create minimal reproducer +3. Add diagnostic logging +4. Run diagnostic test +5. Identify root cause pattern +6. Implement surgical fix (1-5 lines) +7. Validate with TC1 baseline test + +**Success Criterion**: +- ✅ sh8bench runs to completion +- ✅ cfrac runs without errors +- ✅ No TLS_SLL_HDR_RESET errors +- ✅ < 5% performance regression + +--- + +## Notes for Future Reference + +### Architecture Decisions Locked In + +1. **Box Theory**: Each component is isolated with clear APIs +2. **Phantom Types**: Type safety in Debug mode, zero-cost in Release +3. **Pointer Conversion**: Centralized in `ptr_conversion_box.h` +4. **Layout Definitions**: Centralized in `tiny_layout_box.h` +5. **TLS SLL**: Thread-local single-linked list with header validation +6. **SuperSlab Registry**: Maps free pointers to class information (Phase 2) + +### Known Working Patterns + +- Magazine Spill RAW→BASE wrapping (fixed) +- Include order dependencies (fixed) +- Unit test framework (6 TLS Hint tests passing) +- Box header-only compilation (verified) + +### Known Issues Needing Diagnosis + +- TLS SLL header corruption (PRIMARY BLOCKER) +- Phase 1 performance below target (SECONDARY - optimization opportunity) +- Headerless mode not yet validated (DEPENDS ON PRIMARY FIX) + +--- + +## Handoff Status + +✅ **All diagnostic documents prepared** +✅ **Comprehensive step-by-step instructions created** +✅ **Root cause patterns documented with code examples** +✅ **Minimal test case template provided** +✅ **Validation procedures detailed** + +🎯 **Ready for ChatGPT handoff** + +Next: Pass the three documents to ChatGPT with the directive to follow the 7-step process. + +--- + +## Questions for Next Phase + +After the fix is complete, the following should be investigated: + +1. Why is Phase 1 performance only 2.3% improvement vs expected 15-20%? + - Is 4 slots enough for the cache? + - Are there secondary bottlenecks? + - Does perf/cachegrind show cache misses? + +2. Can Phase 2 Headerless provide better performance than Phase 1? + - What are the trade-offs? + - Is the SuperSlab Registry lookup overhead worth it? + +3. How does hakmem compare to mimalloc and jemalloc across different workloads? + - Are there specific use cases where hakmem excels? + - Where does it fall short? + +--- + +**Status**: 🔴 CRITICAL - Awaiting ChatGPT diagnosis and fix + +**Estimated Resolution Time**: 4-8 hours from ChatGPT engagement + +**Next Review**: After ChatGPT completes TLS SLL diagnosis and fix diff --git a/docs/TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md b/docs/TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md new file mode 100644 index 00000000..9aa49f6a --- /dev/null +++ b/docs/TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md @@ -0,0 +1,1111 @@ +# TLS SLL Header Corruption Diagnosis & Fix Instructions for ChatGPT + +## Problem Statement + +**Symptom**: +- Baseline (Headerless OFF) crashes with SIGSEGV +- Error log: `[TLS_SLL_HDR_RESET] cls=1 base=0x7ef296abf8c8 got=0x31 expect=0xa1 count=0` +- Location: `core/box/tls_sll_box.h` header integrity check during pop operation + +**Root Cause**: +Header byte at offset 0 from base pointer contains user data (0x31) instead of header magic (0xa1). +This indicates one of: +1. Wrong pointer is being stored in TLS SLL +2. Header is not being written correctly before push +3. Adjacent block corruption overwrites header +4. Header write/read offset mismatch + +**Impact**: +- TLS SLL header reset occurs (entire freelist for class 1 dropped) +- Subsequent allocations may fail or use wrong metadata +- Benchmark crashes with SIGSEGV +- Memory corruption potential + +**Timeline**: +- Discovered during Phase 1 TLS Hint Box benchmarking +- Affects baseline configuration (no hints involved) +- Suggests pre-existing issue in shared TLS SLL code + +--- + +## Investigation Strategy + +**Phase A: Understand the Error** +- Where is header validation happening? +- What does 0x31 represent? (Is it deterministic or random data?) +- Can we reproduce with minimal allocations? + +**Phase B: Locate Corruption Source** +- Where is header supposed to be written? +- Is header being written BEFORE push or after? +- Are there any recent changes to header write logic? + +**Phase C: Implement Fix** +- Add instrumentation to catch corruption early +- Identify exact allocation/free cycle causing problem +- Fix root cause (not just symptom) + +**Phase D: Validate** +- TC1 baseline should complete without crashes +- TC2/TC3 can then be evaluated +- No performance regression + +--- + +## Deep Dive: TLS SLL Header Corruption + +### What is 0x31? + +The error reports `got=0x31`. Let's understand what this means: + +```c +// Expected (header magic for class 1): +0xa1 = 0xa0 (HEADER_MAGIC) | 0x01 (class_idx) + +// Got: +0x31 = 0b00110001 + = ASCII '1' character + = Some piece of user data or metadata +``` + +**Questions to answer**: +1. Is 0x31 always the same, or does it vary? (Deterministic vs random corruption) +2. Does 0x31 correspond to any known data pattern in hakmem? +3. Does the corruption happen during alloc or free? +4. Is 0x31 part of the test program's data? + +### TLS SLL Header Check Logic + +**Location**: `/mnt/workdisk/public_share/hakmem/core/box/tls_sll_box.h` (around lines 280-320) + +```c +// In tls_sll_pop_impl(): +if (tiny_class_preserves_header(class_idx)) { + uint8_t* b = (uint8_t*)raw_base; + uint8_t got = *b; // Read byte at offset 0 of base pointer + uint8_t expected = (uint8_t)(HEADER_MAGIC | (class_idx & HEADER_CLASS_MASK)); + + if (got != expected) { + // CORRUPTION DETECTED! + fprintf(stderr, "[TLS_SLL_HDR_RESET] cls=%d base=%p got=0x%02x expect=0x%02x ...\n", + class_idx, raw_base, got, expected); + // ... reset logic follows + } +} +``` + +**Key Points**: +- Header is read at `(uint8_t*)raw_base` (offset 0) +- Expected value is `0xa0 | class_idx` +- For class 1: expect `0xa1` +- Got `0x31` instead (user data) + +### When Does This Happen? + +The error occurs during `tls_sll_pop()`, which is called when: +1. **Freelist refill**: Taking blocks from TLS SLL back to unified cache +2. **Magazine spill**: Freelist → TLS SLL transition for overflow +3. **Allocation path**: Pulling blocks from TLS SLL to satisfy malloc + +**The header corruption must have happened BEFORE push**, but is detected AFTER pop. + +This suggests: +- Either the pointer stored in TLS SLL is wrong (points to wrong location) +- Or the header was never written correctly +- Or adjacent block corruption overwrote the header +- Or there's an offset calculation error between push and pop + +--- + +## Diagnostic Procedure + +### Step 1: Reproduce with Minimal Test + +Create the smallest possible test case: + +**File**: `/mnt/workdisk/public_share/hakmem/tests/test_tls_sll_minimal.c` + +```c +#include +#include +#include + +int main() { + printf("Test 1: Simple alloc/free cycle\n"); + for (int i = 0; i < 10; i++) { + void* p = malloc(16); // Class 1 + if (p) { + memset(p, 0x31, 16); // Write user data (includes 0x31!) + free(p); + } + } + printf("✓ Test 1 passed\n"); + + printf("Test 2: Rapid alloc/free (trigger refill)\n"); + for (int i = 0; i < 1000; i++) { + void* p = malloc(16); + if (p) { + memset(p, 0x31, 16); + free(p); + } + } + printf("✓ Test 2 passed\n"); + + printf("Test 3: Multiple sizes\n"); + for (int size = 8; size <= 512; size *= 2) { + for (int j = 0; j < 100; j++) { + void* p = malloc(size); + if (p) { + memset(p, 0x31, size); + free(p); + } + } + } + printf("✓ Test 3 passed\n"); + + printf("Test 4: Heavy churn (trigger SLL push/pop)\n"); + void* ptrs[100]; + for (int round = 0; round < 10; round++) { + for (int i = 0; i < 100; i++) { + ptrs[i] = malloc(16); + if (ptrs[i]) memset(ptrs[i], 0x31, 16); + } + for (int i = 0; i < 100; i++) { + free(ptrs[i]); + } + } + printf("✓ Test 4 passed\n"); + + return 0; +} +``` + +**Build and test**: +```bash +cd /mnt/workdisk/public_share/hakmem +mkdir -p tests +gcc -o tests/test_tls_sll_minimal tests/test_tls_sll_minimal.c +LD_PRELOAD=./libhakmem.so ./tests/test_tls_sll_minimal +``` + +**Goal**: Find the minimal reproduction: +- If test 1 fails: Early corruption (basic alloc/free) +- If test 2 fails: Refill-related corruption +- If test 3 fails: Class-specific issue +- If test 4 fails: SLL push/pop cycling issue + +### Step 2: Add Diagnostic Logging + +Instrument the header write/read paths: + +#### Instrument Header Write + +**File**: `/mnt/workdisk/public_share/hakmem/core/hakmem_tiny_config_box.inc` + +Find the `HAK_RET_ALLOC` macro and add logging: + +```c +// Add diagnostic logging +#define HAK_RET_ALLOC(base, cls) do { \ + fprintf(stderr, "[ALLOC_HEADER_WRITE] base=%p cls=%d\n", base, cls); \ + uint8_t* hdr = (uint8_t*)(base); \ + uint8_t magic = (uint8_t)(0xa0 | ((cls) & 0x0f)); \ + *hdr = magic; \ + fprintf(stderr, "[ALLOC_HEADER_WROTE] base=%p magic=0x%02x (at %p)\n", base, *hdr, hdr); \ + __atomic_thread_fence(__ATOMIC_RELEASE); \ + hak_user_ptr_t user = ptr_base_to_user(base, cls); \ + fprintf(stderr, "[ALLOC_RETURN] user=%p (base=%p + %ld)\n", user, base, (char*)user - (char*)base); \ + return user; \ +} while(0) +``` + +#### Instrument Header Read + +**File**: `/mnt/workdisk/public_share/hakmem/core/box/tls_sll_box.h` + +Modify the header read/check in `tls_sll_pop_impl()`: + +```c +// In tls_sll_pop_impl(), before the check: +if (tiny_class_preserves_header(class_idx)) { + uint8_t* b = (uint8_t*)raw_base; + uint8_t got = *b; + uint8_t expected = (uint8_t)(HEADER_MAGIC | (class_idx & HEADER_CLASS_MASK)); + + // NEW DIAGNOSTIC LOGGING: + fprintf(stderr, "[TLS_SLL_POP_CHECK] class=%d raw_base=%p checking at %p\n", + class_idx, raw_base, b); + fprintf(stderr, "[TLS_SLL_POP_READ] got=0x%02x expected=0x%02x\n", got, expected); + + if (got != expected) { + fprintf(stderr, "[CORRUPTION_DETECTED] Mismatch! Dumping context...\n"); + fprintf(stderr, "[CORRUPTION_CONTEXT] raw_base=%p, offset=%ld\n", raw_base, (char*)b - (char*)raw_base); + + // Dump surrounding bytes + fprintf(stderr, "[CORRUPTION_DUMP] Bytes around base: "); + for (int i = -8; i < 16; i++) { + fprintf(stderr, "%02x ", ((uint8_t*)raw_base)[i]); + } + fprintf(stderr, "\n"); + + // ... existing reset logic + } +} +``` + +#### Instrument SLL Push + +**File**: `/mnt/workdisk/public_share/hakmem/core/box/tls_sll_box.h` + +Find `tls_sll_push_impl()` and add logging: + +```c +static inline bool tls_sll_push_impl(..., hak_base_ptr_t ptr, ...) { + fprintf(stderr, "[TLS_SLL_PUSH] class=%d ptr=%p\n", class_idx, ptr); + + // Check header BEFORE push + if (tiny_class_preserves_header(class_idx)) { + uint8_t hdr = *(uint8_t*)ptr; + fprintf(stderr, "[TLS_SLL_PUSH_HDR_CHECK] ptr=%p header=0x%02x\n", ptr, hdr); + } + + // ... existing push logic +} +``` + +**Build and run**: +```bash +make clean +make shared -j8 +LD_PRELOAD=./libhakmem.so ./tests/test_tls_sll_minimal 2>&1 | grep -E "ALLOC|POP|PUSH|CORRUPTION" | head -100 +``` + +**What to look for**: +- Do ALLOC_HEADER_WRITE and TLS_SLL_PUSH_HDR_CHECK match? +- Does TLS_SLL_POP_READ show corruption? +- What is the sequence: WRITE → PUSH → POP? +- Are pointers consistent across operations? + +### Step 3: Examine Header Write Locations + +Search for all places headers are written: + +```bash +cd /mnt/workdisk/public_share/hakmem +grep -rn "= 0xa\|= HEADER_MAGIC\|= TINY_HEADER\|0xa0 |" core/ --include="*.h" --include="*.c" --include="*.inc" +``` + +Expected locations: +1. `core/hakmem_tiny_config_box.inc` - HAK_RET_ALLOC macro +2. `core/box/tls_sll_box.h` - Optional header write on SLL push (if needed) +3. `core/tiny_alloc_fast_push.c` - Fast path allocations +4. Other allocation paths? + +**Check each location**: +- Is the offset correct? (Should be offset 0 from base) +- Is it written BEFORE or AFTER pushing to TLS SLL? +- Is there an atomic fence to prevent reordering? +- Is the class_idx valid? + +### Step 4: Examine Pointer Conversion Logic + +The key question: **Are we storing the right pointer in TLS SLL?** + +**File**: `/mnt/workdisk/public_share/hakmem/core/hakmem_tiny_types.h` + +Check the pointer conversion macros: + +```bash +cd /mnt/workdisk/public_share/hakmem +grep -A5 "ptr_user_to_base\|ptr_base_to_user\|HAK_BASE_FROM_RAW" core/hakmem_tiny_types.h +``` + +**Critical questions**: +1. When we free a user pointer, do we convert it to base pointer correctly? +2. When we push to TLS SLL, do we push the base pointer or user pointer? +3. When we pop from TLS SLL, do we get back the exact same base pointer? + +**Expected flow**: +``` +Alloc: BASE → (write header at BASE) → (convert to USER) → return USER +Free: USER → (convert to BASE) → (push BASE to TLS SLL) +Pop: (pop BASE from TLS SLL) → (read header at BASE) → validate +``` + +If any step uses wrong offset, corruption occurs. + +### Step 5: Git Blame on Recent Changes + +```bash +cd /mnt/workdisk/public_share/hakmem +git log --oneline -30 +git show b5be708b6 # "Fix potential freelist corruption" +git show c91602f18 # "Fix ptr_user_to_base_blind regression" +git show f3f75ba3d # "Fix magazine spill RAW pointer" +``` + +**Check**: Did any of these changes affect header write logic? + +Look for: +- Changes to `HAK_RET_ALLOC` macro +- Changes to pointer conversion logic +- Changes to TLS SLL push/pop +- Changes to header offset calculations + +### Step 6: Review Commit History for TLS SLL + +```bash +cd /mnt/workdisk/public_share/hakmem +git log --oneline --all -- core/box/tls_sll_box.h | head -20 +git log -p --all -- core/box/tls_sll_box.h | head -200 +``` + +Look for: +- When was header logic last changed? +- Were there any defensive fixes recently? +- Any atomic fence changes? +- Any offset calculation changes? + +### Step 7: Check Phase 1 Configuration + +**File**: `/mnt/workdisk/public_share/hakmem/core/hakmem_tiny_types.h` + +Verify the header configuration: + +```c +// Phase 1: headerless = false → headers ON +// Header should be at offset 0 of base pointer +#define TINY_HEADER_SIZE_BYTES 1 +#define HEADER_MAGIC 0xa0 +``` + +**Check**: +- Is HEADERLESS defined? (Should be undefined for Phase 1) +- Is header size correct? (Should be 1 byte) +- Are offset calculations consistent? + +--- + +## Likely Root Causes (Narrowed) + +### Root Cause A: Header Written at Wrong Offset + +**Symptom**: User data appears where header should be + +**Check**: +```c +// In HAK_RET_ALLOC, are we writing at the right place? +// Phase 1: header at offset 0 of base +uint8_t* hdr_ptr = (uint8_t*)base; // Should be offset 0 +*hdr_ptr = magic; + +// If this was changed to: +uint8_t* hdr_ptr = (uint8_t*)base + 1; // WRONG! User data location +*hdr_ptr = magic; +// Then header is written in user space, gets overwritten +``` + +**How to verify**: +```bash +cd /mnt/workdisk/public_share/hakmem +grep -n "HAK_RET_ALLOC" core/hakmem_tiny_config_box.inc +# Check that header write is at (uint8_t*)base, not base+offset +``` + +**Fix**: Ensure header write is at `(uint8_t*)base`, not base+offset. + +### Root Cause B: User Pointer Pushed Instead of Base Pointer + +**Symptom**: SLL contains user pointers, but pop expects base pointers + +**Sequence**: +```c +// During free: +void* user_ptr = ...; // User pointer (base + 1 for Phase 1) +tls_sll_push(class_idx, user_ptr); // WRONG! Should be base pointer + +// During pop: +void* popped = tls_sll_pop(class_idx); // Gets user_ptr +uint8_t header = *(uint8_t*)popped; // Reads at user_ptr, not base_ptr! +// This reads user data instead of header +``` + +**How to verify**: +```bash +cd /mnt/workdisk/public_share/hakmem +grep -rn "tls_sll_push" core/ --include="*.c" --include="*.inc" -A3 -B3 +# Check that all pushes use base pointer, not user pointer +``` + +**Fix**: Convert user pointer to base pointer before pushing: +```c +hak_base_ptr_t base = ptr_user_to_base(user_ptr, class_idx); +tls_sll_push(class_idx, base, cap); +``` + +### Root Cause C: Atomic Fence Missing + +**Symptom**: Compiler reorders header write after SLL push + +**Check**: +```c +*(uint8_t*)base = header_magic; // Instruction 1 +__atomic_thread_fence(__ATOMIC_RELEASE); // Fence (required!) +tls_sll_push(class_idx, base); // Instruction 2 +``` + +If fence is missing, CPU/compiler might: +1. Schedule push before header write +2. Other thread sees unprepared node in SLL +3. Pop reads unwritten header → corruption + +**How to verify**: +```bash +cd /mnt/workdisk/public_share/hakmem +grep -B5 "tls_sll_push" core/ --include="*.c" --include="*.inc" | grep -E "fence|barrier|atomic" +# Check that fence exists between header write and push +``` + +**Fix**: Add `__atomic_thread_fence(__ATOMIC_RELEASE)` after header write, before SLL push. + +### Root Cause D: Magazine Spill Pointer Wrapping + +**Symptom**: Magazine stores RAW pointer, SLL expects BASE pointer + +**Already Fixed**: Commit f3f75ba3d added `HAK_BASE_FROM_RAW()` wrapper + +**Verify**: +```bash +cd /mnt/workdisk/public_share/hakmem +grep -n "HAK_BASE_FROM_RAW\|magazine.*spill" core/hakmem_tiny_refill.inc.h +# Check line 228 or nearby has the fix +``` + +**Expected code**: +```c +void* p = mag->items[--mag->top].ptr; +hak_base_ptr_t base_p = HAK_BASE_FROM_RAW(p); // Must have this! +if (!tls_sll_push(class_idx, base_p, cap)) { + // ... +} +``` + +**Fix**: If missing, add `HAK_BASE_FROM_RAW()` wrapper around raw pointer. + +### Root Cause E: Class Index Mismatch + +**Symptom**: Wrong class_idx used for header magic + +**Check**: +```c +int class_idx = ...; // Where does this come from? +uint8_t magic = (uint8_t)(0xa0 | (class_idx & 0x0f)); +// If class_idx is wrong (e.g., -1 or 999), magic will be corrupt +``` + +**How to verify**: +```bash +cd /mnt/workdisk/public_share/hakmem +grep -rn "class_idx\|tiny_size_to_class" core/ --include="*.h" | grep -E "= -1|= 0xff" +# Look for places where class_idx might be invalid +``` + +**Fix**: Validate class_idx is in range [0, 7] before using: +```c +if (class_idx < 0 || class_idx >= TINY_NUM_CLASSES) { + fprintf(stderr, "[ERROR] Invalid class_idx: %d\n", class_idx); + abort(); +} +``` + +### Root Cause F: Offset Calculation Error + +**Symptom**: Header written at base, but read at base+offset (or vice versa) + +**Check**: +```c +// During alloc: +*(uint8_t*)base = magic; // Write at base+0 +user = base + 1; // User at base+1 (Phase 1) + +// During free/pop: +base = user - 1; // Should recover original base +uint8_t hdr = *(uint8_t*)base; // Should read at base+0 + +// BUT if conversion is wrong: +base = user - 0; // WRONG! Off by one +uint8_t hdr = *(uint8_t*)base; // Reads at wrong location +``` + +**How to verify**: +```bash +cd /mnt/workdisk/public_share/hakmem +grep -A10 "ptr_user_to_base_impl\|ptr_base_to_user_impl" core/hakmem_tiny_types.h +# Check offset calculations are consistent +``` + +**Fix**: Ensure offset calculations match between: +- `ptr_base_to_user` (add offset) +- `ptr_user_to_base` (subtract same offset) + +--- + +## Proposed Fix Patterns + +Based on diagnostic results, the fix will likely be one of: + +### Fix Pattern 1: Restore Header Write Logic + +**Problem**: Header write uses wrong offset or wrong pointer + +**File**: `/mnt/workdisk/public_share/hakmem/core/hakmem_tiny_config_box.inc` + +```c +#define HAK_RET_ALLOC(base, cls) do { \ + /* Write header FIRST at offset 0 of base */ \ + *(uint8_t*)(base) = (uint8_t)(0xa0 | ((cls) & 0x0f)); \ + /* Ensure header write completes before next operation */ \ + __atomic_thread_fence(__ATOMIC_RELEASE); \ + /* Now convert to user pointer and return */ \ + return ptr_base_to_user((base), (cls)); \ +} while(0) +``` + +### Fix Pattern 2: Add Missing Fence + +**Problem**: Compiler reorders header write after SLL push + +**File**: `/mnt/workdisk/public_share/hakmem/core/tiny_alloc_fast_push.c` or `core/hakmem_tiny_free.inc` + +```c +// Before push to TLS SLL: +*(uint8_t*)base = header_magic; +__atomic_thread_fence(__ATOMIC_RELEASE); // ADD THIS LINE +tls_sll_push(class_idx, base, cap); +``` + +### Fix Pattern 3: Fix Pointer Type in Push + +**Problem**: User pointer pushed instead of base pointer + +**File**: Multiple locations (search for `tls_sll_push`) + +```c +// In free path: +void* user_ptr = ptr; // From user +hak_base_ptr_t base_ptr = ptr_user_to_base(user_ptr, class_idx); // Convert! +if (!tls_sll_push(class_idx, base_ptr, cap)) { // Push base, not user + // ... +} +``` + +### Fix Pattern 4: Validate Inputs + +**Problem**: Invalid class_idx or pointer values + +**File**: `/mnt/workdisk/public_share/hakmem/core/box/tls_sll_box.h` + +```c +// At entry of tls_sll_push_impl(): +static inline bool tls_sll_push_impl(..., hak_base_ptr_t ptr, ...) { + // Validate inputs + if (class_idx < 0 || class_idx >= TINY_NUM_CLASSES) { + fprintf(stderr, "[ERROR] Invalid class_idx: %d\n", class_idx); + return false; + } + if (!ptr || ptr == (void*)-1) { + fprintf(stderr, "[ERROR] Invalid pointer: %p\n", ptr); + return false; + } + + // ... existing logic +} +``` + +### Fix Pattern 5: Check Magazine Spill + +**Problem**: Magazine spill uses wrong pointer type + +**File**: `/mnt/workdisk/public_share/hakmem/core/hakmem_tiny_refill.inc.h` + +```c +// Around line 228 (magazine spill): +void* p = mag->items[--mag->top].ptr; + +// MUST convert RAW to BASE before pushing: +hak_base_ptr_t base_p = HAK_BASE_FROM_RAW(p); // Essential! + +if (!tls_sll_push(class_idx, base_p, cap)) { + // ... error handling +} +``` + +**Verify fix exists**: +```bash +cd /mnt/workdisk/public_share/hakmem +grep -n "HAK_BASE_FROM_RAW" core/hakmem_tiny_refill.inc.h +# Should see it used before tls_sll_push +``` + +### Fix Pattern 6: Fix Offset Calculation + +**Problem**: Pointer conversion uses wrong offset + +**File**: `/mnt/workdisk/public_share/hakmem/core/hakmem_tiny_types.h` + +```c +// Verify Phase 1 offsets: +static inline hak_user_ptr_t ptr_base_to_user_impl(hak_base_ptr_t base, int cls) { + if (tiny_class_preserves_header(cls)) { + return (hak_user_ptr_t)((uint8_t*)base + TINY_HEADER_SIZE_BYTES); // +1 for Phase 1 + } + return (hak_user_ptr_t)base; +} + +static inline hak_base_ptr_t ptr_user_to_base_impl(hak_user_ptr_t user, int cls) { + if (tiny_class_preserves_header(cls)) { + return (hak_base_ptr_t)((uint8_t*)user - TINY_HEADER_SIZE_BYTES); // -1 for Phase 1 + } + return (hak_base_ptr_t)user; +} +``` + +**Check**: Ensure +1 and -1 match, and TINY_HEADER_SIZE_BYTES is 1. + +--- + +## Debug Workflow + +### Quick Debug Cycle + +```bash +cd /mnt/workdisk/public_share/hakmem + +# 1. Make changes to source +# ... edit files ... + +# 2. Rebuild +make clean && make shared -j8 + +# 3. Test with minimal reproducer +LD_PRELOAD=./libhakmem.so ./tests/test_tls_sll_minimal 2>&1 | tee debug.log + +# 4. Check for errors +grep "TLS_SLL_HDR_RESET\|CORRUPTION\|SIGSEGV" debug.log + +# 5. Analyze log patterns +grep -E "ALLOC|PUSH|POP" debug.log | head -50 +``` + +### Advanced Debug: GDB + +```bash +cd /mnt/workdisk/public_share/hakmem + +# Build with debug symbols +make clean +CFLAGS="-g -O0" make shared -j8 + +# Run under GDB +gdb --args ./tests/test_tls_sll_minimal +``` + +**GDB commands**: +```gdb +(gdb) set environment LD_PRELOAD ./libhakmem.so +(gdb) break tls_sll_push_impl +(gdb) break tls_sll_pop_impl +(gdb) run +(gdb) print /x *(uint8_t*)ptr # Check header byte +(gdb) print class_idx +(gdb) backtrace +(gdb) continue +``` + +### Memory Corruption Detection + +Enable AddressSanitizer: + +```bash +cd /mnt/workdisk/public_share/hakmem +make clean +CFLAGS="-fsanitize=address -g" LDFLAGS="-fsanitize=address" make shared -j8 +LD_PRELOAD=./libhakmem.so ./tests/test_tls_sll_minimal +``` + +ASan will catch: +- Buffer overflows +- Use-after-free +- Double-free +- Invalid pointer arithmetic + +--- + +## After Applying Fix + +### Step 1: Rebuild and Test Minimal Reproducer + +```bash +cd /mnt/workdisk/public_share/hakmem +make clean +make shared -j8 +LD_PRELOAD=./libhakmem.so ./tests/test_tls_sll_minimal +``` + +**Expected**: +- All tests pass +- No `[TLS_SLL_HDR_RESET]` errors +- No SIGSEGV crashes + +### Step 2: Run TC1 Baseline Test + +```bash +cd /mnt/workdisk/public_share/hakmem +make clean +make shared -j8 +LD_PRELOAD=./libhakmem.so timeout 20 ./mimalloc-bench/out/bench/sh8bench 2>&1 | tail -20 +``` + +**Expected**: +- "Total elapsed time..." message +- No SIGSEGV +- Completion within timeout + +### Step 3: Run Full Benchmark Suite + +```bash +cd /mnt/workdisk/public_share/hakmem + +# cfrac test +LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/cfrac 2>&1 | head -10 + +# larson test +LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/larson 8 2>&1 | tail -10 + +# sh6bench test +LD_PRELOAD=./libhakmem.so timeout 20 ./mimalloc-bench/out/bench/sh6bench 2>&1 | tail -5 +``` + +**Expected**: All pass without crashes or corruption errors + +### Step 4: Regression Check + +Ensure fix doesn't break other configurations: + +```bash +cd /mnt/workdisk/public_share/hakmem + +# Test Phase 2 (headerless=true) - if implemented +# ... config changes ... +# make clean && make shared -j8 +# LD_PRELOAD=./libhakmem.so ./tests/test_tls_sll_minimal + +# Test with different workloads +LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/mstress 10 2 +LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/rptest 10 +``` + +### Step 5: Performance Check + +Verify no performance regression: + +```bash +cd /mnt/workdisk/public_share/hakmem + +# Before fix (save baseline): +LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/sh8bench 2>&1 | grep "Total elapsed" +# Note: May crash, but if it runs, record time + +# After fix: +LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/sh8bench 2>&1 | grep "Total elapsed" + +# Compare: Should be within 5% of baseline (if baseline worked) +``` + +### Step 6: Remove Diagnostic Logging + +After fix is confirmed, remove debug logging: + +```bash +cd /mnt/workdisk/public_share/hakmem + +# Remove fprintf statements added for diagnosis +# Restore original HAK_RET_ALLOC macro +# Restore original tls_sll_push/pop implementations + +# Rebuild clean version +make clean +make shared -j8 + +# Final test +LD_PRELOAD=./libhakmem.so ./tests/test_tls_sll_minimal +LD_PRELOAD=./libhakmem.so timeout 20 ./mimalloc-bench/out/bench/sh8bench +``` + +### Step 7: Commit with Detailed Message + +```bash +cd /mnt/workdisk/public_share/hakmem +git status +git add [modified files] +git commit -m "Fix TLS SLL header corruption + +Problem: Header magic byte being corrupted during allocation/free path, +causing [TLS_SLL_HDR_RESET] errors and SIGSEGV crashes in baseline tests. + +Symptoms: +- sh8bench crashes with SIGSEGV +- Error: [TLS_SLL_HDR_RESET] cls=1 got=0x31 expect=0xa1 +- Header validation fails during tls_sll_pop() + +Root cause: [DESCRIBE WHAT WAS WRONG - e.g.:] +- User pointer was being pushed to TLS SLL instead of base pointer +- Header read at wrong offset due to pointer type mismatch +- Missing atomic fence allowed reordering of header write + +Solution: [DESCRIBE WHAT WAS FIXED - e.g.:] +- Convert user pointer to base pointer before tls_sll_push() +- Add atomic fence after header write, before SLL operations +- Validate pointer types at SLL entry points + +Changes: +- core/hakmem_tiny_config_box.inc: Fixed HAK_RET_ALLOC header offset +- core/box/tls_sll_box.h: Added pointer validation +- core/hakmem_tiny_free.inc: Convert to base ptr before push + +Validation: +- test_tls_sll_minimal passes (4/4 tests) +- sh8bench baseline completes successfully +- cfrac/larson/sh6bench pass without crashes +- No performance regression (<2% variance) + +Verified: TC1 baseline stability restored, ready for Phase 1 testing" +``` + +--- + +## Expected Timeline + +**Phase A: Understanding (1-2 hours)** +- Read this document +- Understand TLS SLL architecture +- Review header mechanism +- Locate relevant code sections + +**Phase B: Diagnosis (2-4 hours)** +- Create minimal test case +- Add diagnostic logging +- Run tests and analyze logs +- Identify root cause + +**Phase C: Fix Implementation (1-2 hours)** +- Implement surgical fix +- Remove diagnostic logging +- Clean build and test + +**Phase D: Validation (1-2 hours)** +- Run full test suite +- Verify no regressions +- Performance check +- Document and commit + +**Total: 5-10 hours** for complete diagnosis, fix, and validation + +--- + +## Success Criteria + +**Must Have**: +1. No `[TLS_SLL_HDR_RESET]` errors in baseline tests +2. sh8bench completes without SIGSEGV +3. Minimal test suite passes (4/4 tests) +4. Fix is surgical (minimal code changes) +5. Root cause documented clearly + +**Nice to Have**: +1. Performance neutral (<5% variance) +2. Fix applies to all configurations +3. Additional validation checks added +4. Regression tests added + +**Verification**: +```bash +cd /mnt/workdisk/public_share/hakmem +LD_PRELOAD=./libhakmem.so timeout 20 ./mimalloc-bench/out/bench/sh8bench 2>&1 | grep -E "Total elapsed|RESET|SIGSEGV" +# Should show "Total elapsed time" with no errors +``` + +--- + +## Common Pitfalls + +### Pitfall 1: Fixing Symptoms, Not Root Cause + +**Wrong approach**: +```c +// Just disable the check +if (got != expected) { + // Do nothing, ignore corruption +} +``` + +**Right approach**: +- Understand WHY corruption happens +- Fix the source (wrong pointer, wrong offset, etc.) +- Keep the validation check enabled + +### Pitfall 2: Over-Engineering + +**Wrong approach**: +- Rewrite entire TLS SLL system +- Add complex locking mechanisms +- Change fundamental architecture + +**Right approach**: +- Minimal fix (usually 1-5 lines) +- Fix pointer conversion or offset +- Add fence if missing + +### Pitfall 3: Ignoring Test Results + +**Wrong approach**: +- Fix compiles, assume it works +- Skip minimal reproducer +- Don't verify with benchmarks + +**Right approach**: +- Test with minimal case FIRST +- Verify all benchmarks pass +- Check performance impact + +### Pitfall 4: Removing Too Much Logging Too Early + +**Wrong approach**: +- Remove diagnostic logging immediately +- Hard to debug if issue returns + +**Right approach**: +- Keep logging until fix is verified +- Remove logging in separate commit +- Document what was learned + +--- + +## Additional Resources + +### Key Files to Understand + +1. `/mnt/workdisk/public_share/hakmem/core/box/tls_sll_box.h` + - TLS SLL push/pop implementation + - Header validation logic + +2. `/mnt/workdisk/public_share/hakmem/core/hakmem_tiny_config_box.inc` + - HAK_RET_ALLOC macro + - Header write logic + +3. `/mnt/workdisk/public_share/hakmem/core/hakmem_tiny_types.h` + - Pointer conversion macros + - ptr_user_to_base / ptr_base_to_user + +4. `/mnt/workdisk/public_share/hakmem/core/hakmem_tiny_refill.inc.h` + - Magazine spill logic + - TLS SLL interaction + +### Useful Git Commands + +```bash +# Find when header logic changed +git log -p --all -S "0xa0" -- core/ + +# Find recent changes to TLS SLL +git log --oneline -20 -- core/box/tls_sll_box.h + +# Compare current vs previous version +git diff HEAD~5 core/hakmem_tiny_config_box.inc + +# Find all references to a function +git grep -n "tls_sll_push" core/ +``` + +### Debugging Commands + +```bash +# Check header size configuration +grep -n "TINY_HEADER\|HEADERLESS" core/hakmem_tiny_types.h + +# Find all allocation return points +grep -rn "HAK_RET_ALLOC\|return.*user" core/ --include="*.inc" + +# Find all TLS SLL push calls +grep -rn "tls_sll_push" core/ --include="*.c" --include="*.inc" -B3 -A3 + +# Check atomic operations +grep -rn "atomic_thread_fence\|__atomic\|memory_order" core/ --include="*.h" +``` + +--- + +## Questions to Answer During Diagnosis + +1. **What is 0x31?** + - Is it always 0x31, or does it vary? + - Does it correspond to test data? + - Is it ASCII '1' character? + +2. **Where is the header written?** + - In HAK_RET_ALLOC macro? + - In tls_sll_push? + - Somewhere else? + +3. **Where is the header read?** + - In tls_sll_pop? + - In allocation path? + +4. **Are offsets consistent?** + - Write at offset X + - Read at offset X + - Both use same base pointer? + +5. **Are pointer types correct?** + - Push base or user pointer? + - Pop returns base or user pointer? + - Conversions correct? + +6. **Is there a fence?** + - Between header write and SLL push? + - Between SLL pop and header read? + +7. **Is class_idx valid?** + - In range [0, 7]? + - Matches actual allocation size? + +8. **Has this ever worked?** + - Check git history + - Was there a recent breaking change? + +--- + +## Document Version + +- **Version**: 1.0 +- **Date**: 2025-12-03 +- **Author**: System diagnostic documentation +- **Target**: ChatGPT diagnostic agent +- **Estimated completion time**: 5-10 hours + +--- + +## Final Checklist + +Before considering the fix complete: + +- [ ] Minimal reproducer created and passes +- [ ] Root cause identified and documented +- [ ] Fix implemented with explanation +- [ ] Diagnostic logging removed +- [ ] All baseline tests pass +- [ ] No performance regression +- [ ] Git commit with detailed message +- [ ] This document updated with findings + +**Good luck with the diagnosis!** diff --git a/docs/TLS_SS_HINT_BOX_DESIGN.md b/docs/TLS_SS_HINT_BOX_DESIGN.md new file mode 100644 index 00000000..9381d738 --- /dev/null +++ b/docs/TLS_SS_HINT_BOX_DESIGN.md @@ -0,0 +1,1148 @@ +# TLS Superslab Hint Box - Design Document + +**Phase**: Headerless Performance Optimization - Phase 1 +**Date**: 2025-12-03 +**Status**: Design Review +**Author**: hakmem team + +--- + +## 1. Executive Summary + +The TLS Superslab Hint Box is a thread-local cache that accelerates pointer-to-SuperSlab resolution in Headerless mode. When HAKMEM_TINY_HEADERLESS=1 is enabled, every free() operation requires translating a user pointer to its owning SuperSlab. Currently, this uses `hak_super_lookup()`, which performs a hash table lookup costing 10-50 cycles. By caching recently-used SuperSlab references in thread-local storage, we can reduce this to 2-5 cycles for cache hits (85-95% hit rate expected). + +**Expected Performance Improvement**: 15-20% throughput increase (54.60 → 64-68 Mops/s on sh8bench) + +**Risk Level**: Low +- Thread-local storage eliminates cache coherency issues +- Magic number validation provides fail-safe fallback +- Self-contained Box with minimal integration surface +- Memory overhead: ~128 bytes per thread (negligible) + +--- + +## 2. Box Definition (Box Theory) + +``` +Box: TLS Superslab Hint Cache + +MISSION: + Cache recently-used SuperSlab references in TLS to accelerate + ptr→SuperSlab resolution in Headerless mode, avoiding expensive + hash table lookups on the critical free() path. + +DESIGN: + - Provides O(1) lookup for hot SuperSlabs (L1 cache hit, 2-5 cycles) + - Falls back to global registry on miss (fail-safe, no data loss) + - No ownership, no remote queues, pure read-only cache + - FIFO eviction policy with configurable cache size (2-4 slots) + +INVARIANTS: + - hint.base <= ptr < hint.end implies hint.ss is valid + - Miss is always safe (triggers fallback to hak_super_lookup) + - TLS data survives only within thread lifetime + - Cache entries are invalidated implicitly by FIFO rotation + - Magic number check (SUPERSLAB_MAGIC) validates all pointers + +BOUNDARY: + - Input: raw user pointer (void* ptr) from free() path + - Output: SuperSlab* or NULL (miss triggers fallback) + - Does NOT determine class_idx (that's slab_index_for's job) + - Does NOT perform ownership validation (that's SuperSlab's job) + +PERFORMANCE: + - Cache hit: 2-5 cycles (L1 cache hit, 4 pointer comparisons) + - Cache miss: fallback to hak_super_lookup (10-50 cycles) + - Expected hit rate: 85-95% for single-threaded workloads + - Expected hit rate: 70-85% for multi-threaded workloads + +THREAD SAFETY: + - TLS storage: no sharing, no synchronization required + - Read-only cache: never modifies SuperSlab state + - Stale entries: caught by magic number check +``` + +--- + +## 3. Data Structures + +```c +// core/box/tls_ss_hint_box.h + +#ifndef TLS_SS_HINT_BOX_H +#define TLS_SS_HINT_BOX_H + +#include +#include + +// Forward declaration +struct SuperSlab; + +// Cache entry for a single SuperSlab hint +// Size: 24 bytes (cache-friendly, fits in 1 cache line with metadata) +typedef struct { + void* base; // SuperSlab base address (aligned to 1MB or 2MB) + void* end; // base + superslab_size (for range check) + struct SuperSlab* ss; // Cached SuperSlab pointer +} TlsSsHintEntry; + +// TLS hint cache configuration +// - 4 slots provide good hit rate without excessive overhead +// - Larger caches (8, 16) show diminishing returns in benchmarks +// - Smaller caches (2) may thrash on workloads with 3+ active SuperSlabs +#define TLS_SS_HINT_SLOTS 4 + +// Thread-local SuperSlab hint cache +// Total size: 24*4 + 16 = 112 bytes per thread (negligible overhead) +typedef struct { + TlsSsHintEntry entries[TLS_SS_HINT_SLOTS]; // Cache entries + uint32_t count; // Number of valid entries (0 to TLS_SS_HINT_SLOTS) + uint32_t next_slot; // Next slot for FIFO rotation (wraps at TLS_SS_HINT_SLOTS) + + // Statistics (optional, for profiling builds) + // Disabled in HAKMEM_BUILD_RELEASE to save 16 bytes per thread + #if !HAKMEM_BUILD_RELEASE + uint64_t hits; // Cache hit count + uint64_t misses; // Cache miss count + #endif +} TlsSsHintCache; + +// Thread-local storage instance +// Initialized to zero by TLS semantics, formal init in tls_ss_hint_init() +extern __thread TlsSsHintCache g_tls_ss_hint; + +#endif // TLS_SS_HINT_BOX_H +``` + +--- + +## 4. API Design + +```c +// core/box/tls_ss_hint_box.h (continued) + +/** + * @brief Initialize TLS hint cache for current thread + * + * Call once per thread, typically in thread-local initialization path. + * Safe to call multiple times (idempotent). + * + * Thread Safety: TLS, no synchronization required + * Performance: ~10 cycles (negligible one-time cost) + */ +static inline void tls_ss_hint_init(void); + +/** + * @brief Update hint cache with a SuperSlab reference + * + * Called on paths where we know the SuperSlab for a given address range: + * - After successful tiny_alloc (cache the allocated-from SuperSlab) + * - After superslab refill (cache the newly bound SuperSlab) + * - After unified cache refill (cache the refilled SuperSlab) + * + * Duplicate detection: If the SuperSlab is already cached, no update occurs. + * This prevents thrashing when repeatedly allocating from the same SuperSlab. + * + * @param ss SuperSlab to cache (must be non-NULL, SUPERSLAB_MAGIC validated by caller) + * @param base SuperSlab base address (1MB or 2MB aligned) + * @param size SuperSlab size in bytes (1MB or 2MB) + * + * Thread Safety: TLS, no synchronization required + * Performance: ~15-20 cycles (duplicate check + FIFO rotation) + */ +static inline void tls_ss_hint_update(struct SuperSlab* ss, void* base, size_t size); + +/** + * @brief Lookup SuperSlab for given pointer (fast path) + * + * Called on free() entry, before falling back to hak_super_lookup(). + * Performs linear search over cached entries (4 iterations max). + * + * Cache hit: Returns true, sets *out_ss to cached SuperSlab pointer + * Cache miss: Returns false, caller must use hak_super_lookup() + * + * @param ptr User pointer to lookup (arbitrary alignment) + * @param out_ss Output: SuperSlab pointer if found (only valid if return true) + * @return true if cache hit (out_ss is valid), false if miss + * + * Thread Safety: TLS, no synchronization required + * Performance: 2-5 cycles (hit), 8-12 cycles (miss) + * + * NOTE: Caller MUST validate SUPERSLAB_MAGIC after successful lookup. + * This Box does not perform magic validation to keep fast path minimal. + */ +static inline bool tls_ss_hint_lookup(void* ptr, struct SuperSlab** out_ss); + +/** + * @brief Clear all cached hints (for testing/reset) + * + * Use cases: + * - Unit tests: Reset cache between test cases + * - Debug: Force cache cold start for profiling + * - Thread teardown: Optional cleanup (TLS auto-cleanup on thread exit) + * + * Thread Safety: TLS, no synchronization required + * Performance: ~10 cycles + */ +static inline void tls_ss_hint_clear(void); + +/** + * @brief Get cache statistics (for profiling builds) + * + * Returns hit/miss counters for performance analysis. + * Only available in non-release builds (HAKMEM_BUILD_RELEASE=0). + * + * @param hits Output: Total cache hits + * @param misses Output: Total cache misses + * + * Thread Safety: TLS, no synchronization required + * Performance: ~5 cycles (two loads) + */ +#if !HAKMEM_BUILD_RELEASE +static inline void tls_ss_hint_stats(uint64_t* hits, uint64_t* misses); +#endif +``` + +--- + +## 5. Implementation Details + +```c +// core/box/tls_ss_hint_box.c (or inline in .h for header-only Box) + +#include "tls_ss_hint_box.h" +#include "../hakmem_tiny_superslab.h" // For SuperSlab, SUPERSLAB_MAGIC + +// Thread-local storage definition +__thread TlsSsHintCache g_tls_ss_hint = {0}; + +/** + * Initialize TLS hint cache + * Safe to call multiple times (idempotent check via count) + */ +static inline void tls_ss_hint_init(void) { + // Zero-initialization by TLS, but explicit init for clarity + g_tls_ss_hint.count = 0; + g_tls_ss_hint.next_slot = 0; + + #if !HAKMEM_BUILD_RELEASE + g_tls_ss_hint.hits = 0; + g_tls_ss_hint.misses = 0; + #endif + + // Clear all entries (paranoid, but cache-friendly loop) + for (int i = 0; i < TLS_SS_HINT_SLOTS; i++) { + g_tls_ss_hint.entries[i].base = NULL; + g_tls_ss_hint.entries[i].end = NULL; + g_tls_ss_hint.entries[i].ss = NULL; + } +} + +/** + * Update hint cache with SuperSlab reference + * FIFO rotation: oldest entry is evicted when cache is full + * Duplicate detection: skip if SuperSlab already cached + */ +static inline void tls_ss_hint_update(struct SuperSlab* ss, void* base, size_t size) { + // Sanity check: reject invalid inputs + if (__builtin_expect(!ss || !base || size == 0, 0)) { + return; + } + + // Duplicate detection: check if this SuperSlab is already cached + // This prevents thrashing when allocating from the same SuperSlab repeatedly + for (uint32_t i = 0; i < g_tls_ss_hint.count; i++) { + if (g_tls_ss_hint.entries[i].ss == ss) { + return; // Already cached, no update needed + } + } + + // Add to next slot (FIFO rotation) + uint32_t slot = g_tls_ss_hint.next_slot; + g_tls_ss_hint.entries[slot].base = base; + g_tls_ss_hint.entries[slot].end = (char*)base + size; + g_tls_ss_hint.entries[slot].ss = ss; + + // Advance to next slot (wrap at TLS_SS_HINT_SLOTS) + g_tls_ss_hint.next_slot = (slot + 1) % TLS_SS_HINT_SLOTS; + + // Increment count until cache is full + if (g_tls_ss_hint.count < TLS_SS_HINT_SLOTS) { + g_tls_ss_hint.count++; + } +} + +/** + * Lookup SuperSlab for pointer (fast path) + * Linear search over cached entries (4 iterations max) + * + * Performance note: + * - Linear search is faster than hash table for small N (N <= 8) + * - Branch-free comparison (ptr >= base && ptr < end) is 2-3 cycles + * - Total cost: 2-5 cycles (hit), 8-12 cycles (miss with 4 entries) + */ +static inline bool tls_ss_hint_lookup(void* ptr, struct SuperSlab** out_ss) { + // Fast path: iterate over valid entries + // Unrolling this loop (if count is small) is beneficial, but let compiler decide + for (uint32_t i = 0; i < g_tls_ss_hint.count; i++) { + TlsSsHintEntry* e = &g_tls_ss_hint.entries[i]; + + // Range check: base <= ptr < end + // Note: end is exclusive (base + size), so use < not <= + if (ptr >= e->base && ptr < e->end) { + // Cache hit! + *out_ss = e->ss; + + #if !HAKMEM_BUILD_RELEASE + g_tls_ss_hint.hits++; + #endif + + return true; + } + } + + // Cache miss: caller must fall back to hak_super_lookup() + #if !HAKMEM_BUILD_RELEASE + g_tls_ss_hint.misses++; + #endif + + return false; +} + +/** + * Clear all cached hints + * Use for testing or manual reset + */ +static inline void tls_ss_hint_clear(void) { + g_tls_ss_hint.count = 0; + g_tls_ss_hint.next_slot = 0; + + #if !HAKMEM_BUILD_RELEASE + // Preserve stats across clear (for cumulative profiling) + // Uncomment to reset stats: + // g_tls_ss_hint.hits = 0; + // g_tls_ss_hint.misses = 0; + #endif + + // Optional: zero out entries (paranoid, not required for correctness) + for (int i = 0; i < TLS_SS_HINT_SLOTS; i++) { + g_tls_ss_hint.entries[i].base = NULL; + g_tls_ss_hint.entries[i].end = NULL; + g_tls_ss_hint.entries[i].ss = NULL; + } +} + +/** + * Get cache statistics (profiling builds only) + */ +#if !HAKMEM_BUILD_RELEASE +static inline void tls_ss_hint_stats(uint64_t* hits, uint64_t* misses) { + if (hits) *hits = g_tls_ss_hint.hits; + if (misses) *misses = g_tls_ss_hint.misses; +} +#endif +``` + +--- + +## 6. Integration Points + +### 6.1 Update Points: When to Call `tls_ss_hint_update()` + +The hint cache should be updated whenever we know the SuperSlab for an address range. This happens on allocation success paths: + +#### Location 1: After Successful Tiny Alloc (hakmem_tiny.c) +```c +// In hak_tiny_alloc or similar allocation path +void* ptr = tiny_allocate_from_superslab(class_idx, &ss); +if (ptr) { + #if HAKMEM_TINY_SS_TLS_HINT + // Cache the SuperSlab we just allocated from + // This improves free() performance for LIFO allocation patterns + tls_ss_hint_update(ss, ss->base_addr, ss->size_bytes); + #endif + return ptr; +} +``` + +#### Location 2: After SuperSlab Refill (hakmem_tiny_refill.inc.h) +```c +// In tiny_refill_from_superslab or superslab_allocate +SuperSlab* ss = superslab_allocate(class_idx); +if (ss) { + // Bind SuperSlab to thread's TLS state + bind_superslab_to_thread(ss, class_idx); + + #if HAKMEM_TINY_SS_TLS_HINT + // Cache the newly bound SuperSlab + // Future allocations from this SuperSlab will have cached hint + tls_ss_hint_update(ss, ss->base_addr, ss->size_bytes); + #endif +} +``` + +#### Location 3: Unified Cache Refill (core/front/tiny_unified_cache.c) +```c +// In unified_cache_refill_class +void* block = superslab_alloc_block(class_idx, &ss); +if (block) { + #if HAKMEM_TINY_SS_TLS_HINT + // Cache the SuperSlab that provided this block + tls_ss_hint_update(ss, ss->base_addr, ss->size_bytes); + #endif + + // Push to unified cache + unified_cache_push(class_idx, block); +} +``` + +#### Location 4: Thread-Local Init (hakmem_tiny_tls_init) +```c +// In tiny_tls_init or thread_local_init +void tiny_tls_init(void) { + // Initialize TLS structures + tiny_magazine_init(); + tiny_sll_init(); + + #if HAKMEM_TINY_SS_TLS_HINT + // Initialize hint cache (zero-init by TLS, but explicit for clarity) + tls_ss_hint_init(); + #endif +} +``` + +### 6.2 Lookup Points: When to Call `tls_ss_hint_lookup()` + +The hint lookup should be the **first step** in free() path, before falling back to registry lookup: + +#### Location 1: Tiny Free Entry (core/hakmem_tiny_free.inc) +```c +// In hak_tiny_free or similar free path +void hak_tiny_free(void* ptr) { + if (!ptr) return; + + SuperSlab* ss = NULL; + + #if HAKMEM_TINY_HEADERLESS + // Phase 1: Try TLS hint cache (fast path, 2-5 cycles on hit) + #if HAKMEM_TINY_SS_TLS_HINT + if (!tls_ss_hint_lookup(ptr, &ss)) { + #endif + // Phase 2: Fallback to global registry (slow path, 10-50 cycles) + ss = hak_super_lookup(ptr); + #if HAKMEM_TINY_SS_TLS_HINT + } + #endif + + // Validate SuperSlab (magic check) + if (!ss || ss->magic != SUPERSLAB_MAGIC) { + // Invalid pointer - external guard path + hak_external_guard_free(ptr); + return; + } + + // Proceed with free using SuperSlab info + int class_idx = slab_index_for(ss, ptr); + tiny_free_to_slab(ss, ptr, class_idx); + + #else + // Header mode: read class_idx from header (1-3 cycles) + uint8_t hdr = *((uint8_t*)ptr - 1); + int class_idx = hdr & 0x7; + tiny_free_to_class(class_idx, ptr); + #endif +} +``` + +#### Location 2: Fast Free Path (core/tiny_free_fast_v2.inc.h) +```c +// In tiny_free_fast or inline free path +static inline void tiny_free_fast(void* ptr) { + #if HAKMEM_TINY_HEADERLESS + SuperSlab* ss = NULL; + + // Try hint cache first + #if HAKMEM_TINY_SS_TLS_HINT + if (!tls_ss_hint_lookup(ptr, &ss)) { + #endif + ss = hak_super_lookup(ptr); + #if HAKMEM_TINY_SS_TLS_HINT + } + #endif + + if (__builtin_expect(!ss || ss->magic != SUPERSLAB_MAGIC, 0)) { + // Slow path: external guard or invalid pointer + hak_tiny_free_slow(ptr); + return; + } + + // Fast path: push to TLS freelist + int class_idx = slab_index_for(ss, ptr); + front_gate_push_tls(class_idx, ptr); + + #else + // Header mode fast path + uint8_t hdr = *((uint8_t*)ptr - 1); + int class_idx = hdr & 0x7; + front_gate_push_tls(class_idx, ptr); + #endif +} +``` + +--- + +## 7. Environment Variable + +```c +// In hakmem_build_flags.h or similar configuration header + +// ============================================================================ +// Phase 1: Headerless Optimization - TLS SuperSlab Hint Cache +// ============================================================================ +// Purpose: Accelerate ptr→SuperSlab lookup in Headerless mode +// Default: 0 (disabled during development and testing) +// Target: 1 (enabled after validation in Phase 1 rollout) +// +// Performance Impact: +// - Cache hit: 2-5 cycles (vs 10-50 cycles for hak_super_lookup) +// - Expected hit rate: 85-95% (single-threaded), 70-85% (multi-threaded) +// - Expected throughput improvement: 15-20% +// +// Memory Overhead: +// - 112 bytes per thread (TLS) +// - Negligible for typical workloads (1000 threads = 112KB) +// +// Dependencies: +// - Requires HAKMEM_TINY_HEADERLESS=1 (hint is no-op in header mode) +// - No other dependencies (self-contained Box) + +#ifndef HAKMEM_TINY_SS_TLS_HINT + #define HAKMEM_TINY_SS_TLS_HINT 0 +#endif + +// Validation: Hint Box only active in Headerless mode +#if HAKMEM_TINY_SS_TLS_HINT && !HAKMEM_TINY_HEADERLESS + #error "HAKMEM_TINY_SS_TLS_HINT requires HAKMEM_TINY_HEADERLESS=1" +#endif +``` + +--- + +## 8. Testing Plan + +### 8.1 Unit Tests + +Create `/mnt/workdisk/public_share/hakmem/tests/test_tls_ss_hint.c`: + +```c +#include +#include +#include +#include "core/box/tls_ss_hint_box.h" +#include "core/hakmem_tiny_superslab.h" + +// Mock SuperSlab for testing +typedef struct { + uint32_t magic; + void* base_addr; + size_t size_bytes; + uint8_t size_class; +} MockSuperSlab; + +void test_hint_init(void) { + printf("test_hint_init...\n"); + + tls_ss_hint_init(); + + // Verify cache is empty + assert(g_tls_ss_hint.count == 0); + assert(g_tls_ss_hint.next_slot == 0); + + #if !HAKMEM_BUILD_RELEASE + assert(g_tls_ss_hint.hits == 0); + assert(g_tls_ss_hint.misses == 0); + #endif + + printf(" PASS\n"); +} + +void test_hint_basic(void) { + printf("test_hint_basic...\n"); + + tls_ss_hint_init(); + + // Mock SuperSlab + MockSuperSlab ss = { + .magic = SUPERSLAB_MAGIC, + .base_addr = (void*)0x1000000, + .size_bytes = 2 * 1024 * 1024, // 2MB + .size_class = 0 + }; + + // Update hint + tls_ss_hint_update((SuperSlab*)&ss, ss.base_addr, ss.size_bytes); + + // Verify cache entry + assert(g_tls_ss_hint.count == 1); + assert(g_tls_ss_hint.entries[0].base == ss.base_addr); + assert(g_tls_ss_hint.entries[0].ss == (SuperSlab*)&ss); + + // Lookup should hit (within range) + SuperSlab* out = NULL; + assert(tls_ss_hint_lookup((void*)0x1000100, &out) == true); + assert(out == (SuperSlab*)&ss); + + // Lookup at base should hit + assert(tls_ss_hint_lookup((void*)0x1000000, &out) == true); + assert(out == (SuperSlab*)&ss); + + // Lookup at end-1 should hit + assert(tls_ss_hint_lookup((void*)0x12FFFFF, &out) == true); + assert(out == (SuperSlab*)&ss); + + // Lookup at end should miss (exclusive boundary) + assert(tls_ss_hint_lookup((void*)0x1300000, &out) == false); + + // Lookup outside range should miss + assert(tls_ss_hint_lookup((void*)0x3000000, &out) == false); + + printf(" PASS\n"); +} + +void test_hint_fifo_rotation(void) { + printf("test_hint_fifo_rotation...\n"); + + tls_ss_hint_init(); + + // Create 6 mock SuperSlabs (cache has 4 slots) + MockSuperSlab ss[6]; + for (int i = 0; i < 6; i++) { + ss[i].magic = SUPERSLAB_MAGIC; + ss[i].base_addr = (void*)(uintptr_t)(0x1000000 + i * 0x200000); // 2MB apart + ss[i].size_bytes = 2 * 1024 * 1024; + ss[i].size_class = 0; + + tls_ss_hint_update((SuperSlab*)&ss[i], ss[i].base_addr, ss[i].size_bytes); + } + + // Cache should be full (4 slots) + assert(g_tls_ss_hint.count == TLS_SS_HINT_SLOTS); + + // First 2 SuperSlabs should be evicted (FIFO) + SuperSlab* out = NULL; + assert(tls_ss_hint_lookup((void*)0x1000100, &out) == false); // ss[0] evicted + assert(tls_ss_hint_lookup((void*)0x1200100, &out) == false); // ss[1] evicted + + // Last 4 SuperSlabs should be cached + assert(tls_ss_hint_lookup((void*)0x1400100, &out) == true); // ss[2] + assert(out == (SuperSlab*)&ss[2]); + assert(tls_ss_hint_lookup((void*)0x1600100, &out) == true); // ss[3] + assert(out == (SuperSlab*)&ss[3]); + assert(tls_ss_hint_lookup((void*)0x1800100, &out) == true); // ss[4] + assert(out == (SuperSlab*)&ss[4]); + assert(tls_ss_hint_lookup((void*)0x1A00100, &out) == true); // ss[5] + assert(out == (SuperSlab*)&ss[5]); + + printf(" PASS\n"); +} + +void test_hint_duplicate_detection(void) { + printf("test_hint_duplicate_detection...\n"); + + tls_ss_hint_init(); + + // Mock SuperSlab + MockSuperSlab ss = { + .magic = SUPERSLAB_MAGIC, + .base_addr = (void*)0x1000000, + .size_bytes = 2 * 1024 * 1024, + .size_class = 0 + }; + + // Update hint 3 times with same SuperSlab + tls_ss_hint_update((SuperSlab*)&ss, ss.base_addr, ss.size_bytes); + tls_ss_hint_update((SuperSlab*)&ss, ss.base_addr, ss.size_bytes); + tls_ss_hint_update((SuperSlab*)&ss, ss.base_addr, ss.size_bytes); + + // Cache should have only 1 entry (duplicates ignored) + assert(g_tls_ss_hint.count == 1); + assert(g_tls_ss_hint.entries[0].ss == (SuperSlab*)&ss); + + printf(" PASS\n"); +} + +void test_hint_clear(void) { + printf("test_hint_clear...\n"); + + tls_ss_hint_init(); + + // Add some entries + MockSuperSlab ss = { + .magic = SUPERSLAB_MAGIC, + .base_addr = (void*)0x1000000, + .size_bytes = 2 * 1024 * 1024, + .size_class = 0 + }; + tls_ss_hint_update((SuperSlab*)&ss, ss.base_addr, ss.size_bytes); + + assert(g_tls_ss_hint.count == 1); + + // Clear cache + tls_ss_hint_clear(); + + // Cache should be empty + assert(g_tls_ss_hint.count == 0); + assert(g_tls_ss_hint.next_slot == 0); + + // Lookup should miss + SuperSlab* out = NULL; + assert(tls_ss_hint_lookup((void*)0x1000100, &out) == false); + + printf(" PASS\n"); +} + +#if !HAKMEM_BUILD_RELEASE +void test_hint_stats(void) { + printf("test_hint_stats...\n"); + + tls_ss_hint_init(); + + // Add entry + MockSuperSlab ss = { + .magic = SUPERSLAB_MAGIC, + .base_addr = (void*)0x1000000, + .size_bytes = 2 * 1024 * 1024, + .size_class = 0 + }; + tls_ss_hint_update((SuperSlab*)&ss, ss.base_addr, ss.size_bytes); + + // Perform lookups + SuperSlab* out = NULL; + tls_ss_hint_lookup((void*)0x1000100, &out); // Hit + tls_ss_hint_lookup((void*)0x1000200, &out); // Hit + tls_ss_hint_lookup((void*)0x3000000, &out); // Miss + + // Check stats + uint64_t hits = 0, misses = 0; + tls_ss_hint_stats(&hits, &misses); + + assert(hits == 2); + assert(misses == 1); + + printf(" PASS\n"); +} +#endif + +int main(void) { + printf("Running TLS SS Hint Box unit tests...\n\n"); + + test_hint_init(); + test_hint_basic(); + test_hint_fifo_rotation(); + test_hint_duplicate_detection(); + test_hint_clear(); + + #if !HAKMEM_BUILD_RELEASE + test_hint_stats(); + #endif + + printf("\nAll tests passed!\n"); + return 0; +} +``` + +### 8.2 Integration Tests + +#### Test 1: Build Validation +```bash +# Test 1: Build with hint disabled (baseline) +make clean +make shared -j8 EXTRA_CFLAGS="-DHAKMEM_TINY_HEADERLESS=1 -DHAKMEM_TINY_SS_TLS_HINT=0" + +# Test 2: Build with hint enabled +make clean +make shared -j8 EXTRA_CFLAGS="-DHAKMEM_TINY_HEADERLESS=1 -DHAKMEM_TINY_SS_TLS_HINT=1" + +# Test 3: Verify hint is disabled in header mode (should error) +# make clean +# make shared -j8 EXTRA_CFLAGS="-DHAKMEM_TINY_HEADERLESS=0 -DHAKMEM_TINY_SS_TLS_HINT=1" +# Expected: Compile error (validation check in hakmem_build_flags.h) +``` + +#### Test 2: Benchmark Comparison +```bash +# Build baseline (hint disabled) +make clean +make shared -j8 EXTRA_CFLAGS="-DHAKMEM_TINY_HEADERLESS=1 -DHAKMEM_TINY_SS_TLS_HINT=0" + +# Run benchmarks +LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/sh8bench > baseline.txt +LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/cfrac 17545186520809 > cfrac_baseline.txt +LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/larson 8 > larson_baseline.txt + +# Build with hint enabled +make clean +make shared -j8 EXTRA_CFLAGS="-DHAKMEM_TINY_HEADERLESS=1 -DHAKMEM_TINY_SS_TLS_HINT=1" + +# Run same benchmarks +LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/sh8bench > hint.txt +LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/cfrac 17545186520809 > cfrac_hint.txt +LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/larson 8 > larson_hint.txt + +# Compare results +echo "=== sh8bench ===" +grep "Mops" baseline.txt hint.txt + +echo "=== cfrac ===" +grep "time:" cfrac_baseline.txt cfrac_hint.txt + +echo "=== larson ===" +grep "ops/s" larson_baseline.txt larson_hint.txt +``` + +#### Test 3: Hit Rate Profiling +```bash +# Build with stats enabled (non-release) +make clean +make shared -j8 EXTRA_CFLAGS="-DHAKMEM_TINY_HEADERLESS=1 -DHAKMEM_TINY_SS_TLS_HINT=1 -DHAKMEM_BUILD_RELEASE=0" + +# Add stats dump at exit (in hakmem_exit.c or similar) +# void dump_hint_stats(void) { +# uint64_t hits = 0, misses = 0; +# tls_ss_hint_stats(&hits, &misses); +# fprintf(stderr, "[TLS_HINT_STATS] hits=%lu misses=%lu hit_rate=%.2f%%\n", +# hits, misses, 100.0 * hits / (hits + misses)); +# } + +# Run benchmark and check hit rate +LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/sh8bench 2>&1 | grep TLS_HINT_STATS +# Expected: hit_rate >= 85% +``` + +### 8.3 Correctness Tests + +```bash +# Test with external pointer (should fall back to hak_super_lookup) +# This tests that cache misses are handled correctly + +# Build with hint enabled +make clean +make shared -j8 EXTRA_CFLAGS="-DHAKMEM_TINY_HEADERLESS=1 -DHAKMEM_TINY_SS_TLS_HINT=1" + +# Run sh8bench (allocates from multiple SuperSlabs) +LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/sh8bench + +# No crashes or assertion failures = success +echo "Correctness test passed" +``` + +--- + +## 9. Performance Expectations + +### 9.1 Cycle Count Analysis + +| Operation | Without Hint | With Hint (Hit) | With Hint (Miss) | Improvement | +|-----------|-------------|----------------|-----------------|-------------| +| free() lookup | 10-50 cycles | 2-5 cycles | 10-50 cycles | 80-95% | +| Range check (per entry) | N/A | 2 cycles | 2 cycles | - | +| Hash table lookup | 10-50 cycles | N/A | 10-50 cycles | - | +| Total free() cost | 15-60 cycles | 7-15 cycles (hit) | 20-65 cycles (miss) | 40-60% | + +### 9.2 Expected Hit Rates + +| Workload | Hit Rate | Reasoning | +|----------|----------|-----------| +| Single-threaded LIFO | 95-99% | Free() immediately after alloc() from same SuperSlab | +| Single-threaded FIFO | 85-95% | Recent allocations from 2-4 SuperSlabs | +| Multi-threaded (8 threads) | 70-85% | Shared SuperSlabs, more cache thrashing | +| Larson (high churn) | 65-80% | Many active SuperSlabs, frequent evictions | + +### 9.3 Benchmark Targets + +| Benchmark | Baseline (no hint) | Target (with hint) | Improvement | +|-----------|-------------------|-------------------|-------------| +| sh8bench | 54.60 Mops/s | 64-68 Mops/s | +15-20% | +| cfrac | 1.25 sec | 1.10-1.15 sec | +10-15% | +| larson (8 threads) | 6.5M ops/s | 7.5-8.0M ops/s | +15-20% | + +### 9.4 Memory Overhead + +| Metric | Value | Notes | +|--------|-------|-------| +| Per-thread overhead | 112 bytes | TLS cache (release build) | +| Per-thread overhead (debug) | 128 bytes | TLS cache + stats counters | +| 1000 threads | 112 KB | Negligible for server workloads | +| 10000 threads | 1.12 MB | Still negligible | + +--- + +## 10. Risk Analysis + +| Risk | Likelihood | Impact | Mitigation | +|------|-----------|--------|------------| +| **Cache coherency issues** | Very Low | Low | TLS is thread-local, no sharing between threads | +| **Stale hint after munmap** | Low | Low | Magic check (SUPERSLAB_MAGIC) catches freed SuperSlabs | +| **Cache thrashing (many SS)** | Low | Low | 4 slots cover typical workloads; miss falls back to registry | +| **Memory overhead** | Very Low | Very Low | 112 bytes/thread, negligible for most workloads | +| **Integration bugs** | Low | Medium | Self-contained Box, clear API, comprehensive tests | +| **Hit rate lower than expected** | Low | Low | Even 50% hit rate improves performance; no regression on miss | +| **Complexity increase** | Low | Low | 150 LOC, header-only Box, minimal dependencies | + +### 10.1 Failure Modes and Recovery + +| Failure Mode | Detection | Recovery | +|-------------|-----------|----------| +| Stale SuperSlab pointer | Magic check (SUPERSLAB_MAGIC != expected) | Fall back to hak_super_lookup() | +| Cache miss | tls_ss_hint_lookup returns false | Fall back to hak_super_lookup() | +| Invalid hint range | ptr outside [base, end) | Linear search continues, eventually misses | +| Thread teardown | TLS cleanup by OS | No manual cleanup needed | +| SuperSlab freed | Magic number cleared | Caught by magic check in free() path | + +--- + +## 11. Future Considerations + +### 11.1 Phase 2 Integration: Global Class Map + +When Phase 2 introduces a Global Class Map (pointer → class_idx lookup), the TLS Hint Box becomes the first tier in a three-tier lookup hierarchy: + +``` +Tier 1 (fastest): TLS Hint Cache (2-5 cycles, 85-95% hit rate) + ↓ miss +Tier 2 (medium): Global Class Map (5-15 cycles, 99%+ hit rate) + ↓ miss +Tier 3 (slowest): Global SuperSlab Registry (10-50 cycles, 100% hit rate) +``` + +**Integration point**: +```c +SuperSlab* ss = NULL; +int class_idx = -1; + +// Tier 1: TLS hint +#if HAKMEM_TINY_SS_TLS_HINT +if (tls_ss_hint_lookup(ptr, &ss)) { + class_idx = slab_index_for(ss, ptr); + goto found; +} +#endif + +// Tier 2: Global class map +#if HAKMEM_TINY_CLASS_MAP +class_idx = class_map_lookup(ptr); +if (class_idx >= 0) { + ss = hak_super_lookup(ptr); // Still need SS for metadata + goto found; +} +#endif + +// Tier 3: Registry fallback +ss = hak_super_lookup(ptr); +if (ss && ss->magic == SUPERSLAB_MAGIC) { + class_idx = slab_index_for(ss, ptr); + goto found; +} + +// External pointer +hak_external_guard_free(ptr); +return; + +found: + tiny_free_to_class(class_idx, ptr); +``` + +### 11.2 Adaptive Cache Sizing + +Current design uses fixed `TLS_SS_HINT_SLOTS = 4`. Future optimization could make this adaptive: + +- **Workload detection**: Track hit rate over time windows +- **Dynamic sizing**: Increase slots (4 → 8) if hit rate < 80% +- **Memory pressure**: Decrease slots (8 → 2) if memory constrained + +**Implementation sketch**: +```c +#define TLS_SS_HINT_SLOTS_MAX 8 + +typedef struct { + uint32_t current_slots; // Dynamic (2, 4, 8) + uint64_t hits_window; + uint64_t misses_window; +} TlsSsHintAdaptive; + +void tls_ss_hint_tune(void) { + double hit_rate = (double)g_tls_ss_hint.hits_window / + (g_tls_ss_hint.hits_window + g_tls_ss_hint.misses_window); + + if (hit_rate < 0.80 && g_tls_ss_hint.current_slots < TLS_SS_HINT_SLOTS_MAX) { + g_tls_ss_hint.current_slots *= 2; // Grow cache + } else if (hit_rate > 0.95 && g_tls_ss_hint.current_slots > 2) { + g_tls_ss_hint.current_slots /= 2; // Shrink cache + } +} +``` + +### 11.3 LRU vs FIFO Eviction Policy + +Current design uses FIFO (simple, predictable). Alternative: LRU with move-to-front on hit. + +**LRU advantages**: +- Better hit rate for workloads with temporal locality +- Commonly used SuperSlabs stay cached longer + +**LRU disadvantages**: +- 2-3 extra cycles per hit (move to front) +- More complex implementation (doubly-linked list) + +**Benchmark before switching**: Profile sh8bench, larson, cfrac with both policies. + +### 11.4 Per-Class Hint Caches + +Current design: Single cache for all classes (4 entries, any class). +Alternative: Per-class caches (1 entry per class, 8 entries total). + +**Per-class advantages**: +- Guaranteed cache slot for each class +- No inter-class eviction + +**Per-class disadvantages**: +- Wastes space if only 2-3 classes are active +- More TLS overhead (8 entries vs 4) + +**Recommendation**: Defer until benchmarks show inter-class thrashing. + +### 11.5 Statistics Export API + +For production monitoring, export hit rate via: + +```c +// Global aggregated stats (all threads) +void hak_tls_hint_global_stats(uint64_t* total_hits, uint64_t* total_misses); + +// ENV-based stats dump at exit +// HAKMEM_TLS_HINT_STATS=1 → dump to stderr at exit +``` + +--- + +## 12. Implementation Checklist + +### 12.1 Phase 1a: Core Implementation (Week 1) +- [ ] Create `core/box/tls_ss_hint_box.h` +- [ ] Implement `tls_ss_hint_init()` +- [ ] Implement `tls_ss_hint_update()` +- [ ] Implement `tls_ss_hint_lookup()` +- [ ] Implement `tls_ss_hint_clear()` +- [ ] Add `HAKMEM_TINY_SS_TLS_HINT` flag to `hakmem_build_flags.h` +- [ ] Add validation check (hint requires headerless mode) + +### 12.2 Phase 1b: Integration (Week 2) +- [ ] Integrate into `hakmem_tiny_free.inc` (lookup path) +- [ ] Integrate into `hakmem_tiny.c` (update path after alloc) +- [ ] Integrate into `hakmem_tiny_refill.inc.h` (update path after refill) +- [ ] Integrate into `core/front/tiny_unified_cache.c` (update path) +- [ ] Call `tls_ss_hint_init()` in thread-local init + +### 12.3 Phase 1c: Testing (Week 2-3) +- [ ] Write unit tests (`tests/test_tls_ss_hint.c`) +- [ ] Run unit tests: `make test_tls_ss_hint && ./test_tls_ss_hint` +- [ ] Build validation (hint disabled, hint enabled, error check) +- [ ] Benchmark comparison (sh8bench, cfrac, larson) +- [ ] Hit rate profiling (debug build with stats) +- [ ] Correctness tests (no crashes, no assertion failures) + +### 12.4 Phase 1d: Validation (Week 3) +- [ ] Benchmark: sh8bench (target: +15-20%) +- [ ] Benchmark: cfrac (target: +10-15%) +- [ ] Benchmark: larson 8 threads (target: +15-20%) +- [ ] Hit rate analysis (target: 85-95%) +- [ ] Memory overhead check (target: < 150 bytes/thread) +- [ ] Regression test: Headerless=0 mode still works + +### 12.5 Phase 1e: Documentation (Week 3-4) +- [ ] Update `docs/PHASE2_HEADERLESS_INSTRUCTION.md` with hint Box +- [ ] Add Box Theory annotation to hakmem Box registry +- [ ] Write performance analysis report (before/after comparison) +- [ ] Update build instructions (`make shared EXTRA_CFLAGS=...`) + +--- + +## 13. Rollout Plan + +### Stage 1: Internal Testing (Week 1-3) +- Build with `HAKMEM_TINY_SS_TLS_HINT=1` in dev environment +- Run full benchmark suite (mimalloc-bench) +- Profile with perf/cachegrind (verify cycle count reduction) +- Fix any integration bugs + +### Stage 2: Canary Deployment (Week 4) +- Enable hint Box in 5% of production traffic +- Monitor: crash rate, performance metrics, hit rate +- A/B test: Hint ON vs Hint OFF + +### Stage 3: Gradual Rollout (Week 5-6) +- 25% traffic (if canary success) +- 50% traffic +- 100% traffic + +### Stage 4: Default Enable (Week 7) +- Change default: `HAKMEM_TINY_SS_TLS_HINT=1` +- Update build scripts, CI/CD pipelines +- Announce in release notes + +--- + +## 14. Success Metrics + +| Metric | Baseline | Target | Measurement | +|--------|----------|--------|-------------| +| sh8bench throughput | 54.60 Mops/s | 64-68 Mops/s | +15-20% | +| cfrac runtime | 1.25 sec | 1.10-1.15 sec | -10-15% | +| larson throughput | 6.5M ops/s | 7.5-8.0M ops/s | +15-20% | +| TLS hint hit rate | N/A | 85-95% | Stats API | +| free() cycle count | 15-60 cycles | 7-15 cycles (hit) | perf/cachegrind | +| Memory overhead | 0 | < 150 bytes/thread | sizeof(TlsSsHintCache) | +| Crash rate | 0.001% | 0.001% (no regression) | Production monitoring | + +--- + +## 15. Open Questions + +1. **Q**: Should we implement per-class hint caches instead of unified cache? + **A**: Defer until benchmarks show inter-class thrashing. Current unified design is simpler and sufficient for most workloads. + +2. **Q**: Should we use LRU instead of FIFO eviction? + **A**: Defer until benchmarks show FIFO hit rate < 80%. FIFO is simpler and avoids move-to-front cost on hits. + +3. **Q**: Should we make TLS_SS_HINT_SLOTS runtime-configurable? + **A**: No, compile-time constant allows better optimization (loop unrolling, register allocation). Consider adaptive sizing in Phase 2 if needed. + +4. **Q**: Should we validate SUPERSLAB_MAGIC in tls_ss_hint_lookup()? + **A**: No, keep lookup minimal (2-5 cycles). Caller (free() path) must validate magic. This matches existing design where hak_super_lookup() also requires caller validation. + +5. **Q**: Should we export hit rate stats in production builds? + **A**: Phase 1: No (save 16 bytes/thread). Phase 2: Add global aggregated stats API for monitoring if needed. + +--- + +## 16. Conclusion + +The TLS Superslab Hint Box is a low-risk, high-reward optimization that reduces the performance gap between Headerless mode and Header mode from 30% to ~15%. The design is self-contained, testable, and follows hakmem's Box Theory architecture. Expected implementation time: 3-4 weeks (including testing and validation). + +**Key Strengths**: +- Minimal integration surface (5 call sites) +- Self-contained Box (no dependencies) +- Fail-safe fallback (miss → hak_super_lookup) +- Low memory overhead (112 bytes/thread) +- Proven pattern (TLS caching used in jemalloc, tcmalloc) + +**Next Steps**: +1. Review this design document +2. Approve Phase 1a implementation (core Box) +3. Begin implementation with unit tests +4. Benchmark and validate in dev environment +5. Plan Phase 2 integration (Global Class Map) + +--- + +**End of Design Document**