Files
hakmem/docs/CHATGPT_CONTEXT_SUMMARY.md
Moe Charm (CI) 2624dcce62 Add comprehensive ChatGPT handoff documentation for TLS SLL diagnosis
Created 9 diagnostic and handoff documents (48KB) to guide ChatGPT through
systematic diagnosis and fix of TLS SLL header corruption issue.

Documents Added:
- README_HANDOFF_CHATGPT.md: Master guide explaining 3-doc system
- CHATGPT_CONTEXT_SUMMARY.md: Quick facts & architecture (2-3 min read)
- CHATGPT_HANDOFF_TLS_DIAGNOSIS.md: 7-step procedure (4-8h timeline)
- GEMINI_HANDOFF_SUMMARY.md: Handoff summary for user review
- STATUS_2025_12_03_CURRENT.md: Complete project status snapshot
- TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md: Deep reference (1,150+ lines)
  - 6 root cause patterns with code examples
  - Diagnostic logging instrumentation
  - Fix templates and validation procedures
- TLS_SS_HINT_BOX_DESIGN.md: Phase 1 optimization design (1,148 lines)
- HEADERLESS_STABILITY_DEBUG_INSTRUCTIONS.md: Test environment setup
- SEGFAULT_INVESTIGATION_FOR_GEMINI.md: Original investigation notes

Problem Context:
- Baseline (Headerless OFF) crashes with [TLS_SLL_HDR_RESET]
- Error: cls=1 base=0x... got=0x31 expect=0xa1
- Blocks Phase 1 validation and Phase 2 progression

Expected Outcome:
- ChatGPT follows 7-step diagnostic process
- Root cause identified (one of 6 patterns)
- Surgical fix (1-5 lines)
- TC1 baseline completes without crashes

🤖 Generated with Claude Code (https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 20:41:34 +09:00

296 lines
8.5 KiB
Markdown

# Context Summary for ChatGPT - TLS SLL Header Corruption Fix
**Date**: 2025-12-03
**Project**: hakmem - Custom Memory Allocator
**Handoff From**: Gemini + Task agent (previous phase)
**Current Task**: Diagnose and fix TLS SLL header corruption
**Status**: CRITICAL BLOCKER - Investigation Required
---
## Quick Facts
| Item | Value |
|------|-------|
| **Problem** | Header corruption in TLS SLL during baseline testing |
| **Error Message** | `[TLS_SLL_HDR_RESET] cls=1 base=0x... got=0x31 expect=0xa1` |
| **Error Location** | `core/box/tls_sll_box.h:282-303` |
| **Affected Configurations** | ALL (shared code path issue) |
| **Root Cause** | Unknown (6 patterns documented) |
| **Fix Type** | Surgical (1-5 lines expected) |
| **Build Status** | ✅ Succeeds |
| **Baseline Test Status** | ❌ Crashes (SIGSEGV at ~22 seconds) |
---
## What is 0x31 vs 0xa1?
```
Expected (header magic): 0xa1 = 0xa0 (HEADER_MAGIC) | 0x01 (class_idx=1)
Got (corruption): 0x31 = ASCII character '1' or some user data
This means: User data exists where header should be.
```
---
## Project Architecture (Box Theory)
The hakmem allocator uses a **Box Theory** architecture where:
- Each component (memory layout, pointer conversion, TLS state) is a separate "box"
- Each box has a single responsibility and clear API boundaries
- Examples:
- `tiny_layout_box.h` - Class sizes and header offsets (single source of truth)
- `ptr_conversion_box.h` - Pointer type safety (base vs user pointers)
- `tls_sll_box.h` - Thread-local single-linked list management
- `tls_ss_hint_box.h` - SuperSlab hint cache (Phase 1 optimization)
---
## Recent Changes (Last 5 Commits)
1. **f3f75ba3d** - "Fix Magazine Spill RAW pointer type conversion"
- Added HAK_BASE_FROM_RAW() wrapper in hakmem_tiny_refill.inc.h:228
- Status: ✅ Fixed
2. **2dc9d5d59** - "Fix include order in hakmem.c"
- Moved #include "box/hak_kpi_util.inc.h" before hak_core_init.inc.h
- Status: ✅ Fixed
3. **94f9ea51** - "Implement TLS SuperSlab Hint Box (Phase 1)"
- New header-only cache for recently-used SuperSlabs
- Status: ✅ Implemented, but only 2.3% performance improvement (target was 15-20%)
4. Earlier: Box theory framework, phantom types, etc.
---
## The Remaining Issue: TLS SLL Header Corruption
### Symptom
```bash
# Build succeeds
$ make clean && make shared -j8
Building libhakmem.so... OK (547KB)
# But baseline test crashes
$ LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/sh8bench
[TLS_SLL_HDR_RESET] cls=1 base=0x7ef296abf8c8 got=0x31 expect=0xa1 count=0
Segmentation fault (core dumped)
```
### Timeline
- **When Discovered**: During Phase 1 benchmarking (2025-12-03)
- **Frequency**: 100% reproducible with sh8bench
- **Scope**: Affects baseline (Headerless OFF), so affects all configurations
### Error Location
**File**: `core/box/tls_sll_box.h` (lines 282-303)
**Function**: `tls_sll_pop_impl()`
**Operation**: Reading header validation
```c
// Simplified logic (actual code has more details)
if (tiny_class_preserves_header(class_idx)) {
uint8_t* b = (uint8_t*)raw_base;
uint8_t got = *b; // Read byte at offset 0
uint8_t expected = (uint8_t)(HEADER_MAGIC | (class_idx & HEADER_CLASS_MASK));
if (got != expected) {
fprintf(stderr, "[TLS_SLL_HDR_RESET] cls=%d base=%p got=0x%02x expect=0x%02x\n",
class_idx, raw_base, got, expected);
// Reset TLS SLL for this class
}
}
```
### Root Cause - Six Documented Patterns
The diagnostic document identifies six possible patterns:
1. **RAW Pointer vs BASE Pointer** - Wrong pointer type passed to tls_sll_push()
2. **Header Offset Mismatch** - Writing at one offset, reading at another
3. **Atomic Fence Missing** - Compiler/CPU reordering of write + push
4. **Adjacent Block Overflow** - User data from previous block overwrites header
5. **Class Index Mismatch** - Push with one class_idx, pop as different class_idx
6. **Headerless Mode Interference** - Mixed header/headerless logic despite OFF flag
---
## Your Task
**You have two comprehensive documents**:
1. **`docs/CHATGPT_HANDOFF_TLS_DIAGNOSIS.md`** (THIS FILE'S COMPANION)
- Step-by-step task breakdown
- 7-step investigation and fix process
- Expected validation criteria
2. **`docs/TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md`** (MAIN REFERENCE - 1,150+ LINES)
- Deep dive into all 6 root cause patterns
- Code examples for each pattern
- Minimal test case template
- Diagnostic logging instrumentation
- Fix code templates
- 7-step validation procedure
**Follow the handoff document's steps 1-7 to diagnose and fix this issue.**
---
## Build & Test Commands
### Quick Build
```bash
cd /mnt/workdisk/public_share/hakmem
make clean
make shared -j8
```
### Baseline Test (Should Currently Crash)
```bash
LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/sh8bench 2>&1 | \
grep -E "TLS_SLL_HDR_RESET|Total|Segmentation"
```
### Minimal Test Case (After Creation)
```bash
./tests/test_tls_sll_minimal 2>&1 | grep -E "TLS_SLL_HDR_RESET|PASS|FAIL"
```
---
## Important File Locations
| Path | Purpose |
|------|---------|
| `core/box/tls_sll_box.h` | TLS SLL implementation (error source) |
| `core/hakmem_tiny_free.inc` | Free path - where headers are written |
| `core/hakmem_tiny_refill.inc.h` | Magazine spill - recent fix location |
| `core/box/ptr_conversion_box.h` | Pointer type conversion |
| `core/box/tiny_layout_box.h` | Class layout definitions |
| `core/box/tls_ss_hint_box.h` | Phase 1 optimization (new) |
| `docs/TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md` | YOUR MAIN REFERENCE |
---
## Key Data Structures
### TLS SLL Header Structure
```c
typedef struct {
uint8_t hdr; // Header: 0xa0 | class_idx
uint8_t pad; // Padding/metadata
uint16_t _unused; // Alignment
SuperSlab* next; // Pointer to next SuperSlab
} TlsSllEntry;
```
### Header Validation
```c
// Expected value for class 1:
expected = 0xa0 | 1 = 0xa1
// What we're seeing:
got = 0x31 = some user data
// This means the header was never written OR was overwritten
```
---
## Pointer Types in hakmem
The codebase distinguishes between:
```c
hak_base_ptr_t - "Base pointer" pointing to start of allocation (includes header)
hak_user_ptr_t - "User pointer" pointing to user data (after offset adjustment)
Conversion:
user = base + tiny_user_offset(class_idx) // Typically base + 1
base = user - tiny_user_offset(class_idx) // Typically user - 1
```
**Critical**: In Headerless mode, the offset is 0, so base == user.
---
## Known Good Patterns (For Reference)
From previous fixes:
```c
// Pattern: Wrapping RAW pointer before TLS SLL push (ALREADY FIXED)
void* p = mag->items[--mag->top].ptr; // RAW pointer (user offset)
hak_base_ptr_t base_p = HAK_BASE_FROM_RAW(p); // Wrap to base pointer
if (!tls_sll_push(class_idx, base_p, cap)) { // Push base pointer
// Pattern: Consistent include order (ALREADY FIXED)
#include "box/hak_kpi_util.inc.h" // Must come first
#include "hak_core_init.inc.h" // Must come after
```
---
## Success Criteria
| Criteria | Status |
|----------|--------|
| TLS SLL Header Corruption diagnosed | ❌ In progress |
| Root cause pattern identified | ❌ In progress |
| Minimal reproducer created | ❌ In progress |
| Fix implemented | ❌ In progress |
| sh8bench runs without errors | ❌ GOAL |
| cfrac runs without errors | ❌ GOAL |
| No performance regression | ❌ GOAL |
---
## Previous Phase Context
This project has gone through several phases:
- **Phase 0**: Initial implementation (completed)
- **Phase 1**: TLS SuperSlab Hint Box optimization (implemented, needs validation)
- **Phase 2**: Headerless mode (designed, blocked by current issue)
- **Phase 102**: MemApi bridge (future)
The current issue blocks validation of Phase 1 and progression to Phase 2.
---
## Timeline Estimate
- **Step 1 (Read guide)**: 15-30 min
- **Step 2-3 (Setup + logging)**: 1-2 hours
- **Step 4 (Diagnostic run)**: 30 min
- **Step 5 (Pattern matching)**: 1 hour
- **Step 6 (Fix implementation)**: 30 min - 1 hour
- **Step 7 (Validation)**: 1-2 hours
**Total**: 4-8 hours expected
---
## Next: Start Investigation
👉 **Next Action**: Read `docs/CHATGPT_HANDOFF_TLS_DIAGNOSIS.md` and follow steps 1-7.
The comprehensive diagnostic guide (`docs/TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md`) contains all the details you need for each pattern and debugging technique.
**Questions or blockers?** The diagnostic guide has extensive explanations for each pattern.
---
**You're now ready to begin the investigation. Good luck! 🚀**