Files
hakmem/docs/STATUS_2025_12_03_CURRENT.md
Moe Charm (CI) 2624dcce62 Add comprehensive ChatGPT handoff documentation for TLS SLL diagnosis
Created 9 diagnostic and handoff documents (48KB) to guide ChatGPT through
systematic diagnosis and fix of TLS SLL header corruption issue.

Documents Added:
- README_HANDOFF_CHATGPT.md: Master guide explaining 3-doc system
- CHATGPT_CONTEXT_SUMMARY.md: Quick facts & architecture (2-3 min read)
- CHATGPT_HANDOFF_TLS_DIAGNOSIS.md: 7-step procedure (4-8h timeline)
- GEMINI_HANDOFF_SUMMARY.md: Handoff summary for user review
- STATUS_2025_12_03_CURRENT.md: Complete project status snapshot
- TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md: Deep reference (1,150+ lines)
  - 6 root cause patterns with code examples
  - Diagnostic logging instrumentation
  - Fix templates and validation procedures
- TLS_SS_HINT_BOX_DESIGN.md: Phase 1 optimization design (1,148 lines)
- HEADERLESS_STABILITY_DEBUG_INSTRUCTIONS.md: Test environment setup
- SEGFAULT_INVESTIGATION_FOR_GEMINI.md: Original investigation notes

Problem Context:
- Baseline (Headerless OFF) crashes with [TLS_SLL_HDR_RESET]
- Error: cls=1 base=0x... got=0x31 expect=0xa1
- Blocks Phase 1 validation and Phase 2 progression

Expected Outcome:
- ChatGPT follows 7-step diagnostic process
- Root cause identified (one of 6 patterns)
- Surgical fix (1-5 lines)
- TC1 baseline completes without crashes

🤖 Generated with Claude Code (https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 20:41:34 +09:00

9.0 KiB

Project Status - 2025-12-03

Last Updated: 2025-12-03 (Current) Status: 🔴 CRITICAL BLOCKER - TLS SLL Header Corruption Detected Overall Phase: Phase 1 Implementation + Phase 2 Design (Blocked)


Summary

The hakmem memory allocator project has reached a critical stability issue during Phase 1 performance benchmarking. The baseline configuration crashes with a TLS SLL header corruption error that affects all configurations, indicating a shared code path problem rather than a Phase 1 specific issue.


Completed Phases

Phase 0: Type Safety & Box Architecture Framework

  • Phantom Types implementation (ptr_type_box.h)
  • Pointer conversion API (ptr_conversion_box.h)
  • Root cause analysis verified (Gemini's mathematical proof)
  • Box theory framework established
  • Include order dependencies resolved (commit 2dc9d5d59)
  • Magazine Spill pointer wrapping fixed (commit f3f75ba3d)

Phase 1: Logic Centralization & Optimization (TLS Hint Box)

  • Designed TLS SuperSlab Hint Box (tls_ss_hint_box.h)
  • Implemented 5-function API (init, lookup, update, clear, stats)
  • Integrated into free path (lines 477-481, 550-555)
  • Integrated into alloc path (lines 115-122, 179-186)
  • Created 6 unit tests - ALL PASSING
  • Compiled as header-only (zero overhead when disabled)
  • ⚠️ Performance benchmarking: Only 2.3% improvement vs target 15-20%

Phase 2: Headerless Mode Design

  • Comprehensive design document (21KB)
  • All 7 task specifications documented
  • A/B toggle flag designed (HAKMEM_TINY_HEADERLESS)
  • SuperSlab Registry integration planned
  • TLS SLL validation skipping documented
  • BLOCKED: Cannot proceed - baseline instability

Current Critical Issue 🔴

Symptom

[TLS_SLL_HDR_RESET] cls=1 base=0x7ef296abf8c8 got=0x31 expect=0xa1 count=0
Segmentation fault (core dumped)

Location

  • File: core/box/tls_sll_box.h
  • Lines: 282-303
  • Function: tls_sll_pop_impl()
  • Operation: Header validation during free path

Impact

  • TC1 (Baseline) crashes after ~22 seconds of execution
  • Cannot validate Phase 1 performance improvements
  • Cannot proceed to Phase 2 implementation
  • Cannot benchmark any configuration variant

Root Cause

Unknown - One of six documented patterns:

  1. RAW pointer vs BASE pointer type mismatch
  2. Header offset mismatch (write vs read location)
  3. Atomic fence missing (compiler/CPU reordering)
  4. Adjacent block overflow corrupting header
  5. Class index mismatch during push/pop
  6. Headerless mode interference

Documents Created for Diagnosis

Three comprehensive documents have been created to guide the fix:

  1. docs/CHATGPT_CONTEXT_SUMMARY.md

    • Quick facts about the problem
    • Architecture overview
    • File locations and data structures
    • Timeline estimate: 4-8 hours
  2. docs/CHATGPT_HANDOFF_TLS_DIAGNOSIS.md

    • Step-by-step 7-step task breakdown
    • Detailed instructions for each phase
    • Expected validation criteria
    • Success metrics
  3. docs/TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md (Existing, 1,150+ lines)

    • Deep dive into all 6 root cause patterns
    • Code examples for each pattern
    • Minimal test case template
    • Diagnostic logging instrumentation
    • Fix code templates
    • 7-step validation procedure

What Needs to Happen

Immediate (Blocking)

  1. [CHATGPT TASK] Diagnose TLS SLL header corruption
    • Use the three diagnostic documents
    • Follow 7-step process
    • Expected delivery: 4-8 hours
    • Success criterion: TC1 baseline completes without crashes

After Diagnosis

  1. [DEPENDS ON #1] Validate Phase 1 performance

    • Run full benchmarks (TC1, TC2, TC3)
    • Confirm TLS Hint Box improves performance
    • Identify optimization opportunities
  2. [DEPENDS ON #1] Proceed to Phase 2

    • Implement Headerless mode (ON/OFF toggle)
    • Validate alignment guarantees
    • Benchmark performance trade-offs
  3. [DEPENDS ON #1-3] Phase 102 Planning

    • Design MemApi bridge
    • Connect hakmem to nyrt Ring0 runtime

Recent Git History

ad852e5d5 - Priority-2 ENV Cache: hakmem_batch.c (1変数追加、1箇所置換)
b741d61b4 - Priority-2 ENV Cache: hakmem_debug.c (1変数追加、1箇所置換)
22a67e5ca - Priority-2 ENV Cache: hakmem_smallmid.c (1変数追加、1箇所置換)
f0e77a000 - Priority-2 ENV Cache: hakmem_tiny.c (3箇所置換)
183b10673 - Priority-2 ENV Cache: Shared Pool Release (1箇所置換)

[Earlier commits in THIS session:]
94f9ea51  - Implement TLS SuperSlab Hint Box (Phase 1) ✅
           - Header-only implementation (256 lines)
           - 5 function APIs
           - 6 unit tests - ALL PASSING
           - Benchmarked at only 2.3% improvement

f3f75ba3d - Fix Magazine Spill RAW pointer type conversion ✅
           - Added HAK_BASE_FROM_RAW() wrapping
           - hakmem_tiny_refill.inc.h:228
           - Verified with cfrac/sh8bench

2dc9d5d59 - Fix include order in hakmem.c ✅
           - Moved hak_kpi_util.inc.h before hak_core_init.inc.h
           - Resolved undefined reference errors
           - Clean build verified

File Statistics

Category Count Status
Core Implementation 47 files Compiles
Box Components 15 files Box theory applied
Test Suite 23 tests ⚠️ 6 TLS Hint tests PASS, 17 others untested due to crash
Documentation 12 documents Comprehensive
Build Artifacts libhakmem.so Generates (547 KB)

Build Status

$ make clean && make shared -j8
✅ Compilation: SUCCESS
✅ Linking: SUCCESS
✅ Output: ./libhakmem.so (547 KB)
✅ Debug symbols: Included (-g flag)

$ LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/sh8bench
❌ Execution: SEGFAULT
Error: [TLS_SLL_HDR_RESET] cls=1 base=0x... got=0x31 expect=0xa1
Exit Code: 139 (SIGSEGV)
Runtime: ~22 seconds before crash

Key Metrics

Metric Value Status
Compilation Time 8-12 sec Good
Executable Size 547 KB Reasonable
Baseline Performance N/A Crashes
Phase 1 Optimization 2.3% ⚠️ Below target (15-20%)
Code Coverage Unknown Pending baseline fix

Next Steps (Clearly Defined)

For ChatGPT (Immediate Handoff)

Task: Diagnose and fix TLS SLL header corruption

Documents to Use:

  1. docs/CHATGPT_CONTEXT_SUMMARY.md - Quick reference
  2. docs/CHATGPT_HANDOFF_TLS_DIAGNOSIS.md - Step-by-step instructions
  3. docs/TLS_SLL_HEADER_CORRUPTION_DIAGNOSIS.md - Deep reference

Steps:

  1. Read diagnostic documents
  2. Create minimal reproducer
  3. Add diagnostic logging
  4. Run diagnostic test
  5. Identify root cause pattern
  6. Implement surgical fix (1-5 lines)
  7. Validate with TC1 baseline test

Success Criterion:

  • sh8bench runs to completion
  • cfrac runs without errors
  • No TLS_SLL_HDR_RESET errors
  • < 5% performance regression

Notes for Future Reference

Architecture Decisions Locked In

  1. Box Theory: Each component is isolated with clear APIs
  2. Phantom Types: Type safety in Debug mode, zero-cost in Release
  3. Pointer Conversion: Centralized in ptr_conversion_box.h
  4. Layout Definitions: Centralized in tiny_layout_box.h
  5. TLS SLL: Thread-local single-linked list with header validation
  6. SuperSlab Registry: Maps free pointers to class information (Phase 2)

Known Working Patterns

  • Magazine Spill RAW→BASE wrapping (fixed)
  • Include order dependencies (fixed)
  • Unit test framework (6 TLS Hint tests passing)
  • Box header-only compilation (verified)

Known Issues Needing Diagnosis

  • TLS SLL header corruption (PRIMARY BLOCKER)
  • Phase 1 performance below target (SECONDARY - optimization opportunity)
  • Headerless mode not yet validated (DEPENDS ON PRIMARY FIX)

Handoff Status

All diagnostic documents prepared Comprehensive step-by-step instructions created Root cause patterns documented with code examples Minimal test case template provided Validation procedures detailed

🎯 Ready for ChatGPT handoff

Next: Pass the three documents to ChatGPT with the directive to follow the 7-step process.


Questions for Next Phase

After the fix is complete, the following should be investigated:

  1. Why is Phase 1 performance only 2.3% improvement vs expected 15-20%?

    • Is 4 slots enough for the cache?
    • Are there secondary bottlenecks?
    • Does perf/cachegrind show cache misses?
  2. Can Phase 2 Headerless provide better performance than Phase 1?

    • What are the trade-offs?
    • Is the SuperSlab Registry lookup overhead worth it?
  3. How does hakmem compare to mimalloc and jemalloc across different workloads?

    • Are there specific use cases where hakmem excels?
    • Where does it fall short?

Status: 🔴 CRITICAL - Awaiting ChatGPT diagnosis and fix

Estimated Resolution Time: 4-8 hours from ChatGPT engagement

Next Review: After ChatGPT completes TLS SLL diagnosis and fix