Phase 8 Complete: BenchFast crash root cause fixes Documentation updates: 1. CURRENT_TASK.md: - Phase 8 complete (TLS→Atomic + Header write fixes) - 箱理論 root cause analysis (3 critical bugs) - Next phase recommendations (Option C: BenchFast pool expansion) - Detailed technical explanations for each layer 2. .claude/claude.md: - Phase 8 achievement summary - 箱理論 4-principle validation - Commit references (191e65983,da8f4d2c8) Key Fixes Documented: - TLS→Atomic: Cross-thread guard variable (pthread_once bug) - Header Write: Direct write bypasses P3 optimization (free routing) - Infrastructure Isolation: __libc_calloc for cache arrays - Design Fix: Removed unified_cache_init() call 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2.3 KiB
2.3 KiB
Claude AI Instructions for HAKMEM Project
🚫 CRITICAL: LD_PRELOAD Benchmark Ban
NEVER suggest or attempt LD_PRELOAD benchmarking
Rule
If you encounter segfaults or issues with LD_PRELOAD:
- STOP immediately - Do not debug
- Remind user: "LD_PRELOAD benchmarks are banned (see LD_PRELOAD_BENCHMARK_BAN.md)"
- Suggest alternative: Use direct linking instead
Why This Rule Exists
- Previous sessions wasted hours debugging LD_PRELOAD segfaults
- Problem is NOT in HAKMEM - it's a glibc limitation
- Industry-wide issue affecting tcmalloc, jemalloc, mimalloc, hardened_malloc
- Trade-off: LD_PRELOAD safety requires mincore() → 6.4x performance loss → unacceptable
Correct Approach
# ✅ ALWAYS USE THIS
gcc -o bench bench.c libhakmem.a -lpthread
./bench
# ❌ NEVER USE THIS FOR BENCHMARKING
LD_PRELOAD=./libhakmem.so ./bench
Reference
See LD_PRELOAD_BENCHMARK_BAN.md for full details including:
- WebSearch evidence (hardened_malloc #98, mimalloc #21, Stack Overflow)
- Historical attempts (Phase 6.15, Phase 8.2)
- Technical root causes (dlsym recursion, printf malloc dependency, glibc edge cases)
Project Context
HAKMEM is a high-performance malloc replacement with:
- L0 Tiny Pool (≤1KiB): TLS magazine + TLS Active Slab
- L1 Mid Pool (1-16KiB): Thread-local cache
- L2 Pool (16-256KiB): Sharded locks + remote free rings
- L2.5 Pool (256KiB-2MiB): Size-class caching
- L3 BigCache (>2MiB): mmap with batch madvise
Current focus: Performance optimization and memory overhead reduction.
Phase 8 Complete (2025-11-30)
Achievement: BenchFast crash root cause fixes (箱理論 analysis)
Key Fixes:
- TLS→Atomic: Guard variable works across all threads (pthread_once bug)
- Header Write: Direct write bypasses P3 optimization (free routing bug)
- Infrastructure Isolation: __libc_calloc for Unified Cache arrays
- Design Fix: Removed unified_cache_init() call (BenchFast uses TLS SLL, not UC)
箱理論 Validation:
- Single Responsibility: Guard protects entire process (not per-thread)
- Clear Contract: BenchFast always writes headers (explicit)
- Observable: Atomic variable visible across all threads
- Composable: Works with pthread_once() threading model
Last Updated: 2025-11-30