Files

Moe Charm (CI) 52386401b3 Debug Counters Implementation - Clean History

Major Features:
- Debug counter infrastructure for Refill Stage tracking
- Free Pipeline counters (ss_local, ss_remote, tls_sll)
- Diagnostic counters for early return analysis
- Unified larson.sh benchmark runner with profiles
- Phase 6-3 regression analysis documentation

Bug Fixes:
- Fix SuperSlab disabled by default (HAKMEM_TINY_USE_SUPERSLAB)
- Fix profile variable naming consistency
- Add .gitignore patterns for large files

Performance:
- Phase 6-3: 4.79 M ops/s (has OOM risk)
- With SuperSlab: 3.13 M ops/s (+19% improvement)

This is a clean repository without large log files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-05 12:31:14 +09:00

1.6 KiB

Raw Blame History

Claude AI Instructions for HAKMEM Project

🚫 CRITICAL: LD_PRELOAD Benchmark Ban

NEVER suggest or attempt LD_PRELOAD benchmarking

Rule

If you encounter segfaults or issues with LD_PRELOAD:

STOP immediately - Do not debug
Remind user: "LD_PRELOAD benchmarks are banned (see LD_PRELOAD_BENCHMARK_BAN.md)"
Suggest alternative: Use direct linking instead

Why This Rule Exists

Previous sessions wasted hours debugging LD_PRELOAD segfaults
Problem is NOT in HAKMEM - it's a glibc limitation
Industry-wide issue affecting tcmalloc, jemalloc, mimalloc, hardened_malloc
Trade-off: LD_PRELOAD safety requires mincore() → 6.4x performance loss → unacceptable

Correct Approach

# ✅ ALWAYS USE THIS
gcc -o bench bench.c libhakmem.a -lpthread
./bench

# ❌ NEVER USE THIS FOR BENCHMARKING
LD_PRELOAD=./libhakmem.so ./bench

Reference

See LD_PRELOAD_BENCHMARK_BAN.md for full details including:

WebSearch evidence (hardened_malloc #98, mimalloc #21, Stack Overflow)
Historical attempts (Phase 6.15, Phase 8.2)
Technical root causes (dlsym recursion, printf malloc dependency, glibc edge cases)

Project Context

HAKMEM is a high-performance malloc replacement with:

L0 Tiny Pool (≤1KiB): TLS magazine + TLS Active Slab
L1 Mid Pool (1-16KiB): Thread-local cache
L2 Pool (16-256KiB): Sharded locks + remote free rings
L2.5 Pool (256KiB-2MiB): Size-class caching
L3 BigCache (>2MiB): mmap with batch madvise

Current focus: Performance optimization and memory overhead reduction.

Last Updated: 2025-10-27

1.6 KiB Raw Blame History