Merge separate g_tls_sll_head[] and g_tls_sll_count[] arrays into unified TinyTLSSLL struct to improve L1D cache locality. Expected performance gain: +12-18% from reducing cache line splits (2 loads → 1 load per operation). Changes: - core/hakmem_tiny.h: Add TinyTLSSLL type (16B aligned, head+count+pad) - core/hakmem_tiny.c: Replace separate arrays with g_tls_sll[8] - core/box/tls_sll_box.h: Update Box API (13 sites) for unified access - Updated 32+ files: All g_tls_sll_head[i] → g_tls_sll[i].head - Updated 32+ files: All g_tls_sll_count[i] → g_tls_sll[i].count - core/hakmem_tiny_integrity.h: Unified canary guards - core/box/integrity_box.c: Simplified canary validation - Makefile: Added core/box/tiny_sizeclass_hist_box.o to link Build: ✅ PASS (10K ops sanity test) Warnings: Only pre-existing LTO type mismatches (unrelated) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
8 lines
560 B
Bash
Executable File
8 lines
560 B
Bash
Executable File
#!/bin/bash
|
|
echo "Verification runs for Hot_2048 (top performer):"
|
|
for i in 1 2 3 4 5; do
|
|
echo -n "Run $i: "
|
|
result=$(HAKMEM_TINY_UNIFIED_C0=64 HAKMEM_TINY_UNIFIED_C1=64 HAKMEM_TINY_UNIFIED_C2=2048 HAKMEM_TINY_UNIFIED_C3=2048 HAKMEM_TINY_UNIFIED_C4=64 HAKMEM_TINY_UNIFIED_C5=64 HAKMEM_TINY_UNIFIED_C6=64 HAKMEM_TINY_UNIFIED_C7=64 HAKMEM_TINY_UNIFIED_CACHE=1 ./out/release/bench_random_mixed_hakmem 100000 256 42 2>&1 | grep "Throughput" | grep -oP '\d+(?=\s+operations)')
|
|
echo "scale=2; $result / 1000000" | bc | xargs printf "%.2f M ops/s\n"
|
|
done
|