Phase 3d-B: TLS Cache Merge - Unified g_tls_sll[] structure (+12-18% expected)

Merge separate g_tls_sll_head[] and g_tls_sll_count[] arrays into unified
TinyTLSSLL struct to improve L1D cache locality. Expected performance gain:
+12-18% from reducing cache line splits (2 loads → 1 load per operation).

Changes:
- core/hakmem_tiny.h: Add TinyTLSSLL type (16B aligned, head+count+pad)
- core/hakmem_tiny.c: Replace separate arrays with g_tls_sll[8]
- core/box/tls_sll_box.h: Update Box API (13 sites) for unified access
- Updated 32+ files: All g_tls_sll_head[i] → g_tls_sll[i].head
- Updated 32+ files: All g_tls_sll_count[i] → g_tls_sll[i].count
- core/hakmem_tiny_integrity.h: Unified canary guards
- core/box/integrity_box.c: Simplified canary validation
- Makefile: Added core/box/tiny_sizeclass_hist_box.o to link

Build:  PASS (10K ops sanity test)
Warnings: Only pre-existing LTO type mismatches (unrelated)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Moe Charm (CI)
2025-11-20 07:32:30 +09:00
parent 38552c3f39
commit 9b0d746407
83 changed files with 7509 additions and 259 deletions

View File

@ -35,6 +35,29 @@ int hak_is_initializing(void);
// Forward declaration (implementation in hakmem_tiny.c)
size_t tiny_get_max_size(void);
// ============================================================================
// Phase 3d-B: TLS Cache Merge - Unified TLS SLL Structure
// ============================================================================
//
// Goal: Improve L1D cache hit rate by merging head+count into same struct.
//
// OLD (cache line split):
// __thread void* g_tls_sll_head[8]; // 64 bytes (cache line 0)
// __thread uint32_t g_tls_sll_count[8]; // 32 bytes (cache line 1)
// → 2 L1D loads per operation (head from line 0, count from line 1)
//
// NEW (unified):
// __thread TinyTLSSLL g_tls_sll[8]; // 128 bytes = 2 cache lines
// → 1 L1D load per operation (head+count in same 16B struct)
//
// Expected: +12-18% improvement from cache locality
//
typedef struct {
void* head; // SLL head pointer (8 bytes)
uint32_t count; // Number of elements in SLL (4 bytes)
uint32_t _pad; // Padding to 16 bytes for cache alignment (4 bytes)
} TinyTLSSLL;
// ============================================================================
// Size Classes
// ============================================================================