# Phase 2b: TLS Cache Adaptive Sizing **Date**: 2025-11-08 **Priority**: 🟡 HIGH - Performance optimization **Estimated Effort**: 3-5 days **Status**: Ready for implementation **Depends on**: Phase 2a (not blocking, can run in parallel) --- ## Executive Summary **Problem**: TLS Cache has fixed capacity (256-768 slots) → Cannot adapt to workload **Solution**: Implement adaptive sizing with high-water mark tracking **Expected Result**: Hot classes get more cache → Better hit rate → Higher throughput --- ## Current Architecture (INEFFICIENT) ### Fixed Capacity ```c // core/hakmem_tiny.c or similar #define TLS_SLL_CAP_DEFAULT 256 static __thread int g_tls_sll_count[TINY_NUM_CLASSES]; static __thread void* g_tls_sll_head[TINY_NUM_CLASSES]; // Fixed capacity for all classes! // Hot class (e.g., class 4 in Larson) → cache thrashes // Cold class (e.g., class 0 rarely used) → wastes memory ``` ### Why This is Inefficient **Scenario 1: Hot class (class 4 - 128B allocations)** ``` Larson 4T: 4000+ concurrent 128B allocations TLS cache capacity: 256 slots Hit rate: ~6% (256/4000) Result: Constant refill overhead → poor performance ``` **Scenario 2: Cold class (class 0 - 16B allocations)** ``` Usage: ~10 allocations per minute TLS cache capacity: 256 slots Waste: 246 slots × 16B = 3936B per thread wasted ``` --- ## Proposed Architecture (ADAPTIVE) ### High-Water Mark Tracking ```c typedef struct TLSCacheStats { size_t capacity; // Current capacity size_t high_water_mark; // Peak usage in recent window size_t refill_count; // Number of refills in recent window uint64_t last_adapt_time; // Timestamp of last adaptation } TLSCacheStats; static __thread TLSCacheStats g_tls_cache_stats[TINY_NUM_CLASSES]; ``` ### Adaptive Sizing Logic ```c // Periodically adapt cache size based on usage void adapt_tls_cache_size(int class_idx) { TLSCacheStats* stats = &g_tls_cache_stats[class_idx]; // Update high-water mark if (g_tls_sll_count[class_idx] > stats->high_water_mark) { stats->high_water_mark = g_tls_sll_count[class_idx]; } // Adapt every N refills or M seconds uint64_t now = get_timestamp_ns(); if (stats->refill_count < ADAPT_REFILL_THRESHOLD && (now - stats->last_adapt_time) < ADAPT_TIME_THRESHOLD_NS) { return; // Too soon to adapt } // Decide: grow, shrink, or keep if (stats->high_water_mark > stats->capacity * 0.8) { // High usage → grow cache (2x) grow_tls_cache(class_idx); } else if (stats->high_water_mark < stats->capacity * 0.2) { // Low usage → shrink cache (0.5x) shrink_tls_cache(class_idx); } // Reset stats for next window stats->high_water_mark = g_tls_sll_count[class_idx]; stats->refill_count = 0; stats->last_adapt_time = now; } ``` --- ## Implementation Tasks ### Task 1: Add Adaptive Sizing Stats (1-2 hours) **File**: `core/hakmem_tiny.c` or TLS cache code ```c // Per-class TLS cache statistics typedef struct TLSCacheStats { size_t capacity; // Current capacity size_t high_water_mark; // Peak usage in recent window size_t refill_count; // Refills since last adapt size_t shrink_count; // Shrinks (for debugging) size_t grow_count; // Grows (for debugging) uint64_t last_adapt_time; // Timestamp of last adaptation } TLSCacheStats; static __thread TLSCacheStats g_tls_cache_stats[TINY_NUM_CLASSES]; // Configuration #define TLS_CACHE_MIN_CAPACITY 16 // Minimum cache size #define TLS_CACHE_MAX_CAPACITY 2048 // Maximum cache size #define TLS_CACHE_INITIAL_CAPACITY 64 // Initial size (reduced from 256) #define ADAPT_REFILL_THRESHOLD 10 // Adapt every 10 refills #define ADAPT_TIME_THRESHOLD_NS (1000000000ULL) // Or every 1 second // Growth thresholds #define GROW_THRESHOLD 0.8 // Grow if usage > 80% of capacity #define SHRINK_THRESHOLD 0.2 // Shrink if usage < 20% of capacity ``` ### Task 2: Implement Grow/Shrink Functions (2-3 hours) **File**: `core/hakmem_tiny.c` ```c // Grow TLS cache capacity (2x) static void grow_tls_cache(int class_idx) { TLSCacheStats* stats = &g_tls_cache_stats[class_idx]; size_t new_capacity = stats->capacity * 2; if (new_capacity > TLS_CACHE_MAX_CAPACITY) { new_capacity = TLS_CACHE_MAX_CAPACITY; } if (new_capacity == stats->capacity) { return; // Already at max } stats->capacity = new_capacity; stats->grow_count++; fprintf(stderr, "[TLS_CACHE] Grow class %d: %zu → %zu slots (grow_count=%zu)\n", class_idx, stats->capacity / 2, stats->capacity, stats->grow_count); } // Shrink TLS cache capacity (0.5x) static void shrink_tls_cache(int class_idx) { TLSCacheStats* stats = &g_tls_cache_stats[class_idx]; size_t new_capacity = stats->capacity / 2; if (new_capacity < TLS_CACHE_MIN_CAPACITY) { new_capacity = TLS_CACHE_MIN_CAPACITY; } if (new_capacity == stats->capacity) { return; // Already at min } // Evict excess blocks if current count > new_capacity if (g_tls_sll_count[class_idx] > new_capacity) { // Drain excess blocks back to SuperSlab int excess = g_tls_sll_count[class_idx] - new_capacity; drain_excess_blocks(class_idx, excess); } stats->capacity = new_capacity; stats->shrink_count++; fprintf(stderr, "[TLS_CACHE] Shrink class %d: %zu → %zu slots (shrink_count=%zu)\n", class_idx, stats->capacity * 2, stats->capacity, stats->shrink_count); } // Drain excess blocks back to SuperSlab static void drain_excess_blocks(int class_idx, int count) { void** head = &g_tls_sll_head[class_idx]; int drained = 0; while (*head && drained < count) { void* block = *head; *head = *(void**)block; // Pop from TLS list // Return to SuperSlab (or freelist) return_block_to_superslab(block, class_idx); drained++; g_tls_sll_count[class_idx]--; } fprintf(stderr, "[TLS_CACHE] Drained %d excess blocks from class %d\n", drained, class_idx); } ``` ### Task 3: Integrate Adaptation into Refill Path (2-3 hours) **File**: `core/tiny_alloc_fast.inc.h` or refill code ```c static inline int tiny_alloc_fast_refill(int class_idx) { // ... existing refill logic ... // Track refill for adaptive sizing TLSCacheStats* stats = &g_tls_cache_stats[class_idx]; stats->refill_count++; // Update high-water mark if (g_tls_sll_count[class_idx] > stats->high_water_mark) { stats->high_water_mark = g_tls_sll_count[class_idx]; } // Periodically adapt cache size adapt_tls_cache_size(class_idx); // ... rest of refill ... } ``` ### Task 4: Implement Adaptation Logic (2-3 hours) **File**: `core/hakmem_tiny.c` ```c // Adapt TLS cache size based on usage patterns static void adapt_tls_cache_size(int class_idx) { TLSCacheStats* stats = &g_tls_cache_stats[class_idx]; // Adapt every N refills or M seconds uint64_t now = get_timestamp_ns(); bool should_adapt = (stats->refill_count >= ADAPT_REFILL_THRESHOLD) || ((now - stats->last_adapt_time) >= ADAPT_TIME_THRESHOLD_NS); if (!should_adapt) { return; // Too soon to adapt } // Calculate usage ratio double usage_ratio = (double)stats->high_water_mark / (double)stats->capacity; // Decide: grow, shrink, or keep if (usage_ratio > GROW_THRESHOLD) { // High usage (>80%) → grow cache grow_tls_cache(class_idx); } else if (usage_ratio < SHRINK_THRESHOLD) { // Low usage (<20%) → shrink cache shrink_tls_cache(class_idx); } else { // Moderate usage (20-80%) → keep current size fprintf(stderr, "[TLS_CACHE] Keep class %d at %zu slots (usage=%.1f%%)\n", class_idx, stats->capacity, usage_ratio * 100.0); } // Reset stats for next window stats->high_water_mark = g_tls_sll_count[class_idx]; stats->refill_count = 0; stats->last_adapt_time = now; } // Helper: Get timestamp in nanoseconds static inline uint64_t get_timestamp_ns(void) { struct timespec ts; clock_gettime(CLOCK_MONOTONIC, &ts); return (uint64_t)ts.tv_sec * 1000000000ULL + (uint64_t)ts.tv_nsec; } ``` ### Task 5: Initialize Adaptive Stats (1 hour) **File**: `core/hakmem_tiny.c` ```c void hak_tiny_init(void) { // ... existing init ... // Initialize TLS cache stats for each class for (int class_idx = 0; class_idx < TINY_NUM_CLASSES; class_idx++) { TLSCacheStats* stats = &g_tls_cache_stats[class_idx]; stats->capacity = TLS_CACHE_INITIAL_CAPACITY; // Start with 64 slots stats->high_water_mark = 0; stats->refill_count = 0; stats->shrink_count = 0; stats->grow_count = 0; stats->last_adapt_time = get_timestamp_ns(); // Initialize TLS cache head/count g_tls_sll_head[class_idx] = NULL; g_tls_sll_count[class_idx] = 0; } } ``` ### Task 6: Add Capacity Enforcement (2-3 hours) **File**: `core/tiny_alloc_fast.inc.h` ```c static inline int tiny_alloc_fast_refill(int class_idx) { TLSCacheStats* stats = &g_tls_cache_stats[class_idx]; // Don't refill beyond current capacity int current_count = g_tls_sll_count[class_idx]; int available_slots = stats->capacity - current_count; if (available_slots <= 0) { // Cache is full, don't refill fprintf(stderr, "[TLS_CACHE] Class %d cache full (%d/%zu), skipping refill\n", class_idx, current_count, stats->capacity); return -1; // Signal caller to try again or use slow path } // Refill only up to capacity int want_count = HAKMEM_TINY_REFILL_DEFAULT; // e.g., 16 int refill_count = (want_count < available_slots) ? want_count : available_slots; // ... existing refill logic with refill_count ... } ``` --- ## Testing Strategy ### Test 1: Adaptive Behavior Verification ```bash # Enable debug logging HAKMEM_LOG=1 ./larson_hakmem 10 8 128 1024 1 12345 4 2>&1 | grep "TLS_CACHE" # Should see: # [TLS_CACHE] Grow class 4: 64 → 128 slots (grow_count=1) # [TLS_CACHE] Grow class 4: 128 → 256 slots (grow_count=2) # [TLS_CACHE] Grow class 4: 256 → 512 slots (grow_count=3) # [TLS_CACHE] Keep class 0 at 64 slots (usage=5.2%) ``` ### Test 2: Performance Improvement ```bash # Before (fixed capacity) ./larson_hakmem 1 1 128 1024 1 12345 1 # Baseline: 2.71M ops/s # After (adaptive capacity) ./larson_hakmem 1 1 128 1024 1 12345 1 # Expected: 2.8-3.0M ops/s (+3-10%) ``` ### Test 3: Memory Efficiency ```bash # Run with memory profiling valgrind --tool=massif ./larson_hakmem 1 1 128 1024 1 12345 1 # Compare peak memory usage # Fixed: 256 slots × 8 classes × 8B = ~16KB per thread # Adaptive: ~8KB per thread (cold classes shrink to 16 slots) ``` --- ## Success Criteria ✅ **Adaptive behavior**: Logs show grow/shrink based on usage ✅ **Hot class expansion**: Class 4 grows to 512+ slots under load ✅ **Cold class shrinkage**: Class 0 shrinks to 16-32 slots ✅ **Performance improvement**: +3-10% on Larson benchmark ✅ **Memory efficiency**: -30-50% TLS cache memory usage --- ## Deliverable **Report file**: `/mnt/workdisk/public_share/hakmem/PHASE2B_IMPLEMENTATION_REPORT.md` **Required sections**: 1. **Adaptive sizing behavior** (logs showing grow/shrink) 2. **Performance comparison** (before/after) 3. **Memory usage comparison** (TLS cache overhead) 4. **Per-class capacity evolution** (graph if possible) 5. **Production readiness** (YES/NO verdict) --- **Let's make TLS cache adaptive! 🎯**