Files
hakmem/docs/status/PHASE2B_TLS_ADAPTIVE_SIZING.md
Moe Charm (CI) 67fb15f35f Wrap debug fprintf in !HAKMEM_BUILD_RELEASE guards (Release build optimization)
## Changes

### 1. core/page_arena.c
- Removed init failure message (lines 25-27) - error is handled by returning early
- All other fprintf statements already wrapped in existing #if !HAKMEM_BUILD_RELEASE blocks

### 2. core/hakmem.c
- Wrapped SIGSEGV handler init message (line 72)
- CRITICAL: Kept SIGSEGV/SIGBUS/SIGABRT error messages (lines 62-64) - production needs crash logs

### 3. core/hakmem_shared_pool.c
- Wrapped all debug fprintf statements in #if !HAKMEM_BUILD_RELEASE:
  - Node pool exhaustion warning (line 252)
  - SP_META_CAPACITY_ERROR warning (line 421)
  - SP_FIX_GEOMETRY debug logging (line 745)
  - SP_ACQUIRE_STAGE0.5_EMPTY debug logging (line 865)
  - SP_ACQUIRE_STAGE0_L0 debug logging (line 803)
  - SP_ACQUIRE_STAGE1_LOCKFREE debug logging (line 922)
  - SP_ACQUIRE_STAGE2_LOCKFREE debug logging (line 996)
  - SP_ACQUIRE_STAGE3 debug logging (line 1116)
  - SP_SLOT_RELEASE debug logging (line 1245)
  - SP_SLOT_FREELIST_LOCKFREE debug logging (line 1305)
  - SP_SLOT_COMPLETELY_EMPTY debug logging (line 1316)
- Fixed lock_stats_init() for release builds (lines 60-65) - ensure g_lock_stats_enabled is initialized

## Performance Validation

Before: 51M ops/s (with debug fprintf overhead)
After:  49.1M ops/s (consistent performance, fprintf removed from hot paths)

## Build & Test

```bash
./build.sh larson_hakmem
./out/release/larson_hakmem 1 5 1 1000 100 10000 42
# Result: 49.1M ops/s
```

Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 13:14:18 +09:00

399 lines
11 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Phase 2b: TLS Cache Adaptive Sizing
**Date**: 2025-11-08
**Priority**: 🟡 HIGH - Performance optimization
**Estimated Effort**: 3-5 days
**Status**: Ready for implementation
**Depends on**: Phase 2a (not blocking, can run in parallel)
---
## Executive Summary
**Problem**: TLS Cache has fixed capacity (256-768 slots) → Cannot adapt to workload
**Solution**: Implement adaptive sizing with high-water mark tracking
**Expected Result**: Hot classes get more cache → Better hit rate → Higher throughput
---
## Current Architecture (INEFFICIENT)
### Fixed Capacity
```c
// core/hakmem_tiny.c or similar
#define TLS_SLL_CAP_DEFAULT 256
static __thread int g_tls_sll_count[TINY_NUM_CLASSES];
static __thread void* g_tls_sll_head[TINY_NUM_CLASSES];
// Fixed capacity for all classes!
// Hot class (e.g., class 4 in Larson) → cache thrashes
// Cold class (e.g., class 0 rarely used) → wastes memory
```
### Why This is Inefficient
**Scenario 1: Hot class (class 4 - 128B allocations)**
```
Larson 4T: 4000+ concurrent 128B allocations
TLS cache capacity: 256 slots
Hit rate: ~6% (256/4000)
Result: Constant refill overhead → poor performance
```
**Scenario 2: Cold class (class 0 - 16B allocations)**
```
Usage: ~10 allocations per minute
TLS cache capacity: 256 slots
Waste: 246 slots × 16B = 3936B per thread wasted
```
---
## Proposed Architecture (ADAPTIVE)
### High-Water Mark Tracking
```c
typedef struct TLSCacheStats {
size_t capacity; // Current capacity
size_t high_water_mark; // Peak usage in recent window
size_t refill_count; // Number of refills in recent window
uint64_t last_adapt_time; // Timestamp of last adaptation
} TLSCacheStats;
static __thread TLSCacheStats g_tls_cache_stats[TINY_NUM_CLASSES];
```
### Adaptive Sizing Logic
```c
// Periodically adapt cache size based on usage
void adapt_tls_cache_size(int class_idx) {
TLSCacheStats* stats = &g_tls_cache_stats[class_idx];
// Update high-water mark
if (g_tls_sll_count[class_idx] > stats->high_water_mark) {
stats->high_water_mark = g_tls_sll_count[class_idx];
}
// Adapt every N refills or M seconds
uint64_t now = get_timestamp_ns();
if (stats->refill_count < ADAPT_REFILL_THRESHOLD &&
(now - stats->last_adapt_time) < ADAPT_TIME_THRESHOLD_NS) {
return; // Too soon to adapt
}
// Decide: grow, shrink, or keep
if (stats->high_water_mark > stats->capacity * 0.8) {
// High usage → grow cache (2x)
grow_tls_cache(class_idx);
} else if (stats->high_water_mark < stats->capacity * 0.2) {
// Low usage → shrink cache (0.5x)
shrink_tls_cache(class_idx);
}
// Reset stats for next window
stats->high_water_mark = g_tls_sll_count[class_idx];
stats->refill_count = 0;
stats->last_adapt_time = now;
}
```
---
## Implementation Tasks
### Task 1: Add Adaptive Sizing Stats (1-2 hours)
**File**: `core/hakmem_tiny.c` or TLS cache code
```c
// Per-class TLS cache statistics
typedef struct TLSCacheStats {
size_t capacity; // Current capacity
size_t high_water_mark; // Peak usage in recent window
size_t refill_count; // Refills since last adapt
size_t shrink_count; // Shrinks (for debugging)
size_t grow_count; // Grows (for debugging)
uint64_t last_adapt_time; // Timestamp of last adaptation
} TLSCacheStats;
static __thread TLSCacheStats g_tls_cache_stats[TINY_NUM_CLASSES];
// Configuration
#define TLS_CACHE_MIN_CAPACITY 16 // Minimum cache size
#define TLS_CACHE_MAX_CAPACITY 2048 // Maximum cache size
#define TLS_CACHE_INITIAL_CAPACITY 64 // Initial size (reduced from 256)
#define ADAPT_REFILL_THRESHOLD 10 // Adapt every 10 refills
#define ADAPT_TIME_THRESHOLD_NS (1000000000ULL) // Or every 1 second
// Growth thresholds
#define GROW_THRESHOLD 0.8 // Grow if usage > 80% of capacity
#define SHRINK_THRESHOLD 0.2 // Shrink if usage < 20% of capacity
```
### Task 2: Implement Grow/Shrink Functions (2-3 hours)
**File**: `core/hakmem_tiny.c`
```c
// Grow TLS cache capacity (2x)
static void grow_tls_cache(int class_idx) {
TLSCacheStats* stats = &g_tls_cache_stats[class_idx];
size_t new_capacity = stats->capacity * 2;
if (new_capacity > TLS_CACHE_MAX_CAPACITY) {
new_capacity = TLS_CACHE_MAX_CAPACITY;
}
if (new_capacity == stats->capacity) {
return; // Already at max
}
stats->capacity = new_capacity;
stats->grow_count++;
fprintf(stderr, "[TLS_CACHE] Grow class %d: %zu → %zu slots (grow_count=%zu)\n",
class_idx, stats->capacity / 2, stats->capacity, stats->grow_count);
}
// Shrink TLS cache capacity (0.5x)
static void shrink_tls_cache(int class_idx) {
TLSCacheStats* stats = &g_tls_cache_stats[class_idx];
size_t new_capacity = stats->capacity / 2;
if (new_capacity < TLS_CACHE_MIN_CAPACITY) {
new_capacity = TLS_CACHE_MIN_CAPACITY;
}
if (new_capacity == stats->capacity) {
return; // Already at min
}
// Evict excess blocks if current count > new_capacity
if (g_tls_sll_count[class_idx] > new_capacity) {
// Drain excess blocks back to SuperSlab
int excess = g_tls_sll_count[class_idx] - new_capacity;
drain_excess_blocks(class_idx, excess);
}
stats->capacity = new_capacity;
stats->shrink_count++;
fprintf(stderr, "[TLS_CACHE] Shrink class %d: %zu → %zu slots (shrink_count=%zu)\n",
class_idx, stats->capacity * 2, stats->capacity, stats->shrink_count);
}
// Drain excess blocks back to SuperSlab
static void drain_excess_blocks(int class_idx, int count) {
void** head = &g_tls_sll_head[class_idx];
int drained = 0;
while (*head && drained < count) {
void* block = *head;
*head = *(void**)block; // Pop from TLS list
// Return to SuperSlab (or freelist)
return_block_to_superslab(block, class_idx);
drained++;
g_tls_sll_count[class_idx]--;
}
fprintf(stderr, "[TLS_CACHE] Drained %d excess blocks from class %d\n", drained, class_idx);
}
```
### Task 3: Integrate Adaptation into Refill Path (2-3 hours)
**File**: `core/tiny_alloc_fast.inc.h` or refill code
```c
static inline int tiny_alloc_fast_refill(int class_idx) {
// ... existing refill logic ...
// Track refill for adaptive sizing
TLSCacheStats* stats = &g_tls_cache_stats[class_idx];
stats->refill_count++;
// Update high-water mark
if (g_tls_sll_count[class_idx] > stats->high_water_mark) {
stats->high_water_mark = g_tls_sll_count[class_idx];
}
// Periodically adapt cache size
adapt_tls_cache_size(class_idx);
// ... rest of refill ...
}
```
### Task 4: Implement Adaptation Logic (2-3 hours)
**File**: `core/hakmem_tiny.c`
```c
// Adapt TLS cache size based on usage patterns
static void adapt_tls_cache_size(int class_idx) {
TLSCacheStats* stats = &g_tls_cache_stats[class_idx];
// Adapt every N refills or M seconds
uint64_t now = get_timestamp_ns();
bool should_adapt = (stats->refill_count >= ADAPT_REFILL_THRESHOLD) ||
((now - stats->last_adapt_time) >= ADAPT_TIME_THRESHOLD_NS);
if (!should_adapt) {
return; // Too soon to adapt
}
// Calculate usage ratio
double usage_ratio = (double)stats->high_water_mark / (double)stats->capacity;
// Decide: grow, shrink, or keep
if (usage_ratio > GROW_THRESHOLD) {
// High usage (>80%) → grow cache
grow_tls_cache(class_idx);
} else if (usage_ratio < SHRINK_THRESHOLD) {
// Low usage (<20%) → shrink cache
shrink_tls_cache(class_idx);
} else {
// Moderate usage (20-80%) → keep current size
fprintf(stderr, "[TLS_CACHE] Keep class %d at %zu slots (usage=%.1f%%)\n",
class_idx, stats->capacity, usage_ratio * 100.0);
}
// Reset stats for next window
stats->high_water_mark = g_tls_sll_count[class_idx];
stats->refill_count = 0;
stats->last_adapt_time = now;
}
// Helper: Get timestamp in nanoseconds
static inline uint64_t get_timestamp_ns(void) {
struct timespec ts;
clock_gettime(CLOCK_MONOTONIC, &ts);
return (uint64_t)ts.tv_sec * 1000000000ULL + (uint64_t)ts.tv_nsec;
}
```
### Task 5: Initialize Adaptive Stats (1 hour)
**File**: `core/hakmem_tiny.c`
```c
void hak_tiny_init(void) {
// ... existing init ...
// Initialize TLS cache stats for each class
for (int class_idx = 0; class_idx < TINY_NUM_CLASSES; class_idx++) {
TLSCacheStats* stats = &g_tls_cache_stats[class_idx];
stats->capacity = TLS_CACHE_INITIAL_CAPACITY; // Start with 64 slots
stats->high_water_mark = 0;
stats->refill_count = 0;
stats->shrink_count = 0;
stats->grow_count = 0;
stats->last_adapt_time = get_timestamp_ns();
// Initialize TLS cache head/count
g_tls_sll_head[class_idx] = NULL;
g_tls_sll_count[class_idx] = 0;
}
}
```
### Task 6: Add Capacity Enforcement (2-3 hours)
**File**: `core/tiny_alloc_fast.inc.h`
```c
static inline int tiny_alloc_fast_refill(int class_idx) {
TLSCacheStats* stats = &g_tls_cache_stats[class_idx];
// Don't refill beyond current capacity
int current_count = g_tls_sll_count[class_idx];
int available_slots = stats->capacity - current_count;
if (available_slots <= 0) {
// Cache is full, don't refill
fprintf(stderr, "[TLS_CACHE] Class %d cache full (%d/%zu), skipping refill\n",
class_idx, current_count, stats->capacity);
return -1; // Signal caller to try again or use slow path
}
// Refill only up to capacity
int want_count = HAKMEM_TINY_REFILL_DEFAULT; // e.g., 16
int refill_count = (want_count < available_slots) ? want_count : available_slots;
// ... existing refill logic with refill_count ...
}
```
---
## Testing Strategy
### Test 1: Adaptive Behavior Verification
```bash
# Enable debug logging
HAKMEM_LOG=1 ./larson_hakmem 10 8 128 1024 1 12345 4 2>&1 | grep "TLS_CACHE"
# Should see:
# [TLS_CACHE] Grow class 4: 64 → 128 slots (grow_count=1)
# [TLS_CACHE] Grow class 4: 128 → 256 slots (grow_count=2)
# [TLS_CACHE] Grow class 4: 256 → 512 slots (grow_count=3)
# [TLS_CACHE] Keep class 0 at 64 slots (usage=5.2%)
```
### Test 2: Performance Improvement
```bash
# Before (fixed capacity)
./larson_hakmem 1 1 128 1024 1 12345 1
# Baseline: 2.71M ops/s
# After (adaptive capacity)
./larson_hakmem 1 1 128 1024 1 12345 1
# Expected: 2.8-3.0M ops/s (+3-10%)
```
### Test 3: Memory Efficiency
```bash
# Run with memory profiling
valgrind --tool=massif ./larson_hakmem 1 1 128 1024 1 12345 1
# Compare peak memory usage
# Fixed: 256 slots × 8 classes × 8B = ~16KB per thread
# Adaptive: ~8KB per thread (cold classes shrink to 16 slots)
```
---
## Success Criteria
**Adaptive behavior**: Logs show grow/shrink based on usage
**Hot class expansion**: Class 4 grows to 512+ slots under load
**Cold class shrinkage**: Class 0 shrinks to 16-32 slots
**Performance improvement**: +3-10% on Larson benchmark
**Memory efficiency**: -30-50% TLS cache memory usage
---
## Deliverable
**Report file**: `/mnt/workdisk/public_share/hakmem/PHASE2B_IMPLEMENTATION_REPORT.md`
**Required sections**:
1. **Adaptive sizing behavior** (logs showing grow/shrink)
2. **Performance comparison** (before/after)
3. **Memory usage comparison** (TLS cache overhead)
4. **Per-class capacity evolution** (graph if possible)
5. **Production readiness** (YES/NO verdict)
---
**Let's make TLS cache adaptive! 🎯**