Goal: Reduce backend transitions by increasing frontend hit rate Result: +2% best case, syscalls unchanged (root cause: SuperSlab churn) Implementation: 1. Cache capacity expansion (2-8x per-class) - Hot classes (C0-C3): 4x increase (512 slots) - Medium classes (C4-C6): 2-3x increase - Class 7 (1KB): 2x increase (128 slots) - Fast cache: 2x default capacity 2. Refill batch size increase (4-8x) - Global default: 16 → 64 (4x) - Hot classes: 128 (8x) via HAKMEM_TINY_REFILL_COUNT_HOT - Mid classes: 96 (6x) via HAKMEM_TINY_REFILL_COUNT_MID - Class 7: 64 → 128 (2x) - SFC refill: 64 → 128 (2x) 3. Adaptive sizing aggressive parameters - Grow threshold: 80% → 70% (expand earlier) - Shrink threshold: 20% → 10% (shrink less) - Growth rate: 2x → 1.5x (smoother growth) - Max capacity: 2048 → 4096 (2x ceiling) - Adapt frequency: Every 10 → 5 refills (more responsive) Performance Results (100K iterations): Before (Phase 9): - Performance: 9.71M ops/s - Syscalls: 1,729 (mmap:877, munmap:852) After (Phase 10): - Default settings: 8.77M ops/s (-9.7%) ⚠️ - Optimal ENV: 9.89M ops/s (+2%) ✅ - Syscalls: 1,729 (unchanged) ❌ Optimal ENV configuration: export HAKMEM_TINY_REFILL_COUNT_HOT=256 export HAKMEM_TINY_REFILL_COUNT_MID=192 Root Cause Analysis: Bottleneck is NOT TLS/SFC hit rate, but SuperSlab allocation churn: - 877 SuperSlabs allocated (877MB via mmap) - Phase 9 LRU cache not utilized (no frees during benchmark) - All SuperSlabs retained until program exit - System malloc: 9 syscalls vs HAKMEM: 1,729 syscalls (192x gap) Conclusion: TLS/SFC tuning cannot solve SuperSlab allocation policy problem. Next step: Phase 11 SuperSlab Prewarm strategy to eliminate mmap/munmap during benchmark execution. ChatGPT review: Strategy validated, Option A (Prewarm) recommended. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
76 lines
2.9 KiB
C
76 lines
2.9 KiB
C
/**
|
||
* hakmem_tiny_config.c
|
||
*
|
||
* Implementation of centralized Tiny configuration constants
|
||
*/
|
||
|
||
#include "hakmem_tiny_config.h"
|
||
|
||
// ============================================================================
|
||
// Fast Cache Configuration
|
||
// ============================================================================
|
||
|
||
// Factory defaults ("aggressive") – mutable at runtime
|
||
// Phase 10: Aggressive cache sizing to maximize TLS hit rate
|
||
// Hot classes (C0-C3) get 2-4x larger caches to reduce backend transitions
|
||
static const uint16_t k_fast_cap_defaults_factory[TINY_NUM_CLASSES] = {
|
||
512, // Class 0: 8B (2x increase: hot class)
|
||
512, // Class 1: 16B (2x increase: hot class)
|
||
512, // Class 2: 32B (2x increase: hot class)
|
||
384, // Class 3: 64B (3x increase: hot class)
|
||
256, // Class 4: 128B (2x increase: medium class)
|
||
384, // Class 5: 256B (1.7x increase: bench-optimized)
|
||
192, // Class 6: 512B (1.5x increase)
|
||
96 // Class 7: 1KB (2x increase: reduce superslab reliance)
|
||
};
|
||
|
||
uint16_t g_fast_cap_defaults[TINY_NUM_CLASSES] = {
|
||
512, 512, 512, 384, 256, 384, 192, 96
|
||
};
|
||
|
||
void tiny_config_reset_defaults(void) {
|
||
for (int i = 0; i < TINY_NUM_CLASSES; i++) {
|
||
g_fast_cap_defaults[i] = k_fast_cap_defaults_factory[i];
|
||
}
|
||
}
|
||
|
||
// ============================================================================
|
||
// TLS Magazine Configuration
|
||
// ============================================================================
|
||
|
||
// Default TLS magazine capacities per class
|
||
// Phase 10: Aggressive cache sizing for hot classes (C0-C3)
|
||
// Goal: Maximize TLS hit rate, reduce backend transitions
|
||
int tiny_default_cap(int class_idx) {
|
||
switch (class_idx) {
|
||
case 0: return 512; // 8B (4x increase: hot class)
|
||
case 1: return 512; // 16B (4x increase: hot class)
|
||
case 2: return 512; // 32B (4x increase: hot class)
|
||
case 3: return 384; // 64B (3x increase: hot class)
|
||
case 4: return 192; // 128B (2x increase: medium class)
|
||
case 5: return 256; // 256B (2x increase: medium class)
|
||
case 6: return 192; // 512B (1.5x increase)
|
||
default: return 128; // 1KB (2x increase)
|
||
}
|
||
}
|
||
|
||
// Alias for tiny_default_cap
|
||
int tiny_mag_default_cap(int class_idx) {
|
||
return tiny_default_cap(class_idx);
|
||
}
|
||
|
||
// Maximum allowed TLS magazine capacities per class
|
||
// Phase 10: Raise ceilings to allow aggressive cache growth
|
||
int tiny_cap_max_for_class(int class_idx) {
|
||
switch (class_idx) {
|
||
case 0: return 4096; // 8B (2x increase: allow massive caching)
|
||
case 1: return 4096; // 16B (4x increase: hot class)
|
||
case 2: return 2048; // 32B (2.67x increase: hot class)
|
||
case 3: return 1536; // 64B (3x increase: hot class)
|
||
case 4: return 512; // 128B (3.2x increase: medium class)
|
||
case 5: return 768; // 256B (3x increase: medium class)
|
||
case 6: return 384; // 512B (3x increase)
|
||
default: return 256; // 1KB (4x increase)
|
||
}
|
||
}
|