Phase 10: TLS/SFC aggressive cache tuning (syscall reduction failed)
Goal: Reduce backend transitions by increasing frontend hit rate Result: +2% best case, syscalls unchanged (root cause: SuperSlab churn) Implementation: 1. Cache capacity expansion (2-8x per-class) - Hot classes (C0-C3): 4x increase (512 slots) - Medium classes (C4-C6): 2-3x increase - Class 7 (1KB): 2x increase (128 slots) - Fast cache: 2x default capacity 2. Refill batch size increase (4-8x) - Global default: 16 → 64 (4x) - Hot classes: 128 (8x) via HAKMEM_TINY_REFILL_COUNT_HOT - Mid classes: 96 (6x) via HAKMEM_TINY_REFILL_COUNT_MID - Class 7: 64 → 128 (2x) - SFC refill: 64 → 128 (2x) 3. Adaptive sizing aggressive parameters - Grow threshold: 80% → 70% (expand earlier) - Shrink threshold: 20% → 10% (shrink less) - Growth rate: 2x → 1.5x (smoother growth) - Max capacity: 2048 → 4096 (2x ceiling) - Adapt frequency: Every 10 → 5 refills (more responsive) Performance Results (100K iterations): Before (Phase 9): - Performance: 9.71M ops/s - Syscalls: 1,729 (mmap:877, munmap:852) After (Phase 10): - Default settings: 8.77M ops/s (-9.7%) ⚠️ - Optimal ENV: 9.89M ops/s (+2%) ✅ - Syscalls: 1,729 (unchanged) ❌ Optimal ENV configuration: export HAKMEM_TINY_REFILL_COUNT_HOT=256 export HAKMEM_TINY_REFILL_COUNT_MID=192 Root Cause Analysis: Bottleneck is NOT TLS/SFC hit rate, but SuperSlab allocation churn: - 877 SuperSlabs allocated (877MB via mmap) - Phase 9 LRU cache not utilized (no frees during benchmark) - All SuperSlabs retained until program exit - System malloc: 9 syscalls vs HAKMEM: 1,729 syscalls (192x gap) Conclusion: TLS/SFC tuning cannot solve SuperSlab allocation policy problem. Next step: Phase 11 SuperSlab Prewarm strategy to eliminate mmap/munmap during benchmark execution. ChatGPT review: Strategy validated, Option A (Prewarm) recommended. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@ -12,17 +12,20 @@
|
||||
// ========== Configuration ==========
|
||||
|
||||
// Capacity bounds
|
||||
#define TLS_CACHE_MIN_CAPACITY 16 // Minimum cache size
|
||||
#define TLS_CACHE_MAX_CAPACITY 2048 // Maximum cache size
|
||||
#define TLS_CACHE_INITIAL_CAPACITY 64 // Initial size (reduced from 256)
|
||||
// Phase 10: Aggressive adaptive sizing - maximize front cache utilization
|
||||
#define TLS_CACHE_MIN_CAPACITY 32 // Minimum cache size (2x increase)
|
||||
#define TLS_CACHE_MAX_CAPACITY 4096 // Maximum cache size (2x increase)
|
||||
#define TLS_CACHE_INITIAL_CAPACITY 256 // Initial size (4x increase from 64)
|
||||
|
||||
// Adaptation triggers
|
||||
#define ADAPT_REFILL_THRESHOLD 10 // Adapt every 10 refills
|
||||
#define ADAPT_TIME_THRESHOLD_NS (1000000000ULL) // Or every 1 second
|
||||
// Phase 10: More frequent adaptation to respond quickly to workload changes
|
||||
#define ADAPT_REFILL_THRESHOLD 5 // Adapt every 5 refills (was 10)
|
||||
#define ADAPT_TIME_THRESHOLD_NS (500000000ULL) // Or every 0.5 seconds (was 1s)
|
||||
|
||||
// Growth/shrink thresholds
|
||||
#define GROW_THRESHOLD 0.8 // Grow if usage > 80% of capacity
|
||||
#define SHRINK_THRESHOLD 0.2 // Shrink if usage < 20% of capacity
|
||||
// Phase 10: Aggressive growth, conservative shrinkage
|
||||
#define GROW_THRESHOLD 0.7 // Grow if usage > 70% of capacity (was 80%)
|
||||
#define SHRINK_THRESHOLD 0.1 // Shrink if usage < 10% of capacity (was 20%)
|
||||
|
||||
// ========== Data Structures ==========
|
||||
|
||||
|
||||
Reference in New Issue
Block a user