Phase 10: TLS/SFC aggressive cache tuning (syscall reduction failed)

Goal: Reduce backend transitions by increasing frontend hit rate
Result: +2% best case, syscalls unchanged (root cause: SuperSlab churn)

Implementation:

1. Cache capacity expansion (2-8x per-class)
   - Hot classes (C0-C3): 4x increase (512 slots)
   - Medium classes (C4-C6): 2-3x increase
   - Class 7 (1KB): 2x increase (128 slots)
   - Fast cache: 2x default capacity

2. Refill batch size increase (4-8x)
   - Global default: 16 → 64 (4x)
   - Hot classes: 128 (8x) via HAKMEM_TINY_REFILL_COUNT_HOT
   - Mid classes: 96 (6x) via HAKMEM_TINY_REFILL_COUNT_MID
   - Class 7: 64 → 128 (2x)
   - SFC refill: 64 → 128 (2x)

3. Adaptive sizing aggressive parameters
   - Grow threshold: 80% → 70% (expand earlier)
   - Shrink threshold: 20% → 10% (shrink less)
   - Growth rate: 2x → 1.5x (smoother growth)
   - Max capacity: 2048 → 4096 (2x ceiling)
   - Adapt frequency: Every 10 → 5 refills (more responsive)

Performance Results (100K iterations):

Before (Phase 9):
- Performance: 9.71M ops/s
- Syscalls: 1,729 (mmap:877, munmap:852)

After (Phase 10):
- Default settings: 8.77M ops/s (-9.7%) ⚠️
- Optimal ENV: 9.89M ops/s (+2%) 
- Syscalls: 1,729 (unchanged) 

Optimal ENV configuration:
export HAKMEM_TINY_REFILL_COUNT_HOT=256
export HAKMEM_TINY_REFILL_COUNT_MID=192

Root Cause Analysis:

Bottleneck is NOT TLS/SFC hit rate, but SuperSlab allocation churn:
- 877 SuperSlabs allocated (877MB via mmap)
- Phase 9 LRU cache not utilized (no frees during benchmark)
- All SuperSlabs retained until program exit
- System malloc: 9 syscalls vs HAKMEM: 1,729 syscalls (192x gap)

Conclusion:

TLS/SFC tuning cannot solve SuperSlab allocation policy problem.
Next step: Phase 11 SuperSlab Prewarm strategy to eliminate
mmap/munmap during benchmark execution.

ChatGPT review: Strategy validated, Option A (Prewarm) recommended.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Moe Charm (CI)
2025-11-13 14:25:54 +09:00
parent fb10d1710b
commit 030132f911
6 changed files with 63 additions and 46 deletions

View File

@ -10,21 +10,22 @@
// Fast Cache Configuration
// ============================================================================
// Factory defaults (“balanced”) mutable at runtime
// Small classes (0..2) are given higher caps by default to favor hot small-size throughput.
// Factory defaults ("aggressive") mutable at runtime
// Phase 10: Aggressive cache sizing to maximize TLS hit rate
// Hot classes (C0-C3) get 2-4x larger caches to reduce backend transitions
static const uint16_t k_fast_cap_defaults_factory[TINY_NUM_CLASSES] = {
256, // Class 0: 8B (was 128)
256, // Class 1: 16B (was 128)
256, // Class 2: 32B (was 128)
128, // Class 3: 64B (reduced from 512 to limit RSS)
128, // Class 4: 128B (trimmed via ACE/TLS caps)
224, // Class 5: 256B (bench-optimized default)
128, // Class 6: 512B
48 // Class 7: 1KB (reduce superslab reliance)
512, // Class 0: 8B (2x increase: hot class)
512, // Class 1: 16B (2x increase: hot class)
512, // Class 2: 32B (2x increase: hot class)
384, // Class 3: 64B (3x increase: hot class)
256, // Class 4: 128B (2x increase: medium class)
384, // Class 5: 256B (1.7x increase: bench-optimized)
192, // Class 6: 512B (1.5x increase)
96 // Class 7: 1KB (2x increase: reduce superslab reliance)
};
uint16_t g_fast_cap_defaults[TINY_NUM_CLASSES] = {
256, 256, 256, 128, 128, 224, 128, 48
512, 512, 512, 384, 256, 384, 192, 96
};
void tiny_config_reset_defaults(void) {
@ -38,16 +39,18 @@ void tiny_config_reset_defaults(void) {
// ============================================================================
// Default TLS magazine capacities per class
// Phase 10: Aggressive cache sizing for hot classes (C0-C3)
// Goal: Maximize TLS hit rate, reduce backend transitions
int tiny_default_cap(int class_idx) {
switch (class_idx) {
case 0: return 128; // 8B
case 1: return 128; // 16B
case 2: return 128; // 32B
case 3: return 128; // 64B (reduced from 512 to limit RSS)
case 4: return 96; // 128B (aggressively trimmed to limit RSS)
case 5: return 128; // 256B
case 6: return 128; // 512B
default: return 64; // 1KB
case 0: return 512; // 8B (4x increase: hot class)
case 1: return 512; // 16B (4x increase: hot class)
case 2: return 512; // 32B (4x increase: hot class)
case 3: return 384; // 64B (3x increase: hot class)
case 4: return 192; // 128B (2x increase: medium class)
case 5: return 256; // 256B (2x increase: medium class)
case 6: return 192; // 512B (1.5x increase)
default: return 128; // 1KB (2x increase)
}
}
@ -57,15 +60,16 @@ int tiny_mag_default_cap(int class_idx) {
}
// Maximum allowed TLS magazine capacities per class
// Phase 10: Raise ceilings to allow aggressive cache growth
int tiny_cap_max_for_class(int class_idx) {
switch (class_idx) {
case 0: return 2048;
case 1: return 1024;
case 2: return 768;
case 3: return 512;
case 4: return 160;
case 5: return 256;
case 6: return 128;
default: return 64;
case 0: return 4096; // 8B (2x increase: allow massive caching)
case 1: return 4096; // 16B (4x increase: hot class)
case 2: return 2048; // 32B (2.67x increase: hot class)
case 3: return 1536; // 64B (3x increase: hot class)
case 4: return 512; // 128B (3.2x increase: medium class)
case 5: return 768; // 256B (3x increase: medium class)
case 6: return 384; // 512B (3x increase)
default: return 256; // 1KB (4x increase)
}
}