Phase 10: TLS/SFC aggressive cache tuning (syscall reduction failed)

Goal: Reduce backend transitions by increasing frontend hit rate
Result: +2% best case, syscalls unchanged (root cause: SuperSlab churn)

Implementation:

1. Cache capacity expansion (2-8x per-class)
   - Hot classes (C0-C3): 4x increase (512 slots)
   - Medium classes (C4-C6): 2-3x increase
   - Class 7 (1KB): 2x increase (128 slots)
   - Fast cache: 2x default capacity

2. Refill batch size increase (4-8x)
   - Global default: 16 → 64 (4x)
   - Hot classes: 128 (8x) via HAKMEM_TINY_REFILL_COUNT_HOT
   - Mid classes: 96 (6x) via HAKMEM_TINY_REFILL_COUNT_MID
   - Class 7: 64 → 128 (2x)
   - SFC refill: 64 → 128 (2x)

3. Adaptive sizing aggressive parameters
   - Grow threshold: 80% → 70% (expand earlier)
   - Shrink threshold: 20% → 10% (shrink less)
   - Growth rate: 2x → 1.5x (smoother growth)
   - Max capacity: 2048 → 4096 (2x ceiling)
   - Adapt frequency: Every 10 → 5 refills (more responsive)

Performance Results (100K iterations):

Before (Phase 9):
- Performance: 9.71M ops/s
- Syscalls: 1,729 (mmap:877, munmap:852)

After (Phase 10):
- Default settings: 8.77M ops/s (-9.7%) ⚠️
- Optimal ENV: 9.89M ops/s (+2%) 
- Syscalls: 1,729 (unchanged) 

Optimal ENV configuration:
export HAKMEM_TINY_REFILL_COUNT_HOT=256
export HAKMEM_TINY_REFILL_COUNT_MID=192

Root Cause Analysis:

Bottleneck is NOT TLS/SFC hit rate, but SuperSlab allocation churn:
- 877 SuperSlabs allocated (877MB via mmap)
- Phase 9 LRU cache not utilized (no frees during benchmark)
- All SuperSlabs retained until program exit
- System malloc: 9 syscalls vs HAKMEM: 1,729 syscalls (192x gap)

Conclusion:

TLS/SFC tuning cannot solve SuperSlab allocation policy problem.
Next step: Phase 11 SuperSlab Prewarm strategy to eliminate
mmap/munmap during benchmark execution.

ChatGPT review: Strategy validated, Option A (Prewarm) recommended.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Moe Charm (CI)
2025-11-13 14:25:54 +09:00
parent fb10d1710b
commit 030132f911
6 changed files with 63 additions and 46 deletions

View File

@ -466,17 +466,23 @@ void hak_tiny_init(void) {
}
// Front refill count globals
// Phase 10: Set aggressive defaults for hot and mid classes
{
char* g = getenv("HAKMEM_TINY_REFILL_COUNT");
if (g) { int v = atoi(g); if (v < 0) v = 0; if (v > 256) v = 256; g_refill_count_global = v; }
else { g_refill_count_global = 64; } // Phase 10: default 64 (was 16)
char* h = getenv("HAKMEM_TINY_REFILL_COUNT_HOT");
if (h) { int v = atoi(h); if (v < 0) v = 0; if (v > 256) v = 256; g_refill_count_hot = v; }
else { g_refill_count_hot = 128; } // Phase 10: default 128 for hot classes (C0-C3)
char* m = getenv("HAKMEM_TINY_REFILL_COUNT_MID");
if (m) { int v = atoi(m); if (v < 0) v = 0; if (v > 256) v = 256; g_refill_count_mid = v; }
else { g_refill_count_mid = 96; } // Phase 10: default 96 for mid classes (C4-C7)
}
// Sensible default for class 7 (1024B): favor larger refill to reduce refills/syscalls
if (g_refill_count_class[7] == 0) {
g_refill_count_class[7] = 64; // can be overridden by env HAKMEM_TINY_REFILL_COUNT_C7
g_refill_count_class[7] = 128; // Phase 10: increased from 64 to 128
}
{
char* fast_env = getenv("HAKMEM_TINY_FAST");