Files
hakmem/core/hakmem_tiny_config.c
Moe Charm (CI) 030132f911 Phase 10: TLS/SFC aggressive cache tuning (syscall reduction failed)
Goal: Reduce backend transitions by increasing frontend hit rate
Result: +2% best case, syscalls unchanged (root cause: SuperSlab churn)

Implementation:

1. Cache capacity expansion (2-8x per-class)
   - Hot classes (C0-C3): 4x increase (512 slots)
   - Medium classes (C4-C6): 2-3x increase
   - Class 7 (1KB): 2x increase (128 slots)
   - Fast cache: 2x default capacity

2. Refill batch size increase (4-8x)
   - Global default: 16 → 64 (4x)
   - Hot classes: 128 (8x) via HAKMEM_TINY_REFILL_COUNT_HOT
   - Mid classes: 96 (6x) via HAKMEM_TINY_REFILL_COUNT_MID
   - Class 7: 64 → 128 (2x)
   - SFC refill: 64 → 128 (2x)

3. Adaptive sizing aggressive parameters
   - Grow threshold: 80% → 70% (expand earlier)
   - Shrink threshold: 20% → 10% (shrink less)
   - Growth rate: 2x → 1.5x (smoother growth)
   - Max capacity: 2048 → 4096 (2x ceiling)
   - Adapt frequency: Every 10 → 5 refills (more responsive)

Performance Results (100K iterations):

Before (Phase 9):
- Performance: 9.71M ops/s
- Syscalls: 1,729 (mmap:877, munmap:852)

After (Phase 10):
- Default settings: 8.77M ops/s (-9.7%) ⚠️
- Optimal ENV: 9.89M ops/s (+2%) 
- Syscalls: 1,729 (unchanged) 

Optimal ENV configuration:
export HAKMEM_TINY_REFILL_COUNT_HOT=256
export HAKMEM_TINY_REFILL_COUNT_MID=192

Root Cause Analysis:

Bottleneck is NOT TLS/SFC hit rate, but SuperSlab allocation churn:
- 877 SuperSlabs allocated (877MB via mmap)
- Phase 9 LRU cache not utilized (no frees during benchmark)
- All SuperSlabs retained until program exit
- System malloc: 9 syscalls vs HAKMEM: 1,729 syscalls (192x gap)

Conclusion:

TLS/SFC tuning cannot solve SuperSlab allocation policy problem.
Next step: Phase 11 SuperSlab Prewarm strategy to eliminate
mmap/munmap during benchmark execution.

ChatGPT review: Strategy validated, Option A (Prewarm) recommended.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 14:25:54 +09:00

76 lines
2.9 KiB
C
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

/**
* hakmem_tiny_config.c
*
* Implementation of centralized Tiny configuration constants
*/
#include "hakmem_tiny_config.h"
// ============================================================================
// Fast Cache Configuration
// ============================================================================
// Factory defaults ("aggressive") mutable at runtime
// Phase 10: Aggressive cache sizing to maximize TLS hit rate
// Hot classes (C0-C3) get 2-4x larger caches to reduce backend transitions
static const uint16_t k_fast_cap_defaults_factory[TINY_NUM_CLASSES] = {
512, // Class 0: 8B (2x increase: hot class)
512, // Class 1: 16B (2x increase: hot class)
512, // Class 2: 32B (2x increase: hot class)
384, // Class 3: 64B (3x increase: hot class)
256, // Class 4: 128B (2x increase: medium class)
384, // Class 5: 256B (1.7x increase: bench-optimized)
192, // Class 6: 512B (1.5x increase)
96 // Class 7: 1KB (2x increase: reduce superslab reliance)
};
uint16_t g_fast_cap_defaults[TINY_NUM_CLASSES] = {
512, 512, 512, 384, 256, 384, 192, 96
};
void tiny_config_reset_defaults(void) {
for (int i = 0; i < TINY_NUM_CLASSES; i++) {
g_fast_cap_defaults[i] = k_fast_cap_defaults_factory[i];
}
}
// ============================================================================
// TLS Magazine Configuration
// ============================================================================
// Default TLS magazine capacities per class
// Phase 10: Aggressive cache sizing for hot classes (C0-C3)
// Goal: Maximize TLS hit rate, reduce backend transitions
int tiny_default_cap(int class_idx) {
switch (class_idx) {
case 0: return 512; // 8B (4x increase: hot class)
case 1: return 512; // 16B (4x increase: hot class)
case 2: return 512; // 32B (4x increase: hot class)
case 3: return 384; // 64B (3x increase: hot class)
case 4: return 192; // 128B (2x increase: medium class)
case 5: return 256; // 256B (2x increase: medium class)
case 6: return 192; // 512B (1.5x increase)
default: return 128; // 1KB (2x increase)
}
}
// Alias for tiny_default_cap
int tiny_mag_default_cap(int class_idx) {
return tiny_default_cap(class_idx);
}
// Maximum allowed TLS magazine capacities per class
// Phase 10: Raise ceilings to allow aggressive cache growth
int tiny_cap_max_for_class(int class_idx) {
switch (class_idx) {
case 0: return 4096; // 8B (2x increase: allow massive caching)
case 1: return 4096; // 16B (4x increase: hot class)
case 2: return 2048; // 32B (2.67x increase: hot class)
case 3: return 1536; // 64B (3x increase: hot class)
case 4: return 512; // 128B (3.2x increase: medium class)
case 5: return 768; // 256B (3x increase: medium class)
case 6: return 384; // 512B (3x increase)
default: return 256; // 1KB (4x increase)
}
}