Files
hakmem/core/hakmem_tiny_config.h
Moe Charm (CI) 030132f911 Phase 10: TLS/SFC aggressive cache tuning (syscall reduction failed)
Goal: Reduce backend transitions by increasing frontend hit rate
Result: +2% best case, syscalls unchanged (root cause: SuperSlab churn)

Implementation:

1. Cache capacity expansion (2-8x per-class)
   - Hot classes (C0-C3): 4x increase (512 slots)
   - Medium classes (C4-C6): 2-3x increase
   - Class 7 (1KB): 2x increase (128 slots)
   - Fast cache: 2x default capacity

2. Refill batch size increase (4-8x)
   - Global default: 16 → 64 (4x)
   - Hot classes: 128 (8x) via HAKMEM_TINY_REFILL_COUNT_HOT
   - Mid classes: 96 (6x) via HAKMEM_TINY_REFILL_COUNT_MID
   - Class 7: 64 → 128 (2x)
   - SFC refill: 64 → 128 (2x)

3. Adaptive sizing aggressive parameters
   - Grow threshold: 80% → 70% (expand earlier)
   - Shrink threshold: 20% → 10% (shrink less)
   - Growth rate: 2x → 1.5x (smoother growth)
   - Max capacity: 2048 → 4096 (2x ceiling)
   - Adapt frequency: Every 10 → 5 refills (more responsive)

Performance Results (100K iterations):

Before (Phase 9):
- Performance: 9.71M ops/s
- Syscalls: 1,729 (mmap:877, munmap:852)

After (Phase 10):
- Default settings: 8.77M ops/s (-9.7%) ⚠️
- Optimal ENV: 9.89M ops/s (+2%) 
- Syscalls: 1,729 (unchanged) 

Optimal ENV configuration:
export HAKMEM_TINY_REFILL_COUNT_HOT=256
export HAKMEM_TINY_REFILL_COUNT_MID=192

Root Cause Analysis:

Bottleneck is NOT TLS/SFC hit rate, but SuperSlab allocation churn:
- 877 SuperSlabs allocated (877MB via mmap)
- Phase 9 LRU cache not utilized (no frees during benchmark)
- All SuperSlabs retained until program exit
- System malloc: 9 syscalls vs HAKMEM: 1,729 syscalls (192x gap)

Conclusion:

TLS/SFC tuning cannot solve SuperSlab allocation policy problem.
Next step: Phase 11 SuperSlab Prewarm strategy to eliminate
mmap/munmap during benchmark execution.

ChatGPT review: Strategy validated, Option A (Prewarm) recommended.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 14:25:54 +09:00

192 lines
7.3 KiB
C

/**
* hakmem_tiny_config.h
*
* Centralized Configuration for TinyPool (≤1KB allocations)
* All tunable constants and defaults in one place
*
* Created: 2025-11-01
* Purpose: Simplify tuning and avoid scattered magic numbers
*/
#ifndef HAKMEM_TINY_CONFIG_H
#define HAKMEM_TINY_CONFIG_H
#include <stdint.h>
#include <stddef.h>
#ifdef __cplusplus
extern "C" {
#endif
// ============================================================================
// Size Classes (8 classes: 8B, 16B, 32B, 64B, 128B, 256B, 512B, 1KB)
// ============================================================================
#define TINY_NUM_CLASSES 8
// Size class boundaries (defined in hakmem_tiny.h, referenced here)
extern const size_t g_tiny_class_sizes[TINY_NUM_CLASSES];
// ============================================================================
// Fast Cache Configuration (per-class front-end cache)
// ============================================================================
// Default fast cache capacities per class (mutable so presets/env can tweak)
extern uint16_t g_fast_cap_defaults[TINY_NUM_CLASSES];
// Reset fast cache defaults back to the factory baseline
void tiny_config_reset_defaults(void);
// ============================================================================
// TLS Magazine Configuration (thread-local cache)
// ============================================================================
// Global magazine capacity limit (can be overridden by HAKMEM_TINY_MAG_CAP)
#define TINY_TLS_MAG_CAP 2048
// Default TLS magazine capacities per class
// These are the initial/default values before ACE learning adjusts them
// Implemented in hakmem_tiny_config.c
int tiny_default_cap(int class_idx);
int tiny_mag_default_cap(int class_idx); // Alias for tiny_default_cap
// Maximum allowed TLS magazine capacities per class
// These limits prevent ACE from growing caches too large
// Implemented in hakmem_tiny_config.c
int tiny_cap_max_for_class(int class_idx);
// ============================================================================
// SuperSlab Configuration (1MB aligned chunks)
// ============================================================================
// SuperSlab constants are defined in hakmem_tiny_superslab.h to avoid duplication
// - SUPERSLAB_SIZE: 1MB (default)
// - SLABS_PER_SUPERSLAB: 256 (for 1MB SuperSlab)
// - SUPERSLAB_MAGIC: Magic number for validation
// ============================================================================
// Partial SuperSlab Release Configuration
// ============================================================================
// Enable partial SuperSlab release by default
// When enabled, SuperSlabs with low active block count are released via madvise
#define TINY_SS_PARTIAL_ENABLE_DEFAULT 1
// Partial release interval (every N refills)
#define TINY_SS_PARTIAL_INTERVAL_DEFAULT 4
// Active block threshold for partial release (percentage)
// If active_blocks / capacity < threshold, release the SuperSlab
#define TINY_SS_PARTIAL_THRESHOLD_PCT_DEFAULT 10 // 10%
// ============================================================================
// Refill/Drain Configuration
// ============================================================================
// Number of blocks to refill from SuperSlab to magazine
#define TINY_REFILL_BATCH_SIZE 16
// Number of blocks to drain from magazine to SuperSlab
#define TINY_DRAIN_BATCH_SIZE 16
// ============================================================================
// Remote Free Configuration (cross-thread free)
// ============================================================================
// Remote free list capacity per class
#define TINY_REMOTE_FREE_CAP 64
// Batch size for draining remote free list
#define TINY_REMOTE_DRAIN_BATCH 32
// ============================================================================
// Memory Efficiency Presets
// ============================================================================
// Preset: Balanced (default)
// - Moderate cache sizes
// - Partial release enabled
// - Good balance between performance and RSS
#define TINY_PRESET_BALANCED() tiny_config_reset_defaults()
// Preset: Tight (low memory)
// - Smaller cache sizes
// - Aggressive partial release
// - Optimized for RSS at slight performance cost
#define TINY_PRESET_TIGHT() do { \
g_fast_cap_defaults[0] = 64; /* 8B */ \
g_fast_cap_defaults[1] = 64; /* 16B */ \
g_fast_cap_defaults[2] = 64; /* 32B */ \
g_fast_cap_defaults[3] = 64; /* 64B */ \
g_fast_cap_defaults[4] = 64; /* 128B */ \
g_fast_cap_defaults[5] = 64; /* 256B */ \
g_fast_cap_defaults[6] = 64; /* 512B */ \
} while(0)
// Preset: Ultra Tight (minimal memory)
// - Minimal cache sizes
// - Maximum RSS reduction
// - Use for memory-constrained environments
#define TINY_PRESET_ULTRA_TIGHT() do { \
g_fast_cap_defaults[0] = 32; /* 8B */ \
g_fast_cap_defaults[1] = 32; /* 16B */ \
g_fast_cap_defaults[2] = 32; /* 32B */ \
g_fast_cap_defaults[3] = 32; /* 64B */ \
g_fast_cap_defaults[4] = 32; /* 128B */ \
g_fast_cap_defaults[5] = 32; /* 256B */ \
g_fast_cap_defaults[6] = 32; /* 512B */ \
} while(0)
// ============================================================================
// Super Front Cache (SFC) Configuration - Box 5-NEW (Phase 1)
// ============================================================================
// SFC Feature Flag (A/B testing)
// ENV: HAKMEM_SFC_ENABLE (default: 0, OFF)
extern int g_sfc_enabled;
// SFC Default Configuration (can be overridden via ENV)
// Phase 10: Aggressive SFC defaults to maximize front cache hit rate
// ENV: HAKMEM_SFC_CAPACITY (default: 256, range: 16-512)
// ENV: HAKMEM_SFC_REFILL_COUNT (default: 128, range: 8-256)
#define SFC_DEFAULT_CAPACITY 256
#define SFC_DEFAULT_REFILL_COUNT 128
// SFC Per-Class Overrides (optional)
// ENV: HAKMEM_SFC_CAPACITY_CLASS{0..7} (per-class capacity)
// ENV: HAKMEM_SFC_REFILL_COUNT_CLASS{0..7} (per-class refill count)
// SFC Statistics Dump (optional)
// ENV: HAKMEM_SFC_STATS_DUMP=1 (print stats at exit)
// ENV: HAKMEM_SFC_DEBUG=1 (enable debug logging)
// ============================================================================
// Environment Variable Overrides
// ============================================================================
// The following environment variables can override defaults:
//
// - HAKMEM_TINY_MAG_CAP: Global magazine cap limit
// - HAKMEM_TINY_MAG_CAP_C{0..7}: Per-class magazine cap override
// - HAKMEM_TINY_SS_PARTIAL: Enable/disable partial release (0/1)
// - HAKMEM_TINY_SS_PARTIAL_INT: Partial release interval
// - HAKMEM_TINY_SS_PARTIAL_PCT: Partial release threshold percentage
//
// - HAKMEM_SFC_ENABLE: Enable Super Front Cache (0/1, default: 0)
// - HAKMEM_SFC_CAPACITY: Default SFC capacity (16-256, default: 128)
// - HAKMEM_SFC_REFILL_COUNT: Default refill count (8-256, default: 64)
// - HAKMEM_SFC_CAPACITY_CLASS{0..7}: Per-class capacity override
// - HAKMEM_SFC_REFILL_COUNT_CLASS{0..7}: Per-class refill count override
// - HAKMEM_SFC_STATS_DUMP: Print SFC stats at exit (0/1, default: 0)
// - HAKMEM_SFC_DEBUG: Enable SFC debug logging (0/1, default: 0)
//
// Example:
// HAKMEM_TINY_MAG_CAP=512 HAKMEM_TINY_SS_PARTIAL=1 ./my_app
// HAKMEM_SFC_ENABLE=1 HAKMEM_SFC_CAPACITY=192 ./my_app # Test SFC Phase 1
#ifdef __cplusplus
}
#endif
#endif // HAKMEM_TINY_CONFIG_H