Phase 8-Step1-3: Unified Cache hot path optimization (config macro + prewarm + PGO init removal)

Goal: Reduce branches in Unified Cache hot paths (-2 branches per op)
Expected improvement: +2-3% in PGO mode

Changes:
1. Config Macro (Step 1):
   - Added TINY_FRONT_UNIFIED_CACHE_ENABLED macro to tiny_front_config_box.h
   - PGO mode: compile-time constant (1)
   - Normal mode: runtime function call unified_cache_enabled()
   - Replaced unified_cache_enabled() calls in 3 locations:
     * unified_cache_pop() line 142
     * unified_cache_push() line 182
     * unified_cache_pop_or_refill() line 228

2. Function Declaration Fix:
   - Moved unified_cache_enabled() from static inline to non-static
   - Implementation in tiny_unified_cache.c (was in .h as static inline)
   - Forward declaration in tiny_front_config_box.h
   - Resolves declaration conflict between config box and header

3. Prewarm (Step 2):
   - Added unified_cache_init() call to bench_fast_init()
   - Ensures cache is initialized before benchmark starts
   - Enables PGO builds to remove lazy init checks

4. Conditional Init Removal (Step 3):
   - Wrapped lazy init checks in #if !HAKMEM_TINY_FRONT_PGO
   - PGO builds assume prewarm → no init check needed (-1 branch)
   - Normal builds keep lazy init for safety
   - Applied to 3 functions: unified_cache_pop(), unified_cache_push(), unified_cache_pop_or_refill()

Performance Impact:
  PGO mode: -2 branches per operation (enabled check + init check)
  Normal mode: Same as before (runtime checks)

Branch Elimination (PGO):
  Before: if (!unified_cache_enabled()) + if (slots == NULL)
  After:  if (!1) [eliminated] + [init check removed]
  Result: -2 branches in alloc/free hot paths

Files Modified:
  core/box/tiny_front_config_box.h        - Config macro + forward declaration
  core/front/tiny_unified_cache.h         - Config macro usage + PGO conditionals
  core/front/tiny_unified_cache.c         - unified_cache_enabled() implementation
  core/box/bench_fast_box.c               - Prewarm call in bench_fast_init()

Note: BenchFast mode has pre-existing crash (not caused by these changes)

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Moe Charm (CI)
2025-11-29 17:58:42 +09:00
parent 6b75453072
commit cfa587c61d
4 changed files with 65 additions and 26 deletions

View File

@ -31,6 +31,26 @@ __thread uint64_t g_unified_cache_push[TINY_NUM_CLASSES] = {0};
__thread uint64_t g_unified_cache_full[TINY_NUM_CLASSES] = {0};
#endif
// ============================================================================
// Phase 8-Step1-Fix: unified_cache_enabled() implementation (non-static)
// ============================================================================
// Enable flag (default: ON, disable with HAKMEM_TINY_UNIFIED_CACHE=0)
int unified_cache_enabled(void) {
static int g_enable = -1;
if (__builtin_expect(g_enable == -1, 0)) {
const char* e = getenv("HAKMEM_TINY_UNIFIED_CACHE");
g_enable = (e && *e && *e == '0') ? 0 : 1; // default ON
#if !HAKMEM_BUILD_RELEASE
if (g_enable) {
fprintf(stderr, "[Unified-INIT] unified_cache_enabled() = %d\n", g_enable);
fflush(stderr);
}
#endif
}
return g_enable;
}
// ============================================================================
// Init (called at thread start or lazy on first access)
// ============================================================================