Files
hakmem/core/box/tiny_front_config_box.h

171 lines
7.2 KiB
C
Raw Normal View History

Phase 4-Step3: Add Front Config Box (+2.7-4.9% dead code elimination) Implement compile-time configuration system for dead code elimination in Tiny allocation hot paths. The Config Box provides dual-mode configuration: - Normal mode: Runtime ENV checks (backward compatible, flexible) - PGO mode: Compile-time constants (dead code elimination, performance) PERFORMANCE: - Baseline (runtime config): 50.32 M ops/s (avg of 5 runs) - Config Box (PGO mode): 52.77 M ops/s (avg of 5 runs) - Improvement: +2.45 M ops/s (+4.87% with outlier, +2.72% without) - Target: +5-8% (partially achieved) IMPLEMENTATION: 1. core/box/tiny_front_config_box.h (NEW): - Defines TINY_FRONT_*_ENABLED macros for all config checks - PGO mode (#if HAKMEM_TINY_FRONT_PGO): Macros expand to constants (0/1) - Normal mode (#else): Macros expand to function calls - Functions remain in their original locations (no code duplication) 2. core/hakmem_build_flags.h: - Added HAKMEM_TINY_FRONT_PGO build flag (default: 0, off) - Documentation: Usage with make EXTRA_CFLAGS="-DHAKMEM_TINY_FRONT_PGO=1" 3. core/box/hak_wrappers.inc.h: - Replaced front_gate_unified_enabled() with TINY_FRONT_UNIFIED_GATE_ENABLED - 2 call sites updated (malloc and free fast paths) - Added config box include EXPECTED DEAD CODE ELIMINATION (PGO mode): if (TINY_FRONT_UNIFIED_GATE_ENABLED) { ... } → if (1) { ... } // Constant, always true → Compiler optimizes away the branch, keeps body SCOPE: Currently only front_gate_unified_enabled() is replaced (2 call sites). To achieve full +5-8% target, expand to other config checks: - ultra_slim_mode_enabled() - tiny_heap_v2_enabled() - sfc_cascade_enabled() - tiny_fastcache_enabled() - tiny_metrics_enabled() - tiny_diag_enabled() BUILD USAGE: Normal mode (runtime config, default): make bench_random_mixed_hakmem PGO mode (compile-time config, dead code elimination): make EXTRA_CFLAGS="-DHAKMEM_TINY_FRONT_PGO=1" bench_random_mixed_hakmem BOX PATTERN COMPLIANCE: ✅ Single Responsibility: Configuration management ONLY ✅ Clear Contract: Dual-mode (PGO = constants, Normal = runtime) ✅ Observable: Config report function (debug builds) ✅ Safe: Backward compatible (default is normal mode) ✅ Testable: Easy A/B comparison (PGO vs normal builds) WHY +2.7-4.9% (below +5-8% target)? - Limited scope: Only 2 call sites for 1 config function replaced - Lazy init overhead: front_gate_unified_enabled() cached after first call - Need to expand to more config checks for full benefit NEXT STEPS: - Expand config macro usage to other functions (optional) - OR proceed with PGO re-enablement (Final polish) 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-29 12:18:37 +09:00
// tiny_front_config_box.h - Phase 4-Step3: Tiny Front Config Box
// Purpose: Compile-time configuration for dead code elimination
// Contract: Dual-mode (compile-time fixed vs. runtime ENV checks)
// Performance: Target +5-8% via branch elimination (57.2M → 60-62M ops/s)
//
// Design Principles (Box Pattern):
// 1. Single Responsibility: Configuration management ONLY
// 2. Clear Contract: PGO mode = compile-time constants, Normal mode = runtime checks
// 3. Observable: Config report function (debug builds)
// 4. Safe: Backward compatible (default runtime mode)
// 5. Testable: Easy A/B comparison (PGO vs normal builds)
//
// Usage:
// Normal build (runtime config, backward compatible):
// make bench_random_mixed_hakmem
//
// PGO build (compile-time config, dead code elimination):
// make CFLAGS="-DHAKMEM_TINY_FRONT_PGO=1" bench_random_mixed_hakmem
//
// Expected Benefit:
// - Dead code elimination: Compiler removes disabled code paths
// - Branch reduction: if (CONSTANT_0) { ... } → eliminated
// - I-cache improvement: Smaller code size (no dead branches)
// - Target: +5-8% improvement (even without PGO profiling)
#ifndef TINY_FRONT_CONFIG_BOX_H
#define TINY_FRONT_CONFIG_BOX_H
#include <stdio.h>
#include "../hakmem_build_flags.h"
// ============================================================================
// Build Flag Check (must be defined in hakmem_build_flags.h)
// ============================================================================
#ifndef HAKMEM_TINY_FRONT_PGO
# define HAKMEM_TINY_FRONT_PGO 0
#endif
// ============================================================================
// PGO Mode: Fixed Configuration (Compile-Time Constants)
// ============================================================================
#if HAKMEM_TINY_FRONT_PGO
// PGO-optimized build: All runtime checks become compile-time constants
// Compiler constant folding eliminates dead branches:
// if (TINY_FRONT_HEAP_V2_ENABLED) { ... } // 0 → entire block removed
// if (!TINY_FRONT_SFC_ENABLED) { ... } // !1 → entire block removed
#define TINY_FRONT_ULTRA_SLIM_ENABLED 0 // Disabled (use normal front)
#define TINY_FRONT_HEAP_V2_ENABLED 0 // Disabled (use Unified Cache)
#define TINY_FRONT_SFC_ENABLED 1 // Enabled (SFC cascade)
#define TINY_FRONT_FASTCACHE_ENABLED 0 // Disabled (use Unified Cache)
Phase 7-Step7: Replace g_tls_sll_enable with TINY_FRONT_TLS_SLL_ENABLED macro **Goal**: Enable dead code elimination for TLS SLL checks in PGO mode **Changes**: 1. core/box/tiny_front_config_box.h: - Add TINY_FRONT_TLS_SLL_ENABLED macro (PGO: 1, Normal: tiny_tls_sll_enabled()) - Add tiny_tls_sll_enabled() wrapper function (static inline) 2. core/tiny_alloc_fast.inc.h (5 hot path locations): - Line 220: tiny_heap_v2_refill_mag() - early return check - Line 388: SLIM mode - SLL freelist check - Line 459: tiny_alloc_fast_pop() - Layer 1 SLL check - Line 774: Main alloc path - cached sll_enabled check (most critical!) - Line 815: Generic front - SLL toggle respect 3. core/hakmem_tiny_refill.inc.h (2 locations): - Line 186: bulk_mag_refill_fc() - refill from SLL - Line 213: bulk_mag_to_sll_if_room() - push to SLL **Performance**: 79.9M ops/s (maintained, +0.1M vs Step 6) - Normal mode: Same performance (runtime checks preserved) - PGO mode: Dead code elimination ready (if (!1) → removed by compiler) **Expected PGO benefit**: - Eliminate 7 TLS SLL checks across hot paths - Reduce instruction count in main alloc loop - Better branch prediction (no runtime checks) **Design**: Config Box as single entry point - All TLS SLL checks now use TINY_FRONT_TLS_SLL_ENABLED - Consistent pattern with FASTCACHE/SFC/HEAP_V2 macros - Include order independent (wrapper in config box header) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-29 17:35:51 +09:00
#define TINY_FRONT_TLS_SLL_ENABLED 1 // Enabled (TLS SLL freelist)
Phase 8-Step1-3: Unified Cache hot path optimization (config macro + prewarm + PGO init removal) Goal: Reduce branches in Unified Cache hot paths (-2 branches per op) Expected improvement: +2-3% in PGO mode Changes: 1. Config Macro (Step 1): - Added TINY_FRONT_UNIFIED_CACHE_ENABLED macro to tiny_front_config_box.h - PGO mode: compile-time constant (1) - Normal mode: runtime function call unified_cache_enabled() - Replaced unified_cache_enabled() calls in 3 locations: * unified_cache_pop() line 142 * unified_cache_push() line 182 * unified_cache_pop_or_refill() line 228 2. Function Declaration Fix: - Moved unified_cache_enabled() from static inline to non-static - Implementation in tiny_unified_cache.c (was in .h as static inline) - Forward declaration in tiny_front_config_box.h - Resolves declaration conflict between config box and header 3. Prewarm (Step 2): - Added unified_cache_init() call to bench_fast_init() - Ensures cache is initialized before benchmark starts - Enables PGO builds to remove lazy init checks 4. Conditional Init Removal (Step 3): - Wrapped lazy init checks in #if !HAKMEM_TINY_FRONT_PGO - PGO builds assume prewarm → no init check needed (-1 branch) - Normal builds keep lazy init for safety - Applied to 3 functions: unified_cache_pop(), unified_cache_push(), unified_cache_pop_or_refill() Performance Impact: PGO mode: -2 branches per operation (enabled check + init check) Normal mode: Same as before (runtime checks) Branch Elimination (PGO): Before: if (!unified_cache_enabled()) + if (slots == NULL) After: if (!1) [eliminated] + [init check removed] Result: -2 branches in alloc/free hot paths Files Modified: core/box/tiny_front_config_box.h - Config macro + forward declaration core/front/tiny_unified_cache.h - Config macro usage + PGO conditionals core/front/tiny_unified_cache.c - unified_cache_enabled() implementation core/box/bench_fast_box.c - Prewarm call in bench_fast_init() Note: BenchFast mode has pre-existing crash (not caused by these changes) 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-29 17:58:42 +09:00
#define TINY_FRONT_UNIFIED_CACHE_ENABLED 1 // Enabled (Unified Cache - tcache-style)
Phase 4-Step3: Add Front Config Box (+2.7-4.9% dead code elimination) Implement compile-time configuration system for dead code elimination in Tiny allocation hot paths. The Config Box provides dual-mode configuration: - Normal mode: Runtime ENV checks (backward compatible, flexible) - PGO mode: Compile-time constants (dead code elimination, performance) PERFORMANCE: - Baseline (runtime config): 50.32 M ops/s (avg of 5 runs) - Config Box (PGO mode): 52.77 M ops/s (avg of 5 runs) - Improvement: +2.45 M ops/s (+4.87% with outlier, +2.72% without) - Target: +5-8% (partially achieved) IMPLEMENTATION: 1. core/box/tiny_front_config_box.h (NEW): - Defines TINY_FRONT_*_ENABLED macros for all config checks - PGO mode (#if HAKMEM_TINY_FRONT_PGO): Macros expand to constants (0/1) - Normal mode (#else): Macros expand to function calls - Functions remain in their original locations (no code duplication) 2. core/hakmem_build_flags.h: - Added HAKMEM_TINY_FRONT_PGO build flag (default: 0, off) - Documentation: Usage with make EXTRA_CFLAGS="-DHAKMEM_TINY_FRONT_PGO=1" 3. core/box/hak_wrappers.inc.h: - Replaced front_gate_unified_enabled() with TINY_FRONT_UNIFIED_GATE_ENABLED - 2 call sites updated (malloc and free fast paths) - Added config box include EXPECTED DEAD CODE ELIMINATION (PGO mode): if (TINY_FRONT_UNIFIED_GATE_ENABLED) { ... } → if (1) { ... } // Constant, always true → Compiler optimizes away the branch, keeps body SCOPE: Currently only front_gate_unified_enabled() is replaced (2 call sites). To achieve full +5-8% target, expand to other config checks: - ultra_slim_mode_enabled() - tiny_heap_v2_enabled() - sfc_cascade_enabled() - tiny_fastcache_enabled() - tiny_metrics_enabled() - tiny_diag_enabled() BUILD USAGE: Normal mode (runtime config, default): make bench_random_mixed_hakmem PGO mode (compile-time config, dead code elimination): make EXTRA_CFLAGS="-DHAKMEM_TINY_FRONT_PGO=1" bench_random_mixed_hakmem BOX PATTERN COMPLIANCE: ✅ Single Responsibility: Configuration management ONLY ✅ Clear Contract: Dual-mode (PGO = constants, Normal = runtime) ✅ Observable: Config report function (debug builds) ✅ Safe: Backward compatible (default is normal mode) ✅ Testable: Easy A/B comparison (PGO vs normal builds) WHY +2.7-4.9% (below +5-8% target)? - Limited scope: Only 2 call sites for 1 config function replaced - Lazy init overhead: front_gate_unified_enabled() cached after first call - Need to expand to more config checks for full benefit NEXT STEPS: - Expand config macro usage to other functions (optional) - OR proceed with PGO re-enablement (Final polish) 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-29 12:18:37 +09:00
#define TINY_FRONT_UNIFIED_GATE_ENABLED 1 // Enabled (Front Gate Unification)
#define TINY_FRONT_METRICS_ENABLED 0 // Disabled (no runtime overhead)
#define TINY_FRONT_DIAG_ENABLED 0 // Disabled (no diagnostics)
// Expected code reduction:
// - Ultra SLIM check: 1 branch removed
// - Heap V2 check: 1 branch removed
// - Metrics check: 2-3 branches removed
// - Diag check: 1 branch removed
// Total: 5-7 branches eliminated in hot path
#else
// ============================================================================
// Normal Mode: Runtime Configuration (Backward Compatible)
// ============================================================================
// Normal build: Checks ENV variables or global config state
// Preserves backward compatibility with existing ENV variable interface
//
// NOTE: The actual runtime config functions (ultra_slim_mode_enabled, etc.)
// are defined in their respective modules:
// - front_gate_unified_enabled() → core/front/malloc_tiny_fast.h
// - sfc_cascade_enabled() → core/hakmem_tiny_sfc.h
// - tiny_heap_v2_enabled() → core/front/tiny_heap_v2.h
// - etc.
//
// This config box ONLY defines the macros that expand to function calls.
// The functions themselves are implemented here as static inline to avoid include order issues.
// Phase 7-Step6-Fix: Config wrapper functions (for normal mode)
// These are static inline to access static global variables from any include order
static inline int tiny_fastcache_enabled(void) {
extern int g_fastcache_enable;
return g_fastcache_enable;
}
Phase 7-Step8: Replace SFC/HEAP_V2/ULTRA_SLIM runtime checks with config macros **Goal**: Complete dead code elimination infrastructure for all runtime checks **Changes**: 1. core/box/tiny_front_config_box.h: - Rename sfc_cascade_enabled() → tiny_sfc_enabled() (avoid name collision) - Update TINY_FRONT_SFC_ENABLED macro to use tiny_sfc_enabled() 2. core/tiny_alloc_fast.inc.h (5 locations): - Line 274: tiny_heap_v2_alloc_by_class() - use TINY_FRONT_HEAP_V2_ENABLED - Line 431: SFC TLS cache init - use TINY_FRONT_SFC_ENABLED - Line 678: SFC cascade check - use TINY_FRONT_SFC_ENABLED - Line 740: Ultra SLIM debug check - use TINY_FRONT_ULTRA_SLIM_ENABLED 3. core/hakmem_tiny_free.inc (1 location): - Line 233: Heap V2 free path - use TINY_FRONT_HEAP_V2_ENABLED **Performance**: 79.5M ops/s (maintained, -0.4M vs Step 7, within noise) - Normal mode: Neutral (runtime checks preserved) - PGO mode: Ready for dead code elimination **Total Runtime Checks Replaced (Phase 7)**: - ✅ TINY_FRONT_FASTCACHE_ENABLED: 3 locations (Step 4-6) - ✅ TINY_FRONT_TLS_SLL_ENABLED: 7 locations (Step 7) - ✅ TINY_FRONT_SFC_ENABLED: 2 locations (Step 8) - ✅ TINY_FRONT_HEAP_V2_ENABLED: 2 locations (Step 8) - ✅ TINY_FRONT_ULTRA_SLIM_ENABLED: 1 location (Step 8) **Total**: 15 runtime checks → config macros **PGO Mode Expected Benefit**: - Eliminate 15 runtime checks across hot paths - Reduce branch mispredictions - Smaller code size (dead code removed by compiler) - Better instruction cache locality **Design Complete**: Config Box as single entry point for all Tiny Front policy - Unified macro interface for all feature toggles - Include order independent (static inline wrappers) - Dual-mode support (PGO compile-time vs normal runtime) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-29 17:40:05 +09:00
static inline int tiny_sfc_enabled(void) {
extern int g_sfc_enabled;
return g_sfc_enabled;
}
Phase 4-Step3: Add Front Config Box (+2.7-4.9% dead code elimination) Implement compile-time configuration system for dead code elimination in Tiny allocation hot paths. The Config Box provides dual-mode configuration: - Normal mode: Runtime ENV checks (backward compatible, flexible) - PGO mode: Compile-time constants (dead code elimination, performance) PERFORMANCE: - Baseline (runtime config): 50.32 M ops/s (avg of 5 runs) - Config Box (PGO mode): 52.77 M ops/s (avg of 5 runs) - Improvement: +2.45 M ops/s (+4.87% with outlier, +2.72% without) - Target: +5-8% (partially achieved) IMPLEMENTATION: 1. core/box/tiny_front_config_box.h (NEW): - Defines TINY_FRONT_*_ENABLED macros for all config checks - PGO mode (#if HAKMEM_TINY_FRONT_PGO): Macros expand to constants (0/1) - Normal mode (#else): Macros expand to function calls - Functions remain in their original locations (no code duplication) 2. core/hakmem_build_flags.h: - Added HAKMEM_TINY_FRONT_PGO build flag (default: 0, off) - Documentation: Usage with make EXTRA_CFLAGS="-DHAKMEM_TINY_FRONT_PGO=1" 3. core/box/hak_wrappers.inc.h: - Replaced front_gate_unified_enabled() with TINY_FRONT_UNIFIED_GATE_ENABLED - 2 call sites updated (malloc and free fast paths) - Added config box include EXPECTED DEAD CODE ELIMINATION (PGO mode): if (TINY_FRONT_UNIFIED_GATE_ENABLED) { ... } → if (1) { ... } // Constant, always true → Compiler optimizes away the branch, keeps body SCOPE: Currently only front_gate_unified_enabled() is replaced (2 call sites). To achieve full +5-8% target, expand to other config checks: - ultra_slim_mode_enabled() - tiny_heap_v2_enabled() - sfc_cascade_enabled() - tiny_fastcache_enabled() - tiny_metrics_enabled() - tiny_diag_enabled() BUILD USAGE: Normal mode (runtime config, default): make bench_random_mixed_hakmem PGO mode (compile-time config, dead code elimination): make EXTRA_CFLAGS="-DHAKMEM_TINY_FRONT_PGO=1" bench_random_mixed_hakmem BOX PATTERN COMPLIANCE: ✅ Single Responsibility: Configuration management ONLY ✅ Clear Contract: Dual-mode (PGO = constants, Normal = runtime) ✅ Observable: Config report function (debug builds) ✅ Safe: Backward compatible (default is normal mode) ✅ Testable: Easy A/B comparison (PGO vs normal builds) WHY +2.7-4.9% (below +5-8% target)? - Limited scope: Only 2 call sites for 1 config function replaced - Lazy init overhead: front_gate_unified_enabled() cached after first call - Need to expand to more config checks for full benefit NEXT STEPS: - Expand config macro usage to other functions (optional) - OR proceed with PGO re-enablement (Final polish) 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-29 12:18:37 +09:00
Phase 7-Step7: Replace g_tls_sll_enable with TINY_FRONT_TLS_SLL_ENABLED macro **Goal**: Enable dead code elimination for TLS SLL checks in PGO mode **Changes**: 1. core/box/tiny_front_config_box.h: - Add TINY_FRONT_TLS_SLL_ENABLED macro (PGO: 1, Normal: tiny_tls_sll_enabled()) - Add tiny_tls_sll_enabled() wrapper function (static inline) 2. core/tiny_alloc_fast.inc.h (5 hot path locations): - Line 220: tiny_heap_v2_refill_mag() - early return check - Line 388: SLIM mode - SLL freelist check - Line 459: tiny_alloc_fast_pop() - Layer 1 SLL check - Line 774: Main alloc path - cached sll_enabled check (most critical!) - Line 815: Generic front - SLL toggle respect 3. core/hakmem_tiny_refill.inc.h (2 locations): - Line 186: bulk_mag_refill_fc() - refill from SLL - Line 213: bulk_mag_to_sll_if_room() - push to SLL **Performance**: 79.9M ops/s (maintained, +0.1M vs Step 6) - Normal mode: Same performance (runtime checks preserved) - PGO mode: Dead code elimination ready (if (!1) → removed by compiler) **Expected PGO benefit**: - Eliminate 7 TLS SLL checks across hot paths - Reduce instruction count in main alloc loop - Better branch prediction (no runtime checks) **Design**: Config Box as single entry point - All TLS SLL checks now use TINY_FRONT_TLS_SLL_ENABLED - Consistent pattern with FASTCACHE/SFC/HEAP_V2 macros - Include order independent (wrapper in config box header) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-29 17:35:51 +09:00
static inline int tiny_tls_sll_enabled(void) {
extern int g_tls_sll_enable;
return g_tls_sll_enable;
}
Phase 8-Step1-3: Unified Cache hot path optimization (config macro + prewarm + PGO init removal) Goal: Reduce branches in Unified Cache hot paths (-2 branches per op) Expected improvement: +2-3% in PGO mode Changes: 1. Config Macro (Step 1): - Added TINY_FRONT_UNIFIED_CACHE_ENABLED macro to tiny_front_config_box.h - PGO mode: compile-time constant (1) - Normal mode: runtime function call unified_cache_enabled() - Replaced unified_cache_enabled() calls in 3 locations: * unified_cache_pop() line 142 * unified_cache_push() line 182 * unified_cache_pop_or_refill() line 228 2. Function Declaration Fix: - Moved unified_cache_enabled() from static inline to non-static - Implementation in tiny_unified_cache.c (was in .h as static inline) - Forward declaration in tiny_front_config_box.h - Resolves declaration conflict between config box and header 3. Prewarm (Step 2): - Added unified_cache_init() call to bench_fast_init() - Ensures cache is initialized before benchmark starts - Enables PGO builds to remove lazy init checks 4. Conditional Init Removal (Step 3): - Wrapped lazy init checks in #if !HAKMEM_TINY_FRONT_PGO - PGO builds assume prewarm → no init check needed (-1 branch) - Normal builds keep lazy init for safety - Applied to 3 functions: unified_cache_pop(), unified_cache_push(), unified_cache_pop_or_refill() Performance Impact: PGO mode: -2 branches per operation (enabled check + init check) Normal mode: Same as before (runtime checks) Branch Elimination (PGO): Before: if (!unified_cache_enabled()) + if (slots == NULL) After: if (!1) [eliminated] + [init check removed] Result: -2 branches in alloc/free hot paths Files Modified: core/box/tiny_front_config_box.h - Config macro + forward declaration core/front/tiny_unified_cache.h - Config macro usage + PGO conditionals core/front/tiny_unified_cache.c - unified_cache_enabled() implementation core/box/bench_fast_box.c - Prewarm call in bench_fast_init() Note: BenchFast mode has pre-existing crash (not caused by these changes) 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-29 17:58:42 +09:00
// Phase 8-Step1: Unified Cache enabled wrapper
// Forward declaration - actual function is in tiny_unified_cache.c
int unified_cache_enabled(void);
Phase 4-Step3: Add Front Config Box (+2.7-4.9% dead code elimination) Implement compile-time configuration system for dead code elimination in Tiny allocation hot paths. The Config Box provides dual-mode configuration: - Normal mode: Runtime ENV checks (backward compatible, flexible) - PGO mode: Compile-time constants (dead code elimination, performance) PERFORMANCE: - Baseline (runtime config): 50.32 M ops/s (avg of 5 runs) - Config Box (PGO mode): 52.77 M ops/s (avg of 5 runs) - Improvement: +2.45 M ops/s (+4.87% with outlier, +2.72% without) - Target: +5-8% (partially achieved) IMPLEMENTATION: 1. core/box/tiny_front_config_box.h (NEW): - Defines TINY_FRONT_*_ENABLED macros for all config checks - PGO mode (#if HAKMEM_TINY_FRONT_PGO): Macros expand to constants (0/1) - Normal mode (#else): Macros expand to function calls - Functions remain in their original locations (no code duplication) 2. core/hakmem_build_flags.h: - Added HAKMEM_TINY_FRONT_PGO build flag (default: 0, off) - Documentation: Usage with make EXTRA_CFLAGS="-DHAKMEM_TINY_FRONT_PGO=1" 3. core/box/hak_wrappers.inc.h: - Replaced front_gate_unified_enabled() with TINY_FRONT_UNIFIED_GATE_ENABLED - 2 call sites updated (malloc and free fast paths) - Added config box include EXPECTED DEAD CODE ELIMINATION (PGO mode): if (TINY_FRONT_UNIFIED_GATE_ENABLED) { ... } → if (1) { ... } // Constant, always true → Compiler optimizes away the branch, keeps body SCOPE: Currently only front_gate_unified_enabled() is replaced (2 call sites). To achieve full +5-8% target, expand to other config checks: - ultra_slim_mode_enabled() - tiny_heap_v2_enabled() - sfc_cascade_enabled() - tiny_fastcache_enabled() - tiny_metrics_enabled() - tiny_diag_enabled() BUILD USAGE: Normal mode (runtime config, default): make bench_random_mixed_hakmem PGO mode (compile-time config, dead code elimination): make EXTRA_CFLAGS="-DHAKMEM_TINY_FRONT_PGO=1" bench_random_mixed_hakmem BOX PATTERN COMPLIANCE: ✅ Single Responsibility: Configuration management ONLY ✅ Clear Contract: Dual-mode (PGO = constants, Normal = runtime) ✅ Observable: Config report function (debug builds) ✅ Safe: Backward compatible (default is normal mode) ✅ Testable: Easy A/B comparison (PGO vs normal builds) WHY +2.7-4.9% (below +5-8% target)? - Limited scope: Only 2 call sites for 1 config function replaced - Lazy init overhead: front_gate_unified_enabled() cached after first call - Need to expand to more config checks for full benefit NEXT STEPS: - Expand config macro usage to other functions (optional) - OR proceed with PGO re-enablement (Final polish) 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-29 12:18:37 +09:00
// Config macros (runtime function calls)
// These expand to actual function calls in normal mode
Phase 8-Step1-3: Unified Cache hot path optimization (config macro + prewarm + PGO init removal) Goal: Reduce branches in Unified Cache hot paths (-2 branches per op) Expected improvement: +2-3% in PGO mode Changes: 1. Config Macro (Step 1): - Added TINY_FRONT_UNIFIED_CACHE_ENABLED macro to tiny_front_config_box.h - PGO mode: compile-time constant (1) - Normal mode: runtime function call unified_cache_enabled() - Replaced unified_cache_enabled() calls in 3 locations: * unified_cache_pop() line 142 * unified_cache_push() line 182 * unified_cache_pop_or_refill() line 228 2. Function Declaration Fix: - Moved unified_cache_enabled() from static inline to non-static - Implementation in tiny_unified_cache.c (was in .h as static inline) - Forward declaration in tiny_front_config_box.h - Resolves declaration conflict between config box and header 3. Prewarm (Step 2): - Added unified_cache_init() call to bench_fast_init() - Ensures cache is initialized before benchmark starts - Enables PGO builds to remove lazy init checks 4. Conditional Init Removal (Step 3): - Wrapped lazy init checks in #if !HAKMEM_TINY_FRONT_PGO - PGO builds assume prewarm → no init check needed (-1 branch) - Normal builds keep lazy init for safety - Applied to 3 functions: unified_cache_pop(), unified_cache_push(), unified_cache_pop_or_refill() Performance Impact: PGO mode: -2 branches per operation (enabled check + init check) Normal mode: Same as before (runtime checks) Branch Elimination (PGO): Before: if (!unified_cache_enabled()) + if (slots == NULL) After: if (!1) [eliminated] + [init check removed] Result: -2 branches in alloc/free hot paths Files Modified: core/box/tiny_front_config_box.h - Config macro + forward declaration core/front/tiny_unified_cache.h - Config macro usage + PGO conditionals core/front/tiny_unified_cache.c - unified_cache_enabled() implementation core/box/bench_fast_box.c - Prewarm call in bench_fast_init() Note: BenchFast mode has pre-existing crash (not caused by these changes) 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-29 17:58:42 +09:00
#define TINY_FRONT_ULTRA_SLIM_ENABLED ultra_slim_mode_enabled()
#define TINY_FRONT_HEAP_V2_ENABLED tiny_heap_v2_enabled()
#define TINY_FRONT_SFC_ENABLED tiny_sfc_enabled()
#define TINY_FRONT_FASTCACHE_ENABLED tiny_fastcache_enabled()
#define TINY_FRONT_TLS_SLL_ENABLED tiny_tls_sll_enabled()
#define TINY_FRONT_UNIFIED_CACHE_ENABLED unified_cache_enabled()
#define TINY_FRONT_UNIFIED_GATE_ENABLED front_gate_unified_enabled()
#define TINY_FRONT_METRICS_ENABLED tiny_metrics_enabled()
#define TINY_FRONT_DIAG_ENABLED tiny_diag_enabled()
Phase 4-Step3: Add Front Config Box (+2.7-4.9% dead code elimination) Implement compile-time configuration system for dead code elimination in Tiny allocation hot paths. The Config Box provides dual-mode configuration: - Normal mode: Runtime ENV checks (backward compatible, flexible) - PGO mode: Compile-time constants (dead code elimination, performance) PERFORMANCE: - Baseline (runtime config): 50.32 M ops/s (avg of 5 runs) - Config Box (PGO mode): 52.77 M ops/s (avg of 5 runs) - Improvement: +2.45 M ops/s (+4.87% with outlier, +2.72% without) - Target: +5-8% (partially achieved) IMPLEMENTATION: 1. core/box/tiny_front_config_box.h (NEW): - Defines TINY_FRONT_*_ENABLED macros for all config checks - PGO mode (#if HAKMEM_TINY_FRONT_PGO): Macros expand to constants (0/1) - Normal mode (#else): Macros expand to function calls - Functions remain in their original locations (no code duplication) 2. core/hakmem_build_flags.h: - Added HAKMEM_TINY_FRONT_PGO build flag (default: 0, off) - Documentation: Usage with make EXTRA_CFLAGS="-DHAKMEM_TINY_FRONT_PGO=1" 3. core/box/hak_wrappers.inc.h: - Replaced front_gate_unified_enabled() with TINY_FRONT_UNIFIED_GATE_ENABLED - 2 call sites updated (malloc and free fast paths) - Added config box include EXPECTED DEAD CODE ELIMINATION (PGO mode): if (TINY_FRONT_UNIFIED_GATE_ENABLED) { ... } → if (1) { ... } // Constant, always true → Compiler optimizes away the branch, keeps body SCOPE: Currently only front_gate_unified_enabled() is replaced (2 call sites). To achieve full +5-8% target, expand to other config checks: - ultra_slim_mode_enabled() - tiny_heap_v2_enabled() - sfc_cascade_enabled() - tiny_fastcache_enabled() - tiny_metrics_enabled() - tiny_diag_enabled() BUILD USAGE: Normal mode (runtime config, default): make bench_random_mixed_hakmem PGO mode (compile-time config, dead code elimination): make EXTRA_CFLAGS="-DHAKMEM_TINY_FRONT_PGO=1" bench_random_mixed_hakmem BOX PATTERN COMPLIANCE: ✅ Single Responsibility: Configuration management ONLY ✅ Clear Contract: Dual-mode (PGO = constants, Normal = runtime) ✅ Observable: Config report function (debug builds) ✅ Safe: Backward compatible (default is normal mode) ✅ Testable: Easy A/B comparison (PGO vs normal builds) WHY +2.7-4.9% (below +5-8% target)? - Limited scope: Only 2 call sites for 1 config function replaced - Lazy init overhead: front_gate_unified_enabled() cached after first call - Need to expand to more config checks for full benefit NEXT STEPS: - Expand config macro usage to other functions (optional) - OR proceed with PGO re-enablement (Final polish) 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-29 12:18:37 +09:00
#endif // HAKMEM_TINY_FRONT_PGO
// ============================================================================
// Configuration Helpers
// ============================================================================
// Check if running in PGO-optimized build
static inline int tiny_front_is_pgo_build(void) {
return HAKMEM_TINY_FRONT_PGO;
}
// Get effective configuration (for diagnostics)
static inline void tiny_front_config_report(void) {
#if !HAKMEM_BUILD_RELEASE
fprintf(stderr, "[TINY_FRONT_CONFIG]\n");
fprintf(stderr, " PGO Build: %d\n", HAKMEM_TINY_FRONT_PGO);
fprintf(stderr, " Ultra SLIM: %d\n", TINY_FRONT_ULTRA_SLIM_ENABLED);
fprintf(stderr, " Heap V2: %d\n", TINY_FRONT_HEAP_V2_ENABLED);
fprintf(stderr, " SFC: %d\n", TINY_FRONT_SFC_ENABLED);
fprintf(stderr, " FastCache: %d\n", TINY_FRONT_FASTCACHE_ENABLED);
fprintf(stderr, " Unified Gate: %d\n", TINY_FRONT_UNIFIED_GATE_ENABLED);
fprintf(stderr, " Metrics: %d\n", TINY_FRONT_METRICS_ENABLED);
fprintf(stderr, " Diag: %d\n", TINY_FRONT_DIAG_ENABLED);
fflush(stderr);
#endif
}
// ============================================================================
// Performance Notes
// ============================================================================
// Expected improvements (Phase 4-Step3):
// - Random Mixed 256: 57.2M → 60-62M ops/s (+5-8%)
// - Tiny Hot 64B: Current → +5-8%
//
// Key optimizations:
// 1. Dead code elimination: Compiler removes disabled code paths
// 2. Branch reduction: if (CONSTANT) → compile-time evaluation
// 3. I-cache improvement: Smaller code size (no dead branches)
// 4. Constant propagation: Compiler optimizes based on known values
//
// Trade-offs:
// 1. Binary size: PGO build is specialized (not configurable at runtime)
// 2. Flexibility: PGO build ignores ENV variables (fixed config)
// 3. Testing: Need separate builds for A/B testing (PGO vs normal)
//
// Recommendation:
// - Development: Use normal build (runtime config, flexible)
// - Production: Use PGO build after profiling (maximum performance)
#endif // TINY_FRONT_CONFIG_BOX_H