Phase 4-Step3: Add Front Config Box (+2.7-4.9% dead code elimination)
Implement compile-time configuration system for dead code elimination in Tiny
allocation hot paths. The Config Box provides dual-mode configuration:
- Normal mode: Runtime ENV checks (backward compatible, flexible)
- PGO mode: Compile-time constants (dead code elimination, performance)
PERFORMANCE:
- Baseline (runtime config): 50.32 M ops/s (avg of 5 runs)
- Config Box (PGO mode): 52.77 M ops/s (avg of 5 runs)
- Improvement: +2.45 M ops/s (+4.87% with outlier, +2.72% without)
- Target: +5-8% (partially achieved)
IMPLEMENTATION:
1. core/box/tiny_front_config_box.h (NEW):
- Defines TINY_FRONT_*_ENABLED macros for all config checks
- PGO mode (#if HAKMEM_TINY_FRONT_PGO): Macros expand to constants (0/1)
- Normal mode (#else): Macros expand to function calls
- Functions remain in their original locations (no code duplication)
2. core/hakmem_build_flags.h:
- Added HAKMEM_TINY_FRONT_PGO build flag (default: 0, off)
- Documentation: Usage with make EXTRA_CFLAGS="-DHAKMEM_TINY_FRONT_PGO=1"
3. core/box/hak_wrappers.inc.h:
- Replaced front_gate_unified_enabled() with TINY_FRONT_UNIFIED_GATE_ENABLED
- 2 call sites updated (malloc and free fast paths)
- Added config box include
EXPECTED DEAD CODE ELIMINATION (PGO mode):
if (TINY_FRONT_UNIFIED_GATE_ENABLED) { ... }
→ if (1) { ... } // Constant, always true
→ Compiler optimizes away the branch, keeps body
SCOPE:
Currently only front_gate_unified_enabled() is replaced (2 call sites).
To achieve full +5-8% target, expand to other config checks:
- ultra_slim_mode_enabled()
- tiny_heap_v2_enabled()
- sfc_cascade_enabled()
- tiny_fastcache_enabled()
- tiny_metrics_enabled()
- tiny_diag_enabled()
BUILD USAGE:
Normal mode (runtime config, default):
make bench_random_mixed_hakmem
PGO mode (compile-time config, dead code elimination):
make EXTRA_CFLAGS="-DHAKMEM_TINY_FRONT_PGO=1" bench_random_mixed_hakmem
BOX PATTERN COMPLIANCE:
✅ Single Responsibility: Configuration management ONLY
✅ Clear Contract: Dual-mode (PGO = constants, Normal = runtime)
✅ Observable: Config report function (debug builds)
✅ Safe: Backward compatible (default is normal mode)
✅ Testable: Easy A/B comparison (PGO vs normal builds)
WHY +2.7-4.9% (below +5-8% target)?
- Limited scope: Only 2 call sites for 1 config function replaced
- Lazy init overhead: front_gate_unified_enabled() cached after first call
- Need to expand to more config checks for full benefit
NEXT STEPS:
- Expand config macro usage to other functions (optional)
- OR proceed with PGO re-enablement (Final polish)
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-29 12:18:37 +09:00
|
|
|
// tiny_front_config_box.h - Phase 4-Step3: Tiny Front Config Box
|
|
|
|
|
// Purpose: Compile-time configuration for dead code elimination
|
|
|
|
|
// Contract: Dual-mode (compile-time fixed vs. runtime ENV checks)
|
|
|
|
|
// Performance: Target +5-8% via branch elimination (57.2M → 60-62M ops/s)
|
|
|
|
|
//
|
|
|
|
|
// Design Principles (Box Pattern):
|
|
|
|
|
// 1. Single Responsibility: Configuration management ONLY
|
|
|
|
|
// 2. Clear Contract: PGO mode = compile-time constants, Normal mode = runtime checks
|
|
|
|
|
// 3. Observable: Config report function (debug builds)
|
|
|
|
|
// 4. Safe: Backward compatible (default runtime mode)
|
|
|
|
|
// 5. Testable: Easy A/B comparison (PGO vs normal builds)
|
|
|
|
|
//
|
|
|
|
|
// Usage:
|
|
|
|
|
// Normal build (runtime config, backward compatible):
|
|
|
|
|
// make bench_random_mixed_hakmem
|
|
|
|
|
//
|
|
|
|
|
// PGO build (compile-time config, dead code elimination):
|
|
|
|
|
// make CFLAGS="-DHAKMEM_TINY_FRONT_PGO=1" bench_random_mixed_hakmem
|
|
|
|
|
//
|
|
|
|
|
// Expected Benefit:
|
|
|
|
|
// - Dead code elimination: Compiler removes disabled code paths
|
|
|
|
|
// - Branch reduction: if (CONSTANT_0) { ... } → eliminated
|
|
|
|
|
// - I-cache improvement: Smaller code size (no dead branches)
|
|
|
|
|
// - Target: +5-8% improvement (even without PGO profiling)
|
|
|
|
|
|
|
|
|
|
#ifndef TINY_FRONT_CONFIG_BOX_H
|
|
|
|
|
#define TINY_FRONT_CONFIG_BOX_H
|
|
|
|
|
|
|
|
|
|
#include <stdio.h>
|
|
|
|
|
#include "../hakmem_build_flags.h"
|
|
|
|
|
|
|
|
|
|
// ============================================================================
|
|
|
|
|
// Build Flag Check (must be defined in hakmem_build_flags.h)
|
|
|
|
|
// ============================================================================
|
|
|
|
|
|
|
|
|
|
#ifndef HAKMEM_TINY_FRONT_PGO
|
|
|
|
|
# define HAKMEM_TINY_FRONT_PGO 0
|
|
|
|
|
#endif
|
|
|
|
|
|
|
|
|
|
// ============================================================================
|
|
|
|
|
// PGO Mode: Fixed Configuration (Compile-Time Constants)
|
|
|
|
|
// ============================================================================
|
|
|
|
|
|
|
|
|
|
#if HAKMEM_TINY_FRONT_PGO
|
|
|
|
|
|
|
|
|
|
// PGO-optimized build: All runtime checks become compile-time constants
|
|
|
|
|
// Compiler constant folding eliminates dead branches:
|
|
|
|
|
// if (TINY_FRONT_HEAP_V2_ENABLED) { ... } // 0 → entire block removed
|
|
|
|
|
// if (!TINY_FRONT_SFC_ENABLED) { ... } // !1 → entire block removed
|
|
|
|
|
|
|
|
|
|
#define TINY_FRONT_ULTRA_SLIM_ENABLED 0 // Disabled (use normal front)
|
|
|
|
|
#define TINY_FRONT_HEAP_V2_ENABLED 0 // Disabled (use Unified Cache)
|
|
|
|
|
#define TINY_FRONT_SFC_ENABLED 1 // Enabled (SFC cascade)
|
|
|
|
|
#define TINY_FRONT_FASTCACHE_ENABLED 0 // Disabled (use Unified Cache)
|
2025-11-29 17:35:51 +09:00
|
|
|
#define TINY_FRONT_TLS_SLL_ENABLED 1 // Enabled (TLS SLL freelist)
|
2025-11-29 17:58:42 +09:00
|
|
|
#define TINY_FRONT_UNIFIED_CACHE_ENABLED 1 // Enabled (Unified Cache - tcache-style)
|
Phase 4-Step3: Add Front Config Box (+2.7-4.9% dead code elimination)
Implement compile-time configuration system for dead code elimination in Tiny
allocation hot paths. The Config Box provides dual-mode configuration:
- Normal mode: Runtime ENV checks (backward compatible, flexible)
- PGO mode: Compile-time constants (dead code elimination, performance)
PERFORMANCE:
- Baseline (runtime config): 50.32 M ops/s (avg of 5 runs)
- Config Box (PGO mode): 52.77 M ops/s (avg of 5 runs)
- Improvement: +2.45 M ops/s (+4.87% with outlier, +2.72% without)
- Target: +5-8% (partially achieved)
IMPLEMENTATION:
1. core/box/tiny_front_config_box.h (NEW):
- Defines TINY_FRONT_*_ENABLED macros for all config checks
- PGO mode (#if HAKMEM_TINY_FRONT_PGO): Macros expand to constants (0/1)
- Normal mode (#else): Macros expand to function calls
- Functions remain in their original locations (no code duplication)
2. core/hakmem_build_flags.h:
- Added HAKMEM_TINY_FRONT_PGO build flag (default: 0, off)
- Documentation: Usage with make EXTRA_CFLAGS="-DHAKMEM_TINY_FRONT_PGO=1"
3. core/box/hak_wrappers.inc.h:
- Replaced front_gate_unified_enabled() with TINY_FRONT_UNIFIED_GATE_ENABLED
- 2 call sites updated (malloc and free fast paths)
- Added config box include
EXPECTED DEAD CODE ELIMINATION (PGO mode):
if (TINY_FRONT_UNIFIED_GATE_ENABLED) { ... }
→ if (1) { ... } // Constant, always true
→ Compiler optimizes away the branch, keeps body
SCOPE:
Currently only front_gate_unified_enabled() is replaced (2 call sites).
To achieve full +5-8% target, expand to other config checks:
- ultra_slim_mode_enabled()
- tiny_heap_v2_enabled()
- sfc_cascade_enabled()
- tiny_fastcache_enabled()
- tiny_metrics_enabled()
- tiny_diag_enabled()
BUILD USAGE:
Normal mode (runtime config, default):
make bench_random_mixed_hakmem
PGO mode (compile-time config, dead code elimination):
make EXTRA_CFLAGS="-DHAKMEM_TINY_FRONT_PGO=1" bench_random_mixed_hakmem
BOX PATTERN COMPLIANCE:
✅ Single Responsibility: Configuration management ONLY
✅ Clear Contract: Dual-mode (PGO = constants, Normal = runtime)
✅ Observable: Config report function (debug builds)
✅ Safe: Backward compatible (default is normal mode)
✅ Testable: Easy A/B comparison (PGO vs normal builds)
WHY +2.7-4.9% (below +5-8% target)?
- Limited scope: Only 2 call sites for 1 config function replaced
- Lazy init overhead: front_gate_unified_enabled() cached after first call
- Need to expand to more config checks for full benefit
NEXT STEPS:
- Expand config macro usage to other functions (optional)
- OR proceed with PGO re-enablement (Final polish)
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-29 12:18:37 +09:00
|
|
|
#define TINY_FRONT_UNIFIED_GATE_ENABLED 1 // Enabled (Front Gate Unification)
|
|
|
|
|
#define TINY_FRONT_METRICS_ENABLED 0 // Disabled (no runtime overhead)
|
|
|
|
|
#define TINY_FRONT_DIAG_ENABLED 0 // Disabled (no diagnostics)
|
|
|
|
|
|
|
|
|
|
// Expected code reduction:
|
|
|
|
|
// - Ultra SLIM check: 1 branch removed
|
|
|
|
|
// - Heap V2 check: 1 branch removed
|
|
|
|
|
// - Metrics check: 2-3 branches removed
|
|
|
|
|
// - Diag check: 1 branch removed
|
|
|
|
|
// Total: 5-7 branches eliminated in hot path
|
|
|
|
|
|
|
|
|
|
#else
|
|
|
|
|
|
|
|
|
|
// ============================================================================
|
|
|
|
|
// Normal Mode: Runtime Configuration (Backward Compatible)
|
|
|
|
|
// ============================================================================
|
|
|
|
|
|
|
|
|
|
// Normal build: Checks ENV variables or global config state
|
|
|
|
|
// Preserves backward compatibility with existing ENV variable interface
|
|
|
|
|
//
|
|
|
|
|
// NOTE: The actual runtime config functions (ultra_slim_mode_enabled, etc.)
|
|
|
|
|
// are defined in their respective modules:
|
|
|
|
|
// - front_gate_unified_enabled() → core/front/malloc_tiny_fast.h
|
|
|
|
|
// - sfc_cascade_enabled() → core/hakmem_tiny_sfc.h
|
|
|
|
|
// - tiny_heap_v2_enabled() → core/front/tiny_heap_v2.h
|
|
|
|
|
// - etc.
|
|
|
|
|
//
|
|
|
|
|
// This config box ONLY defines the macros that expand to function calls.
|
2025-11-29 17:31:32 +09:00
|
|
|
// The functions themselves are implemented here as static inline to avoid include order issues.
|
|
|
|
|
|
|
|
|
|
// Phase 7-Step6-Fix: Config wrapper functions (for normal mode)
|
|
|
|
|
// These are static inline to access static global variables from any include order
|
|
|
|
|
static inline int tiny_fastcache_enabled(void) {
|
|
|
|
|
extern int g_fastcache_enable;
|
|
|
|
|
return g_fastcache_enable;
|
|
|
|
|
}
|
|
|
|
|
|
2025-11-29 17:40:05 +09:00
|
|
|
static inline int tiny_sfc_enabled(void) {
|
2025-11-29 17:31:32 +09:00
|
|
|
extern int g_sfc_enabled;
|
|
|
|
|
return g_sfc_enabled;
|
|
|
|
|
}
|
Phase 4-Step3: Add Front Config Box (+2.7-4.9% dead code elimination)
Implement compile-time configuration system for dead code elimination in Tiny
allocation hot paths. The Config Box provides dual-mode configuration:
- Normal mode: Runtime ENV checks (backward compatible, flexible)
- PGO mode: Compile-time constants (dead code elimination, performance)
PERFORMANCE:
- Baseline (runtime config): 50.32 M ops/s (avg of 5 runs)
- Config Box (PGO mode): 52.77 M ops/s (avg of 5 runs)
- Improvement: +2.45 M ops/s (+4.87% with outlier, +2.72% without)
- Target: +5-8% (partially achieved)
IMPLEMENTATION:
1. core/box/tiny_front_config_box.h (NEW):
- Defines TINY_FRONT_*_ENABLED macros for all config checks
- PGO mode (#if HAKMEM_TINY_FRONT_PGO): Macros expand to constants (0/1)
- Normal mode (#else): Macros expand to function calls
- Functions remain in their original locations (no code duplication)
2. core/hakmem_build_flags.h:
- Added HAKMEM_TINY_FRONT_PGO build flag (default: 0, off)
- Documentation: Usage with make EXTRA_CFLAGS="-DHAKMEM_TINY_FRONT_PGO=1"
3. core/box/hak_wrappers.inc.h:
- Replaced front_gate_unified_enabled() with TINY_FRONT_UNIFIED_GATE_ENABLED
- 2 call sites updated (malloc and free fast paths)
- Added config box include
EXPECTED DEAD CODE ELIMINATION (PGO mode):
if (TINY_FRONT_UNIFIED_GATE_ENABLED) { ... }
→ if (1) { ... } // Constant, always true
→ Compiler optimizes away the branch, keeps body
SCOPE:
Currently only front_gate_unified_enabled() is replaced (2 call sites).
To achieve full +5-8% target, expand to other config checks:
- ultra_slim_mode_enabled()
- tiny_heap_v2_enabled()
- sfc_cascade_enabled()
- tiny_fastcache_enabled()
- tiny_metrics_enabled()
- tiny_diag_enabled()
BUILD USAGE:
Normal mode (runtime config, default):
make bench_random_mixed_hakmem
PGO mode (compile-time config, dead code elimination):
make EXTRA_CFLAGS="-DHAKMEM_TINY_FRONT_PGO=1" bench_random_mixed_hakmem
BOX PATTERN COMPLIANCE:
✅ Single Responsibility: Configuration management ONLY
✅ Clear Contract: Dual-mode (PGO = constants, Normal = runtime)
✅ Observable: Config report function (debug builds)
✅ Safe: Backward compatible (default is normal mode)
✅ Testable: Easy A/B comparison (PGO vs normal builds)
WHY +2.7-4.9% (below +5-8% target)?
- Limited scope: Only 2 call sites for 1 config function replaced
- Lazy init overhead: front_gate_unified_enabled() cached after first call
- Need to expand to more config checks for full benefit
NEXT STEPS:
- Expand config macro usage to other functions (optional)
- OR proceed with PGO re-enablement (Final polish)
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-29 12:18:37 +09:00
|
|
|
|
2025-11-29 17:35:51 +09:00
|
|
|
static inline int tiny_tls_sll_enabled(void) {
|
|
|
|
|
extern int g_tls_sll_enable;
|
|
|
|
|
return g_tls_sll_enable;
|
|
|
|
|
}
|
|
|
|
|
|
2025-11-29 17:58:42 +09:00
|
|
|
// Phase 8-Step1: Unified Cache enabled wrapper
|
|
|
|
|
// Forward declaration - actual function is in tiny_unified_cache.c
|
|
|
|
|
int unified_cache_enabled(void);
|
|
|
|
|
|
Phase 4-Step3: Add Front Config Box (+2.7-4.9% dead code elimination)
Implement compile-time configuration system for dead code elimination in Tiny
allocation hot paths. The Config Box provides dual-mode configuration:
- Normal mode: Runtime ENV checks (backward compatible, flexible)
- PGO mode: Compile-time constants (dead code elimination, performance)
PERFORMANCE:
- Baseline (runtime config): 50.32 M ops/s (avg of 5 runs)
- Config Box (PGO mode): 52.77 M ops/s (avg of 5 runs)
- Improvement: +2.45 M ops/s (+4.87% with outlier, +2.72% without)
- Target: +5-8% (partially achieved)
IMPLEMENTATION:
1. core/box/tiny_front_config_box.h (NEW):
- Defines TINY_FRONT_*_ENABLED macros for all config checks
- PGO mode (#if HAKMEM_TINY_FRONT_PGO): Macros expand to constants (0/1)
- Normal mode (#else): Macros expand to function calls
- Functions remain in their original locations (no code duplication)
2. core/hakmem_build_flags.h:
- Added HAKMEM_TINY_FRONT_PGO build flag (default: 0, off)
- Documentation: Usage with make EXTRA_CFLAGS="-DHAKMEM_TINY_FRONT_PGO=1"
3. core/box/hak_wrappers.inc.h:
- Replaced front_gate_unified_enabled() with TINY_FRONT_UNIFIED_GATE_ENABLED
- 2 call sites updated (malloc and free fast paths)
- Added config box include
EXPECTED DEAD CODE ELIMINATION (PGO mode):
if (TINY_FRONT_UNIFIED_GATE_ENABLED) { ... }
→ if (1) { ... } // Constant, always true
→ Compiler optimizes away the branch, keeps body
SCOPE:
Currently only front_gate_unified_enabled() is replaced (2 call sites).
To achieve full +5-8% target, expand to other config checks:
- ultra_slim_mode_enabled()
- tiny_heap_v2_enabled()
- sfc_cascade_enabled()
- tiny_fastcache_enabled()
- tiny_metrics_enabled()
- tiny_diag_enabled()
BUILD USAGE:
Normal mode (runtime config, default):
make bench_random_mixed_hakmem
PGO mode (compile-time config, dead code elimination):
make EXTRA_CFLAGS="-DHAKMEM_TINY_FRONT_PGO=1" bench_random_mixed_hakmem
BOX PATTERN COMPLIANCE:
✅ Single Responsibility: Configuration management ONLY
✅ Clear Contract: Dual-mode (PGO = constants, Normal = runtime)
✅ Observable: Config report function (debug builds)
✅ Safe: Backward compatible (default is normal mode)
✅ Testable: Easy A/B comparison (PGO vs normal builds)
WHY +2.7-4.9% (below +5-8% target)?
- Limited scope: Only 2 call sites for 1 config function replaced
- Lazy init overhead: front_gate_unified_enabled() cached after first call
- Need to expand to more config checks for full benefit
NEXT STEPS:
- Expand config macro usage to other functions (optional)
- OR proceed with PGO re-enablement (Final polish)
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-29 12:18:37 +09:00
|
|
|
// Config macros (runtime function calls)
|
|
|
|
|
// These expand to actual function calls in normal mode
|
2025-11-29 17:58:42 +09:00
|
|
|
#define TINY_FRONT_ULTRA_SLIM_ENABLED ultra_slim_mode_enabled()
|
|
|
|
|
#define TINY_FRONT_HEAP_V2_ENABLED tiny_heap_v2_enabled()
|
|
|
|
|
#define TINY_FRONT_SFC_ENABLED tiny_sfc_enabled()
|
|
|
|
|
#define TINY_FRONT_FASTCACHE_ENABLED tiny_fastcache_enabled()
|
|
|
|
|
#define TINY_FRONT_TLS_SLL_ENABLED tiny_tls_sll_enabled()
|
|
|
|
|
#define TINY_FRONT_UNIFIED_CACHE_ENABLED unified_cache_enabled()
|
|
|
|
|
#define TINY_FRONT_UNIFIED_GATE_ENABLED front_gate_unified_enabled()
|
|
|
|
|
#define TINY_FRONT_METRICS_ENABLED tiny_metrics_enabled()
|
|
|
|
|
#define TINY_FRONT_DIAG_ENABLED tiny_diag_enabled()
|
Phase 4-Step3: Add Front Config Box (+2.7-4.9% dead code elimination)
Implement compile-time configuration system for dead code elimination in Tiny
allocation hot paths. The Config Box provides dual-mode configuration:
- Normal mode: Runtime ENV checks (backward compatible, flexible)
- PGO mode: Compile-time constants (dead code elimination, performance)
PERFORMANCE:
- Baseline (runtime config): 50.32 M ops/s (avg of 5 runs)
- Config Box (PGO mode): 52.77 M ops/s (avg of 5 runs)
- Improvement: +2.45 M ops/s (+4.87% with outlier, +2.72% without)
- Target: +5-8% (partially achieved)
IMPLEMENTATION:
1. core/box/tiny_front_config_box.h (NEW):
- Defines TINY_FRONT_*_ENABLED macros for all config checks
- PGO mode (#if HAKMEM_TINY_FRONT_PGO): Macros expand to constants (0/1)
- Normal mode (#else): Macros expand to function calls
- Functions remain in their original locations (no code duplication)
2. core/hakmem_build_flags.h:
- Added HAKMEM_TINY_FRONT_PGO build flag (default: 0, off)
- Documentation: Usage with make EXTRA_CFLAGS="-DHAKMEM_TINY_FRONT_PGO=1"
3. core/box/hak_wrappers.inc.h:
- Replaced front_gate_unified_enabled() with TINY_FRONT_UNIFIED_GATE_ENABLED
- 2 call sites updated (malloc and free fast paths)
- Added config box include
EXPECTED DEAD CODE ELIMINATION (PGO mode):
if (TINY_FRONT_UNIFIED_GATE_ENABLED) { ... }
→ if (1) { ... } // Constant, always true
→ Compiler optimizes away the branch, keeps body
SCOPE:
Currently only front_gate_unified_enabled() is replaced (2 call sites).
To achieve full +5-8% target, expand to other config checks:
- ultra_slim_mode_enabled()
- tiny_heap_v2_enabled()
- sfc_cascade_enabled()
- tiny_fastcache_enabled()
- tiny_metrics_enabled()
- tiny_diag_enabled()
BUILD USAGE:
Normal mode (runtime config, default):
make bench_random_mixed_hakmem
PGO mode (compile-time config, dead code elimination):
make EXTRA_CFLAGS="-DHAKMEM_TINY_FRONT_PGO=1" bench_random_mixed_hakmem
BOX PATTERN COMPLIANCE:
✅ Single Responsibility: Configuration management ONLY
✅ Clear Contract: Dual-mode (PGO = constants, Normal = runtime)
✅ Observable: Config report function (debug builds)
✅ Safe: Backward compatible (default is normal mode)
✅ Testable: Easy A/B comparison (PGO vs normal builds)
WHY +2.7-4.9% (below +5-8% target)?
- Limited scope: Only 2 call sites for 1 config function replaced
- Lazy init overhead: front_gate_unified_enabled() cached after first call
- Need to expand to more config checks for full benefit
NEXT STEPS:
- Expand config macro usage to other functions (optional)
- OR proceed with PGO re-enablement (Final polish)
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-29 12:18:37 +09:00
|
|
|
|
|
|
|
|
#endif // HAKMEM_TINY_FRONT_PGO
|
|
|
|
|
|
|
|
|
|
// ============================================================================
|
|
|
|
|
// Configuration Helpers
|
|
|
|
|
// ============================================================================
|
|
|
|
|
|
|
|
|
|
// Check if running in PGO-optimized build
|
|
|
|
|
static inline int tiny_front_is_pgo_build(void) {
|
|
|
|
|
return HAKMEM_TINY_FRONT_PGO;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Get effective configuration (for diagnostics)
|
|
|
|
|
static inline void tiny_front_config_report(void) {
|
2025-12-03 12:43:02 +09:00
|
|
|
#if 0 // !HAKMEM_BUILD_RELEASE
|
|
|
|
|
// Disabled to avoid circular dependency / implicit declaration issues
|
|
|
|
|
// re-enable when include order is fixed
|
Phase 4-Step3: Add Front Config Box (+2.7-4.9% dead code elimination)
Implement compile-time configuration system for dead code elimination in Tiny
allocation hot paths. The Config Box provides dual-mode configuration:
- Normal mode: Runtime ENV checks (backward compatible, flexible)
- PGO mode: Compile-time constants (dead code elimination, performance)
PERFORMANCE:
- Baseline (runtime config): 50.32 M ops/s (avg of 5 runs)
- Config Box (PGO mode): 52.77 M ops/s (avg of 5 runs)
- Improvement: +2.45 M ops/s (+4.87% with outlier, +2.72% without)
- Target: +5-8% (partially achieved)
IMPLEMENTATION:
1. core/box/tiny_front_config_box.h (NEW):
- Defines TINY_FRONT_*_ENABLED macros for all config checks
- PGO mode (#if HAKMEM_TINY_FRONT_PGO): Macros expand to constants (0/1)
- Normal mode (#else): Macros expand to function calls
- Functions remain in their original locations (no code duplication)
2. core/hakmem_build_flags.h:
- Added HAKMEM_TINY_FRONT_PGO build flag (default: 0, off)
- Documentation: Usage with make EXTRA_CFLAGS="-DHAKMEM_TINY_FRONT_PGO=1"
3. core/box/hak_wrappers.inc.h:
- Replaced front_gate_unified_enabled() with TINY_FRONT_UNIFIED_GATE_ENABLED
- 2 call sites updated (malloc and free fast paths)
- Added config box include
EXPECTED DEAD CODE ELIMINATION (PGO mode):
if (TINY_FRONT_UNIFIED_GATE_ENABLED) { ... }
→ if (1) { ... } // Constant, always true
→ Compiler optimizes away the branch, keeps body
SCOPE:
Currently only front_gate_unified_enabled() is replaced (2 call sites).
To achieve full +5-8% target, expand to other config checks:
- ultra_slim_mode_enabled()
- tiny_heap_v2_enabled()
- sfc_cascade_enabled()
- tiny_fastcache_enabled()
- tiny_metrics_enabled()
- tiny_diag_enabled()
BUILD USAGE:
Normal mode (runtime config, default):
make bench_random_mixed_hakmem
PGO mode (compile-time config, dead code elimination):
make EXTRA_CFLAGS="-DHAKMEM_TINY_FRONT_PGO=1" bench_random_mixed_hakmem
BOX PATTERN COMPLIANCE:
✅ Single Responsibility: Configuration management ONLY
✅ Clear Contract: Dual-mode (PGO = constants, Normal = runtime)
✅ Observable: Config report function (debug builds)
✅ Safe: Backward compatible (default is normal mode)
✅ Testable: Easy A/B comparison (PGO vs normal builds)
WHY +2.7-4.9% (below +5-8% target)?
- Limited scope: Only 2 call sites for 1 config function replaced
- Lazy init overhead: front_gate_unified_enabled() cached after first call
- Need to expand to more config checks for full benefit
NEXT STEPS:
- Expand config macro usage to other functions (optional)
- OR proceed with PGO re-enablement (Final polish)
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-29 12:18:37 +09:00
|
|
|
fprintf(stderr, "[TINY_FRONT_CONFIG]\n");
|
|
|
|
|
fprintf(stderr, " PGO Build: %d\n", HAKMEM_TINY_FRONT_PGO);
|
|
|
|
|
fprintf(stderr, " Ultra SLIM: %d\n", TINY_FRONT_ULTRA_SLIM_ENABLED);
|
|
|
|
|
fprintf(stderr, " Heap V2: %d\n", TINY_FRONT_HEAP_V2_ENABLED);
|
|
|
|
|
fprintf(stderr, " SFC: %d\n", TINY_FRONT_SFC_ENABLED);
|
|
|
|
|
fprintf(stderr, " FastCache: %d\n", TINY_FRONT_FASTCACHE_ENABLED);
|
|
|
|
|
fprintf(stderr, " Unified Gate: %d\n", TINY_FRONT_UNIFIED_GATE_ENABLED);
|
|
|
|
|
fprintf(stderr, " Metrics: %d\n", TINY_FRONT_METRICS_ENABLED);
|
|
|
|
|
fprintf(stderr, " Diag: %d\n", TINY_FRONT_DIAG_ENABLED);
|
|
|
|
|
fflush(stderr);
|
|
|
|
|
#endif
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// ============================================================================
|
|
|
|
|
// Performance Notes
|
|
|
|
|
// ============================================================================
|
|
|
|
|
|
|
|
|
|
// Expected improvements (Phase 4-Step3):
|
|
|
|
|
// - Random Mixed 256: 57.2M → 60-62M ops/s (+5-8%)
|
|
|
|
|
// - Tiny Hot 64B: Current → +5-8%
|
|
|
|
|
//
|
|
|
|
|
// Key optimizations:
|
|
|
|
|
// 1. Dead code elimination: Compiler removes disabled code paths
|
|
|
|
|
// 2. Branch reduction: if (CONSTANT) → compile-time evaluation
|
|
|
|
|
// 3. I-cache improvement: Smaller code size (no dead branches)
|
|
|
|
|
// 4. Constant propagation: Compiler optimizes based on known values
|
|
|
|
|
//
|
|
|
|
|
// Trade-offs:
|
|
|
|
|
// 1. Binary size: PGO build is specialized (not configurable at runtime)
|
|
|
|
|
// 2. Flexibility: PGO build ignores ENV variables (fixed config)
|
|
|
|
|
// 3. Testing: Need separate builds for A/B testing (PGO vs normal)
|
|
|
|
|
//
|
|
|
|
|
// Recommendation:
|
|
|
|
|
// - Development: Use normal build (runtime config, flexible)
|
|
|
|
|
// - Production: Use PGO build after profiling (maximum performance)
|
|
|
|
|
|
|
|
|
|
#endif // TINY_FRONT_CONFIG_BOX_H
|