Files
hakmem/core/box/tiny_env_box.c

73 lines
3.7 KiB
C
Raw Normal View History

feat(Phase 1-1): Complete getenv elimination from malloc/free hot paths (+39-42% perf) ## Summary Eliminated all getenv() calls from malloc/free wrappers and allocator hot paths by implementing constructor-based environment variable caching. This achieves 39-42% performance improvement (36s → 22s on sh8bench single-thread). ## Performance Impact - sh8bench 1 thread: 35-36s → 21-22s (+39-42% improvement) 🚀 - sh8bench 8 threads: ~15s (maintained) - getenv overhead: 36.32% → 0% (completely eliminated) ## Changes ### New Files - **core/box/tiny_env_box.{c,h}**: Centralized environment variable cache for Tiny allocator - Caches 43 environment variables (HAKMEM_TINY_*, HAKMEM_SLL_*, HAKMEM_SS_*, etc.) - Constructor-based initialization with atomic CAS for thread safety - Inline accessor tiny_env_cfg() for hot path access - **core/box/wrapper_env_box.{c,h}**: Environment cache for malloc/free wrappers - Caches 3 wrapper variables (HAKMEM_STEP_TRACE, HAKMEM_LD_SAFE, HAKMEM_FREE_WRAP_TRACE) - Constructor priority 101 ensures early initialization - Replaces all lazy-init patterns in wrapper code ### Modified Files - **Makefile**: Added tiny_env_box.o and wrapper_env_box.o to OBJS_BASE and SHARED_OBJS - **core/box/hak_wrappers.inc.h**: - Removed static lazy-init variables (g_step_trace, ld_safe_mode cache) - Replaced with wrapper_env_cfg() lookups (wcfg->step_trace, wcfg->ld_safe_mode) - All getenv() calls eliminated from malloc/free hot paths - **core/hakmem.c**: - Added hak_ld_env_init() with constructor for LD_PRELOAD caching - Added hak_force_libc_ctor() for HAKMEM_FORCE_LIBC_ALLOC* caching - Simplified hak_ld_env_mode() to return cached value only - Simplified hak_force_libc_alloc() to use cached values - Eliminated all getenv/atoi calls from hot paths ## Technical Details ### Constructor Initialization Pattern All environment variables are now read once at library load time using __attribute__((constructor)): ```c __attribute__((constructor(101))) static void wrapper_env_ctor(void) { wrapper_env_init_once(); // Atomic CAS ensures exactly-once init } ``` ### Thread Safety - Atomic compare-and-swap (CAS) ensures single initialization - Spin-wait for initialization completion in multi-threaded scenarios - Memory barriers (memory_order_acq_rel) ensure visibility ### Hot Path Impact Before: Every malloc/free → getenv("LD_PRELOAD") + getenv("HAKMEM_STEP_TRACE") + ... After: Every malloc/free → Single pointer dereference (wcfg->field) ## Next Optimization Target (Phase 1-2) Perf analysis reveals libc fallback accounts for ~51% of cycles: - _int_malloc: 15.04% - malloc: 9.81% - _int_free: 10.07% - malloc_consolidate: 9.27% - unlink_chunk: 6.82% Reducing libc fallback from 51% → 10% could yield additional +25-30% improvement. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: ChatGPT <chatgpt@openai.com>
2025-12-02 16:16:51 +09:00
#include "tiny_env_box.h"
#include <stdlib.h>
#include <stdatomic.h>
tiny_env_cfg_t g_tiny_env = {0};
static int env_flag(const char* name, int def_val) {
const char* e = getenv(name);
if (!e || *e == '\0') return def_val;
return (*e != '0');
}
static int env_int(const char* name, int def_val) {
const char* e = getenv(name);
if (!e || *e == '\0') return def_val;
return atoi(e);
}
void tiny_env_init_once(void) {
static _Atomic int init_state = 0; // 0: uninit, 1: inited
int expected = 0;
if (!atomic_compare_exchange_strong_explicit(&init_state, &expected, 1,
memory_order_acq_rel, memory_order_relaxed)) {
return; // already initialized
}
g_tiny_env.tiny_tls_sll = env_flag("HAKMEM_TINY_TLS_SLL", 1);
g_tiny_env.sll_multiplier = env_int("HAKMEM_SLL_MULTIPLIER", 2);
g_tiny_env.tiny_no_front_cache = env_flag("HAKMEM_TINY_NO_FRONT_CACHE", 0);
g_tiny_env.tiny_no_quick = env_flag("HAKMEM_TINY_NO_QUICK", 0);
g_tiny_env.tiny_prefetch = env_flag("HAKMEM_TINY_PREFETCH", 0);
g_tiny_env.front_direct = env_flag("HAKMEM_TINY_FRONT_DIRECT", 0);
g_tiny_env.use_class_map = env_flag("HAKMEM_TINY_NO_CLASS_MAP", 0) ? 0 : 1;
g_tiny_env.route_enable = env_flag("HAKMEM_ROUTE", 0);
g_tiny_env.route_sample_lg = env_int("HAKMEM_ROUTE_SAMPLE_LG", 10);
g_tiny_env.larson_fix = env_flag("HAKMEM_TINY_LARSON_FIX", 0);
g_tiny_env.active_track = env_flag("HAKMEM_TINY_ACTIVE_TRACK", 0);
g_tiny_env.drain_to_sll = env_int("HAKMEM_TINY_DRAIN_TO_SLL", 0);
g_tiny_env.free_to_ss = env_flag("HAKMEM_TINY_FREE_TO_SS", 0);
g_tiny_env.route_free = env_flag("HAKMEM_TINY_ROUTE_FREE", 0);
g_tiny_env.sll_diag = env_flag("HAKMEM_TINY_SLL_DIAG", 0);
g_tiny_env.sll_safeheader = env_flag("HAKMEM_TINY_SLL_SAFEHEADER", 0);
g_tiny_env.sll_ring = env_flag("HAKMEM_TINY_SLL_RING", 0);
g_tiny_env.free_fast = env_flag("HAKMEM_TINY_FREE_FAST", 1);
g_tiny_env.sll_canary_fast = env_flag("HAKMEM_TINY_SLL_CANARY_FAST", 0);
g_tiny_env.ss_free_debug = env_flag("HAKMEM_SS_FREE_DEBUG", 0);
g_tiny_env.ss_adopt = env_flag("HAKMEM_TINY_SS_ADOPT", 1);
g_tiny_env.disable_remote = env_flag("HAKMEM_TINY_DISABLE_REMOTE", 0);
g_tiny_env.freelist_mask = env_flag("HAKMEM_TINY_FREELIST_MASK", 0);
g_tiny_env.alloc_1024_metric = env_flag("HAKMEM_TINY_ALLOC_1024_METRIC", 0);
g_tiny_env.tiny_profile = env_flag("HAKMEM_TINY_PROFILE", 0);
g_tiny_env.tiny_fast_stats = env_flag("HAKMEM_TINY_FAST_STATS", 0);
g_tiny_env.heap_v2_stats = env_flag("HAKMEM_TINY_HEAP_V2_STATS", 0);
g_tiny_env.front_slim = env_flag("HAKMEM_TINY_FRONT_SLIM", 0);
{
int pct = env_int("HAKMEM_SFC_CASCADE_PCT", 50);
if (pct < 0) pct = 0;
if (pct > 100) pct = 100;
g_tiny_env.sfc_cascade_pct = pct;
}
g_tiny_env.sfc_cascade = env_flag("HAKMEM_TINY_SFC_CASCADE", 0);
g_tiny_env.alloc_remote_relax = env_flag("HAKMEM_TINY_ALLOC_REMOTE_RELAX", 0);
g_tiny_env.ss_empty_reuse = env_flag("HAKMEM_SS_EMPTY_REUSE", 1);
g_tiny_env.ss_empty_scan_limit = env_int("HAKMEM_SS_EMPTY_SCAN_LIMIT", 32);
g_tiny_env.ss_acquire_debug = env_flag("HAKMEM_SS_ACQUIRE_DEBUG", 0);
g_tiny_env.tension_drain_enable = env_flag("HAKMEM_TINY_TENSION_DRAIN_ENABLE", 1);
g_tiny_env.tension_drain_threshold = env_int("HAKMEM_TINY_TENSION_DRAIN_THRESHOLD", 1024);
g_tiny_env.alloc_route_shape = env_flag("HAKMEM_TINY_ALLOC_ROUTE_SHAPE", 0);
feat(Phase 1-1): Complete getenv elimination from malloc/free hot paths (+39-42% perf) ## Summary Eliminated all getenv() calls from malloc/free wrappers and allocator hot paths by implementing constructor-based environment variable caching. This achieves 39-42% performance improvement (36s → 22s on sh8bench single-thread). ## Performance Impact - sh8bench 1 thread: 35-36s → 21-22s (+39-42% improvement) 🚀 - sh8bench 8 threads: ~15s (maintained) - getenv overhead: 36.32% → 0% (completely eliminated) ## Changes ### New Files - **core/box/tiny_env_box.{c,h}**: Centralized environment variable cache for Tiny allocator - Caches 43 environment variables (HAKMEM_TINY_*, HAKMEM_SLL_*, HAKMEM_SS_*, etc.) - Constructor-based initialization with atomic CAS for thread safety - Inline accessor tiny_env_cfg() for hot path access - **core/box/wrapper_env_box.{c,h}**: Environment cache for malloc/free wrappers - Caches 3 wrapper variables (HAKMEM_STEP_TRACE, HAKMEM_LD_SAFE, HAKMEM_FREE_WRAP_TRACE) - Constructor priority 101 ensures early initialization - Replaces all lazy-init patterns in wrapper code ### Modified Files - **Makefile**: Added tiny_env_box.o and wrapper_env_box.o to OBJS_BASE and SHARED_OBJS - **core/box/hak_wrappers.inc.h**: - Removed static lazy-init variables (g_step_trace, ld_safe_mode cache) - Replaced with wrapper_env_cfg() lookups (wcfg->step_trace, wcfg->ld_safe_mode) - All getenv() calls eliminated from malloc/free hot paths - **core/hakmem.c**: - Added hak_ld_env_init() with constructor for LD_PRELOAD caching - Added hak_force_libc_ctor() for HAKMEM_FORCE_LIBC_ALLOC* caching - Simplified hak_ld_env_mode() to return cached value only - Simplified hak_force_libc_alloc() to use cached values - Eliminated all getenv/atoi calls from hot paths ## Technical Details ### Constructor Initialization Pattern All environment variables are now read once at library load time using __attribute__((constructor)): ```c __attribute__((constructor(101))) static void wrapper_env_ctor(void) { wrapper_env_init_once(); // Atomic CAS ensures exactly-once init } ``` ### Thread Safety - Atomic compare-and-swap (CAS) ensures single initialization - Spin-wait for initialization completion in multi-threaded scenarios - Memory barriers (memory_order_acq_rel) ensure visibility ### Hot Path Impact Before: Every malloc/free → getenv("LD_PRELOAD") + getenv("HAKMEM_STEP_TRACE") + ... After: Every malloc/free → Single pointer dereference (wcfg->field) ## Next Optimization Target (Phase 1-2) Perf analysis reveals libc fallback accounts for ~51% of cycles: - _int_malloc: 15.04% - malloc: 9.81% - _int_free: 10.07% - malloc_consolidate: 9.27% - unlink_chunk: 6.82% Reducing libc fallback from 51% → 10% could yield additional +25-30% improvement. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: ChatGPT <chatgpt@openai.com>
2025-12-02 16:16:51 +09:00
g_tiny_env.inited = 1;
}