Phase 4 E1: ENV Snapshot Consolidation - GO (+3.92% avg, +4.01% median)
Target: Consolidate 3 ENV gate TLS reads → 1 TLS read
- tiny_c7_ultra_enabled_env(): 1.28% self
- tiny_front_v3_enabled(): 1.01% self
- tiny_metadata_cache_enabled(): 0.97% self
- Total overhead: 3.26% self (perf profile analysis)
Implementation:
- core/box/hakmem_env_snapshot_box.h (new): ENV snapshot struct & API
- core/box/hakmem_env_snapshot_box.c (new): TLS snapshot implementation
- core/front/malloc_tiny_fast.h: Migrated 5 call sites to snapshot
- core/box/tiny_legacy_fallback_box.h: Migrated 2 call sites
- core/box/tiny_metadata_cache_hot_box.h: Migrated 1 call site
- core/bench_profile.h: Added hakmem_env_snapshot_refresh_from_env()
- Makefile: Added hakmem_env_snapshot_box.o to build
- ENV gate: HAKMEM_ENV_SNAPSHOT=0/1 (default: 0, research box)
A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (E1=0): 43,617,549 ops/s (avg), 43,562,895 ops/s (median)
- Optimized (E1=1): 45,327,239 ops/s (avg), 45,309,218 ops/s (median)
- Improvement: avg +3.92%, median +4.01%
Decision: GO (+3.92% >= +2.5% threshold)
- Action: Keep as research box (default OFF) for Phase 4
- Next: Consider promotion to default in MIXED_TINYV3_C7_SAFE preset
Design Rationale:
- Shape optimizations (B3, D3) reached saturation (+0.56% NEUTRAL)
- Shift to memory/TLS overhead optimization (new optimization frontier)
- Pattern: Similar to existing tiny_front_v3_snapshot (proven approach)
- Expected: +1-3% from 3.26% ENV overhead → Achieved: +3.92%
Technical Details:
- Consolidation: 3 TLS reads → 1 TLS read (66% reduction)
- Learner interlock: tiny_metadata_cache_eff pre-computed in snapshot
- Version sync: Refreshes on small_policy_v7_version_changed()
- Fallback safety: Existing ENV gates still available when E1=0
🤖 Generated with Claude Code
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-14 00:59:12 +09:00
|
|
|
// hakmem_env_snapshot_box.c - Phase 4 E1: ENV Snapshot Consolidation (implementation)
|
Phase 4 E3-4: ENV Constructor Init (+4.75% GO)
Target: Eliminate E1 lazy init check overhead (3.22% self%)
- E1 consolidated ENV gates but lazy check remained in hot path
- Strategy: __attribute__((constructor(101))) for pre-main init
Implementation:
- ENV gate: HAKMEM_ENV_SNAPSHOT_CTOR=0/1 (default 0, research box)
- core/box/hakmem_env_snapshot_box.c: Constructor function added
- Reads ENV before main() when CTOR=1
- Refresh also syncs gate state for bench_profile putenv
- core/box/hakmem_env_snapshot_box.h: Dual-mode enabled check
- CTOR=1 fast path: direct global read (no lazy branch)
- CTOR=0 fallback: legacy lazy init (rollback safe)
- Branch hints adjusted for default OFF baseline
A/B Test Results (Mixed, 10-run, 20M iters, E1=1):
- Baseline (CTOR=0): 44.28M ops/s (mean), 44.60M ops/s (median)
- Optimized (CTOR=1): 46.38M ops/s (mean), 46.53M ops/s (median)
- Improvement: +4.75% mean, +4.35% median
Decision: GO (+4.75% >> +0.5% threshold)
- Expected +0.5-1.5%, achieved +4.75%
- Lazy init branch overhead was larger than expected
- Action: Keep as research box (default OFF), evaluate promotion
Phase 4 Cumulative:
- E1 (ENV Snapshot): +3.92%
- E2 (Alloc Per-Class): -0.21% (NEUTRAL, frozen)
- E3-4 (Constructor Init): +4.75%
- Total Phase 4: ~+8.5%
Deliverables:
- docs/analysis/PHASE4_E3_ENV_CONSTRUCTOR_INIT_DESIGN.md
- docs/analysis/PHASE4_E3_ENV_CONSTRUCTOR_INIT_NEXT_INSTRUCTIONS.md
- docs/analysis/PHASE4_COMPREHENSIVE_STATUS_ANALYSIS.md
- docs/analysis/PHASE4_EXECUTIVE_SUMMARY.md
- scripts/verify_health_profiles.sh (sanity check script)
- CURRENT_TASK.md (E3-4 complete, next instructions)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-14 02:57:35 +09:00
|
|
|
//
|
|
|
|
|
// E3-4 Extension: Constructor init to eliminate lazy check overhead (3.22% self%)
|
Phase 4 E1: ENV Snapshot Consolidation - GO (+3.92% avg, +4.01% median)
Target: Consolidate 3 ENV gate TLS reads → 1 TLS read
- tiny_c7_ultra_enabled_env(): 1.28% self
- tiny_front_v3_enabled(): 1.01% self
- tiny_metadata_cache_enabled(): 0.97% self
- Total overhead: 3.26% self (perf profile analysis)
Implementation:
- core/box/hakmem_env_snapshot_box.h (new): ENV snapshot struct & API
- core/box/hakmem_env_snapshot_box.c (new): TLS snapshot implementation
- core/front/malloc_tiny_fast.h: Migrated 5 call sites to snapshot
- core/box/tiny_legacy_fallback_box.h: Migrated 2 call sites
- core/box/tiny_metadata_cache_hot_box.h: Migrated 1 call site
- core/bench_profile.h: Added hakmem_env_snapshot_refresh_from_env()
- Makefile: Added hakmem_env_snapshot_box.o to build
- ENV gate: HAKMEM_ENV_SNAPSHOT=0/1 (default: 0, research box)
A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (E1=0): 43,617,549 ops/s (avg), 43,562,895 ops/s (median)
- Optimized (E1=1): 45,327,239 ops/s (avg), 45,309,218 ops/s (median)
- Improvement: avg +3.92%, median +4.01%
Decision: GO (+3.92% >= +2.5% threshold)
- Action: Keep as research box (default OFF) for Phase 4
- Next: Consider promotion to default in MIXED_TINYV3_C7_SAFE preset
Design Rationale:
- Shape optimizations (B3, D3) reached saturation (+0.56% NEUTRAL)
- Shift to memory/TLS overhead optimization (new optimization frontier)
- Pattern: Similar to existing tiny_front_v3_snapshot (proven approach)
- Expected: +1-3% from 3.26% ENV overhead → Achieved: +3.92%
Technical Details:
- Consolidation: 3 TLS reads → 1 TLS read (66% reduction)
- Learner interlock: tiny_metadata_cache_eff pre-computed in snapshot
- Version sync: Refreshes on small_policy_v7_version_changed()
- Fallback safety: Existing ENV gates still available when E1=0
🤖 Generated with Claude Code
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-14 00:59:12 +09:00
|
|
|
|
|
|
|
|
#include "hakmem_env_snapshot_box.h"
|
|
|
|
|
#include <stdio.h>
|
|
|
|
|
#include <stdlib.h>
|
|
|
|
|
#include <stdbool.h>
|
|
|
|
|
#include "../hakmem_build_flags.h"
|
|
|
|
|
|
|
|
|
|
// Forward declare learner check (to avoid circular deps)
|
|
|
|
|
extern bool small_learner_v2_enabled(void);
|
|
|
|
|
|
|
|
|
|
// Global snapshot state (TLS for thread safety)
|
|
|
|
|
HakmemEnvSnapshot g_hakmem_env_snapshot = {0};
|
|
|
|
|
int g_hakmem_env_snapshot_ready = 0;
|
|
|
|
|
|
Phase 4 E3-4: ENV Constructor Init (+4.75% GO)
Target: Eliminate E1 lazy init check overhead (3.22% self%)
- E1 consolidated ENV gates but lazy check remained in hot path
- Strategy: __attribute__((constructor(101))) for pre-main init
Implementation:
- ENV gate: HAKMEM_ENV_SNAPSHOT_CTOR=0/1 (default 0, research box)
- core/box/hakmem_env_snapshot_box.c: Constructor function added
- Reads ENV before main() when CTOR=1
- Refresh also syncs gate state for bench_profile putenv
- core/box/hakmem_env_snapshot_box.h: Dual-mode enabled check
- CTOR=1 fast path: direct global read (no lazy branch)
- CTOR=0 fallback: legacy lazy init (rollback safe)
- Branch hints adjusted for default OFF baseline
A/B Test Results (Mixed, 10-run, 20M iters, E1=1):
- Baseline (CTOR=0): 44.28M ops/s (mean), 44.60M ops/s (median)
- Optimized (CTOR=1): 46.38M ops/s (mean), 46.53M ops/s (median)
- Improvement: +4.75% mean, +4.35% median
Decision: GO (+4.75% >> +0.5% threshold)
- Expected +0.5-1.5%, achieved +4.75%
- Lazy init branch overhead was larger than expected
- Action: Keep as research box (default OFF), evaluate promotion
Phase 4 Cumulative:
- E1 (ENV Snapshot): +3.92%
- E2 (Alloc Per-Class): -0.21% (NEUTRAL, frozen)
- E3-4 (Constructor Init): +4.75%
- Total Phase 4: ~+8.5%
Deliverables:
- docs/analysis/PHASE4_E3_ENV_CONSTRUCTOR_INIT_DESIGN.md
- docs/analysis/PHASE4_E3_ENV_CONSTRUCTOR_INIT_NEXT_INSTRUCTIONS.md
- docs/analysis/PHASE4_COMPREHENSIVE_STATUS_ANALYSIS.md
- docs/analysis/PHASE4_EXECUTIVE_SUMMARY.md
- scripts/verify_health_profiles.sh (sanity check script)
- CURRENT_TASK.md (E3-4 complete, next instructions)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-14 02:57:35 +09:00
|
|
|
// E3-4: Global gate state (not static local - avoids lazy init overhead)
|
|
|
|
|
int g_hakmem_env_snapshot_gate = -1;
|
|
|
|
|
int g_hakmem_env_snapshot_ctor_mode = -1;
|
|
|
|
|
|
|
|
|
|
// E3-4: Constructor - run before main() to init gate without lazy check
|
|
|
|
|
__attribute__((constructor(101)))
|
|
|
|
|
static void hakmem_env_snapshot_gate_ctor(void) {
|
|
|
|
|
// Read HAKMEM_ENV_SNAPSHOT_CTOR (default OFF for safety)
|
|
|
|
|
const char* ctor_env = getenv("HAKMEM_ENV_SNAPSHOT_CTOR");
|
|
|
|
|
g_hakmem_env_snapshot_ctor_mode = (ctor_env && *ctor_env == '1') ? 1 : 0;
|
|
|
|
|
|
|
|
|
|
if (g_hakmem_env_snapshot_ctor_mode) {
|
|
|
|
|
// Constructor mode: init gate now (before any malloc/free calls)
|
|
|
|
|
const char* e = getenv("HAKMEM_ENV_SNAPSHOT");
|
|
|
|
|
g_hakmem_env_snapshot_gate = (e && *e == '1') ? 1 : 0;
|
|
|
|
|
|
|
|
|
|
#if !HAKMEM_BUILD_RELEASE
|
|
|
|
|
fprintf(stderr, "[E3-4] Constructor init: HAKMEM_ENV_SNAPSHOT_GATE=%d\n",
|
|
|
|
|
g_hakmem_env_snapshot_gate);
|
|
|
|
|
fflush(stderr);
|
|
|
|
|
#endif
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
Phase 4 E1: ENV Snapshot Consolidation - GO (+3.92% avg, +4.01% median)
Target: Consolidate 3 ENV gate TLS reads → 1 TLS read
- tiny_c7_ultra_enabled_env(): 1.28% self
- tiny_front_v3_enabled(): 1.01% self
- tiny_metadata_cache_enabled(): 0.97% self
- Total overhead: 3.26% self (perf profile analysis)
Implementation:
- core/box/hakmem_env_snapshot_box.h (new): ENV snapshot struct & API
- core/box/hakmem_env_snapshot_box.c (new): TLS snapshot implementation
- core/front/malloc_tiny_fast.h: Migrated 5 call sites to snapshot
- core/box/tiny_legacy_fallback_box.h: Migrated 2 call sites
- core/box/tiny_metadata_cache_hot_box.h: Migrated 1 call site
- core/bench_profile.h: Added hakmem_env_snapshot_refresh_from_env()
- Makefile: Added hakmem_env_snapshot_box.o to build
- ENV gate: HAKMEM_ENV_SNAPSHOT=0/1 (default: 0, research box)
A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (E1=0): 43,617,549 ops/s (avg), 43,562,895 ops/s (median)
- Optimized (E1=1): 45,327,239 ops/s (avg), 45,309,218 ops/s (median)
- Improvement: avg +3.92%, median +4.01%
Decision: GO (+3.92% >= +2.5% threshold)
- Action: Keep as research box (default OFF) for Phase 4
- Next: Consider promotion to default in MIXED_TINYV3_C7_SAFE preset
Design Rationale:
- Shape optimizations (B3, D3) reached saturation (+0.56% NEUTRAL)
- Shift to memory/TLS overhead optimization (new optimization frontier)
- Pattern: Similar to existing tiny_front_v3_snapshot (proven approach)
- Expected: +1-3% from 3.26% ENV overhead → Achieved: +3.92%
Technical Details:
- Consolidation: 3 TLS reads → 1 TLS read (66% reduction)
- Learner interlock: tiny_metadata_cache_eff pre-computed in snapshot
- Version sync: Refreshes on small_policy_v7_version_changed()
- Fallback safety: Existing ENV gates still available when E1=0
🤖 Generated with Claude Code
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-14 00:59:12 +09:00
|
|
|
// Internal helper: read all ENV vars and compute effective values
|
|
|
|
|
static void hakmem_env_snapshot_load(HakmemEnvSnapshot* snap) {
|
|
|
|
|
// Read HAKMEM_TINY_C7_ULTRA (default: ON)
|
|
|
|
|
const char* c7_env = getenv("HAKMEM_TINY_C7_ULTRA_ENABLED");
|
|
|
|
|
if (c7_env && *c7_env) {
|
|
|
|
|
snap->tiny_c7_ultra_enabled = (*c7_env != '0');
|
|
|
|
|
} else {
|
|
|
|
|
snap->tiny_c7_ultra_enabled = true; // default: ON
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Read HAKMEM_TINY_FRONT_V3_ENABLED (default: ON)
|
|
|
|
|
const char* v3_env = getenv("HAKMEM_TINY_FRONT_V3_ENABLED");
|
|
|
|
|
if (v3_env && *v3_env) {
|
|
|
|
|
snap->tiny_front_v3_enabled = (*v3_env != '0');
|
|
|
|
|
} else {
|
|
|
|
|
snap->tiny_front_v3_enabled = true; // default: ON
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Read HAKMEM_TINY_METADATA_CACHE (default: OFF)
|
|
|
|
|
const char* cache_env = getenv("HAKMEM_TINY_METADATA_CACHE");
|
|
|
|
|
if (cache_env && *cache_env) {
|
|
|
|
|
snap->tiny_metadata_cache = (*cache_env == '1');
|
|
|
|
|
} else {
|
|
|
|
|
snap->tiny_metadata_cache = false; // default: OFF
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Compute effective metadata cache (cache && !learner)
|
|
|
|
|
// Safety: disable if learner v7 is active (learner updates route_kind dynamically)
|
|
|
|
|
bool learner_active = small_learner_v2_enabled();
|
|
|
|
|
snap->tiny_metadata_cache_eff = snap->tiny_metadata_cache && !learner_active;
|
|
|
|
|
|
|
|
|
|
#if !HAKMEM_BUILD_RELEASE
|
|
|
|
|
fprintf(stderr, "[HAKMEM_ENV_SNAPSHOT] Initialized:\n");
|
|
|
|
|
fprintf(stderr, " tiny_c7_ultra_enabled: %d\n", snap->tiny_c7_ultra_enabled);
|
|
|
|
|
fprintf(stderr, " tiny_front_v3_enabled: %d\n", snap->tiny_front_v3_enabled);
|
|
|
|
|
fprintf(stderr, " tiny_metadata_cache: %d\n", snap->tiny_metadata_cache);
|
|
|
|
|
fprintf(stderr, " tiny_metadata_cache_eff: %d (learner_active=%d)\n",
|
|
|
|
|
snap->tiny_metadata_cache_eff, learner_active);
|
|
|
|
|
fflush(stderr);
|
|
|
|
|
#endif
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Initialize snapshot (lazy init on first access)
|
|
|
|
|
void hakmem_env_snapshot_init(void) {
|
|
|
|
|
if (g_hakmem_env_snapshot_ready) {
|
|
|
|
|
return; // already initialized
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
hakmem_env_snapshot_load(&g_hakmem_env_snapshot);
|
|
|
|
|
g_hakmem_env_snapshot_ready = 1;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Refresh snapshot from ENV (for bench_profile putenv sync)
|
|
|
|
|
// This ensures that after bench_setenv_default() runs, the snapshot is refreshed
|
|
|
|
|
void hakmem_env_snapshot_refresh_from_env(void) {
|
Phase 4 E3-4: ENV Constructor Init (+4.75% GO)
Target: Eliminate E1 lazy init check overhead (3.22% self%)
- E1 consolidated ENV gates but lazy check remained in hot path
- Strategy: __attribute__((constructor(101))) for pre-main init
Implementation:
- ENV gate: HAKMEM_ENV_SNAPSHOT_CTOR=0/1 (default 0, research box)
- core/box/hakmem_env_snapshot_box.c: Constructor function added
- Reads ENV before main() when CTOR=1
- Refresh also syncs gate state for bench_profile putenv
- core/box/hakmem_env_snapshot_box.h: Dual-mode enabled check
- CTOR=1 fast path: direct global read (no lazy branch)
- CTOR=0 fallback: legacy lazy init (rollback safe)
- Branch hints adjusted for default OFF baseline
A/B Test Results (Mixed, 10-run, 20M iters, E1=1):
- Baseline (CTOR=0): 44.28M ops/s (mean), 44.60M ops/s (median)
- Optimized (CTOR=1): 46.38M ops/s (mean), 46.53M ops/s (median)
- Improvement: +4.75% mean, +4.35% median
Decision: GO (+4.75% >> +0.5% threshold)
- Expected +0.5-1.5%, achieved +4.75%
- Lazy init branch overhead was larger than expected
- Action: Keep as research box (default OFF), evaluate promotion
Phase 4 Cumulative:
- E1 (ENV Snapshot): +3.92%
- E2 (Alloc Per-Class): -0.21% (NEUTRAL, frozen)
- E3-4 (Constructor Init): +4.75%
- Total Phase 4: ~+8.5%
Deliverables:
- docs/analysis/PHASE4_E3_ENV_CONSTRUCTOR_INIT_DESIGN.md
- docs/analysis/PHASE4_E3_ENV_CONSTRUCTOR_INIT_NEXT_INSTRUCTIONS.md
- docs/analysis/PHASE4_COMPREHENSIVE_STATUS_ANALYSIS.md
- docs/analysis/PHASE4_EXECUTIVE_SUMMARY.md
- scripts/verify_health_profiles.sh (sanity check script)
- CURRENT_TASK.md (E3-4 complete, next instructions)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-14 02:57:35 +09:00
|
|
|
// Refresh gate state too so bench_profile putenv defaults take effect even if
|
|
|
|
|
// the gate was lazily initialized earlier (e.g. by pre-main malloc/free).
|
|
|
|
|
const char* ctor_env = getenv("HAKMEM_ENV_SNAPSHOT_CTOR");
|
|
|
|
|
g_hakmem_env_snapshot_ctor_mode = (ctor_env && *ctor_env == '1') ? 1 : 0;
|
|
|
|
|
|
|
|
|
|
const char* gate_env = getenv("HAKMEM_ENV_SNAPSHOT");
|
|
|
|
|
g_hakmem_env_snapshot_gate = (gate_env && *gate_env == '1') ? 1 : 0;
|
|
|
|
|
|
Phase 4 E1: ENV Snapshot Consolidation - GO (+3.92% avg, +4.01% median)
Target: Consolidate 3 ENV gate TLS reads → 1 TLS read
- tiny_c7_ultra_enabled_env(): 1.28% self
- tiny_front_v3_enabled(): 1.01% self
- tiny_metadata_cache_enabled(): 0.97% self
- Total overhead: 3.26% self (perf profile analysis)
Implementation:
- core/box/hakmem_env_snapshot_box.h (new): ENV snapshot struct & API
- core/box/hakmem_env_snapshot_box.c (new): TLS snapshot implementation
- core/front/malloc_tiny_fast.h: Migrated 5 call sites to snapshot
- core/box/tiny_legacy_fallback_box.h: Migrated 2 call sites
- core/box/tiny_metadata_cache_hot_box.h: Migrated 1 call site
- core/bench_profile.h: Added hakmem_env_snapshot_refresh_from_env()
- Makefile: Added hakmem_env_snapshot_box.o to build
- ENV gate: HAKMEM_ENV_SNAPSHOT=0/1 (default: 0, research box)
A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (E1=0): 43,617,549 ops/s (avg), 43,562,895 ops/s (median)
- Optimized (E1=1): 45,327,239 ops/s (avg), 45,309,218 ops/s (median)
- Improvement: avg +3.92%, median +4.01%
Decision: GO (+3.92% >= +2.5% threshold)
- Action: Keep as research box (default OFF) for Phase 4
- Next: Consider promotion to default in MIXED_TINYV3_C7_SAFE preset
Design Rationale:
- Shape optimizations (B3, D3) reached saturation (+0.56% NEUTRAL)
- Shift to memory/TLS overhead optimization (new optimization frontier)
- Pattern: Similar to existing tiny_front_v3_snapshot (proven approach)
- Expected: +1-3% from 3.26% ENV overhead → Achieved: +3.92%
Technical Details:
- Consolidation: 3 TLS reads → 1 TLS read (66% reduction)
- Learner interlock: tiny_metadata_cache_eff pre-computed in snapshot
- Version sync: Refreshes on small_policy_v7_version_changed()
- Fallback safety: Existing ENV gates still available when E1=0
🤖 Generated with Claude Code
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-14 00:59:12 +09:00
|
|
|
hakmem_env_snapshot_load(&g_hakmem_env_snapshot);
|
|
|
|
|
g_hakmem_env_snapshot_ready = 1;
|
|
|
|
|
|
|
|
|
|
#if !HAKMEM_BUILD_RELEASE
|
|
|
|
|
fprintf(stderr, "[HAKMEM_ENV_SNAPSHOT] Refreshed from ENV (bench_profile sync)\n");
|
|
|
|
|
fflush(stderr);
|
|
|
|
|
#endif
|
|
|
|
|
}
|