# Phase 19-3: ENV Snapshot Consolidation — Revised Implementation Instructions ## 0. Design Corrections (from initial analysis) **Key mistakes in initial design**: 1. **Wrong**: Create new TLS ctx **Right**: Use existing `HakmemEnvSnapshot*` from `core/box/hakmem_env_snapshot_box.h` 2. **Wrong**: Option A with static `__thread g_init` (doesn't respect `hakmem_env_snapshot_refresh_from_env()`) **Right**: Pass `const HakmemEnvSnapshot* env` down the call stack (refresh works automatically) 3. **Wrong**: Modify `core/wrapper.c` **Right**: Modify `core/box/hak_wrappers.inc.h` (actual integration point) 4. **Wrong**: Keep `__builtin_expect(hakmem_env_snapshot_enabled(), 0)` **Right**: Remove UNLIKELY hint (Phase 19-1 trap: snapshot is now ON by default, hint is backwards) --- ## 1. Strategy: Simplest Path (3 micro-phases) ### Phase 19-3a: Remove UNLIKELY hint from ENV checks **Problem**: `__builtin_expect(hakmem_env_snapshot_enabled(), 0)` appears in: - `core/front/malloc_tiny_fast.h:236` (C7 ULTRA alloc) - `core/front/malloc_tiny_fast.h:403` (Front V3 free hotcold) - `core/front/malloc_tiny_fast.h:624` (C7 ULTRA free) - `core/front/malloc_tiny_fast.h:830` (C7 ULTRA free larson) - `core/front/malloc_tiny_fast.h:910` (Front V3 free larson) **Issue**: Snapshot is now ON in presets → UNLIKELY hint is backwards (same trap as Phase 19-1 NO-GO) **Fix**: Replace `__builtin_expect(hakmem_env_snapshot_enabled(), 0)` with `hakmem_env_snapshot_enabled()` **Expected**: +0-2% from correct branch prediction **A/B Test**: ```sh # Before (with UNLIKELY hint) scripts/run_mixed_10_cleanenv.sh # After (without hint) scripts/run_mixed_10_cleanenv.sh ``` **GO Criteria**: No regression (±1%) **Status**: DONE ✅ (GO +4.42%) --- ### Phase 19-3b: Pass snapshot down (hot helper pass-down) **Status**: DONE ✅ (GO +2.76% mean / +2.57% median) Results: `docs/analysis/PHASE19_FASTLANE_INSTRUCTION_REDUCTION_3B_ENV_SNAPSHOT_PASSDOWN_AB_TEST_RESULTS.md` **Current state**: Each callee calls `hakmem_env_snapshot_enabled()` independently - 5 calls × 2 TLS reads each = **10 TLS reads/op** **Implementation (landed)**: - `core/front/malloc_tiny_fast.h` - Capture `env` once per hot call and pass it down: - `free_tiny_fast()` / `free_tiny_fast_hot()` capture `env` once - `free_tiny_fast_cold(..., env)` consumes it - `tiny_legacy_fallback_free_base_with_env(..., env)` consumes it - Reuse the same snapshot for alloc route selection: - `tiny_policy_hot_get_route_with_env(class_idx, env)` - `core/box/tiny_legacy_fallback_box.h`: add `tiny_legacy_fallback_free_base_with_env(...)` - `core/box/tiny_metadata_cache_hot_box.h`: add `tiny_policy_hot_get_route_with_env(...)` **Optional extension (if chasing the last alloc-side gate)**: Pass `env` down from `core/box/hak_wrappers.inc.h` entry. Keep the invariant: malloc miss must fall through (do not call `malloc_cold()` directly). **Example (optional)**: ```c // core/box/hak_wrappers.inc.h (malloc wrapper) void* malloc(size_t size) { // Entry-point: read ENV snapshot once const HakmemEnvSnapshot* env = hakmem_env_snapshot_enabled() ? hakmem_env_snapshot() : NULL; if (fastlane_direct_enabled()) { // NOTE: Keep FastLane safety rule: do not use fast paths before init. if (g_initialized) { void* ptr = malloc_tiny_fast_with_env(size, env); // NEW: pass env if (ptr != NULL) return ptr; // IMPORTANT: malloc miss must fall through to existing wrapper path // (do NOT call malloc_cold() directly; it expects lock_depth to be incremented). } } void* ptr = front_fastlane_try_malloc(size); if (__builtin_expect(ptr != NULL, 1)) return ptr; // Not handled → continue to existing wrapper path below (wrap_shape / lock_depth / init waits / malloc_cold(...)). // (Do not duplicate the full wrapper here; only the env pass-down is new.) /* existing wrapper path */ } // core/box/hak_wrappers.inc.h (free wrapper) void free(void* ptr) { if (__builtin_expect(!ptr, 0)) return; // Entry-point: read ENV snapshot once const HakmemEnvSnapshot* env = hakmem_env_snapshot_enabled() ? hakmem_env_snapshot() : NULL; if (fastlane_direct_enabled()) { // NOTE: Keep FastLane safety rule: do not use fast paths before init. if (g_initialized) { if (free_tiny_fast_with_env(ptr, env)) return; // NEW: pass env free_cold(ptr); return; } } if (front_fastlane_try_free(ptr)) return; free_cold(ptr, wrapper_env_cfg_fast()); } ``` **Propagate env down**: ```c // core/front/malloc_tiny_fast.h (example) static inline void* malloc_tiny_fast_with_env(size_t size, const HakmemEnvSnapshot* env) { // ... existing logic ... // OLD: if (__builtin_expect(hakmem_env_snapshot_enabled(), 0)) { // NEW: if (env) { if (env) { // Use snapshot from env (NO additional TLS read) if (tiny_c7_ultra_enabled_cached(env)) { // NEW: use cached check // C7 ULTRA path } } // ... rest of logic ... } ``` **Helper functions** (optional, if needed): ```c // core/box/hakmem_env_snapshot_box.h static inline bool tiny_c7_ultra_enabled_cached(const HakmemEnvSnapshot* env) { if (!env) return 0; // Read from snapshot (already in cache, no TLS read) return env->tiny_c7_ultra_enabled; } ``` **Expected**: - TLS reads: 10/op → 2/op (just wrapper entry check) - Instructions: -8.0 to -10.0/op - Throughput: **+3-5%** **ENV Gate** (optional, if conservative rollout needed): ```c // core/box/env_snapshot_consolidation_env_box.h (NEW, if needed) extern _Atomic int g_env_snapshot_consolidation_enabled; static inline bool env_snapshot_consolidation_enabled(void) { return atomic_load_explicit(&g_env_snapshot_consolidation_enabled, memory_order_relaxed); } ``` **Alternative** (no new ENV gate): - Just always pass env down when HAKMEM_FASTLANE_DIRECT=1 - Rely on existing FASTLANE_DIRECT gate for rollback --- ### Phase 19-3c: Propagate to all callees (if 19-3b is GO) **Targets**: - `tiny_legacy_fallback_box.h:28` (ENV snapshot check) - `tiny_metadata_cache_hot_box.h:64` (metadata cache check) - Other helper functions that currently call `hakmem_env_snapshot_enabled()` **Expected cumulative**: **+5-8%** --- ## 2. Box Theory Compliance **Boundary**: - L0 (ENV gate): Optional `HAKMEM_ENV_SNAPSHOT_CONSOLIDATION=0/1` OR reuse `HAKMEM_FASTLANE_DIRECT=1` - L1 (Hot inline): `const HakmemEnvSnapshot* env` parameter (NULL-safe) - L2 (Fallback): If `env == NULL`, use old path (call `hakmem_env_snapshot_enabled()`) **Rollback**: - Runtime: `HAKMEM_ENV_SNAPSHOT_CONSOLIDATION=0` (or `HAKMEM_FASTLANE_DIRECT=0`) - Compile-time: `#if` guards on env parameter propagation - NULL-safe: `if (!env) { /* old path */ }` **Observability**: - Perf stat: TLS read count reduction - `hakmem_env_snapshot_enabled()` samples should drop from 7% → <1% **Refresh compatibility**: - ✅ Works: Wrapper reads fresh snapshot on each operation - ✅ `bench_profile` putenv() + refresh works (no static cache) - ✅ No version tracking needed --- ## 3. Implementation Steps ### Step 1: Phase 19-3a (UNLIKELY hint removal) **Files to modify**: ```sh # Find all instances grep -n "__builtin_expect(hakmem_env_snapshot_enabled(), 0)" core/front/malloc_tiny_fast.h # Replace with # hakmem_env_snapshot_enabled() ``` **Lines to change**: - `malloc_tiny_fast.h:236` - `malloc_tiny_fast.h:403` - `malloc_tiny_fast.h:624` - `malloc_tiny_fast.h:830` - `malloc_tiny_fast.h:910` **A/B Test**: ```sh # Baseline (before) scripts/run_mixed_10_cleanenv.sh # Optimized (after) scripts/run_mixed_10_cleanenv.sh ``` **GO Criteria**: No regression (±1%) --- ### Step 2: Phase 19-3b (Pass env from wrapper) **2.1 Update wrappers** (`core/box/hak_wrappers.inc.h`) Find malloc wrapper (around line 674): ```c void* malloc(size_t size) { // ADD: Read snapshot once at entry const HakmemEnvSnapshot* env = hakmem_env_snapshot_enabled() ? hakmem_env_snapshot() : NULL; // ... rest of existing logic ... // Replace malloc_tiny_fast(size) with malloc_tiny_fast_with_env(size, env) } ``` Find free wrapper (around line 188): ```c void free(void* ptr) { // ADD: Read snapshot once at entry const HakmemEnvSnapshot* env = hakmem_env_snapshot_enabled() ? hakmem_env_snapshot() : NULL; // ... rest of existing logic ... // Replace free_tiny_fast(ptr) with free_tiny_fast_with_env(ptr, env) } ``` **2.2 Add _with_env variants** (`core/front/malloc_tiny_fast.h`) Option A: Rename existing functions (breaking change, requires full propagation) Option B: Add new `_with_env` variants, keep old functions as wrappers (safer) **Recommended: Option B** (incremental migration) ```c // malloc_tiny_fast.h (add new variant) static inline void* malloc_tiny_fast_with_env(size_t size, const HakmemEnvSnapshot* env) { // Replace hakmem_env_snapshot_enabled() checks with (env && ...) // ... existing logic ... } // Keep old function as wrapper (for gradual migration) static inline void* malloc_tiny_fast(size_t size) { const HakmemEnvSnapshot* env = hakmem_env_snapshot_enabled() ? hakmem_env_snapshot() : NULL; return malloc_tiny_fast_with_env(size, env); } ``` **2.3 Update ENV checks inside _with_env** Replace: ```c if (__builtin_expect(hakmem_env_snapshot_enabled(), 0)) { // C7 ULTRA check } ``` With: ```c if (env) { // Use env snapshot (NO TLS read) if (env->tiny_c7_ultra_enabled) { // C7 ULTRA path } } ``` **2.4 Build & Verify** ```sh make clean && make -j bench_random_mixed_hakmem # Should compile without errors ``` **2.5 A/B Test** ```sh # Baseline (19-3a only) scripts/run_mixed_10_cleanenv.sh # Optimized (19-3b with env passing) scripts/run_mixed_10_cleanenv.sh ``` **GO Criteria**: +3% minimum --- ### Step 3: Phase 19-3c (Propagate to helpers) **If 19-3b is GO**, propagate env to: - `tiny_legacy_fallback_box.h` - `tiny_metadata_cache_hot_box.h` - Other helper functions **Expected cumulative**: +5-8% --- ## 4. Safety Checklist - [ ] NULL-safe: All `if (env)` checks handle NULL correctly - [ ] Refresh works: Wrapper reads fresh snapshot each call - [ ] Rollback: Can disable via ENV or compile flag - [ ] No static cache: No version tracking needed - [ ] Existing code paths preserved: Old functions still work --- ## 5. Expected Performance (Cumulative) | Phase | TLS reads/op | Instructions/op | Throughput | |-------|--------------|-----------------|------------| | Baseline (19-1b) | 10 | 169.45 | 52.06M ops/s | | 19-3a (hint fix) | 10 | ~169 | +0-2% | | 19-3b (env pass) | 2 | ~159-161 | +3-5% | | 19-3c (helpers) | 2 | ~157-159 | +5-8% | **Target after Phase 19-3c**: - Throughput: **54.7-56.2M ops/s** (vs 52.06M baseline) - Instructions/op: **157-159** (vs 169.45 baseline) - Gap to libc (135.92): **+15-17%** (vs +24.6% before 19-3) --- ## 6. Perf Validation **Before Phase 19-3**: ```sh perf stat -e cycles,instructions -- ./bench_random_mixed_hakmem 200000000 400 1 # Instructions: ~169.45/op ``` **After Phase 19-3c**: ```sh perf stat -e cycles,instructions -- ./bench_random_mixed_hakmem 200000000 400 1 # Instructions: ~157-159/op (target: -10.0 reduction) ``` **Perf record validation**: ```sh perf record -g -- ./bench_random_mixed_hakmem 50000000 400 1 perf report --stdio --no-children | grep hakmem_env_snapshot_enabled # Should show <1% samples (down from 7%) ``` --- ## 7. Risk Assessment **Phase 19-3a**: **LOW** - Simple search-replace - No algorithmic changes - Worst case: ±0% (no-op) **Phase 19-3b**: **MEDIUM** - API signature changes (add env parameter) - Mechanical changes (low semantic risk) - Rollback: Keep old functions as wrappers **Phase 19-3c**: **MEDIUM** - Wider propagation - More functions to update - Same rollback strategy --- ## 8. Timeline - **Phase 19-3a**: 1-2 hours (search-replace + A/B) - **Phase 19-3b**: 4-6 hours (wrapper + _with_env variants + A/B) - **Phase 19-3c**: 3-4 hours (helper propagation + A/B) **Total**: 8-12 hours (1-2 days part-time) --- ## 9. Next Steps After This Phase If Phase 19-3 achieves +5-8%: - Current: 52.06M ops/s - After 19-3: ~54.7-56.2M ops/s - Gap to libc (79.72M): ~+42-46% **Remaining candidates** (from Phase 19-1 Design): - Candidate C: Stats removal (+3-5%, already done in BENCH_MINIMAL) - Candidate D: Header inline (+2-3%) - Candidate E: Route fast path (+2-3%) **Or**: Re-profile with `perf record` to find next hot path (self% ≥ 5%)