# Phase 19-2: FASTLANE_DIRECT Promotion + Rebaseline (Next Instructions) ## 0. Status (where we are) - Phase 19-1b (FASTLANE_DIRECT) is **GO**: throughput **+5.88%** with **-15.23% instr/op** and **-19.36% branches/op**. - Safety hardening completed: - `!g_initialized` → direct path is skipped (fail-fast, same rule as Front FastLane). - malloc miss no longer calls `malloc_cold()` directly; it falls through to the normal wrapper path (preserves `g_hakmem_lock_depth` invariants). - ENV cache is a single global `_Atomic` so `bench_profile` refresh affects wrappers. ## 1. Promotion policy (Box Theory) - Keep rollback simple: - `HAKMEM_FASTLANE_DIRECT=0` → disable (fallback to Phase 6 FastLane wrapper path). - `HAKMEM_FASTLANE_DIRECT=1` → enable (direct `malloc_tiny_fast()` / `free_tiny_fast()` first). - Promotion level: - **Preset promotion** (recommended): set `HAKMEM_FASTLANE_DIRECT=1` in `MIXED_TINYV3_C7_SAFE` and `C6_HEAVY_LEGACY_POOLV1` presets. - Keep **ENV default = 0** (opt-in) until real-world/LD_PRELOAD validation is done. ## 2. Required verification (same-binary A/B) ### 2.1 Mixed (10-run, clean env) Baseline: ```sh HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE HAKMEM_FASTLANE_DIRECT=0 scripts/run_mixed_10_cleanenv.sh ``` Optimized: ```sh HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE HAKMEM_FASTLANE_DIRECT=1 scripts/run_mixed_10_cleanenv.sh ``` GO/NO-GO: - GO: mean **+1.0%** or higher - NEUTRAL: **±1.0%** → keep as preset-only (do not flip global default) - NO-GO: **≤ -1.0%** → revert preset promotion ### 2.2 C6-heavy (5-run) ```sh HAKMEM_PROFILE=C6_HEAVY_LEGACY_POOLV1 HAKMEM_FASTLANE_DIRECT=0 ./bench_mid_large_mt_hakmem 1 1000000 400 1 HAKMEM_PROFILE=C6_HEAVY_LEGACY_POOLV1 HAKMEM_FASTLANE_DIRECT=1 ./bench_mid_large_mt_hakmem 1 1000000 400 1 ``` ## 3. Perf stat capture (root-cause guardrails) Run both A/B with: ```sh perf stat -e cycles,instructions,branches,branch-misses,L1-icache-load-misses,iTLB-load-misses,dTLB-load-misses -- \ ./bench_random_mixed_hakmem 200000000 400 1 ``` Checklist: - `instructions/op` and `branches/op` must improve (expected) - iTLB/dTLB misses may worsen; accept only if throughput still improves ## 4. Next target selection (after promotion) After Phase 19-2 is stable, re-run `perf record` on Mixed and choose the next box by **self% ≥ 5%**: - If `unified_cache_push/pop` rises: focus on **UnifiedCache data-path** (touch fewer cache lines). - If `tiny_header_finalize_alloc` rises: focus on **header finalize path** (but treat as high NO-GO risk; prior header work was often NEUTRAL). - If ENV checks reappear in hot path: consider **Phase 19-3 (ENV check consolidation)**, but keep it in a separate research box.