65 lines
2.7 KiB
Markdown
65 lines
2.7 KiB
Markdown
|
|
# Phase 19-2: FASTLANE_DIRECT Promotion + Rebaseline (Next Instructions)
|
||
|
|
|
||
|
|
## 0. Status (where we are)
|
||
|
|
|
||
|
|
- Phase 19-1b (FASTLANE_DIRECT) is **GO**: throughput **+5.88%** with **-15.23% instr/op** and **-19.36% branches/op**.
|
||
|
|
- Safety hardening completed:
|
||
|
|
- `!g_initialized` → direct path is skipped (fail-fast, same rule as Front FastLane).
|
||
|
|
- malloc miss no longer calls `malloc_cold()` directly; it falls through to the normal wrapper path (preserves `g_hakmem_lock_depth` invariants).
|
||
|
|
- ENV cache is a single global `_Atomic` so `bench_profile` refresh affects wrappers.
|
||
|
|
|
||
|
|
## 1. Promotion policy (Box Theory)
|
||
|
|
|
||
|
|
- Keep rollback simple:
|
||
|
|
- `HAKMEM_FASTLANE_DIRECT=0` → disable (fallback to Phase 6 FastLane wrapper path).
|
||
|
|
- `HAKMEM_FASTLANE_DIRECT=1` → enable (direct `malloc_tiny_fast()` / `free_tiny_fast()` first).
|
||
|
|
- Promotion level:
|
||
|
|
- **Preset promotion** (recommended): set `HAKMEM_FASTLANE_DIRECT=1` in `MIXED_TINYV3_C7_SAFE` and `C6_HEAVY_LEGACY_POOLV1` presets.
|
||
|
|
- Keep **ENV default = 0** (opt-in) until real-world/LD_PRELOAD validation is done.
|
||
|
|
|
||
|
|
## 2. Required verification (same-binary A/B)
|
||
|
|
|
||
|
|
### 2.1 Mixed (10-run, clean env)
|
||
|
|
|
||
|
|
Baseline:
|
||
|
|
```sh
|
||
|
|
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE HAKMEM_FASTLANE_DIRECT=0 scripts/run_mixed_10_cleanenv.sh
|
||
|
|
```
|
||
|
|
|
||
|
|
Optimized:
|
||
|
|
```sh
|
||
|
|
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE HAKMEM_FASTLANE_DIRECT=1 scripts/run_mixed_10_cleanenv.sh
|
||
|
|
```
|
||
|
|
|
||
|
|
GO/NO-GO:
|
||
|
|
- GO: mean **+1.0%** or higher
|
||
|
|
- NEUTRAL: **±1.0%** → keep as preset-only (do not flip global default)
|
||
|
|
- NO-GO: **≤ -1.0%** → revert preset promotion
|
||
|
|
|
||
|
|
### 2.2 C6-heavy (5-run)
|
||
|
|
|
||
|
|
```sh
|
||
|
|
HAKMEM_PROFILE=C6_HEAVY_LEGACY_POOLV1 HAKMEM_FASTLANE_DIRECT=0 ./bench_mid_large_mt_hakmem 1 1000000 400 1
|
||
|
|
HAKMEM_PROFILE=C6_HEAVY_LEGACY_POOLV1 HAKMEM_FASTLANE_DIRECT=1 ./bench_mid_large_mt_hakmem 1 1000000 400 1
|
||
|
|
```
|
||
|
|
|
||
|
|
## 3. Perf stat capture (root-cause guardrails)
|
||
|
|
|
||
|
|
Run both A/B with:
|
||
|
|
```sh
|
||
|
|
perf stat -e cycles,instructions,branches,branch-misses,L1-icache-load-misses,iTLB-load-misses,dTLB-load-misses -- \
|
||
|
|
./bench_random_mixed_hakmem 200000000 400 1
|
||
|
|
```
|
||
|
|
|
||
|
|
Checklist:
|
||
|
|
- `instructions/op` and `branches/op` must improve (expected)
|
||
|
|
- iTLB/dTLB misses may worsen; accept only if throughput still improves
|
||
|
|
|
||
|
|
## 4. Next target selection (after promotion)
|
||
|
|
|
||
|
|
After Phase 19-2 is stable, re-run `perf record` on Mixed and choose the next box by **self% ≥ 5%**:
|
||
|
|
- If `unified_cache_push/pop` rises: focus on **UnifiedCache data-path** (touch fewer cache lines).
|
||
|
|
- If `tiny_header_finalize_alloc` rises: focus on **header finalize path** (but treat as high NO-GO risk; prior header work was often NEUTRAL).
|
||
|
|
- If ENV checks reappear in hot path: consider **Phase 19-3 (ENV check consolidation)**, but keep it in a separate research box.
|
||
|
|
|