# Phase 47 — FAST Front "PGO mode" A/B Test Results ## Executive Summary **Decision: NEUTRAL** - **Mean improvement**: +0.27% (below +0.5% threshold) - **Median improvement**: +1.02% (positive signal) - **Verdict**: Within noise range; no actionable performance gain - **Side effects**: Higher variance in treatment group (2.32% vs 1.23% CV) ## Background ### Objective Apply `HAKMEM_TINY_FRONT_PGO=1` to FAST build to evaluate whether compile-time fixed config (eliminating runtime gate branches) yields measurable performance improvements. ### Expected Outcome (from instructions) - Original instruction estimate: **+3~8%** - Revised expectation (based on Phase 46A lessons): **+0.5~2.0%** - Rationale: Modern CPUs predict branches well; layout tax is a real risk ### Hypothesis By converting runtime gate checks (e.g., `unified_cache_enabled()`) to compile-time constants: - Eliminate 5-7 branches in hot path - Improve I-cache density - Enable better constant propagation ## Implementation ### Changes Made 1. **Makefile**: Added new target `bench_random_mixed_hakmem_fast_pgo` - Build flags: `-DHAKMEM_BENCH_MINIMAL=1 -DHAKMEM_TINY_FRONT_PGO=1` - Location: `/mnt/workdisk/public_share/hakmem/Makefile` (line 662-670) 2. **Config Mechanism**: `core/box/tiny_front_config_box.h` - Normal mode: Runtime gate functions (e.g., `unified_cache_enabled()`) - PGO mode: Compile-time constants (e.g., `#define TINY_FRONT_UNIFIED_CACHE_ENABLED 1`) ### PGO Fixed Config Values ```c #define TINY_FRONT_ULTRA_SLIM_ENABLED 0 // Disabled #define TINY_FRONT_HEAP_V2_ENABLED 0 // Disabled #define TINY_FRONT_SFC_ENABLED 1 // Enabled #define TINY_FRONT_FASTCACHE_ENABLED 0 // Disabled #define TINY_FRONT_TLS_SLL_ENABLED 1 // Enabled #define TINY_FRONT_UNIFIED_CACHE_ENABLED 1 // Enabled #define TINY_FRONT_UNIFIED_GATE_ENABLED 1 // Enabled #define TINY_FRONT_METRICS_ENABLED 0 // Disabled #define TINY_FRONT_DIAG_ENABLED 0 // Disabled ``` ## A/B Test Results ### Methodology - **Baseline**: `bench_random_mixed_hakmem_minimal` (FAST v3: `BENCH_MINIMAL=1`) - **Treatment**: `bench_random_mixed_hakmem_fast_pgo` (FAST v3 + PGO: `BENCH_MINIMAL=1 + TINY_FRONT_PGO=1`) - **Iterations**: 10 runs per variant - **Workload**: 20M ops, WS=400, random mixed allocation pattern ### Raw Data #### Baseline (FAST - BENCH_MINIMAL only) ``` 60378212, 60412333, 60126097, 60557230, 59593446, 59503095, 59686129, 58695907, 58750183, 58687807 ``` #### Treatment (FAST+PGO - BENCH_MINIMAL + TINY_FRONT_PGO) ``` 61083082, 60515989, 60785621, 61251824, 61135770, 57473378, 58233393, 59070853, 58446760, 59977402 ``` ### Statistical Summary | Metric | Baseline (ops/s) | Treatment (ops/s) | Delta | |-----------------|------------------|-------------------|------------| | **Mean** | 59,639,044 | 59,797,407 | **+0.27%** | | **Median** | 59,639,788 | 60,246,696 | **+1.02%** | | **Stdev** | 732,715 (1.23%) | 1,385,809 (2.32%) | +89% CV | | **Min** | 58,687,807 | 57,473,378 | -2.1% | | **Max** | 60,557,230 | 61,251,824 | +1.1% | ### Decision Criteria | Threshold | Range | Decision | Result | |-----------|---------|----------|---------| | GO | ≥ +0.5% | Accept | ❌ | | NEUTRAL | ±0.5% | Research | ✅ | | NO-GO | ≤ -0.5% | Revert | ❌ | **Actual**: Mean +0.27% → **NEUTRAL** ## Analysis ### Observations 1. **Mean vs Median divergence**: - Mean: +0.27% (borderline noise) - Median: +1.02% (positive signal, above threshold) - Interpretation: Median suggests possible small gain, but mean shows high outlier sensitivity 2. **Variance increase**: - Baseline CV: 1.23% - Treatment CV: 2.32% (+89% relative increase) - Possible causes: - Layout tax (code rearrangement affecting I-cache/alignment) - Workload interaction with fixed config - Run-to-run noise amplification 3. **Outlier in treatment**: - Run 6: 57.47M ops/s (lowest across both groups) - Suggests potential instability or cache thrashing event ### Why NEUTRAL (not GO)? 1. **Mean below threshold**: +0.27% < +0.5% decision boundary 2. **High variance**: 2× coefficient of variation suggests measurement uncertainty 3. **Phase 46A lesson**: Small positive signals can mask layout tax; require conservative threshold 4. **Reproducibility concern**: Wide spread in treatment group reduces confidence ### Why not NO-GO? - Median improvement (+1.02%) is positive and above threshold - No systematic regression pattern (just higher variance) - Possibility of genuine small gain obscured by variance ## Health Check **Status**: ✅ PASS - Command: `make perf_observe` (1 run) - Outcome: No crashes, assertions, or integrity failures - Throughput (OBSERVE build): 48.27M ops/s (expected ~20% slower than FAST) - Health profiles: Both C6_HEAVY and C7_SAFE passed ## Comparison with Phase 46A | Aspect | Phase 46A (`always_inline`) | Phase 47 (PGO mode) | |-------------------------|------------------------------|---------------------| | **Hypothesis** | Inline hot function | Compile-time gates | | **Expected gain** | +1~2% | +0.5~2.0% | | **Actual mean** | -0.68% (NO-GO) | +0.27% (NEUTRAL) | | **Actual median** | +0.17% | +1.02% | | **Variance** | Similar to baseline | 2× baseline | | **Binary size change** | None (inline ≈ non-inline) | Unknown (not measured) | | **Lesson** | Layout tax real risk | Variance amplification risk | ### Key Insight Both phases show **median-positive, mean-neutral** signals. This pattern suggests: - Genuine micro-optimization present (median) - But layout tax or variance offsets mean improvement - Conservative threshold (±0.5% mean) is justified ## Recommendations ### 1. Keep as Research Box (Current Status) - **Action**: Leave `bench_random_mixed_hakmem_fast_pgo` target in Makefile for future experiments - **Rationale**: Median +1.02% suggests potential; may combine well with other optimizations - **Do NOT**: Make default or promote to FAST standard build ### 2. Future Investigation (Optional) If pursuing further: 1. **Increase sample size**: 20-30 runs to reduce variance noise 2. **Profile-guided analysis**: Check if variance correlates with: - Cache miss patterns (`perf stat -e cache-misses`) - Branch misprediction (`perf stat -e branch-misses`) - TLB misses (`perf stat -e dTLB-load-misses`) 3. **Binary size/layout analysis**: ```bash size bench_random_mixed_hakmem_minimal bench_random_mixed_hakmem_fast_pgo objdump -d ... | analyze_layout.py ``` 4. **Workload sensitivity**: - Test on different allocation patterns (C6-heavy, C7-safe, etc.) - Check if variance is workload-specific ### 3. DO NOT Promote (Current Verdict) - **Reason**: Mean +0.27% within ±0.5% noise threshold - **Risk**: High variance (2.32% CV) suggests instability - **Box Theory**: FAST build should be stable baseline, not experimental ## Lessons Learned 1. **Branch prediction is effective**: Even 5-7 branch eliminations yield <1% gain 2. **Layout tax is real**: Variance increase (2× CV) suggests code rearrangement side effects 3. **Conservative thresholds justified**: ±0.5% mean threshold filters out noise 4. **Median-positive ≠ actionable**: Need both mean and median above threshold for GO decision ## Files Modified 1. **Makefile**: Added `bench_random_mixed_hakmem_fast_pgo` target (lines 662-670) - Build flags: `EXTRA_CFLAGS='-DHAKMEM_BENCH_MINIMAL=1 -DHAKMEM_TINY_FRONT_PGO=1'` 2. **No code changes**: PGO mode uses existing `tiny_front_config_box.h` infrastructure ## Next Steps ### If NEUTRAL (Current) - Document in scorecard as "NEUTRAL - research box retained" - Monitor future phases for synergy opportunities ### If Future GO Signal Emerges 1. Run extended validation (30+ runs) 2. Profile binary layout changes 3. Test across multiple workloads 4. Update scorecard and promote to FAST standard ## Appendix: Test Commands ### Baseline (FAST) ```bash make bench_random_mixed_hakmem_minimal BENCH_BIN=./bench_random_mixed_hakmem_minimal scripts/run_mixed_10_cleanenv.sh ``` ### Treatment (FAST+PGO) ```bash make bench_random_mixed_hakmem_fast_pgo BENCH_BIN=./bench_random_mixed_hakmem_fast_pgo scripts/run_mixed_10_cleanenv.sh ``` ### Health Check ```bash make perf_observe ``` ## References - **Phase 47 Instructions**: `docs/analysis/PHASE47_FAST_FRONT_PGO_MODE_INSTRUCTIONS.md` - **Phase 46A Results**: `docs/analysis/PHASE46A_TINY_REGION_ID_WRITE_HEADER_ALWAYS_INLINE_RESULTS.md` - **Box Theory**: `docs/analysis/PHASE2_STRUCTURAL_CHANGES_NEXT_INSTRUCTIONS.md` - **Config Box**: `core/box/tiny_front_config_box.h`