# Phase 47 — FAST Front "PGO mode" A/B Test Results

## Executive Summary

**Decision: NEUTRAL**

- **Mean improvement**: +0.27% (below +0.5% threshold)
- **Median improvement**: +1.02% (positive signal)
- **Verdict**: Within noise range; no actionable performance gain
- **Side effects**: Higher variance in treatment group (2.32% vs 1.23% CV)

## Background

### Objective

Apply `HAKMEM_TINY_FRONT_PGO=1` to FAST build to evaluate whether compile-time fixed config (eliminating runtime gate branches) yields measurable performance improvements.

### Expected Outcome (from instructions)

- Original instruction estimate: **+3~8%**
- Revised expectation (based on Phase 46A lessons): **+0.5~2.0%**
  - Rationale: Modern CPUs predict branches well; layout tax is a real risk

### Hypothesis

By converting runtime gate checks (e.g., `unified_cache_enabled()`) to compile-time constants:
- Eliminate 5-7 branches in hot path
- Improve I-cache density
- Enable better constant propagation

## Implementation

### Changes Made

1. **Makefile**: Added new target `bench_random_mixed_hakmem_fast_pgo`
   - Build flags: `-DHAKMEM_BENCH_MINIMAL=1 -DHAKMEM_TINY_FRONT_PGO=1`
   - Location: `/mnt/workdisk/public_share/hakmem/Makefile` (line 662-670)

2. **Config Mechanism**: `core/box/tiny_front_config_box.h`
   - Normal mode: Runtime gate functions (e.g., `unified_cache_enabled()`)
   - PGO mode: Compile-time constants (e.g., `#define TINY_FRONT_UNIFIED_CACHE_ENABLED 1`)

### PGO Fixed Config Values

```c
#define TINY_FRONT_ULTRA_SLIM_ENABLED    0   // Disabled
#define TINY_FRONT_HEAP_V2_ENABLED       0   // Disabled
#define TINY_FRONT_SFC_ENABLED           1   // Enabled
#define TINY_FRONT_FASTCACHE_ENABLED     0   // Disabled
#define TINY_FRONT_TLS_SLL_ENABLED       1   // Enabled
#define TINY_FRONT_UNIFIED_CACHE_ENABLED 1   // Enabled
#define TINY_FRONT_UNIFIED_GATE_ENABLED  1   // Enabled
#define TINY_FRONT_METRICS_ENABLED       0   // Disabled
#define TINY_FRONT_DIAG_ENABLED          0   // Disabled
```

## A/B Test Results

### Methodology

- **Baseline**: `bench_random_mixed_hakmem_minimal` (FAST v3: `BENCH_MINIMAL=1`)
- **Treatment**: `bench_random_mixed_hakmem_fast_pgo` (FAST v3 + PGO: `BENCH_MINIMAL=1 + TINY_FRONT_PGO=1`)
- **Iterations**: 10 runs per variant
- **Workload**: 20M ops, WS=400, random mixed allocation pattern

### Raw Data

#### Baseline (FAST - BENCH_MINIMAL only)
```
60378212, 60412333, 60126097, 60557230, 59593446,
59503095, 59686129, 58695907, 58750183, 58687807
```

#### Treatment (FAST+PGO - BENCH_MINIMAL + TINY_FRONT_PGO)
```
61083082, 60515989, 60785621, 61251824, 61135770,
57473378, 58233393, 59070853, 58446760, 59977402
```

### Statistical Summary

| Metric          | Baseline (ops/s) | Treatment (ops/s) | Delta      |
|-----------------|------------------|-------------------|------------|
| **Mean**        | 59,639,044       | 59,797,407        | **+0.27%** |
| **Median**      | 59,639,788       | 60,246,696        | **+1.02%** |
| **Stdev**       | 732,715 (1.23%)  | 1,385,809 (2.32%) | +89% CV    |
| **Min**         | 58,687,807       | 57,473,378        | -2.1%      |
| **Max**         | 60,557,230       | 61,251,824        | +1.1%      |

### Decision Criteria

| Threshold | Range   | Decision | Result  |
|-----------|---------|----------|---------|
| GO        | ≥ +0.5% | Accept   | ❌      |
| NEUTRAL   | ±0.5%   | Research | ✅      |
| NO-GO     | ≤ -0.5% | Revert   | ❌      |

**Actual**: Mean +0.27% → **NEUTRAL**

## Analysis

### Observations

1. **Mean vs Median divergence**:
   - Mean: +0.27% (borderline noise)
   - Median: +1.02% (positive signal, above threshold)
   - Interpretation: Median suggests possible small gain, but mean shows high outlier sensitivity

2. **Variance increase**:
   - Baseline CV: 1.23%
   - Treatment CV: 2.32% (+89% relative increase)
   - Possible causes:
     - Layout tax (code rearrangement affecting I-cache/alignment)
     - Workload interaction with fixed config
     - Run-to-run noise amplification

3. **Outlier in treatment**:
   - Run 6: 57.47M ops/s (lowest across both groups)
   - Suggests potential instability or cache thrashing event

### Why NEUTRAL (not GO)?

1. **Mean below threshold**: +0.27% < +0.5% decision boundary
2. **High variance**: 2× coefficient of variation suggests measurement uncertainty
3. **Phase 46A lesson**: Small positive signals can mask layout tax; require conservative threshold
4. **Reproducibility concern**: Wide spread in treatment group reduces confidence

### Why not NO-GO?

- Median improvement (+1.02%) is positive and above threshold
- No systematic regression pattern (just higher variance)
- Possibility of genuine small gain obscured by variance

## Health Check

**Status**: ✅ PASS

- Command: `make perf_observe` (1 run)
- Outcome: No crashes, assertions, or integrity failures
- Throughput (OBSERVE build): 48.27M ops/s (expected ~20% slower than FAST)
- Health profiles: Both C6_HEAVY and C7_SAFE passed

## Comparison with Phase 46A

| Aspect                  | Phase 46A (`always_inline`) | Phase 47 (PGO mode) |
|-------------------------|------------------------------|---------------------|
| **Hypothesis**          | Inline hot function          | Compile-time gates  |
| **Expected gain**       | +1~2%                        | +0.5~2.0%           |
| **Actual mean**         | -0.68% (NO-GO)               | +0.27% (NEUTRAL)    |
| **Actual median**       | +0.17%                       | +1.02%              |
| **Variance**            | Similar to baseline          | 2× baseline         |
| **Binary size change**  | None (inline ≈ non-inline)   | Unknown (not measured) |
| **Lesson**              | Layout tax real risk         | Variance amplification risk |

### Key Insight

Both phases show **median-positive, mean-neutral** signals. This pattern suggests:
- Genuine micro-optimization present (median)
- But layout tax or variance offsets mean improvement
- Conservative threshold (±0.5% mean) is justified

## Recommendations

### 1. Keep as Research Box (Current Status)

- **Action**: Leave `bench_random_mixed_hakmem_fast_pgo` target in Makefile for future experiments
- **Rationale**: Median +1.02% suggests potential; may combine well with other optimizations
- **Do NOT**: Make default or promote to FAST standard build

### 2. Future Investigation (Optional)

If pursuing further:

1. **Increase sample size**: 20-30 runs to reduce variance noise
2. **Profile-guided analysis**: Check if variance correlates with:
   - Cache miss patterns (`perf stat -e cache-misses`)
   - Branch misprediction (`perf stat -e branch-misses`)
   - TLB misses (`perf stat -e dTLB-load-misses`)

3. **Binary size/layout analysis**:
   ```bash
   size bench_random_mixed_hakmem_minimal bench_random_mixed_hakmem_fast_pgo
   objdump -d ... | analyze_layout.py
   ```

4. **Workload sensitivity**:
   - Test on different allocation patterns (C6-heavy, C7-safe, etc.)
   - Check if variance is workload-specific

### 3. DO NOT Promote (Current Verdict)

- **Reason**: Mean +0.27% within ±0.5% noise threshold
- **Risk**: High variance (2.32% CV) suggests instability
- **Box Theory**: FAST build should be stable baseline, not experimental

## Lessons Learned

1. **Branch prediction is effective**: Even 5-7 branch eliminations yield <1% gain
2. **Layout tax is real**: Variance increase (2× CV) suggests code rearrangement side effects
3. **Conservative thresholds justified**: ±0.5% mean threshold filters out noise
4. **Median-positive ≠ actionable**: Need both mean and median above threshold for GO decision

## Files Modified

1. **Makefile**: Added `bench_random_mixed_hakmem_fast_pgo` target (lines 662-670)
   - Build flags: `EXTRA_CFLAGS='-DHAKMEM_BENCH_MINIMAL=1 -DHAKMEM_TINY_FRONT_PGO=1'`

2. **No code changes**: PGO mode uses existing `tiny_front_config_box.h` infrastructure

## Next Steps

### If NEUTRAL (Current)

- Document in scorecard as "NEUTRAL - research box retained"
- Monitor future phases for synergy opportunities

### If Future GO Signal Emerges

1. Run extended validation (30+ runs)
2. Profile binary layout changes
3. Test across multiple workloads
4. Update scorecard and promote to FAST standard

## Appendix: Test Commands

### Baseline (FAST)
```bash
make bench_random_mixed_hakmem_minimal
BENCH_BIN=./bench_random_mixed_hakmem_minimal scripts/run_mixed_10_cleanenv.sh
```

### Treatment (FAST+PGO)
```bash
make bench_random_mixed_hakmem_fast_pgo
BENCH_BIN=./bench_random_mixed_hakmem_fast_pgo scripts/run_mixed_10_cleanenv.sh
```

### Health Check
```bash
make perf_observe
```

## References

- **Phase 47 Instructions**: `docs/analysis/PHASE47_FAST_FRONT_PGO_MODE_INSTRUCTIONS.md`
- **Phase 46A Results**: `docs/analysis/PHASE46A_TINY_REGION_ID_WRITE_HEADER_ALWAYS_INLINE_RESULTS.md`
- **Box Theory**: `docs/analysis/PHASE2_STRUCTURAL_CHANGES_NEXT_INSTRUCTIONS.md`
- **Config Box**: `core/box/tiny_front_config_box.h`