104 lines
3.3 KiB
Markdown
104 lines
3.3 KiB
Markdown
|
|
# Phase 75-5: PGO Regeneration (C5/C6 Inline Slots Aware) — Next Instructions
|
||
|
|
|
||
|
|
**Status**: NEXT (HIGH PRIORITY)
|
||
|
|
|
||
|
|
## Goal
|
||
|
|
|
||
|
|
Rebuild the FAST PGO SSOT binary (`bench_random_mixed_hakmem_minimal_pgo`) with a training profile that matches the **current promoted defaults**:
|
||
|
|
- `HAKMEM_WARM_POOL_SIZE=16`
|
||
|
|
- `HAKMEM_TINY_C5_INLINE_SLOTS=1`
|
||
|
|
- `HAKMEM_TINY_C6_INLINE_SLOTS=1`
|
||
|
|
|
||
|
|
This is required because Phase 75-4 observed a large gap between:
|
||
|
|
- **Phase 69 historical FAST baseline** (62.63M ops/s)
|
||
|
|
- **Phase 75-4 current FAST PGO Point A baseline** (53.81M ops/s)
|
||
|
|
|
||
|
|
## SSOT Rules
|
||
|
|
|
||
|
|
- Use `scripts/run_mixed_10_cleanenv.sh` as the harness.
|
||
|
|
- Always pin the binary explicitly via `BENCH_BIN=...` to avoid Standard/FAST confusion.
|
||
|
|
- Keep comparisons within the **same binary** when judging a single knob (C5/C6 OFF/ON).
|
||
|
|
|
||
|
|
## Step 1: Prepare training commands (C5/C6 ON)
|
||
|
|
|
||
|
|
Pick one of these approaches (A is preferred):
|
||
|
|
|
||
|
|
### A) Training uses the harness (preferred)
|
||
|
|
|
||
|
|
Ensure the training workload exports the correct knobs:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
export HAKMEM_WARM_POOL_SIZE=16
|
||
|
|
export HAKMEM_TINY_C5_INLINE_SLOTS=1
|
||
|
|
export HAKMEM_TINY_C6_INLINE_SLOTS=1
|
||
|
|
```
|
||
|
|
|
||
|
|
Then run the existing PGO training target (repo-specific; example):
|
||
|
|
|
||
|
|
```bash
|
||
|
|
make pgo-fast-full
|
||
|
|
```
|
||
|
|
|
||
|
|
### B) Hard-pin knobs inside PGO training config (if needed)
|
||
|
|
|
||
|
|
If the training driver does not inherit ENV cleanly, update the PGO training config script to include:
|
||
|
|
- `HAKMEM_WARM_POOL_SIZE=16`
|
||
|
|
- `HAKMEM_TINY_C5_INLINE_SLOTS=1`
|
||
|
|
- `HAKMEM_TINY_C6_INLINE_SLOTS=1`
|
||
|
|
|
||
|
|
## Step 2: Validate the rebuilt binary
|
||
|
|
|
||
|
|
Run Mixed SSOT 10-run on FAST PGO:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo scripts/run_mixed_10_cleanenv.sh
|
||
|
|
```
|
||
|
|
|
||
|
|
Record mean/median/CV and update the scorecard baseline if improved.
|
||
|
|
|
||
|
|
## Step 3: Re-run Phase 75-4 matrix on FAST PGO (sanity)
|
||
|
|
|
||
|
|
Run 4-point matrix on FAST PGO to confirm:
|
||
|
|
- Point D > Point A
|
||
|
|
- and quantify additivity (B/C contributions)
|
||
|
|
|
||
|
|
```bash
|
||
|
|
BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo \
|
||
|
|
HAKMEM_TINY_C5_INLINE_SLOTS=0 HAKMEM_TINY_C6_INLINE_SLOTS=0 RUNS=10 \
|
||
|
|
scripts/run_mixed_10_cleanenv.sh
|
||
|
|
|
||
|
|
BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo \
|
||
|
|
HAKMEM_TINY_C5_INLINE_SLOTS=1 HAKMEM_TINY_C6_INLINE_SLOTS=0 RUNS=10 \
|
||
|
|
scripts/run_mixed_10_cleanenv.sh
|
||
|
|
|
||
|
|
BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo \
|
||
|
|
HAKMEM_TINY_C5_INLINE_SLOTS=0 HAKMEM_TINY_C6_INLINE_SLOTS=1 RUNS=10 \
|
||
|
|
scripts/run_mixed_10_cleanenv.sh
|
||
|
|
|
||
|
|
BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo \
|
||
|
|
HAKMEM_TINY_C5_INLINE_SLOTS=1 HAKMEM_TINY_C6_INLINE_SLOTS=1 RUNS=10 \
|
||
|
|
scripts/run_mixed_10_cleanenv.sh
|
||
|
|
```
|
||
|
|
|
||
|
|
## Step 4: If regression persists, do layout tax forensics
|
||
|
|
|
||
|
|
Use:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
./scripts/box/layout_tax_forensics_box.sh \
|
||
|
|
./bench_random_mixed_hakmem_minimal_pgo_phase69_best \
|
||
|
|
./bench_random_mixed_hakmem_minimal_pgo
|
||
|
|
```
|
||
|
|
|
||
|
|
Then classify:
|
||
|
|
- IPC drop (>3%) → text layout / inlining / code placement issue
|
||
|
|
- branch-miss spike (>10%) → hint mismatch / control-flow reshaping
|
||
|
|
- cache/dTLB spike → data layout / TLS bloat / spill
|
||
|
|
|
||
|
|
## GO/NO-GO Gates
|
||
|
|
|
||
|
|
- **GO**: FAST PGO baseline recovers significantly (target: close to Phase 69 order-of-magnitude), and Phase 75-4 D vs A remains ≥ +1.0%.
|
||
|
|
- **NEUTRAL**: D vs A stays positive but baseline still low → keep investigating training config.
|
||
|
|
- **NO-GO**: D vs A becomes negative → revert or rework inline slots integration for FAST builds.
|
||
|
|
|