Phase 68: PGO training set diversification (seed/WS expansion)
Changes: - scripts/box/pgo_fast_profile_config.sh: Expanded WS patterns (3→5) and seeds (1→3) for reduced overfitting and better production workload representativeness - PERFORMANCE_TARGETS_SCORECARD.md: Phase 68 baseline promoted (61.614M = 50.93%) - CURRENT_TASK.md: Phase 68 marked complete, Phase 67a (layout tax forensics) set Active Results: - 10-run verification: +1.19% vs Phase 66 baseline (GO, >+1.0% threshold) - M1 milestone: 50.93% of mimalloc (target 50%, exceeded by +0.93pp) - Stability: 10-run mean/median with <2.1% CV 🤖 Generated with Claude Code Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
This commit is contained in:
79
docs/analysis/PHASE65_HOT_SYMBOL_ORDERING_1_INSTRUCTIONS.md
Normal file
79
docs/analysis/PHASE65_HOT_SYMBOL_ORDERING_1_INSTRUCTIONS.md
Normal file
@ -0,0 +1,79 @@
|
||||
# Phase 65: Hot Symbol Ordering(layout tax を狙い撃ち)
|
||||
|
||||
背景:
|
||||
- Phase 64 が示した通り「コードを減らす/DCEする」だけでは、layout tax で IPC/branch/cache が悪化し得る。
|
||||
- `-ffunction-sections/--gc-sections` は Phase 18 precedent で破壊的になりやすい。
|
||||
- そこで Phase 65 は **“削らずに並べる”**:リンカの symbol ordering を使い、hot text を連続配置して I-cache/BTB を安定化させる。
|
||||
|
||||
目的:
|
||||
- Mixed FAST(`bench_random_mixed_hakmem_minimal`)に対して、**+1〜5%** を狙う。
|
||||
- link-out/物理削除はしない(Box Theory の「戻せる」「境界1箇所」と layout 安定を両立)。
|
||||
|
||||
成功基準:
|
||||
- Mixed 10-run mean **+2.0% 以上 = GO**(build-level 変更のため閾値は高め)
|
||||
- ±2.0% = NEUTRAL(research build として保持)
|
||||
- -2.0% 以下 = NO-GO(revert)
|
||||
|
||||
---
|
||||
|
||||
## Step 0: 事前条件
|
||||
|
||||
- baseline build:
|
||||
- `make bench_random_mixed_hakmem_minimal`
|
||||
- baseline run:
|
||||
- `BENCH_BIN=./bench_random_mixed_hakmem_minimal scripts/run_mixed_10_cleanenv.sh`
|
||||
|
||||
---
|
||||
|
||||
## Step 1: hot symbol list を作る(手作業でOK)
|
||||
|
||||
1) `mkdir -p build`
|
||||
|
||||
2) `build/hot_syms.txt` を作る(例:)
|
||||
|
||||
```
|
||||
malloc
|
||||
free
|
||||
front_fastlane_try_malloc
|
||||
front_fastlane_try_free
|
||||
malloc_tiny_fast
|
||||
free_tiny_fast
|
||||
tiny_c7_ultra_alloc
|
||||
tiny_c7_ultra_free
|
||||
tiny_region_id_write_header
|
||||
unified_cache_push
|
||||
unified_cache_pop
|
||||
small_policy_v7_snapshot
|
||||
```
|
||||
|
||||
ルール:
|
||||
- perf の self% 上位から 10〜30 個に限定(増やしすぎると order file 自体がノイズになる)
|
||||
- “wrapper 名だけ” ではなく **本当に hot な leaf** を含める
|
||||
- 関数名は `nm -n ./bench_random_mixed_hakmem_minimal | rg ' T '` などで確認
|
||||
|
||||
---
|
||||
|
||||
## Step 2: ordered FAST build
|
||||
|
||||
- `make bench_random_mixed_hakmem_fast_ordered`
|
||||
|
||||
---
|
||||
|
||||
## Step 3: A/B(Mixed 10-run)
|
||||
|
||||
baseline:
|
||||
- `BENCH_BIN=./bench_random_mixed_hakmem_minimal scripts/run_mixed_10_cleanenv.sh`
|
||||
|
||||
treatment:
|
||||
- `BENCH_BIN=./bench_random_mixed_hakmem_fast_ordered scripts/run_mixed_10_cleanenv.sh`
|
||||
|
||||
必須で perf stat(200M iters 推奨):
|
||||
- `perf stat -e cycles,instructions,branches,branch-misses,iTLB-load-misses,dTLB-load-misses,cache-misses -- ...`
|
||||
|
||||
---
|
||||
|
||||
## Rollback
|
||||
|
||||
- `make bench_random_mixed_hakmem_minimal` に戻す(order build は research のまま残してよい)
|
||||
- `build/hot_syms.txt` を削除してもよい(ただし削除による layout 差の罠はベンチ比較では踏まないこと)
|
||||
|
||||
Reference in New Issue
Block a user