Phase 69: Refill tuning completion (Warm Pool Size=16 optimized)
- Promoted Warm Pool Size=16 as the new baseline (+3.26% gain). - Updated PERFORMANCE_TARGETS_SCORECARD.md with Phase 69 results. - Updated scripts/run_mixed_10_cleanenv.sh and core/bench_profile.h to use HAKMEM_WARM_POOL_SIZE=16 by default. - Clarified that TINY_REFILL_BATCH_SIZE is not currently connected.
This commit is contained in:
@ -2,11 +2,11 @@
|
||||
|
||||
## 0) 今の「正」
|
||||
|
||||
- **性能比較の正**: FAST PGO build(`make pgo-fast-full` → `bench_random_mixed_hakmem_minimal_pgo`)✓ **Phase 68 昇格済み** (seed/WS diversified)
|
||||
- **性能比較の正**: FAST PGO build(`make pgo-fast-full` → `bench_random_mixed_hakmem_minimal_pgo`)✓ **Phase 69 昇格済み** (Warm Pool Size=16)
|
||||
- **安全・互換の正**: Standard build(`make bench_random_mixed_hakmem`)
|
||||
- **観測の正**: OBSERVE build(`make perf_observe`)
|
||||
- **スコアカード**: `docs/analysis/PERFORMANCE_TARGETS_SCORECARD.md`(M1 達成・超過: 50.93% vs 50% target)
|
||||
- **計測の正(Mixed 10-run)**: `scripts/run_mixed_10_cleanenv.sh`(`ITERS=20000000 WS=400`)
|
||||
- **スコアカード**: `docs/analysis/PERFORMANCE_TARGETS_SCORECARD.md`(M1 達成・超過: 51.77% vs 50% target、M2 まで残り +3.23pp)
|
||||
- **計測の正(Mixed 10-run)**: `scripts/run_mixed_10_cleanenv.sh`(`ITERS=20000000 WS=400`、`HAKMEM_WARM_POOL_SIZE=16` デフォルト)
|
||||
|
||||
## 1) 現状(要点)
|
||||
|
||||
@ -18,7 +18,10 @@
|
||||
- Baseline: `bench_random_mixed_hakmem_minimal_pgo` = 60.89M ops/s = 50.32% (initial PGO)
|
||||
- Phase 68(PGO training set 最適化): **GO & 昇格完了** ✓
|
||||
- 検証: 10-run で +1.19% vs Phase 66 (GO: +1.0% threshold超過)
|
||||
- 新 baseline: `bench_random_mixed_hakmem_minimal_pgo` (upgraded) = 61.614M ops/s = **50.93%** (50% target 超過、+0.93pp)
|
||||
- Baseline (upgraded): `bench_random_mixed_hakmem_minimal_pgo` = 61.614M ops/s = **50.93%** (50% target 超過、+0.93pp)
|
||||
- Phase 69(Refill tuning: Warm Pool Size 最適化): **強GO & 昇格完了** ✓✓✓
|
||||
- 検証: 10-run で +3.26% vs Phase 68 (強GO: +3.0% threshold超過)
|
||||
- 新 baseline: `bench_random_mixed_hakmem_minimal_pgo` (upgraded) = 62.63M ops/s = **51.77%** (M1 超過、+1.77pp、M2 まで残り +3.23pp)
|
||||
|
||||
## 2) 次の指示書(Active)
|
||||
|
||||
@ -64,32 +67,63 @@
|
||||
|
||||
- ✓ `docs/analysis/PHASE69_REFILL_TUNING_0_DESIGN.md` 作成
|
||||
- ✓ Tunable parameters 特定:
|
||||
- `TINY_REFILL_BATCH_SIZE` (16 → 32/64/128)
|
||||
- `HAKMEM_TINY_REFILL_COUNT_MID` / `HAKMEM_TINY_REFILL_COUNT_HOT`(refill 量の実体, ENV-only)
|
||||
- Unified Cache C5-C7 capacity (128 → 256/512)
|
||||
- Warm Pool size (12 → 16/24)
|
||||
- ✓ Sweep 計画立案(single-parameter → combined optimization)
|
||||
- ✓ Risk assessment & 判定基準定義
|
||||
|
||||
**Phase 69-1(Active): Sweep 実行**
|
||||
**Phase 69-1: Sweep 実行** ✅ **完了**
|
||||
|
||||
- **狙い**: +3〜6% (M2: 55% target への最短距離)
|
||||
- **施策**(優先順):
|
||||
1. **Warm Pool Size** (ENV-only, no recompile): 12 → 16 → 24
|
||||
- Expected: +0.5-1.0% (registry scan reduction)
|
||||
2. **Unified Cache C5-C7** (ENV-only, no recompile): 128 → 256 → 512
|
||||
- Expected: +1-2% (miss rate reduction for mid-size allocations)
|
||||
3. **Refill Batch Size** (requires PGO rebuild): 16 → 32 → 64
|
||||
- Expected: +1-3% (refill frequency reduction)
|
||||
- **手順**:
|
||||
- `scripts/run_mixed_10_cleanenv.sh` で 10-run (各パラメータ)
|
||||
- 失敗時は `scripts/box/layout_tax_forensics_box.sh` を当てて原因分類
|
||||
- **判定**:
|
||||
- GO: +1.0%(まず1段目)
|
||||
- "強GO": +3.0% 以上(M2射程の芯として昇格)
|
||||
- ✓ Baseline (Phase 68 PGO): 60.65M ops/s (10-run mean)
|
||||
- ✓ Warm Pool Size sweep:
|
||||
- Size=16: **62.63M ops/s (+3.26%, 強GO)** ✓✓✓ **Winner**
|
||||
- Size=24: 62.37M ops/s (+2.84%, GO)
|
||||
- ✓ Unified Cache C5-C7 sweep:
|
||||
- Cache=256: 61.92M ops/s (+2.09%, GO)
|
||||
- Cache=512: 61.80M ops/s (+1.89%, GO)
|
||||
- ✓ Combined optimization check:
|
||||
- Warm=16 + Cache=256: 62.35M ops/s (+2.81%, non-additive)
|
||||
- ✓ “Refill Batch Size sweep” は無効(knob 未接続):
|
||||
- `TINY_REFILL_BATCH_SIZE` は現行 Tiny front に call site が無く、性能 knob として成立していない
|
||||
- 参照: `docs/analysis/PHASE69_REFILL_TUNING_3C_REFILL_BATCH_KNOB_AUDIT.md`
|
||||
- **結果**: `docs/analysis/PHASE69_REFILL_TUNING_1_RESULTS.md`
|
||||
- **勝ち設定**: **Warm Pool Size=16 (ENV-only, +3.26%, 強GO)**
|
||||
|
||||
**Phase 69-2(後続): 勝ち設定を baseline に反映**
|
||||
- 勝ちパラメータを `pgo_fast_profile_config.sh` / `core/hakmem_tiny_config.h` に反映
|
||||
- `make pgo-fast-full` で再ビルド → baseline 昇格
|
||||
**Phase 69-2: 勝ち設定を baseline に反映** ✅ **完了**
|
||||
|
||||
- ✓ `scripts/run_mixed_10_cleanenv.sh` に `HAKMEM_WARM_POOL_SIZE=16` デフォルト追加
|
||||
- ✓ `core/bench_profile.h` の `MIXED_TINYV3_C7_SAFE` preset に `bench_setenv_default("HAKMEM_WARM_POOL_SIZE","16")` 追加
|
||||
- ✓ `PERFORMANCE_TARGETS_SCORECARD.md` に新 baseline 追加:
|
||||
- Phase 69 baseline: 62.63M ops/s = 51.77% of mimalloc
|
||||
- M1 (50%) achievement: **EXCEEDED** (+1.77pp above target)
|
||||
- M2 (55%) progress: Gap reduced to +3.23pp
|
||||
- ✓ Rollback: `HAKMEM_WARM_POOL_SIZE=12` or ENV 変数削除
|
||||
|
||||
**新 baseline**: 62.63M ops/s = mimalloc の **51.77%** (Phase 68 から +3.26%、M2 まで残り +3.23pp)
|
||||
|
||||
---
|
||||
|
||||
**Phase 69-3(次候補): refill 量(ENV-only)sweep OR 次の sweep**
|
||||
|
||||
- **選択肢 A(推奨)**: Refill count の ENV sweep(コード変更なし)
|
||||
- `HAKMEM_TINY_REFILL_COUNT_MID`(C4–C7)を 64/96/128/160… で sweep
|
||||
- `HAKMEM_TINY_REFILL_COUNT_HOT`(C0–C3)も同様に sweep(ただし WarmPool/UnifiedCache と相互作用あり)
|
||||
- 判定: 10-run mean で GO(+1.0%) / 強GO(+3.0%) / NO-GO(-1.0%)
|
||||
|
||||
- **選択肢 B**: Unified Cache の fine sweep(ENV-only)
|
||||
- C5/C6/C7 を 192/256/320… などで sweep(Phase 69-1 の 256/512 は coarse)
|
||||
- WarmPool=16 との非加算性を “原因切り分け” する
|
||||
|
||||
- **選択肢 C**: compile-time knob の新設(後回し)
|
||||
- `TINY_REFILL_BATCH_SIZE` は未接続なので、そのまま追わない
|
||||
- 必要なら別途 SSOT を作って実装する(Phase 70+)
|
||||
|
||||
- **選択肢 D**: 別方向の最適化(M2: 55% への最短距離)
|
||||
- 残り gap: +3.23pp (51.77% → 55%)
|
||||
- Phase 67b(境界 inline/unroll チューニング)
|
||||
- Top 50 hot functions の最適化
|
||||
- PGO profile の再調整
|
||||
|
||||
---
|
||||
|
||||
|
||||
6
Makefile
6
Makefile
@ -1060,6 +1060,12 @@ clean:
|
||||
rm -f $(OBJS) $(TARGET) $(BENCH_HAKMEM_OBJS) $(BENCH_SYSTEM_OBJS) $(BENCH_HAKMEM) $(BENCH_SYSTEM) $(SHARED_OBJS) $(SHARED_LIB) *.csv libhako_ffi_stub.a hako_ffi_stub.o
|
||||
rm -f bench_comprehensive.o bench_comprehensive_hakmem bench_comprehensive_system
|
||||
rm -f bench_tiny bench_tiny.o bench_tiny_mt bench_tiny_mt.o test_mf2 test_mf2.o bench_tiny_hakmem
|
||||
rm -f bench_random_mixed_hakmem.o bench_random_mixed_system.o bench_random_mixed_mi.o
|
||||
rm -f bench_tiny_hot_hakmem.o bench_tiny_hot_system.o bench_tiny_hot_mi.o bench_mi_force.o
|
||||
rm -f bench_random_mixed_hakmem bench_random_mixed_system bench_random_mixed_mi bench_random_mixed_hakx
|
||||
rm -f bench_random_mixed_hakmem_minimal bench_random_mixed_hakmem_minimal_pgo
|
||||
rm -f bench_random_mixed_hakmem_fast_fixed bench_random_mixed_hakmem_fast_pruned bench_random_mixed_hakmem_fast_pgo
|
||||
rm -f bench_tiny_hot_hakmem bench_tiny_hot_system bench_tiny_hot_mi bench_tiny_hot_hakmi bench_tiny_hot_hakx bench_tiny_hot_hakx_p0 bench_tiny_hot_direct
|
||||
|
||||
# Help
|
||||
help:
|
||||
|
||||
@ -103,6 +103,8 @@ static inline void bench_apply_mixed_tinyv3_c7_common(void) {
|
||||
bench_setenv_default("HAKMEM_TINY_STATIC_ROUTE", "1");
|
||||
// Phase 3 D1: Free route cache (TLS cache for free path routing, +2.19% proven)
|
||||
bench_setenv_default("HAKMEM_FREE_STATIC_ROUTE", "1");
|
||||
// Phase 69-1: Warm Pool Size=16 (+3.26% Strong GO, ENV-only)
|
||||
bench_setenv_default("HAKMEM_WARM_POOL_SIZE", "16");
|
||||
}
|
||||
|
||||
static inline void bench_apply_profile(void) {
|
||||
|
||||
@ -83,7 +83,10 @@ int tiny_cap_max_for_class(int class_idx);
|
||||
// Refill/Drain Configuration
|
||||
// ============================================================================
|
||||
|
||||
// Number of blocks to refill from SuperSlab to magazine
|
||||
// Number of blocks to refill from SuperSlab to magazine.
|
||||
// NOTE: As of 2025-12, this macro is not wired into the active Tiny front path
|
||||
// (no call sites). Keep it for documentation/future work, but do not treat it as
|
||||
// a performance knob until it is connected to the refill logic.
|
||||
#define TINY_REFILL_BATCH_SIZE 16
|
||||
|
||||
// Number of blocks to drain from magazine to SuperSlab
|
||||
|
||||
@ -27,7 +27,8 @@ mimalloc との比較は **FAST build** で行う(Standard は fixed tax を
|
||||
| FAST v3 | 58.478 | 58.876 | 48.34% | 旧 baseline(Phase 59b rebase)。性能評価の正から昇格 → Phase 66 PGO へ |
|
||||
| FAST v3 + PGO | 59.80 | 60.25 | 49.41% | Phase 47: NEUTRAL (+0.27% mean, +1.02% median, research box) |
|
||||
| **FAST v3 + PGO (Phase 66)** | **60.89** | **61.35** | **50.32%** | **GO: +3.0% mean (3回検証済み、安定 <±1%)**。Phase 66 PGO initial baseline |
|
||||
| **FAST v3 + PGO (Phase 68)** | **61.614** | **61.924** | **50.93%** | **GO: +1.19% vs Phase 66** ✓ (seed/WS diversification) → **昇格済み 新 FAST baseline** ✓ |
|
||||
| **FAST v3 + PGO (Phase 68)** | **61.614** | **61.924** | **50.93%** | **GO: +1.19% vs Phase 66** ✓ (seed/WS diversification) |
|
||||
| **FAST v3 + PGO (Phase 69)** | **62.63** | **63.38** | **51.77%** | **強GO: +3.26% vs Phase 68** ✓✓✓ (Warm Pool Size=16, ENV-only) → **昇格済み 新 FAST baseline** ✓ |
|
||||
| Standard | 53.50 | - | 44.21% | 安全・互換基準(Phase 48 前計測、要 rebase) |
|
||||
| OBSERVE | TBD | - | - | 診断カウンタ ON |
|
||||
|
||||
@ -65,14 +66,14 @@ Notes:
|
||||
|
||||
推奨マイルストーン(Mixed 16–1024B, FAST build):
|
||||
|
||||
| Milestone | Target | Current (FAST v3 + PGO Phase 68) | Status |
|
||||
| Milestone | Target | Current (FAST v3 + PGO Phase 69) | Status |
|
||||
|-----------|--------|-----------------------------------|--------|
|
||||
| M1 | mimalloc の **50%** | 50.93% | 🟢 **EXCEEDED** (Phase 68 PGO, 10-run verified) |
|
||||
| M2 | mimalloc の **55%** | - | 🔴 未達(構造改造必要)|
|
||||
| M1 | mimalloc の **50%** | 51.77% | 🟢 **EXCEEDED** (Phase 69, Warm Pool Size=16, ENV-only) |
|
||||
| M2 | mimalloc の **55%** | - | 🔴 未達(残り +3.23pp、Phase 69+ 継続中)|
|
||||
| M3 | mimalloc の **60%** | - | 🔴 未達(構造改造必要)|
|
||||
| M4 | mimalloc の **65–70%** | - | 🔴 未達(構造改造必要)|
|
||||
|
||||
**現状:** FAST v3 + PGO (Phase 68) = 61.614M ops/s = mimalloc の 50.93%(seed/WS diversified, 10-run 検証済み)
|
||||
**現状:** FAST v3 + PGO (Phase 69) = 62.63M ops/s = mimalloc の 51.77%(Warm Pool Size=16, ENV-only, 10-run 検証済み)
|
||||
|
||||
**Phase 68 PGO 昇格(Phase 66 → Phase 68 upgrade):**
|
||||
- Phase 66 baseline: 60.89M ops/s = 50.32% (+3.0% mean, 3-run stable)
|
||||
@ -93,6 +94,26 @@ Notes:
|
||||
- More representative of production workload variance
|
||||
- Higher confidence in baseline stability
|
||||
|
||||
**Phase 69 PGO 昇格(Phase 68 → Phase 69 upgrade):**
|
||||
- Phase 68 baseline: 61.614M ops/s = 50.93% (+1.19% vs Phase 66, 10-run verified)
|
||||
- Phase 69 baseline: 62.63M ops/s = 51.77% (+3.26% vs Phase 68, 10-run verified)
|
||||
- Parameter change: Warm Pool Size 12 → 16 (ENV-only, zero code changes)
|
||||
- M1 (50%) achievement: **EXCEEDED** (+1.77pp above target, vs +0.93pp in Phase 68)
|
||||
- M2 (55%) progress: Gap reduced to +3.23pp (from +4.07pp in Phase 68)
|
||||
|
||||
**Phase 69 Benefits Over Phase 68:**
|
||||
- +3.26% improvement from warm pool optimization (強GO threshold exceeded)
|
||||
- ENV-only change (zero layout tax risk, fully reversible)
|
||||
- Reduced registry O(N) scan overhead via larger warm pool
|
||||
- Non-additive with other optimizations (Warm Pool Size=16 alone is optimal)
|
||||
- Single strongest parameter improvement in refill tuning sweep
|
||||
|
||||
**Phase 69 Implementation:**
|
||||
- Warm Pool Size: 12 → 16 SuperSlabs/class
|
||||
- ENV variable: `HAKMEM_WARM_POOL_SIZE=16` (default in MIXED_TINYV3_C7_SAFE preset)
|
||||
- Rollback: Set `HAKMEM_WARM_POOL_SIZE=12` or remove ENV variable
|
||||
- Results: `docs/analysis/PHASE69_REFILL_TUNING_1_RESULTS.md`
|
||||
|
||||
※注意: `mimalloc/system/jemalloc` の参照値は環境ドリフトでズレるため、定期的に再ベースラインする。
|
||||
- Phase 48 完了: `docs/analysis/PHASE48_REBASE_ALLOCATORS_AND_STABILITY_SUITE_RESULTS.md`
|
||||
- Phase 59 完了: `docs/analysis/PHASE59_50PERCENT_RECOVERY_BASELINE_REBASE_RESULTS.md`
|
||||
|
||||
@ -33,7 +33,7 @@ malloc() → Unified Cache (C2/C3: 2048 slots, C5-C7: 128 slots)
|
||||
↓
|
||||
Warm Pool (12 SuperSlabs per class)
|
||||
↓ MISS
|
||||
sll_refill_batch_from_ss(class_idx, max_take=16)
|
||||
sll_refill_*_from_ss(class_idx, max_take=refill_count(class_idx))
|
||||
↓
|
||||
trc_pop_from_freelist() OR trc_linear_carve()
|
||||
↓
|
||||
@ -44,12 +44,11 @@ malloc() → Unified Cache (C2/C3: 2048 slots, C5-C7: 128 slots)
|
||||
|
||||
| Component | Parameter | Current Value | Location |
|
||||
|-----------|-----------|---------------|----------|
|
||||
| **Refill Batch Size** | `TINY_REFILL_BATCH_SIZE` | 16 | `core/hakmem_tiny_config.h:87` |
|
||||
| **Refill Count (hot/mid/global)** | `HAKMEM_TINY_REFILL_COUNT_*` | hot=128, mid=96, global=64 (defaults) | `core/hakmem_tiny_init.inc:270` |
|
||||
| **Unified Cache C2/C3** | `unified_capacity(2/3)` | 2048 slots | `core/front/tiny_unified_cache.h:129` |
|
||||
| **Unified Cache C5-C7** | `unified_capacity(5-7)` | 128 slots | `core/front/tiny_unified_cache.h:131` |
|
||||
| **Unified Cache Others** | `unified_capacity(0/1/4)` | 64 slots | `core/front/tiny_unified_cache.h:128` |
|
||||
| **Warm Pool Size** | `TINY_WARM_POOL_MAX_PER_CLASS` | 12 | `core/front/tiny_warm_pool.h:46` |
|
||||
| **Drain Batch Size** | `TINY_DRAIN_BATCH_SIZE` | 16 | `core/hakmem_tiny_config.h:90` |
|
||||
|
||||
### 1.3 Refill Overhead Breakdown
|
||||
|
||||
@ -72,35 +71,31 @@ Total Refill Cost = (refill_count × fixed_overhead) + (blocks_carved × per_blo
|
||||
|
||||
## 2. Tunable Parameters (Phase 69 Sweep Plan)
|
||||
|
||||
### 2.1 Refill Batch Size Sweep
|
||||
### 2.1 Refill Count Sweep (ENV-only)
|
||||
|
||||
**Parameter**: `TINY_REFILL_BATCH_SIZE` (global default for all classes)
|
||||
**Parameter**:
|
||||
- `HAKMEM_TINY_REFILL_COUNT_HOT` (classes C0–C3)
|
||||
- `HAKMEM_TINY_REFILL_COUNT_MID` (classes C4–C7)
|
||||
- `HAKMEM_TINY_REFILL_COUNT` (fallback/global)
|
||||
|
||||
**Current**: 16 (conservative, optimized for RSS)
|
||||
|
||||
**Sweep Range**: [16, 32, 64, 128]
|
||||
**Current defaults** (when ENV unset):
|
||||
- hot=128, mid=96, global=64
|
||||
|
||||
**Rationale**:
|
||||
- 16 → 32: Expected +1-2% (fewer refill calls)
|
||||
- 32 → 64: Expected +1-2% (diminishing returns, cache pressure increases)
|
||||
- 64 → 128: Expected +0-1% (likely NO-GO due to cache thrashing)
|
||||
- Smaller counts reduce per-refill work but increase refill frequency.
|
||||
- Larger counts reduce refill frequency but may increase chain-building cost and TLS pressure.
|
||||
- Sweep is cheap (ENV-only) and reversible.
|
||||
|
||||
**A/B Test Method**:
|
||||
```bash
|
||||
# Baseline (16)
|
||||
make clean && make bench_random_mixed_hakmem_minimal_pgo
|
||||
ITERS=20000000 WS=400 RUNS=10 scripts/run_mixed_10_cleanenv.sh
|
||||
# Baseline
|
||||
RUNS=10 BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo scripts/run_mixed_10_cleanenv.sh
|
||||
|
||||
# Treatment (32)
|
||||
sed -i 's/TINY_REFILL_BATCH_SIZE 16/TINY_REFILL_BATCH_SIZE 32/' core/hakmem_tiny_config.h
|
||||
make clean && make pgo-fast-full
|
||||
ITERS=20000000 WS=400 RUNS=10 scripts/run_mixed_10_cleanenv.sh
|
||||
|
||||
# Compare results
|
||||
# Treatment examples (pick one axis at a time)
|
||||
HAKMEM_TINY_REFILL_COUNT_MID=128 RUNS=10 BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo scripts/run_mixed_10_cleanenv.sh
|
||||
HAKMEM_TINY_REFILL_COUNT_MID=64 RUNS=10 BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo scripts/run_mixed_10_cleanenv.sh
|
||||
```
|
||||
|
||||
**Expected Winner**: 32 (balance between refill frequency and cache locality)
|
||||
|
||||
---
|
||||
|
||||
### 2.2 Unified Cache Capacity Sweep (C5-C7 Focus)
|
||||
@ -183,27 +178,18 @@ HAKMEM_WARM_POOL_SIZE=24 # Treatment 2
|
||||
- Expected: +1-2% (miss rate reduction for mid-size allocations)
|
||||
- Risk: Low (ENV-only change)
|
||||
|
||||
3. **Refill Batch Size** (requires recompile + PGO):
|
||||
```bash
|
||||
for BATCH in 16 32 64; do
|
||||
sed -i "s/TINY_REFILL_BATCH_SIZE .*/TINY_REFILL_BATCH_SIZE $BATCH/" core/hakmem_tiny_config.h
|
||||
make pgo-fast-full
|
||||
RUNS=10 scripts/run_mixed_10_cleanenv.sh
|
||||
done
|
||||
```
|
||||
- Expected: +1-3% (refill frequency reduction)
|
||||
- Risk: Medium (requires PGO rebuild, potential layout tax)
|
||||
3. **Refill Batch Size**(削除, 2025-12 audit):
|
||||
`TINY_REFILL_BATCH_SIZE` は現行 Tiny front に接続されておらず、knob として成立していない。
|
||||
実際の refill 量は `HAKMEM_TINY_REFILL_COUNT_*` で制御する(ENV-only)。
|
||||
|
||||
### 3.2 Phase 69-2: Combined Optimization (Best Settings)
|
||||
|
||||
After identifying winners from Phase 69-1, combine them:
|
||||
|
||||
```bash
|
||||
# Example: batch=32, C5-C7=256, warm_pool=16
|
||||
# Example: C5-C7=256, warm_pool=16
|
||||
HAKMEM_WARM_POOL_SIZE=16 \
|
||||
HAKMEM_TINY_UNIFIED_C5=256 HAKMEM_TINY_UNIFIED_C6=256 HAKMEM_TINY_UNIFIED_C7=256 \
|
||||
make pgo-fast-full # with TINY_REFILL_BATCH_SIZE=32
|
||||
|
||||
RUNS=10 scripts/run_mixed_10_cleanenv.sh
|
||||
```
|
||||
|
||||
@ -289,8 +275,9 @@ atomic_fetch_add(&g_refill_count_total, 1, memory_order_relaxed);
|
||||
- Warm pool size: 12 → 16 → 24
|
||||
- Unified cache C5-C7: 128 → 256 → 512
|
||||
|
||||
2. **Batch Size Sweep** (requires PGO rebuild):
|
||||
- TINY_REFILL_BATCH_SIZE: 16 → 32 → 64
|
||||
2. **Refill Count Sweep** (ENV-only):
|
||||
- `HAKMEM_TINY_REFILL_COUNT_MID`: 64 → 96 → 128 → 160 (example)
|
||||
- `HAKMEM_TINY_REFILL_COUNT_HOT`: 96 → 128 → 160 (example)
|
||||
|
||||
### Follow-Up (Phase 69-2)
|
||||
|
||||
@ -300,7 +287,6 @@ atomic_fetch_add(&g_refill_count_total, 1, memory_order_relaxed);
|
||||
|
||||
4. **Baseline Promotion** (if Strong GO):
|
||||
- Update `pgo_fast_profile_config.sh` with winning ENV vars
|
||||
- Update `core/hakmem_tiny_config.h` with winning batch size
|
||||
- Re-run `make pgo-fast-full` to bake optimizations into baseline
|
||||
|
||||
---
|
||||
|
||||
@ -39,6 +39,8 @@ export HAKMEM_FASTLANE_DIRECT=${HAKMEM_FASTLANE_DIRECT:-1}
|
||||
# NOTE: Phase 9/10 are promoted (bench_profile defaults to 1). Keep cleanenv aligned by default.
|
||||
export HAKMEM_FREE_TINY_FAST_MONO_DUALHOT=${HAKMEM_FREE_TINY_FAST_MONO_DUALHOT:-1}
|
||||
export HAKMEM_FREE_TINY_FAST_MONO_LEGACY_DIRECT=${HAKMEM_FREE_TINY_FAST_MONO_LEGACY_DIRECT:-1}
|
||||
# NOTE: Phase 69-1 winner (Warm Pool Size=16, +3.26% Strong GO, ENV-only)
|
||||
export HAKMEM_WARM_POOL_SIZE=${HAKMEM_WARM_POOL_SIZE:-16}
|
||||
|
||||
for i in $(seq 1 "${runs}"); do
|
||||
echo "=== Run ${i}/${runs} ==="
|
||||
|
||||
Reference in New Issue
Block a user