584 lines
15 KiB
Markdown
584 lines
15 KiB
Markdown
|
|
# Phase 4 改善ロードマップ
|
|||
|
|
|
|||
|
|
## Overview
|
|||
|
|
|
|||
|
|
Phase 4 で 3.6% の性能退行が発生したため、段階的に改善します。
|
|||
|
|
|
|||
|
|
**現状**:
|
|||
|
|
- Phase 3: 391 M ops/sec
|
|||
|
|
- Phase 4: 373-380 M ops/sec
|
|||
|
|
- **退行**: -3.6%
|
|||
|
|
|
|||
|
|
**目標**:
|
|||
|
|
- Phase 4.1(Quick Win): 385-390 M ops/sec(+1-2%)
|
|||
|
|
- Phase 4.2(Gating): 390-395 M ops/sec(Phase 3 レベル回復)
|
|||
|
|
- Phase 4.3(Batching): 395-400 M ops/sec(Phase 3 超え)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Phase 4.1: Quick Win(Option A+B)
|
|||
|
|
|
|||
|
|
### 目標
|
|||
|
|
- 実装時間: **5-10分**
|
|||
|
|
- 期待効果: **+1-2%**(385-390 M ops/sec)
|
|||
|
|
- リスク: **低**
|
|||
|
|
|
|||
|
|
### 実装内容
|
|||
|
|
|
|||
|
|
#### Option A: 重複メモリアクセスの削減
|
|||
|
|
|
|||
|
|
**Before**:
|
|||
|
|
```c
|
|||
|
|
// owner->class_idx を2回読む
|
|||
|
|
int is_tls_active = (owner == g_tls_active_slab_a[owner->class_idx] ||
|
|||
|
|
owner == g_tls_active_slab_b[owner->class_idx]);
|
|||
|
|
|
|||
|
|
if (is_tls_active && !mini_mag_is_full(&owner->mini_mag)) {
|
|||
|
|
mini_mag_push(&owner->mini_mag, it.ptr);
|
|||
|
|
stats_record_free(owner->class_idx);
|
|||
|
|
continue;
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**After**:
|
|||
|
|
```c
|
|||
|
|
// 1回だけ読んで再利用
|
|||
|
|
uint8_t cidx = owner->class_idx;
|
|||
|
|
TinySlab* tls_a = g_tls_active_slab_a[cidx];
|
|||
|
|
TinySlab* tls_b = g_tls_active_slab_b[cidx];
|
|||
|
|
|
|||
|
|
if ((owner == tls_a || owner == tls_b) &&
|
|||
|
|
!mini_mag_is_full(&owner->mini_mag)) {
|
|||
|
|
mini_mag_push(&owner->mini_mag, it.ptr);
|
|||
|
|
stats_record_free(cidx);
|
|||
|
|
continue;
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**改善点**:
|
|||
|
|
- `owner->class_idx`: 3回読み → 1回読み
|
|||
|
|
- `g_tls_active_slab_a/b[cidx]`: ループ外に hoist 可能
|
|||
|
|
|
|||
|
|
#### Option B: Branch prediction hint
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
// TLS Magazine から spill する場合、TLS-active slab への戻りが likely
|
|||
|
|
if (__builtin_expect((owner == tls_a || owner == tls_b) &&
|
|||
|
|
!mini_mag_is_full(&owner->mini_mag), 1)) {
|
|||
|
|
mini_mag_push(&owner->mini_mag, it.ptr);
|
|||
|
|
stats_record_free(cidx);
|
|||
|
|
continue;
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**改善点**:
|
|||
|
|
- Branch misprediction を削減
|
|||
|
|
- CPU の分岐予測をヒント
|
|||
|
|
|
|||
|
|
### 修正箇所
|
|||
|
|
|
|||
|
|
**ファイル**: `hakmem_tiny.c`
|
|||
|
|
**関数**: `hak_tiny_free_with_slab()`
|
|||
|
|
**行番号**: 890-922
|
|||
|
|
|
|||
|
|
### 検証方法
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# Phase 4.1 実装
|
|||
|
|
make clean && make bench_tiny
|
|||
|
|
|
|||
|
|
# ベンチマーク実行(3回)
|
|||
|
|
./bench_tiny 2>&1 | tail -5
|
|||
|
|
./bench_tiny 2>&1 | tail -5
|
|||
|
|
./bench_tiny 2>&1 | tail -5
|
|||
|
|
|
|||
|
|
# 期待結果: 385-390 M ops/sec
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 成功基準
|
|||
|
|
|
|||
|
|
- **最低**: 380 M ops/sec(現状維持)
|
|||
|
|
- **目標**: 385-390 M ops/sec(+1-2%)
|
|||
|
|
- **理想**: 390+ M ops/sec(Phase 3 レベル)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Phase 4.2: High-water ゲート(Option E-1)
|
|||
|
|
|
|||
|
|
### 目標
|
|||
|
|
- 実装時間: **10-20分**
|
|||
|
|
- 期待効果: **+2-5%**(390-395 M ops/sec)
|
|||
|
|
- リスク: **低〜中**
|
|||
|
|
|
|||
|
|
### 実装内容
|
|||
|
|
|
|||
|
|
#### High-water ゲートロジック
|
|||
|
|
|
|||
|
|
**コンセプト**:
|
|||
|
|
> TLS Magazine が高水位(≥75%)のとき、Phase 4 を丸ごとスキップ
|
|||
|
|
> 理由: 次回 alloc は TLS から出るので、mini-mag への投入は無駄
|
|||
|
|
|
|||
|
|
**実装**:
|
|||
|
|
```c
|
|||
|
|
// hak_tiny_free_with_slab() の先頭に追加
|
|||
|
|
void hak_tiny_free_with_slab(...) {
|
|||
|
|
// ... 既存の前処理 ...
|
|||
|
|
|
|||
|
|
// Phase 4.2: High-water ゲート
|
|||
|
|
int tls_occ = mag->count; // TLS Magazine 現在の占有数
|
|||
|
|
int tls_cap = TLS_MAG_CAPACITY; // 2048
|
|||
|
|
|
|||
|
|
if (tls_occ >= (tls_cap * 3 / 4)) {
|
|||
|
|
// High-water: Phase 4 無効
|
|||
|
|
// 全件 bitmap へ直書き(既存ロジック)
|
|||
|
|
for (int i = 0; i < mag->count; i++) {
|
|||
|
|
PoolItem it = mag->items[i];
|
|||
|
|
TinySlab* owner = hak_tiny_owner_slab(it.ptr);
|
|||
|
|
if (!owner) continue;
|
|||
|
|
|
|||
|
|
// Bitmap へ spill(既存ロジックを流用)
|
|||
|
|
// ... bitmap operations ...
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// 統計
|
|||
|
|
g_tiny_pool.phase4_gate_skip[class_idx]++;
|
|||
|
|
return;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// Low-water: Phase 4 実行(既存ロジック)
|
|||
|
|
for (int i = 0; i < mag->count; i++) {
|
|||
|
|
PoolItem it = mag->items[i];
|
|||
|
|
TinySlab* owner = hak_tiny_owner_slab(it.ptr);
|
|||
|
|
if (!owner) continue;
|
|||
|
|
|
|||
|
|
// Phase 4.1 で最適化したロジック
|
|||
|
|
uint8_t cidx = owner->class_idx;
|
|||
|
|
TinySlab* tls_a = g_tls_active_slab_a[cidx];
|
|||
|
|
TinySlab* tls_b = g_tls_active_slab_b[cidx];
|
|||
|
|
|
|||
|
|
if (__builtin_expect((owner == tls_a || owner == tls_b) &&
|
|||
|
|
!mini_mag_is_full(&owner->mini_mag), 1)) {
|
|||
|
|
mini_mag_push(&owner->mini_mag, it.ptr);
|
|||
|
|
stats_record_free(cidx);
|
|||
|
|
g_tiny_pool.phase4_mini_push[cidx]++;
|
|||
|
|
continue;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// Bitmap へ spill
|
|||
|
|
// ... bitmap operations ...
|
|||
|
|
g_tiny_pool.phase4_bitmap_spill[cidx]++;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
g_tiny_pool.phase4_spill_count[class_idx]++;
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 定数定義
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
// hakmem_tiny.h または hakmem_tiny.c
|
|||
|
|
#define TLS_MAG_CAPACITY 2048
|
|||
|
|
#define TLS_MAG_HIGH_WATER (TLS_MAG_CAPACITY * 3 / 4) // 1536
|
|||
|
|
#define TLS_MAG_LOW_WATER (TLS_MAG_CAPACITY / 4) // 512
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 統計追加
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
// hakmem_tiny.h: TinyPool 構造体に追加
|
|||
|
|
typedef struct {
|
|||
|
|
// 既存
|
|||
|
|
uint64_t alloc_count[TINY_NUM_CLASSES];
|
|||
|
|
uint64_t free_count[TINY_NUM_CLASSES];
|
|||
|
|
uint64_t slab_count[TINY_NUM_CLASSES];
|
|||
|
|
|
|||
|
|
// Phase 4 測定用(新規)
|
|||
|
|
uint64_t phase4_spill_count[TINY_NUM_CLASSES]; // Phase 4 判定回数
|
|||
|
|
uint64_t phase4_mini_push[TINY_NUM_CLASSES]; // Mini-mag push 成功
|
|||
|
|
uint64_t phase4_bitmap_spill[TINY_NUM_CLASSES]; // Bitmap spill
|
|||
|
|
uint64_t phase4_gate_skip[TINY_NUM_CLASSES]; // High-water skip
|
|||
|
|
} TinyPool;
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
// hakmem_tiny.c: 初期化
|
|||
|
|
void hak_tiny_init(void) {
|
|||
|
|
// ... 既存の初期化 ...
|
|||
|
|
|
|||
|
|
for (int i = 0; i < TINY_NUM_CLASSES; i++) {
|
|||
|
|
g_tiny_pool.phase4_spill_count[i] = 0;
|
|||
|
|
g_tiny_pool.phase4_mini_push[i] = 0;
|
|||
|
|
g_tiny_pool.phase4_bitmap_spill[i] = 0;
|
|||
|
|
g_tiny_pool.phase4_gate_skip[i] = 0;
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 検証方法
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# Phase 4.2 実装
|
|||
|
|
make clean && make bench_tiny
|
|||
|
|
|
|||
|
|
# ベンチマーク実行(3回)
|
|||
|
|
./bench_tiny 2>&1 | tail -5
|
|||
|
|
./bench_tiny 2>&1 | tail -5
|
|||
|
|
./bench_tiny 2>&1 | tail -5
|
|||
|
|
|
|||
|
|
# 統計確認(実装後)
|
|||
|
|
# hak_tiny_print_stats() に phase4 統計を追加
|
|||
|
|
./test_mf2 2>&1 | grep -A 10 "Phase 4"
|
|||
|
|
|
|||
|
|
# 期待結果: 390-395 M ops/sec(Phase 3 レベル)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 成功基準
|
|||
|
|
|
|||
|
|
- **最低**: 385 M ops/sec(Phase 4.1 維持)
|
|||
|
|
- **目標**: 390-395 M ops/sec(Phase 3 レベル回復)
|
|||
|
|
- **理想**: 395+ M ops/sec(Phase 3 超え)
|
|||
|
|
|
|||
|
|
### Revert 判断
|
|||
|
|
|
|||
|
|
Phase 4.2 実装後も 385 M ops/sec を下回る場合:
|
|||
|
|
- **Phase 4 全体を revert**
|
|||
|
|
- Phase 3(391 M ops/sec)に戻る
|
|||
|
|
- Pull 型アプローチを検討
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Phase 4.3: Per-slab バッチ(Option E-2)
|
|||
|
|
|
|||
|
|
### 目標
|
|||
|
|
- 実装時間: **30-40分**
|
|||
|
|
- 期待効果: **+2-5%**(395-400 M ops/sec)
|
|||
|
|
- リスク: **中〜高**(実装複雑)
|
|||
|
|
|
|||
|
|
### 実装内容
|
|||
|
|
|
|||
|
|
#### Per-slab グルーピング
|
|||
|
|
|
|||
|
|
**コンセプト**:
|
|||
|
|
> Spill 256 items を slab 単位でグルーピング
|
|||
|
|
> is_tls_active 判定: 256回 → slab数回(1-8回)に激減
|
|||
|
|
|
|||
|
|
**データ構造**:
|
|||
|
|
```c
|
|||
|
|
#define SLAB_BUCKETS 32 // 線形プローブ用バケツ数
|
|||
|
|
|
|||
|
|
typedef struct {
|
|||
|
|
TinySlab* owner;
|
|||
|
|
void* ptrs[256];
|
|||
|
|
int count;
|
|||
|
|
} SlabBucket;
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**実装**:
|
|||
|
|
```c
|
|||
|
|
void hak_tiny_free_with_slab_batched(...) {
|
|||
|
|
// Phase 4.2: High-water ゲート
|
|||
|
|
int tls_occ = mag->count;
|
|||
|
|
if (tls_occ >= TLS_MAG_HIGH_WATER) {
|
|||
|
|
fast_spill_all_to_bitmap(mag);
|
|||
|
|
return;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// Phase 4.3: Per-slab バッチ
|
|||
|
|
SlabBucket buckets[SLAB_BUCKETS] = {0};
|
|||
|
|
|
|||
|
|
// 1st pass: Slab 単位でグルーピング
|
|||
|
|
for (int i = 0; i < mag->count; i++) {
|
|||
|
|
PoolItem it = mag->items[i];
|
|||
|
|
TinySlab* owner = hak_tiny_owner_slab(it.ptr);
|
|||
|
|
if (!owner) continue;
|
|||
|
|
|
|||
|
|
// Linear probing hash
|
|||
|
|
size_t hash = ((uintptr_t)owner >> 6) & (SLAB_BUCKETS - 1);
|
|||
|
|
while (buckets[hash].owner && buckets[hash].owner != owner) {
|
|||
|
|
hash = (hash + 1) & (SLAB_BUCKETS - 1);
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
if (!buckets[hash].owner) {
|
|||
|
|
buckets[hash].owner = owner;
|
|||
|
|
}
|
|||
|
|
buckets[hash].ptrs[buckets[hash].count++] = it.ptr;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// 2nd pass: Slab ごとに処理(判定は slab ごとに 1 回)
|
|||
|
|
for (int b = 0; b < SLAB_BUCKETS; b++) {
|
|||
|
|
if (!buckets[b].owner) continue;
|
|||
|
|
|
|||
|
|
TinySlab* slab = buckets[b].owner;
|
|||
|
|
uint8_t cidx = slab->class_idx;
|
|||
|
|
TinySlab* tls_a = g_tls_active_slab_a[cidx];
|
|||
|
|
TinySlab* tls_b = g_tls_active_slab_b[cidx];
|
|||
|
|
|
|||
|
|
int is_tls_active = (slab == tls_a || slab == tls_b);
|
|||
|
|
int room = mini_capacity(&slab->mini_mag) - mini_count(&slab->mini_mag);
|
|||
|
|
int take = is_tls_active ? min(room, buckets[b].count) : 0;
|
|||
|
|
|
|||
|
|
// Mini-mag へ一括 push
|
|||
|
|
for (int i = 0; i < take; i++) {
|
|||
|
|
mini_mag_push(&slab->mini_mag, buckets[b].ptrs[i]);
|
|||
|
|
stats_record_free(cidx);
|
|||
|
|
g_tiny_pool.phase4_mini_push[cidx]++;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// 余りは bitmap へ一括 spill
|
|||
|
|
for (int i = take; i < buckets[b].count; i++) {
|
|||
|
|
// ... bitmap operations ...
|
|||
|
|
g_tiny_pool.phase4_bitmap_spill[cidx]++;
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
g_tiny_pool.phase4_spill_count[class_idx]++;
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### Helper 関数
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
// Mini-magazine の容量と現在数
|
|||
|
|
static inline int mini_capacity(PageMiniMag* mag) {
|
|||
|
|
return mag->capacity;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
static inline int mini_count(PageMiniMag* mag) {
|
|||
|
|
return mag->count;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// min マクロ
|
|||
|
|
#define min(a, b) ((a) < (b) ? (a) : (b))
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 検証方法
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# Phase 4.3 実装
|
|||
|
|
make clean && make bench_tiny
|
|||
|
|
|
|||
|
|
# ベンチマーク実行(5回)
|
|||
|
|
for i in {1..5}; do
|
|||
|
|
./bench_tiny 2>&1 | tail -5
|
|||
|
|
done
|
|||
|
|
|
|||
|
|
# 統計確認
|
|||
|
|
./test_mf2 2>&1 | grep -A 10 "Phase 4"
|
|||
|
|
|
|||
|
|
# 期待結果: 395-400 M ops/sec(Phase 3 超え)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 成功基準
|
|||
|
|
|
|||
|
|
- **最低**: 390 M ops/sec(Phase 4.2 維持)
|
|||
|
|
- **目標**: 395-400 M ops/sec(Phase 3 超え)
|
|||
|
|
- **理想**: 400+ M ops/sec(5% 改善)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Phase 4.4: Pull 型反転(将来)
|
|||
|
|
|
|||
|
|
### 目標
|
|||
|
|
- 実装時間: **1-2時間**
|
|||
|
|
- 期待効果: **根本的解決**
|
|||
|
|
- リスク: **高**(アーキテクチャ変更)
|
|||
|
|
|
|||
|
|
### コンセプト
|
|||
|
|
|
|||
|
|
**現状(Push型)**:
|
|||
|
|
- Free 側(spill)で mini-mag に押し戻す
|
|||
|
|
- すべての spill item に overhead
|
|||
|
|
- 恩恵は allocation 側で発生(不確実)
|
|||
|
|
|
|||
|
|
**改善(Pull型)**:
|
|||
|
|
- Allocation 側で必要時だけ mini-mag から引き上げる
|
|||
|
|
- Free 側の overhead ゼロ
|
|||
|
|
- Allocation latency は若干増加(trade-off)
|
|||
|
|
|
|||
|
|
### 実装箇所
|
|||
|
|
|
|||
|
|
**ファイル**: `hakmem_tiny.c`
|
|||
|
|
**関数**: `hak_tiny_alloc()`
|
|||
|
|
**タイミング**: Bitmap scan の直前
|
|||
|
|
|
|||
|
|
### 実装イメージ
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
void* hak_tiny_alloc(size_t size) {
|
|||
|
|
int class_idx = hak_tiny_size_to_class(size);
|
|||
|
|
if (class_idx < 0) return NULL;
|
|||
|
|
|
|||
|
|
// 1. TLS Magazine (fast path)
|
|||
|
|
if (!mini_mag_is_empty(&g_tls_mag[class_idx])) {
|
|||
|
|
return mini_mag_pop(&g_tls_mag[class_idx]);
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// 2. TLS Active Slabs (medium path)
|
|||
|
|
TinySlab* tls = g_tls_active_slab_a[class_idx];
|
|||
|
|
if (!(tls && tls->free_count > 0)) {
|
|||
|
|
tls = g_tls_active_slab_b[class_idx];
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
if (tls && tls->free_count > 0) {
|
|||
|
|
// Phase 4.4: Pull from page mini-mag
|
|||
|
|
if (!mini_mag_is_empty(&tls->mini_mag)) {
|
|||
|
|
void* p = mini_mag_pop(&tls->mini_mag);
|
|||
|
|
if (p) {
|
|||
|
|
stats_record_alloc(class_idx);
|
|||
|
|
return p;
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// Phase 4.4: Refill TLS Magazine from page mini-mag
|
|||
|
|
if (!mini_mag_is_empty(&tls->mini_mag)) {
|
|||
|
|
int pulled = mini_pull_batch(&tls->mini_mag, &g_tls_mag[class_idx], 16);
|
|||
|
|
if (pulled > 0) {
|
|||
|
|
void* p = mini_mag_pop(&g_tls_mag[class_idx]);
|
|||
|
|
if (p) {
|
|||
|
|
stats_record_alloc(class_idx);
|
|||
|
|
return p;
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// Fallback: Bitmap scan(既存ロジック)
|
|||
|
|
// ...
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// 3. Global pool(既存ロジック)
|
|||
|
|
// ...
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Free 側の変更
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
void hak_tiny_free_with_slab(...) {
|
|||
|
|
// Phase 4.4: Push型ロジックを削除
|
|||
|
|
// 全件 bitmap へ直書き(シンプル化)
|
|||
|
|
|
|||
|
|
for (int i = 0; i < mag->count; i++) {
|
|||
|
|
PoolItem it = mag->items[i];
|
|||
|
|
TinySlab* owner = hak_tiny_owner_slab(it.ptr);
|
|||
|
|
if (!owner) continue;
|
|||
|
|
|
|||
|
|
// Bitmap へ spill(既存ロジック)
|
|||
|
|
// ... bitmap operations ...
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Trade-off
|
|||
|
|
|
|||
|
|
**利点**:
|
|||
|
|
- Free latency が安定(overhead なし)
|
|||
|
|
- Allocation 側で制御できる(必要時だけ pull)
|
|||
|
|
|
|||
|
|
**欠点**:
|
|||
|
|
- Allocation latency が若干増加(mini-mag からの pull コスト)
|
|||
|
|
- 実装が複雑
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 測定・診断
|
|||
|
|
|
|||
|
|
### 統計出力
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
void hak_tiny_print_phase4_stats(void) {
|
|||
|
|
printf("========================================\n");
|
|||
|
|
printf("Phase 4 Statistics\n");
|
|||
|
|
printf("========================================\n");
|
|||
|
|
|
|||
|
|
for (int i = 0; i < TINY_NUM_CLASSES; i++) {
|
|||
|
|
uint64_t spill = g_tiny_pool.phase4_spill_count[i];
|
|||
|
|
uint64_t mini = g_tiny_pool.phase4_mini_push[i];
|
|||
|
|
uint64_t bitmap = g_tiny_pool.phase4_bitmap_spill[i];
|
|||
|
|
uint64_t gate = g_tiny_pool.phase4_gate_skip[i];
|
|||
|
|
|
|||
|
|
if (spill == 0) continue;
|
|||
|
|
|
|||
|
|
double mini_ratio = (double)mini / (mini + bitmap) * 100;
|
|||
|
|
double gate_ratio = (double)gate / spill * 100;
|
|||
|
|
|
|||
|
|
printf("Class %d (%zu B):\n", i, g_tiny_class_sizes[i]);
|
|||
|
|
printf(" Spill count: %lu\n", spill);
|
|||
|
|
printf(" Mini-mag push: %lu (%.1f%%)\n", mini, mini_ratio);
|
|||
|
|
printf(" Bitmap spill: %lu (%.1f%%)\n", bitmap, 100 - mini_ratio);
|
|||
|
|
printf(" Gate skip: %lu (%.1f%%)\n", gate, gate_ratio);
|
|||
|
|
printf("\n");
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### ベンチマーク比較
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# Phase 3(ベースライン)
|
|||
|
|
git checkout <phase3-commit>
|
|||
|
|
make clean && make bench_tiny
|
|||
|
|
./bench_tiny > results_phase3.txt
|
|||
|
|
|
|||
|
|
# Phase 4.1
|
|||
|
|
git checkout <phase4.1-commit>
|
|||
|
|
make clean && make bench_tiny
|
|||
|
|
./bench_tiny > results_phase4_1.txt
|
|||
|
|
|
|||
|
|
# Phase 4.2
|
|||
|
|
git checkout <phase4.2-commit>
|
|||
|
|
make clean && make bench_tiny
|
|||
|
|
./bench_tiny > results_phase4_2.txt
|
|||
|
|
|
|||
|
|
# Phase 4.3
|
|||
|
|
git checkout <phase4.3-commit>
|
|||
|
|
make clean && make bench_tiny
|
|||
|
|
./bench_tiny > results_phase4_3.txt
|
|||
|
|
|
|||
|
|
# 比較
|
|||
|
|
diff results_phase3.txt results_phase4_*.txt
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 成功基準まとめ
|
|||
|
|
|
|||
|
|
| Phase | 実装時間 | 期待性能 | リスク | 必須 |
|
|||
|
|
|-------|---------|---------|-------|-----|
|
|||
|
|
| 4.1 (A+B) | 5-10分 | 385-390 M ops/sec | 低 | ✅ Yes |
|
|||
|
|
| 4.2 (ゲート) | 10-20分 | 390-395 M ops/sec | 低〜中 | ✅ Yes |
|
|||
|
|
| 4.3 (バッチ) | 30-40分 | 395-400 M ops/sec | 中〜高 | ⚠️ Conditional |
|
|||
|
|
| 4.4 (Pull) | 1-2時間 | 根本解決 | 高 | ❌ Future |
|
|||
|
|
|
|||
|
|
**Revert 条件**:
|
|||
|
|
- Phase 4.2 実装後も < 385 M ops/sec → Phase 4 全体を revert
|
|||
|
|
|
|||
|
|
**継続条件**:
|
|||
|
|
- Phase 4.2 で >= 390 M ops/sec → Phase 4.3 に進む
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Timeline
|
|||
|
|
|
|||
|
|
**Day 1(今日)**:
|
|||
|
|
1. ✅ ドキュメント整備(完了)
|
|||
|
|
2. 🔄 Phase 4.1 実装(5-10分)
|
|||
|
|
3. 🔄 Phase 4.1 検証(5分)
|
|||
|
|
4. 🔄 Phase 4.2 実装(10-20分)
|
|||
|
|
5. 🔄 Phase 4.2 検証(5分)
|
|||
|
|
6. ✅ コミット
|
|||
|
|
7. 📊 結果まとめ
|
|||
|
|
|
|||
|
|
**Day 2(条件付き)**:
|
|||
|
|
- Phase 4.2 が成功した場合のみ
|
|||
|
|
- Phase 4.3 実装・検証
|
|||
|
|
|
|||
|
|
**Future**:
|
|||
|
|
- Phase 4.4(Pull型)は別途検討
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## References
|
|||
|
|
|
|||
|
|
- PHASE4_REGRESSION_ANALYSIS.md(詳細分析)
|
|||
|
|
- ChatGPT Pro アドバイス(2025-10-26)
|