Files
hakmem/docs/archive/BUILD_README.md

195 lines
5.5 KiB
Markdown
Raw Normal View History

# HAKMEM Build Guide
## Quick Start
### 通常ビルド (Normal Build)
```bash
make bench_comprehensive_hakmem
./bench_comprehensive_hakmem
```
**Expected**: 200-220M ops/sec
### PGO最適化ビルド (Recommended)
```bash
./build_pgo.sh
./bench_comprehensive_hakmem
```
**Expected**: 300-350M ops/sec (+50-75% faster!)
### 共有ライブラリLD_PRELOADPGOビルド
```bash
# Step 1: 計測用に instrumented な共有ライブラリでプロファイル収集
make pgo-profile-shared
# Step 2: PGO最適化した共有ライブラリをビルド
make pgo-build-shared
# 実行system版ベンチに差し替え
HAKMEM_WRAP_TINY=1 LD_PRELOAD=./libhakmem.so ./bench_comprehensive_system
```
Expected: 共有ライブラリでも通常より高速(環境により差あり)
---
## PGO Build Script
### Usage
```bash
./build_pgo.sh [command]
```
### Commands
| Command | Description | When to Use |
|---------|-------------|-------------|
| `all` | Full PGO build (default) | First time, or after code changes |
| `clean` | Clean previous builds | Before rebuilding |
| `profile` | Build instrumented + collect profile | Step 1: Profile collection |
| `build` | Build optimized using profile | Step 2: After profile exists |
### Example Workflow
#### Full automated build (recommended)
```bash
./build_pgo.sh
```
#### Manual step-by-step
```bash
# Step 1: Collect profile
./build_pgo.sh profile
# Step 2: Build optimized
./build_pgo.sh build
```
---
## Performance Comparison
| Build Type | 128B Long-lived | Best Result | Use Case |
|------------|----------------|-------------|----------|
| **Normal** | 210M ops/s | 222M ops/s | Debug, development |
| **PGO** | 314M ops/s | 342M ops/s | Production, benchmarks |
Latest (Phase 9.3+ tiny fast-path):
- Direct (PGO): 400M+ ops/s 確認済みbench_comprehensive_hakmem
- System malloc baseline: ~410M ops/s環境依存
## What is PGO?
**Profile-Guided Optimization (PGO)** is a compiler optimization technique:
1. **Phase 1 (Profile)**: Build instrumented binary, run representative workload
2. **Phase 2 (Optimize)**: Rebuild with profile data, compiler optimizes hot paths
**Benefits**:
- Better branch prediction
- Improved code layout (hot paths together)
- Inlining decisions based on actual usage
- +50-75% performance improvement
## Requirements
- GCC with LTO/PGO support (gcc 7+)
- ~2 minutes for full PGO build
- 200MB disk space for profile data (*.gcda files)
## Troubleshooting
### "Profile data not generated"
```bash
# Make sure you run the instrumented binary
HAKMEM_WRAP_TINY=1 ./bench_comprehensive_hakmem
```
### "No profile data found"
```bash
# Run profile step first
./build_pgo.sh profile
```
### Clean start
```bash
./build_pgo.sh clean
./build_pgo.sh all
```
### ベンチ実行のヒントTiny向け
- Tinyを有効化: `export HAKMEM_WRAP_TINY=1`
- 観測・学習系は既定OFFオーバーヘッド回避。ONにする場合のみ環境変数を設定
- 学習: `HAKMEM_LEARN=1`
- 既定では `HAKMEM_SITE_RULES``HAKMEM_PROF` は未設定OFF
### TinyモードとFLINTフロント遅延インテリジェンス
- Ultra TinySLL-only, 実験)
- 有効化: `HAKMEM_TINY_ULTRA=1`
- 検証ON/OFF: `HAKMEM_TINY_ULTRA_VALIDATE=0/1`性能計測時は0推奨
- パラメータ(クラス別上書き):
- `HAKMEM_TINY_ULTRA_BATCH_C{0..7}=N`
- `HAKMEM_TINY_ULTRA_SLL_CAP_C{0..7}=N`
- 可視化: `bash scripts/run_ultra_debug_sweep.sh 60000 200`
- FLINTFast Lightweight INTelligence, 実験)
- FRONT超軽量FastCache: `HAKMEM_TINY_FRONTEND=1`
- INT遅延インテリジェンス: イベント集計BGスレッド: `HAKMEM_INT_ENGINE=1`
- 備考: ホットパス最小化学習は非同期化。現状は実験Ultra/通常とA/B比較推奨
- TinyQuickSlot最小フロント: `HAKMEM_TINY_QUICK=1`
- 64B/クラスの6エントリ・スタック。ヒット時は1ラインのみ参照し返却。
- miss時は SLL→Quick, Magazine→Quick の少量補充で局所性を維持。
### スクリプト集CSV出力あり
- 直リンク総合比較HAKMEM vs mimalloc: `bash scripts/run_comprehensive_pair.sh`
- Tiny triadHAKMEM/System/mimalloc: `bash scripts/run_tiny_hot_triad.sh 80000`
- Random mixed triad: `bash scripts/run_random_mixed_matrix.sh 120000`
- Ultra可視化: `bash scripts/run_ultra_debug_sweep.sh 60000 200`
- Ultraパラメータスイープ: `bash scripts/sweep_ultra_params.sh 40000 150`
### 高速ビルドターゲット(実験用)
```bash
make bench_fast
HAKMEM_WRAP_TINY=1 ./bench_comprehensive_hakmem
```
備考: unwindテーブル等を削減するビルド。PGOとの併用を推奨。
---
## Advanced: Manual PGO Build
If you prefer Makefile directly:
```bash
# Step 1: Profile collection
make pgo-profile
# Step 2: Optimized build
make pgo-build
# Run benchmark
./bench_comprehensive_hakmem
```
---
## Phase 8.4 Achievement
```
🏆 342M ops/s NEW RECORD! (+8.2% vs Step 3d baseline)
Top 5 Results (PGO Build):
1. 64B FIFO: 342M ops/s 🥇
2. 64B Interleaved: 342M ops/s 🥈
3. 64B Long-lived: 342M ops/s 🥉
4. 32B Long-lived: 341M ops/s
5. 128B FIFO: 341M ops/s
```
**Design**: Zero hot-path overhead ACE observer
- Removed all ACE counters from alloc/free paths (600M+ operations)
- Background Learner thread observation (1-second interval)
- Registry-based scan using existing `meta->used` field
---
Generated with [Claude Code](https://claude.com/claude-code) - Phase 8.4