Files
hakmem/docs/archive/BUILD_README.md
Moe Charm (CI) 52386401b3 Debug Counters Implementation - Clean History
Major Features:
- Debug counter infrastructure for Refill Stage tracking
- Free Pipeline counters (ss_local, ss_remote, tls_sll)
- Diagnostic counters for early return analysis
- Unified larson.sh benchmark runner with profiles
- Phase 6-3 regression analysis documentation

Bug Fixes:
- Fix SuperSlab disabled by default (HAKMEM_TINY_USE_SUPERSLAB)
- Fix profile variable naming consistency
- Add .gitignore patterns for large files

Performance:
- Phase 6-3: 4.79 M ops/s (has OOM risk)
- With SuperSlab: 3.13 M ops/s (+19% improvement)

This is a clean repository without large log files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-05 12:31:14 +09:00

195 lines
5.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# HAKMEM Build Guide
## Quick Start
### 通常ビルド (Normal Build)
```bash
make bench_comprehensive_hakmem
./bench_comprehensive_hakmem
```
**Expected**: 200-220M ops/sec
### PGO最適化ビルド (Recommended)
```bash
./build_pgo.sh
./bench_comprehensive_hakmem
```
**Expected**: 300-350M ops/sec (+50-75% faster!)
### 共有ライブラリLD_PRELOADPGOビルド
```bash
# Step 1: 計測用に instrumented な共有ライブラリでプロファイル収集
make pgo-profile-shared
# Step 2: PGO最適化した共有ライブラリをビルド
make pgo-build-shared
# 実行system版ベンチに差し替え
HAKMEM_WRAP_TINY=1 LD_PRELOAD=./libhakmem.so ./bench_comprehensive_system
```
Expected: 共有ライブラリでも通常より高速(環境により差あり)
---
## PGO Build Script
### Usage
```bash
./build_pgo.sh [command]
```
### Commands
| Command | Description | When to Use |
|---------|-------------|-------------|
| `all` | Full PGO build (default) | First time, or after code changes |
| `clean` | Clean previous builds | Before rebuilding |
| `profile` | Build instrumented + collect profile | Step 1: Profile collection |
| `build` | Build optimized using profile | Step 2: After profile exists |
### Example Workflow
#### Full automated build (recommended)
```bash
./build_pgo.sh
```
#### Manual step-by-step
```bash
# Step 1: Collect profile
./build_pgo.sh profile
# Step 2: Build optimized
./build_pgo.sh build
```
---
## Performance Comparison
| Build Type | 128B Long-lived | Best Result | Use Case |
|------------|----------------|-------------|----------|
| **Normal** | 210M ops/s | 222M ops/s | Debug, development |
| **PGO** | 314M ops/s | 342M ops/s | Production, benchmarks |
Latest (Phase 9.3+ tiny fast-path):
- Direct (PGO): 400M+ ops/s 確認済みbench_comprehensive_hakmem
- System malloc baseline: ~410M ops/s環境依存
## What is PGO?
**Profile-Guided Optimization (PGO)** is a compiler optimization technique:
1. **Phase 1 (Profile)**: Build instrumented binary, run representative workload
2. **Phase 2 (Optimize)**: Rebuild with profile data, compiler optimizes hot paths
**Benefits**:
- Better branch prediction
- Improved code layout (hot paths together)
- Inlining decisions based on actual usage
- +50-75% performance improvement
## Requirements
- GCC with LTO/PGO support (gcc 7+)
- ~2 minutes for full PGO build
- 200MB disk space for profile data (*.gcda files)
## Troubleshooting
### "Profile data not generated"
```bash
# Make sure you run the instrumented binary
HAKMEM_WRAP_TINY=1 ./bench_comprehensive_hakmem
```
### "No profile data found"
```bash
# Run profile step first
./build_pgo.sh profile
```
### Clean start
```bash
./build_pgo.sh clean
./build_pgo.sh all
```
### ベンチ実行のヒントTiny向け
- Tinyを有効化: `export HAKMEM_WRAP_TINY=1`
- 観測・学習系は既定OFFオーバーヘッド回避。ONにする場合のみ環境変数を設定
- 学習: `HAKMEM_LEARN=1`
- 既定では `HAKMEM_SITE_RULES``HAKMEM_PROF` は未設定OFF
### TinyモードとFLINTフロント遅延インテリジェンス
- Ultra TinySLL-only, 実験)
- 有効化: `HAKMEM_TINY_ULTRA=1`
- 検証ON/OFF: `HAKMEM_TINY_ULTRA_VALIDATE=0/1`性能計測時は0推奨
- パラメータ(クラス別上書き):
- `HAKMEM_TINY_ULTRA_BATCH_C{0..7}=N`
- `HAKMEM_TINY_ULTRA_SLL_CAP_C{0..7}=N`
- 可視化: `bash scripts/run_ultra_debug_sweep.sh 60000 200`
- FLINTFast Lightweight INTelligence, 実験)
- FRONT超軽量FastCache: `HAKMEM_TINY_FRONTEND=1`
- INT遅延インテリジェンス: イベント集計BGスレッド: `HAKMEM_INT_ENGINE=1`
- 備考: ホットパス最小化学習は非同期化。現状は実験Ultra/通常とA/B比較推奨
- TinyQuickSlot最小フロント: `HAKMEM_TINY_QUICK=1`
- 64B/クラスの6エントリ・スタック。ヒット時は1ラインのみ参照し返却。
- miss時は SLL→Quick, Magazine→Quick の少量補充で局所性を維持。
### スクリプト集CSV出力あり
- 直リンク総合比較HAKMEM vs mimalloc: `bash scripts/run_comprehensive_pair.sh`
- Tiny triadHAKMEM/System/mimalloc: `bash scripts/run_tiny_hot_triad.sh 80000`
- Random mixed triad: `bash scripts/run_random_mixed_matrix.sh 120000`
- Ultra可視化: `bash scripts/run_ultra_debug_sweep.sh 60000 200`
- Ultraパラメータスイープ: `bash scripts/sweep_ultra_params.sh 40000 150`
### 高速ビルドターゲット(実験用)
```bash
make bench_fast
HAKMEM_WRAP_TINY=1 ./bench_comprehensive_hakmem
```
備考: unwindテーブル等を削減するビルド。PGOとの併用を推奨。
---
## Advanced: Manual PGO Build
If you prefer Makefile directly:
```bash
# Step 1: Profile collection
make pgo-profile
# Step 2: Optimized build
make pgo-build
# Run benchmark
./bench_comprehensive_hakmem
```
---
## Phase 8.4 Achievement
```
🏆 342M ops/s NEW RECORD! (+8.2% vs Step 3d baseline)
Top 5 Results (PGO Build):
1. 64B FIFO: 342M ops/s 🥇
2. 64B Interleaved: 342M ops/s 🥈
3. 64B Long-lived: 342M ops/s 🥉
4. 32B Long-lived: 341M ops/s
5. 128B FIFO: 341M ops/s
```
**Design**: Zero hot-path overhead ACE observer
- Removed all ACE counters from alloc/free paths (600M+ operations)
- Background Learner thread observation (1-second interval)
- Registry-based scan using existing `meta->used` field
---
Generated with [Claude Code](https://claude.com/claude-code) - Phase 8.4