Major Features: - Debug counter infrastructure for Refill Stage tracking - Free Pipeline counters (ss_local, ss_remote, tls_sll) - Diagnostic counters for early return analysis - Unified larson.sh benchmark runner with profiles - Phase 6-3 regression analysis documentation Bug Fixes: - Fix SuperSlab disabled by default (HAKMEM_TINY_USE_SUPERSLAB) - Fix profile variable naming consistency - Add .gitignore patterns for large files Performance: - Phase 6-3: 4.79 M ops/s (has OOM risk) - With SuperSlab: 3.13 M ops/s (+19% improvement) This is a clean repository without large log files. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
5.5 KiB
5.5 KiB
HAKMEM Build Guide
Quick Start
通常ビルド (Normal Build)
make bench_comprehensive_hakmem
./bench_comprehensive_hakmem
Expected: 200-220M ops/sec
PGO最適化ビルド (Recommended)
./build_pgo.sh
./bench_comprehensive_hakmem
Expected: 300-350M ops/sec (+50-75% faster!)
共有ライブラリ(LD_PRELOAD)PGOビルド
# Step 1: 計測用に instrumented な共有ライブラリでプロファイル収集
make pgo-profile-shared
# Step 2: PGO最適化した共有ライブラリをビルド
make pgo-build-shared
# 実行(system版ベンチに差し替え)
HAKMEM_WRAP_TINY=1 LD_PRELOAD=./libhakmem.so ./bench_comprehensive_system
Expected: 共有ライブラリでも通常より高速(環境により差あり)
PGO Build Script
Usage
./build_pgo.sh [command]
Commands
| Command | Description | When to Use |
|---|---|---|
all |
Full PGO build (default) | First time, or after code changes |
clean |
Clean previous builds | Before rebuilding |
profile |
Build instrumented + collect profile | Step 1: Profile collection |
build |
Build optimized using profile | Step 2: After profile exists |
Example Workflow
Full automated build (recommended)
./build_pgo.sh
Manual step-by-step
# Step 1: Collect profile
./build_pgo.sh profile
# Step 2: Build optimized
./build_pgo.sh build
Performance Comparison
| Build Type | 128B Long-lived | Best Result | Use Case |
|---|---|---|---|
| Normal | 210M ops/s | 222M ops/s | Debug, development |
| PGO | 314M ops/s | 342M ops/s | Production, benchmarks |
Latest (Phase 9.3+ tiny fast-path):
- Direct (PGO): 400M+ ops/s 確認済み(bench_comprehensive_hakmem)
- System malloc baseline: ~410M ops/s(環境依存)
What is PGO?
Profile-Guided Optimization (PGO) is a compiler optimization technique:
- Phase 1 (Profile): Build instrumented binary, run representative workload
- Phase 2 (Optimize): Rebuild with profile data, compiler optimizes hot paths
Benefits:
- Better branch prediction
- Improved code layout (hot paths together)
- Inlining decisions based on actual usage
- +50-75% performance improvement
Requirements
- GCC with LTO/PGO support (gcc 7+)
- ~2 minutes for full PGO build
- 200MB disk space for profile data (*.gcda files)
Troubleshooting
"Profile data not generated"
# Make sure you run the instrumented binary
HAKMEM_WRAP_TINY=1 ./bench_comprehensive_hakmem
"No profile data found"
# Run profile step first
./build_pgo.sh profile
Clean start
./build_pgo.sh clean
./build_pgo.sh all
ベンチ実行のヒント(Tiny向け)
- Tinyを有効化:
export HAKMEM_WRAP_TINY=1 - 観測・学習系は既定OFF(オーバーヘッド回避)。ONにする場合のみ環境変数を設定
- 学習:
HAKMEM_LEARN=1 - 既定では
HAKMEM_SITE_RULESやHAKMEM_PROFは未設定(OFF)
- 学習:
TinyモードとFLINT(フロント+遅延インテリジェンス)
- Ultra Tiny(SLL-only, 実験)
- 有効化:
HAKMEM_TINY_ULTRA=1 - 検証ON/OFF:
HAKMEM_TINY_ULTRA_VALIDATE=0/1(性能計測時は0推奨) - パラメータ(クラス別上書き):
HAKMEM_TINY_ULTRA_BATCH_C{0..7}=NHAKMEM_TINY_ULTRA_SLL_CAP_C{0..7}=N
- 可視化:
bash scripts/run_ultra_debug_sweep.sh 60000 200
- 有効化:
- FLINT(Fast Lightweight INTelligence, 実験)
- FRONT(超軽量FastCache):
HAKMEM_TINY_FRONTEND=1 - INT(遅延インテリジェンス: イベント集計+BGスレッド):
HAKMEM_INT_ENGINE=1 - 備考: ホットパス最小化+学習は非同期化。現状は実験(Ultra/通常とA/B比較推奨)
- TinyQuickSlot(最小フロント):
HAKMEM_TINY_QUICK=1- 64B/クラスの6エントリ・スタック。ヒット時は1ラインのみ参照し返却。
- miss時は SLL→Quick, Magazine→Quick の少量補充で局所性を維持。
- FRONT(超軽量FastCache):
スクリプト集(CSV出力あり)
- 直リンク総合比較(HAKMEM vs mimalloc):
bash scripts/run_comprehensive_pair.sh - Tiny triad(HAKMEM/System/mimalloc):
bash scripts/run_tiny_hot_triad.sh 80000 - Random mixed triad:
bash scripts/run_random_mixed_matrix.sh 120000 - Ultra可視化:
bash scripts/run_ultra_debug_sweep.sh 60000 200 - Ultraパラメータスイープ:
bash scripts/sweep_ultra_params.sh 40000 150
高速ビルドターゲット(実験用)
make bench_fast
HAKMEM_WRAP_TINY=1 ./bench_comprehensive_hakmem
備考: unwindテーブル等を削減するビルド。PGOとの併用を推奨。
Advanced: Manual PGO Build
If you prefer Makefile directly:
# Step 1: Profile collection
make pgo-profile
# Step 2: Optimized build
make pgo-build
# Run benchmark
./bench_comprehensive_hakmem
Phase 8.4 Achievement
🏆 342M ops/s NEW RECORD! (+8.2% vs Step 3d baseline)
Top 5 Results (PGO Build):
1. 64B FIFO: 342M ops/s 🥇
2. 64B Interleaved: 342M ops/s 🥈
3. 64B Long-lived: 342M ops/s 🥉
4. 32B Long-lived: 341M ops/s
5. 128B FIFO: 341M ops/s
Design: Zero hot-path overhead ACE observer
- Removed all ACE counters from alloc/free paths (600M+ operations)
- Background Learner thread observation (1-second interval)
- Registry-based scan using existing
meta->usedfield
Generated with Claude Code - Phase 8.4