Files
hakmem/docs/archive/BUILD_README.md
Moe Charm (CI) 52386401b3 Debug Counters Implementation - Clean History
Major Features:
- Debug counter infrastructure for Refill Stage tracking
- Free Pipeline counters (ss_local, ss_remote, tls_sll)
- Diagnostic counters for early return analysis
- Unified larson.sh benchmark runner with profiles
- Phase 6-3 regression analysis documentation

Bug Fixes:
- Fix SuperSlab disabled by default (HAKMEM_TINY_USE_SUPERSLAB)
- Fix profile variable naming consistency
- Add .gitignore patterns for large files

Performance:
- Phase 6-3: 4.79 M ops/s (has OOM risk)
- With SuperSlab: 3.13 M ops/s (+19% improvement)

This is a clean repository without large log files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-05 12:31:14 +09:00

5.5 KiB
Raw Blame History

HAKMEM Build Guide

Quick Start

通常ビルド (Normal Build)

make bench_comprehensive_hakmem
./bench_comprehensive_hakmem

Expected: 200-220M ops/sec

./build_pgo.sh
./bench_comprehensive_hakmem

Expected: 300-350M ops/sec (+50-75% faster!)

共有ライブラリLD_PRELOADPGOビルド

# Step 1: 計測用に instrumented な共有ライブラリでプロファイル収集
make pgo-profile-shared

# Step 2: PGO最適化した共有ライブラリをビルド
make pgo-build-shared

# 実行system版ベンチに差し替え
HAKMEM_WRAP_TINY=1 LD_PRELOAD=./libhakmem.so ./bench_comprehensive_system

Expected: 共有ライブラリでも通常より高速(環境により差あり)


PGO Build Script

Usage

./build_pgo.sh [command]

Commands

Command Description When to Use
all Full PGO build (default) First time, or after code changes
clean Clean previous builds Before rebuilding
profile Build instrumented + collect profile Step 1: Profile collection
build Build optimized using profile Step 2: After profile exists

Example Workflow

./build_pgo.sh

Manual step-by-step

# Step 1: Collect profile
./build_pgo.sh profile

# Step 2: Build optimized
./build_pgo.sh build

Performance Comparison

Build Type 128B Long-lived Best Result Use Case
Normal 210M ops/s 222M ops/s Debug, development
PGO 314M ops/s 342M ops/s Production, benchmarks

Latest (Phase 9.3+ tiny fast-path):

  • Direct (PGO): 400M+ ops/s 確認済みbench_comprehensive_hakmem
  • System malloc baseline: ~410M ops/s環境依存

What is PGO?

Profile-Guided Optimization (PGO) is a compiler optimization technique:

  1. Phase 1 (Profile): Build instrumented binary, run representative workload
  2. Phase 2 (Optimize): Rebuild with profile data, compiler optimizes hot paths

Benefits:

  • Better branch prediction
  • Improved code layout (hot paths together)
  • Inlining decisions based on actual usage
  • +50-75% performance improvement

Requirements

  • GCC with LTO/PGO support (gcc 7+)
  • ~2 minutes for full PGO build
  • 200MB disk space for profile data (*.gcda files)

Troubleshooting

"Profile data not generated"

# Make sure you run the instrumented binary
HAKMEM_WRAP_TINY=1 ./bench_comprehensive_hakmem

"No profile data found"

# Run profile step first
./build_pgo.sh profile

Clean start

./build_pgo.sh clean
./build_pgo.sh all

ベンチ実行のヒントTiny向け

  • Tinyを有効化: export HAKMEM_WRAP_TINY=1
  • 観測・学習系は既定OFFオーバーヘッド回避。ONにする場合のみ環境変数を設定
    • 学習: HAKMEM_LEARN=1
    • 既定では HAKMEM_SITE_RULESHAKMEM_PROF は未設定OFF

TinyモードとFLINTフロント遅延インテリジェンス

  • Ultra TinySLL-only, 実験)
    • 有効化: HAKMEM_TINY_ULTRA=1
    • 検証ON/OFF: HAKMEM_TINY_ULTRA_VALIDATE=0/1性能計測時は0推奨
    • パラメータ(クラス別上書き):
      • HAKMEM_TINY_ULTRA_BATCH_C{0..7}=N
      • HAKMEM_TINY_ULTRA_SLL_CAP_C{0..7}=N
    • 可視化: bash scripts/run_ultra_debug_sweep.sh 60000 200
  • FLINTFast Lightweight INTelligence, 実験)
    • FRONT超軽量FastCache: HAKMEM_TINY_FRONTEND=1
    • INT遅延インテリジェンス: イベント集計BGスレッド: HAKMEM_INT_ENGINE=1
    • 備考: ホットパス最小化学習は非同期化。現状は実験Ultra/通常とA/B比較推奨
    • TinyQuickSlot最小フロント: HAKMEM_TINY_QUICK=1
      • 64B/クラスの6エントリ・スタック。ヒット時は1ラインのみ参照し返却。
      • miss時は SLL→Quick, Magazine→Quick の少量補充で局所性を維持。

スクリプト集CSV出力あり

  • 直リンク総合比較HAKMEM vs mimalloc: bash scripts/run_comprehensive_pair.sh
  • Tiny triadHAKMEM/System/mimalloc: bash scripts/run_tiny_hot_triad.sh 80000
  • Random mixed triad: bash scripts/run_random_mixed_matrix.sh 120000
  • Ultra可視化: bash scripts/run_ultra_debug_sweep.sh 60000 200
  • Ultraパラメータスイープ: bash scripts/sweep_ultra_params.sh 40000 150

高速ビルドターゲット(実験用)

make bench_fast
HAKMEM_WRAP_TINY=1 ./bench_comprehensive_hakmem

備考: unwindテーブル等を削減するビルド。PGOとの併用を推奨。


Advanced: Manual PGO Build

If you prefer Makefile directly:

# Step 1: Profile collection
make pgo-profile

# Step 2: Optimized build
make pgo-build

# Run benchmark
./bench_comprehensive_hakmem

Phase 8.4 Achievement

🏆 342M ops/s NEW RECORD! (+8.2% vs Step 3d baseline)

Top 5 Results (PGO Build):
1. 64B FIFO:        342M ops/s 🥇
2. 64B Interleaved: 342M ops/s 🥈
3. 64B Long-lived:  342M ops/s 🥉
4. 32B Long-lived:  341M ops/s
5. 128B FIFO:       341M ops/s

Design: Zero hot-path overhead ACE observer

  • Removed all ACE counters from alloc/free paths (600M+ operations)
  • Background Learner thread observation (1-second interval)
  • Registry-based scan using existing meta->used field

Generated with Claude Code - Phase 8.4