Files

Moe Charm (CI) 52386401b3 Debug Counters Implementation - Clean History

Major Features:
- Debug counter infrastructure for Refill Stage tracking
- Free Pipeline counters (ss_local, ss_remote, tls_sll)
- Diagnostic counters for early return analysis
- Unified larson.sh benchmark runner with profiles
- Phase 6-3 regression analysis documentation

Bug Fixes:
- Fix SuperSlab disabled by default (HAKMEM_TINY_USE_SUPERSLAB)
- Fix profile variable naming consistency
- Add .gitignore patterns for large files

Performance:
- Phase 6-3: 4.79 M ops/s (has OOM risk)
- With SuperSlab: 3.13 M ops/s (+19% improvement)

This is a clean repository without large log files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-05 12:31:14 +09:00

5.5 KiB

Raw Blame History

HAKMEM Build Guide

Quick Start

通常ビルド (Normal Build)

make bench_comprehensive_hakmem
./bench_comprehensive_hakmem

Expected: 200-220M ops/sec

PGO最適化ビルド (Recommended)

./build_pgo.sh
./bench_comprehensive_hakmem

Expected: 300-350M ops/sec (+50-75% faster!)

共有ライブラリ（LD_PRELOAD）PGOビルド

# Step 1: 計測用に instrumented な共有ライブラリでプロファイル収集
make pgo-profile-shared

# Step 2: PGO最適化した共有ライブラリをビルド
make pgo-build-shared

# 実行（system版ベンチに差し替え）
HAKMEM_WRAP_TINY=1 LD_PRELOAD=./libhakmem.so ./bench_comprehensive_system

Expected: 共有ライブラリでも通常より高速（環境により差あり）

PGO Build Script

Usage

./build_pgo.sh [command]

Commands

Command	Description	When to Use
`all`	Full PGO build (default)	First time, or after code changes
`clean`	Clean previous builds	Before rebuilding
`profile`	Build instrumented + collect profile	Step 1: Profile collection
`build`	Build optimized using profile	Step 2: After profile exists

Example Workflow

Full automated build (recommended)

./build_pgo.sh

Manual step-by-step

# Step 1: Collect profile
./build_pgo.sh profile

# Step 2: Build optimized
./build_pgo.sh build

Performance Comparison

Build Type	128B Long-lived	Best Result	Use Case
Normal	210M ops/s	222M ops/s	Debug, development
PGO	314M ops/s	342M ops/s	Production, benchmarks

Latest (Phase 9.3+ tiny fast-path):

Direct (PGO): 400M+ ops/s 確認済み（bench_comprehensive_hakmem）
System malloc baseline: ~410M ops/s（環境依存）

What is PGO?

Profile-Guided Optimization (PGO) is a compiler optimization technique:

Phase 1 (Profile): Build instrumented binary, run representative workload
Phase 2 (Optimize): Rebuild with profile data, compiler optimizes hot paths

Benefits:

Better branch prediction
Improved code layout (hot paths together)
Inlining decisions based on actual usage
+50-75% performance improvement

Requirements

GCC with LTO/PGO support (gcc 7+)
~2 minutes for full PGO build
200MB disk space for profile data (*.gcda files)

Troubleshooting

"Profile data not generated"

# Make sure you run the instrumented binary
HAKMEM_WRAP_TINY=1 ./bench_comprehensive_hakmem

"No profile data found"

# Run profile step first
./build_pgo.sh profile

Clean start

./build_pgo.sh clean
./build_pgo.sh all

ベンチ実行のヒント（Tiny向け）

Tinyを有効化: export HAKMEM_WRAP_TINY=1
観測・学習系は既定OFF（オーバーヘッド回避）。ONにする場合のみ環境変数を設定
- 学習: HAKMEM_LEARN=1
- 既定では HAKMEM_SITE_RULES や HAKMEM_PROF は未設定（OFF）

TinyモードとFLINT（フロント＋遅延インテリジェンス）

Ultra Tiny（SLL-only, 実験）
- 有効化: HAKMEM_TINY_ULTRA=1
- 検証ON/OFF: HAKMEM_TINY_ULTRA_VALIDATE=0/1（性能計測時は0推奨）
- パラメータ（クラス別上書き）:
  - HAKMEM_TINY_ULTRA_BATCH_C{0..7}=N
  - HAKMEM_TINY_ULTRA_SLL_CAP_C{0..7}=N
- 可視化: bash scripts/run_ultra_debug_sweep.sh 60000 200
FLINT（Fast Lightweight INTelligence, 実験）
- FRONT（超軽量FastCache）: HAKMEM_TINY_FRONTEND=1
- INT（遅延インテリジェンス: イベント集計＋BGスレッド）: HAKMEM_INT_ENGINE=1
- 備考: ホットパス最小化＋学習は非同期化。現状は実験（Ultra/通常とA/B比較推奨）
- TinyQuickSlot（最小フロント）: HAKMEM_TINY_QUICK=1
  - 64B/クラスの6エントリ・スタック。ヒット時は1ラインのみ参照し返却。
  - miss時は SLL→Quick, Magazine→Quick の少量補充で局所性を維持。

スクリプト集（CSV出力あり）

直リンク総合比較（HAKMEM vs mimalloc）: bash scripts/run_comprehensive_pair.sh
Tiny triad（HAKMEM/System/mimalloc）: bash scripts/run_tiny_hot_triad.sh 80000
Random mixed triad: bash scripts/run_random_mixed_matrix.sh 120000
Ultra可視化: bash scripts/run_ultra_debug_sweep.sh 60000 200
Ultraパラメータスイープ: bash scripts/sweep_ultra_params.sh 40000 150

高速ビルドターゲット（実験用）

make bench_fast
HAKMEM_WRAP_TINY=1 ./bench_comprehensive_hakmem

備考: unwindテーブル等を削減するビルド。PGOとの併用を推奨。

Advanced: Manual PGO Build

If you prefer Makefile directly:

# Step 1: Profile collection
make pgo-profile

# Step 2: Optimized build
make pgo-build

# Run benchmark
./bench_comprehensive_hakmem

Phase 8.4 Achievement

🏆 342M ops/s NEW RECORD! (+8.2% vs Step 3d baseline)

Top 5 Results (PGO Build):
1. 64B FIFO:        342M ops/s 🥇
2. 64B Interleaved: 342M ops/s 🥈
3. 64B Long-lived:  342M ops/s 🥉
4. 32B Long-lived:  341M ops/s
5. 128B FIFO:       341M ops/s

Design: Zero hot-path overhead ACE observer

Removed all ACE counters from alloc/free paths (600M+ operations)
Background Learner thread observation (1-second interval)
Registry-based scan using existing meta->used field

Generated with Claude Code - Phase 8.4

5.5 KiB Raw Blame History Unescape Escape