Files
hakmem/core/hakmem_phase7_config.h

138 lines
5.0 KiB
C
Raw Normal View History

Phase 7 Task 3: Pre-warm TLS cache (+180-280% improvement!) MAJOR SUCCESS: HAKMEM now achieves 85-92% of System malloc on tiny allocations (128-512B) and BEATS System at 146% on 1024B allocations! Performance Results: - Random Mixed 128B: 21M → 59M ops/s (+181%) 🚀 - Random Mixed 256B: 19M → 70M ops/s (+268%) 🚀 - Random Mixed 512B: 21M → 68M ops/s (+224%) 🚀 - Random Mixed 1024B: 21M → 65M ops/s (+210%, 146% of System!) 🏆 - Larson 1T: 2.68M ops/s (stable, no regression) Implementation: 1. Task 3a: Remove profiling overhead in release builds - Wrapped RDTSC calls in #if !HAKMEM_BUILD_RELEASE - Compiler can eliminate profiling code completely - Effect: +2% (2.68M → 2.73M Larson) 2. Task 3b: Simplify refill logic - Use constants from hakmem_build_flags.h - TLS cache already optimal - Effect: No regression 3. Task 3c: Pre-warm TLS cache (GAME CHANGER!) - Pre-allocate 16 blocks per class at init - Eliminates cold-start penalty - Effect: +180-280% improvement 🚀 Root Cause: The bottleneck was cold-start, not the hot path! First allocation in each class triggered a SuperSlab refill (100+ cycles). Pre-warming eliminated this penalty, revealing Phase 7's true potential. Files Modified: - core/hakmem_tiny.c: Pre-warm function implementation - core/box/hak_core_init.inc.h: Pre-warm initialization call - core/tiny_alloc_fast.inc.h: Profiling overhead removal - core/hakmem_phase7_config.h: Task 3 constants (NEW) - core/hakmem_build_flags.h: Phase 7 feature flags - Makefile: PREWARM_TLS flag, phase7 targets - CLAUDE.md: Phase 7 success summary - PHASE7_TASK3_RESULTS.md: Comprehensive results report (NEW) Build: make HEADER_CLASSIDX=1 AGGRESSIVE_INLINE=1 PREWARM_TLS=1 phase7-bench 🎉 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-08 12:54:52 +09:00
// hakmem_phase7_config.h - Phase 7 定数・パラメータ集約ヘッダー
// Purpose: Phase 7の重要な定数数値・閾値を一箇所に集約忘れないように
// Usage: Phase 7のコードから include される
//
// 注意: コンパイル時フラグON/OFFは hakmem_build_flags.h で定義
// このファイルは数値定数・パラメータのみ!
#ifndef HAKMEM_PHASE7_CONFIG_H
#define HAKMEM_PHASE7_CONFIG_H
#include "hakmem_build_flags.h" // Phase 7 フラグを取得
// ========================================
// 【重要】フラグと定数の役割分担
// ========================================
//
// hakmem_build_flags.h (既存):
// - コンパイル時 ON/OFF フラグ
// - HAKMEM_TINY_HEADER_CLASSIDX (Task 1)
// - HAKMEM_TINY_AGGRESSIVE_INLINE (Task 2)
// - HAKMEM_TINY_PREWARM_TLS (Task 3)
// - HAKMEM_TINY_REFILL_DEFAULT (16)
//
// hakmem_phase7_config.h (このファイル):
// - Phase 7 専用の数値定数・閾値
// - 性能目標値
// - チューニングパラメータ
// - ドキュメント・使い方
// ========================================
// ========================================
// Phase 7 重要定数(チューニングパラメータ)
// ========================================
// Refill count 範囲hakmem_build_flags.h で HAKMEM_TINY_REFILL_DEFAULT=16 が定義済み)
// 環境変数 HAKMEM_TINY_REFILL_COUNT で上書き可能
#ifndef HAKMEM_TINY_REFILL_MIN
# define HAKMEM_TINY_REFILL_MIN 8
#endif
#ifndef HAKMEM_TINY_REFILL_MAX
# define HAKMEM_TINY_REFILL_MAX 256
#endif
// TLS cache capacity デフォルト値
// 小さすぎる: 頻繁な refill → 遅い
// 大きすぎる: メモリ浪費、cache miss 増加
#ifndef HAKMEM_TINY_TLS_CAP_DEFAULT
# define HAKMEM_TINY_TLS_CAP_DEFAULT 64
#endif
// Pre-warm count (Task 3)
// 初期化時に各クラスに何個のブロックを事前割り当てするか
#ifndef HAKMEM_TINY_PREWARM_COUNT
# define HAKMEM_TINY_PREWARM_COUNT 16
#endif
// ========================================
// Phase 7 Header Magic (Task 1)
// ========================================
// Note: これらの定数は tiny_region_id.h でも定義されています
// ここは参照・ドキュメント用です
// Header format: 1 byte before each block
// Bits 0-3: class_idx (0-15, only 0-7 used for Tiny)
// Bits 4-7: magic (0xA for validation)
// 実装: core/tiny_region_id.h:36-37 を参照
// ========================================
// Phase 7 Performance Targets
// ========================================
// Target: 40-55% of System malloc (27-37M ops/s on typical hardware)
// Current baseline: 21M ops/s (31% of System)
// After Tasks 1-5: 27-37M ops/s (40-55% of System) ← 目標!
#ifndef HAKMEM_PHASE7_TARGET_MIN_PERCENT
# define HAKMEM_PHASE7_TARGET_MIN_PERCENT 40 // 最低目標: 40% of System
#endif
#ifndef HAKMEM_PHASE7_TARGET_MAX_PERCENT
# define HAKMEM_PHASE7_TARGET_MAX_PERCENT 55 // 最高目標: 55% of System
#endif
// ========================================
// Phase 7 環境変数リスト(ドキュメント用)
// ========================================
// Runtime tunable via environment variables:
//
// HAKMEM_TINY_REFILL_COUNT=<n> 全クラスの refill count
// HAKMEM_TINY_REFILL_COUNT_HOT=<n> class 0-3 の refill count
// HAKMEM_TINY_REFILL_COUNT_MID=<n> class 4-7 の refill count
// HAKMEM_TINY_REFILL_COUNT_C0=<n> class 0 の refill count (個別設定)
// HAKMEM_TINY_REFILL_COUNT_C1=<n> class 1 の refill count
// ... (C2-C7も同様)
//
// HAKMEM_TINY_TLS_CAP=<n> TLS cache capacity (default: 64)
// HAKMEM_TINY_PREWARM=<0|1> Pre-warm TLS cache at init
// HAKMEM_TINY_PROFILE=<0|1> Enable profiling counters
//
// Example:
// HAKMEM_TINY_REFILL_COUNT=32 ./bench_random_mixed_hakmem 100000 128 1234567
// ========================================
// Phase 7 ステータス2025-11-08 現在)
// ========================================
// Task 1: ✅ COMPLETE (Skip magic validation in release)
// Task 2: ✅ COMPLETE (Aggressive inline TLS macros)
// Task 3: 🔄 IN PROGRESS (Pre-warm + refill simplification)
// Task 4: ⏳ PENDING (PGO)
// Task 5: ⏳ PENDING (Full validation)
// Task 6: ✅ COMPLETE (このファイル!)
// ========================================
// 使い方(忘れないように!)
// ========================================
// 1. 開発中(デバッグ):
// make clean && make bench_random_mixed_hakmem larson_hakmem
//
// 2. Phase 7 最適化テスト:
// make phase7-bench
//
// 3. Phase 7 完全ビルド:
// make HEADER_CLASSIDX=1 AGGRESSIVE_INLINE=1 PREWARM_TLS=1
Phase 7 Task 3: Pre-warm TLS cache (+180-280% improvement!) MAJOR SUCCESS: HAKMEM now achieves 85-92% of System malloc on tiny allocations (128-512B) and BEATS System at 146% on 1024B allocations! Performance Results: - Random Mixed 128B: 21M → 59M ops/s (+181%) 🚀 - Random Mixed 256B: 19M → 70M ops/s (+268%) 🚀 - Random Mixed 512B: 21M → 68M ops/s (+224%) 🚀 - Random Mixed 1024B: 21M → 65M ops/s (+210%, 146% of System!) 🏆 - Larson 1T: 2.68M ops/s (stable, no regression) Implementation: 1. Task 3a: Remove profiling overhead in release builds - Wrapped RDTSC calls in #if !HAKMEM_BUILD_RELEASE - Compiler can eliminate profiling code completely - Effect: +2% (2.68M → 2.73M Larson) 2. Task 3b: Simplify refill logic - Use constants from hakmem_build_flags.h - TLS cache already optimal - Effect: No regression 3. Task 3c: Pre-warm TLS cache (GAME CHANGER!) - Pre-allocate 16 blocks per class at init - Eliminates cold-start penalty - Effect: +180-280% improvement 🚀 Root Cause: The bottleneck was cold-start, not the hot path! First allocation in each class triggered a SuperSlab refill (100+ cycles). Pre-warming eliminated this penalty, revealing Phase 7's true potential. Files Modified: - core/hakmem_tiny.c: Pre-warm function implementation - core/box/hak_core_init.inc.h: Pre-warm initialization call - core/tiny_alloc_fast.inc.h: Profiling overhead removal - core/hakmem_phase7_config.h: Task 3 constants (NEW) - core/hakmem_build_flags.h: Phase 7 feature flags - Makefile: PREWARM_TLS flag, phase7 targets - CLAUDE.md: Phase 7 success summary - PHASE7_TASK3_RESULTS.md: Comprehensive results report (NEW) Build: make HEADER_CLASSIDX=1 AGGRESSIVE_INLINE=1 PREWARM_TLS=1 phase7-bench 🎉 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-08 12:54:52 +09:00
// bench_random_mixed_hakmem larson_hakmem
//
// 4. PGO ビルド (Task 4):
// make PROFILE_GEN=1 bench_random_mixed_hakmem
// ./bench_random_mixed_hakmem 100000 128 1234567 # プロファイル収集
// make clean
// make PROFILE_USE=1 HEADER_CLASSIDX=1 AGGRESSIVE_INLINE=1
Phase 7 Task 3: Pre-warm TLS cache (+180-280% improvement!) MAJOR SUCCESS: HAKMEM now achieves 85-92% of System malloc on tiny allocations (128-512B) and BEATS System at 146% on 1024B allocations! Performance Results: - Random Mixed 128B: 21M → 59M ops/s (+181%) 🚀 - Random Mixed 256B: 19M → 70M ops/s (+268%) 🚀 - Random Mixed 512B: 21M → 68M ops/s (+224%) 🚀 - Random Mixed 1024B: 21M → 65M ops/s (+210%, 146% of System!) 🏆 - Larson 1T: 2.68M ops/s (stable, no regression) Implementation: 1. Task 3a: Remove profiling overhead in release builds - Wrapped RDTSC calls in #if !HAKMEM_BUILD_RELEASE - Compiler can eliminate profiling code completely - Effect: +2% (2.68M → 2.73M Larson) 2. Task 3b: Simplify refill logic - Use constants from hakmem_build_flags.h - TLS cache already optimal - Effect: No regression 3. Task 3c: Pre-warm TLS cache (GAME CHANGER!) - Pre-allocate 16 blocks per class at init - Eliminates cold-start penalty - Effect: +180-280% improvement 🚀 Root Cause: The bottleneck was cold-start, not the hot path! First allocation in each class triggered a SuperSlab refill (100+ cycles). Pre-warming eliminated this penalty, revealing Phase 7's true potential. Files Modified: - core/hakmem_tiny.c: Pre-warm function implementation - core/box/hak_core_init.inc.h: Pre-warm initialization call - core/tiny_alloc_fast.inc.h: Profiling overhead removal - core/hakmem_phase7_config.h: Task 3 constants (NEW) - core/hakmem_build_flags.h: Phase 7 feature flags - Makefile: PREWARM_TLS flag, phase7 targets - CLAUDE.md: Phase 7 success summary - PHASE7_TASK3_RESULTS.md: Comprehensive results report (NEW) Build: make HEADER_CLASSIDX=1 AGGRESSIVE_INLINE=1 PREWARM_TLS=1 phase7-bench 🎉 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-08 12:54:52 +09:00
// bench_random_mixed_hakmem
#endif // HAKMEM_PHASE7_CONFIG_H