Implemented Hot/Cold Path separation using Box pattern for Tiny allocations: Performance Improvement (without PGO): - Baseline (Phase 26-A): 53.3 M ops/s - Hot/Cold Box (Phase 4-Step2): 57.2 M ops/s - Gain: +7.3% (+3.9 M ops/s) Implementation: 1. core/box/tiny_front_hot_box.h - Ultra-fast hot path (1 branch) - Removed range check (caller guarantees valid class_idx) - Inline cache hit path with branch prediction hints - Debug metrics with zero overhead in Release builds 2. core/box/tiny_front_cold_box.h - Slow cold path (noinline, cold) - Refill logic (batch allocation from SuperSlab) - Drain logic (batch free to SuperSlab) - Error reporting and diagnostics 3. core/front/malloc_tiny_fast.h - Updated to use Hot/Cold Boxes - Hot path: tiny_hot_alloc_fast() (1 branch: cache empty check) - Cold path: tiny_cold_refill_and_alloc() (noinline, cold attribute) - Clear separation improves i-cache locality Branch Analysis: - Baseline: 4-5 branches in hot path (range check + cache check + refill logic mixed) - Hot/Cold Box: 1 branch in hot path (cache empty check only) - Reduction: 3-4 branches eliminated from hot path Design Principles (Box Pattern): ✅ Single Responsibility: Hot path = cache hit only, Cold path = refill/errors ✅ Clear Contract: Hot returns NULL on miss, Cold handles miss ✅ Observable: Debug metrics (TINY_HOT_METRICS_*) gated by NDEBUG ✅ Safe: Branch prediction hints (TINY_HOT_LIKELY/UNLIKELY) ✅ Testable: Isolated hot/cold paths, easy A/B testing PGO Status: - Temporarily disabled (build issues with __gcov_merge_time_profile) - Will re-enable PGO in future commit after resolving gcc/lto issues - Current benchmarks are without PGO (fair A/B comparison) Other Changes: - .gitignore: Added *.d files (dependency files, auto-generated) - Makefile: PGO targets temporarily disabled (show informational message) - build_pgo.sh: Temporarily disabled (show "PGO paused" message) Next: Phase 4-Step3 (Front Config Box, target +5-8%) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
144 lines
2.0 KiB
Plaintext
144 lines
2.0 KiB
Plaintext
# Build artifacts
|
|
*.o
|
|
*.so
|
|
*.a
|
|
*.exe
|
|
*.d
|
|
out/
|
|
bench_allocators
|
|
bench_asan
|
|
test_hakmem
|
|
test_evo
|
|
test_p2
|
|
test_sizeclass_dist
|
|
vm_profile
|
|
vm_profile_system
|
|
pf_test
|
|
memset_test
|
|
|
|
# Benchmark outputs
|
|
*.log
|
|
*.csv
|
|
|
|
# Windows Zone.Identifier files
|
|
*:Zone.Identifier
|
|
|
|
# Editor/IDE files
|
|
.vscode/
|
|
.idea/
|
|
*.swp
|
|
*~
|
|
|
|
# Python cache
|
|
__pycache__/
|
|
*.pyc
|
|
*.pyo
|
|
|
|
# Core dumps
|
|
core.*
|
|
|
|
# PGO profile data
|
|
*.gcda
|
|
*.gcno
|
|
|
|
# Binaries - benchmark executables
|
|
bench_allocators
|
|
bench_comprehensive_hakmem
|
|
bench_comprehensive_hakmi
|
|
bench_comprehensive_hakx
|
|
bench_comprehensive_mi
|
|
bench_comprehensive_system
|
|
bench_mid_large_hakmem
|
|
bench_mid_large_hakx
|
|
bench_mid_large_mi
|
|
bench_mid_large_mt_hakmem
|
|
bench_mid_large_mt_hakx
|
|
bench_mid_large_mt_mi
|
|
bench_mid_large_mt_system
|
|
bench_mid_large_system
|
|
bench_random_mixed_hakmi
|
|
bench_random_mixed_hakx
|
|
bench_random_mixed_mi
|
|
bench_random_mixed_system
|
|
bench_tiny_hot_direct
|
|
bench_tiny_hot_hakmi
|
|
bench_tiny_hot_hakx
|
|
bench_tiny_hot_mi
|
|
bench_tiny_hot_system
|
|
bench_fragment_stress_hakmem
|
|
bench_fragment_stress_mi
|
|
bench_fragment_stress_system
|
|
bench_burst_pause_hakmem
|
|
bench_burst_pause_mi
|
|
bench_burst_pause_system
|
|
test_offset
|
|
test_simple_mt
|
|
print_tiny_stats
|
|
|
|
# Benchmark results (keep in benchmarks/ directory)
|
|
*.txt
|
|
!benchmarks/*.md
|
|
|
|
# Perf data
|
|
perf.data
|
|
perf.data.old
|
|
perf_*.data
|
|
perf_*.data.old
|
|
# Perf data directory (organized)
|
|
perf_data/
|
|
|
|
# Local benchmark result directories
|
|
bench_results/
|
|
|
|
# Backup files
|
|
*.backup
|
|
|
|
# Temporary files
|
|
.tmp_*
|
|
*.tmp
|
|
|
|
# Archive directories
|
|
bench_results_archive/
|
|
.backup_*/
|
|
|
|
# External dependencies
|
|
glibc-*/
|
|
*.zip
|
|
*.tar.gz
|
|
|
|
# Memory measurement script
|
|
measure_memory.sh
|
|
|
|
# Additional perf data patterns
|
|
*perf.data
|
|
*perf.data.old
|
|
perf_data_*/
|
|
|
|
# Large log files
|
|
logs/*.err
|
|
logs/*.log
|
|
guard_*.log
|
|
asan_*.log
|
|
ubsan_*.log
|
|
*.err
|
|
|
|
# Worktrees (embedded git repos)
|
|
worktrees/
|
|
|
|
# Binary executables
|
|
larson_hakmem
|
|
larson_hakmem_asan
|
|
larson_hakmem_ubsan
|
|
larson_hakmem_tsan
|
|
bench_tiny_hot_hakmem
|
|
test_*
|
|
|
|
# All benchmark binaries
|
|
larson_*
|
|
bench_*
|
|
|
|
# Benchmark result files
|
|
benchmarks/results/snapshot_*/
|
|
*.out
|
|
*.d
|