Implemented automated Profile-Guided Optimization workflow using Box pattern: Performance Improvement: - Baseline: 57.0 M ops/s - PGO-optimized: 60.6 M ops/s - Gain: +6.25% (within expected +5-10% range) Implementation: 1. scripts/box/pgo_tiny_profile_config.sh - 5 representative workloads 2. scripts/box/pgo_tiny_profile_box.sh - Automated profile collection 3. Makefile PGO targets: - pgo-tiny-profile: Build instrumented binaries - pgo-tiny-collect: Collect .gcda profile data - pgo-tiny-build: Build optimized binaries - pgo-tiny-full: Complete workflow (profile → collect → build → test) 4. Makefile help target: Added PGO instructions for discoverability Design: - Box化: Single responsibility, clear contracts - Deterministic: Fixed seeds (42) for reproducibility - Safe: Validation, error detection, timeout protection (30s/workload) - Observable: Progress reporting, .gcda verification (33 files generated) Workload Coverage: - Random mixed: 3 working set sizes (128/256/512 slots) - Tiny hot: 2 size classes (16B/64B) - Total: 5 workloads covering hot/cold paths Documentation: - PHASE4_STEP1_COMPLETE.md - Completion report - CURRENT_TASK.md - Phase 4 roadmap (Step 1 complete ✓) - docs/design/PHASE4_TINY_FRONT_BOX_DESIGN.md - Complete Phase 4 design Next: Phase 4-Step2 (Hot/Cold Path Box, target +10-15%) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
44 lines
1.4 KiB
Bash
Executable File
44 lines
1.4 KiB
Bash
Executable File
#!/bin/bash
|
|
# Box: PGO Profile Configuration
|
|
# Purpose: Define representative workloads for Tiny Front
|
|
# Contract: Provides workload definitions for PGO profile collection
|
|
|
|
# Binaries to profile
|
|
PGO_BINARIES=(
|
|
"./bench_random_mixed_hakmem"
|
|
"./bench_tiny_hot_hakmem"
|
|
)
|
|
|
|
# Representative workloads (deterministic seeds for reproducibility)
|
|
# Design: Cover diverse allocation patterns for optimal PGO data
|
|
PGO_WORKLOADS=(
|
|
# Random mixed: Common case (medium working set)
|
|
# - Most representative of general allocation patterns
|
|
# - 256 slots = moderate cache pressure
|
|
"./bench_random_mixed_hakmem 5000000 256 42"
|
|
|
|
# Random mixed: Smaller working set (higher cache hit)
|
|
# - Exercises hot TLS SLL path heavily
|
|
# - 128 slots = higher hit rate
|
|
"./bench_random_mixed_hakmem 5000000 128 42"
|
|
|
|
# Random mixed: Larger working set (more diverse)
|
|
# - Exercises refill and cold paths more
|
|
# - 512 slots = more SuperSlab allocations
|
|
"./bench_random_mixed_hakmem 5000000 512 42"
|
|
|
|
# Tiny hot path: 16B allocations
|
|
# - Class 0 (smallest) intensive
|
|
# - High allocation frequency
|
|
"./bench_tiny_hot_hakmem 16 100 60000"
|
|
|
|
# Tiny hot path: 64B allocations
|
|
# - Class 3 (common size) intensive
|
|
# - Typical small object pattern
|
|
"./bench_tiny_hot_hakmem 64 100 60000"
|
|
)
|
|
|
|
# Configuration summary
|
|
PGO_WORKLOAD_COUNT=${#PGO_WORKLOADS[@]}
|
|
PGO_BINARY_COUNT=${#PGO_BINARIES[@]}
|