4.1 KiB
Phase 4 Perf Profiling - Files Index
Date: 2025-12-14 Status: Complete
Created Documents
1. Primary Analysis
File: /mnt/workdisk/public_share/hakmem/docs/analysis/PHASE4_PERF_PROFILE_ANALYSIS.md
Size: ~5000 words
Contents:
- Detailed perf report breakdown
- Candidate analysis (tiny_alloc_gate_fast, free_tiny_fast_cold, ENV gates)
- Shape optimization plateau analysis
- E1 implementation plan (ENV snapshot consolidation)
- Alternative targets (E2/E3/E4)
2. Executive Summary
File: /mnt/workdisk/public_share/hakmem/docs/analysis/PHASE4_PERF_PROFILE_FINAL_REPORT.md
Size: ~3000 words
Contents:
- Executive summary
- Top hotspots analysis
- Selected target (E1 ENV Snapshot Consolidation)
- Implementation roadmap
- Success criteria checklist
3. Files Index (This Document)
File: /mnt/workdisk/public_share/hakmem/docs/analysis/PHASE4_PROFILING_FILES_INDEX.md
Contents:
- List of all created/modified files
- Quick reference guide
Modified Documents
1. CURRENT_TASK.md
File: /mnt/workdisk/public_share/hakmem/CURRENT_TASK.md
Changes:
- Added Phase 4 perf profiling summary (lines 3-39)
- Key findings: ENV gate overhead (3.26%), shape plateau analysis
- Next target: Phase 4 E1 - ENV Snapshot Consolidation
Perf Data Artifacts
1. Raw Perf Data
File: /mnt/workdisk/public_share/hakmem/perf.data
Format: Binary (perf record output)
Size: 0.059 MB
Samples: 922 @ 999Hz
Command:
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE \
perf record -F 999 -- ./bench_random_mixed_hakmem 40000000 400 1
2. Perf Report (Full)
File: /tmp/perf_report_full.txt
Format: Text (perf report --stdio output)
Contents: Full symbol-sorted report with self% breakdown
3. Perf Summary
File: /tmp/perf_summary.txt
Format: Text (quick reference)
Contents: Top hotspots, selected target, perf command reference
Key Findings
ENV Gate Overhead (3.26% Combined)
tiny_c7_ultra_enabled_env(): 1.28%tiny_front_v3_enabled(): 1.01%tiny_metadata_cache_enabled(): 0.97%
Root Cause: 3 separate TLS reads + lazy init checks on every hot path call
Shape Optimization Plateau
- B3 (Routing Shape): +2.89% (first pass)
- D3 (Alloc Gate Shape): +0.56% NEUTRAL (diminishing returns)
- Lesson: Branch prediction saturated, next frontier is caching/structural changes
Selected Next Target
Phase 4 E1: ENV Snapshot Consolidation
- Expected gain: +3.0-3.5%
- Approach: Consolidate all ENV gates into single TLS snapshot struct
- Precedent:
tiny_front_v3_snapshot(proven pattern)
Quick Navigation
Detailed Analysis
cat /mnt/workdisk/public_share/hakmem/docs/analysis/PHASE4_PERF_PROFILE_ANALYSIS.md
Executive Summary
cat /mnt/workdisk/public_share/hakmem/docs/analysis/PHASE4_PERF_PROFILE_FINAL_REPORT.md
Current Task Status
head -100 /mnt/workdisk/public_share/hakmem/CURRENT_TASK.md
Perf Commands (Re-run)
# Profile
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE \
perf record -F 999 -- ./bench_random_mixed_hakmem 40000000 400 1
# Report (top 80)
perf report --stdio --no-children --sort=symbol | head -80
# Annotate specific function
perf annotate --stdio tiny_alloc_gate_fast.lto_priv.0 | head -100
Next Steps
-
Phase 4 E1 Implementation (2-3 days):
- Create
core/box/hakmem_env_snapshot_box.h/c - Migrate priority ENV gates (C7 ultra, front_v3, metadata_cache)
- Refactor ~14 call sites
- A/B test (Mixed 10-run, target +2.5%)
- Health check, promote to default if GO
- Create
-
Phase 4 E2 (SECONDARY, defer until E1 complete):
- Per-class alloc fast path specialization
- Expected gain: +2-3%
-
Phase 4 E3 (TERTIARY, extends E1):
- Free path ENV gate consolidation
- Expected gain: +0.4-0.6%
References
- Baseline: 46.37M ops/s (MIXED_TINYV3_C7_SAFE, Phase 3 + D1)
- Target: 47.8M ops/s (+3.0% via E1)
- Profile: MIXED_TINYV3_C7_SAFE (20M iterations, ws=400)
- Workload: bench_random_mixed_hakmem (50% alloc / 50% free)
Status: COMPLETE - Ready for Phase 4 E1 Date: 2025-12-14