Files
hakmem/docs/analysis/PHASE4_PROFILING_FILES_INDEX.md
2025-12-14 00:48:03 +09:00

4.1 KiB

Phase 4 Perf Profiling - Files Index

Date: 2025-12-14 Status: Complete

Created Documents

1. Primary Analysis

File: /mnt/workdisk/public_share/hakmem/docs/analysis/PHASE4_PERF_PROFILE_ANALYSIS.md Size: ~5000 words Contents:

  • Detailed perf report breakdown
  • Candidate analysis (tiny_alloc_gate_fast, free_tiny_fast_cold, ENV gates)
  • Shape optimization plateau analysis
  • E1 implementation plan (ENV snapshot consolidation)
  • Alternative targets (E2/E3/E4)

2. Executive Summary

File: /mnt/workdisk/public_share/hakmem/docs/analysis/PHASE4_PERF_PROFILE_FINAL_REPORT.md Size: ~3000 words Contents:

  • Executive summary
  • Top hotspots analysis
  • Selected target (E1 ENV Snapshot Consolidation)
  • Implementation roadmap
  • Success criteria checklist

3. Files Index (This Document)

File: /mnt/workdisk/public_share/hakmem/docs/analysis/PHASE4_PROFILING_FILES_INDEX.md Contents:

  • List of all created/modified files
  • Quick reference guide

Modified Documents

1. CURRENT_TASK.md

File: /mnt/workdisk/public_share/hakmem/CURRENT_TASK.md Changes:

  • Added Phase 4 perf profiling summary (lines 3-39)
  • Key findings: ENV gate overhead (3.26%), shape plateau analysis
  • Next target: Phase 4 E1 - ENV Snapshot Consolidation

Perf Data Artifacts

1. Raw Perf Data

File: /mnt/workdisk/public_share/hakmem/perf.data Format: Binary (perf record output) Size: 0.059 MB Samples: 922 @ 999Hz Command:

HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE \
perf record -F 999 -- ./bench_random_mixed_hakmem 40000000 400 1

2. Perf Report (Full)

File: /tmp/perf_report_full.txt Format: Text (perf report --stdio output) Contents: Full symbol-sorted report with self% breakdown

3. Perf Summary

File: /tmp/perf_summary.txt Format: Text (quick reference) Contents: Top hotspots, selected target, perf command reference

Key Findings

ENV Gate Overhead (3.26% Combined)

  1. tiny_c7_ultra_enabled_env(): 1.28%
  2. tiny_front_v3_enabled(): 1.01%
  3. tiny_metadata_cache_enabled(): 0.97%

Root Cause: 3 separate TLS reads + lazy init checks on every hot path call

Shape Optimization Plateau

  • B3 (Routing Shape): +2.89% (first pass)
  • D3 (Alloc Gate Shape): +0.56% NEUTRAL (diminishing returns)
  • Lesson: Branch prediction saturated, next frontier is caching/structural changes

Selected Next Target

Phase 4 E1: ENV Snapshot Consolidation

  • Expected gain: +3.0-3.5%
  • Approach: Consolidate all ENV gates into single TLS snapshot struct
  • Precedent: tiny_front_v3_snapshot (proven pattern)

Quick Navigation

Detailed Analysis

cat /mnt/workdisk/public_share/hakmem/docs/analysis/PHASE4_PERF_PROFILE_ANALYSIS.md

Executive Summary

cat /mnt/workdisk/public_share/hakmem/docs/analysis/PHASE4_PERF_PROFILE_FINAL_REPORT.md

Current Task Status

head -100 /mnt/workdisk/public_share/hakmem/CURRENT_TASK.md

Perf Commands (Re-run)

# Profile
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE \
perf record -F 999 -- ./bench_random_mixed_hakmem 40000000 400 1

# Report (top 80)
perf report --stdio --no-children --sort=symbol | head -80

# Annotate specific function
perf annotate --stdio tiny_alloc_gate_fast.lto_priv.0 | head -100

Next Steps

  1. Phase 4 E1 Implementation (2-3 days):

    • Create core/box/hakmem_env_snapshot_box.h/c
    • Migrate priority ENV gates (C7 ultra, front_v3, metadata_cache)
    • Refactor ~14 call sites
    • A/B test (Mixed 10-run, target +2.5%)
    • Health check, promote to default if GO
  2. Phase 4 E2 (SECONDARY, defer until E1 complete):

    • Per-class alloc fast path specialization
    • Expected gain: +2-3%
  3. Phase 4 E3 (TERTIARY, extends E1):

    • Free path ENV gate consolidation
    • Expected gain: +0.4-0.6%

References

  • Baseline: 46.37M ops/s (MIXED_TINYV3_C7_SAFE, Phase 3 + D1)
  • Target: 47.8M ops/s (+3.0% via E1)
  • Profile: MIXED_TINYV3_C7_SAFE (20M iterations, ws=400)
  • Workload: bench_random_mixed_hakmem (50% alloc / 50% free)

Status: COMPLETE - Ready for Phase 4 E1 Date: 2025-12-14