144 lines
4.1 KiB
Markdown
144 lines
4.1 KiB
Markdown
|
|
# Phase 4 Perf Profiling - Files Index
|
||
|
|
|
||
|
|
**Date**: 2025-12-14
|
||
|
|
**Status**: Complete
|
||
|
|
|
||
|
|
## Created Documents
|
||
|
|
|
||
|
|
### 1. Primary Analysis
|
||
|
|
**File**: `/mnt/workdisk/public_share/hakmem/docs/analysis/PHASE4_PERF_PROFILE_ANALYSIS.md`
|
||
|
|
**Size**: ~5000 words
|
||
|
|
**Contents**:
|
||
|
|
- Detailed perf report breakdown
|
||
|
|
- Candidate analysis (tiny_alloc_gate_fast, free_tiny_fast_cold, ENV gates)
|
||
|
|
- Shape optimization plateau analysis
|
||
|
|
- E1 implementation plan (ENV snapshot consolidation)
|
||
|
|
- Alternative targets (E2/E3/E4)
|
||
|
|
|
||
|
|
### 2. Executive Summary
|
||
|
|
**File**: `/mnt/workdisk/public_share/hakmem/docs/analysis/PHASE4_PERF_PROFILE_FINAL_REPORT.md`
|
||
|
|
**Size**: ~3000 words
|
||
|
|
**Contents**:
|
||
|
|
- Executive summary
|
||
|
|
- Top hotspots analysis
|
||
|
|
- Selected target (E1 ENV Snapshot Consolidation)
|
||
|
|
- Implementation roadmap
|
||
|
|
- Success criteria checklist
|
||
|
|
|
||
|
|
### 3. Files Index (This Document)
|
||
|
|
**File**: `/mnt/workdisk/public_share/hakmem/docs/analysis/PHASE4_PROFILING_FILES_INDEX.md`
|
||
|
|
**Contents**:
|
||
|
|
- List of all created/modified files
|
||
|
|
- Quick reference guide
|
||
|
|
|
||
|
|
## Modified Documents
|
||
|
|
|
||
|
|
### 1. CURRENT_TASK.md
|
||
|
|
**File**: `/mnt/workdisk/public_share/hakmem/CURRENT_TASK.md`
|
||
|
|
**Changes**:
|
||
|
|
- Added Phase 4 perf profiling summary (lines 3-39)
|
||
|
|
- Key findings: ENV gate overhead (3.26%), shape plateau analysis
|
||
|
|
- Next target: Phase 4 E1 - ENV Snapshot Consolidation
|
||
|
|
|
||
|
|
## Perf Data Artifacts
|
||
|
|
|
||
|
|
### 1. Raw Perf Data
|
||
|
|
**File**: `/mnt/workdisk/public_share/hakmem/perf.data`
|
||
|
|
**Format**: Binary (perf record output)
|
||
|
|
**Size**: 0.059 MB
|
||
|
|
**Samples**: 922 @ 999Hz
|
||
|
|
**Command**:
|
||
|
|
```bash
|
||
|
|
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE \
|
||
|
|
perf record -F 999 -- ./bench_random_mixed_hakmem 40000000 400 1
|
||
|
|
```
|
||
|
|
|
||
|
|
### 2. Perf Report (Full)
|
||
|
|
**File**: `/tmp/perf_report_full.txt`
|
||
|
|
**Format**: Text (perf report --stdio output)
|
||
|
|
**Contents**: Full symbol-sorted report with self% breakdown
|
||
|
|
|
||
|
|
### 3. Perf Summary
|
||
|
|
**File**: `/tmp/perf_summary.txt`
|
||
|
|
**Format**: Text (quick reference)
|
||
|
|
**Contents**: Top hotspots, selected target, perf command reference
|
||
|
|
|
||
|
|
## Key Findings
|
||
|
|
|
||
|
|
### ENV Gate Overhead (3.26% Combined)
|
||
|
|
1. `tiny_c7_ultra_enabled_env()`: 1.28%
|
||
|
|
2. `tiny_front_v3_enabled()`: 1.01%
|
||
|
|
3. `tiny_metadata_cache_enabled()`: 0.97%
|
||
|
|
|
||
|
|
**Root Cause**: 3 separate TLS reads + lazy init checks on every hot path call
|
||
|
|
|
||
|
|
### Shape Optimization Plateau
|
||
|
|
- B3 (Routing Shape): +2.89% (first pass)
|
||
|
|
- D3 (Alloc Gate Shape): +0.56% NEUTRAL (diminishing returns)
|
||
|
|
- **Lesson**: Branch prediction saturated, next frontier is caching/structural changes
|
||
|
|
|
||
|
|
### Selected Next Target
|
||
|
|
**Phase 4 E1**: ENV Snapshot Consolidation
|
||
|
|
- Expected gain: +3.0-3.5%
|
||
|
|
- Approach: Consolidate all ENV gates into single TLS snapshot struct
|
||
|
|
- Precedent: `tiny_front_v3_snapshot` (proven pattern)
|
||
|
|
|
||
|
|
## Quick Navigation
|
||
|
|
|
||
|
|
### Detailed Analysis
|
||
|
|
```bash
|
||
|
|
cat /mnt/workdisk/public_share/hakmem/docs/analysis/PHASE4_PERF_PROFILE_ANALYSIS.md
|
||
|
|
```
|
||
|
|
|
||
|
|
### Executive Summary
|
||
|
|
```bash
|
||
|
|
cat /mnt/workdisk/public_share/hakmem/docs/analysis/PHASE4_PERF_PROFILE_FINAL_REPORT.md
|
||
|
|
```
|
||
|
|
|
||
|
|
### Current Task Status
|
||
|
|
```bash
|
||
|
|
head -100 /mnt/workdisk/public_share/hakmem/CURRENT_TASK.md
|
||
|
|
```
|
||
|
|
|
||
|
|
### Perf Commands (Re-run)
|
||
|
|
```bash
|
||
|
|
# Profile
|
||
|
|
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE \
|
||
|
|
perf record -F 999 -- ./bench_random_mixed_hakmem 40000000 400 1
|
||
|
|
|
||
|
|
# Report (top 80)
|
||
|
|
perf report --stdio --no-children --sort=symbol | head -80
|
||
|
|
|
||
|
|
# Annotate specific function
|
||
|
|
perf annotate --stdio tiny_alloc_gate_fast.lto_priv.0 | head -100
|
||
|
|
```
|
||
|
|
|
||
|
|
## Next Steps
|
||
|
|
|
||
|
|
1. **Phase 4 E1 Implementation** (2-3 days):
|
||
|
|
- Create `core/box/hakmem_env_snapshot_box.h/c`
|
||
|
|
- Migrate priority ENV gates (C7 ultra, front_v3, metadata_cache)
|
||
|
|
- Refactor ~14 call sites
|
||
|
|
- A/B test (Mixed 10-run, target +2.5%)
|
||
|
|
- Health check, promote to default if GO
|
||
|
|
|
||
|
|
2. **Phase 4 E2** (SECONDARY, defer until E1 complete):
|
||
|
|
- Per-class alloc fast path specialization
|
||
|
|
- Expected gain: +2-3%
|
||
|
|
|
||
|
|
3. **Phase 4 E3** (TERTIARY, extends E1):
|
||
|
|
- Free path ENV gate consolidation
|
||
|
|
- Expected gain: +0.4-0.6%
|
||
|
|
|
||
|
|
## References
|
||
|
|
|
||
|
|
- **Baseline**: 46.37M ops/s (MIXED_TINYV3_C7_SAFE, Phase 3 + D1)
|
||
|
|
- **Target**: 47.8M ops/s (+3.0% via E1)
|
||
|
|
- **Profile**: MIXED_TINYV3_C7_SAFE (20M iterations, ws=400)
|
||
|
|
- **Workload**: bench_random_mixed_hakmem (50% alloc / 50% free)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
**Status**: COMPLETE - Ready for Phase 4 E1
|
||
|
|
**Date**: 2025-12-14
|