Files
hakmem/docs/analysis/PHASE4_PROFILING_FILES_INDEX.md
2025-12-14 00:48:03 +09:00

144 lines
4.1 KiB
Markdown

# Phase 4 Perf Profiling - Files Index
**Date**: 2025-12-14
**Status**: Complete
## Created Documents
### 1. Primary Analysis
**File**: `/mnt/workdisk/public_share/hakmem/docs/analysis/PHASE4_PERF_PROFILE_ANALYSIS.md`
**Size**: ~5000 words
**Contents**:
- Detailed perf report breakdown
- Candidate analysis (tiny_alloc_gate_fast, free_tiny_fast_cold, ENV gates)
- Shape optimization plateau analysis
- E1 implementation plan (ENV snapshot consolidation)
- Alternative targets (E2/E3/E4)
### 2. Executive Summary
**File**: `/mnt/workdisk/public_share/hakmem/docs/analysis/PHASE4_PERF_PROFILE_FINAL_REPORT.md`
**Size**: ~3000 words
**Contents**:
- Executive summary
- Top hotspots analysis
- Selected target (E1 ENV Snapshot Consolidation)
- Implementation roadmap
- Success criteria checklist
### 3. Files Index (This Document)
**File**: `/mnt/workdisk/public_share/hakmem/docs/analysis/PHASE4_PROFILING_FILES_INDEX.md`
**Contents**:
- List of all created/modified files
- Quick reference guide
## Modified Documents
### 1. CURRENT_TASK.md
**File**: `/mnt/workdisk/public_share/hakmem/CURRENT_TASK.md`
**Changes**:
- Added Phase 4 perf profiling summary (lines 3-39)
- Key findings: ENV gate overhead (3.26%), shape plateau analysis
- Next target: Phase 4 E1 - ENV Snapshot Consolidation
## Perf Data Artifacts
### 1. Raw Perf Data
**File**: `/mnt/workdisk/public_share/hakmem/perf.data`
**Format**: Binary (perf record output)
**Size**: 0.059 MB
**Samples**: 922 @ 999Hz
**Command**:
```bash
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE \
perf record -F 999 -- ./bench_random_mixed_hakmem 40000000 400 1
```
### 2. Perf Report (Full)
**File**: `/tmp/perf_report_full.txt`
**Format**: Text (perf report --stdio output)
**Contents**: Full symbol-sorted report with self% breakdown
### 3. Perf Summary
**File**: `/tmp/perf_summary.txt`
**Format**: Text (quick reference)
**Contents**: Top hotspots, selected target, perf command reference
## Key Findings
### ENV Gate Overhead (3.26% Combined)
1. `tiny_c7_ultra_enabled_env()`: 1.28%
2. `tiny_front_v3_enabled()`: 1.01%
3. `tiny_metadata_cache_enabled()`: 0.97%
**Root Cause**: 3 separate TLS reads + lazy init checks on every hot path call
### Shape Optimization Plateau
- B3 (Routing Shape): +2.89% (first pass)
- D3 (Alloc Gate Shape): +0.56% NEUTRAL (diminishing returns)
- **Lesson**: Branch prediction saturated, next frontier is caching/structural changes
### Selected Next Target
**Phase 4 E1**: ENV Snapshot Consolidation
- Expected gain: +3.0-3.5%
- Approach: Consolidate all ENV gates into single TLS snapshot struct
- Precedent: `tiny_front_v3_snapshot` (proven pattern)
## Quick Navigation
### Detailed Analysis
```bash
cat /mnt/workdisk/public_share/hakmem/docs/analysis/PHASE4_PERF_PROFILE_ANALYSIS.md
```
### Executive Summary
```bash
cat /mnt/workdisk/public_share/hakmem/docs/analysis/PHASE4_PERF_PROFILE_FINAL_REPORT.md
```
### Current Task Status
```bash
head -100 /mnt/workdisk/public_share/hakmem/CURRENT_TASK.md
```
### Perf Commands (Re-run)
```bash
# Profile
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE \
perf record -F 999 -- ./bench_random_mixed_hakmem 40000000 400 1
# Report (top 80)
perf report --stdio --no-children --sort=symbol | head -80
# Annotate specific function
perf annotate --stdio tiny_alloc_gate_fast.lto_priv.0 | head -100
```
## Next Steps
1. **Phase 4 E1 Implementation** (2-3 days):
- Create `core/box/hakmem_env_snapshot_box.h/c`
- Migrate priority ENV gates (C7 ultra, front_v3, metadata_cache)
- Refactor ~14 call sites
- A/B test (Mixed 10-run, target +2.5%)
- Health check, promote to default if GO
2. **Phase 4 E2** (SECONDARY, defer until E1 complete):
- Per-class alloc fast path specialization
- Expected gain: +2-3%
3. **Phase 4 E3** (TERTIARY, extends E1):
- Free path ENV gate consolidation
- Expected gain: +0.4-0.6%
## References
- **Baseline**: 46.37M ops/s (MIXED_TINYV3_C7_SAFE, Phase 3 + D1)
- **Target**: 47.8M ops/s (+3.0% via E1)
- **Profile**: MIXED_TINYV3_C7_SAFE (20M iterations, ws=400)
- **Workload**: bench_random_mixed_hakmem (50% alloc / 50% free)
---
**Status**: COMPLETE - Ready for Phase 4 E1
**Date**: 2025-12-14