|
|
2e3fcc92af
|
Final Session Report: Comprehensive HAKMEM Performance Profiling & Optimization
## Session Complete ✅
Comprehensive profiling session analyzing HAKMEM allocator performance with three major phases:
### Phase 1: Profiling Investigation
- Answered user's 3 questions about prefault, CPU layers, and L1 caches
- Discovered TLB misses NOT from SuperSlab allocations
- THP/PREFAULT optimizations have ZERO measurable effect
- Page zeroing appears to be kernel-level, not user-controllable
### Phase 2: Implementation & Testing
- Implemented lazy zeroing via MADV_DONTNEED
- Result: -0.5% (worse due to syscall overhead)
- Discovered that 11.65% page zeroing is not controllable
- Profiling % doesn't always equal optimization opportunity
## Key Discoveries
1. **Prefault Box:** Works but only +2.6% benefit (marginal)
2. **User Code:** Only <1% CPU (not bottleneck)
3. **TLB Misses:** From TLS/libc, not allocations (THP useless)
4. **Page Zeroing:** Kernel-level (can't control from user-space)
5. **Profiling Lesson:** 11.65% visible ≠ controllable overhead
## Performance Reality
- **Current:** 1.06M ops/s (Random Mixed)
- **With tweaks:** 1.10-1.15M ops/s max (+10-15% theoretical)
- **vs Tiny Hot:** 89M ops/s (80x gap - architectural, unbridgeable)
## Deliverables
6 comprehensive analysis reports created:
1. Comprehensive Profiling Analysis
2. Profiling Insights & Recommendations (Task investigation)
3. Phase 1 Test Results (TLB/THP analysis)
4. Session Summary Findings
5. Lazy Zeroing Implementation Results
6. Final Session Report (this)
Plus: 1 working implementation (lazy zeroing), 2 git commits
## Conclusion
HAKMEM allocator is well-designed. Kernel memory overhead (63% of cycles)
is not controllable from user-space. Random Mixed at 1.06-1.15M ops/s
represents realistic ceiling for this workload class.
The biggest discovery: not all profile percentages are optimization opportunities.
Some bottlenecks are kernel-level and simply not controllable from user-space.
🐱 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
|
2025-12-04 20:52:48 +09:00 |
|