feat: Add ACE allocation failure tracing and debug hooks
This commit introduces a comprehensive tracing mechanism for allocation failures within the Adaptive Cache Engine (ACE) component. This feature allows for precise identification of the root cause for Out-Of-Memory (OOM) issues related to ACE allocations. Key changes include: - **ACE Tracing Implementation**: - Added environment variable to enable/disable detailed logging of allocation failures. - Instrumented , , and to distinguish between "Threshold" (size class mismatch), "Exhaustion" (pool depletion), and "MapFail" (OS memory allocation failure). - **Build System Fixes**: - Corrected to ensure is properly linked into , resolving an error. - **LD_PRELOAD Wrapper Adjustments**: - Investigated and understood the wrapper's behavior under , particularly its interaction with and checks. - Enabled debugging flags for environment to prevent unintended fallbacks to 's for non-tiny allocations, allowing comprehensive testing of the allocator. - **Debugging & Verification**: - Introduced temporary verbose logging to pinpoint execution flow issues within interception and routing. These temporary logs have been removed. - Created to facilitate testing of the tracing features. This feature will significantly aid in diagnosing and resolving allocation-related OOM issues in by providing clear insights into the failure pathways.
This commit is contained in:
246
PHASE8_VISUAL_SUMMARY.md
Normal file
246
PHASE8_VISUAL_SUMMARY.md
Normal file
@ -0,0 +1,246 @@
|
||||
# Phase 8 Comprehensive Benchmark - Visual Summary
|
||||
|
||||
## Performance Comparison Charts
|
||||
|
||||
### Working Set 256 (Hot Cache) - Bar Chart
|
||||
|
||||
```
|
||||
HAKMEM ████████████████████████████████████████ 79.2 M ops/s (1.00x)
|
||||
System ███████████████████████████████████████████ 86.7 M ops/s (1.09x) ↑ 9%
|
||||
mimalloc ██████████████████████████████████████████████████████████ 114.9 M ops/s (1.45x) ↑ 45%
|
||||
```
|
||||
|
||||
### Working Set 8192 (Realistic Workload) - Bar Chart
|
||||
|
||||
```
|
||||
HAKMEM ████ 16.5 M ops/s (1.00x)
|
||||
System ██████████████ 57.1 M ops/s (3.46x) ↑ 246%
|
||||
mimalloc ████████████████████████ 96.5 M ops/s (5.85x) ↑ 485%
|
||||
```
|
||||
|
||||
## Scalability Comparison
|
||||
|
||||
### Performance Degradation (WS256 → WS8192)
|
||||
|
||||
```
|
||||
mimalloc ████ 1.19x degradation [EXCELLENT]
|
||||
System ██████ 1.52x degradation [GOOD]
|
||||
HAKMEM ███████████████████ 4.80x degradation [CRITICAL ISSUE]
|
||||
```
|
||||
|
||||
## Performance Gap Analysis
|
||||
|
||||
### Cycle Budget (Estimated at 3.5 GHz)
|
||||
|
||||
| Allocator | Cycles/Op | Extra Cycles vs Best |
|
||||
|-----------|-----------|---------------------|
|
||||
| mimalloc | 36 | 0 (baseline) |
|
||||
| System | 61 | +25 (+69%) |
|
||||
| HAKMEM | 212 | +176 (+489%) |
|
||||
|
||||
**HAKMEM uses 176 extra cycles per operation compared to mimalloc!**
|
||||
|
||||
### Where Are The Cycles Going?
|
||||
|
||||
```
|
||||
Estimated cycle breakdown for HAKMEM WS8192:
|
||||
|
||||
SuperSlab Lookup: ████████████████ 50-80 cycles
|
||||
Legacy Fallback: ██████████████ 30-50 cycles (when triggered)
|
||||
Fragmentation: ███████████ 30-50 cycles
|
||||
TLS Drain Logic: ███ 10-15 cycles
|
||||
Actual Work: ████████ 30-40 cycles
|
||||
─────────────────────────
|
||||
Total: ~212 cycles/operation
|
||||
|
||||
mimalloc for comparison:
|
||||
Optimized Fast Path: ████████ 36 cycles total
|
||||
```
|
||||
|
||||
## Priority Ranking
|
||||
|
||||
### Critical Issues (Must Fix)
|
||||
|
||||
```
|
||||
1. SuperSlab Scaling Priority: CRITICAL Impact: 246% perf loss
|
||||
└─ 4.8x degradation vs 1.5x for System malloc
|
||||
└─ "shared_fail→legacy" messages indicate capacity issues
|
||||
|
||||
2. Fragmentation Priority: HIGH Impact: 30-50 cycles/op
|
||||
└─ SuperSlab list becomes inefficient at scale
|
||||
|
||||
3. TLB Pressure Priority: HIGH Impact: Unknown, likely high
|
||||
└─ Many 512KB SuperSlabs → TLB misses
|
||||
```
|
||||
|
||||
### Important Issues (Should Fix)
|
||||
|
||||
```
|
||||
4. TLS Drain Overhead Priority: MEDIUM Impact: 9.4% on hot cache
|
||||
└─ Affects even best-case performance
|
||||
|
||||
5. Fast Path Efficiency Priority: MEDIUM Impact: 9.4% on hot cache
|
||||
└─ Need more aggressive inlining
|
||||
```
|
||||
|
||||
### Nice-to-Have
|
||||
|
||||
```
|
||||
6. Metadata Optimization Priority: LOW Impact: Unknown
|
||||
└─ Reduce cache pollution from slab metadata
|
||||
```
|
||||
|
||||
## Competitive Position
|
||||
|
||||
### Current Status: Phase 8
|
||||
|
||||
```
|
||||
Tier 1 (Production-Ready):
|
||||
mimalloc ████████████████████████ 96.5 M ops/s
|
||||
System ██████████████ 57.1 M ops/s
|
||||
|
||||
Tier 2 (Needs Work):
|
||||
(empty)
|
||||
|
||||
Tier 3 (Experimental):
|
||||
HAKMEM ████ 16.5 M ops/s ← YOU ARE HERE
|
||||
```
|
||||
|
||||
### Target for Phase 12 (6 months)
|
||||
|
||||
```
|
||||
Tier 1 (Production-Ready):
|
||||
mimalloc ████████████████████████ 96.5 M ops/s
|
||||
HAKMEM ████████████████████ 80+ M ops/s ← TARGET
|
||||
System ██████████████ 57.1 M ops/s
|
||||
|
||||
Goal: Match or exceed System malloc, get within 20% of mimalloc
|
||||
```
|
||||
|
||||
## Decision Matrix for Phase 9
|
||||
|
||||
### Option A: Fix SuperSlab Architecture (Recommended)
|
||||
|
||||
**Pros**:
|
||||
- Preserve existing work
|
||||
- Targeted fixes may yield big gains
|
||||
- Debug logs provide clear direction
|
||||
|
||||
**Cons**:
|
||||
- May be fundamentally flawed architecture
|
||||
- Risk of incremental fixes not solving core issue
|
||||
|
||||
**Time estimate**: 2-3 weeks
|
||||
**Success probability**: 60%
|
||||
|
||||
### Option B: Hybrid Architecture
|
||||
|
||||
**Pros**:
|
||||
- Keep TLS fast path (working well)
|
||||
- Replace SuperSlab backend with proven design
|
||||
- Best of both worlds
|
||||
|
||||
**Cons**:
|
||||
- Major refactoring required
|
||||
- Lose SuperSlab work
|
||||
- Integration complexity
|
||||
|
||||
**Time estimate**: 4-6 weeks
|
||||
**Success probability**: 75%
|
||||
|
||||
### Option C: Start Over (Not Recommended Yet)
|
||||
|
||||
**Pros**:
|
||||
- Clean slate
|
||||
- Can copy proven designs (mimalloc, jemalloc)
|
||||
|
||||
**Cons**:
|
||||
- Lose all current work
|
||||
- No learning from mistakes
|
||||
- 3+ months delay
|
||||
|
||||
**Time estimate**: 3-4 months
|
||||
**Success probability**: 85% (but high cost)
|
||||
|
||||
## Recommended Path Forward
|
||||
|
||||
### Phase 9: SuperSlab Deep Dive (2 weeks)
|
||||
|
||||
**Week 1: Investigation**
|
||||
- Add comprehensive profiling
|
||||
- Measure cache/TLB misses
|
||||
- Analyze fragmentation patterns
|
||||
- Understand "shared_fail→legacy" root cause
|
||||
|
||||
**Week 2: Targeted Fixes**
|
||||
- Implement hash table for SuperSlab lookup
|
||||
- Experiment with larger SuperSlabs (1-2MB)
|
||||
- Optimize fragmentation handling
|
||||
- Add better capacity management
|
||||
|
||||
**Success criteria**:
|
||||
- WS8192: 16.5 → 35+ M ops/s (2x improvement)
|
||||
- Understand root cause even if fix incomplete
|
||||
|
||||
### Phase 10: Decision Point
|
||||
|
||||
**If Phase 9 successful (>35 M ops/s)**:
|
||||
- Continue with SuperSlab optimizations
|
||||
- Focus on fast path improvements
|
||||
- Target: 50 M ops/s by Phase 12
|
||||
|
||||
**If Phase 9 unsuccessful (<30 M ops/s)**:
|
||||
- Switch to Hybrid Architecture (Option B)
|
||||
- Keep TLS layer, replace backend
|
||||
- Target: 60 M ops/s by Phase 14
|
||||
|
||||
## Key Metrics to Track
|
||||
|
||||
### Performance Metrics
|
||||
- [ ] WS256 throughput (target: 85+ M ops/s)
|
||||
- [ ] WS8192 throughput (target: 35+ M ops/s)
|
||||
- [ ] Degradation ratio (target: <2.5x)
|
||||
|
||||
### Architecture Metrics
|
||||
- [ ] SuperSlab lookup latency (target: <20 cycles)
|
||||
- [ ] Cache miss rate (target: <5%)
|
||||
- [ ] TLB miss rate (target: <1%)
|
||||
- [ ] Fragmentation ratio (target: <20%)
|
||||
|
||||
### Debug Metrics
|
||||
- [ ] "shared_fail→legacy" events (target: 0)
|
||||
- [ ] TLS_SLL_HDR_RESET events (target: 0)
|
||||
- [ ] Average SuperSlab count (target: <10 at WS8192)
|
||||
|
||||
## Conclusion
|
||||
|
||||
**Phase 8 Status**: COMPLETE
|
||||
- ✓ Comprehensive benchmarks executed
|
||||
- ✓ Statistical analysis completed
|
||||
- ✓ Root cause hypotheses identified
|
||||
- ✓ Clear path forward defined
|
||||
|
||||
**Phase 9 Ready**: YES
|
||||
- Clear investigation targets
|
||||
- Specific metrics to measure
|
||||
- Decision criteria established
|
||||
|
||||
**Confidence Level**: HIGH
|
||||
- Data is robust (low variance)
|
||||
- Gaps are well-understood
|
||||
- Multiple viable paths forward
|
||||
|
||||
---
|
||||
|
||||
**Next Action**: Begin Phase 9 - SuperSlab Deep Dive and Profiling
|
||||
|
||||
**Timeline**:
|
||||
- Phase 9: 2 weeks (investigation + targeted fixes)
|
||||
- Phase 10: 1 week (decision point + planning)
|
||||
- Phase 11-12: 3-4 weeks (major optimizations)
|
||||
- Target completion: 6-8 weeks to production-ready
|
||||
|
||||
**Risk Level**: MEDIUM
|
||||
- SuperSlab may be unfixable → fallback to Hybrid (Option B)
|
||||
- Hybrid adds 2-3 weeks but higher success probability
|
||||
- Total timeline stays within 10 weeks worst case
|
||||
Reference in New Issue
Block a user