This commit introduces a comprehensive tracing mechanism for allocation failures within the Adaptive Cache Engine (ACE) component. This feature allows for precise identification of the root cause for Out-Of-Memory (OOM) issues related to ACE allocations. Key changes include: - **ACE Tracing Implementation**: - Added environment variable to enable/disable detailed logging of allocation failures. - Instrumented , , and to distinguish between "Threshold" (size class mismatch), "Exhaustion" (pool depletion), and "MapFail" (OS memory allocation failure). - **Build System Fixes**: - Corrected to ensure is properly linked into , resolving an error. - **LD_PRELOAD Wrapper Adjustments**: - Investigated and understood the wrapper's behavior under , particularly its interaction with and checks. - Enabled debugging flags for environment to prevent unintended fallbacks to 's for non-tiny allocations, allowing comprehensive testing of the allocator. - **Debugging & Verification**: - Introduced temporary verbose logging to pinpoint execution flow issues within interception and routing. These temporary logs have been removed. - Created to facilitate testing of the tracing features. This feature will significantly aid in diagnosing and resolving allocation-related OOM issues in by providing clear insights into the failure pathways.
117 lines
4.8 KiB
Markdown
117 lines
4.8 KiB
Markdown
================================================================================
|
|
Phase 8 Comprehensive Allocator Comparison - Analysis
|
|
================================================================================
|
|
|
|
## Working Set 256 (Hot cache, Phase 7 comparison)
|
|
|
|
| Allocator | Avg (M ops/s) | StdDev (%) | Min - Max | vs HAKMEM |
|
|
|----------------|---------------|------------|----------------|-----------|
|
|
| HAKMEM Phase 8 | 79.2 | ± 2.4% | 77.0 - 81.2 | 1.00x |
|
|
| System malloc | 86.7 | ± 1.0% | 85.3 - 87.5 | 1.09x |
|
|
| mimalloc | 114.9 | ± 1.2% | 112.5 - 116.2 | 1.45x |
|
|
|
|
## Working Set 8192 (Realistic workload)
|
|
|
|
| Allocator | Avg (M ops/s) | StdDev (%) | Min - Max | vs HAKMEM |
|
|
|----------------|---------------|------------|----------------|-----------|
|
|
| HAKMEM Phase 8 | 16.5 | ± 2.5% | 15.8 - 16.9 | 1.00x |
|
|
| System malloc | 57.1 | ± 1.3% | 56.1 - 57.8 | 3.46x |
|
|
| mimalloc | 96.5 | ± 0.9% | 95.5 - 97.7 | 5.85x |
|
|
|
|
================================================================================
|
|
Performance Analysis
|
|
================================================================================
|
|
|
|
### 1. Working Set 256 (Hot Cache) Results
|
|
|
|
- HAKMEM Phase 8: 79.2 M ops/s
|
|
- System malloc: 86.7 M ops/s (1.09x faster)
|
|
- mimalloc: 114.9 M ops/s (1.45x faster)
|
|
|
|
HAKMEM is **9.4% slower** than System malloc and **45.2% slower** than mimalloc
|
|
|
|
### 2. Working Set 8192 (Realistic Workload) Results
|
|
|
|
- HAKMEM Phase 8: 16.5 M ops/s
|
|
- System malloc: 57.1 M ops/s (3.46x faster)
|
|
- mimalloc: 96.5 M ops/s (5.85x faster)
|
|
|
|
HAKMEM is **246.0% slower** than System malloc and **484.9% slower** than mimalloc
|
|
|
|
================================================================================
|
|
Critical Observations
|
|
================================================================================
|
|
|
|
### HAKMEM Performance Gap Analysis
|
|
|
|
Performance degradation from WS256 to WS8192:
|
|
- HAKMEM: 4.80x slowdown (79.2 → 16.5 M ops/s)
|
|
- System: 1.52x slowdown (86.7 → 57.1 M ops/s)
|
|
- mimalloc: 1.19x slowdown (114.9 → 96.5 M ops/s)
|
|
|
|
HAKMEM degrades **3.16x MORE** than System malloc
|
|
HAKMEM degrades **4.03x MORE** than mimalloc
|
|
|
|
### Key Issues Identified
|
|
|
|
1. **Hot Cache Performance (WS256)**:
|
|
- HAKMEM: 79.2 M ops/s
|
|
- Gap: -9.1% vs System, -45.8% vs mimalloc
|
|
- Issue: Fast-path overhead (TLS drain, SuperSlab lookup)
|
|
|
|
2. **Realistic Workload Performance (WS8192)**:
|
|
- HAKMEM: 16.5 M ops/s
|
|
- Gap: -71.1% vs System, -83.1% vs mimalloc
|
|
- Issue: SEVERE - SuperSlab scaling, fragmentation, TLB pressure
|
|
|
|
3. **Scalability Problem**:
|
|
- HAKMEM loses 4.8x performance with larger working sets
|
|
- System loses only 1.5x
|
|
- mimalloc loses only 1.2x
|
|
- Root cause: SuperSlab architecture doesn't scale well
|
|
|
|
================================================================================
|
|
Recommendations for Phase 9+
|
|
================================================================================
|
|
|
|
### CRITICAL PRIORITY: Fix WS8192 Performance Gap
|
|
|
|
The 71-83% performance gap at realistic working sets is UNACCEPTABLE.
|
|
|
|
**Immediate Actions Required:**
|
|
|
|
1. **Investigate SuperSlab Scaling (Phase 9)**
|
|
- Profile: Why does performance collapse with larger working sets?
|
|
- Hypothesis: SuperSlab lookup overhead, fragmentation, or TLB misses
|
|
- Debug logs show 'shared_fail→legacy' messages → shared slab exhaustion
|
|
|
|
2. **Optimize Fast Path (Phase 10)**
|
|
- Even WS256 shows 9-46% gap vs competitors
|
|
- Profile TLS drain overhead
|
|
- Consider reducing drain frequency or lazy draining
|
|
|
|
3. **Consider Alternative Architectures (Phase 11)**
|
|
- Current SuperSlab model may be fundamentally flawed
|
|
- Benchmark shows 4.8x degradation vs 1.5x for System malloc
|
|
- May need hybrid approach: TLS fast path + different backend
|
|
|
|
4. **Specific Debug Actions**
|
|
- Analyze '[SS_BACKEND] shared_fail→legacy' logs
|
|
- Measure SuperSlab hit rate at different working set sizes
|
|
- Profile cache misses and TLB misses
|
|
|
|
================================================================================
|
|
Raw Data (for reproducibility)
|
|
================================================================================
|
|
|
|
hakmem_256 : [78480676, 78099247, 77034450, 81120430, 81206714]
|
|
system_256 : [87329938, 86497843, 87514376, 85308713, 86630819]
|
|
mimalloc_256 : [115842807, 115180313, 116209200, 112542094, 114950573]
|
|
hakmem_8192 : [16504443, 15799180, 16916987, 16687009, 16582555]
|
|
system_8192 : [56095157, 57843156, 56999206, 57717254, 56720055]
|
|
mimalloc_8192 : [96824532, 96117137, 95521242, 97733856, 96327554]
|
|
|
|
================================================================================
|
|
Analysis Complete
|
|
================================================================================
|