This commit introduces a comprehensive tracing mechanism for allocation failures within the Adaptive Cache Engine (ACE) component. This feature allows for precise identification of the root cause for Out-Of-Memory (OOM) issues related to ACE allocations. Key changes include: - **ACE Tracing Implementation**: - Added environment variable to enable/disable detailed logging of allocation failures. - Instrumented , , and to distinguish between "Threshold" (size class mismatch), "Exhaustion" (pool depletion), and "MapFail" (OS memory allocation failure). - **Build System Fixes**: - Corrected to ensure is properly linked into , resolving an error. - **LD_PRELOAD Wrapper Adjustments**: - Investigated and understood the wrapper's behavior under , particularly its interaction with and checks. - Enabled debugging flags for environment to prevent unintended fallbacks to 's for non-tiny allocations, allowing comprehensive testing of the allocator. - **Debugging & Verification**: - Introduced temporary verbose logging to pinpoint execution flow issues within interception and routing. These temporary logs have been removed. - Created to facilitate testing of the tracing features. This feature will significantly aid in diagnosing and resolving allocation-related OOM issues in by providing clear insights into the failure pathways.
7.1 KiB
7.1 KiB
Phase 8 Comprehensive Benchmark - Visual Summary
Performance Comparison Charts
Working Set 256 (Hot Cache) - Bar Chart
HAKMEM ████████████████████████████████████████ 79.2 M ops/s (1.00x)
System ███████████████████████████████████████████ 86.7 M ops/s (1.09x) ↑ 9%
mimalloc ██████████████████████████████████████████████████████████ 114.9 M ops/s (1.45x) ↑ 45%
Working Set 8192 (Realistic Workload) - Bar Chart
HAKMEM ████ 16.5 M ops/s (1.00x)
System ██████████████ 57.1 M ops/s (3.46x) ↑ 246%
mimalloc ████████████████████████ 96.5 M ops/s (5.85x) ↑ 485%
Scalability Comparison
Performance Degradation (WS256 → WS8192)
mimalloc ████ 1.19x degradation [EXCELLENT]
System ██████ 1.52x degradation [GOOD]
HAKMEM ███████████████████ 4.80x degradation [CRITICAL ISSUE]
Performance Gap Analysis
Cycle Budget (Estimated at 3.5 GHz)
| Allocator | Cycles/Op | Extra Cycles vs Best |
|---|---|---|
| mimalloc | 36 | 0 (baseline) |
| System | 61 | +25 (+69%) |
| HAKMEM | 212 | +176 (+489%) |
HAKMEM uses 176 extra cycles per operation compared to mimalloc!
Where Are The Cycles Going?
Estimated cycle breakdown for HAKMEM WS8192:
SuperSlab Lookup: ████████████████ 50-80 cycles
Legacy Fallback: ██████████████ 30-50 cycles (when triggered)
Fragmentation: ███████████ 30-50 cycles
TLS Drain Logic: ███ 10-15 cycles
Actual Work: ████████ 30-40 cycles
─────────────────────────
Total: ~212 cycles/operation
mimalloc for comparison:
Optimized Fast Path: ████████ 36 cycles total
Priority Ranking
Critical Issues (Must Fix)
1. SuperSlab Scaling Priority: CRITICAL Impact: 246% perf loss
└─ 4.8x degradation vs 1.5x for System malloc
└─ "shared_fail→legacy" messages indicate capacity issues
2. Fragmentation Priority: HIGH Impact: 30-50 cycles/op
└─ SuperSlab list becomes inefficient at scale
3. TLB Pressure Priority: HIGH Impact: Unknown, likely high
└─ Many 512KB SuperSlabs → TLB misses
Important Issues (Should Fix)
4. TLS Drain Overhead Priority: MEDIUM Impact: 9.4% on hot cache
└─ Affects even best-case performance
5. Fast Path Efficiency Priority: MEDIUM Impact: 9.4% on hot cache
└─ Need more aggressive inlining
Nice-to-Have
6. Metadata Optimization Priority: LOW Impact: Unknown
└─ Reduce cache pollution from slab metadata
Competitive Position
Current Status: Phase 8
Tier 1 (Production-Ready):
mimalloc ████████████████████████ 96.5 M ops/s
System ██████████████ 57.1 M ops/s
Tier 2 (Needs Work):
(empty)
Tier 3 (Experimental):
HAKMEM ████ 16.5 M ops/s ← YOU ARE HERE
Target for Phase 12 (6 months)
Tier 1 (Production-Ready):
mimalloc ████████████████████████ 96.5 M ops/s
HAKMEM ████████████████████ 80+ M ops/s ← TARGET
System ██████████████ 57.1 M ops/s
Goal: Match or exceed System malloc, get within 20% of mimalloc
Decision Matrix for Phase 9
Option A: Fix SuperSlab Architecture (Recommended)
Pros:
- Preserve existing work
- Targeted fixes may yield big gains
- Debug logs provide clear direction
Cons:
- May be fundamentally flawed architecture
- Risk of incremental fixes not solving core issue
Time estimate: 2-3 weeks Success probability: 60%
Option B: Hybrid Architecture
Pros:
- Keep TLS fast path (working well)
- Replace SuperSlab backend with proven design
- Best of both worlds
Cons:
- Major refactoring required
- Lose SuperSlab work
- Integration complexity
Time estimate: 4-6 weeks Success probability: 75%
Option C: Start Over (Not Recommended Yet)
Pros:
- Clean slate
- Can copy proven designs (mimalloc, jemalloc)
Cons:
- Lose all current work
- No learning from mistakes
- 3+ months delay
Time estimate: 3-4 months Success probability: 85% (but high cost)
Recommended Path Forward
Phase 9: SuperSlab Deep Dive (2 weeks)
Week 1: Investigation
- Add comprehensive profiling
- Measure cache/TLB misses
- Analyze fragmentation patterns
- Understand "shared_fail→legacy" root cause
Week 2: Targeted Fixes
- Implement hash table for SuperSlab lookup
- Experiment with larger SuperSlabs (1-2MB)
- Optimize fragmentation handling
- Add better capacity management
Success criteria:
- WS8192: 16.5 → 35+ M ops/s (2x improvement)
- Understand root cause even if fix incomplete
Phase 10: Decision Point
If Phase 9 successful (>35 M ops/s):
- Continue with SuperSlab optimizations
- Focus on fast path improvements
- Target: 50 M ops/s by Phase 12
If Phase 9 unsuccessful (<30 M ops/s):
- Switch to Hybrid Architecture (Option B)
- Keep TLS layer, replace backend
- Target: 60 M ops/s by Phase 14
Key Metrics to Track
Performance Metrics
- WS256 throughput (target: 85+ M ops/s)
- WS8192 throughput (target: 35+ M ops/s)
- Degradation ratio (target: <2.5x)
Architecture Metrics
- SuperSlab lookup latency (target: <20 cycles)
- Cache miss rate (target: <5%)
- TLB miss rate (target: <1%)
- Fragmentation ratio (target: <20%)
Debug Metrics
- "shared_fail→legacy" events (target: 0)
- TLS_SLL_HDR_RESET events (target: 0)
- Average SuperSlab count (target: <10 at WS8192)
Conclusion
Phase 8 Status: COMPLETE
- ✓ Comprehensive benchmarks executed
- ✓ Statistical analysis completed
- ✓ Root cause hypotheses identified
- ✓ Clear path forward defined
Phase 9 Ready: YES
- Clear investigation targets
- Specific metrics to measure
- Decision criteria established
Confidence Level: HIGH
- Data is robust (low variance)
- Gaps are well-understood
- Multiple viable paths forward
Next Action: Begin Phase 9 - SuperSlab Deep Dive and Profiling
Timeline:
- Phase 9: 2 weeks (investigation + targeted fixes)
- Phase 10: 1 week (decision point + planning)
- Phase 11-12: 3-4 weeks (major optimizations)
- Target completion: 6-8 weeks to production-ready
Risk Level: MEDIUM
- SuperSlab may be unfixable → fallback to Hybrid (Option B)
- Hybrid adds 2-3 weeks but higher success probability
- Total timeline stays within 10 weeks worst case