Files
hakmem/PHASE8_VISUAL_SUMMARY.md
Moe Charm (CI) 4ef0171bc0 feat: Add ACE allocation failure tracing and debug hooks
This commit introduces a comprehensive tracing mechanism for allocation failures within the Adaptive Cache Engine (ACE) component. This feature allows for precise identification of the root cause for Out-Of-Memory (OOM) issues related to ACE allocations.

Key changes include:
- **ACE Tracing Implementation**:
  - Added  environment variable to enable/disable detailed logging of allocation failures.
  - Instrumented , , and  to distinguish between "Threshold" (size class mismatch), "Exhaustion" (pool depletion), and "MapFail" (OS memory allocation failure).
- **Build System Fixes**:
  - Corrected  to ensure  is properly linked into , resolving an  error.
- **LD_PRELOAD Wrapper Adjustments**:
  - Investigated and understood the  wrapper's behavior under , particularly its interaction with  and  checks.
  - Enabled debugging flags for  environment to prevent unintended fallbacks to 's  for non-tiny allocations, allowing comprehensive testing of the  allocator.
- **Debugging & Verification**:
  - Introduced temporary verbose logging to pinpoint execution flow issues within  interception and  routing. These temporary logs have been removed.
  - Created  to facilitate testing of the tracing features.

This feature will significantly aid in diagnosing and resolving allocation-related OOM issues in  by providing clear insights into the failure pathways.
2025-12-01 16:37:59 +09:00

7.1 KiB

Phase 8 Comprehensive Benchmark - Visual Summary

Performance Comparison Charts

Working Set 256 (Hot Cache) - Bar Chart

HAKMEM    ████████████████████████████████████████ 79.2 M ops/s (1.00x)
System    ███████████████████████████████████████████ 86.7 M ops/s (1.09x) ↑ 9%
mimalloc  ██████████████████████████████████████████████████████████ 114.9 M ops/s (1.45x) ↑ 45%

Working Set 8192 (Realistic Workload) - Bar Chart

HAKMEM    ████ 16.5 M ops/s (1.00x)
System    ██████████████ 57.1 M ops/s (3.46x) ↑ 246%
mimalloc  ████████████████████████ 96.5 M ops/s (5.85x) ↑ 485%

Scalability Comparison

Performance Degradation (WS256 → WS8192)

mimalloc  ████ 1.19x degradation  [EXCELLENT]
System    ██████ 1.52x degradation  [GOOD]
HAKMEM    ███████████████████ 4.80x degradation  [CRITICAL ISSUE]

Performance Gap Analysis

Cycle Budget (Estimated at 3.5 GHz)

Allocator Cycles/Op Extra Cycles vs Best
mimalloc 36 0 (baseline)
System 61 +25 (+69%)
HAKMEM 212 +176 (+489%)

HAKMEM uses 176 extra cycles per operation compared to mimalloc!

Where Are The Cycles Going?

Estimated cycle breakdown for HAKMEM WS8192:

SuperSlab Lookup:      ████████████████ 50-80 cycles
Legacy Fallback:       ██████████████ 30-50 cycles (when triggered)
Fragmentation:         ███████████ 30-50 cycles
TLS Drain Logic:       ███ 10-15 cycles
Actual Work:           ████████ 30-40 cycles
                       ─────────────────────────
Total:                 ~212 cycles/operation

mimalloc for comparison:
Optimized Fast Path:   ████████ 36 cycles total

Priority Ranking

Critical Issues (Must Fix)

1. SuperSlab Scaling          Priority: CRITICAL    Impact: 246% perf loss
   └─ 4.8x degradation vs 1.5x for System malloc
   └─ "shared_fail→legacy" messages indicate capacity issues

2. Fragmentation             Priority: HIGH        Impact: 30-50 cycles/op
   └─ SuperSlab list becomes inefficient at scale

3. TLB Pressure              Priority: HIGH        Impact: Unknown, likely high
   └─ Many 512KB SuperSlabs → TLB misses

Important Issues (Should Fix)

4. TLS Drain Overhead        Priority: MEDIUM      Impact: 9.4% on hot cache
   └─ Affects even best-case performance

5. Fast Path Efficiency      Priority: MEDIUM      Impact: 9.4% on hot cache
   └─ Need more aggressive inlining

Nice-to-Have

6. Metadata Optimization     Priority: LOW         Impact: Unknown
   └─ Reduce cache pollution from slab metadata

Competitive Position

Current Status: Phase 8

Tier 1 (Production-Ready):
  mimalloc   ████████████████████████ 96.5 M ops/s
  System     ██████████████ 57.1 M ops/s

Tier 2 (Needs Work):
  (empty)

Tier 3 (Experimental):
  HAKMEM     ████ 16.5 M ops/s  ← YOU ARE HERE

Target for Phase 12 (6 months)

Tier 1 (Production-Ready):
  mimalloc   ████████████████████████ 96.5 M ops/s
  HAKMEM     ████████████████████ 80+ M ops/s  ← TARGET
  System     ██████████████ 57.1 M ops/s

Goal: Match or exceed System malloc, get within 20% of mimalloc

Decision Matrix for Phase 9

Pros:

  • Preserve existing work
  • Targeted fixes may yield big gains
  • Debug logs provide clear direction

Cons:

  • May be fundamentally flawed architecture
  • Risk of incremental fixes not solving core issue

Time estimate: 2-3 weeks Success probability: 60%

Option B: Hybrid Architecture

Pros:

  • Keep TLS fast path (working well)
  • Replace SuperSlab backend with proven design
  • Best of both worlds

Cons:

  • Major refactoring required
  • Lose SuperSlab work
  • Integration complexity

Time estimate: 4-6 weeks Success probability: 75%

Pros:

  • Clean slate
  • Can copy proven designs (mimalloc, jemalloc)

Cons:

  • Lose all current work
  • No learning from mistakes
  • 3+ months delay

Time estimate: 3-4 months Success probability: 85% (but high cost)

Phase 9: SuperSlab Deep Dive (2 weeks)

Week 1: Investigation

  • Add comprehensive profiling
  • Measure cache/TLB misses
  • Analyze fragmentation patterns
  • Understand "shared_fail→legacy" root cause

Week 2: Targeted Fixes

  • Implement hash table for SuperSlab lookup
  • Experiment with larger SuperSlabs (1-2MB)
  • Optimize fragmentation handling
  • Add better capacity management

Success criteria:

  • WS8192: 16.5 → 35+ M ops/s (2x improvement)
  • Understand root cause even if fix incomplete

Phase 10: Decision Point

If Phase 9 successful (>35 M ops/s):

  • Continue with SuperSlab optimizations
  • Focus on fast path improvements
  • Target: 50 M ops/s by Phase 12

If Phase 9 unsuccessful (<30 M ops/s):

  • Switch to Hybrid Architecture (Option B)
  • Keep TLS layer, replace backend
  • Target: 60 M ops/s by Phase 14

Key Metrics to Track

Performance Metrics

  • WS256 throughput (target: 85+ M ops/s)
  • WS8192 throughput (target: 35+ M ops/s)
  • Degradation ratio (target: <2.5x)

Architecture Metrics

  • SuperSlab lookup latency (target: <20 cycles)
  • Cache miss rate (target: <5%)
  • TLB miss rate (target: <1%)
  • Fragmentation ratio (target: <20%)

Debug Metrics

  • "shared_fail→legacy" events (target: 0)
  • TLS_SLL_HDR_RESET events (target: 0)
  • Average SuperSlab count (target: <10 at WS8192)

Conclusion

Phase 8 Status: COMPLETE

  • ✓ Comprehensive benchmarks executed
  • ✓ Statistical analysis completed
  • ✓ Root cause hypotheses identified
  • ✓ Clear path forward defined

Phase 9 Ready: YES

  • Clear investigation targets
  • Specific metrics to measure
  • Decision criteria established

Confidence Level: HIGH

  • Data is robust (low variance)
  • Gaps are well-understood
  • Multiple viable paths forward

Next Action: Begin Phase 9 - SuperSlab Deep Dive and Profiling

Timeline:

  • Phase 9: 2 weeks (investigation + targeted fixes)
  • Phase 10: 1 week (decision point + planning)
  • Phase 11-12: 3-4 weeks (major optimizations)
  • Target completion: 6-8 weeks to production-ready

Risk Level: MEDIUM

  • SuperSlab may be unfixable → fallback to Hybrid (Option B)
  • Hybrid adds 2-3 weeks but higher success probability
  • Total timeline stays within 10 weeks worst case