This commit introduces a comprehensive tracing mechanism for allocation failures within the Adaptive Cache Engine (ACE) component. This feature allows for precise identification of the root cause for Out-Of-Memory (OOM) issues related to ACE allocations.
Key changes include:
- **ACE Tracing Implementation**:
- Added environment variable to enable/disable detailed logging of allocation failures.
- Instrumented , , and to distinguish between "Threshold" (size class mismatch), "Exhaustion" (pool depletion), and "MapFail" (OS memory allocation failure).
- **Build System Fixes**:
- Corrected to ensure is properly linked into , resolving an error.
- **LD_PRELOAD Wrapper Adjustments**:
- Investigated and understood the wrapper's behavior under , particularly its interaction with and checks.
- Enabled debugging flags for environment to prevent unintended fallbacks to 's for non-tiny allocations, allowing comprehensive testing of the allocator.
- **Debugging & Verification**:
- Introduced temporary verbose logging to pinpoint execution flow issues within interception and routing. These temporary logs have been removed.
- Created to facilitate testing of the tracing features.
This feature will significantly aid in diagnosing and resolving allocation-related OOM issues in by providing clear insights into the failure pathways.
- Root cause: header-based class indexing (HEADER_CLASSIDX=1) wrote a 1-byte
header during allocation, but linear carve/refill and initial slab capacity
still used bare class block sizes. This mismatch could overrun slab usable
space and corrupt freelists, causing reproducible SEGV at ~100k iters.
Changes
- Superslab: compute capacity with effective stride (block_size + header for
classes 0..6; class7 remains headerless) in superslab_init_slab(). Add a
debug-only bound check in superslab_alloc_from_slab() to fail fast if carve
would exceed usable bytes.
- Refill (non-P0 and P0): use header-aware stride for all linear carving and
TLS window bump operations. Ensure alignment/validation in tiny_refill_opt.h
also uses stride, not raw class size.
- Drain: keep existing defense-in-depth for remote sentinel and sanitize nodes
before splicing into freelist (already present).
Notes
- This unifies the memory layout across alloc/linear-carve/refill with a single
stride definition and keeps class7 (1024B) headerless as designed.
- Debug builds add fail-fast checks; release builds remain lean.
Next
- Re-run Tiny benches (256/1024B) in debug to confirm stability, then in
release. If any remaining crash persists, bisect with HAKMEM_TINY_P0_BATCH_REFILL=0
to isolate P0 batch carve, and continue reducing branch-miss as planned.