# Profiling Results (2025‑10‑22) This document stores the current observed overheads from lightweight sampling runs. Throughput is ignored; we focus on avg ns per path. Environment: larson, LD_PRELOAD hakmem, sampling profiler ON (HAKMEM_PROF=1), sample rates as indicated. ## 10s reference (subset) - BURST 10s, 1T - tiny_alloc: avg ~28.8 ns (samples ~65k) - BURST 10s, 4T - malloc_alloc: avg ~98.7 ns (samples ~16k) - LOOP 10s, 4T - malloc_alloc: avg ~40.9 ns (samples ~35k) ## 2s sweep (1/256), 1T/4T (mid/large ranges) - Mid 2–32KiB, 1T - ace_alloc: ~26–31 ns - malloc_alloc: ~150–220 ns - Mid 2–32KiB, 4T - ace_alloc: ~31–35 ns - malloc_alloc: ~250–315 ns - Large 64KiB–1MiB, 1T - ace_alloc: ~32–58 ns → 31–50 ns (after W_MAX tuning) - malloc_alloc: ~960–1,690 ns → 840–1,330 ns (slight drop) - Large 64KiB–1MiB, 4T - ace_alloc: ~52–72 ns - malloc_alloc: ~2.5–4.1 µs Notes: - Tiny path healthy: tiny_alloc ~18–29 ns, tiny_reg_lookup ~17–23 ns. - Registry registration ~0.6–1.3 µs (rare: slab creation only). - Pool/L25 lock/refill present (instrumented) but low sample count in 2s runs; use focused ranges and higher sampling for deeper analysis. ## 1s sweep (1/256), 1T (W_MAX_mid=1.40, W_MAX_large=1.30) - Mid 2–32KiB, 1T - ace_alloc: ~26.6 ns - malloc_alloc: ~153.2 ns (slight drop) - Large 64KiB–1MiB, 1T - ace_alloc: ~31.9–50.6 ns - malloc_alloc: ~927.9–1,334.1 ns (modest drop)