Major Features: - Debug counter infrastructure for Refill Stage tracking - Free Pipeline counters (ss_local, ss_remote, tls_sll) - Diagnostic counters for early return analysis - Unified larson.sh benchmark runner with profiles - Phase 6-3 regression analysis documentation Bug Fixes: - Fix SuperSlab disabled by default (HAKMEM_TINY_USE_SUPERSLAB) - Fix profile variable naming consistency - Add .gitignore patterns for large files Performance: - Phase 6-3: 4.79 M ops/s (has OOM risk) - With SuperSlab: 3.13 M ops/s (+19% improvement) This is a clean repository without large log files. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
45 lines
1.4 KiB
Markdown
45 lines
1.4 KiB
Markdown
# Profiling Results (2025‑10‑22)
|
||
|
||
This document stores the current observed overheads from lightweight sampling runs. Throughput is ignored; we focus on avg ns per path.
|
||
|
||
Environment: larson, LD_PRELOAD hakmem, sampling profiler ON (HAKMEM_PROF=1), sample rates as indicated.
|
||
|
||
## 10s reference (subset)
|
||
|
||
- BURST 10s, 1T
|
||
- tiny_alloc: avg ~28.8 ns (samples ~65k)
|
||
- BURST 10s, 4T
|
||
- malloc_alloc: avg ~98.7 ns (samples ~16k)
|
||
- LOOP 10s, 4T
|
||
- malloc_alloc: avg ~40.9 ns (samples ~35k)
|
||
|
||
## 2s sweep (1/256), 1T/4T (mid/large ranges)
|
||
|
||
- Mid 2–32KiB, 1T
|
||
- ace_alloc: ~26–31 ns
|
||
- malloc_alloc: ~150–220 ns
|
||
- Mid 2–32KiB, 4T
|
||
- ace_alloc: ~31–35 ns
|
||
- malloc_alloc: ~250–315 ns
|
||
|
||
- Large 64KiB–1MiB, 1T
|
||
- ace_alloc: ~32–58 ns → 31–50 ns (after W_MAX tuning)
|
||
- malloc_alloc: ~960–1,690 ns → 840–1,330 ns (slight drop)
|
||
- Large 64KiB–1MiB, 4T
|
||
- ace_alloc: ~52–72 ns
|
||
- malloc_alloc: ~2.5–4.1 µs
|
||
|
||
Notes:
|
||
- Tiny path healthy: tiny_alloc ~18–29 ns, tiny_reg_lookup ~17–23 ns.
|
||
- Registry registration ~0.6–1.3 µs (rare: slab creation only).
|
||
- Pool/L25 lock/refill present (instrumented) but low sample count in 2s runs; use focused ranges and higher sampling for deeper analysis.
|
||
|
||
## 1s sweep (1/256), 1T (W_MAX_mid=1.40, W_MAX_large=1.30)
|
||
|
||
- Mid 2–32KiB, 1T
|
||
- ace_alloc: ~26.6 ns
|
||
- malloc_alloc: ~153.2 ns (slight drop)
|
||
- Large 64KiB–1MiB, 1T
|
||
- ace_alloc: ~31.9–50.6 ns
|
||
- malloc_alloc: ~927.9–1,334.1 ns (modest drop)
|