Files
hakmem/docs/archive/PROFILING_RESULTS_2025_10_22.md
Moe Charm (CI) 52386401b3 Debug Counters Implementation - Clean History
Major Features:
- Debug counter infrastructure for Refill Stage tracking
- Free Pipeline counters (ss_local, ss_remote, tls_sll)
- Diagnostic counters for early return analysis
- Unified larson.sh benchmark runner with profiles
- Phase 6-3 regression analysis documentation

Bug Fixes:
- Fix SuperSlab disabled by default (HAKMEM_TINY_USE_SUPERSLAB)
- Fix profile variable naming consistency
- Add .gitignore patterns for large files

Performance:
- Phase 6-3: 4.79 M ops/s (has OOM risk)
- With SuperSlab: 3.13 M ops/s (+19% improvement)

This is a clean repository without large log files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-05 12:31:14 +09:00

45 lines
1.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Profiling Results (20251022)
This document stores the current observed overheads from lightweight sampling runs. Throughput is ignored; we focus on avg ns per path.
Environment: larson, LD_PRELOAD hakmem, sampling profiler ON (HAKMEM_PROF=1), sample rates as indicated.
## 10s reference (subset)
- BURST 10s, 1T
- tiny_alloc: avg ~28.8 ns (samples ~65k)
- BURST 10s, 4T
- malloc_alloc: avg ~98.7 ns (samples ~16k)
- LOOP 10s, 4T
- malloc_alloc: avg ~40.9 ns (samples ~35k)
## 2s sweep (1/256), 1T/4T (mid/large ranges)
- Mid 232KiB, 1T
- ace_alloc: ~2631 ns
- malloc_alloc: ~150220 ns
- Mid 232KiB, 4T
- ace_alloc: ~3135 ns
- malloc_alloc: ~250315 ns
- Large 64KiB1MiB, 1T
- ace_alloc: ~3258 ns → 3150 ns (after W_MAX tuning)
- malloc_alloc: ~9601,690 ns → 8401,330 ns (slight drop)
- Large 64KiB1MiB, 4T
- ace_alloc: ~5272 ns
- malloc_alloc: ~2.54.1 µs
Notes:
- Tiny path healthy: tiny_alloc ~1829 ns, tiny_reg_lookup ~1723 ns.
- Registry registration ~0.61.3 µs (rare: slab creation only).
- Pool/L25 lock/refill present (instrumented) but low sample count in 2s runs; use focused ranges and higher sampling for deeper analysis.
## 1s sweep (1/256), 1T (W_MAX_mid=1.40, W_MAX_large=1.30)
- Mid 232KiB, 1T
- ace_alloc: ~26.6 ns
- malloc_alloc: ~153.2 ns (slight drop)
- Large 64KiB1MiB, 1T
- ace_alloc: ~31.950.6 ns
- malloc_alloc: ~927.91,334.1 ns (modest drop)