Files
hakmem/core
Claude b64cfc055e Implement Option A: Fast Path priority optimization (Phase 6-4)
Changes:
- Reorder malloc() to prioritize Fast Path (initialized + tiny size check first)
- Move Fast Path check before all guard checks (recursion, LD_PRELOAD, etc.)
- Optimize free() with same strategy (initialized check first)
- Add branch prediction hints (__builtin_expect)

Implementation:
- malloc(): Fast Path now executes with 3 branches total
  - Branch 1+2: g_initialized && size <= TINY_FAST_THRESHOLD
  - Branch 3: tiny_fast_alloc() cache hit check
  - Slow Path: All guard checks moved after Fast Path miss

- free(): Fast Path with 1-2 branches
  - Branch 1: g_initialized check
  - Direct to hak_free_at() on normal case

Performance Results (Larson benchmark, size=8-128B):

Single-thread (threads=1):
- Before: 0.46M ops/s (10.7% of system malloc)
- After:  0.65M ops/s (15.4% of system malloc)
- Change: +42% improvement ✓

Multi-thread (threads=4):
- Before: 1.81M ops/s (25.0% of system malloc)
- After:  1.44M ops/s (19.9% of system malloc)
- Change: -20% regression ✗

Analysis:
- ST improvement shows Fast Path optimization works
- MT regression suggests contention or cache issues
- Did not meet target (+200-400%), further optimization needed

Next Steps:
- Investigate MT regression (cache coherency?)
- Consider more aggressive inlining
- Explore Option B (Refill optimization)
2025-11-05 04:44:50 +00:00
..