|
|
896f24367f
|
Phase 19-2: Ultra SLIM 4-layer fast path implementation (ENV gated)
Implement Ultra SLIM 4-layer allocation fast path with ACE learning preserved.
ENV: HAKMEM_TINY_ULTRA_SLIM=1 (default OFF)
Architecture (4 layers):
- Layer 1: Init Safety (1-2 cycles, cold path only)
- Layer 2: Size-to-Class (1-2 cycles, LUT lookup)
- Layer 3: ACE Learning (2-3 cycles, histogram update) ← PRESERVED!
- Layer 4: TLS SLL Direct (3-5 cycles, freelist pop)
- Total: 7-12 cycles (~2-4ns on 3GHz CPU)
Goal: Achieve mimalloc parity (90-110M ops/s) by removing intermediate layers
(HeapV2, FastCache, SFC) while preserving HAKMEM's learning capability.
Deleted Layers (from standard 7-layer path):
❌ HeapV2 (C0-C3 magazine)
❌ FastCache (C0-C3 array stack)
❌ SFC (Super Front Cache)
Expected savings: 11-15 cycles
Implementation:
1. core/box/ultra_slim_alloc_box.h
- 4-layer allocation path (returns USER pointer)
- TLS-cached ENV check (once per thread)
- Statistics & diagnostics (HAKMEM_ULTRA_SLIM_STATS=1)
- Refill integration with backend
2. core/tiny_alloc_fast.inc.h
- Ultra SLIM gate at entry point (line 694-702)
- Early return if Ultra SLIM mode enabled
- Zero impact on standard path (cold branch)
Performance Results (Random Mixed 256B, 10M iterations):
- Baseline (Ultra SLIM OFF): 63.3M ops/s
- Ultra SLIM ON: 62.6M ops/s (-1.1%)
- Target: 90-110M ops/s (mimalloc parity)
- Gap: 44-76% slower than target
Status: Implementation complete, but performance target not achieved.
The 4-layer architecture is in place and ACE learning is preserved.
Further optimization needed to reach mimalloc parity.
Next Steps:
- Profile Ultra SLIM path to identify remaining bottlenecks
- Verify TLS SLL hit rate (statistics currently show zero)
- Consider further cycle reduction in Layer 3 (ACE learning)
- A/B test with ACE learning disabled to measure impact
Notes:
- Ultra SLIM mode is ENV gated (off by default)
- No impact on standard 7-layer path performance
- Statistics tracking implemented but needs verification
- workset=256 tested and verified working
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
|
2025-11-22 06:16:20 +09:00 |
|