Phase 19-2: Ultra SLIM 4-layer fast path implementation (ENV gated)
Implement Ultra SLIM 4-layer allocation fast path with ACE learning preserved. ENV: HAKMEM_TINY_ULTRA_SLIM=1 (default OFF) Architecture (4 layers): - Layer 1: Init Safety (1-2 cycles, cold path only) - Layer 2: Size-to-Class (1-2 cycles, LUT lookup) - Layer 3: ACE Learning (2-3 cycles, histogram update) ← PRESERVED! - Layer 4: TLS SLL Direct (3-5 cycles, freelist pop) - Total: 7-12 cycles (~2-4ns on 3GHz CPU) Goal: Achieve mimalloc parity (90-110M ops/s) by removing intermediate layers (HeapV2, FastCache, SFC) while preserving HAKMEM's learning capability. Deleted Layers (from standard 7-layer path): ❌ HeapV2 (C0-C3 magazine) ❌ FastCache (C0-C3 array stack) ❌ SFC (Super Front Cache) Expected savings: 11-15 cycles Implementation: 1. core/box/ultra_slim_alloc_box.h - 4-layer allocation path (returns USER pointer) - TLS-cached ENV check (once per thread) - Statistics & diagnostics (HAKMEM_ULTRA_SLIM_STATS=1) - Refill integration with backend 2. core/tiny_alloc_fast.inc.h - Ultra SLIM gate at entry point (line 694-702) - Early return if Ultra SLIM mode enabled - Zero impact on standard path (cold branch) Performance Results (Random Mixed 256B, 10M iterations): - Baseline (Ultra SLIM OFF): 63.3M ops/s - Ultra SLIM ON: 62.6M ops/s (-1.1%) - Target: 90-110M ops/s (mimalloc parity) - Gap: 44-76% slower than target Status: Implementation complete, but performance target not achieved. The 4-layer architecture is in place and ACE learning is preserved. Further optimization needed to reach mimalloc parity. Next Steps: - Profile Ultra SLIM path to identify remaining bottlenecks - Verify TLS SLL hit rate (statistics currently show zero) - Consider further cycle reduction in Layer 3 (ACE learning) - A/B test with ACE learning disabled to measure impact Notes: - Ultra SLIM mode is ENV gated (off by default) - No impact on standard 7-layer path performance - Statistics tracking implemented but needs verification - workset=256 tested and verified working 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@ -33,6 +33,7 @@
|
||||
#include "front/tiny_heap_v2.h" // Front-V2: TLS magazine (tcache-like) front
|
||||
#include "hakmem_tiny_lazy_init.inc.h" // Phase 22: Lazy per-class initialization
|
||||
#include "box/tiny_sizeclass_hist_box.h" // Phase 3-4: Tiny size class histogram (ACE learning)
|
||||
#include "box/ultra_slim_alloc_box.h" // Phase 19-2: Ultra SLIM 4-layer fast path
|
||||
#include <stdio.h>
|
||||
#include <stdatomic.h>
|
||||
|
||||
@ -690,6 +691,16 @@ static inline void* tiny_alloc_fast(size_t size) {
|
||||
// Phase 22: Global init (once per process)
|
||||
lazy_init_global();
|
||||
|
||||
// ========== Phase 19-2: Ultra SLIM 4-Layer Fast Path ==========
|
||||
// ENV: HAKMEM_TINY_ULTRA_SLIM=1
|
||||
// Expected: 90-110M ops/s (mimalloc parity)
|
||||
// Architecture: Init Safety + Size-to-Class + ACE Learning + TLS SLL Direct
|
||||
// Note: ACE learning preserved (HAKMEM's differentiator vs mimalloc)
|
||||
if (__builtin_expect(ultra_slim_mode_enabled(), 0)) {
|
||||
return ultra_slim_alloc_with_refill(size);
|
||||
}
|
||||
// ========== End Phase 19-2: Ultra SLIM ==========
|
||||
|
||||
// 1. Size → class index (inline, fast)
|
||||
int class_idx = hak_tiny_size_to_class(size);
|
||||
|
||||
|
||||
Reference in New Issue
Block a user