Files
hakmem/core/hakmem_tiny_refill.inc.h
Moe Charm (CI) 03df05ec75 Phase 12: Shared SuperSlab Pool implementation (WIP - runtime crash)
## Summary
Implemented Phase 12 Shared SuperSlab Pool (mimalloc-style) to address
SuperSlab allocation churn (877 SuperSlabs → 100-200 target).

## Implementation (ChatGPT + Claude)
1. **Metadata changes** (superslab_types.h):
   - Added class_idx to TinySlabMeta (per-slab dynamic class)
   - Removed size_class from SuperSlab (no longer per-SuperSlab)
   - Changed owner_tid (16-bit) → owner_tid_low (8-bit)

2. **Shared Pool** (hakmem_shared_pool.{h,c}):
   - Global pool shared by all size classes
   - shared_pool_acquire_slab() - Get free slab for class_idx
   - shared_pool_release_slab() - Return slab when empty
   - Per-class hints for fast path optimization

3. **Integration** (23 files modified):
   - Updated all ss->size_class → meta->class_idx
   - Updated all meta->owner_tid → meta->owner_tid_low
   - superslab_refill() now uses shared pool
   - Free path releases empty slabs back to pool

4. **Build system** (Makefile):
   - Added hakmem_shared_pool.o to OBJS_BASE and TINY_BENCH_OBJS_BASE

## Status: ⚠️ Build OK, Runtime CRASH

**Build**:  SUCCESS
- All 23 files compile without errors
- Only warnings: superslab_allocate type mismatch (legacy code)

**Runtime**:  SEGFAULT
- Crash location: sll_refill_small_from_ss()
- Exit code: 139 (SIGSEGV)
- Test case: ./bench_random_mixed_hakmem 1000 256 42

## Known Issues
1. **SEGFAULT in refill path** - Likely shared_pool_acquire_slab() issue
2. **Legacy superslab_allocate()** still exists (type mismatch warning)
3. **Remaining TODOs** from design doc:
   - SuperSlab physical layout integration
   - slab_handle.h cleanup
   - Remove old per-class head implementation

## Next Steps
1. Debug SEGFAULT (gdb backtrace shows sll_refill_small_from_ss)
2. Fix shared_pool_acquire_slab() or superslab_init_slab()
3. Basic functionality test (1K → 100K iterations)
4. Measure SuperSlab count reduction (877 → 100-200)
5. Performance benchmark (+650-860% expected)

## Files Changed (25 files)
core/box/free_local_box.c
core/box/free_remote_box.c
core/box/front_gate_classifier.c
core/hakmem_super_registry.c
core/hakmem_tiny.c
core/hakmem_tiny_bg_spill.c
core/hakmem_tiny_free.inc
core/hakmem_tiny_lifecycle.inc
core/hakmem_tiny_magazine.c
core/hakmem_tiny_query.c
core/hakmem_tiny_refill.inc.h
core/hakmem_tiny_superslab.c
core/hakmem_tiny_superslab.h
core/hakmem_tiny_tls_ops.h
core/slab_handle.h
core/superslab/superslab_inline.h
core/superslab/superslab_types.h
core/tiny_debug.h
core/tiny_free_fast.inc.h
core/tiny_free_magazine.inc.h
core/tiny_remote.c
core/tiny_superslab_alloc.inc.h
core/tiny_superslab_free.inc.h
Makefile

## New Files (3 files)
PHASE12_SHARED_SUPERSLAB_POOL_DESIGN.md
core/hakmem_shared_pool.c
core/hakmem_shared_pool.h

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: ChatGPT <chatgpt@openai.com>
2025-11-13 16:33:03 +09:00

564 lines
23 KiB
C
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

// hakmem_tiny_refill.inc.h
// Phase 2D-1: Hot-path inline functions - Refill operations
//
// This file contains hot-path refill functions for various allocation tiers.
// These functions are extracted from hakmem_tiny.c to improve maintainability and
// reduce the main file size by approximately 280 lines.
//
// Functions handle:
// - tiny_fast_refill_and_take: Fast cache refill (lines 584-622, 39 lines)
// - quick_refill_from_sll: Quick slot refill from SLL (lines 918-936, 19 lines)
// - quick_refill_from_mag: Quick slot refill from magazine (lines 938-949, 12 lines)
// - sll_refill_small_from_ss: SLL refill from superslab (lines 952-996, 45 lines)
// - superslab_tls_bump_fast: TLS bump allocation (lines 1016-1060, 45 lines)
// - frontend_refill_fc: Frontend fast cache refill (lines 1063-1106, 44 lines)
// - bulk_mag_to_sll_if_room: Magazine to SLL bulk transfer (lines 1133-1154, 22 lines)
// - ultra_refill_sll: Ultra-mode SLL refill (lines 1178-1233, 56 lines)
#ifndef HAKMEM_TINY_REFILL_INC_H
#define HAKMEM_TINY_REFILL_INC_H
#include "hakmem_tiny.h"
#include "hakmem_tiny_superslab.h"
#include "hakmem_tiny_magazine.h"
#include "hakmem_tiny_tls_list.h"
#include "tiny_box_geometry.h" // Box 3: Geometry & Capacity Calculator
#include "hakmem_super_registry.h" // For hak_super_lookup (Debug validation)
#include "superslab/superslab_inline.h" // For slab_index_for/ss_slabs_capacity (Debug validation)
#include "box/tls_sll_box.h" // Box TLS-SLL: Safe SLL operations API
#include "hakmem_tiny_integrity.h" // PRIORITY 1-4: Corruption detection
#include "box/tiny_next_ptr_box.h" // Box API: Next pointer read/write
#include <stdint.h>
#include <pthread.h>
#include <stdlib.h>
// External declarations for TLS variables and globals
extern int g_fast_enable;
extern uint16_t g_fast_cap[TINY_NUM_CLASSES];
extern __thread void* g_fast_head[TINY_NUM_CLASSES];
extern __thread uint16_t g_fast_count[TINY_NUM_CLASSES];
extern int g_tls_list_enable;
extern int g_tls_sll_enable;
extern __thread void* g_tls_sll_head[TINY_NUM_CLASSES];
extern __thread uint32_t g_tls_sll_count[TINY_NUM_CLASSES];
extern int g_use_superslab;
extern int g_ultra_bump_shadow;
extern int g_bump_chunk;
extern __thread uint8_t* g_tls_bcur[TINY_NUM_CLASSES];
extern __thread uint8_t* g_tls_bend[TINY_NUM_CLASSES];
extern int g_fastcache_enable;
extern int g_quick_enable;
// External variable declarations
// Note: TinyTLSSlab, TinyFastCache, and TinyQuickSlot types must be defined before including this file
extern __thread TinyTLSSlab g_tls_slabs[TINY_NUM_CLASSES];
extern TinyPool g_tiny_pool;
extern PaddedLock g_tiny_class_locks[TINY_NUM_CLASSES];
extern __thread TinyFastCache g_fast_cache[TINY_NUM_CLASSES];
extern __thread TinyQuickSlot g_tls_quick[TINY_NUM_CLASSES];
// Frontend fill target
extern _Atomic uint32_t g_frontend_fill_target[TINY_NUM_CLASSES];
// Debug counters
#if HAKMEM_DEBUG_COUNTERS
extern uint64_t g_bump_hits[TINY_NUM_CLASSES];
extern uint64_t g_bump_arms[TINY_NUM_CLASSES];
extern uint64_t g_path_refill_calls[TINY_NUM_CLASSES];
extern uint64_t g_ultra_refill_calls[TINY_NUM_CLASSES];
#define HAK_PATHDBG_INC(arr, idx) do { if (g_path_debug_enabled) { (arr)[(idx)]++; } } while(0)
#define HAK_ULTRADBG_INC(arr, idx) do { (arr)[(idx)]++; } while(0)
extern int g_path_debug_enabled;
#else
#define HAK_PATHDBG_INC(arr, idx) do { (void)(idx); } while(0)
#define HAK_ULTRADBG_INC(arr, idx) do { (void)(idx); } while(0)
#endif
// Tracepoint macros
#ifndef HAK_TP1
#define HAK_TP1(name, idx) do { (void)(idx); } while(0)
#endif
// Forward declarations for functions used in this file
static inline void* tiny_fast_pop(int class_idx);
static inline int tiny_fast_push(int class_idx, void* ptr);
static inline int tls_refill_from_tls_slab(int class_idx, TinyTLSList* tls, uint32_t want);
static inline uint32_t sll_cap_for_class(int class_idx, uint32_t mag_cap);
SuperSlab* superslab_refill(int class_idx);
static void* slab_data_start(SuperSlab* ss, int slab_idx);
static inline uint8_t* tiny_slab_base_for(SuperSlab* ss, int slab_idx);
void ss_active_add(SuperSlab* ss, uint32_t n);
static inline void ss_active_inc(SuperSlab* ss);
static TinySlab* allocate_new_slab(int class_idx);
static void move_to_full_list(int class_idx, struct TinySlab* target_slab);
static int hak_tiny_find_free_block(TinySlab* slab);
static void hak_tiny_set_used(TinySlab* slab, int block_idx);
static inline int ultra_batch_for_class(int class_idx);
static inline int ultra_sll_cap_for_class(int class_idx);
// Note: tiny_small_mags_init_once and tiny_mag_init_if_needed are declared in hakmem_tiny_magazine.h
static void eventq_push(int class_idx, uint32_t size);
// Debug-only: Validate that a base node belongs to the expected Tiny SuperSlab and is stride-aligned
// IMPORTANT: This is expensive validation, ONLY enabled in DEBUG builds
#if !HAKMEM_BUILD_RELEASE && 0 // Disabled by default even in debug (enable with #if 1 if needed)
static inline void tiny_debug_validate_node_base(int class_idx, void* node, const char* where) {
if ((uintptr_t)node < 4096) {
fprintf(stderr, "[SLL_NODE_SMALL] %s: node=%p cls=%d\n", where, node, class_idx);
abort();
}
SuperSlab* ss = hak_super_lookup(node);
if (!ss) {
fprintf(stderr, "[SLL_NODE_UNKNOWN] %s: node=%p cls=%d\n", where, node, class_idx);
abort();
}
int ocls = meta ? meta->class_idx : -1;
if (ocls == 7 || ocls != class_idx) {
fprintf(stderr, "[SLL_NODE_CLASS_MISMATCH] %s: node=%p cls=%d owner_cls=%d\n", where, node, class_idx, ocls);
abort();
}
int slab_idx = slab_index_for(ss, node);
if (slab_idx < 0 || slab_idx >= ss_slabs_capacity(ss)) {
fprintf(stderr, "[SLL_NODE_SLAB_OOB] %s: node=%p slab_idx=%d cap=%d\n", where, node, slab_idx, ss_slabs_capacity(ss));
abort();
}
uint8_t* base = tiny_slab_base_for_geometry(ss, slab_idx);
size_t usable = tiny_usable_bytes_for_slab(slab_idx);
size_t stride = tiny_stride_for_class(ocls);
uintptr_t a = (uintptr_t)node;
if (a < (uintptr_t)base || a >= (uintptr_t)base + usable) {
fprintf(stderr, "[SLL_NODE_RANGE] %s: node=%p base=%p usable=%zu\n", where, node, base, usable);
abort();
}
size_t off = (size_t)(a - (uintptr_t)base);
if (off % stride != 0) {
fprintf(stderr, "[SLL_NODE_MISALIGNED] %s: node=%p off=%zu stride=%zu base=%p\n", where, node, off, stride, base);
abort();
}
}
#else
static inline void tiny_debug_validate_node_base(int class_idx, void* node, const char* where) { (void)class_idx; (void)node; (void)where; }
#endif
// Fast cache refill and take operation
static inline void* tiny_fast_refill_and_take(int class_idx, TinyTLSList* tls) {
// Phase 1: C0C3 prefer headerless array stack (FastCache) for lowest latency
if (__builtin_expect(g_fastcache_enable && class_idx <= 3, 1)) {
void* fc = fastcache_pop(class_idx);
if (fc) {
extern unsigned long long g_front_fc_hit[];
g_front_fc_hit[class_idx]++;
return fc;
} else {
extern unsigned long long g_front_fc_miss[];
g_front_fc_miss[class_idx]++;
}
}
// For class5 hotpath, skip direct Front (SFC/SLL) and rely on TLS List path
extern int g_tiny_hotpath_class5;
if (!(g_tiny_hotpath_class5 && class_idx == 5)) {
void* direct = tiny_fast_pop(class_idx);
if (direct) return direct;
}
uint16_t cap = g_fast_cap[class_idx];
if (cap == 0) return NULL;
uint16_t count = g_fast_count[class_idx];
uint16_t need = cap > count ? (uint16_t)(cap - count) : 0;
if (need == 0) return NULL;
uint32_t have = tls->count;
if (have < need) {
uint32_t want = need - have;
uint32_t thresh = tls_list_refill_threshold(tls);
if (want < thresh) want = thresh;
tls_refill_from_tls_slab(class_idx, tls, want);
}
void* batch_head = NULL;
void* batch_tail = NULL;
uint32_t taken = tls_list_bulk_take(tls, need, &batch_head, &batch_tail, class_idx);
if (taken == 0u || batch_head == NULL) {
return NULL;
}
void* ret = batch_head;
void* node = tiny_next_read(class_idx, ret);
uint32_t remaining = (taken > 0u) ? (taken - 1u) : 0u;
while (node && remaining > 0u) {
void* next = tiny_next_read(class_idx, node);
int pushed = 0;
if (__builtin_expect(g_fastcache_enable && class_idx <= 3, 1)) {
// Headerless array stack for hottest tiny classes
pushed = fastcache_push(class_idx, node);
} else {
// For class5 hotpath, keep leftovers in TLS List (not SLL)
extern int g_tiny_hotpath_class5;
if (__builtin_expect(g_tiny_hotpath_class5 && class_idx == 5, 0)) {
tls_list_push_fast(tls, node, 5);
pushed = 1;
} else {
pushed = tiny_fast_push(class_idx, node);
}
}
if (pushed) { node = next; remaining--; }
else {
// Push failed, return remaining to TLS (preserve order)
tls_list_bulk_put(tls, node, batch_tail, remaining, class_idx);
// ✅ FIX #16: Return BASE pointer (not USER)
// Caller will apply HAK_RET_ALLOC which does BASE → USER conversion
return ret;
}
}
// ✅ FIX #16: Return BASE pointer (not USER)
// Caller will apply HAK_RET_ALLOC which does BASE → USER conversion
return ret;
}
// Quick slot refill from SLL
static inline int quick_refill_from_sll(int class_idx) {
if (!g_tls_sll_enable) return 0;
TinyQuickSlot* qs = &g_tls_quick[class_idx];
int room = (int)(QUICK_CAP - qs->top);
if (room <= 0) return 0;
// Limit burst to a tiny constant to reduce loop/branches
if (room > 2) room = 2;
int filled = 0;
while (room > 0) {
// CRITICAL: Use Box TLS-SLL API to avoid race condition (rbp=0xa0 SEGV)
void* head = NULL;
if (!tls_sll_pop(class_idx, &head)) break;
// One-shot validation for the first pop
#if !HAKMEM_BUILD_RELEASE
do { static _Atomic int once = 0; int exp = 0; if (atomic_compare_exchange_strong(&once, &exp, 1)) { tiny_debug_validate_node_base(class_idx, head, "quick_refill_from_sll"); } } while (0);
#endif
qs->items[qs->top++] = head;
room--; filled++;
}
if (filled > 0) HAK_TP1(quick_refill_sll, class_idx);
if (filled > 0) {
extern unsigned long long g_front_quick_hit[];
g_front_quick_hit[class_idx]++;
}
return filled;
}
// Quick slot refill from magazine
static inline int quick_refill_from_mag(int class_idx) {
TinyTLSMag* mag = &g_tls_mags[class_idx];
if (mag->top <= 0) return 0;
TinyQuickSlot* qs = &g_tls_quick[class_idx];
int room = (int)(QUICK_CAP - qs->top);
if (room <= 0) return 0;
// Only a single transfer from magazine to minimize overhead
int take = (mag->top > 0 && room > 0) ? 1 : 0;
for (int i = 0; i < take; i++) { qs->items[qs->top++] = mag->items[--mag->top].ptr; }
if (take > 0) HAK_TP1(quick_refill_mag, class_idx);
return take;
}
// Box 3 wrapper: verify linear carve stays within slab usable bytes (Fail-Fast)
// DEPRECATED: Use tiny_carve_guard_verbose() from Box 3 directly
static inline int tiny_linear_carve_guard(TinyTLSSlab* tls,
TinySlabMeta* meta,
size_t stride,
uint32_t reserve,
const char* stage) {
if (!tls || !meta) return 0;
int class_idx = (tls->meta && tls->meta->class_idx < TINY_NUM_CLASSES)
? (int)tls->meta->class_idx
: -1;
return tiny_carve_guard_verbose(stage,
class_idx,
tls->slab_idx,
meta->carved,
meta->used,
meta->capacity,
stride,
reserve);
}
// Refill a few nodes directly into TLS SLL from TLS-cached SuperSlab (owner-thread only)
// Note: If HAKMEM_TINY_P0_BATCH_REFILL is enabled, sll_refill_batch_from_ss is used instead
#ifdef HAKMEM_TINY_PHASE6_BOX_REFACTOR
__attribute__((noinline)) int sll_refill_small_from_ss(int class_idx, int max_take) {
#else
static inline int sll_refill_small_from_ss(int class_idx, int max_take) {
#endif
HAK_CHECK_CLASS_IDX(class_idx, "sll_refill_small_from_ss");
atomic_fetch_add(&g_integrity_check_class_bounds, 1);
if (!g_use_superslab || max_take <= 0)
return 0;
TinyTLSSlab* tls = &g_tls_slabs[class_idx];
if (!tls->ss || !tls->meta || tls->meta->class_idx != (uint8_t)class_idx) {
if (!superslab_refill(class_idx))
return 0;
tls = &g_tls_slabs[class_idx];
if (!tls->ss || !tls->meta || tls->meta->class_idx != (uint8_t)class_idx)
return 0;
}
TinySlabMeta* meta = tls->meta;
uint32_t sll_cap = sll_cap_for_class(class_idx, (uint32_t)TINY_TLS_MAG_CAP);
int room = (int)sll_cap - (int)g_tls_sll_count[class_idx];
if (room <= 0)
return 0;
int take = max_take < room ? max_take : room;
int taken = 0;
size_t bs = tiny_stride_for_class(class_idx);
while (taken < take) {
void* p = NULL;
if (meta->freelist) {
p = meta->freelist;
meta->freelist = tiny_next_read(class_idx, p);
meta->used++;
ss_active_inc(tls->ss);
} else if (meta->carved < meta->capacity) {
if (!tiny_linear_carve_guard(tls, meta, bs, 1, "sll_refill_small"))
abort();
uint8_t* slab_start = tiny_slab_base_for_geometry(tls->ss, tls->slab_idx);
p = tiny_block_at_index(slab_start, meta->carved, bs);
meta->carved++;
meta->used++;
ss_active_inc(tls->ss);
} else {
if (!superslab_refill(class_idx))
break;
tls = &g_tls_slabs[class_idx];
meta = tls->meta;
if (!tls->ss || !meta || meta->class_idx != (uint8_t)class_idx)
break;
continue;
}
if (!p)
break;
if (!tls_sll_push(class_idx, p, sll_cap)) {
// SLL full; stop without complex rollback.
break;
}
taken++;
}
return taken;
}
// Ultra-Bump TLS shadow try: returns pointer when a TLS bump window is armed
// or can be armed by reserving a small chunk from the current SuperSlab meta.
static inline void* superslab_tls_bump_fast(int class_idx) {
if (!g_ultra_bump_shadow || !g_use_superslab) return NULL;
// Serve from armed TLS window if present
uint8_t* cur = g_tls_bcur[class_idx];
if (__builtin_expect(cur != NULL, 0)) {
uint8_t* end = g_tls_bend[class_idx];
// ✅ FIX #13B: Use stride (not user size) to match window arming (line 516)
// ROOT CAUSE: Window is carved with stride spacing, but fast path advanced by user size,
// causing misalignment and missing headers on blocks after the first one.
size_t bs = g_tiny_class_sizes[class_idx];
#if HAKMEM_TINY_HEADER_CLASSIDX
if (class_idx != 7) bs += 1; // stride = user_size + header
#endif
if (__builtin_expect(cur <= end - bs, 1)) {
g_tls_bcur[class_idx] = cur + bs;
#if HAKMEM_DEBUG_COUNTERS
g_bump_hits[class_idx]++;
#endif
HAK_TP1(bump_hit, class_idx);
// ✅ FIX #13: Write header and return BASE pointer
// ROOT CAUSE: Bump allocations didn't write headers, causing corruption when freed.
// SOLUTION: Write header to carved block before returning BASE.
// IMPORTANT: Return BASE (not USER) - caller will convert via HAK_RET_ALLOC.
#if HAKMEM_TINY_HEADER_CLASSIDX
if (class_idx != 7) {
*cur = HEADER_MAGIC | (class_idx & HEADER_CLASS_MASK);
}
#endif
return (void*)cur; // Return BASE (caller converts to USER via HAK_RET_ALLOC)
}
// Window exhausted
g_tls_bcur[class_idx] = NULL;
g_tls_bend[class_idx] = NULL;
}
// Arm a new window from TLS-cached SuperSlab meta (linear mode only)
TinyTLSSlab* tls = &g_tls_slabs[class_idx];
TinySlabMeta* meta = tls->meta;
if (!meta || meta->freelist != NULL) return NULL; // linear mode only
// Use monotonic 'carved' for window arming
uint16_t carved = meta->carved;
uint16_t cap = meta->capacity;
if (carved >= cap) return NULL;
uint32_t avail = (uint32_t)cap - (uint32_t)carved;
uint32_t chunk = (g_bump_chunk > 0 ? (uint32_t)g_bump_chunk : 1u);
if (chunk > avail) chunk = avail;
// Box 3: Get stride and slab base
size_t bs = tiny_stride_for_class(tls->meta ? tls->meta->class_idx : 0);
uint8_t* base = tls->slab_base ? tls->slab_base : tiny_slab_base_for_geometry(tls->ss, tls->slab_idx);
if (__builtin_expect(!tiny_linear_carve_guard(tls, meta, bs, chunk, "tls_bump"), 0)) {
abort();
}
uint8_t* start = base + ((size_t)carved * bs);
// Reserve the chunk: advance carved and used accordingly
meta->carved = (uint16_t)(carved + (uint16_t)chunk);
meta->used = (uint16_t)(meta->used + (uint16_t)chunk);
// Account all reserved blocks as active in SuperSlab
ss_active_add(tls->ss, chunk);
#if HAKMEM_DEBUG_COUNTERS
g_bump_arms[class_idx]++;
#endif
g_tls_bcur[class_idx] = start + bs;
g_tls_bend[class_idx] = start + (size_t)chunk * bs;
// ✅ FIX #13: Write header and return BASE pointer
#if HAKMEM_TINY_HEADER_CLASSIDX
if (class_idx != 7) {
*start = HEADER_MAGIC | (class_idx & HEADER_CLASS_MASK);
}
#endif
return (void*)start; // Return BASE (caller converts to USER via HAK_RET_ALLOC)
}
// Frontend: refill FastCache directly from TLS active slab (owner-only) or adopt a slab
static inline int frontend_refill_fc(int class_idx) {
TinyFastCache* fc = &g_fast_cache[class_idx];
int room = TINY_FASTCACHE_CAP - fc->top;
if (room <= 0) return 0;
// Target refill (conservative for safety)
int need = ultra_batch_for_class(class_idx);
int tgt = atomic_load_explicit(&g_frontend_fill_target[class_idx], memory_order_relaxed);
if (tgt > 0 && tgt < need) need = tgt;
if (need > room) need = room;
if (need <= 0) return 0;
int filled = 0;
// Step A: First bulk transfer from TLS SLL to FastCache (lock-free, O(1))
// CRITICAL: Use Box TLS-SLL API to avoid race condition (rbp=0xa0 SEGV)
if (g_tls_sll_enable) {
while (need > 0) {
void* h = NULL;
if (!tls_sll_pop(class_idx, &h)) break;
// One-shot validation for the first pop into FastCache
#if !HAKMEM_BUILD_RELEASE
do { static _Atomic int once_fc = 0; int exp2 = 0; if (atomic_compare_exchange_strong(&once_fc, &exp2, 1)) { tiny_debug_validate_node_base(class_idx, h, "frontend_refill_fc"); } } while (0);
#endif
fc->items[fc->top++] = h;
need--; filled++;
if (fc->top >= TINY_FASTCACHE_CAP) break;
}
}
// Step B: If still not enough, transfer from TLS Magazine (lock-free, O(1))
if (need > 0) {
tiny_small_mags_init_once();
if (class_idx > 3) tiny_mag_init_if_needed(class_idx);
TinyTLSMag* mag = &g_tls_mags[class_idx];
while (need > 0 && mag->top > 0 && fc->top < TINY_FASTCACHE_CAP) {
void* p = mag->items[--mag->top].ptr;
fc->items[fc->top++] = p;
need--; filled++;
}
}
if (filled > 0) {
eventq_push(class_idx, (uint32_t)g_tiny_class_sizes[class_idx]);
HAK_PATHDBG_INC(g_path_refill_calls, class_idx);
return 1;
}
return 0;
}
// Move up to 'n' items from TLS magazine to SLL if SLL has room (lock-free).
static inline int bulk_mag_to_sll_if_room(int class_idx, TinyTLSMag* mag, int n) {
if (g_tls_list_enable) return 0;
if (!g_tls_sll_enable || n <= 0) return 0;
uint32_t cap = sll_cap_for_class(class_idx, (uint32_t)mag->cap);
uint32_t have = g_tls_sll_count[class_idx];
if (have >= cap) return 0;
int room = (int)(cap - have);
int avail = mag->top;
// Hysteresis: avoid frequent tiny moves; take at least 8 if possible
int take = (n < room ? n : room);
if (take < 8 && avail >= 8 && room >= 8) take = 8;
if (take > avail) take = avail;
if (take <= 0) return 0;
for (int i = 0; i < take; i++) {
void* p = mag->items[--mag->top].ptr;
if (!tls_sll_push(class_idx, p, cap)) {
// No more room; return remaining items to magazine and stop
mag->top++; // undo pop
break;
}
}
HAK_PATHDBG_INC(g_path_refill_calls, class_idx);
return take;
}
// Ultra-mode (SLL-only) refill operation
static inline void ultra_refill_sll(int class_idx) {
int need = ultra_batch_for_class(class_idx);
HAK_ULTRADBG_INC(g_ultra_refill_calls, class_idx);
int sll_cap = ultra_sll_cap_for_class(class_idx);
pthread_mutex_t* lock = &g_tiny_class_locks[class_idx].m;
pthread_mutex_lock(lock);
TinySlab* slab = g_tiny_pool.free_slabs[class_idx];
if (!slab) {
slab = allocate_new_slab(class_idx);
if (slab) {
slab->next = g_tiny_pool.free_slabs[class_idx];
g_tiny_pool.free_slabs[class_idx] = slab;
}
}
if (slab) {
// Box 3: Get stride (block size + header, except C7 which is headerless)
size_t bs = tiny_stride_for_class(class_idx);
int remaining = need;
while (remaining > 0 && slab->free_count > 0) {
if ((int)g_tls_sll_count[class_idx] >= sll_cap) break;
int first = hak_tiny_find_free_block(slab);
if (first < 0) break;
// Allocate the first found block
hak_tiny_set_used(slab, first);
slab->free_count--;
void* p0 = (char*)slab->base + ((size_t)first * bs);
if (!tls_sll_push(class_idx, p0, (uint32_t)sll_cap)) {
// SLL saturated; stop refilling
break;
}
remaining--;
// Try to allocate more from the same word to amortize scanning
int word_idx = first / 64;
uint64_t used = slab->bitmap[word_idx];
uint64_t free_bits = ~used;
while (remaining > 0 && free_bits && slab->free_count > 0) {
if ((int)g_tls_sll_count[class_idx] >= sll_cap) break;
int bit_idx = __builtin_ctzll(free_bits);
int block_idx = word_idx * 64 + bit_idx;
hak_tiny_set_used(slab, block_idx);
slab->free_count--;
void* p = (char*)slab->base + ((size_t)block_idx * bs);
if (!tls_sll_push(class_idx, p, (uint32_t)sll_cap)) {
break;
}
remaining--;
// Update free_bits for next iteration
used = slab->bitmap[word_idx];
free_bits = ~used;
}
if (slab->free_count == 0) {
move_to_full_list(class_idx, slab);
break;
}
}
}
pthread_mutex_unlock(lock);
}
#endif // HAKMEM_TINY_REFILL_INC_H