524 lines
14 KiB
Markdown
524 lines
14 KiB
Markdown
|
|
# Warm Pool Implementation - Quick-Start Guide
|
|||
|
|
## 2025-12-04
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🎯 TL;DR
|
|||
|
|
|
|||
|
|
**Objective:** Add per-thread warm SuperSlab pools to eliminate registry scan on cache miss.
|
|||
|
|
|
|||
|
|
**Expected Result:** +40-50% performance (1.06M → 1.5M+ ops/s)
|
|||
|
|
|
|||
|
|
**Code Changes:** ~300 lines total
|
|||
|
|
- 1 new header file (80 lines)
|
|||
|
|
- 3 files modified (unified_cache, malloc_tiny_fast, superslab_registry)
|
|||
|
|
|
|||
|
|
**Time Estimate:** 2-3 days
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📋 Implementation Roadmap
|
|||
|
|
|
|||
|
|
### Step 1: Create Warm Pool Header (30 mins)
|
|||
|
|
|
|||
|
|
**File:** `core/front/tiny_warm_pool.h` (NEW)
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
#ifndef HAK_TINY_WARM_POOL_H
|
|||
|
|
#define HAK_TINY_WARM_POOL_H
|
|||
|
|
|
|||
|
|
#include <stdint.h>
|
|||
|
|
#include "../hakmem_tiny_config.h"
|
|||
|
|
#include "../superslab/superslab_types.h"
|
|||
|
|
|
|||
|
|
// Maximum warm SuperSlabs per thread per class
|
|||
|
|
#define TINY_WARM_POOL_MAX_PER_CLASS 4
|
|||
|
|
|
|||
|
|
typedef struct {
|
|||
|
|
SuperSlab* slabs[TINY_WARM_POOL_MAX_PER_CLASS];
|
|||
|
|
int32_t count;
|
|||
|
|
} TinyWarmPool;
|
|||
|
|
|
|||
|
|
// Per-thread warm pool (one per class)
|
|||
|
|
extern __thread TinyWarmPool g_tiny_warm_pool[TINY_NUM_CLASSES];
|
|||
|
|
|
|||
|
|
// Initialize once per thread (lazy)
|
|||
|
|
static inline void tiny_warm_pool_init_once(void) {
|
|||
|
|
static __thread int initialized = 0;
|
|||
|
|
if (!initialized) {
|
|||
|
|
for (int i = 0; i < TINY_NUM_CLASSES; i++) {
|
|||
|
|
g_tiny_warm_pool[i].count = 0;
|
|||
|
|
}
|
|||
|
|
initialized = 1;
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// O(1) pop from warm pool
|
|||
|
|
// Returns: SuperSlab* (not NULL if pool has items)
|
|||
|
|
static inline SuperSlab* tiny_warm_pool_pop(int class_idx) {
|
|||
|
|
if (g_tiny_warm_pool[class_idx].count > 0) {
|
|||
|
|
return g_tiny_warm_pool[class_idx].slabs[--g_tiny_warm_pool[class_idx].count];
|
|||
|
|
}
|
|||
|
|
return NULL;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// O(1) push to warm pool
|
|||
|
|
// Returns: 1 if pushed, 0 if pool full (caller should free to LRU)
|
|||
|
|
static inline int tiny_warm_pool_push(int class_idx, SuperSlab* ss) {
|
|||
|
|
if (g_tiny_warm_pool[class_idx].count < TINY_WARM_POOL_MAX_PER_CLASS) {
|
|||
|
|
g_tiny_warm_pool[class_idx].slabs[g_tiny_warm_pool[class_idx].count++] = ss;
|
|||
|
|
return 1;
|
|||
|
|
}
|
|||
|
|
return 0;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// Get current count (for metrics)
|
|||
|
|
static inline int tiny_warm_pool_count(int class_idx) {
|
|||
|
|
return g_tiny_warm_pool[class_idx].count;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
#endif // HAK_TINY_WARM_POOL_H
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Step 2: Declare Thread-Local Variable (5 mins)
|
|||
|
|
|
|||
|
|
**File:** `core/front/malloc_tiny_fast.h` (or `tiny_warm_pool.h`)
|
|||
|
|
|
|||
|
|
Add to appropriate source file (e.g., `core/hakmem_tiny.c` or new `core/front/tiny_warm_pool.c`):
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
#include "tiny_warm_pool.h"
|
|||
|
|
|
|||
|
|
// Per-thread warm pools (one array per class)
|
|||
|
|
__thread TinyWarmPool g_tiny_warm_pool[TINY_NUM_CLASSES] = {0};
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Step 3: Modify unified_cache_refill() (60 mins)
|
|||
|
|
|
|||
|
|
**File:** `core/front/tiny_unified_cache.h`
|
|||
|
|
|
|||
|
|
**Current Implementation:**
|
|||
|
|
```c
|
|||
|
|
static inline void unified_cache_refill(int class_idx) {
|
|||
|
|
// Find first HOT SuperSlab in per-class registry
|
|||
|
|
for (int i = 0; i < g_super_reg_by_class_count[class_idx]; i++) {
|
|||
|
|
SuperSlab* ss = g_super_reg_by_class[class_idx][i];
|
|||
|
|
if (ss_tier_is_hot(ss)) {
|
|||
|
|
// Carve and refill cache
|
|||
|
|
carve_blocks_from_superslab(ss, class_idx,
|
|||
|
|
&g_unified_cache[class_idx]);
|
|||
|
|
return;
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
// Not found → cold path (allocate new SuperSlab)
|
|||
|
|
allocate_new_superslab_and_carve(class_idx);
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**New Implementation (with Warm Pool):**
|
|||
|
|
```c
|
|||
|
|
#include "tiny_warm_pool.h"
|
|||
|
|
|
|||
|
|
static inline void unified_cache_refill(int class_idx) {
|
|||
|
|
// 1. Initialize warm pool on first use (per-thread)
|
|||
|
|
tiny_warm_pool_init_once();
|
|||
|
|
|
|||
|
|
// 2. Try warm pool first (no locks, O(1))
|
|||
|
|
SuperSlab* ss = tiny_warm_pool_pop(class_idx);
|
|||
|
|
if (ss) {
|
|||
|
|
// SuperSlab already HOT (pre-qualified)
|
|||
|
|
// No tier check needed, just carve
|
|||
|
|
carve_blocks_from_superslab(ss, class_idx,
|
|||
|
|
&g_unified_cache[class_idx]);
|
|||
|
|
return;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// 3. Fall back to registry scan (only if warm pool empty)
|
|||
|
|
for (int i = 0; i < g_super_reg_by_class_count[class_idx]; i++) {
|
|||
|
|
SuperSlab* candidate = g_super_reg_by_class[class_idx][i];
|
|||
|
|
if (ss_tier_is_hot(candidate)) {
|
|||
|
|
// Carve blocks
|
|||
|
|
carve_blocks_from_superslab(candidate, class_idx,
|
|||
|
|
&g_unified_cache[class_idx]);
|
|||
|
|
|
|||
|
|
// Refill warm pool for next miss
|
|||
|
|
// (Look ahead 2-3 more HOT SuperSlabs)
|
|||
|
|
for (int j = i + 1; j < g_super_reg_by_class_count[class_idx] && j < i + 3; j++) {
|
|||
|
|
SuperSlab* extra = g_super_reg_by_class[class_idx][j];
|
|||
|
|
if (ss_tier_is_hot(extra)) {
|
|||
|
|
tiny_warm_pool_push(class_idx, extra);
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
return;
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// 4. Registry exhausted → cold path (allocate new SuperSlab)
|
|||
|
|
allocate_new_superslab_and_carve(class_idx);
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Step 4: Initialize Warm Pool in malloc_tiny_fast() (20 mins)
|
|||
|
|
|
|||
|
|
**File:** `core/front/malloc_tiny_fast.h`
|
|||
|
|
|
|||
|
|
Ensure warm pool is initialized on first malloc call:
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
// In malloc_tiny_fast() or tiny_hot_alloc_fast():
|
|||
|
|
if (__builtin_expect(g_tiny_warm_pool[0].count == 0 && need_init, 0)) {
|
|||
|
|
tiny_warm_pool_init_once();
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Or simpler: Let `unified_cache_refill()` call `tiny_warm_pool_init_once()` (as shown in Step 3).
|
|||
|
|
|
|||
|
|
### Step 5: Add to SuperSlab Cleanup (30 mins)
|
|||
|
|
|
|||
|
|
**File:** `core/hakmem_super_registry.h` or `core/hakmem_tiny.h`
|
|||
|
|
|
|||
|
|
When a SuperSlab becomes empty (no active objects), add it to warm pool if room:
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
// In ss_slab_meta free path (when last object freed):
|
|||
|
|
if (ss_slab_meta_active_count(slab_meta) == 0) {
|
|||
|
|
// SuperSlab is now empty
|
|||
|
|
SuperSlab* ss = ss_from_slab_meta(slab_meta);
|
|||
|
|
int class_idx = ss_slab_meta_class_get(slab_meta);
|
|||
|
|
|
|||
|
|
// Try to add to warm pool for next allocation
|
|||
|
|
if (!tiny_warm_pool_push(class_idx, ss)) {
|
|||
|
|
// Warm pool full, return to LRU cache
|
|||
|
|
ss_cache_put(ss);
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Step 6: Add Optional Environment Variables (15 mins)
|
|||
|
|
|
|||
|
|
**File:** `core/hakmem_tiny.h` or `core/front/tiny_warm_pool.h`
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
// Check warm pool size via environment (for tuning)
|
|||
|
|
static inline int warm_pool_max_per_class(void) {
|
|||
|
|
static int max = -1;
|
|||
|
|
if (max == -1) {
|
|||
|
|
const char* env = getenv("HAKMEM_WARM_POOL_SIZE");
|
|||
|
|
if (env) {
|
|||
|
|
max = atoi(env);
|
|||
|
|
if (max < 1 || max > 16) max = TINY_WARM_POOL_MAX_PER_CLASS;
|
|||
|
|
} else {
|
|||
|
|
max = TINY_WARM_POOL_MAX_PER_CLASS;
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
return max;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// Use in tiny_warm_pool_push():
|
|||
|
|
static inline int tiny_warm_pool_push(int class_idx, SuperSlab* ss) {
|
|||
|
|
int capacity = warm_pool_max_per_class();
|
|||
|
|
if (g_tiny_warm_pool[class_idx].count < capacity) {
|
|||
|
|
g_tiny_warm_pool[class_idx].slabs[g_tiny_warm_pool[class_idx].count++] = ss;
|
|||
|
|
return 1;
|
|||
|
|
}
|
|||
|
|
return 0;
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🔍 Testing Checklist
|
|||
|
|
|
|||
|
|
### Unit Tests
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
// In test/test_warm_pool.c (NEW)
|
|||
|
|
|
|||
|
|
void test_warm_pool_pop_empty() {
|
|||
|
|
// Verify pop on empty returns NULL
|
|||
|
|
SuperSlab* ss = tiny_warm_pool_pop(0);
|
|||
|
|
assert(ss == NULL);
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
void test_warm_pool_push_pop() {
|
|||
|
|
// Verify push then pop returns same
|
|||
|
|
SuperSlab* test_ss = (SuperSlab*)0x123456;
|
|||
|
|
tiny_warm_pool_push(0, test_ss);
|
|||
|
|
SuperSlab* popped = tiny_warm_pool_pop(0);
|
|||
|
|
assert(popped == test_ss);
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
void test_warm_pool_capacity() {
|
|||
|
|
// Verify pool respects capacity
|
|||
|
|
for (int i = 0; i < TINY_WARM_POOL_MAX_PER_CLASS + 1; i++) {
|
|||
|
|
SuperSlab* ss = (SuperSlab*)malloc(sizeof(SuperSlab));
|
|||
|
|
int pushed = tiny_warm_pool_push(0, ss);
|
|||
|
|
if (i < TINY_WARM_POOL_MAX_PER_CLASS) {
|
|||
|
|
assert(pushed == 1); // Should succeed
|
|||
|
|
} else {
|
|||
|
|
assert(pushed == 0); // Should fail when full
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
void test_warm_pool_per_thread() {
|
|||
|
|
// Verify thread isolation
|
|||
|
|
pthread_t t1, t2;
|
|||
|
|
pthread_create(&t1, NULL, thread_func_1, NULL);
|
|||
|
|
pthread_create(&t2, NULL, thread_func_2, NULL);
|
|||
|
|
pthread_join(t1, NULL);
|
|||
|
|
pthread_join(t2, NULL);
|
|||
|
|
// Each thread should have independent warm pools
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Integration Tests
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# Run existing benchmark suite
|
|||
|
|
./bench_allocators_hakmem bench_random_mixed_hakmem 1000000 256 42
|
|||
|
|
|
|||
|
|
# Compare before/after:
|
|||
|
|
Before: 1.06M ops/s
|
|||
|
|
After: 1.5M+ ops/s (target +40%)
|
|||
|
|
|
|||
|
|
# Run other benchmarks to verify no regression
|
|||
|
|
./bench_allocators_hakmem bench_tiny_hot # Should be ~89M ops/s
|
|||
|
|
./bench_allocators_hakmem bench_tiny_cold # Should be similar
|
|||
|
|
./bench_allocators_hakmem bench_random_mid # Should improve
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Performance Metrics
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# With perf profiling
|
|||
|
|
HAKMEM_WARM_POOL_SIZE=4 perf record -F 5000 -e cycles \
|
|||
|
|
./bench_allocators_hakmem bench_random_mixed_hakmem 1000000 256 42
|
|||
|
|
|
|||
|
|
# Expected to see:
|
|||
|
|
# - Fewer unified_cache_refill calls
|
|||
|
|
# - Reduced registry scan overhead
|
|||
|
|
# - Increased warm pool pop hits
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📊 Success Criteria
|
|||
|
|
|
|||
|
|
| Metric | Current | Target | Status |
|
|||
|
|
|--------|---------|--------|--------|
|
|||
|
|
| Random Mixed ops/s | 1.06M | 1.5M+ | ✓ Target |
|
|||
|
|
| Warm pool hit rate | N/A | > 90% | ✓ New metric |
|
|||
|
|
| Tiny Hot ops/s | 89M | 89M | ✓ No regression |
|
|||
|
|
| Memory per thread | ~256KB | < 400KB | ✓ Acceptable |
|
|||
|
|
| All tests pass | ✓ | ✓ | ✓ Verify |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🚀 Quick Build & Test
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# After code changes, compile and test:
|
|||
|
|
|
|||
|
|
cd /mnt/workdisk/public_share/hakmem
|
|||
|
|
|
|||
|
|
# Build
|
|||
|
|
make clean && make
|
|||
|
|
|
|||
|
|
# Test warm pool directly
|
|||
|
|
make test_warm_pool
|
|||
|
|
./test_warm_pool
|
|||
|
|
|
|||
|
|
# Benchmark
|
|||
|
|
./bench_allocators_hakmem bench_random_mixed_hakmem 1000000 256 42
|
|||
|
|
|
|||
|
|
# Profile
|
|||
|
|
perf record -F 5000 -e cycles \
|
|||
|
|
./bench_allocators_hakmem bench_random_mixed_hakmem 1000000 256 42
|
|||
|
|
perf report
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🔧 Debugging Tips
|
|||
|
|
|
|||
|
|
### Verify Warm Pool is Active
|
|||
|
|
|
|||
|
|
Add debug output to warm pool operations:
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
#if !HAKMEM_BUILD_RELEASE
|
|||
|
|
static int warm_pool_pop_debug(int class_idx) {
|
|||
|
|
SuperSlab* ss = tiny_warm_pool_pop(class_idx);
|
|||
|
|
if (ss) {
|
|||
|
|
fprintf(stderr, "[WarmPool] Pop class=%d, count=%d\n",
|
|||
|
|
class_idx, g_tiny_warm_pool[class_idx].count);
|
|||
|
|
}
|
|||
|
|
return ss ? 1 : 0;
|
|||
|
|
}
|
|||
|
|
#endif
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Check Warm Pool Hit Rate
|
|||
|
|
|
|||
|
|
```c
|
|||
|
|
// Global counters (atomic)
|
|||
|
|
__thread uint64_t g_warm_pool_hits = 0;
|
|||
|
|
__thread uint64_t g_warm_pool_misses = 0;
|
|||
|
|
|
|||
|
|
// Add to refill
|
|||
|
|
if (tiny_warm_pool_pop(...)) {
|
|||
|
|
g_warm_pool_hits++; // Hit
|
|||
|
|
} else {
|
|||
|
|
g_warm_pool_misses++; // Miss
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// Print at end of benchmark
|
|||
|
|
fprintf(stderr, "Warm pool: %lu hits, %lu misses (%.1f%% hit rate)\n",
|
|||
|
|
g_warm_pool_hits, g_warm_pool_misses,
|
|||
|
|
100.0 * g_warm_pool_hits / (g_warm_pool_hits + g_warm_pool_misses));
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Measure Registry Scan Reduction
|
|||
|
|
|
|||
|
|
Profile before/after to verify:
|
|||
|
|
- Fewer calls to registry scan loop
|
|||
|
|
- Reduced cycles in `unified_cache_refill()`
|
|||
|
|
- Increased warm pool pop calls
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📝 Commit Message Template
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Add warm pool optimization for 40% performance improvement
|
|||
|
|
|
|||
|
|
- New: tiny_warm_pool.h with per-thread SuperSlab pools
|
|||
|
|
- Modify: unified_cache_refill() to use warm pool (O(1) pop)
|
|||
|
|
- Modify: SuperSlab cleanup to add to warm pool
|
|||
|
|
- Env: HAKMEM_WARM_POOL_SIZE for tuning (default: 4)
|
|||
|
|
|
|||
|
|
Benefits:
|
|||
|
|
- Eliminates registry O(N) scan on cache miss
|
|||
|
|
- 40-50% improvement on Random Mixed (1.06M → 1.5M+ ops/s)
|
|||
|
|
- No regression in other workloads
|
|||
|
|
- Minimal per-thread memory overhead (<200KB)
|
|||
|
|
|
|||
|
|
Testing:
|
|||
|
|
- Unit tests for warm pool operations
|
|||
|
|
- Benchmark validation: Random Mixed +40%
|
|||
|
|
- No regression in Tiny Hot, Tiny Cold
|
|||
|
|
- Thread safety verified
|
|||
|
|
|
|||
|
|
🤖 Generated with Claude Code
|
|||
|
|
Co-Authored-By: Claude <noreply@anthropic.com>
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🎓 Key Design Decisions
|
|||
|
|
|
|||
|
|
### Why 4 SuperSlabs per Class?
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Trade-off: Working set size vs warm pool effectiveness
|
|||
|
|
|
|||
|
|
Too small (1-2):
|
|||
|
|
- Less memory: ✓
|
|||
|
|
- High miss rate: ✗ (frequently falls back to registry)
|
|||
|
|
|
|||
|
|
Right size (4):
|
|||
|
|
- Memory: ~8-32 KB per class × 32 classes = 256-512 KB
|
|||
|
|
- Hit rate: ~90% (captures typical working set)
|
|||
|
|
- Sweet spot: ✓
|
|||
|
|
|
|||
|
|
Too large (8+):
|
|||
|
|
- More memory: ✗ (unnecessary TLS bloat)
|
|||
|
|
- Marginal benefit: ✗ (diminishing returns)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Why Thread-Local Storage?
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Options:
|
|||
|
|
1. Global pool (lock-protected) → Contention
|
|||
|
|
2. Per-thread pool (TLS) → No locks, thread-safe ✓
|
|||
|
|
3. Hybrid (mostly TLS) → Complexity
|
|||
|
|
|
|||
|
|
Chosen: Per-thread TLS
|
|||
|
|
- Fast path: No locks
|
|||
|
|
- Correctness: Thread-safe by design
|
|||
|
|
- Simplicity: No synchronization needed
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Why Batched Tier Check?
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Current: Check tier on every refill (expensive)
|
|||
|
|
Proposed: Check tier periodically (every 64 pops)
|
|||
|
|
|
|||
|
|
Cost:
|
|||
|
|
- Rare case: SuperSlab changes tier while in warm pool
|
|||
|
|
- Detection: Caught on next batch check (~50 operations later)
|
|||
|
|
- Fallback: Registry scan still validates
|
|||
|
|
|
|||
|
|
Benefit:
|
|||
|
|
- Reduces unnecessary tier checks
|
|||
|
|
- Improves cache refill performance
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📚 Related Files
|
|||
|
|
|
|||
|
|
**Core Implementation:**
|
|||
|
|
- `core/front/tiny_warm_pool.h` (NEW - this guide)
|
|||
|
|
- `core/front/tiny_unified_cache.h` (MODIFY - call warm pool)
|
|||
|
|
- `core/front/malloc_tiny_fast.h` (MODIFY - init warm pool)
|
|||
|
|
|
|||
|
|
**Supporting:**
|
|||
|
|
- `core/hakmem_super_registry.h` (UNDERSTAND - how registry works)
|
|||
|
|
- `core/box/ss_tier_box.h` (UNDERSTAND - tier management)
|
|||
|
|
- `core/superslab/superslab_types.h` (REFERENCE - SuperSlab struct)
|
|||
|
|
|
|||
|
|
**Testing:**
|
|||
|
|
- `bench_allocators_hakmem` (BENCHMARK)
|
|||
|
|
- `test/test_*.c` (ADD warm pool tests)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## ✅ Implementation Checklist
|
|||
|
|
|
|||
|
|
- [ ] Create `core/front/tiny_warm_pool.h`
|
|||
|
|
- [ ] Declare `__thread g_tiny_warm_pool[]`
|
|||
|
|
- [ ] Modify `unified_cache_refill()` in `tiny_unified_cache.h`
|
|||
|
|
- [ ] Add `tiny_warm_pool_init_once()` call in malloc hot path
|
|||
|
|
- [ ] Add warm pool push on SuperSlab cleanup
|
|||
|
|
- [ ] Add optional environment variable tuning
|
|||
|
|
- [ ] Write unit tests for warm pool operations
|
|||
|
|
- [ ] Compile and verify no errors
|
|||
|
|
- [ ] Run benchmark: Random Mixed ops/s improvement
|
|||
|
|
- [ ] Verify no regression in other workloads
|
|||
|
|
- [ ] Measure warm pool hit rate (target > 90%)
|
|||
|
|
- [ ] Profile CPU cycles (target ~40-50% reduction)
|
|||
|
|
- [ ] Create commit with summary above
|
|||
|
|
- [ ] Update documentation if needed
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📞 Questions or Issues?
|
|||
|
|
|
|||
|
|
If you encounter:
|
|||
|
|
|
|||
|
|
1. **Compilation errors:** Check includes, particularly `superslab_types.h`
|
|||
|
|
2. **Low hit rate (<80%):** Increase pool size via `HAKMEM_WARM_POOL_SIZE`
|
|||
|
|
3. **Memory bloat:** Verify pool size is <= 4 slots per class
|
|||
|
|
4. **No performance gain:** Check warm pool is actually being used (add debug output)
|
|||
|
|
5. **Regression in other tests:** Verify registry fallback path still works
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**Status:** Ready to implement
|
|||
|
|
**Expected Timeline:** 2-3 development days
|
|||
|
|
**Estimated Performance Gain:** +40-50% (1.06M → 1.5M+ ops/s)
|