# Phase 12: Shared SuperSlab Pool - Design Document **Date**: 2025-11-13 **Goal**: System malloc parity (90M ops/s) via mimalloc-style shared SuperSlab architecture **Expected Impact**: SuperSlab count 877 → 100-200 (-70-80%), +650-860% performance --- ## 🎯 Problem Statement ### Root Cause: Fixed Size Class Architecture **Current Design** (Phase 11): ```c // SuperSlab is bound to ONE size class struct SuperSlab { uint8_t size_class; // FIXED at allocation time (0-7) // ... 32 slabs, all for the SAME class }; // 8 independent SuperSlabHead structures (one per class) SuperSlabHead g_superslab_heads[8]; // Each class manages its own pool ``` **Problem**: - Benchmark (100K iterations, 256B): **877 SuperSlabs allocated** - Memory usage: 877MB (877 × 1MB SuperSlabs) - Metadata overhead: 877 × ~2KB headers = ~1.8MB - **Each size class independently allocates SuperSlabs** → massive churn **Why 877?**: ``` Class 0 (8B): ~100 SuperSlabs Class 1 (16B): ~120 SuperSlabs Class 2 (32B): ~150 SuperSlabs Class 3 (64B): ~180 SuperSlabs Class 4 (128B): ~140 SuperSlabs Class 5 (256B): ~187 SuperSlabs ← Target class for benchmark Class 6 (512B): ~80 SuperSlabs Class 7 (1KB): ~20 SuperSlabs Total: 877 SuperSlabs ``` **Performance Impact**: - Massive metadata traversal overhead - Poor cache locality (877 scattered 1MB regions) - Excessive TLB pressure - SuperSlab allocation churn dominates runtime --- ## 🚀 Solution: Shared SuperSlab Pool (mimalloc-style) ### Core Concept **New Design** (Phase 12): ```c // SuperSlab is NOT bound to any class - slabs are dynamically assigned struct SuperSlab { // NO size_class field! Each slab has its own class_idx uint8_t active_slabs; // Number of active slabs (any class) uint32_t slab_bitmap; // 32-bit bitmap (1=active, 0=free) // ... 32 slabs, EACH can be a different size class }; // Single global pool (shared by all classes) typedef struct SharedSuperSlabPool { SuperSlab** slabs; // Array of all SuperSlabs uint32_t total_count; // Total SuperSlabs allocated uint32_t active_count; // SuperSlabs with active slabs pthread_mutex_t lock; // Allocation lock // Per-class hints (fast path optimization) SuperSlab* class_hints[8]; // Last known SuperSlab with free space per class } SharedSuperSlabPool; ``` ### Per-Slab Dynamic Class Assignment **Old** (TinySlabMeta): ```c // Slab metadata (16 bytes) - class_idx inherited from SuperSlab typedef struct TinySlabMeta { void* freelist; uint16_t used; uint16_t capacity; uint16_t carved; uint16_t owner_tid; } TinySlabMeta; ``` **New** (Phase 12): ```c // Slab metadata (16 bytes) - class_idx is PER-SLAB typedef struct TinySlabMeta { void* freelist; uint16_t used; uint16_t capacity; uint16_t carved; uint8_t class_idx; // NEW: Dynamic class assignment (0-7, 255=unassigned) uint8_t owner_tid_low; // Truncated to 8-bit (from 16-bit) } TinySlabMeta; ``` **Size preserved**: Still 16 bytes (no growth!) --- ## 📐 Architecture Changes ### 1. SuperSlab Structure (superslab_types.h) **Remove**: ```c uint8_t size_class; // DELETE - no longer per-SuperSlab ``` **Add** (optional, for debugging): ```c uint8_t mixed_slab_count; // Number of slabs with different class_idx (stats) ``` ### 2. TinySlabMeta Structure (superslab_types.h) **Modify**: ```c typedef struct TinySlabMeta { void* freelist; uint16_t used; uint16_t capacity; uint16_t carved; uint8_t class_idx; // NEW: 0-7 for active, 255=unassigned uint8_t owner_tid_low; // Changed from uint16_t owner_tid } TinySlabMeta; ``` ### 3. Shared Pool Structure (NEW: hakmem_shared_pool.h) ```c // Global shared pool (singleton) typedef struct SharedSuperSlabPool { SuperSlab** slabs; // Dynamic array of SuperSlab pointers uint32_t capacity; // Array capacity (grows as needed) uint32_t total_count; // Total SuperSlabs allocated uint32_t active_count; // SuperSlabs with >0 active slabs pthread_mutex_t alloc_lock; // Lock for slab allocation // Per-class hints (lock-free read, updated under lock) SuperSlab* class_hints[TINY_NUM_CLASSES]; // LRU cache integration (Phase 9) SuperSlab* lru_head; SuperSlab* lru_tail; uint32_t lru_count; } SharedSuperSlabPool; // Global singleton extern SharedSuperSlabPool g_shared_pool; // API void shared_pool_init(void); SuperSlab* shared_pool_acquire_superslab(void); // Get/allocate SuperSlab int shared_pool_acquire_slab(int class_idx, SuperSlab** ss_out, int* slab_idx_out); void shared_pool_release_slab(SuperSlab* ss, int slab_idx); ``` ### 4. Allocation Flow (NEW) **Old Flow** (Phase 11): ``` 1. TLS cache miss for class C 2. Check g_superslab_heads[C].current_chunk 3. If no space → allocate NEW SuperSlab for class C 4. All 32 slabs in new SuperSlab belong to class C ``` **New Flow** (Phase 12): ``` 1. TLS cache miss for class C 2. Check g_shared_pool.class_hints[C] 3. If hint has free slab → assign that slab to class C (set class_idx=C) 4. If no hint: a. Scan g_shared_pool.slabs[] for any SuperSlab with free slab b. If found → assign slab to class C c. If not found → allocate NEW SuperSlab (add to pool) 5. Update class_hints[C] for fast path ``` **Key Benefit**: NEW SuperSlab only allocated when ALL existing SuperSlabs are full! --- ## 🔧 Implementation Plan ### Phase 12-1: Dynamic Slab Metadata ✅ (Current Task) **Files to modify**: - `core/superslab/superslab_types.h` - Add `class_idx` to TinySlabMeta - `core/superslab/superslab_types.h` - Remove `size_class` from SuperSlab **Changes**: ```c // TinySlabMeta: Add class_idx field typedef struct TinySlabMeta { void* freelist; uint16_t used; uint16_t capacity; uint16_t carved; uint8_t class_idx; // NEW: 0-7 for active, 255=UNASSIGNED uint8_t owner_tid_low; // Changed from uint16_t } TinySlabMeta; // SuperSlab: Remove size_class typedef struct SuperSlab { uint64_t magic; // uint8_t size_class; // REMOVED! uint8_t active_slabs; uint8_t lg_size; uint8_t _pad0; // ... rest unchanged } SuperSlab; ``` **Compatibility shim** (temporary, for gradual migration): ```c // Provide backward-compatible size_class accessor static inline int superslab_get_class(SuperSlab* ss, int slab_idx) { return ss->slabs[slab_idx].class_idx; } ``` ### Phase 12-2: Shared Pool Infrastructure **New file**: `core/hakmem_shared_pool.h`, `core/hakmem_shared_pool.c` **Functionality**: - `shared_pool_init()` - Initialize global pool - `shared_pool_acquire_slab()` - Get free slab for class_idx - `shared_pool_release_slab()` - Mark slab as free (class_idx=255) - `shared_pool_gc()` - Garbage collect empty SuperSlabs **Data structure**: ```c // Global pool (singleton) SharedSuperSlabPool g_shared_pool = { .slabs = NULL, .capacity = 0, .total_count = 0, .active_count = 0, .alloc_lock = PTHREAD_MUTEX_INITIALIZER, .class_hints = {NULL}, .lru_head = NULL, .lru_tail = NULL, .lru_count = 0 }; ``` ### Phase 12-3: Refill Path Integration **Files to modify**: - `core/hakmem_tiny_refill_p0.inc.h` - Update to use shared pool - `core/tiny_superslab_alloc.inc.h` - Replace per-class allocation with shared pool **Key changes**: ```c // OLD: superslab_refill(int class_idx) static SuperSlab* superslab_refill_old(int class_idx) { SuperSlabHead* head = &g_superslab_heads[class_idx]; // ... allocate SuperSlab for class_idx only } // NEW: superslab_refill(int class_idx) - use shared pool static SuperSlab* superslab_refill_new(int class_idx) { SuperSlab* ss = NULL; int slab_idx = -1; // Try to acquire a free slab from shared pool if (shared_pool_acquire_slab(class_idx, &ss, &slab_idx) == 0) { // SUCCESS: Got a slab assigned to class_idx return ss; } // FAILURE: All SuperSlabs full, need to allocate new one // (This should be RARE after pool grows to steady-state) return NULL; } ``` ### Phase 12-4: Free Path Integration **Files to modify**: - `core/tiny_free_fast.inc.h` - Update to handle dynamic class_idx - `core/tiny_superslab_free.inc.h` - Update to release slabs back to pool **Key changes**: ```c // OLD: Free assumes slab belongs to ss->size_class static inline void hak_tiny_free_superslab_old(void* ptr, SuperSlab* ss) { int class_idx = ss->size_class; // FIXED class // ... free logic } // NEW: Free reads class_idx from slab metadata static inline void hak_tiny_free_superslab_new(void* ptr, SuperSlab* ss, int slab_idx) { int class_idx = ss->slabs[slab_idx].class_idx; // DYNAMIC class // ... free logic // If slab becomes empty, release back to pool if (ss->slabs[slab_idx].used == 0) { shared_pool_release_slab(ss, slab_idx); ss->slabs[slab_idx].class_idx = 255; // Mark as unassigned } } ``` ### Phase 12-5: Testing & Benchmarking **Validation**: 1. **Correctness**: Run bench_fixed_size_hakmem 100K iterations (all classes) 2. **SuperSlab count**: Monitor g_shared_pool.total_count (expect 100-200) 3. **Performance**: bench_random_mixed_hakmem (expect 70-90M ops/s) **Expected results**: | Metric | Phase 11 (Before) | Phase 12 (After) | Improvement | |--------|-------------------|------------------|-------------| | SuperSlab count | 877 | 100-200 | -70-80% | | Memory usage | 877MB | 100-200MB | -70-80% | | Metadata overhead | ~1.8MB | ~0.2-0.4MB | -78-89% | | Performance | 9.38M ops/s | 70-90M ops/s | +650-860% | --- ## ⚠️ Risk Analysis ### Complexity Risks 1. **Concurrency**: Shared pool requires careful locking - **Mitigation**: Per-class hints reduce contention (lock-free fast path) 2. **Fragmentation**: Mixed classes in same SuperSlab may increase fragmentation - **Mitigation**: Smart slab assignment (prefer same-class SuperSlabs) 3. **Debugging**: Dynamic class_idx makes debugging harder - **Mitigation**: Add runtime validation (class_idx sanity checks) ### Performance Risks 1. **Lock contention**: Shared pool lock may become bottleneck - **Mitigation**: Per-class hints + fast path bypass lock 90%+ of time 2. **Cache misses**: Accessing distant SuperSlabs may reduce locality - **Mitigation**: LRU cache keeps hot SuperSlabs resident --- ## 📊 Success Metrics ### Primary Goals 1. **SuperSlab count**: 877 → 100-200 (-70-80%) ✅ 2. **Performance**: 9.38M → 70-90M ops/s (+650-860%) ✅ 3. **Memory usage**: 877MB → 100-200MB (-70-80%) ✅ ### Stretch Goals 1. **System malloc parity**: 90M ops/s (100% of target) 🎯 2. **Scalability**: Maintain performance with 4T+ threads 3. **Fragmentation**: <10% internal fragmentation --- ## 🔄 Migration Strategy ### Phase 12-1: Metadata (Low Risk) - Add `class_idx` to TinySlabMeta (16B preserved) - Remove `size_class` from SuperSlab - Add backward-compatible shim ### Phase 12-2: Infrastructure (Medium Risk) - Implement shared pool (NEW code, isolated) - No changes to existing paths yet ### Phase 12-3: Integration (High Risk) - Update refill path to use shared pool - Update free path to handle dynamic class_idx - **Critical**: Extensive testing required ### Phase 12-4: Cleanup (Low Risk) - Remove per-class SuperSlabHead structures - Remove backward-compatible shims - Final optimization pass --- ## 📝 Next Steps ### Immediate (Phase 12-1) 1. ✅ Update `superslab_types.h` - Add `class_idx` to TinySlabMeta 2. ✅ Update `superslab_types.h` - Remove `size_class` from SuperSlab 3. Add backward-compatible shim `superslab_get_class()` 4. Fix compilation errors (grep for `ss->size_class`) ### Next (Phase 12-2) 1. Implement `hakmem_shared_pool.h/c` 2. Write unit tests for shared pool 3. Integrate with LRU cache (Phase 9) ### Then (Phase 12-3+) 1. Update refill path 2. Update free path 3. Benchmark & validate 4. Cleanup & optimize --- **Status**: 🚧 Phase 12-1 (Metadata) - IN PROGRESS **Expected completion**: Phase 12-1 today, Phase 12-2 tomorrow, Phase 12-3 day after **Total estimated time**: 3-4 days for full implementation