Files
hakmem/docs/design/PHASE12_SHARED_SUPERSLAB_POOL_DESIGN.md
Moe Charm (CI) a9ddb52ad4 ENV cleanup: Remove BG/HotMag vars & guard fprintf (Larson 52.3M ops/s)
Phase 1 完了:環境変数整理 + fprintf デバッグガード

ENV変数削除(BG/HotMag系):
- core/hakmem_tiny_init.inc: HotMag ENV 削除 (~131 lines)
- core/hakmem_tiny_bg_spill.c: BG spill ENV 削除
- core/tiny_refill.h: BG remote 固定値化
- core/hakmem_tiny_slow.inc: BG refs 削除

fprintf Debug Guards (#if !HAKMEM_BUILD_RELEASE):
- core/hakmem_shared_pool.c: Lock stats (~18 fprintf)
- core/page_arena.c: Init/Shutdown/Stats (~27 fprintf)
- core/hakmem.c: SIGSEGV init message

ドキュメント整理:
- 328 markdown files 削除(旧レポート・重複docs)

性能確認:
- Larson: 52.35M ops/s (前回52.8M、安定動作)
- ENV整理による機能影響なし
- Debug出力は一部残存(次phase で対応)

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 14:45:26 +09:00

424 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Phase 12: Shared SuperSlab Pool - Design Document
**Date**: 2025-11-13
**Goal**: System malloc parity (90M ops/s) via mimalloc-style shared SuperSlab architecture
**Expected Impact**: SuperSlab count 877 → 100-200 (-70-80%), +650-860% performance
---
## 🎯 Problem Statement
### Root Cause: Fixed Size Class Architecture
**Current Design** (Phase 11):
```c
// SuperSlab is bound to ONE size class
struct SuperSlab {
uint8_t size_class; // FIXED at allocation time (0-7)
// ... 32 slabs, all for the SAME class
};
// 8 independent SuperSlabHead structures (one per class)
SuperSlabHead g_superslab_heads[8]; // Each class manages its own pool
```
**Problem**:
- Benchmark (100K iterations, 256B): **877 SuperSlabs allocated**
- Memory usage: 877MB (877 × 1MB SuperSlabs)
- Metadata overhead: 877 × ~2KB headers = ~1.8MB
- **Each size class independently allocates SuperSlabs** → massive churn
**Why 877?**:
```
Class 0 (8B): ~100 SuperSlabs
Class 1 (16B): ~120 SuperSlabs
Class 2 (32B): ~150 SuperSlabs
Class 3 (64B): ~180 SuperSlabs
Class 4 (128B): ~140 SuperSlabs
Class 5 (256B): ~187 SuperSlabs ← Target class for benchmark
Class 6 (512B): ~80 SuperSlabs
Class 7 (1KB): ~20 SuperSlabs
Total: 877 SuperSlabs
```
**Performance Impact**:
- Massive metadata traversal overhead
- Poor cache locality (877 scattered 1MB regions)
- Excessive TLB pressure
- SuperSlab allocation churn dominates runtime
---
## 🚀 Solution: Shared SuperSlab Pool (mimalloc-style)
### Core Concept
**New Design** (Phase 12):
```c
// SuperSlab is NOT bound to any class - slabs are dynamically assigned
struct SuperSlab {
// NO size_class field! Each slab has its own class_idx
uint8_t active_slabs; // Number of active slabs (any class)
uint32_t slab_bitmap; // 32-bit bitmap (1=active, 0=free)
// ... 32 slabs, EACH can be a different size class
};
// Single global pool (shared by all classes)
typedef struct SharedSuperSlabPool {
SuperSlab** slabs; // Array of all SuperSlabs
uint32_t total_count; // Total SuperSlabs allocated
uint32_t active_count; // SuperSlabs with active slabs
pthread_mutex_t lock; // Allocation lock
// Per-class hints (fast path optimization)
SuperSlab* class_hints[8]; // Last known SuperSlab with free space per class
} SharedSuperSlabPool;
```
### Per-Slab Dynamic Class Assignment
**Old** (TinySlabMeta):
```c
// Slab metadata (16 bytes) - class_idx inherited from SuperSlab
typedef struct TinySlabMeta {
void* freelist;
uint16_t used;
uint16_t capacity;
uint16_t carved;
uint16_t owner_tid;
} TinySlabMeta;
```
**New** (Phase 12):
```c
// Slab metadata (16 bytes) - class_idx is PER-SLAB
typedef struct TinySlabMeta {
void* freelist;
uint16_t used;
uint16_t capacity;
uint16_t carved;
uint8_t class_idx; // NEW: Dynamic class assignment (0-7, 255=unassigned)
uint8_t owner_tid_low; // Truncated to 8-bit (from 16-bit)
} TinySlabMeta;
```
**Size preserved**: Still 16 bytes (no growth!)
---
## 📐 Architecture Changes
### 1. SuperSlab Structure (superslab_types.h)
**Remove**:
```c
uint8_t size_class; // DELETE - no longer per-SuperSlab
```
**Add** (optional, for debugging):
```c
uint8_t mixed_slab_count; // Number of slabs with different class_idx (stats)
```
### 2. TinySlabMeta Structure (superslab_types.h)
**Modify**:
```c
typedef struct TinySlabMeta {
void* freelist;
uint16_t used;
uint16_t capacity;
uint16_t carved;
uint8_t class_idx; // NEW: 0-7 for active, 255=unassigned
uint8_t owner_tid_low; // Changed from uint16_t owner_tid
} TinySlabMeta;
```
### 3. Shared Pool Structure (NEW: hakmem_shared_pool.h)
```c
// Global shared pool (singleton)
typedef struct SharedSuperSlabPool {
SuperSlab** slabs; // Dynamic array of SuperSlab pointers
uint32_t capacity; // Array capacity (grows as needed)
uint32_t total_count; // Total SuperSlabs allocated
uint32_t active_count; // SuperSlabs with >0 active slabs
pthread_mutex_t alloc_lock; // Lock for slab allocation
// Per-class hints (lock-free read, updated under lock)
SuperSlab* class_hints[TINY_NUM_CLASSES];
// LRU cache integration (Phase 9)
SuperSlab* lru_head;
SuperSlab* lru_tail;
uint32_t lru_count;
} SharedSuperSlabPool;
// Global singleton
extern SharedSuperSlabPool g_shared_pool;
// API
void shared_pool_init(void);
SuperSlab* shared_pool_acquire_superslab(void); // Get/allocate SuperSlab
int shared_pool_acquire_slab(int class_idx, SuperSlab** ss_out, int* slab_idx_out);
void shared_pool_release_slab(SuperSlab* ss, int slab_idx);
```
### 4. Allocation Flow (NEW)
**Old Flow** (Phase 11):
```
1. TLS cache miss for class C
2. Check g_superslab_heads[C].current_chunk
3. If no space → allocate NEW SuperSlab for class C
4. All 32 slabs in new SuperSlab belong to class C
```
**New Flow** (Phase 12):
```
1. TLS cache miss for class C
2. Check g_shared_pool.class_hints[C]
3. If hint has free slab → assign that slab to class C (set class_idx=C)
4. If no hint:
a. Scan g_shared_pool.slabs[] for any SuperSlab with free slab
b. If found → assign slab to class C
c. If not found → allocate NEW SuperSlab (add to pool)
5. Update class_hints[C] for fast path
```
**Key Benefit**: NEW SuperSlab only allocated when ALL existing SuperSlabs are full!
---
## 🔧 Implementation Plan
### Phase 12-1: Dynamic Slab Metadata ✅ (Current Task)
**Files to modify**:
- `core/superslab/superslab_types.h` - Add `class_idx` to TinySlabMeta
- `core/superslab/superslab_types.h` - Remove `size_class` from SuperSlab
**Changes**:
```c
// TinySlabMeta: Add class_idx field
typedef struct TinySlabMeta {
void* freelist;
uint16_t used;
uint16_t capacity;
uint16_t carved;
uint8_t class_idx; // NEW: 0-7 for active, 255=UNASSIGNED
uint8_t owner_tid_low; // Changed from uint16_t
} TinySlabMeta;
// SuperSlab: Remove size_class
typedef struct SuperSlab {
uint64_t magic;
// uint8_t size_class; // REMOVED!
uint8_t active_slabs;
uint8_t lg_size;
uint8_t _pad0;
// ... rest unchanged
} SuperSlab;
```
**Compatibility shim** (temporary, for gradual migration):
```c
// Provide backward-compatible size_class accessor
static inline int superslab_get_class(SuperSlab* ss, int slab_idx) {
return ss->slabs[slab_idx].class_idx;
}
```
### Phase 12-2: Shared Pool Infrastructure
**New file**: `core/hakmem_shared_pool.h`, `core/hakmem_shared_pool.c`
**Functionality**:
- `shared_pool_init()` - Initialize global pool
- `shared_pool_acquire_slab()` - Get free slab for class_idx
- `shared_pool_release_slab()` - Mark slab as free (class_idx=255)
- `shared_pool_gc()` - Garbage collect empty SuperSlabs
**Data structure**:
```c
// Global pool (singleton)
SharedSuperSlabPool g_shared_pool = {
.slabs = NULL,
.capacity = 0,
.total_count = 0,
.active_count = 0,
.alloc_lock = PTHREAD_MUTEX_INITIALIZER,
.class_hints = {NULL},
.lru_head = NULL,
.lru_tail = NULL,
.lru_count = 0
};
```
### Phase 12-3: Refill Path Integration
**Files to modify**:
- `core/hakmem_tiny_refill_p0.inc.h` - Update to use shared pool
- `core/tiny_superslab_alloc.inc.h` - Replace per-class allocation with shared pool
**Key changes**:
```c
// OLD: superslab_refill(int class_idx)
static SuperSlab* superslab_refill_old(int class_idx) {
SuperSlabHead* head = &g_superslab_heads[class_idx];
// ... allocate SuperSlab for class_idx only
}
// NEW: superslab_refill(int class_idx) - use shared pool
static SuperSlab* superslab_refill_new(int class_idx) {
SuperSlab* ss = NULL;
int slab_idx = -1;
// Try to acquire a free slab from shared pool
if (shared_pool_acquire_slab(class_idx, &ss, &slab_idx) == 0) {
// SUCCESS: Got a slab assigned to class_idx
return ss;
}
// FAILURE: All SuperSlabs full, need to allocate new one
// (This should be RARE after pool grows to steady-state)
return NULL;
}
```
### Phase 12-4: Free Path Integration
**Files to modify**:
- `core/tiny_free_fast.inc.h` - Update to handle dynamic class_idx
- `core/tiny_superslab_free.inc.h` - Update to release slabs back to pool
**Key changes**:
```c
// OLD: Free assumes slab belongs to ss->size_class
static inline void hak_tiny_free_superslab_old(void* ptr, SuperSlab* ss) {
int class_idx = ss->size_class; // FIXED class
// ... free logic
}
// NEW: Free reads class_idx from slab metadata
static inline void hak_tiny_free_superslab_new(void* ptr, SuperSlab* ss, int slab_idx) {
int class_idx = ss->slabs[slab_idx].class_idx; // DYNAMIC class
// ... free logic
// If slab becomes empty, release back to pool
if (ss->slabs[slab_idx].used == 0) {
shared_pool_release_slab(ss, slab_idx);
ss->slabs[slab_idx].class_idx = 255; // Mark as unassigned
}
}
```
### Phase 12-5: Testing & Benchmarking
**Validation**:
1. **Correctness**: Run bench_fixed_size_hakmem 100K iterations (all classes)
2. **SuperSlab count**: Monitor g_shared_pool.total_count (expect 100-200)
3. **Performance**: bench_random_mixed_hakmem (expect 70-90M ops/s)
**Expected results**:
| Metric | Phase 11 (Before) | Phase 12 (After) | Improvement |
|--------|-------------------|------------------|-------------|
| SuperSlab count | 877 | 100-200 | -70-80% |
| Memory usage | 877MB | 100-200MB | -70-80% |
| Metadata overhead | ~1.8MB | ~0.2-0.4MB | -78-89% |
| Performance | 9.38M ops/s | 70-90M ops/s | +650-860% |
---
## ⚠️ Risk Analysis
### Complexity Risks
1. **Concurrency**: Shared pool requires careful locking
- **Mitigation**: Per-class hints reduce contention (lock-free fast path)
2. **Fragmentation**: Mixed classes in same SuperSlab may increase fragmentation
- **Mitigation**: Smart slab assignment (prefer same-class SuperSlabs)
3. **Debugging**: Dynamic class_idx makes debugging harder
- **Mitigation**: Add runtime validation (class_idx sanity checks)
### Performance Risks
1. **Lock contention**: Shared pool lock may become bottleneck
- **Mitigation**: Per-class hints + fast path bypass lock 90%+ of time
2. **Cache misses**: Accessing distant SuperSlabs may reduce locality
- **Mitigation**: LRU cache keeps hot SuperSlabs resident
---
## 📊 Success Metrics
### Primary Goals
1. **SuperSlab count**: 877 → 100-200 (-70-80%) ✅
2. **Performance**: 9.38M → 70-90M ops/s (+650-860%) ✅
3. **Memory usage**: 877MB → 100-200MB (-70-80%) ✅
### Stretch Goals
1. **System malloc parity**: 90M ops/s (100% of target) 🎯
2. **Scalability**: Maintain performance with 4T+ threads
3. **Fragmentation**: <10% internal fragmentation
---
## 🔄 Migration Strategy
### Phase 12-1: Metadata (Low Risk)
- Add `class_idx` to TinySlabMeta (16B preserved)
- Remove `size_class` from SuperSlab
- Add backward-compatible shim
### Phase 12-2: Infrastructure (Medium Risk)
- Implement shared pool (NEW code, isolated)
- No changes to existing paths yet
### Phase 12-3: Integration (High Risk)
- Update refill path to use shared pool
- Update free path to handle dynamic class_idx
- **Critical**: Extensive testing required
### Phase 12-4: Cleanup (Low Risk)
- Remove per-class SuperSlabHead structures
- Remove backward-compatible shims
- Final optimization pass
---
## 📝 Next Steps
### Immediate (Phase 12-1)
1. Update `superslab_types.h` - Add `class_idx` to TinySlabMeta
2. Update `superslab_types.h` - Remove `size_class` from SuperSlab
3. Add backward-compatible shim `superslab_get_class()`
4. Fix compilation errors (grep for `ss->size_class`)
### Next (Phase 12-2)
1. Implement `hakmem_shared_pool.h/c`
2. Write unit tests for shared pool
3. Integrate with LRU cache (Phase 9)
### Then (Phase 12-3+)
1. Update refill path
2. Update free path
3. Benchmark & validate
4. Cleanup & optimize
---
**Status**: 🚧 Phase 12-1 (Metadata) - IN PROGRESS
**Expected completion**: Phase 12-1 today, Phase 12-2 tomorrow, Phase 12-3 day after
**Total estimated time**: 3-4 days for full implementation