616 lines
18 KiB
Markdown
616 lines
18 KiB
Markdown
|
|
# Bitmap vs Free List: Design Tradeoffs
|
||
|
|
|
||
|
|
**Date**: 2025-10-26
|
||
|
|
**Context**: Evaluating architectural choices for hakmem Tiny Pool optimization
|
||
|
|
**Purpose**: Understand tradeoffs before deciding whether to adopt mimalloc's free list approach
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Executive Summary
|
||
|
|
|
||
|
|
### The Core Question
|
||
|
|
|
||
|
|
**Should hakmem abandon bitmap allocation in favor of mimalloc's intrusive free list?**
|
||
|
|
|
||
|
|
**Answer**: It depends on **project goals**:
|
||
|
|
- **If goal = production speed**: Free list wins (5-10ns faster)
|
||
|
|
- **If goal = research/diagnostics**: Bitmap wins (visibility, safety, flexibility)
|
||
|
|
- **If goal = both**: Hybrid approach possible (see Section 6)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 1. Architecture Comparison
|
||
|
|
|
||
|
|
### Bitmap Approach (Current hakmem)
|
||
|
|
|
||
|
|
```c
|
||
|
|
// Metadata: Separate bitmap (1 bit per block)
|
||
|
|
typedef struct TinySlab {
|
||
|
|
uint64_t bitmap[16]; // 1024 blocks = 1024 bits
|
||
|
|
uint8_t* base; // Data region
|
||
|
|
uint16_t free_count; // O(1) empty check
|
||
|
|
// ... diagnostics, ownership, stats ...
|
||
|
|
} TinySlab;
|
||
|
|
|
||
|
|
// Allocation: Find-first-set
|
||
|
|
void* alloc_from_bitmap(TinySlab* s) {
|
||
|
|
int word_idx = find_first_nonzero(s->bitmap); // ~3 ns
|
||
|
|
int bit_idx = __builtin_ctzll(s->bitmap[word_idx]); // ~1 ns
|
||
|
|
s->bitmap[word_idx] &= ~(1ULL << bit_idx); // ~1 ns
|
||
|
|
return s->base + (word_idx * 64 + bit_idx) * block_size;
|
||
|
|
}
|
||
|
|
// Cost: 5-6 ns (bitmap scan + bit extraction)
|
||
|
|
```
|
||
|
|
|
||
|
|
**Key Properties**:
|
||
|
|
- ✅ Metadata separate from data
|
||
|
|
- ✅ Random access to allocation state
|
||
|
|
- ✅ O(1) slab-level statistics (free_count, bitmap scan)
|
||
|
|
- ⚠️ 5-6 ns overhead per allocation
|
||
|
|
|
||
|
|
### Free List Approach (mimalloc)
|
||
|
|
|
||
|
|
```c
|
||
|
|
// Metadata: Intrusive next-pointer in free blocks
|
||
|
|
typedef struct Block {
|
||
|
|
struct Block* next; // 8 bytes IN the data region
|
||
|
|
} Block;
|
||
|
|
|
||
|
|
typedef struct Page {
|
||
|
|
Block* local_free; // LIFO stack head
|
||
|
|
// ... minimal metadata ...
|
||
|
|
} Page;
|
||
|
|
|
||
|
|
// Allocation: Pop from LIFO
|
||
|
|
void* alloc_from_freelist(Page* p) {
|
||
|
|
Block* b = p->local_free; // ~0.5 ns (L1 hit)
|
||
|
|
p->local_free = b->next; // ~0.5 ns (L1 hit)
|
||
|
|
return b;
|
||
|
|
}
|
||
|
|
// Cost: 1-2 ns (two pointer operations)
|
||
|
|
```
|
||
|
|
|
||
|
|
**Key Properties**:
|
||
|
|
- ✅ Zero metadata overhead (uses free blocks themselves)
|
||
|
|
- ✅ Minimal CPU overhead (1-2 pointer ops)
|
||
|
|
- ⚠️ Intrusive (overwrites first 8 bytes of free blocks)
|
||
|
|
- ⚠️ No random access (must traverse list)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 2. Bitmap Advantages
|
||
|
|
|
||
|
|
### 2.1 Observability and Diagnostics
|
||
|
|
|
||
|
|
**Bitmap**: Complete allocation state visible at a glance
|
||
|
|
```c
|
||
|
|
// Print slab state (O(1) bitmap scan)
|
||
|
|
void print_slab_state(TinySlab* s) {
|
||
|
|
printf("Slab free pattern: ");
|
||
|
|
for (int i = 0; i < 1024; i++) {
|
||
|
|
printf("%c", is_free(s->bitmap, i) ? '.' : 'X');
|
||
|
|
}
|
||
|
|
// Output: "X...XX.X.XX....." (visual fragmentation pattern)
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Free List**: Must traverse entire list
|
||
|
|
```c
|
||
|
|
// Print page state (O(n) traversal)
|
||
|
|
void print_page_state(Page* p) {
|
||
|
|
int count = 0;
|
||
|
|
Block* b = p->local_free;
|
||
|
|
while (b) { count++; b = b->next; }
|
||
|
|
printf("Free blocks: %d (locations unknown)\n", count);
|
||
|
|
// Output: "Free blocks: 42" (no spatial information)
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Impact**:
|
||
|
|
- ✅ **Bitmap**: Can detect fragmentation patterns, hot spots, allocation clustering
|
||
|
|
- ⚠️ **Free List**: Only knows count, not spatial distribution
|
||
|
|
|
||
|
|
### 2.2 Memory Safety and Debugging
|
||
|
|
|
||
|
|
**Bitmap**: Freed memory can be immediately zeroed
|
||
|
|
```c
|
||
|
|
void free_to_bitmap(TinySlab* s, void* ptr) {
|
||
|
|
int idx = block_index(s, ptr);
|
||
|
|
s->bitmap[idx / 64] |= (1ULL << (idx % 64));
|
||
|
|
memset(ptr, 0, block_size); // Safe: no metadata in block
|
||
|
|
}
|
||
|
|
// Use-after-free detection: accessing 0-filled memory likely crashes early
|
||
|
|
```
|
||
|
|
|
||
|
|
**Free List**: Next-pointer remains in freed memory
|
||
|
|
```c
|
||
|
|
void free_to_list(Page* p, void* ptr) {
|
||
|
|
Block* b = (Block*)ptr;
|
||
|
|
b->next = p->local_free; // Writes to freed memory!
|
||
|
|
p->local_free = b;
|
||
|
|
}
|
||
|
|
// Use-after-free: might corrupt next-pointer, causing subtle bugs later
|
||
|
|
```
|
||
|
|
|
||
|
|
**Impact**:
|
||
|
|
- ✅ **Bitmap**: Easier debugging (freed memory is clean)
|
||
|
|
- ✅ **Bitmap**: Better ASAN/Valgrind integration (can mark freed)
|
||
|
|
- ⚠️ **Free List**: Next-pointer corruption can cause cascading failures
|
||
|
|
|
||
|
|
### 2.3 Ownership Tracking and Validation
|
||
|
|
|
||
|
|
**Bitmap**: Can track per-block metadata
|
||
|
|
```c
|
||
|
|
typedef struct TinySlab {
|
||
|
|
uint64_t bitmap[16]; // Allocation state
|
||
|
|
uint8_t owner[1024]; // Per-block owner thread ID
|
||
|
|
uint32_t alloc_time[1024]; // Allocation timestamp
|
||
|
|
} TinySlab;
|
||
|
|
|
||
|
|
// Validate ownership on free
|
||
|
|
void free_with_validation(TinySlab* s, void* ptr) {
|
||
|
|
int idx = block_index(s, ptr);
|
||
|
|
if (s->owner[idx] != current_thread()) {
|
||
|
|
fprintf(stderr, "ERROR: Cross-thread free without handoff!\n");
|
||
|
|
// Can detect bugs immediately
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Free List**: No per-block metadata (intrusive design)
|
||
|
|
```c
|
||
|
|
// Cannot store per-block metadata without external hash table
|
||
|
|
// Owner validation requires separate data structure
|
||
|
|
```
|
||
|
|
|
||
|
|
**Impact**:
|
||
|
|
- ✅ **Bitmap**: Can implement rich diagnostics (owner, timestamp, call-site)
|
||
|
|
- ✅ **Bitmap**: Validates invariants at allocation/free time
|
||
|
|
- ⚠️ **Free List**: Requires external data structures for diagnostics
|
||
|
|
|
||
|
|
### 2.4 Statistics and Profiling
|
||
|
|
|
||
|
|
**Bitmap**: O(1) slab-level queries
|
||
|
|
```c
|
||
|
|
// All O(1) operations
|
||
|
|
uint16_t free_count = slab->free_count;
|
||
|
|
bool is_empty = (free_count == 1024);
|
||
|
|
bool is_full = (free_count == 0);
|
||
|
|
float utilization = 1.0 - (free_count / 1024.0);
|
||
|
|
|
||
|
|
// Fragmentation analysis (O(n) but rare)
|
||
|
|
int longest_run = find_longest_free_run(slab->bitmap);
|
||
|
|
```
|
||
|
|
|
||
|
|
**Free List**: Requires traversal
|
||
|
|
```c
|
||
|
|
// Count requires O(n) traversal
|
||
|
|
int free_count = 0;
|
||
|
|
for (Block* b = page->local_free; b; b = b->next) {
|
||
|
|
free_count++;
|
||
|
|
}
|
||
|
|
// Cannot determine fragmentation without traversal
|
||
|
|
```
|
||
|
|
|
||
|
|
**Impact**:
|
||
|
|
- ✅ **Bitmap**: Fast statistics collection (research-friendly)
|
||
|
|
- ✅ **Bitmap**: Can analyze allocation patterns
|
||
|
|
- ⚠️ **Free List**: Statistics require expensive traversal or external counters
|
||
|
|
|
||
|
|
### 2.5 Concurrent Access Visibility
|
||
|
|
|
||
|
|
**Bitmap**: Can inspect remote thread state
|
||
|
|
```c
|
||
|
|
// Diagnostic thread can scan all slabs
|
||
|
|
void print_global_state() {
|
||
|
|
for (int tid = 0; tid < MAX_THREADS; tid++) {
|
||
|
|
for (int class = 0; class < 8; class++) {
|
||
|
|
TinySlab* s = get_slab(tid, class);
|
||
|
|
// Instant visibility of free_count, bitmap
|
||
|
|
printf("Thread %d Class %d: %d/%d free\n",
|
||
|
|
tid, class, s->free_count, 1024);
|
||
|
|
}
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Free List**: Cannot safely inspect remote thread's local_free
|
||
|
|
```c
|
||
|
|
// Diagnostic thread CANNOT read local_free (race condition)
|
||
|
|
// Must use external atomic counters (defeats purpose)
|
||
|
|
```
|
||
|
|
|
||
|
|
**Impact**:
|
||
|
|
- ✅ **Bitmap**: Can build monitoring dashboards, live profilers
|
||
|
|
- ✅ **Bitmap**: Supports cross-thread adoption decisions (CDA)
|
||
|
|
- ⚠️ **Free List**: Opaque to external observers
|
||
|
|
|
||
|
|
### 2.6 Research and Experimentation
|
||
|
|
|
||
|
|
**Bitmap**: Easy to modify allocation policy
|
||
|
|
```c
|
||
|
|
// Experiment: Best-fit instead of first-fit
|
||
|
|
int find_best_fit_block(TinySlab* s, int requested_run) {
|
||
|
|
// Scan bitmap for smallest run >= requested_run
|
||
|
|
// Easy to implement alternative allocation strategies
|
||
|
|
}
|
||
|
|
|
||
|
|
// Experiment: Locality-aware allocation
|
||
|
|
int find_nearest_free(TinySlab* s, void* previous_alloc) {
|
||
|
|
int prev_idx = block_index(s, previous_alloc);
|
||
|
|
// Search bitmap for nearby free blocks (cache locality)
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Free List**: Policy locked to LIFO
|
||
|
|
```c
|
||
|
|
// Always LIFO (most recently freed = next allocated)
|
||
|
|
// Cannot experiment with other policies without major restructuring
|
||
|
|
```
|
||
|
|
|
||
|
|
**Impact**:
|
||
|
|
- ✅ **Bitmap**: Flexible research platform (try different allocation strategies)
|
||
|
|
- ✅ **Bitmap**: Can experiment with locality, fragmentation reduction
|
||
|
|
- ⚠️ **Free List**: Fixed policy (LIFO only)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 3. Free List Advantages
|
||
|
|
|
||
|
|
### 3.1 Raw Performance
|
||
|
|
|
||
|
|
**Numbers from ANALYSIS_SUMMARY.md**:
|
||
|
|
- **Bitmap**: 5-6 ns per allocation (find-first-set + bit extraction)
|
||
|
|
- **Free List**: 1-2 ns per allocation (two pointer operations)
|
||
|
|
- **Gap**: **3-4 ns per allocation (2-6x faster)**
|
||
|
|
|
||
|
|
**Why Free List Wins**:
|
||
|
|
```c
|
||
|
|
// Bitmap: 5 operations
|
||
|
|
int word_idx = find_first_nonzero(bitmap); // 2-3 ns (unpredictable branch)
|
||
|
|
int bit_idx = ctzll(bitmap[word_idx]); // 1 ns (CPU instruction)
|
||
|
|
bitmap[word_idx] &= ~(1ULL << bit_idx); // 1 ns (bit clear)
|
||
|
|
void* ptr = base + index * block_size; // 1 ns (arithmetic)
|
||
|
|
// Total: 5 ns
|
||
|
|
|
||
|
|
// Free List: 2 operations
|
||
|
|
Block* b = page->local_free; // 0.5 ns (L1 hit)
|
||
|
|
page->local_free = b->next; // 0.5 ns (L1 hit)
|
||
|
|
return b; // 0.5 ns
|
||
|
|
// Total: 1.5 ns
|
||
|
|
```
|
||
|
|
|
||
|
|
### 3.2 Cache Efficiency
|
||
|
|
|
||
|
|
**Free List**: Excellent temporal locality
|
||
|
|
```c
|
||
|
|
// Recently freed block = next allocated (LIFO)
|
||
|
|
// Likely still in L1 cache (3-5 cycles)
|
||
|
|
Block* b = page->local_free; // Cache hit!
|
||
|
|
```
|
||
|
|
|
||
|
|
**Bitmap**: Poorer temporal locality
|
||
|
|
```c
|
||
|
|
// Allocated block may be anywhere in slab
|
||
|
|
// Bitmap access + block access = 2 cache lines
|
||
|
|
int idx = find_first_set(...); // Cache line 1 (bitmap)
|
||
|
|
void* ptr = base + idx * block_size; // Cache line 2 (block)
|
||
|
|
```
|
||
|
|
|
||
|
|
**Impact**:
|
||
|
|
- ✅ **Free List**: Better L1 cache hit rate (~95%+)
|
||
|
|
- ⚠️ **Bitmap**: More cache line touches (~2x)
|
||
|
|
|
||
|
|
### 3.3 Memory Overhead
|
||
|
|
|
||
|
|
**Free List**: Zero metadata
|
||
|
|
```c
|
||
|
|
typedef struct Page {
|
||
|
|
Block* local_free; // 8 bytes
|
||
|
|
uint16_t capacity; // 2 bytes
|
||
|
|
// Total: 10 bytes for entire page
|
||
|
|
} Page;
|
||
|
|
```
|
||
|
|
|
||
|
|
**Bitmap**: 1 bit per block (+ supporting metadata)
|
||
|
|
```c
|
||
|
|
typedef struct TinySlab {
|
||
|
|
uint64_t bitmap[16]; // 128 bytes (1024 blocks)
|
||
|
|
uint16_t free_count; // 2 bytes
|
||
|
|
uint8_t* base; // 8 bytes
|
||
|
|
// Total: 138 bytes minimum
|
||
|
|
} TinySlab;
|
||
|
|
// For 8-byte blocks: 1024 * 8 = 8KB data, 138B metadata = 1.7% overhead
|
||
|
|
```
|
||
|
|
|
||
|
|
**Impact**:
|
||
|
|
- ✅ **Free List**: ~0.1% overhead
|
||
|
|
- ⚠️ **Bitmap**: ~1-2% overhead
|
||
|
|
|
||
|
|
### 3.4 Simplicity
|
||
|
|
|
||
|
|
**Free List**: Minimal code complexity
|
||
|
|
```c
|
||
|
|
// Entire allocation logic: 3 lines
|
||
|
|
void* alloc(Page* p) {
|
||
|
|
Block* b = p->local_free;
|
||
|
|
if (!b) return NULL;
|
||
|
|
p->local_free = b->next;
|
||
|
|
return b;
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Bitmap**: More complex
|
||
|
|
```c
|
||
|
|
// Allocation logic: 15+ lines
|
||
|
|
void* alloc(TinySlab* s) {
|
||
|
|
if (s->free_count == 0) return NULL;
|
||
|
|
for (int i = 0; i < 16; i++) {
|
||
|
|
if (s->bitmap[i] == 0) continue; // Skip empty words
|
||
|
|
int bit_idx = __builtin_ctzll(s->bitmap[i]);
|
||
|
|
s->bitmap[i] &= ~(1ULL << bit_idx);
|
||
|
|
s->free_count--;
|
||
|
|
return s->base + (i * 64 + bit_idx) * s->block_size;
|
||
|
|
}
|
||
|
|
return NULL; // Should never reach
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Impact**:
|
||
|
|
- ✅ **Free List**: Easier to understand, maintain, optimize
|
||
|
|
- ⚠️ **Bitmap**: More code paths, more potential for bugs
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 4. Real-World Use Cases
|
||
|
|
|
||
|
|
### When Bitmap Wins
|
||
|
|
|
||
|
|
**Scenario 1: Memory Debugging Tools**
|
||
|
|
```c
|
||
|
|
// AddressSanitizer, Valgrind integration
|
||
|
|
// Can mark freed blocks immediately
|
||
|
|
void free_with_asan(TinySlab* s, void* ptr) {
|
||
|
|
int idx = block_index(s, ptr);
|
||
|
|
s->bitmap[idx / 64] |= (1ULL << (idx % 64));
|
||
|
|
__asan_poison_memory_region(ptr, block_size); // Safe!
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Scenario 2: Research Allocators**
|
||
|
|
```c
|
||
|
|
// Experimenting with allocation strategies
|
||
|
|
// e.g., hakmem's ELO learning, call-site profiling
|
||
|
|
void alloc_with_learning(TinySlab* s, void* site) {
|
||
|
|
int idx = find_best_block_for_site(s, site); // Bitmap enables this
|
||
|
|
// Can implement custom heuristics
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Scenario 3: Diagnostic Dashboards**
|
||
|
|
```c
|
||
|
|
// Real-time monitoring (e.g., allocator profiler UI)
|
||
|
|
// Can scan all slabs without stopping allocation
|
||
|
|
void update_dashboard() {
|
||
|
|
for_each_slab(slab) {
|
||
|
|
dashboard_update(slab->free_count, slab->bitmap);
|
||
|
|
// No disruption to allocation threads
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### When Free List Wins
|
||
|
|
|
||
|
|
**Scenario 1: Production Web Servers**
|
||
|
|
```c
|
||
|
|
// mimalloc in WebKit, nginx, etc.
|
||
|
|
// Every nanosecond counts (millions of allocations/sec)
|
||
|
|
// Diagnostics = rare, speed = always
|
||
|
|
```
|
||
|
|
|
||
|
|
**Scenario 2: Latency-Sensitive Systems**
|
||
|
|
```c
|
||
|
|
// HFT, real-time systems
|
||
|
|
// Predictable 1-2ns allocation critical
|
||
|
|
// Bitmap's 5-6ns too variable
|
||
|
|
```
|
||
|
|
|
||
|
|
**Scenario 3: Memory-Constrained Embedded**
|
||
|
|
```c
|
||
|
|
// 1.7% bitmap overhead unacceptable
|
||
|
|
// Every byte matters
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 5. Quantitative Comparison
|
||
|
|
|
||
|
|
| Metric | Bitmap | Free List | Winner |
|
||
|
|
|--------|--------|-----------|--------|
|
||
|
|
| **Performance** |
|
||
|
|
| Allocation latency | 5-6 ns | 1-2 ns | Free List (3-4ns faster) |
|
||
|
|
| Cache efficiency | 2 cache lines | 1 cache line | Free List |
|
||
|
|
| Branch mispredicts | 1-2 per alloc | 0-1 per alloc | Free List |
|
||
|
|
| **Memory** |
|
||
|
|
| Metadata overhead | 1-2% | ~0.1% | Free List |
|
||
|
|
| Block size impact | +128B per slab | +8B per page | Free List |
|
||
|
|
| **Diagnostics** |
|
||
|
|
| Observability | Full state visible | Opaque (count only) | Bitmap |
|
||
|
|
| Debugging | Easy (zeroed free) | Hard (pointer corruption) | Bitmap |
|
||
|
|
| Statistics | O(1) queries | O(n) traversal | Bitmap |
|
||
|
|
| Profiling | Per-block tracking | External hash table | Bitmap |
|
||
|
|
| **Flexibility** |
|
||
|
|
| Allocation policy | Pluggable (first-fit, best-fit, etc.) | LIFO only | Bitmap |
|
||
|
|
| Research | Easy experimentation | Fixed design | Bitmap |
|
||
|
|
| Monitoring | Non-intrusive scanning | Requires external counters | Bitmap |
|
||
|
|
| **Safety** |
|
||
|
|
| Use-after-free detection | Good (zeroed memory) | Poor (pointer corruption) | Bitmap |
|
||
|
|
| ASAN/Valgrind integration | Excellent | Limited | Bitmap |
|
||
|
|
| Cross-thread validation | Easy | Requires external state | Bitmap |
|
||
|
|
| **Complexity** |
|
||
|
|
| Code size | ~100 lines | ~20 lines | Free List |
|
||
|
|
| Maintainability | Moderate | High | Free List |
|
||
|
|
| Optimization potential | Limited (bitmap scan) | High (2 pointers) | Free List |
|
||
|
|
|
||
|
|
**Overall**:
|
||
|
|
- **Production speed**: Free List wins (3-4ns faster, simpler)
|
||
|
|
- **Research/diagnostics**: Bitmap wins (visibility, flexibility, safety)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 6. Hybrid Approaches
|
||
|
|
|
||
|
|
### Option 1: Dual-Mode Allocator
|
||
|
|
|
||
|
|
```c
|
||
|
|
#ifdef HAKMEM_DIAGNOSTIC_MODE
|
||
|
|
// Bitmap mode (slow but visible)
|
||
|
|
void* alloc() { return alloc_bitmap(); }
|
||
|
|
#else
|
||
|
|
// Free list mode (fast production)
|
||
|
|
void* alloc() { return alloc_freelist(); }
|
||
|
|
#endif
|
||
|
|
```
|
||
|
|
|
||
|
|
**Pros**: Best of both worlds
|
||
|
|
**Cons**: Maintenance burden (two code paths)
|
||
|
|
|
||
|
|
### Option 2: Shadow Bitmap
|
||
|
|
|
||
|
|
```c
|
||
|
|
// Fast path: Free list
|
||
|
|
Block* b = page->local_free;
|
||
|
|
page->local_free = b->next;
|
||
|
|
|
||
|
|
// Diagnostic path: Update shadow bitmap (async)
|
||
|
|
if (unlikely(diagnostic_enabled)) {
|
||
|
|
shadow_bitmap_record(page, b); // Non-blocking queue
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Pros**: Fast path unaffected, diagnostics available
|
||
|
|
**Cons**: Shadow state may lag, memory overhead
|
||
|
|
|
||
|
|
### Option 3: Adaptive Strategy
|
||
|
|
|
||
|
|
```c
|
||
|
|
// Use bitmap for slabs with high churn (diagnostic value)
|
||
|
|
// Use free list for stable slabs (performance critical)
|
||
|
|
if (slab->churn_rate > THRESHOLD) {
|
||
|
|
use_bitmap_mode(slab);
|
||
|
|
} else {
|
||
|
|
use_freelist_mode(slab);
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Pros**: Dynamic optimization
|
||
|
|
**Cons**: Complex, runtime overhead
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 7. Recommendations for hakmem
|
||
|
|
|
||
|
|
### Context: hakmem's Goals (from ANALYSIS_SUMMARY.md)
|
||
|
|
|
||
|
|
> **hakmem's Philosophy** (research PoC):
|
||
|
|
> - "Flexible architecture: research platform for learning"
|
||
|
|
> - "Trade performance for visibility (ownership tracking, per-class stats)"
|
||
|
|
> - "Novel features: call-site profiling, ELO learning, evolution tracking"
|
||
|
|
|
||
|
|
### Recommendation: **Keep Bitmap for Tiny Pool**
|
||
|
|
|
||
|
|
**Reasons**:
|
||
|
|
1. ✅ **Research value**: hakmem's ELO learning, call-site profiling **require** per-block tracking
|
||
|
|
2. ✅ **Diagnostics**: Ownership tracking, CDA decision-making benefit from bitmap visibility
|
||
|
|
3. ✅ **Trade-off is acceptable**: 5-6ns overhead is worth the flexibility for a research allocator
|
||
|
|
4. ⚠️ **But optimize around it**: Remove statistics overhead, simplify hot path (my original P1-P2)
|
||
|
|
|
||
|
|
### Alternative: **Adopt Free List for Tiny Pool**
|
||
|
|
|
||
|
|
**Reasons**:
|
||
|
|
1. ✅ **Performance**: Closes 3-4ns of the 69ns gap
|
||
|
|
2. ✅ **Proven**: mimalloc's design is battle-tested
|
||
|
|
3. ✅ **Simplicity**: Easier to maintain, optimize
|
||
|
|
4. ⚠️ **But lose research features**: Must find alternative ways to track per-block metadata
|
||
|
|
|
||
|
|
### Compromise: **Hybrid Approach**
|
||
|
|
|
||
|
|
**Proposal**:
|
||
|
|
```c
|
||
|
|
// Fast path: Free list (mimalloc-style)
|
||
|
|
void* tiny_alloc_fast(Page* p) {
|
||
|
|
Block* b = p->local_free;
|
||
|
|
if (likely(b)) {
|
||
|
|
p->local_free = b->next;
|
||
|
|
return b;
|
||
|
|
}
|
||
|
|
return tiny_alloc_slow(p);
|
||
|
|
}
|
||
|
|
|
||
|
|
// Diagnostic mode: Enable shadow bitmap
|
||
|
|
#ifdef HAKMEM_DIAGNOSTIC_MODE
|
||
|
|
void* tiny_alloc_slow(Page* p) {
|
||
|
|
void* ptr = refill_from_partial(p);
|
||
|
|
diagnostic_record_alloc(p, ptr); // Async, non-blocking
|
||
|
|
return ptr;
|
||
|
|
}
|
||
|
|
#endif
|
||
|
|
```
|
||
|
|
|
||
|
|
**Benefits**:
|
||
|
|
- Fast path: 1-2ns (mimalloc speed)
|
||
|
|
- Diagnostic mode: Optional bitmap tracking (research features)
|
||
|
|
- Production mode: Zero overhead
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 8. Decision Matrix
|
||
|
|
|
||
|
|
| Priority | Bitmap | Free List | Hybrid |
|
||
|
|
|----------|--------|-----------|--------|
|
||
|
|
| **Speed is #1 goal** | ❌ | ✅ | ✅ |
|
||
|
|
| **Research/diagnostics #1** | ✅ | ❌ | ⚠️ (complex) |
|
||
|
|
| **Simplicity #1** | ⚠️ | ✅ | ❌ |
|
||
|
|
| **Memory efficiency #1** | ❌ | ✅ | ⚠️ |
|
||
|
|
| **Flexibility #1** | ✅ | ❌ | ✅ |
|
||
|
|
|
||
|
|
**For hakmem specifically**:
|
||
|
|
- If **goal = beat mimalloc**: Free List
|
||
|
|
- If **goal = research platform**: Bitmap
|
||
|
|
- If **goal = both**: Hybrid (complex but feasible)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 9. Conclusion
|
||
|
|
|
||
|
|
### The Fundamental Tradeoff
|
||
|
|
|
||
|
|
**Bitmap = Observatory, Free List = Race Car**
|
||
|
|
|
||
|
|
- **Bitmap**: Sacrifices 3-4ns for complete visibility and flexibility
|
||
|
|
- **Free List**: Sacrifices observability for raw speed
|
||
|
|
|
||
|
|
### For hakmem's Context
|
||
|
|
|
||
|
|
Based on ANALYSIS_SUMMARY.md, hakmem's goals include:
|
||
|
|
- "Call-site profiling" → **Requires per-block tracking** → Bitmap advantage
|
||
|
|
- "ELO learning" → **Requires allocation history** → Bitmap advantage
|
||
|
|
- "Evolution tracking" → **Requires observability** → Bitmap advantage
|
||
|
|
|
||
|
|
**Verdict**: **Bitmap is the right choice for hakmem's research goals**
|
||
|
|
|
||
|
|
### But Optimize Around It
|
||
|
|
|
||
|
|
Instead of abandoning bitmap:
|
||
|
|
1. ✅ **Remove statistics overhead** (ChatGPT Pro's P1) → +10ns
|
||
|
|
2. ✅ **Simplify hot path** (my original P1-P2) → +15ns
|
||
|
|
3. ✅ **Keep bitmap** → Preserve research features
|
||
|
|
|
||
|
|
**Expected**: 83ns → 58-65ns (still 4x slower than mimalloc, but research features intact)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
**Last Updated**: 2025-10-26
|
||
|
|
**Status**: Analysis complete
|
||
|
|
**Next**: Decide strategy based on project priorities
|