Files
hakmem/docs/archive/BITMAP_VS_FREELIST_TRADEOFFS.md

616 lines
18 KiB
Markdown
Raw Normal View History

# Bitmap vs Free List: Design Tradeoffs
**Date**: 2025-10-26
**Context**: Evaluating architectural choices for hakmem Tiny Pool optimization
**Purpose**: Understand tradeoffs before deciding whether to adopt mimalloc's free list approach
---
## Executive Summary
### The Core Question
**Should hakmem abandon bitmap allocation in favor of mimalloc's intrusive free list?**
**Answer**: It depends on **project goals**:
- **If goal = production speed**: Free list wins (5-10ns faster)
- **If goal = research/diagnostics**: Bitmap wins (visibility, safety, flexibility)
- **If goal = both**: Hybrid approach possible (see Section 6)
---
## 1. Architecture Comparison
### Bitmap Approach (Current hakmem)
```c
// Metadata: Separate bitmap (1 bit per block)
typedef struct TinySlab {
uint64_t bitmap[16]; // 1024 blocks = 1024 bits
uint8_t* base; // Data region
uint16_t free_count; // O(1) empty check
// ... diagnostics, ownership, stats ...
} TinySlab;
// Allocation: Find-first-set
void* alloc_from_bitmap(TinySlab* s) {
int word_idx = find_first_nonzero(s->bitmap); // ~3 ns
int bit_idx = __builtin_ctzll(s->bitmap[word_idx]); // ~1 ns
s->bitmap[word_idx] &= ~(1ULL << bit_idx); // ~1 ns
return s->base + (word_idx * 64 + bit_idx) * block_size;
}
// Cost: 5-6 ns (bitmap scan + bit extraction)
```
**Key Properties**:
- ✅ Metadata separate from data
- ✅ Random access to allocation state
- ✅ O(1) slab-level statistics (free_count, bitmap scan)
- ⚠️ 5-6 ns overhead per allocation
### Free List Approach (mimalloc)
```c
// Metadata: Intrusive next-pointer in free blocks
typedef struct Block {
struct Block* next; // 8 bytes IN the data region
} Block;
typedef struct Page {
Block* local_free; // LIFO stack head
// ... minimal metadata ...
} Page;
// Allocation: Pop from LIFO
void* alloc_from_freelist(Page* p) {
Block* b = p->local_free; // ~0.5 ns (L1 hit)
p->local_free = b->next; // ~0.5 ns (L1 hit)
return b;
}
// Cost: 1-2 ns (two pointer operations)
```
**Key Properties**:
- ✅ Zero metadata overhead (uses free blocks themselves)
- ✅ Minimal CPU overhead (1-2 pointer ops)
- ⚠️ Intrusive (overwrites first 8 bytes of free blocks)
- ⚠️ No random access (must traverse list)
---
## 2. Bitmap Advantages
### 2.1 Observability and Diagnostics
**Bitmap**: Complete allocation state visible at a glance
```c
// Print slab state (O(1) bitmap scan)
void print_slab_state(TinySlab* s) {
printf("Slab free pattern: ");
for (int i = 0; i < 1024; i++) {
printf("%c", is_free(s->bitmap, i) ? '.' : 'X');
}
// Output: "X...XX.X.XX....." (visual fragmentation pattern)
}
```
**Free List**: Must traverse entire list
```c
// Print page state (O(n) traversal)
void print_page_state(Page* p) {
int count = 0;
Block* b = p->local_free;
while (b) { count++; b = b->next; }
printf("Free blocks: %d (locations unknown)\n", count);
// Output: "Free blocks: 42" (no spatial information)
}
```
**Impact**:
-**Bitmap**: Can detect fragmentation patterns, hot spots, allocation clustering
- ⚠️ **Free List**: Only knows count, not spatial distribution
### 2.2 Memory Safety and Debugging
**Bitmap**: Freed memory can be immediately zeroed
```c
void free_to_bitmap(TinySlab* s, void* ptr) {
int idx = block_index(s, ptr);
s->bitmap[idx / 64] |= (1ULL << (idx % 64));
memset(ptr, 0, block_size); // Safe: no metadata in block
}
// Use-after-free detection: accessing 0-filled memory likely crashes early
```
**Free List**: Next-pointer remains in freed memory
```c
void free_to_list(Page* p, void* ptr) {
Block* b = (Block*)ptr;
b->next = p->local_free; // Writes to freed memory!
p->local_free = b;
}
// Use-after-free: might corrupt next-pointer, causing subtle bugs later
```
**Impact**:
-**Bitmap**: Easier debugging (freed memory is clean)
-**Bitmap**: Better ASAN/Valgrind integration (can mark freed)
- ⚠️ **Free List**: Next-pointer corruption can cause cascading failures
### 2.3 Ownership Tracking and Validation
**Bitmap**: Can track per-block metadata
```c
typedef struct TinySlab {
uint64_t bitmap[16]; // Allocation state
uint8_t owner[1024]; // Per-block owner thread ID
uint32_t alloc_time[1024]; // Allocation timestamp
} TinySlab;
// Validate ownership on free
void free_with_validation(TinySlab* s, void* ptr) {
int idx = block_index(s, ptr);
if (s->owner[idx] != current_thread()) {
fprintf(stderr, "ERROR: Cross-thread free without handoff!\n");
// Can detect bugs immediately
}
}
```
**Free List**: No per-block metadata (intrusive design)
```c
// Cannot store per-block metadata without external hash table
// Owner validation requires separate data structure
```
**Impact**:
-**Bitmap**: Can implement rich diagnostics (owner, timestamp, call-site)
-**Bitmap**: Validates invariants at allocation/free time
- ⚠️ **Free List**: Requires external data structures for diagnostics
### 2.4 Statistics and Profiling
**Bitmap**: O(1) slab-level queries
```c
// All O(1) operations
uint16_t free_count = slab->free_count;
bool is_empty = (free_count == 1024);
bool is_full = (free_count == 0);
float utilization = 1.0 - (free_count / 1024.0);
// Fragmentation analysis (O(n) but rare)
int longest_run = find_longest_free_run(slab->bitmap);
```
**Free List**: Requires traversal
```c
// Count requires O(n) traversal
int free_count = 0;
for (Block* b = page->local_free; b; b = b->next) {
free_count++;
}
// Cannot determine fragmentation without traversal
```
**Impact**:
-**Bitmap**: Fast statistics collection (research-friendly)
-**Bitmap**: Can analyze allocation patterns
- ⚠️ **Free List**: Statistics require expensive traversal or external counters
### 2.5 Concurrent Access Visibility
**Bitmap**: Can inspect remote thread state
```c
// Diagnostic thread can scan all slabs
void print_global_state() {
for (int tid = 0; tid < MAX_THREADS; tid++) {
for (int class = 0; class < 8; class++) {
TinySlab* s = get_slab(tid, class);
// Instant visibility of free_count, bitmap
printf("Thread %d Class %d: %d/%d free\n",
tid, class, s->free_count, 1024);
}
}
}
```
**Free List**: Cannot safely inspect remote thread's local_free
```c
// Diagnostic thread CANNOT read local_free (race condition)
// Must use external atomic counters (defeats purpose)
```
**Impact**:
-**Bitmap**: Can build monitoring dashboards, live profilers
-**Bitmap**: Supports cross-thread adoption decisions (CDA)
- ⚠️ **Free List**: Opaque to external observers
### 2.6 Research and Experimentation
**Bitmap**: Easy to modify allocation policy
```c
// Experiment: Best-fit instead of first-fit
int find_best_fit_block(TinySlab* s, int requested_run) {
// Scan bitmap for smallest run >= requested_run
// Easy to implement alternative allocation strategies
}
// Experiment: Locality-aware allocation
int find_nearest_free(TinySlab* s, void* previous_alloc) {
int prev_idx = block_index(s, previous_alloc);
// Search bitmap for nearby free blocks (cache locality)
}
```
**Free List**: Policy locked to LIFO
```c
// Always LIFO (most recently freed = next allocated)
// Cannot experiment with other policies without major restructuring
```
**Impact**:
-**Bitmap**: Flexible research platform (try different allocation strategies)
-**Bitmap**: Can experiment with locality, fragmentation reduction
- ⚠️ **Free List**: Fixed policy (LIFO only)
---
## 3. Free List Advantages
### 3.1 Raw Performance
**Numbers from ANALYSIS_SUMMARY.md**:
- **Bitmap**: 5-6 ns per allocation (find-first-set + bit extraction)
- **Free List**: 1-2 ns per allocation (two pointer operations)
- **Gap**: **3-4 ns per allocation (2-6x faster)**
**Why Free List Wins**:
```c
// Bitmap: 5 operations
int word_idx = find_first_nonzero(bitmap); // 2-3 ns (unpredictable branch)
int bit_idx = ctzll(bitmap[word_idx]); // 1 ns (CPU instruction)
bitmap[word_idx] &= ~(1ULL << bit_idx); // 1 ns (bit clear)
void* ptr = base + index * block_size; // 1 ns (arithmetic)
// Total: 5 ns
// Free List: 2 operations
Block* b = page->local_free; // 0.5 ns (L1 hit)
page->local_free = b->next; // 0.5 ns (L1 hit)
return b; // 0.5 ns
// Total: 1.5 ns
```
### 3.2 Cache Efficiency
**Free List**: Excellent temporal locality
```c
// Recently freed block = next allocated (LIFO)
// Likely still in L1 cache (3-5 cycles)
Block* b = page->local_free; // Cache hit!
```
**Bitmap**: Poorer temporal locality
```c
// Allocated block may be anywhere in slab
// Bitmap access + block access = 2 cache lines
int idx = find_first_set(...); // Cache line 1 (bitmap)
void* ptr = base + idx * block_size; // Cache line 2 (block)
```
**Impact**:
-**Free List**: Better L1 cache hit rate (~95%+)
- ⚠️ **Bitmap**: More cache line touches (~2x)
### 3.3 Memory Overhead
**Free List**: Zero metadata
```c
typedef struct Page {
Block* local_free; // 8 bytes
uint16_t capacity; // 2 bytes
// Total: 10 bytes for entire page
} Page;
```
**Bitmap**: 1 bit per block (+ supporting metadata)
```c
typedef struct TinySlab {
uint64_t bitmap[16]; // 128 bytes (1024 blocks)
uint16_t free_count; // 2 bytes
uint8_t* base; // 8 bytes
// Total: 138 bytes minimum
} TinySlab;
// For 8-byte blocks: 1024 * 8 = 8KB data, 138B metadata = 1.7% overhead
```
**Impact**:
-**Free List**: ~0.1% overhead
- ⚠️ **Bitmap**: ~1-2% overhead
### 3.4 Simplicity
**Free List**: Minimal code complexity
```c
// Entire allocation logic: 3 lines
void* alloc(Page* p) {
Block* b = p->local_free;
if (!b) return NULL;
p->local_free = b->next;
return b;
}
```
**Bitmap**: More complex
```c
// Allocation logic: 15+ lines
void* alloc(TinySlab* s) {
if (s->free_count == 0) return NULL;
for (int i = 0; i < 16; i++) {
if (s->bitmap[i] == 0) continue; // Skip empty words
int bit_idx = __builtin_ctzll(s->bitmap[i]);
s->bitmap[i] &= ~(1ULL << bit_idx);
s->free_count--;
return s->base + (i * 64 + bit_idx) * s->block_size;
}
return NULL; // Should never reach
}
```
**Impact**:
-**Free List**: Easier to understand, maintain, optimize
- ⚠️ **Bitmap**: More code paths, more potential for bugs
---
## 4. Real-World Use Cases
### When Bitmap Wins
**Scenario 1: Memory Debugging Tools**
```c
// AddressSanitizer, Valgrind integration
// Can mark freed blocks immediately
void free_with_asan(TinySlab* s, void* ptr) {
int idx = block_index(s, ptr);
s->bitmap[idx / 64] |= (1ULL << (idx % 64));
__asan_poison_memory_region(ptr, block_size); // Safe!
}
```
**Scenario 2: Research Allocators**
```c
// Experimenting with allocation strategies
// e.g., hakmem's ELO learning, call-site profiling
void alloc_with_learning(TinySlab* s, void* site) {
int idx = find_best_block_for_site(s, site); // Bitmap enables this
// Can implement custom heuristics
}
```
**Scenario 3: Diagnostic Dashboards**
```c
// Real-time monitoring (e.g., allocator profiler UI)
// Can scan all slabs without stopping allocation
void update_dashboard() {
for_each_slab(slab) {
dashboard_update(slab->free_count, slab->bitmap);
// No disruption to allocation threads
}
}
```
### When Free List Wins
**Scenario 1: Production Web Servers**
```c
// mimalloc in WebKit, nginx, etc.
// Every nanosecond counts (millions of allocations/sec)
// Diagnostics = rare, speed = always
```
**Scenario 2: Latency-Sensitive Systems**
```c
// HFT, real-time systems
// Predictable 1-2ns allocation critical
// Bitmap's 5-6ns too variable
```
**Scenario 3: Memory-Constrained Embedded**
```c
// 1.7% bitmap overhead unacceptable
// Every byte matters
```
---
## 5. Quantitative Comparison
| Metric | Bitmap | Free List | Winner |
|--------|--------|-----------|--------|
| **Performance** |
| Allocation latency | 5-6 ns | 1-2 ns | Free List (3-4ns faster) |
| Cache efficiency | 2 cache lines | 1 cache line | Free List |
| Branch mispredicts | 1-2 per alloc | 0-1 per alloc | Free List |
| **Memory** |
| Metadata overhead | 1-2% | ~0.1% | Free List |
| Block size impact | +128B per slab | +8B per page | Free List |
| **Diagnostics** |
| Observability | Full state visible | Opaque (count only) | Bitmap |
| Debugging | Easy (zeroed free) | Hard (pointer corruption) | Bitmap |
| Statistics | O(1) queries | O(n) traversal | Bitmap |
| Profiling | Per-block tracking | External hash table | Bitmap |
| **Flexibility** |
| Allocation policy | Pluggable (first-fit, best-fit, etc.) | LIFO only | Bitmap |
| Research | Easy experimentation | Fixed design | Bitmap |
| Monitoring | Non-intrusive scanning | Requires external counters | Bitmap |
| **Safety** |
| Use-after-free detection | Good (zeroed memory) | Poor (pointer corruption) | Bitmap |
| ASAN/Valgrind integration | Excellent | Limited | Bitmap |
| Cross-thread validation | Easy | Requires external state | Bitmap |
| **Complexity** |
| Code size | ~100 lines | ~20 lines | Free List |
| Maintainability | Moderate | High | Free List |
| Optimization potential | Limited (bitmap scan) | High (2 pointers) | Free List |
**Overall**:
- **Production speed**: Free List wins (3-4ns faster, simpler)
- **Research/diagnostics**: Bitmap wins (visibility, flexibility, safety)
---
## 6. Hybrid Approaches
### Option 1: Dual-Mode Allocator
```c
#ifdef HAKMEM_DIAGNOSTIC_MODE
// Bitmap mode (slow but visible)
void* alloc() { return alloc_bitmap(); }
#else
// Free list mode (fast production)
void* alloc() { return alloc_freelist(); }
#endif
```
**Pros**: Best of both worlds
**Cons**: Maintenance burden (two code paths)
### Option 2: Shadow Bitmap
```c
// Fast path: Free list
Block* b = page->local_free;
page->local_free = b->next;
// Diagnostic path: Update shadow bitmap (async)
if (unlikely(diagnostic_enabled)) {
shadow_bitmap_record(page, b); // Non-blocking queue
}
```
**Pros**: Fast path unaffected, diagnostics available
**Cons**: Shadow state may lag, memory overhead
### Option 3: Adaptive Strategy
```c
// Use bitmap for slabs with high churn (diagnostic value)
// Use free list for stable slabs (performance critical)
if (slab->churn_rate > THRESHOLD) {
use_bitmap_mode(slab);
} else {
use_freelist_mode(slab);
}
```
**Pros**: Dynamic optimization
**Cons**: Complex, runtime overhead
---
## 7. Recommendations for hakmem
### Context: hakmem's Goals (from ANALYSIS_SUMMARY.md)
> **hakmem's Philosophy** (research PoC):
> - "Flexible architecture: research platform for learning"
> - "Trade performance for visibility (ownership tracking, per-class stats)"
> - "Novel features: call-site profiling, ELO learning, evolution tracking"
### Recommendation: **Keep Bitmap for Tiny Pool**
**Reasons**:
1.**Research value**: hakmem's ELO learning, call-site profiling **require** per-block tracking
2.**Diagnostics**: Ownership tracking, CDA decision-making benefit from bitmap visibility
3.**Trade-off is acceptable**: 5-6ns overhead is worth the flexibility for a research allocator
4. ⚠️ **But optimize around it**: Remove statistics overhead, simplify hot path (my original P1-P2)
### Alternative: **Adopt Free List for Tiny Pool**
**Reasons**:
1.**Performance**: Closes 3-4ns of the 69ns gap
2.**Proven**: mimalloc's design is battle-tested
3.**Simplicity**: Easier to maintain, optimize
4. ⚠️ **But lose research features**: Must find alternative ways to track per-block metadata
### Compromise: **Hybrid Approach**
**Proposal**:
```c
// Fast path: Free list (mimalloc-style)
void* tiny_alloc_fast(Page* p) {
Block* b = p->local_free;
if (likely(b)) {
p->local_free = b->next;
return b;
}
return tiny_alloc_slow(p);
}
// Diagnostic mode: Enable shadow bitmap
#ifdef HAKMEM_DIAGNOSTIC_MODE
void* tiny_alloc_slow(Page* p) {
void* ptr = refill_from_partial(p);
diagnostic_record_alloc(p, ptr); // Async, non-blocking
return ptr;
}
#endif
```
**Benefits**:
- Fast path: 1-2ns (mimalloc speed)
- Diagnostic mode: Optional bitmap tracking (research features)
- Production mode: Zero overhead
---
## 8. Decision Matrix
| Priority | Bitmap | Free List | Hybrid |
|----------|--------|-----------|--------|
| **Speed is #1 goal** | ❌ | ✅ | ✅ |
| **Research/diagnostics #1** | ✅ | ❌ | ⚠️ (complex) |
| **Simplicity #1** | ⚠️ | ✅ | ❌ |
| **Memory efficiency #1** | ❌ | ✅ | ⚠️ |
| **Flexibility #1** | ✅ | ❌ | ✅ |
**For hakmem specifically**:
- If **goal = beat mimalloc**: Free List
- If **goal = research platform**: Bitmap
- If **goal = both**: Hybrid (complex but feasible)
---
## 9. Conclusion
### The Fundamental Tradeoff
**Bitmap = Observatory, Free List = Race Car**
- **Bitmap**: Sacrifices 3-4ns for complete visibility and flexibility
- **Free List**: Sacrifices observability for raw speed
### For hakmem's Context
Based on ANALYSIS_SUMMARY.md, hakmem's goals include:
- "Call-site profiling" → **Requires per-block tracking** → Bitmap advantage
- "ELO learning" → **Requires allocation history** → Bitmap advantage
- "Evolution tracking" → **Requires observability** → Bitmap advantage
**Verdict**: **Bitmap is the right choice for hakmem's research goals**
### But Optimize Around It
Instead of abandoning bitmap:
1.**Remove statistics overhead** (ChatGPT Pro's P1) → +10ns
2.**Simplify hot path** (my original P1-P2) → +15ns
3.**Keep bitmap** → Preserve research features
**Expected**: 83ns → 58-65ns (still 4x slower than mimalloc, but research features intact)
---
**Last Updated**: 2025-10-26
**Status**: Analysis complete
**Next**: Decide strategy based on project priorities