# Bitmap vs Free List: Design Tradeoffs

**Date**: 2025-10-26
**Context**: Evaluating architectural choices for hakmem Tiny Pool optimization
**Purpose**: Understand tradeoffs before deciding whether to adopt mimalloc's free list approach

---

## Executive Summary

### The Core Question

**Should hakmem abandon bitmap allocation in favor of mimalloc's intrusive free list?**

**Answer**: It depends on **project goals**:
- **If goal = production speed**: Free list wins (5-10ns faster)
- **If goal = research/diagnostics**: Bitmap wins (visibility, safety, flexibility)
- **If goal = both**: Hybrid approach possible (see Section 6)

---

## 1. Architecture Comparison

### Bitmap Approach (Current hakmem)

```c
// Metadata: Separate bitmap (1 bit per block)
typedef struct TinySlab {
    uint64_t bitmap[16];      // 1024 blocks = 1024 bits
    uint8_t* base;            // Data region
    uint16_t free_count;      // O(1) empty check
    // ... diagnostics, ownership, stats ...
} TinySlab;

// Allocation: Find-first-set
void* alloc_from_bitmap(TinySlab* s) {
    int word_idx = find_first_nonzero(s->bitmap);  // ~3 ns
    int bit_idx = __builtin_ctzll(s->bitmap[word_idx]);  // ~1 ns
    s->bitmap[word_idx] &= ~(1ULL << bit_idx);     // ~1 ns
    return s->base + (word_idx * 64 + bit_idx) * block_size;
}
// Cost: 5-6 ns (bitmap scan + bit extraction)
```

**Key Properties**:
- ✅ Metadata separate from data
- ✅ Random access to allocation state
- ✅ O(1) slab-level statistics (free_count, bitmap scan)
- ⚠️ 5-6 ns overhead per allocation

### Free List Approach (mimalloc)

```c
// Metadata: Intrusive next-pointer in free blocks
typedef struct Block {
    struct Block* next;       // 8 bytes IN the data region
} Block;

typedef struct Page {
    Block* local_free;        // LIFO stack head
    // ... minimal metadata ...
} Page;

// Allocation: Pop from LIFO
void* alloc_from_freelist(Page* p) {
    Block* b = p->local_free;             // ~0.5 ns (L1 hit)
    p->local_free = b->next;              // ~0.5 ns (L1 hit)
    return b;
}
// Cost: 1-2 ns (two pointer operations)
```

**Key Properties**:
- ✅ Zero metadata overhead (uses free blocks themselves)
- ✅ Minimal CPU overhead (1-2 pointer ops)
- ⚠️ Intrusive (overwrites first 8 bytes of free blocks)
- ⚠️ No random access (must traverse list)

---

## 2. Bitmap Advantages

### 2.1 Observability and Diagnostics

**Bitmap**: Complete allocation state visible at a glance
```c
// Print slab state (O(1) bitmap scan)
void print_slab_state(TinySlab* s) {
    printf("Slab free pattern: ");
    for (int i = 0; i < 1024; i++) {
        printf("%c", is_free(s->bitmap, i) ? '.' : 'X');
    }
    // Output: "X...XX.X.XX....." (visual fragmentation pattern)
}
```

**Free List**: Must traverse entire list
```c
// Print page state (O(n) traversal)
void print_page_state(Page* p) {
    int count = 0;
    Block* b = p->local_free;
    while (b) { count++; b = b->next; }
    printf("Free blocks: %d (locations unknown)\n", count);
    // Output: "Free blocks: 42" (no spatial information)
}
```

**Impact**:
- ✅ **Bitmap**: Can detect fragmentation patterns, hot spots, allocation clustering
- ⚠️ **Free List**: Only knows count, not spatial distribution

### 2.2 Memory Safety and Debugging

**Bitmap**: Freed memory can be immediately zeroed
```c
void free_to_bitmap(TinySlab* s, void* ptr) {
    int idx = block_index(s, ptr);
    s->bitmap[idx / 64] |= (1ULL << (idx % 64));
    memset(ptr, 0, block_size);  // Safe: no metadata in block
}
// Use-after-free detection: accessing 0-filled memory likely crashes early
```

**Free List**: Next-pointer remains in freed memory
```c
void free_to_list(Page* p, void* ptr) {
    Block* b = (Block*)ptr;
    b->next = p->local_free;     // Writes to freed memory!
    p->local_free = b;
}
// Use-after-free: might corrupt next-pointer, causing subtle bugs later
```

**Impact**:
- ✅ **Bitmap**: Easier debugging (freed memory is clean)
- ✅ **Bitmap**: Better ASAN/Valgrind integration (can mark freed)
- ⚠️ **Free List**: Next-pointer corruption can cause cascading failures

### 2.3 Ownership Tracking and Validation

**Bitmap**: Can track per-block metadata
```c
typedef struct TinySlab {
    uint64_t bitmap[16];       // Allocation state
    uint8_t owner[1024];       // Per-block owner thread ID
    uint32_t alloc_time[1024]; // Allocation timestamp
} TinySlab;

// Validate ownership on free
void free_with_validation(TinySlab* s, void* ptr) {
    int idx = block_index(s, ptr);
    if (s->owner[idx] != current_thread()) {
        fprintf(stderr, "ERROR: Cross-thread free without handoff!\n");
        // Can detect bugs immediately
    }
}
```

**Free List**: No per-block metadata (intrusive design)
```c
// Cannot store per-block metadata without external hash table
// Owner validation requires separate data structure
```

**Impact**:
- ✅ **Bitmap**: Can implement rich diagnostics (owner, timestamp, call-site)
- ✅ **Bitmap**: Validates invariants at allocation/free time
- ⚠️ **Free List**: Requires external data structures for diagnostics

### 2.4 Statistics and Profiling

**Bitmap**: O(1) slab-level queries
```c
// All O(1) operations
uint16_t free_count = slab->free_count;
bool is_empty = (free_count == 1024);
bool is_full = (free_count == 0);
float utilization = 1.0 - (free_count / 1024.0);

// Fragmentation analysis (O(n) but rare)
int longest_run = find_longest_free_run(slab->bitmap);
```

**Free List**: Requires traversal
```c
// Count requires O(n) traversal
int free_count = 0;
for (Block* b = page->local_free; b; b = b->next) {
    free_count++;
}
// Cannot determine fragmentation without traversal
```

**Impact**:
- ✅ **Bitmap**: Fast statistics collection (research-friendly)
- ✅ **Bitmap**: Can analyze allocation patterns
- ⚠️ **Free List**: Statistics require expensive traversal or external counters

### 2.5 Concurrent Access Visibility

**Bitmap**: Can inspect remote thread state
```c
// Diagnostic thread can scan all slabs
void print_global_state() {
    for (int tid = 0; tid < MAX_THREADS; tid++) {
        for (int class = 0; class < 8; class++) {
            TinySlab* s = get_slab(tid, class);
            // Instant visibility of free_count, bitmap
            printf("Thread %d Class %d: %d/%d free\n",
                   tid, class, s->free_count, 1024);
        }
    }
}
```

**Free List**: Cannot safely inspect remote thread's local_free
```c
// Diagnostic thread CANNOT read local_free (race condition)
// Must use external atomic counters (defeats purpose)
```

**Impact**:
- ✅ **Bitmap**: Can build monitoring dashboards, live profilers
- ✅ **Bitmap**: Supports cross-thread adoption decisions (CDA)
- ⚠️ **Free List**: Opaque to external observers

### 2.6 Research and Experimentation

**Bitmap**: Easy to modify allocation policy
```c
// Experiment: Best-fit instead of first-fit
int find_best_fit_block(TinySlab* s, int requested_run) {
    // Scan bitmap for smallest run >= requested_run
    // Easy to implement alternative allocation strategies
}

// Experiment: Locality-aware allocation
int find_nearest_free(TinySlab* s, void* previous_alloc) {
    int prev_idx = block_index(s, previous_alloc);
    // Search bitmap for nearby free blocks (cache locality)
}
```

**Free List**: Policy locked to LIFO
```c
// Always LIFO (most recently freed = next allocated)
// Cannot experiment with other policies without major restructuring
```

**Impact**:
- ✅ **Bitmap**: Flexible research platform (try different allocation strategies)
- ✅ **Bitmap**: Can experiment with locality, fragmentation reduction
- ⚠️ **Free List**: Fixed policy (LIFO only)

---

## 3. Free List Advantages

### 3.1 Raw Performance

**Numbers from ANALYSIS_SUMMARY.md**:
- **Bitmap**: 5-6 ns per allocation (find-first-set + bit extraction)
- **Free List**: 1-2 ns per allocation (two pointer operations)
- **Gap**: **3-4 ns per allocation (2-6x faster)**

**Why Free List Wins**:
```c
// Bitmap: 5 operations
int word_idx = find_first_nonzero(bitmap);  // 2-3 ns (unpredictable branch)
int bit_idx = ctzll(bitmap[word_idx]);      // 1 ns (CPU instruction)
bitmap[word_idx] &= ~(1ULL << bit_idx);     // 1 ns (bit clear)
void* ptr = base + index * block_size;      // 1 ns (arithmetic)
// Total: 5 ns

// Free List: 2 operations
Block* b = page->local_free;                // 0.5 ns (L1 hit)
page->local_free = b->next;                 // 0.5 ns (L1 hit)
return b;                                   // 0.5 ns
// Total: 1.5 ns
```

### 3.2 Cache Efficiency

**Free List**: Excellent temporal locality
```c
// Recently freed block = next allocated (LIFO)
// Likely still in L1 cache (3-5 cycles)
Block* b = page->local_free;  // Cache hit!
```

**Bitmap**: Poorer temporal locality
```c
// Allocated block may be anywhere in slab
// Bitmap access + block access = 2 cache lines
int idx = find_first_set(...);      // Cache line 1 (bitmap)
void* ptr = base + idx * block_size; // Cache line 2 (block)
```

**Impact**:
- ✅ **Free List**: Better L1 cache hit rate (~95%+)
- ⚠️ **Bitmap**: More cache line touches (~2x)

### 3.3 Memory Overhead

**Free List**: Zero metadata
```c
typedef struct Page {
    Block* local_free;   // 8 bytes
    uint16_t capacity;   // 2 bytes
    // Total: 10 bytes for entire page
} Page;
```

**Bitmap**: 1 bit per block (+ supporting metadata)
```c
typedef struct TinySlab {
    uint64_t bitmap[16];  // 128 bytes (1024 blocks)
    uint16_t free_count;  // 2 bytes
    uint8_t* base;        // 8 bytes
    // Total: 138 bytes minimum
} TinySlab;
// For 8-byte blocks: 1024 * 8 = 8KB data, 138B metadata = 1.7% overhead
```

**Impact**:
- ✅ **Free List**: ~0.1% overhead
- ⚠️ **Bitmap**: ~1-2% overhead

### 3.4 Simplicity

**Free List**: Minimal code complexity
```c
// Entire allocation logic: 3 lines
void* alloc(Page* p) {
    Block* b = p->local_free;
    if (!b) return NULL;
    p->local_free = b->next;
    return b;
}
```

**Bitmap**: More complex
```c
// Allocation logic: 15+ lines
void* alloc(TinySlab* s) {
    if (s->free_count == 0) return NULL;
    for (int i = 0; i < 16; i++) {
        if (s->bitmap[i] == 0) continue;  // Skip empty words
        int bit_idx = __builtin_ctzll(s->bitmap[i]);
        s->bitmap[i] &= ~(1ULL << bit_idx);
        s->free_count--;
        return s->base + (i * 64 + bit_idx) * s->block_size;
    }
    return NULL;  // Should never reach
}
```

**Impact**:
- ✅ **Free List**: Easier to understand, maintain, optimize
- ⚠️ **Bitmap**: More code paths, more potential for bugs

---

## 4. Real-World Use Cases

### When Bitmap Wins

**Scenario 1: Memory Debugging Tools**
```c
// AddressSanitizer, Valgrind integration
// Can mark freed blocks immediately
void free_with_asan(TinySlab* s, void* ptr) {
    int idx = block_index(s, ptr);
    s->bitmap[idx / 64] |= (1ULL << (idx % 64));
    __asan_poison_memory_region(ptr, block_size);  // Safe!
}
```

**Scenario 2: Research Allocators**
```c
// Experimenting with allocation strategies
// e.g., hakmem's ELO learning, call-site profiling
void alloc_with_learning(TinySlab* s, void* site) {
    int idx = find_best_block_for_site(s, site);  // Bitmap enables this
    // Can implement custom heuristics
}
```

**Scenario 3: Diagnostic Dashboards**
```c
// Real-time monitoring (e.g., allocator profiler UI)
// Can scan all slabs without stopping allocation
void update_dashboard() {
    for_each_slab(slab) {
        dashboard_update(slab->free_count, slab->bitmap);
        // No disruption to allocation threads
    }
}
```

### When Free List Wins

**Scenario 1: Production Web Servers**
```c
// mimalloc in WebKit, nginx, etc.
// Every nanosecond counts (millions of allocations/sec)
// Diagnostics = rare, speed = always
```

**Scenario 2: Latency-Sensitive Systems**
```c
// HFT, real-time systems
// Predictable 1-2ns allocation critical
// Bitmap's 5-6ns too variable
```

**Scenario 3: Memory-Constrained Embedded**
```c
// 1.7% bitmap overhead unacceptable
// Every byte matters
```

---

## 5. Quantitative Comparison

| Metric | Bitmap | Free List | Winner |
|--------|--------|-----------|--------|
| **Performance** |
| Allocation latency | 5-6 ns | 1-2 ns | Free List (3-4ns faster) |
| Cache efficiency | 2 cache lines | 1 cache line | Free List |
| Branch mispredicts | 1-2 per alloc | 0-1 per alloc | Free List |
| **Memory** |
| Metadata overhead | 1-2% | ~0.1% | Free List |
| Block size impact | +128B per slab | +8B per page | Free List |
| **Diagnostics** |
| Observability | Full state visible | Opaque (count only) | Bitmap |
| Debugging | Easy (zeroed free) | Hard (pointer corruption) | Bitmap |
| Statistics | O(1) queries | O(n) traversal | Bitmap |
| Profiling | Per-block tracking | External hash table | Bitmap |
| **Flexibility** |
| Allocation policy | Pluggable (first-fit, best-fit, etc.) | LIFO only | Bitmap |
| Research | Easy experimentation | Fixed design | Bitmap |
| Monitoring | Non-intrusive scanning | Requires external counters | Bitmap |
| **Safety** |
| Use-after-free detection | Good (zeroed memory) | Poor (pointer corruption) | Bitmap |
| ASAN/Valgrind integration | Excellent | Limited | Bitmap |
| Cross-thread validation | Easy | Requires external state | Bitmap |
| **Complexity** |
| Code size | ~100 lines | ~20 lines | Free List |
| Maintainability | Moderate | High | Free List |
| Optimization potential | Limited (bitmap scan) | High (2 pointers) | Free List |

**Overall**:
- **Production speed**: Free List wins (3-4ns faster, simpler)
- **Research/diagnostics**: Bitmap wins (visibility, flexibility, safety)

---

## 6. Hybrid Approaches

### Option 1: Dual-Mode Allocator

```c
#ifdef HAKMEM_DIAGNOSTIC_MODE
    // Bitmap mode (slow but visible)
    void* alloc() { return alloc_bitmap(); }
#else
    // Free list mode (fast production)
    void* alloc() { return alloc_freelist(); }
#endif
```

**Pros**: Best of both worlds
**Cons**: Maintenance burden (two code paths)

### Option 2: Shadow Bitmap

```c
// Fast path: Free list
Block* b = page->local_free;
page->local_free = b->next;

// Diagnostic path: Update shadow bitmap (async)
if (unlikely(diagnostic_enabled)) {
    shadow_bitmap_record(page, b);  // Non-blocking queue
}
```

**Pros**: Fast path unaffected, diagnostics available
**Cons**: Shadow state may lag, memory overhead

### Option 3: Adaptive Strategy

```c
// Use bitmap for slabs with high churn (diagnostic value)
// Use free list for stable slabs (performance critical)
if (slab->churn_rate > THRESHOLD) {
    use_bitmap_mode(slab);
} else {
    use_freelist_mode(slab);
}
```

**Pros**: Dynamic optimization
**Cons**: Complex, runtime overhead

---

## 7. Recommendations for hakmem

### Context: hakmem's Goals (from ANALYSIS_SUMMARY.md)

> **hakmem's Philosophy** (research PoC):
> - "Flexible architecture: research platform for learning"
> - "Trade performance for visibility (ownership tracking, per-class stats)"
> - "Novel features: call-site profiling, ELO learning, evolution tracking"

### Recommendation: **Keep Bitmap for Tiny Pool**

**Reasons**:
1. ✅ **Research value**: hakmem's ELO learning, call-site profiling **require** per-block tracking
2. ✅ **Diagnostics**: Ownership tracking, CDA decision-making benefit from bitmap visibility
3. ✅ **Trade-off is acceptable**: 5-6ns overhead is worth the flexibility for a research allocator
4. ⚠️ **But optimize around it**: Remove statistics overhead, simplify hot path (my original P1-P2)

### Alternative: **Adopt Free List for Tiny Pool**

**Reasons**:
1. ✅ **Performance**: Closes 3-4ns of the 69ns gap
2. ✅ **Proven**: mimalloc's design is battle-tested
3. ✅ **Simplicity**: Easier to maintain, optimize
4. ⚠️ **But lose research features**: Must find alternative ways to track per-block metadata

### Compromise: **Hybrid Approach**

**Proposal**:
```c
// Fast path: Free list (mimalloc-style)
void* tiny_alloc_fast(Page* p) {
    Block* b = p->local_free;
    if (likely(b)) {
        p->local_free = b->next;
        return b;
    }
    return tiny_alloc_slow(p);
}

// Diagnostic mode: Enable shadow bitmap
#ifdef HAKMEM_DIAGNOSTIC_MODE
void* tiny_alloc_slow(Page* p) {
    void* ptr = refill_from_partial(p);
    diagnostic_record_alloc(p, ptr);  // Async, non-blocking
    return ptr;
}
#endif
```

**Benefits**:
- Fast path: 1-2ns (mimalloc speed)
- Diagnostic mode: Optional bitmap tracking (research features)
- Production mode: Zero overhead

---

## 8. Decision Matrix

| Priority | Bitmap | Free List | Hybrid |
|----------|--------|-----------|--------|
| **Speed is #1 goal** | ❌ | ✅ | ✅ |
| **Research/diagnostics #1** | ✅ | ❌ | ⚠️ (complex) |
| **Simplicity #1** | ⚠️ | ✅ | ❌ |
| **Memory efficiency #1** | ❌ | ✅ | ⚠️ |
| **Flexibility #1** | ✅ | ❌ | ✅ |

**For hakmem specifically**:
- If **goal = beat mimalloc**: Free List
- If **goal = research platform**: Bitmap
- If **goal = both**: Hybrid (complex but feasible)

---

## 9. Conclusion

### The Fundamental Tradeoff

**Bitmap = Observatory, Free List = Race Car**

- **Bitmap**: Sacrifices 3-4ns for complete visibility and flexibility
- **Free List**: Sacrifices observability for raw speed

### For hakmem's Context

Based on ANALYSIS_SUMMARY.md, hakmem's goals include:
- "Call-site profiling" → **Requires per-block tracking** → Bitmap advantage
- "ELO learning" → **Requires allocation history** → Bitmap advantage
- "Evolution tracking" → **Requires observability** → Bitmap advantage

**Verdict**: **Bitmap is the right choice for hakmem's research goals**

### But Optimize Around It

Instead of abandoning bitmap:
1. ✅ **Remove statistics overhead** (ChatGPT Pro's P1) → +10ns
2. ✅ **Simplify hot path** (my original P1-P2) → +15ns
3. ✅ **Keep bitmap** → Preserve research features

**Expected**: 83ns → 58-65ns (still 4x slower than mimalloc, but research features intact)

---

**Last Updated**: 2025-10-26
**Status**: Analysis complete
**Next**: Decide strategy based on project priorities