# Bitmap vs Free List: Design Tradeoffs **Date**: 2025-10-26 **Context**: Evaluating architectural choices for hakmem Tiny Pool optimization **Purpose**: Understand tradeoffs before deciding whether to adopt mimalloc's free list approach --- ## Executive Summary ### The Core Question **Should hakmem abandon bitmap allocation in favor of mimalloc's intrusive free list?** **Answer**: It depends on **project goals**: - **If goal = production speed**: Free list wins (5-10ns faster) - **If goal = research/diagnostics**: Bitmap wins (visibility, safety, flexibility) - **If goal = both**: Hybrid approach possible (see Section 6) --- ## 1. Architecture Comparison ### Bitmap Approach (Current hakmem) ```c // Metadata: Separate bitmap (1 bit per block) typedef struct TinySlab { uint64_t bitmap[16]; // 1024 blocks = 1024 bits uint8_t* base; // Data region uint16_t free_count; // O(1) empty check // ... diagnostics, ownership, stats ... } TinySlab; // Allocation: Find-first-set void* alloc_from_bitmap(TinySlab* s) { int word_idx = find_first_nonzero(s->bitmap); // ~3 ns int bit_idx = __builtin_ctzll(s->bitmap[word_idx]); // ~1 ns s->bitmap[word_idx] &= ~(1ULL << bit_idx); // ~1 ns return s->base + (word_idx * 64 + bit_idx) * block_size; } // Cost: 5-6 ns (bitmap scan + bit extraction) ``` **Key Properties**: - ✅ Metadata separate from data - ✅ Random access to allocation state - ✅ O(1) slab-level statistics (free_count, bitmap scan) - ⚠️ 5-6 ns overhead per allocation ### Free List Approach (mimalloc) ```c // Metadata: Intrusive next-pointer in free blocks typedef struct Block { struct Block* next; // 8 bytes IN the data region } Block; typedef struct Page { Block* local_free; // LIFO stack head // ... minimal metadata ... } Page; // Allocation: Pop from LIFO void* alloc_from_freelist(Page* p) { Block* b = p->local_free; // ~0.5 ns (L1 hit) p->local_free = b->next; // ~0.5 ns (L1 hit) return b; } // Cost: 1-2 ns (two pointer operations) ``` **Key Properties**: - ✅ Zero metadata overhead (uses free blocks themselves) - ✅ Minimal CPU overhead (1-2 pointer ops) - ⚠️ Intrusive (overwrites first 8 bytes of free blocks) - ⚠️ No random access (must traverse list) --- ## 2. Bitmap Advantages ### 2.1 Observability and Diagnostics **Bitmap**: Complete allocation state visible at a glance ```c // Print slab state (O(1) bitmap scan) void print_slab_state(TinySlab* s) { printf("Slab free pattern: "); for (int i = 0; i < 1024; i++) { printf("%c", is_free(s->bitmap, i) ? '.' : 'X'); } // Output: "X...XX.X.XX....." (visual fragmentation pattern) } ``` **Free List**: Must traverse entire list ```c // Print page state (O(n) traversal) void print_page_state(Page* p) { int count = 0; Block* b = p->local_free; while (b) { count++; b = b->next; } printf("Free blocks: %d (locations unknown)\n", count); // Output: "Free blocks: 42" (no spatial information) } ``` **Impact**: - ✅ **Bitmap**: Can detect fragmentation patterns, hot spots, allocation clustering - ⚠️ **Free List**: Only knows count, not spatial distribution ### 2.2 Memory Safety and Debugging **Bitmap**: Freed memory can be immediately zeroed ```c void free_to_bitmap(TinySlab* s, void* ptr) { int idx = block_index(s, ptr); s->bitmap[idx / 64] |= (1ULL << (idx % 64)); memset(ptr, 0, block_size); // Safe: no metadata in block } // Use-after-free detection: accessing 0-filled memory likely crashes early ``` **Free List**: Next-pointer remains in freed memory ```c void free_to_list(Page* p, void* ptr) { Block* b = (Block*)ptr; b->next = p->local_free; // Writes to freed memory! p->local_free = b; } // Use-after-free: might corrupt next-pointer, causing subtle bugs later ``` **Impact**: - ✅ **Bitmap**: Easier debugging (freed memory is clean) - ✅ **Bitmap**: Better ASAN/Valgrind integration (can mark freed) - ⚠️ **Free List**: Next-pointer corruption can cause cascading failures ### 2.3 Ownership Tracking and Validation **Bitmap**: Can track per-block metadata ```c typedef struct TinySlab { uint64_t bitmap[16]; // Allocation state uint8_t owner[1024]; // Per-block owner thread ID uint32_t alloc_time[1024]; // Allocation timestamp } TinySlab; // Validate ownership on free void free_with_validation(TinySlab* s, void* ptr) { int idx = block_index(s, ptr); if (s->owner[idx] != current_thread()) { fprintf(stderr, "ERROR: Cross-thread free without handoff!\n"); // Can detect bugs immediately } } ``` **Free List**: No per-block metadata (intrusive design) ```c // Cannot store per-block metadata without external hash table // Owner validation requires separate data structure ``` **Impact**: - ✅ **Bitmap**: Can implement rich diagnostics (owner, timestamp, call-site) - ✅ **Bitmap**: Validates invariants at allocation/free time - ⚠️ **Free List**: Requires external data structures for diagnostics ### 2.4 Statistics and Profiling **Bitmap**: O(1) slab-level queries ```c // All O(1) operations uint16_t free_count = slab->free_count; bool is_empty = (free_count == 1024); bool is_full = (free_count == 0); float utilization = 1.0 - (free_count / 1024.0); // Fragmentation analysis (O(n) but rare) int longest_run = find_longest_free_run(slab->bitmap); ``` **Free List**: Requires traversal ```c // Count requires O(n) traversal int free_count = 0; for (Block* b = page->local_free; b; b = b->next) { free_count++; } // Cannot determine fragmentation without traversal ``` **Impact**: - ✅ **Bitmap**: Fast statistics collection (research-friendly) - ✅ **Bitmap**: Can analyze allocation patterns - ⚠️ **Free List**: Statistics require expensive traversal or external counters ### 2.5 Concurrent Access Visibility **Bitmap**: Can inspect remote thread state ```c // Diagnostic thread can scan all slabs void print_global_state() { for (int tid = 0; tid < MAX_THREADS; tid++) { for (int class = 0; class < 8; class++) { TinySlab* s = get_slab(tid, class); // Instant visibility of free_count, bitmap printf("Thread %d Class %d: %d/%d free\n", tid, class, s->free_count, 1024); } } } ``` **Free List**: Cannot safely inspect remote thread's local_free ```c // Diagnostic thread CANNOT read local_free (race condition) // Must use external atomic counters (defeats purpose) ``` **Impact**: - ✅ **Bitmap**: Can build monitoring dashboards, live profilers - ✅ **Bitmap**: Supports cross-thread adoption decisions (CDA) - ⚠️ **Free List**: Opaque to external observers ### 2.6 Research and Experimentation **Bitmap**: Easy to modify allocation policy ```c // Experiment: Best-fit instead of first-fit int find_best_fit_block(TinySlab* s, int requested_run) { // Scan bitmap for smallest run >= requested_run // Easy to implement alternative allocation strategies } // Experiment: Locality-aware allocation int find_nearest_free(TinySlab* s, void* previous_alloc) { int prev_idx = block_index(s, previous_alloc); // Search bitmap for nearby free blocks (cache locality) } ``` **Free List**: Policy locked to LIFO ```c // Always LIFO (most recently freed = next allocated) // Cannot experiment with other policies without major restructuring ``` **Impact**: - ✅ **Bitmap**: Flexible research platform (try different allocation strategies) - ✅ **Bitmap**: Can experiment with locality, fragmentation reduction - ⚠️ **Free List**: Fixed policy (LIFO only) --- ## 3. Free List Advantages ### 3.1 Raw Performance **Numbers from ANALYSIS_SUMMARY.md**: - **Bitmap**: 5-6 ns per allocation (find-first-set + bit extraction) - **Free List**: 1-2 ns per allocation (two pointer operations) - **Gap**: **3-4 ns per allocation (2-6x faster)** **Why Free List Wins**: ```c // Bitmap: 5 operations int word_idx = find_first_nonzero(bitmap); // 2-3 ns (unpredictable branch) int bit_idx = ctzll(bitmap[word_idx]); // 1 ns (CPU instruction) bitmap[word_idx] &= ~(1ULL << bit_idx); // 1 ns (bit clear) void* ptr = base + index * block_size; // 1 ns (arithmetic) // Total: 5 ns // Free List: 2 operations Block* b = page->local_free; // 0.5 ns (L1 hit) page->local_free = b->next; // 0.5 ns (L1 hit) return b; // 0.5 ns // Total: 1.5 ns ``` ### 3.2 Cache Efficiency **Free List**: Excellent temporal locality ```c // Recently freed block = next allocated (LIFO) // Likely still in L1 cache (3-5 cycles) Block* b = page->local_free; // Cache hit! ``` **Bitmap**: Poorer temporal locality ```c // Allocated block may be anywhere in slab // Bitmap access + block access = 2 cache lines int idx = find_first_set(...); // Cache line 1 (bitmap) void* ptr = base + idx * block_size; // Cache line 2 (block) ``` **Impact**: - ✅ **Free List**: Better L1 cache hit rate (~95%+) - ⚠️ **Bitmap**: More cache line touches (~2x) ### 3.3 Memory Overhead **Free List**: Zero metadata ```c typedef struct Page { Block* local_free; // 8 bytes uint16_t capacity; // 2 bytes // Total: 10 bytes for entire page } Page; ``` **Bitmap**: 1 bit per block (+ supporting metadata) ```c typedef struct TinySlab { uint64_t bitmap[16]; // 128 bytes (1024 blocks) uint16_t free_count; // 2 bytes uint8_t* base; // 8 bytes // Total: 138 bytes minimum } TinySlab; // For 8-byte blocks: 1024 * 8 = 8KB data, 138B metadata = 1.7% overhead ``` **Impact**: - ✅ **Free List**: ~0.1% overhead - ⚠️ **Bitmap**: ~1-2% overhead ### 3.4 Simplicity **Free List**: Minimal code complexity ```c // Entire allocation logic: 3 lines void* alloc(Page* p) { Block* b = p->local_free; if (!b) return NULL; p->local_free = b->next; return b; } ``` **Bitmap**: More complex ```c // Allocation logic: 15+ lines void* alloc(TinySlab* s) { if (s->free_count == 0) return NULL; for (int i = 0; i < 16; i++) { if (s->bitmap[i] == 0) continue; // Skip empty words int bit_idx = __builtin_ctzll(s->bitmap[i]); s->bitmap[i] &= ~(1ULL << bit_idx); s->free_count--; return s->base + (i * 64 + bit_idx) * s->block_size; } return NULL; // Should never reach } ``` **Impact**: - ✅ **Free List**: Easier to understand, maintain, optimize - ⚠️ **Bitmap**: More code paths, more potential for bugs --- ## 4. Real-World Use Cases ### When Bitmap Wins **Scenario 1: Memory Debugging Tools** ```c // AddressSanitizer, Valgrind integration // Can mark freed blocks immediately void free_with_asan(TinySlab* s, void* ptr) { int idx = block_index(s, ptr); s->bitmap[idx / 64] |= (1ULL << (idx % 64)); __asan_poison_memory_region(ptr, block_size); // Safe! } ``` **Scenario 2: Research Allocators** ```c // Experimenting with allocation strategies // e.g., hakmem's ELO learning, call-site profiling void alloc_with_learning(TinySlab* s, void* site) { int idx = find_best_block_for_site(s, site); // Bitmap enables this // Can implement custom heuristics } ``` **Scenario 3: Diagnostic Dashboards** ```c // Real-time monitoring (e.g., allocator profiler UI) // Can scan all slabs without stopping allocation void update_dashboard() { for_each_slab(slab) { dashboard_update(slab->free_count, slab->bitmap); // No disruption to allocation threads } } ``` ### When Free List Wins **Scenario 1: Production Web Servers** ```c // mimalloc in WebKit, nginx, etc. // Every nanosecond counts (millions of allocations/sec) // Diagnostics = rare, speed = always ``` **Scenario 2: Latency-Sensitive Systems** ```c // HFT, real-time systems // Predictable 1-2ns allocation critical // Bitmap's 5-6ns too variable ``` **Scenario 3: Memory-Constrained Embedded** ```c // 1.7% bitmap overhead unacceptable // Every byte matters ``` --- ## 5. Quantitative Comparison | Metric | Bitmap | Free List | Winner | |--------|--------|-----------|--------| | **Performance** | | Allocation latency | 5-6 ns | 1-2 ns | Free List (3-4ns faster) | | Cache efficiency | 2 cache lines | 1 cache line | Free List | | Branch mispredicts | 1-2 per alloc | 0-1 per alloc | Free List | | **Memory** | | Metadata overhead | 1-2% | ~0.1% | Free List | | Block size impact | +128B per slab | +8B per page | Free List | | **Diagnostics** | | Observability | Full state visible | Opaque (count only) | Bitmap | | Debugging | Easy (zeroed free) | Hard (pointer corruption) | Bitmap | | Statistics | O(1) queries | O(n) traversal | Bitmap | | Profiling | Per-block tracking | External hash table | Bitmap | | **Flexibility** | | Allocation policy | Pluggable (first-fit, best-fit, etc.) | LIFO only | Bitmap | | Research | Easy experimentation | Fixed design | Bitmap | | Monitoring | Non-intrusive scanning | Requires external counters | Bitmap | | **Safety** | | Use-after-free detection | Good (zeroed memory) | Poor (pointer corruption) | Bitmap | | ASAN/Valgrind integration | Excellent | Limited | Bitmap | | Cross-thread validation | Easy | Requires external state | Bitmap | | **Complexity** | | Code size | ~100 lines | ~20 lines | Free List | | Maintainability | Moderate | High | Free List | | Optimization potential | Limited (bitmap scan) | High (2 pointers) | Free List | **Overall**: - **Production speed**: Free List wins (3-4ns faster, simpler) - **Research/diagnostics**: Bitmap wins (visibility, flexibility, safety) --- ## 6. Hybrid Approaches ### Option 1: Dual-Mode Allocator ```c #ifdef HAKMEM_DIAGNOSTIC_MODE // Bitmap mode (slow but visible) void* alloc() { return alloc_bitmap(); } #else // Free list mode (fast production) void* alloc() { return alloc_freelist(); } #endif ``` **Pros**: Best of both worlds **Cons**: Maintenance burden (two code paths) ### Option 2: Shadow Bitmap ```c // Fast path: Free list Block* b = page->local_free; page->local_free = b->next; // Diagnostic path: Update shadow bitmap (async) if (unlikely(diagnostic_enabled)) { shadow_bitmap_record(page, b); // Non-blocking queue } ``` **Pros**: Fast path unaffected, diagnostics available **Cons**: Shadow state may lag, memory overhead ### Option 3: Adaptive Strategy ```c // Use bitmap for slabs with high churn (diagnostic value) // Use free list for stable slabs (performance critical) if (slab->churn_rate > THRESHOLD) { use_bitmap_mode(slab); } else { use_freelist_mode(slab); } ``` **Pros**: Dynamic optimization **Cons**: Complex, runtime overhead --- ## 7. Recommendations for hakmem ### Context: hakmem's Goals (from ANALYSIS_SUMMARY.md) > **hakmem's Philosophy** (research PoC): > - "Flexible architecture: research platform for learning" > - "Trade performance for visibility (ownership tracking, per-class stats)" > - "Novel features: call-site profiling, ELO learning, evolution tracking" ### Recommendation: **Keep Bitmap for Tiny Pool** **Reasons**: 1. ✅ **Research value**: hakmem's ELO learning, call-site profiling **require** per-block tracking 2. ✅ **Diagnostics**: Ownership tracking, CDA decision-making benefit from bitmap visibility 3. ✅ **Trade-off is acceptable**: 5-6ns overhead is worth the flexibility for a research allocator 4. ⚠️ **But optimize around it**: Remove statistics overhead, simplify hot path (my original P1-P2) ### Alternative: **Adopt Free List for Tiny Pool** **Reasons**: 1. ✅ **Performance**: Closes 3-4ns of the 69ns gap 2. ✅ **Proven**: mimalloc's design is battle-tested 3. ✅ **Simplicity**: Easier to maintain, optimize 4. ⚠️ **But lose research features**: Must find alternative ways to track per-block metadata ### Compromise: **Hybrid Approach** **Proposal**: ```c // Fast path: Free list (mimalloc-style) void* tiny_alloc_fast(Page* p) { Block* b = p->local_free; if (likely(b)) { p->local_free = b->next; return b; } return tiny_alloc_slow(p); } // Diagnostic mode: Enable shadow bitmap #ifdef HAKMEM_DIAGNOSTIC_MODE void* tiny_alloc_slow(Page* p) { void* ptr = refill_from_partial(p); diagnostic_record_alloc(p, ptr); // Async, non-blocking return ptr; } #endif ``` **Benefits**: - Fast path: 1-2ns (mimalloc speed) - Diagnostic mode: Optional bitmap tracking (research features) - Production mode: Zero overhead --- ## 8. Decision Matrix | Priority | Bitmap | Free List | Hybrid | |----------|--------|-----------|--------| | **Speed is #1 goal** | ❌ | ✅ | ✅ | | **Research/diagnostics #1** | ✅ | ❌ | ⚠️ (complex) | | **Simplicity #1** | ⚠️ | ✅ | ❌ | | **Memory efficiency #1** | ❌ | ✅ | ⚠️ | | **Flexibility #1** | ✅ | ❌ | ✅ | **For hakmem specifically**: - If **goal = beat mimalloc**: Free List - If **goal = research platform**: Bitmap - If **goal = both**: Hybrid (complex but feasible) --- ## 9. Conclusion ### The Fundamental Tradeoff **Bitmap = Observatory, Free List = Race Car** - **Bitmap**: Sacrifices 3-4ns for complete visibility and flexibility - **Free List**: Sacrifices observability for raw speed ### For hakmem's Context Based on ANALYSIS_SUMMARY.md, hakmem's goals include: - "Call-site profiling" → **Requires per-block tracking** → Bitmap advantage - "ELO learning" → **Requires allocation history** → Bitmap advantage - "Evolution tracking" → **Requires observability** → Bitmap advantage **Verdict**: **Bitmap is the right choice for hakmem's research goals** ### But Optimize Around It Instead of abandoning bitmap: 1. ✅ **Remove statistics overhead** (ChatGPT Pro's P1) → +10ns 2. ✅ **Simplify hot path** (my original P1-P2) → +15ns 3. ✅ **Keep bitmap** → Preserve research features **Expected**: 83ns → 58-65ns (still 4x slower than mimalloc, but research features intact) --- **Last Updated**: 2025-10-26 **Status**: Analysis complete **Next**: Decide strategy based on project priorities