hakmem/docs/status/PHASE2A_SUPERSLAB_DYNAMIC_EXPANSION.md

# Phase 2a: SuperSlab Dynamic Expansion Implementation

**Date**: 2025-11-08
**Priority**: 🔴 CRITICAL - BLOCKING 100% stability
**Estimated Effort**: 7-10 days
**Status**: Ready for implementation

---

## Executive Summary

**Problem**: SuperSlab uses fixed 32-slab array → OOM under 4T high-contention
**Solution**: Implement mimalloc-style chunk linking → unlimited slab expansion
**Expected Result**: 50% → 100% stability (20/20 success rate)

---

## Current Architecture (BROKEN)

### File: `core/superslab/superslab_types.h:82`

```c
typedef struct SuperSlab {
    Slab slabs[SLABS_PER_SUPERSLAB_MAX];  // ← FIXED 32 slabs! Cannot grow!
    uint32_t bitmap;                       // ← 32 bits = 32 slabs max
    size_t total_active_blocks;
    int class_idx;
    // ...
} SuperSlab;
```

### Why This Fails

**4T high-contention scenario**:
```
Thread 1: allocates from slabs[0-7]   → bitmap bits 0-7 = 0
Thread 2: allocates from slabs[8-15]  → bitmap bits 8-15 = 0
Thread 3: allocates from slabs[16-23] → bitmap bits 16-23 = 0
Thread 4: allocates from slabs[24-31] → bitmap bits 24-31 = 0

→ bitmap = 0x00000000 (all slabs busy)
→ superslab_refill() returns NULL
→ OOM → malloc fallback (now disabled) → CRASH
```

**Evidence from logs**:
```
[DEBUG] superslab_refill returned NULL (OOM) detail:
  class=4 prev_ss=(nil) active=0 bitmap=0x00000000
  prev_meta=(nil) used=0 cap=0 slab_idx=0
  reused_freelist=0 free_idx=-2 errno=12
```

---

## Proposed Architecture (mimalloc-style)

### Design Pattern: Linked Chunks

**Inspiration**: mimalloc uses linked segments, jemalloc uses linked chunks

```c
typedef struct SuperSlabChunk {
    Slab slabs[32];                    // Initial 32 slabs per chunk
    struct SuperSlabChunk* next;       // ← Link to next chunk
    uint32_t bitmap;                   // 32 bits for this chunk's slabs
    size_t total_active_blocks;        // Active blocks in this chunk
    int class_idx;
} SuperSlabChunk;

typedef struct SuperSlabHead {
    SuperSlabChunk* first_chunk;       // Head of chunk list
    SuperSlabChunk* current_chunk;     // Current chunk for allocation
    size_t total_chunks;               // Total chunks allocated
    int class_idx;
    pthread_mutex_t lock;              // Protect chunk list
} SuperSlabHead;
```

### Allocation Flow

```
1. superslab_refill() called
   ↓
2. Try current_chunk
   ↓
3. bitmap == 0x00000000? (all slabs busy)
   ↓ YES
4. Try current_chunk->next
   ↓ NULL (no next chunk)
5. Allocate new chunk via mmap
   ↓
6. current_chunk->next = new_chunk
   ↓
7. current_chunk = new_chunk
   ↓
8. Refill from new_chunk
   ↓ SUCCESS
9. Return blocks to caller
```

### Visual Representation

```
Before (BROKEN):
┌─────────────────────────────────┐
│ SuperSlab (2MB)                 │
│ slabs[32] ← FIXED!              │
│ [0][1][2]...[31]                │
│ bitmap = 0x00000000 → OOM 💥    │
└─────────────────────────────────┘

After (DYNAMIC):
┌─────────────────────────────────┐
│ SuperSlabHead                   │
│ ├─ first_chunk ──────────────┐  │
│ └─ current_chunk ────────┐   │  │
└──────────────────────────│───│──┘
                           │   │
                           ▼   ▼
                    ┌────────────────┐      ┌────────────────┐
                    │ Chunk 1 (2MB)  │ ───► │ Chunk 2 (2MB)  │ ───► ...
                    │ slabs[32]      │ next │ slabs[32]      │ next
                    │ bitmap=0x0000  │      │ bitmap=0xFFFF  │
                    └────────────────┘      └────────────────┘
                     (all busy)              (has free slabs!)
```

---

## Implementation Tasks

### Task 1: Define New Data Structures (2-3 hours)

**File**: `core/superslab/superslab_types.h`

**Changes**:

1. **Rename existing `SuperSlab` → `SuperSlabChunk`**:
```c
typedef struct SuperSlabChunk {
    Slab slabs[32];                    // Keep 32 slabs per chunk
    struct SuperSlabChunk* next;       // NEW: Link to next chunk
    uint32_t bitmap;
    size_t total_active_blocks;
    int class_idx;

    // Existing fields...
} SuperSlabChunk;
```

2. **Add new `SuperSlabHead`**:
```c
typedef struct SuperSlabHead {
    SuperSlabChunk* first_chunk;       // Head of chunk list
    SuperSlabChunk* current_chunk;     // Current chunk for fast allocation
    size_t total_chunks;               // Total chunks in list
    int class_idx;

    // Thread safety
    pthread_mutex_t expansion_lock;    // Protect chunk list expansion
} SuperSlabHead;
```

3. **Update global registry**:
```c
// Before:
extern SuperSlab* g_superslab_registry[MAX_SUPERSLABS];

// After:
extern SuperSlabHead* g_superslab_heads[TINY_NUM_CLASSES];
```

---

### Task 2: Implement Chunk Allocation (3-4 hours)

**File**: `core/superslab/superslab_alloc.c` (new file or add to existing)

**Function 1: Allocate new chunk**:
```c
// Allocate a new SuperSlabChunk via mmap
static SuperSlabChunk* alloc_new_chunk(int class_idx) {
    size_t chunk_size = SUPERSLAB_SIZE;  // 2MB

    // mmap new chunk
    void* raw = mmap(NULL, chunk_size, PROT_READ | PROT_WRITE,
                     MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
    if (raw == MAP_FAILED) {
        fprintf(stderr, "[HAKMEM] CRITICAL: Failed to mmap new SuperSlabChunk for class %d (errno=%d)\n",
                class_idx, errno);
        return NULL;
    }

    // Initialize chunk structure
    SuperSlabChunk* chunk = (SuperSlabChunk*)raw;
    chunk->next = NULL;
    chunk->bitmap = 0xFFFFFFFF;  // All 32 slabs available
    chunk->total_active_blocks = 0;
    chunk->class_idx = class_idx;

    // Initialize slabs
    size_t block_size = class_to_size(class_idx);
    init_slabs_in_chunk(chunk, block_size);

    return chunk;
}
```

**Function 2: Link new chunk to head**:
```c
// Expand SuperSlabHead by linking new chunk
static int expand_superslab_head(SuperSlabHead* head) {
    if (!head) return -1;

    // Allocate new chunk
    SuperSlabChunk* new_chunk = alloc_new_chunk(head->class_idx);
    if (!new_chunk) {
        return -1;  // True OOM (system out of memory)
    }

    // Thread-safe linking
    pthread_mutex_lock(&head->expansion_lock);

    if (head->current_chunk) {
        // Link at end of list
        SuperSlabChunk* tail = head->current_chunk;
        while (tail->next) {
            tail = tail->next;
        }
        tail->next = new_chunk;
    } else {
        // First chunk
        head->first_chunk = new_chunk;
    }

    // Update current chunk to new chunk
    head->current_chunk = new_chunk;
    head->total_chunks++;

    pthread_mutex_unlock(&head->expansion_lock);

    fprintf(stderr, "[HAKMEM] Expanded SuperSlabHead for class %d: %zu chunks now\n",
            head->class_idx, head->total_chunks);

    return 0;
}
```

---

### Task 3: Update Refill Logic (4-5 hours)

**File**: `core/tiny_superslab_alloc.inc.h` or wherever `superslab_refill()` is

**Modify `superslab_refill()` to try all chunks**:

```c
// Before (BROKEN):
void* superslab_refill(int class_idx, int count) {
    SuperSlab* ss = get_superslab_for_class(class_idx);
    if (!ss) return NULL;

    if (ss->bitmap == 0x00000000) {
        // All slabs busy → OOM!
        return NULL;  // ← CRASH HERE
    }

    // Try to refill from this SuperSlab
    return refill_from_superslab(ss, count);
}

// After (DYNAMIC):
void* superslab_refill(int class_idx, int count) {
    SuperSlabHead* head = g_superslab_heads[class_idx];
    if (!head) {
        // Initialize head for this class (first time)
        head = init_superslab_head(class_idx);
        if (!head) return NULL;
        g_superslab_heads[class_idx] = head;
    }

    SuperSlabChunk* chunk = head->current_chunk;

    // Try current chunk first (fast path)
    if (chunk && chunk->bitmap != 0x00000000) {
        return refill_from_chunk(chunk, count);
    }

    // Current chunk exhausted, try to expand
    fprintf(stderr, "[DEBUG] SuperSlabChunk exhausted for class %d (bitmap=0x00000000), expanding...\n", class_idx);

    if (expand_superslab_head(head) < 0) {
        fprintf(stderr, "[HAKMEM] CRITICAL: Failed to expand SuperSlabHead for class %d\n", class_idx);
        return NULL;  // True system OOM
    }

    // Retry refill from new chunk
    chunk = head->current_chunk;
    if (!chunk || chunk->bitmap == 0x00000000) {
        fprintf(stderr, "[HAKMEM] CRITICAL: New chunk still has no free slabs for class %d\n", class_idx);
        return NULL;
    }

    return refill_from_chunk(chunk, count);
}
```

**Helper function**:
```c
// Refill from a specific chunk
static void* refill_from_chunk(SuperSlabChunk* chunk, int count) {
    if (!chunk || chunk->bitmap == 0x00000000) return NULL;

    // Use existing P0 optimization (ctz-based slab selection)
    uint32_t mask = chunk->bitmap;
    while (mask && count > 0) {
        int slab_idx = __builtin_ctz(mask);
        mask &= ~(1u << slab_idx);

        Slab* slab = &chunk->slabs[slab_idx];
        // Try to acquire slab and refill
        // ... existing refill logic
    }

    return /* refilled blocks */;
}
```

---

### Task 4: Update Initialization (2-3 hours)

**File**: `core/hakmem_tiny.c` or initialization code

**Modify `hak_tiny_init()`**:

```c
void hak_tiny_init(void) {
    // Initialize SuperSlabHead for each class
    for (int class_idx = 0; class_idx < TINY_NUM_CLASSES; class_idx++) {
        SuperSlabHead* head = init_superslab_head(class_idx);
        if (!head) {
            fprintf(stderr, "[HAKMEM] CRITICAL: Failed to initialize SuperSlabHead for class %d\n", class_idx);
            abort();
        }
        g_superslab_heads[class_idx] = head;
    }
}

// Initialize SuperSlabHead with initial chunk(s)
static SuperSlabHead* init_superslab_head(int class_idx) {
    SuperSlabHead* head = calloc(1, sizeof(SuperSlabHead));
    if (!head) return NULL;

    head->class_idx = class_idx;
    head->total_chunks = 0;
    pthread_mutex_init(&head->expansion_lock, NULL);

    // Allocate initial chunk(s)
    int initial_chunks = 1;

    // Hot classes (1, 4, 6) get 2 initial chunks
    if (class_idx == 1 || class_idx == 4 || class_idx == 6) {
        initial_chunks = 2;
    }

    for (int i = 0; i < initial_chunks; i++) {
        if (expand_superslab_head(head) < 0) {
            fprintf(stderr, "[HAKMEM] CRITICAL: Failed to allocate initial chunk %d for class %d\n", i, class_idx);
            free(head);
            return NULL;
        }
    }

    return head;
}
```

---

### Task 5: Update Free Path (2-3 hours)

**File**: `core/hakmem_tiny_free.inc` or free path code

**Modify free to find correct chunk**:

```c
void hak_tiny_free(void* ptr) {
    if (!ptr) return;

    // Determine class_idx from header or registry
    int class_idx = get_class_idx_for_ptr(ptr);
    if (class_idx < 0) {
        fprintf(stderr, "[HAKMEM] Invalid free: ptr=%p not in any SuperSlab\n", ptr);
        return;
    }

    // Find which chunk this ptr belongs to
    SuperSlabHead* head = g_superslab_heads[class_idx];
    if (!head) {
        fprintf(stderr, "[HAKMEM] Invalid free: no SuperSlabHead for class %d\n", class_idx);
        return;
    }

    SuperSlabChunk* chunk = head->first_chunk;
    while (chunk) {
        // Check if ptr is within this chunk's memory range
        uintptr_t chunk_start = (uintptr_t)chunk;
        uintptr_t chunk_end = chunk_start + SUPERSLAB_SIZE;
        uintptr_t ptr_addr = (uintptr_t)ptr;

        if (ptr_addr >= chunk_start && ptr_addr < chunk_end) {
            // Found the chunk, free to it
            free_to_chunk(chunk, ptr);
            return;
        }

        chunk = chunk->next;
    }

    fprintf(stderr, "[HAKMEM] Invalid free: ptr=%p not found in any chunk for class %d\n", ptr, class_idx);
}
```

---

### Task 6: Update Registry (3-4 hours)

**File**: Registry code (wherever SuperSlab registry is managed)

**Replace flat registry with per-class heads**:

```c
// Before:
SuperSlab* g_superslab_registry[MAX_SUPERSLABS];

// After:
SuperSlabHead* g_superslab_heads[TINY_NUM_CLASSES];
```

**Update registry lookup**:

```c
// Before:
SuperSlab* find_superslab_for_ptr(void* ptr) {
    for (int i = 0; i < MAX_SUPERSLABS; i++) {
        SuperSlab* ss = g_superslab_registry[i];
        if (ptr_in_range(ptr, ss)) return ss;
    }
    return NULL;
}

// After:
SuperSlabChunk* find_chunk_for_ptr(void* ptr, int* out_class_idx) {
    for (int class_idx = 0; class_idx < TINY_NUM_CLASSES; class_idx++) {
        SuperSlabHead* head = g_superslab_heads[class_idx];
        if (!head) continue;

        SuperSlabChunk* chunk = head->first_chunk;
        while (chunk) {
            if (ptr_in_chunk_range(ptr, chunk)) {
                if (out_class_idx) *out_class_idx = class_idx;
                return chunk;
            }
            chunk = chunk->next;
        }
    }
    return NULL;
}
```

---

## Testing Strategy

### Test 1: Build Verification

```bash
# Rebuild with new architecture
make clean
make HEADER_CLASSIDX=1 AGGRESSIVE_INLINE=1 PREWARM_TLS=1 larson_hakmem

# Check for compilation errors
echo $?  # Should be 0
```

### Test 2: Single-Thread Stability

```bash
# Should work perfectly (no change in behavior)
./larson_hakmem 1 1 128 1024 1 12345 1

# Expected: 2.68-2.71M ops/s (no regression)
```

### Test 3: 4T High-Contention (CRITICAL)

```bash
# Run 20 times, count successes
success=0
for i in {1..20}; do
  echo "=== Run $i ==="
  env HAKMEM_TINY_USE_SUPERSLAB=1 HAKMEM_TINY_MEM_DIET=0 \
    ./larson_hakmem 10 8 128 1024 1 12345 4 2>&1 | tee phase2a_run_$i.log

  if grep -q "Throughput" phase2a_run_$i.log; then
    ((success++))
    echo "✓ Success ($success/20)"
  else
    echo "✗ Failed"
  fi
done

echo "Final: $success/20 success rate"

# TARGET: 20/20 (100%)
# Current baseline: 10/20 (50%)
```

### Test 4: Chunk Expansion Verification

```bash
# Enable debug logging
HAKMEM_LOG=1 ./larson_hakmem 10 8 128 1024 1 12345 4 2>&1 | grep "Expanded SuperSlabHead"

# Should see:
# [HAKMEM] Expanded SuperSlabHead for class 4: 2 chunks now
# [HAKMEM] Expanded SuperSlabHead for class 4: 3 chunks now
# ...
```

### Test 5: Memory Leak Check

```bash
# Valgrind test (may be slow)
valgrind --leak-check=full --show-leak-kinds=all \
  ./larson_hakmem 1 1 128 1024 1 12345 1 2>&1 | tee valgrind_phase2a.log

# Check for leaks
grep "definitely lost" valgrind_phase2a.log
# Should be 0 bytes
```

---

## Success Criteria

✅ **Compilation**: No errors, no warnings
✅ **Single-thread**: 2.68-2.71M ops/s (no regression)
✅ **4T stability**: **20/20 (100%)** ← KEY METRIC
✅ **Chunk expansion**: Logs show multiple chunks allocated
✅ **No memory leaks**: Valgrind clean
✅ **Performance**: 4T throughput ≥981K ops/s (when it works)

---

## Deliverable

**Report file**: `/mnt/workdisk/public_share/hakmem/PHASE2A_IMPLEMENTATION_REPORT.md`

**Required sections**:
1. **Architecture changes** (SuperSlab → SuperSlabChunk + SuperSlabHead)
2. **Code diffs** (all modified files)
3. **Test results** (20/20 stability test)
4. **Performance comparison** (before/after)
5. **Chunk expansion behavior** (how many chunks allocated under load)
6. **Memory usage** (overhead per chunk, total memory)
7. **Production readiness** (YES/NO verdict)

---

## Files to Create/Modify

**New files**:
1. `core/superslab/superslab_alloc.c` - Chunk allocation functions

**Modified files**:
1. `core/superslab/superslab_types.h` - SuperSlabChunk + SuperSlabHead
2. `core/tiny_superslab_alloc.inc.h` - Refill logic with expansion
3. `core/hakmem_tiny_free.inc` - Free path with chunk lookup
4. `core/hakmem_tiny.c` - Initialization with SuperSlabHead
5. Registry code - Update to per-class heads

**Estimated LOC**: 500-800 lines (new code + modifications)

---

## Risk Mitigation

**Risk 1: Performance regression**
- Mitigation: Keep fast path (current_chunk) unchanged
- Single-chunk case should be identical to before

**Risk 2: Thread safety issues**
- Mitigation: Use expansion_lock only for chunk linking
- Slab-level atomics unchanged

**Risk 3: Memory overhead**
- Each chunk: 2MB (same as before)
- SuperSlabHead: ~64 bytes per class
- Total overhead: negligible

**Risk 4: Complexity**
- Mitigation: Follow mimalloc pattern (proven design)
- Keep chunk size fixed (2MB) for simplicity

---

**Let's implement Phase 2a and achieve 100% stability! 🚀**
Wrap debug fprintf in !HAKMEM_BUILD_RELEASE guards (Release build optimization) ## Changes ### 1. core/page_arena.c - Removed init failure message (lines 25-27) - error is handled by returning early - All other fprintf statements already wrapped in existing #if !HAKMEM_BUILD_RELEASE blocks ### 2. core/hakmem.c - Wrapped SIGSEGV handler init message (line 72) - CRITICAL: Kept SIGSEGV/SIGBUS/SIGABRT error messages (lines 62-64) - production needs crash logs ### 3. core/hakmem_shared_pool.c - Wrapped all debug fprintf statements in #if !HAKMEM_BUILD_RELEASE: - Node pool exhaustion warning (line 252) - SP_META_CAPACITY_ERROR warning (line 421) - SP_FIX_GEOMETRY debug logging (line 745) - SP_ACQUIRE_STAGE0.5_EMPTY debug logging (line 865) - SP_ACQUIRE_STAGE0_L0 debug logging (line 803) - SP_ACQUIRE_STAGE1_LOCKFREE debug logging (line 922) - SP_ACQUIRE_STAGE2_LOCKFREE debug logging (line 996) - SP_ACQUIRE_STAGE3 debug logging (line 1116) - SP_SLOT_RELEASE debug logging (line 1245) - SP_SLOT_FREELIST_LOCKFREE debug logging (line 1305) - SP_SLOT_COMPLETELY_EMPTY debug logging (line 1316) - Fixed lock_stats_init() for release builds (lines 60-65) - ensure g_lock_stats_enabled is initialized ## Performance Validation Before: 51M ops/s (with debug fprintf overhead) After: 49.1M ops/s (consistent performance, fprintf removed from hot paths) ## Build & Test ```bash ./build.sh larson_hakmem ./out/release/larson_hakmem 1 5 1 1000 100 10000 42 # Result: 49.1M ops/s ``` Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> 2025-11-26 13:14:18 +09:00			`# Phase 2a: SuperSlab Dynamic Expansion Implementation`

			`Date: 2025-11-08`
			`Priority: 🔴 CRITICAL - BLOCKING 100% stability`
			`Estimated Effort: 7-10 days`
			`Status: Ready for implementation`

			`---`

			`## Executive Summary`

			`Problem: SuperSlab uses fixed 32-slab array → OOM under 4T high-contention`
			`Solution: Implement mimalloc-style chunk linking → unlimited slab expansion`
			`Expected Result: 50% → 100% stability (20/20 success rate)`

			`---`

			`## Current Architecture (BROKEN)`

			### File: `core/superslab/superslab_types.h:82`

			```c
			`typedef struct SuperSlab {`
			`Slab slabs[SLABS_PER_SUPERSLAB_MAX]; // ← FIXED 32 slabs! Cannot grow!`
			`uint32_t bitmap; // ← 32 bits = 32 slabs max`
			`size_t total_active_blocks;`
			`int class_idx;`
			`// ...`
			`} SuperSlab;`
			```

			`### Why This Fails`

			`4T high-contention scenario:`
			```
			`Thread 1: allocates from slabs[0-7] → bitmap bits 0-7 = 0`
			`Thread 2: allocates from slabs[8-15] → bitmap bits 8-15 = 0`
			`Thread 3: allocates from slabs[16-23] → bitmap bits 16-23 = 0`
			`Thread 4: allocates from slabs[24-31] → bitmap bits 24-31 = 0`

			`→ bitmap = 0x00000000 (all slabs busy)`
			`→ superslab_refill() returns NULL`
			`→ OOM → malloc fallback (now disabled) → CRASH`
			```

			`Evidence from logs:`
			```
			`[DEBUG] superslab_refill returned NULL (OOM) detail:`
			`class=4 prev_ss=(nil) active=0 bitmap=0x00000000`
			`prev_meta=(nil) used=0 cap=0 slab_idx=0`
			`reused_freelist=0 free_idx=-2 errno=12`
			```

			`---`

			`## Proposed Architecture (mimalloc-style)`

			`### Design Pattern: Linked Chunks`

			`Inspiration: mimalloc uses linked segments, jemalloc uses linked chunks`

			```c
			`typedef struct SuperSlabChunk {`
			`Slab slabs[32]; // Initial 32 slabs per chunk`
			`struct SuperSlabChunk* next; // ← Link to next chunk`
			`uint32_t bitmap; // 32 bits for this chunk's slabs`
			`size_t total_active_blocks; // Active blocks in this chunk`
			`int class_idx;`
			`} SuperSlabChunk;`

			`typedef struct SuperSlabHead {`
			`SuperSlabChunk* first_chunk; // Head of chunk list`
			`SuperSlabChunk* current_chunk; // Current chunk for allocation`
			`size_t total_chunks; // Total chunks allocated`
			`int class_idx;`
			`pthread_mutex_t lock; // Protect chunk list`
			`} SuperSlabHead;`
			```

			`### Allocation Flow`

			```
			`1. superslab_refill() called`
			`↓`
			`2. Try current_chunk`
			`↓`
			`3. bitmap == 0x00000000? (all slabs busy)`
			`↓ YES`
			`4. Try current_chunk->next`
			`↓ NULL (no next chunk)`
			`5. Allocate new chunk via mmap`
			`↓`
			`6. current_chunk->next = new_chunk`
			`↓`
			`7. current_chunk = new_chunk`
			`↓`
			`8. Refill from new_chunk`
			`↓ SUCCESS`
			`9. Return blocks to caller`
			```

			`### Visual Representation`

			```
			`Before (BROKEN):`
			`┌─────────────────────────────────┐`
			`│ SuperSlab (2MB) │`
			`│ slabs[32] ← FIXED! │`
			`│ [0][1][2]...[31] │`
			`│ bitmap = 0x00000000 → OOM 💥 │`
			`└─────────────────────────────────┘`

			`After (DYNAMIC):`
			`┌─────────────────────────────────┐`
			`│ SuperSlabHead │`
			`│ ├─ first_chunk ──────────────┐ │`
			`│ └─ current_chunk ────────┐ │ │`
			`└──────────────────────────│───│──┘`
			`│ │`
			`▼ ▼`
			`┌────────────────┐ ┌────────────────┐`
			`│ Chunk 1 (2MB) │ ───► │ Chunk 2 (2MB) │ ───► ...`
			`│ slabs[32] │ next │ slabs[32] │ next`
			`│ bitmap=0x0000 │ │ bitmap=0xFFFF │`
			`└────────────────┘ └────────────────┘`
			`(all busy) (has free slabs!)`
			```

			`---`

			`## Implementation Tasks`

			`### Task 1: Define New Data Structures (2-3 hours)`

			File: `core/superslab/superslab_types.h`

			`Changes:`

			1. Rename existing `SuperSlab` → `SuperSlabChunk`:
			```c
			`typedef struct SuperSlabChunk {`
			`Slab slabs[32]; // Keep 32 slabs per chunk`
			`struct SuperSlabChunk* next; // NEW: Link to next chunk`
			`uint32_t bitmap;`
			`size_t total_active_blocks;`
			`int class_idx;`

			`// Existing fields...`
			`} SuperSlabChunk;`
			```

			2. Add new `SuperSlabHead`:
			```c
			`typedef struct SuperSlabHead {`
			`SuperSlabChunk* first_chunk; // Head of chunk list`
			`SuperSlabChunk* current_chunk; // Current chunk for fast allocation`
			`size_t total_chunks; // Total chunks in list`
			`int class_idx;`

			`// Thread safety`
			`pthread_mutex_t expansion_lock; // Protect chunk list expansion`
			`} SuperSlabHead;`
			```

			`3. Update global registry:`
			```c
			`// Before:`
			`extern SuperSlab* g_superslab_registry[MAX_SUPERSLABS];`

			`// After:`
			`extern SuperSlabHead* g_superslab_heads[TINY_NUM_CLASSES];`
			```

			`---`

			`### Task 2: Implement Chunk Allocation (3-4 hours)`

			File: `core/superslab/superslab_alloc.c` (new file or add to existing)

			`Function 1: Allocate new chunk:`
			```c
			`// Allocate a new SuperSlabChunk via mmap`
			`static SuperSlabChunk* alloc_new_chunk(int class_idx) {`
			`size_t chunk_size = SUPERSLAB_SIZE; // 2MB`

			`// mmap new chunk`
			`void* raw = mmap(NULL, chunk_size, PROT_READ \| PROT_WRITE,`
			`MAP_PRIVATE \| MAP_ANONYMOUS, -1, 0);`
			`if (raw == MAP_FAILED) {`
			`fprintf(stderr, "[HAKMEM] CRITICAL: Failed to mmap new SuperSlabChunk for class %d (errno=%d)\n",`
			`class_idx, errno);`
			`return NULL;`
			`}`

			`// Initialize chunk structure`
			`SuperSlabChunk* chunk = (SuperSlabChunk*)raw;`
			`chunk->next = NULL;`
			`chunk->bitmap = 0xFFFFFFFF; // All 32 slabs available`
			`chunk->total_active_blocks = 0;`
			`chunk->class_idx = class_idx;`

			`// Initialize slabs`
			`size_t block_size = class_to_size(class_idx);`
			`init_slabs_in_chunk(chunk, block_size);`

			`return chunk;`
			`}`
			```

			`Function 2: Link new chunk to head:`
			```c
			`// Expand SuperSlabHead by linking new chunk`
			`static int expand_superslab_head(SuperSlabHead* head) {`
			`if (!head) return -1;`

			`// Allocate new chunk`
			`SuperSlabChunk* new_chunk = alloc_new_chunk(head->class_idx);`
			`if (!new_chunk) {`
			`return -1; // True OOM (system out of memory)`
			`}`

			`// Thread-safe linking`
			`pthread_mutex_lock(&head->expansion_lock);`

			`if (head->current_chunk) {`
			`// Link at end of list`
			`SuperSlabChunk* tail = head->current_chunk;`
			`while (tail->next) {`
			`tail = tail->next;`
			`}`
			`tail->next = new_chunk;`
			`} else {`
			`// First chunk`
			`head->first_chunk = new_chunk;`
			`}`

			`// Update current chunk to new chunk`
			`head->current_chunk = new_chunk;`
			`head->total_chunks++;`

			`pthread_mutex_unlock(&head->expansion_lock);`

			`fprintf(stderr, "[HAKMEM] Expanded SuperSlabHead for class %d: %zu chunks now\n",`
			`head->class_idx, head->total_chunks);`

			`return 0;`
			`}`
			```

			`---`

			`### Task 3: Update Refill Logic (4-5 hours)`

			File: `core/tiny_superslab_alloc.inc.h` or wherever `superslab_refill()` is

			Modify `superslab_refill()` to try all chunks:

			```c
			`// Before (BROKEN):`
			`void* superslab_refill(int class_idx, int count) {`
			`SuperSlab* ss = get_superslab_for_class(class_idx);`
			`if (!ss) return NULL;`

			`if (ss->bitmap == 0x00000000) {`
			`// All slabs busy → OOM!`
			`return NULL; // ← CRASH HERE`
			`}`

			`// Try to refill from this SuperSlab`
			`return refill_from_superslab(ss, count);`
			`}`

			`// After (DYNAMIC):`
			`void* superslab_refill(int class_idx, int count) {`
			`SuperSlabHead* head = g_superslab_heads[class_idx];`
			`if (!head) {`
			`// Initialize head for this class (first time)`
			`head = init_superslab_head(class_idx);`
			`if (!head) return NULL;`
			`g_superslab_heads[class_idx] = head;`
			`}`

			`SuperSlabChunk* chunk = head->current_chunk;`

			`// Try current chunk first (fast path)`
			`if (chunk && chunk->bitmap != 0x00000000) {`
			`return refill_from_chunk(chunk, count);`
			`}`

			`// Current chunk exhausted, try to expand`
			`fprintf(stderr, "[DEBUG] SuperSlabChunk exhausted for class %d (bitmap=0x00000000), expanding...\n", class_idx);`

			`if (expand_superslab_head(head) < 0) {`
			`fprintf(stderr, "[HAKMEM] CRITICAL: Failed to expand SuperSlabHead for class %d\n", class_idx);`
			`return NULL; // True system OOM`
			`}`

			`// Retry refill from new chunk`
			`chunk = head->current_chunk;`
			`if (!chunk \|\| chunk->bitmap == 0x00000000) {`
			`fprintf(stderr, "[HAKMEM] CRITICAL: New chunk still has no free slabs for class %d\n", class_idx);`
			`return NULL;`
			`}`

			`return refill_from_chunk(chunk, count);`
			`}`
			```

			`Helper function:`
			```c
			`// Refill from a specific chunk`
			`static void* refill_from_chunk(SuperSlabChunk* chunk, int count) {`
			`if (!chunk \|\| chunk->bitmap == 0x00000000) return NULL;`

			`// Use existing P0 optimization (ctz-based slab selection)`
			`uint32_t mask = chunk->bitmap;`
			`while (mask && count > 0) {`
			`int slab_idx = __builtin_ctz(mask);`
			`mask &= ~(1u << slab_idx);`

			`Slab* slab = &chunk->slabs[slab_idx];`
			`// Try to acquire slab and refill`
			`// ... existing refill logic`
			`}`

			`return /* refilled blocks */;`
			`}`
			```

			`---`

			`### Task 4: Update Initialization (2-3 hours)`

			File: `core/hakmem_tiny.c` or initialization code

			Modify `hak_tiny_init()`:

			```c
			`void hak_tiny_init(void) {`
			`// Initialize SuperSlabHead for each class`
			`for (int class_idx = 0; class_idx < TINY_NUM_CLASSES; class_idx++) {`
			`SuperSlabHead* head = init_superslab_head(class_idx);`
			`if (!head) {`
			`fprintf(stderr, "[HAKMEM] CRITICAL: Failed to initialize SuperSlabHead for class %d\n", class_idx);`
			`abort();`
			`}`
			`g_superslab_heads[class_idx] = head;`
			`}`
			`}`

			`// Initialize SuperSlabHead with initial chunk(s)`
			`static SuperSlabHead* init_superslab_head(int class_idx) {`
			`SuperSlabHead* head = calloc(1, sizeof(SuperSlabHead));`
			`if (!head) return NULL;`

			`head->class_idx = class_idx;`
			`head->total_chunks = 0;`
			`pthread_mutex_init(&head->expansion_lock, NULL);`

			`// Allocate initial chunk(s)`
			`int initial_chunks = 1;`

			`// Hot classes (1, 4, 6) get 2 initial chunks`
			`if (class_idx == 1 \|\| class_idx == 4 \|\| class_idx == 6) {`
			`initial_chunks = 2;`
			`}`

			`for (int i = 0; i < initial_chunks; i++) {`
			`if (expand_superslab_head(head) < 0) {`
			`fprintf(stderr, "[HAKMEM] CRITICAL: Failed to allocate initial chunk %d for class %d\n", i, class_idx);`
			`free(head);`
			`return NULL;`
			`}`
			`}`

			`return head;`
			`}`
			```

			`---`

			`### Task 5: Update Free Path (2-3 hours)`

			File: `core/hakmem_tiny_free.inc` or free path code

			`Modify free to find correct chunk:`

			```c
			`void hak_tiny_free(void* ptr) {`
			`if (!ptr) return;`

			`// Determine class_idx from header or registry`
			`int class_idx = get_class_idx_for_ptr(ptr);`
			`if (class_idx < 0) {`
			`fprintf(stderr, "[HAKMEM] Invalid free: ptr=%p not in any SuperSlab\n", ptr);`
			`return;`
			`}`

			`// Find which chunk this ptr belongs to`
			`SuperSlabHead* head = g_superslab_heads[class_idx];`
			`if (!head) {`
			`fprintf(stderr, "[HAKMEM] Invalid free: no SuperSlabHead for class %d\n", class_idx);`
			`return;`
			`}`

			`SuperSlabChunk* chunk = head->first_chunk;`
			`while (chunk) {`
			`// Check if ptr is within this chunk's memory range`
			`uintptr_t chunk_start = (uintptr_t)chunk;`
			`uintptr_t chunk_end = chunk_start + SUPERSLAB_SIZE;`
			`uintptr_t ptr_addr = (uintptr_t)ptr;`

			`if (ptr_addr >= chunk_start && ptr_addr < chunk_end) {`
			`// Found the chunk, free to it`
			`free_to_chunk(chunk, ptr);`
			`return;`
			`}`

			`chunk = chunk->next;`
			`}`

			`fprintf(stderr, "[HAKMEM] Invalid free: ptr=%p not found in any chunk for class %d\n", ptr, class_idx);`
			`}`
			```

			`---`

			`### Task 6: Update Registry (3-4 hours)`

			`File: Registry code (wherever SuperSlab registry is managed)`

			`Replace flat registry with per-class heads:`

			```c
			`// Before:`
			`SuperSlab* g_superslab_registry[MAX_SUPERSLABS];`

			`// After:`
			`SuperSlabHead* g_superslab_heads[TINY_NUM_CLASSES];`
			```

			`Update registry lookup:`

			```c
			`// Before:`
			`SuperSlab* find_superslab_for_ptr(void* ptr) {`
			`for (int i = 0; i < MAX_SUPERSLABS; i++) {`
			`SuperSlab* ss = g_superslab_registry[i];`
			`if (ptr_in_range(ptr, ss)) return ss;`
			`}`
			`return NULL;`
			`}`

			`// After:`
			`SuperSlabChunk* find_chunk_for_ptr(void* ptr, int* out_class_idx) {`
			`for (int class_idx = 0; class_idx < TINY_NUM_CLASSES; class_idx++) {`
			`SuperSlabHead* head = g_superslab_heads[class_idx];`
			`if (!head) continue;`

			`SuperSlabChunk* chunk = head->first_chunk;`
			`while (chunk) {`
			`if (ptr_in_chunk_range(ptr, chunk)) {`
			`if (out_class_idx) *out_class_idx = class_idx;`
			`return chunk;`
			`}`
			`chunk = chunk->next;`
			`}`
			`}`
			`return NULL;`
			`}`
			```

			`---`

			`## Testing Strategy`

			`### Test 1: Build Verification`

			```bash
			`# Rebuild with new architecture`
			`make clean`
			`make HEADER_CLASSIDX=1 AGGRESSIVE_INLINE=1 PREWARM_TLS=1 larson_hakmem`

			`# Check for compilation errors`
			`echo $? # Should be 0`
			```

			`### Test 2: Single-Thread Stability`

			```bash
			`# Should work perfectly (no change in behavior)`
			`./larson_hakmem 1 1 128 1024 1 12345 1`

			`# Expected: 2.68-2.71M ops/s (no regression)`
			```

			`### Test 3: 4T High-Contention (CRITICAL)`

			```bash
			`# Run 20 times, count successes`
			`success=0`
			`for i in {1..20}; do`
			`echo "=== Run $i ==="`
			`env HAKMEM_TINY_USE_SUPERSLAB=1 HAKMEM_TINY_MEM_DIET=0 \`
			`./larson_hakmem 10 8 128 1024 1 12345 4 2>&1 \| tee phase2a_run_$i.log`

			`if grep -q "Throughput" phase2a_run_$i.log; then`
			`((success++))`
			`echo "✓ Success ($success/20)"`
			`else`
			`echo "✗ Failed"`
			`fi`
			`done`

			`echo "Final: $success/20 success rate"`

			`# TARGET: 20/20 (100%)`
			`# Current baseline: 10/20 (50%)`
			```

			`### Test 4: Chunk Expansion Verification`

			```bash
			`# Enable debug logging`
			`HAKMEM_LOG=1 ./larson_hakmem 10 8 128 1024 1 12345 4 2>&1 \| grep "Expanded SuperSlabHead"`

			`# Should see:`
			`# [HAKMEM] Expanded SuperSlabHead for class 4: 2 chunks now`
			`# [HAKMEM] Expanded SuperSlabHead for class 4: 3 chunks now`
			`# ...`
			```

			`### Test 5: Memory Leak Check`

			```bash
			`# Valgrind test (may be slow)`
			`valgrind --leak-check=full --show-leak-kinds=all \`
			`./larson_hakmem 1 1 128 1024 1 12345 1 2>&1 \| tee valgrind_phase2a.log`

			`# Check for leaks`
			`grep "definitely lost" valgrind_phase2a.log`
			`# Should be 0 bytes`
			```

			`---`

			`## Success Criteria`

			`✅ Compilation: No errors, no warnings`
			`✅ Single-thread: 2.68-2.71M ops/s (no regression)`
			`✅ 4T stability: 20/20 (100%) ← KEY METRIC`
			`✅ Chunk expansion: Logs show multiple chunks allocated`
			`✅ No memory leaks: Valgrind clean`
			`✅ Performance: 4T throughput ≥981K ops/s (when it works)`

			`---`

			`## Deliverable`

			Report file: `/mnt/workdisk/public_share/hakmem/PHASE2A_IMPLEMENTATION_REPORT.md`

			`Required sections:`
			`1. Architecture changes (SuperSlab → SuperSlabChunk + SuperSlabHead)`
			`2. Code diffs (all modified files)`
			`3. Test results (20/20 stability test)`
			`4. Performance comparison (before/after)`
			`5. Chunk expansion behavior (how many chunks allocated under load)`
			`6. Memory usage (overhead per chunk, total memory)`
			`7. Production readiness (YES/NO verdict)`

			`---`

			`## Files to Create/Modify`

			`New files:`
			1. `core/superslab/superslab_alloc.c` - Chunk allocation functions

			`Modified files:`
			1. `core/superslab/superslab_types.h` - SuperSlabChunk + SuperSlabHead
			2. `core/tiny_superslab_alloc.inc.h` - Refill logic with expansion
			3. `core/hakmem_tiny_free.inc` - Free path with chunk lookup
			4. `core/hakmem_tiny.c` - Initialization with SuperSlabHead
			`5. Registry code - Update to per-class heads`

			`Estimated LOC: 500-800 lines (new code + modifications)`

			`---`

			`## Risk Mitigation`

			`Risk 1: Performance regression`
			`- Mitigation: Keep fast path (current_chunk) unchanged`
			`- Single-chunk case should be identical to before`

			`Risk 2: Thread safety issues`
			`- Mitigation: Use expansion_lock only for chunk linking`
			`- Slab-level atomics unchanged`

			`Risk 3: Memory overhead`
			`- Each chunk: 2MB (same as before)`
			`- SuperSlabHead: ~64 bytes per class`
			`- Total overhead: negligible`

			`Risk 4: Complexity`
			`- Mitigation: Follow mimalloc pattern (proven design)`
			`- Keep chunk size fixed (2MB) for simplicity`

			`---`

			`Let's implement Phase 2a and achieve 100% stability! 🚀`