# Phase 6.15: Quick Reference Card

**Full Details**: See [PHASE_6.15_PLAN.md](PHASE_6.15_PLAN.md) (1008 lines)

---

## 📊 **The Problem**

```
Current State: hakmem is THREAD-UNSAFE

1-thread:  15.1M ops/sec ✅ Excellent
4-thread:   3.3M ops/sec ❌ -78% collapse!

Root Cause: grep pthread_mutex *.c → 0 results
```

---

## 🎯 **The Solution (3 Steps)**

| Step | What | Time | Expected Result |
|------|------|------|----------------|
| **1** | Fix docs | 1h | Clarity on 67.9M issue |
| **2** | P0 Safety Lock | 2-3h | 4T = 13-15M (safe, no scaling) |
| **3** | TLS Performance | 8-10h | 4T = 15-20M (+381% proven) |

---

## 📋 **Step-by-Step Execution**

### **Day 1 Morning: Step 1 (1 hour)**

```bash
cd apps/experiments/hakmem-poc

# 1. Edit PHASE_6.14_COMPLETION_REPORT.md
# Add section explaining 67.9M measurement issue
# Add thread safety warning

# 2. Edit CURRENT_TASK.md
# Move Phase 6.14 to completed
# Add Phase 6.15 as current focus

# 3. Verify
grep "67.9M\|Thread Safety" PHASE_6.14_COMPLETION_REPORT.md
grep "Phase 6.15" CURRENT_TASK.md
```

---

### **Day 1 Afternoon: Step 2 - P0 Safety Lock (2-3 hours)**

#### **Implementation (30 min)**

**File**: `hakmem.c`

```c
// After line 22: Add pthread.h
#include <pthread.h>

// After line 58: Add global lock
static pthread_mutex_t g_hakmem_lock = PTHREAD_MUTEX_INITIALIZER;
#define HAKMEM_LOCK() pthread_mutex_lock(&g_hakmem_lock)
#define HAKMEM_UNLOCK() pthread_mutex_unlock(&g_hakmem_lock)

// Wrap hak_alloc_at (find ~line 300-400)
void* hak_alloc_at(size_t size, uintptr_t site_id) {
    HAKMEM_LOCK();
    void* ptr = hak_alloc_at_internal(size, site_id);  // Rename old function
    HAKMEM_UNLOCK();
    return ptr;
}

// Wrap hak_free_at
void hak_free_at(void* ptr, uintptr_t site_id) {
    if (!ptr) return;
    HAKMEM_LOCK();
    hak_free_at_internal(ptr, site_id);  // Rename old function
    HAKMEM_UNLOCK();
}
```

#### **Testing (1.5 hours)**

```bash
# Build
make clean && make shared

# Test 1: larson 1T/4T (30 min)
cd /tmp/mimalloc-bench/bench/larson

# 1-thread
LD_PRELOAD=~/git/hakorune-selfhost/apps/experiments/hakmem-poc/libhakmem.so \
./larson 0 8 1024 10000 1 12345 1
# Expected: 13-15M ops/sec

# 4-thread
LD_PRELOAD=~/git/hakorune-selfhost/apps/experiments/hakmem-poc/libhakmem.so \
./larson 0 8 1024 10000 1 12345 4
# Expected: 13-15M ops/sec (same as 1T, no crashes!)

# Test 2: Helgrind (20 min)
valgrind --tool=helgrind \
  LD_PRELOAD=~/git/hakorune-selfhost/apps/experiments/hakmem-poc/libhakmem.so \
  ./larson 0 8 1024 1000 1 12345 4
# Expected: ERROR SUMMARY: 0 errors

# Test 3: Stability (10 min)
for i in {1..10}; do
  LD_PRELOAD=~/git/hakorune-selfhost/apps/experiments/hakmem-poc/libhakmem.so \
  ./larson 0 8 1024 10000 1 12345 4 || exit 1
done
# Expected: 10/10 runs succeed
```

#### **Documentation (15 min)**

Create `PHASE_6.15_P0_RESULTS.md` with benchmark results.

---

### **Day 2: Step 3 - P1 Tiny Pool TLS (2 hours)**

**File**: `hakmem_tiny.c`

**Pattern** (copy from `hakmem_l25_pool.c:26`):
```c
// Add TLS cache
static __thread TinySlab* tls_tiny_cache[TINY_NUM_CLASSES] = {NULL};

// TLS fast path in hak_tiny_alloc()
TinySlab* slab = tls_tiny_cache[class_idx];
if (slab && slab->free_count > 0) {
    // Fast path: no lock needed
    return alloc_from_slab(slab, class_idx);
}

// TLS miss: refill from global (locked)
HAKMEM_LOCK();
// ... refill logic ...
HAKMEM_UNLOCK();
```

**Test**: larson 4T → expect 12-15M ops/sec

---

### **Day 3-4: P2 L2 Pool TLS (3 hours)**

**File**: `hakmem_pool.c`

**Same pattern** as Tiny Pool (above)

**Test**: larson 4T → expect 15-18M ops/sec

---

### **Day 5: P3 L2.5 Pool TLS (3 hours)**

**File**: `hakmem_l25_pool.c`

**Existing**: Line 26 already has `__thread L25Block* tls_l25_cache[5];`

**Add**: Refill/eviction logic in alloc/free functions

**Test**: larson 4T → expect 18-22M ops/sec

---

## 📊 **Performance Roadmap**

```
Before P0:  1T = 15.1M  4T = 3.3M  (-78%) ← UNSAFE
After P0:   1T = 13-15M 4T = 13-15M (+294-355%) ← SAFE, no scaling
After P1:   1T = 13-15M 4T = 12-15M (+264-355%) ← 95% TLS hit
After P2:   1T = 13-15M 4T = 15-18M (+355-445%) ← 90% TLS hit
After P3:   1T = 13-15M 4T = 18-22M (+445-567%) ← Full TLS

Phase 6.13 Validation:
            1T = 17.8M  4T = 15.9M (+381%) ✅ PROVEN
```

---

## ✅ **Success Criteria**

**P0 (Minimum)**:
- ✅ 4T ≥ 13M ops/sec
- ✅ Helgrind: 0 data races
- ✅ 10/10 stability runs

**P0+P1+P2 (Target)**:
- ✅ 4T ≥ 15M ops/sec
- ✅ TLS hit rate ≥ 90%
- ✅ No 1T regression (≤15%)

**All Phases (Stretch)**:
- ✅ 4T ≥ 18M ops/sec
- ✅ 16T ≥ 11.6M ops/sec

---

## 🚨 **Critical Findings**

1. **67.9M ops/sec = Measurement Error**
   - Actual: 15.1M (1T), 3.3M (4T)
   - Fix: Update Phase 6.14 report

2. **4-thread collapse = Thread-unsafe**
   - NOT a feature, NOT expected
   - Zero `pthread_mutex` in codebase
   - Fix: P0 global lock (30 min)

3. **TLS is validated (+381%)**
   - Phase 6.13 proved 4T = 15.9M ops/sec
   - NOT the cause of Phase 6.11.5 regression
   - Real culprit: Slab Registry (Phase 6.12.1)

---

## 📁 **Document Map**

```
PHASE_6.15_PLAN.md (this)    - Full implementation guide (1008 lines)
PHASE_6.15_SUMMARY.md        - Executive summary (152 lines)
PHASE_6.15_QUICK_REF.md      - Quick reference card (YOU ARE HERE)

THREAD_SAFETY_SOLUTION.md    - Complete analysis (Option A/B/C)
PHASE_6.13_INITIAL_RESULTS.md - TLS validation proof
PHASE_6.14_COMPLETION_REPORT.md - Thread issue discovery
```

---

## 🔧 **Common Commands**

```bash
# Build hakmem
cd apps/experiments/hakmem-poc
make clean && make shared

# larson benchmark (4-thread)
cd /tmp/mimalloc-bench/bench/larson
LD_PRELOAD=~/git/hakorune-selfhost/apps/experiments/hakmem-poc/libhakmem.so \
./larson 0 8 1024 10000 1 12345 4

# Helgrind race detection
valgrind --tool=helgrind \
  LD_PRELOAD=~/git/hakorune-selfhost/apps/experiments/hakmem-poc/libhakmem.so \
  ./larson 0 8 1024 1000 1 12345 4

# Check pthread usage
grep -n "pthread" apps/experiments/hakmem-poc/*.c
```

---

## 📞 **Need Help?**

- **Detailed steps**: See [PHASE_6.15_PLAN.md](PHASE_6.15_PLAN.md)
- **Technical analysis**: See [THREAD_SAFETY_SOLUTION.md](THREAD_SAFETY_SOLUTION.md)
- **Validation proof**: See [PHASE_6.13_INITIAL_RESULTS.md](PHASE_6.13_INITIAL_RESULTS.md)

---

**Status**: ✅ Ready to execute
**Total Time**: 12-13 hours (6 days)
**Expected ROI**: 6-15x improvement (3.3M → 20-50M ops/sec)