270 lines
6.3 KiB
Markdown
270 lines
6.3 KiB
Markdown
|
|
# Phase 6.15: Quick Reference Card
|
||
|
|
|
||
|
|
**Full Details**: See [PHASE_6.15_PLAN.md](PHASE_6.15_PLAN.md) (1008 lines)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 📊 **The Problem**
|
||
|
|
|
||
|
|
```
|
||
|
|
Current State: hakmem is THREAD-UNSAFE
|
||
|
|
|
||
|
|
1-thread: 15.1M ops/sec ✅ Excellent
|
||
|
|
4-thread: 3.3M ops/sec ❌ -78% collapse!
|
||
|
|
|
||
|
|
Root Cause: grep pthread_mutex *.c → 0 results
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 🎯 **The Solution (3 Steps)**
|
||
|
|
|
||
|
|
| Step | What | Time | Expected Result |
|
||
|
|
|------|------|------|----------------|
|
||
|
|
| **1** | Fix docs | 1h | Clarity on 67.9M issue |
|
||
|
|
| **2** | P0 Safety Lock | 2-3h | 4T = 13-15M (safe, no scaling) |
|
||
|
|
| **3** | TLS Performance | 8-10h | 4T = 15-20M (+381% proven) |
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 📋 **Step-by-Step Execution**
|
||
|
|
|
||
|
|
### **Day 1 Morning: Step 1 (1 hour)**
|
||
|
|
|
||
|
|
```bash
|
||
|
|
cd apps/experiments/hakmem-poc
|
||
|
|
|
||
|
|
# 1. Edit PHASE_6.14_COMPLETION_REPORT.md
|
||
|
|
# Add section explaining 67.9M measurement issue
|
||
|
|
# Add thread safety warning
|
||
|
|
|
||
|
|
# 2. Edit CURRENT_TASK.md
|
||
|
|
# Move Phase 6.14 to completed
|
||
|
|
# Add Phase 6.15 as current focus
|
||
|
|
|
||
|
|
# 3. Verify
|
||
|
|
grep "67.9M\|Thread Safety" PHASE_6.14_COMPLETION_REPORT.md
|
||
|
|
grep "Phase 6.15" CURRENT_TASK.md
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### **Day 1 Afternoon: Step 2 - P0 Safety Lock (2-3 hours)**
|
||
|
|
|
||
|
|
#### **Implementation (30 min)**
|
||
|
|
|
||
|
|
**File**: `hakmem.c`
|
||
|
|
|
||
|
|
```c
|
||
|
|
// After line 22: Add pthread.h
|
||
|
|
#include <pthread.h>
|
||
|
|
|
||
|
|
// After line 58: Add global lock
|
||
|
|
static pthread_mutex_t g_hakmem_lock = PTHREAD_MUTEX_INITIALIZER;
|
||
|
|
#define HAKMEM_LOCK() pthread_mutex_lock(&g_hakmem_lock)
|
||
|
|
#define HAKMEM_UNLOCK() pthread_mutex_unlock(&g_hakmem_lock)
|
||
|
|
|
||
|
|
// Wrap hak_alloc_at (find ~line 300-400)
|
||
|
|
void* hak_alloc_at(size_t size, uintptr_t site_id) {
|
||
|
|
HAKMEM_LOCK();
|
||
|
|
void* ptr = hak_alloc_at_internal(size, site_id); // Rename old function
|
||
|
|
HAKMEM_UNLOCK();
|
||
|
|
return ptr;
|
||
|
|
}
|
||
|
|
|
||
|
|
// Wrap hak_free_at
|
||
|
|
void hak_free_at(void* ptr, uintptr_t site_id) {
|
||
|
|
if (!ptr) return;
|
||
|
|
HAKMEM_LOCK();
|
||
|
|
hak_free_at_internal(ptr, site_id); // Rename old function
|
||
|
|
HAKMEM_UNLOCK();
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
#### **Testing (1.5 hours)**
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Build
|
||
|
|
make clean && make shared
|
||
|
|
|
||
|
|
# Test 1: larson 1T/4T (30 min)
|
||
|
|
cd /tmp/mimalloc-bench/bench/larson
|
||
|
|
|
||
|
|
# 1-thread
|
||
|
|
LD_PRELOAD=~/git/hakorune-selfhost/apps/experiments/hakmem-poc/libhakmem.so \
|
||
|
|
./larson 0 8 1024 10000 1 12345 1
|
||
|
|
# Expected: 13-15M ops/sec
|
||
|
|
|
||
|
|
# 4-thread
|
||
|
|
LD_PRELOAD=~/git/hakorune-selfhost/apps/experiments/hakmem-poc/libhakmem.so \
|
||
|
|
./larson 0 8 1024 10000 1 12345 4
|
||
|
|
# Expected: 13-15M ops/sec (same as 1T, no crashes!)
|
||
|
|
|
||
|
|
# Test 2: Helgrind (20 min)
|
||
|
|
valgrind --tool=helgrind \
|
||
|
|
LD_PRELOAD=~/git/hakorune-selfhost/apps/experiments/hakmem-poc/libhakmem.so \
|
||
|
|
./larson 0 8 1024 1000 1 12345 4
|
||
|
|
# Expected: ERROR SUMMARY: 0 errors
|
||
|
|
|
||
|
|
# Test 3: Stability (10 min)
|
||
|
|
for i in {1..10}; do
|
||
|
|
LD_PRELOAD=~/git/hakorune-selfhost/apps/experiments/hakmem-poc/libhakmem.so \
|
||
|
|
./larson 0 8 1024 10000 1 12345 4 || exit 1
|
||
|
|
done
|
||
|
|
# Expected: 10/10 runs succeed
|
||
|
|
```
|
||
|
|
|
||
|
|
#### **Documentation (15 min)**
|
||
|
|
|
||
|
|
Create `PHASE_6.15_P0_RESULTS.md` with benchmark results.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### **Day 2: Step 3 - P1 Tiny Pool TLS (2 hours)**
|
||
|
|
|
||
|
|
**File**: `hakmem_tiny.c`
|
||
|
|
|
||
|
|
**Pattern** (copy from `hakmem_l25_pool.c:26`):
|
||
|
|
```c
|
||
|
|
// Add TLS cache
|
||
|
|
static __thread TinySlab* tls_tiny_cache[TINY_NUM_CLASSES] = {NULL};
|
||
|
|
|
||
|
|
// TLS fast path in hak_tiny_alloc()
|
||
|
|
TinySlab* slab = tls_tiny_cache[class_idx];
|
||
|
|
if (slab && slab->free_count > 0) {
|
||
|
|
// Fast path: no lock needed
|
||
|
|
return alloc_from_slab(slab, class_idx);
|
||
|
|
}
|
||
|
|
|
||
|
|
// TLS miss: refill from global (locked)
|
||
|
|
HAKMEM_LOCK();
|
||
|
|
// ... refill logic ...
|
||
|
|
HAKMEM_UNLOCK();
|
||
|
|
```
|
||
|
|
|
||
|
|
**Test**: larson 4T → expect 12-15M ops/sec
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### **Day 3-4: P2 L2 Pool TLS (3 hours)**
|
||
|
|
|
||
|
|
**File**: `hakmem_pool.c`
|
||
|
|
|
||
|
|
**Same pattern** as Tiny Pool (above)
|
||
|
|
|
||
|
|
**Test**: larson 4T → expect 15-18M ops/sec
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### **Day 5: P3 L2.5 Pool TLS (3 hours)**
|
||
|
|
|
||
|
|
**File**: `hakmem_l25_pool.c`
|
||
|
|
|
||
|
|
**Existing**: Line 26 already has `__thread L25Block* tls_l25_cache[5];`
|
||
|
|
|
||
|
|
**Add**: Refill/eviction logic in alloc/free functions
|
||
|
|
|
||
|
|
**Test**: larson 4T → expect 18-22M ops/sec
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 📊 **Performance Roadmap**
|
||
|
|
|
||
|
|
```
|
||
|
|
Before P0: 1T = 15.1M 4T = 3.3M (-78%) ← UNSAFE
|
||
|
|
After P0: 1T = 13-15M 4T = 13-15M (+294-355%) ← SAFE, no scaling
|
||
|
|
After P1: 1T = 13-15M 4T = 12-15M (+264-355%) ← 95% TLS hit
|
||
|
|
After P2: 1T = 13-15M 4T = 15-18M (+355-445%) ← 90% TLS hit
|
||
|
|
After P3: 1T = 13-15M 4T = 18-22M (+445-567%) ← Full TLS
|
||
|
|
|
||
|
|
Phase 6.13 Validation:
|
||
|
|
1T = 17.8M 4T = 15.9M (+381%) ✅ PROVEN
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## ✅ **Success Criteria**
|
||
|
|
|
||
|
|
**P0 (Minimum)**:
|
||
|
|
- ✅ 4T ≥ 13M ops/sec
|
||
|
|
- ✅ Helgrind: 0 data races
|
||
|
|
- ✅ 10/10 stability runs
|
||
|
|
|
||
|
|
**P0+P1+P2 (Target)**:
|
||
|
|
- ✅ 4T ≥ 15M ops/sec
|
||
|
|
- ✅ TLS hit rate ≥ 90%
|
||
|
|
- ✅ No 1T regression (≤15%)
|
||
|
|
|
||
|
|
**All Phases (Stretch)**:
|
||
|
|
- ✅ 4T ≥ 18M ops/sec
|
||
|
|
- ✅ 16T ≥ 11.6M ops/sec
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 🚨 **Critical Findings**
|
||
|
|
|
||
|
|
1. **67.9M ops/sec = Measurement Error**
|
||
|
|
- Actual: 15.1M (1T), 3.3M (4T)
|
||
|
|
- Fix: Update Phase 6.14 report
|
||
|
|
|
||
|
|
2. **4-thread collapse = Thread-unsafe**
|
||
|
|
- NOT a feature, NOT expected
|
||
|
|
- Zero `pthread_mutex` in codebase
|
||
|
|
- Fix: P0 global lock (30 min)
|
||
|
|
|
||
|
|
3. **TLS is validated (+381%)**
|
||
|
|
- Phase 6.13 proved 4T = 15.9M ops/sec
|
||
|
|
- NOT the cause of Phase 6.11.5 regression
|
||
|
|
- Real culprit: Slab Registry (Phase 6.12.1)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 📁 **Document Map**
|
||
|
|
|
||
|
|
```
|
||
|
|
PHASE_6.15_PLAN.md (this) - Full implementation guide (1008 lines)
|
||
|
|
PHASE_6.15_SUMMARY.md - Executive summary (152 lines)
|
||
|
|
PHASE_6.15_QUICK_REF.md - Quick reference card (YOU ARE HERE)
|
||
|
|
|
||
|
|
THREAD_SAFETY_SOLUTION.md - Complete analysis (Option A/B/C)
|
||
|
|
PHASE_6.13_INITIAL_RESULTS.md - TLS validation proof
|
||
|
|
PHASE_6.14_COMPLETION_REPORT.md - Thread issue discovery
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 🔧 **Common Commands**
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Build hakmem
|
||
|
|
cd apps/experiments/hakmem-poc
|
||
|
|
make clean && make shared
|
||
|
|
|
||
|
|
# larson benchmark (4-thread)
|
||
|
|
cd /tmp/mimalloc-bench/bench/larson
|
||
|
|
LD_PRELOAD=~/git/hakorune-selfhost/apps/experiments/hakmem-poc/libhakmem.so \
|
||
|
|
./larson 0 8 1024 10000 1 12345 4
|
||
|
|
|
||
|
|
# Helgrind race detection
|
||
|
|
valgrind --tool=helgrind \
|
||
|
|
LD_PRELOAD=~/git/hakorune-selfhost/apps/experiments/hakmem-poc/libhakmem.so \
|
||
|
|
./larson 0 8 1024 1000 1 12345 4
|
||
|
|
|
||
|
|
# Check pthread usage
|
||
|
|
grep -n "pthread" apps/experiments/hakmem-poc/*.c
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 📞 **Need Help?**
|
||
|
|
|
||
|
|
- **Detailed steps**: See [PHASE_6.15_PLAN.md](PHASE_6.15_PLAN.md)
|
||
|
|
- **Technical analysis**: See [THREAD_SAFETY_SOLUTION.md](THREAD_SAFETY_SOLUTION.md)
|
||
|
|
- **Validation proof**: See [PHASE_6.13_INITIAL_RESULTS.md](PHASE_6.13_INITIAL_RESULTS.md)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
**Status**: ✅ Ready to execute
|
||
|
|
**Total Time**: 12-13 hours (6 days)
|
||
|
|
**Expected ROI**: 6-15x improvement (3.3M → 20-50M ops/sec)
|