Files
hakmem/docs/archive/PHASE_6.15_QUICK_REF.md
Moe Charm (CI) 52386401b3 Debug Counters Implementation - Clean History
Major Features:
- Debug counter infrastructure for Refill Stage tracking
- Free Pipeline counters (ss_local, ss_remote, tls_sll)
- Diagnostic counters for early return analysis
- Unified larson.sh benchmark runner with profiles
- Phase 6-3 regression analysis documentation

Bug Fixes:
- Fix SuperSlab disabled by default (HAKMEM_TINY_USE_SUPERSLAB)
- Fix profile variable naming consistency
- Add .gitignore patterns for large files

Performance:
- Phase 6-3: 4.79 M ops/s (has OOM risk)
- With SuperSlab: 3.13 M ops/s (+19% improvement)

This is a clean repository without large log files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-05 12:31:14 +09:00

270 lines
6.3 KiB
Markdown

# Phase 6.15: Quick Reference Card
**Full Details**: See [PHASE_6.15_PLAN.md](PHASE_6.15_PLAN.md) (1008 lines)
---
## 📊 **The Problem**
```
Current State: hakmem is THREAD-UNSAFE
1-thread: 15.1M ops/sec ✅ Excellent
4-thread: 3.3M ops/sec ❌ -78% collapse!
Root Cause: grep pthread_mutex *.c → 0 results
```
---
## 🎯 **The Solution (3 Steps)**
| Step | What | Time | Expected Result |
|------|------|------|----------------|
| **1** | Fix docs | 1h | Clarity on 67.9M issue |
| **2** | P0 Safety Lock | 2-3h | 4T = 13-15M (safe, no scaling) |
| **3** | TLS Performance | 8-10h | 4T = 15-20M (+381% proven) |
---
## 📋 **Step-by-Step Execution**
### **Day 1 Morning: Step 1 (1 hour)**
```bash
cd apps/experiments/hakmem-poc
# 1. Edit PHASE_6.14_COMPLETION_REPORT.md
# Add section explaining 67.9M measurement issue
# Add thread safety warning
# 2. Edit CURRENT_TASK.md
# Move Phase 6.14 to completed
# Add Phase 6.15 as current focus
# 3. Verify
grep "67.9M\|Thread Safety" PHASE_6.14_COMPLETION_REPORT.md
grep "Phase 6.15" CURRENT_TASK.md
```
---
### **Day 1 Afternoon: Step 2 - P0 Safety Lock (2-3 hours)**
#### **Implementation (30 min)**
**File**: `hakmem.c`
```c
// After line 22: Add pthread.h
#include <pthread.h>
// After line 58: Add global lock
static pthread_mutex_t g_hakmem_lock = PTHREAD_MUTEX_INITIALIZER;
#define HAKMEM_LOCK() pthread_mutex_lock(&g_hakmem_lock)
#define HAKMEM_UNLOCK() pthread_mutex_unlock(&g_hakmem_lock)
// Wrap hak_alloc_at (find ~line 300-400)
void* hak_alloc_at(size_t size, uintptr_t site_id) {
HAKMEM_LOCK();
void* ptr = hak_alloc_at_internal(size, site_id); // Rename old function
HAKMEM_UNLOCK();
return ptr;
}
// Wrap hak_free_at
void hak_free_at(void* ptr, uintptr_t site_id) {
if (!ptr) return;
HAKMEM_LOCK();
hak_free_at_internal(ptr, site_id); // Rename old function
HAKMEM_UNLOCK();
}
```
#### **Testing (1.5 hours)**
```bash
# Build
make clean && make shared
# Test 1: larson 1T/4T (30 min)
cd /tmp/mimalloc-bench/bench/larson
# 1-thread
LD_PRELOAD=~/git/hakorune-selfhost/apps/experiments/hakmem-poc/libhakmem.so \
./larson 0 8 1024 10000 1 12345 1
# Expected: 13-15M ops/sec
# 4-thread
LD_PRELOAD=~/git/hakorune-selfhost/apps/experiments/hakmem-poc/libhakmem.so \
./larson 0 8 1024 10000 1 12345 4
# Expected: 13-15M ops/sec (same as 1T, no crashes!)
# Test 2: Helgrind (20 min)
valgrind --tool=helgrind \
LD_PRELOAD=~/git/hakorune-selfhost/apps/experiments/hakmem-poc/libhakmem.so \
./larson 0 8 1024 1000 1 12345 4
# Expected: ERROR SUMMARY: 0 errors
# Test 3: Stability (10 min)
for i in {1..10}; do
LD_PRELOAD=~/git/hakorune-selfhost/apps/experiments/hakmem-poc/libhakmem.so \
./larson 0 8 1024 10000 1 12345 4 || exit 1
done
# Expected: 10/10 runs succeed
```
#### **Documentation (15 min)**
Create `PHASE_6.15_P0_RESULTS.md` with benchmark results.
---
### **Day 2: Step 3 - P1 Tiny Pool TLS (2 hours)**
**File**: `hakmem_tiny.c`
**Pattern** (copy from `hakmem_l25_pool.c:26`):
```c
// Add TLS cache
static __thread TinySlab* tls_tiny_cache[TINY_NUM_CLASSES] = {NULL};
// TLS fast path in hak_tiny_alloc()
TinySlab* slab = tls_tiny_cache[class_idx];
if (slab && slab->free_count > 0) {
// Fast path: no lock needed
return alloc_from_slab(slab, class_idx);
}
// TLS miss: refill from global (locked)
HAKMEM_LOCK();
// ... refill logic ...
HAKMEM_UNLOCK();
```
**Test**: larson 4T → expect 12-15M ops/sec
---
### **Day 3-4: P2 L2 Pool TLS (3 hours)**
**File**: `hakmem_pool.c`
**Same pattern** as Tiny Pool (above)
**Test**: larson 4T → expect 15-18M ops/sec
---
### **Day 5: P3 L2.5 Pool TLS (3 hours)**
**File**: `hakmem_l25_pool.c`
**Existing**: Line 26 already has `__thread L25Block* tls_l25_cache[5];`
**Add**: Refill/eviction logic in alloc/free functions
**Test**: larson 4T → expect 18-22M ops/sec
---
## 📊 **Performance Roadmap**
```
Before P0: 1T = 15.1M 4T = 3.3M (-78%) ← UNSAFE
After P0: 1T = 13-15M 4T = 13-15M (+294-355%) ← SAFE, no scaling
After P1: 1T = 13-15M 4T = 12-15M (+264-355%) ← 95% TLS hit
After P2: 1T = 13-15M 4T = 15-18M (+355-445%) ← 90% TLS hit
After P3: 1T = 13-15M 4T = 18-22M (+445-567%) ← Full TLS
Phase 6.13 Validation:
1T = 17.8M 4T = 15.9M (+381%) ✅ PROVEN
```
---
## ✅ **Success Criteria**
**P0 (Minimum)**:
- ✅ 4T ≥ 13M ops/sec
- ✅ Helgrind: 0 data races
- ✅ 10/10 stability runs
**P0+P1+P2 (Target)**:
- ✅ 4T ≥ 15M ops/sec
- ✅ TLS hit rate ≥ 90%
- ✅ No 1T regression (≤15%)
**All Phases (Stretch)**:
- ✅ 4T ≥ 18M ops/sec
- ✅ 16T ≥ 11.6M ops/sec
---
## 🚨 **Critical Findings**
1. **67.9M ops/sec = Measurement Error**
- Actual: 15.1M (1T), 3.3M (4T)
- Fix: Update Phase 6.14 report
2. **4-thread collapse = Thread-unsafe**
- NOT a feature, NOT expected
- Zero `pthread_mutex` in codebase
- Fix: P0 global lock (30 min)
3. **TLS is validated (+381%)**
- Phase 6.13 proved 4T = 15.9M ops/sec
- NOT the cause of Phase 6.11.5 regression
- Real culprit: Slab Registry (Phase 6.12.1)
---
## 📁 **Document Map**
```
PHASE_6.15_PLAN.md (this) - Full implementation guide (1008 lines)
PHASE_6.15_SUMMARY.md - Executive summary (152 lines)
PHASE_6.15_QUICK_REF.md - Quick reference card (YOU ARE HERE)
THREAD_SAFETY_SOLUTION.md - Complete analysis (Option A/B/C)
PHASE_6.13_INITIAL_RESULTS.md - TLS validation proof
PHASE_6.14_COMPLETION_REPORT.md - Thread issue discovery
```
---
## 🔧 **Common Commands**
```bash
# Build hakmem
cd apps/experiments/hakmem-poc
make clean && make shared
# larson benchmark (4-thread)
cd /tmp/mimalloc-bench/bench/larson
LD_PRELOAD=~/git/hakorune-selfhost/apps/experiments/hakmem-poc/libhakmem.so \
./larson 0 8 1024 10000 1 12345 4
# Helgrind race detection
valgrind --tool=helgrind \
LD_PRELOAD=~/git/hakorune-selfhost/apps/experiments/hakmem-poc/libhakmem.so \
./larson 0 8 1024 1000 1 12345 4
# Check pthread usage
grep -n "pthread" apps/experiments/hakmem-poc/*.c
```
---
## 📞 **Need Help?**
- **Detailed steps**: See [PHASE_6.15_PLAN.md](PHASE_6.15_PLAN.md)
- **Technical analysis**: See [THREAD_SAFETY_SOLUTION.md](THREAD_SAFETY_SOLUTION.md)
- **Validation proof**: See [PHASE_6.13_INITIAL_RESULTS.md](PHASE_6.13_INITIAL_RESULTS.md)
---
**Status**: ✅ Ready to execute
**Total Time**: 12-13 hours (6 days)
**Expected ROI**: 6-15x improvement (3.3M → 20-50M ops/sec)