Files
hakmem/docs/analysis/CHATGPT_PRO_RESPONSE_MMAP.md

323 lines
7.3 KiB
Markdown
Raw Normal View History

# ChatGPT Pro Response: mmap vs malloc Strategy
**Date**: 2025-10-21
**Response Time**: ~2 minutes
**Model**: GPT-5 (via codex)
**Status**: ✅ Clear recommendation received
---
## 🎯 **Final Recommendation: GO with Option A**
**Decision**: Switch `POLICY_LARGE_INFREQUENT` to `mmap` with kill-switch guard.
---
## ✅ **Why Option A**
1. **Phase 6.3 requires mmap**: `madvise` is a no-op on `malloc` blocks
2. **BigCache absorbs risk**: 90% hit rate → only 10% hit OS (1538 → 150 faults)
3. **mimalloc's secret**: "keep mapping, lazily reclaim" with MADV_FREE/DONTNEED
4. **Immediate unlock**: Phase 6.3 works immediately
---
## 🔥 **CRITICAL BUG DISCOVERED in Current Code**
**Problem in `hakmem.c:543`**:
```c
case ALLOC_METHOD_MMAP:
if (hdr->size >= BATCH_MIN_SIZE) {
hak_batch_add(raw, hdr->size); // Add to batch
}
munmap(raw, hdr->size); // ← BUG! Immediately unmaps
break;
```
**Why this is wrong**:
- Calls `munmap` immediately after adding to batch
- **Negates Phase 6.3 benefit**: batch cannot coalesce/defray TLB work
- TLB flush happens on `munmap`, not on `madvise`
---
## ✅ **Correct Implementation**
### Free Path Logic (Choose ONE):
**Option 1: Cache in BigCache**
```c
// Try BigCache first
if (hak_bigcache_try_insert(ptr, size, site_id)) {
// Cached! Do NOT munmap
// Optionally: madvise(MADV_FREE) on insert or eviction
return;
}
```
**Option 2: Batch for delayed reclaim**
```c
// BigCache full, add to batch
if (size >= BATCH_MIN_SIZE) {
hak_batch_add(raw, size);
// Do NOT munmap here!
// munmap happens on batch flush (coalesced)
return;
}
```
**Option 3: Immediate unmap (last resort)**
```c
// Cold eviction only
munmap(raw, size);
```
---
## 🎯 **Implementation Plan**
### Phase 1: Minimal Change (1-line)
**File**: `hakmem.c:357`
```c
case POLICY_LARGE_INFREQUENT:
return alloc_mmap(size); // Changed from alloc_malloc
```
**Guard with kill-switch**:
```c
#ifdef HAKO_HAKMEM_LARGE_MMAP
return alloc_mmap(size);
#else
return alloc_malloc(size); // Safe fallback
#endif
```
**Env variable**: `HAKO_HAKMEM_LARGE_MMAP=1` (default OFF)
### Phase 2: Fix Free Path
**File**: `hakmem.c:543-548`
**Current (WRONG)**:
```c
case ALLOC_METHOD_MMAP:
if (hdr->size >= BATCH_MIN_SIZE) {
hak_batch_add(raw, hdr->size);
}
munmap(raw, hdr->size); // ← Remove this!
break;
```
**Correct**:
```c
case ALLOC_METHOD_MMAP:
// Try BigCache first
if (hdr->size >= 1048576) { // 1MB threshold
if (hak_bigcache_try_insert(user_ptr, hdr->size, site_id)) {
// Cached, skip munmap
return;
}
}
// BigCache full, add to batch
if (hdr->size >= BATCH_MIN_SIZE) {
hak_batch_add(raw, hdr->size);
// munmap deferred to batch flush
return;
}
// Small or batch disabled, immediate unmap
munmap(raw, hdr->size);
break;
```
### Phase 3: Batch Flush Implementation
**File**: `hakmem_batch.c`
```c
void hak_batch_flush(void) {
if (batch_count == 0) return;
// Use MADV_FREE (prefer) or MADV_DONTNEED (fallback)
for (size_t i = 0; i < batch_count; i++) {
#ifdef __linux__
madvise(batch[i].ptr, batch[i].size, MADV_FREE);
#else
madvise(batch[i].ptr, batch[i].size, MADV_DONTNEED);
#endif
}
// Optional: munmap on cold eviction
// (Keep VA mapped for reuse in most cases)
batch_count = 0;
}
```
---
## 📊 **Expected Performance Gains**
### Metrics Prediction:
| Metric | Current (malloc) | With Option A (mmap) | Improvement |
|--------|------------------|----------------------|-------------|
| **Page faults** | 513 | **120-180** | 65-77% fewer |
| **TLB shootdowns** | ~150 | **3-8** | 95% fewer |
| **Latency (VM)** | 36,647 ns | **24,000-28,000 ns** | **30-45% faster** |
### Success Criteria:
- ✅ Page faults: 120-180 (vs 513 current)
- ✅ Batch flushes: 3-8 per run
- ✅ Latency: 25-28 µs (vs 36.6 µs current)
### Rollback Criteria:
- ❌ Page faults > 500 (BigCache failing)
- ❌ Latency regression (slower than 36,647 ns)
---
## 🛡️ **Risk Mitigation**
### 1. Kill-Switch Guard
```c
// Compile-time or runtime flag
HAKO_HAKMEM_LARGE_MMAP=1 // Enable mmap path
```
### 2. BigCache Hard Cap
- Limit: 64-256 MB (1-2× working set)
- LRU eviction to batched reclaim
### 3. Prefer MADV_FREE
- Lower TLB cost than MADV_DONTNEED
- Better performance on quick reuse
- Linux: `MADV_FREE`, macOS: `MADV_FREE_REUSABLE`
### 4. Observability (Add Counters)
- mmap allocation count
- BigCache hits/misses for mmap
- Batch flush count
- munmap count
- Sample `minflt/majflt` before/after
---
## 🧪 **Test Plan**
### Step 1: Enable mmap with guard
```bash
# Makefile
CFLAGS += -DHAKO_HAKMEM_LARGE_MMAP=1
```
### Step 2: Run VM scenario benchmark
```bash
# 10 runs, measure:
make bench_vm RUNS=10
```
### Step 3: Collect metrics
- BigCache hit% for mmap
- Page faults (expect 120-180)
- Batch flushes (expect 3-8)
- Latency (expect 24-28 µs)
### Step 4: Validate or rollback
```bash
# If page faults > 500 or latency regresses:
CFLAGS += -UHAKO_HAKMEM_LARGE_MMAP # Rollback
```
---
## 🎯 **BigCache + mmap Compatibility**
**ChatGPT Pro confirms: SAFE**
- ✅ mmap blocks can be cached (same as malloc semantics)
- ✅ Content unspecified (matches malloc)
- ✅ Reusable after `MADV_FREE`
**Required changes**:
1. **Allocation**: `hak_bigcache_try_get` serves mmap blocks
2. **Free**: Try BigCache insert first, skip `munmap` if cached
3. **Header**: Keep `ALLOC_METHOD_MMAP` on cached blocks
---
## 🏆 **mimalloc's Secret Revealed**
**How mimalloc wins on VM scenario**:
1. **Keep VA mapped**: Don't `munmap` immediately
2. **Lazy reclaim**: Use `MADV_FREE`/`REUSABLE`
3. **Batch TLB work**: Coalesce reclamation
4. **Per-segment reuse**: Cache large blocks
**Our Option A emulates this**: BigCache + mmap + MADV_FREE + batching
---
## 📋 **Action Items**
### Immediate (Phase 1):
- [ ] Add kill-switch guard (`HAKO_HAKMEM_LARGE_MMAP`)
- [ ] Change line 357: `return alloc_mmap(size);`
- [ ] Test compile
### Critical (Phase 2):
- [ ] Fix free path (remove immediate `munmap`)
- [ ] Implement BigCache insert check
- [ ] Defer `munmap` to batch flush
### Optimization (Phase 3):
- [ ] Switch to `MADV_FREE` (Linux)
- [ ] Add observability counters
- [ ] Implement BigCache hard cap (64-256 MB)
### Validation:
- [ ] Run VM scenario (10 runs)
- [ ] Verify page faults < 200
- [ ] Verify latency 24-28 µs
- [ ] Rollback if metrics fail
---
## 🎯 **Alternative: Option C (ELO)**
**If Option A fails**:
- Extend ELO action space: malloc vs mmap dimension
- Doubles ELO arms (12 → 24 strategies)
- Slower convergence, more complex
**ChatGPT Pro says**: "Overkill right now. Ship Option A with kill-switch first."
---
## 📊 **Summary**
**Decision**: ✅ GO with Option A (mmap + kill-switch)
**Critical Fix**: Remove immediate `munmap` in free path
**Expected Gain**: 30-45% improvement on VM scenario (36.6 → 24-28 µs)
**Next Steps**:
1. Implement Phase 1 (1-line change + guard)
2. Fix Phase 2 (free path)
3. Run VM benchmark
4. Validate or rollback
**Confidence**: HIGH (based on BigCache's 90% hit rate + mimalloc analysis)
---
**Generated**: 2025-10-21 by ChatGPT-5 (via codex exec)
**Status**: Ready for implementation
**Priority**: P0 (unlocks Phase 6.3)