Major Features: - Debug counter infrastructure for Refill Stage tracking - Free Pipeline counters (ss_local, ss_remote, tls_sll) - Diagnostic counters for early return analysis - Unified larson.sh benchmark runner with profiles - Phase 6-3 regression analysis documentation Bug Fixes: - Fix SuperSlab disabled by default (HAKMEM_TINY_USE_SUPERSLAB) - Fix profile variable naming consistency - Add .gitignore patterns for large files Performance: - Phase 6-3: 4.79 M ops/s (has OOM risk) - With SuperSlab: 3.13 M ops/s (+19% improvement) This is a clean repository without large log files. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
323 lines
7.3 KiB
Markdown
323 lines
7.3 KiB
Markdown
# ChatGPT Pro Response: mmap vs malloc Strategy
|
||
|
||
**Date**: 2025-10-21
|
||
**Response Time**: ~2 minutes
|
||
**Model**: GPT-5 (via codex)
|
||
**Status**: ✅ Clear recommendation received
|
||
|
||
---
|
||
|
||
## 🎯 **Final Recommendation: GO with Option A**
|
||
|
||
**Decision**: Switch `POLICY_LARGE_INFREQUENT` to `mmap` with kill-switch guard.
|
||
|
||
---
|
||
|
||
## ✅ **Why Option A**
|
||
|
||
1. **Phase 6.3 requires mmap**: `madvise` is a no-op on `malloc` blocks
|
||
2. **BigCache absorbs risk**: 90% hit rate → only 10% hit OS (1538 → 150 faults)
|
||
3. **mimalloc's secret**: "keep mapping, lazily reclaim" with MADV_FREE/DONTNEED
|
||
4. **Immediate unlock**: Phase 6.3 works immediately
|
||
|
||
---
|
||
|
||
## 🔥 **CRITICAL BUG DISCOVERED in Current Code**
|
||
|
||
**Problem in `hakmem.c:543`**:
|
||
|
||
```c
|
||
case ALLOC_METHOD_MMAP:
|
||
if (hdr->size >= BATCH_MIN_SIZE) {
|
||
hak_batch_add(raw, hdr->size); // Add to batch
|
||
}
|
||
munmap(raw, hdr->size); // ← BUG! Immediately unmaps
|
||
break;
|
||
```
|
||
|
||
**Why this is wrong**:
|
||
- Calls `munmap` immediately after adding to batch
|
||
- **Negates Phase 6.3 benefit**: batch cannot coalesce/defray TLB work
|
||
- TLB flush happens on `munmap`, not on `madvise`
|
||
|
||
---
|
||
|
||
## ✅ **Correct Implementation**
|
||
|
||
### Free Path Logic (Choose ONE):
|
||
|
||
**Option 1: Cache in BigCache**
|
||
```c
|
||
// Try BigCache first
|
||
if (hak_bigcache_try_insert(ptr, size, site_id)) {
|
||
// Cached! Do NOT munmap
|
||
// Optionally: madvise(MADV_FREE) on insert or eviction
|
||
return;
|
||
}
|
||
```
|
||
|
||
**Option 2: Batch for delayed reclaim**
|
||
```c
|
||
// BigCache full, add to batch
|
||
if (size >= BATCH_MIN_SIZE) {
|
||
hak_batch_add(raw, size);
|
||
// Do NOT munmap here!
|
||
// munmap happens on batch flush (coalesced)
|
||
return;
|
||
}
|
||
```
|
||
|
||
**Option 3: Immediate unmap (last resort)**
|
||
```c
|
||
// Cold eviction only
|
||
munmap(raw, size);
|
||
```
|
||
|
||
---
|
||
|
||
## 🎯 **Implementation Plan**
|
||
|
||
### Phase 1: Minimal Change (1-line)
|
||
|
||
**File**: `hakmem.c:357`
|
||
|
||
```c
|
||
case POLICY_LARGE_INFREQUENT:
|
||
return alloc_mmap(size); // Changed from alloc_malloc
|
||
```
|
||
|
||
**Guard with kill-switch**:
|
||
```c
|
||
#ifdef HAKO_HAKMEM_LARGE_MMAP
|
||
return alloc_mmap(size);
|
||
#else
|
||
return alloc_malloc(size); // Safe fallback
|
||
#endif
|
||
```
|
||
|
||
**Env variable**: `HAKO_HAKMEM_LARGE_MMAP=1` (default OFF)
|
||
|
||
### Phase 2: Fix Free Path
|
||
|
||
**File**: `hakmem.c:543-548`
|
||
|
||
**Current (WRONG)**:
|
||
```c
|
||
case ALLOC_METHOD_MMAP:
|
||
if (hdr->size >= BATCH_MIN_SIZE) {
|
||
hak_batch_add(raw, hdr->size);
|
||
}
|
||
munmap(raw, hdr->size); // ← Remove this!
|
||
break;
|
||
```
|
||
|
||
**Correct**:
|
||
```c
|
||
case ALLOC_METHOD_MMAP:
|
||
// Try BigCache first
|
||
if (hdr->size >= 1048576) { // 1MB threshold
|
||
if (hak_bigcache_try_insert(user_ptr, hdr->size, site_id)) {
|
||
// Cached, skip munmap
|
||
return;
|
||
}
|
||
}
|
||
|
||
// BigCache full, add to batch
|
||
if (hdr->size >= BATCH_MIN_SIZE) {
|
||
hak_batch_add(raw, hdr->size);
|
||
// munmap deferred to batch flush
|
||
return;
|
||
}
|
||
|
||
// Small or batch disabled, immediate unmap
|
||
munmap(raw, hdr->size);
|
||
break;
|
||
```
|
||
|
||
### Phase 3: Batch Flush Implementation
|
||
|
||
**File**: `hakmem_batch.c`
|
||
|
||
```c
|
||
void hak_batch_flush(void) {
|
||
if (batch_count == 0) return;
|
||
|
||
// Use MADV_FREE (prefer) or MADV_DONTNEED (fallback)
|
||
for (size_t i = 0; i < batch_count; i++) {
|
||
#ifdef __linux__
|
||
madvise(batch[i].ptr, batch[i].size, MADV_FREE);
|
||
#else
|
||
madvise(batch[i].ptr, batch[i].size, MADV_DONTNEED);
|
||
#endif
|
||
}
|
||
|
||
// Optional: munmap on cold eviction
|
||
// (Keep VA mapped for reuse in most cases)
|
||
|
||
batch_count = 0;
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 📊 **Expected Performance Gains**
|
||
|
||
### Metrics Prediction:
|
||
|
||
| Metric | Current (malloc) | With Option A (mmap) | Improvement |
|
||
|--------|------------------|----------------------|-------------|
|
||
| **Page faults** | 513 | **120-180** | 65-77% fewer |
|
||
| **TLB shootdowns** | ~150 | **3-8** | 95% fewer |
|
||
| **Latency (VM)** | 36,647 ns | **24,000-28,000 ns** | **30-45% faster** |
|
||
|
||
### Success Criteria:
|
||
- ✅ Page faults: 120-180 (vs 513 current)
|
||
- ✅ Batch flushes: 3-8 per run
|
||
- ✅ Latency: 25-28 µs (vs 36.6 µs current)
|
||
|
||
### Rollback Criteria:
|
||
- ❌ Page faults > 500 (BigCache failing)
|
||
- ❌ Latency regression (slower than 36,647 ns)
|
||
|
||
---
|
||
|
||
## 🛡️ **Risk Mitigation**
|
||
|
||
### 1. Kill-Switch Guard
|
||
```c
|
||
// Compile-time or runtime flag
|
||
HAKO_HAKMEM_LARGE_MMAP=1 // Enable mmap path
|
||
```
|
||
|
||
### 2. BigCache Hard Cap
|
||
- Limit: 64-256 MB (1-2× working set)
|
||
- LRU eviction to batched reclaim
|
||
|
||
### 3. Prefer MADV_FREE
|
||
- Lower TLB cost than MADV_DONTNEED
|
||
- Better performance on quick reuse
|
||
- Linux: `MADV_FREE`, macOS: `MADV_FREE_REUSABLE`
|
||
|
||
### 4. Observability (Add Counters)
|
||
- mmap allocation count
|
||
- BigCache hits/misses for mmap
|
||
- Batch flush count
|
||
- munmap count
|
||
- Sample `minflt/majflt` before/after
|
||
|
||
---
|
||
|
||
## 🧪 **Test Plan**
|
||
|
||
### Step 1: Enable mmap with guard
|
||
```bash
|
||
# Makefile
|
||
CFLAGS += -DHAKO_HAKMEM_LARGE_MMAP=1
|
||
```
|
||
|
||
### Step 2: Run VM scenario benchmark
|
||
```bash
|
||
# 10 runs, measure:
|
||
make bench_vm RUNS=10
|
||
```
|
||
|
||
### Step 3: Collect metrics
|
||
- BigCache hit% for mmap
|
||
- Page faults (expect 120-180)
|
||
- Batch flushes (expect 3-8)
|
||
- Latency (expect 24-28 µs)
|
||
|
||
### Step 4: Validate or rollback
|
||
```bash
|
||
# If page faults > 500 or latency regresses:
|
||
CFLAGS += -UHAKO_HAKMEM_LARGE_MMAP # Rollback
|
||
```
|
||
|
||
---
|
||
|
||
## 🎯 **BigCache + mmap Compatibility**
|
||
|
||
**ChatGPT Pro confirms: SAFE**
|
||
|
||
- ✅ mmap blocks can be cached (same as malloc semantics)
|
||
- ✅ Content unspecified (matches malloc)
|
||
- ✅ Reusable after `MADV_FREE`
|
||
|
||
**Required changes**:
|
||
1. **Allocation**: `hak_bigcache_try_get` serves mmap blocks
|
||
2. **Free**: Try BigCache insert first, skip `munmap` if cached
|
||
3. **Header**: Keep `ALLOC_METHOD_MMAP` on cached blocks
|
||
|
||
---
|
||
|
||
## 🏆 **mimalloc's Secret Revealed**
|
||
|
||
**How mimalloc wins on VM scenario**:
|
||
|
||
1. **Keep VA mapped**: Don't `munmap` immediately
|
||
2. **Lazy reclaim**: Use `MADV_FREE`/`REUSABLE`
|
||
3. **Batch TLB work**: Coalesce reclamation
|
||
4. **Per-segment reuse**: Cache large blocks
|
||
|
||
**Our Option A emulates this**: BigCache + mmap + MADV_FREE + batching
|
||
|
||
---
|
||
|
||
## 📋 **Action Items**
|
||
|
||
### Immediate (Phase 1):
|
||
- [ ] Add kill-switch guard (`HAKO_HAKMEM_LARGE_MMAP`)
|
||
- [ ] Change line 357: `return alloc_mmap(size);`
|
||
- [ ] Test compile
|
||
|
||
### Critical (Phase 2):
|
||
- [ ] Fix free path (remove immediate `munmap`)
|
||
- [ ] Implement BigCache insert check
|
||
- [ ] Defer `munmap` to batch flush
|
||
|
||
### Optimization (Phase 3):
|
||
- [ ] Switch to `MADV_FREE` (Linux)
|
||
- [ ] Add observability counters
|
||
- [ ] Implement BigCache hard cap (64-256 MB)
|
||
|
||
### Validation:
|
||
- [ ] Run VM scenario (10 runs)
|
||
- [ ] Verify page faults < 200
|
||
- [ ] Verify latency 24-28 µs
|
||
- [ ] Rollback if metrics fail
|
||
|
||
---
|
||
|
||
## 🎯 **Alternative: Option C (ELO)**
|
||
|
||
**If Option A fails**:
|
||
- Extend ELO action space: malloc vs mmap dimension
|
||
- Doubles ELO arms (12 → 24 strategies)
|
||
- Slower convergence, more complex
|
||
|
||
**ChatGPT Pro says**: "Overkill right now. Ship Option A with kill-switch first."
|
||
|
||
---
|
||
|
||
## 📊 **Summary**
|
||
|
||
**Decision**: ✅ GO with Option A (mmap + kill-switch)
|
||
|
||
**Critical Fix**: Remove immediate `munmap` in free path
|
||
|
||
**Expected Gain**: 30-45% improvement on VM scenario (36.6 → 24-28 µs)
|
||
|
||
**Next Steps**:
|
||
1. Implement Phase 1 (1-line change + guard)
|
||
2. Fix Phase 2 (free path)
|
||
3. Run VM benchmark
|
||
4. Validate or rollback
|
||
|
||
**Confidence**: HIGH (based on BigCache's 90% hit rate + mimalloc analysis)
|
||
|
||
---
|
||
|
||
**Generated**: 2025-10-21 by ChatGPT-5 (via codex exec)
|
||
**Status**: Ready for implementation
|
||
**Priority**: P0 (unlocks Phase 6.3)
|