Files
hakmem/docs/analysis/CHATGPT_PRO_BATCH_ANALYSIS.md
Moe Charm (CI) 52386401b3 Debug Counters Implementation - Clean History
Major Features:
- Debug counter infrastructure for Refill Stage tracking
- Free Pipeline counters (ss_local, ss_remote, tls_sll)
- Diagnostic counters for early return analysis
- Unified larson.sh benchmark runner with profiles
- Phase 6-3 regression analysis documentation

Bug Fixes:
- Fix SuperSlab disabled by default (HAKMEM_TINY_USE_SUPERSLAB)
- Fix profile variable naming consistency
- Add .gitignore patterns for large files

Performance:
- Phase 6-3: 4.79 M ops/s (has OOM risk)
- With SuperSlab: 3.13 M ops/s (+19% improvement)

This is a clean repository without large log files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-05 12:31:14 +09:00

240 lines
6.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# ChatGPT Pro Analysis: Batch Not Triggered Issue
**Date**: 2025-10-21
**Status**: Implementation correct, coverage issue + one gap
---
## 🎯 **Short Answer**
**This is primarily a benchmark coverage issue, plus one implementation gap.**
Current run never calls the batch path because:
- BigCache intercepts almost all frees
- Eviction callback does direct munmap (bypasses batch)
**Result**: You've already captured **~29% gain** from switching to mmap + BigCache!
Batching will mostly help **cold-churn patterns**, not hit-heavy ones.
---
## 🔍 **Why 0 Blocks Are Batched**
### 1. Free Path Skipped
- Cacheable mmap blocks → BigCache → return early
- `hak_batch_add` (hakmem.c:586) **never runs**
### 2. Eviction Bypasses Batch
- BigCache eviction callback (hakmem.c:403):
```c
case ALLOC_METHOD_MMAP:
madvise(raw, hdr->size, MADV_FREE);
munmap(raw, hdr->size); // ❌ Direct munmap, not batched
break;
```
### 3. Too Few Evictions
- VM(10) + `BIGCACHE_RING_CAP=4` → only **1 eviction**
- `BATCH_THRESHOLD=4MB` needs **≥2 × 2MB** evictions to flush
---
## ✅ **Fixes (Structural First)**
### Fix 1: Route Eviction Through Batch
**File**: `hakmem.c:403-407`
**Current (WRONG)**:
```c
case ALLOC_METHOD_MMAP:
madvise(raw, hdr->size, MADV_FREE);
munmap(raw, hdr->size); // ❌ Bypasses batch
break;
```
**Fixed**:
```c
case ALLOC_METHOD_MMAP:
// Cold eviction: use batch for large blocks
if (hdr->size >= BATCH_MIN_SIZE) {
hak_batch_add(raw, hdr->size); // ✅ Route to batch
} else {
// Small blocks: direct munmap
madvise(raw, hdr->size, MADV_FREE);
munmap(raw, hdr->size);
}
break;
```
### Fix 2: Document Boundary
**Add to README**:
> "BigCache retains for warm reuse; on cold eviction, hand off to Batch; only Batch may `munmap`."
This prevents regressions.
---
## 🧪 **Bench Plan (Exercise Batching)**
### Option 1: Increase Churn
```bash
# Generate 1000 alloc/free ops (100 × 10)
./bench_allocators_hakmem --allocator hakmem-evolving --scenario vm --iterations 100
```
**Expected**:
- Evictions: ~96 (100 allocs - 4 cache slots)
- Batch flushes: ~48 (96 evictions ÷ 2 blocks/flush at 4MB threshold)
- Stats: `Total blocks added > 0`
### Option 2: Reduce Cache Capacity
**File**: `hakmem_bigcache.h:20`
```c
#define BIGCACHE_RING_CAP 2 // Changed from 4
```
**Result**: More evictions with same iterations
---
## 📊 **Performance Expectations**
### Current Gains
- **Previous** (malloc): 36,647 ns
- **Current** (mmap + BigCache): 25,888 ns
- **Improvement**: **29.4%** 🎉
### Expected with Batch Working
**Scenario 1: Cache-Heavy (Current)**
- BigCache 99% hit → batch rarely used
- **Additional gain**: 0-5% (minimal)
**Scenario 2: Cold-Churn Heavy**
- Many evictions, low reuse
- **Additional gain**: 5-15%
- **Total**: 30-40% vs malloc baseline
### Why Limited Gains?
**ChatGPT Pro's Insight**:
> "Each `munmap` still triggers TLB flush individually. Batching helps by:
> 1. Reducing syscall overhead (N calls → 1 batch)
> 2. Using `MADV_FREE` before `munmap` (lighter)
>
> But it does NOT reduce TLB flushes from N→1. Each `munmap(ptr, size)` in the loop still flushes."
**Key Point**: Batching helps with **syscall overhead**, not TLB flush count.
---
## 🎯 **Answers to Your Questions**
### 1. Is the benchmark too small?
**YES**. With `BIGCACHE_RING_CAP=4`:
- Need >4 evictions to see batching
- VM(10) = 1 eviction only
- **Recommendation**: `--iterations 100`
### 2. Should BigCache eviction use batch?
**YES (with size gate)**:
- Large blocks (≥64KB) → batch
- Small blocks → direct munmap
- **Fix**: hakmem.c:403-407
### 3. Is BigCache capacity too large?
**For testing, yes**:
- Current: 4 slots × 2MB = 8MB
- **For testing**: Reduce to 2 slots
- **For production**: Keep 4 (better hit rate)
### 4. What's the right test scenario?
**Two scenarios needed**:
**A) Cache-Heavy** (current VM):
- Tests BigCache effectiveness
- Batching rarely triggered
**B) Cold-Churn** (new scenario):
```c
// Allocate unique addresses, no reuse
for (int i = 0; i < 1000; i++) {
void* bufs[100];
for (int j = 0; j < 100; j++) {
bufs[j] = alloc(2MB);
}
for (int j = 0; j < 100; j++) {
free(bufs[j]);
}
}
```
### 5. Is 29.4% gain good enough?
**ChatGPT Pro says**:
> "You've already hit the predicted range (30-45%). The gain comes from:
> - mmap efficiency for 2MB blocks
> - BigCache eliminating most alloc/free overhead
>
> Batching adds **marginal** benefit in your workload (cache-heavy).
>
> **Recommendation**: Ship current implementation. Batching will help when you add workloads with lower cache hit rates."
---
## 🚀 **Next Steps (Prioritized)**
### Option A: Fix + Quick Test (Recommended)
1. ✅ Fix BigCache eviction (route to batch)
2. ✅ Run `--iterations 100`
3. ✅ Verify batch stats show >0 blocks
4. ✅ Document the architecture
**Time**: 15-30 minutes
### Option B: Comprehensive Testing
1. Fix BigCache eviction
2. Add cold-churn scenario
3. Benchmark: cache-heavy vs cold-churn
4. Generate comparison chart
**Time**: 1-2 hours
### Option C: Ship Current (Fast Track)
1. Accept 29.4% gain
2. Document "batch infrastructure ready"
3. Test batch when cold-churn workloads appear
**Time**: 5 minutes
---
## 💡 **ChatGPT Pro's Final Recommendation**
**Go with Option A**:
> "Fix the eviction callback to complete the implementation, then run `--iterations 100` to confirm batching works. You'll see stats change from 0→96 blocks added.
>
> The performance gain will be modest (0-10% more) because BigCache is already doing its job. But having the complete infrastructure ready is valuable for future workloads with lower cache hit rates.
>
> **Ship with confidence**: 29.4% gain is solid, and the architecture is now correct."
---
## 📋 **Implementation Checklist**
- [ ] Fix BigCache eviction callback (hakmem.c:403)
- [ ] Run `--iterations 100` test
- [ ] Verify batch stats show >0 blocks
- [ ] Document release path architecture
- [ ] Optional: Add cold-churn test scenario
- [ ] Commit with summary
---
**Generated**: 2025-10-21 by ChatGPT-5 (via codex)
**Status**: Ready to fix and test
**Priority**: Medium (complete infrastructure)