Files
hakmem/docs/analysis/CHATGPT_CONSULTATION_MMAP.md
Moe Charm (CI) 52386401b3 Debug Counters Implementation - Clean History
Major Features:
- Debug counter infrastructure for Refill Stage tracking
- Free Pipeline counters (ss_local, ss_remote, tls_sll)
- Diagnostic counters for early return analysis
- Unified larson.sh benchmark runner with profiles
- Phase 6-3 regression analysis documentation

Bug Fixes:
- Fix SuperSlab disabled by default (HAKMEM_TINY_USE_SUPERSLAB)
- Fix profile variable naming consistency
- Add .gitignore patterns for large files

Performance:
- Phase 6-3: 4.79 M ops/s (has OOM risk)
- With SuperSlab: 3.13 M ops/s (+19% improvement)

This is a clean repository without large log files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-05 12:31:14 +09:00

283 lines
8.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# ChatGPT Pro Consultation: mmap vs malloc Strategy
**Date**: 2025-10-21
**Context**: hakmem allocator optimization (Phase 6.2 + 6.3 implementation)
**Time Limit**: 10 minutes
**Question Type**: Architecture decision
---
## 🎯 Core Question
**Should we switch from malloc to mmap for large allocations (POLICY_LARGE_INFREQUENT) to enable Phase 6.3 madvise batching?**
---
## 📊 Current Situation
### What We Built (Phases 6.2 + 6.3)
1. **Phase 6.2: ELO Strategy Selection**
- 12 candidate strategies (512KB-32MB thresholds)
- Epsilon-greedy selection (10% exploration)
- Expected: +10-20% on VM scenario
2. **Phase 6.3: madvise Batching**
- Batch MADV_DONTNEED calls (4MB threshold)
- Reduces TLB flush overhead
- Expected: +20-30% on VM scenario
### Critical Problem Discovered
**Phase 6.3 doesn't work because all allocations use malloc!**
```c
// hakmem.c:357
static void* allocate_with_policy(size_t size, Policy policy) {
switch (policy) {
case POLICY_LARGE_INFREQUENT:
// ALL ALLOCATIONS USE MALLOC
return alloc_malloc(size); // ← Was alloc_mmap(size) before
```
**Why this is a problem**:
- madvise() only works on mmap blocks (not malloc!)
- Current code: 100% malloc → 0% madvise batching
- Phase 6.3 implementation is correct, but never triggered
---
## 📜 Key Code Snippets
### 1. Current Allocation Strategy (ALL MALLOC)
```c
// hakmem.c:349-357
static void* allocate_with_policy(size_t size, Policy policy) {
switch (policy) {
case POLICY_LARGE_INFREQUENT:
// CHANGED: Use malloc for all sizes to leverage system allocator's
// built-in free-list and mmap optimization. Direct mmap() without
// free-list causes excessive page faults (1538 vs 2 for 10×2MB).
//
// Future: Implement per-site mmap cache for true zero-copy large allocs.
return alloc_malloc(size); // was: alloc_mmap(size)
case POLICY_SMALL_FREQUENT:
case POLICY_MEDIUM:
case POLICY_DEFAULT:
default:
return alloc_malloc(size);
}
}
```
### 2. BigCache (Implemented for malloc blocks)
```c
// hakmem.c:430-437
// NEW: Try BigCache first (for large allocations)
if (size >= 1048576) { // 1MB threshold
void* cached_ptr = NULL;
if (hak_bigcache_try_get(size, site_id, &cached_ptr)) {
// Cache hit! Return immediately
return cached_ptr;
}
}
```
**Stats from FINAL_RESULTS.md**:
- BigCache hit rate: 90%
- Page faults reduced: 50% (513 vs 1026)
- BigCache caches malloc blocks (not mmap)
### 3. madvise Batching (Only works on mmap!)
```c
// hakmem.c:543-548
case ALLOC_METHOD_MMAP:
// Phase 6.3: Batch madvise for mmap blocks ONLY
if (hdr->size >= BATCH_MIN_SIZE) {
hak_batch_add(raw, hdr->size); // ← Never called!
}
munmap(raw, hdr->size);
break;
```
**Problem**: No blocks have ALLOC_METHOD_MMAP, so batching never triggers.
### 4. Historical Context (Why malloc was chosen)
```c
// Comment in hakmem.c:352-356
// CHANGED: Use malloc for all sizes to leverage system allocator's
// built-in free-list and mmap optimization. Direct mmap() without
// free-list causes excessive page faults (1538 vs 2 for 10×2MB).
//
// Future: Implement per-site mmap cache for true zero-copy large allocs.
```
**Before BigCache**:
- Direct mmap: 1538 page faults (10 allocations × 2MB)
- malloc: 2 page faults (system allocator's internal mmap caching)
**After BigCache** (current):
- BigCache hit rate: 90% → Only 10% of allocations hit actual allocator
- Expected page faults with mmap: 1538 × 10% = ~150 faults
---
## 🤔 Decision Options
### Option A: Switch to mmap (Enable Phase 6.3)
**Change**:
```c
case POLICY_LARGE_INFREQUENT:
return alloc_mmap(size); // 1-line change
```
**Pros**:
- ✅ Phase 6.3 madvise batching works immediately
- ✅ BigCache (90% hit) should prevent page fault explosion
- ✅ Combined effect: BigCache + madvise batching
- ✅ Expected: 150 faults → 150/50 = 3 TLB flushes (vs 150 without batching)
**Cons**:
- ❌ Risk of page fault regression if BigCache doesn't work as expected
- ❌ Need to verify BigCache works with mmap blocks (not just malloc)
**Expected Performance**:
- Page faults: 1538 → 150 (BigCache: 90% hit)
- TLB flushes: 150 → 3-5 (madvise batching: 50× reduction)
- Net speedup: +30-50% on VM scenario
### Option B: Keep malloc (Status quo)
**Pros**:
- ✅ Known good performance (system allocator optimization)
- ✅ No risk of page fault regression
**Cons**:
- ❌ Phase 6.3 completely wasted (no madvise batching)
- ❌ No TLB optimization
- ❌ Can't compete with mimalloc (2× faster due to madvise batching)
### Option C: ELO-based dynamic selection
**Change**:
```c
// ELO selects between malloc and mmap strategies
if (strategy_id < 6) {
return alloc_malloc(size);
} else {
return alloc_mmap(size); // Test mmap with top strategies
}
```
**Pros**:
- ✅ Let ELO learning decide based on actual performance
- ✅ Safe fallback to malloc if mmap performs worse
**Cons**:
- ❌ More complex
- ❌ Slower convergence (need data from both paths)
---
## 📊 Benchmark Data (Current Silver Medal Results)
**From FINAL_RESULTS.md**:
| Allocator | JSON (ns) | MIR (ns) | VM (ns) | MIXED (ns) |
|-----------|-----------|----------|---------|------------|
| mimalloc | 278.5 | 1234.0 | **17725.0** | 512.0 |
| **hakmem-evolving** | 272.0 | 1578.0 | **36647.5** | 739.5 |
| hakmem-baseline | 261.0 | 1690.0 | 36910.5 | 781.5 |
| jemalloc | 489.0 | 1493.0 | 27039.0 | 800.5 |
| system | 253.5 | 1724.0 | 62772.5 | 931.5 |
**Current gap (VM scenario)**:
- hakmem vs mimalloc: **2.07× slower** (36647 / 17725)
- Target with Phase 6.3: **1.3-1.4× slower** (close gap by 30-50%)
**Page faults (VM scenario)**:
- hakmem: 513 (with BigCache)
- system: 1026 (without BigCache)
- BigCache reduces faults by 50%
---
## 🎯 Specific Questions for ChatGPT Pro
1. **Risk Assessment**: Is switching to mmap safe given BigCache's 90% hit rate?
- Will 150 page faults (10% miss rate) cause acceptable overhead?
- Is madvise batching (150 → 3-5 TLB flushes) worth the risk?
2. **BigCache + mmap Compatibility**: Any concerns with caching mmap blocks?
- Current: BigCache caches malloc blocks
- Proposed: BigCache caches mmap blocks (same size class)
- Any hidden issues?
3. **Alternative Approach**: Should we implement Option C (ELO-based selection)?
- Let ELO choose between malloc and mmap strategies
- Trade-off: complexity vs. safety
4. **mimalloc Analysis**: Does mimalloc use mmap for large allocations?
- How does it achieve 2× speedup on VM scenario?
- Is madvise batching the main factor?
5. **Performance Prediction**: Expected performance with Option A?
- Current: 36,647 ns (malloc, no batching)
- Predicted: ??? ns (mmap + BigCache + madvise batching)
- Is +30-50% gain realistic?
---
## 🧪 Test Plan (If Option A is chosen)
1. **Switch to mmap** (1-line change)
2. **Run VM scenario benchmark** (10 runs, quick test)
3. **Measure**:
- Page faults (expect ~150, vs 513 with malloc)
- TLB flushes (expect 3-5, vs 150 without batching)
- Latency (expect 25,000-28,000 ns, vs 36,647 ns current)
4. **Rollback if**:
- Page faults > 500 (BigCache not working)
- Latency regression (slower than current)
---
## 📚 Context Files
**Implementation**:
- `hakmem.c`: Main allocator (allocate_with_policy L349)
- `hakmem_bigcache.c`: Per-site cache (90% hit rate)
- `hakmem_batch.c`: madvise batching (Phase 6.3)
- `hakmem_elo.c`: ELO strategy selection (Phase 6.2)
**Documentation**:
- `FINAL_RESULTS.md`: Silver medal results (2nd place / 5 allocators)
- `CHATGPT_FEEDBACK.md`: Your previous recommendations (ACE + ELO + madvise)
- `PHASE_6.2_ELO_IMPLEMENTATION.md`: ELO implementation details
- `PHASE_6.3_MADVISE_BATCHING.md`: madvise batching implementation
---
## 🎯 Recommendation Request
**Please provide**:
1. **Go/No-Go**: Should we switch to mmap (Option A)?
2. **Risk mitigation**: How to safely test without breaking current performance?
3. **Alternative**: If not Option A, what's the best path to gold medal?
4. **Expected gain**: Realistic performance prediction with mmap + batching?
**Time limit**: 10 minutes
**Priority**: HIGH (blocks Phase 6.3 effectiveness)
---
**Generated**: 2025-10-21
**Status**: Awaiting ChatGPT Pro consultation
**Next**: Implement recommended approach