Major Features: - Debug counter infrastructure for Refill Stage tracking - Free Pipeline counters (ss_local, ss_remote, tls_sll) - Diagnostic counters for early return analysis - Unified larson.sh benchmark runner with profiles - Phase 6-3 regression analysis documentation Bug Fixes: - Fix SuperSlab disabled by default (HAKMEM_TINY_USE_SUPERSLAB) - Fix profile variable naming consistency - Add .gitignore patterns for large files Performance: - Phase 6-3: 4.79 M ops/s (has OOM risk) - With SuperSlab: 3.13 M ops/s (+19% improvement) This is a clean repository without large log files. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
8.3 KiB
ChatGPT Pro Consultation: mmap vs malloc Strategy
Date: 2025-10-21 Context: hakmem allocator optimization (Phase 6.2 + 6.3 implementation) Time Limit: 10 minutes Question Type: Architecture decision
🎯 Core Question
Should we switch from malloc to mmap for large allocations (POLICY_LARGE_INFREQUENT) to enable Phase 6.3 madvise batching?
📊 Current Situation
What We Built (Phases 6.2 + 6.3)
-
Phase 6.2: ELO Strategy Selection ✅
- 12 candidate strategies (512KB-32MB thresholds)
- Epsilon-greedy selection (10% exploration)
- Expected: +10-20% on VM scenario
-
Phase 6.3: madvise Batching ✅
- Batch MADV_DONTNEED calls (4MB threshold)
- Reduces TLB flush overhead
- Expected: +20-30% on VM scenario
Critical Problem Discovered
Phase 6.3 doesn't work because all allocations use malloc!
// hakmem.c:357
static void* allocate_with_policy(size_t size, Policy policy) {
switch (policy) {
case POLICY_LARGE_INFREQUENT:
// ALL ALLOCATIONS USE MALLOC
return alloc_malloc(size); // ← Was alloc_mmap(size) before
Why this is a problem:
- madvise() only works on mmap blocks (not malloc!)
- Current code: 100% malloc → 0% madvise batching
- Phase 6.3 implementation is correct, but never triggered
📜 Key Code Snippets
1. Current Allocation Strategy (ALL MALLOC)
// hakmem.c:349-357
static void* allocate_with_policy(size_t size, Policy policy) {
switch (policy) {
case POLICY_LARGE_INFREQUENT:
// CHANGED: Use malloc for all sizes to leverage system allocator's
// built-in free-list and mmap optimization. Direct mmap() without
// free-list causes excessive page faults (1538 vs 2 for 10×2MB).
//
// Future: Implement per-site mmap cache for true zero-copy large allocs.
return alloc_malloc(size); // was: alloc_mmap(size)
case POLICY_SMALL_FREQUENT:
case POLICY_MEDIUM:
case POLICY_DEFAULT:
default:
return alloc_malloc(size);
}
}
2. BigCache (Implemented for malloc blocks)
// hakmem.c:430-437
// NEW: Try BigCache first (for large allocations)
if (size >= 1048576) { // 1MB threshold
void* cached_ptr = NULL;
if (hak_bigcache_try_get(size, site_id, &cached_ptr)) {
// Cache hit! Return immediately
return cached_ptr;
}
}
Stats from FINAL_RESULTS.md:
- BigCache hit rate: 90%
- Page faults reduced: 50% (513 vs 1026)
- BigCache caches malloc blocks (not mmap)
3. madvise Batching (Only works on mmap!)
// hakmem.c:543-548
case ALLOC_METHOD_MMAP:
// Phase 6.3: Batch madvise for mmap blocks ONLY
if (hdr->size >= BATCH_MIN_SIZE) {
hak_batch_add(raw, hdr->size); // ← Never called!
}
munmap(raw, hdr->size);
break;
Problem: No blocks have ALLOC_METHOD_MMAP, so batching never triggers.
4. Historical Context (Why malloc was chosen)
// Comment in hakmem.c:352-356
// CHANGED: Use malloc for all sizes to leverage system allocator's
// built-in free-list and mmap optimization. Direct mmap() without
// free-list causes excessive page faults (1538 vs 2 for 10×2MB).
//
// Future: Implement per-site mmap cache for true zero-copy large allocs.
Before BigCache:
- Direct mmap: 1538 page faults (10 allocations × 2MB)
- malloc: 2 page faults (system allocator's internal mmap caching)
After BigCache (current):
- BigCache hit rate: 90% → Only 10% of allocations hit actual allocator
- Expected page faults with mmap: 1538 × 10% = ~150 faults
🤔 Decision Options
Option A: Switch to mmap (Enable Phase 6.3)
Change:
case POLICY_LARGE_INFREQUENT:
return alloc_mmap(size); // 1-line change
Pros:
- ✅ Phase 6.3 madvise batching works immediately
- ✅ BigCache (90% hit) should prevent page fault explosion
- ✅ Combined effect: BigCache + madvise batching
- ✅ Expected: 150 faults → 150/50 = 3 TLB flushes (vs 150 without batching)
Cons:
- ❌ Risk of page fault regression if BigCache doesn't work as expected
- ❌ Need to verify BigCache works with mmap blocks (not just malloc)
Expected Performance:
- Page faults: 1538 → 150 (BigCache: 90% hit)
- TLB flushes: 150 → 3-5 (madvise batching: 50× reduction)
- Net speedup: +30-50% on VM scenario
Option B: Keep malloc (Status quo)
Pros:
- ✅ Known good performance (system allocator optimization)
- ✅ No risk of page fault regression
Cons:
- ❌ Phase 6.3 completely wasted (no madvise batching)
- ❌ No TLB optimization
- ❌ Can't compete with mimalloc (2× faster due to madvise batching)
Option C: ELO-based dynamic selection
Change:
// ELO selects between malloc and mmap strategies
if (strategy_id < 6) {
return alloc_malloc(size);
} else {
return alloc_mmap(size); // Test mmap with top strategies
}
Pros:
- ✅ Let ELO learning decide based on actual performance
- ✅ Safe fallback to malloc if mmap performs worse
Cons:
- ❌ More complex
- ❌ Slower convergence (need data from both paths)
📊 Benchmark Data (Current Silver Medal Results)
From FINAL_RESULTS.md:
| Allocator | JSON (ns) | MIR (ns) | VM (ns) | MIXED (ns) |
|---|---|---|---|---|
| mimalloc | 278.5 | 1234.0 | 17725.0 | 512.0 |
| hakmem-evolving | 272.0 | 1578.0 | 36647.5 | 739.5 |
| hakmem-baseline | 261.0 | 1690.0 | 36910.5 | 781.5 |
| jemalloc | 489.0 | 1493.0 | 27039.0 | 800.5 |
| system | 253.5 | 1724.0 | 62772.5 | 931.5 |
Current gap (VM scenario):
- hakmem vs mimalloc: 2.07× slower (36647 / 17725)
- Target with Phase 6.3: 1.3-1.4× slower (close gap by 30-50%)
Page faults (VM scenario):
- hakmem: 513 (with BigCache)
- system: 1026 (without BigCache)
- BigCache reduces faults by 50%
🎯 Specific Questions for ChatGPT Pro
-
Risk Assessment: Is switching to mmap safe given BigCache's 90% hit rate?
- Will 150 page faults (10% miss rate) cause acceptable overhead?
- Is madvise batching (150 → 3-5 TLB flushes) worth the risk?
-
BigCache + mmap Compatibility: Any concerns with caching mmap blocks?
- Current: BigCache caches malloc blocks
- Proposed: BigCache caches mmap blocks (same size class)
- Any hidden issues?
-
Alternative Approach: Should we implement Option C (ELO-based selection)?
- Let ELO choose between malloc and mmap strategies
- Trade-off: complexity vs. safety
-
mimalloc Analysis: Does mimalloc use mmap for large allocations?
- How does it achieve 2× speedup on VM scenario?
- Is madvise batching the main factor?
-
Performance Prediction: Expected performance with Option A?
- Current: 36,647 ns (malloc, no batching)
- Predicted: ??? ns (mmap + BigCache + madvise batching)
- Is +30-50% gain realistic?
🧪 Test Plan (If Option A is chosen)
- Switch to mmap (1-line change)
- Run VM scenario benchmark (10 runs, quick test)
- Measure:
- Page faults (expect ~150, vs 513 with malloc)
- TLB flushes (expect 3-5, vs 150 without batching)
- Latency (expect 25,000-28,000 ns, vs 36,647 ns current)
- Rollback if:
- Page faults > 500 (BigCache not working)
- Latency regression (slower than current)
📚 Context Files
Implementation:
hakmem.c: Main allocator (allocate_with_policy L349)hakmem_bigcache.c: Per-site cache (90% hit rate)hakmem_batch.c: madvise batching (Phase 6.3)hakmem_elo.c: ELO strategy selection (Phase 6.2)
Documentation:
FINAL_RESULTS.md: Silver medal results (2nd place / 5 allocators)CHATGPT_FEEDBACK.md: Your previous recommendations (ACE + ELO + madvise)PHASE_6.2_ELO_IMPLEMENTATION.md: ELO implementation detailsPHASE_6.3_MADVISE_BATCHING.md: madvise batching implementation
🎯 Recommendation Request
Please provide:
- Go/No-Go: Should we switch to mmap (Option A)?
- Risk mitigation: How to safely test without breaking current performance?
- Alternative: If not Option A, what's the best path to gold medal?
- Expected gain: Realistic performance prediction with mmap + batching?
Time limit: 10 minutes Priority: HIGH (blocks Phase 6.3 effectiveness)
Generated: 2025-10-21 Status: Awaiting ChatGPT Pro consultation Next: Implement recommended approach