Files

Moe Charm (CI) 52386401b3 Debug Counters Implementation - Clean History

Major Features:
- Debug counter infrastructure for Refill Stage tracking
- Free Pipeline counters (ss_local, ss_remote, tls_sll)
- Diagnostic counters for early return analysis
- Unified larson.sh benchmark runner with profiles
- Phase 6-3 regression analysis documentation

Bug Fixes:
- Fix SuperSlab disabled by default (HAKMEM_TINY_USE_SUPERSLAB)
- Fix profile variable naming consistency
- Add .gitignore patterns for large files

Performance:
- Phase 6-3: 4.79 M ops/s (has OOM risk)
- With SuperSlab: 3.13 M ops/s (+19% improvement)

This is a clean repository without large log files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-05 12:31:14 +09:00

6.1 KiB

Raw Blame History

ChatGPT Pro Analysis: Batch Not Triggered Issue

Date: 2025-10-21 Status: Implementation correct, coverage issue + one gap

🎯 Short Answer

This is primarily a benchmark coverage issue, plus one implementation gap.

Current run never calls the batch path because:

BigCache intercepts almost all frees
Eviction callback does direct munmap (bypasses batch)

Result: You've already captured ~29% gain from switching to mmap + BigCache!

Batching will mostly help cold-churn patterns, not hit-heavy ones.

🔍 Why 0 Blocks Are Batched

1. Free Path Skipped

Cacheable mmap blocks → BigCache → return early
hak_batch_add (hakmem.c:586) never runs

2. Eviction Bypasses Batch

BigCache eviction callback (hakmem.c:403):

case ALLOC_METHOD_MMAP:
    madvise(raw, hdr->size, MADV_FREE);
    munmap(raw, hdr->size);  // ❌ Direct munmap, not batched
    break;

3. Too Few Evictions

VM(10) + BIGCACHE_RING_CAP=4 → only 1 eviction
BATCH_THRESHOLD=4MB needs ≥2 × 2MB evictions to flush

✅ Fixes (Structural First)

Fix 1: Route Eviction Through Batch

File: hakmem.c:403-407

Current (WRONG):

case ALLOC_METHOD_MMAP:
    madvise(raw, hdr->size, MADV_FREE);
    munmap(raw, hdr->size);  // ❌ Bypasses batch
    break;

Fixed:

case ALLOC_METHOD_MMAP:
    // Cold eviction: use batch for large blocks
    if (hdr->size >= BATCH_MIN_SIZE) {
        hak_batch_add(raw, hdr->size);  // ✅ Route to batch
    } else {
        // Small blocks: direct munmap
        madvise(raw, hdr->size, MADV_FREE);
        munmap(raw, hdr->size);
    }
    break;

Fix 2: Document Boundary

Add to README:

"BigCache retains for warm reuse; on cold eviction, hand off to Batch; only Batch may munmap."

This prevents regressions.

🧪 Bench Plan (Exercise Batching)

Option 1: Increase Churn

# Generate 1000 alloc/free ops (100 × 10)
./bench_allocators_hakmem --allocator hakmem-evolving --scenario vm --iterations 100

Expected:

Evictions: ~96 (100 allocs - 4 cache slots)
Batch flushes: ~48 (96 evictions ÷ 2 blocks/flush at 4MB threshold)
Stats: Total blocks added > 0

Option 2: Reduce Cache Capacity

File: hakmem_bigcache.h:20

#define BIGCACHE_RING_CAP 2  // Changed from 4

Result: More evictions with same iterations

📊 Performance Expectations

Current Gains

Previous (malloc): 36,647 ns
Current (mmap + BigCache): 25,888 ns
Improvement: 29.4% 🎉

Expected with Batch Working

Scenario 1: Cache-Heavy (Current)

BigCache 99% hit → batch rarely used
Additional gain: 0-5% (minimal)

Scenario 2: Cold-Churn Heavy

Many evictions, low reuse
Additional gain: 5-15%
Total: 30-40% vs malloc baseline

Why Limited Gains?

ChatGPT Pro's Insight:

"Each munmap still triggers TLB flush individually. Batching helps by:

Reducing syscall overhead (N calls → 1 batch)

Using MADV_FREE before munmap (lighter)

But it does NOT reduce TLB flushes from N→1. Each munmap(ptr, size) in the loop still flushes."

Key Point: Batching helps with syscall overhead, not TLB flush count.

🎯 Answers to Your Questions

1. Is the benchmark too small?

YES. With BIGCACHE_RING_CAP=4:

Need >4 evictions to see batching
VM(10) = 1 eviction only
Recommendation: --iterations 100

2. Should BigCache eviction use batch?

YES (with size gate):

Large blocks (≥64KB) → batch
Small blocks → direct munmap
Fix: hakmem.c:403-407

3. Is BigCache capacity too large?

For testing, yes:

Current: 4 slots × 2MB = 8MB
For testing: Reduce to 2 slots
For production: Keep 4 (better hit rate)

4. What's the right test scenario?

Two scenarios needed:

A) Cache-Heavy (current VM):

Tests BigCache effectiveness
Batching rarely triggered

B) Cold-Churn (new scenario):

// Allocate unique addresses, no reuse
for (int i = 0; i < 1000; i++) {
    void* bufs[100];
    for (int j = 0; j < 100; j++) {
        bufs[j] = alloc(2MB);
    }
    for (int j = 0; j < 100; j++) {
        free(bufs[j]);
    }
}

5. Is 29.4% gain good enough?

ChatGPT Pro says:

"You've already hit the predicted range (30-45%). The gain comes from:

mmap efficiency for 2MB blocks

BigCache eliminating most alloc/free overhead

Batching adds marginal benefit in your workload (cache-heavy).

Recommendation: Ship current implementation. Batching will help when you add workloads with lower cache hit rates."

🚀 Next Steps (Prioritized)

Option A: Fix + Quick Test (Recommended)

✅ Fix BigCache eviction (route to batch)
✅ Run --iterations 100
✅ Verify batch stats show >0 blocks
✅ Document the architecture

Time: 15-30 minutes

Option B: Comprehensive Testing

Fix BigCache eviction
Add cold-churn scenario
Benchmark: cache-heavy vs cold-churn
Generate comparison chart

Time: 1-2 hours

Option C: Ship Current (Fast Track)

Accept 29.4% gain
Document "batch infrastructure ready"
Test batch when cold-churn workloads appear

Time: 5 minutes

💡 ChatGPT Pro's Final Recommendation

Go with Option A:

"Fix the eviction callback to complete the implementation, then run --iterations 100 to confirm batching works. You'll see stats change from 0→96 blocks added.

The performance gain will be modest (0-10% more) because BigCache is already doing its job. But having the complete infrastructure ready is valuable for future workloads with lower cache hit rates.

Ship with confidence: 29.4% gain is solid, and the architecture is now correct."

📋 Implementation Checklist

Fix BigCache eviction callback (hakmem.c:403)
Run --iterations 100 test
Verify batch stats show >0 blocks
Document release path architecture
Optional: Add cold-churn test scenario
Commit with summary

Generated: 2025-10-21 by ChatGPT-5 (via codex) Status: Ready to fix and test Priority: Medium (complete infrastructure)

6.1 KiB Raw Blame History Unescape Escape