Files

Moe Charm (CI) 52386401b3 Debug Counters Implementation - Clean History

Major Features:
- Debug counter infrastructure for Refill Stage tracking
- Free Pipeline counters (ss_local, ss_remote, tls_sll)
- Diagnostic counters for early return analysis
- Unified larson.sh benchmark runner with profiles
- Phase 6-3 regression analysis documentation

Bug Fixes:
- Fix SuperSlab disabled by default (HAKMEM_TINY_USE_SUPERSLAB)
- Fix profile variable naming consistency
- Add .gitignore patterns for large files

Performance:
- Phase 6-3: 4.79 M ops/s (has OOM risk)
- With SuperSlab: 3.13 M ops/s (+19% improvement)

This is a clean repository without large log files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-05 12:31:14 +09:00

7.3 KiB

Raw Blame History

ChatGPT Pro Response: mmap vs malloc Strategy

Date: 2025-10-21 Response Time: ~2 minutes Model: GPT-5 (via codex) Status: ✅ Clear recommendation received

🎯 Final Recommendation: GO with Option A

Decision: Switch POLICY_LARGE_INFREQUENT to mmap with kill-switch guard.

✅ Why Option A

Phase 6.3 requires mmap: madvise is a no-op on malloc blocks
BigCache absorbs risk: 90% hit rate → only 10% hit OS (1538 → 150 faults)
mimalloc's secret: "keep mapping, lazily reclaim" with MADV_FREE/DONTNEED
Immediate unlock: Phase 6.3 works immediately

🔥 CRITICAL BUG DISCOVERED in Current Code

Problem in hakmem.c:543:

case ALLOC_METHOD_MMAP:
    if (hdr->size >= BATCH_MIN_SIZE) {
        hak_batch_add(raw, hdr->size);  // Add to batch
    }
    munmap(raw, hdr->size);  // ← BUG! Immediately unmaps
    break;

Why this is wrong:

Calls munmap immediately after adding to batch
Negates Phase 6.3 benefit: batch cannot coalesce/defray TLB work
TLB flush happens on munmap, not on madvise

✅ Correct Implementation

Free Path Logic (Choose ONE):

Option 1: Cache in BigCache

// Try BigCache first
if (hak_bigcache_try_insert(ptr, size, site_id)) {
    // Cached! Do NOT munmap
    // Optionally: madvise(MADV_FREE) on insert or eviction
    return;
}

Option 2: Batch for delayed reclaim

// BigCache full, add to batch
if (size >= BATCH_MIN_SIZE) {
    hak_batch_add(raw, size);
    // Do NOT munmap here!
    // munmap happens on batch flush (coalesced)
    return;
}

Option 3: Immediate unmap (last resort)

// Cold eviction only
munmap(raw, size);

🎯 Implementation Plan

Phase 1: Minimal Change (1-line)

File: hakmem.c:357

case POLICY_LARGE_INFREQUENT:
    return alloc_mmap(size);  // Changed from alloc_malloc

Guard with kill-switch:

#ifdef HAKO_HAKMEM_LARGE_MMAP
    return alloc_mmap(size);
#else
    return alloc_malloc(size);  // Safe fallback
#endif

Env variable: HAKO_HAKMEM_LARGE_MMAP=1 (default OFF)

Phase 2: Fix Free Path

File: hakmem.c:543-548

Current (WRONG):

case ALLOC_METHOD_MMAP:
    if (hdr->size >= BATCH_MIN_SIZE) {
        hak_batch_add(raw, hdr->size);
    }
    munmap(raw, hdr->size);  // ← Remove this!
    break;

Correct:

case ALLOC_METHOD_MMAP:
    // Try BigCache first
    if (hdr->size >= 1048576) {  // 1MB threshold
        if (hak_bigcache_try_insert(user_ptr, hdr->size, site_id)) {
            // Cached, skip munmap
            return;
        }
    }

    // BigCache full, add to batch
    if (hdr->size >= BATCH_MIN_SIZE) {
        hak_batch_add(raw, hdr->size);
        // munmap deferred to batch flush
        return;
    }

    // Small or batch disabled, immediate unmap
    munmap(raw, hdr->size);
    break;

Phase 3: Batch Flush Implementation

File: hakmem_batch.c

void hak_batch_flush(void) {
    if (batch_count == 0) return;

    // Use MADV_FREE (prefer) or MADV_DONTNEED (fallback)
    for (size_t i = 0; i < batch_count; i++) {
        #ifdef __linux__
            madvise(batch[i].ptr, batch[i].size, MADV_FREE);
        #else
            madvise(batch[i].ptr, batch[i].size, MADV_DONTNEED);
        #endif
    }

    // Optional: munmap on cold eviction
    // (Keep VA mapped for reuse in most cases)

    batch_count = 0;
}

📊 Expected Performance Gains

Metrics Prediction:

Metric	Current (malloc)	With Option A (mmap)	Improvement
Page faults	513	120-180	65-77% fewer
TLB shootdowns	~150	3-8	95% fewer
Latency (VM)	36,647 ns	24,000-28,000 ns	30-45% faster

Success Criteria:

✅ Page faults: 120-180 (vs 513 current)
✅ Batch flushes: 3-8 per run
✅ Latency: 25-28 µs (vs 36.6 µs current)

Rollback Criteria:

❌ Page faults > 500 (BigCache failing)
❌ Latency regression (slower than 36,647 ns)

🛡️ Risk Mitigation

1. Kill-Switch Guard

// Compile-time or runtime flag
HAKO_HAKMEM_LARGE_MMAP=1  // Enable mmap path

2. BigCache Hard Cap

Limit: 64-256 MB (1-2× working set)
LRU eviction to batched reclaim

3. Prefer MADV_FREE

Lower TLB cost than MADV_DONTNEED
Better performance on quick reuse
Linux: MADV_FREE, macOS: MADV_FREE_REUSABLE

4. Observability (Add Counters)

mmap allocation count
BigCache hits/misses for mmap
Batch flush count
munmap count
Sample minflt/majflt before/after

🧪 Test Plan

Step 1: Enable mmap with guard

# Makefile
CFLAGS += -DHAKO_HAKMEM_LARGE_MMAP=1

Step 2: Run VM scenario benchmark

# 10 runs, measure:
make bench_vm RUNS=10

Step 3: Collect metrics

BigCache hit% for mmap
Page faults (expect 120-180)
Batch flushes (expect 3-8)
Latency (expect 24-28 µs)

Step 4: Validate or rollback

# If page faults > 500 or latency regresses:
CFLAGS += -UHAKO_HAKMEM_LARGE_MMAP  # Rollback

🎯 BigCache + mmap Compatibility

ChatGPT Pro confirms: SAFE

✅ mmap blocks can be cached (same as malloc semantics)
✅ Content unspecified (matches malloc)
✅ Reusable after MADV_FREE

Required changes:

Allocation: hak_bigcache_try_get serves mmap blocks
Free: Try BigCache insert first, skip munmap if cached
Header: Keep ALLOC_METHOD_MMAP on cached blocks

🏆 mimalloc's Secret Revealed

How mimalloc wins on VM scenario:

Keep VA mapped: Don't munmap immediately
Lazy reclaim: Use MADV_FREE/REUSABLE
Batch TLB work: Coalesce reclamation
Per-segment reuse: Cache large blocks

Our Option A emulates this: BigCache + mmap + MADV_FREE + batching

📋 Action Items

Immediate (Phase 1):

Add kill-switch guard (HAKO_HAKMEM_LARGE_MMAP)
Change line 357: return alloc_mmap(size);
Test compile

Critical (Phase 2):

Fix free path (remove immediate munmap)
Implement BigCache insert check
Defer munmap to batch flush

Optimization (Phase 3):

Switch to MADV_FREE (Linux)
Add observability counters
Implement BigCache hard cap (64-256 MB)

Validation:

Run VM scenario (10 runs)
Verify page faults < 200
Verify latency 24-28 µs
Rollback if metrics fail

🎯 Alternative: Option C (ELO)

If Option A fails:

Extend ELO action space: malloc vs mmap dimension
Doubles ELO arms (12 → 24 strategies)
Slower convergence, more complex

ChatGPT Pro says: "Overkill right now. Ship Option A with kill-switch first."

📊 Summary

Decision: ✅ GO with Option A (mmap + kill-switch)

Critical Fix: Remove immediate munmap in free path

Expected Gain: 30-45% improvement on VM scenario (36.6 → 24-28 µs)

Next Steps:

Implement Phase 1 (1-line change + guard)
Fix Phase 2 (free path)
Run VM benchmark
Validate or rollback

Confidence: HIGH (based on BigCache's 90% hit rate + mimalloc analysis)

Generated: 2025-10-21 by ChatGPT-5 (via codex exec) Status: Ready for implementation Priority: P0 (unlocks Phase 6.3)

7.3 KiB Raw Blame History Unescape Escape