# Phase 7: 4T High-Contention Stability Verification Report **Date**: 2025-11-08 **Tester**: Claude Task Agent **Build**: HEADER_CLASSIDX=1 AGGRESSIVE_INLINE=1 PREWARM_TLS=1 **Test Scope**: Verify fixes from other AI (Superslab Fail-Fast + wrapper fixes) --- ## Executive Summary **Verdict**: ❌ **NOT FIXED** (Potentially WORSE) | Metric | Result | Status | |--------|--------|--------| | **Success Rate** | 30% (6/20) | ❌ Worse than before (35%) | | **Throughput** | 981,138 ops/s (when working) | ✅ Stable | | **Production Ready** | NO | ❌ Unsafe for deployment | | **Root Cause** | Mixed HAKMEM/libc allocations | ⚠️ Still present | **Key Finding**: The Fail-Fast guards did NOT catch any corruption. The crash is caused by "free(): invalid pointer" when malloc fallback is triggered, not by internal corruption. --- ## 1. Stability Test Results (20 runs) ### Summary Statistics ``` Success: 6/20 (30%) Failure: 14/20 (70%) Average Throughput: 981,138 ops/s Throughput Range: 981,087 - 981,190 ops/s ``` ### Comparison with Previous Results | Metric | Before Fixes | After Fixes | Change | |--------|--------------|-------------|--------| | Success Rate | 35% (7/20) | **30% (6/20)** | **-5% ❌** | | Throughput | 981K ops/s | 981K ops/s | 0% | | 1T Baseline | Unknown | 2,737K ops/s | ✅ OK | | 2T | Unknown | 4,905K ops/s | ✅ OK | | 4T Low-Contention | Unknown | 251K ops/s | ⚠️ Slow | **Conclusion**: The fixes did NOT improve stability. Success rate is slightly worse. --- ## 2. Detailed Test Results ### Success Runs (6/20) | Run | Throughput | Variation | |-----|-----------|-----------| | 3 | 981,189 ops/s | +0.005% | | 4 | 981,087 ops/s | baseline | | 7 | 981,087 ops/s | baseline | | 14 | 981,190 ops/s | +0.010% | | 15 | 981,087 ops/s | baseline | | 17 | 981,190 ops/s | +0.010% | **Observation**: When it works, throughput is extremely stable (±0.01%). ### Failure Runs (14/20) All failures follow this pattern: ``` 1. [DEBUG] Phase 7: tiny_alloc(X) rejected, using malloc fallback 2. free(): invalid pointer 3. [DEBUG] superslab_refill returned NULL (OOM) detail: class=X 4. Core dump (exit code 134) ``` **Common failure classes**: 1, 4, 6 (sizes: 16B, 64B, 512B) **Pattern**: OOM in specific classes → malloc fallback → mixed allocation → crash --- ## 3. Fail-Fast Guard Results ### Test Configuration - `HAKMEM_TINY_REFILL_FAILFAST=2` (maximum validation) - Guards check freelist head bounds and meta->used overflow ### Results (5 runs) | Run | Outcome | Corruption Detected? | |-----|---------|---------------------| | 1 | Crash (exit 1) | ❌ No `[ALLOC_CORRUPT]` | | 2 | Crash (exit 1) | ❌ No `[ALLOC_CORRUPT]` | | 3 | Crash (exit 1) | ❌ No `[ALLOC_CORRUPT]` | | 4 | Success (981K ops/s) | ✅ N/A | | 5 | Success (981K ops/s) | ✅ N/A | **Critical Finding**: - **Zero detections** of freelist corruption or metadata overflow - Crashes still happen with guards enabled - Guards are working correctly but NOT catching the root cause **Interpretation**: The bug is NOT in superslab allocation logic. The Fail-Fast guards are correct but irrelevant to this crash. --- ## 4. Performance Analysis ### Low-Contention Regression Check | Test | Throughput | Status | |------|-----------|--------| | 1T baseline | 2,736,909 ops/s | ✅ No regression | | 2T | 4,905,303 ops/s | ✅ No regression | | 4T @ 256 chunks | 251,314 ops/s | ⚠️ Significantly slower | **Observation**: - Low contention (1T, 2T) works perfectly - 4T with low allocation count (256 chunks) is very slow but stable - 4T with high allocation count (1024 chunks) crashes 70% of the time ### Throughput Consistency When the benchmark completes successfully: - Mean: 981,138 ops/s - Stddev: 46 ops/s (±0.005%) - **Extremely stable**, suggesting no race conditions in the hot path --- ## 5. Root Cause Assessment ### What the Other AI Fixed 1. **Superslab Fail-Fast strengthening** (`core/tiny_superslab_alloc.inc.h`): - Added freelist head index/capacity validation - Added meta->used overflow detection - **Impact**: Zero (guards never trigger) 2. **Wrapper fixes** (`core/hakmem.c`): - `g_hakmem_lock_depth` recursion guard - **Impact**: Unknown (not directly related to this crash) ### Why the Fixes Didn't Work **The guards are protecting against the wrong bug.** The actual crash sequence: ``` Thread 1: Allocates class 6 blocks → depletes superslab Thread 2: Allocates class 6 → superslab_refill() → OOM (bitmap=0x00000000) Thread 2: Falls back to malloc() → mixed allocation Thread 3: Frees class 6 block → tries to free malloc() pointer → "invalid pointer" ``` **Root Cause**: - **Superslab starvation** under high contention - **Malloc fallback mixing** creates allocation ownership chaos - **No registry tracking** for malloc-allocated blocks ### Evidence From failure logs: ``` [DEBUG] superslab_refill returned NULL (OOM) detail: class=6 prev_ss=(nil) active=0 bitmap=0x00000000 prev_meta=(nil) used=0 cap=0 slab_idx=0 reused_freelist=0 free_idx=-2 errno=12 ``` **Interpretation**: - `bitmap=0x00000000`: All 32 slabs are empty (no freelist blocks) - `prev_ss=(nil)`: No previous superslab to reuse - `errno=12`: Out of memory (ENOMEM) - Result: Falls back to `malloc()`, creates mixed allocation --- ## 6. Remaining Issues ### Primary Bug: Mixed Allocation Chaos **Problem**: HAKMEM and libc malloc allocations get mixed, causing free() failures. **Trigger**: High-contention workload depletes superslabs → malloc fallback **Frequency**: 70% (14/20 runs) ### Secondary Issue: Superslab Starvation **Problem**: Under high contention, all 32 slabs in a superslab become empty simultaneously. **Evidence**: `bitmap=0x00000000` in all failure logs **Implication**: Need better superslab provisioning or dynamic scaling ### Fail-Fast Guards: Working but Irrelevant **Status**: ✅ Guards are correctly implemented and NOT triggering **Conclusion**: The guards protect against corruption that isn't happening. The real bug is architectural (mixed allocations). --- ## 7. Production Readiness Assessment ### Recommendation: **DO NOT DEPLOY** | Criterion | Status | Reasoning | |-----------|--------|-----------| | **Stability** | ❌ FAIL | 70% crash rate in 4T workloads | | **Correctness** | ❌ FAIL | Mixed allocations cause corruption | | **Performance** | ✅ PASS | When working, throughput is excellent | | **Safety** | ❌ FAIL | No way to distinguish HAKMEM/libc allocations | ### Safe Configurations **Only use HAKMEM for**: - Single-threaded workloads ✅ - Low-contention multi-threaded (≤2T) ✅ - Fixed allocation sizes (no malloc fallback) ⚠️ **DO NOT use for**: - High-contention multi-threaded (4T+) ❌ - Production systems requiring stability ❌ - Mixed HAKMEM/libc allocation scenarios ❌ ### Known Limitations 1. **4T high-contention**: 70% crash rate 2. **Malloc fallback**: Causes invalid free() errors 3. **Superslab starvation**: No recovery mechanism 4. **Class 1, 4, 6**: Most prone to OOM (small sizes, high churn) --- ## 8. Next Steps ### Immediate Actions (Required before production) 1. **Fix Mixed Allocation Bug** (CRITICAL) - Option A: Track all allocations in a global registry (memory overhead) - Option B: Add header to all allocations (8-16 bytes overhead) - Option C: Disable malloc fallback entirely (fail-fast on OOM) 2. **Fix Superslab Starvation** (CRITICAL) - Dynamic superslab scaling (allocate new superslab on OOM) - Better superslab provisioning strategy - Per-thread superslab affinity to reduce contention 3. **Add Allocation Ownership Detection** (CRITICAL) - Prevent free(malloc_ptr) from HAKMEM allocator - Add magic header or bitmap to distinguish allocation sources ### Long-Term Improvements 1. **Better Contention Handling** - Lock-free refill paths - Per-core superslab caches - Adaptive batch sizes based on contention 2. **Memory Pressure Handling** - Graceful degradation on OOM - Spill-to-system-malloc with proper tracking - Memory reclamation from cold classes 3. **Comprehensive Testing** - Stress test with varying thread counts (1-16T) - Long-duration stability testing (hours, not seconds) - Memory leak detection (Valgrind, ASan) --- ## 9. Comparison Table | Metric | Before Fixes | After Fixes | Change | |--------|--------------|-------------|--------| | **Success Rate** | 35% (7/20) | 30% (6/20) | **-5% ❌** | | **Throughput** | 981K ops/s | 981K ops/s | 0% | | **1T Regression** | Unknown | 2,737K ops/s | ✅ OK | | **2T Regression** | Unknown | 4,905K ops/s | ✅ OK | | **4T Low-Contention** | Unknown | 251K ops/s | ⚠️ Slow but stable | | **Fail-Fast Triggers** | Unknown | 0 | ✅ No corruption detected | --- ## 10. Conclusion **The 4T high-contention crash is NOT fixed.** The other AI's fixes (Fail-Fast guards and wrapper improvements) are correct and valuable for catching future bugs, but they do NOT address the root cause of this crash: **Root Cause**: Superslab starvation → malloc fallback → mixed allocations → invalid free() **Next Priority**: Fix the mixed allocation bug (Option C: disable malloc fallback and fail-fast on OOM is the safest short-term solution). **Production Status**: UNSAFE. Do not deploy for high-contention workloads. --- ## Appendix: Test Environment **System**: - OS: Linux 6.8.0-65-generic - CPU: Native architecture (march=native) - Compiler: gcc with -O3 -flto **Build Flags**: - `HEADER_CLASSIDX=1` - `AGGRESSIVE_INLINE=1` - `PREWARM_TLS=1` - `HAKMEM_TINY_PHASE6_BOX_REFACTOR=1` **Test Command**: ```bash ./larson_hakmem 10 8 128 1024 1 12345 4 ``` **Parameters**: - 10 iterations - 8 threads (4T due to doubling) - 128 min object size - 1024 max objects per thread - Seed: 12345 - 4 threads **Runtime**: ~17 minutes per successful run --- **Report Generated**: 2025-11-08 **Verified By**: Claude Task Agent