Files
hakmem/PHASE7_4T_STABILITY_VERIFICATION.md
Moe Charm (CI) 707056b765 feat: Phase 7 + Phase 2 - Massive performance & stability improvements
Performance Achievements:
- Tiny allocations: +180-280% (21M → 59-70M ops/s random mixed)
- Single-thread: +24% (2.71M → 3.36M ops/s Larson)
- 4T stability: 0% → 95% (19/20 success rate)
- Overall: 91.3% of System malloc average (target was 40-55%) ✓

Phase 7 (Tasks 1-3): Core Optimizations
- Task 1: Header validation removal (Region-ID direct lookup)
- Task 2: Aggressive inline (TLS cache access optimization)
- Task 3: Pre-warm TLS cache (eliminate cold-start penalty)
  Result: +180-280% improvement, 85-146% of System malloc

Critical Bug Fixes:
- Fix 64B allocation crash (size-to-class +1 for header)
- Fix 4T wrapper recursion bugs (BUG #7, #8, #10, #11)
- Remove malloc fallback (30% → 50% stability)

Phase 2a: SuperSlab Dynamic Expansion (CRITICAL)
- Implement mimalloc-style chunk linking
- Unlimited slab expansion (no more OOM at 32 slabs)
- Fix chunk initialization bug (bitmap=0x00000001 after expansion)
  Files: core/hakmem_tiny_superslab.c/h, core/superslab/superslab_types.h
  Result: 50% → 95% stability (19/20 4T success)

Phase 2b: TLS Cache Adaptive Sizing
- Dynamic capacity: 16-2048 slots based on usage
- High-water mark tracking + exponential growth/shrink
- Expected: +3-10% performance, -30-50% memory
  Files: core/tiny_adaptive_sizing.c/h (new)

Phase 2c: BigCache Dynamic Hash Table
- Migrate from fixed 256×8 array to dynamic hash table
- Auto-resize: 256 → 512 → 1024 → 65,536 buckets
- Improved hash function (FNV-1a) + collision chaining
  Files: core/hakmem_bigcache.c/h
  Expected: +10-20% cache hit rate

Design Flaws Analysis:
- Identified 6 components with fixed-capacity bottlenecks
- SuperSlab (CRITICAL), TLS Cache (HIGH), BigCache/L2.5 (MEDIUM)
- Report: DESIGN_FLAWS_ANALYSIS.md (11 chapters)

Documentation:
- 13 comprehensive reports (PHASE*.md, DESIGN_FLAWS*.md)
- Implementation guides, test results, production readiness
- Bug fix reports, root cause analysis

Build System:
- Makefile: phase7 targets, PREWARM_TLS flag
- Auto dependency generation (-MMD -MP) for .inc files

Known Issues:
- 4T stability: 19/20 (95%) - investigating 1 failure for 100%
- L2.5 Pool dynamic sharding: design only (needs 2-3 days integration)

🤖 Generated with Claude Code (https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-08 17:08:00 +09:00

9.7 KiB

Phase 7: 4T High-Contention Stability Verification Report

Date: 2025-11-08 Tester: Claude Task Agent Build: HEADER_CLASSIDX=1 AGGRESSIVE_INLINE=1 PREWARM_TLS=1 Test Scope: Verify fixes from other AI (Superslab Fail-Fast + wrapper fixes)


Executive Summary

Verdict: NOT FIXED (Potentially WORSE)

Metric Result Status
Success Rate 30% (6/20) Worse than before (35%)
Throughput 981,138 ops/s (when working) Stable
Production Ready NO Unsafe for deployment
Root Cause Mixed HAKMEM/libc allocations ⚠️ Still present

Key Finding: The Fail-Fast guards did NOT catch any corruption. The crash is caused by "free(): invalid pointer" when malloc fallback is triggered, not by internal corruption.


1. Stability Test Results (20 runs)

Summary Statistics

Success: 6/20 (30%)
Failure: 14/20 (70%)
Average Throughput: 981,138 ops/s
Throughput Range: 981,087 - 981,190 ops/s

Comparison with Previous Results

Metric Before Fixes After Fixes Change
Success Rate 35% (7/20) 30% (6/20) -5%
Throughput 981K ops/s 981K ops/s 0%
1T Baseline Unknown 2,737K ops/s OK
2T Unknown 4,905K ops/s OK
4T Low-Contention Unknown 251K ops/s ⚠️ Slow

Conclusion: The fixes did NOT improve stability. Success rate is slightly worse.


2. Detailed Test Results

Success Runs (6/20)

Run Throughput Variation
3 981,189 ops/s +0.005%
4 981,087 ops/s baseline
7 981,087 ops/s baseline
14 981,190 ops/s +0.010%
15 981,087 ops/s baseline
17 981,190 ops/s +0.010%

Observation: When it works, throughput is extremely stable (±0.01%).

Failure Runs (14/20)

All failures follow this pattern:

1. [DEBUG] Phase 7: tiny_alloc(X) rejected, using malloc fallback
2. free(): invalid pointer
3. [DEBUG] superslab_refill returned NULL (OOM) detail: class=X
4. Core dump (exit code 134)

Common failure classes: 1, 4, 6 (sizes: 16B, 64B, 512B)

Pattern: OOM in specific classes → malloc fallback → mixed allocation → crash


3. Fail-Fast Guard Results

Test Configuration

  • HAKMEM_TINY_REFILL_FAILFAST=2 (maximum validation)
  • Guards check freelist head bounds and meta->used overflow

Results (5 runs)

Run Outcome Corruption Detected?
1 Crash (exit 1) No [ALLOC_CORRUPT]
2 Crash (exit 1) No [ALLOC_CORRUPT]
3 Crash (exit 1) No [ALLOC_CORRUPT]
4 Success (981K ops/s) N/A
5 Success (981K ops/s) N/A

Critical Finding:

  • Zero detections of freelist corruption or metadata overflow
  • Crashes still happen with guards enabled
  • Guards are working correctly but NOT catching the root cause

Interpretation: The bug is NOT in superslab allocation logic. The Fail-Fast guards are correct but irrelevant to this crash.


4. Performance Analysis

Low-Contention Regression Check

Test Throughput Status
1T baseline 2,736,909 ops/s No regression
2T 4,905,303 ops/s No regression
4T @ 256 chunks 251,314 ops/s ⚠️ Significantly slower

Observation:

  • Low contention (1T, 2T) works perfectly
  • 4T with low allocation count (256 chunks) is very slow but stable
  • 4T with high allocation count (1024 chunks) crashes 70% of the time

Throughput Consistency

When the benchmark completes successfully:

  • Mean: 981,138 ops/s
  • Stddev: 46 ops/s (±0.005%)
  • Extremely stable, suggesting no race conditions in the hot path

5. Root Cause Assessment

What the Other AI Fixed

  1. Superslab Fail-Fast strengthening (core/tiny_superslab_alloc.inc.h):

    • Added freelist head index/capacity validation
    • Added meta->used overflow detection
    • Impact: Zero (guards never trigger)
  2. Wrapper fixes (core/hakmem.c):

    • g_hakmem_lock_depth recursion guard
    • Impact: Unknown (not directly related to this crash)

Why the Fixes Didn't Work

The guards are protecting against the wrong bug.

The actual crash sequence:

Thread 1: Allocates class 6 blocks → depletes superslab
Thread 2: Allocates class 6 → superslab_refill() → OOM (bitmap=0x00000000)
Thread 2: Falls back to malloc() → mixed allocation
Thread 3: Frees class 6 block → tries to free malloc() pointer → "invalid pointer"

Root Cause:

  • Superslab starvation under high contention
  • Malloc fallback mixing creates allocation ownership chaos
  • No registry tracking for malloc-allocated blocks

Evidence

From failure logs:

[DEBUG] superslab_refill returned NULL (OOM) detail:
  class=6 prev_ss=(nil) active=0 bitmap=0x00000000
  prev_meta=(nil) used=0 cap=0 slab_idx=0
  reused_freelist=0 free_idx=-2 errno=12

Interpretation:

  • bitmap=0x00000000: All 32 slabs are empty (no freelist blocks)
  • prev_ss=(nil): No previous superslab to reuse
  • errno=12: Out of memory (ENOMEM)
  • Result: Falls back to malloc(), creates mixed allocation

6. Remaining Issues

Primary Bug: Mixed Allocation Chaos

Problem: HAKMEM and libc malloc allocations get mixed, causing free() failures.

Trigger: High-contention workload depletes superslabs → malloc fallback

Frequency: 70% (14/20 runs)

Secondary Issue: Superslab Starvation

Problem: Under high contention, all 32 slabs in a superslab become empty simultaneously.

Evidence: bitmap=0x00000000 in all failure logs

Implication: Need better superslab provisioning or dynamic scaling

Fail-Fast Guards: Working but Irrelevant

Status: Guards are correctly implemented and NOT triggering

Conclusion: The guards protect against corruption that isn't happening. The real bug is architectural (mixed allocations).


7. Production Readiness Assessment

Recommendation: DO NOT DEPLOY

Criterion Status Reasoning
Stability FAIL 70% crash rate in 4T workloads
Correctness FAIL Mixed allocations cause corruption
Performance PASS When working, throughput is excellent
Safety FAIL No way to distinguish HAKMEM/libc allocations

Safe Configurations

Only use HAKMEM for:

  • Single-threaded workloads
  • Low-contention multi-threaded (≤2T)
  • Fixed allocation sizes (no malloc fallback) ⚠️

DO NOT use for:

  • High-contention multi-threaded (4T+)
  • Production systems requiring stability
  • Mixed HAKMEM/libc allocation scenarios

Known Limitations

  1. 4T high-contention: 70% crash rate
  2. Malloc fallback: Causes invalid free() errors
  3. Superslab starvation: No recovery mechanism
  4. Class 1, 4, 6: Most prone to OOM (small sizes, high churn)

8. Next Steps

Immediate Actions (Required before production)

  1. Fix Mixed Allocation Bug (CRITICAL)

    • Option A: Track all allocations in a global registry (memory overhead)
    • Option B: Add header to all allocations (8-16 bytes overhead)
    • Option C: Disable malloc fallback entirely (fail-fast on OOM)
  2. Fix Superslab Starvation (CRITICAL)

    • Dynamic superslab scaling (allocate new superslab on OOM)
    • Better superslab provisioning strategy
    • Per-thread superslab affinity to reduce contention
  3. Add Allocation Ownership Detection (CRITICAL)

    • Prevent free(malloc_ptr) from HAKMEM allocator
    • Add magic header or bitmap to distinguish allocation sources

Long-Term Improvements

  1. Better Contention Handling

    • Lock-free refill paths
    • Per-core superslab caches
    • Adaptive batch sizes based on contention
  2. Memory Pressure Handling

    • Graceful degradation on OOM
    • Spill-to-system-malloc with proper tracking
    • Memory reclamation from cold classes
  3. Comprehensive Testing

    • Stress test with varying thread counts (1-16T)
    • Long-duration stability testing (hours, not seconds)
    • Memory leak detection (Valgrind, ASan)

9. Comparison Table

Metric Before Fixes After Fixes Change
Success Rate 35% (7/20) 30% (6/20) -5%
Throughput 981K ops/s 981K ops/s 0%
1T Regression Unknown 2,737K ops/s OK
2T Regression Unknown 4,905K ops/s OK
4T Low-Contention Unknown 251K ops/s ⚠️ Slow but stable
Fail-Fast Triggers Unknown 0 No corruption detected

10. Conclusion

The 4T high-contention crash is NOT fixed.

The other AI's fixes (Fail-Fast guards and wrapper improvements) are correct and valuable for catching future bugs, but they do NOT address the root cause of this crash:

Root Cause: Superslab starvation → malloc fallback → mixed allocations → invalid free()

Next Priority: Fix the mixed allocation bug (Option C: disable malloc fallback and fail-fast on OOM is the safest short-term solution).

Production Status: UNSAFE. Do not deploy for high-contention workloads.


Appendix: Test Environment

System:

  • OS: Linux 6.8.0-65-generic
  • CPU: Native architecture (march=native)
  • Compiler: gcc with -O3 -flto

Build Flags:

  • HEADER_CLASSIDX=1
  • AGGRESSIVE_INLINE=1
  • PREWARM_TLS=1
  • HAKMEM_TINY_PHASE6_BOX_REFACTOR=1

Test Command:

./larson_hakmem 10 8 128 1024 1 12345 4

Parameters:

  • 10 iterations
  • 8 threads (4T due to doubling)
  • 128 min object size
  • 1024 max objects per thread
  • Seed: 12345
  • 4 threads

Runtime: ~17 minutes per successful run


Report Generated: 2025-11-08 Verified By: Claude Task Agent