244 lines
8.2 KiB
Markdown
244 lines
8.2 KiB
Markdown
|
|
# Root Cause Analysis: Excessive mmap/munmap During Random_Mixed Benchmark
|
||
|
|
|
||
|
|
**Investigation Date**: 2025-11-25
|
||
|
|
**Status**: COMPLETE - Root Cause Identified
|
||
|
|
**Severity**: HIGH - 400+ unnecessary syscalls per 100K iteration benchmark
|
||
|
|
|
||
|
|
## Executive Summary
|
||
|
|
|
||
|
|
SuperSlabs are being mmap'd repeatedly (400+ times in a 100K iteration benchmark) instead of reusing the LRU cache because **slabs never become completely empty** during the benchmark run. The shared pool architecture requires `meta->used == 0` to trigger `shared_pool_release_slab()`, which is the only path that can populate the LRU cache with cached SuperSlabs for reuse.
|
||
|
|
|
||
|
|
## Evidence
|
||
|
|
|
||
|
|
### Debug Logging Results
|
||
|
|
|
||
|
|
From `HAKMEM_SS_LRU_DEBUG=1 HAKMEM_SS_FREE_DEBUG=1` run on 100K iteration benchmark:
|
||
|
|
|
||
|
|
```
|
||
|
|
[SS_LRU_INIT] max_cached=256 max_memory_mb=512 ttl_sec=60
|
||
|
|
[LRU_POP] class=2 (miss) (cache_size=0/256)
|
||
|
|
[LRU_POP] class=0 (miss) (cache_size=0/256)
|
||
|
|
|
||
|
|
<... rest of benchmark with NO LRU_PUSH, SS_FREE, or EMPTY messages ...>
|
||
|
|
```
|
||
|
|
|
||
|
|
**Key observations:**
|
||
|
|
- Only **2 LRU_POP** calls (both misses)
|
||
|
|
- **Zero LRU_PUSH** calls → Cache never populated
|
||
|
|
- **Zero SS_FREE** calls → No SuperSlabs freed to cache
|
||
|
|
- **Zero "EMPTY detected"** messages → No slabs reached meta->used==0 state
|
||
|
|
|
||
|
|
### Call Count Analysis
|
||
|
|
|
||
|
|
Testing with 100K iterations, ws=256 allocation slots:
|
||
|
|
- SuperSlab capacity (class 2 = 32B): 1984 blocks per slab
|
||
|
|
- Expected utilization: ~256 blocks / 1984 = 13%
|
||
|
|
- Result: Slabs remain 87% empty but never reach `used == 0`
|
||
|
|
|
||
|
|
## Root Cause: Shared Pool EMPTY Condition Never Triggered
|
||
|
|
|
||
|
|
### Code Path Analysis
|
||
|
|
|
||
|
|
**File**: `core/box/free_local_box.c` (lines 177-202)
|
||
|
|
|
||
|
|
```c
|
||
|
|
meta->used--;
|
||
|
|
ss_active_dec_one(ss);
|
||
|
|
|
||
|
|
if (meta->used == 0) { // ← THIS CONDITION NEVER MET
|
||
|
|
ss_mark_slab_empty(ss, slab_idx);
|
||
|
|
shared_pool_release_slab(ss, slab_idx); // ← Path to LRU cache
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Triggering condition**: **ALL** slabs in a SuperSlab must have `used == 0`
|
||
|
|
|
||
|
|
**File**: `core/box/sp_core_box.inc` (lines 799-836)
|
||
|
|
|
||
|
|
```c
|
||
|
|
if (atomic_load_explicit(&sp_meta->active_slots, ...) == 0) {
|
||
|
|
// All slots are EMPTY → SuperSlab can be freed to cache or munmap
|
||
|
|
ss_lifetime_on_empty(ss, class_idx); // → superslab_free() → hak_ss_lru_push()
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### Why Condition Never Triggers During Benchmark
|
||
|
|
|
||
|
|
**Workload pattern** (`bench_random_mixed.c` lines 96-137):
|
||
|
|
|
||
|
|
1. Allocate to random `slots[0..255]` (ws=256)
|
||
|
|
2. Free from random `slots[0..255]`
|
||
|
|
3. Expected steady-state: ~128 allocated, ~128 in freelist
|
||
|
|
4. Each slab remains partially filled: **never reaches 100% free**
|
||
|
|
|
||
|
|
**Concrete timeline (Class 2, 32B allocations)**:
|
||
|
|
```
|
||
|
|
Time T0: Allocate blocks 1, 5, 17, 42 to slots[0..3]
|
||
|
|
Slab has: used=4, capacity=1984
|
||
|
|
|
||
|
|
Time T1: Free slot[1] → blocks 5 freed
|
||
|
|
Slab has: used=3, capacity=1984
|
||
|
|
|
||
|
|
Time T100000: Free slot[0] → blocks 1 freed
|
||
|
|
Final state: Slab still has used=1, capacity=1984
|
||
|
|
Condition meta->used==0? → FALSE
|
||
|
|
```
|
||
|
|
|
||
|
|
## Impact: Allocation Path Forced to Stage 3
|
||
|
|
|
||
|
|
Without SuperSlabs in LRU cache, allocation falls back to Stage 3 (mutex-protected mmap):
|
||
|
|
|
||
|
|
**File**: `core/box/sp_core_box.inc` (lines 435-672)
|
||
|
|
|
||
|
|
```
|
||
|
|
Stage 0: L0 hot slot lookup → MISS (new workload)
|
||
|
|
Stage 0.5: EMPTY slab scan → MISS (registry empty)
|
||
|
|
Stage 1: Lock-free per-class list → MISS (no EMPTY slots yet)
|
||
|
|
Stage 2: Lock-free unused slots → MISS (all in use or partially full)
|
||
|
|
[Tension drain attempted...] → No effect
|
||
|
|
Stage 3: Allocate new SuperSlab → shared_pool_allocate_superslab_unlocked()
|
||
|
|
↓
|
||
|
|
shared_pool_alloc_raw_superslab()
|
||
|
|
↓
|
||
|
|
superslab_allocate()
|
||
|
|
↓
|
||
|
|
hak_ss_lru_pop() → MISS (cache empty)
|
||
|
|
↓
|
||
|
|
ss_os_acquire()
|
||
|
|
↓
|
||
|
|
mmap(4MB) → SYSCALL (unavoidable)
|
||
|
|
```
|
||
|
|
|
||
|
|
## Why Recent Commits Made It Worse
|
||
|
|
|
||
|
|
### Commit 203886c97: "Fix active_slots EMPTY detection"
|
||
|
|
|
||
|
|
Added at line 189-190 of `free_local_box.c`:
|
||
|
|
```c
|
||
|
|
shared_pool_release_slab(ss, slab_idx);
|
||
|
|
```
|
||
|
|
|
||
|
|
**Intent**: Enable proper EMPTY detection to populate LRU cache
|
||
|
|
|
||
|
|
**Unintended consequence**: This NEW call assumes slabs will become empty, but they don't. Meanwhile:
|
||
|
|
- Old architecture kept SuperSlabs in `g_superslab_heads[class_idx]` indefinitely
|
||
|
|
- New architecture tries to free them (via `shared_pool_release_slab()`) but fails because EMPTY condition unreachable
|
||
|
|
|
||
|
|
### Architecture Mismatch
|
||
|
|
|
||
|
|
**Old approach** (Phase 2a - per-class SuperSlabHead):
|
||
|
|
- `g_superslab_heads[class_idx]` = linked list of all SuperSlabs for this class
|
||
|
|
- Scan entire list for available slabs on each allocation
|
||
|
|
- O(n) but never deallocates during run
|
||
|
|
|
||
|
|
**New approach** (Phase 12 - shared pool):
|
||
|
|
- Try to cache SuperSlabs when completely empty
|
||
|
|
- LRU management with configurable limits
|
||
|
|
- But: Completely empty condition unreachable with typical workloads
|
||
|
|
|
||
|
|
## Missing Piece: Per-Class Registry Population
|
||
|
|
|
||
|
|
**File**: `core/box/sp_core_box.inc` (lines 235-282)
|
||
|
|
|
||
|
|
```c
|
||
|
|
if (empty_reuse_enabled) {
|
||
|
|
extern SuperSlab* g_super_reg_by_class[TINY_NUM_CLASSES][SUPER_REG_PER_CLASS];
|
||
|
|
int reg_size = g_super_reg_class_size[class_idx];
|
||
|
|
// Scan for EMPTY slabs...
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Problem**: `g_super_reg_by_class[][]` is **not populated** because per-class registration was removed in Phase 12:
|
||
|
|
|
||
|
|
**File**: `core/hakmem_super_registry.c` (lines 100-104)
|
||
|
|
|
||
|
|
```c
|
||
|
|
// Phase 12: per-class registry not keyed by ss->size_class anymore.
|
||
|
|
// Keep existing global hash registration only.
|
||
|
|
pthread_mutex_unlock(&g_super_reg_lock);
|
||
|
|
return 1;
|
||
|
|
```
|
||
|
|
|
||
|
|
Result: Empty scan always returns 0 hits, Stage 0.5 always misses.
|
||
|
|
|
||
|
|
## Timeline of mmap Calls
|
||
|
|
|
||
|
|
For 100K iteration benchmark with ws=256:
|
||
|
|
|
||
|
|
```
|
||
|
|
Initialization phase:
|
||
|
|
- mmap() Class 2: 1x (SuperSlab allocated for slab 0)
|
||
|
|
- mmap() Class 3: 1x (SuperSlab allocated for slab 1)
|
||
|
|
- ... (other classes)
|
||
|
|
|
||
|
|
Main loop (100K iterations):
|
||
|
|
Stage 3 allocations triggered when all Stage 0-2 searches fail:
|
||
|
|
- Expected: ~10-20 more SuperSlabs due to fragmentation
|
||
|
|
- Actual: ~200+ new SuperSlabs allocated
|
||
|
|
|
||
|
|
Result: ~400 total mmap calls (including alignment trimming)
|
||
|
|
```
|
||
|
|
|
||
|
|
## Recommended Fixes
|
||
|
|
|
||
|
|
### Priority 1: Enable EMPTY Condition Detection
|
||
|
|
|
||
|
|
**Option A1: Lower granularity from SuperSlab to individual slabs**
|
||
|
|
|
||
|
|
Change trigger from "all SuperSlab slots empty" to "individual slab empty":
|
||
|
|
|
||
|
|
```c
|
||
|
|
// Current: waits for entire SuperSlab to be empty
|
||
|
|
if (atomic_load_explicit(&sp_meta->active_slots, ...) == 0)
|
||
|
|
|
||
|
|
// Proposed: trigger on individual slab empty
|
||
|
|
if (meta->used == 0) // Already there, just needs LRU-compatible handling
|
||
|
|
```
|
||
|
|
|
||
|
|
**Impact**: Each individual empty slab can be recycled immediately, without waiting for entire SuperSlab.
|
||
|
|
|
||
|
|
### Priority 2: Restore Per-Class Registry or Implement L1 Cache
|
||
|
|
|
||
|
|
**Option A2: Rebuild per-class empty slab registry**
|
||
|
|
|
||
|
|
```c
|
||
|
|
// Track empty slabs per-class during free
|
||
|
|
if (meta->used == 0) {
|
||
|
|
g_sp_empty_slabs_by_class[class_idx].push(ss, slab_idx);
|
||
|
|
}
|
||
|
|
|
||
|
|
// Stage 0.5 reuse (currently broken):
|
||
|
|
SuperSlab* candidate = g_sp_empty_slabs_by_class[class_idx].pop();
|
||
|
|
```
|
||
|
|
|
||
|
|
### Priority 3: Reduce Stage 3 Frequency
|
||
|
|
|
||
|
|
**Option A3: Increase Slab Capacity or Reduce Working Set Pressure**
|
||
|
|
|
||
|
|
Not practical for benchmarks, but highlights that shared pool needs better slab reuse efficiency.
|
||
|
|
|
||
|
|
## Validation
|
||
|
|
|
||
|
|
To confirm fix effectiveness:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Before fix: 400+ LRU_POP misses + mmap calls
|
||
|
|
export HAKMEM_SS_LRU_DEBUG=1 HAKMEM_SS_FREE_DEBUG=1
|
||
|
|
./out/debug/bench_random_mixed_hakmem 100000 256 42 2>&1 | grep -E "LRU_|SS_FREE|EMPTY|mmap"
|
||
|
|
|
||
|
|
# After fix: Multiple LRU_PUSH hits + <50 mmap calls
|
||
|
|
# Expected: [EMPTY detected] messages + [LRU_PUSH] messages
|
||
|
|
```
|
||
|
|
|
||
|
|
## Files Involved
|
||
|
|
|
||
|
|
1. `core/box/free_local_box.c` - Trigger point for EMPTY detection
|
||
|
|
2. `core/box/sp_core_box.inc` - Stage 3 allocation (mmap fallback)
|
||
|
|
3. `core/hakmem_super_registry.c` - LRU cache (never populated)
|
||
|
|
4. `core/hakmem_tiny_superslab.c` - SuperSlab allocation/free
|
||
|
|
5. `core/box/ss_lifetime_box.h` - Lifetime policy (calls superslab_free)
|
||
|
|
|
||
|
|
## Conclusion
|
||
|
|
|
||
|
|
The 400+ mmap/munmap calls are a symptom of the shared pool architecture not being designed to handle workloads where slabs never reach 100% empty. The LRU cache mechanism exists but never activates because its trigger condition (`active_slots == 0`) is unreachable. The fix requires either lowering the trigger granularity, rebuilding the per-class registry, or restructuring the shared pool to support partial-slab reuse.
|