CRITICAL DISCOVERY: Phase 9 LRU architecturally unreachable due to TLS SLL

Root Cause:
- TLS SLL fast path (95-99% of frees) does NOT decrement meta->used
- Slabs never appear empty (meta->used never reaches 0)
- superslab_free() never called
- hak_ss_lru_push() never called
- LRU cache utilization: 0% (should be >90%)

Impact:
- mmap/munmap churn: 6,455 syscalls (74.8% time)
- Performance: -94% regression (9.38M → 563K ops/s)
- Phase 9 design goal: FAILED (lazy deallocation non-functional)

Evidence:
- 200K iterations: [LRU_PUSH]=0, [LRU_POP]=877 misses
- Experimental verification with debug logs confirms theory

Solution: Option B - Periodic TLS SLL Drain
- Every 1,024 frees: drain TLS SLL → slab freelist
- Decrement meta->used properly → enable empty detection
- Expected: -96% syscalls, +1,300-1,700% throughput

Files:
- PHASE9_LRU_ARCHITECTURE_ISSUE.md: Comprehensive analysis (300+ lines)
- Includes design options A/B/C/D with tradeoff analysis

Next: Await ultrathink approval to implement Option B
This commit is contained in:
Moe Charm (CI)
2025-11-14 06:49:32 +09:00
parent c6a2a6d38a
commit f95448c767
6 changed files with 498 additions and 4 deletions

View File

@ -247,6 +247,13 @@ shared_pool_release_slab(SuperSlab* ss, int slab_idx)
return;
}
// ADD DEBUG LOGGING
static int dbg = -1;
if (__builtin_expect(dbg == -1, 0)) {
const char* e = getenv("HAKMEM_SS_FREE_DEBUG");
dbg = (e && *e && *e != '0') ? 1 : 0;
}
pthread_mutex_lock(&g_shared_pool.alloc_lock);
TinySlabMeta* meta = &ss->slabs[slab_idx];
@ -256,6 +263,11 @@ shared_pool_release_slab(SuperSlab* ss, int slab_idx)
return;
}
if (dbg == 1) {
fprintf(stderr, "[SS_SLAB_EMPTY] ss=%p slab_idx=%d class=%d used=0 (releasing to pool)\n",
(void*)ss, slab_idx, meta->class_idx);
}
uint32_t bit = (1u << slab_idx);
if (ss->slab_bitmap & bit) {
ss->slab_bitmap &= ~bit;
@ -276,9 +288,25 @@ shared_pool_release_slab(SuperSlab* ss, int slab_idx)
// We could rescan ss for another matching slab; to keep it cheap, just clear.
g_shared_pool.class_hints[old_class] = NULL;
}
}
// TODO Phase 12-4+: if ss->active_slabs == 0, consider GC / unmap.
// DEBUG: Check if SuperSlab is now completely empty
if (dbg == 1 && ss->active_slabs == 0) {
fprintf(stderr, "[SS_COMPLETELY_EMPTY] ss=%p active_slabs=0 (calling superslab_free)\n",
(void*)ss);
}
// Phase 12-4: Free SuperSlab when it becomes completely empty
if (ss->active_slabs == 0) {
pthread_mutex_unlock(&g_shared_pool.alloc_lock);
// Call superslab_free() to either:
// 1. Cache in LRU (hak_ss_lru_push) - lazy deallocation
// 2. Or munmap if LRU is full - eager deallocation
extern void superslab_free(SuperSlab* ss);
superslab_free(ss);
return;
}
}
pthread_mutex_unlock(&g_shared_pool.alloc_lock);
}