P0 Lock Contention Analysis: Instrumentation + comprehensive report

**P0-2: Lock Instrumentation** (✅ Complete) - Add atomic counters to g_shared_pool.alloc_lock - Track acquire_slab() vs release_slab() separately - Environment: HAKMEM_SHARED_POOL_LOCK_STATS=1 - Report stats at shutdown via destructor **P0-3: Analysis Results** (✅ Complete) - 100% contention from acquire_slab() (allocation path) - 0% from release_slab() (effectively lock-free!) - Lock rate: 0.206% (TLS hit rate: 99.8%) - Scaling: 4T→8T = 1.44x (sublinear, lock bottleneck) **Key Findings**: - 4T: 330 lock acquisitions / 160K ops - 8T: 658 lock acquisitions / 320K ops - futex: 68% of syscall time (from previous strace) - Bottleneck: acquire_slab 3-stage logic under mutex **Report**: MID_LARGE_LOCK_CONTENTION_ANALYSIS.md (2.3KB) - Detailed breakdown by code path - Root cause analysis (TLS miss → shared pool lock) - Lock-free implementation roadmap (P0-4/P0-5) - Expected impact: +50-73% throughput **Files Modified**: - core/hakmem_shared_pool.c: +60 lines instrumentation - Atomic counters: g_lock_acquire/release_slab_count - lock_stats_init() + lock_stats_report() - Per-path tracking in acquire/release functions **Next Steps**: - P0-4: Lock-free per-class free lists (Stage 1: LIFO stack CAS) - P0-5: Lock-free slot claiming (Stage 2: atomic bitmap) - P0-6: A/B comparison (target: +50-73%) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-14 15:32:07 +09:00
parent 87f12fe87f
commit 29fefa2018
13 changed files with 1183 additions and 16 deletions
--- a/core/box/hak_free_api.inc.h
+++ b/core/box/hak_free_api.inc.h
@ -196,13 +196,16 @@ void hak_free_at(void* ptr, size_t size, hak_callsite_t site) {
        // Phase 9 gutted hak_is_memory_readable() to always return 1 (unsafe!)
        // We MUST verify memory is mapped before dereferencing AllocHeader.
        //
-        // Step A (2025-11-14): TLS page cache to reduce mincore() frequency.
-        // - Cache last-checked pages in __thread statics.
-        // - Typical case: many frees on the same handful of pages → 90%+ cache hit.
+        // A/B Testing (2025-11-14): Add #ifdef guard to measure mincore performance impact.
+        // Expected: mincore OFF → +100-200% throughput, but may cause crashes on invalid ptrs.
+        // Usage: make DISABLE_MINCORE=1 to disable mincore checks.
        int is_mapped = 0;
-#ifdef __linux__
+#ifndef HAKMEM_DISABLE_MINCORE_CHECK
+    #ifdef __linux__
        {
-            // TLS cache for page→is_mapped
+            // TLS page cache to reduce mincore() frequency.
+            // - Cache last-checked pages in __thread statics.
+            // - Typical case: many frees on the same handful of pages → 90%+ cache hit.
            static __thread void* s_last_page1 = NULL;
            static __thread int   s_last_page1_mapped = 0;
            static __thread void* s_last_page2 = NULL;
@ -237,8 +240,14 @@ void hak_free_at(void* ptr, size_t size, hak_callsite_t site) {
                }
            }
        }
-#else
+    #else
        is_mapped = 1;  // Assume mapped on non-Linux
+    #endif
+#else
+        // HAKMEM_DISABLE_MINCORE_CHECK=1: Trust internal metadata (registry/headers)
+        // Assumes all ptrs reaching this path are valid HAKMEM allocations.
+        // WARNING: May crash on invalid ptrs (libc/external allocations without headers).
+        is_mapped = 1;
 #endif

        if (!is_mapped) {