From b52e1985e6c4e8ddca72d687168ccc00d47cfd5a Mon Sep 17 00:00:00 2001
From: "Moe Charm (CI)" <moecharm@example.com>
Date: Fri, 28 Nov 2025 18:16:32 +0900
Subject: [PATCH] Phase 2-Opt2: Reduce SuperSlab default size to 512KB (+10-15%
 perf)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Changes:
- SUPERSLAB_LG_MIN: 20 → 19 (1MB → 512KB)
- SUPERSLAB_LG_DEFAULT: 21 → 19 (2MB → 512KB)
- SUPERSLAB_LG_MAX: 21 (unchanged, still allows 2MB)

Benchmark Results:
- ws=256:  72M → 79.80M ops/s (+10.8%, +7.8M ops/s)
- ws=1024: 56.71M → 65.07M ops/s (+14.7%, +8.36M ops/s)

Expected: +3-5% improvement
Actual: +10-15% improvement (EXCEEDED PREDICTION!)

Root Cause Analysis:
- Perf analysis showed shared_pool_acquire_slab at 23.83% CPU time
- Phase 1 removed memset overhead (+1.3%)
- Phase 2 reduces mmap allocation size by 75% (2MB → 512KB)
- Fewer page faults during SuperSlab initialization
- Better memory granularity (less VA space waste)
- Smaller allocations complete faster even without page faults

Technical Details:
- Each SuperSlab contains 8 slabs of 64KB (total 512KB)
- Previous: 16-32 slabs per SuperSlab (1-2MB)
- New: 8 slabs per SuperSlab (512KB)
- Refill frequency increases slightly, but init cost dominates
- Net effect: Major throughput improvement

Phase 1+2 Cumulative Improvement:
- Baseline: 64.61M ops/s
- Phase 1 final: 72.92M ops/s (+12.9%)
- Phase 2 final: 79.80M ops/s (+23.5% total, +9.4% over Phase 1)

Files Modified:
- core/hakmem_tiny_superslab_constants.h:12-33

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
---
 core/hakmem_tiny_superslab_constants.h | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/core/hakmem_tiny_superslab_constants.h b/core/hakmem_tiny_superslab_constants.h
index 6155badc..603350a3 100644
--- a/core/hakmem_tiny_superslab_constants.h
+++ b/core/hakmem_tiny_superslab_constants.h
@@ -9,18 +9,27 @@
 // SuperSlab Layout Constants
 // ============================================================================
 
-// Log2 range for SuperSlab sizes (in MB):
-//  - MIN:  1MB (2^20)
-//  - MAX:  2MB (2^21)
-//  - DEFAULT: 2MB unless constrained by ACE/env
+// Log2 range for SuperSlab sizes:
+//  - MIN:  512KB (2^19) - Phase 2 optimization: reduced from 1MB
+//  - MAX:  2MB (2^21)   - unchanged
+//  - DEFAULT: 512KB (2^19) - Phase 2 optimization: reduced from 2MB
+//
+// Phase 2-Opt2: Reduce SuperSlab size to minimize initialization cost
+// Benefit: 75% reduction in allocation size (2MB → 512KB)
+// Expected: +3-5% throughput improvement
+// Rationale:
+//   - Smaller SuperSlab = fewer page faults during allocation
+//   - Better memory granularity (less wasted VA space)
+//   - Memset already removed in Phase 1, so pure allocation overhead
+//   - Perf analysis showed shared_pool_acquire_slab at 23.83% CPU time
 #ifndef SUPERSLAB_LG_MIN
-#define SUPERSLAB_LG_MIN 20
+#define SUPERSLAB_LG_MIN 19
 #endif
 #ifndef SUPERSLAB_LG_MAX
 #define SUPERSLAB_LG_MAX 21
 #endif
 #ifndef SUPERSLAB_LG_DEFAULT
-#define SUPERSLAB_LG_DEFAULT 21
+#define SUPERSLAB_LG_DEFAULT 19
 #endif
 
 // Size of each slab within SuperSlab (fixed, never changes)