P-Tier + Tiny Route Policy: Aggressive Superslab Management + Safe Routing

## Phase 1: Utilization-Aware Superslab Tiering (案B実装済) - Add ss_tier_box.h: Classify SuperSlabs into HOT/DRAINING/FREE based on utilization - HOT (>25%): Accept new allocations - DRAINING (≤25%): Drain only, no new allocs - FREE (0%): Ready for eager munmap - Enhanced shared_pool_release_slab(): - Check tier transition after each slab release - If tier→FREE: Force remaining slots to EMPTY and call superslab_free() immediately - Bypasses LRU cache to prevent registry bloat from accumulating DRAINING SuperSlabs - Test results (bench_random_mixed_hakmem): - 1M iterations: ✅ ~1.03M ops/s (previously passed) - 10M iterations: ✅ ~1.15M ops/s (previously: registry full error) - 50M iterations: ✅ ~1.08M ops/s (stress test) ## Phase 2: Tiny Front Routing Policy (新規Box) - Add tiny_route_box.h/c: Single 8-byte table for class→routing decisions - ROUTE_TINY_ONLY: Tiny front exclusive (no fallback) - ROUTE_TINY_FIRST: Try Tiny, fallback to Pool if fails - ROUTE_POOL_ONLY: Skip Tiny entirely - Profiles via HAKMEM_TINY_PROFILE ENV: - "hot": C0-C3=TINY_ONLY, C4-C6=TINY_FIRST, C7=POOL_ONLY - "conservative" (default): All TINY_FIRST - "off": All POOL_ONLY (disable Tiny) - "full": All TINY_ONLY (microbench mode) - A/B test results (ws=256, 100k ops random_mixed): - Default (conservative): ~2.90M ops/s - hot: ~2.65M ops/s (more conservative) - off: ~2.86M ops/s - full: ~2.98M ops/s (slightly best) ## Design Rationale ### Registry Pressure Fix (案B) - Problem: DRAINING tier SS occupied registry indefinitely - Solution: When total_active_blocks→0, immediately free to clear registry slot - Result: No more "registry full" errors under stress ### Routing Policy Box (新) - Problem: Tiny front optimization scattered across ENV/branches - Solution: Centralize routing in single table, select profiles via ENV - Benefit: Safe A/B testing without touching hot path code - Future: Integrate with RSS budget/learning layers for dynamic profile switching ## Next Steps (性能最適化) - Profile Tiny front internals (TLS SLL, FastCache, Superslab backend latency) - Identify bottleneck between current ~2.9M ops/s and mimalloc ~100M ops/s - Consider: - Reduce shared pool lock contention - Optimize unified cache hit rate - Streamline Superslab carving logic 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-04 18:01:25 +09:00
parent 984cca41ef
commit d5e6ed535c
13 changed files with 647 additions and 25 deletions
--- a/core/hakmem_shared_pool_release.c
+++ b/core/hakmem_shared_pool_release.c
@ -2,6 +2,7 @@
 #include "hakmem_debug_master.h"
 #include "box/ss_slab_meta_box.h"
 #include "box/ss_hot_cold_box.h"
+#include "box/ss_tier_box.h"              // P-Tier: Utilization-aware tiering
 #include "hakmem_env_cache.h"  // Priority-2: ENV cache
 #include "superslab/superslab_inline.h"  // superslab_ref_get guard for TLS pins
 #include "box/ss_release_guard_box.h"    // Box: SuperSlab Release Guard
@ -176,6 +177,51 @@ shared_pool_release_slab(SuperSlab* ss, int slab_idx)
        #endif
    }

+    // P-Tier: Check tier transition after releasing slab
+    // This may transition HOT → DRAINING if utilization dropped below threshold
+    // or DRAINING → FREE if utilization reached 0
+    ss_tier_check_transition(ss);
+
+    // P-Tier Step B: Eager FREE eviction
+    // If tier transitioned to FREE (total_active_blocks == 0), immediately try to
+    // release the SuperSlab regardless of active_slots. This prevents registry bloat.
+    SSTier current_tier = ss_tier_get(ss);
+    if (current_tier == SS_TIER_FREE) {
+        // Double-check: total_active_blocks should be 0 for FREE tier
+        uint32_t active_blocks = atomic_load_explicit(&ss->total_active_blocks, memory_order_acquire);
+        if (active_blocks == 0 && ss_release_guard_superslab_can_free(ss)) {
+            #if !HAKMEM_BUILD_RELEASE
+            if (dbg == 1) {
+                fprintf(stderr, "[SP_TIER_FREE_EAGER] ss=%p tier=FREE active_slots=%u -> immediate free\n",
+                        (void*)ss, sp_meta->active_slots);
+            }
+            #endif
+
+            // Force all remaining slots to EMPTY state for clean metadata
+            for (uint32_t i = 0; i < sp_meta->total_slots; i++) {
+                SlotState st = atomic_load_explicit(&sp_meta->slots[i].state, memory_order_relaxed);
+                if (st == SLOT_ACTIVE) {
+                    atomic_store_explicit(&sp_meta->slots[i].state, SLOT_EMPTY, memory_order_relaxed);
+                }
+            }
+            sp_meta->active_slots = 0;
+
+            if (g_lock_stats_enabled == 1) {
+                atomic_fetch_add(&g_lock_release_count, 1);
+            }
+
+            // Clear meta->ss before unlocking (race prevention)
+            atomic_store_explicit(&sp_meta->ss, NULL, memory_order_release);
+
+            pthread_mutex_unlock(&g_shared_pool.alloc_lock);
+
+            // Free SuperSlab immediately (bypasses normal active_slots==0 check)
+            extern void superslab_free(SuperSlab* ss);
+            superslab_free(ss);
+            return;
+        }
+    }
+
    // Check if SuperSlab is now completely empty (all slots EMPTY or UNUSED)
    if (sp_meta->active_slots == 0) {
        #if !HAKMEM_BUILD_RELEASE