Phase 26: Front Gate Unification - Tiny allocator fast path (+12.9%)

Implementation: - New single-layer malloc/free path for Tiny (≤1024B) allocations - Bypasses 3-layer overhead: malloc → hak_alloc_at (236 lines) → wrapper → tiny_alloc_fast - Leverages Phase 23 Unified Cache (tcache-style, 2-3 cache misses) - Safe fallback to normal path on Unified Cache miss Performance (Random Mixed 256B, 100K iterations): - Baseline (Phase 26 OFF): 11.33M ops/s - Phase 26 ON: 12.79M ops/s (+12.9%) - Prediction (ChatGPT): +10-15% → Actual: +12.9% (perfect match!) Bug fixes: - Initialization bug: Added hak_init() call before fast path - Page boundary SEGV: Added guard for offset_in_page == 0 Also includes Phase 23 debug log fixes: - Guard C2_CARVE logs with #if !HAKMEM_BUILD_RELEASE - Guard prewarm logs with #if !HAKMEM_BUILD_RELEASE - Set Hot_2048 as default capacity (C2/C3=2048, others=64) Files: - core/front/malloc_tiny_fast.h: Phase 26 implementation (145 lines) - core/box/hak_wrappers.inc.h: Fast path integration (+28 lines) - core/front/tiny_unified_cache.h: Hot_2048 default - core/tiny_refill_opt.h: C2_CARVE log guard - core/box/ss_hot_prewarm_box.c: Prewarm log guard - CURRENT_TASK.md: Phase 26 completion documentation ENV variables: - HAKMEM_FRONT_GATE_UNIFIED=1 (enable Phase 26, default: OFF) - HAKMEM_TINY_UNIFIED_CACHE=1 (Phase 23, required) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 05:29:08 +09:00
parent 7311d32574
commit 5b36c1c908
6 changed files with 302 additions and 9 deletions
--- a/core/front/tiny_unified_cache.h
+++ b/core/front/tiny_unified_cache.h
@ -77,18 +77,27 @@ static inline int unified_cache_enabled(void) {
    return g_enable;
 }

-// Per-class capacity (default: 128 for all classes)
+// Per-class capacity (default: Hot_2048 strategy - optimized for 256B workload)
+// Phase 23 Capacity Optimization Result: Hot_2048 = 14.63M ops/s (+43% vs baseline)
+// Hot classes (C2/C3: 128B/256B) get 2048 slots, others get 64 slots
 static inline size_t unified_capacity(int class_idx) {
    static size_t g_cap[TINY_NUM_CLASSES] = {0};
    if (__builtin_expect(g_cap[class_idx] == 0, 0)) {
        char env_name[64];
        snprintf(env_name, sizeof(env_name), "HAKMEM_TINY_UNIFIED_C%d", class_idx);
        const char* e = getenv(env_name);
-        g_cap[class_idx] = (e && *e) ? (size_t)atoi(e) : 128;  // Default: 128
+
+        // Default: Hot_2048 strategy (C2/C3=2048, others=64)
+        size_t default_cap = 64;  // Cold classes
+        if (class_idx == 2 || class_idx == 3) {
+            default_cap = 2048;  // Hot classes (128B, 256B)
+        }
+
+        g_cap[class_idx] = (e && *e) ? (size_t)atoi(e) : default_cap;

        // Round up to power of 2 (for fast modulo)
        if (g_cap[class_idx] < 32) g_cap[class_idx] = 32;
-        if (g_cap[class_idx] > 512) g_cap[class_idx] = 512;
+        if (g_cap[class_idx] > 4096) g_cap[class_idx] = 4096;  // Increased limit for Hot_2048

        // Ensure power of 2
        size_t pow2 = 32;