# ACE-Pool Architecture Investigation Report ## Executive Summary **Root Cause Found:** Bridge classes (40KB, 52KB) are disabled at initialization due to conflicting code paths. The Pool init code expects them from Policy, but Policy disabled them in Phase 6.21. **Fix is trivial: Don't overwrite hardcoded Bridge classes with 0.** ## Part 1: Root Cause Analysis ### The Bug Chain 1. **Policy Phase 6.21 Change:** ```c // core/hakmem_policy.c:53-55 pol->mid_dyn1_bytes = 0; // Disabled (Bridge classes now hardcoded) pol->mid_dyn2_bytes = 0; // Disabled ``` 2. **Pool Init Overwrites Bridge Classes:** ```c // core/box/pool_init_api.inc.h:9-17 if (pol && pol->mid_dyn1_bytes >= POOL_MIN_SIZE && pol->mid_dyn1_bytes <= POOL_MAX_SIZE) { g_class_sizes[5] = pol->mid_dyn1_bytes; } else { g_class_sizes[5] = 0; // ← BRIDGE CLASS 5 (40KB) DISABLED! } ``` 3. **Pool Has Bridge Classes Hardcoded:** ```c // core/hakmem_pool.c:810-817 static size_t g_class_sizes[POOL_NUM_CLASSES] = { POOL_CLASS_2KB, // 2 KB POOL_CLASS_4KB, // 4 KB POOL_CLASS_8KB, // 8 KB POOL_CLASS_16KB, // 16 KB POOL_CLASS_32KB, // 32 KB POOL_CLASS_40KB, // 40 KB (Bridge class 0) ← GETS OVERWRITTEN TO 0! POOL_CLASS_52KB // 52 KB (Bridge class 1) ← GETS OVERWRITTEN TO 0! }; ``` 4. **Result: 33KB Allocation Fails:** - ACE rounds 33KB → 40KB (Bridge class 5) - Pool lookup: `g_class_sizes[5] = 0` → class disabled - Pool returns NULL - Fallback to mmap (1.03M ops/s instead of 50-80M) ### Why Pre-allocation Code Never Runs ```c // core/box/pool_init_api.inc.h:101-106 if (g_class_sizes[5] != 0) { // ← FALSE because g_class_sizes[5] = 0 // Pre-allocation code NEVER executes for (int s = 0; s < prewarm_pages && s < POOL_NUM_SHARDS; s++) { refill_freelist(5, s); } } ``` The pre-allocation code is correct but never runs because the Bridge classes are disabled! ## Part 2: Boxing Analysis ### Current Architecture Problems **1. Conflicting Ownership:** - Policy thinks it owns Bridge class configuration (DYN1/DYN2) - Pool has Bridge classes hardcoded - Pool init overwrites hardcoded values with Policy's 0s **2. Invisible Failures:** - No error when Bridge classes get disabled - No warning when Pool returns NULL - No trace showing why allocation failed **3. Mixed Responsibilities:** - `pool_init_api.inc.h` does both init AND policy configuration - ACE does rounding AND allocation AND fallback - No clear separation of concerns ### Data Flow Tracing ``` 33KB allocation request → hkm_ace_alloc() → round_to_mid_class(33KB, wmax=1.33) → 40KB ✓ → hak_pool_try_alloc(40KB) → hak_pool_init() (pthread_once) → hak_pool_get_class_index(40KB) → Check g_class_sizes[5] = 0 ✗ → Return -1 (not found) → Pool returns NULL → ACE tries Large rounding (fails) → Fallback to mmap ✗ ``` ### Missing Boxes 1. **Configuration Validator Box:** - Should verify Bridge classes are enabled - Should warn if Policy conflicts with Pool 2. **Allocation Router Box:** - Central decision point for allocation strategy - Clear logging of routing decisions 3. **Pool Health Check Box:** - Verify all classes are properly configured - Check if pre-allocation succeeded ## Part 3: Central Checker Box Design ### Proposed Architecture ```c // core/box/ace_pool_checker.h typedef struct { bool ace_enabled; bool pool_initialized; bool bridge_classes_enabled; bool pool_has_pages[POOL_NUM_CLASSES]; size_t class_sizes[POOL_NUM_CLASSES]; const char* last_error; } AcePoolHealthStatus; // Central validation point AcePoolHealthStatus* hak_ace_pool_health_check(void); // Routing with validation void* hak_ace_pool_route_alloc(size_t size, uintptr_t site_id) { // 1. Check health AcePoolHealthStatus* health = hak_ace_pool_health_check(); if (!health->ace_enabled) { LOG("ACE disabled, fallback to system"); return NULL; } // 2. Validate Pool if (!health->pool_initialized) { LOG("Pool not initialized!"); hak_pool_init(); health = hak_ace_pool_health_check(); // Re-check } // 3. Check Bridge classes size_t rounded = round_to_mid_class(size, 1.33, NULL); int class_idx = hak_pool_get_class_index(rounded); if (class_idx >= 0 && health->class_sizes[class_idx] == 0) { LOG("ERROR: Class %d disabled (size=%zu)", class_idx, rounded); return NULL; } // 4. Try allocation with logging LOG("Routing %zu → class %d (size=%zu)", size, class_idx, rounded); void* ptr = hak_pool_try_alloc(rounded, site_id); if (!ptr) { LOG("Pool allocation failed for class %d", class_idx); } return ptr; } ``` ### Integration Points 1. **Replace silent failures with logged checker:** ```c // Before: Silent failure void* p = hak_pool_try_alloc(r, site_id); // After: Checked and logged void* p = hak_ace_pool_route_alloc(size, site_id); ``` 2. **Add health check command:** ```c // In main() or benchmark if (getenv("HAKMEM_HEALTH_CHECK")) { AcePoolHealthStatus* h = hak_ace_pool_health_check(); fprintf(stderr, "ACE: %s\n", h->ace_enabled ? "ON" : "OFF"); fprintf(stderr, "Pool: %s\n", h->pool_initialized ? "OK" : "NOT INIT"); for (int i = 0; i < POOL_NUM_CLASSES; i++) { fprintf(stderr, "Class %d: %zu KB %s\n", i, h->class_sizes[i]/1024, h->class_sizes[i] ? "ENABLED" : "DISABLED"); } } ``` ## Part 4: Immediate Fix ### Quick Fix #1: Don't Overwrite Bridge Classes ```diff // core/box/pool_init_api.inc.h:9-17 - if (pol && pol->mid_dyn1_bytes >= POOL_MIN_SIZE && pol->mid_dyn1_bytes <= POOL_MAX_SIZE) { - g_class_sizes[5] = pol->mid_dyn1_bytes; - } else { - g_class_sizes[5] = 0; - } + // Phase 6.21: Bridge classes are hardcoded, don't overwrite with 0 + if (pol && pol->mid_dyn1_bytes >= POOL_MIN_SIZE && pol->mid_dyn1_bytes <= POOL_MAX_SIZE) { + g_class_sizes[5] = pol->mid_dyn1_bytes; // Only override if Policy provides valid value + } + // Otherwise keep the hardcoded POOL_CLASS_40KB ``` ### Quick Fix #2: Force Bridge Classes (Simpler) ```diff // core/box/pool_init_api.inc.h:7 (in hak_pool_init_impl) static void hak_pool_init_impl(void) { const FrozenPolicy* pol = hkm_policy_get(); + + // Phase 6.21 CRITICAL FIX: Bridge classes are hardcoded, not from Policy + // DO NOT overwrite them with 0! + /* if (pol && pol->mid_dyn1_bytes >= POOL_MIN_SIZE && pol->mid_dyn1_bytes <= POOL_MAX_SIZE) { g_class_sizes[5] = pol->mid_dyn1_bytes; } else { g_class_sizes[5] = 0; } if (pol && pol->mid_dyn2_bytes >= POOL_MIN_SIZE && pol->mid_dyn2_bytes <= POOL_MAX_SIZE) { g_class_sizes[6] = pol->mid_dyn2_bytes; } else { g_class_sizes[6] = 0; } + */ + // Bridge classes stay as initialized in g_class_sizes (40KB, 52KB) ``` ### Quick Fix #3: Add Debug Logging (For Verification) ```diff // core/box/pool_init_api.inc.h:84-95 g_pool.initialized = 1; HAKMEM_LOG("[Pool] Initialized (L2 Hybrid Pool)\n"); + HAKMEM_LOG("[Pool] Class sizes after init:\n"); + for (int i = 0; i < POOL_NUM_CLASSES; i++) { + HAKMEM_LOG(" Class %d: %zu KB %s\n", + i, g_class_sizes[i]/1024, + g_class_sizes[i] ? "ENABLED" : "DISABLED"); + } ``` ## Recommended Actions ### Immediate (NOW): 1. Apply Quick Fix #2 (comment out the overwrite code) 2. Rebuild with debug logging 3. Test: `HAKMEM_ACE_ENABLED=1 ./bench_mid_large_mt_hakmem` 4. Expected: 50-80M ops/s (vs current 1.03M) ### Short-term (1-2 days): 1. Implement Central Checker Box 2. Add health check API 3. Add allocation routing logs ### Long-term (1 week): 1. Refactor Pool/Policy bridge class ownership 2. Separate init from configuration 3. Add comprehensive boxing tests ## Architecture Diagram ``` Current (BROKEN): ================ [Policy] ↓ mid_dyn1=0, mid_dyn2=0 [Pool Init] ↓ Overwrites g_class_sizes[5]=0, [6]=0 [Pool] ↓ Bridge classes DISABLED [ACE Alloc] ↓ 33KB → 40KB (class 5) [Pool Lookup] ↓ g_class_sizes[5]=0 → FAIL [mmap fallback] ← 1.03M ops/s Proposed (FIXED): ================ [Policy] ↓ (Bridge config ignored) [Pool Init] ↓ Keep hardcoded g_class_sizes [Central Checker] ← NEW ↓ Validate all components [Pool] ↓ Bridge classes ENABLED (40KB, 52KB) [ACE Alloc] ↓ 33KB → 40KB (class 5) [Pool Lookup] ↓ g_class_sizes[5]=40KB → SUCCESS [Pool Pages] ← 50-80M ops/s ``` ## Test Commands ```bash # Before fix (current broken state) make HEADER_CLASSIDX=1 AGGRESSIVE_INLINE=1 PREWARM_TLS=1 HAKMEM_ACE_ENABLED=1 ./bench_mid_large_mt_hakmem # Result: 1.03M ops/s (mmap fallback) # After fix (comment out lines 9-17) vim core/box/pool_init_api.inc.h # Comment out lines 9-17 make clean && make HEADER_CLASSIDX=1 AGGRESSIVE_INLINE=1 PREWARM_TLS=1 HAKMEM_ACE_ENABLED=1 ./bench_mid_large_mt_hakmem # Expected: 50-80M ops/s (Pool working!) # With debug verification HAKMEM_LOG_LEVEL=3 HAKMEM_ACE_ENABLED=1 ./bench_mid_large_mt_hakmem 2>&1 | grep "Class 5" # Should show: "Class 5: 40 KB ENABLED" ``` ## Conclusion **The bug is trivial:** Pool init code overwrites hardcoded Bridge classes with 0 because Policy disabled them in Phase 6.21. **The fix is trivial:** Don't overwrite them. Comment out 9 lines. **The impact is massive:** 50-80x performance improvement (1.03M → 50-80M ops/s). **The lesson:** When two components (Policy and Pool) both think they own configuration, silent failures occur. Need better boxing with clear ownership boundaries and validation points.