# Pool TLS + Learning Layer Integration Design ## Executive Summary **Core Insight**: "キャッシュ増やす時だけ学習させる、push して他のスレッドに任せる" - Learning happens ONLY during refill (cold path) - Hot path stays ultra-fast (5-6 cycles) - Learning data pushed async to background thread ## 1. Box Architecture ### Clean Separation Design ``` ┌──────────────────────────────────────────────────────────────┐ │ HOT PATH (5-6 cycles) │ ├──────────────────────────────────────────────────────────────┤ │ Box 1: TLS Freelist (pool_tls.c) │ │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ │ │ • NO learning code │ │ • NO metrics collection │ │ • Just pop/push freelists │ │ │ │ API: │ │ - pool_alloc_fast(class) → void* │ │ - pool_free_fast(ptr, class) → void │ │ - pool_needs_refill(class) → bool │ └────────────────────────┬─────────────────────────────────────┘ │ Refill trigger (miss) ↓ ┌──────────────────────────────────────────────────────────────┐ │ COLD PATH (100+ cycles) │ ├──────────────────────────────────────────────────────────────┤ │ Box 2: Refill Engine (pool_refill.c) │ │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ │ │ • Batch allocate from backend │ │ • Write headers (if enabled) │ │ • Collect metrics HERE │ │ • Push learning event (async) │ │ │ │ API: │ │ - pool_refill(class) → int │ │ - pool_get_refill_count(class) → int │ │ - pool_notify_refill(class, count) → void │ └────────────────────────┬─────────────────────────────────────┘ │ Learning event (async) ↓ ┌──────────────────────────────────────────────────────────────┐ │ BACKGROUND (separate thread) │ ├──────────────────────────────────────────────────────────────┤ │ Box 3: ACE Learning (ace_learning.c) │ │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ │ │ • Consume learning events │ │ • Update policies (UCB1, etc) │ │ • Tune refill counts │ │ • NO direct interaction with hot path │ │ │ │ API: │ │ - ace_push_event(event) → void │ │ - ace_get_policy(class) → policy │ │ - ace_background_thread() → void │ └──────────────────────────────────────────────────────────────┘ ``` ### Key Design Principles 1. **NO learning code in hot path** - Box 1 is pristine 2. **Metrics collection in refill only** - Box 2 handles all instrumentation 3. **Async learning** - Box 3 runs independently 4. **One-way data flow** - Events flow down, policies flow up via shared memory ## 2. Learning Event Design ### Event Structure ```c typedef struct { uint32_t thread_id; // Which thread triggered refill uint16_t class_idx; // Size class uint16_t refill_count; // How many blocks refilled uint64_t timestamp_ns; // When refill occurred uint32_t miss_streak; // Consecutive misses before refill uint32_t tls_occupancy; // How full was cache before refill uint32_t flags; // FIRST_REFILL, FORCED_DRAIN, etc. } RefillEvent; ``` ### Collection Points (in pool_refill.c ONLY) ```c static inline void pool_refill_internal(int class_idx) { // 1. Capture pre-refill state uint32_t old_count = g_tls_pool_count[class_idx]; uint32_t miss_streak = g_tls_miss_streak[class_idx]; // 2. Get refill policy (from ACE or default) int refill_count = pool_get_refill_count(class_idx); // 3. Batch allocate void* chain = backend_batch_alloc(class_idx, refill_count); // 4. Install in TLS pool_splice_chain(class_idx, chain, refill_count); // 5. Create learning event (AFTER successful refill) RefillEvent event = { .thread_id = pool_get_thread_id(), .class_idx = class_idx, .refill_count = refill_count, .timestamp_ns = pool_get_timestamp(), .miss_streak = miss_streak, .tls_occupancy = old_count, .flags = (old_count == 0) ? FIRST_REFILL : 0 }; // 6. Push to learning queue (non-blocking) ace_push_event(&event); // 7. Reset counters g_tls_miss_streak[class_idx] = 0; } ``` ## 3. Thread-Crossing Strategy ### Chosen Design: Lock-Free MPSC Queue **Rationale**: Minimal overhead, no blocking, simple to implement ```c // Lock-free multi-producer single-consumer queue typedef struct { _Atomic(RefillEvent*) events[LEARNING_QUEUE_SIZE]; _Atomic uint64_t write_pos; uint64_t read_pos; // Only accessed by consumer _Atomic uint64_t drops; // Track dropped events (Contract A) } LearningQueue; // Producer side (worker threads during refill) void ace_push_event(RefillEvent* event) { uint64_t pos = atomic_fetch_add(&g_queue.write_pos, 1); uint64_t slot = pos % LEARNING_QUEUE_SIZE; // Contract A: Check for full queue and drop if necessary if (atomic_load(&g_queue.events[slot]) != NULL) { atomic_fetch_add(&g_queue.drops, 1); return; // DROP - never block! } // Copy event to pre-allocated slot (Contract C: fixed ring buffer) RefillEvent* dest = &g_event_pool[slot]; memcpy(dest, event, sizeof(RefillEvent)); // Publish (release semantics) atomic_store_explicit(&g_queue.events[slot], dest, memory_order_release); } // Consumer side (learning thread) void ace_consume_events(void) { while (running) { uint64_t slot = g_queue.read_pos % LEARNING_QUEUE_SIZE; RefillEvent* event = atomic_load_explicit( &g_queue.events[slot], memory_order_acquire); if (event) { ace_process_event(event); atomic_store(&g_queue.events[slot], NULL); g_queue.read_pos++; } else { // No events, sleep briefly usleep(1000); // 1ms } } } ``` ### Why Not TLS Accumulation? - ❌ Requires synchronization points (when to flush?) - ❌ Delays learning (batch vs streaming) - ❌ More complex state management - ✅ MPSC queue is simpler and proven ## 4. Interface Contracts (Critical Specifications) ### Contract A: Queue Overflow Policy **Rule**: ace_push_event() MUST NEVER BLOCK **Implementation**: - If queue is full: DROP the event silently - Rationale: Hot path correctness > complete telemetry - Monitoring: Track drop count for diagnostics **Code**: ```c void ace_push_event(RefillEvent* event) { uint64_t pos = atomic_fetch_add(&g_queue.write_pos, 1); uint64_t slot = pos % LEARNING_QUEUE_SIZE; // Check if slot is still occupied (queue full) if (atomic_load(&g_queue.events[slot]) != NULL) { atomic_fetch_add(&g_queue.drops, 1); // Track drops return; // DROP - don't wait! } // Safe to write - copy to ring buffer memcpy(&g_event_pool[slot], event, sizeof(RefillEvent)); atomic_store_explicit(&g_queue.events[slot], &g_event_pool[slot], memory_order_release); } ``` ### Contract B: Policy Scope Limitation **Rule**: ACE can ONLY adjust "next refill parameters" **Allowed**: - ✅ Refill count for next miss - ✅ Drain threshold adjustments - ✅ Pre-warming at thread init **FORBIDDEN**: - ❌ Immediate cache flush - ❌ Blocking operations - ❌ Direct TLS manipulation **Implementation**: - ACE writes to: `g_refill_policies[class_idx]` (atomic) - Box2 reads from: `ace_get_refill_count(class_idx)` (atomic load, no blocking) **Code**: ```c // ACE side - writes policy void ace_update_policy(int class_idx, uint32_t new_count) { // ONLY writes to policy table atomic_store(&g_refill_policies[class_idx], new_count); } // Box2 side - reads policy (never blocks) uint32_t pool_get_refill_count(int class_idx) { uint32_t count = atomic_load(&g_refill_policies[class_idx]); return count ? count : DEFAULT_REFILL_COUNT[class_idx]; } ``` ### Contract C: Memory Ownership Model **Rule**: Clear ownership to prevent use-after-free **Model**: Fixed Ring Buffer (No Allocations) ```c // Pre-allocated event pool static RefillEvent g_event_pool[LEARNING_QUEUE_SIZE]; // Producer (Box2) void ace_push_event(RefillEvent* event) { uint64_t pos = atomic_fetch_add(&g_queue.write_pos, 1); uint64_t slot = pos % LEARNING_QUEUE_SIZE; // Check for full queue (Contract A) if (atomic_load(&g_queue.events[slot]) != NULL) { atomic_fetch_add(&g_queue.drops, 1); return; } // Copy to fixed slot (no malloc!) memcpy(&g_event_pool[slot], event, sizeof(RefillEvent)); // Publish pointer atomic_store(&g_queue.events[slot], &g_event_pool[slot]); } // Consumer (Box3) void ace_consume_events(void) { RefillEvent* event = atomic_load(&g_queue.events[slot]); if (event) { // Process (event lifetime guaranteed by ring buffer) ace_process_event(event); // Release slot atomic_store(&g_queue.events[slot], NULL); } } ``` **Ownership Rules**: - Producer: COPIES to ring buffer (stack event is safe to discard) - Consumer: READS from ring buffer (no ownership transfer) - Ring buffer: OWNS all events (never freed, just reused) ### Contract D: API Boundary Enforcement **Box1 API (pool_tls.h)**: ```c // PUBLIC: Hot path functions void* pool_alloc(size_t size); void pool_free(void* ptr); // INTERNAL: Only called by Box2 void pool_install_chain(int class_idx, void* chain, int count); ``` **Box2 API (pool_refill.h)**: ```c // INTERNAL: Refill implementation void* pool_refill_and_alloc(int class_idx); // Box2 is ONLY box that calls ace_push_event() // (Enforced by making it static in pool_refill.c) static void notify_learning(RefillEvent* event) { ace_push_event(event); } ``` **Box3 API (ace_learning.h)**: ```c // POLICY OUTPUT: Box2 reads these uint32_t ace_get_refill_count(int class_idx); // EVENT INPUT: Only Box2 calls this void ace_push_event(RefillEvent* event); // Box3 NEVER calls Box1 functions directly // Box3 NEVER blocks Box1 or Box2 ``` **Enforcement Strategy**: - Separate .c files (no cross-includes except public headers) - Static functions where appropriate - Code review checklist in POOL_IMPLEMENTATION_CHECKLIST.md ## 5. Progressive Implementation Plan ### Phase 1: Ultra-Simple TLS (2 days) **Goal**: 40-60M ops/s without any learning **Files**: - `core/pool_tls.c` - TLS freelist implementation - `core/pool_tls.h` - Public API **Code** (pool_tls.c): ```c // Global TLS state (per-thread) __thread void* g_tls_pool_head[POOL_SIZE_CLASSES]; __thread uint32_t g_tls_pool_count[POOL_SIZE_CLASSES]; // Fixed refill counts for Phase 1 static const uint32_t DEFAULT_REFILL_COUNT[POOL_SIZE_CLASSES] = { 64, 64, 48, 48, 32, 32, 24, 24, // Small (high frequency) 16, 16, 12, 12, 8, 8, 8, 8 // Large (lower frequency) }; // Ultra-fast allocation (5-6 cycles) void* pool_alloc_fast(size_t size) { int class_idx = pool_size_to_class(size); void* head = g_tls_pool_head[class_idx]; if (LIKELY(head)) { // Pop from freelist g_tls_pool_head[class_idx] = *(void**)head; g_tls_pool_count[class_idx]--; // Write header if enabled #if POOL_USE_HEADERS *((uint8_t*)head - 1) = POOL_MAGIC | class_idx; #endif return head; } // Cold path: refill return pool_refill_and_alloc(class_idx); } // Simple refill (no learning) static void* pool_refill_and_alloc(int class_idx) { int count = DEFAULT_REFILL_COUNT[class_idx]; // Batch allocate from SuperSlab void* chain = ss_batch_carve(class_idx, count); if (!chain) return NULL; // Pop first for return void* ret = chain; chain = *(void**)chain; count--; // Install rest in TLS g_tls_pool_head[class_idx] = chain; g_tls_pool_count[class_idx] = count; #if POOL_USE_HEADERS *((uint8_t*)ret - 1) = POOL_MAGIC | class_idx; #endif return ret; } // Ultra-fast free (5-6 cycles) void pool_free_fast(void* ptr) { #if POOL_USE_HEADERS uint8_t header = *((uint8_t*)ptr - 1); if ((header & 0xF0) != POOL_MAGIC) { // Not ours, route elsewhere return pool_free_slow(ptr); } int class_idx = header & 0x0F; #else int class_idx = pool_ptr_to_class(ptr); // Lookup #endif // Push to freelist *(void**)ptr = g_tls_pool_head[class_idx]; g_tls_pool_head[class_idx] = ptr; g_tls_pool_count[class_idx]++; // Optional: drain if too full if (UNLIKELY(g_tls_pool_count[class_idx] > MAX_TLS_CACHE)) { pool_drain_excess(class_idx); } } ``` **Acceptance Criteria**: - ✅ Larson: 2.5M+ ops/s - ✅ bench_random_mixed: 40M+ ops/s - ✅ No learning code present - ✅ Clean, readable, < 200 LOC ### Phase 2: Metrics Collection (1 day) **Goal**: Add instrumentation without slowing hot path **Changes**: ```c // Add to TLS state __thread uint64_t g_tls_pool_hits[POOL_SIZE_CLASSES]; __thread uint64_t g_tls_pool_misses[POOL_SIZE_CLASSES]; __thread uint32_t g_tls_miss_streak[POOL_SIZE_CLASSES]; // In pool_alloc_fast() - hot path if (LIKELY(head)) { #ifdef POOL_COLLECT_METRICS g_tls_pool_hits[class_idx]++; // Single increment #endif // ... existing code } // In pool_refill_and_alloc() - cold path g_tls_pool_misses[class_idx]++; g_tls_miss_streak[class_idx]++; // New stats function void pool_print_stats(void) { for (int i = 0; i < POOL_SIZE_CLASSES; i++) { double hit_rate = (double)g_tls_pool_hits[i] / (g_tls_pool_hits[i] + g_tls_pool_misses[i]); printf("Class %d: %.2f%% hit rate, avg streak %u\n", i, hit_rate * 100, avg_streak[i]); } } ``` **Acceptance Criteria**: - ✅ < 2% performance regression - ✅ Accurate hit rate reporting - ✅ Identify hot classes for Phase 3 ### Phase 3: Learning Integration (2 days) **Goal**: Connect ACE learning without touching hot path **New Files**: - `core/ace_learning.c` - Learning thread - `core/ace_policy.h` - Policy structures **Integration Points**: 1. **Startup**: Launch learning thread ```c void hakmem_init(void) { // ... existing init ace_start_learning_thread(); } ``` 2. **Refill**: Push events ```c // In pool_refill_and_alloc() - add after successful refill RefillEvent event = { /* ... */ }; ace_push_event(&event); // Non-blocking ``` 3. **Policy Application**: Read tuned values ```c // Replace DEFAULT_REFILL_COUNT with dynamic lookup int count = ace_get_refill_count(class_idx); // Falls back to default if no policy yet ``` **ACE Learning Algorithm** (ace_learning.c): ```c // UCB1 for exploration vs exploitation typedef struct { double total_reward; // Sum of rewards uint64_t play_count; // Times tried uint32_t refill_size; // Current policy } ClassPolicy; static ClassPolicy g_policies[POOL_SIZE_CLASSES]; void ace_process_event(RefillEvent* e) { ClassPolicy* p = &g_policies[e->class_idx]; // Compute reward (inverse of miss streak) double reward = 1.0 / (1.0 + e->miss_streak); // Update UCB1 statistics p->total_reward += reward; p->play_count++; // Adjust refill size based on occupancy if (e->tls_occupancy < 4) { // Cache was nearly empty, increase refill p->refill_size = MIN(p->refill_size * 1.5, 256); } else if (e->tls_occupancy > 32) { // Cache had plenty, decrease refill p->refill_size = MAX(p->refill_size * 0.75, 16); } // Publish new policy (atomic write) atomic_store(&g_refill_policies[e->class_idx], p->refill_size); } ``` **Acceptance Criteria**: - ✅ No regression in hot path performance - ✅ Refill sizes adapt to workload - ✅ Background thread < 1% CPU ## 5. API Specifications ### Box 1: TLS Freelist API ```c // Public API (pool_tls.h) void* pool_alloc(size_t size); void pool_free(void* ptr); void pool_thread_init(void); void pool_thread_cleanup(void); // Internal API (for refill box) int pool_needs_refill(int class_idx); void pool_install_chain(int class_idx, void* chain, int count); ``` ### Box 2: Refill API ```c // Internal API (pool_refill.h) void* pool_refill_and_alloc(int class_idx); int pool_get_refill_count(int class_idx); void pool_drain_excess(int class_idx); // Backend interface void* backend_batch_alloc(int class_idx, int count); void backend_batch_free(int class_idx, void* chain, int count); ``` ### Box 3: Learning API ```c // Public API (ace_learning.h) void ace_start_learning_thread(void); void ace_stop_learning_thread(void); void ace_push_event(RefillEvent* event); // Policy API uint32_t ace_get_refill_count(int class_idx); void ace_reset_policies(void); void ace_print_stats(void); ``` ## 6. Diagnostics and Monitoring ### Queue Health Metrics ```c typedef struct { uint64_t total_events; // Total events pushed uint64_t dropped_events; // Events dropped due to full queue uint64_t processed_events; // Events successfully processed double drop_rate; // drops / total_events } QueueMetrics; void ace_compute_metrics(QueueMetrics* m) { m->total_events = atomic_load(&g_queue.write_pos); m->dropped_events = atomic_load(&g_queue.drops); m->processed_events = g_queue.read_pos; m->drop_rate = (double)m->dropped_events / m->total_events; // Alert if drop rate exceeds threshold if (m->drop_rate > 0.01) { // > 1% drops fprintf(stderr, "WARNING: Queue drop rate %.2f%% - increase LEARNING_QUEUE_SIZE\n", m->drop_rate * 100); } } ``` **Target Metrics**: - Drop rate: < 0.1% (normal operation) - If > 1%: Increase LEARNING_QUEUE_SIZE - If > 5%: Critical - learning degraded ### Policy Stability Metrics ```c typedef struct { uint32_t refill_count; uint32_t change_count; // Times policy changed uint64_t last_change_ns; // When last changed double variance; // Refill count variance } PolicyMetrics; void ace_track_policy_stability(int class_idx) { static PolicyMetrics metrics[POOL_SIZE_CLASSES]; PolicyMetrics* m = &metrics[class_idx]; uint32_t new_count = atomic_load(&g_refill_policies[class_idx]); if (new_count != m->refill_count) { m->change_count++; m->last_change_ns = get_timestamp_ns(); // Detect oscillation uint64_t change_interval = get_timestamp_ns() - m->last_change_ns; if (change_interval < 1000000000) { // < 1 second fprintf(stderr, "WARNING: Class %d policy oscillating\n", class_idx); } } } ``` ### Debug Flags ```c // Contract validation #ifdef POOL_DEBUG_CONTRACTS #define VALIDATE_CONTRACT_A() do { \ if (is_blocking_detected()) { \ panic("Contract A violation: ace_push_event blocked!"); \ } \ } while(0) #define VALIDATE_CONTRACT_B() do { \ if (ace_performed_immediate_action()) { \ panic("Contract B violation: ACE performed immediate action!"); \ } \ } while(0) #define VALIDATE_CONTRACT_D() do { \ if (box3_called_box1_function()) { \ panic("Contract D violation: Box3 called Box1 directly!"); \ } \ } while(0) #else #define VALIDATE_CONTRACT_A() #define VALIDATE_CONTRACT_B() #define VALIDATE_CONTRACT_D() #endif // Drop tracking #ifdef POOL_DEBUG_DROPS #define LOG_DROP() fprintf(stderr, "DROP: tid=%lu class=%d @ %s:%d\n", \ pthread_self(), class_idx, __FILE__, __LINE__) #else #define LOG_DROP() #endif ``` ### Runtime Diagnostics Command ```c void pool_print_diagnostics(void) { printf("=== Pool TLS Learning Diagnostics ===\n"); // Queue health QueueMetrics qm; ace_compute_metrics(&qm); printf("Queue: %lu events, %lu drops (%.2f%%)\n", qm.total_events, qm.dropped_events, qm.drop_rate * 100); // Per-class stats for (int i = 0; i < POOL_SIZE_CLASSES; i++) { uint32_t refill_count = atomic_load(&g_refill_policies[i]); double hit_rate = (double)g_tls_pool_hits[i] / (g_tls_pool_hits[i] + g_tls_pool_misses[i]); printf("Class %2d: refill=%3u hit_rate=%.1f%%\n", i, refill_count, hit_rate * 100); } // Contract violations (if any) #ifdef POOL_DEBUG_CONTRACTS printf("Contract violations: A=%u B=%u C=%u D=%u\n", g_contract_a_violations, g_contract_b_violations, g_contract_c_violations, g_contract_d_violations); #endif } ``` ## 7. Risk Analysis ### Performance Risks | Risk | Mitigation | Severity | |------|------------|----------| | Hot path regression | Feature flags for each phase | Low | | Learning overhead | Async queue, no blocking | Low | | Cache line bouncing | TLS data, no sharing | Low | | Memory overhead | Bounded TLS cache sizes | Medium | ### Complexity Risks | Risk | Mitigation | Severity | |------|------------|----------| | Box boundary violation | Contract D: Separate files, enforced APIs | Medium | | Deadlock in learning | Contract A: Lock-free queue, drops allowed | Low | | Policy instability | Contract B: Only next-refill adjustments | Medium | | Debug complexity | Per-box debug flags | Low | ### Correctness Risks | Risk | Mitigation | Severity | |------|------------|----------| | Header corruption | Magic byte validation | Low | | Double-free | TLS ownership clear | Low | | Memory leak | Drain on thread exit | Medium | | Refill failure | Fallback to system malloc | Low | | Use-after-free | Contract C: Fixed ring buffer, no malloc | Low | ### Contract-Specific Risks | Risk | Contract | Mitigation | |------|----------|------------| | Queue overflow causing blocking | A | Drop events, monitor drop rate | | Learning thread blocking refill | B | Policy reads are atomic only | | Event lifetime issues | C | Fixed ring buffer, memcpy semantics | | Cross-box coupling | D | Separate compilation units, code review | ## 8. Testing Strategy ### Phase 1 Tests - Unit: TLS alloc/free correctness - Perf: 40-60M ops/s target - Stress: Multi-threaded consistency ### Phase 2 Tests - Metrics accuracy validation - Performance regression < 2% - Hit rate analysis ### Phase 3 Tests - Learning convergence - Policy stability - Background thread CPU < 1% ### Contract Validation Tests #### Contract A: Non-Blocking Queue ```c void test_queue_never_blocks(void) { // Fill queue completely for (int i = 0; i < LEARNING_QUEUE_SIZE * 2; i++) { RefillEvent event = {.class_idx = i % 16}; uint64_t start = get_cycles(); ace_push_event(&event); uint64_t elapsed = get_cycles() - start; // Should never take more than 1000 cycles assert(elapsed < 1000); } // Verify drops were tracked assert(atomic_load(&g_queue.drops) > 0); } ``` #### Contract B: Policy Scope ```c void test_policy_scope_limited(void) { // ACE should only write to policy table uint32_t old_count = g_tls_pool_count[0]; // Trigger learning update ace_update_policy(0, 128); // Verify TLS state unchanged assert(g_tls_pool_count[0] == old_count); // Verify policy updated assert(ace_get_refill_count(0) == 128); } ``` #### Contract C: Memory Safety ```c void test_no_use_after_free(void) { RefillEvent stack_event = {.class_idx = 5}; // Push event (should be copied) ace_push_event(&stack_event); // Modify stack event stack_event.class_idx = 10; // Consume event - should see original value ace_consume_single_event(); assert(last_processed_class == 5); } ``` #### Contract D: API Boundaries ```c // This should fail to compile if boundaries are correct #ifdef TEST_CONTRACT_D_VIOLATION // In ace_learning.c void bad_function(void) { // Should not compile - Box3 can't call Box1 pool_alloc(128); // VIOLATION! } #endif ``` ## 9. Implementation Timeline ``` Day 1-2: Phase 1 (Simple TLS) - pool_tls.c implementation - Basic testing - Performance validation Day 3: Phase 2 (Metrics) - Add counters - Stats reporting - Identify hot classes Day 4-5: Phase 3 (Learning) - ace_learning.c - MPSC queue - UCB1 algorithm Day 6: Integration Testing - Full system test - Performance validation - Documentation ``` ## Conclusion This design achieves: - ✅ **Clean separation**: Three distinct boxes with clear boundaries - ✅ **Simple hot path**: 5-6 cycles for alloc/free - ✅ **Smart learning**: UCB1 in background, no hot path impact - ✅ **Progressive enhancement**: Each phase independently valuable - ✅ **User's vision**: "キャッシュ増やす時だけ学習させる、push して他のスレッドに任せる" **Critical Specifications Now Formalized:** - ✅ **Contract A**: Queue overflow policy - DROP events, never block - ✅ **Contract B**: Policy scope limitation - Only adjust next refill - ✅ **Contract C**: Memory ownership model - Fixed ring buffer, no UAF - ✅ **Contract D**: API boundary enforcement - Separate files, no cross-calls The key insight is that learning during refill (cold path) keeps the hot path pristine while still enabling intelligent adaptation. The lock-free MPSC queue with explicit drop policy ensures zero contention between workers and the learning thread. **Ready for Implementation**: All ambiguities resolved, contracts specified, testing defined.