879 lines
27 KiB
Markdown
879 lines
27 KiB
Markdown
|
|
# Pool TLS + Learning Layer Integration Design
|
||
|
|
|
||
|
|
## Executive Summary
|
||
|
|
|
||
|
|
**Core Insight**: "キャッシュ増やす時だけ学習させる、push して他のスレッドに任せる"
|
||
|
|
- Learning happens ONLY during refill (cold path)
|
||
|
|
- Hot path stays ultra-fast (5-6 cycles)
|
||
|
|
- Learning data pushed async to background thread
|
||
|
|
|
||
|
|
## 1. Box Architecture
|
||
|
|
|
||
|
|
### Clean Separation Design
|
||
|
|
|
||
|
|
```
|
||
|
|
┌──────────────────────────────────────────────────────────────┐
|
||
|
|
│ HOT PATH (5-6 cycles) │
|
||
|
|
├──────────────────────────────────────────────────────────────┤
|
||
|
|
│ Box 1: TLS Freelist (pool_tls.c) │
|
||
|
|
│ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ │
|
||
|
|
│ • NO learning code │
|
||
|
|
│ • NO metrics collection │
|
||
|
|
│ • Just pop/push freelists │
|
||
|
|
│ │
|
||
|
|
│ API: │
|
||
|
|
│ - pool_alloc_fast(class) → void* │
|
||
|
|
│ - pool_free_fast(ptr, class) → void │
|
||
|
|
│ - pool_needs_refill(class) → bool │
|
||
|
|
└────────────────────────┬─────────────────────────────────────┘
|
||
|
|
│ Refill trigger (miss)
|
||
|
|
↓
|
||
|
|
┌──────────────────────────────────────────────────────────────┐
|
||
|
|
│ COLD PATH (100+ cycles) │
|
||
|
|
├──────────────────────────────────────────────────────────────┤
|
||
|
|
│ Box 2: Refill Engine (pool_refill.c) │
|
||
|
|
│ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ │
|
||
|
|
│ • Batch allocate from backend │
|
||
|
|
│ • Write headers (if enabled) │
|
||
|
|
│ • Collect metrics HERE │
|
||
|
|
│ • Push learning event (async) │
|
||
|
|
│ │
|
||
|
|
│ API: │
|
||
|
|
│ - pool_refill(class) → int │
|
||
|
|
│ - pool_get_refill_count(class) → int │
|
||
|
|
│ - pool_notify_refill(class, count) → void │
|
||
|
|
└────────────────────────┬─────────────────────────────────────┘
|
||
|
|
│ Learning event (async)
|
||
|
|
↓
|
||
|
|
┌──────────────────────────────────────────────────────────────┐
|
||
|
|
│ BACKGROUND (separate thread) │
|
||
|
|
├──────────────────────────────────────────────────────────────┤
|
||
|
|
│ Box 3: ACE Learning (ace_learning.c) │
|
||
|
|
│ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ │
|
||
|
|
│ • Consume learning events │
|
||
|
|
│ • Update policies (UCB1, etc) │
|
||
|
|
│ • Tune refill counts │
|
||
|
|
│ • NO direct interaction with hot path │
|
||
|
|
│ │
|
||
|
|
│ API: │
|
||
|
|
│ - ace_push_event(event) → void │
|
||
|
|
│ - ace_get_policy(class) → policy │
|
||
|
|
│ - ace_background_thread() → void │
|
||
|
|
└──────────────────────────────────────────────────────────────┘
|
||
|
|
```
|
||
|
|
|
||
|
|
### Key Design Principles
|
||
|
|
|
||
|
|
1. **NO learning code in hot path** - Box 1 is pristine
|
||
|
|
2. **Metrics collection in refill only** - Box 2 handles all instrumentation
|
||
|
|
3. **Async learning** - Box 3 runs independently
|
||
|
|
4. **One-way data flow** - Events flow down, policies flow up via shared memory
|
||
|
|
|
||
|
|
## 2. Learning Event Design
|
||
|
|
|
||
|
|
### Event Structure
|
||
|
|
|
||
|
|
```c
|
||
|
|
typedef struct {
|
||
|
|
uint32_t thread_id; // Which thread triggered refill
|
||
|
|
uint16_t class_idx; // Size class
|
||
|
|
uint16_t refill_count; // How many blocks refilled
|
||
|
|
uint64_t timestamp_ns; // When refill occurred
|
||
|
|
uint32_t miss_streak; // Consecutive misses before refill
|
||
|
|
uint32_t tls_occupancy; // How full was cache before refill
|
||
|
|
uint32_t flags; // FIRST_REFILL, FORCED_DRAIN, etc.
|
||
|
|
} RefillEvent;
|
||
|
|
```
|
||
|
|
|
||
|
|
### Collection Points (in pool_refill.c ONLY)
|
||
|
|
|
||
|
|
```c
|
||
|
|
static inline void pool_refill_internal(int class_idx) {
|
||
|
|
// 1. Capture pre-refill state
|
||
|
|
uint32_t old_count = g_tls_pool_count[class_idx];
|
||
|
|
uint32_t miss_streak = g_tls_miss_streak[class_idx];
|
||
|
|
|
||
|
|
// 2. Get refill policy (from ACE or default)
|
||
|
|
int refill_count = pool_get_refill_count(class_idx);
|
||
|
|
|
||
|
|
// 3. Batch allocate
|
||
|
|
void* chain = backend_batch_alloc(class_idx, refill_count);
|
||
|
|
|
||
|
|
// 4. Install in TLS
|
||
|
|
pool_splice_chain(class_idx, chain, refill_count);
|
||
|
|
|
||
|
|
// 5. Create learning event (AFTER successful refill)
|
||
|
|
RefillEvent event = {
|
||
|
|
.thread_id = pool_get_thread_id(),
|
||
|
|
.class_idx = class_idx,
|
||
|
|
.refill_count = refill_count,
|
||
|
|
.timestamp_ns = pool_get_timestamp(),
|
||
|
|
.miss_streak = miss_streak,
|
||
|
|
.tls_occupancy = old_count,
|
||
|
|
.flags = (old_count == 0) ? FIRST_REFILL : 0
|
||
|
|
};
|
||
|
|
|
||
|
|
// 6. Push to learning queue (non-blocking)
|
||
|
|
ace_push_event(&event);
|
||
|
|
|
||
|
|
// 7. Reset counters
|
||
|
|
g_tls_miss_streak[class_idx] = 0;
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
## 3. Thread-Crossing Strategy
|
||
|
|
|
||
|
|
### Chosen Design: Lock-Free MPSC Queue
|
||
|
|
|
||
|
|
**Rationale**: Minimal overhead, no blocking, simple to implement
|
||
|
|
|
||
|
|
```c
|
||
|
|
// Lock-free multi-producer single-consumer queue
|
||
|
|
typedef struct {
|
||
|
|
_Atomic(RefillEvent*) events[LEARNING_QUEUE_SIZE];
|
||
|
|
_Atomic uint64_t write_pos;
|
||
|
|
uint64_t read_pos; // Only accessed by consumer
|
||
|
|
_Atomic uint64_t drops; // Track dropped events (Contract A)
|
||
|
|
} LearningQueue;
|
||
|
|
|
||
|
|
// Producer side (worker threads during refill)
|
||
|
|
void ace_push_event(RefillEvent* event) {
|
||
|
|
uint64_t pos = atomic_fetch_add(&g_queue.write_pos, 1);
|
||
|
|
uint64_t slot = pos % LEARNING_QUEUE_SIZE;
|
||
|
|
|
||
|
|
// Contract A: Check for full queue and drop if necessary
|
||
|
|
if (atomic_load(&g_queue.events[slot]) != NULL) {
|
||
|
|
atomic_fetch_add(&g_queue.drops, 1);
|
||
|
|
return; // DROP - never block!
|
||
|
|
}
|
||
|
|
|
||
|
|
// Copy event to pre-allocated slot (Contract C: fixed ring buffer)
|
||
|
|
RefillEvent* dest = &g_event_pool[slot];
|
||
|
|
memcpy(dest, event, sizeof(RefillEvent));
|
||
|
|
|
||
|
|
// Publish (release semantics)
|
||
|
|
atomic_store_explicit(&g_queue.events[slot], dest, memory_order_release);
|
||
|
|
}
|
||
|
|
|
||
|
|
// Consumer side (learning thread)
|
||
|
|
void ace_consume_events(void) {
|
||
|
|
while (running) {
|
||
|
|
uint64_t slot = g_queue.read_pos % LEARNING_QUEUE_SIZE;
|
||
|
|
RefillEvent* event = atomic_load_explicit(
|
||
|
|
&g_queue.events[slot], memory_order_acquire);
|
||
|
|
|
||
|
|
if (event) {
|
||
|
|
ace_process_event(event);
|
||
|
|
atomic_store(&g_queue.events[slot], NULL);
|
||
|
|
g_queue.read_pos++;
|
||
|
|
} else {
|
||
|
|
// No events, sleep briefly
|
||
|
|
usleep(1000); // 1ms
|
||
|
|
}
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### Why Not TLS Accumulation?
|
||
|
|
|
||
|
|
- ❌ Requires synchronization points (when to flush?)
|
||
|
|
- ❌ Delays learning (batch vs streaming)
|
||
|
|
- ❌ More complex state management
|
||
|
|
- ✅ MPSC queue is simpler and proven
|
||
|
|
|
||
|
|
## 4. Interface Contracts (Critical Specifications)
|
||
|
|
|
||
|
|
### Contract A: Queue Overflow Policy
|
||
|
|
|
||
|
|
**Rule**: ace_push_event() MUST NEVER BLOCK
|
||
|
|
|
||
|
|
**Implementation**:
|
||
|
|
- If queue is full: DROP the event silently
|
||
|
|
- Rationale: Hot path correctness > complete telemetry
|
||
|
|
- Monitoring: Track drop count for diagnostics
|
||
|
|
|
||
|
|
**Code**:
|
||
|
|
```c
|
||
|
|
void ace_push_event(RefillEvent* event) {
|
||
|
|
uint64_t pos = atomic_fetch_add(&g_queue.write_pos, 1);
|
||
|
|
uint64_t slot = pos % LEARNING_QUEUE_SIZE;
|
||
|
|
|
||
|
|
// Check if slot is still occupied (queue full)
|
||
|
|
if (atomic_load(&g_queue.events[slot]) != NULL) {
|
||
|
|
atomic_fetch_add(&g_queue.drops, 1); // Track drops
|
||
|
|
return; // DROP - don't wait!
|
||
|
|
}
|
||
|
|
|
||
|
|
// Safe to write - copy to ring buffer
|
||
|
|
memcpy(&g_event_pool[slot], event, sizeof(RefillEvent));
|
||
|
|
atomic_store_explicit(&g_queue.events[slot], &g_event_pool[slot],
|
||
|
|
memory_order_release);
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### Contract B: Policy Scope Limitation
|
||
|
|
|
||
|
|
**Rule**: ACE can ONLY adjust "next refill parameters"
|
||
|
|
|
||
|
|
**Allowed**:
|
||
|
|
- ✅ Refill count for next miss
|
||
|
|
- ✅ Drain threshold adjustments
|
||
|
|
- ✅ Pre-warming at thread init
|
||
|
|
|
||
|
|
**FORBIDDEN**:
|
||
|
|
- ❌ Immediate cache flush
|
||
|
|
- ❌ Blocking operations
|
||
|
|
- ❌ Direct TLS manipulation
|
||
|
|
|
||
|
|
**Implementation**:
|
||
|
|
- ACE writes to: `g_refill_policies[class_idx]` (atomic)
|
||
|
|
- Box2 reads from: `ace_get_refill_count(class_idx)` (atomic load, no blocking)
|
||
|
|
|
||
|
|
**Code**:
|
||
|
|
```c
|
||
|
|
// ACE side - writes policy
|
||
|
|
void ace_update_policy(int class_idx, uint32_t new_count) {
|
||
|
|
// ONLY writes to policy table
|
||
|
|
atomic_store(&g_refill_policies[class_idx], new_count);
|
||
|
|
}
|
||
|
|
|
||
|
|
// Box2 side - reads policy (never blocks)
|
||
|
|
uint32_t pool_get_refill_count(int class_idx) {
|
||
|
|
uint32_t count = atomic_load(&g_refill_policies[class_idx]);
|
||
|
|
return count ? count : DEFAULT_REFILL_COUNT[class_idx];
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### Contract C: Memory Ownership Model
|
||
|
|
|
||
|
|
**Rule**: Clear ownership to prevent use-after-free
|
||
|
|
|
||
|
|
**Model**: Fixed Ring Buffer (No Allocations)
|
||
|
|
|
||
|
|
```c
|
||
|
|
// Pre-allocated event pool
|
||
|
|
static RefillEvent g_event_pool[LEARNING_QUEUE_SIZE];
|
||
|
|
|
||
|
|
// Producer (Box2)
|
||
|
|
void ace_push_event(RefillEvent* event) {
|
||
|
|
uint64_t pos = atomic_fetch_add(&g_queue.write_pos, 1);
|
||
|
|
uint64_t slot = pos % LEARNING_QUEUE_SIZE;
|
||
|
|
|
||
|
|
// Check for full queue (Contract A)
|
||
|
|
if (atomic_load(&g_queue.events[slot]) != NULL) {
|
||
|
|
atomic_fetch_add(&g_queue.drops, 1);
|
||
|
|
return;
|
||
|
|
}
|
||
|
|
|
||
|
|
// Copy to fixed slot (no malloc!)
|
||
|
|
memcpy(&g_event_pool[slot], event, sizeof(RefillEvent));
|
||
|
|
|
||
|
|
// Publish pointer
|
||
|
|
atomic_store(&g_queue.events[slot], &g_event_pool[slot]);
|
||
|
|
}
|
||
|
|
|
||
|
|
// Consumer (Box3)
|
||
|
|
void ace_consume_events(void) {
|
||
|
|
RefillEvent* event = atomic_load(&g_queue.events[slot]);
|
||
|
|
|
||
|
|
if (event) {
|
||
|
|
// Process (event lifetime guaranteed by ring buffer)
|
||
|
|
ace_process_event(event);
|
||
|
|
|
||
|
|
// Release slot
|
||
|
|
atomic_store(&g_queue.events[slot], NULL);
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Ownership Rules**:
|
||
|
|
- Producer: COPIES to ring buffer (stack event is safe to discard)
|
||
|
|
- Consumer: READS from ring buffer (no ownership transfer)
|
||
|
|
- Ring buffer: OWNS all events (never freed, just reused)
|
||
|
|
|
||
|
|
### Contract D: API Boundary Enforcement
|
||
|
|
|
||
|
|
**Box1 API (pool_tls.h)**:
|
||
|
|
```c
|
||
|
|
// PUBLIC: Hot path functions
|
||
|
|
void* pool_alloc(size_t size);
|
||
|
|
void pool_free(void* ptr);
|
||
|
|
|
||
|
|
// INTERNAL: Only called by Box2
|
||
|
|
void pool_install_chain(int class_idx, void* chain, int count);
|
||
|
|
```
|
||
|
|
|
||
|
|
**Box2 API (pool_refill.h)**:
|
||
|
|
```c
|
||
|
|
// INTERNAL: Refill implementation
|
||
|
|
void* pool_refill_and_alloc(int class_idx);
|
||
|
|
|
||
|
|
// Box2 is ONLY box that calls ace_push_event()
|
||
|
|
// (Enforced by making it static in pool_refill.c)
|
||
|
|
static void notify_learning(RefillEvent* event) {
|
||
|
|
ace_push_event(event);
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Box3 API (ace_learning.h)**:
|
||
|
|
```c
|
||
|
|
// POLICY OUTPUT: Box2 reads these
|
||
|
|
uint32_t ace_get_refill_count(int class_idx);
|
||
|
|
|
||
|
|
// EVENT INPUT: Only Box2 calls this
|
||
|
|
void ace_push_event(RefillEvent* event);
|
||
|
|
|
||
|
|
// Box3 NEVER calls Box1 functions directly
|
||
|
|
// Box3 NEVER blocks Box1 or Box2
|
||
|
|
```
|
||
|
|
|
||
|
|
**Enforcement Strategy**:
|
||
|
|
- Separate .c files (no cross-includes except public headers)
|
||
|
|
- Static functions where appropriate
|
||
|
|
- Code review checklist in POOL_IMPLEMENTATION_CHECKLIST.md
|
||
|
|
|
||
|
|
## 5. Progressive Implementation Plan
|
||
|
|
|
||
|
|
### Phase 1: Ultra-Simple TLS (2 days)
|
||
|
|
|
||
|
|
**Goal**: 40-60M ops/s without any learning
|
||
|
|
|
||
|
|
**Files**:
|
||
|
|
- `core/pool_tls.c` - TLS freelist implementation
|
||
|
|
- `core/pool_tls.h` - Public API
|
||
|
|
|
||
|
|
**Code** (pool_tls.c):
|
||
|
|
```c
|
||
|
|
// Global TLS state (per-thread)
|
||
|
|
__thread void* g_tls_pool_head[POOL_SIZE_CLASSES];
|
||
|
|
__thread uint32_t g_tls_pool_count[POOL_SIZE_CLASSES];
|
||
|
|
|
||
|
|
// Fixed refill counts for Phase 1
|
||
|
|
static const uint32_t DEFAULT_REFILL_COUNT[POOL_SIZE_CLASSES] = {
|
||
|
|
64, 64, 48, 48, 32, 32, 24, 24, // Small (high frequency)
|
||
|
|
16, 16, 12, 12, 8, 8, 8, 8 // Large (lower frequency)
|
||
|
|
};
|
||
|
|
|
||
|
|
// Ultra-fast allocation (5-6 cycles)
|
||
|
|
void* pool_alloc_fast(size_t size) {
|
||
|
|
int class_idx = pool_size_to_class(size);
|
||
|
|
void* head = g_tls_pool_head[class_idx];
|
||
|
|
|
||
|
|
if (LIKELY(head)) {
|
||
|
|
// Pop from freelist
|
||
|
|
g_tls_pool_head[class_idx] = *(void**)head;
|
||
|
|
g_tls_pool_count[class_idx]--;
|
||
|
|
|
||
|
|
// Write header if enabled
|
||
|
|
#if POOL_USE_HEADERS
|
||
|
|
*((uint8_t*)head - 1) = POOL_MAGIC | class_idx;
|
||
|
|
#endif
|
||
|
|
|
||
|
|
return head;
|
||
|
|
}
|
||
|
|
|
||
|
|
// Cold path: refill
|
||
|
|
return pool_refill_and_alloc(class_idx);
|
||
|
|
}
|
||
|
|
|
||
|
|
// Simple refill (no learning)
|
||
|
|
static void* pool_refill_and_alloc(int class_idx) {
|
||
|
|
int count = DEFAULT_REFILL_COUNT[class_idx];
|
||
|
|
|
||
|
|
// Batch allocate from SuperSlab
|
||
|
|
void* chain = ss_batch_carve(class_idx, count);
|
||
|
|
if (!chain) return NULL;
|
||
|
|
|
||
|
|
// Pop first for return
|
||
|
|
void* ret = chain;
|
||
|
|
chain = *(void**)chain;
|
||
|
|
count--;
|
||
|
|
|
||
|
|
// Install rest in TLS
|
||
|
|
g_tls_pool_head[class_idx] = chain;
|
||
|
|
g_tls_pool_count[class_idx] = count;
|
||
|
|
|
||
|
|
#if POOL_USE_HEADERS
|
||
|
|
*((uint8_t*)ret - 1) = POOL_MAGIC | class_idx;
|
||
|
|
#endif
|
||
|
|
|
||
|
|
return ret;
|
||
|
|
}
|
||
|
|
|
||
|
|
// Ultra-fast free (5-6 cycles)
|
||
|
|
void pool_free_fast(void* ptr) {
|
||
|
|
#if POOL_USE_HEADERS
|
||
|
|
uint8_t header = *((uint8_t*)ptr - 1);
|
||
|
|
if ((header & 0xF0) != POOL_MAGIC) {
|
||
|
|
// Not ours, route elsewhere
|
||
|
|
return pool_free_slow(ptr);
|
||
|
|
}
|
||
|
|
int class_idx = header & 0x0F;
|
||
|
|
#else
|
||
|
|
int class_idx = pool_ptr_to_class(ptr); // Lookup
|
||
|
|
#endif
|
||
|
|
|
||
|
|
// Push to freelist
|
||
|
|
*(void**)ptr = g_tls_pool_head[class_idx];
|
||
|
|
g_tls_pool_head[class_idx] = ptr;
|
||
|
|
g_tls_pool_count[class_idx]++;
|
||
|
|
|
||
|
|
// Optional: drain if too full
|
||
|
|
if (UNLIKELY(g_tls_pool_count[class_idx] > MAX_TLS_CACHE)) {
|
||
|
|
pool_drain_excess(class_idx);
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Acceptance Criteria**:
|
||
|
|
- ✅ Larson: 2.5M+ ops/s
|
||
|
|
- ✅ bench_random_mixed: 40M+ ops/s
|
||
|
|
- ✅ No learning code present
|
||
|
|
- ✅ Clean, readable, < 200 LOC
|
||
|
|
|
||
|
|
### Phase 2: Metrics Collection (1 day)
|
||
|
|
|
||
|
|
**Goal**: Add instrumentation without slowing hot path
|
||
|
|
|
||
|
|
**Changes**:
|
||
|
|
```c
|
||
|
|
// Add to TLS state
|
||
|
|
__thread uint64_t g_tls_pool_hits[POOL_SIZE_CLASSES];
|
||
|
|
__thread uint64_t g_tls_pool_misses[POOL_SIZE_CLASSES];
|
||
|
|
__thread uint32_t g_tls_miss_streak[POOL_SIZE_CLASSES];
|
||
|
|
|
||
|
|
// In pool_alloc_fast() - hot path
|
||
|
|
if (LIKELY(head)) {
|
||
|
|
#ifdef POOL_COLLECT_METRICS
|
||
|
|
g_tls_pool_hits[class_idx]++; // Single increment
|
||
|
|
#endif
|
||
|
|
// ... existing code
|
||
|
|
}
|
||
|
|
|
||
|
|
// In pool_refill_and_alloc() - cold path
|
||
|
|
g_tls_pool_misses[class_idx]++;
|
||
|
|
g_tls_miss_streak[class_idx]++;
|
||
|
|
|
||
|
|
// New stats function
|
||
|
|
void pool_print_stats(void) {
|
||
|
|
for (int i = 0; i < POOL_SIZE_CLASSES; i++) {
|
||
|
|
double hit_rate = (double)g_tls_pool_hits[i] /
|
||
|
|
(g_tls_pool_hits[i] + g_tls_pool_misses[i]);
|
||
|
|
printf("Class %d: %.2f%% hit rate, avg streak %u\n",
|
||
|
|
i, hit_rate * 100, avg_streak[i]);
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Acceptance Criteria**:
|
||
|
|
- ✅ < 2% performance regression
|
||
|
|
- ✅ Accurate hit rate reporting
|
||
|
|
- ✅ Identify hot classes for Phase 3
|
||
|
|
|
||
|
|
### Phase 3: Learning Integration (2 days)
|
||
|
|
|
||
|
|
**Goal**: Connect ACE learning without touching hot path
|
||
|
|
|
||
|
|
**New Files**:
|
||
|
|
- `core/ace_learning.c` - Learning thread
|
||
|
|
- `core/ace_policy.h` - Policy structures
|
||
|
|
|
||
|
|
**Integration Points**:
|
||
|
|
|
||
|
|
1. **Startup**: Launch learning thread
|
||
|
|
```c
|
||
|
|
void hakmem_init(void) {
|
||
|
|
// ... existing init
|
||
|
|
ace_start_learning_thread();
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
2. **Refill**: Push events
|
||
|
|
```c
|
||
|
|
// In pool_refill_and_alloc() - add after successful refill
|
||
|
|
RefillEvent event = { /* ... */ };
|
||
|
|
ace_push_event(&event); // Non-blocking
|
||
|
|
```
|
||
|
|
|
||
|
|
3. **Policy Application**: Read tuned values
|
||
|
|
```c
|
||
|
|
// Replace DEFAULT_REFILL_COUNT with dynamic lookup
|
||
|
|
int count = ace_get_refill_count(class_idx);
|
||
|
|
// Falls back to default if no policy yet
|
||
|
|
```
|
||
|
|
|
||
|
|
**ACE Learning Algorithm** (ace_learning.c):
|
||
|
|
```c
|
||
|
|
// UCB1 for exploration vs exploitation
|
||
|
|
typedef struct {
|
||
|
|
double total_reward; // Sum of rewards
|
||
|
|
uint64_t play_count; // Times tried
|
||
|
|
uint32_t refill_size; // Current policy
|
||
|
|
} ClassPolicy;
|
||
|
|
|
||
|
|
static ClassPolicy g_policies[POOL_SIZE_CLASSES];
|
||
|
|
|
||
|
|
void ace_process_event(RefillEvent* e) {
|
||
|
|
ClassPolicy* p = &g_policies[e->class_idx];
|
||
|
|
|
||
|
|
// Compute reward (inverse of miss streak)
|
||
|
|
double reward = 1.0 / (1.0 + e->miss_streak);
|
||
|
|
|
||
|
|
// Update UCB1 statistics
|
||
|
|
p->total_reward += reward;
|
||
|
|
p->play_count++;
|
||
|
|
|
||
|
|
// Adjust refill size based on occupancy
|
||
|
|
if (e->tls_occupancy < 4) {
|
||
|
|
// Cache was nearly empty, increase refill
|
||
|
|
p->refill_size = MIN(p->refill_size * 1.5, 256);
|
||
|
|
} else if (e->tls_occupancy > 32) {
|
||
|
|
// Cache had plenty, decrease refill
|
||
|
|
p->refill_size = MAX(p->refill_size * 0.75, 16);
|
||
|
|
}
|
||
|
|
|
||
|
|
// Publish new policy (atomic write)
|
||
|
|
atomic_store(&g_refill_policies[e->class_idx], p->refill_size);
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Acceptance Criteria**:
|
||
|
|
- ✅ No regression in hot path performance
|
||
|
|
- ✅ Refill sizes adapt to workload
|
||
|
|
- ✅ Background thread < 1% CPU
|
||
|
|
|
||
|
|
## 5. API Specifications
|
||
|
|
|
||
|
|
### Box 1: TLS Freelist API
|
||
|
|
|
||
|
|
```c
|
||
|
|
// Public API (pool_tls.h)
|
||
|
|
void* pool_alloc(size_t size);
|
||
|
|
void pool_free(void* ptr);
|
||
|
|
void pool_thread_init(void);
|
||
|
|
void pool_thread_cleanup(void);
|
||
|
|
|
||
|
|
// Internal API (for refill box)
|
||
|
|
int pool_needs_refill(int class_idx);
|
||
|
|
void pool_install_chain(int class_idx, void* chain, int count);
|
||
|
|
```
|
||
|
|
|
||
|
|
### Box 2: Refill API
|
||
|
|
|
||
|
|
```c
|
||
|
|
// Internal API (pool_refill.h)
|
||
|
|
void* pool_refill_and_alloc(int class_idx);
|
||
|
|
int pool_get_refill_count(int class_idx);
|
||
|
|
void pool_drain_excess(int class_idx);
|
||
|
|
|
||
|
|
// Backend interface
|
||
|
|
void* backend_batch_alloc(int class_idx, int count);
|
||
|
|
void backend_batch_free(int class_idx, void* chain, int count);
|
||
|
|
```
|
||
|
|
|
||
|
|
### Box 3: Learning API
|
||
|
|
|
||
|
|
```c
|
||
|
|
// Public API (ace_learning.h)
|
||
|
|
void ace_start_learning_thread(void);
|
||
|
|
void ace_stop_learning_thread(void);
|
||
|
|
void ace_push_event(RefillEvent* event);
|
||
|
|
|
||
|
|
// Policy API
|
||
|
|
uint32_t ace_get_refill_count(int class_idx);
|
||
|
|
void ace_reset_policies(void);
|
||
|
|
void ace_print_stats(void);
|
||
|
|
```
|
||
|
|
|
||
|
|
## 6. Diagnostics and Monitoring
|
||
|
|
|
||
|
|
### Queue Health Metrics
|
||
|
|
|
||
|
|
```c
|
||
|
|
typedef struct {
|
||
|
|
uint64_t total_events; // Total events pushed
|
||
|
|
uint64_t dropped_events; // Events dropped due to full queue
|
||
|
|
uint64_t processed_events; // Events successfully processed
|
||
|
|
double drop_rate; // drops / total_events
|
||
|
|
} QueueMetrics;
|
||
|
|
|
||
|
|
void ace_compute_metrics(QueueMetrics* m) {
|
||
|
|
m->total_events = atomic_load(&g_queue.write_pos);
|
||
|
|
m->dropped_events = atomic_load(&g_queue.drops);
|
||
|
|
m->processed_events = g_queue.read_pos;
|
||
|
|
m->drop_rate = (double)m->dropped_events / m->total_events;
|
||
|
|
|
||
|
|
// Alert if drop rate exceeds threshold
|
||
|
|
if (m->drop_rate > 0.01) { // > 1% drops
|
||
|
|
fprintf(stderr, "WARNING: Queue drop rate %.2f%% - increase LEARNING_QUEUE_SIZE\n",
|
||
|
|
m->drop_rate * 100);
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Target Metrics**:
|
||
|
|
- Drop rate: < 0.1% (normal operation)
|
||
|
|
- If > 1%: Increase LEARNING_QUEUE_SIZE
|
||
|
|
- If > 5%: Critical - learning degraded
|
||
|
|
|
||
|
|
### Policy Stability Metrics
|
||
|
|
|
||
|
|
```c
|
||
|
|
typedef struct {
|
||
|
|
uint32_t refill_count;
|
||
|
|
uint32_t change_count; // Times policy changed
|
||
|
|
uint64_t last_change_ns; // When last changed
|
||
|
|
double variance; // Refill count variance
|
||
|
|
} PolicyMetrics;
|
||
|
|
|
||
|
|
void ace_track_policy_stability(int class_idx) {
|
||
|
|
static PolicyMetrics metrics[POOL_SIZE_CLASSES];
|
||
|
|
PolicyMetrics* m = &metrics[class_idx];
|
||
|
|
|
||
|
|
uint32_t new_count = atomic_load(&g_refill_policies[class_idx]);
|
||
|
|
if (new_count != m->refill_count) {
|
||
|
|
m->change_count++;
|
||
|
|
m->last_change_ns = get_timestamp_ns();
|
||
|
|
|
||
|
|
// Detect oscillation
|
||
|
|
uint64_t change_interval = get_timestamp_ns() - m->last_change_ns;
|
||
|
|
if (change_interval < 1000000000) { // < 1 second
|
||
|
|
fprintf(stderr, "WARNING: Class %d policy oscillating\n", class_idx);
|
||
|
|
}
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### Debug Flags
|
||
|
|
|
||
|
|
```c
|
||
|
|
// Contract validation
|
||
|
|
#ifdef POOL_DEBUG_CONTRACTS
|
||
|
|
#define VALIDATE_CONTRACT_A() do { \
|
||
|
|
if (is_blocking_detected()) { \
|
||
|
|
panic("Contract A violation: ace_push_event blocked!"); \
|
||
|
|
} \
|
||
|
|
} while(0)
|
||
|
|
|
||
|
|
#define VALIDATE_CONTRACT_B() do { \
|
||
|
|
if (ace_performed_immediate_action()) { \
|
||
|
|
panic("Contract B violation: ACE performed immediate action!"); \
|
||
|
|
} \
|
||
|
|
} while(0)
|
||
|
|
|
||
|
|
#define VALIDATE_CONTRACT_D() do { \
|
||
|
|
if (box3_called_box1_function()) { \
|
||
|
|
panic("Contract D violation: Box3 called Box1 directly!"); \
|
||
|
|
} \
|
||
|
|
} while(0)
|
||
|
|
#else
|
||
|
|
#define VALIDATE_CONTRACT_A()
|
||
|
|
#define VALIDATE_CONTRACT_B()
|
||
|
|
#define VALIDATE_CONTRACT_D()
|
||
|
|
#endif
|
||
|
|
|
||
|
|
// Drop tracking
|
||
|
|
#ifdef POOL_DEBUG_DROPS
|
||
|
|
#define LOG_DROP() fprintf(stderr, "DROP: tid=%lu class=%d @ %s:%d\n", \
|
||
|
|
pthread_self(), class_idx, __FILE__, __LINE__)
|
||
|
|
#else
|
||
|
|
#define LOG_DROP()
|
||
|
|
#endif
|
||
|
|
```
|
||
|
|
|
||
|
|
### Runtime Diagnostics Command
|
||
|
|
|
||
|
|
```c
|
||
|
|
void pool_print_diagnostics(void) {
|
||
|
|
printf("=== Pool TLS Learning Diagnostics ===\n");
|
||
|
|
|
||
|
|
// Queue health
|
||
|
|
QueueMetrics qm;
|
||
|
|
ace_compute_metrics(&qm);
|
||
|
|
printf("Queue: %lu events, %lu drops (%.2f%%)\n",
|
||
|
|
qm.total_events, qm.dropped_events, qm.drop_rate * 100);
|
||
|
|
|
||
|
|
// Per-class stats
|
||
|
|
for (int i = 0; i < POOL_SIZE_CLASSES; i++) {
|
||
|
|
uint32_t refill_count = atomic_load(&g_refill_policies[i]);
|
||
|
|
double hit_rate = (double)g_tls_pool_hits[i] /
|
||
|
|
(g_tls_pool_hits[i] + g_tls_pool_misses[i]);
|
||
|
|
|
||
|
|
printf("Class %2d: refill=%3u hit_rate=%.1f%%\n",
|
||
|
|
i, refill_count, hit_rate * 100);
|
||
|
|
}
|
||
|
|
|
||
|
|
// Contract violations (if any)
|
||
|
|
#ifdef POOL_DEBUG_CONTRACTS
|
||
|
|
printf("Contract violations: A=%u B=%u C=%u D=%u\n",
|
||
|
|
g_contract_a_violations, g_contract_b_violations,
|
||
|
|
g_contract_c_violations, g_contract_d_violations);
|
||
|
|
#endif
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
## 7. Risk Analysis
|
||
|
|
|
||
|
|
### Performance Risks
|
||
|
|
|
||
|
|
| Risk | Mitigation | Severity |
|
||
|
|
|------|------------|----------|
|
||
|
|
| Hot path regression | Feature flags for each phase | Low |
|
||
|
|
| Learning overhead | Async queue, no blocking | Low |
|
||
|
|
| Cache line bouncing | TLS data, no sharing | Low |
|
||
|
|
| Memory overhead | Bounded TLS cache sizes | Medium |
|
||
|
|
|
||
|
|
### Complexity Risks
|
||
|
|
|
||
|
|
| Risk | Mitigation | Severity |
|
||
|
|
|------|------------|----------|
|
||
|
|
| Box boundary violation | Contract D: Separate files, enforced APIs | Medium |
|
||
|
|
| Deadlock in learning | Contract A: Lock-free queue, drops allowed | Low |
|
||
|
|
| Policy instability | Contract B: Only next-refill adjustments | Medium |
|
||
|
|
| Debug complexity | Per-box debug flags | Low |
|
||
|
|
|
||
|
|
### Correctness Risks
|
||
|
|
|
||
|
|
| Risk | Mitigation | Severity |
|
||
|
|
|------|------------|----------|
|
||
|
|
| Header corruption | Magic byte validation | Low |
|
||
|
|
| Double-free | TLS ownership clear | Low |
|
||
|
|
| Memory leak | Drain on thread exit | Medium |
|
||
|
|
| Refill failure | Fallback to system malloc | Low |
|
||
|
|
| Use-after-free | Contract C: Fixed ring buffer, no malloc | Low |
|
||
|
|
|
||
|
|
### Contract-Specific Risks
|
||
|
|
|
||
|
|
| Risk | Contract | Mitigation |
|
||
|
|
|------|----------|------------|
|
||
|
|
| Queue overflow causing blocking | A | Drop events, monitor drop rate |
|
||
|
|
| Learning thread blocking refill | B | Policy reads are atomic only |
|
||
|
|
| Event lifetime issues | C | Fixed ring buffer, memcpy semantics |
|
||
|
|
| Cross-box coupling | D | Separate compilation units, code review |
|
||
|
|
|
||
|
|
## 8. Testing Strategy
|
||
|
|
|
||
|
|
### Phase 1 Tests
|
||
|
|
- Unit: TLS alloc/free correctness
|
||
|
|
- Perf: 40-60M ops/s target
|
||
|
|
- Stress: Multi-threaded consistency
|
||
|
|
|
||
|
|
### Phase 2 Tests
|
||
|
|
- Metrics accuracy validation
|
||
|
|
- Performance regression < 2%
|
||
|
|
- Hit rate analysis
|
||
|
|
|
||
|
|
### Phase 3 Tests
|
||
|
|
- Learning convergence
|
||
|
|
- Policy stability
|
||
|
|
- Background thread CPU < 1%
|
||
|
|
|
||
|
|
### Contract Validation Tests
|
||
|
|
|
||
|
|
#### Contract A: Non-Blocking Queue
|
||
|
|
```c
|
||
|
|
void test_queue_never_blocks(void) {
|
||
|
|
// Fill queue completely
|
||
|
|
for (int i = 0; i < LEARNING_QUEUE_SIZE * 2; i++) {
|
||
|
|
RefillEvent event = {.class_idx = i % 16};
|
||
|
|
uint64_t start = get_cycles();
|
||
|
|
ace_push_event(&event);
|
||
|
|
uint64_t elapsed = get_cycles() - start;
|
||
|
|
|
||
|
|
// Should never take more than 1000 cycles
|
||
|
|
assert(elapsed < 1000);
|
||
|
|
}
|
||
|
|
|
||
|
|
// Verify drops were tracked
|
||
|
|
assert(atomic_load(&g_queue.drops) > 0);
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
#### Contract B: Policy Scope
|
||
|
|
```c
|
||
|
|
void test_policy_scope_limited(void) {
|
||
|
|
// ACE should only write to policy table
|
||
|
|
uint32_t old_count = g_tls_pool_count[0];
|
||
|
|
|
||
|
|
// Trigger learning update
|
||
|
|
ace_update_policy(0, 128);
|
||
|
|
|
||
|
|
// Verify TLS state unchanged
|
||
|
|
assert(g_tls_pool_count[0] == old_count);
|
||
|
|
|
||
|
|
// Verify policy updated
|
||
|
|
assert(ace_get_refill_count(0) == 128);
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
#### Contract C: Memory Safety
|
||
|
|
```c
|
||
|
|
void test_no_use_after_free(void) {
|
||
|
|
RefillEvent stack_event = {.class_idx = 5};
|
||
|
|
|
||
|
|
// Push event (should be copied)
|
||
|
|
ace_push_event(&stack_event);
|
||
|
|
|
||
|
|
// Modify stack event
|
||
|
|
stack_event.class_idx = 10;
|
||
|
|
|
||
|
|
// Consume event - should see original value
|
||
|
|
ace_consume_single_event();
|
||
|
|
assert(last_processed_class == 5);
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
#### Contract D: API Boundaries
|
||
|
|
```c
|
||
|
|
// This should fail to compile if boundaries are correct
|
||
|
|
#ifdef TEST_CONTRACT_D_VIOLATION
|
||
|
|
// In ace_learning.c
|
||
|
|
void bad_function(void) {
|
||
|
|
// Should not compile - Box3 can't call Box1
|
||
|
|
pool_alloc(128); // VIOLATION!
|
||
|
|
}
|
||
|
|
#endif
|
||
|
|
```
|
||
|
|
|
||
|
|
## 9. Implementation Timeline
|
||
|
|
|
||
|
|
```
|
||
|
|
Day 1-2: Phase 1 (Simple TLS)
|
||
|
|
- pool_tls.c implementation
|
||
|
|
- Basic testing
|
||
|
|
- Performance validation
|
||
|
|
|
||
|
|
Day 3: Phase 2 (Metrics)
|
||
|
|
- Add counters
|
||
|
|
- Stats reporting
|
||
|
|
- Identify hot classes
|
||
|
|
|
||
|
|
Day 4-5: Phase 3 (Learning)
|
||
|
|
- ace_learning.c
|
||
|
|
- MPSC queue
|
||
|
|
- UCB1 algorithm
|
||
|
|
|
||
|
|
Day 6: Integration Testing
|
||
|
|
- Full system test
|
||
|
|
- Performance validation
|
||
|
|
- Documentation
|
||
|
|
```
|
||
|
|
|
||
|
|
## Conclusion
|
||
|
|
|
||
|
|
This design achieves:
|
||
|
|
- ✅ **Clean separation**: Three distinct boxes with clear boundaries
|
||
|
|
- ✅ **Simple hot path**: 5-6 cycles for alloc/free
|
||
|
|
- ✅ **Smart learning**: UCB1 in background, no hot path impact
|
||
|
|
- ✅ **Progressive enhancement**: Each phase independently valuable
|
||
|
|
- ✅ **User's vision**: "キャッシュ増やす時だけ学習させる、push して他のスレッドに任せる"
|
||
|
|
|
||
|
|
**Critical Specifications Now Formalized:**
|
||
|
|
- ✅ **Contract A**: Queue overflow policy - DROP events, never block
|
||
|
|
- ✅ **Contract B**: Policy scope limitation - Only adjust next refill
|
||
|
|
- ✅ **Contract C**: Memory ownership model - Fixed ring buffer, no UAF
|
||
|
|
- ✅ **Contract D**: API boundary enforcement - Separate files, no cross-calls
|
||
|
|
|
||
|
|
The key insight is that learning during refill (cold path) keeps the hot path pristine while still enabling intelligent adaptation. The lock-free MPSC queue with explicit drop policy ensures zero contention between workers and the learning thread.
|
||
|
|
|
||
|
|
**Ready for Implementation**: All ambiguities resolved, contracts specified, testing defined.
|