hakmem/docs/design/POOL_TLS_LEARNING_DESIGN.md

# Pool TLS + Learning Layer Integration Design

## Executive Summary

**Core Insight**: "キャッシュ増やす時だけ学習させる、push して他のスレッドに任せる"
- Learning happens ONLY during refill (cold path)
- Hot path stays ultra-fast (5-6 cycles)
- Learning data pushed async to background thread

## 1. Box Architecture

### Clean Separation Design

```
┌──────────────────────────────────────────────────────────────┐
│                     HOT PATH (5-6 cycles)                     │
├──────────────────────────────────────────────────────────────┤
│  Box 1: TLS Freelist (pool_tls.c)                           │
│  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━                            │
│  • NO learning code                                         │
│  • NO metrics collection                                    │
│  • Just pop/push freelists                                  │
│                                                              │
│  API:                                                        │
│  - pool_alloc_fast(class) → void*                          │
│  - pool_free_fast(ptr, class) → void                       │
│  - pool_needs_refill(class) → bool                         │
└────────────────────────┬─────────────────────────────────────┘
                        │ Refill trigger (miss)
                        ↓
┌──────────────────────────────────────────────────────────────┐
│                    COLD PATH (100+ cycles)                    │
├──────────────────────────────────────────────────────────────┤
│  Box 2: Refill Engine (pool_refill.c)                       │
│  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━                       │
│  • Batch allocate from backend                              │
│  • Write headers (if enabled)                               │
│  • Collect metrics HERE                                     │
│  • Push learning event (async)                              │
│                                                              │
│  API:                                                        │
│  - pool_refill(class) → int                                 │
│  - pool_get_refill_count(class) → int                       │
│  - pool_notify_refill(class, count) → void                  │
└────────────────────────┬─────────────────────────────────────┘
                        │ Learning event (async)
                        ↓
┌──────────────────────────────────────────────────────────────┐
│                  BACKGROUND (separate thread)                 │
├──────────────────────────────────────────────────────────────┤
│  Box 3: ACE Learning (ace_learning.c)                       │
│  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━                       │
│  • Consume learning events                                  │
│  • Update policies (UCB1, etc)                              │
│  • Tune refill counts                                       │
│  • NO direct interaction with hot path                      │
│                                                              │
│  API:                                                        │
│  - ace_push_event(event) → void                             │
│  - ace_get_policy(class) → policy                           │
│  - ace_background_thread() → void                           │
└──────────────────────────────────────────────────────────────┘
```

### Key Design Principles

1. **NO learning code in hot path** - Box 1 is pristine
2. **Metrics collection in refill only** - Box 2 handles all instrumentation
3. **Async learning** - Box 3 runs independently
4. **One-way data flow** - Events flow down, policies flow up via shared memory

## 2. Learning Event Design

### Event Structure

```c
typedef struct {
    uint32_t thread_id;        // Which thread triggered refill
    uint16_t class_idx;        // Size class
    uint16_t refill_count;     // How many blocks refilled
    uint64_t timestamp_ns;     // When refill occurred
    uint32_t miss_streak;      // Consecutive misses before refill
    uint32_t tls_occupancy;    // How full was cache before refill
    uint32_t flags;            // FIRST_REFILL, FORCED_DRAIN, etc.
} RefillEvent;
```

### Collection Points (in pool_refill.c ONLY)

```c
static inline void pool_refill_internal(int class_idx) {
    // 1. Capture pre-refill state
    uint32_t old_count = g_tls_pool_count[class_idx];
    uint32_t miss_streak = g_tls_miss_streak[class_idx];

    // 2. Get refill policy (from ACE or default)
    int refill_count = pool_get_refill_count(class_idx);

    // 3. Batch allocate
    void* chain = backend_batch_alloc(class_idx, refill_count);

    // 4. Install in TLS
    pool_splice_chain(class_idx, chain, refill_count);

    // 5. Create learning event (AFTER successful refill)
    RefillEvent event = {
        .thread_id = pool_get_thread_id(),
        .class_idx = class_idx,
        .refill_count = refill_count,
        .timestamp_ns = pool_get_timestamp(),
        .miss_streak = miss_streak,
        .tls_occupancy = old_count,
        .flags = (old_count == 0) ? FIRST_REFILL : 0
    };

    // 6. Push to learning queue (non-blocking)
    ace_push_event(&event);

    // 7. Reset counters
    g_tls_miss_streak[class_idx] = 0;
}
```

## 3. Thread-Crossing Strategy

### Chosen Design: Lock-Free MPSC Queue

**Rationale**: Minimal overhead, no blocking, simple to implement

```c
// Lock-free multi-producer single-consumer queue
typedef struct {
    _Atomic(RefillEvent*) events[LEARNING_QUEUE_SIZE];
    _Atomic uint64_t write_pos;
    uint64_t read_pos;  // Only accessed by consumer
    _Atomic uint64_t drops;  // Track dropped events (Contract A)
} LearningQueue;

// Producer side (worker threads during refill)
void ace_push_event(RefillEvent* event) {
    uint64_t pos = atomic_fetch_add(&g_queue.write_pos, 1);
    uint64_t slot = pos % LEARNING_QUEUE_SIZE;

    // Contract A: Check for full queue and drop if necessary
    if (atomic_load(&g_queue.events[slot]) != NULL) {
        atomic_fetch_add(&g_queue.drops, 1);
        return;  // DROP - never block!
    }

    // Copy event to pre-allocated slot (Contract C: fixed ring buffer)
    RefillEvent* dest = &g_event_pool[slot];
    memcpy(dest, event, sizeof(RefillEvent));

    // Publish (release semantics)
    atomic_store_explicit(&g_queue.events[slot], dest, memory_order_release);
}

// Consumer side (learning thread)
void ace_consume_events(void) {
    while (running) {
        uint64_t slot = g_queue.read_pos % LEARNING_QUEUE_SIZE;
        RefillEvent* event = atomic_load_explicit(
            &g_queue.events[slot], memory_order_acquire);

        if (event) {
            ace_process_event(event);
            atomic_store(&g_queue.events[slot], NULL);
            g_queue.read_pos++;
        } else {
            // No events, sleep briefly
            usleep(1000);  // 1ms
        }
    }
}
```

### Why Not TLS Accumulation?

- ❌ Requires synchronization points (when to flush?)
- ❌ Delays learning (batch vs streaming)
- ❌ More complex state management
- ✅ MPSC queue is simpler and proven

## 4. Interface Contracts (Critical Specifications)

### Contract A: Queue Overflow Policy

**Rule**: ace_push_event() MUST NEVER BLOCK

**Implementation**:
- If queue is full: DROP the event silently
- Rationale: Hot path correctness > complete telemetry
- Monitoring: Track drop count for diagnostics

**Code**:
```c
void ace_push_event(RefillEvent* event) {
    uint64_t pos = atomic_fetch_add(&g_queue.write_pos, 1);
    uint64_t slot = pos % LEARNING_QUEUE_SIZE;

    // Check if slot is still occupied (queue full)
    if (atomic_load(&g_queue.events[slot]) != NULL) {
        atomic_fetch_add(&g_queue.drops, 1);  // Track drops
        return;  // DROP - don't wait!
    }

    // Safe to write - copy to ring buffer
    memcpy(&g_event_pool[slot], event, sizeof(RefillEvent));
    atomic_store_explicit(&g_queue.events[slot], &g_event_pool[slot],
                         memory_order_release);
}
```

### Contract B: Policy Scope Limitation

**Rule**: ACE can ONLY adjust "next refill parameters"

**Allowed**:
- ✅ Refill count for next miss
- ✅ Drain threshold adjustments
- ✅ Pre-warming at thread init

**FORBIDDEN**:
- ❌ Immediate cache flush
- ❌ Blocking operations
- ❌ Direct TLS manipulation

**Implementation**:
- ACE writes to: `g_refill_policies[class_idx]` (atomic)
- Box2 reads from: `ace_get_refill_count(class_idx)` (atomic load, no blocking)

**Code**:
```c
// ACE side - writes policy
void ace_update_policy(int class_idx, uint32_t new_count) {
    // ONLY writes to policy table
    atomic_store(&g_refill_policies[class_idx], new_count);
}

// Box2 side - reads policy (never blocks)
uint32_t pool_get_refill_count(int class_idx) {
    uint32_t count = atomic_load(&g_refill_policies[class_idx]);
    return count ? count : DEFAULT_REFILL_COUNT[class_idx];
}
```

### Contract C: Memory Ownership Model

**Rule**: Clear ownership to prevent use-after-free

**Model**: Fixed Ring Buffer (No Allocations)

```c
// Pre-allocated event pool
static RefillEvent g_event_pool[LEARNING_QUEUE_SIZE];

// Producer (Box2)
void ace_push_event(RefillEvent* event) {
    uint64_t pos = atomic_fetch_add(&g_queue.write_pos, 1);
    uint64_t slot = pos % LEARNING_QUEUE_SIZE;

    // Check for full queue (Contract A)
    if (atomic_load(&g_queue.events[slot]) != NULL) {
        atomic_fetch_add(&g_queue.drops, 1);
        return;
    }

    // Copy to fixed slot (no malloc!)
    memcpy(&g_event_pool[slot], event, sizeof(RefillEvent));

    // Publish pointer
    atomic_store(&g_queue.events[slot], &g_event_pool[slot]);
}

// Consumer (Box3)
void ace_consume_events(void) {
    RefillEvent* event = atomic_load(&g_queue.events[slot]);

    if (event) {
        // Process (event lifetime guaranteed by ring buffer)
        ace_process_event(event);

        // Release slot
        atomic_store(&g_queue.events[slot], NULL);
    }
}
```

**Ownership Rules**:
- Producer: COPIES to ring buffer (stack event is safe to discard)
- Consumer: READS from ring buffer (no ownership transfer)
- Ring buffer: OWNS all events (never freed, just reused)

### Contract D: API Boundary Enforcement

**Box1 API (pool_tls.h)**:
```c
// PUBLIC: Hot path functions
void* pool_alloc(size_t size);
void  pool_free(void* ptr);

// INTERNAL: Only called by Box2
void  pool_install_chain(int class_idx, void* chain, int count);
```

**Box2 API (pool_refill.h)**:
```c
// INTERNAL: Refill implementation
void* pool_refill_and_alloc(int class_idx);

// Box2 is ONLY box that calls ace_push_event()
// (Enforced by making it static in pool_refill.c)
static void notify_learning(RefillEvent* event) {
    ace_push_event(event);
}
```

**Box3 API (ace_learning.h)**:
```c
// POLICY OUTPUT: Box2 reads these
uint32_t ace_get_refill_count(int class_idx);

// EVENT INPUT: Only Box2 calls this
void ace_push_event(RefillEvent* event);

// Box3 NEVER calls Box1 functions directly
// Box3 NEVER blocks Box1 or Box2
```

**Enforcement Strategy**:
- Separate .c files (no cross-includes except public headers)
- Static functions where appropriate
- Code review checklist in POOL_IMPLEMENTATION_CHECKLIST.md

## 5. Progressive Implementation Plan

### Phase 1: Ultra-Simple TLS (2 days)

**Goal**: 40-60M ops/s without any learning

**Files**:
- `core/pool_tls.c` - TLS freelist implementation
- `core/pool_tls.h` - Public API

**Code** (pool_tls.c):
```c
// Global TLS state (per-thread)
__thread void* g_tls_pool_head[POOL_SIZE_CLASSES];
__thread uint32_t g_tls_pool_count[POOL_SIZE_CLASSES];

// Fixed refill counts for Phase 1
static const uint32_t DEFAULT_REFILL_COUNT[POOL_SIZE_CLASSES] = {
    64, 64, 48, 48, 32, 32, 24, 24,  // Small (high frequency)
    16, 16, 12, 12, 8, 8, 8, 8       // Large (lower frequency)
};

// Ultra-fast allocation (5-6 cycles)
void* pool_alloc_fast(size_t size) {
    int class_idx = pool_size_to_class(size);
    void* head = g_tls_pool_head[class_idx];

    if (LIKELY(head)) {
        // Pop from freelist
        g_tls_pool_head[class_idx] = *(void**)head;
        g_tls_pool_count[class_idx]--;

        // Write header if enabled
        #if POOL_USE_HEADERS
        *((uint8_t*)head - 1) = POOL_MAGIC | class_idx;
        #endif

        return head;
    }

    // Cold path: refill
    return pool_refill_and_alloc(class_idx);
}

// Simple refill (no learning)
static void* pool_refill_and_alloc(int class_idx) {
    int count = DEFAULT_REFILL_COUNT[class_idx];

    // Batch allocate from SuperSlab
    void* chain = ss_batch_carve(class_idx, count);
    if (!chain) return NULL;

    // Pop first for return
    void* ret = chain;
    chain = *(void**)chain;
    count--;

    // Install rest in TLS
    g_tls_pool_head[class_idx] = chain;
    g_tls_pool_count[class_idx] = count;

    #if POOL_USE_HEADERS
    *((uint8_t*)ret - 1) = POOL_MAGIC | class_idx;
    #endif

    return ret;
}

// Ultra-fast free (5-6 cycles)
void pool_free_fast(void* ptr) {
    #if POOL_USE_HEADERS
    uint8_t header = *((uint8_t*)ptr - 1);
    if ((header & 0xF0) != POOL_MAGIC) {
        // Not ours, route elsewhere
        return pool_free_slow(ptr);
    }
    int class_idx = header & 0x0F;
    #else
    int class_idx = pool_ptr_to_class(ptr);  // Lookup
    #endif

    // Push to freelist
    *(void**)ptr = g_tls_pool_head[class_idx];
    g_tls_pool_head[class_idx] = ptr;
    g_tls_pool_count[class_idx]++;

    // Optional: drain if too full
    if (UNLIKELY(g_tls_pool_count[class_idx] > MAX_TLS_CACHE)) {
        pool_drain_excess(class_idx);
    }
}
```

**Acceptance Criteria**:
- ✅ Larson: 2.5M+ ops/s
- ✅ bench_random_mixed: 40M+ ops/s
- ✅ No learning code present
- ✅ Clean, readable, < 200 LOC

### Phase 2: Metrics Collection (1 day)

**Goal**: Add instrumentation without slowing hot path

**Changes**:
```c
// Add to TLS state
__thread uint64_t g_tls_pool_hits[POOL_SIZE_CLASSES];
__thread uint64_t g_tls_pool_misses[POOL_SIZE_CLASSES];
__thread uint32_t g_tls_miss_streak[POOL_SIZE_CLASSES];

// In pool_alloc_fast() - hot path
if (LIKELY(head)) {
    #ifdef POOL_COLLECT_METRICS
    g_tls_pool_hits[class_idx]++;  // Single increment
    #endif
    // ... existing code
}

// In pool_refill_and_alloc() - cold path
g_tls_pool_misses[class_idx]++;
g_tls_miss_streak[class_idx]++;

// New stats function
void pool_print_stats(void) {
    for (int i = 0; i < POOL_SIZE_CLASSES; i++) {
        double hit_rate = (double)g_tls_pool_hits[i] /
            (g_tls_pool_hits[i] + g_tls_pool_misses[i]);
        printf("Class %d: %.2f%% hit rate, avg streak %u\n",
            i, hit_rate * 100, avg_streak[i]);
    }
}
```

**Acceptance Criteria**:
- ✅ < 2% performance regression
- ✅ Accurate hit rate reporting
- ✅ Identify hot classes for Phase 3

### Phase 3: Learning Integration (2 days)

**Goal**: Connect ACE learning without touching hot path

**New Files**:
- `core/ace_learning.c` - Learning thread
- `core/ace_policy.h` - Policy structures

**Integration Points**:

1. **Startup**: Launch learning thread
```c
void hakmem_init(void) {
    // ... existing init
    ace_start_learning_thread();
}
```

2. **Refill**: Push events
```c
// In pool_refill_and_alloc() - add after successful refill
RefillEvent event = { /* ... */ };
ace_push_event(&event);  // Non-blocking
```

3. **Policy Application**: Read tuned values
```c
// Replace DEFAULT_REFILL_COUNT with dynamic lookup
int count = ace_get_refill_count(class_idx);
// Falls back to default if no policy yet
```

**ACE Learning Algorithm** (ace_learning.c):
```c
// UCB1 for exploration vs exploitation
typedef struct {
    double total_reward;   // Sum of rewards
    uint64_t play_count;   // Times tried
    uint32_t refill_size;  // Current policy
} ClassPolicy;

static ClassPolicy g_policies[POOL_SIZE_CLASSES];

void ace_process_event(RefillEvent* e) {
    ClassPolicy* p = &g_policies[e->class_idx];

    // Compute reward (inverse of miss streak)
    double reward = 1.0 / (1.0 + e->miss_streak);

    // Update UCB1 statistics
    p->total_reward += reward;
    p->play_count++;

    // Adjust refill size based on occupancy
    if (e->tls_occupancy < 4) {
        // Cache was nearly empty, increase refill
        p->refill_size = MIN(p->refill_size * 1.5, 256);
    } else if (e->tls_occupancy > 32) {
        // Cache had plenty, decrease refill
        p->refill_size = MAX(p->refill_size * 0.75, 16);
    }

    // Publish new policy (atomic write)
    atomic_store(&g_refill_policies[e->class_idx], p->refill_size);
}
```

**Acceptance Criteria**:
- ✅ No regression in hot path performance
- ✅ Refill sizes adapt to workload
- ✅ Background thread < 1% CPU

## 5. API Specifications

### Box 1: TLS Freelist API

```c
// Public API (pool_tls.h)
void* pool_alloc(size_t size);
void  pool_free(void* ptr);
void  pool_thread_init(void);
void  pool_thread_cleanup(void);

// Internal API (for refill box)
int   pool_needs_refill(int class_idx);
void  pool_install_chain(int class_idx, void* chain, int count);
```

### Box 2: Refill API

```c
// Internal API (pool_refill.h)
void* pool_refill_and_alloc(int class_idx);
int   pool_get_refill_count(int class_idx);
void  pool_drain_excess(int class_idx);

// Backend interface
void* backend_batch_alloc(int class_idx, int count);
void  backend_batch_free(int class_idx, void* chain, int count);
```

### Box 3: Learning API

```c
// Public API (ace_learning.h)
void ace_start_learning_thread(void);
void ace_stop_learning_thread(void);
void ace_push_event(RefillEvent* event);

// Policy API
uint32_t ace_get_refill_count(int class_idx);
void     ace_reset_policies(void);
void     ace_print_stats(void);
```

## 6. Diagnostics and Monitoring

### Queue Health Metrics

```c
typedef struct {
    uint64_t total_events;     // Total events pushed
    uint64_t dropped_events;   // Events dropped due to full queue
    uint64_t processed_events; // Events successfully processed
    double drop_rate;          // drops / total_events
} QueueMetrics;

void ace_compute_metrics(QueueMetrics* m) {
    m->total_events = atomic_load(&g_queue.write_pos);
    m->dropped_events = atomic_load(&g_queue.drops);
    m->processed_events = g_queue.read_pos;
    m->drop_rate = (double)m->dropped_events / m->total_events;

    // Alert if drop rate exceeds threshold
    if (m->drop_rate > 0.01) {  // > 1% drops
        fprintf(stderr, "WARNING: Queue drop rate %.2f%% - increase LEARNING_QUEUE_SIZE\n",
                m->drop_rate * 100);
    }
}
```

**Target Metrics**:
- Drop rate: < 0.1% (normal operation)
- If > 1%: Increase LEARNING_QUEUE_SIZE
- If > 5%: Critical - learning degraded

### Policy Stability Metrics

```c
typedef struct {
    uint32_t refill_count;
    uint32_t change_count;     // Times policy changed
    uint64_t last_change_ns;   // When last changed
    double variance;           // Refill count variance
} PolicyMetrics;

void ace_track_policy_stability(int class_idx) {
    static PolicyMetrics metrics[POOL_SIZE_CLASSES];
    PolicyMetrics* m = &metrics[class_idx];

    uint32_t new_count = atomic_load(&g_refill_policies[class_idx]);
    if (new_count != m->refill_count) {
        m->change_count++;
        m->last_change_ns = get_timestamp_ns();

        // Detect oscillation
        uint64_t change_interval = get_timestamp_ns() - m->last_change_ns;
        if (change_interval < 1000000000) {  // < 1 second
            fprintf(stderr, "WARNING: Class %d policy oscillating\n", class_idx);
        }
    }
}
```

### Debug Flags

```c
// Contract validation
#ifdef POOL_DEBUG_CONTRACTS
    #define VALIDATE_CONTRACT_A() do { \
        if (is_blocking_detected()) { \
            panic("Contract A violation: ace_push_event blocked!"); \
        } \
    } while(0)

    #define VALIDATE_CONTRACT_B() do { \
        if (ace_performed_immediate_action()) { \
            panic("Contract B violation: ACE performed immediate action!"); \
        } \
    } while(0)

    #define VALIDATE_CONTRACT_D() do { \
        if (box3_called_box1_function()) { \
            panic("Contract D violation: Box3 called Box1 directly!"); \
        } \
    } while(0)
#else
    #define VALIDATE_CONTRACT_A()
    #define VALIDATE_CONTRACT_B()
    #define VALIDATE_CONTRACT_D()
#endif

// Drop tracking
#ifdef POOL_DEBUG_DROPS
    #define LOG_DROP() fprintf(stderr, "DROP: tid=%lu class=%d @ %s:%d\n", \
                              pthread_self(), class_idx, __FILE__, __LINE__)
#else
    #define LOG_DROP()
#endif
```

### Runtime Diagnostics Command

```c
void pool_print_diagnostics(void) {
    printf("=== Pool TLS Learning Diagnostics ===\n");

    // Queue health
    QueueMetrics qm;
    ace_compute_metrics(&qm);
    printf("Queue: %lu events, %lu drops (%.2f%%)\n",
           qm.total_events, qm.dropped_events, qm.drop_rate * 100);

    // Per-class stats
    for (int i = 0; i < POOL_SIZE_CLASSES; i++) {
        uint32_t refill_count = atomic_load(&g_refill_policies[i]);
        double hit_rate = (double)g_tls_pool_hits[i] /
                         (g_tls_pool_hits[i] + g_tls_pool_misses[i]);

        printf("Class %2d: refill=%3u hit_rate=%.1f%%\n",
               i, refill_count, hit_rate * 100);
    }

    // Contract violations (if any)
    #ifdef POOL_DEBUG_CONTRACTS
    printf("Contract violations: A=%u B=%u C=%u D=%u\n",
           g_contract_a_violations, g_contract_b_violations,
           g_contract_c_violations, g_contract_d_violations);
    #endif
}
```

## 7. Risk Analysis

### Performance Risks

| Risk | Mitigation | Severity |
|------|------------|----------|
| Hot path regression | Feature flags for each phase | Low |
| Learning overhead | Async queue, no blocking | Low |
| Cache line bouncing | TLS data, no sharing | Low |
| Memory overhead | Bounded TLS cache sizes | Medium |

### Complexity Risks

| Risk | Mitigation | Severity |
|------|------------|----------|
| Box boundary violation | Contract D: Separate files, enforced APIs | Medium |
| Deadlock in learning | Contract A: Lock-free queue, drops allowed | Low |
| Policy instability | Contract B: Only next-refill adjustments | Medium |
| Debug complexity | Per-box debug flags | Low |

### Correctness Risks

| Risk | Mitigation | Severity |
|------|------------|----------|
| Header corruption | Magic byte validation | Low |
| Double-free | TLS ownership clear | Low |
| Memory leak | Drain on thread exit | Medium |
| Refill failure | Fallback to system malloc | Low |
| Use-after-free | Contract C: Fixed ring buffer, no malloc | Low |

### Contract-Specific Risks

| Risk | Contract | Mitigation |
|------|----------|------------|
| Queue overflow causing blocking | A | Drop events, monitor drop rate |
| Learning thread blocking refill | B | Policy reads are atomic only |
| Event lifetime issues | C | Fixed ring buffer, memcpy semantics |
| Cross-box coupling | D | Separate compilation units, code review |

## 8. Testing Strategy

### Phase 1 Tests
- Unit: TLS alloc/free correctness
- Perf: 40-60M ops/s target
- Stress: Multi-threaded consistency

### Phase 2 Tests
- Metrics accuracy validation
- Performance regression < 2%
- Hit rate analysis

### Phase 3 Tests
- Learning convergence
- Policy stability
- Background thread CPU < 1%

### Contract Validation Tests

#### Contract A: Non-Blocking Queue
```c
void test_queue_never_blocks(void) {
    // Fill queue completely
    for (int i = 0; i < LEARNING_QUEUE_SIZE * 2; i++) {
        RefillEvent event = {.class_idx = i % 16};
        uint64_t start = get_cycles();
        ace_push_event(&event);
        uint64_t elapsed = get_cycles() - start;

        // Should never take more than 1000 cycles
        assert(elapsed < 1000);
    }

    // Verify drops were tracked
    assert(atomic_load(&g_queue.drops) > 0);
}
```

#### Contract B: Policy Scope
```c
void test_policy_scope_limited(void) {
    // ACE should only write to policy table
    uint32_t old_count = g_tls_pool_count[0];

    // Trigger learning update
    ace_update_policy(0, 128);

    // Verify TLS state unchanged
    assert(g_tls_pool_count[0] == old_count);

    // Verify policy updated
    assert(ace_get_refill_count(0) == 128);
}
```

#### Contract C: Memory Safety
```c
void test_no_use_after_free(void) {
    RefillEvent stack_event = {.class_idx = 5};

    // Push event (should be copied)
    ace_push_event(&stack_event);

    // Modify stack event
    stack_event.class_idx = 10;

    // Consume event - should see original value
    ace_consume_single_event();
    assert(last_processed_class == 5);
}
```

#### Contract D: API Boundaries
```c
// This should fail to compile if boundaries are correct
#ifdef TEST_CONTRACT_D_VIOLATION
    // In ace_learning.c
    void bad_function(void) {
        // Should not compile - Box3 can't call Box1
        pool_alloc(128);  // VIOLATION!
    }
#endif
```

## 9. Implementation Timeline

```
Day 1-2: Phase 1 (Simple TLS)
  - pool_tls.c implementation
  - Basic testing
  - Performance validation

Day 3: Phase 2 (Metrics)
  - Add counters
  - Stats reporting
  - Identify hot classes

Day 4-5: Phase 3 (Learning)
  - ace_learning.c
  - MPSC queue
  - UCB1 algorithm

Day 6: Integration Testing
  - Full system test
  - Performance validation
  - Documentation
```

## Conclusion

This design achieves:
- ✅ **Clean separation**: Three distinct boxes with clear boundaries
- ✅ **Simple hot path**: 5-6 cycles for alloc/free
- ✅ **Smart learning**: UCB1 in background, no hot path impact
- ✅ **Progressive enhancement**: Each phase independently valuable
- ✅ **User's vision**: "キャッシュ増やす時だけ学習させる、push して他のスレッドに任せる"

**Critical Specifications Now Formalized:**
- ✅ **Contract A**: Queue overflow policy - DROP events, never block
- ✅ **Contract B**: Policy scope limitation - Only adjust next refill
- ✅ **Contract C**: Memory ownership model - Fixed ring buffer, no UAF
- ✅ **Contract D**: API boundary enforcement - Separate files, no cross-calls

The key insight is that learning during refill (cold path) keeps the hot path pristine while still enabling intelligent adaptation. The lock-free MPSC queue with explicit drop policy ensures zero contention between workers and the learning thread.

**Ready for Implementation**: All ambiguities resolved, contracts specified, testing defined.