# Pool TLS + Learning Implementation Checklist

## Pre-Implementation Review

### Contract Understanding
- [ ] Read and understand all 4 contracts (A-D) in POOL_TLS_LEARNING_DESIGN.md
- [ ] Identify which contract applies to each code section
- [ ] Review enforcement strategies for each contract

## Phase 1: Ultra-Simple TLS Implementation

### Box 1: TLS Freelist (pool_tls.c)

#### Setup
- [ ] Create `core/pool_tls.c` and `core/pool_tls.h`
- [ ] Define TLS globals: `__thread void* g_tls_pool_head[POOL_SIZE_CLASSES]`
- [ ] Define TLS counts: `__thread uint32_t g_tls_pool_count[POOL_SIZE_CLASSES]`
- [ ] Define default refill counts array

#### Hot Path Implementation
- [ ] Implement `pool_alloc_fast()` - must be 5-6 instructions max
  - [ ] Pop from TLS freelist
  - [ ] Conditional header write (if enabled)
  - [ ] Call refill only on miss
- [ ] Implement `pool_free_fast()` - must be 5-6 instructions max
  - [ ] Header validation (if enabled)
  - [ ] Push to TLS freelist
  - [ ] Optional drain check

#### Contract D Validation
- [ ] Verify Box1 has NO learning code
- [ ] Verify Box1 has NO metrics collection
- [ ] Verify Box1 only exposes public API and internal chain installer
- [ ] No includes of ace_learning.h or pool_refill.h in pool_tls.c

#### Testing
- [ ] Unit test: Allocation/free correctness
- [ ] Performance test: Target 40-60M ops/s
- [ ] Verify hot path is < 10 instructions with objdump

### Box 2: Refill Engine (pool_refill.c)

#### Setup
- [ ] Create `core/pool_refill.c` and `core/pool_refill.h`
- [ ] Import only pool_tls.h public API
- [ ] Define refill statistics (miss streak, etc.)

#### Refill Implementation
- [ ] Implement `pool_refill_and_alloc()`
  - [ ] Capture pre-refill state
  - [ ] Get refill count (default for Phase 1)
  - [ ] Batch allocate from backend
  - [ ] Install chain in TLS
  - [ ] Return first block

#### Contract B Validation
- [ ] Verify refill NEVER blocks waiting for policy
- [ ] Verify refill only reads atomic policy values
- [ ] No immediate cache manipulation

#### Contract C Validation
- [ ] Event created on stack
- [ ] Event data copied, not referenced
- [ ] No dynamic allocation for events

## Phase 2: Metrics Collection

### Metrics Addition
- [ ] Add hit/miss counters to TLS state
- [ ] Add miss streak tracking
- [ ] Instrument hot path (with ifdef guard)
- [ ] Implement `pool_print_stats()`

### Performance Validation
- [ ] Measure regression with metrics enabled
- [ ] Must be < 2% performance impact
- [ ] Verify counters are accurate

## Phase 3: Learning Integration

### Box 3: ACE Learning (ace_learning.c)

#### Setup
- [ ] Create `core/ace_learning.c` and `core/ace_learning.h`
- [ ] Pre-allocate event ring buffer: `RefillEvent g_event_pool[QUEUE_SIZE]`
- [ ] Initialize MPSC queue structure
- [ ] Define policy table: `_Atomic uint32_t g_refill_policies[CLASSES]`

#### MPSC Queue Implementation
- [ ] Implement `ace_push_event()`
  - [ ] Contract A: Check for full queue
  - [ ] Contract A: DROP if full (never block!)
  - [ ] Contract A: Track drops with counter
  - [ ] Contract C: COPY event to ring buffer
  - [ ] Use proper memory ordering
- [ ] Implement `ace_consume_events()`
  - [ ] Read events with acquire semantics
  - [ ] Process and release slots
  - [ ] Sleep when queue empty

#### Contract A Validation
- [ ] Push function NEVER blocks
- [ ] Drops are tracked
- [ ] Drop rate monitoring implemented
- [ ] Warning issued if drop rate > 1%

#### Contract B Validation
- [ ] ACE only writes to policy table
- [ ] No immediate actions taken
- [ ] No direct TLS manipulation
- [ ] No blocking operations

#### Contract C Validation
- [ ] Ring buffer pre-allocated
- [ ] Events copied, not moved
- [ ] No malloc/free in event path
- [ ] Clear slot ownership model

#### Contract D Validation
- [ ] ace_learning.c does NOT include pool_tls.h internals
- [ ] No direct calls to Box1 functions
- [ ] Only ace_push_event() exposed to Box2
- [ ] Make notify_learning() static in pool_refill.c

#### Learning Algorithm
- [ ] Implement UCB1 or similar
- [ ] Track per-class statistics
- [ ] Gradual policy adjustments
- [ ] Oscillation detection

### Integration Points

#### Box2 → Box3 Connection
- [ ] Add event creation in pool_refill_and_alloc()
- [ ] Call ace_push_event() after successful refill
- [ ] Make notify_learning() wrapper static

#### Box2 Policy Reading
- [ ] Replace DEFAULT_REFILL_COUNT with ace_get_refill_count()
- [ ] Atomic read of policy (no blocking)
- [ ] Fallback to default if no policy

#### Startup
- [ ] Launch learning thread in hakmem_init()
- [ ] Initialize policy table with defaults
- [ ] Verify thread starts successfully

## Diagnostics Implementation

### Queue Monitoring
- [ ] Implement drop rate calculation
- [ ] Add queue health metrics structure
- [ ] Periodic health checks

### Debug Flags
- [ ] POOL_DEBUG_CONTRACTS - contract validation
- [ ] POOL_DEBUG_DROPS - log dropped events
- [ ] Add contract violation counters

### Runtime Diagnostics
- [ ] Implement pool_print_diagnostics()
- [ ] Per-class statistics
- [ ] Queue health report
- [ ] Contract violation summary

## Final Validation

### Performance
- [ ] Larson: 2.5M+ ops/s
- [ ] bench_random_mixed: 40M+ ops/s
- [ ] Background thread < 1% CPU
- [ ] Drop rate < 0.1%

### Correctness
- [ ] No memory leaks (Valgrind)
- [ ] Thread safety verified
- [ ] All contracts validated
- [ ] Stress test passes

### Code Quality
- [ ] Each box in separate .c file
- [ ] Clear API boundaries
- [ ] No cross-box includes
- [ ] < 1000 LOC total

## Sign-off Checklist

### Contract A (Queue Never Blocks)
- [ ] Verified ace_push_event() drops on full
- [ ] Drop tracking implemented
- [ ] No blocking operations in push path
- [ ] Approved by: _____________

### Contract B (Policy Scope Limited)
- [ ] ACE only adjusts next refill count
- [ ] No immediate actions
- [ ] Atomic reads only
- [ ] Approved by: _____________

### Contract C (Memory Ownership Clear)
- [ ] Ring buffer pre-allocated
- [ ] Events copied not moved
- [ ] No use-after-free possible
- [ ] Approved by: _____________

### Contract D (API Boundaries Enforced)
- [ ] Box files separate
- [ ] No improper includes
- [ ] Static functions where needed
- [ ] Approved by: _____________

## Notes

**Remember**: The goal is an ultra-simple hot path (5-6 cycles) with smart learning that never interferes with performance. When in doubt, favor simplicity and speed over completeness of telemetry.

**Key Principle**: "キャッシュ増やす時だけ学習させる、push して他のスレッドに任せる" - Learning happens only during refill, pushed async to another thread.