# Pool TLS + Learning Implementation Checklist ## Pre-Implementation Review ### Contract Understanding - [ ] Read and understand all 4 contracts (A-D) in POOL_TLS_LEARNING_DESIGN.md - [ ] Identify which contract applies to each code section - [ ] Review enforcement strategies for each contract ## Phase 1: Ultra-Simple TLS Implementation ### Box 1: TLS Freelist (pool_tls.c) #### Setup - [ ] Create `core/pool_tls.c` and `core/pool_tls.h` - [ ] Define TLS globals: `__thread void* g_tls_pool_head[POOL_SIZE_CLASSES]` - [ ] Define TLS counts: `__thread uint32_t g_tls_pool_count[POOL_SIZE_CLASSES]` - [ ] Define default refill counts array #### Hot Path Implementation - [ ] Implement `pool_alloc_fast()` - must be 5-6 instructions max - [ ] Pop from TLS freelist - [ ] Conditional header write (if enabled) - [ ] Call refill only on miss - [ ] Implement `pool_free_fast()` - must be 5-6 instructions max - [ ] Header validation (if enabled) - [ ] Push to TLS freelist - [ ] Optional drain check #### Contract D Validation - [ ] Verify Box1 has NO learning code - [ ] Verify Box1 has NO metrics collection - [ ] Verify Box1 only exposes public API and internal chain installer - [ ] No includes of ace_learning.h or pool_refill.h in pool_tls.c #### Testing - [ ] Unit test: Allocation/free correctness - [ ] Performance test: Target 40-60M ops/s - [ ] Verify hot path is < 10 instructions with objdump ### Box 2: Refill Engine (pool_refill.c) #### Setup - [ ] Create `core/pool_refill.c` and `core/pool_refill.h` - [ ] Import only pool_tls.h public API - [ ] Define refill statistics (miss streak, etc.) #### Refill Implementation - [ ] Implement `pool_refill_and_alloc()` - [ ] Capture pre-refill state - [ ] Get refill count (default for Phase 1) - [ ] Batch allocate from backend - [ ] Install chain in TLS - [ ] Return first block #### Contract B Validation - [ ] Verify refill NEVER blocks waiting for policy - [ ] Verify refill only reads atomic policy values - [ ] No immediate cache manipulation #### Contract C Validation - [ ] Event created on stack - [ ] Event data copied, not referenced - [ ] No dynamic allocation for events ## Phase 2: Metrics Collection ### Metrics Addition - [ ] Add hit/miss counters to TLS state - [ ] Add miss streak tracking - [ ] Instrument hot path (with ifdef guard) - [ ] Implement `pool_print_stats()` ### Performance Validation - [ ] Measure regression with metrics enabled - [ ] Must be < 2% performance impact - [ ] Verify counters are accurate ## Phase 3: Learning Integration ### Box 3: ACE Learning (ace_learning.c) #### Setup - [ ] Create `core/ace_learning.c` and `core/ace_learning.h` - [ ] Pre-allocate event ring buffer: `RefillEvent g_event_pool[QUEUE_SIZE]` - [ ] Initialize MPSC queue structure - [ ] Define policy table: `_Atomic uint32_t g_refill_policies[CLASSES]` #### MPSC Queue Implementation - [ ] Implement `ace_push_event()` - [ ] Contract A: Check for full queue - [ ] Contract A: DROP if full (never block!) - [ ] Contract A: Track drops with counter - [ ] Contract C: COPY event to ring buffer - [ ] Use proper memory ordering - [ ] Implement `ace_consume_events()` - [ ] Read events with acquire semantics - [ ] Process and release slots - [ ] Sleep when queue empty #### Contract A Validation - [ ] Push function NEVER blocks - [ ] Drops are tracked - [ ] Drop rate monitoring implemented - [ ] Warning issued if drop rate > 1% #### Contract B Validation - [ ] ACE only writes to policy table - [ ] No immediate actions taken - [ ] No direct TLS manipulation - [ ] No blocking operations #### Contract C Validation - [ ] Ring buffer pre-allocated - [ ] Events copied, not moved - [ ] No malloc/free in event path - [ ] Clear slot ownership model #### Contract D Validation - [ ] ace_learning.c does NOT include pool_tls.h internals - [ ] No direct calls to Box1 functions - [ ] Only ace_push_event() exposed to Box2 - [ ] Make notify_learning() static in pool_refill.c #### Learning Algorithm - [ ] Implement UCB1 or similar - [ ] Track per-class statistics - [ ] Gradual policy adjustments - [ ] Oscillation detection ### Integration Points #### Box2 → Box3 Connection - [ ] Add event creation in pool_refill_and_alloc() - [ ] Call ace_push_event() after successful refill - [ ] Make notify_learning() wrapper static #### Box2 Policy Reading - [ ] Replace DEFAULT_REFILL_COUNT with ace_get_refill_count() - [ ] Atomic read of policy (no blocking) - [ ] Fallback to default if no policy #### Startup - [ ] Launch learning thread in hakmem_init() - [ ] Initialize policy table with defaults - [ ] Verify thread starts successfully ## Diagnostics Implementation ### Queue Monitoring - [ ] Implement drop rate calculation - [ ] Add queue health metrics structure - [ ] Periodic health checks ### Debug Flags - [ ] POOL_DEBUG_CONTRACTS - contract validation - [ ] POOL_DEBUG_DROPS - log dropped events - [ ] Add contract violation counters ### Runtime Diagnostics - [ ] Implement pool_print_diagnostics() - [ ] Per-class statistics - [ ] Queue health report - [ ] Contract violation summary ## Final Validation ### Performance - [ ] Larson: 2.5M+ ops/s - [ ] bench_random_mixed: 40M+ ops/s - [ ] Background thread < 1% CPU - [ ] Drop rate < 0.1% ### Correctness - [ ] No memory leaks (Valgrind) - [ ] Thread safety verified - [ ] All contracts validated - [ ] Stress test passes ### Code Quality - [ ] Each box in separate .c file - [ ] Clear API boundaries - [ ] No cross-box includes - [ ] < 1000 LOC total ## Sign-off Checklist ### Contract A (Queue Never Blocks) - [ ] Verified ace_push_event() drops on full - [ ] Drop tracking implemented - [ ] No blocking operations in push path - [ ] Approved by: _____________ ### Contract B (Policy Scope Limited) - [ ] ACE only adjusts next refill count - [ ] No immediate actions - [ ] Atomic reads only - [ ] Approved by: _____________ ### Contract C (Memory Ownership Clear) - [ ] Ring buffer pre-allocated - [ ] Events copied not moved - [ ] No use-after-free possible - [ ] Approved by: _____________ ### Contract D (API Boundaries Enforced) - [ ] Box files separate - [ ] No improper includes - [ ] Static functions where needed - [ ] Approved by: _____________ ## Notes **Remember**: The goal is an ultra-simple hot path (5-6 cycles) with smart learning that never interferes with performance. When in doubt, favor simplicity and speed over completeness of telemetry. **Key Principle**: "キャッシュ増やす時だけ学習させる、push して他のスレッドに任せる" - Learning happens only during refill, pushed async to another thread.