Files
hakmem/docs/design/POOL_IMPLEMENTATION_CHECKLIST.md
Moe Charm (CI) a9ddb52ad4 ENV cleanup: Remove BG/HotMag vars & guard fprintf (Larson 52.3M ops/s)
Phase 1 完了:環境変数整理 + fprintf デバッグガード

ENV変数削除(BG/HotMag系):
- core/hakmem_tiny_init.inc: HotMag ENV 削除 (~131 lines)
- core/hakmem_tiny_bg_spill.c: BG spill ENV 削除
- core/tiny_refill.h: BG remote 固定値化
- core/hakmem_tiny_slow.inc: BG refs 削除

fprintf Debug Guards (#if !HAKMEM_BUILD_RELEASE):
- core/hakmem_shared_pool.c: Lock stats (~18 fprintf)
- core/page_arena.c: Init/Shutdown/Stats (~27 fprintf)
- core/hakmem.c: SIGSEGV init message

ドキュメント整理:
- 328 markdown files 削除(旧レポート・重複docs)

性能確認:
- Larson: 52.35M ops/s (前回52.8M、安定動作)
- ENV整理による機能影響なし
- Debug出力は一部残存(次phase で対応)

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 14:45:26 +09:00

216 lines
6.5 KiB
Markdown

# Pool TLS + Learning Implementation Checklist
## Pre-Implementation Review
### Contract Understanding
- [ ] Read and understand all 4 contracts (A-D) in POOL_TLS_LEARNING_DESIGN.md
- [ ] Identify which contract applies to each code section
- [ ] Review enforcement strategies for each contract
## Phase 1: Ultra-Simple TLS Implementation
### Box 1: TLS Freelist (pool_tls.c)
#### Setup
- [ ] Create `core/pool_tls.c` and `core/pool_tls.h`
- [ ] Define TLS globals: `__thread void* g_tls_pool_head[POOL_SIZE_CLASSES]`
- [ ] Define TLS counts: `__thread uint32_t g_tls_pool_count[POOL_SIZE_CLASSES]`
- [ ] Define default refill counts array
#### Hot Path Implementation
- [ ] Implement `pool_alloc_fast()` - must be 5-6 instructions max
- [ ] Pop from TLS freelist
- [ ] Conditional header write (if enabled)
- [ ] Call refill only on miss
- [ ] Implement `pool_free_fast()` - must be 5-6 instructions max
- [ ] Header validation (if enabled)
- [ ] Push to TLS freelist
- [ ] Optional drain check
#### Contract D Validation
- [ ] Verify Box1 has NO learning code
- [ ] Verify Box1 has NO metrics collection
- [ ] Verify Box1 only exposes public API and internal chain installer
- [ ] No includes of ace_learning.h or pool_refill.h in pool_tls.c
#### Testing
- [ ] Unit test: Allocation/free correctness
- [ ] Performance test: Target 40-60M ops/s
- [ ] Verify hot path is < 10 instructions with objdump
### Box 2: Refill Engine (pool_refill.c)
#### Setup
- [ ] Create `core/pool_refill.c` and `core/pool_refill.h`
- [ ] Import only pool_tls.h public API
- [ ] Define refill statistics (miss streak, etc.)
#### Refill Implementation
- [ ] Implement `pool_refill_and_alloc()`
- [ ] Capture pre-refill state
- [ ] Get refill count (default for Phase 1)
- [ ] Batch allocate from backend
- [ ] Install chain in TLS
- [ ] Return first block
#### Contract B Validation
- [ ] Verify refill NEVER blocks waiting for policy
- [ ] Verify refill only reads atomic policy values
- [ ] No immediate cache manipulation
#### Contract C Validation
- [ ] Event created on stack
- [ ] Event data copied, not referenced
- [ ] No dynamic allocation for events
## Phase 2: Metrics Collection
### Metrics Addition
- [ ] Add hit/miss counters to TLS state
- [ ] Add miss streak tracking
- [ ] Instrument hot path (with ifdef guard)
- [ ] Implement `pool_print_stats()`
### Performance Validation
- [ ] Measure regression with metrics enabled
- [ ] Must be < 2% performance impact
- [ ] Verify counters are accurate
## Phase 3: Learning Integration
### Box 3: ACE Learning (ace_learning.c)
#### Setup
- [ ] Create `core/ace_learning.c` and `core/ace_learning.h`
- [ ] Pre-allocate event ring buffer: `RefillEvent g_event_pool[QUEUE_SIZE]`
- [ ] Initialize MPSC queue structure
- [ ] Define policy table: `_Atomic uint32_t g_refill_policies[CLASSES]`
#### MPSC Queue Implementation
- [ ] Implement `ace_push_event()`
- [ ] Contract A: Check for full queue
- [ ] Contract A: DROP if full (never block!)
- [ ] Contract A: Track drops with counter
- [ ] Contract C: COPY event to ring buffer
- [ ] Use proper memory ordering
- [ ] Implement `ace_consume_events()`
- [ ] Read events with acquire semantics
- [ ] Process and release slots
- [ ] Sleep when queue empty
#### Contract A Validation
- [ ] Push function NEVER blocks
- [ ] Drops are tracked
- [ ] Drop rate monitoring implemented
- [ ] Warning issued if drop rate > 1%
#### Contract B Validation
- [ ] ACE only writes to policy table
- [ ] No immediate actions taken
- [ ] No direct TLS manipulation
- [ ] No blocking operations
#### Contract C Validation
- [ ] Ring buffer pre-allocated
- [ ] Events copied, not moved
- [ ] No malloc/free in event path
- [ ] Clear slot ownership model
#### Contract D Validation
- [ ] ace_learning.c does NOT include pool_tls.h internals
- [ ] No direct calls to Box1 functions
- [ ] Only ace_push_event() exposed to Box2
- [ ] Make notify_learning() static in pool_refill.c
#### Learning Algorithm
- [ ] Implement UCB1 or similar
- [ ] Track per-class statistics
- [ ] Gradual policy adjustments
- [ ] Oscillation detection
### Integration Points
#### Box2 → Box3 Connection
- [ ] Add event creation in pool_refill_and_alloc()
- [ ] Call ace_push_event() after successful refill
- [ ] Make notify_learning() wrapper static
#### Box2 Policy Reading
- [ ] Replace DEFAULT_REFILL_COUNT with ace_get_refill_count()
- [ ] Atomic read of policy (no blocking)
- [ ] Fallback to default if no policy
#### Startup
- [ ] Launch learning thread in hakmem_init()
- [ ] Initialize policy table with defaults
- [ ] Verify thread starts successfully
## Diagnostics Implementation
### Queue Monitoring
- [ ] Implement drop rate calculation
- [ ] Add queue health metrics structure
- [ ] Periodic health checks
### Debug Flags
- [ ] POOL_DEBUG_CONTRACTS - contract validation
- [ ] POOL_DEBUG_DROPS - log dropped events
- [ ] Add contract violation counters
### Runtime Diagnostics
- [ ] Implement pool_print_diagnostics()
- [ ] Per-class statistics
- [ ] Queue health report
- [ ] Contract violation summary
## Final Validation
### Performance
- [ ] Larson: 2.5M+ ops/s
- [ ] bench_random_mixed: 40M+ ops/s
- [ ] Background thread < 1% CPU
- [ ] Drop rate < 0.1%
### Correctness
- [ ] No memory leaks (Valgrind)
- [ ] Thread safety verified
- [ ] All contracts validated
- [ ] Stress test passes
### Code Quality
- [ ] Each box in separate .c file
- [ ] Clear API boundaries
- [ ] No cross-box includes
- [ ] < 1000 LOC total
## Sign-off Checklist
### Contract A (Queue Never Blocks)
- [ ] Verified ace_push_event() drops on full
- [ ] Drop tracking implemented
- [ ] No blocking operations in push path
- [ ] Approved by: _____________
### Contract B (Policy Scope Limited)
- [ ] ACE only adjusts next refill count
- [ ] No immediate actions
- [ ] Atomic reads only
- [ ] Approved by: _____________
### Contract C (Memory Ownership Clear)
- [ ] Ring buffer pre-allocated
- [ ] Events copied not moved
- [ ] No use-after-free possible
- [ ] Approved by: _____________
### Contract D (API Boundaries Enforced)
- [ ] Box files separate
- [ ] No improper includes
- [ ] Static functions where needed
- [ ] Approved by: _____________
## Notes
**Remember**: The goal is an ultra-simple hot path (5-6 cycles) with smart learning that never interferes with performance. When in doubt, favor simplicity and speed over completeness of telemetry.
**Key Principle**: "キャッシュ増やす時だけ学習させるpush して他のスレッドに任せる" - Learning happens only during refill, pushed async to another thread.