Phase 1 完了:環境変数整理 + fprintf デバッグガード ENV変数削除(BG/HotMag系): - core/hakmem_tiny_init.inc: HotMag ENV 削除 (~131 lines) - core/hakmem_tiny_bg_spill.c: BG spill ENV 削除 - core/tiny_refill.h: BG remote 固定値化 - core/hakmem_tiny_slow.inc: BG refs 削除 fprintf Debug Guards (#if !HAKMEM_BUILD_RELEASE): - core/hakmem_shared_pool.c: Lock stats (~18 fprintf) - core/page_arena.c: Init/Shutdown/Stats (~27 fprintf) - core/hakmem.c: SIGSEGV init message ドキュメント整理: - 328 markdown files 削除(旧レポート・重複docs) 性能確認: - Larson: 52.35M ops/s (前回52.8M、安定動作✅) - ENV整理による機能影響なし - Debug出力は一部残存(次phase で対応) 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
6.5 KiB
6.5 KiB
Pool TLS + Learning Implementation Checklist
Pre-Implementation Review
Contract Understanding
- Read and understand all 4 contracts (A-D) in POOL_TLS_LEARNING_DESIGN.md
- Identify which contract applies to each code section
- Review enforcement strategies for each contract
Phase 1: Ultra-Simple TLS Implementation
Box 1: TLS Freelist (pool_tls.c)
Setup
- Create
core/pool_tls.candcore/pool_tls.h - Define TLS globals:
__thread void* g_tls_pool_head[POOL_SIZE_CLASSES] - Define TLS counts:
__thread uint32_t g_tls_pool_count[POOL_SIZE_CLASSES] - Define default refill counts array
Hot Path Implementation
- Implement
pool_alloc_fast()- must be 5-6 instructions max- Pop from TLS freelist
- Conditional header write (if enabled)
- Call refill only on miss
- Implement
pool_free_fast()- must be 5-6 instructions max- Header validation (if enabled)
- Push to TLS freelist
- Optional drain check
Contract D Validation
- Verify Box1 has NO learning code
- Verify Box1 has NO metrics collection
- Verify Box1 only exposes public API and internal chain installer
- No includes of ace_learning.h or pool_refill.h in pool_tls.c
Testing
- Unit test: Allocation/free correctness
- Performance test: Target 40-60M ops/s
- Verify hot path is < 10 instructions with objdump
Box 2: Refill Engine (pool_refill.c)
Setup
- Create
core/pool_refill.candcore/pool_refill.h - Import only pool_tls.h public API
- Define refill statistics (miss streak, etc.)
Refill Implementation
- Implement
pool_refill_and_alloc()- Capture pre-refill state
- Get refill count (default for Phase 1)
- Batch allocate from backend
- Install chain in TLS
- Return first block
Contract B Validation
- Verify refill NEVER blocks waiting for policy
- Verify refill only reads atomic policy values
- No immediate cache manipulation
Contract C Validation
- Event created on stack
- Event data copied, not referenced
- No dynamic allocation for events
Phase 2: Metrics Collection
Metrics Addition
- Add hit/miss counters to TLS state
- Add miss streak tracking
- Instrument hot path (with ifdef guard)
- Implement
pool_print_stats()
Performance Validation
- Measure regression with metrics enabled
- Must be < 2% performance impact
- Verify counters are accurate
Phase 3: Learning Integration
Box 3: ACE Learning (ace_learning.c)
Setup
- Create
core/ace_learning.candcore/ace_learning.h - Pre-allocate event ring buffer:
RefillEvent g_event_pool[QUEUE_SIZE] - Initialize MPSC queue structure
- Define policy table:
_Atomic uint32_t g_refill_policies[CLASSES]
MPSC Queue Implementation
- Implement
ace_push_event()- Contract A: Check for full queue
- Contract A: DROP if full (never block!)
- Contract A: Track drops with counter
- Contract C: COPY event to ring buffer
- Use proper memory ordering
- Implement
ace_consume_events()- Read events with acquire semantics
- Process and release slots
- Sleep when queue empty
Contract A Validation
- Push function NEVER blocks
- Drops are tracked
- Drop rate monitoring implemented
- Warning issued if drop rate > 1%
Contract B Validation
- ACE only writes to policy table
- No immediate actions taken
- No direct TLS manipulation
- No blocking operations
Contract C Validation
- Ring buffer pre-allocated
- Events copied, not moved
- No malloc/free in event path
- Clear slot ownership model
Contract D Validation
- ace_learning.c does NOT include pool_tls.h internals
- No direct calls to Box1 functions
- Only ace_push_event() exposed to Box2
- Make notify_learning() static in pool_refill.c
Learning Algorithm
- Implement UCB1 or similar
- Track per-class statistics
- Gradual policy adjustments
- Oscillation detection
Integration Points
Box2 → Box3 Connection
- Add event creation in pool_refill_and_alloc()
- Call ace_push_event() after successful refill
- Make notify_learning() wrapper static
Box2 Policy Reading
- Replace DEFAULT_REFILL_COUNT with ace_get_refill_count()
- Atomic read of policy (no blocking)
- Fallback to default if no policy
Startup
- Launch learning thread in hakmem_init()
- Initialize policy table with defaults
- Verify thread starts successfully
Diagnostics Implementation
Queue Monitoring
- Implement drop rate calculation
- Add queue health metrics structure
- Periodic health checks
Debug Flags
- POOL_DEBUG_CONTRACTS - contract validation
- POOL_DEBUG_DROPS - log dropped events
- Add contract violation counters
Runtime Diagnostics
- Implement pool_print_diagnostics()
- Per-class statistics
- Queue health report
- Contract violation summary
Final Validation
Performance
- Larson: 2.5M+ ops/s
- bench_random_mixed: 40M+ ops/s
- Background thread < 1% CPU
- Drop rate < 0.1%
Correctness
- No memory leaks (Valgrind)
- Thread safety verified
- All contracts validated
- Stress test passes
Code Quality
- Each box in separate .c file
- Clear API boundaries
- No cross-box includes
- < 1000 LOC total
Sign-off Checklist
Contract A (Queue Never Blocks)
- Verified ace_push_event() drops on full
- Drop tracking implemented
- No blocking operations in push path
- Approved by: _____________
Contract B (Policy Scope Limited)
- ACE only adjusts next refill count
- No immediate actions
- Atomic reads only
- Approved by: _____________
Contract C (Memory Ownership Clear)
- Ring buffer pre-allocated
- Events copied not moved
- No use-after-free possible
- Approved by: _____________
Contract D (API Boundaries Enforced)
- Box files separate
- No improper includes
- Static functions where needed
- Approved by: _____________
Notes
Remember: The goal is an ultra-simple hot path (5-6 cycles) with smart learning that never interferes with performance. When in doubt, favor simplicity and speed over completeness of telemetry.
Key Principle: "キャッシュ増やす時だけ学習させる、push して他のスレッドに任せる" - Learning happens only during refill, pushed async to another thread.