## Changes ### 1. core/page_arena.c - Removed init failure message (lines 25-27) - error is handled by returning early - All other fprintf statements already wrapped in existing #if !HAKMEM_BUILD_RELEASE blocks ### 2. core/hakmem.c - Wrapped SIGSEGV handler init message (line 72) - CRITICAL: Kept SIGSEGV/SIGBUS/SIGABRT error messages (lines 62-64) - production needs crash logs ### 3. core/hakmem_shared_pool.c - Wrapped all debug fprintf statements in #if !HAKMEM_BUILD_RELEASE: - Node pool exhaustion warning (line 252) - SP_META_CAPACITY_ERROR warning (line 421) - SP_FIX_GEOMETRY debug logging (line 745) - SP_ACQUIRE_STAGE0.5_EMPTY debug logging (line 865) - SP_ACQUIRE_STAGE0_L0 debug logging (line 803) - SP_ACQUIRE_STAGE1_LOCKFREE debug logging (line 922) - SP_ACQUIRE_STAGE2_LOCKFREE debug logging (line 996) - SP_ACQUIRE_STAGE3 debug logging (line 1116) - SP_SLOT_RELEASE debug logging (line 1245) - SP_SLOT_FREELIST_LOCKFREE debug logging (line 1305) - SP_SLOT_COMPLETELY_EMPTY debug logging (line 1316) - Fixed lock_stats_init() for release builds (lines 60-65) - ensure g_lock_stats_enabled is initialized ## Performance Validation Before: 51M ops/s (with debug fprintf overhead) After: 49.1M ops/s (consistent performance, fprintf removed from hot paths) ## Build & Test ```bash ./build.sh larson_hakmem ./out/release/larson_hakmem 1 5 1 1000 100 10000 42 # Result: 49.1M ops/s ``` Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
6.5 KiB
6.5 KiB
Pool TLS + Learning Implementation Checklist
Pre-Implementation Review
Contract Understanding
- Read and understand all 4 contracts (A-D) in POOL_TLS_LEARNING_DESIGN.md
- Identify which contract applies to each code section
- Review enforcement strategies for each contract
Phase 1: Ultra-Simple TLS Implementation
Box 1: TLS Freelist (pool_tls.c)
Setup
- Create
core/pool_tls.candcore/pool_tls.h - Define TLS globals:
__thread void* g_tls_pool_head[POOL_SIZE_CLASSES] - Define TLS counts:
__thread uint32_t g_tls_pool_count[POOL_SIZE_CLASSES] - Define default refill counts array
Hot Path Implementation
- Implement
pool_alloc_fast()- must be 5-6 instructions max- Pop from TLS freelist
- Conditional header write (if enabled)
- Call refill only on miss
- Implement
pool_free_fast()- must be 5-6 instructions max- Header validation (if enabled)
- Push to TLS freelist
- Optional drain check
Contract D Validation
- Verify Box1 has NO learning code
- Verify Box1 has NO metrics collection
- Verify Box1 only exposes public API and internal chain installer
- No includes of ace_learning.h or pool_refill.h in pool_tls.c
Testing
- Unit test: Allocation/free correctness
- Performance test: Target 40-60M ops/s
- Verify hot path is < 10 instructions with objdump
Box 2: Refill Engine (pool_refill.c)
Setup
- Create
core/pool_refill.candcore/pool_refill.h - Import only pool_tls.h public API
- Define refill statistics (miss streak, etc.)
Refill Implementation
- Implement
pool_refill_and_alloc()- Capture pre-refill state
- Get refill count (default for Phase 1)
- Batch allocate from backend
- Install chain in TLS
- Return first block
Contract B Validation
- Verify refill NEVER blocks waiting for policy
- Verify refill only reads atomic policy values
- No immediate cache manipulation
Contract C Validation
- Event created on stack
- Event data copied, not referenced
- No dynamic allocation for events
Phase 2: Metrics Collection
Metrics Addition
- Add hit/miss counters to TLS state
- Add miss streak tracking
- Instrument hot path (with ifdef guard)
- Implement
pool_print_stats()
Performance Validation
- Measure regression with metrics enabled
- Must be < 2% performance impact
- Verify counters are accurate
Phase 3: Learning Integration
Box 3: ACE Learning (ace_learning.c)
Setup
- Create
core/ace_learning.candcore/ace_learning.h - Pre-allocate event ring buffer:
RefillEvent g_event_pool[QUEUE_SIZE] - Initialize MPSC queue structure
- Define policy table:
_Atomic uint32_t g_refill_policies[CLASSES]
MPSC Queue Implementation
- Implement
ace_push_event()- Contract A: Check for full queue
- Contract A: DROP if full (never block!)
- Contract A: Track drops with counter
- Contract C: COPY event to ring buffer
- Use proper memory ordering
- Implement
ace_consume_events()- Read events with acquire semantics
- Process and release slots
- Sleep when queue empty
Contract A Validation
- Push function NEVER blocks
- Drops are tracked
- Drop rate monitoring implemented
- Warning issued if drop rate > 1%
Contract B Validation
- ACE only writes to policy table
- No immediate actions taken
- No direct TLS manipulation
- No blocking operations
Contract C Validation
- Ring buffer pre-allocated
- Events copied, not moved
- No malloc/free in event path
- Clear slot ownership model
Contract D Validation
- ace_learning.c does NOT include pool_tls.h internals
- No direct calls to Box1 functions
- Only ace_push_event() exposed to Box2
- Make notify_learning() static in pool_refill.c
Learning Algorithm
- Implement UCB1 or similar
- Track per-class statistics
- Gradual policy adjustments
- Oscillation detection
Integration Points
Box2 → Box3 Connection
- Add event creation in pool_refill_and_alloc()
- Call ace_push_event() after successful refill
- Make notify_learning() wrapper static
Box2 Policy Reading
- Replace DEFAULT_REFILL_COUNT with ace_get_refill_count()
- Atomic read of policy (no blocking)
- Fallback to default if no policy
Startup
- Launch learning thread in hakmem_init()
- Initialize policy table with defaults
- Verify thread starts successfully
Diagnostics Implementation
Queue Monitoring
- Implement drop rate calculation
- Add queue health metrics structure
- Periodic health checks
Debug Flags
- POOL_DEBUG_CONTRACTS - contract validation
- POOL_DEBUG_DROPS - log dropped events
- Add contract violation counters
Runtime Diagnostics
- Implement pool_print_diagnostics()
- Per-class statistics
- Queue health report
- Contract violation summary
Final Validation
Performance
- Larson: 2.5M+ ops/s
- bench_random_mixed: 40M+ ops/s
- Background thread < 1% CPU
- Drop rate < 0.1%
Correctness
- No memory leaks (Valgrind)
- Thread safety verified
- All contracts validated
- Stress test passes
Code Quality
- Each box in separate .c file
- Clear API boundaries
- No cross-box includes
- < 1000 LOC total
Sign-off Checklist
Contract A (Queue Never Blocks)
- Verified ace_push_event() drops on full
- Drop tracking implemented
- No blocking operations in push path
- Approved by: _____________
Contract B (Policy Scope Limited)
- ACE only adjusts next refill count
- No immediate actions
- Atomic reads only
- Approved by: _____________
Contract C (Memory Ownership Clear)
- Ring buffer pre-allocated
- Events copied not moved
- No use-after-free possible
- Approved by: _____________
Contract D (API Boundaries Enforced)
- Box files separate
- No improper includes
- Static functions where needed
- Approved by: _____________
Notes
Remember: The goal is an ultra-simple hot path (5-6 cycles) with smart learning that never interferes with performance. When in doubt, favor simplicity and speed over completeness of telemetry.
Key Principle: "キャッシュ増やす時だけ学習させる、push して他のスレッドに任せる" - Learning happens only during refill, pushed async to another thread.