Commit Graph

5 Commits

Author SHA1 Message Date
82ba74933a Tiny Step 2: drain interval optimization (default 1024→2048)
Completed A/B testing for TLS SLL drain interval and implemented
optimal default value based on empirical results.

Changes:
- core/box/tls_sll_drain_box.h: Default drain interval 1024 → 2048
- TINY_DRAIN_INTERVAL_AB_REPORT.md: Complete A/B analysis report

Results (100K iterations):
- 256B: 7.68M ops/s (+4.9% vs baseline 7.32M)
- 128B: 8.76M ops/s (+13.6% vs baseline 7.71M)
- Syscalls: Unchanged (2410) - drain affects frontend only

Key Findings:
- Size-dependent optimal intervals discovered (128B→512, 256B→2048)
- Prioritized 256B critical path (classify_ptr 3.65% in perf profile)
- No regression observed; both classes improved

Methodology:
- ENV-only testing (no code changes during A/B)
- Tested intervals: 512, 1024 (baseline), 2048
- Workload: bench_random_mixed_hakmem
- Metrics: Throughput, syscall count (strace -c)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-14 17:41:26 +09:00
dd613bc93a Drain optimization: Drain ALL blocks to maximize empty detection
Issue:
- Previous drain: only 32 blocks/trigger → slabs partially empty
- Shared pool SuperSlabs mix multiple classes (C0-C7)
- active_slabs only reaches 0 when ALL classes empty
- Result: superslab_free() rarely called, LRU cache unused

Fix:
- Change drain batch_size: 32 → 0 (drain all available)
- Added active_slabs logging in shared_pool_release_slab
- Maximizes chance of SuperSlab becoming completely empty

Performance Impact (ws=4096, 200K iterations):
- Before (batch=32): 5.9M ops/s
- After (batch=all): 6.1M ops/s (+3.4%)
- Baseline improvement: 563K → 6.1M ops/s (+980%!)

Known Issue:
- LRU cache still unused due to Shared Pool design
- SuperSlabs rarely become completely empty (multi-class mixing)
- Requires Shared Pool architecture optimization (Phase 12)

Next: Investigate Shared Pool optimization strategies

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-14 07:55:51 +09:00
4ffdaae2fc Add empty slab detection to drain: call shared_pool_release_slab
Issue:
- Drain was detecting meta->used==0 but not releasing slabs
- Logic missing: shared_pool_release_slab() call after empty detection
- Result: SuperSlabs not freed, LRU cache not populated

Fix:
- Added shared_pool_release_slab() call when meta->used==0 (line 194)
- Mirrors logic in tiny_superslab_free.inc.h:223-236
- Empty slabs now released to shared pool

Performance Impact (ws=4096, 200K iterations):
- Before (baseline): 563K ops/s
- After this fix: 5.9M ops/s (+950% improvement!)

Note: LRU cache still not populated (investigating next)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-14 07:13:00 +09:00
2ef28ee5ab Fix drain box compilation: Use pthread_self() directly
Issue:
- tiny_self_u32() is static inline, cannot be linked from drain box
- Link error: undefined reference to 'tiny_self_u32'

Fix:
- Use pthread_self() directly like hakmem_tiny_superslab.c:917
- Added <pthread.h> include
- Changed extern declaration from size_t to const size_t

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-14 07:10:46 +09:00
88f3592ef6 Option B: Periodic TLS SLL Drain - Fix Phase 9 LRU Architecture Issue
Root Cause:
- TLS SLL fast path (95-99% of frees) does NOT decrement meta->used
- Slabs never appear empty → SuperSlabs never freed → LRU never used
- Impact: 6,455 mmap/munmap calls per 200K iterations (74.8% time)
- Performance: -94% regression (9.38M → 563K ops/s)

Solution:
- Periodic drain every N frees (default: 1024) per size class
- Drain path: TLS SLL → slab freelist via tiny_free_local_box()
- This properly decrements meta->used and enables empty detection

Implementation:
1. core/box/tls_sll_drain_box.h - New drain box function
   - tiny_tls_sll_drain(): Pop from TLS SLL, push to slab freelist
   - tiny_tls_sll_try_drain(): Drain trigger with counter
   - ENV: HAKMEM_TINY_SLL_DRAIN_ENABLE=1/0 (default: 1)
   - ENV: HAKMEM_TINY_SLL_DRAIN_INTERVAL=N (default: 1024)
   - ENV: HAKMEM_TINY_SLL_DRAIN_DEBUG=1 (debug logging)

2. core/tiny_free_fast_v2.inc.h - Integrated drain trigger
   - Added drain call after successful TLS SLL push (line 145)
   - Cost: 2-3 cycles per free (counter increment + comparison)
   - Drain triggered every 1024 frees (0.1% overhead)

Expected Impact:
- mmap/munmap: 6,455 → ~100 calls (-96-97%)
- Throughput: 563K → 8-10M ops/s (+1,300-1,700%)
- LRU utilization: 0% → >90% (functional)

Reference: PHASE9_LRU_ARCHITECTURE_ISSUE.md

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-14 07:09:18 +09:00