Major Features: - Debug counter infrastructure for Refill Stage tracking - Free Pipeline counters (ss_local, ss_remote, tls_sll) - Diagnostic counters for early return analysis - Unified larson.sh benchmark runner with profiles - Phase 6-3 regression analysis documentation Bug Fixes: - Fix SuperSlab disabled by default (HAKMEM_TINY_USE_SUPERSLAB) - Fix profile variable naming consistency - Add .gitignore patterns for large files Performance: - Phase 6-3: 4.79 M ops/s (has OOM risk) - With SuperSlab: 3.13 M ops/s (+19% improvement) This is a clean repository without large log files. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
117 lines
3.3 KiB
Markdown
117 lines
3.3 KiB
Markdown
# Ring Size Analysis: Document Index
|
|
|
|
## Overview
|
|
|
|
This directory contains a comprehensive ultra-deep analysis of why `POOL_TLS_RING_CAP` changes affect `mid_large_mt` and `random_mixed` benchmarks differently, and provides a solution that improves BOTH.
|
|
|
|
## Documents
|
|
|
|
### 1. RING_SIZE_SUMMARY.md (Start Here!)
|
|
**Length:** 2.4 KB
|
|
**Read Time:** 2 minutes
|
|
|
|
Executive summary with:
|
|
- Problem statement
|
|
- Root cause explanation
|
|
- Solution overview
|
|
- Expected results
|
|
- Key insights
|
|
|
|
**Best for:** Quick understanding of the issue and solution.
|
|
|
|
### 2. RING_SIZE_VISUALIZATION.txt
|
|
**Length:** 14 KB
|
|
**Read Time:** 5 minutes
|
|
|
|
Visual guide with ASCII art showing:
|
|
- Pool routing diagrams
|
|
- TLS memory footprint comparison
|
|
- L1 cache pressure visualization
|
|
- Performance bar charts
|
|
- Implementation roadmap
|
|
|
|
**Best for:** Visual learners who want to see the problem graphically.
|
|
|
|
### 3. RING_SIZE_SOLUTION.md
|
|
**Length:** 7.6 KB
|
|
**Read Time:** 10 minutes
|
|
|
|
Step-by-step implementation guide with:
|
|
- Exact code changes (line numbers)
|
|
- sed commands for bulk replacement
|
|
- Testing plan with scripts
|
|
- Expected performance matrix
|
|
- Rollback plan
|
|
|
|
**Best for:** Implementing the fix.
|
|
|
|
### 4. RING_SIZE_DEEP_ANALYSIS.md
|
|
**Length:** 18 KB
|
|
**Read Time:** 30 minutes
|
|
|
|
Complete technical analysis with 10 sections:
|
|
1. Pool routing confirmation
|
|
2. TLS memory footprint analysis
|
|
3. Why ring size affects benchmarks differently
|
|
4. Why Ring=128 hurts BOTH benchmarks
|
|
5. Separate ring sizes per pool (solution)
|
|
6. Optimal ring size sweep
|
|
7. Other bottlenecks analysis
|
|
8. Implementation guidance
|
|
9. Recommended approach
|
|
10. Conclusion + Appendix (cache analysis)
|
|
|
|
**Best for:** Deep understanding of the root cause and trade-offs.
|
|
|
|
## Quick Navigation
|
|
|
|
**Want to:** → **Read:**
|
|
- Understand the problem in 2 min → `RING_SIZE_SUMMARY.md`
|
|
- See visual diagrams → `RING_SIZE_VISUALIZATION.txt`
|
|
- Implement the fix → `RING_SIZE_SOLUTION.md`
|
|
- Deep technical dive → `RING_SIZE_DEEP_ANALYSIS.md`
|
|
|
|
## Key Findings
|
|
|
|
### Root Cause
|
|
`POOL_TLS_RING_CAP` controls ring size for L2 Pool (8-32KB) only:
|
|
- **mid_large_mt** uses L2 Pool → benefits from larger rings
|
|
- **random_mixed** uses Tiny Pool → hurt by L2's TLS growth evicting L1 cache
|
|
|
|
### Solution
|
|
Use separate ring sizes per pool:
|
|
- L2 Pool: `POOL_L2_RING_CAP=48` (balanced)
|
|
- L2.5 Pool: `POOL_L25_RING_CAP=16` (unchanged)
|
|
- Tiny Pool: No ring (freelist-based, unchanged)
|
|
|
|
### Expected Results
|
|
| Metric | Ring=16 | Ring=64 | **L2=48** | vs Ring=64 |
|
|
|--------|---------|---------|-----------|------------|
|
|
| mid_large_mt | 36.04M | 37.22M | **36.8M** | -1.1% |
|
|
| random_mixed | 22.5M | 21.29M | **22.5M** | **+5.7%** |
|
|
| Average | 29.27M | 29.26M | **29.65M** | **+1.3%** |
|
|
| TLS/thread | 2.36 KB | 5.05 KB | **3.4 KB** | **-33%** |
|
|
|
|
**Win-Win:** Improves BOTH benchmarks simultaneously.
|
|
|
|
## Implementation Timeline
|
|
|
|
- Code changes: 30 minutes
|
|
- Testing: 2-3 hours
|
|
- Documentation: 30 minutes
|
|
- **Total: ~4 hours**
|
|
|
|
## Files to Modify
|
|
|
|
1. `core/hakmem_pool.c` - Replace `POOL_TLS_RING_CAP` → `POOL_L2_RING_CAP`
|
|
2. `core/hakmem_l25_pool.c` - Replace `POOL_TLS_RING_CAP` → `POOL_L25_RING_CAP`
|
|
3. `Makefile` - Add `-DPOOL_L2_RING_CAP=48 -DPOOL_L25_RING_CAP=16`
|
|
|
|
## Success Criteria
|
|
|
|
✓ mid_large_mt: ≥36.5M ops/s (+1.3% vs baseline)
|
|
✓ random_mixed: ≥22.4M ops/s (within ±1% of baseline)
|
|
✓ TLS footprint: ≤3.5 KB/thread
|
|
✓ No regressions in full benchmark suite
|
|
|