Files
hakmem/archive/analysis/RING_SIZE_INDEX.md
Moe Charm (CI) 52386401b3 Debug Counters Implementation - Clean History
Major Features:
- Debug counter infrastructure for Refill Stage tracking
- Free Pipeline counters (ss_local, ss_remote, tls_sll)
- Diagnostic counters for early return analysis
- Unified larson.sh benchmark runner with profiles
- Phase 6-3 regression analysis documentation

Bug Fixes:
- Fix SuperSlab disabled by default (HAKMEM_TINY_USE_SUPERSLAB)
- Fix profile variable naming consistency
- Add .gitignore patterns for large files

Performance:
- Phase 6-3: 4.79 M ops/s (has OOM risk)
- With SuperSlab: 3.13 M ops/s (+19% improvement)

This is a clean repository without large log files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-05 12:31:14 +09:00

117 lines
3.3 KiB
Markdown

# Ring Size Analysis: Document Index
## Overview
This directory contains a comprehensive ultra-deep analysis of why `POOL_TLS_RING_CAP` changes affect `mid_large_mt` and `random_mixed` benchmarks differently, and provides a solution that improves BOTH.
## Documents
### 1. RING_SIZE_SUMMARY.md (Start Here!)
**Length:** 2.4 KB
**Read Time:** 2 minutes
Executive summary with:
- Problem statement
- Root cause explanation
- Solution overview
- Expected results
- Key insights
**Best for:** Quick understanding of the issue and solution.
### 2. RING_SIZE_VISUALIZATION.txt
**Length:** 14 KB
**Read Time:** 5 minutes
Visual guide with ASCII art showing:
- Pool routing diagrams
- TLS memory footprint comparison
- L1 cache pressure visualization
- Performance bar charts
- Implementation roadmap
**Best for:** Visual learners who want to see the problem graphically.
### 3. RING_SIZE_SOLUTION.md
**Length:** 7.6 KB
**Read Time:** 10 minutes
Step-by-step implementation guide with:
- Exact code changes (line numbers)
- sed commands for bulk replacement
- Testing plan with scripts
- Expected performance matrix
- Rollback plan
**Best for:** Implementing the fix.
### 4. RING_SIZE_DEEP_ANALYSIS.md
**Length:** 18 KB
**Read Time:** 30 minutes
Complete technical analysis with 10 sections:
1. Pool routing confirmation
2. TLS memory footprint analysis
3. Why ring size affects benchmarks differently
4. Why Ring=128 hurts BOTH benchmarks
5. Separate ring sizes per pool (solution)
6. Optimal ring size sweep
7. Other bottlenecks analysis
8. Implementation guidance
9. Recommended approach
10. Conclusion + Appendix (cache analysis)
**Best for:** Deep understanding of the root cause and trade-offs.
## Quick Navigation
**Want to:****Read:**
- Understand the problem in 2 min → `RING_SIZE_SUMMARY.md`
- See visual diagrams → `RING_SIZE_VISUALIZATION.txt`
- Implement the fix → `RING_SIZE_SOLUTION.md`
- Deep technical dive → `RING_SIZE_DEEP_ANALYSIS.md`
## Key Findings
### Root Cause
`POOL_TLS_RING_CAP` controls ring size for L2 Pool (8-32KB) only:
- **mid_large_mt** uses L2 Pool → benefits from larger rings
- **random_mixed** uses Tiny Pool → hurt by L2's TLS growth evicting L1 cache
### Solution
Use separate ring sizes per pool:
- L2 Pool: `POOL_L2_RING_CAP=48` (balanced)
- L2.5 Pool: `POOL_L25_RING_CAP=16` (unchanged)
- Tiny Pool: No ring (freelist-based, unchanged)
### Expected Results
| Metric | Ring=16 | Ring=64 | **L2=48** | vs Ring=64 |
|--------|---------|---------|-----------|------------|
| mid_large_mt | 36.04M | 37.22M | **36.8M** | -1.1% |
| random_mixed | 22.5M | 21.29M | **22.5M** | **+5.7%** |
| Average | 29.27M | 29.26M | **29.65M** | **+1.3%** |
| TLS/thread | 2.36 KB | 5.05 KB | **3.4 KB** | **-33%** |
**Win-Win:** Improves BOTH benchmarks simultaneously.
## Implementation Timeline
- Code changes: 30 minutes
- Testing: 2-3 hours
- Documentation: 30 minutes
- **Total: ~4 hours**
## Files to Modify
1. `core/hakmem_pool.c` - Replace `POOL_TLS_RING_CAP``POOL_L2_RING_CAP`
2. `core/hakmem_l25_pool.c` - Replace `POOL_TLS_RING_CAP``POOL_L25_RING_CAP`
3. `Makefile` - Add `-DPOOL_L2_RING_CAP=48 -DPOOL_L25_RING_CAP=16`
## Success Criteria
✓ mid_large_mt: ≥36.5M ops/s (+1.3% vs baseline)
✓ random_mixed: ≥22.4M ops/s (within ±1% of baseline)
✓ TLS footprint: ≤3.5 KB/thread
✓ No regressions in full benchmark suite