213 lines
6.4 KiB
Markdown
213 lines
6.4 KiB
Markdown
|
|
# Phase 1: TLS SuperSlab Hint Box - Benchmark Report
|
|||
|
|
|
|||
|
|
## Implementation Summary
|
|||
|
|
|
|||
|
|
**Date**: 2025-12-03
|
|||
|
|
**Status**: Implementation Complete - Benchmarking Required
|
|||
|
|
**Commit**: [Pending]
|
|||
|
|
|
|||
|
|
### What Was Implemented
|
|||
|
|
|
|||
|
|
1. **TLS SuperSlab Hint Box** (`/mnt/workdisk/public_share/hakmem/core/box/tls_ss_hint_box.h`)
|
|||
|
|
- Header-only Box implementation
|
|||
|
|
- 4-slot FIFO cache per thread (112 bytes TLS overhead)
|
|||
|
|
- Inline functions: `tls_ss_hint_init()`, `tls_ss_hint_update()`, `tls_ss_hint_lookup()`, `tls_ss_hint_clear()`
|
|||
|
|
- Statistics API for debug builds
|
|||
|
|
|
|||
|
|
2. **Build Flag** (`/mnt/workdisk/public_share/hakmem/core/hakmem_build_flags.h`)
|
|||
|
|
- `HAKMEM_TINY_SS_TLS_HINT` (default: 0, disabled)
|
|||
|
|
- Validation check: requires `HAKMEM_TINY_HEADERLESS=1`
|
|||
|
|
|
|||
|
|
3. **Integration Points**
|
|||
|
|
- **Free path** (`core/hakmem_tiny_free.inc`): Lines 477-481, 550-555
|
|||
|
|
- Fast path hint lookup before expensive `hak_super_lookup()`
|
|||
|
|
- **Allocation path** (`core/tiny_superslab_alloc.inc.h`): Lines 115-122, 179-186
|
|||
|
|
- Cache update on successful allocation (both linear and freelist modes)
|
|||
|
|
|
|||
|
|
4. **TLS Variable Definition** (`core/hakmem_tiny_tls_state_box.inc`)
|
|||
|
|
- `__thread TlsSsHintCache g_tls_ss_hint = {0};`
|
|||
|
|
|
|||
|
|
5. **Unit Tests** (`tests/test_tls_ss_hint.c`)
|
|||
|
|
- 6 test functions (init, basic lookup, FIFO rotation, duplicate detection, clear, stats)
|
|||
|
|
- All tests PASSING
|
|||
|
|
|
|||
|
|
6. **Build System**
|
|||
|
|
- Removed old conflicting `ss_tls_hint_box.c` (different implementation)
|
|||
|
|
- Updated Makefile to remove compiled object files (header-only design)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Environment
|
|||
|
|
|
|||
|
|
- **CPU**: [Run: lscpu | grep "Model name"]
|
|||
|
|
- **OS**: Linux 6.8.0-87-generic
|
|||
|
|
- **Compiler**: gcc (Ubuntu)
|
|||
|
|
- **Build Date**: 2025-12-03
|
|||
|
|
- **Hakmem Commit**: [Git log -1 --oneline]
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Build Validation
|
|||
|
|
|
|||
|
|
### Build 1: Hint Disabled (Baseline)
|
|||
|
|
```bash
|
|||
|
|
make clean
|
|||
|
|
make shared -j8
|
|||
|
|
```
|
|||
|
|
**Result**: ✅ SUCCESS
|
|||
|
|
|
|||
|
|
### Build 2: Hint Enabled
|
|||
|
|
```bash
|
|||
|
|
make clean
|
|||
|
|
make shared -j8 EXTRA_CFLAGS="-DHAKMEM_TINY_SS_TLS_HINT=1 -DHAKMEM_TINY_HEADERLESS=1"
|
|||
|
|
```
|
|||
|
|
**Result**: ✅ SUCCESS
|
|||
|
|
|
|||
|
|
### Unit Tests
|
|||
|
|
```bash
|
|||
|
|
gcc -o tests/test_tls_ss_hint tests/test_tls_ss_hint.c -I./core \
|
|||
|
|
-DHAKMEM_TINY_SS_TLS_HINT=1 -DHAKMEM_BUILD_RELEASE=0 -DHAKMEM_TINY_HEADERLESS=1
|
|||
|
|
./tests/test_tls_ss_hint
|
|||
|
|
```
|
|||
|
|
**Result**: ✅ ALL 6 TESTS PASSED
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Benchmark Results (To Be Run)
|
|||
|
|
|
|||
|
|
### Methodology
|
|||
|
|
|
|||
|
|
Run each benchmark configuration 3 times and take the median:
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# Configuration 1: Baseline (Headerless OFF, Hint OFF)
|
|||
|
|
make clean
|
|||
|
|
make shared -j8
|
|||
|
|
LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/sh8bench
|
|||
|
|
|
|||
|
|
# Configuration 2: Headerless ON, Hint OFF
|
|||
|
|
make clean
|
|||
|
|
make shared -j8 EXTRA_CFLAGS="-DHAKMEM_TINY_HEADERLESS=1 -DHAKMEM_TINY_SS_TLS_HINT=0"
|
|||
|
|
LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/sh8bench
|
|||
|
|
|
|||
|
|
# Configuration 3: Headerless ON, Hint ON
|
|||
|
|
make clean
|
|||
|
|
make shared -j8 EXTRA_CFLAGS="-DHAKMEM_TINY_HEADERLESS=1 -DHAKMEM_TINY_SS_TLS_HINT=1"
|
|||
|
|
LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/sh8bench
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### sh8bench (Memory Stress Test)
|
|||
|
|
|
|||
|
|
| Configuration | Time (sec) | Mops/s | Relative to Baseline | Improvement vs Headerless |
|
|||
|
|
|---------------|-----------|---------|----------------------|---------------------------|
|
|||
|
|
| Baseline (Headerless OFF, Hint OFF) | TBD | TBD | 100% | - |
|
|||
|
|
| Headerless ON, Hint OFF | TBD | TBD | TBD | 0% |
|
|||
|
|
| Headerless ON, Hint ON | TBD | TBD | TBD | **TBD** |
|
|||
|
|
|
|||
|
|
**Expected**: Headerless w/ Hint should recover 15-20% of Headerless performance loss
|
|||
|
|
|
|||
|
|
### cfrac (Factorization Test)
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/cfrac 17545186520809
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
| Configuration | Status | Time (sec) | Notes |
|
|||
|
|
|---------------|--------|-----------|-------|
|
|||
|
|
| Baseline | TBD | TBD | - |
|
|||
|
|
| Headerless ON, Hint OFF | TBD | TBD | - |
|
|||
|
|
| Headerless ON, Hint ON | TBD | TBD | No regressions expected |
|
|||
|
|
|
|||
|
|
### larson (Multi-threaded Stress)
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
LD_PRELOAD=./libhakmem.so ./mimalloc-bench/out/bench/larson 8
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
| Configuration | Status | Ops/sec | Notes |
|
|||
|
|
|---------------|--------|---------|-------|
|
|||
|
|
| Baseline | TBD | TBD | - |
|
|||
|
|
| Headerless ON, Hint OFF | TBD | TBD | - |
|
|||
|
|
| Headerless ON, Hint ON | TBD | TBD | Multi-threaded hit rate: 70-85% |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Performance Analysis
|
|||
|
|
|
|||
|
|
### Expected Hit Rate
|
|||
|
|
|
|||
|
|
Based on design analysis (Section 9 of TLS_SS_HINT_BOX_DESIGN.md):
|
|||
|
|
|
|||
|
|
- **Single-threaded**: 85-95%
|
|||
|
|
- **Multi-threaded**: 70-85%
|
|||
|
|
|
|||
|
|
### Cycle Count Savings
|
|||
|
|
|
|||
|
|
| Operation | Without Hint | With Hint (Hit) | Savings |
|
|||
|
|
|-----------|-------------|----------------|---------|
|
|||
|
|
| ptr→SuperSlab lookup | 10-50 cycles | 2-5 cycles | **80-95%** |
|
|||
|
|
|
|||
|
|
### Memory Overhead
|
|||
|
|
|
|||
|
|
- Per-thread: 112 bytes (4 slots × 24 bytes + 16 bytes metadata)
|
|||
|
|
- 1000 threads: 112 KB (negligible)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Next Steps
|
|||
|
|
|
|||
|
|
1. **Run Benchmarks**: Execute benchmark suite on dedicated machine
|
|||
|
|
2. **Measure Hit Rate**: Enable `HAKMEM_BUILD_RELEASE=0` and add stats dump at exit
|
|||
|
|
3. **Performance Tuning**: If hit rate < 80%, consider increasing slots to 8
|
|||
|
|
4. **Production Rollout**: If results meet target (15-20% improvement), enable by default
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Success Criteria
|
|||
|
|
|
|||
|
|
✅ **Code Quality**
|
|||
|
|
- [x] Header-only Box design (zero runtime overhead when disabled)
|
|||
|
|
- [x] Follows Box Theory architecture
|
|||
|
|
- [x] Comprehensive unit tests (6/6 passing)
|
|||
|
|
- [x] Fail-safe fallback (miss → hak_super_lookup)
|
|||
|
|
|
|||
|
|
✅ **Build System**
|
|||
|
|
- [x] Compiles with hint disabled (default)
|
|||
|
|
- [x] Compiles with hint enabled
|
|||
|
|
- [x] No regressions in existing tests
|
|||
|
|
|
|||
|
|
⏳ **Performance** (Benchmarking Required)
|
|||
|
|
- [ ] sh8bench: +15-20% throughput vs Headerless baseline
|
|||
|
|
- [ ] cfrac: No regressions
|
|||
|
|
- [ ] larson: No regressions, +15-20% ideal case
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Risk Assessment
|
|||
|
|
|
|||
|
|
**Risk Level**: Low
|
|||
|
|
|
|||
|
|
- ✅ Thread-local storage (no cache coherency issues)
|
|||
|
|
- ✅ Read-only cache (never modifies SuperSlab state)
|
|||
|
|
- ✅ Magic number validation (catches stale entries)
|
|||
|
|
- ✅ Fail-safe fallback (miss → hak_super_lookup)
|
|||
|
|
- ✅ Minimal integration surface (2 locations modified)
|
|||
|
|
- ✅ Zero overhead when disabled (compile-time flag)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Conclusion
|
|||
|
|
|
|||
|
|
**Implementation Status**: ✅ Complete
|
|||
|
|
|
|||
|
|
The TLS SuperSlab Hint Box has been successfully implemented as a header-only Box with clean integration into the free and allocation paths. All unit tests pass, and the build succeeds in both configurations (hint enabled/disabled).
|
|||
|
|
|
|||
|
|
**Next Action**: Run full benchmark suite to validate performance targets (15-20% improvement over Headerless baseline).
|
|||
|
|
|
|||
|
|
**Recommendation**: If benchmarks show >= 15% improvement with no regressions, merge to master and plan for default enable in Phase 2.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**Generated**: 2025-12-03
|
|||
|
|
**Author**: hakmem team
|