Implement Phase 1: TLS SuperSlab Hint Box for Headerless performance
Design: Cache recently-used SuperSlab references in TLS to accelerate
ptr→SuperSlab resolution in Headerless mode free() path.
## Implementation
### New Box: core/box/tls_ss_hint_box.h
- Header-only Box (4-slot FIFO cache per thread)
- Functions: tls_ss_hint_init(), tls_ss_hint_update(), tls_ss_hint_lookup(), tls_ss_hint_clear()
- Memory overhead: 112 bytes per thread (negligible)
- Statistics API for debug builds (hit/miss counters)
### Integration Points
1. **Free path** (core/hakmem_tiny_free.inc):
- Lines 477-481: Fast path hint lookup before hak_super_lookup()
- Lines 550-555: Second lookup location (fallback path)
- Expected savings: 10-50 cycles → 2-5 cycles on cache hit
2. **Allocation path** (core/tiny_superslab_alloc.inc.h):
- Lines 115-122: Linear allocation return path
- Lines 179-186: Freelist allocation return path
- Cache update on successful allocation
3. **TLS variable** (core/hakmem_tiny_tls_state_box.inc):
- `__thread TlsSsHintCache g_tls_ss_hint = {0};`
### Build System
- **Build flag** (core/hakmem_build_flags.h):
- HAKMEM_TINY_SS_TLS_HINT (default: 0, disabled)
- Validation: requires HAKMEM_TINY_HEADERLESS=1
- **Makefile**:
- Removed old ss_tls_hint_box.o (conflicting implementation)
- Header-only design eliminates compiled object files
### Testing
- **Unit tests** (tests/test_tls_ss_hint.c):
- 6 test functions covering init, lookup, FIFO rotation, duplicates, clear, stats
- All tests PASSING
- **Build validation**:
- ✅ Compiles with hint disabled (default)
- ✅ Compiles with hint enabled (HAKMEM_TINY_SS_TLS_HINT=1)
### Documentation
- **Benchmark report** (docs/PHASE1_TLS_HINT_BENCHMARK.md):
- Implementation summary
- Build validation results
- Benchmark methodology (to be executed)
- Performance analysis framework
## Expected Performance
- **Hit rate**: 85-95% (single-threaded), 70-85% (multi-threaded)
- **Cycle savings**: 80-95% on cache hit (10-50 cycles → 2-5 cycles)
- **Target improvement**: 15-20% throughput increase vs Headerless baseline
- **Memory overhead**: 112 bytes per thread
## Box Theory
**Mission**: Cache hot SuperSlabs to avoid global registry lookup
**Boundary**: ptr → SuperSlab* or NULL (miss)
**Invariant**: hint.base ≤ ptr < hint.end → hit is valid
**Fallback**: Always safe to miss (triggers hak_super_lookup)
**Thread Safety**: TLS storage, no synchronization required
**Risk**: Low (read-only cache, fail-safe fallback, magic validation)
## Next Steps
1. Run full benchmark suite (sh8bench, cfrac, larson)
2. Measure actual hit rate with stats enabled
3. If performance target met (15-20% improvement), enable by default
4. Consider increasing cache slots if hit rate < 80%
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@ -11,6 +11,9 @@
|
||||
#include "tiny_box_geometry.h" // Box 3: Geometry & Capacity Calculator"
|
||||
#include "tiny_debug_api.h" // Guard/failfast declarations
|
||||
#include "hakmem_env_cache.h" // Priority-2: ENV cache (eliminate syscalls)
|
||||
#if HAKMEM_TINY_SS_TLS_HINT
|
||||
#include "box/tls_ss_hint_box.h" // Phase 1: TLS SuperSlab Hint Cache for Headerless mode
|
||||
#endif
|
||||
|
||||
// ============================================================================
|
||||
// Phase 6.24: Allocate from SuperSlab slab (lazy freelist + linear allocation)
|
||||
@ -112,6 +115,14 @@ static inline void* superslab_alloc_from_slab(SuperSlab* ss, int slab_idx) {
|
||||
tiny_remote_track_on_alloc(ss, slab_idx, user, "linear_alloc", 0);
|
||||
tiny_remote_assert_not_remote(ss, slab_idx, user, "linear_alloc_ret", 0);
|
||||
}
|
||||
// Phase 1: Update TLS hint cache with this SuperSlab (fast free path optimization)
|
||||
#if HAKMEM_TINY_SS_TLS_HINT
|
||||
{
|
||||
void* ss_base = (void*)ss;
|
||||
size_t ss_size = (size_t)1ULL << ss->lg_size;
|
||||
tls_ss_hint_update(ss, ss_base, ss_size);
|
||||
}
|
||||
#endif
|
||||
return user;
|
||||
}
|
||||
|
||||
@ -167,6 +178,14 @@ static inline void* superslab_alloc_from_slab(SuperSlab* ss, int slab_idx) {
|
||||
tiny_region_id_write_header(block, meta->class_idx);
|
||||
#else
|
||||
block;
|
||||
#endif
|
||||
// Phase 1: Update TLS hint cache with this SuperSlab (fast free path optimization)
|
||||
#if HAKMEM_TINY_SS_TLS_HINT
|
||||
{
|
||||
void* ss_base = (void*)ss;
|
||||
size_t ss_size = (size_t)1ULL << ss->lg_size;
|
||||
tls_ss_hint_update(ss, ss_base, ss_size);
|
||||
}
|
||||
#endif
|
||||
return user;
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user