Implement Phase 1: TLS SuperSlab Hint Box for Headerless performance

Design: Cache recently-used SuperSlab references in TLS to accelerate
ptr→SuperSlab resolution in Headerless mode free() path.

## Implementation

### New Box: core/box/tls_ss_hint_box.h
- Header-only Box (4-slot FIFO cache per thread)
- Functions: tls_ss_hint_init(), tls_ss_hint_update(), tls_ss_hint_lookup(), tls_ss_hint_clear()
- Memory overhead: 112 bytes per thread (negligible)
- Statistics API for debug builds (hit/miss counters)

### Integration Points

1. **Free path** (core/hakmem_tiny_free.inc):
   - Lines 477-481: Fast path hint lookup before hak_super_lookup()
   - Lines 550-555: Second lookup location (fallback path)
   - Expected savings: 10-50 cycles → 2-5 cycles on cache hit

2. **Allocation path** (core/tiny_superslab_alloc.inc.h):
   - Lines 115-122: Linear allocation return path
   - Lines 179-186: Freelist allocation return path
   - Cache update on successful allocation

3. **TLS variable** (core/hakmem_tiny_tls_state_box.inc):
   - `__thread TlsSsHintCache g_tls_ss_hint = {0};`

### Build System

- **Build flag** (core/hakmem_build_flags.h):
  - HAKMEM_TINY_SS_TLS_HINT (default: 0, disabled)
  - Validation: requires HAKMEM_TINY_HEADERLESS=1

- **Makefile**:
  - Removed old ss_tls_hint_box.o (conflicting implementation)
  - Header-only design eliminates compiled object files

### Testing

- **Unit tests** (tests/test_tls_ss_hint.c):
  - 6 test functions covering init, lookup, FIFO rotation, duplicates, clear, stats
  - All tests PASSING

- **Build validation**:
  -  Compiles with hint disabled (default)
  -  Compiles with hint enabled (HAKMEM_TINY_SS_TLS_HINT=1)

### Documentation

- **Benchmark report** (docs/PHASE1_TLS_HINT_BENCHMARK.md):
  - Implementation summary
  - Build validation results
  - Benchmark methodology (to be executed)
  - Performance analysis framework

## Expected Performance

- **Hit rate**: 85-95% (single-threaded), 70-85% (multi-threaded)
- **Cycle savings**: 80-95% on cache hit (10-50 cycles → 2-5 cycles)
- **Target improvement**: 15-20% throughput increase vs Headerless baseline
- **Memory overhead**: 112 bytes per thread

## Box Theory

**Mission**: Cache hot SuperSlabs to avoid global registry lookup

**Boundary**: ptr → SuperSlab* or NULL (miss)

**Invariant**: hint.base ≤ ptr < hint.end → hit is valid

**Fallback**: Always safe to miss (triggers hak_super_lookup)

**Thread Safety**: TLS storage, no synchronization required

**Risk**: Low (read-only cache, fail-safe fallback, magic validation)

## Next Steps

1. Run full benchmark suite (sh8bench, cfrac, larson)
2. Measure actual hit rate with stats enabled
3. If performance target met (15-20% improvement), enable by default
4. Consider increasing cache slots if hit rate < 80%

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Moe Charm (CI)
2025-12-03 18:06:24 +09:00
parent d397994b23
commit 94f9ea5104
8 changed files with 802 additions and 7 deletions

View File

@ -11,6 +11,9 @@
#include "tiny_box_geometry.h" // Box 3: Geometry & Capacity Calculator"
#include "tiny_debug_api.h" // Guard/failfast declarations
#include "hakmem_env_cache.h" // Priority-2: ENV cache (eliminate syscalls)
#if HAKMEM_TINY_SS_TLS_HINT
#include "box/tls_ss_hint_box.h" // Phase 1: TLS SuperSlab Hint Cache for Headerless mode
#endif
// ============================================================================
// Phase 6.24: Allocate from SuperSlab slab (lazy freelist + linear allocation)
@ -112,6 +115,14 @@ static inline void* superslab_alloc_from_slab(SuperSlab* ss, int slab_idx) {
tiny_remote_track_on_alloc(ss, slab_idx, user, "linear_alloc", 0);
tiny_remote_assert_not_remote(ss, slab_idx, user, "linear_alloc_ret", 0);
}
// Phase 1: Update TLS hint cache with this SuperSlab (fast free path optimization)
#if HAKMEM_TINY_SS_TLS_HINT
{
void* ss_base = (void*)ss;
size_t ss_size = (size_t)1ULL << ss->lg_size;
tls_ss_hint_update(ss, ss_base, ss_size);
}
#endif
return user;
}
@ -167,6 +178,14 @@ static inline void* superslab_alloc_from_slab(SuperSlab* ss, int slab_idx) {
tiny_region_id_write_header(block, meta->class_idx);
#else
block;
#endif
// Phase 1: Update TLS hint cache with this SuperSlab (fast free path optimization)
#if HAKMEM_TINY_SS_TLS_HINT
{
void* ss_base = (void*)ss;
size_t ss_size = (size_t)1ULL << ss->lg_size;
tls_ss_hint_update(ss, ss_base, ss_size);
}
#endif
return user;
}