hakmem/PHASE1_TLS_HINT_BENCHMARK.md at b2724e6f5d848dc96dba4b4b75b332e891e95ca1

Moe Charm (CI) 94f9ea5104 Implement Phase 1: TLS SuperSlab Hint Box for Headerless performance

Design: Cache recently-used SuperSlab references in TLS to accelerate
ptr→SuperSlab resolution in Headerless mode free() path.

## Implementation

### New Box: core/box/tls_ss_hint_box.h
- Header-only Box (4-slot FIFO cache per thread)
- Functions: tls_ss_hint_init(), tls_ss_hint_update(), tls_ss_hint_lookup(), tls_ss_hint_clear()
- Memory overhead: 112 bytes per thread (negligible)
- Statistics API for debug builds (hit/miss counters)

### Integration Points

1. **Free path** (core/hakmem_tiny_free.inc):
   - Lines 477-481: Fast path hint lookup before hak_super_lookup()
   - Lines 550-555: Second lookup location (fallback path)
   - Expected savings: 10-50 cycles → 2-5 cycles on cache hit

2. **Allocation path** (core/tiny_superslab_alloc.inc.h):
   - Lines 115-122: Linear allocation return path
   - Lines 179-186: Freelist allocation return path
   - Cache update on successful allocation

3. **TLS variable** (core/hakmem_tiny_tls_state_box.inc):
   - `__thread TlsSsHintCache g_tls_ss_hint = {0};`

### Build System

- **Build flag** (core/hakmem_build_flags.h):
  - HAKMEM_TINY_SS_TLS_HINT (default: 0, disabled)
  - Validation: requires HAKMEM_TINY_HEADERLESS=1

- **Makefile**:
  - Removed old ss_tls_hint_box.o (conflicting implementation)
  - Header-only design eliminates compiled object files

### Testing

- **Unit tests** (tests/test_tls_ss_hint.c):
  - 6 test functions covering init, lookup, FIFO rotation, duplicates, clear, stats
  - All tests PASSING

- **Build validation**:
  - ✅ Compiles with hint disabled (default)
  - ✅ Compiles with hint enabled (HAKMEM_TINY_SS_TLS_HINT=1)

### Documentation

- **Benchmark report** (docs/PHASE1_TLS_HINT_BENCHMARK.md):
  - Implementation summary
  - Build validation results
  - Benchmark methodology (to be executed)
  - Performance analysis framework

## Expected Performance

- **Hit rate**: 85-95% (single-threaded), 70-85% (multi-threaded)
- **Cycle savings**: 80-95% on cache hit (10-50 cycles → 2-5 cycles)
- **Target improvement**: 15-20% throughput increase vs Headerless baseline
- **Memory overhead**: 112 bytes per thread

## Box Theory

**Mission**: Cache hot SuperSlabs to avoid global registry lookup

**Boundary**: ptr → SuperSlab* or NULL (miss)

**Invariant**: hint.base ≤ ptr < hint.end → hit is valid

**Fallback**: Always safe to miss (triggers hak_super_lookup)

**Thread Safety**: TLS storage, no synchronization required

**Risk**: Low (read-only cache, fail-safe fallback, magic validation)

## Next Steps

1. Run full benchmark suite (sh8bench, cfrac, larson)
2. Measure actual hit rate with stats enabled
3. If performance target met (15-20% improvement), enable by default
4. Consider increasing cache slots if hit rate < 80%

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

Configuration	Time (sec)	Mops/s	Relative to Baseline	Improvement vs Headerless
Baseline (Headerless OFF, Hint OFF)	TBD	TBD	100%	-
Headerless ON, Hint OFF	TBD	TBD	TBD	0%
Headerless ON, Hint ON	TBD	TBD	TBD	TBD

Configuration	Status	Time (sec)	Notes
Baseline	TBD	TBD	-
Headerless ON, Hint OFF	TBD	TBD	-
Headerless ON, Hint ON	TBD	TBD	No regressions expected

Configuration	Status	Ops/sec	Notes
Baseline	TBD	TBD	-
Headerless ON, Hint OFF	TBD	TBD	-
Headerless ON, Hint ON	TBD	TBD	Multi-threaded hit rate: 70-85%

6.4 KiB

Raw Blame History

Phase 1: TLS SuperSlab Hint Box - Benchmark Report

Implementation Summary

What Was Implemented

Environment

Build Validation

Build 1: Hint Disabled (Baseline)

Build 2: Hint Enabled

Unit Tests

Benchmark Results (To Be Run)

Methodology

sh8bench (Memory Stress Test)

cfrac (Factorization Test)

larson (Multi-threaded Stress)

Performance Analysis

Expected Hit Rate

Cycle Count Savings

Memory Overhead

Next Steps

Success Criteria

Risk Assessment

Conclusion

6.4 KiB Raw Blame History Unescape Escape

Phase 1: TLS SuperSlab Hint Box - Benchmark Report

Implementation Summary

What Was Implemented

Environment

Build Validation

Build 1: Hint Disabled (Baseline)

Build 2: Hint Enabled

Unit Tests

Benchmark Results (To Be Run)

Methodology

sh8bench (Memory Stress Test)

cfrac (Factorization Test)

larson (Multi-threaded Stress)

Performance Analysis

Expected Hit Rate

Cycle Count Savings

Memory Overhead

Next Steps

Success Criteria

Risk Assessment

Conclusion

6.4 KiB

Raw Blame History