- Implement tiny_page_box.c/h: per-thread page cache between UC and Shared Pool
- Integrate Page Box into Unified Cache refill path
- Remove legacy SuperSlab implementation (merged into smallmid)
- Add HAKMEM_TINY_PAGE_BOX_CLASSES env var for selective class enabling
- Update bench_random_mixed.c with Page Box statistics
Current status: Implementation safe, no regressions.
Page Box ON/OFF shows minimal difference - pool strategy needs tuning.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Key Changes:
- Reduced static capacity from 16 to 12 SuperSlabs per class
- Fixed prefill threshold from hardcoded 4 to match capacity (12)
- Updated environment variable clamping to [1,12]
- This allows warm pool to actually utilize its full capacity
Performance:
- Baseline (post-unified-cache-opt): 4.76M ops/s
- After Phase 1: 4.84M ops/s
- Improvement: +1.6% (expected +15-20%)
Note: Actual improvement lower than expected because the warm pool
bottleneck is only part of the overall allocation path. Unified cache
optimization (+14.9%) already addressed much of the registry scan overhead.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Problem: Warm pool had 0% hit rate (only 1 hit per 3976 misses) despite being
implemented, causing all cache misses to go through expensive superslab_refill
registry scans.
Root Cause Analysis:
- Warm pool was initialized once and pushed a single slab after each refill
- When that slab was exhausted, it was discarded (not pushed back)
- Next refill would push another single slab, which was immediately exhausted
- Pool would oscillate between 0 and 1 items, yielding 0% hit rate
Solution: Secondary Prefill on Cache Miss
When warm pool becomes empty, we now do multiple superslab_refills and prefill
the pool with 3 additional HOT superlslabs before attempting to carve. This
builds a working set of slabs that can sustain allocation pressure.
Implementation Details:
- Modified unified_cache_refill() cold path to detect empty pool
- Added prefill loop: when pool count == 0, load 3 extra superlslabs
- Store extra slabs in warm pool, keep 1 in TLS for immediate carving
- Track prefill events in g_warm_pool_stats[].prefilled counter
Results (1M Random Mixed 256B allocations):
- Before: C7 hits=1, misses=3976, hit_rate=0.0%
- After: C7 hits=3929, misses=3143, hit_rate=55.6%
- Throughput: 4.055M ops/s (maintained vs 4.07M baseline)
- Stability: Consistent 55.6% hit rate at 5M allocations (4.102M ops/s)
Performance Impact:
- No regression: throughput remained stable at ~4.1M ops/s
- Registry scan avoided in 55.6% of cache misses (significant savings)
- Warm pool now functioning as intended with strong locality
Configuration:
- TINY_WARM_POOL_MAX_PER_CLASS increased from 4 to 16 to support prefill
- Prefill budget hardcoded to 3 (tunable via env var if needed later)
- All statistics always compiled, ENV-gated printing via HAKMEM_WARM_POOL_STATS=1
Next Steps:
- Monitor for further optimization opportunities (prefill budget tuning)
- Consider adaptive prefill budget based on class-specific hit rates
- Validate at larger allocation counts (10M+ pending registry size fix)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
ChatGPT's diagnostic changes to address TLS_SLL_HDR_RESET issue.
Current status: Partial mitigation, but root cause remains.
Changes Applied:
1. SuperSlab Registry Fallback (hakmem_super_registry.h)
- Added legacy table probe when hash map lookup misses
- Prevents NULL returns for valid SuperSlabs during initialization
- Status: ✅ Works but may hide underlying registration issues
2. TLS SLL Push Validation (tls_sll_box.h)
- Reject push if SuperSlab lookup returns NULL
- Reject push if class_idx mismatch detected
- Added [TLS_SLL_PUSH_NO_SS] diagnostic message
- Status: ✅ Prevents list corruption (defensive)
3. SuperSlab Allocation Class Fix (superslab_allocate.c)
- Pass actual class_idx to sp_internal_allocate_superslab
- Prevents dummy class=8 causing OOB access
- Status: ✅ Root cause fix for allocation path
4. Debug Output Additions
- First 256 push/pop operations traced
- First 4 mismatches logged with details
- SuperSlab registration state logged
- Status: ✅ Diagnostic tool (not a fix)
5. TLS Hint Box Removed
- Deleted ss_tls_hint_box.{c,h} (Phase 1 optimization)
- Simplified to focus on stability first
- Status: ⏳ Can be re-added after root cause fixed
Current Problem (REMAINS UNSOLVED):
- [TLS_SLL_HDR_RESET] still occurs after ~60 seconds of sh8bench
- Pointer is 16 bytes offset from expected (class 1 → class 2 boundary)
- hak_super_lookup returns NULL for that pointer
- Suggests: Use-After-Free, Double-Free, or pointer arithmetic error
Root Cause Analysis:
- Pattern: Pointer offset by +16 (one class 1 stride)
- Timing: Cumulative problem (appears after 60s, not immediately)
- Location: Header corruption detected during TLS SLL pop
Remaining Issues:
⚠️ Registry fallback is defensive (may hide registration bugs)
⚠️ Push validation prevents symptoms but not root cause
⚠️ 16-byte pointer offset source unidentified
Next Steps for Investigation:
1. Full pointer arithmetic audit (Magazine ⇔ TLS SLL paths)
2. Enhanced logging at HDR_RESET point:
- Expected vs actual pointer value
- Pointer provenance (where it came from)
- Allocation trace for that block
3. Verify Headerless flag is OFF throughout build
4. Check for double-offset application in conversions
Technical Assessment:
- 60% root cause fixes (allocation class, validation)
- 40% defensive mitigation (registry fallback, push rejection)
Performance Impact:
- Registry fallback: +10-30 cycles on cold path (negligible)
- Push validation: +5-10 cycles per push (acceptable)
- Overall: < 2% performance impact estimated
Related Issues:
- Phase 1 TLS Hint Box removed temporarily
- Phase 2 Headerless blocked until stability achieved
🤖 Generated with Claude Code (https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Removed Legacy Backend fallback; Shared Pool is now the sole backend.
- Removed Soft Cap limit in Shared Pool to allow full memory management.
- Implemented EMPTY slab recycling with batched meta->used decrement in remote drain.
- Updated tiny_free_local_box to return is_empty status for safe recycling.
- Fixed race condition in release path by removing from legacy list early.
- Achieved 50.3M ops/s in WS8192 benchmark (+200% vs baseline).
- Split core/hakmem_shared_pool.c into acquire/release modules for maintainability.
- Introduced core/hakmem_shared_pool_internal.h for shared internal API.
- Fixed incorrect function name usage (superslab_alloc -> superslab_allocate).
- Increased SUPER_REG_SIZE to 1M to support large working sets (Phase 9-2 fix).
- Updated Makefile.
- Verified with benchmarks.