a6e681aae7
P2: TLS SLL Redesign - class_map default, tls_cached tracking, conditional header restore
...
This commit completes the P2 phase of the Tiny Pool TLS SLL redesign to fix the
Header/Next pointer conflict that was causing ~30% crash rates.
Changes:
- P2.1: Make class_map lookup the default (ENV: HAKMEM_TINY_NO_CLASS_MAP=1 for legacy)
- P2.2: Add meta->tls_cached field to track blocks cached in TLS SLL
- P2.3: Make Header restoration conditional in tiny_next_store() (default: skip)
- P2.4: Add invariant verification functions (active + tls_cached ≈ used)
- P0.4: Document new ENV variables in ENV_VARS.md
New ENV variables:
- HAKMEM_TINY_ACTIVE_TRACK=1: Enable active/tls_cached tracking (~1% overhead)
- HAKMEM_TINY_NO_CLASS_MAP=1: Disable class_map (legacy mode)
- HAKMEM_TINY_RESTORE_HEADER=1: Force header restoration (legacy mode)
- HAKMEM_TINY_INVARIANT_CHECK=1: Enable invariant verification (debug)
- HAKMEM_TINY_INVARIANT_DUMP=1: Enable periodic state dumps (debug)
Benchmark results (bench_tiny_hot_hakmem 64B):
- Default (class_map ON): 84.49 M ops/sec
- ACTIVE_TRACK=1: 83.62 M ops/sec (-1%)
- NO_CLASS_MAP=1 (legacy): 85.06 M ops/sec
- MT performance: +21-28% vs system allocator
No crashes observed. All tests passed.
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-28 14:11:37 +09:00
6b86c60a20
P1.3: Add meta->active for TLS SLL tracking
...
Add active field to TinySlabMeta to track blocks currently held by
users (not in TLS SLL or freelist caches). This enables accurate
empty slab detection that accounts for TLS SLL cached blocks.
Changes:
- superslab_types.h: Add _Atomic uint16_t active field
- ss_allocation_box.c, hakmem_tiny_superslab.c: Initialize active=0
- tiny_free_fast_v2.inc.h: Decrement active on TLS SLL push
- tiny_alloc_fast.inc.h: Add tiny_active_track_alloc() helper,
increment active on TLS SLL pop (all code paths)
- ss_hot_cold_box.h: ss_is_slab_empty() uses active when enabled
All tracking is ENV-gated: HAKMEM_TINY_ACTIVE_TRACK=1 to enable.
Default is off for zero performance impact.
Invariant: active = used - tls_cached (active <= used)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-28 13:53:45 +09:00
6afaa5703a
Phase 12-1.1: EMPTY Slab Detection + Immediate Reuse (+13% improvement, 10.2M→11.5M ops/s)
...
Implementation of Task-sensei Priority 1 recommendation: Add empty_mask to SuperSlab
for immediate EMPTY slab detection and reuse, reducing Stage 3 (mmap) overhead.
## Changes
### 1. SuperSlab Structure (core/superslab/superslab_types.h)
- Added `empty_mask` (uint32_t): Bitmap for EMPTY slabs (used==0)
- Added `empty_count` (uint8_t): Quick check for EMPTY slab availability
### 2. EMPTY Detection API (core/box/ss_hot_cold_box.h)
- Added `ss_is_slab_empty()`: Returns true if slab is completely EMPTY
- Added `ss_mark_slab_empty()`: Marks slab as EMPTY (highest reuse priority)
- Added `ss_clear_slab_empty()`: Removes EMPTY state when reactivated
- Updated `ss_update_hot_cold_indices()`: Classify EMPTY/Hot/Cold slabs
- Updated `ss_init_hot_cold()`: Initialize empty_mask/empty_count
### 3. Free Path Integration (core/box/free_local_box.c)
- After `meta->used--`, check if `meta->used == 0`
- If true, call `ss_mark_slab_empty()` to update empty_mask
- Enables immediate EMPTY detection on every free operation
### 4. Shared Pool Stage 0.5 (core/hakmem_shared_pool.c)
- New Stage 0.5 before Stage 1: Scan existing SuperSlabs for EMPTY slabs
- Iterate over `g_super_reg_by_class[class_idx][]` (first 16 entries)
- Check `ss->empty_count > 0` → scan `empty_mask` with `__builtin_ctz()`
- Reuse EMPTY slab directly, avoiding Stage 3 (mmap/lock overhead)
- ENV control: `HAKMEM_SS_EMPTY_REUSE=1` (default OFF for A/B testing)
- ENV tunable: `HAKMEM_SS_EMPTY_SCAN_LIMIT=N` (default 16 SuperSlabs)
## Performance Results
```
Benchmark: Random Mixed 256B (100K iterations)
OFF (default): 10.2M ops/s (baseline)
ON (ENV=1): 11.5M ops/s (+13.0% improvement) ✅
```
## Expected Impact (from Task-sensei analysis)
**Current bottleneck**:
- Stage 1: 2-5% hit rate (free list broken)
- Stage 2: 3-8% hit rate (rare UNUSED)
- Stage 3: 87-95% hit rate (lock + mmap overhead) ← bottleneck
**Expected with Phase 12-1.1**:
- Stage 0.5: 20-40% hit rate (EMPTY scan)
- Stage 1-2: 20-30% hit rate (combined)
- Stage 3: 30-50% hit rate (significantly reduced)
**Theoretical max**: 25M → 55-70M ops/s (+120-180%)
## Current Gap Analysis
**Observed**: 11.5M ops/s (+13%)
**Expected**: 55-70M ops/s (+120-180%)
**Gap**: Performance regression or missing complementary optimizations
Possible causes:
1. Phase 3d-C (25.1M→10.2M) regression - unrelated to this change
2. EMPTY scan overhead (16 SuperSlabs × empty_count check)
3. Missing Priority 2-5 optimizations (Lazy SS deallocation, etc.)
4. Stage 0.5 too conservative (scan_limit=16, should be higher?)
## Usage
```bash
# Enable EMPTY reuse optimization
export HAKMEM_SS_EMPTY_REUSE=1
# Optional: increase scan limit (trade-off: throughput vs latency)
export HAKMEM_SS_EMPTY_SCAN_LIMIT=32
./bench_random_mixed_hakmem 100000 256 42
```
## Next Steps
**Priority 1-A**: Investigate Phase 3d-C→12-1.1 regression (25.1M→10.2M)
**Priority 1-B**: Implement Phase 12-1.2 (Lazy SS deallocation) for complementary effect
**Priority 1-C**: Profile Stage 0.5 overhead (scan_limit tuning)
## Files Modified
Core implementation:
- `core/superslab/superslab_types.h` - empty_mask/empty_count fields
- `core/box/ss_hot_cold_box.h` - EMPTY detection/marking API
- `core/box/free_local_box.c` - Free path EMPTY detection
- `core/hakmem_shared_pool.c` - Stage 0.5 EMPTY scan
Documentation:
- `CURRENT_TASK.md` - Task-sensei investigation report
---
🎯 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
Co-Authored-By: Task-sensei (investigation & design analysis)
2025-11-21 04:56:48 +09:00