540 lines
16 KiB
Markdown
540 lines
16 KiB
Markdown
|
|
# Atomic Freelist Implementation Strategy
|
||
|
|
|
||
|
|
## Executive Summary
|
||
|
|
|
||
|
|
**Good News**: Only **90 freelist access sites** (not 589), making full conversion feasible in 4-6 hours.
|
||
|
|
|
||
|
|
**Recommendation**: **Hybrid Approach** - Convert hot paths to lock-free atomic operations, use relaxed ordering for cold paths, skip debug/stats sites entirely.
|
||
|
|
|
||
|
|
**Expected Performance Impact**: <3% regression for atomic operations in hot paths.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 1. Accessor Function Design
|
||
|
|
|
||
|
|
### Core API (in `core/box/slab_freelist_atomic.h`)
|
||
|
|
|
||
|
|
```c
|
||
|
|
#ifndef SLAB_FREELIST_ATOMIC_H
|
||
|
|
#define SLAB_FREELIST_ATOMIC_H
|
||
|
|
|
||
|
|
#include <stdatomic.h>
|
||
|
|
#include "../superslab/superslab_types.h"
|
||
|
|
|
||
|
|
// ============================================================================
|
||
|
|
// HOT PATH: Lock-Free Operations (use CAS for push/pop)
|
||
|
|
// ============================================================================
|
||
|
|
|
||
|
|
// Atomic POP (lock-free, for refill hot path)
|
||
|
|
// Returns NULL if freelist empty
|
||
|
|
static inline void* slab_freelist_pop_lockfree(TinySlabMeta* meta, int class_idx) {
|
||
|
|
void* head = atomic_load_explicit(&meta->freelist, memory_order_acquire);
|
||
|
|
if (!head) return NULL;
|
||
|
|
|
||
|
|
void* next = tiny_next_read(class_idx, head);
|
||
|
|
while (!atomic_compare_exchange_weak_explicit(
|
||
|
|
&meta->freelist,
|
||
|
|
&head, // Expected value (updated on failure)
|
||
|
|
next, // Desired value
|
||
|
|
memory_order_release, // Success ordering
|
||
|
|
memory_order_acquire // Failure ordering (reload head)
|
||
|
|
)) {
|
||
|
|
// CAS failed (another thread modified freelist)
|
||
|
|
if (!head) return NULL; // List became empty
|
||
|
|
next = tiny_next_read(class_idx, head); // Reload next pointer
|
||
|
|
}
|
||
|
|
return head;
|
||
|
|
}
|
||
|
|
|
||
|
|
// Atomic PUSH (lock-free, for free hot path)
|
||
|
|
static inline void slab_freelist_push_lockfree(TinySlabMeta* meta, int class_idx, void* node) {
|
||
|
|
void* head = atomic_load_explicit(&meta->freelist, memory_order_relaxed);
|
||
|
|
do {
|
||
|
|
tiny_next_write(class_idx, node, head); // Link node->next = head
|
||
|
|
} while (!atomic_compare_exchange_weak_explicit(
|
||
|
|
&meta->freelist,
|
||
|
|
&head, // Expected value (updated on failure)
|
||
|
|
node, // Desired value
|
||
|
|
memory_order_release, // Success ordering
|
||
|
|
memory_order_relaxed // Failure ordering
|
||
|
|
));
|
||
|
|
}
|
||
|
|
|
||
|
|
// ============================================================================
|
||
|
|
// WARM PATH: Relaxed Load/Store (single-threaded or low contention)
|
||
|
|
// ============================================================================
|
||
|
|
|
||
|
|
// Simple load (relaxed ordering for checks/prefetch)
|
||
|
|
static inline void* slab_freelist_load_relaxed(TinySlabMeta* meta) {
|
||
|
|
return atomic_load_explicit(&meta->freelist, memory_order_relaxed);
|
||
|
|
}
|
||
|
|
|
||
|
|
// Simple store (relaxed ordering for init/cleanup)
|
||
|
|
static inline void slab_freelist_store_relaxed(TinySlabMeta* meta, void* value) {
|
||
|
|
atomic_store_explicit(&meta->freelist, value, memory_order_relaxed);
|
||
|
|
}
|
||
|
|
|
||
|
|
// NULL check (relaxed ordering)
|
||
|
|
static inline bool slab_freelist_is_empty(TinySlabMeta* meta) {
|
||
|
|
return atomic_load_explicit(&meta->freelist, memory_order_relaxed) == NULL;
|
||
|
|
}
|
||
|
|
|
||
|
|
static inline bool slab_freelist_is_nonempty(TinySlabMeta* meta) {
|
||
|
|
return atomic_load_explicit(&meta->freelist, memory_order_relaxed) != NULL;
|
||
|
|
}
|
||
|
|
|
||
|
|
// ============================================================================
|
||
|
|
// COLD PATH: Direct Access (for debug/stats - already atomic type)
|
||
|
|
// ============================================================================
|
||
|
|
|
||
|
|
// For printf/debugging: cast to void* for printing
|
||
|
|
#define SLAB_FREELIST_DEBUG_PTR(meta) \
|
||
|
|
((void*)atomic_load_explicit(&(meta)->freelist, memory_order_relaxed))
|
||
|
|
|
||
|
|
#endif // SLAB_FREELIST_ATOMIC_H
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 2. Critical Site List (Top 20 - MUST Convert)
|
||
|
|
|
||
|
|
### Tier 1: Ultra-Hot Paths (5-10 ops/allocation)
|
||
|
|
|
||
|
|
1. **`core/tiny_superslab_alloc.inc.h:118-145`** - Fast alloc freelist pop
|
||
|
|
2. **`core/hakmem_tiny_refill_p0.inc.h:252-253`** - P0 batch refill check
|
||
|
|
3. **`core/box/carve_push_box.c:33-34, 120-121, 128-129`** - Carve rollback push
|
||
|
|
4. **`core/hakmem_tiny_tls_ops.h:77-85`** - TLS freelist drain
|
||
|
|
|
||
|
|
### Tier 2: Hot Paths (1-2 ops/allocation)
|
||
|
|
|
||
|
|
5. **`core/tiny_refill_opt.h:199-230`** - Refill chain pop
|
||
|
|
6. **`core/tiny_free_magazine.inc.h:135-136`** - Magazine free push
|
||
|
|
7. **`core/box/carve_push_box.c:172-180`** - Freelist carve with push
|
||
|
|
|
||
|
|
### Tier 3: Warm Paths (0.1-1 ops/allocation)
|
||
|
|
|
||
|
|
8. **`core/refill/ss_refill_fc.h:151-153`** - FC refill pop
|
||
|
|
9. **`core/hakmem_tiny_tls_ops.h:203`** - TLS freelist init
|
||
|
|
10. **`core/slab_handle.h:211, 259, 308`** - Slab handle ops
|
||
|
|
|
||
|
|
**Total Critical Sites**: ~40-50 (out of 90 total)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 3. Non-Critical Site Strategy
|
||
|
|
|
||
|
|
### Skip Entirely (10-15 sites)
|
||
|
|
|
||
|
|
- **Debug/Stats**: `core/box/ss_stats_box.c:79`, `core/tiny_debug.h:48`
|
||
|
|
- **Reason**: Already atomic type, simple load for printing is fine
|
||
|
|
- **Action**: Change `meta->freelist` → `SLAB_FREELIST_DEBUG_PTR(meta)`
|
||
|
|
|
||
|
|
- **Initialization** (already protected by single-threaded setup):
|
||
|
|
- `core/box/ss_allocation_box.c:66` - Initial freelist setup
|
||
|
|
- `core/hakmem_tiny_superslab.c` - SuperSlab init
|
||
|
|
|
||
|
|
### Use Relaxed Load/Store (20-30 sites)
|
||
|
|
|
||
|
|
- **Condition checks**: `if (meta->freelist)` → `if (slab_freelist_is_nonempty(meta))`
|
||
|
|
- **Prefetch**: `__builtin_prefetch(&meta->freelist, 0, 3)` → keep as-is (atomic type is fine)
|
||
|
|
- **Init/cleanup**: `meta->freelist = NULL` → `slab_freelist_store_relaxed(meta, NULL)`
|
||
|
|
|
||
|
|
### Convert to Lock-Free (10-20 sites)
|
||
|
|
|
||
|
|
- **All POP operations** in hot paths
|
||
|
|
- **All PUSH operations** in free paths
|
||
|
|
- **Carve rollback** operations
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 4. Phased Implementation Plan
|
||
|
|
|
||
|
|
### Phase 1: Hot Paths Only (2-3 hours) 🔥
|
||
|
|
|
||
|
|
**Goal**: Fix Larson 8T crash with minimal changes
|
||
|
|
|
||
|
|
**Files to modify** (5 files, ~25 sites):
|
||
|
|
1. `core/tiny_superslab_alloc.inc.h` (fast alloc pop)
|
||
|
|
2. `core/hakmem_tiny_refill_p0.inc.h` (P0 batch refill)
|
||
|
|
3. `core/box/carve_push_box.c` (carve/rollback push)
|
||
|
|
4. `core/hakmem_tiny_tls_ops.h` (TLS drain)
|
||
|
|
5. Create `core/box/slab_freelist_atomic.h` (accessor API)
|
||
|
|
|
||
|
|
**Testing**:
|
||
|
|
```bash
|
||
|
|
./build.sh bench_random_mixed_hakmem
|
||
|
|
./out/release/bench_random_mixed_hakmem 10000000 256 42 # Single-threaded baseline
|
||
|
|
./build.sh larson_hakmem
|
||
|
|
./out/release/larson_hakmem 8 100000 256 # 8 threads (expect no crash)
|
||
|
|
```
|
||
|
|
|
||
|
|
**Expected Result**: Larson 8T stable, <5% regression on single-threaded
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### Phase 2: All TLS Paths (2-3 hours) ⚡
|
||
|
|
|
||
|
|
**Goal**: Full MT safety for all allocation paths
|
||
|
|
|
||
|
|
**Files to modify** (10 files, ~40 sites):
|
||
|
|
- All files from Phase 1 (complete conversion)
|
||
|
|
- `core/tiny_refill_opt.h` (refill chain ops)
|
||
|
|
- `core/tiny_free_magazine.inc.h` (magazine push)
|
||
|
|
- `core/refill/ss_refill_fc.h` (FC refill)
|
||
|
|
- `core/slab_handle.h` (slab handle ops)
|
||
|
|
|
||
|
|
**Testing**:
|
||
|
|
```bash
|
||
|
|
./build.sh bench_random_mixed_hakmem
|
||
|
|
./out/release/bench_random_mixed_hakmem 10000000 256 42 # Baseline check
|
||
|
|
./build.sh stress_test_mt_hakmem
|
||
|
|
./out/release/stress_test_mt_hakmem 16 100000 # 16 threads stress test
|
||
|
|
```
|
||
|
|
|
||
|
|
**Expected Result**: All MT tests pass, <3% regression
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### Phase 3: Cleanup (1-2 hours) 🧹
|
||
|
|
|
||
|
|
**Goal**: Convert/document remaining sites
|
||
|
|
|
||
|
|
**Files to modify** (5 files, ~25 sites):
|
||
|
|
- Debug/stats sites: Add `SLAB_FREELIST_DEBUG_PTR()` macro
|
||
|
|
- Init/cleanup sites: Use `slab_freelist_store_relaxed()`
|
||
|
|
- Add comments explaining MT safety assumptions
|
||
|
|
|
||
|
|
**Testing**:
|
||
|
|
```bash
|
||
|
|
make clean && make all # Full rebuild
|
||
|
|
./run_all_tests.sh # Comprehensive test suite
|
||
|
|
```
|
||
|
|
|
||
|
|
**Expected Result**: Clean build, all tests pass
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 5. Automated Conversion Script
|
||
|
|
|
||
|
|
### Semi-Automated Sed Script
|
||
|
|
|
||
|
|
```bash
|
||
|
|
#!/bin/bash
|
||
|
|
# atomic_freelist_convert.sh - Phase 1 conversion helper
|
||
|
|
|
||
|
|
set -e
|
||
|
|
|
||
|
|
# Backup
|
||
|
|
git stash
|
||
|
|
git checkout -b atomic-freelist-phase1
|
||
|
|
|
||
|
|
# Step 1: Convert NULL checks (read-only, safe)
|
||
|
|
find core -name "*.c" -o -name "*.h" | xargs sed -i \
|
||
|
|
's/if (\([^)]*\)meta->freelist)/if (slab_freelist_is_nonempty(\1meta))/g'
|
||
|
|
|
||
|
|
# Step 2: Convert condition checks in while loops
|
||
|
|
find core -name "*.c" -o -name "*.h" | xargs sed -i \
|
||
|
|
's/while (\([^)]*\)meta->freelist)/while (slab_freelist_is_nonempty(\1meta))/g'
|
||
|
|
|
||
|
|
# Step 3: Show remaining manual conversions needed
|
||
|
|
echo "=== REMAINING MANUAL CONVERSIONS ==="
|
||
|
|
grep -rn "meta->freelist" core/ --include="*.c" --include="*.h" | \
|
||
|
|
grep -v "slab_freelist_" | wc -l
|
||
|
|
|
||
|
|
echo "Review changes:"
|
||
|
|
git diff --stat
|
||
|
|
echo ""
|
||
|
|
echo "If good: git commit -am 'Phase 1: Convert freelist NULL checks'"
|
||
|
|
echo "If bad: git checkout . && git checkout master"
|
||
|
|
```
|
||
|
|
|
||
|
|
**Limitations**:
|
||
|
|
- Cannot auto-convert POP operations (need CAS loop)
|
||
|
|
- Cannot auto-convert PUSH operations (need tiny_next_write + CAS)
|
||
|
|
- Manual review required for all changes
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 6. Performance Projection
|
||
|
|
|
||
|
|
### Single-Threaded Impact
|
||
|
|
|
||
|
|
| Operation | Before | After (Relaxed) | After (CAS) | Overhead |
|
||
|
|
|-----------|--------|-----------------|-------------|----------|
|
||
|
|
| Load | 1 cycle | 1 cycle | 1 cycle | 0% |
|
||
|
|
| Store | 1 cycle | 1 cycle | - | 0% |
|
||
|
|
| POP (freelist) | 3-5 cycles | - | 8-12 cycles | +60-140% |
|
||
|
|
| PUSH (freelist) | 3-5 cycles | - | 8-12 cycles | +60-140% |
|
||
|
|
|
||
|
|
**Expected Regression**:
|
||
|
|
- Best case: 0-1% (mostly relaxed loads)
|
||
|
|
- Worst case: 3-5% (CAS overhead in hot paths)
|
||
|
|
- Realistic: 2-3% (good branch prediction, low contention)
|
||
|
|
|
||
|
|
**Mitigation**: Lock-free CAS is still faster than mutex (20-30 cycles)
|
||
|
|
|
||
|
|
### Multi-Threaded Impact
|
||
|
|
|
||
|
|
| Metric | Before (Non-Atomic) | After (Atomic) | Change |
|
||
|
|
|--------|---------------------|----------------|--------|
|
||
|
|
| Larson 8T | CRASH | Stable | ✅ FIXED |
|
||
|
|
| Throughput (1T) | 25.1M ops/s | 24.4-24.8M ops/s | -1.2-2.8% |
|
||
|
|
| Throughput (8T) | CRASH | ~18-20M ops/s | ✅ NEW |
|
||
|
|
| Scalability | 0% (crashes) | 70-80% | ✅ GAIN |
|
||
|
|
|
||
|
|
**Expected Benefit**: Stability + MT scalability >> 2-3% single-threaded cost
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 7. Implementation Example (Phase 1)
|
||
|
|
|
||
|
|
### Before: `core/tiny_superslab_alloc.inc.h:117-145`
|
||
|
|
|
||
|
|
```c
|
||
|
|
if (__builtin_expect(meta->freelist != NULL, 0)) {
|
||
|
|
void* block = meta->freelist;
|
||
|
|
if (meta->class_idx != class_idx) {
|
||
|
|
meta->freelist = NULL;
|
||
|
|
goto bump_path;
|
||
|
|
}
|
||
|
|
// ... pop logic ...
|
||
|
|
meta->freelist = tiny_next_read(meta->class_idx, block);
|
||
|
|
return (void*)((uint8_t*)block + 1);
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### After: `core/tiny_superslab_alloc.inc.h:117-145`
|
||
|
|
|
||
|
|
```c
|
||
|
|
if (__builtin_expect(slab_freelist_is_nonempty(meta), 0)) {
|
||
|
|
void* block = slab_freelist_pop_lockfree(meta, class_idx);
|
||
|
|
if (!block) {
|
||
|
|
// Another thread won the race, fall through to bump path
|
||
|
|
goto bump_path;
|
||
|
|
}
|
||
|
|
if (meta->class_idx != class_idx) {
|
||
|
|
// Wrong class, return to freelist and go to bump path
|
||
|
|
slab_freelist_push_lockfree(meta, class_idx, block);
|
||
|
|
goto bump_path;
|
||
|
|
}
|
||
|
|
return (void*)((uint8_t*)block + 1);
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Changes**:
|
||
|
|
- NULL check → `slab_freelist_is_nonempty()`
|
||
|
|
- Manual pop → `slab_freelist_pop_lockfree()`
|
||
|
|
- Handle CAS race (block == NULL case)
|
||
|
|
- Simpler logic (CAS handles next pointer atomically)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 8. Risk Assessment
|
||
|
|
|
||
|
|
### Low Risk ✅
|
||
|
|
|
||
|
|
- **Phase 1**: Only 5 files, ~25 sites, well-tested patterns
|
||
|
|
- **Rollback**: Easy (`git checkout master`)
|
||
|
|
- **Testing**: Can A/B test with env variable
|
||
|
|
|
||
|
|
### Medium Risk ⚠️
|
||
|
|
|
||
|
|
- **Performance**: 2-3% regression possible
|
||
|
|
- **Subtle bugs**: CAS retry loops need careful review
|
||
|
|
- **ABA problem**: mitigated by pointer tagging (already in codebase)
|
||
|
|
|
||
|
|
### High Risk ❌
|
||
|
|
|
||
|
|
- **None**: Atomic type already declared, no ABI changes
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 9. Alternative Approaches (Considered)
|
||
|
|
|
||
|
|
### Option A: Mutex per Slab (rejected)
|
||
|
|
|
||
|
|
**Pros**: Simple, guaranteed correctness
|
||
|
|
**Cons**: 40-byte overhead per slab, 10-20x performance hit
|
||
|
|
|
||
|
|
### Option B: Global Lock (rejected)
|
||
|
|
|
||
|
|
**Pros**: Zero code changes, 1-line fix
|
||
|
|
**Cons**: Serializes all allocation, kills MT performance
|
||
|
|
|
||
|
|
### Option C: TLS-Only (rejected)
|
||
|
|
|
||
|
|
**Pros**: No atomics needed
|
||
|
|
**Cons**: Cannot handle remote free (required for MT)
|
||
|
|
|
||
|
|
### Option D: Hybrid (SELECTED) ✅
|
||
|
|
|
||
|
|
**Pros**: Best performance, incremental implementation
|
||
|
|
**Cons**: More complex, requires careful memory ordering
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 10. Memory Ordering Rationale
|
||
|
|
|
||
|
|
### Relaxed (`memory_order_relaxed`)
|
||
|
|
|
||
|
|
**Use case**: Single-threaded or benign races (e.g., stats)
|
||
|
|
**Cost**: 0 cycles (no fence)
|
||
|
|
**Example**: `if (meta->freelist)` - checking emptiness
|
||
|
|
|
||
|
|
### Acquire (`memory_order_acquire`)
|
||
|
|
|
||
|
|
**Use case**: Loading pointer before dereferencing
|
||
|
|
**Cost**: 1-2 cycles (read fence on some architectures)
|
||
|
|
**Example**: POP freelist head before reading `next` pointer
|
||
|
|
|
||
|
|
### Release (`memory_order_release`)
|
||
|
|
|
||
|
|
**Use case**: Publishing pointer after setup
|
||
|
|
**Cost**: 1-2 cycles (write fence on some architectures)
|
||
|
|
**Example**: PUSH node to freelist after writing `next` pointer
|
||
|
|
|
||
|
|
### AcqRel (`memory_order_acq_rel`)
|
||
|
|
|
||
|
|
**Use case**: CAS success path (acquire+release)
|
||
|
|
**Cost**: 2-4 cycles (full fence on some architectures)
|
||
|
|
**Example**: Not used (separate acquire/release in CAS)
|
||
|
|
|
||
|
|
### SeqCst (`memory_order_seq_cst`)
|
||
|
|
|
||
|
|
**Use case**: Total ordering required
|
||
|
|
**Cost**: 5-10 cycles (expensive fence)
|
||
|
|
**Example**: Not needed for freelist (per-slab ordering sufficient)
|
||
|
|
|
||
|
|
**Chosen**: Acquire/Release for CAS, Relaxed for checks (optimal trade-off)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 11. Testing Strategy
|
||
|
|
|
||
|
|
### Phase 1 Tests
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Baseline (before conversion)
|
||
|
|
./out/release/bench_random_mixed_hakmem 10000000 256 42
|
||
|
|
# Record: 25.1M ops/s
|
||
|
|
|
||
|
|
# After conversion (expect: 24.4-24.8M ops/s)
|
||
|
|
./out/release/bench_random_mixed_hakmem 10000000 256 42
|
||
|
|
|
||
|
|
# MT stability (expect: no crash)
|
||
|
|
./out/release/larson_hakmem 8 100000 256
|
||
|
|
|
||
|
|
# Correctness (expect: 0 errors)
|
||
|
|
./out/release/bench_fixed_size_hakmem 100000 256 128
|
||
|
|
./out/release/bench_fixed_size_hakmem 100000 1024 128
|
||
|
|
```
|
||
|
|
|
||
|
|
### Phase 2 Tests
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Stress test all sizes
|
||
|
|
for size in 128 256 512 1024; do
|
||
|
|
./out/release/bench_random_mixed_hakmem 1000000 $size 42
|
||
|
|
done
|
||
|
|
|
||
|
|
# MT scaling test
|
||
|
|
for threads in 1 2 4 8 16; do
|
||
|
|
./out/release/larson_hakmem $threads 100000 256
|
||
|
|
done
|
||
|
|
```
|
||
|
|
|
||
|
|
### Phase 3 Tests
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Full test suite
|
||
|
|
./run_all_tests.sh
|
||
|
|
|
||
|
|
# ASan build (detect races)
|
||
|
|
./build.sh asan bench_random_mixed_hakmem
|
||
|
|
./out/asan/bench_random_mixed_hakmem 100000 256 42
|
||
|
|
|
||
|
|
# TSan build (detect data races)
|
||
|
|
./build.sh tsan larson_hakmem
|
||
|
|
./out/tsan/larson_hakmem 8 10000 256
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 12. Success Criteria
|
||
|
|
|
||
|
|
### Phase 1 (Hot Paths)
|
||
|
|
|
||
|
|
- ✅ Larson 8T runs without crash (100K iterations)
|
||
|
|
- ✅ Single-threaded regression <5% (24.0M+ ops/s)
|
||
|
|
- ✅ No ASan/TSan warnings
|
||
|
|
- ✅ Clean build with no warnings
|
||
|
|
|
||
|
|
### Phase 2 (All Paths)
|
||
|
|
|
||
|
|
- ✅ All MT tests pass (1T, 2T, 4T, 8T, 16T)
|
||
|
|
- ✅ Single-threaded regression <3% (24.4M+ ops/s)
|
||
|
|
- ✅ MT scaling 70%+ (8T = 5.6x+ speedup)
|
||
|
|
- ✅ No memory leaks (Valgrind clean)
|
||
|
|
|
||
|
|
### Phase 3 (Complete)
|
||
|
|
|
||
|
|
- ✅ All 90 sites converted or documented
|
||
|
|
- ✅ Full test suite passes (100% pass rate)
|
||
|
|
- ✅ Code review approved
|
||
|
|
- ✅ Documentation updated
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 13. Rollback Plan
|
||
|
|
|
||
|
|
If Phase 1 fails (>5% regression or instability):
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Revert to master
|
||
|
|
git checkout master
|
||
|
|
git branch -D atomic-freelist-phase1
|
||
|
|
|
||
|
|
# Try alternative: Per-slab spinlock (medium overhead)
|
||
|
|
# Add uint8_t lock field to TinySlabMeta
|
||
|
|
# Use __sync_lock_test_and_set() for 1-byte spinlock
|
||
|
|
# Expected: 5-10% overhead, but guaranteed correctness
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 14. Next Steps
|
||
|
|
|
||
|
|
1. **Create accessor header** (`core/box/slab_freelist_atomic.h`) - 30 min
|
||
|
|
2. **Phase 1 conversion** (5 files, ~25 sites) - 2-3 hours
|
||
|
|
3. **Test Phase 1** (single + MT tests) - 1 hour
|
||
|
|
4. **If pass**: Continue to Phase 2
|
||
|
|
5. **If fail**: Review, fix, or rollback
|
||
|
|
|
||
|
|
**Estimated Total Time**: 4-6 hours for full implementation (all 3 phases)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 15. Code Review Checklist
|
||
|
|
|
||
|
|
Before merging:
|
||
|
|
|
||
|
|
- [ ] All CAS loops handle retry correctly
|
||
|
|
- [ ] Memory ordering documented for each site
|
||
|
|
- [ ] No direct `meta->freelist` access remains (except debug)
|
||
|
|
- [ ] All tests pass (single + MT)
|
||
|
|
- [ ] ASan/TSan clean
|
||
|
|
- [ ] Performance regression <3%
|
||
|
|
- [ ] Documentation updated (CLAUDE.md)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Summary
|
||
|
|
|
||
|
|
**Approach**: Hybrid - Lock-free CAS for hot paths, relaxed atomics for cold paths
|
||
|
|
**Effort**: 4-6 hours (3 phases)
|
||
|
|
**Risk**: Low (incremental, easy rollback)
|
||
|
|
**Performance**: -2-3% single-threaded, +MT stability and scalability
|
||
|
|
**Benefit**: Unlocks MT performance without sacrificing single-threaded speed
|
||
|
|
|
||
|
|
**Recommendation**: Proceed with Phase 1 (2-3 hours) and evaluate results before committing to full implementation.
|