# Atomic Freelist Quick Start Guide ## TL;DR **Problem**: 589 freelist access sites? → **Actual: 90 sites** (much better!) **Solution**: Hybrid approach - lock-free CAS for hot paths, relaxed atomics for cold paths **Effort**: 5-8 hours (3 phases) **Risk**: Low (incremental, easy rollback) **Impact**: -2-3% single-threaded, +MT stability --- ## Step-by-Step Implementation ### Step 1: Read Documentation (15 min) 1. **Strategy**: `ATOMIC_FREELIST_IMPLEMENTATION_STRATEGY.md` - Accessor function design - Memory ordering rationale - Performance projections 2. **Site Guide**: `ATOMIC_FREELIST_SITE_BY_SITE_GUIDE.md` - File-by-file conversion instructions - Common pitfalls - Testing checklist 3. **Analysis**: Run `scripts/analyze_freelist_sites.sh` - Validates site counts - Shows operation breakdown - Estimates effort --- ### Step 2: Create Accessor Header (30 min) ```bash # Copy template to working file cp core/box/slab_freelist_atomic.h.TEMPLATE core/box/slab_freelist_atomic.h # Add include to tiny_next_ptr_box.h echo '#include "tiny_next_ptr_box.h"' >> core/box/slab_freelist_atomic.h # Verify compile make clean make bench_random_mixed_hakmem 2>&1 | grep -i error ``` **Expected**: Clean compile (no errors) --- ### Step 3: Phase 1 - Hot Paths (2-3 hours) #### 3.1 Convert NULL Checks (30 min) **Pattern**: `if (meta->freelist)` → `if (slab_freelist_is_nonempty(meta))` **Files**: - `core/tiny_superslab_alloc.inc.h` (4 sites) - `core/hakmem_tiny_refill_p0.inc.h` (1 site) - `core/box/carve_push_box.c` (2 sites) - `core/hakmem_tiny_tls_ops.h` (2 sites) **Commands**: ```bash # Add include at top of each file # For tiny_superslab_alloc.inc.h: sed -i '1i#include "box/slab_freelist_atomic.h"' core/tiny_superslab_alloc.inc.h # Replace NULL checks (review carefully!) # Do this manually - automated sed is too risky ``` --- #### 3.2 Convert POP Operations (1 hour) **Pattern**: ```c // BEFORE: void* block = meta->freelist; meta->freelist = tiny_next_read(class_idx, block); // AFTER: void* block = slab_freelist_pop_lockfree(meta, class_idx); if (!block) goto fallback; // Handle race ``` **Files**: - `core/tiny_superslab_alloc.inc.h:117-145` (1 critical site) - `core/box/carve_push_box.c:173-174` (1 site) - `core/hakmem_tiny_tls_ops.h:83-85` (1 site) **Testing after each file**: ```bash make bench_random_mixed_hakmem ./out/release/bench_random_mixed_hakmem 10000 256 42 ``` --- #### 3.3 Convert PUSH Operations (1 hour) **Pattern**: ```c // BEFORE: tiny_next_write(class_idx, node, meta->freelist); meta->freelist = node; // AFTER: slab_freelist_push_lockfree(meta, class_idx, node); ``` **Files**: - `core/box/carve_push_box.c` (6 sites - rollback paths) **Testing**: ```bash make bench_random_mixed_hakmem ./out/release/bench_random_mixed_hakmem 100000 256 42 ``` --- #### 3.4 Phase 1 Final Test (30 min) ```bash # Single-threaded baseline ./out/release/bench_random_mixed_hakmem 10000000 256 42 # Record ops/s (expect: 24.4-24.8M, vs 25.1M baseline) # Multi-threaded stability make larson_hakmem ./out/release/larson_hakmem 8 100000 256 # Expect: No crashes, ~18-20M ops/s # Race detection ./build.sh tsan larson_hakmem ./out/tsan/larson_hakmem 4 10000 256 # Expect: No TSan warnings ``` **Success Criteria**: - ✅ Single-threaded regression <5% (24.0M+ ops/s) - ✅ Larson 8T stable (no crashes) - ✅ No TSan warnings - ✅ Clean build **If failed**: Rollback and debug ```bash git diff > phase1.patch # Save work git checkout . # Revert # Review phase1.patch and fix issues ``` --- ### Step 4: Phase 2 - Warm Paths (2-3 hours) **Scope**: Convert remaining 40 sites in 10 files **Files** (in order of priority): 1. `core/tiny_refill_opt.h` (refill chain ops) 2. `core/tiny_free_magazine.inc.h` (magazine push) 3. `core/refill/ss_refill_fc.h` (FC refill) 4. `core/slab_handle.h` (slab handle ops) 5-10. Remaining files (see SITE_BY_SITE_GUIDE.md) **Testing** (after each file): ```bash make bench_random_mixed_hakmem ./out/release/bench_random_mixed_hakmem 100000 256 42 ``` **Phase 2 Final Test**: ```bash # All sizes for size in 128 256 512 1024; do ./out/release/bench_random_mixed_hakmem 1000000 $size 42 done # MT scaling for threads in 1 2 4 8 16; do ./out/release/larson_hakmem $threads 100000 256 done ``` --- ### Step 5: Phase 3 - Cleanup (1-2 hours) **Scope**: Convert/document remaining 25 sites #### 5.1 Debug/Stats Sites (30 min) **Pattern**: `meta->freelist` → `SLAB_FREELIST_DEBUG_PTR(meta)` **Files**: - `core/box/ss_stats_box.c` - `core/tiny_debug.h` - `core/tiny_remote.c` --- #### 5.2 Init/Cleanup Sites (30 min) **Pattern**: `meta->freelist = NULL` → `slab_freelist_store_relaxed(meta, NULL)` **Files**: - `core/hakmem_tiny_superslab.c` - `core/hakmem_smallmid_superslab.c` --- #### 5.3 Final Verification (30 min) ```bash # Full rebuild make clean && make all # Run all tests ./run_all_tests.sh # Check for remaining direct accesses grep -rn "meta->freelist" core/ --include="*.c" --include="*.h" | \ grep -v "slab_freelist_" | grep -v "SLAB_FREELIST_DEBUG_PTR" # Expect: 0 results (all converted or documented) ``` --- ## Common Pitfalls ### Pitfall 1: Double-Converting POP ```c // ❌ WRONG: slab_freelist_pop_lockfree already calls tiny_next_read! void* p = slab_freelist_pop_lockfree(meta, class_idx); void* next = tiny_next_read(class_idx, p); // ❌ BUG! // ✅ RIGHT: Use p directly void* p = slab_freelist_pop_lockfree(meta, class_idx); if (!p) goto fallback; use(p); // ✅ CORRECT ``` ### Pitfall 2: Forgetting Race Handling ```c // ❌ WRONG: Assuming pop always succeeds void* p = slab_freelist_pop_lockfree(meta, class_idx); use(p); // ❌ SEGV if p == NULL! // ✅ RIGHT: Always check for NULL void* p = slab_freelist_pop_lockfree(meta, class_idx); if (!p) goto fallback; // ✅ CORRECT use(p); ``` ### Pitfall 3: Including Header Before Dependencies ```c // ❌ WRONG: slab_freelist_atomic.h needs tiny_next_ptr_box.h #include "box/slab_freelist_atomic.h" // ❌ Compile error! #include "box/tiny_next_ptr_box.h" // ✅ RIGHT: Dependencies first #include "box/tiny_next_ptr_box.h" // ✅ CORRECT #include "box/slab_freelist_atomic.h" ``` --- ## Performance Expectations ### Single-Threaded | Metric | Before | After | Change | |--------|--------|-------|--------| | Random Mixed 256B | 25.1M ops/s | 24.4-24.8M ops/s | -1.2-2.8% | | Larson 1T | 2.76M ops/s | 2.68-2.73M ops/s | -1.1-2.9% | **Acceptable**: <5% regression (relaxed atomics have ~0% cost, CAS has 60-140% but rare) ### Multi-Threaded | Metric | Before | After | Change | |--------|--------|-------|--------| | Larson 8T | CRASH | ~18-20M ops/s | ✅ FIXED | | MT Scaling (8T) | 0% (crashes) | 70-80% | ✅ GAIN | **Expected**: Stability + MT scalability >> 2-3% single-threaded cost --- ## Rollback Plan If Phase 1 fails (>5% regression or instability): ```bash # Option 1: Revert to master git checkout master git branch -D atomic-freelist-phase1 # Option 2: Alternative approach (per-slab spinlock) # Add uint8_t lock field to TinySlabMeta (1 byte) # Use __sync_lock_test_and_set() for spinlock (5-10% overhead) # Guaranteed correctness, simpler implementation ``` --- ## Success Criteria ### Phase 1 - ✅ Larson 8T runs without crash (100K iterations) - ✅ Single-threaded regression <5% (24.0M+ ops/s) - ✅ No ASan/TSan warnings ### Phase 2 - ✅ All MT tests pass (1T, 2T, 4T, 8T, 16T) - ✅ Single-threaded regression <3% (24.4M+ ops/s) - ✅ MT scaling 70%+ (8T = 5.6x+ speedup) ### Phase 3 - ✅ All 90 sites converted or documented - ✅ Full test suite passes (100% pass rate) - ✅ Zero direct `meta->freelist` accesses (except in atomic.h) --- ## Time Budget | Phase | Description | Files | Sites | Time | |-------|-------------|-------|-------|------| | **Prep** | Read docs, setup | - | - | 15 min | | **Header** | Create accessor API | 1 | - | 30 min | | **Phase 1** | Hot paths (critical) | 5 | 25 | 2-3h | | **Phase 2** | Warm paths (important) | 10 | 40 | 2-3h | | **Phase 3** | Cold paths (cleanup) | 5 | 25 | 1-2h | | **Total** | | **21** | **90** | **6-9h** | **Realistic**: 6-9 hours with testing and debugging --- ## Next Steps 1. **Review strategy** (15 min) - `ATOMIC_FREELIST_IMPLEMENTATION_STRATEGY.md` - `ATOMIC_FREELIST_SITE_BY_SITE_GUIDE.md` 2. **Run analysis** (5 min) ```bash ./scripts/analyze_freelist_sites.sh ``` 3. **Create branch** (2 min) ```bash git checkout -b atomic-freelist-phase1 git stash # Save any uncommitted work ``` 4. **Create accessor header** (30 min) ```bash cp core/box/slab_freelist_atomic.h.TEMPLATE core/box/slab_freelist_atomic.h # Edit to add includes make bench_random_mixed_hakmem # Test compile ``` 5. **Start Phase 1** (2-3 hours) - Convert 5 files, ~25 sites - Test after each file - Final test with Larson 8T 6. **Evaluate results** - If pass: Continue to Phase 2 - If fail: Debug or rollback --- ## Support Documents - **ATOMIC_FREELIST_IMPLEMENTATION_STRATEGY.md** - Overall strategy, performance analysis - **ATOMIC_FREELIST_SITE_BY_SITE_GUIDE.md** - Detailed conversion instructions - **core/box/slab_freelist_atomic.h.TEMPLATE** - Accessor API implementation - **scripts/analyze_freelist_sites.sh** - Automated site analysis --- ## Questions? **Q: Why not just add a mutex to TinySlabMeta?** A: 40-byte overhead per slab, 10-20x performance hit. Lock-free CAS is 3-5x faster. **Q: Why not use a global lock?** A: Serializes all allocation, kills MT performance. Lock-free allows concurrency. **Q: Why 3 phases instead of all at once?** A: Risk management. Phase 1 fixes Larson crash (2-3h), can stop there if needed. **Q: What if performance regression is >5%?** A: Rollback to master, review strategy. Consider spinlock alternative (5-10% overhead, simpler). **Q: Can I skip Phase 3?** A: Yes, but you'll have ~25 sites with direct access (debug/stats). Document them clearly. --- ## Recommendation **Start with Phase 1 (2-3 hours)** and evaluate results: - If Larson 8T stable + regression <5%: ✅ Continue to Phase 2 - If unstable or regression >5%: ❌ Rollback and review **Best case**: 6-9 hours for full MT safety with <3% regression **Worst case**: 2-3 hours to prove feasibility, then rollback if needed **Risk**: Low (incremental, easy rollback, well-documented) **Benefit**: High (MT stability, scalability, future-proof architecture)