# Atomic Freelist Implementation - Documentation Index ## Overview This directory contains comprehensive documentation and tooling for implementing atomic `TinySlabMeta.freelist` operations to enable multi-threaded safety in the HAKMEM memory allocator. **Status**: Ready for implementation **Estimated Effort**: 5-8 hours (3 phases) **Expected Impact**: -2-3% single-threaded, +MT stability and scalability --- ## Quick Start **New to this task?** Start here: 1. **Read**: `ATOMIC_FREELIST_QUICK_START.md` (15 min) 2. **Run**: `./scripts/analyze_freelist_sites.sh` (5 min) 3. **Create**: Accessor header from template (30 min) 4. **Begin**: Phase 1 conversion (2-3 hours) --- ## Documentation Files ### 1. Executive Summary **File**: `ATOMIC_FREELIST_SUMMARY.md` **Purpose**: High-level overview of the entire implementation **Contents**: - Investigation results (90 sites, not 589) - Implementation strategy (hybrid approach) - Performance analysis (2-3% regression expected) - Risk assessment (low risk, high benefit) - Timeline and success metrics **Read this first** for a complete picture. --- ### 2. Implementation Strategy **File**: `ATOMIC_FREELIST_IMPLEMENTATION_STRATEGY.md` **Purpose**: Detailed technical strategy and design decisions **Contents**: - Accessor function API design (lock-free CAS + relaxed atomics) - Critical site list (top 20 sites to convert) - Non-critical site strategy (skip or use relaxed) - Phased implementation plan (3 phases) - Performance projections (single/multi-threaded) - Memory ordering rationale (acquire/release/relaxed) - Alternative approaches (mutex, global lock, etc.) **Use this** when designing the accessor API and planning conversion phases. --- ### 3. Site-by-Site Conversion Guide **File**: `ATOMIC_FREELIST_SITE_BY_SITE_GUIDE.md` **Purpose**: Line-by-line conversion instructions for all 90 sites **Contents**: - Phase 1: 5 files, 25 sites (hot paths) - File 1: `core/box/slab_freelist_atomic.h` (CREATE) - File 2: `core/tiny_superslab_alloc.inc.h` (8 sites) - File 3: `core/hakmem_tiny_refill_p0.inc.h` (3 sites) - File 4: `core/box/carve_push_box.c` (10 sites) - File 5: `core/hakmem_tiny_tls_ops.h` (4 sites) - Phase 2: 10 files, 40 sites (warm paths) - Phase 3: 5 files, 25 sites (cold paths) - Common pitfalls (double-POP, missing NULL check, etc.) - Testing checklist per file - Quick reference card (conversion patterns) **Use this** during actual code conversion (your primary reference). --- ### 4. Quick Start Guide **File**: `ATOMIC_FREELIST_QUICK_START.md` **Purpose**: Step-by-step implementation instructions **Contents**: - Step 1: Read documentation (15 min) - Step 2: Create accessor header (30 min) - Step 3: Phase 1 conversion (2-3 hours) - Step 4: Phase 2 conversion (2-3 hours) - Step 5: Phase 3 cleanup (1-2 hours) - Common pitfalls and solutions - Performance expectations - Rollback plan - Success criteria **Use this** as your daily task list during implementation. --- ### 5. Accessor Header Template **File**: `core/box/slab_freelist_atomic.h.TEMPLATE` **Purpose**: Complete implementation of atomic accessor API **Contents**: - Lock-free CAS operations (`slab_freelist_pop_lockfree`, `slab_freelist_push_lockfree`) - Relaxed load/store operations (`slab_freelist_load_relaxed`, `slab_freelist_store_relaxed`) - NULL check helpers (`slab_freelist_is_empty`, `slab_freelist_is_nonempty`) - Debug macro (`SLAB_FREELIST_DEBUG_PTR`) - Extensive comments (80+ lines of documentation) - Conversion examples - Performance notes - Testing strategy **Copy this** to `core/box/slab_freelist_atomic.h` to get started. --- ## Tool Scripts ### 1. Site Analysis Script **File**: `scripts/analyze_freelist_sites.sh` **Purpose**: Analyze freelist access patterns in codebase **Output**: - Total site count (90 sites) - Operation breakdown (POP, PUSH, NULL checks, etc.) - Files with freelist usage (21 files) - Phase 1/2/3 file lists - Lock-protected sites check - Conversion effort estimates **Run this** before starting conversion to validate site counts. ```bash ./scripts/analyze_freelist_sites.sh ``` --- ### 2. Conversion Verification Script **File**: `scripts/verify_atomic_freelist_conversion.sh` **Purpose**: Track conversion progress and detect potential bugs **Output**: - Accessor header check (exists, functions defined) - Direct access count (remaining unconverted sites) - Converted operations count (by type) - Conversion progress (0-100%) - Phase 1/2/3 file check (which files converted) - Potential bug detection (double-POP, double-PUSH, missing NULL check) - Compile status - Recommendations for next steps **Run this** frequently during conversion to track progress and catch bugs early. ```bash ./scripts/verify_atomic_freelist_conversion.sh ``` **Example output**: ``` Progress: 30% (27/90 sites) [============----------------------------] Currently working on: Phase 1 (Critical Hot Paths) ✅ No double-POP bugs detected ✅ No double-PUSH bugs detected ✅ Compilation succeeded ``` --- ## Implementation Phases ### Phase 1: Critical Hot Paths (2-3 hours) **Goal**: Fix Larson 8T crash with minimal changes **Scope**: 5 files, 25 sites **Files**: - `core/box/slab_freelist_atomic.h` (CREATE) - `core/tiny_superslab_alloc.inc.h` - `core/hakmem_tiny_refill_p0.inc.h` - `core/box/carve_push_box.c` - `core/hakmem_tiny_tls_ops.h` **Success Criteria**: - ✅ Larson 8T stable (no crashes) - ✅ Regression <5% (>24.0M ops/s) - ✅ No TSan warnings --- ### Phase 2: Important Paths (2-3 hours) **Goal**: Full MT safety for all allocation paths **Scope**: 10 files, 40 sites **Files**: - `core/tiny_refill_opt.h` - `core/tiny_free_magazine.inc.h` - `core/refill/ss_refill_fc.h` - `core/slab_handle.h` - 6 additional files **Success Criteria**: - ✅ All MT tests pass (1T-16T) - ✅ Regression <3% (>24.4M ops/s) - ✅ MT scaling 70%+ --- ### Phase 3: Cleanup (1-2 hours) **Goal**: Convert/document remaining sites **Scope**: 5 files, 25 sites **Files**: - Debug/stats files - Init/cleanup files - Verification files **Success Criteria**: - ✅ All 90 sites converted or documented - ✅ Zero direct accesses (except atomic.h) - ✅ Full test suite passes --- ## Testing Strategy ### Per-File Testing After converting each file: ```bash make bench_random_mixed_hakmem ./out/release/bench_random_mixed_hakmem 10000 256 42 ``` ### Phase 1 Testing ```bash # Single-threaded baseline ./out/release/bench_random_mixed_hakmem 10000000 256 42 # Multi-threaded stability (PRIMARY TEST) ./out/release/larson_hakmem 8 100000 256 # Race detection ./build.sh tsan larson_hakmem ./out/tsan/larson_hakmem 4 10000 256 ``` ### Phase 2 Testing ```bash # All sizes for size in 128 256 512 1024; do ./out/release/bench_random_mixed_hakmem 1000000 $size 42 done # MT scaling for threads in 1 2 4 8 16; do ./out/release/larson_hakmem $threads 100000 256 done ``` ### Phase 3 Testing ```bash # Full test suite make clean && make all ./run_all_tests.sh # ASan check ./build.sh asan bench_random_mixed_hakmem ./out/asan/bench_random_mixed_hakmem 100000 256 42 ``` --- ## Performance Expectations ### Single-Threaded | Metric | Before | After | Change | |--------|--------|-------|--------| | Random Mixed 256B | 25.1M ops/s | 24.4-24.8M ops/s | -1.2-2.8% ✅ | | Larson 1T | 2.76M ops/s | 2.68-2.73M ops/s | -1.1-2.9% ✅ | **Acceptable**: <5% regression ### Multi-Threaded | Metric | Before | After | Change | |--------|--------|-------|--------| | Larson 8T | **CRASH** | ~18-20M ops/s | **FIXED** ✅ | | MT Scaling (8T) | 0% (crashes) | 70-80% | **NEW** ✅ | **Benefit**: Stability + MT scalability >> 2-3% single-threaded cost --- ## Common Patterns ### NULL Check Conversion ```c // BEFORE: if (meta->freelist) { ... } // AFTER: if (slab_freelist_is_nonempty(meta)) { ... } ``` ### POP Operation Conversion ```c // BEFORE: void* block = meta->freelist; meta->freelist = tiny_next_read(class_idx, block); // AFTER: void* block = slab_freelist_pop_lockfree(meta, class_idx); if (!block) goto fallback; // Handle race ``` ### PUSH Operation Conversion ```c // BEFORE: tiny_next_write(class_idx, node, meta->freelist); meta->freelist = node; // AFTER: slab_freelist_push_lockfree(meta, class_idx, node); ``` ### Initialization Conversion ```c // BEFORE: meta->freelist = NULL; // AFTER: slab_freelist_store_relaxed(meta, NULL); ``` ### Debug Print Conversion ```c // BEFORE: fprintf(stderr, "freelist=%p", meta->freelist); // AFTER: fprintf(stderr, "freelist=%p", SLAB_FREELIST_DEBUG_PTR(meta)); ``` --- ## Troubleshooting ### Issue: Compilation Fails ```bash # Check if accessor header exists ls -la core/box/slab_freelist_atomic.h # Check for missing includes grep -n "#include.*slab_freelist_atomic.h" core/tiny_superslab_alloc.inc.h # Rebuild from clean state make clean && make bench_random_mixed_hakmem ``` ### Issue: Larson 8T Still Crashes ```bash # Check conversion progress ./scripts/verify_atomic_freelist_conversion.sh # Run with TSan to detect data races ./build.sh tsan larson_hakmem ./out/tsan/larson_hakmem 4 10000 256 2>&1 | grep -A5 "WARNING" # Check for double-POP/PUSH bugs grep -A1 "slab_freelist_pop_lockfree" core/ -r | grep "tiny_next_read" grep -B1 "slab_freelist_push_lockfree" core/ -r | grep "tiny_next_write" ``` ### Issue: Performance Regression >5% ```bash # Verify baseline (before conversion) git stash git checkout master ./out/release/bench_random_mixed_hakmem 10000000 256 42 # Record: 25.1M ops/s # Check converted version git checkout atomic-freelist-phase1 ./out/release/bench_random_mixed_hakmem 10000000 256 42 # Should be: >24.0M ops/s # If regression >5%, profile hot paths perf record ./out/release/bench_random_mixed_hakmem 1000000 256 42 perf report # Look for CAS retry loops or excessive memory ordering ``` --- ## Rollback Procedures ### Quick Rollback (if Phase 1 fails) ```bash git stash git checkout master git branch -D atomic-freelist-phase1 # Review issues and retry ``` ### Alternative Approach (Spinlock) If lock-free proves too complex: ```c // Option: Use 1-byte spinlock instead // Add to TinySlabMeta: uint8_t freelist_lock; // Use __sync_lock_test_and_set() for lock/unlock // Expected overhead: 5-10% (vs 2-3% for lock-free) ``` --- ## Progress Tracking Use the verification script to track progress: ```bash ./scripts/verify_atomic_freelist_conversion.sh ``` **Output example**: ``` Progress: 30% (27/90 sites) [============----------------------------] Phase 1 files converted: 2/4 Remaining sites: 63 Currently working on: Phase 1 (Critical Hot Paths) Next step: Convert core/box/carve_push_box.c ``` --- ## Success Criteria ### Phase 1 Complete - [ ] 5 files converted (25 sites) - [ ] Larson 8T runs 100K iterations without crash - [ ] Single-threaded regression <5% - [ ] No TSan warnings - [ ] Verification script shows 30% progress ### Phase 2 Complete - [ ] 15 files converted (65 sites) - [ ] All MT tests pass (1T-16T) - [ ] Single-threaded regression <3% - [ ] MT scaling 70%+ - [ ] Verification script shows 72% progress ### Phase 3 Complete - [ ] 21 files converted (90 sites) - [ ] Zero direct `meta->freelist` accesses - [ ] Full test suite passes - [ ] Documentation updated (CLAUDE.md) - [ ] Verification script shows 100% progress --- ## File Checklist ### Documentation - [x] `ATOMIC_FREELIST_SUMMARY.md` - Executive summary - [x] `ATOMIC_FREELIST_IMPLEMENTATION_STRATEGY.md` - Technical strategy - [x] `ATOMIC_FREELIST_SITE_BY_SITE_GUIDE.md` - Conversion guide - [x] `ATOMIC_FREELIST_QUICK_START.md` - Quick start instructions - [x] `ATOMIC_FREELIST_INDEX.md` - This file ### Templates - [x] `core/box/slab_freelist_atomic.h.TEMPLATE` - Accessor API ### Tools - [x] `scripts/analyze_freelist_sites.sh` - Site analysis - [x] `scripts/verify_atomic_freelist_conversion.sh` - Progress tracker ### Implementation (to be created) - [ ] `core/box/slab_freelist_atomic.h` - Working accessor API --- ## Contact and Support If you encounter issues during implementation: 1. **Check documentation**: Review relevant guide for your current phase 2. **Run verification**: `./scripts/verify_atomic_freelist_conversion.sh` 3. **Review common pitfalls**: See `ATOMIC_FREELIST_SITE_BY_SITE_GUIDE.md` section 4. **Rollback if needed**: `git checkout master` --- ## Estimated Timeline | Milestone | Duration | Cumulative | |-----------|----------|------------| | **Preparation** | 15 min | 0.25h | | **Create accessor header** | 30 min | 0.75h | | **Phase 1 conversion** | 2-3h | 3-4h | | **Phase 1 testing** | 30 min | 3.5-4.5h | | **Phase 2 conversion** | 2-3h | 5.5-7.5h | | **Phase 2 testing** | 1h | 6.5-8.5h | | **Phase 3 conversion** | 1-2h | 7.5-10.5h | | **Phase 3 testing** | 1h | 8.5-11.5h | | **Total** | | **8.5-11.5h** | **Minimal viable**: 3.5-4.5 hours (Phase 1 only, fixes Larson crash) **Full implementation**: 8.5-11.5 hours (all 3 phases, complete MT safety) --- ## Next Steps **Ready to start?** 1. Read `ATOMIC_FREELIST_QUICK_START.md` (15 min) 2. Run `./scripts/analyze_freelist_sites.sh` (5 min) 3. Copy template: `cp core/box/slab_freelist_atomic.h.TEMPLATE core/box/slab_freelist_atomic.h` (5 min) 4. Edit template to add includes (20 min) 5. Test compile: `make bench_random_mixed_hakmem` (5 min) 6. Begin Phase 1 conversion using `ATOMIC_FREELIST_SITE_BY_SITE_GUIDE.md` (2-3 hours) **Good luck!** 🚀