hakmem/docs/archive/ATOMIC_FREELIST_QUICK_START.md

# Atomic Freelist Quick Start Guide

## TL;DR

**Problem**: 589 freelist access sites? → **Actual: 90 sites** (much better!)
**Solution**: Hybrid approach - lock-free CAS for hot paths, relaxed atomics for cold paths
**Effort**: 5-8 hours (3 phases)
**Risk**: Low (incremental, easy rollback)
**Impact**: -2-3% single-threaded, +MT stability

---

## Step-by-Step Implementation

### Step 1: Read Documentation (15 min)

1. **Strategy**: `ATOMIC_FREELIST_IMPLEMENTATION_STRATEGY.md`
   - Accessor function design
   - Memory ordering rationale
   - Performance projections

2. **Site Guide**: `ATOMIC_FREELIST_SITE_BY_SITE_GUIDE.md`
   - File-by-file conversion instructions
   - Common pitfalls
   - Testing checklist

3. **Analysis**: Run `scripts/analyze_freelist_sites.sh`
   - Validates site counts
   - Shows operation breakdown
   - Estimates effort

---

### Step 2: Create Accessor Header (30 min)

```bash
# Copy template to working file
cp core/box/slab_freelist_atomic.h.TEMPLATE core/box/slab_freelist_atomic.h

# Add include to tiny_next_ptr_box.h
echo '#include "tiny_next_ptr_box.h"' >> core/box/slab_freelist_atomic.h

# Verify compile
make clean
make bench_random_mixed_hakmem 2>&1 | grep -i error
```

**Expected**: Clean compile (no errors)

---

### Step 3: Phase 1 - Hot Paths (2-3 hours)

#### 3.1 Convert NULL Checks (30 min)

**Pattern**: `if (meta->freelist)` → `if (slab_freelist_is_nonempty(meta))`

**Files**:
- `core/tiny_superslab_alloc.inc.h` (4 sites)
- `core/hakmem_tiny_refill_p0.inc.h` (1 site)
- `core/box/carve_push_box.c` (2 sites)
- `core/hakmem_tiny_tls_ops.h` (2 sites)

**Commands**:
```bash
# Add include at top of each file
# For tiny_superslab_alloc.inc.h:
sed -i '1i#include "box/slab_freelist_atomic.h"' core/tiny_superslab_alloc.inc.h

# Replace NULL checks (review carefully!)
# Do this manually - automated sed is too risky
```

---

#### 3.2 Convert POP Operations (1 hour)

**Pattern**:
```c
// BEFORE:
void* block = meta->freelist;
meta->freelist = tiny_next_read(class_idx, block);

// AFTER:
void* block = slab_freelist_pop_lockfree(meta, class_idx);
if (!block) goto fallback;  // Handle race
```

**Files**:
- `core/tiny_superslab_alloc.inc.h:117-145` (1 critical site)
- `core/box/carve_push_box.c:173-174` (1 site)
- `core/hakmem_tiny_tls_ops.h:83-85` (1 site)

**Testing after each file**:
```bash
make bench_random_mixed_hakmem
./out/release/bench_random_mixed_hakmem 10000 256 42
```

---

#### 3.3 Convert PUSH Operations (1 hour)

**Pattern**:
```c
// BEFORE:
tiny_next_write(class_idx, node, meta->freelist);
meta->freelist = node;

// AFTER:
slab_freelist_push_lockfree(meta, class_idx, node);
```

**Files**:
- `core/box/carve_push_box.c` (6 sites - rollback paths)

**Testing**:
```bash
make bench_random_mixed_hakmem
./out/release/bench_random_mixed_hakmem 100000 256 42
```

---

#### 3.4 Phase 1 Final Test (30 min)

```bash
# Single-threaded baseline
./out/release/bench_random_mixed_hakmem 10000000 256 42
# Record ops/s (expect: 24.4-24.8M, vs 25.1M baseline)

# Multi-threaded stability
make larson_hakmem
./out/release/larson_hakmem 8 100000 256
# Expect: No crashes, ~18-20M ops/s

# Race detection
./build.sh tsan larson_hakmem
./out/tsan/larson_hakmem 4 10000 256
# Expect: No TSan warnings
```

**Success Criteria**:
- ✅ Single-threaded regression <5% (24.0M+ ops/s)
- ✅ Larson 8T stable (no crashes)
- ✅ No TSan warnings
- ✅ Clean build

**If failed**: Rollback and debug
```bash
git diff > phase1.patch  # Save work
git checkout .           # Revert
# Review phase1.patch and fix issues
```

---

### Step 4: Phase 2 - Warm Paths (2-3 hours)

**Scope**: Convert remaining 40 sites in 10 files

**Files** (in order of priority):
1. `core/tiny_refill_opt.h` (refill chain ops)
2. `core/tiny_free_magazine.inc.h` (magazine push)
3. `core/refill/ss_refill_fc.h` (FC refill)
4. `core/slab_handle.h` (slab handle ops)
5-10. Remaining files (see SITE_BY_SITE_GUIDE.md)

**Testing** (after each file):
```bash
make bench_random_mixed_hakmem
./out/release/bench_random_mixed_hakmem 100000 256 42
```

**Phase 2 Final Test**:
```bash
# All sizes
for size in 128 256 512 1024; do
    ./out/release/bench_random_mixed_hakmem 1000000 $size 42
done

# MT scaling
for threads in 1 2 4 8 16; do
    ./out/release/larson_hakmem $threads 100000 256
done
```

---

### Step 5: Phase 3 - Cleanup (1-2 hours)

**Scope**: Convert/document remaining 25 sites

#### 5.1 Debug/Stats Sites (30 min)

**Pattern**: `meta->freelist` → `SLAB_FREELIST_DEBUG_PTR(meta)`

**Files**:
- `core/box/ss_stats_box.c`
- `core/tiny_debug.h`
- `core/tiny_remote.c`

---

#### 5.2 Init/Cleanup Sites (30 min)

**Pattern**: `meta->freelist = NULL` → `slab_freelist_store_relaxed(meta, NULL)`

**Files**:
- `core/hakmem_tiny_superslab.c`
- `core/hakmem_smallmid_superslab.c`

---

#### 5.3 Final Verification (30 min)

```bash
# Full rebuild
make clean && make all

# Run all tests
./run_all_tests.sh

# Check for remaining direct accesses
grep -rn "meta->freelist" core/ --include="*.c" --include="*.h" | \
  grep -v "slab_freelist_" | grep -v "SLAB_FREELIST_DEBUG_PTR"
# Expect: 0 results (all converted or documented)
```

---

## Common Pitfalls

### Pitfall 1: Double-Converting POP
```c
// ❌ WRONG: slab_freelist_pop_lockfree already calls tiny_next_read!
void* p = slab_freelist_pop_lockfree(meta, class_idx);
void* next = tiny_next_read(class_idx, p);  // ❌ BUG!

// ✅ RIGHT: Use p directly
void* p = slab_freelist_pop_lockfree(meta, class_idx);
if (!p) goto fallback;
use(p);  // ✅ CORRECT
```

### Pitfall 2: Forgetting Race Handling
```c
// ❌ WRONG: Assuming pop always succeeds
void* p = slab_freelist_pop_lockfree(meta, class_idx);
use(p);  // ❌ SEGV if p == NULL!

// ✅ RIGHT: Always check for NULL
void* p = slab_freelist_pop_lockfree(meta, class_idx);
if (!p) goto fallback;  // ✅ CORRECT
use(p);
```

### Pitfall 3: Including Header Before Dependencies
```c
// ❌ WRONG: slab_freelist_atomic.h needs tiny_next_ptr_box.h
#include "box/slab_freelist_atomic.h"  // ❌ Compile error!
#include "box/tiny_next_ptr_box.h"

// ✅ RIGHT: Dependencies first
#include "box/tiny_next_ptr_box.h"  // ✅ CORRECT
#include "box/slab_freelist_atomic.h"
```

---

## Performance Expectations

### Single-Threaded

| Metric | Before | After | Change |
|--------|--------|-------|--------|
| Random Mixed 256B | 25.1M ops/s | 24.4-24.8M ops/s | -1.2-2.8% |
| Larson 1T | 2.76M ops/s | 2.68-2.73M ops/s | -1.1-2.9% |

**Acceptable**: <5% regression (relaxed atomics have ~0% cost, CAS has 60-140% but rare)

### Multi-Threaded

| Metric | Before | After | Change |
|--------|--------|-------|--------|
| Larson 8T | CRASH | ~18-20M ops/s | ✅ FIXED |
| MT Scaling (8T) | 0% (crashes) | 70-80% | ✅ GAIN |

**Expected**: Stability + MT scalability >> 2-3% single-threaded cost

---

## Rollback Plan

If Phase 1 fails (>5% regression or instability):

```bash
# Option 1: Revert to master
git checkout master
git branch -D atomic-freelist-phase1

# Option 2: Alternative approach (per-slab spinlock)
# Add uint8_t lock field to TinySlabMeta (1 byte)
# Use __sync_lock_test_and_set() for spinlock (5-10% overhead)
# Guaranteed correctness, simpler implementation
```

---

## Success Criteria

### Phase 1
- ✅ Larson 8T runs without crash (100K iterations)
- ✅ Single-threaded regression <5% (24.0M+ ops/s)
- ✅ No ASan/TSan warnings

### Phase 2
- ✅ All MT tests pass (1T, 2T, 4T, 8T, 16T)
- ✅ Single-threaded regression <3% (24.4M+ ops/s)
- ✅ MT scaling 70%+ (8T = 5.6x+ speedup)

### Phase 3
- ✅ All 90 sites converted or documented
- ✅ Full test suite passes (100% pass rate)
- ✅ Zero direct `meta->freelist` accesses (except in atomic.h)

---

## Time Budget

| Phase | Description | Files | Sites | Time |
|-------|-------------|-------|-------|------|
| **Prep** | Read docs, setup | - | - | 15 min |
| **Header** | Create accessor API | 1 | - | 30 min |
| **Phase 1** | Hot paths (critical) | 5 | 25 | 2-3h |
| **Phase 2** | Warm paths (important) | 10 | 40 | 2-3h |
| **Phase 3** | Cold paths (cleanup) | 5 | 25 | 1-2h |
| **Total** | | **21** | **90** | **6-9h** |

**Realistic**: 6-9 hours with testing and debugging

---

## Next Steps

1. **Review strategy** (15 min)
   - `ATOMIC_FREELIST_IMPLEMENTATION_STRATEGY.md`
   - `ATOMIC_FREELIST_SITE_BY_SITE_GUIDE.md`

2. **Run analysis** (5 min)
   ```bash
   ./scripts/analyze_freelist_sites.sh
   ```

3. **Create branch** (2 min)
   ```bash
   git checkout -b atomic-freelist-phase1
   git stash  # Save any uncommitted work
   ```

4. **Create accessor header** (30 min)
   ```bash
   cp core/box/slab_freelist_atomic.h.TEMPLATE core/box/slab_freelist_atomic.h
   # Edit to add includes
   make bench_random_mixed_hakmem  # Test compile
   ```

5. **Start Phase 1** (2-3 hours)
   - Convert 5 files, ~25 sites
   - Test after each file
   - Final test with Larson 8T

6. **Evaluate results**
   - If pass: Continue to Phase 2
   - If fail: Debug or rollback

---

## Support Documents

- **ATOMIC_FREELIST_IMPLEMENTATION_STRATEGY.md** - Overall strategy, performance analysis
- **ATOMIC_FREELIST_SITE_BY_SITE_GUIDE.md** - Detailed conversion instructions
- **core/box/slab_freelist_atomic.h.TEMPLATE** - Accessor API implementation
- **scripts/analyze_freelist_sites.sh** - Automated site analysis

---

## Questions?

**Q: Why not just add a mutex to TinySlabMeta?**
A: 40-byte overhead per slab, 10-20x performance hit. Lock-free CAS is 3-5x faster.

**Q: Why not use a global lock?**
A: Serializes all allocation, kills MT performance. Lock-free allows concurrency.

**Q: Why 3 phases instead of all at once?**
A: Risk management. Phase 1 fixes Larson crash (2-3h), can stop there if needed.

**Q: What if performance regression is >5%?**
A: Rollback to master, review strategy. Consider spinlock alternative (5-10% overhead, simpler).

**Q: Can I skip Phase 3?**
A: Yes, but you'll have ~25 sites with direct access (debug/stats). Document them clearly.

---

## Recommendation

**Start with Phase 1 (2-3 hours)** and evaluate results:
- If Larson 8T stable + regression <5%: ✅ Continue to Phase 2
- If unstable or regression >5%: ❌ Rollback and review

**Best case**: 6-9 hours for full MT safety with <3% regression
**Worst case**: 2-3 hours to prove feasibility, then rollback if needed

**Risk**: Low (incremental, easy rollback, well-documented)
**Benefit**: High (MT stability, scalability, future-proof architecture)
Doc: Add benchmark reports, atomic freelist docs, and .gitignore update Phase 1 Commit: Comprehensive documentation and build system cleanup Added Documentation: - BENCHMARK_SUMMARY_20251122.md: Current performance baseline - COMPREHENSIVE_BENCHMARK_REPORT_20251122.md: Detailed analysis - LARSON_SLOWDOWN_INVESTIGATION_REPORT.md: Larson benchmark deep dive - ATOMIC_FREELIST_.md (5 files): Complete atomic freelist documentation - Implementation strategy, quick start, site-by-site guide - Index and summary for easy navigation Added Scripts: - run_comprehensive_benchmark.sh: Automated benchmark runner - scripts/analyze_freelist_sites.sh: Freelist analysis tool - scripts/verify_atomic_freelist_conversion.sh: Conversion verification Build System: - Updated .gitignore: Added .d (build dependency files) - Cleaned up tracked .d files (will be ignored going forward) Performance Status (2025-11-22): - Random Mixed 256B: 59.6M ops/s (VERIFIED WORKING) - Benchmark command: ./out/release/bench_random_mixed_hakmem 10000000 256 42 - Known issue: workset=8192 causes SEGV (to be fixed separately) Notes: - bench_random_mixed.c already tracked, working state confirmed - Ultra SLIM implementation backed up to /tmp/ (Phase 2 restore pending) - Documentation covers atomic freelist conversion and benchmarking methodology 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> 2025-11-22 06:11:55 +09:00			`# Atomic Freelist Quick Start Guide`

			`## TL;DR`

			`Problem: 589 freelist access sites? → Actual: 90 sites (much better!)`
			`Solution: Hybrid approach - lock-free CAS for hot paths, relaxed atomics for cold paths`
			`Effort: 5-8 hours (3 phases)`
			`Risk: Low (incremental, easy rollback)`
			`Impact: -2-3% single-threaded, +MT stability`

			`---`

			`## Step-by-Step Implementation`

			`### Step 1: Read Documentation (15 min)`

			1. Strategy: `ATOMIC_FREELIST_IMPLEMENTATION_STRATEGY.md`
			`- Accessor function design`
			`- Memory ordering rationale`
			`- Performance projections`

			2. Site Guide: `ATOMIC_FREELIST_SITE_BY_SITE_GUIDE.md`
			`- File-by-file conversion instructions`
			`- Common pitfalls`
			`- Testing checklist`

			3. Analysis: Run `scripts/analyze_freelist_sites.sh`
			`- Validates site counts`
			`- Shows operation breakdown`
			`- Estimates effort`

			`---`

			`### Step 2: Create Accessor Header (30 min)`

			```bash
			`# Copy template to working file`
			`cp core/box/slab_freelist_atomic.h.TEMPLATE core/box/slab_freelist_atomic.h`

			`# Add include to tiny_next_ptr_box.h`
			`echo '#include "tiny_next_ptr_box.h"' >> core/box/slab_freelist_atomic.h`

			`# Verify compile`
			`make clean`
			`make bench_random_mixed_hakmem 2>&1 \| grep -i error`
			```

			`Expected: Clean compile (no errors)`

			`---`

			`### Step 3: Phase 1 - Hot Paths (2-3 hours)`

			`#### 3.1 Convert NULL Checks (30 min)`

			Pattern: `if (meta->freelist)` → `if (slab_freelist_is_nonempty(meta))`

			`Files:`
			- `core/tiny_superslab_alloc.inc.h` (4 sites)
			- `core/hakmem_tiny_refill_p0.inc.h` (1 site)
			- `core/box/carve_push_box.c` (2 sites)
			- `core/hakmem_tiny_tls_ops.h` (2 sites)

			`Commands:`
			```bash
			`# Add include at top of each file`
			`# For tiny_superslab_alloc.inc.h:`
			`sed -i '1i#include "box/slab_freelist_atomic.h"' core/tiny_superslab_alloc.inc.h`

			`# Replace NULL checks (review carefully!)`
			`# Do this manually - automated sed is too risky`
			```

			`---`

			`#### 3.2 Convert POP Operations (1 hour)`

			`Pattern:`
			```c
			`// BEFORE:`
			`void* block = meta->freelist;`
			`meta->freelist = tiny_next_read(class_idx, block);`

			`// AFTER:`
			`void* block = slab_freelist_pop_lockfree(meta, class_idx);`
			`if (!block) goto fallback; // Handle race`
			```

			`Files:`
			- `core/tiny_superslab_alloc.inc.h:117-145` (1 critical site)
			- `core/box/carve_push_box.c:173-174` (1 site)
			- `core/hakmem_tiny_tls_ops.h:83-85` (1 site)

			`Testing after each file:`
			```bash
			`make bench_random_mixed_hakmem`
			`./out/release/bench_random_mixed_hakmem 10000 256 42`
			```

			`---`

			`#### 3.3 Convert PUSH Operations (1 hour)`

			`Pattern:`
			```c
			`// BEFORE:`
			`tiny_next_write(class_idx, node, meta->freelist);`
			`meta->freelist = node;`

			`// AFTER:`
			`slab_freelist_push_lockfree(meta, class_idx, node);`
			```

			`Files:`
			- `core/box/carve_push_box.c` (6 sites - rollback paths)

			`Testing:`
			```bash
			`make bench_random_mixed_hakmem`
			`./out/release/bench_random_mixed_hakmem 100000 256 42`
			```

			`---`

			`#### 3.4 Phase 1 Final Test (30 min)`

			```bash
			`# Single-threaded baseline`
			`./out/release/bench_random_mixed_hakmem 10000000 256 42`
			`# Record ops/s (expect: 24.4-24.8M, vs 25.1M baseline)`

			`# Multi-threaded stability`
			`make larson_hakmem`
			`./out/release/larson_hakmem 8 100000 256`
			`# Expect: No crashes, ~18-20M ops/s`

			`# Race detection`
			`./build.sh tsan larson_hakmem`
			`./out/tsan/larson_hakmem 4 10000 256`
			`# Expect: No TSan warnings`
			```

			`Success Criteria:`
			`- ✅ Single-threaded regression <5% (24.0M+ ops/s)`
			`- ✅ Larson 8T stable (no crashes)`
			`- ✅ No TSan warnings`
			`- ✅ Clean build`

			`If failed: Rollback and debug`
			```bash
			`git diff > phase1.patch # Save work`
			`git checkout . # Revert`
			`# Review phase1.patch and fix issues`
			```

			`---`

			`### Step 4: Phase 2 - Warm Paths (2-3 hours)`

			`Scope: Convert remaining 40 sites in 10 files`

			`Files (in order of priority):`
			1. `core/tiny_refill_opt.h` (refill chain ops)
			2. `core/tiny_free_magazine.inc.h` (magazine push)
			3. `core/refill/ss_refill_fc.h` (FC refill)
			4. `core/slab_handle.h` (slab handle ops)
			`5-10. Remaining files (see SITE_BY_SITE_GUIDE.md)`

			`Testing (after each file):`
			```bash
			`make bench_random_mixed_hakmem`
			`./out/release/bench_random_mixed_hakmem 100000 256 42`
			```

			`Phase 2 Final Test:`
			```bash
			`# All sizes`
			`for size in 128 256 512 1024; do`
			`./out/release/bench_random_mixed_hakmem 1000000 $size 42`
			`done`

			`# MT scaling`
			`for threads in 1 2 4 8 16; do`
			`./out/release/larson_hakmem $threads 100000 256`
			`done`
			```

			`---`

			`### Step 5: Phase 3 - Cleanup (1-2 hours)`

			`Scope: Convert/document remaining 25 sites`

			`#### 5.1 Debug/Stats Sites (30 min)`

			Pattern: `meta->freelist` → `SLAB_FREELIST_DEBUG_PTR(meta)`

			`Files:`
			- `core/box/ss_stats_box.c`
			- `core/tiny_debug.h`
			- `core/tiny_remote.c`

			`---`

			`#### 5.2 Init/Cleanup Sites (30 min)`

			Pattern: `meta->freelist = NULL` → `slab_freelist_store_relaxed(meta, NULL)`

			`Files:`
			- `core/hakmem_tiny_superslab.c`
			- `core/hakmem_smallmid_superslab.c`

			`---`

			`#### 5.3 Final Verification (30 min)`

			```bash
			`# Full rebuild`
			`make clean && make all`

			`# Run all tests`
			`./run_all_tests.sh`

			`# Check for remaining direct accesses`
			`grep -rn "meta->freelist" core/ --include=".c" --include=".h" \| \`
			`grep -v "slab_freelist_" \| grep -v "SLAB_FREELIST_DEBUG_PTR"`
			`# Expect: 0 results (all converted or documented)`
			```

			`---`

			`## Common Pitfalls`

			`### Pitfall 1: Double-Converting POP`
			```c
			`// ❌ WRONG: slab_freelist_pop_lockfree already calls tiny_next_read!`
			`void* p = slab_freelist_pop_lockfree(meta, class_idx);`
			`void* next = tiny_next_read(class_idx, p); // ❌ BUG!`

			`// ✅ RIGHT: Use p directly`
			`void* p = slab_freelist_pop_lockfree(meta, class_idx);`
			`if (!p) goto fallback;`
			`use(p); // ✅ CORRECT`
			```

			`### Pitfall 2: Forgetting Race Handling`
			```c
			`// ❌ WRONG: Assuming pop always succeeds`
			`void* p = slab_freelist_pop_lockfree(meta, class_idx);`
			`use(p); // ❌ SEGV if p == NULL!`

			`// ✅ RIGHT: Always check for NULL`
			`void* p = slab_freelist_pop_lockfree(meta, class_idx);`
			`if (!p) goto fallback; // ✅ CORRECT`
			`use(p);`
			```

			`### Pitfall 3: Including Header Before Dependencies`
			```c
			`// ❌ WRONG: slab_freelist_atomic.h needs tiny_next_ptr_box.h`
			`#include "box/slab_freelist_atomic.h" // ❌ Compile error!`
			`#include "box/tiny_next_ptr_box.h"`

			`// ✅ RIGHT: Dependencies first`
			`#include "box/tiny_next_ptr_box.h" // ✅ CORRECT`
			`#include "box/slab_freelist_atomic.h"`
			```

			`---`

			`## Performance Expectations`

			`### Single-Threaded`

			`\| Metric \| Before \| After \| Change \|`
			`\|--------\|--------\|-------\|--------\|`
			`\| Random Mixed 256B \| 25.1M ops/s \| 24.4-24.8M ops/s \| -1.2-2.8% \|`
			`\| Larson 1T \| 2.76M ops/s \| 2.68-2.73M ops/s \| -1.1-2.9% \|`

			`Acceptable: <5% regression (relaxed atomics have ~0% cost, CAS has 60-140% but rare)`

			`### Multi-Threaded`

			`\| Metric \| Before \| After \| Change \|`
			`\|--------\|--------\|-------\|--------\|`
			`\| Larson 8T \| CRASH \| ~18-20M ops/s \| ✅ FIXED \|`
			`\| MT Scaling (8T) \| 0% (crashes) \| 70-80% \| ✅ GAIN \|`

			`Expected: Stability + MT scalability >> 2-3% single-threaded cost`

			`---`

			`## Rollback Plan`

			`If Phase 1 fails (>5% regression or instability):`

			```bash
			`# Option 1: Revert to master`
			`git checkout master`
			`git branch -D atomic-freelist-phase1`

			`# Option 2: Alternative approach (per-slab spinlock)`
			`# Add uint8_t lock field to TinySlabMeta (1 byte)`
			`# Use __sync_lock_test_and_set() for spinlock (5-10% overhead)`
			`# Guaranteed correctness, simpler implementation`
			```

			`---`

			`## Success Criteria`

			`### Phase 1`
			`- ✅ Larson 8T runs without crash (100K iterations)`
			`- ✅ Single-threaded regression <5% (24.0M+ ops/s)`
			`- ✅ No ASan/TSan warnings`

			`### Phase 2`
			`- ✅ All MT tests pass (1T, 2T, 4T, 8T, 16T)`
			`- ✅ Single-threaded regression <3% (24.4M+ ops/s)`
			`- ✅ MT scaling 70%+ (8T = 5.6x+ speedup)`

			`### Phase 3`
			`- ✅ All 90 sites converted or documented`
			`- ✅ Full test suite passes (100% pass rate)`
			- ✅ Zero direct `meta->freelist` accesses (except in atomic.h)

			`---`

			`## Time Budget`

			`\| Phase \| Description \| Files \| Sites \| Time \|`
			`\|-------\|-------------\|-------\|-------\|------\|`
			`\| Prep \| Read docs, setup \| - \| - \| 15 min \|`
			`\| Header \| Create accessor API \| 1 \| - \| 30 min \|`
			`\| Phase 1 \| Hot paths (critical) \| 5 \| 25 \| 2-3h \|`
			`\| Phase 2 \| Warm paths (important) \| 10 \| 40 \| 2-3h \|`
			`\| Phase 3 \| Cold paths (cleanup) \| 5 \| 25 \| 1-2h \|`
			`\| Total \| \| 21 \| 90 \| 6-9h \|`

			`Realistic: 6-9 hours with testing and debugging`

			`---`

			`## Next Steps`

			`1. Review strategy (15 min)`
			- `ATOMIC_FREELIST_IMPLEMENTATION_STRATEGY.md`
			- `ATOMIC_FREELIST_SITE_BY_SITE_GUIDE.md`

			`2. Run analysis (5 min)`
			```bash
			`./scripts/analyze_freelist_sites.sh`
			```

			`3. Create branch (2 min)`
			```bash
			`git checkout -b atomic-freelist-phase1`
			`git stash # Save any uncommitted work`
			```

			`4. Create accessor header (30 min)`
			```bash
			`cp core/box/slab_freelist_atomic.h.TEMPLATE core/box/slab_freelist_atomic.h`
			`# Edit to add includes`
			`make bench_random_mixed_hakmem # Test compile`
			```

			`5. Start Phase 1 (2-3 hours)`
			`- Convert 5 files, ~25 sites`
			`- Test after each file`
			`- Final test with Larson 8T`

			`6. Evaluate results`
			`- If pass: Continue to Phase 2`
			`- If fail: Debug or rollback`

			`---`

			`## Support Documents`

			`- ATOMIC_FREELIST_IMPLEMENTATION_STRATEGY.md - Overall strategy, performance analysis`
			`- ATOMIC_FREELIST_SITE_BY_SITE_GUIDE.md - Detailed conversion instructions`
			`- core/box/slab_freelist_atomic.h.TEMPLATE - Accessor API implementation`
			`- scripts/analyze_freelist_sites.sh - Automated site analysis`

			`---`

			`## Questions?`

			`Q: Why not just add a mutex to TinySlabMeta?`
			`A: 40-byte overhead per slab, 10-20x performance hit. Lock-free CAS is 3-5x faster.`

			`Q: Why not use a global lock?`
			`A: Serializes all allocation, kills MT performance. Lock-free allows concurrency.`

			`Q: Why 3 phases instead of all at once?`
			`A: Risk management. Phase 1 fixes Larson crash (2-3h), can stop there if needed.`

			`Q: What if performance regression is >5%?`
			`A: Rollback to master, review strategy. Consider spinlock alternative (5-10% overhead, simpler).`

			`Q: Can I skip Phase 3?`
			`A: Yes, but you'll have ~25 sites with direct access (debug/stats). Document them clearly.`

			`---`

			`## Recommendation`

			`Start with Phase 1 (2-3 hours) and evaluate results:`
			`- If Larson 8T stable + regression <5%: ✅ Continue to Phase 2`
			`- If unstable or regression >5%: ❌ Rollback and review`

			`Best case: 6-9 hours for full MT safety with <3% regression`
			`Worst case: 2-3 hours to prove feasibility, then rollback if needed`

			`Risk: Low (incremental, easy rollback, well-documented)`
			`Benefit: High (MT stability, scalability, future-proof architecture)`