Phase 1 Commit: Comprehensive documentation and build system cleanup Added Documentation: - BENCHMARK_SUMMARY_20251122.md: Current performance baseline - COMPREHENSIVE_BENCHMARK_REPORT_20251122.md: Detailed analysis - LARSON_SLOWDOWN_INVESTIGATION_REPORT.md: Larson benchmark deep dive - ATOMIC_FREELIST_*.md (5 files): Complete atomic freelist documentation - Implementation strategy, quick start, site-by-site guide - Index and summary for easy navigation Added Scripts: - run_comprehensive_benchmark.sh: Automated benchmark runner - scripts/analyze_freelist_sites.sh: Freelist analysis tool - scripts/verify_atomic_freelist_conversion.sh: Conversion verification Build System: - Updated .gitignore: Added *.d (build dependency files) - Cleaned up tracked .d files (will be ignored going forward) Performance Status (2025-11-22): - Random Mixed 256B: 59.6M ops/s (VERIFIED WORKING) - Benchmark command: ./out/release/bench_random_mixed_hakmem 10000000 256 42 - Known issue: workset=8192 causes SEGV (to be fixed separately) Notes: - bench_random_mixed.c already tracked, working state confirmed - Ultra SLIM implementation backed up to /tmp/ (Phase 2 restore pending) - Documentation covers atomic freelist conversion and benchmarking methodology 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
16 KiB
Atomic Freelist Site-by-Site Conversion Guide
Quick Reference
Total Sites: 90 Phase 1 (Critical): 25 sites in 5 files Phase 2 (Important): 40 sites in 10 files Phase 3 (Cleanup): 25 sites in 5 files
Phase 1: Critical Hot Paths (5 files, 25 sites)
File 1: core/box/slab_freelist_atomic.h (NEW)
Action: CREATE new file with accessor API (see ATOMIC_FREELIST_IMPLEMENTATION_STRATEGY.md section 1)
Lines: ~80 lines Time: 30 minutes
File 2: core/tiny_superslab_alloc.inc.h (8 sites)
File: /mnt/workdisk/public_share/hakmem/core/tiny_superslab_alloc.inc.h
Site 2.1: Line 26 (NULL check)
// BEFORE:
if (meta->freelist == NULL && meta->used < meta->capacity) {
// AFTER:
if (slab_freelist_is_empty(meta) && meta->used < meta->capacity) {
Reason: Relaxed load for condition check
Site 2.2: Line 38 (remote drain check)
// BEFORE:
if (__builtin_expect(atomic_load_explicit(&ss->remote_heads[slab_idx], memory_order_acquire) != 0, 0)) {
// AFTER: (no change - this is remote_heads, not freelist)
Reason: Already using atomic operations correctly
Site 2.3: Line 88 (fast path check)
// BEFORE:
if (__builtin_expect(meta->freelist == NULL && meta->used < meta->capacity, 1)) {
// AFTER:
if (__builtin_expect(slab_freelist_is_empty(meta) && meta->used < meta->capacity, 1)) {
Reason: Relaxed load for fast path condition
Site 2.4: Lines 117-145 (freelist pop - CRITICAL)
// BEFORE:
if (__builtin_expect(meta->freelist != NULL, 0)) {
void* block = meta->freelist;
if (meta->class_idx != class_idx) {
// Class mismatch, abandon freelist
meta->freelist = NULL;
goto bump_path;
}
// Allocate from freelist
meta->freelist = tiny_next_read(meta->class_idx, block);
meta->used = (uint16_t)((uint32_t)meta->used + 1);
ss_active_add(ss, 1);
return (void*)((uint8_t*)block + 1);
}
// AFTER:
if (__builtin_expect(slab_freelist_is_nonempty(meta), 0)) {
// Try lock-free pop
void* block = slab_freelist_pop_lockfree(meta, meta->class_idx);
if (!block) {
// Another thread won the race, fall through to bump path
goto bump_path;
}
if (meta->class_idx != class_idx) {
// Class mismatch, return to freelist and abandon
slab_freelist_push_lockfree(meta, meta->class_idx, block);
slab_freelist_store_relaxed(meta, NULL); // Clear freelist
goto bump_path;
}
// Success
meta->used = (uint16_t)((uint32_t)meta->used + 1);
ss_active_add(ss, 1);
return (void*)((uint8_t*)block + 1);
}
Reason: Lock-free CAS for hot path allocation
CRITICAL: Note that slab_freelist_pop_lockfree() already handles tiny_next_read() internally!
Site 2.5: Line 134 (freelist clear)
// BEFORE:
meta->freelist = NULL;
// AFTER:
slab_freelist_store_relaxed(meta, NULL);
Reason: Relaxed store for initialization
Site 2.6: Line 308 (bump path check)
// BEFORE:
if (meta && meta->freelist == NULL && meta->used < meta->capacity && tls->slab_base) {
// AFTER:
if (meta && slab_freelist_is_empty(meta) && meta->used < meta->capacity && tls->slab_base) {
Reason: Relaxed load for condition check
Site 2.7: Line 351 (freelist update after remote drain)
// BEFORE:
meta->freelist = next;
// AFTER:
slab_freelist_store_relaxed(meta, next);
Reason: Relaxed store after drain (single-threaded context)
Site 2.8: Line 372 (bump path check)
// BEFORE:
if (meta && meta->freelist == NULL && meta->used < meta->capacity && meta->carved < meta->capacity) {
// AFTER:
if (meta && slab_freelist_is_empty(meta) && meta->used < meta->capacity && meta->carved < meta->capacity) {
Reason: Relaxed load for condition check
File 3: core/hakmem_tiny_refill_p0.inc.h (3 sites)
File: /mnt/workdisk/public_share/hakmem/core/hakmem_tiny_refill_p0.inc.h
Site 3.1: Line 101 (prefetch)
// BEFORE:
__builtin_prefetch(&meta->freelist, 0, 3);
// AFTER: (no change)
__builtin_prefetch(&meta->freelist, 0, 3);
Reason: Prefetch works fine with atomic type, no conversion needed
Site 3.2: Lines 252-253 (freelist check + prefetch)
// BEFORE:
if (meta->freelist) {
__builtin_prefetch(meta->freelist, 0, 3);
}
// AFTER:
if (slab_freelist_is_nonempty(meta)) {
void* head = slab_freelist_load_relaxed(meta);
__builtin_prefetch(head, 0, 3);
}
Reason: Need to load pointer before prefetching (cannot prefetch atomic type directly)
Alternative (if prefetch not critical):
// Simpler: Skip prefetch
if (slab_freelist_is_nonempty(meta)) {
// ... rest of logic
}
Site 3.3: Line ~260 (freelist pop in batch refill)
Context: Need to review full function to find freelist pop logic
grep -A20 "if (meta->freelist)" core/hakmem_tiny_refill_p0.inc.h
Expected Pattern:
// BEFORE:
while (taken < want && meta->freelist) {
void* p = meta->freelist;
meta->freelist = tiny_next_read(class_idx, p);
// ... push to TLS
}
// AFTER:
while (taken < want && slab_freelist_is_nonempty(meta)) {
void* p = slab_freelist_pop_lockfree(meta, class_idx);
if (!p) break; // Another thread drained it
// ... push to TLS
}
File 4: core/box/carve_push_box.c (10 sites)
File: /mnt/workdisk/public_share/hakmem/core/box/carve_push_box.c
Site 4.1-4.2: Lines 33-34 (rollback push)
// BEFORE:
tiny_next_write(class_idx, node, meta->freelist);
meta->freelist = node;
// AFTER:
slab_freelist_push_lockfree(meta, class_idx, node);
Reason: Lock-free push for rollback (inside rollback_carved_blocks)
IMPORTANT: slab_freelist_push_lockfree() already calls tiny_next_write() internally!
Site 4.3-4.4: Lines 120-121 (rollback in box_carve_and_push)
// BEFORE:
tiny_next_write(class_idx, popped, meta->freelist);
meta->freelist = popped;
// AFTER:
slab_freelist_push_lockfree(meta, class_idx, popped);
Reason: Same as 4.1-4.2
Site 4.5-4.6: Lines 128-129 (rollback remaining)
// BEFORE:
tiny_next_write(class_idx, node, meta->freelist);
meta->freelist = node;
// AFTER:
slab_freelist_push_lockfree(meta, class_idx, node);
Reason: Same as 4.1-4.2
Site 4.7: Line 172 (freelist carve check)
// BEFORE:
while (pushed < want && meta->freelist) {
// AFTER:
while (pushed < want && slab_freelist_is_nonempty(meta)) {
Reason: Relaxed load for loop condition
Site 4.8: Lines 173-174 (freelist pop)
// BEFORE:
void* p = meta->freelist;
meta->freelist = tiny_next_read(class_idx, p);
// AFTER:
void* p = slab_freelist_pop_lockfree(meta, class_idx);
if (!p) break; // Freelist exhausted
Reason: Lock-free pop for carve-with-freelist path
Site 4.9-4.10: Lines 179-180 (rollback on push failure)
// BEFORE:
tiny_next_write(class_idx, p, meta->freelist);
meta->freelist = p;
// AFTER:
slab_freelist_push_lockfree(meta, class_idx, p);
Reason: Same as 4.1-4.2
File 5: core/hakmem_tiny_tls_ops.h (4 sites)
File: /mnt/workdisk/public_share/hakmem/core/hakmem_tiny_tls_ops.h
Site 5.1: Line 77 (TLS drain check)
// BEFORE:
if (meta->freelist) {
// AFTER:
if (slab_freelist_is_nonempty(meta)) {
Reason: Relaxed load for condition check
Site 5.2: Line 82 (TLS drain loop)
// BEFORE:
while (local < need && meta->freelist) {
// AFTER:
while (local < need && slab_freelist_is_nonempty(meta)) {
Reason: Relaxed load for loop condition
Site 5.3: Lines 83-85 (TLS drain pop)
// BEFORE:
void* node = meta->freelist;
// ... 1 line ...
meta->freelist = tiny_next_read(class_idx, node);
// AFTER:
void* node = slab_freelist_pop_lockfree(meta, class_idx);
if (!node) break; // Freelist exhausted
// ... remove tiny_next_read line ...
Reason: Lock-free pop for TLS drain
Site 5.4: Line 203 (TLS freelist init)
// BEFORE:
meta->freelist = node;
// AFTER:
slab_freelist_store_relaxed(meta, node);
Reason: Relaxed store for initialization (single-threaded context)
Phase 1 Summary
Total Changes:
- 1 new file (
slab_freelist_atomic.h) - 5 modified files
- ~25 conversion sites
- ~8 POP operations converted to CAS
- ~6 PUSH operations converted to CAS
- ~11 NULL checks converted to relaxed loads
Time Estimate: 2-3 hours (with testing)
Phase 2: Important Paths (10 files, 40 sites)
File 6: core/tiny_refill_opt.h
Lines 199-230 (refill chain pop)
// BEFORE:
while (taken < want && meta->freelist) {
void* p = meta->freelist;
// ... splice logic ...
meta->freelist = next;
}
// AFTER:
while (taken < want && slab_freelist_is_nonempty(meta)) {
void* p = slab_freelist_pop_lockfree(meta, class_idx);
if (!p) break;
// ... splice logic (remove next assignment) ...
}
File 7: core/tiny_free_magazine.inc.h
Lines 135-136, 328 (magazine push)
// BEFORE:
tiny_next_write(meta->class_idx, it.ptr, meta->freelist);
meta->freelist = it.ptr;
// AFTER:
slab_freelist_push_lockfree(meta, meta->class_idx, it.ptr);
File 8: core/refill/ss_refill_fc.h
Lines 151-153 (FC refill pop)
// BEFORE:
if (meta->freelist != NULL) {
void* p = meta->freelist;
meta->freelist = tiny_next_read(class_idx, p);
}
// AFTER:
if (slab_freelist_is_nonempty(meta)) {
void* p = slab_freelist_pop_lockfree(meta, class_idx);
if (!p) {
// Race: freelist drained, skip
}
}
File 9: core/slab_handle.h
Lines 211, 259, 308, 334 (slab handle ops)
// BEFORE (line 211):
return h->meta->freelist;
// AFTER:
return slab_freelist_load_relaxed(h->meta);
// BEFORE (line 259):
h->meta->freelist = ptr;
// AFTER:
slab_freelist_store_relaxed(h->meta, ptr);
// BEFORE (line 302):
h->meta->freelist = NULL;
// AFTER:
slab_freelist_store_relaxed(h->meta, NULL);
// BEFORE (line 308):
h->meta->freelist = next;
// AFTER:
slab_freelist_store_relaxed(h->meta, next);
// BEFORE (line 334):
return (h->meta->freelist != NULL);
// AFTER:
return slab_freelist_is_nonempty(h->meta);
Files 10-15: Remaining Phase 2 Files
Pattern: Same conversions as above
- NULL checks →
slab_freelist_is_empty/nonempty() - Direct loads →
slab_freelist_load_relaxed() - Direct stores →
slab_freelist_store_relaxed() - POP operations →
slab_freelist_pop_lockfree() - PUSH operations →
slab_freelist_push_lockfree()
Files:
core/hakmem_tiny_superslab.ccore/hakmem_tiny_alloc_new.inccore/hakmem_tiny_free.inccore/box/ss_allocation_box.ccore/box/free_local_box.ccore/box/integrity_box.c
Time Estimate: 2-3 hours (with testing)
Phase 3: Cleanup (5 files, 25 sites)
Debug/Stats Sites (NO CONVERSION)
Files:
core/box/ss_stats_box.ccore/tiny_debug.hcore/tiny_remote.c
Change:
// BEFORE:
fprintf(stderr, "freelist=%p", meta->freelist);
// AFTER:
fprintf(stderr, "freelist=%p", SLAB_FREELIST_DEBUG_PTR(meta));
Reason: Already atomic type, just need explicit cast for printf
Init/Cleanup Sites (RELAXED STORE)
Files:
core/hakmem_tiny_superslab.c(init)core/hakmem_smallmid_superslab.c(init)
Change:
// BEFORE:
meta->freelist = NULL;
// AFTER:
slab_freelist_store_relaxed(meta, NULL);
Reason: Single-threaded initialization, relaxed is sufficient
Verification Sites (RELAXED LOAD)
Files:
core/box/integrity_box.c(integrity checks)
Change:
// BEFORE:
if (meta->freelist) {
// ... integrity check ...
}
// AFTER:
if (slab_freelist_is_nonempty(meta)) {
// ... integrity check ...
}
Time Estimate: 1-2 hours
Common Pitfalls
Pitfall 1: Double-Converting POP Operations
WRONG:
// ❌ BAD: slab_freelist_pop_lockfree already calls tiny_next_read!
void* p = slab_freelist_pop_lockfree(meta, class_idx);
void* next = tiny_next_read(class_idx, p); // ❌ WRONG!
RIGHT:
// ✅ GOOD: slab_freelist_pop_lockfree returns the popped block directly
void* p = slab_freelist_pop_lockfree(meta, class_idx);
if (!p) break; // Handle race
// Use p directly
Pitfall 2: Double-Converting PUSH Operations
WRONG:
// ❌ BAD: slab_freelist_push_lockfree already calls tiny_next_write!
tiny_next_write(class_idx, node, meta->freelist); // ❌ WRONG!
slab_freelist_push_lockfree(meta, class_idx, node);
RIGHT:
// ✅ GOOD: slab_freelist_push_lockfree does everything
slab_freelist_push_lockfree(meta, class_idx, node);
Pitfall 3: Forgetting CAS Race Handling
WRONG:
// ❌ BAD: Assuming pop always succeeds
void* p = slab_freelist_pop_lockfree(meta, class_idx);
use(p); // ❌ SEGV if p == NULL!
RIGHT:
// ✅ GOOD: Always check for NULL (race condition)
void* p = slab_freelist_pop_lockfree(meta, class_idx);
if (!p) {
// Another thread won the race, handle gracefully
break; // or continue, or goto alternative path
}
use(p);
Pitfall 4: Using Wrong Memory Ordering
WRONG:
// ❌ BAD: Using seq_cst for simple check (10x slower!)
if (atomic_load_explicit(&meta->freelist, memory_order_seq_cst) != NULL) {
RIGHT:
// ✅ GOOD: Use relaxed for benign checks
if (slab_freelist_is_nonempty(meta)) { // Uses relaxed internally
Testing Checklist (Per File)
After converting each file:
# 1. Compile check
make clean
make bench_random_mixed_hakmem 2>&1 | tee build.log
grep -i "error\|warning" build.log
# 2. Single-threaded correctness
./out/release/bench_random_mixed_hakmem 100000 256 42
# 3. Multi-threaded stress (if Phase 1 complete)
./out/release/larson_hakmem 8 10000 256
# 4. ASan check (if available)
./build.sh asan bench_random_mixed_hakmem
./out/asan/bench_random_mixed_hakmem 10000 256 42
Progress Tracking
Use this checklist to track conversion progress:
Phase 1 (Critical)
- File 1:
core/box/slab_freelist_atomic.h(CREATE) - File 2:
core/tiny_superslab_alloc.inc.h(8 sites) - File 3:
core/hakmem_tiny_refill_p0.inc.h(3 sites) - File 4:
core/box/carve_push_box.c(10 sites) - File 5:
core/hakmem_tiny_tls_ops.h(4 sites) - Phase 1 Testing (Larson 8T)
Phase 2 (Important)
- File 6:
core/tiny_refill_opt.h(5 sites) - File 7:
core/tiny_free_magazine.inc.h(3 sites) - File 8:
core/refill/ss_refill_fc.h(3 sites) - File 9:
core/slab_handle.h(7 sites) - Files 10-15: Remaining files (22 sites)
- Phase 2 Testing (MT stress)
Phase 3 (Cleanup)
- Debug/Stats sites (5 sites)
- Init/Cleanup sites (10 sites)
- Verification sites (10 sites)
- Phase 3 Testing (Full suite)
Quick Reference Card
| Old Pattern | New Pattern | Use Case |
|---|---|---|
if (meta->freelist) |
if (slab_freelist_is_nonempty(meta)) |
NULL check |
if (meta->freelist == NULL) |
if (slab_freelist_is_empty(meta)) |
Empty check |
void* p = meta->freelist; |
void* p = slab_freelist_load_relaxed(meta); |
Simple load |
meta->freelist = NULL; |
slab_freelist_store_relaxed(meta, NULL); |
Init/clear |
void* p = meta->freelist; meta->freelist = next; |
void* p = slab_freelist_pop_lockfree(meta, cls); |
POP |
tiny_next_write(...); meta->freelist = node; |
slab_freelist_push_lockfree(meta, cls, node); |
PUSH |
fprintf("...%p", meta->freelist) |
fprintf("...%p", SLAB_FREELIST_DEBUG_PTR(meta)) |
Debug print |
Time Budget Summary
| Phase | Files | Sites | Time |
|---|---|---|---|
| Phase 1 (Hot) | 5 | 25 | 2-3h |
| Phase 2 (Warm) | 10 | 40 | 2-3h |
| Phase 3 (Cold) | 5 | 25 | 1-2h |
| Total | 20 | 90 | 5-8h |
Add 20% buffer for unexpected issues: 6-10 hours total
Success Metrics
After full conversion:
- ✅ Zero direct
meta->freelistaccesses (except in atomic accessor functions) - ✅ All tests pass (single + MT)
- ✅ ASan/TSan clean (no data races)
- ✅ Performance regression <3% (single-threaded)
- ✅ Larson 8T stable (no crashes)
- ✅ MT scaling 70%+ (good scalability)
Emergency Rollback
If conversion fails at any phase:
git stash # Save work in progress
git checkout master
git branch -D atomic-freelist-phase1 # Or phase2/phase3
# Review strategy and try alternative approach