Files
hakmem/ATOMIC_FREELIST_SITE_BY_SITE_GUIDE.md
Moe Charm (CI) 131cdb7b88 Doc: Add benchmark reports, atomic freelist docs, and .gitignore update
Phase 1 Commit: Comprehensive documentation and build system cleanup

Added Documentation:
- BENCHMARK_SUMMARY_20251122.md: Current performance baseline
- COMPREHENSIVE_BENCHMARK_REPORT_20251122.md: Detailed analysis
- LARSON_SLOWDOWN_INVESTIGATION_REPORT.md: Larson benchmark deep dive
- ATOMIC_FREELIST_*.md (5 files): Complete atomic freelist documentation
  - Implementation strategy, quick start, site-by-site guide
  - Index and summary for easy navigation

Added Scripts:
- run_comprehensive_benchmark.sh: Automated benchmark runner
- scripts/analyze_freelist_sites.sh: Freelist analysis tool
- scripts/verify_atomic_freelist_conversion.sh: Conversion verification

Build System:
- Updated .gitignore: Added *.d (build dependency files)
- Cleaned up tracked .d files (will be ignored going forward)

Performance Status (2025-11-22):
- Random Mixed 256B: 59.6M ops/s (VERIFIED WORKING)
- Benchmark command: ./out/release/bench_random_mixed_hakmem 10000000 256 42
- Known issue: workset=8192 causes SEGV (to be fixed separately)

Notes:
- bench_random_mixed.c already tracked, working state confirmed
- Ultra SLIM implementation backed up to /tmp/ (Phase 2 restore pending)
- Documentation covers atomic freelist conversion and benchmarking methodology

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-22 06:11:55 +09:00

16 KiB

Atomic Freelist Site-by-Site Conversion Guide

Quick Reference

Total Sites: 90 Phase 1 (Critical): 25 sites in 5 files Phase 2 (Important): 40 sites in 10 files Phase 3 (Cleanup): 25 sites in 5 files


Phase 1: Critical Hot Paths (5 files, 25 sites)

File 1: core/box/slab_freelist_atomic.h (NEW)

Action: CREATE new file with accessor API (see ATOMIC_FREELIST_IMPLEMENTATION_STRATEGY.md section 1)

Lines: ~80 lines Time: 30 minutes


File 2: core/tiny_superslab_alloc.inc.h (8 sites)

File: /mnt/workdisk/public_share/hakmem/core/tiny_superslab_alloc.inc.h

Site 2.1: Line 26 (NULL check)

// BEFORE:
if (meta->freelist == NULL && meta->used < meta->capacity) {

// AFTER:
if (slab_freelist_is_empty(meta) && meta->used < meta->capacity) {

Reason: Relaxed load for condition check


Site 2.2: Line 38 (remote drain check)

// BEFORE:
if (__builtin_expect(atomic_load_explicit(&ss->remote_heads[slab_idx], memory_order_acquire) != 0, 0)) {

// AFTER: (no change - this is remote_heads, not freelist)

Reason: Already using atomic operations correctly


Site 2.3: Line 88 (fast path check)

// BEFORE:
if (__builtin_expect(meta->freelist == NULL && meta->used < meta->capacity, 1)) {

// AFTER:
if (__builtin_expect(slab_freelist_is_empty(meta) && meta->used < meta->capacity, 1)) {

Reason: Relaxed load for fast path condition


Site 2.4: Lines 117-145 (freelist pop - CRITICAL)

// BEFORE:
if (__builtin_expect(meta->freelist != NULL, 0)) {
    void* block = meta->freelist;
    if (meta->class_idx != class_idx) {
        // Class mismatch, abandon freelist
        meta->freelist = NULL;
        goto bump_path;
    }

    // Allocate from freelist
    meta->freelist = tiny_next_read(meta->class_idx, block);
    meta->used = (uint16_t)((uint32_t)meta->used + 1);
    ss_active_add(ss, 1);
    return (void*)((uint8_t*)block + 1);
}

// AFTER:
if (__builtin_expect(slab_freelist_is_nonempty(meta), 0)) {
    // Try lock-free pop
    void* block = slab_freelist_pop_lockfree(meta, meta->class_idx);
    if (!block) {
        // Another thread won the race, fall through to bump path
        goto bump_path;
    }

    if (meta->class_idx != class_idx) {
        // Class mismatch, return to freelist and abandon
        slab_freelist_push_lockfree(meta, meta->class_idx, block);
        slab_freelist_store_relaxed(meta, NULL);  // Clear freelist
        goto bump_path;
    }

    // Success
    meta->used = (uint16_t)((uint32_t)meta->used + 1);
    ss_active_add(ss, 1);
    return (void*)((uint8_t*)block + 1);
}

Reason: Lock-free CAS for hot path allocation

CRITICAL: Note that slab_freelist_pop_lockfree() already handles tiny_next_read() internally!


Site 2.5: Line 134 (freelist clear)

// BEFORE:
meta->freelist = NULL;

// AFTER:
slab_freelist_store_relaxed(meta, NULL);

Reason: Relaxed store for initialization


Site 2.6: Line 308 (bump path check)

// BEFORE:
if (meta && meta->freelist == NULL && meta->used < meta->capacity && tls->slab_base) {

// AFTER:
if (meta && slab_freelist_is_empty(meta) && meta->used < meta->capacity && tls->slab_base) {

Reason: Relaxed load for condition check


Site 2.7: Line 351 (freelist update after remote drain)

// BEFORE:
meta->freelist = next;

// AFTER:
slab_freelist_store_relaxed(meta, next);

Reason: Relaxed store after drain (single-threaded context)


Site 2.8: Line 372 (bump path check)

// BEFORE:
if (meta && meta->freelist == NULL && meta->used < meta->capacity && meta->carved < meta->capacity) {

// AFTER:
if (meta && slab_freelist_is_empty(meta) && meta->used < meta->capacity && meta->carved < meta->capacity) {

Reason: Relaxed load for condition check


File 3: core/hakmem_tiny_refill_p0.inc.h (3 sites)

File: /mnt/workdisk/public_share/hakmem/core/hakmem_tiny_refill_p0.inc.h

Site 3.1: Line 101 (prefetch)

// BEFORE:
__builtin_prefetch(&meta->freelist, 0, 3);

// AFTER: (no change)
__builtin_prefetch(&meta->freelist, 0, 3);

Reason: Prefetch works fine with atomic type, no conversion needed


Site 3.2: Lines 252-253 (freelist check + prefetch)

// BEFORE:
if (meta->freelist) {
    __builtin_prefetch(meta->freelist, 0, 3);
}

// AFTER:
if (slab_freelist_is_nonempty(meta)) {
    void* head = slab_freelist_load_relaxed(meta);
    __builtin_prefetch(head, 0, 3);
}

Reason: Need to load pointer before prefetching (cannot prefetch atomic type directly)

Alternative (if prefetch not critical):

// Simpler: Skip prefetch
if (slab_freelist_is_nonempty(meta)) {
    // ... rest of logic
}

Site 3.3: Line ~260 (freelist pop in batch refill)

Context: Need to review full function to find freelist pop logic

grep -A20 "if (meta->freelist)" core/hakmem_tiny_refill_p0.inc.h

Expected Pattern:

// BEFORE:
while (taken < want && meta->freelist) {
    void* p = meta->freelist;
    meta->freelist = tiny_next_read(class_idx, p);
    // ... push to TLS
}

// AFTER:
while (taken < want && slab_freelist_is_nonempty(meta)) {
    void* p = slab_freelist_pop_lockfree(meta, class_idx);
    if (!p) break;  // Another thread drained it
    // ... push to TLS
}

File 4: core/box/carve_push_box.c (10 sites)

File: /mnt/workdisk/public_share/hakmem/core/box/carve_push_box.c

Site 4.1-4.2: Lines 33-34 (rollback push)

// BEFORE:
tiny_next_write(class_idx, node, meta->freelist);
meta->freelist = node;

// AFTER:
slab_freelist_push_lockfree(meta, class_idx, node);

Reason: Lock-free push for rollback (inside rollback_carved_blocks)

IMPORTANT: slab_freelist_push_lockfree() already calls tiny_next_write() internally!


Site 4.3-4.4: Lines 120-121 (rollback in box_carve_and_push)

// BEFORE:
tiny_next_write(class_idx, popped, meta->freelist);
meta->freelist = popped;

// AFTER:
slab_freelist_push_lockfree(meta, class_idx, popped);

Reason: Same as 4.1-4.2


Site 4.5-4.6: Lines 128-129 (rollback remaining)

// BEFORE:
tiny_next_write(class_idx, node, meta->freelist);
meta->freelist = node;

// AFTER:
slab_freelist_push_lockfree(meta, class_idx, node);

Reason: Same as 4.1-4.2


Site 4.7: Line 172 (freelist carve check)

// BEFORE:
while (pushed < want && meta->freelist) {

// AFTER:
while (pushed < want && slab_freelist_is_nonempty(meta)) {

Reason: Relaxed load for loop condition


Site 4.8: Lines 173-174 (freelist pop)

// BEFORE:
void* p = meta->freelist;
meta->freelist = tiny_next_read(class_idx, p);

// AFTER:
void* p = slab_freelist_pop_lockfree(meta, class_idx);
if (!p) break;  // Freelist exhausted

Reason: Lock-free pop for carve-with-freelist path


Site 4.9-4.10: Lines 179-180 (rollback on push failure)

// BEFORE:
tiny_next_write(class_idx, p, meta->freelist);
meta->freelist = p;

// AFTER:
slab_freelist_push_lockfree(meta, class_idx, p);

Reason: Same as 4.1-4.2


File 5: core/hakmem_tiny_tls_ops.h (4 sites)

File: /mnt/workdisk/public_share/hakmem/core/hakmem_tiny_tls_ops.h

Site 5.1: Line 77 (TLS drain check)

// BEFORE:
if (meta->freelist) {

// AFTER:
if (slab_freelist_is_nonempty(meta)) {

Reason: Relaxed load for condition check


Site 5.2: Line 82 (TLS drain loop)

// BEFORE:
while (local < need && meta->freelist) {

// AFTER:
while (local < need && slab_freelist_is_nonempty(meta)) {

Reason: Relaxed load for loop condition


Site 5.3: Lines 83-85 (TLS drain pop)

// BEFORE:
void* node = meta->freelist;
// ... 1 line ...
meta->freelist = tiny_next_read(class_idx, node);

// AFTER:
void* node = slab_freelist_pop_lockfree(meta, class_idx);
if (!node) break;  // Freelist exhausted
// ... remove tiny_next_read line ...

Reason: Lock-free pop for TLS drain


Site 5.4: Line 203 (TLS freelist init)

// BEFORE:
meta->freelist = node;

// AFTER:
slab_freelist_store_relaxed(meta, node);

Reason: Relaxed store for initialization (single-threaded context)


Phase 1 Summary

Total Changes:

  • 1 new file (slab_freelist_atomic.h)
  • 5 modified files
  • ~25 conversion sites
  • ~8 POP operations converted to CAS
  • ~6 PUSH operations converted to CAS
  • ~11 NULL checks converted to relaxed loads

Time Estimate: 2-3 hours (with testing)


Phase 2: Important Paths (10 files, 40 sites)

File 6: core/tiny_refill_opt.h

Lines 199-230 (refill chain pop)

// BEFORE:
while (taken < want && meta->freelist) {
    void* p = meta->freelist;
    // ... splice logic ...
    meta->freelist = next;
}

// AFTER:
while (taken < want && slab_freelist_is_nonempty(meta)) {
    void* p = slab_freelist_pop_lockfree(meta, class_idx);
    if (!p) break;
    // ... splice logic (remove next assignment) ...
}

File 7: core/tiny_free_magazine.inc.h

Lines 135-136, 328 (magazine push)

// BEFORE:
tiny_next_write(meta->class_idx, it.ptr, meta->freelist);
meta->freelist = it.ptr;

// AFTER:
slab_freelist_push_lockfree(meta, meta->class_idx, it.ptr);

File 8: core/refill/ss_refill_fc.h

Lines 151-153 (FC refill pop)

// BEFORE:
if (meta->freelist != NULL) {
    void* p = meta->freelist;
    meta->freelist = tiny_next_read(class_idx, p);
}

// AFTER:
if (slab_freelist_is_nonempty(meta)) {
    void* p = slab_freelist_pop_lockfree(meta, class_idx);
    if (!p) {
        // Race: freelist drained, skip
    }
}

File 9: core/slab_handle.h

Lines 211, 259, 308, 334 (slab handle ops)

// BEFORE (line 211):
return h->meta->freelist;

// AFTER:
return slab_freelist_load_relaxed(h->meta);

// BEFORE (line 259):
h->meta->freelist = ptr;

// AFTER:
slab_freelist_store_relaxed(h->meta, ptr);

// BEFORE (line 302):
h->meta->freelist = NULL;

// AFTER:
slab_freelist_store_relaxed(h->meta, NULL);

// BEFORE (line 308):
h->meta->freelist = next;

// AFTER:
slab_freelist_store_relaxed(h->meta, next);

// BEFORE (line 334):
return (h->meta->freelist != NULL);

// AFTER:
return slab_freelist_is_nonempty(h->meta);

Files 10-15: Remaining Phase 2 Files

Pattern: Same conversions as above

  • NULL checks → slab_freelist_is_empty/nonempty()
  • Direct loads → slab_freelist_load_relaxed()
  • Direct stores → slab_freelist_store_relaxed()
  • POP operations → slab_freelist_pop_lockfree()
  • PUSH operations → slab_freelist_push_lockfree()

Files:

  • core/hakmem_tiny_superslab.c
  • core/hakmem_tiny_alloc_new.inc
  • core/hakmem_tiny_free.inc
  • core/box/ss_allocation_box.c
  • core/box/free_local_box.c
  • core/box/integrity_box.c

Time Estimate: 2-3 hours (with testing)


Phase 3: Cleanup (5 files, 25 sites)

Debug/Stats Sites (NO CONVERSION)

Files:

  • core/box/ss_stats_box.c
  • core/tiny_debug.h
  • core/tiny_remote.c

Change:

// BEFORE:
fprintf(stderr, "freelist=%p", meta->freelist);

// AFTER:
fprintf(stderr, "freelist=%p", SLAB_FREELIST_DEBUG_PTR(meta));

Reason: Already atomic type, just need explicit cast for printf


Init/Cleanup Sites (RELAXED STORE)

Files:

  • core/hakmem_tiny_superslab.c (init)
  • core/hakmem_smallmid_superslab.c (init)

Change:

// BEFORE:
meta->freelist = NULL;

// AFTER:
slab_freelist_store_relaxed(meta, NULL);

Reason: Single-threaded initialization, relaxed is sufficient


Verification Sites (RELAXED LOAD)

Files:

  • core/box/integrity_box.c (integrity checks)

Change:

// BEFORE:
if (meta->freelist) {
    // ... integrity check ...
}

// AFTER:
if (slab_freelist_is_nonempty(meta)) {
    // ... integrity check ...
}

Time Estimate: 1-2 hours


Common Pitfalls

Pitfall 1: Double-Converting POP Operations

WRONG:

// ❌ BAD: slab_freelist_pop_lockfree already calls tiny_next_read!
void* p = slab_freelist_pop_lockfree(meta, class_idx);
void* next = tiny_next_read(class_idx, p);  // ❌ WRONG!

RIGHT:

// ✅ GOOD: slab_freelist_pop_lockfree returns the popped block directly
void* p = slab_freelist_pop_lockfree(meta, class_idx);
if (!p) break;  // Handle race
// Use p directly

Pitfall 2: Double-Converting PUSH Operations

WRONG:

// ❌ BAD: slab_freelist_push_lockfree already calls tiny_next_write!
tiny_next_write(class_idx, node, meta->freelist);  // ❌ WRONG!
slab_freelist_push_lockfree(meta, class_idx, node);

RIGHT:

// ✅ GOOD: slab_freelist_push_lockfree does everything
slab_freelist_push_lockfree(meta, class_idx, node);

Pitfall 3: Forgetting CAS Race Handling

WRONG:

// ❌ BAD: Assuming pop always succeeds
void* p = slab_freelist_pop_lockfree(meta, class_idx);
use(p);  // ❌ SEGV if p == NULL!

RIGHT:

// ✅ GOOD: Always check for NULL (race condition)
void* p = slab_freelist_pop_lockfree(meta, class_idx);
if (!p) {
    // Another thread won the race, handle gracefully
    break;  // or continue, or goto alternative path
}
use(p);

Pitfall 4: Using Wrong Memory Ordering

WRONG:

// ❌ BAD: Using seq_cst for simple check (10x slower!)
if (atomic_load_explicit(&meta->freelist, memory_order_seq_cst) != NULL) {

RIGHT:

// ✅ GOOD: Use relaxed for benign checks
if (slab_freelist_is_nonempty(meta)) {  // Uses relaxed internally

Testing Checklist (Per File)

After converting each file:

# 1. Compile check
make clean
make bench_random_mixed_hakmem 2>&1 | tee build.log
grep -i "error\|warning" build.log

# 2. Single-threaded correctness
./out/release/bench_random_mixed_hakmem 100000 256 42

# 3. Multi-threaded stress (if Phase 1 complete)
./out/release/larson_hakmem 8 10000 256

# 4. ASan check (if available)
./build.sh asan bench_random_mixed_hakmem
./out/asan/bench_random_mixed_hakmem 10000 256 42

Progress Tracking

Use this checklist to track conversion progress:

Phase 1 (Critical)

  • File 1: core/box/slab_freelist_atomic.h (CREATE)
  • File 2: core/tiny_superslab_alloc.inc.h (8 sites)
  • File 3: core/hakmem_tiny_refill_p0.inc.h (3 sites)
  • File 4: core/box/carve_push_box.c (10 sites)
  • File 5: core/hakmem_tiny_tls_ops.h (4 sites)
  • Phase 1 Testing (Larson 8T)

Phase 2 (Important)

  • File 6: core/tiny_refill_opt.h (5 sites)
  • File 7: core/tiny_free_magazine.inc.h (3 sites)
  • File 8: core/refill/ss_refill_fc.h (3 sites)
  • File 9: core/slab_handle.h (7 sites)
  • Files 10-15: Remaining files (22 sites)
  • Phase 2 Testing (MT stress)

Phase 3 (Cleanup)

  • Debug/Stats sites (5 sites)
  • Init/Cleanup sites (10 sites)
  • Verification sites (10 sites)
  • Phase 3 Testing (Full suite)

Quick Reference Card

Old Pattern New Pattern Use Case
if (meta->freelist) if (slab_freelist_is_nonempty(meta)) NULL check
if (meta->freelist == NULL) if (slab_freelist_is_empty(meta)) Empty check
void* p = meta->freelist; void* p = slab_freelist_load_relaxed(meta); Simple load
meta->freelist = NULL; slab_freelist_store_relaxed(meta, NULL); Init/clear
void* p = meta->freelist; meta->freelist = next; void* p = slab_freelist_pop_lockfree(meta, cls); POP
tiny_next_write(...); meta->freelist = node; slab_freelist_push_lockfree(meta, cls, node); PUSH
fprintf("...%p", meta->freelist) fprintf("...%p", SLAB_FREELIST_DEBUG_PTR(meta)) Debug print

Time Budget Summary

Phase Files Sites Time
Phase 1 (Hot) 5 25 2-3h
Phase 2 (Warm) 10 40 2-3h
Phase 3 (Cold) 5 25 1-2h
Total 20 90 5-8h

Add 20% buffer for unexpected issues: 6-10 hours total


Success Metrics

After full conversion:

  • Zero direct meta->freelist accesses (except in atomic accessor functions)
  • All tests pass (single + MT)
  • ASan/TSan clean (no data races)
  • Performance regression <3% (single-threaded)
  • Larson 8T stable (no crashes)
  • MT scaling 70%+ (good scalability)

Emergency Rollback

If conversion fails at any phase:

git stash  # Save work in progress
git checkout master
git branch -D atomic-freelist-phase1  # Or phase2/phase3
# Review strategy and try alternative approach