Files
hakmem/docs/specs/ATOMIC_FREELIST_INDEX.md
Moe Charm (CI) a9ddb52ad4 ENV cleanup: Remove BG/HotMag vars & guard fprintf (Larson 52.3M ops/s)
Phase 1 完了:環境変数整理 + fprintf デバッグガード

ENV変数削除(BG/HotMag系):
- core/hakmem_tiny_init.inc: HotMag ENV 削除 (~131 lines)
- core/hakmem_tiny_bg_spill.c: BG spill ENV 削除
- core/tiny_refill.h: BG remote 固定値化
- core/hakmem_tiny_slow.inc: BG refs 削除

fprintf Debug Guards (#if !HAKMEM_BUILD_RELEASE):
- core/hakmem_shared_pool.c: Lock stats (~18 fprintf)
- core/page_arena.c: Init/Shutdown/Stats (~27 fprintf)
- core/hakmem.c: SIGSEGV init message

ドキュメント整理:
- 328 markdown files 削除(旧レポート・重複docs)

性能確認:
- Larson: 52.35M ops/s (前回52.8M、安定動作)
- ENV整理による機能影響なし
- Debug出力は一部残存(次phase で対応)

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 14:45:26 +09:00

13 KiB

Atomic Freelist Implementation - Documentation Index

Overview

This directory contains comprehensive documentation and tooling for implementing atomic TinySlabMeta.freelist operations to enable multi-threaded safety in the HAKMEM memory allocator.

Status: Ready for implementation Estimated Effort: 5-8 hours (3 phases) Expected Impact: -2-3% single-threaded, +MT stability and scalability


Quick Start

New to this task? Start here:

  1. Read: ATOMIC_FREELIST_QUICK_START.md (15 min)
  2. Run: ./scripts/analyze_freelist_sites.sh (5 min)
  3. Create: Accessor header from template (30 min)
  4. Begin: Phase 1 conversion (2-3 hours)

Documentation Files

1. Executive Summary

File: ATOMIC_FREELIST_SUMMARY.md Purpose: High-level overview of the entire implementation Contents:

  • Investigation results (90 sites, not 589)
  • Implementation strategy (hybrid approach)
  • Performance analysis (2-3% regression expected)
  • Risk assessment (low risk, high benefit)
  • Timeline and success metrics

Read this first for a complete picture.


2. Implementation Strategy

File: ATOMIC_FREELIST_IMPLEMENTATION_STRATEGY.md Purpose: Detailed technical strategy and design decisions Contents:

  • Accessor function API design (lock-free CAS + relaxed atomics)
  • Critical site list (top 20 sites to convert)
  • Non-critical site strategy (skip or use relaxed)
  • Phased implementation plan (3 phases)
  • Performance projections (single/multi-threaded)
  • Memory ordering rationale (acquire/release/relaxed)
  • Alternative approaches (mutex, global lock, etc.)

Use this when designing the accessor API and planning conversion phases.


3. Site-by-Site Conversion Guide

File: ATOMIC_FREELIST_SITE_BY_SITE_GUIDE.md Purpose: Line-by-line conversion instructions for all 90 sites Contents:

  • Phase 1: 5 files, 25 sites (hot paths)
    • File 1: core/box/slab_freelist_atomic.h (CREATE)
    • File 2: core/tiny_superslab_alloc.inc.h (8 sites)
    • File 3: core/hakmem_tiny_refill_p0.inc.h (3 sites)
    • File 4: core/box/carve_push_box.c (10 sites)
    • File 5: core/hakmem_tiny_tls_ops.h (4 sites)
  • Phase 2: 10 files, 40 sites (warm paths)
  • Phase 3: 5 files, 25 sites (cold paths)
  • Common pitfalls (double-POP, missing NULL check, etc.)
  • Testing checklist per file
  • Quick reference card (conversion patterns)

Use this during actual code conversion (your primary reference).


4. Quick Start Guide

File: ATOMIC_FREELIST_QUICK_START.md Purpose: Step-by-step implementation instructions Contents:

  • Step 1: Read documentation (15 min)
  • Step 2: Create accessor header (30 min)
  • Step 3: Phase 1 conversion (2-3 hours)
  • Step 4: Phase 2 conversion (2-3 hours)
  • Step 5: Phase 3 cleanup (1-2 hours)
  • Common pitfalls and solutions
  • Performance expectations
  • Rollback plan
  • Success criteria

Use this as your daily task list during implementation.


5. Accessor Header Template

File: core/box/slab_freelist_atomic.h.TEMPLATE Purpose: Complete implementation of atomic accessor API Contents:

  • Lock-free CAS operations (slab_freelist_pop_lockfree, slab_freelist_push_lockfree)
  • Relaxed load/store operations (slab_freelist_load_relaxed, slab_freelist_store_relaxed)
  • NULL check helpers (slab_freelist_is_empty, slab_freelist_is_nonempty)
  • Debug macro (SLAB_FREELIST_DEBUG_PTR)
  • Extensive comments (80+ lines of documentation)
  • Conversion examples
  • Performance notes
  • Testing strategy

Copy this to core/box/slab_freelist_atomic.h to get started.


Tool Scripts

1. Site Analysis Script

File: scripts/analyze_freelist_sites.sh Purpose: Analyze freelist access patterns in codebase Output:

  • Total site count (90 sites)
  • Operation breakdown (POP, PUSH, NULL checks, etc.)
  • Files with freelist usage (21 files)
  • Phase 1/2/3 file lists
  • Lock-protected sites check
  • Conversion effort estimates

Run this before starting conversion to validate site counts.

./scripts/analyze_freelist_sites.sh

2. Conversion Verification Script

File: scripts/verify_atomic_freelist_conversion.sh Purpose: Track conversion progress and detect potential bugs Output:

  • Accessor header check (exists, functions defined)
  • Direct access count (remaining unconverted sites)
  • Converted operations count (by type)
  • Conversion progress (0-100%)
  • Phase 1/2/3 file check (which files converted)
  • Potential bug detection (double-POP, double-PUSH, missing NULL check)
  • Compile status
  • Recommendations for next steps

Run this frequently during conversion to track progress and catch bugs early.

./scripts/verify_atomic_freelist_conversion.sh

Example output:

Progress: 30% (27/90 sites)
[============----------------------------]
Currently working on: Phase 1 (Critical Hot Paths)

✅ No double-POP bugs detected
✅ No double-PUSH bugs detected
✅ Compilation succeeded

Implementation Phases

Phase 1: Critical Hot Paths (2-3 hours)

Goal: Fix Larson 8T crash with minimal changes Scope: 5 files, 25 sites Files:

  • core/box/slab_freelist_atomic.h (CREATE)
  • core/tiny_superslab_alloc.inc.h
  • core/hakmem_tiny_refill_p0.inc.h
  • core/box/carve_push_box.c
  • core/hakmem_tiny_tls_ops.h

Success Criteria:

  • Larson 8T stable (no crashes)
  • Regression <5% (>24.0M ops/s)
  • No TSan warnings

Phase 2: Important Paths (2-3 hours)

Goal: Full MT safety for all allocation paths Scope: 10 files, 40 sites Files:

  • core/tiny_refill_opt.h
  • core/tiny_free_magazine.inc.h
  • core/refill/ss_refill_fc.h
  • core/slab_handle.h
  • 6 additional files

Success Criteria:

  • All MT tests pass (1T-16T)
  • Regression <3% (>24.4M ops/s)
  • MT scaling 70%+

Phase 3: Cleanup (1-2 hours)

Goal: Convert/document remaining sites Scope: 5 files, 25 sites Files:

  • Debug/stats files
  • Init/cleanup files
  • Verification files

Success Criteria:

  • All 90 sites converted or documented
  • Zero direct accesses (except atomic.h)
  • Full test suite passes

Testing Strategy

Per-File Testing

After converting each file:

make bench_random_mixed_hakmem
./out/release/bench_random_mixed_hakmem 10000 256 42

Phase 1 Testing

# Single-threaded baseline
./out/release/bench_random_mixed_hakmem 10000000 256 42

# Multi-threaded stability (PRIMARY TEST)
./out/release/larson_hakmem 8 100000 256

# Race detection
./build.sh tsan larson_hakmem
./out/tsan/larson_hakmem 4 10000 256

Phase 2 Testing

# All sizes
for size in 128 256 512 1024; do
    ./out/release/bench_random_mixed_hakmem 1000000 $size 42
done

# MT scaling
for threads in 1 2 4 8 16; do
    ./out/release/larson_hakmem $threads 100000 256
done

Phase 3 Testing

# Full test suite
make clean && make all
./run_all_tests.sh

# ASan check
./build.sh asan bench_random_mixed_hakmem
./out/asan/bench_random_mixed_hakmem 100000 256 42

Performance Expectations

Single-Threaded

Metric Before After Change
Random Mixed 256B 25.1M ops/s 24.4-24.8M ops/s -1.2-2.8%
Larson 1T 2.76M ops/s 2.68-2.73M ops/s -1.1-2.9%

Acceptable: <5% regression

Multi-Threaded

Metric Before After Change
Larson 8T CRASH ~18-20M ops/s FIXED
MT Scaling (8T) 0% (crashes) 70-80% NEW

Benefit: Stability + MT scalability >> 2-3% single-threaded cost


Common Patterns

NULL Check Conversion

// BEFORE:
if (meta->freelist) { ... }

// AFTER:
if (slab_freelist_is_nonempty(meta)) { ... }

POP Operation Conversion

// BEFORE:
void* block = meta->freelist;
meta->freelist = tiny_next_read(class_idx, block);

// AFTER:
void* block = slab_freelist_pop_lockfree(meta, class_idx);
if (!block) goto fallback;  // Handle race

PUSH Operation Conversion

// BEFORE:
tiny_next_write(class_idx, node, meta->freelist);
meta->freelist = node;

// AFTER:
slab_freelist_push_lockfree(meta, class_idx, node);

Initialization Conversion

// BEFORE:
meta->freelist = NULL;

// AFTER:
slab_freelist_store_relaxed(meta, NULL);

Debug Print Conversion

// BEFORE:
fprintf(stderr, "freelist=%p", meta->freelist);

// AFTER:
fprintf(stderr, "freelist=%p", SLAB_FREELIST_DEBUG_PTR(meta));

Troubleshooting

Issue: Compilation Fails

# Check if accessor header exists
ls -la core/box/slab_freelist_atomic.h

# Check for missing includes
grep -n "#include.*slab_freelist_atomic.h" core/tiny_superslab_alloc.inc.h

# Rebuild from clean state
make clean && make bench_random_mixed_hakmem

Issue: Larson 8T Still Crashes

# Check conversion progress
./scripts/verify_atomic_freelist_conversion.sh

# Run with TSan to detect data races
./build.sh tsan larson_hakmem
./out/tsan/larson_hakmem 4 10000 256 2>&1 | grep -A5 "WARNING"

# Check for double-POP/PUSH bugs
grep -A1 "slab_freelist_pop_lockfree" core/ -r | grep "tiny_next_read"
grep -B1 "slab_freelist_push_lockfree" core/ -r | grep "tiny_next_write"

Issue: Performance Regression >5%

# Verify baseline (before conversion)
git stash
git checkout master
./out/release/bench_random_mixed_hakmem 10000000 256 42
# Record: 25.1M ops/s

# Check converted version
git checkout atomic-freelist-phase1
./out/release/bench_random_mixed_hakmem 10000000 256 42
# Should be: >24.0M ops/s

# If regression >5%, profile hot paths
perf record ./out/release/bench_random_mixed_hakmem 1000000 256 42
perf report
# Look for CAS retry loops or excessive memory ordering

Rollback Procedures

Quick Rollback (if Phase 1 fails)

git stash
git checkout master
git branch -D atomic-freelist-phase1
# Review issues and retry

Alternative Approach (Spinlock)

If lock-free proves too complex:

// Option: Use 1-byte spinlock instead
// Add to TinySlabMeta: uint8_t freelist_lock;
// Use __sync_lock_test_and_set() for lock/unlock
// Expected overhead: 5-10% (vs 2-3% for lock-free)

Progress Tracking

Use the verification script to track progress:

./scripts/verify_atomic_freelist_conversion.sh

Output example:

Progress: 30% (27/90 sites)
[============----------------------------]

Phase 1 files converted: 2/4
Remaining sites: 63

Currently working on: Phase 1 (Critical Hot Paths)
Next step: Convert core/box/carve_push_box.c

Success Criteria

Phase 1 Complete

  • 5 files converted (25 sites)
  • Larson 8T runs 100K iterations without crash
  • Single-threaded regression <5%
  • No TSan warnings
  • Verification script shows 30% progress

Phase 2 Complete

  • 15 files converted (65 sites)
  • All MT tests pass (1T-16T)
  • Single-threaded regression <3%
  • MT scaling 70%+
  • Verification script shows 72% progress

Phase 3 Complete

  • 21 files converted (90 sites)
  • Zero direct meta->freelist accesses
  • Full test suite passes
  • Documentation updated (CLAUDE.md)
  • Verification script shows 100% progress

File Checklist

Documentation

  • ATOMIC_FREELIST_SUMMARY.md - Executive summary
  • ATOMIC_FREELIST_IMPLEMENTATION_STRATEGY.md - Technical strategy
  • ATOMIC_FREELIST_SITE_BY_SITE_GUIDE.md - Conversion guide
  • ATOMIC_FREELIST_QUICK_START.md - Quick start instructions
  • ATOMIC_FREELIST_INDEX.md - This file

Templates

  • core/box/slab_freelist_atomic.h.TEMPLATE - Accessor API

Tools

  • scripts/analyze_freelist_sites.sh - Site analysis
  • scripts/verify_atomic_freelist_conversion.sh - Progress tracker

Implementation (to be created)

  • core/box/slab_freelist_atomic.h - Working accessor API

Contact and Support

If you encounter issues during implementation:

  1. Check documentation: Review relevant guide for your current phase
  2. Run verification: ./scripts/verify_atomic_freelist_conversion.sh
  3. Review common pitfalls: See ATOMIC_FREELIST_SITE_BY_SITE_GUIDE.md section
  4. Rollback if needed: git checkout master

Estimated Timeline

Milestone Duration Cumulative
Preparation 15 min 0.25h
Create accessor header 30 min 0.75h
Phase 1 conversion 2-3h 3-4h
Phase 1 testing 30 min 3.5-4.5h
Phase 2 conversion 2-3h 5.5-7.5h
Phase 2 testing 1h 6.5-8.5h
Phase 3 conversion 1-2h 7.5-10.5h
Phase 3 testing 1h 8.5-11.5h
Total 8.5-11.5h

Minimal viable: 3.5-4.5 hours (Phase 1 only, fixes Larson crash) Full implementation: 8.5-11.5 hours (all 3 phases, complete MT safety)


Next Steps

Ready to start?

  1. Read ATOMIC_FREELIST_QUICK_START.md (15 min)
  2. Run ./scripts/analyze_freelist_sites.sh (5 min)
  3. Copy template: cp core/box/slab_freelist_atomic.h.TEMPLATE core/box/slab_freelist_atomic.h (5 min)
  4. Edit template to add includes (20 min)
  5. Test compile: make bench_random_mixed_hakmem (5 min)
  6. Begin Phase 1 conversion using ATOMIC_FREELIST_SITE_BY_SITE_GUIDE.md (2-3 hours)

Good luck! 🚀