Phase 1 完了:環境変数整理 + fprintf デバッグガード ENV変数削除(BG/HotMag系): - core/hakmem_tiny_init.inc: HotMag ENV 削除 (~131 lines) - core/hakmem_tiny_bg_spill.c: BG spill ENV 削除 - core/tiny_refill.h: BG remote 固定値化 - core/hakmem_tiny_slow.inc: BG refs 削除 fprintf Debug Guards (#if !HAKMEM_BUILD_RELEASE): - core/hakmem_shared_pool.c: Lock stats (~18 fprintf) - core/page_arena.c: Init/Shutdown/Stats (~27 fprintf) - core/hakmem.c: SIGSEGV init message ドキュメント整理: - 328 markdown files 削除(旧レポート・重複docs) 性能確認: - Larson: 52.35M ops/s (前回52.8M、安定動作✅) - ENV整理による機能影響なし - Debug出力は一部残存(次phase で対応) 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
13 KiB
Atomic Freelist Implementation - Documentation Index
Overview
This directory contains comprehensive documentation and tooling for implementing atomic TinySlabMeta.freelist operations to enable multi-threaded safety in the HAKMEM memory allocator.
Status: Ready for implementation Estimated Effort: 5-8 hours (3 phases) Expected Impact: -2-3% single-threaded, +MT stability and scalability
Quick Start
New to this task? Start here:
- Read:
ATOMIC_FREELIST_QUICK_START.md(15 min) - Run:
./scripts/analyze_freelist_sites.sh(5 min) - Create: Accessor header from template (30 min)
- Begin: Phase 1 conversion (2-3 hours)
Documentation Files
1. Executive Summary
File: ATOMIC_FREELIST_SUMMARY.md
Purpose: High-level overview of the entire implementation
Contents:
- Investigation results (90 sites, not 589)
- Implementation strategy (hybrid approach)
- Performance analysis (2-3% regression expected)
- Risk assessment (low risk, high benefit)
- Timeline and success metrics
Read this first for a complete picture.
2. Implementation Strategy
File: ATOMIC_FREELIST_IMPLEMENTATION_STRATEGY.md
Purpose: Detailed technical strategy and design decisions
Contents:
- Accessor function API design (lock-free CAS + relaxed atomics)
- Critical site list (top 20 sites to convert)
- Non-critical site strategy (skip or use relaxed)
- Phased implementation plan (3 phases)
- Performance projections (single/multi-threaded)
- Memory ordering rationale (acquire/release/relaxed)
- Alternative approaches (mutex, global lock, etc.)
Use this when designing the accessor API and planning conversion phases.
3. Site-by-Site Conversion Guide
File: ATOMIC_FREELIST_SITE_BY_SITE_GUIDE.md
Purpose: Line-by-line conversion instructions for all 90 sites
Contents:
- Phase 1: 5 files, 25 sites (hot paths)
- File 1:
core/box/slab_freelist_atomic.h(CREATE) - File 2:
core/tiny_superslab_alloc.inc.h(8 sites) - File 3:
core/hakmem_tiny_refill_p0.inc.h(3 sites) - File 4:
core/box/carve_push_box.c(10 sites) - File 5:
core/hakmem_tiny_tls_ops.h(4 sites)
- File 1:
- Phase 2: 10 files, 40 sites (warm paths)
- Phase 3: 5 files, 25 sites (cold paths)
- Common pitfalls (double-POP, missing NULL check, etc.)
- Testing checklist per file
- Quick reference card (conversion patterns)
Use this during actual code conversion (your primary reference).
4. Quick Start Guide
File: ATOMIC_FREELIST_QUICK_START.md
Purpose: Step-by-step implementation instructions
Contents:
- Step 1: Read documentation (15 min)
- Step 2: Create accessor header (30 min)
- Step 3: Phase 1 conversion (2-3 hours)
- Step 4: Phase 2 conversion (2-3 hours)
- Step 5: Phase 3 cleanup (1-2 hours)
- Common pitfalls and solutions
- Performance expectations
- Rollback plan
- Success criteria
Use this as your daily task list during implementation.
5. Accessor Header Template
File: core/box/slab_freelist_atomic.h.TEMPLATE
Purpose: Complete implementation of atomic accessor API
Contents:
- Lock-free CAS operations (
slab_freelist_pop_lockfree,slab_freelist_push_lockfree) - Relaxed load/store operations (
slab_freelist_load_relaxed,slab_freelist_store_relaxed) - NULL check helpers (
slab_freelist_is_empty,slab_freelist_is_nonempty) - Debug macro (
SLAB_FREELIST_DEBUG_PTR) - Extensive comments (80+ lines of documentation)
- Conversion examples
- Performance notes
- Testing strategy
Copy this to core/box/slab_freelist_atomic.h to get started.
Tool Scripts
1. Site Analysis Script
File: scripts/analyze_freelist_sites.sh
Purpose: Analyze freelist access patterns in codebase
Output:
- Total site count (90 sites)
- Operation breakdown (POP, PUSH, NULL checks, etc.)
- Files with freelist usage (21 files)
- Phase 1/2/3 file lists
- Lock-protected sites check
- Conversion effort estimates
Run this before starting conversion to validate site counts.
./scripts/analyze_freelist_sites.sh
2. Conversion Verification Script
File: scripts/verify_atomic_freelist_conversion.sh
Purpose: Track conversion progress and detect potential bugs
Output:
- Accessor header check (exists, functions defined)
- Direct access count (remaining unconverted sites)
- Converted operations count (by type)
- Conversion progress (0-100%)
- Phase 1/2/3 file check (which files converted)
- Potential bug detection (double-POP, double-PUSH, missing NULL check)
- Compile status
- Recommendations for next steps
Run this frequently during conversion to track progress and catch bugs early.
./scripts/verify_atomic_freelist_conversion.sh
Example output:
Progress: 30% (27/90 sites)
[============----------------------------]
Currently working on: Phase 1 (Critical Hot Paths)
✅ No double-POP bugs detected
✅ No double-PUSH bugs detected
✅ Compilation succeeded
Implementation Phases
Phase 1: Critical Hot Paths (2-3 hours)
Goal: Fix Larson 8T crash with minimal changes Scope: 5 files, 25 sites Files:
core/box/slab_freelist_atomic.h(CREATE)core/tiny_superslab_alloc.inc.hcore/hakmem_tiny_refill_p0.inc.hcore/box/carve_push_box.ccore/hakmem_tiny_tls_ops.h
Success Criteria:
- ✅ Larson 8T stable (no crashes)
- ✅ Regression <5% (>24.0M ops/s)
- ✅ No TSan warnings
Phase 2: Important Paths (2-3 hours)
Goal: Full MT safety for all allocation paths Scope: 10 files, 40 sites Files:
core/tiny_refill_opt.hcore/tiny_free_magazine.inc.hcore/refill/ss_refill_fc.hcore/slab_handle.h- 6 additional files
Success Criteria:
- ✅ All MT tests pass (1T-16T)
- ✅ Regression <3% (>24.4M ops/s)
- ✅ MT scaling 70%+
Phase 3: Cleanup (1-2 hours)
Goal: Convert/document remaining sites Scope: 5 files, 25 sites Files:
- Debug/stats files
- Init/cleanup files
- Verification files
Success Criteria:
- ✅ All 90 sites converted or documented
- ✅ Zero direct accesses (except atomic.h)
- ✅ Full test suite passes
Testing Strategy
Per-File Testing
After converting each file:
make bench_random_mixed_hakmem
./out/release/bench_random_mixed_hakmem 10000 256 42
Phase 1 Testing
# Single-threaded baseline
./out/release/bench_random_mixed_hakmem 10000000 256 42
# Multi-threaded stability (PRIMARY TEST)
./out/release/larson_hakmem 8 100000 256
# Race detection
./build.sh tsan larson_hakmem
./out/tsan/larson_hakmem 4 10000 256
Phase 2 Testing
# All sizes
for size in 128 256 512 1024; do
./out/release/bench_random_mixed_hakmem 1000000 $size 42
done
# MT scaling
for threads in 1 2 4 8 16; do
./out/release/larson_hakmem $threads 100000 256
done
Phase 3 Testing
# Full test suite
make clean && make all
./run_all_tests.sh
# ASan check
./build.sh asan bench_random_mixed_hakmem
./out/asan/bench_random_mixed_hakmem 100000 256 42
Performance Expectations
Single-Threaded
| Metric | Before | After | Change |
|---|---|---|---|
| Random Mixed 256B | 25.1M ops/s | 24.4-24.8M ops/s | -1.2-2.8% ✅ |
| Larson 1T | 2.76M ops/s | 2.68-2.73M ops/s | -1.1-2.9% ✅ |
Acceptable: <5% regression
Multi-Threaded
| Metric | Before | After | Change |
|---|---|---|---|
| Larson 8T | CRASH | ~18-20M ops/s | FIXED ✅ |
| MT Scaling (8T) | 0% (crashes) | 70-80% | NEW ✅ |
Benefit: Stability + MT scalability >> 2-3% single-threaded cost
Common Patterns
NULL Check Conversion
// BEFORE:
if (meta->freelist) { ... }
// AFTER:
if (slab_freelist_is_nonempty(meta)) { ... }
POP Operation Conversion
// BEFORE:
void* block = meta->freelist;
meta->freelist = tiny_next_read(class_idx, block);
// AFTER:
void* block = slab_freelist_pop_lockfree(meta, class_idx);
if (!block) goto fallback; // Handle race
PUSH Operation Conversion
// BEFORE:
tiny_next_write(class_idx, node, meta->freelist);
meta->freelist = node;
// AFTER:
slab_freelist_push_lockfree(meta, class_idx, node);
Initialization Conversion
// BEFORE:
meta->freelist = NULL;
// AFTER:
slab_freelist_store_relaxed(meta, NULL);
Debug Print Conversion
// BEFORE:
fprintf(stderr, "freelist=%p", meta->freelist);
// AFTER:
fprintf(stderr, "freelist=%p", SLAB_FREELIST_DEBUG_PTR(meta));
Troubleshooting
Issue: Compilation Fails
# Check if accessor header exists
ls -la core/box/slab_freelist_atomic.h
# Check for missing includes
grep -n "#include.*slab_freelist_atomic.h" core/tiny_superslab_alloc.inc.h
# Rebuild from clean state
make clean && make bench_random_mixed_hakmem
Issue: Larson 8T Still Crashes
# Check conversion progress
./scripts/verify_atomic_freelist_conversion.sh
# Run with TSan to detect data races
./build.sh tsan larson_hakmem
./out/tsan/larson_hakmem 4 10000 256 2>&1 | grep -A5 "WARNING"
# Check for double-POP/PUSH bugs
grep -A1 "slab_freelist_pop_lockfree" core/ -r | grep "tiny_next_read"
grep -B1 "slab_freelist_push_lockfree" core/ -r | grep "tiny_next_write"
Issue: Performance Regression >5%
# Verify baseline (before conversion)
git stash
git checkout master
./out/release/bench_random_mixed_hakmem 10000000 256 42
# Record: 25.1M ops/s
# Check converted version
git checkout atomic-freelist-phase1
./out/release/bench_random_mixed_hakmem 10000000 256 42
# Should be: >24.0M ops/s
# If regression >5%, profile hot paths
perf record ./out/release/bench_random_mixed_hakmem 1000000 256 42
perf report
# Look for CAS retry loops or excessive memory ordering
Rollback Procedures
Quick Rollback (if Phase 1 fails)
git stash
git checkout master
git branch -D atomic-freelist-phase1
# Review issues and retry
Alternative Approach (Spinlock)
If lock-free proves too complex:
// Option: Use 1-byte spinlock instead
// Add to TinySlabMeta: uint8_t freelist_lock;
// Use __sync_lock_test_and_set() for lock/unlock
// Expected overhead: 5-10% (vs 2-3% for lock-free)
Progress Tracking
Use the verification script to track progress:
./scripts/verify_atomic_freelist_conversion.sh
Output example:
Progress: 30% (27/90 sites)
[============----------------------------]
Phase 1 files converted: 2/4
Remaining sites: 63
Currently working on: Phase 1 (Critical Hot Paths)
Next step: Convert core/box/carve_push_box.c
Success Criteria
Phase 1 Complete
- 5 files converted (25 sites)
- Larson 8T runs 100K iterations without crash
- Single-threaded regression <5%
- No TSan warnings
- Verification script shows 30% progress
Phase 2 Complete
- 15 files converted (65 sites)
- All MT tests pass (1T-16T)
- Single-threaded regression <3%
- MT scaling 70%+
- Verification script shows 72% progress
Phase 3 Complete
- 21 files converted (90 sites)
- Zero direct
meta->freelistaccesses - Full test suite passes
- Documentation updated (CLAUDE.md)
- Verification script shows 100% progress
File Checklist
Documentation
ATOMIC_FREELIST_SUMMARY.md- Executive summaryATOMIC_FREELIST_IMPLEMENTATION_STRATEGY.md- Technical strategyATOMIC_FREELIST_SITE_BY_SITE_GUIDE.md- Conversion guideATOMIC_FREELIST_QUICK_START.md- Quick start instructionsATOMIC_FREELIST_INDEX.md- This file
Templates
core/box/slab_freelist_atomic.h.TEMPLATE- Accessor API
Tools
scripts/analyze_freelist_sites.sh- Site analysisscripts/verify_atomic_freelist_conversion.sh- Progress tracker
Implementation (to be created)
core/box/slab_freelist_atomic.h- Working accessor API
Contact and Support
If you encounter issues during implementation:
- Check documentation: Review relevant guide for your current phase
- Run verification:
./scripts/verify_atomic_freelist_conversion.sh - Review common pitfalls: See
ATOMIC_FREELIST_SITE_BY_SITE_GUIDE.mdsection - Rollback if needed:
git checkout master
Estimated Timeline
| Milestone | Duration | Cumulative |
|---|---|---|
| Preparation | 15 min | 0.25h |
| Create accessor header | 30 min | 0.75h |
| Phase 1 conversion | 2-3h | 3-4h |
| Phase 1 testing | 30 min | 3.5-4.5h |
| Phase 2 conversion | 2-3h | 5.5-7.5h |
| Phase 2 testing | 1h | 6.5-8.5h |
| Phase 3 conversion | 1-2h | 7.5-10.5h |
| Phase 3 testing | 1h | 8.5-11.5h |
| Total | 8.5-11.5h |
Minimal viable: 3.5-4.5 hours (Phase 1 only, fixes Larson crash) Full implementation: 8.5-11.5 hours (all 3 phases, complete MT safety)
Next Steps
Ready to start?
- Read
ATOMIC_FREELIST_QUICK_START.md(15 min) - Run
./scripts/analyze_freelist_sites.sh(5 min) - Copy template:
cp core/box/slab_freelist_atomic.h.TEMPLATE core/box/slab_freelist_atomic.h(5 min) - Edit template to add includes (20 min)
- Test compile:
make bench_random_mixed_hakmem(5 min) - Begin Phase 1 conversion using
ATOMIC_FREELIST_SITE_BY_SITE_GUIDE.md(2-3 hours)
Good luck! 🚀