Phase 1 完了:環境変数整理 + fprintf デバッグガード ENV変数削除(BG/HotMag系): - core/hakmem_tiny_init.inc: HotMag ENV 削除 (~131 lines) - core/hakmem_tiny_bg_spill.c: BG spill ENV 削除 - core/tiny_refill.h: BG remote 固定値化 - core/hakmem_tiny_slow.inc: BG refs 削除 fprintf Debug Guards (#if !HAKMEM_BUILD_RELEASE): - core/hakmem_shared_pool.c: Lock stats (~18 fprintf) - core/page_arena.c: Init/Shutdown/Stats (~27 fprintf) - core/hakmem.c: SIGSEGV init message ドキュメント整理: - 328 markdown files 削除(旧レポート・重複docs) 性能確認: - Larson: 52.35M ops/s (前回52.8M、安定動作✅) - ENV整理による機能影響なし - Debug出力は一部残存(次phase で対応) 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
10 KiB
Atomic Freelist Quick Start Guide
TL;DR
Problem: 589 freelist access sites? → Actual: 90 sites (much better!) Solution: Hybrid approach - lock-free CAS for hot paths, relaxed atomics for cold paths Effort: 5-8 hours (3 phases) Risk: Low (incremental, easy rollback) Impact: -2-3% single-threaded, +MT stability
Step-by-Step Implementation
Step 1: Read Documentation (15 min)
-
Strategy:
ATOMIC_FREELIST_IMPLEMENTATION_STRATEGY.md- Accessor function design
- Memory ordering rationale
- Performance projections
-
Site Guide:
ATOMIC_FREELIST_SITE_BY_SITE_GUIDE.md- File-by-file conversion instructions
- Common pitfalls
- Testing checklist
-
Analysis: Run
scripts/analyze_freelist_sites.sh- Validates site counts
- Shows operation breakdown
- Estimates effort
Step 2: Create Accessor Header (30 min)
# Copy template to working file
cp core/box/slab_freelist_atomic.h.TEMPLATE core/box/slab_freelist_atomic.h
# Add include to tiny_next_ptr_box.h
echo '#include "tiny_next_ptr_box.h"' >> core/box/slab_freelist_atomic.h
# Verify compile
make clean
make bench_random_mixed_hakmem 2>&1 | grep -i error
Expected: Clean compile (no errors)
Step 3: Phase 1 - Hot Paths (2-3 hours)
3.1 Convert NULL Checks (30 min)
Pattern: if (meta->freelist) → if (slab_freelist_is_nonempty(meta))
Files:
core/tiny_superslab_alloc.inc.h(4 sites)core/hakmem_tiny_refill_p0.inc.h(1 site)core/box/carve_push_box.c(2 sites)core/hakmem_tiny_tls_ops.h(2 sites)
Commands:
# Add include at top of each file
# For tiny_superslab_alloc.inc.h:
sed -i '1i#include "box/slab_freelist_atomic.h"' core/tiny_superslab_alloc.inc.h
# Replace NULL checks (review carefully!)
# Do this manually - automated sed is too risky
3.2 Convert POP Operations (1 hour)
Pattern:
// BEFORE:
void* block = meta->freelist;
meta->freelist = tiny_next_read(class_idx, block);
// AFTER:
void* block = slab_freelist_pop_lockfree(meta, class_idx);
if (!block) goto fallback; // Handle race
Files:
core/tiny_superslab_alloc.inc.h:117-145(1 critical site)core/box/carve_push_box.c:173-174(1 site)core/hakmem_tiny_tls_ops.h:83-85(1 site)
Testing after each file:
make bench_random_mixed_hakmem
./out/release/bench_random_mixed_hakmem 10000 256 42
3.3 Convert PUSH Operations (1 hour)
Pattern:
// BEFORE:
tiny_next_write(class_idx, node, meta->freelist);
meta->freelist = node;
// AFTER:
slab_freelist_push_lockfree(meta, class_idx, node);
Files:
core/box/carve_push_box.c(6 sites - rollback paths)
Testing:
make bench_random_mixed_hakmem
./out/release/bench_random_mixed_hakmem 100000 256 42
3.4 Phase 1 Final Test (30 min)
# Single-threaded baseline
./out/release/bench_random_mixed_hakmem 10000000 256 42
# Record ops/s (expect: 24.4-24.8M, vs 25.1M baseline)
# Multi-threaded stability
make larson_hakmem
./out/release/larson_hakmem 8 100000 256
# Expect: No crashes, ~18-20M ops/s
# Race detection
./build.sh tsan larson_hakmem
./out/tsan/larson_hakmem 4 10000 256
# Expect: No TSan warnings
Success Criteria:
- ✅ Single-threaded regression <5% (24.0M+ ops/s)
- ✅ Larson 8T stable (no crashes)
- ✅ No TSan warnings
- ✅ Clean build
If failed: Rollback and debug
git diff > phase1.patch # Save work
git checkout . # Revert
# Review phase1.patch and fix issues
Step 4: Phase 2 - Warm Paths (2-3 hours)
Scope: Convert remaining 40 sites in 10 files
Files (in order of priority):
core/tiny_refill_opt.h(refill chain ops)core/tiny_free_magazine.inc.h(magazine push)core/refill/ss_refill_fc.h(FC refill)core/slab_handle.h(slab handle ops) 5-10. Remaining files (see SITE_BY_SITE_GUIDE.md)
Testing (after each file):
make bench_random_mixed_hakmem
./out/release/bench_random_mixed_hakmem 100000 256 42
Phase 2 Final Test:
# All sizes
for size in 128 256 512 1024; do
./out/release/bench_random_mixed_hakmem 1000000 $size 42
done
# MT scaling
for threads in 1 2 4 8 16; do
./out/release/larson_hakmem $threads 100000 256
done
Step 5: Phase 3 - Cleanup (1-2 hours)
Scope: Convert/document remaining 25 sites
5.1 Debug/Stats Sites (30 min)
Pattern: meta->freelist → SLAB_FREELIST_DEBUG_PTR(meta)
Files:
core/box/ss_stats_box.ccore/tiny_debug.hcore/tiny_remote.c
5.2 Init/Cleanup Sites (30 min)
Pattern: meta->freelist = NULL → slab_freelist_store_relaxed(meta, NULL)
Files:
core/hakmem_tiny_superslab.ccore/hakmem_smallmid_superslab.c
5.3 Final Verification (30 min)
# Full rebuild
make clean && make all
# Run all tests
./run_all_tests.sh
# Check for remaining direct accesses
grep -rn "meta->freelist" core/ --include="*.c" --include="*.h" | \
grep -v "slab_freelist_" | grep -v "SLAB_FREELIST_DEBUG_PTR"
# Expect: 0 results (all converted or documented)
Common Pitfalls
Pitfall 1: Double-Converting POP
// ❌ WRONG: slab_freelist_pop_lockfree already calls tiny_next_read!
void* p = slab_freelist_pop_lockfree(meta, class_idx);
void* next = tiny_next_read(class_idx, p); // ❌ BUG!
// ✅ RIGHT: Use p directly
void* p = slab_freelist_pop_lockfree(meta, class_idx);
if (!p) goto fallback;
use(p); // ✅ CORRECT
Pitfall 2: Forgetting Race Handling
// ❌ WRONG: Assuming pop always succeeds
void* p = slab_freelist_pop_lockfree(meta, class_idx);
use(p); // ❌ SEGV if p == NULL!
// ✅ RIGHT: Always check for NULL
void* p = slab_freelist_pop_lockfree(meta, class_idx);
if (!p) goto fallback; // ✅ CORRECT
use(p);
Pitfall 3: Including Header Before Dependencies
// ❌ WRONG: slab_freelist_atomic.h needs tiny_next_ptr_box.h
#include "box/slab_freelist_atomic.h" // ❌ Compile error!
#include "box/tiny_next_ptr_box.h"
// ✅ RIGHT: Dependencies first
#include "box/tiny_next_ptr_box.h" // ✅ CORRECT
#include "box/slab_freelist_atomic.h"
Performance Expectations
Single-Threaded
| Metric | Before | After | Change |
|---|---|---|---|
| Random Mixed 256B | 25.1M ops/s | 24.4-24.8M ops/s | -1.2-2.8% |
| Larson 1T | 2.76M ops/s | 2.68-2.73M ops/s | -1.1-2.9% |
Acceptable: <5% regression (relaxed atomics have ~0% cost, CAS has 60-140% but rare)
Multi-Threaded
| Metric | Before | After | Change |
|---|---|---|---|
| Larson 8T | CRASH | ~18-20M ops/s | ✅ FIXED |
| MT Scaling (8T) | 0% (crashes) | 70-80% | ✅ GAIN |
Expected: Stability + MT scalability >> 2-3% single-threaded cost
Rollback Plan
If Phase 1 fails (>5% regression or instability):
# Option 1: Revert to master
git checkout master
git branch -D atomic-freelist-phase1
# Option 2: Alternative approach (per-slab spinlock)
# Add uint8_t lock field to TinySlabMeta (1 byte)
# Use __sync_lock_test_and_set() for spinlock (5-10% overhead)
# Guaranteed correctness, simpler implementation
Success Criteria
Phase 1
- ✅ Larson 8T runs without crash (100K iterations)
- ✅ Single-threaded regression <5% (24.0M+ ops/s)
- ✅ No ASan/TSan warnings
Phase 2
- ✅ All MT tests pass (1T, 2T, 4T, 8T, 16T)
- ✅ Single-threaded regression <3% (24.4M+ ops/s)
- ✅ MT scaling 70%+ (8T = 5.6x+ speedup)
Phase 3
- ✅ All 90 sites converted or documented
- ✅ Full test suite passes (100% pass rate)
- ✅ Zero direct
meta->freelistaccesses (except in atomic.h)
Time Budget
| Phase | Description | Files | Sites | Time |
|---|---|---|---|---|
| Prep | Read docs, setup | - | - | 15 min |
| Header | Create accessor API | 1 | - | 30 min |
| Phase 1 | Hot paths (critical) | 5 | 25 | 2-3h |
| Phase 2 | Warm paths (important) | 10 | 40 | 2-3h |
| Phase 3 | Cold paths (cleanup) | 5 | 25 | 1-2h |
| Total | 21 | 90 | 6-9h |
Realistic: 6-9 hours with testing and debugging
Next Steps
-
Review strategy (15 min)
ATOMIC_FREELIST_IMPLEMENTATION_STRATEGY.mdATOMIC_FREELIST_SITE_BY_SITE_GUIDE.md
-
Run analysis (5 min)
./scripts/analyze_freelist_sites.sh -
Create branch (2 min)
git checkout -b atomic-freelist-phase1 git stash # Save any uncommitted work -
Create accessor header (30 min)
cp core/box/slab_freelist_atomic.h.TEMPLATE core/box/slab_freelist_atomic.h # Edit to add includes make bench_random_mixed_hakmem # Test compile -
Start Phase 1 (2-3 hours)
- Convert 5 files, ~25 sites
- Test after each file
- Final test with Larson 8T
-
Evaluate results
- If pass: Continue to Phase 2
- If fail: Debug or rollback
Support Documents
- ATOMIC_FREELIST_IMPLEMENTATION_STRATEGY.md - Overall strategy, performance analysis
- ATOMIC_FREELIST_SITE_BY_SITE_GUIDE.md - Detailed conversion instructions
- core/box/slab_freelist_atomic.h.TEMPLATE - Accessor API implementation
- scripts/analyze_freelist_sites.sh - Automated site analysis
Questions?
Q: Why not just add a mutex to TinySlabMeta? A: 40-byte overhead per slab, 10-20x performance hit. Lock-free CAS is 3-5x faster.
Q: Why not use a global lock? A: Serializes all allocation, kills MT performance. Lock-free allows concurrency.
Q: Why 3 phases instead of all at once? A: Risk management. Phase 1 fixes Larson crash (2-3h), can stop there if needed.
Q: What if performance regression is >5%? A: Rollback to master, review strategy. Consider spinlock alternative (5-10% overhead, simpler).
Q: Can I skip Phase 3? A: Yes, but you'll have ~25 sites with direct access (debug/stats). Document them clearly.
Recommendation
Start with Phase 1 (2-3 hours) and evaluate results:
- If Larson 8T stable + regression <5%: ✅ Continue to Phase 2
- If unstable or regression >5%: ❌ Rollback and review
Best case: 6-9 hours for full MT safety with <3% regression Worst case: 2-3 hours to prove feasibility, then rollback if needed
Risk: Low (incremental, easy rollback, well-documented) Benefit: High (MT stability, scalability, future-proof architecture)