Files
hakmem/docs/specs/ATOMIC_FREELIST_INDEX.md
Moe Charm (CI) a9ddb52ad4 ENV cleanup: Remove BG/HotMag vars & guard fprintf (Larson 52.3M ops/s)
Phase 1 完了:環境変数整理 + fprintf デバッグガード

ENV変数削除(BG/HotMag系):
- core/hakmem_tiny_init.inc: HotMag ENV 削除 (~131 lines)
- core/hakmem_tiny_bg_spill.c: BG spill ENV 削除
- core/tiny_refill.h: BG remote 固定値化
- core/hakmem_tiny_slow.inc: BG refs 削除

fprintf Debug Guards (#if !HAKMEM_BUILD_RELEASE):
- core/hakmem_shared_pool.c: Lock stats (~18 fprintf)
- core/page_arena.c: Init/Shutdown/Stats (~27 fprintf)
- core/hakmem.c: SIGSEGV init message

ドキュメント整理:
- 328 markdown files 削除(旧レポート・重複docs)

性能確認:
- Larson: 52.35M ops/s (前回52.8M、安定動作)
- ENV整理による機能影響なし
- Debug出力は一部残存(次phase で対応)

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 14:45:26 +09:00

517 lines
13 KiB
Markdown

# Atomic Freelist Implementation - Documentation Index
## Overview
This directory contains comprehensive documentation and tooling for implementing atomic `TinySlabMeta.freelist` operations to enable multi-threaded safety in the HAKMEM memory allocator.
**Status**: Ready for implementation
**Estimated Effort**: 5-8 hours (3 phases)
**Expected Impact**: -2-3% single-threaded, +MT stability and scalability
---
## Quick Start
**New to this task?** Start here:
1. **Read**: `ATOMIC_FREELIST_QUICK_START.md` (15 min)
2. **Run**: `./scripts/analyze_freelist_sites.sh` (5 min)
3. **Create**: Accessor header from template (30 min)
4. **Begin**: Phase 1 conversion (2-3 hours)
---
## Documentation Files
### 1. Executive Summary
**File**: `ATOMIC_FREELIST_SUMMARY.md`
**Purpose**: High-level overview of the entire implementation
**Contents**:
- Investigation results (90 sites, not 589)
- Implementation strategy (hybrid approach)
- Performance analysis (2-3% regression expected)
- Risk assessment (low risk, high benefit)
- Timeline and success metrics
**Read this first** for a complete picture.
---
### 2. Implementation Strategy
**File**: `ATOMIC_FREELIST_IMPLEMENTATION_STRATEGY.md`
**Purpose**: Detailed technical strategy and design decisions
**Contents**:
- Accessor function API design (lock-free CAS + relaxed atomics)
- Critical site list (top 20 sites to convert)
- Non-critical site strategy (skip or use relaxed)
- Phased implementation plan (3 phases)
- Performance projections (single/multi-threaded)
- Memory ordering rationale (acquire/release/relaxed)
- Alternative approaches (mutex, global lock, etc.)
**Use this** when designing the accessor API and planning conversion phases.
---
### 3. Site-by-Site Conversion Guide
**File**: `ATOMIC_FREELIST_SITE_BY_SITE_GUIDE.md`
**Purpose**: Line-by-line conversion instructions for all 90 sites
**Contents**:
- Phase 1: 5 files, 25 sites (hot paths)
- File 1: `core/box/slab_freelist_atomic.h` (CREATE)
- File 2: `core/tiny_superslab_alloc.inc.h` (8 sites)
- File 3: `core/hakmem_tiny_refill_p0.inc.h` (3 sites)
- File 4: `core/box/carve_push_box.c` (10 sites)
- File 5: `core/hakmem_tiny_tls_ops.h` (4 sites)
- Phase 2: 10 files, 40 sites (warm paths)
- Phase 3: 5 files, 25 sites (cold paths)
- Common pitfalls (double-POP, missing NULL check, etc.)
- Testing checklist per file
- Quick reference card (conversion patterns)
**Use this** during actual code conversion (your primary reference).
---
### 4. Quick Start Guide
**File**: `ATOMIC_FREELIST_QUICK_START.md`
**Purpose**: Step-by-step implementation instructions
**Contents**:
- Step 1: Read documentation (15 min)
- Step 2: Create accessor header (30 min)
- Step 3: Phase 1 conversion (2-3 hours)
- Step 4: Phase 2 conversion (2-3 hours)
- Step 5: Phase 3 cleanup (1-2 hours)
- Common pitfalls and solutions
- Performance expectations
- Rollback plan
- Success criteria
**Use this** as your daily task list during implementation.
---
### 5. Accessor Header Template
**File**: `core/box/slab_freelist_atomic.h.TEMPLATE`
**Purpose**: Complete implementation of atomic accessor API
**Contents**:
- Lock-free CAS operations (`slab_freelist_pop_lockfree`, `slab_freelist_push_lockfree`)
- Relaxed load/store operations (`slab_freelist_load_relaxed`, `slab_freelist_store_relaxed`)
- NULL check helpers (`slab_freelist_is_empty`, `slab_freelist_is_nonempty`)
- Debug macro (`SLAB_FREELIST_DEBUG_PTR`)
- Extensive comments (80+ lines of documentation)
- Conversion examples
- Performance notes
- Testing strategy
**Copy this** to `core/box/slab_freelist_atomic.h` to get started.
---
## Tool Scripts
### 1. Site Analysis Script
**File**: `scripts/analyze_freelist_sites.sh`
**Purpose**: Analyze freelist access patterns in codebase
**Output**:
- Total site count (90 sites)
- Operation breakdown (POP, PUSH, NULL checks, etc.)
- Files with freelist usage (21 files)
- Phase 1/2/3 file lists
- Lock-protected sites check
- Conversion effort estimates
**Run this** before starting conversion to validate site counts.
```bash
./scripts/analyze_freelist_sites.sh
```
---
### 2. Conversion Verification Script
**File**: `scripts/verify_atomic_freelist_conversion.sh`
**Purpose**: Track conversion progress and detect potential bugs
**Output**:
- Accessor header check (exists, functions defined)
- Direct access count (remaining unconverted sites)
- Converted operations count (by type)
- Conversion progress (0-100%)
- Phase 1/2/3 file check (which files converted)
- Potential bug detection (double-POP, double-PUSH, missing NULL check)
- Compile status
- Recommendations for next steps
**Run this** frequently during conversion to track progress and catch bugs early.
```bash
./scripts/verify_atomic_freelist_conversion.sh
```
**Example output**:
```
Progress: 30% (27/90 sites)
[============----------------------------]
Currently working on: Phase 1 (Critical Hot Paths)
✅ No double-POP bugs detected
✅ No double-PUSH bugs detected
✅ Compilation succeeded
```
---
## Implementation Phases
### Phase 1: Critical Hot Paths (2-3 hours)
**Goal**: Fix Larson 8T crash with minimal changes
**Scope**: 5 files, 25 sites
**Files**:
- `core/box/slab_freelist_atomic.h` (CREATE)
- `core/tiny_superslab_alloc.inc.h`
- `core/hakmem_tiny_refill_p0.inc.h`
- `core/box/carve_push_box.c`
- `core/hakmem_tiny_tls_ops.h`
**Success Criteria**:
- ✅ Larson 8T stable (no crashes)
- ✅ Regression <5% (>24.0M ops/s)
- ✅ No TSan warnings
---
### Phase 2: Important Paths (2-3 hours)
**Goal**: Full MT safety for all allocation paths
**Scope**: 10 files, 40 sites
**Files**:
- `core/tiny_refill_opt.h`
- `core/tiny_free_magazine.inc.h`
- `core/refill/ss_refill_fc.h`
- `core/slab_handle.h`
- 6 additional files
**Success Criteria**:
- ✅ All MT tests pass (1T-16T)
- ✅ Regression <3% (>24.4M ops/s)
- ✅ MT scaling 70%+
---
### Phase 3: Cleanup (1-2 hours)
**Goal**: Convert/document remaining sites
**Scope**: 5 files, 25 sites
**Files**:
- Debug/stats files
- Init/cleanup files
- Verification files
**Success Criteria**:
- ✅ All 90 sites converted or documented
- ✅ Zero direct accesses (except atomic.h)
- ✅ Full test suite passes
---
## Testing Strategy
### Per-File Testing
After converting each file:
```bash
make bench_random_mixed_hakmem
./out/release/bench_random_mixed_hakmem 10000 256 42
```
### Phase 1 Testing
```bash
# Single-threaded baseline
./out/release/bench_random_mixed_hakmem 10000000 256 42
# Multi-threaded stability (PRIMARY TEST)
./out/release/larson_hakmem 8 100000 256
# Race detection
./build.sh tsan larson_hakmem
./out/tsan/larson_hakmem 4 10000 256
```
### Phase 2 Testing
```bash
# All sizes
for size in 128 256 512 1024; do
./out/release/bench_random_mixed_hakmem 1000000 $size 42
done
# MT scaling
for threads in 1 2 4 8 16; do
./out/release/larson_hakmem $threads 100000 256
done
```
### Phase 3 Testing
```bash
# Full test suite
make clean && make all
./run_all_tests.sh
# ASan check
./build.sh asan bench_random_mixed_hakmem
./out/asan/bench_random_mixed_hakmem 100000 256 42
```
---
## Performance Expectations
### Single-Threaded
| Metric | Before | After | Change |
|--------|--------|-------|--------|
| Random Mixed 256B | 25.1M ops/s | 24.4-24.8M ops/s | -1.2-2.8% ✅ |
| Larson 1T | 2.76M ops/s | 2.68-2.73M ops/s | -1.1-2.9% ✅ |
**Acceptable**: <5% regression
### Multi-Threaded
| Metric | Before | After | Change |
|--------|--------|-------|--------|
| Larson 8T | **CRASH** | ~18-20M ops/s | **FIXED** |
| MT Scaling (8T) | 0% (crashes) | 70-80% | **NEW** |
**Benefit**: Stability + MT scalability >> 2-3% single-threaded cost
---
## Common Patterns
### NULL Check Conversion
```c
// BEFORE:
if (meta->freelist) { ... }
// AFTER:
if (slab_freelist_is_nonempty(meta)) { ... }
```
### POP Operation Conversion
```c
// BEFORE:
void* block = meta->freelist;
meta->freelist = tiny_next_read(class_idx, block);
// AFTER:
void* block = slab_freelist_pop_lockfree(meta, class_idx);
if (!block) goto fallback; // Handle race
```
### PUSH Operation Conversion
```c
// BEFORE:
tiny_next_write(class_idx, node, meta->freelist);
meta->freelist = node;
// AFTER:
slab_freelist_push_lockfree(meta, class_idx, node);
```
### Initialization Conversion
```c
// BEFORE:
meta->freelist = NULL;
// AFTER:
slab_freelist_store_relaxed(meta, NULL);
```
### Debug Print Conversion
```c
// BEFORE:
fprintf(stderr, "freelist=%p", meta->freelist);
// AFTER:
fprintf(stderr, "freelist=%p", SLAB_FREELIST_DEBUG_PTR(meta));
```
---
## Troubleshooting
### Issue: Compilation Fails
```bash
# Check if accessor header exists
ls -la core/box/slab_freelist_atomic.h
# Check for missing includes
grep -n "#include.*slab_freelist_atomic.h" core/tiny_superslab_alloc.inc.h
# Rebuild from clean state
make clean && make bench_random_mixed_hakmem
```
### Issue: Larson 8T Still Crashes
```bash
# Check conversion progress
./scripts/verify_atomic_freelist_conversion.sh
# Run with TSan to detect data races
./build.sh tsan larson_hakmem
./out/tsan/larson_hakmem 4 10000 256 2>&1 | grep -A5 "WARNING"
# Check for double-POP/PUSH bugs
grep -A1 "slab_freelist_pop_lockfree" core/ -r | grep "tiny_next_read"
grep -B1 "slab_freelist_push_lockfree" core/ -r | grep "tiny_next_write"
```
### Issue: Performance Regression >5%
```bash
# Verify baseline (before conversion)
git stash
git checkout master
./out/release/bench_random_mixed_hakmem 10000000 256 42
# Record: 25.1M ops/s
# Check converted version
git checkout atomic-freelist-phase1
./out/release/bench_random_mixed_hakmem 10000000 256 42
# Should be: >24.0M ops/s
# If regression >5%, profile hot paths
perf record ./out/release/bench_random_mixed_hakmem 1000000 256 42
perf report
# Look for CAS retry loops or excessive memory ordering
```
---
## Rollback Procedures
### Quick Rollback (if Phase 1 fails)
```bash
git stash
git checkout master
git branch -D atomic-freelist-phase1
# Review issues and retry
```
### Alternative Approach (Spinlock)
If lock-free proves too complex:
```c
// Option: Use 1-byte spinlock instead
// Add to TinySlabMeta: uint8_t freelist_lock;
// Use __sync_lock_test_and_set() for lock/unlock
// Expected overhead: 5-10% (vs 2-3% for lock-free)
```
---
## Progress Tracking
Use the verification script to track progress:
```bash
./scripts/verify_atomic_freelist_conversion.sh
```
**Output example**:
```
Progress: 30% (27/90 sites)
[============----------------------------]
Phase 1 files converted: 2/4
Remaining sites: 63
Currently working on: Phase 1 (Critical Hot Paths)
Next step: Convert core/box/carve_push_box.c
```
---
## Success Criteria
### Phase 1 Complete
- [ ] 5 files converted (25 sites)
- [ ] Larson 8T runs 100K iterations without crash
- [ ] Single-threaded regression <5%
- [ ] No TSan warnings
- [ ] Verification script shows 30% progress
### Phase 2 Complete
- [ ] 15 files converted (65 sites)
- [ ] All MT tests pass (1T-16T)
- [ ] Single-threaded regression <3%
- [ ] MT scaling 70%+
- [ ] Verification script shows 72% progress
### Phase 3 Complete
- [ ] 21 files converted (90 sites)
- [ ] Zero direct `meta->freelist` accesses
- [ ] Full test suite passes
- [ ] Documentation updated (CLAUDE.md)
- [ ] Verification script shows 100% progress
---
## File Checklist
### Documentation
- [x] `ATOMIC_FREELIST_SUMMARY.md` - Executive summary
- [x] `ATOMIC_FREELIST_IMPLEMENTATION_STRATEGY.md` - Technical strategy
- [x] `ATOMIC_FREELIST_SITE_BY_SITE_GUIDE.md` - Conversion guide
- [x] `ATOMIC_FREELIST_QUICK_START.md` - Quick start instructions
- [x] `ATOMIC_FREELIST_INDEX.md` - This file
### Templates
- [x] `core/box/slab_freelist_atomic.h.TEMPLATE` - Accessor API
### Tools
- [x] `scripts/analyze_freelist_sites.sh` - Site analysis
- [x] `scripts/verify_atomic_freelist_conversion.sh` - Progress tracker
### Implementation (to be created)
- [ ] `core/box/slab_freelist_atomic.h` - Working accessor API
---
## Contact and Support
If you encounter issues during implementation:
1. **Check documentation**: Review relevant guide for your current phase
2. **Run verification**: `./scripts/verify_atomic_freelist_conversion.sh`
3. **Review common pitfalls**: See `ATOMIC_FREELIST_SITE_BY_SITE_GUIDE.md` section
4. **Rollback if needed**: `git checkout master`
---
## Estimated Timeline
| Milestone | Duration | Cumulative |
|-----------|----------|------------|
| **Preparation** | 15 min | 0.25h |
| **Create accessor header** | 30 min | 0.75h |
| **Phase 1 conversion** | 2-3h | 3-4h |
| **Phase 1 testing** | 30 min | 3.5-4.5h |
| **Phase 2 conversion** | 2-3h | 5.5-7.5h |
| **Phase 2 testing** | 1h | 6.5-8.5h |
| **Phase 3 conversion** | 1-2h | 7.5-10.5h |
| **Phase 3 testing** | 1h | 8.5-11.5h |
| **Total** | | **8.5-11.5h** |
**Minimal viable**: 3.5-4.5 hours (Phase 1 only, fixes Larson crash)
**Full implementation**: 8.5-11.5 hours (all 3 phases, complete MT safety)
---
## Next Steps
**Ready to start?**
1. Read `ATOMIC_FREELIST_QUICK_START.md` (15 min)
2. Run `./scripts/analyze_freelist_sites.sh` (5 min)
3. Copy template: `cp core/box/slab_freelist_atomic.h.TEMPLATE core/box/slab_freelist_atomic.h` (5 min)
4. Edit template to add includes (20 min)
5. Test compile: `make bench_random_mixed_hakmem` (5 min)
6. Begin Phase 1 conversion using `ATOMIC_FREELIST_SITE_BY_SITE_GUIDE.md` (2-3 hours)
**Good luck!** 🚀