332 lines
8.6 KiB
Markdown
332 lines
8.6 KiB
Markdown
|
|
# Phase 6.25-6.27: Quick Reference Guide
|
||
|
|
|
||
|
|
**Target**: Improve Mid Pool from 47% to 61-68% of mimalloc (4T)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 📋 Implementation Checklist
|
||
|
|
|
||
|
|
### Phase 6.25: Refill Batching (~6 hours)
|
||
|
|
|
||
|
|
**Goal**: Reduce refill latency by allocating 2-4 pages at once
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# New function in hakmem_pool.c
|
||
|
|
static int alloc_tls_page_batch(
|
||
|
|
int class_idx, int batch_size,
|
||
|
|
PoolTLSPage* slots[], int num_slots,
|
||
|
|
PoolTLSRing* ring, PoolTLSBin* bin
|
||
|
|
);
|
||
|
|
|
||
|
|
# New env var
|
||
|
|
HAKMEM_POOL_REFILL_BATCH=2 # Default (conservative)
|
||
|
|
HAKMEM_POOL_REFILL_BATCH=4 # Aggressive
|
||
|
|
```
|
||
|
|
|
||
|
|
**Files**: `hakmem_pool.c` (~116 LOC)
|
||
|
|
|
||
|
|
**Expected**: +10-15% (Mid 1T)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### Phase 6.26: Lock-Free Refill (~11 hours)
|
||
|
|
|
||
|
|
**Goal**: Replace mutex with atomic CAS on freelist
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Replace in hakmem_pool.c
|
||
|
|
- PaddedMutex freelist_locks[POOL_NUM_CLASSES][POOL_NUM_SHARDS];
|
||
|
|
+ atomic_uintptr_t freelist_head[POOL_NUM_CLASSES][POOL_NUM_SHARDS];
|
||
|
|
+ atomic_uint freelist_count[POOL_NUM_CLASSES][POOL_NUM_SHARDS];
|
||
|
|
|
||
|
|
# New functions
|
||
|
|
freelist_pop_lockfree()
|
||
|
|
freelist_push_lockfree()
|
||
|
|
freelist_push_batch_lockfree()
|
||
|
|
drain_remote_lockfree()
|
||
|
|
```
|
||
|
|
|
||
|
|
**Files**: `hakmem_pool.c` (~140 LOC, net ~100)
|
||
|
|
|
||
|
|
**Expected**: +15-20% (Mid 4T)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### Phase 6.27: Learner Integration (~5 hours)
|
||
|
|
|
||
|
|
**Goal**: Dynamic CAP/W_MAX tuning based on runtime stats
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Enable existing learner
|
||
|
|
HAKMEM_LEARN=1
|
||
|
|
HAKMEM_TARGET_HIT_MID=0.65
|
||
|
|
HAKMEM_CAP_STEP_MID=8
|
||
|
|
HAKMEM_CAP_MAX_MID=512
|
||
|
|
|
||
|
|
# Optional: W_MAX learning (risky)
|
||
|
|
HAKMEM_WMAX_LEARN=1
|
||
|
|
HAKMEM_WMAX_CANDIDATES_MID=1.4,1.5,1.6,1.7
|
||
|
|
HAKMEM_WMAX_CANARY=1 # Safe exploration
|
||
|
|
```
|
||
|
|
|
||
|
|
**Files**: `hakmem_ace.c` (+15 LOC), `hakmem_learner.c` (+10 LOC)
|
||
|
|
|
||
|
|
**Expected**: +5-10% (all workloads)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 🚀 Quick Start (Implementation Order)
|
||
|
|
|
||
|
|
### Week 1: Batching + Learner (Parallel)
|
||
|
|
|
||
|
|
**Day 1-2: Phase 6.25**
|
||
|
|
```bash
|
||
|
|
# 1. Implement batch function
|
||
|
|
cd /home/tomoaki/git/hakmem
|
||
|
|
vim hakmem_pool.c # Add alloc_tls_page_batch() after line 486
|
||
|
|
|
||
|
|
# 2. Integrate into alloc path
|
||
|
|
vim hakmem_pool.c # Modify line 931 (refill call site)
|
||
|
|
|
||
|
|
# 3. Add env var
|
||
|
|
vim hakmem_pool.c # Add global + parse in hak_pool_init()
|
||
|
|
|
||
|
|
# 4. Test
|
||
|
|
make clean && make
|
||
|
|
HAKMEM_POOL_REFILL_BATCH=2 ./test_pool_basic
|
||
|
|
HAKMEM_POOL_REFILL_BATCH=2 RUNTIME=10 THREADS=1 ./scripts/run_bench_suite.sh
|
||
|
|
```
|
||
|
|
|
||
|
|
**Day 2-3: Phase 6.27**
|
||
|
|
```bash
|
||
|
|
# 1. Add ACE waste tracking
|
||
|
|
vim hakmem_ace.c # Add hak_ace_get_total_waste()
|
||
|
|
|
||
|
|
# 2. Update learner score
|
||
|
|
vim hakmem_learner.c # Line 414, add frag penalty
|
||
|
|
|
||
|
|
# 3. Test
|
||
|
|
HAKMEM_LEARN=1 HAKMEM_TARGET_HIT_MID=0.70 RUNTIME=60 THREADS=1,4 \
|
||
|
|
./scripts/run_bench_suite.sh
|
||
|
|
```
|
||
|
|
|
||
|
|
### Week 2: Lock-Free
|
||
|
|
|
||
|
|
**Day 1-3: Phase 6.26**
|
||
|
|
```bash
|
||
|
|
# 1. Replace data structures
|
||
|
|
vim hakmem_pool.c # Line 276-280, atomics
|
||
|
|
|
||
|
|
# 2. Implement lock-free ops
|
||
|
|
vim hakmem_pool.c # Add 3 new functions
|
||
|
|
|
||
|
|
# 3. Integrate
|
||
|
|
vim hakmem_pool.c # Replace lock/unlock with CAS
|
||
|
|
|
||
|
|
# 4. Test (CRITICAL: TSan)
|
||
|
|
make clean && make CFLAGS="-fsanitize=thread"
|
||
|
|
THREADS=16 DURATION=60 ./test_pool_lockfree_stress
|
||
|
|
|
||
|
|
# 5. Benchmark
|
||
|
|
RUNTIME=10 THREADS=4 ./scripts/run_bench_suite.sh
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 📊 Expected Results
|
||
|
|
|
||
|
|
| Phase | Mid 1T | Mid 4T | vs mimalloc (1T) | vs mimalloc (4T) |
|
||
|
|
|-------|--------|--------|------------------|------------------|
|
||
|
|
| Baseline (6.21) | 4.0 M/s | 13.8 M/s | 28% | 47% |
|
||
|
|
| + 6.25 (Batch) | 4.5 M/s | 14.5 M/s | 31% | 49% |
|
||
|
|
| + 6.26 (Lock-Free) | 4.6 M/s | 17.0 M/s | 32% | 58% |
|
||
|
|
| + 6.27 (Learner) | 5.0 M/s | 18.5 M/s | 34% | **63%** ✅ |
|
||
|
|
| **Target (60-75%)** | 8.8-11.0 M/s | 17.7-22.1 M/s | 60-75% | 60-75% |
|
||
|
|
|
||
|
|
✅ **4T target achieved!** (61-68% range)
|
||
|
|
❌ **1T still short** (need Phase 6.28: header elimination)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 🧪 Testing Commands
|
||
|
|
|
||
|
|
### Correctness Tests
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Unit test (per phase)
|
||
|
|
./test_pool_refill_batch # Phase 6.25
|
||
|
|
./test_pool_lockfree # Phase 6.26
|
||
|
|
./test_pool_learner # Phase 6.27
|
||
|
|
|
||
|
|
# Memory safety
|
||
|
|
valgrind --leak-check=full ./test_pool_refill_batch
|
||
|
|
make clean && make CFLAGS="-fsanitize=address"
|
||
|
|
./test_pool_lockfree
|
||
|
|
|
||
|
|
# Thread safety (Phase 6.26 CRITICAL)
|
||
|
|
make clean && make CFLAGS="-fsanitize=thread"
|
||
|
|
THREADS=16 DURATION=60 ./test_pool_lockfree_stress
|
||
|
|
```
|
||
|
|
|
||
|
|
### Performance Tests
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Quick test (3 sec)
|
||
|
|
RUNTIME=3 THREADS=1,4 ./scripts/run_bench_suite.sh
|
||
|
|
|
||
|
|
# Full test (10 sec, production)
|
||
|
|
RUNTIME=10 THREADS=1,4 ./scripts/run_bench_suite.sh
|
||
|
|
|
||
|
|
# Stress test (60 sec, stability)
|
||
|
|
RUNTIME=60 THREADS=1,4,8 ./scripts/run_bench_suite.sh
|
||
|
|
|
||
|
|
# Head-to-head comparison
|
||
|
|
./scripts/head_to_head_large.sh # vs mimalloc
|
||
|
|
```
|
||
|
|
|
||
|
|
### A/B Testing
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Baseline (batch=1, no learner)
|
||
|
|
HAKMEM_POOL_REFILL_BATCH=1 HAKMEM_LEARN=0 \
|
||
|
|
RUNTIME=10 THREADS=1,4 ./scripts/run_bench_suite.sh \
|
||
|
|
> baseline.txt
|
||
|
|
|
||
|
|
# Phase 6.25 (batch=2)
|
||
|
|
HAKMEM_POOL_REFILL_BATCH=2 HAKMEM_LEARN=0 \
|
||
|
|
RUNTIME=10 THREADS=1,4 ./scripts/run_bench_suite.sh \
|
||
|
|
> phase_6_25.txt
|
||
|
|
|
||
|
|
# Phase 6.27 (learner)
|
||
|
|
HAKMEM_POOL_REFILL_BATCH=2 HAKMEM_LEARN=1 HAKMEM_TARGET_HIT_MID=0.65 \
|
||
|
|
RUNTIME=10 THREADS=1,4 ./scripts/run_bench_suite.sh \
|
||
|
|
> phase_6_27.txt
|
||
|
|
|
||
|
|
# Compare
|
||
|
|
grep "Throughput" baseline.txt phase_6_25.txt phase_6_27.txt
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 🔧 Troubleshooting
|
||
|
|
|
||
|
|
### Phase 6.25 Issues
|
||
|
|
|
||
|
|
**Symptom**: No performance improvement
|
||
|
|
- Check: `g_pool_refill_batch_size` value (should be 2-4)
|
||
|
|
- Check: Pages allocated counter (should increase in batches)
|
||
|
|
- Check: Ring buffer filling (should hit RING_CAP more often)
|
||
|
|
|
||
|
|
**Symptom**: Memory bloat
|
||
|
|
- Reduce: `HAKMEM_POOL_REFILL_BATCH=2` (from 4)
|
||
|
|
- Check: Respect CAP limits in batch allocator
|
||
|
|
- Check: No memory leaks (valgrind)
|
||
|
|
|
||
|
|
### Phase 6.26 Issues
|
||
|
|
|
||
|
|
**Symptom**: Crash/hang
|
||
|
|
- Run: ThreadSanitizer (TSan) to find races
|
||
|
|
- Check: CAS loop doesn't infinite loop (add retry limit)
|
||
|
|
- Check: Memory ordering (acquire/release)
|
||
|
|
|
||
|
|
**Symptom**: Slower than mutex version
|
||
|
|
- Check: CAS retry rate (should be <5%)
|
||
|
|
- Check: Single-thread overhead (should be minimal)
|
||
|
|
- Add: Exponential backoff after N retries
|
||
|
|
|
||
|
|
**Symptom**: Lost blocks (counter mismatch)
|
||
|
|
- Check: Batch push count matches list length
|
||
|
|
- Check: No concurrent modification during CAS
|
||
|
|
- Add: Invariant checks (debug build)
|
||
|
|
|
||
|
|
### Phase 6.27 Issues
|
||
|
|
|
||
|
|
**Symptom**: CAP oscillation
|
||
|
|
- Increase: `HAKMEM_CAP_DWELL_SEC_MID=5` (from 3)
|
||
|
|
- Increase: `HAKMEM_LEARN_MIN_SAMPLES=512` (from 256)
|
||
|
|
- Narrow: Target band (0.65 ± 0.03 → 0.65 ± 0.05)
|
||
|
|
|
||
|
|
**Symptom**: No CAP changes
|
||
|
|
- Check: Hit rate out of target band (needs >±3% delta)
|
||
|
|
- Check: Sufficient samples (≥256 per window)
|
||
|
|
- Lower: `HAKMEM_TARGET_HIT_MID=0.60` (from 0.65)
|
||
|
|
|
||
|
|
**Symptom**: W_MAX instability
|
||
|
|
- Enable: `HAKMEM_WMAX_CANARY=1` (safe exploration)
|
||
|
|
- Increase: `HAKMEM_WMAX_TRIAL_SEC=10` (from 5)
|
||
|
|
- Narrow: Candidate range (1.4-1.7 → 1.5-1.6)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 📝 Environment Variables Reference
|
||
|
|
|
||
|
|
### Phase 6.25: Batching
|
||
|
|
|
||
|
|
| Variable | Default | Range | Description |
|
||
|
|
|----------|---------|-------|-------------|
|
||
|
|
| `HAKMEM_POOL_REFILL_BATCH` | 2 | 1-4 | Pages per refill (1=baseline) |
|
||
|
|
|
||
|
|
### Phase 6.26: Lock-Free
|
||
|
|
|
||
|
|
(No new env vars, pure implementation change)
|
||
|
|
|
||
|
|
### Phase 6.27: Learner
|
||
|
|
|
||
|
|
| Variable | Default | Range | Description |
|
||
|
|
|----------|---------|-------|-------------|
|
||
|
|
| `HAKMEM_LEARN` | 0 | 0-1 | Enable learner (0=off, 1=on) |
|
||
|
|
| `HAKMEM_TARGET_HIT_MID` | 0.65 | 0.5-0.9 | Target hit rate for Mid Pool |
|
||
|
|
| `HAKMEM_CAP_STEP_MID` | 4 | 1-16 | CAP increment/decrement size |
|
||
|
|
| `HAKMEM_CAP_MIN_MID` | 8 | 4-64 | Minimum CAP per class |
|
||
|
|
| `HAKMEM_CAP_MAX_MID` | 2048 | 128-4096 | Maximum CAP per class |
|
||
|
|
| `HAKMEM_CAP_DWELL_SEC_MID` | 3 | 1-10 | Stability period (sec) |
|
||
|
|
| `HAKMEM_LEARN_WINDOW_MS` | 1000 | 500-5000 | Sampling interval (ms) |
|
||
|
|
| `HAKMEM_LEARN_MIN_SAMPLES` | 256 | 64-1024 | Min samples to trigger update |
|
||
|
|
|
||
|
|
**W_MAX Learning** (Optional):
|
||
|
|
|
||
|
|
| Variable | Default | Range | Description |
|
||
|
|
|----------|---------|-------|-------------|
|
||
|
|
| `HAKMEM_WMAX_LEARN` | 0 | 0-1 | Enable W_MAX exploration |
|
||
|
|
| `HAKMEM_WMAX_CANDIDATES_MID` | 1.4,1.6,... | CSV list | W_MAX values to try |
|
||
|
|
| `HAKMEM_WMAX_CANARY` | 1 | 0-1 | Safe exploration (1=on) |
|
||
|
|
| `HAKMEM_WMAX_TRIAL_SEC` | 5 | 3-15 | Canary trial duration |
|
||
|
|
| `HAKMEM_WMAX_ADOPT_PCT` | 0.01 | 0.005-0.05 | Adoption threshold (1%) |
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 🎯 Success Criteria
|
||
|
|
|
||
|
|
### Must-Have (Release Blockers)
|
||
|
|
|
||
|
|
- ✅ No crashes in 60-sec stress test (16T)
|
||
|
|
- ✅ No memory leaks (valgrind clean)
|
||
|
|
- ✅ No data races (TSan clean)
|
||
|
|
- ✅ Mid 4T: ≥17.0 M/s (≥58% of mimalloc)
|
||
|
|
|
||
|
|
### Should-Have (Quality Bar)
|
||
|
|
|
||
|
|
- ✅ Mid 1T: ≥4.5 M/s (≥31% of mimalloc)
|
||
|
|
- ✅ Memory footprint: ≤30 MB baseline
|
||
|
|
- ✅ No regression on Tiny/Large (<5%)
|
||
|
|
|
||
|
|
### Nice-to-Have (Stretch Goals)
|
||
|
|
|
||
|
|
- ✅ Mid 4T: ≥18.5 M/s (≥63% of mimalloc) ← **TARGET**
|
||
|
|
- ✅ Learner converges in <60 sec
|
||
|
|
- ✅ W_MAX learning finds better value
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 📚 Related Documents
|
||
|
|
|
||
|
|
- **Full Plan**: `PHASE_6.25_6.27_IMPLEMENTATION_PLAN.md` (this directory)
|
||
|
|
- **Previous Results**: `PHASE_6.21_RESULTS_2025_10_24.md`
|
||
|
|
- **Env Vars**: `../specs/ENV_VARS.md`
|
||
|
|
- **Benchmarks**: `../benchmarks/README.md`
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
**Last Updated**: 2025-10-24
|
||
|
|
**Status**: Ready for Implementation
|