# Phase 6.25-6.27: Quick Reference Guide **Target**: Improve Mid Pool from 47% to 61-68% of mimalloc (4T) --- ## ๐Ÿ“‹ Implementation Checklist ### Phase 6.25: Refill Batching (~6 hours) **Goal**: Reduce refill latency by allocating 2-4 pages at once ```bash # New function in hakmem_pool.c static int alloc_tls_page_batch( int class_idx, int batch_size, PoolTLSPage* slots[], int num_slots, PoolTLSRing* ring, PoolTLSBin* bin ); # New env var HAKMEM_POOL_REFILL_BATCH=2 # Default (conservative) HAKMEM_POOL_REFILL_BATCH=4 # Aggressive ``` **Files**: `hakmem_pool.c` (~116 LOC) **Expected**: +10-15% (Mid 1T) --- ### Phase 6.26: Lock-Free Refill (~11 hours) **Goal**: Replace mutex with atomic CAS on freelist ```bash # Replace in hakmem_pool.c - PaddedMutex freelist_locks[POOL_NUM_CLASSES][POOL_NUM_SHARDS]; + atomic_uintptr_t freelist_head[POOL_NUM_CLASSES][POOL_NUM_SHARDS]; + atomic_uint freelist_count[POOL_NUM_CLASSES][POOL_NUM_SHARDS]; # New functions freelist_pop_lockfree() freelist_push_lockfree() freelist_push_batch_lockfree() drain_remote_lockfree() ``` **Files**: `hakmem_pool.c` (~140 LOC, net ~100) **Expected**: +15-20% (Mid 4T) --- ### Phase 6.27: Learner Integration (~5 hours) **Goal**: Dynamic CAP/W_MAX tuning based on runtime stats ```bash # Enable existing learner HAKMEM_LEARN=1 HAKMEM_TARGET_HIT_MID=0.65 HAKMEM_CAP_STEP_MID=8 HAKMEM_CAP_MAX_MID=512 # Optional: W_MAX learning (risky) HAKMEM_WMAX_LEARN=1 HAKMEM_WMAX_CANDIDATES_MID=1.4,1.5,1.6,1.7 HAKMEM_WMAX_CANARY=1 # Safe exploration ``` **Files**: `hakmem_ace.c` (+15 LOC), `hakmem_learner.c` (+10 LOC) **Expected**: +5-10% (all workloads) --- ## ๐Ÿš€ Quick Start (Implementation Order) ### Week 1: Batching + Learner (Parallel) **Day 1-2: Phase 6.25** ```bash # 1. Implement batch function cd /home/tomoaki/git/hakmem vim hakmem_pool.c # Add alloc_tls_page_batch() after line 486 # 2. Integrate into alloc path vim hakmem_pool.c # Modify line 931 (refill call site) # 3. Add env var vim hakmem_pool.c # Add global + parse in hak_pool_init() # 4. Test make clean && make HAKMEM_POOL_REFILL_BATCH=2 ./test_pool_basic HAKMEM_POOL_REFILL_BATCH=2 RUNTIME=10 THREADS=1 ./scripts/run_bench_suite.sh ``` **Day 2-3: Phase 6.27** ```bash # 1. Add ACE waste tracking vim hakmem_ace.c # Add hak_ace_get_total_waste() # 2. Update learner score vim hakmem_learner.c # Line 414, add frag penalty # 3. Test HAKMEM_LEARN=1 HAKMEM_TARGET_HIT_MID=0.70 RUNTIME=60 THREADS=1,4 \ ./scripts/run_bench_suite.sh ``` ### Week 2: Lock-Free **Day 1-3: Phase 6.26** ```bash # 1. Replace data structures vim hakmem_pool.c # Line 276-280, atomics # 2. Implement lock-free ops vim hakmem_pool.c # Add 3 new functions # 3. Integrate vim hakmem_pool.c # Replace lock/unlock with CAS # 4. Test (CRITICAL: TSan) make clean && make CFLAGS="-fsanitize=thread" THREADS=16 DURATION=60 ./test_pool_lockfree_stress # 5. Benchmark RUNTIME=10 THREADS=4 ./scripts/run_bench_suite.sh ``` --- ## ๐Ÿ“Š Expected Results | Phase | Mid 1T | Mid 4T | vs mimalloc (1T) | vs mimalloc (4T) | |-------|--------|--------|------------------|------------------| | Baseline (6.21) | 4.0 M/s | 13.8 M/s | 28% | 47% | | + 6.25 (Batch) | 4.5 M/s | 14.5 M/s | 31% | 49% | | + 6.26 (Lock-Free) | 4.6 M/s | 17.0 M/s | 32% | 58% | | + 6.27 (Learner) | 5.0 M/s | 18.5 M/s | 34% | **63%** โœ… | | **Target (60-75%)** | 8.8-11.0 M/s | 17.7-22.1 M/s | 60-75% | 60-75% | โœ… **4T target achieved!** (61-68% range) โŒ **1T still short** (need Phase 6.28: header elimination) --- ## ๐Ÿงช Testing Commands ### Correctness Tests ```bash # Unit test (per phase) ./test_pool_refill_batch # Phase 6.25 ./test_pool_lockfree # Phase 6.26 ./test_pool_learner # Phase 6.27 # Memory safety valgrind --leak-check=full ./test_pool_refill_batch make clean && make CFLAGS="-fsanitize=address" ./test_pool_lockfree # Thread safety (Phase 6.26 CRITICAL) make clean && make CFLAGS="-fsanitize=thread" THREADS=16 DURATION=60 ./test_pool_lockfree_stress ``` ### Performance Tests ```bash # Quick test (3 sec) RUNTIME=3 THREADS=1,4 ./scripts/run_bench_suite.sh # Full test (10 sec, production) RUNTIME=10 THREADS=1,4 ./scripts/run_bench_suite.sh # Stress test (60 sec, stability) RUNTIME=60 THREADS=1,4,8 ./scripts/run_bench_suite.sh # Head-to-head comparison ./scripts/head_to_head_large.sh # vs mimalloc ``` ### A/B Testing ```bash # Baseline (batch=1, no learner) HAKMEM_POOL_REFILL_BATCH=1 HAKMEM_LEARN=0 \ RUNTIME=10 THREADS=1,4 ./scripts/run_bench_suite.sh \ > baseline.txt # Phase 6.25 (batch=2) HAKMEM_POOL_REFILL_BATCH=2 HAKMEM_LEARN=0 \ RUNTIME=10 THREADS=1,4 ./scripts/run_bench_suite.sh \ > phase_6_25.txt # Phase 6.27 (learner) HAKMEM_POOL_REFILL_BATCH=2 HAKMEM_LEARN=1 HAKMEM_TARGET_HIT_MID=0.65 \ RUNTIME=10 THREADS=1,4 ./scripts/run_bench_suite.sh \ > phase_6_27.txt # Compare grep "Throughput" baseline.txt phase_6_25.txt phase_6_27.txt ``` --- ## ๐Ÿ”ง Troubleshooting ### Phase 6.25 Issues **Symptom**: No performance improvement - Check: `g_pool_refill_batch_size` value (should be 2-4) - Check: Pages allocated counter (should increase in batches) - Check: Ring buffer filling (should hit RING_CAP more often) **Symptom**: Memory bloat - Reduce: `HAKMEM_POOL_REFILL_BATCH=2` (from 4) - Check: Respect CAP limits in batch allocator - Check: No memory leaks (valgrind) ### Phase 6.26 Issues **Symptom**: Crash/hang - Run: ThreadSanitizer (TSan) to find races - Check: CAS loop doesn't infinite loop (add retry limit) - Check: Memory ordering (acquire/release) **Symptom**: Slower than mutex version - Check: CAS retry rate (should be <5%) - Check: Single-thread overhead (should be minimal) - Add: Exponential backoff after N retries **Symptom**: Lost blocks (counter mismatch) - Check: Batch push count matches list length - Check: No concurrent modification during CAS - Add: Invariant checks (debug build) ### Phase 6.27 Issues **Symptom**: CAP oscillation - Increase: `HAKMEM_CAP_DWELL_SEC_MID=5` (from 3) - Increase: `HAKMEM_LEARN_MIN_SAMPLES=512` (from 256) - Narrow: Target band (0.65 ยฑ 0.03 โ†’ 0.65 ยฑ 0.05) **Symptom**: No CAP changes - Check: Hit rate out of target band (needs >ยฑ3% delta) - Check: Sufficient samples (โ‰ฅ256 per window) - Lower: `HAKMEM_TARGET_HIT_MID=0.60` (from 0.65) **Symptom**: W_MAX instability - Enable: `HAKMEM_WMAX_CANARY=1` (safe exploration) - Increase: `HAKMEM_WMAX_TRIAL_SEC=10` (from 5) - Narrow: Candidate range (1.4-1.7 โ†’ 1.5-1.6) --- ## ๐Ÿ“ Environment Variables Reference ### Phase 6.25: Batching | Variable | Default | Range | Description | |----------|---------|-------|-------------| | `HAKMEM_POOL_REFILL_BATCH` | 2 | 1-4 | Pages per refill (1=baseline) | ### Phase 6.26: Lock-Free (No new env vars, pure implementation change) ### Phase 6.27: Learner | Variable | Default | Range | Description | |----------|---------|-------|-------------| | `HAKMEM_LEARN` | 0 | 0-1 | Enable learner (0=off, 1=on) | | `HAKMEM_TARGET_HIT_MID` | 0.65 | 0.5-0.9 | Target hit rate for Mid Pool | | `HAKMEM_CAP_STEP_MID` | 4 | 1-16 | CAP increment/decrement size | | `HAKMEM_CAP_MIN_MID` | 8 | 4-64 | Minimum CAP per class | | `HAKMEM_CAP_MAX_MID` | 2048 | 128-4096 | Maximum CAP per class | | `HAKMEM_CAP_DWELL_SEC_MID` | 3 | 1-10 | Stability period (sec) | | `HAKMEM_LEARN_WINDOW_MS` | 1000 | 500-5000 | Sampling interval (ms) | | `HAKMEM_LEARN_MIN_SAMPLES` | 256 | 64-1024 | Min samples to trigger update | **W_MAX Learning** (Optional): | Variable | Default | Range | Description | |----------|---------|-------|-------------| | `HAKMEM_WMAX_LEARN` | 0 | 0-1 | Enable W_MAX exploration | | `HAKMEM_WMAX_CANDIDATES_MID` | 1.4,1.6,... | CSV list | W_MAX values to try | | `HAKMEM_WMAX_CANARY` | 1 | 0-1 | Safe exploration (1=on) | | `HAKMEM_WMAX_TRIAL_SEC` | 5 | 3-15 | Canary trial duration | | `HAKMEM_WMAX_ADOPT_PCT` | 0.01 | 0.005-0.05 | Adoption threshold (1%) | --- ## ๐ŸŽฏ Success Criteria ### Must-Have (Release Blockers) - โœ… No crashes in 60-sec stress test (16T) - โœ… No memory leaks (valgrind clean) - โœ… No data races (TSan clean) - โœ… Mid 4T: โ‰ฅ17.0 M/s (โ‰ฅ58% of mimalloc) ### Should-Have (Quality Bar) - โœ… Mid 1T: โ‰ฅ4.5 M/s (โ‰ฅ31% of mimalloc) - โœ… Memory footprint: โ‰ค30 MB baseline - โœ… No regression on Tiny/Large (<5%) ### Nice-to-Have (Stretch Goals) - โœ… Mid 4T: โ‰ฅ18.5 M/s (โ‰ฅ63% of mimalloc) โ† **TARGET** - โœ… Learner converges in <60 sec - โœ… W_MAX learning finds better value --- ## ๐Ÿ“š Related Documents - **Full Plan**: `PHASE_6.25_6.27_IMPLEMENTATION_PLAN.md` (this directory) - **Previous Results**: `PHASE_6.21_RESULTS_2025_10_24.md` - **Env Vars**: `../specs/ENV_VARS.md` - **Benchmarks**: `../benchmarks/README.md` --- **Last Updated**: 2025-10-24 **Status**: Ready for Implementation