Files
hakmem/docs/paper/ACE-Alloc/README.md
Moe Charm (CI) 5685c2f4c9 Implement Warm Pool Secondary Prefill Optimization (Phase B-2c Complete)
Problem: Warm pool had 0% hit rate (only 1 hit per 3976 misses) despite being
implemented, causing all cache misses to go through expensive superslab_refill
registry scans.

Root Cause Analysis:
- Warm pool was initialized once and pushed a single slab after each refill
- When that slab was exhausted, it was discarded (not pushed back)
- Next refill would push another single slab, which was immediately exhausted
- Pool would oscillate between 0 and 1 items, yielding 0% hit rate

Solution: Secondary Prefill on Cache Miss
When warm pool becomes empty, we now do multiple superslab_refills and prefill
the pool with 3 additional HOT superlslabs before attempting to carve. This
builds a working set of slabs that can sustain allocation pressure.

Implementation Details:
- Modified unified_cache_refill() cold path to detect empty pool
- Added prefill loop: when pool count == 0, load 3 extra superlslabs
- Store extra slabs in warm pool, keep 1 in TLS for immediate carving
- Track prefill events in g_warm_pool_stats[].prefilled counter

Results (1M Random Mixed 256B allocations):
- Before: C7 hits=1, misses=3976, hit_rate=0.0%
- After:  C7 hits=3929, misses=3143, hit_rate=55.6%
- Throughput: 4.055M ops/s (maintained vs 4.07M baseline)
- Stability: Consistent 55.6% hit rate at 5M allocations (4.102M ops/s)

Performance Impact:
- No regression: throughput remained stable at ~4.1M ops/s
- Registry scan avoided in 55.6% of cache misses (significant savings)
- Warm pool now functioning as intended with strong locality

Configuration:
- TINY_WARM_POOL_MAX_PER_CLASS increased from 4 to 16 to support prefill
- Prefill budget hardcoded to 3 (tunable via env var if needed later)
- All statistics always compiled, ENV-gated printing via HAKMEM_WARM_POOL_STATS=1

Next Steps:
- Monitor for further optimization opportunities (prefill budget tuning)
- Consider adaptive prefill budget based on class-specific hit rates
- Validate at larger allocation counts (10M+ pending registry size fix)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-04 23:31:54 +09:00

39 lines
902 B
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

ACEAlloc Paper (Draft)
=======================
Files
-----
- main.md — 論文本文のひな形Markdown
- figs/ — 図の置き場(必要に応じて作成)。
Build (optional)
----------------
Pandoc がある場合、PDF/HTML などへ変換できます。
```
# HTML
pandoc -s main.md -o paper.html
# PDF (pandoc + LaTeX 環境が必要)
pandoc -s main.md -o paper.pdf
```
Repro / Benchmarks
------------------
簡易スイープ(性能と RSS:
```
scripts/sweep_mem_perf.sh both | tee sweep.csv
```
代表的なベンチマークTiny / Mixed:
```
make bench_tiny_hot_hakmem bench_random_mixed_hakmem
HAKMEM_TINY_PROFILE=full ./bench_tiny_hot_hakmem 64 100 60000
HAKMEM_TINY_PROFILE=conservative ./bench_random_mixed_hakmem 2000000 400 42
```
環境変数やプロファイルの詳細は `docs/specs/ENV_VARS.md` を参照してください。