Files

Moe Charm (CI) 52386401b3 Debug Counters Implementation - Clean History

Major Features:
- Debug counter infrastructure for Refill Stage tracking
- Free Pipeline counters (ss_local, ss_remote, tls_sll)
- Diagnostic counters for early return analysis
- Unified larson.sh benchmark runner with profiles
- Phase 6-3 regression analysis documentation

Bug Fixes:
- Fix SuperSlab disabled by default (HAKMEM_TINY_USE_SUPERSLAB)
- Fix profile variable naming consistency
- Add .gitignore patterns for large files

Performance:
- Phase 6-3: 4.79 M ops/s (has OOM risk)
- With SuperSlab: 3.13 M ops/s (+19% improvement)

This is a clean repository without large log files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-05 12:31:14 +09:00

2.6 KiB

Raw Blame History

ROADMAP（SACS‑3 版）

このロードマップは「速度 × メモリ効率」を両立させつつ mimalloc に追い付くための最短手順です。ホットパスはサイズのみで層決定（Tiny ≤1KiB, Mid 1–32KiB, Large/Big ≥64KiB〜）し、最適化は Mid 層の内部整地（CAP/束数/TC/リング/閾値）に限定します。

P0（直近, 速度ブースト）

Mid ヘッダレス化（安全）
- ページ記述子（64KiB → {class_idx, owner_tid}) で free の class/owner を逆引き
- hak_free_at でヘッダ読取前に Mid 判定→ pool free 直行（HDR_LIGHT=2 でも安全）
- 受け入れ条件: Mid 4T BURST で +10% 以上, 退行なし
Transfer Cache（TC）最適化
- HAKMEM_TC_DRAIN_MAX={32,64,128} × POOL_TLS_RING_CAP={8,16} × HAKMEM_TRYLOCK_PROBES={2,3} をA/B
- 受け入れ条件: Mid 4T BURST +15% 以上（基準: 現在のベスト）

P1（速度 × メモリ効率の両立）

Mid：TLS Active Page “2枚化” ＋ adopt 改善
- 共有補充ページの初回タッチで所有者付与→ TC ヒット率上昇
- 2枚運用でbitmap走査/往復を更に減らす
Tiny/Mid：カウンタ完全サンプリング化（既定 1/256, 必要なら 1/512）
- 受け入れ条件: p99 揺れ縮小, RSS 増加なし

P2（メモリ効率強化）

空ページ即返還・遅延返還の切替
- 完全空は munmap, 部分は MADV_FREE + バッチ閾値
- 閑散時スキャベンジ（低優先スレッド）で RSS を抑制
NUMA ローカル供給（可能なら）
- Mid/L2.5/ページ補充を node local 指向で p99 と TLB 効率を向上

P3（偏り最適化 / 仕上げ）

Site Top‑K micro‑pool（(site,class)専用の小スラブ×2 私有）
- 偏りの強いワークロード（larson系）で TC を超えて効かせる
ACE：学習は裏方に徹底（凍結ポリシーの更新のみ）
- CAP/束数/W_MAX/TC 予算/リング/閾値を調律、ホットパスは“読むだけ”

ベンチ計画・判定基準

10秒, larson（BURST/LOOP）, 1T/4T
比較: system, mimalloc, hakmem（環境変数は summary に保存）
目標値（近傍の目安）
- Tiny 4T: mimalloc ±10% 圏内（達成済シナリオあり）
- Mid 4T: mimalloc の 80% 以上 → 100% 超えを目指す（P0〜P1 で詰める）
保存: docs/benchmarks/<日時>_HEAD2HEAD*/summary.txt

スクリプト

小スイート: RUNTIME=10 THREADS=1,4 scripts/run_bench_suite.sh
Mid fast A/B: RUNTIME=10 THREADS=1,4 PROBES=2,3 RETURNS=2,3 scripts/ab_fast_mid.sh
Head‑to‑head: docs/benchmarks/README.md の例を参照

2.6 KiB Raw Blame History Unescape Escape

ROADMAP（SACS‑3 版）

P0（直近, 速度ブースト）

P1（速度 × メモリ効率の両立）

P2（メモリ効率強化）

P3（偏り最適化 / 仕上げ）

ベンチ計画・判定基準

スクリプト

2.6 KiB

Raw Blame History