Files

Moe Charm (CI) 52386401b3 Debug Counters Implementation - Clean History

Major Features:
- Debug counter infrastructure for Refill Stage tracking
- Free Pipeline counters (ss_local, ss_remote, tls_sll)
- Diagnostic counters for early return analysis
- Unified larson.sh benchmark runner with profiles
- Phase 6-3 regression analysis documentation

Bug Fixes:
- Fix SuperSlab disabled by default (HAKMEM_TINY_USE_SUPERSLAB)
- Fix profile variable naming consistency
- Add .gitignore patterns for large files

Performance:
- Phase 6-3: 4.79 M ops/s (has OOM risk)
- With SuperSlab: 3.13 M ops/s (+19% improvement)

This is a clean repository without large log files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-05 12:31:14 +09:00

2025-10-22_COMPARE_MID_2-32KB.md

Debug Counters Implementation - Clean History

2025-11-05 12:31:14 +09:00

2025-10-22_HAKMEM_BEST_MID_2-32KB.md

Debug Counters Implementation - Clean History

2025-11-05 12:31:14 +09:00

2025-10-22_SWEEP_NOTES.md

Debug Counters Implementation - Clean History

2025-11-05 12:31:14 +09:00

BENCH_RESULTS_2025_10_28.md

Debug Counters Implementation - Clean History

2025-11-05 12:31:14 +09:00

BENCH_RESULTS_2025_10_29.md

Debug Counters Implementation - Clean History

2025-11-05 12:31:14 +09:00

BENCHMARK_PHASE_6.10.1.md

Debug Counters Implementation - Clean History

2025-11-05 12:31:14 +09:00

BENCHMARK_RESULTS_CODE_CLEANUP.md

Debug Counters Implementation - Clean History

2025-11-05 12:31:14 +09:00

BENCHMARK_RESULTS_PHASE6.3.md

Debug Counters Implementation - Clean History

2025-11-05 12:31:14 +09:00

BENCHMARK_RESULTS.md

Debug Counters Implementation - Clean History

2025-11-05 12:31:14 +09:00

LARSON_TINY_PERF_2025-11-02.md

Debug Counters Implementation - Clean History

2025-11-05 12:31:14 +09:00

MID_MT_BENCH_README.md

Debug Counters Implementation - Clean History

2025-11-05 12:31:14 +09:00

README.md

Debug Counters Implementation - Clean History

2025-11-05 12:31:14 +09:00

README.md

Benchmarks Docs

ここではベンチマークの実行・保存・命名規則を定義します。

保存場所・命名

スイープ結果: docs/benchmarks/<YYYY-MM-DD>_SWEEP_NOTES.md
大きい生ログ: docs/benchmarks/<YYYY-MM-DD>/<label>_T<threads>.log

基本スイープ

# 1) Tiny/Mid/Large/Big の代表レンジを1–2秒でざっと
scripts/prof_sweep.sh -d 2 -t 1,4 -s 8

# 2) Mid帯に絞って詳細（例: 2–32KB, 1s, 1T/4T）
scripts/prof_sweep.sh -d 1 -t 1,4 -s 7 -m 2048 -M 32768

代表シナリオ（手動）

# 13–15KB 1T（DYN1 A/B）
LD_PRELOAD=$(readlink -f ./libhakmem.so) HAKMEM_MID_DYN1=0     mimalloc-bench/bench/larson/larson 1 13000 15000 10000 1 12345 1
LD_PRELOAD=$(readlink -f ./libhakmem.so) HAKMEM_MID_DYN1=14336 mimalloc-bench/bench/larson/larson 1 13000 15000 10000 1 12345 1

# ラッパー内L1許可
HAKMEM_WRAP_L2=1 HAKMEM_WRAP_L25=1 ...

スクリプト（ログ保存・安全実行）

scripts/save_prof_sweep.sh — 日時フォルダに自動保存（外部タイムアウト付き）
scripts/run_bench_suite.sh — system/mimalloc/hakmem の小スイート（外部タイムアウト付き）
scripts/ab_sweep_mid.sh — Mid帯のA/B（CAP×min_bundle×threads、外部タイムアウト付き）
scripts/ab_fast_mid.sh — Mid fast‑return系（trylock probes × ring return div）のA/B（短時間）
scripts/ab_rcap_probe_drain.sh — Mid向け RING_CAP × PROBES × DRAIN_MAX × TLS_LO_MAX のA/B（短時間、再ビルド含む）
scripts/run_larson.sh — 再現性の高い larson 実行（burst/loop プリセット、threads指定、ログ末尾出力）
scripts/kill_bench.sh — 残プロセスの強制停止（TERM→KILL）
scripts/head_to_head_large.sh — Large(64KB–1MB) 10s head‑to‑head（system/mimalloc/hakmem）。P1/P2プロファイルを一括保存
scripts/ab_l25_tc.sh — L2.5（remote, HDR=2）で RUN_FACTOR × TC_SPILL のA/B（10s）。ログを自動保存
scripts/bench_large_profiles.sh — Large 10s の代表プロファイル（P1ベスト/P2+TCベスト）を保存

共通環境変数:

RUNTIME（秒）: 測定時間（既定 1）
BENCH_TIMEOUT（秒）: 壁時計タイムアウト。未指定は RUNTIME+3
KILL_GRACE（秒）: SIGTERM→SIGKILL 猶予（既定 2）
Mid向け: HAKMEM_POOL_MIN_BUNDLE（推奨4）, HAKMEM_SHARD_MIX=1（シャード分散強化）

例:

BENCH_TIMEOUT=6 scripts/save_prof_sweep.sh -d 1 -t 1,4 -s 8
RUNTIME=1 THREADS=1,4 BENCH_TIMEOUT=6 scripts/run_bench_suite.sh

# Mid fast A/B（10秒、1T/4T）
RUNTIME=10 THREADS=1,4 PROBES=2,3 RETURNS=2,3 scripts/ab_fast_mid.sh

# Mid リング/プローブ/ドレイン/LIFO上限 A/B（2秒、1T/4T）
RUNTIME=2 THREADS=1,4 RCAPS=8,16 PROBES=2,3 DRAINS=32,64 LOMAX=256,512 \
  scripts/ab_rcap_probe_drain.sh

# Head‑to‑head（Tiny/Mid, system vs mimalloc vs hakmem）
export HAKMEM_HDR_LIGHT=1 HAKMEM_POOL_TLS_RING=1 HAKMEM_SHARD_MIX=1 \
       HAKMEM_TRYLOCK_PROBES=3 HAKMEM_RING_RETURN_DIV=3
OUT=docs/benchmarks/$(date +%Y%m%d_%H%M%S)_HEAD2HEAD && mkdir -p "$OUT"
scripts/run_larson.sh -d 10 -p burst -m 8 -M 64    | tee "$OUT/tiny_burst.log"
scripts/run_larson.sh -d 10 -p burst -m 2048 -M 32768 | tee "$OUT/mid_burst.log"

タイミング計測（Debug Timing）

計測カテゴリ別にホットスポットを可視化します（stderr出力）。Debugビルド推奨。

例（Mid 4T, 10s）:


## Large(64KB–1MB) ベンチ対策（10s）

推奨プロファイル（現時点）:
- P1ベスト（alloc優先）
  - `HAKMEM_L25_PREF=remote HAKMEM_L25_RUN_FACTOR=4 HAKMEM_HDR_LIGHT=1 HAKMEM_SHARD_MIX=1`
  - 目安: ~102k ops/s（4T, timing ON）
- P2+TCベスト（free優先、ヘッダレス＋ページ記述子＋TC）
  - `HAKMEM_L25_PREF=remote HAKMEM_L25_RUN_FACTOR=4 HAKMEM_HDR_LIGHT=2 HAKMEM_L25_TC_SPILL=16 HAKMEM_SHARD_MIX=1`
  - 目安: ~99k ops/s（4T, timing ON）。free負荷が高いパターンで有利

実行例（head‑to‑head保存）:

./scripts/head_to_head_large.sh # docs/benchmarks/_HEAD2HEAD_LARGE に保存


パラメータA/B（RUN_FACTOR × TC_SPILL）:

RUNTIME=10 THREADS=4 ./scripts/ab_l25_tc.sh # docs/benchmarks/_L25_TC_AB に保存


注意:
- `LD_PRELOAD` は絶対パスを推奨（`readlink -f ./libhakmem.so`）
- timing（`HAKMEM_TIMING=1`）は遅くなるので、最終比較は timing OFF でも再確認してください

## トラブルシューティング（ハング/ゾンビ/暴走）

- timeout の付与（ハング防止）
  - すべての長時間ランは `timeout ${BENCH_TIMEOUT:-$((RUNTIME+3))}s` で包む
  - 本リポの `scripts/head_to_head_large.sh` / `scripts/ab_l25_tc.sh` は timeout 対応済
- ゾンビ確認/親特定/掃除
  - 確認: `ps -eo pid,ppid,stat,etime,cmd | awk '$3 ~ /Z/ {print}'`
  - 親特定: `pstree -sp <PPID>`（ない場合は `ps -p <PPID> -o pid,ppid,cmd`）
  - 掃除: ゾンビは kill 不可。親プロセスを適切に終了/再起動（ tmux セッション/シェル/常駐ツールなど）
  - 例: `kill -HUP <PPID>` → 効かない場合はセッションを閉じる/再接続
- 残プロセス一括停止（ベンチ）
  - larson 停止: `pkill -f 'mimalloc-bench/bench/larson/larson'`（最悪 `pkill -9 -f ...`）
- 典型例（本環境）
  - `notify_wrapper.` の `<defunct>` が大量に残る事例あり。親は codex ランチャー/シェルのことが多い
  - 長時間運用後は tmux/シェルをリフレッシュしてから A/B を回すと安定
make -j4 debug
HAKMEM_TIMING=1 HAKMEM_POOL_TLS_RING=1 HAKMEM_TRYLOCK_PROBES=3 HAKMEM_TLS_LO_MAX=256 \
  LD_PRELOAD=./libhakmem.so mimalloc-bench/bench/larson/larson 10 2048 32768 10000 1 12345 4

例（Large 4T, 10s, L2.5）:

make -j4 debug
HAKMEM_TIMING=1 HAKMEM_WRAP_L25=1 HAKMEM_POOL_TLS_RING=1 HAKMEM_TRYLOCK_PROBES=3 HAKMEM_TLS_LO_MAX=256 \
  LD_PRELOAD=./libhakmem.so mimalloc-bench/bench/larson/larson 10 65536 1048576 10000 1 12345 4

主なカテゴリ（抜粋）:

Mid(L2): pool_lock, pool_refill, pool_tc_drain, pool_tls_ring_pop, pool_tls_lifo_pop, pool_remote_push, pool_alloc_tls_page
L2.5: l25_lock, l25_refill, l25_tls_ring_pop, l25_tls_lifo_pop, l25_remote_push, l25_alloc_tls_page, l25_shard_steal

README.md Unescape Escape

Benchmarks Docs

保存場所・命名

基本スイープ

代表シナリオ（手動）

スクリプト（ログ保存・安全実行）

タイミング計測（Debug Timing）

README.md