Files

Moe Charm (CI) 0b1c825f25 Fix: CRITICAL multi-threaded freelist/remote queue race condition

Root Cause:
===========
Freelist and remote queue contained the SAME blocks, causing use-after-free:

1. Thread A (owner): pops block X from freelist → allocates to user
2. User writes data ("ab") to block X
3. Thread B (remote): free(block X) → adds to remote queue
4. Thread A (later): drains remote queue → *(void**)block_X = chain_head
   → OVERWRITES USER DATA! 💥

The freelist pop path did NOT drain the remote queue first, so blocks could
be simultaneously in both freelist and remote queue.

Fix:
====
Add remote queue drain BEFORE freelist pop in refill path:

core/hakmem_tiny_refill_p0.inc.h:
  - Call _ss_remote_drain_to_freelist_unsafe() BEFORE trc_pop_from_freelist()
  - Add #include "superslab/superslab_inline.h"
  - This ensures freelist and remote queue are mutually exclusive

Test Results:
=============
BEFORE:
  larson_hakmem (4 threads): ❌ SEGV in seconds (freelist corruption)

AFTER:
  larson_hakmem (4 threads): ✅ 931,629 ops/s (1073 sec stable run)
  bench_random_mixed:        ✅ 1,020,163 ops/s (no crashes)

Evidence:
  - Fail-Fast logs showed next pointer corruption: 0x...6261 (ASCII "ab")
  - Single-threaded benchmarks worked (865K ops/s)
  - Multi-threaded Larson crashed immediately
  - Fix eliminates all crashes in both benchmarks

Files:
  - core/hakmem_tiny_refill_p0.inc.h: Add remote drain before freelist pop
  - CURRENT_TASK.md: Document fix details

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-08 01:35:45 +09:00

5.4 KiB

Raw Blame History

Current Task – 2025-11-08

✅ 完了: リモートキューとフリーリストの競合バグ修正

根本原因

マルチスレッド環境で、フリーリストとリモートキューが同じブロックを参照していたため、以下の競合が発生していた：

スレッド A (所有者):
- trc_pop_from_freelist() でブロック X をフリーリストから取得
- ブロック X をユーザーに割り当て
- ユーザーがブロック X にデータ ("ab") を書き込み
スレッド B (リモートスレッド):
- free(ブロック X) → ss_remote_push() でリモートキューに追加
スレッド A (後で):
- _ss_remote_drain_to_freelist_unsafe() を実行
- *(void**)block_X = chain_head → ユーザーデータを上書き！ 💥

発見プロセス

Larson ベンチマーク (4 スレッド) で SEGV 発生
Fail-Fast 診断ログで次ポインタ破壊を検出: 0x79a4eca06261 (ASCII "ab")
リモート free パス (ss_remote_push) を疑うも、リモートサイドテーブル有効のため書き込みなし
_ss_remote_drain_to_freelist_unsafe() のチェーン構築時に *(void**)node = ... を発見
フリーリスト pop の前にリモートキューの drain がないことを確認

証拠

bench_random_mixed (シングルスレッド): ✅ 動作正常 (865K ops/s)
larson_hakmem (4 スレッド): ❌ SEGV (freelist corruption)
リモート drain 追加後: ✅ Larson 1073秒安定稼働 (931K ops/s)

実装した修正

core/hakmem_tiny_refill_p0.inc.h にリモートキューの drain を追加

// CRITICAL FIX: Drain remote queue BEFORE popping from freelist
// Without this, blocks in both freelist and remote queue can be double-allocated
// (Thread A pops from freelist, Thread B adds to remote queue, Thread A drains remote → overwrites user data)
if (tls->ss && tls->slab_idx >= 0) {
    _ss_remote_drain_to_freelist_unsafe(tls->ss, tls->slab_idx, meta);
}

// Handle freelist items first (usually 0)
TinyRefillChain chain;
uint32_t from_freelist = trc_pop_from_freelist(
    meta, class_idx, ss_base, ss_limit, bs, want, &chain);

理由:

リモートキューからフリーリストへの drain を先に実行することで、フリーリストとリモートキューの重複を解消
これにより、allocate 済みブロックへの書き込みを防止

テスト結果

Larson ベンチマーク (マルチスレッド)

# 修正前: SEGV (数秒で crash)
HAKMEM_TINY_USE_SUPERSLAB=1 ./larson_hakmem 2 8 128 1024 1 12345 4
→ ❌ Segmentation fault

# 修正後: 1073秒安定稼働
HAKMEM_TINY_USE_SUPERSLAB=1 ./larson_hakmem 2 8 128 1024 1 12345 4
→ ✅ 931,629 ops/s (クラッシュなし、1073秒実行)

bench_random_mixed (シングルスレッド)

HAKMEM_TINY_USE_SUPERSLAB=1 ./bench_random_mixed_hakmem 100000 2048 1234567
→ ✅ 1,020,163 ops/s (クラッシュなし)

修正されたファイル

core/hakmem_tiny_refill_p0.inc.h - フリーリスト pop 前にリモートキュー drain 追加
- _ss_remote_drain_to_freelist_unsafe() 呼び出しを挿入
- #include "superslab/superslab_inline.h" 追加

✅ 完了 (前回): 二重割り当てバグの修正

根本原因

trc_linear_carve() が meta->used をカーソルとして使用していたが、meta->used はブロック解放時に減少するため、既に割り当て済みのブロックが再度カーブされる二重割り当てバグが発生していた。

実装した修正

1. TinySlabMeta 構造体に carved フィールド追加 (core/superslab/superslab_types.h)

typedef struct TinySlabMeta {
    void*    freelist;
    uint16_t used;           // 現在使用中のブロック数（増減両方）
    uint16_t capacity;
    uint16_t carved;         // 線形領域からカーブしたブロック数（単調増加のみ）
    uint16_t owner_tid;      // uint32_t → uint16_t に変更
} TinySlabMeta;

2. trc_linear_carve() を修正 (core/tiny_refill_opt.h)

// Before: meta->used をカーソルとして使用（バグ！）
uint8_t* cursor = base + ((size_t)meta->used * bs);
meta->used += batch;

// After: meta->carved をカーソルとして使用（修正版）
uint8_t* cursor = base + ((size_t)meta->carved * bs);
meta->carved += batch;  // 単調増加のみ
meta->used += batch;    // 使用中カウントも更新

テスト結果

# 通常モード
./bench_random_mixed_hakmem 100000 2048 1234567
→ ✅ 812,670~1,020,163 ops/s

次のステップ

性能ベンチマーク
- Larson の長時間実行テスト (registry 容量問題の調査)
- mimalloc との比較
Registry 容量問題の修正 (Optional)
- SUPER_REG_PER_CLASS の調整
- Class 4 で registry full が頻発
診断ログのクリーンアップ (Optional)
- Fail-Fast ログを本番向けに最適化

実行コマンド

# 通常テスト (シングルスレッド)
HAKMEM_TINY_USE_SUPERSLAB=1 ./bench_random_mixed_hakmem 100000 2048 1234567

# Larson ベンチマーク (マルチスレッド)
HAKMEM_TINY_USE_SUPERSLAB=1 ./larson_hakmem 2 8 128 1024 1 12345 4

# Fail-fast 診断モード
HAKMEM_TINY_REFILL_FAILFAST=2 HAKMEM_TINY_USE_SUPERSLAB=1 \
  ./bench_random_mixed_hakmem 50000 2048 1234567

5.4 KiB Raw Blame History Unescape Escape

Current Task – 2025-11-08

✅ 完了: リモートキューとフリーリストの競合バグ修正

根本原因

発見プロセス

証拠

実装した修正

テスト結果

修正されたファイル

✅ 完了 (前回): 二重割り当てバグの修正

根本原因

実装した修正

テスト結果

次のステップ

実行コマンド

5.4 KiB

Raw Blame History