diff --git a/BUG_FLOW_DIAGRAM.md b/BUG_FLOW_DIAGRAM.md index 2f78b256..ae926bcd 100644 --- a/BUG_FLOW_DIAGRAM.md +++ b/BUG_FLOW_DIAGRAM.md @@ -1,232 +1,41 @@ -# Active Counter Double-Decrement Bug - Visual Flow Diagram +# Bug Flow Diagram: P0 Batch Refill Active Counter Underflow -## Bug Flow Trace (Single Block Lifecycle) +Legend +- Box 2: Remote Queue (push/drain) +- Box 3: Ownership (owner_tid) +- Box 4: Publish/Adopt + Refill boundary (superslab_refill) +Flow (before fix) ``` -┌─────────────────────────────────────────────────────────────────────┐ -│ Thread A: Initial Allocation (Linear Mode) │ -├─────────────────────────────────────────────────────────────────────┤ -│ File: tiny_superslab_alloc.inc.h:463-472 │ -│ │ -│ meta->used++ │ -│ ss_active_inc(tls->ss) ← active = 100 ✅ │ -│ return block │ -│ │ -│ State: Block allocated, counter = 100 │ -└─────────────────────────────────────────────────────────────────────┘ - ↓ -┌─────────────────────────────────────────────────────────────────────┐ -│ Thread B: Cross-Thread Free │ -├─────────────────────────────────────────────────────────────────────┤ -│ File: hakmem_tiny_superslab.h:292-416 (ss_remote_push) │ -│ │ -│ ss_active_dec_one(ss) ← active = 99 ✅ │ -│ Push block to remote queue │ -│ │ -│ State: Block in remote queue, counter = 99 │ -└─────────────────────────────────────────────────────────────────────┘ - ↓ -┌─────────────────────────────────────────────────────────────────────┐ -│ Thread A: Remote Drain │ -├─────────────────────────────────────────────────────────────────────┤ -│ File: hakmem_tiny_superslab.h:421-529 │ -│ (_ss_remote_drain_to_freelist_unsafe) │ -│ │ -│ meta->freelist = chain_head │ -│ (NO counter change) ← active = 99 ✅ │ -│ Comment: "no change to used/active; already adjusted at free" │ -│ │ -│ State: Block in meta->freelist, counter = 99 │ -└─────────────────────────────────────────────────────────────────────┘ - ↓ -┌─────────────────────────────────────────────────────────────────────┐ -│ Thread A: P0 Batch Refill ⚠️ BUG HERE! ⚠️ │ -├─────────────────────────────────────────────────────────────────────┤ -│ File: hakmem_tiny_refill_p0.inc.h:99-109 │ -│ │ -│ from_freelist = trc_pop_from_freelist(meta, want, &chain) │ -│ trc_splice_to_sll(..., &g_tls_sll_head[class_idx], ...) │ -│ │ -│ ❌ MISSING: ss_active_add(tls->ss, from_freelist) │ -│ (NO counter change) ← active = 99 ❌ SHOULD BE 100! │ -│ │ -│ Comment (WRONG): "from_freelist は既に used/active 計上済み" │ -│ "freelist items already counted" │ -│ │ -│ State: Block in TLS SLL, counter = 99 (WRONG!) │ -└─────────────────────────────────────────────────────────────────────┘ - ↓ -┌─────────────────────────────────────────────────────────────────────┐ -│ Thread A: Allocation from TLS SLL │ -├─────────────────────────────────────────────────────────────────────┤ -│ File: tiny_alloc_fast.inc.h:145-210 (tiny_alloc_fast_pop) │ -│ │ -│ ptr = g_tls_sll_head[class_idx] │ -│ g_tls_sll_head[class_idx] = *(void**)ptr │ -│ (NO counter change - correct for TLS cache) │ -│ ← active = 99 (still wrong) │ -│ return ptr │ -│ │ -│ State: Block allocated, counter = 99 (WRONG! Should be 100) │ -└─────────────────────────────────────────────────────────────────────┘ - ↓ -┌─────────────────────────────────────────────────────────────────────┐ -│ Thread A: Same-Thread Free ⚠️ DOUBLE-DECREMENT! ⚠️ │ -├─────────────────────────────────────────────────────────────────────┤ -│ File: tiny_free_fast.inc.h:91-145 (tiny_free_fast_ss) │ -│ │ -│ tiny_alloc_fast_push(class_idx, ptr) ← Push to TLS cache │ -│ ss_active_dec_one(ss) ← active = 98 ❌ DOUBLE DEC! │ -│ │ -│ State: Block in TLS cache, counter = 98 (WRONG! Should be 99) │ -│ │ -│ ⚠️ BUG RESULT: Counter decremented TWICE (steps 2 and 6) │ -│ but only incremented ONCE (step 1) │ -│ Net effect: -1 per cycle → underflow → OOM │ -└─────────────────────────────────────────────────────────────────────┘ +free(ptr) + -> Box 2 remote_push (cross-thread) + - active-- (on free) [OK] + - goes into SS freelist [no active change] + +refill (P0 batch) + -> trc_pop_from_freelist(meta, want) + - splice to TLS SLL [OK] + - MISSING: active += taken [BUG] + +alloc() uses SLL + +free(ptr) (again) + -> active-- (but not incremented before) → double-decrement + -> active underflow → OOM perceived + -> superslab_refill returns NULL → crash path (free(): invalid pointer) ``` ---- - -## Counter State Timeline - +After fix ``` -Step Action Active Counter Expected Status -──────────────────────────────────────────────────────────────────────── - 1 Linear allocation 100 100 ✅ - 2 Cross-thread free 99 99 ✅ - 3 Remote drain 99 99 ✅ - 4 P0 batch refill (BUG!) 99 100 ❌ - 5 Alloc from TLS SLL 99 100 ❌ - 6 Same-thread free (DOUBLE!) 98 99 ❌ -──────────────────────────────────────────────────────────────────────── -Net: -2 decrements, -1 increment = -1 error per cycle +refill (P0 batch) + -> trc_pop_from_freelist(...) + - splice to TLS SLL + - active += from_freelist [FIX] + -> trc_linear_carve(...) + - active += batch [asserted] ``` ---- +Verification Hooks +- One-shot OOM prints from superslab_refill +- Optional: `HAKMEM_TINY_DEBUG_REMOTE_GUARD=1` and `HAKMEM_TINY_TRACE_RING=1` -## Cascade Effect (100 blocks, heavy cross-thread activity) - -``` -Cycle Active Counter State -───────────────────────────────────────────────────────── - 0 100 Initial - 1 99 After 1 cycle (should be 100) - 2 98 After 2 cycles - ... ... ... - 99 1 After 99 cycles -100 0 UNDERFLOW! -101 UINT32_MAX Counter wraps around -───────────────────────────────────────────────────────── - -Result after underflow: - • SuperSlab appears "full" (active = UINT32_MAX) - • superslab_refill() can't reuse slabs - • Registry adoption fails - • Must allocate new SuperSlabs → OOM - • Corrupted state → "free(): invalid pointer" -``` - ---- - -## Comparison: Direct Freelist Allocation (CORRECT) - -``` -┌─────────────────────────────────────────────────────────────────────┐ -│ Thread A: Direct Allocation from Freelist ✅ │ -├─────────────────────────────────────────────────────────────────────┤ -│ File: tiny_superslab_alloc.inc.h:475-508 │ -│ │ -│ void* block = meta->freelist │ -│ meta->freelist = *(void**)block │ -│ meta->used++ │ -│ ss_active_inc(tls->ss) ← active++ ✅ CORRECT! │ -│ HAK_RET_ALLOC(class_idx, block) │ -│ │ -│ State: Block allocated, counter incremented (correct!) │ -└─────────────────────────────────────────────────────────────────────┘ - -This path CORRECTLY increments the counter because it understands: - 1. Freelist blocks were freed (counter decremented) - 2. Allocating from freelist → must increment counter - 3. Net effect: counter stays balanced ✅ - -P0 batch refill must follow the same protocol! -``` - ---- - -## The Fix - -```diff -File: core/hakmem_tiny_refill_p0.inc.h -Lines: 99-109 - - while (want > 0) { - TinyRefillChain chain; - uint32_t from_freelist = trc_pop_from_freelist(meta, want, &chain); - if (from_freelist > 0) { - trc_splice_to_sll(class_idx, &chain, &g_tls_sll_head[class_idx], &g_tls_sll_count[class_idx]); -- // NOTE: from_freelist は既に used/active 計上済みのブロックの再循環。 -- // nonempty_mask クリアは不要(クリアすると後続freeで立たない)。 -+ // FIX (2025-11-07): Blocks from freelist were decremented when freed. -+ // Must increment counter when moving back to allocation pool (TLS SLL). -+ ss_active_add(tls->ss, from_freelist); - extern unsigned long long g_rf_freelist_items[]; - g_rf_freelist_items[class_idx] += from_freelist; - ... - } - } -``` - -**Why this fixes the bug:** - -1. Freelist blocks are "free" (counter was decremented when freed) -2. TLS SLL blocks are "allocated" (will be returned to user without counter change) -3. Moving from freelist to TLS SLL = moving from "free" to "allocated" -4. Therefore: **counter must be incremented** ✅ - -This matches the protocol used by direct freelist allocation (line 508). - ---- - -## Why Debug Hooks Mask the Bug - -``` -Normal Mode (Bug Visible): - • Fast paths enabled - • P0 batch refill active - • High cross-thread free frequency - • Rapid counter underflow → crash in seconds - -Debug Mode (Bug Hidden): - • Slower code paths - • Different timing/scheduling - • Reduced cross-thread free frequency - • P0 batch refill less frequent or disabled - • Bug accumulates slowly → may not manifest in test duration -``` - ---- - -## Related Files - -### Counter Management -- `core/hakmem_tiny.c:177-182` - `ss_active_add()`, `ss_active_inc()` -- `core/hakmem_tiny_superslab.h:189-199` - `ss_active_dec_one()` - -### Bug Location -- **`core/hakmem_tiny_refill_p0.inc.h:99-109`** ⚠️ BUG HERE - -### Correct Examples -- `core/tiny_superslab_alloc.inc.h:475-508` - Direct freelist alloc (✅ correct) -- `core/tiny_superslab_alloc.inc.h:463-472` - Linear alloc (✅ correct) - -### Free Paths (All Correct) -- `core/tiny_free_fast.inc.h:91-145` - Same-thread free (✅) -- `core/hakmem_tiny_superslab.h:292-416` - Cross-thread free (✅) -- `core/hakmem_tiny_superslab.h:421-529` - Remote drain (✅ no change, correct) - ---- - -**Summary:** The bug is a classic double-decrement caused by missing counter increment in P0 batch refill when moving blocks from freelist (free state) to TLS SLL (allocated state). diff --git a/CLAUDE.md b/CLAUDE.md index 74d87636..93dc6770 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -218,6 +218,14 @@ while (mask) { - `Makefile` - ULTRA_SIMPLE テスト結果を記録 (-15%, 無効化) #### 重要な発見 + +--- + +### Phase 6-2.3: P0 batch refill active-counter fix (2025-11-07) +- 症状: 4T 起動直後に `free(): invalid pointer`。P0 batch refill 経路で freelist → TLS 移送時の active カウンタ加算漏れにより、後段で二重デクリメント→アンダーフロー→OOM→クラッシュ。 +- 修正: `core/hakmem_tiny_refill_p0.inc.h` の freelist 移送分岐に `ss_active_add(tls->ss, from_freelist);` を追加。線形 carve 側も `ss_active_add(tls->ss, batch);` を明示。 +- 結果: 4T デフォルト設定で安定(~0.84M ops/s)。再現試行2回で同一スコア。 +- 残課題: `HAKMEM_TINY_REFILL_COUNT_HOT=64` 設定で再発報告あり。class0–3 大量 refill と `FAST_CAP` の相互作用を調査予定。 - **ULTRA_SIMPLE テスト**: 3.56M ops/s (-15% vs BOX_REFACTOR) - **両方とも同じボトルネック**: `superslab_refill` 29% CPU - **P0 で部分改善**: 内部 -12% だが全体効果は限定的 @@ -551,4 +559,3 @@ make clean && make | ONDEMAND | 1,439,179 | -35% (allocation 失敗) | → 最適化が実際に反映され、スコアが変化するようになった! - diff --git a/CRITICAL_BUG_REPORT.md b/CRITICAL_BUG_REPORT.md index 379fb251..b00788bc 100644 --- a/CRITICAL_BUG_REPORT.md +++ b/CRITICAL_BUG_REPORT.md @@ -1,311 +1,49 @@ -# CRITICAL BUG REPORT: Active Counter Double-Decrement in P0 Batch Refill +# Critical Bug Report: P0 Batch Refill Active Counter Double-Decrement -**Date:** 2025-11-07 -**Severity:** CRITICAL -**Impact:** Causes `free(): invalid pointer` crashes and OOM on 4-thread Larson benchmark -**Status:** ROOT CAUSE IDENTIFIED +Date: 2025-11-07 +Severity: Critical (4T immediate crash) ---- +Summary +- `free(): invalid pointer` crash at startup on 4T Larson when P0 batch refill is active. +- Root cause: Missing active counter increment when moving blocks from SuperSlab freelist to TLS SLL during P0 batch refill, causing a subsequent double-decrement on free leading to counter underflow → perceived OOM → crash. -## Executive Summary - -The HAKMEM allocator crashes with `free(): invalid pointer` and OOM errors when running Larson benchmark with 4 threads and 1024 chunks/thread. The root cause is a **double-decrement bug** in the P0 batch refill optimization where blocks from the freelist are moved to TLS cache without incrementing the active counter, causing the counter to underflow and leading to false OOM conditions. - ---- - -## Bug Symptoms - -1. **Crashes (Exit 134)** with `free(): invalid pointer` -2. **OOM errors** even though memory is available: `superslab_refill returned NULL (OOM) detail: class=3 active=0 bitmap=0x00000000` -3. **Heisenbug**: Disappears when debug hooks are enabled -4. **Load-dependent**: Works with 256 chunks/thread, crashes with 1024 chunks/thread -5. **Thread-dependent**: Affects multi-threaded workloads more severely - ---- - -## Root Cause Analysis - -### The Active Counter Protocol - -The `total_active_blocks` counter in SuperSlab tracks how many blocks are currently allocated (not free). The protocol is: - -1. **Allocation**: Increment counter (`ss_active_inc` or `ss_active_add`) -2. **Free**: Decrement counter (`ss_active_dec_one`) - -This counter must stay in sync with actual allocated blocks. - -### The Bug: P0 Batch Refill Missing Counter Increment - -**File:** `/mnt/workdisk/public_share/hakmem/core/hakmem_tiny_refill_p0.inc.h` -**Lines:** 99-109 - -```c -// Pop from freelist -uint32_t from_freelist = trc_pop_from_freelist(meta, want, &chain); -if (from_freelist > 0) { - trc_splice_to_sll(class_idx, &chain, &g_tls_sll_head[class_idx], &g_tls_sll_count[class_idx]); - // NOTE: from_freelist は既に used/active 計上済みのブロックの再循環。active 追加や - // nonempty_mask クリアは不要(クリアすると後続freeで立たない)。 - // ⚠️ BUG: This comment is WRONG! Freelist blocks had counter decremented when freed! - // ⚠️ MISSING: ss_active_add(tls->ss, from_freelist); - ... -} +Reproduction +``` +./larson_hakmem 10 8 128 1024 1 12345 4 +# → Exit 134 with free(): invalid pointer ``` -The comment claims "freelist items are already counted in used/active" but this is **INCORRECT**. Freelist blocks had their counter **decremented when they were freed**. +Root Cause Analysis +- Free path decrements active → correct +- Remote drain places nodes into SuperSlab freelist → no active change (by design) +- P0 batch refill moved nodes from freelist → TLS SLL, but failed to increment SuperSlab active +- Next free decremented active again → double-decrement → underflow → OOM conditions in refill → crash -### Complete Bug Trace - -Let's trace a single block through the lifecycle: - -#### Step 1: Initial Allocation (Thread A) -```c -// File: tiny_superslab_alloc.inc.h:463-472 -meta->used++; -ss_active_inc(tls->ss); // active = 100 -return block; -``` -✅ Counter incremented correctly. - -#### Step 2: Cross-Thread Free (Thread B) -```c -// File: hakmem_tiny_superslab.h:292-416 (ss_remote_push) -ss_active_dec_one(ss); // active = 99 -// Block pushed to remote queue -``` -✅ Counter decremented correctly. - -#### Step 3: Remote Drain (Thread A) -```c -// File: hakmem_tiny_superslab.h:421-529 (_ss_remote_drain_to_freelist_unsafe) -meta->freelist = chain_head; // Move from remote queue to freelist -// NO counter change (correct, already decremented in step 2) -``` -✅ No change (correct, already decremented). -**State:** Block is in `meta->freelist`, active = 99 - -#### Step 4: P0 Batch Refill (Thread A) ⚠️ BUG HERE! -```c -// File: hakmem_tiny_refill_p0.inc.h:99-109 -uint32_t from_freelist = trc_pop_from_freelist(meta, want, &chain); -trc_splice_to_sll(class_idx, &chain, &g_tls_sll_head[class_idx], &g_tls_sll_count[class_idx]); -// ⚠️ MISSING: ss_active_add(tls->ss, from_freelist); -// NO counter change! -``` -❌ **BUG:** Counter should be incremented here but isn't! -**State:** Block is in TLS SLL, active = 99 (WRONG! Should be 100) - -#### Step 5: Allocation from TLS SLL (Thread A) -```c -// File: tiny_alloc_fast.inc.h:145-210 (tiny_alloc_fast_pop) -void* ptr = tiny_alloc_fast_pop(class_idx); -// Pops from TLS SLL, NO counter change (correct for TLS allocation) -return ptr; -``` -✅ No change (correct for TLS cache). -**State:** Block is allocated, active = 99 (WRONG! Should be 100) - -#### Step 6: Same-Thread Free (Thread A) ⚠️ DOUBLE-DECREMENT! -```c -// File: tiny_free_fast.inc.h:91-145 (tiny_free_fast_ss) -tiny_alloc_fast_push(class_idx, ptr); // Push to TLS cache -ss_active_dec_one(ss); // active = 98 (DOUBLE DECREMENT!) -``` -❌ **BUG:** Counter decremented again, but it was already decremented in Step 2! -**State:** Block is in TLS cache, active = 98 (WRONG! Should be 99) - -### The Cascade Effect - -This bug repeats for **every block** that goes through the freelist → P0 batch refill → allocation → free cycle: - -- After 100 such cycles: active = 0 (underflow to UINT32_MAX) -- SuperSlab appears "full" even though it's not -- `superslab_refill()` can't reuse existing slabs -- Registry adoption fails (thinks slabs are full) -- Must allocate new SuperSlabs → **OOM** -- Corrupted state leads to → **`free(): invalid pointer`** - ---- - -## Why It's a Heisenbug - -### Disappears with Debug Hooks - -When `HAKMEM_TINY_TRACE_RING=1` or `HAKMEM_TINY_DEBUG_REMOTE_GUARD=1`: - -1. Different code paths are taken (slower paths) -2. Timing changes reduce cross-thread free frequency -3. P0 batch refill may be disabled or less frequent -4. Bug still exists but doesn't accumulate fast enough to manifest - -### Load-Dependent Manifestation - -- **256 chunks/thread:** Fewer cross-thread frees → less freelist usage → bug accumulates slowly -- **1024 chunks/thread:** Heavy cross-thread frees → frequent freelist reuse → rapid underflow → crash within seconds - ---- - -## Comparison with Direct Allocation from Freelist - -When allocating **directly** from `meta->freelist` (without P0 batch refill): - -```c -// File: tiny_superslab_alloc.inc.h:475-508 -if (meta && meta->freelist) { - void* block = meta->freelist; - meta->freelist = *(void**)block; - meta->used++; - ss_active_inc(tls->ss); // ✅ Counter incremented! - HAK_RET_ALLOC(class_idx, block); -} -``` - -This path **correctly increments** the counter because it understands that freelist blocks have been freed (counter decremented) and are now being allocated again (counter must be incremented). - -**P0 batch refill must do the same!** - ---- - -## The Fix - -### Primary Fix: Add ss_active_add() in P0 Batch Refill - -**File:** `/mnt/workdisk/public_share/hakmem/core/hakmem_tiny_refill_p0.inc.h` -**Location:** Lines 99-109 +Fix +- File: `core/hakmem_tiny_refill_p0.inc.h` +- Change: In freelist transfer branch, increment active with the exact number taken. +Patch (excerpt) ```diff - // === P0 Batch Carving Loop === - while (want > 0) { - // Handle freelist items first (usually 0) - TinyRefillChain chain; - uint32_t from_freelist = trc_pop_from_freelist(meta, want, &chain); - if (from_freelist > 0) { - trc_splice_to_sll(class_idx, &chain, &g_tls_sll_head[class_idx], &g_tls_sll_count[class_idx]); -- // NOTE: from_freelist は既に used/active 計上済みのブロックの再循環。active 追加や -- // nonempty_mask クリアは不要(クリアすると後続freeで立たない)。 -+ // FIX: Blocks from freelist were decremented when freed (remote or local). -+ // Must increment counter when moving back to allocation pool (TLS SLL). -+ ss_active_add(tls->ss, from_freelist); - extern unsigned long long g_rf_freelist_items[]; - g_rf_freelist_items[class_idx] += from_freelist; - total_taken += from_freelist; - want -= from_freelist; - if (want == 0) break; - } +@@ static inline int sll_refill_batch_from_ss(int class_idx, int max_take) + uint32_t from_freelist = trc_pop_from_freelist(meta, want, &chain); + if (from_freelist > 0) { + trc_splice_to_sll(class_idx, &chain, &g_tls_sll_head[class_idx], &g_tls_sll_count[class_idx]); + // FIX: Blocks from freelist were decremented when freed, must increment when allocated + ss_active_add(tls->ss, from_freelist); + g_rf_freelist_items[class_idx] += from_freelist; + total_taken += from_freelist; + want -= from_freelist; + if (want == 0) break; + } ``` -### Why This Fix Is Correct +Verification +- Default 4T: stable at ~0.84M ops/s (twice repeated, identical score). +- Additional guard: Ensure linear carve path also calls `ss_active_add(tls->ss, batch)`. -1. **Freelist blocks are "free"**: Their counter was decremented when freed -2. **TLS SLL blocks are "allocated"**: They will be returned to user without counter change -3. **P0 batch refill moves blocks from "free" to "allocated"**: Counter must be incremented +Open Items +- With `HAKMEM_TINY_REFILL_COUNT_HOT=64`, a crash reappears under class 4 pressure. + - Hypothesis: excessive hot-class refill → memory pressure on mid-class → OOM path. + - Next: Investigate interaction with `HAKMEM_TINY_FAST_CAP` and run Valgrind leak checks. -This matches the behavior of direct freelist allocation (line 508 in tiny_superslab_alloc.inc.h). - ---- - -## Alternative Analysis: Is the Bug Elsewhere? - -### Could the bug be in the free path? - -**No.** Both free paths correctly decrement: - -1. **Same-thread free** (`tiny_free_fast_ss:142`): `ss_active_dec_one(ss)` ✅ -2. **Cross-thread free** (`ss_remote_push:392`): `ss_active_dec_one(ss)` ✅ - -### Could the bug be in remote drain? - -**No.** Remote drain correctly does NOT change counter because it was already decremented during `ss_remote_push`. The comment explicitly states: "no change to used/active; already adjusted at free" ✅ - -### Could freelist blocks not need counter increment? - -**No.** Direct freelist allocation (`tiny_superslab_alloc.inc.h:508`) proves that freelist blocks MUST have counter incremented when allocated. ✅ - ---- - -## Verification Steps - -### 1. Reproduce the Bug (Baseline) -```bash -make larson_hakmem -./larson_hakmem 10 8 128 1024 1 12345 4 -# Expected: Crash with "free(): invalid pointer" or OOM -``` - -### 2. Apply the Fix -Add `ss_active_add(tls->ss, from_freelist);` in `hakmem_tiny_refill_p0.inc.h:102-103` - -### 3. Rebuild and Test -```bash -make clean && make larson_hakmem -./larson_hakmem 10 8 128 1024 1 12345 4 -# Expected: No crash, stable execution -``` - -### 4. Performance Validation -```bash -# Ensure the fix doesn't degrade performance -HAKMEM_TINY_REFILL_COUNT_HOT=64 ./larson_hakmem 2 8 128 1024 1 12345 4 -# Expected: 4.19M ops/s (same as before) -``` - ---- - -## Related Code Locations - -### Active Counter Management -- **Increment:** `core/hakmem_tiny.c:177-182` (`ss_active_add`, `ss_active_inc`) -- **Decrement:** `core/hakmem_tiny_superslab.h:189-199` (`ss_active_dec_one`) - -### Allocation Paths -- **Direct freelist alloc:** `core/tiny_superslab_alloc.inc.h:475-508` (✅ increments counter) -- **P0 batch refill:** `core/hakmem_tiny_refill_p0.inc.h:99-109` (❌ BUG: missing increment) -- **Linear alloc:** `core/tiny_superslab_alloc.inc.h:463-472` (✅ increments counter) - -### Free Paths -- **Same-thread free:** `core/tiny_free_fast.inc.h:91-145` (✅ decrements counter) -- **Cross-thread free:** `core/hakmem_tiny_superslab.h:292-416` (✅ decrements counter) -- **Remote drain:** `core/hakmem_tiny_superslab.h:421-529` (✅ no change, correct) - ---- - -## Impact Assessment - -### Severity: CRITICAL - -- **Correctness:** Double-decrement causes counter underflow → false OOM → crashes -- **Stability:** Affects all multi-threaded workloads with moderate to high cross-thread free frequency -- **Performance:** No performance impact (fix adds one atomic increment per batch refill, negligible) - -### Affected Configurations - -- ✅ **Box-refactor builds** (P0 enabled by default) -- ✅ **Multi-threaded workloads** (Larson 4T, general MT applications) -- ❌ **Single-threaded workloads** (no cross-thread frees, no freelist usage) -- ❌ **Debug builds** (different code paths, timing changes mask bug) - ---- - -## Conclusion - -The bug is a **textbook double-decrement error** caused by an incorrect assumption in the P0 batch refill optimization. The comment claiming "freelist blocks are already counted" is false—they had their counter decremented when freed and must have it incremented when allocated again. - -**The fix is simple, localized, and safe:** Add one line `ss_active_add(tls->ss, from_freelist);` in `hakmem_tiny_refill_p0.inc.h:102-103`. - -This will restore counter correctness and eliminate the OOM/crash issues. - ---- - -## Next Steps - -1. **Apply the fix** as described above -2. **Test with Larson 4T** to confirm crash is eliminated -3. **Run full benchmark suite** to ensure no performance regression -4. **Consider adding assertion** to detect counter underflow in debug builds -5. **Update CLAUDE.md** to document the fix in Phase 6-2.3 - ---- - -**Reported by:** Claude Code (Ultrathink Analysis) -**Date:** 2025-11-07 -**Confidence:** 100% (bug trace is complete and verified through code analysis) diff --git a/fix_active_counter_double_decrement.patch b/fix_active_counter_double_decrement.patch index 2e32b000..5ec7fa05 100644 --- a/fix_active_counter_double_decrement.patch +++ b/fix_active_counter_double_decrement.patch @@ -1,15 +1,16 @@ --- a/core/hakmem_tiny_refill_p0.inc.h +++ b/core/hakmem_tiny_refill_p0.inc.h -@@ -99,9 +99,10 @@ static inline int sll_refill_batch_from_ss(int class_idx, int max_take) { - uint32_t from_freelist = trc_pop_from_freelist(meta, want, &chain); - if (from_freelist > 0) { - trc_splice_to_sll(class_idx, &chain, &g_tls_sll_head[class_idx], &g_tls_sll_count[class_idx]); -- // NOTE: from_freelist は既に used/active 計上済みのブロックの再循環。active 追加や -- // nonempty_mask クリアは不要(クリアすると後続freeで立たない)。 -+ // FIX (2025-11-07): Blocks from freelist were decremented when freed (remote or local). -+ // Must increment counter when moving back to allocation pool (TLS SLL). -+ // Bug: Without this, counter underflows → false OOM → crash. -+ ss_active_add(tls->ss, from_freelist); - extern unsigned long long g_rf_freelist_items[]; - g_rf_freelist_items[class_idx] += from_freelist; - total_taken += from_freelist; +@@ + TinyRefillChain chain; + uint32_t from_freelist = trc_pop_from_freelist(meta, want, &chain); + if (from_freelist > 0) { + trc_splice_to_sll(class_idx, &chain, &g_tls_sll_head[class_idx], &g_tls_sll_count[class_idx]); + // FIX: Blocks from freelist were decremented when freed, must increment when allocated + ss_active_add(tls->ss, from_freelist); + extern unsigned long long g_rf_freelist_items[]; + g_rf_freelist_items[class_idx] += from_freelist; + total_taken += from_freelist; + want -= from_freelist; + if (want == 0) break; + } +