Files
hakmem/docs/analysis/FREE_PATH_INVESTIGATION.md

522 lines
16 KiB
Markdown
Raw Normal View History

Wrap debug fprintf in !HAKMEM_BUILD_RELEASE guards (Release build optimization) ## Changes ### 1. core/page_arena.c - Removed init failure message (lines 25-27) - error is handled by returning early - All other fprintf statements already wrapped in existing #if !HAKMEM_BUILD_RELEASE blocks ### 2. core/hakmem.c - Wrapped SIGSEGV handler init message (line 72) - CRITICAL: Kept SIGSEGV/SIGBUS/SIGABRT error messages (lines 62-64) - production needs crash logs ### 3. core/hakmem_shared_pool.c - Wrapped all debug fprintf statements in #if !HAKMEM_BUILD_RELEASE: - Node pool exhaustion warning (line 252) - SP_META_CAPACITY_ERROR warning (line 421) - SP_FIX_GEOMETRY debug logging (line 745) - SP_ACQUIRE_STAGE0.5_EMPTY debug logging (line 865) - SP_ACQUIRE_STAGE0_L0 debug logging (line 803) - SP_ACQUIRE_STAGE1_LOCKFREE debug logging (line 922) - SP_ACQUIRE_STAGE2_LOCKFREE debug logging (line 996) - SP_ACQUIRE_STAGE3 debug logging (line 1116) - SP_SLOT_RELEASE debug logging (line 1245) - SP_SLOT_FREELIST_LOCKFREE debug logging (line 1305) - SP_SLOT_COMPLETELY_EMPTY debug logging (line 1316) - Fixed lock_stats_init() for release builds (lines 60-65) - ensure g_lock_stats_enabled is initialized ## Performance Validation Before: 51M ops/s (with debug fprintf overhead) After: 49.1M ops/s (consistent performance, fprintf removed from hot paths) ## Build & Test ```bash ./build.sh larson_hakmem ./out/release/larson_hakmem 1 5 1 1000 100 10000 42 # Result: 49.1M ops/s ``` Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 13:14:18 +09:00
# Free Path Freelist Push Investigation
## Executive Summary
Investigation of the same-thread free path for freelist push implementation has identified **ONE CRITICAL BUG** and **MULTIPLE DESIGN ISSUES** that explain the freelist reuse rate problem.
**Critical Finding:** The freelist push is being performed, but it is **only visible when blocks are accessed from the refill path**, not when they're accessed from normal allocation paths. This creates a **visibility gap** in the publish/fetch mechanism.
---
## Investigation Flow: free() → alloc()
### Phase 1: Same-Thread Free (freelist push)
**File:** `core/hakmem_tiny_free.inc` (lines 1-608)
**Main Function:** `hak_tiny_free_superslab(void* ptr, SuperSlab* ss)` (lines ~150-300)
#### Fast Path Decision (Line 121):
```c
if (!g_tiny_force_remote && meta->owner_tid != 0 && meta->owner_tid == my_tid) {
// Same-thread free
// ...
tiny_free_local_box(ss, slab_idx, meta, ptr, my_tid);
```
**Status:** ✓ CORRECT - ownership check is present
#### Freelist Push Implementation
**File:** `core/box/free_local_box.c` (lines 5-36)
```c
void tiny_free_local_box(SuperSlab* ss, int slab_idx, TinySlabMeta* meta, void* ptr, uint32_t my_tid) {
void* prev = meta->freelist;
*(void**)ptr = prev;
meta->freelist = ptr; // <-- FREELIST PUSH HAPPENS HERE (Line 12)
// ...
meta->used--;
ss_active_dec_one(ss);
if (prev == NULL) {
// First-free → publish
tiny_free_publish_first_free((int)ss->size_class, ss, slab_idx); // Line 34
}
}
```
**Status:** ✓ CORRECT - freelist push happens unconditionally before publish
#### Publish Mechanism
**File:** `core/box/free_publish_box.c` (lines 23-28)
```c
void tiny_free_publish_first_free(int class_idx, SuperSlab* ss, int slab_idx) {
tiny_ready_push(class_idx, ss, slab_idx);
ss_partial_publish(class_idx, ss);
mailbox_box_publish(class_idx, ss, slab_idx); // Line 28
}
```
**File:** `core/box/mailbox_box.c` (lines 112-122)
```c
void mailbox_box_publish(int class_idx, SuperSlab* ss, int slab_idx) {
mailbox_box_register(class_idx);
uintptr_t ent = ((uintptr_t)ss) | ((uintptr_t)slab_idx & 0x3Fu);
uint32_t slot = g_tls_mailbox_slot[class_idx];
atomic_store_explicit(&g_pub_mailbox_entries[class_idx][slot], ent, memory_order_release);
g_pub_mail_hits[class_idx]++; // Line 122 - COUNTER INCREMENTED
}
```
**Status:** ✓ CORRECT - publish happens on first-free
---
### Phase 2: Refill/Adoption Path (mailbox fetch)
**File:** `core/tiny_refill.h` (lines 136-157)
```c
// For hot tiny classes (0..3), try mailbox first
if (class_idx <= 3) {
uint32_t self_tid = tiny_self_u32();
ROUTE_MARK(3);
uintptr_t mail = mailbox_box_fetch(class_idx); // Line 139
if (mail) {
SuperSlab* mss = slab_entry_ss(mail);
int midx = slab_entry_idx(mail);
SlabHandle h = slab_try_acquire(mss, midx, self_tid);
if (slab_is_valid(&h)) {
if (slab_remote_pending(&h)) {
slab_drain_remote_full(&h);
} else if (slab_freelist(&h)) {
tiny_tls_bind_slab(tls, h.ss, h.slab_idx);
ROUTE_MARK(4);
return h.ss; // Success!
}
}
}
}
```
**Status:** ✓ CORRECT - mailbox fetch is called for refill
#### Mailbox Fetch Implementation
**File:** `core/box/mailbox_box.c` (lines 160-207)
```c
uintptr_t mailbox_box_fetch(int class_idx) {
uint32_t used = atomic_load_explicit(&g_pub_mailbox_used[class_idx], memory_order_acquire);
// Destructive fetch of first available entry (0..used-1)
for (uint32_t i = 0; i < used; i++) {
uintptr_t ent = atomic_exchange_explicit(&g_pub_mailbox_entries[class_idx][i],
(uintptr_t)0,
memory_order_acq_rel);
if (ent) {
g_rf_hit_mail[class_idx]++; // Line 200 - COUNTER INCREMENTED
return ent;
}
}
return (uintptr_t)0;
}
---
## Fix Log (2025-11-06)
- P0: nonempty_maskをクリアしない
- 変更: `core/slab_handle.h``slab_freelist_pop()``nonempty_mask` を空→空転でクリアする処理を削除。
- 理由: 一度でも非空になった slab を再発見できるようにして、free後の再利用が見えなくなるリークを防止。
- P0: adopt_gate の TOCTOU 安全化
- 変更: すべての bind 直前の判定を `slab_is_safe_to_bind()` に統一。`core/tiny_refill.h` の mailbox/hot/ready/BG 集約の分岐を更新。
- 変更: adopt_gate 実装側(`core/hakmem_tiny.c`)は `slab_drain_remote_full()` の後に `slab_is_safe_to_bind()` を必ず最終確認。
- P1: Refill アイテム内訳カウンタの追加
- 変更: `core/hakmem_tiny.c``g_rf_freelist_items[]` / `g_rf_carve_items[]` を追加。
- 変更: `core/hakmem_tiny_refill_p0.inc.h` で freelist/carve 取得数をカウント。
- 変更: `core/hakmem_tiny_stats.c` のダンプに [Refill Item Sources] を追加。
- Mailbox 実装の一本化
- 変更: 旧 `core/tiny_mailbox.c/.h` を削除。実装は `core/box/mailbox_box.*` のみ(包括的な Boxに統一。
- Makefile 修正
- 変更: タイポ修正 `>/devnull``>/dev/null`
### 検証の目安SIGUSR1/終了時ダンプ)
- [Refill Stage] の mail/reg/ready が 0 のままになっていないか
- [Refill Item Sources] で freelist/carve のバランスfreelist が上がれば再利用が通電)
- [Publish Hits] / [Publish Pipeline] が 0 連発のときは、`HAKMEM_TINY_FREE_TO_SS=1``HAKMEM_TINY_FREELIST_MASK=1` を一時有効化
```
**Status:** ✓ CORRECT - fetch clears the mailbox entry
---
## Critical Bug Found
### BUG #1: Freelist Access Without Publish
**Location:** `core/hakmem_tiny_free.inc` (lines 687-695)
**Function:** `superslab_alloc_from_slab()` - Direct freelist pop during allocation
```c
// Freelist mode (after first free())
if (meta->freelist) {
void* block = meta->freelist;
meta->freelist = *(void**)block; // Pop from freelist
meta->used++;
tiny_remote_track_on_alloc(ss, slab_idx, block, "freelist_alloc", 0);
tiny_remote_assert_not_remote(ss, slab_idx, block, "freelist_alloc_ret", 0);
return block; // Direct pop - NO mailbox tracking!
}
```
**Problem:** When allocation directly pops from `meta->freelist`, it completely **bypasses the mailbox layer**. This means:
1. Block is pushed to freelist via `tiny_free_local_box()`
2. Mailbox is published on first-free ✓
3. But if the block is accessed during direct freelist pop, the mailbox entry is never fetched or cleared
4. The mailbox entry remains stale, wasting a slot permanently
**Impact:**
- **Permanent mailbox slot leakage** - Published blocks that are directly popped are never cleared
- **False positive in `g_pub_mail_hits[]`** - count includes blocks that bypassed the fetch path
- **Freelist reuse becomes invisible** to refill metrics because it doesn't go through mailbox_box_fetch()
### BUG #2: Premature Publish Before Freelist Formation
**Location:** `core/box/free_local_box.c` (lines 32-34)
**Issue:** Publish happens only on first-free (prev==NULL)
```c
if (prev == NULL) {
tiny_free_publish_first_free((int)ss->size_class, ss, slab_idx);
}
```
**Problem:** Once first-free publishes, subsequent pushes (prev!=NULL) are **silent**:
- Block 1 freed: freelist=[1], mailbox published ✓
- Block 2 freed: freelist=[2→1], mailbox NOT updated ⚠️
- Block 3 freed: freelist=[3→2→1], mailbox NOT updated ⚠️
The mailbox only ever contains the first freed block in the slab. If that block is allocated and then freed again, the mailbox entry is not refreshed.
**Impact:**
- Freelist state changes after first-free are not advertised
- Refill can't discover newly available blocks without full registry scan
- Forces slower adoption path (registry scan) instead of mailbox hit
---
## Design Issues
### Issue #1: Missing Freelist State Visibility
The core problem: **Meta->freelist is not synchronized with publish state**.
**Current Flow:**
```
free()
→ tiny_free_local_box()
→ meta->freelist = ptr (direct write, no sync)
→ if (prev==NULL) mailbox_publish() (one-time)
refill()
→ Try mailbox_box_fetch() (gets only first-free block)
→ If miss, scan registry (slow path, O(n))
→ If found, adopt & pop freelist
alloc()
→ superslab_alloc_from_slab()
→ if (meta->freelist) pop (direct access, bypasses mailbox!)
```
**Missing:** Mailbox consistency check when freelist is accessed
### Issue #2: Adoption vs. Direct Access Race
**Location:** `core/hakmem_tiny_free.inc` (line 687-695)
Thread A: Thread B:
1. Allocate from SS
2. Free block → freelist=[1]
3. Publish mailbox ✓
4. Refill: Try adopt
5. Mailbox fetch gets [1] ✓
6. Ownership acquire → success
7. But direct alloc bypasses this path!
8. Alloc again (same thread)
9. Pop from freelist directly
→ mailbox entry stale now
**Result:** Mailbox state diverges from actual freelist state
### Issue #3: Ownership Transition Not Tracked
When `meta->owner_tid` changes (cross-thread ownership transfer), freelist is not re-published:
**Location:** `core/hakmem_tiny_free.inc` (lines 120-135)
```c
if (!g_tiny_force_remote && meta->owner_tid != 0 && meta->owner_tid == my_tid) {
// Same-thread path
} else {
// Cross-thread path - but NO REPUBLISH if ownership changes
}
```
**Missing:** When ownership transitions to a new thread, the existing freelist should be advertised to that thread
---
## Metrics Analysis
The counters reveal the issue:
**In `core/box/mailbox_box.c` (Line 122):**
```c
void mailbox_box_publish(int class_idx, SuperSlab* ss, int slab_idx) {
// ...
g_pub_mail_hits[class_idx]++; // Published count
}
```
**In `core/box/mailbox_box.c` (Line 200):**
```c
uintptr_t mailbox_box_fetch(int class_idx) {
if (ent) {
g_rf_hit_mail[class_idx]++; // Fetched count
return ent;
}
return (uintptr_t)0;
}
```
**Expected Relationship:** `g_rf_hit_mail[class_idx]` should be ~1.0x of `g_pub_mail_hits[class_idx]`
**Actual Relationship:** Probably 0.1x - 0.5x (many published entries never fetched)
**Explanation:**
- Blocks are published (g_pub_mail_hits++)
- But they're accessed via direct freelist pop (no fetch)
- So g_rf_hit_mail stays low
- Mailbox entries accumulate as garbage
---
## Root Cause Summary
**Root Cause:** The freelist push is functional, but the **visibility mechanism (mailbox) is decoupled** from the **actual freelist access pattern**.
The system assumes refill always goes through mailbox_fetch(), but direct freelist pops bypass this entirely, creating:
1. **Stale mailbox entries** - Published but never fetched
2. **Invisible reuse** - Freed blocks are reused directly without fetch visibility
3. **Metric misalignment** - g_pub_mail_hits >> g_rf_hit_mail
---
## Recommended Fixes
### Fix #1: Clear Stale Mailbox Entry on Direct Pop
**File:** `core/hakmem_tiny_free.inc` (lines 687-695)
**In:** `superslab_alloc_from_slab()`
```c
if (meta->freelist) {
void* block = meta->freelist;
meta->freelist = *(void**)block;
meta->used++;
// NEW: If this is a mailbox-published slab, clear the entry
if (slab_idx == 0) { // Only first slab publishes
// Signal to refill: this slab's mailbox entry may now be stale
// Option A: Mark as dirty (requires new field)
// Option B: Clear mailbox on first pop (requires sync)
}
return block;
}
```
### Fix #2: Republish After Each Free (Aggressive)
**File:** `core/box/free_local_box.c` (lines 32-34)
**Problem:** Only first-free publishes
**Change:**
```c
// Always publish if freelist is non-empty
if (meta->freelist != NULL) {
tiny_free_publish_first_free((int)ss->size_class, ss, slab_idx);
}
```
**Cost:** More atomic operations, but ensures mailbox is always up-to-date
### Fix #3: Track Freelist Modifications via Atomic
**New Approach:** Use atomic freelist_mask as published state
**File:** `core/box/free_local_box.c` (current lines 15-25)
```c
// Already implemented - use this more aggressively
if (prev == NULL) {
uint32_t bit = (1u << slab_idx);
atomic_fetch_or_explicit(&ss->freelist_mask, bit, memory_order_release);
}
// Also mark on later frees
else {
uint32_t bit = (1u << slab_idx);
atomic_fetch_or_explicit(&ss->freelist_mask, bit, memory_order_release);
}
```
### Fix #4: Add Freelist Consistency Check in Refill
**File:** `core/tiny_refill.h` (lines ~140-156)
**New Logic:**
```c
uintptr_t mail = mailbox_box_fetch(class_idx);
if (mail) {
SuperSlab* mss = slab_entry_ss(mail);
int midx = slab_entry_idx(mail);
SlabHandle h = slab_try_acquire(mss, midx, self_tid);
if (slab_is_valid(&h)) {
if (slab_freelist(&h)) {
// NEW: Verify mailbox entry matches actual freelist
if (h.ss->slabs[h.slab_idx].freelist == NULL) {
// Stale entry - was already popped directly
// Re-publish if more blocks freed since
continue; // Try next candidate
}
tiny_tls_bind_slab(tls, h.ss, h.slab_idx);
return h.ss;
}
}
}
```
---
## Testing Recommendations
### Test 1: Mailbox vs. Direct Pop Ratio
Instrument the code to measure:
- `mailbox_fetch_calls` vs `direct_freelist_pops`
- Expected ratio after warmup: Should be ~1:1 if refill path is being used
- Actual ratio: Probably 1:10 or worse (direct pops dominating)
### Test 2: Mailbox Entry Staleness
Enable debug mode and check:
```
HAKMEM_TINY_MAILBOX_TRACE=1 HAKMEM_TINY_RF_TRACE=1 ./larson
```
Examine MBTRACE output:
- Count "publish" events vs "fetch" events
- Any publish without matching fetch = wasted slot
### Test 3: Freelist Reuse Path
Add instrumentation to `superslab_alloc_from_slab()`:
```c
if (meta->freelist) {
g_direct_freelist_pops[class_idx]++; // New counter
}
```
Compare with refill path:
```c
g_refill_calls[class_idx]++;
```
Verify that most allocations come from direct freelist (expected) vs. refill (if low, freelist is working)
---
## Code Quality Issues Found
### Issue #1: Unused Function Parameter
**File:** `core/box/free_local_box.c` (line 8)
```c
void tiny_free_local_box(SuperSlab* ss, int slab_idx, TinySlabMeta* meta, void* ptr, uint32_t my_tid) {
// ...
(void)my_tid; // Explicitly ignored
}
```
**Why:** Parameter passed but not used - suggests design change where ownership was computed earlier
### Issue #2: Magic Number for First Slab
**File:** `core/hakmem_tiny_free.inc` (line 676)
```c
if (slab_idx == 0) {
slab_start = (char*)slab_start + 1024; // Magic number!
}
```
Should be:
```c
if (slab_idx == 0) {
slab_start = (char*)slab_start + sizeof(SuperSlab); // or named constant
}
```
### Issue #3: Duplicate Freelist Scan Logic
**Locations:**
- `core/hakmem_tiny_free.inc` (line ~45-62): `tiny_remote_queue_contains_guard()`
- `core/hakmem_tiny_free.inc` (line ~50-64): Duplicate in safe_free path
These should be unified into a helper function.
---
## Performance Impact
**Current Situation:**
- Freelist is functional and pushed correctly
- But publish/fetch visibility is weak
- Forces all allocations to use direct freelist pop (bypassingrefill path)
- This is actually **good** for performance (fewer lock/sync operations)
- But creates **hidden fragmentation** (freelist not reorganized by adopt path)
**After Fix:**
- Expect +5-10% refill path usage (from ~0% to ~5-10%)
- Refill path can reorganize and rebalance
- Better memory locality for hot allocations
- Slightly more atomic operations during free (acceptable trade-off)
---
## Conclusion
**The freelist push IS happening.** The bug is not in the push logic itself, but in:
1. **Visibility Gap:** Pushed blocks are not tracked by mailbox when accessed via direct pop
2. **Incomplete Publish:** Only first-free publishes; later frees are silent
3. **Lack of Republish:** Freelist state changes not advertised to refill path
The fixes are straightforward:
- Re-publish on every free (not just first-free)
- Validate mailbox entries during fetch
- Track direct vs. refill access to find optimal balance
This explains why Larson shows low refill metrics despite high freelist push rate.