Refactor: Split monolithic hakmem_shared_pool.c into acquire/release modules
- Split core/hakmem_shared_pool.c into acquire/release modules for maintainability. - Introduced core/hakmem_shared_pool_internal.h for shared internal API. - Fixed incorrect function name usage (superslab_alloc -> superslab_allocate). - Increased SUPER_REG_SIZE to 1M to support large working sets (Phase 9-2 fix). - Updated Makefile. - Verified with benchmarks.
This commit is contained in:
380
CURRENT_TASK.md
380
CURRENT_TASK.md
@ -1,363 +1,51 @@
|
|||||||
# Current Task: Phase 9-2 — SuperSlab状態の一元化プラン
|
# Current Task: Phase 9-2 Refactoring (Complete)
|
||||||
|
|
||||||
**Date**: 2025-12-01
|
|
||||||
**Status**: 実行時バグは暫定解消(スロット同期で registry 枯渇を停止)
|
|
||||||
**Goal**: Legacy/Shared の二重メタデータを排除し、SuperSlab 状態管理を shared pool に一本化する根治設計を進める。
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 背景 / 症状
|
|
||||||
- `HAKMEM_TINY_USE_SUPERSLAB=1` で `SuperSlab registry full` → registry が解放されず枯渇。
|
|
||||||
- 原因: Legacy 経路で確保した SuperSlab が Shared Pool のスロット状態に反映されず、`shared_pool_release_slab()` が早期 return していた。
|
|
||||||
- 暫定対処: `sp_meta_sync_slots_from_ss()` で差分を検出したら同期し、EMPTY→FREE リスト→registry 解除まで進めるよう修正済み。
|
|
||||||
|
|
||||||
## 根本原因(箱理論での見立て)
|
|
||||||
- 状態の二重管理: Legacy パスと Shared Pool パスがそれぞれ SuperSlab 状態を持ち、整合が崩れる。
|
|
||||||
- 境界が多重化: acquire/free の境界が複数あり、EMPTY 判定・slot 遷移が散在。
|
|
||||||
|
|
||||||
## 目標
|
|
||||||
1) SuperSlab の状態遷移(UNUSED/ACTIVE/EMPTY)を Shared Pool の slot 状態に一元化。
|
|
||||||
2) acquire/free/adopt/drain の境界を共有プール経路に集約(戻せるよう A/B ガード付き)。
|
|
||||||
3) Legacy backend は互換箱として残しつつ入口で同期し、最終的に削除可能な状態へ。
|
|
||||||
|
|
||||||
## 次にやること(手順)
|
|
||||||
1. **入口統一の設計**
|
|
||||||
- `superslab_allocate()` を shared pool 薄ラッパ経由にし、登録・`SharedSSMeta` 初期化を必ず通す案を設計(env で ON/OFF)。
|
|
||||||
2. **free 経路の整理**
|
|
||||||
- TLS drain / remote / local free からの EMPTY 判定を `shared_pool_release_slab()` だけが扱うよう責務を明確化。
|
|
||||||
- `empty_mask/nonempty_mask/freelist_mask` 更新を shared pool 内部ヘルパに一本化する設計を起こす。
|
|
||||||
3. **観測とガード**
|
|
||||||
- `HAKMEM_TINY_SS_SHARED` / `HAKMEM_TINY_USE_SUPERSLAB` で A/B、`*_DEBUG` でワンショット観測。
|
|
||||||
- `shared_fail→legacy` と registry 占有率をダッシュボード化して移行完了を判断。
|
|
||||||
4. **段階的収束計画を書く**
|
|
||||||
- いつ Legacy backend を既定 OFF にし、削除するかの段階と撤退条件(戻し条件)を文書化。
|
|
||||||
|
|
||||||
## 現状のブロッカー / リスク
|
|
||||||
- Legacy/Shared 混在のままコードが増えると再び同期漏れが出るリスク。
|
|
||||||
- LRU/EMPTY マスクの責務が散らばっており、統合時に副作用が出る可能性。
|
|
||||||
|
|
||||||
## 成果物イメージ
|
|
||||||
- 設計ノート: 入口統一ラッパ、マスク更新ヘルパ、A/B ガード設計。
|
|
||||||
- 最小パッチ案: ラッパ導入+マスク更新の集約(コード変更は次ステップで)。
|
|
||||||
- 検証手順: registry 枯渇の再発テスト、`shared_fail→legacy` カウンタの収束確認。
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Commits
|
|
||||||
|
|
||||||
### Phase 8 Root Cause Fix
|
|
||||||
**Commit**: `191e65983`
|
|
||||||
**Date**: 2025-11-30
|
**Date**: 2025-11-30
|
||||||
**Files**: 3 files, 36 insertions(+), 13 deletions(-)
|
**Status**: **COMPLETE** (Phase 9-2 & Refactoring)
|
||||||
|
**Goal**: SuperSlab Unified Management, Stability Fixes, and Code Refactoring
|
||||||
**Changes**:
|
|
||||||
1. `bench_fast_box.c` (Layer 0 + Layer 1):
|
|
||||||
- Removed unified_cache_init() call (design misunderstanding)
|
|
||||||
- Limited prealloc to 128 blocks/class (actual TLS SLL capacity)
|
|
||||||
- Added root cause comments explaining why unified_cache_init() was wrong
|
|
||||||
|
|
||||||
2. `bench_fast_box.h` (Layer 3):
|
|
||||||
- Added Box Contract documentation (BenchFast uses TLS SLL, NOT UC)
|
|
||||||
- Documented scope separation (workload vs infrastructure allocations)
|
|
||||||
- Added contract violation example (Phase 8 bug explanation)
|
|
||||||
|
|
||||||
3. `tiny_unified_cache.c` (Layer 2):
|
|
||||||
- Changed calloc() → __libc_calloc() (infrastructure isolation)
|
|
||||||
- Changed free() → __libc_free() (symmetric cleanup)
|
|
||||||
- Added defensive fix comments explaining infrastructure bypass
|
|
||||||
|
|
||||||
### Phase 8-TLS-Fix
|
|
||||||
**Commit**: `da8f4d2c8`
|
|
||||||
**Date**: 2025-11-30
|
|
||||||
**Files**: 3 files, 21 insertions(+), 11 deletions(-)
|
|
||||||
|
|
||||||
**Changes**:
|
|
||||||
1. `bench_fast_box.c` (TLS→Atomic):
|
|
||||||
- Changed `__thread int bench_fast_init_in_progress` → `atomic_int g_bench_fast_init_in_progress`
|
|
||||||
- Added atomic_load() for reads, atomic_store() for writes
|
|
||||||
- Added root cause comments (pthread_once creates fresh TLS)
|
|
||||||
|
|
||||||
2. `bench_fast_box.h` (TLS→Atomic):
|
|
||||||
- Updated extern declaration to match atomic_int
|
|
||||||
- Added Phase 8-TLS-Fix comment explaining cross-thread safety
|
|
||||||
|
|
||||||
3. `bench_fast_box.c` (Header Write):
|
|
||||||
- Replaced `tiny_region_id_write_header()` → direct write `*(uint8_t*)base = 0xa0 | class_idx`
|
|
||||||
- Added Phase 8-P3-Fix comment explaining P3 optimization bypass
|
|
||||||
- Contract: BenchFast always writes headers (required for free routing)
|
|
||||||
|
|
||||||
4. `hak_wrappers.inc.h` (Atomic):
|
|
||||||
- Updated bench_fast_init_in_progress check to use atomic_load()
|
|
||||||
- Added Phase 8-TLS-Fix comment for cross-thread safety
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Performance Journey
|
## Phase 9-2 Achievements (Completed)
|
||||||
|
|
||||||
### Phase-by-Phase Progress
|
1. **Critical Fixes (Deadlock & OOM)**
|
||||||
|
* **Deadlock**: `shared_pool_acquire_slab` now releases `alloc_lock` before calling `superslab_allocate` (via `sp_internal_allocate_superslab`), preventing lock inversion with `g_super_reg_lock`.
|
||||||
|
* **OOM**: Enabled `HAKMEM_TINY_USE_SUPERSLAB=1` by default in `hakmem_build_flags.h`, ensuring fallback to Legacy Backend when Shared Pool hits soft cap.
|
||||||
|
|
||||||
```
|
2. **SuperSlab Management Unification**
|
||||||
Phase 3 (mincore removal): 56.8 M ops/s
|
* **Unified Entry**: `sp_internal_allocate_superslab` helper introduced to manage safe allocation flow.
|
||||||
Phase 4 (Hot/Cold Box): 57.2 M ops/s (+0.7%)
|
* **Unified Free**: `remove_superslab_from_legacy_head` implemented to safely remove pointers from legacy lists when freeing via Shared Pool.
|
||||||
Phase 5 (Mid MT fix): 52.3 M ops/s (-8.6% regression)
|
|
||||||
Phase 6 (Lock-free Mid MT): 42.1 M ops/s (Mid MT: +2.65%)
|
|
||||||
Phase 7-Step1 (Unified front): 80.6 M ops/s (+54.2%!) ⭐
|
|
||||||
Phase 7-Step4 (Dead code): 81.5 M ops/s (+1.1%) ⭐⭐
|
|
||||||
Phase 8 (Normal mode): 16.3 M ops/s (working, different workload)
|
|
||||||
|
|
||||||
Total improvement: +43.5% (56.8M → 81.5M) from Phase 3
|
3. **Code Refactoring (Split `hakmem_shared_pool.c`)**
|
||||||
```
|
* **Split Strategy**: Divided the monolithic `core/hakmem_shared_pool.c` (1400+ lines) into logical modules:
|
||||||
|
* `core/hakmem_shared_pool.c`: Initialization, stats, and common helpers.
|
||||||
**Note**: Phase 8 used different benchmark (10M iterations, ws=8192) vs Phase 7 (ws=256).
|
* `core/hakmem_shared_pool_acquire.c`: Allocation logic (`shared_pool_acquire_slab` and Stage 0.5-3).
|
||||||
Normal mode performance: 16.3M ops/s (working, no crash).
|
* `core/hakmem_shared_pool_release.c`: Deallocation logic (`shared_pool_release_slab`).
|
||||||
|
* `core/hakmem_shared_pool_internal.h`: Internal shared definitions and prototypes.
|
||||||
|
* **Makefile**: Updated to compile and link the new files.
|
||||||
|
* **Cleanups**: Removed unused "L0 Cache" experimental code and fixed incorrect function names (`superslab_alloc` -> `superslab_allocate`).
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Technical Details
|
## Next Phase Candidates (Handover from Phase 9-2)
|
||||||
|
|
||||||
### Layer 0: Prealloc Capacity Fix
|
### 1. Soft Cap (Policy) Tuning
|
||||||
|
* **Issue**: Medium Working Sets (8192) hit the Shared Pool "Soft Cap" easily, causing frequent fallbacks and performance degradation.
|
||||||
|
* **Action**: Review `hakmem_policy.c` and adjust `tiny_cap` or improve dynamic adjustment logic.
|
||||||
|
|
||||||
**File**: `core/box/bench_fast_box.c`
|
### 2. Fast Path Optimization
|
||||||
**Lines**: 131-148
|
* **Issue**: Small Working Sets (256) show 70-88% performance vs SysAlloc due to lock/call overhead. Refactoring caused a slight dip (15%), highlighting the need for optimization.
|
||||||
|
* **Action**: Re-implement a lightweight L0 Cache or optimize the lock-free path in Shared Pool for hot-path performance. Consider inlining hot helpers again via header-only implementations if needed.
|
||||||
|
|
||||||
**Root Cause**:
|
### 3. Legacy Backend Removal
|
||||||
- Old code preallocated 50,000 blocks/class
|
* **Issue**: Legacy Backend (`g_superslab_heads`) is still kept for fallback but causes complexity.
|
||||||
- TLS SLL actual capacity: 128 blocks (adaptive sizing limit)
|
* **Action**: Plan complete removal of `g_superslab_heads`, migrating all management to Shared Pool.
|
||||||
- Lost blocks (beyond 128) caused heap corruption
|
|
||||||
|
|
||||||
**Fix**:
|
|
||||||
```c
|
|
||||||
// Before:
|
|
||||||
const uint32_t PREALLOC_COUNT = 50000; // Too large!
|
|
||||||
|
|
||||||
// After:
|
|
||||||
const uint32_t ACTUAL_TLS_SLL_CAPACITY = 128; // Observed actual capacity
|
|
||||||
for (int cls = 2; cls <= 7; cls++) {
|
|
||||||
uint32_t capacity = ACTUAL_TLS_SLL_CAPACITY;
|
|
||||||
for (int i = 0; i < (int)capacity; i++) {
|
|
||||||
// preallocate...
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### Layer 1: Design Misunderstanding Fix
|
|
||||||
|
|
||||||
**File**: `core/box/bench_fast_box.c`
|
|
||||||
**Lines**: 123-128 (REMOVED)
|
|
||||||
|
|
||||||
**Root Cause**:
|
|
||||||
- BenchFast uses TLS SLL directly (g_tls_sll[])
|
|
||||||
- Unified Cache is NOT used by BenchFast
|
|
||||||
- unified_cache_init() created 16KB allocations (infrastructure)
|
|
||||||
- Later freed by BenchFast → header misclassification → CRASH
|
|
||||||
|
|
||||||
**Fix**:
|
|
||||||
```c
|
|
||||||
// REMOVED:
|
|
||||||
// unified_cache_init(); // WRONG! BenchFast uses TLS SLL, not Unified Cache
|
|
||||||
|
|
||||||
// Added comment:
|
|
||||||
// Phase 8 Root Cause Fix: REMOVED unified_cache_init() call
|
|
||||||
// Reason: BenchFast uses TLS SLL directly, NOT Unified Cache
|
|
||||||
```
|
|
||||||
|
|
||||||
### Layer 2: Infrastructure Isolation
|
|
||||||
|
|
||||||
**File**: `core/front/tiny_unified_cache.c`
|
|
||||||
**Lines**: 61-71 (init), 103-109 (shutdown)
|
|
||||||
|
|
||||||
**Strategy**: Dual-Path Separation
|
|
||||||
- **Workload allocations** (measured): HAKMEM paths (TLS SLL, Unified Cache)
|
|
||||||
- **Infrastructure allocations** (unmeasured): __libc_calloc/__libc_free
|
|
||||||
|
|
||||||
**Fix**:
|
|
||||||
```c
|
|
||||||
// Before:
|
|
||||||
g_unified_cache[cls].slots = (void**)calloc(cap, sizeof(void*));
|
|
||||||
|
|
||||||
// After:
|
|
||||||
extern void* __libc_calloc(size_t, size_t);
|
|
||||||
g_unified_cache[cls].slots = (void**)__libc_calloc(cap, sizeof(void*));
|
|
||||||
```
|
|
||||||
|
|
||||||
### Layer 3: Box Contract Documentation
|
|
||||||
|
|
||||||
**File**: `core/box/bench_fast_box.h`
|
|
||||||
**Lines**: 13-51
|
|
||||||
|
|
||||||
**Added Documentation**:
|
|
||||||
- BenchFast uses TLS SLL, NOT Unified Cache
|
|
||||||
- Scope separation (workload vs infrastructure)
|
|
||||||
- Preconditions and guarantees
|
|
||||||
- Contract violation example (Phase 8 bug)
|
|
||||||
|
|
||||||
### TLS→Atomic Fix
|
|
||||||
|
|
||||||
**File**: `core/box/bench_fast_box.c`
|
|
||||||
**Lines**: 22-27 (declaration), 37, 124, 215 (usage)
|
|
||||||
|
|
||||||
**Root Cause**:
|
|
||||||
```
|
|
||||||
pthread_once() → creates new thread
|
|
||||||
New thread has fresh TLS (bench_fast_init_in_progress = 0)
|
|
||||||
Guard broken → getenv() allocates → freed by __libc_free() → CRASH
|
|
||||||
```
|
|
||||||
|
|
||||||
**Fix**:
|
|
||||||
```c
|
|
||||||
// Before (TLS - broken):
|
|
||||||
__thread int bench_fast_init_in_progress = 0;
|
|
||||||
if (__builtin_expect(bench_fast_init_in_progress, 0)) { ... }
|
|
||||||
|
|
||||||
// After (Atomic - fixed):
|
|
||||||
atomic_int g_bench_fast_init_in_progress = 0;
|
|
||||||
if (__builtin_expect(atomic_load(&g_bench_fast_init_in_progress), 0)) { ... }
|
|
||||||
```
|
|
||||||
|
|
||||||
**箱理論 Validation**:
|
|
||||||
- **Responsibility**: Guard must protect entire process (not per-thread)
|
|
||||||
- **Contract**: "No BenchFast allocations during init" (all threads)
|
|
||||||
- **Observable**: Atomic variable visible across all threads
|
|
||||||
- **Composable**: Works with pthread_once() threading model
|
|
||||||
|
|
||||||
### Header Write Fix
|
|
||||||
|
|
||||||
**File**: `core/box/bench_fast_box.c`
|
|
||||||
**Lines**: 70-80
|
|
||||||
|
|
||||||
**Root Cause**:
|
|
||||||
- P3 optimization: tiny_region_id_write_header() skips header writes by default
|
|
||||||
- BenchFast free routing checks header magic (0xa0-0xa7)
|
|
||||||
- No header → free() misroutes to __libc_free() → CRASH
|
|
||||||
|
|
||||||
**Fix**:
|
|
||||||
```c
|
|
||||||
// Before (broken - calls function that skips write):
|
|
||||||
tiny_region_id_write_header(base, class_idx);
|
|
||||||
return (void*)((char*)base + 1);
|
|
||||||
|
|
||||||
// After (fixed - direct write):
|
|
||||||
*(uint8_t*)base = (uint8_t)(0xa0 | (class_idx & 0x0f)); // Direct write
|
|
||||||
return (void*)((char*)base + 1);
|
|
||||||
```
|
|
||||||
|
|
||||||
**Contract**: BenchFast always writes headers (required for free routing)
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Next Phase Options
|
## Current Status
|
||||||
|
* **Build**: Passing (Clean build verified).
|
||||||
### Option A: Continue Phase 7 (Steps 5-7) 📦
|
* **Benchmarks**:
|
||||||
**Goal**: Remove remaining legacy layers (complete dead code elimination)
|
* `HAKMEM_TINY_SS_SHARED=1` (Normal): ~20.0 M ops/s (working, fallback active).
|
||||||
**Expected**: Additional +3-5% via further code cleanup
|
* `HAKMEM_TINY_SS_SHARED=2` (Strict): ~20.3 M ops/s (working, OOMs on soft cap as expected).
|
||||||
**Duration**: 1-2 days
|
* **Pending**: Selection of next focus area.
|
||||||
**Risk**: Low (infrastructure already in place)
|
|
||||||
|
|
||||||
**Remaining Steps**:
|
|
||||||
- Step 5: Compile library with PGO flag (Makefile change)
|
|
||||||
- Step 6: Verify dead code elimination in assembly
|
|
||||||
- Step 7: Measure performance improvement
|
|
||||||
|
|
||||||
### Option B: PGO Re-enablement 🚀
|
|
||||||
**Goal**: Re-enable PGO workflow from Phase 4-Step1
|
|
||||||
**Expected**: +6-13% cumulative (on top of 81.5M)
|
|
||||||
**Duration**: 2-3 days
|
|
||||||
**Risk**: Low (proven pattern)
|
|
||||||
|
|
||||||
**Current projection**:
|
|
||||||
- Phase 7 baseline: 81.5 M ops/s
|
|
||||||
- With PGO: ~86-93 M ops/s (+6-13%)
|
|
||||||
|
|
||||||
### Option C: BenchFast Pool Expansion 🏎️
|
|
||||||
**Goal**: Increase BenchFast pool size for full 10M iteration support
|
|
||||||
**Expected**: Structural ceiling measurement (30-40M ops/s target)
|
|
||||||
**Duration**: 1 day
|
|
||||||
**Risk**: Low (just increase prealloc count)
|
|
||||||
|
|
||||||
**Current status**:
|
|
||||||
- Pool: 128 blocks/class (768 total)
|
|
||||||
- Exhaustion: C6/C7 exhaust after ~200 iterations
|
|
||||||
- Need: ~10,000 blocks/class for 10M iterations (60,000 total)
|
|
||||||
|
|
||||||
### Option D: Production Readiness 📊
|
|
||||||
**Goal**: Comprehensive benchmark suite, deployment guide
|
|
||||||
**Expected**: Full performance comparison, stability testing
|
|
||||||
**Duration**: 3-5 days
|
|
||||||
**Risk**: Low (documentation + testing)
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Recommendation
|
|
||||||
|
|
||||||
### Top Pick: **Option C (BenchFast Pool Expansion)** 🏎️
|
|
||||||
|
|
||||||
**Reasoning**:
|
|
||||||
1. **Phase 8 fixes working**: TLS→Atomic + Header write proven
|
|
||||||
2. **Quick win**: Just increase ACTUAL_TLS_SLL_CAPACITY to 10,000
|
|
||||||
3. **Scientific value**: Measure true structural ceiling (no safety costs)
|
|
||||||
4. **Low risk**: 1-day task, no code changes (just capacity tuning)
|
|
||||||
5. **Data-driven**: Enables comparison vs normal mode (16.3M vs 30-40M expected)
|
|
||||||
|
|
||||||
**Expected Result**:
|
|
||||||
```
|
|
||||||
Normal mode: 16.3 M ops/s (current)
|
|
||||||
BenchFast mode: 30-40 M ops/s (target, 2-2.5x faster)
|
|
||||||
```
|
|
||||||
|
|
||||||
**Implementation**:
|
|
||||||
```c
|
|
||||||
// core/box/bench_fast_box.c:140
|
|
||||||
const uint32_t ACTUAL_TLS_SLL_CAPACITY = 10000; // Was 128
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### Second Choice: **Option B (PGO Re-enablement)** 🚀
|
|
||||||
|
|
||||||
**Reasoning**:
|
|
||||||
1. **Proven benefit**: +6.25% in Phase 4-Step1
|
|
||||||
2. **Cumulative**: Would stack with Phase 7 (81.5M baseline)
|
|
||||||
3. **Low risk**: Just fix build issue
|
|
||||||
4. **High impact**: ~86-93 M ops/s projected
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Current Performance Summary
|
|
||||||
|
|
||||||
### bench_random_mixed (16B-1KB, Tiny workload)
|
|
||||||
```
|
|
||||||
Phase 7-Step4 (ws=256): 81.5 M ops/s (+55.5% total)
|
|
||||||
Phase 8 (ws=8192): 16.3 M ops/s (normal mode, working)
|
|
||||||
```
|
|
||||||
|
|
||||||
### bench_mid_mt_gap (1KB-8KB, Mid MT workload, ws=256)
|
|
||||||
```
|
|
||||||
After Phase 6-B (lock-free): 42.09 M ops/s (+2.65%)
|
|
||||||
vs System malloc: 26.8 M ops/s (1.57x faster)
|
|
||||||
```
|
|
||||||
|
|
||||||
### Overall Status
|
|
||||||
- ✅ **Tiny allocations** (16B-1KB): **81.5 M ops/s** (excellent, +55.5%!)
|
|
||||||
- ✅ **Mid MT allocations** (1KB-8KB): 42 M ops/s (excellent, 1.57x vs system)
|
|
||||||
- ✅ **BenchFast mode**: No crash (TLS→Atomic + Header fix working)
|
|
||||||
- ⏸️ **Large allocations** (32KB-2MB): Not benchmarked yet
|
|
||||||
- ⏸️ **MT workloads**: No MT benchmarks yet
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Decision Time
|
|
||||||
|
|
||||||
**Choose your next phase**:
|
|
||||||
- **Option A**: Continue Phase 7 (Steps 5-7, final cleanup)
|
|
||||||
- **Option B**: PGO re-enablement (recommended for normal builds)
|
|
||||||
- **Option C**: BenchFast pool expansion (recommended for ceiling measurement)
|
|
||||||
- **Option D**: Production readiness & benchmarking
|
|
||||||
|
|
||||||
**Or**: Celebrate Phase 8 success! 🎉 (Root cause fixes complete!)
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
Updated: 2025-11-30
|
|
||||||
Phase: 8 COMPLETE (Root Cause Fixes) → 9 PENDING
|
|
||||||
Previous: Phase 7 (Tiny Front Unification, +55.5%)
|
|
||||||
Achievement: BenchFast crash investigation and fixes (箱理論 root cause analysis!)
|
|
||||||
|
|||||||
8
Makefile
8
Makefile
@ -218,12 +218,12 @@ LDFLAGS += $(EXTRA_LDFLAGS)
|
|||||||
|
|
||||||
# Targets
|
# Targets
|
||||||
TARGET = test_hakmem
|
TARGET = test_hakmem
|
||||||
OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o superslab_allocate.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o superslab_head.o hakmem_smallmid.o hakmem_smallmid_superslab.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_mid_mt.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_local_box.o core/box/free_remote_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/unified_batch_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/ss_tls_hint_box.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/link_stubs.o core/tiny_failfast.o test_hakmem.o
|
OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o superslab_allocate.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o superslab_head.o hakmem_smallmid.o hakmem_smallmid_superslab.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_mid_mt.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_local_box.o core/box/free_remote_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/unified_batch_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/ss_tls_hint_box.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/link_stubs.o core/tiny_failfast.o test_hakmem.o
|
||||||
OBJS = $(OBJS_BASE)
|
OBJS = $(OBJS_BASE)
|
||||||
|
|
||||||
# Shared library
|
# Shared library
|
||||||
SHARED_LIB = libhakmem.so
|
SHARED_LIB = libhakmem.so
|
||||||
SHARED_OBJS = hakmem_shared.o hakmem_config_shared.o hakmem_tiny_config_shared.o hakmem_ucb1_shared.o hakmem_bigcache_shared.o hakmem_pool_shared.o hakmem_l25_pool_shared.o hakmem_site_rules_shared.o hakmem_tiny_shared.o superslab_allocate_shared.o superslab_stats_shared.o superslab_cache_shared.o superslab_ace_shared.o superslab_slab_shared.o superslab_backend_shared.o superslab_head_shared.o hakmem_smallmid_shared.o hakmem_smallmid_superslab_shared.o core/box/superslab_expansion_box_shared.o core/box/integrity_box_shared.o core/box/mailbox_box_shared.o core/box/front_gate_box_shared.o core/box/front_gate_classifier_shared.o core/box/free_local_box_shared.o core/box/free_remote_box_shared.o core/box/free_publish_box_shared.o core/box/capacity_box_shared.o core/box/carve_push_box_shared.o core/box/unified_batch_box_shared.o core/box/prewarm_box_shared.o core/box/ss_hot_prewarm_box_shared.o core/box/front_metrics_box_shared.o core/box/bench_fast_box_shared.o core/box/ss_addr_map_box_shared.o core/box/ss_tls_hint_box_shared.o core/box/slab_recycling_box_shared.o core/box/pagefault_telemetry_box_shared.o core/box/tiny_sizeclass_hist_box_shared.o core/page_arena_shared.o core/front/tiny_unified_cache_shared.o core/tiny_alloc_fast_push_shared.o core/link_stubs_shared.o core/tiny_failfast_shared.o tiny_sticky_shared.o tiny_remote_shared.o tiny_publish_shared.o tiny_debug_ring_shared.o hakmem_tiny_magazine_shared.o hakmem_tiny_stats_shared.o hakmem_tiny_sfc_shared.o hakmem_tiny_query_shared.o hakmem_tiny_rss_shared.o hakmem_tiny_registry_shared.o hakmem_tiny_remote_target_shared.o hakmem_tiny_bg_spill_shared.o tiny_adaptive_sizing_shared.o hakmem_mid_mt_shared.o hakmem_super_registry_shared.o hakmem_shared_pool_shared.o hakmem_elo_shared.o hakmem_batch_shared.o hakmem_p2_shared.o hakmem_sizeclass_dist_shared.o hakmem_evo_shared.o hakmem_debug_shared.o hakmem_sys_shared.o hakmem_whale_shared.o hakmem_policy_shared.o hakmem_ace_shared.o hakmem_ace_stats_shared.o hakmem_ace_controller_shared.o hakmem_ace_metrics_shared.o hakmem_ace_ucb1_shared.o hakmem_prof_shared.o hakmem_learner_shared.o hakmem_size_hist_shared.o hakmem_learn_log_shared.o hakmem_syscall_shared.o tiny_fastcache_shared.o
|
SHARED_OBJS = hakmem_shared.o hakmem_config_shared.o hakmem_tiny_config_shared.o hakmem_ucb1_shared.o hakmem_bigcache_shared.o hakmem_pool_shared.o hakmem_l25_pool_shared.o hakmem_site_rules_shared.o hakmem_tiny_shared.o superslab_allocate_shared.o superslab_stats_shared.o superslab_cache_shared.o superslab_ace_shared.o superslab_slab_shared.o superslab_backend_shared.o superslab_head_shared.o hakmem_smallmid_shared.o hakmem_smallmid_superslab_shared.o core/box/superslab_expansion_box_shared.o core/box/integrity_box_shared.o core/box/mailbox_box_shared.o core/box/front_gate_box_shared.o core/box/front_gate_classifier_shared.o core/box/free_local_box_shared.o core/box/free_remote_box_shared.o core/box/free_publish_box_shared.o core/box/capacity_box_shared.o core/box/carve_push_box_shared.o core/box/unified_batch_box_shared.o core/box/prewarm_box_shared.o core/box/ss_hot_prewarm_box_shared.o core/box/front_metrics_box_shared.o core/box/bench_fast_box_shared.o core/box/ss_addr_map_box_shared.o core/box/ss_tls_hint_box_shared.o core/box/slab_recycling_box_shared.o core/box/pagefault_telemetry_box_shared.o core/box/tiny_sizeclass_hist_box_shared.o core/page_arena_shared.o core/front/tiny_unified_cache_shared.o core/tiny_alloc_fast_push_shared.o core/link_stubs_shared.o core/tiny_failfast_shared.o tiny_sticky_shared.o tiny_remote_shared.o tiny_publish_shared.o tiny_debug_ring_shared.o hakmem_tiny_magazine_shared.o hakmem_tiny_stats_shared.o hakmem_tiny_sfc_shared.o hakmem_tiny_query_shared.o hakmem_tiny_rss_shared.o hakmem_tiny_registry_shared.o hakmem_tiny_remote_target_shared.o hakmem_tiny_bg_spill_shared.o tiny_adaptive_sizing_shared.o hakmem_mid_mt_shared.o hakmem_super_registry_shared.o hakmem_shared_pool_shared.o hakmem_shared_pool_acquire_shared.o hakmem_shared_pool_release_shared.o hakmem_elo_shared.o hakmem_batch_shared.o hakmem_p2_shared.o hakmem_sizeclass_dist_shared.o hakmem_evo_shared.o hakmem_debug_shared.o hakmem_sys_shared.o hakmem_whale_shared.o hakmem_policy_shared.o hakmem_ace_shared.o hakmem_ace_stats_shared.o hakmem_ace_controller_shared.o hakmem_ace_metrics_shared.o hakmem_ace_ucb1_shared.o hakmem_prof_shared.o hakmem_learner_shared.o hakmem_size_hist_shared.o hakmem_learn_log_shared.o hakmem_syscall_shared.o tiny_fastcache_shared.o
|
||||||
|
|
||||||
# Pool TLS Phase 1 (enable with POOL_TLS_PHASE1=1)
|
# Pool TLS Phase 1 (enable with POOL_TLS_PHASE1=1)
|
||||||
ifeq ($(POOL_TLS_PHASE1),1)
|
ifeq ($(POOL_TLS_PHASE1),1)
|
||||||
@ -250,7 +250,7 @@ endif
|
|||||||
# Benchmark targets
|
# Benchmark targets
|
||||||
BENCH_HAKMEM = bench_allocators_hakmem
|
BENCH_HAKMEM = bench_allocators_hakmem
|
||||||
BENCH_SYSTEM = bench_allocators_system
|
BENCH_SYSTEM = bench_allocators_system
|
||||||
BENCH_HAKMEM_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o superslab_allocate.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o superslab_head.o hakmem_smallmid.o hakmem_smallmid_superslab.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_mid_mt.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_local_box.o core/box/free_remote_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/unified_batch_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/ss_tls_hint_box.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/link_stubs.o core/tiny_failfast.o bench_allocators_hakmem.o
|
BENCH_HAKMEM_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o superslab_allocate.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o superslab_head.o hakmem_smallmid.o hakmem_smallmid_superslab.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_mid_mt.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_local_box.o core/box/free_remote_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/unified_batch_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/ss_tls_hint_box.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/link_stubs.o core/tiny_failfast.o bench_allocators_hakmem.o
|
||||||
BENCH_HAKMEM_OBJS = $(BENCH_HAKMEM_OBJS_BASE)
|
BENCH_HAKMEM_OBJS = $(BENCH_HAKMEM_OBJS_BASE)
|
||||||
ifeq ($(POOL_TLS_PHASE1),1)
|
ifeq ($(POOL_TLS_PHASE1),1)
|
||||||
BENCH_HAKMEM_OBJS += pool_tls.o pool_refill.o pool_tls_arena.o pool_tls_registry.o pool_tls_remote.o
|
BENCH_HAKMEM_OBJS += pool_tls.o pool_refill.o pool_tls_arena.o pool_tls_registry.o pool_tls_remote.o
|
||||||
@ -427,7 +427,7 @@ test-box-refactor: box-refactor
|
|||||||
./larson_hakmem 10 8 128 1024 1 12345 4
|
./larson_hakmem 10 8 128 1024 1 12345 4
|
||||||
|
|
||||||
# Phase 4: Tiny Pool benchmarks (properly linked with hakmem)
|
# Phase 4: Tiny Pool benchmarks (properly linked with hakmem)
|
||||||
TINY_BENCH_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o superslab_allocate.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o superslab_head.o hakmem_smallmid.o hakmem_smallmid_superslab.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/free_local_box.o core/box/free_remote_box.o core/box/free_publish_box.o core/box/capacity_box.o core/box/carve_push_box.o core/box/unified_batch_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/ss_tls_hint_box.o core/box/slab_recycling_box.o core/box/tiny_sizeclass_hist_box.o core/box/pagefault_telemetry_box.o core/page_arena.o core/front/tiny_unified_cache.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_mid_mt.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/tiny_alloc_fast_push.o core/link_stubs.o core/tiny_failfast.o
|
TINY_BENCH_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o superslab_allocate.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o superslab_head.o hakmem_smallmid.o hakmem_smallmid_superslab.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/free_local_box.o core/box/free_remote_box.o core/box/free_publish_box.o core/box/capacity_box.o core/box/carve_push_box.o core/box/unified_batch_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/ss_tls_hint_box.o core/box/slab_recycling_box.o core/box/tiny_sizeclass_hist_box.o core/box/pagefault_telemetry_box.o core/page_arena.o core/front/tiny_unified_cache.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_mid_mt.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/tiny_alloc_fast_push.o core/link_stubs.o core/tiny_failfast.o
|
||||||
TINY_BENCH_OBJS = $(TINY_BENCH_OBJS_BASE)
|
TINY_BENCH_OBJS = $(TINY_BENCH_OBJS_BASE)
|
||||||
ifeq ($(POOL_TLS_PHASE1),1)
|
ifeq ($(POOL_TLS_PHASE1),1)
|
||||||
TINY_BENCH_OBJS += pool_tls.o pool_refill.o core/pool_tls_arena.o pool_tls_registry.o pool_tls_remote.o
|
TINY_BENCH_OBJS += pool_tls.o pool_refill.o core/pool_tls_arena.o pool_tls_registry.o pool_tls_remote.o
|
||||||
|
|||||||
File diff suppressed because it is too large
Load Diff
479
core/hakmem_shared_pool_acquire.c
Normal file
479
core/hakmem_shared_pool_acquire.c
Normal file
@ -0,0 +1,479 @@
|
|||||||
|
#include "hakmem_shared_pool_internal.h"
|
||||||
|
#include "hakmem_debug_master.h"
|
||||||
|
#include "hakmem_stats_master.h"
|
||||||
|
#include "box/ss_slab_meta_box.h"
|
||||||
|
#include "box/ss_hot_cold_box.h"
|
||||||
|
#include "box/pagefault_telemetry_box.h"
|
||||||
|
#include "box/tls_sll_drain_box.h"
|
||||||
|
#include "box/tls_slab_reuse_guard_box.h"
|
||||||
|
#include "hakmem_policy.h"
|
||||||
|
|
||||||
|
#include <stdlib.h>
|
||||||
|
#include <stdio.h>
|
||||||
|
#include <stdatomic.h>
|
||||||
|
|
||||||
|
// Stage 0.5: EMPTY slab direct scan(registry ベースの EMPTY 再利用)
|
||||||
|
// Scan existing SuperSlabs for EMPTY slabs (highest reuse priority) to
|
||||||
|
// avoid Stage 3 (mmap) when freed slabs are available.
|
||||||
|
static inline int
|
||||||
|
sp_acquire_from_empty_scan(int class_idx, SuperSlab** ss_out, int* slab_idx_out, int dbg_acquire)
|
||||||
|
{
|
||||||
|
static int empty_reuse_enabled = -1;
|
||||||
|
if (__builtin_expect(empty_reuse_enabled == -1, 0)) {
|
||||||
|
const char* e = getenv("HAKMEM_SS_EMPTY_REUSE");
|
||||||
|
empty_reuse_enabled = (e && *e && *e == '0') ? 0 : 1; // default ON
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!empty_reuse_enabled) {
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
|
||||||
|
extern SuperSlab* g_super_reg_by_class[TINY_NUM_CLASSES][SUPER_REG_PER_CLASS];
|
||||||
|
extern int g_super_reg_class_size[TINY_NUM_CLASSES];
|
||||||
|
|
||||||
|
int reg_size = (class_idx < TINY_NUM_CLASSES) ? g_super_reg_class_size[class_idx] : 0;
|
||||||
|
static int scan_limit = -1;
|
||||||
|
if (__builtin_expect(scan_limit == -1, 0)) {
|
||||||
|
const char* e = getenv("HAKMEM_SS_EMPTY_SCAN_LIMIT");
|
||||||
|
scan_limit = (e && *e) ? atoi(e) : 32; // default: scan first 32 SuperSlabs (Phase 9-2 tuning)
|
||||||
|
}
|
||||||
|
if (scan_limit > reg_size) scan_limit = reg_size;
|
||||||
|
|
||||||
|
// Stage 0.5 hit counter for visualization
|
||||||
|
static _Atomic uint64_t stage05_hits = 0;
|
||||||
|
static _Atomic uint64_t stage05_attempts = 0;
|
||||||
|
atomic_fetch_add_explicit(&stage05_attempts, 1, memory_order_relaxed);
|
||||||
|
|
||||||
|
for (int i = 0; i < scan_limit; i++) {
|
||||||
|
SuperSlab* ss = g_super_reg_by_class[class_idx][i];
|
||||||
|
if (!(ss && ss->magic == SUPERSLAB_MAGIC)) continue;
|
||||||
|
if (ss->empty_count == 0) continue; // No EMPTY slabs in this SS
|
||||||
|
|
||||||
|
uint32_t mask = ss->empty_mask;
|
||||||
|
while (mask) {
|
||||||
|
int empty_idx = __builtin_ctz(mask);
|
||||||
|
mask &= (mask - 1); // clear lowest bit
|
||||||
|
|
||||||
|
TinySlabMeta* meta = &ss->slabs[empty_idx];
|
||||||
|
if (meta->capacity > 0 && meta->used == 0) {
|
||||||
|
tiny_tls_slab_reuse_guard(ss);
|
||||||
|
ss_clear_slab_empty(ss, empty_idx);
|
||||||
|
|
||||||
|
meta->class_idx = (uint8_t)class_idx;
|
||||||
|
ss->class_map[empty_idx] = (uint8_t)class_idx;
|
||||||
|
|
||||||
|
#if !HAKMEM_BUILD_RELEASE
|
||||||
|
if (dbg_acquire == 1) {
|
||||||
|
fprintf(stderr,
|
||||||
|
"[SP_ACQUIRE_STAGE0.5_EMPTY] class=%d reusing EMPTY slab (ss=%p slab=%d empty_count=%u)\n",
|
||||||
|
class_idx, (void*)ss, empty_idx, ss->empty_count);
|
||||||
|
}
|
||||||
|
#else
|
||||||
|
(void)dbg_acquire;
|
||||||
|
#endif
|
||||||
|
|
||||||
|
*ss_out = ss;
|
||||||
|
*slab_idx_out = empty_idx;
|
||||||
|
sp_stage_stats_init();
|
||||||
|
if (g_sp_stage_stats_enabled) {
|
||||||
|
atomic_fetch_add(&g_sp_stage1_hits[class_idx], 1);
|
||||||
|
}
|
||||||
|
atomic_fetch_add_explicit(&stage05_hits, 1, memory_order_relaxed);
|
||||||
|
|
||||||
|
// Stage 0.5 hit rate visualization (every 100 hits)
|
||||||
|
uint64_t hits = atomic_load_explicit(&stage05_hits, memory_order_relaxed);
|
||||||
|
if (hits % 100 == 1) {
|
||||||
|
uint64_t attempts = atomic_load_explicit(&stage05_attempts, memory_order_relaxed);
|
||||||
|
fprintf(stderr, "[STAGE0.5_STATS] hits=%lu attempts=%lu rate=%.1f%% (scan_limit=%d)\n",
|
||||||
|
hits, attempts, (double)hits * 100.0 / attempts, scan_limit);
|
||||||
|
}
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
|
||||||
|
int
|
||||||
|
shared_pool_acquire_slab(int class_idx, SuperSlab** ss_out, int* slab_idx_out)
|
||||||
|
{
|
||||||
|
// Phase 12: SP-SLOT Box - 3-Stage Acquire Logic
|
||||||
|
//
|
||||||
|
// Stage 1: Reuse EMPTY slots from per-class free list (EMPTY→ACTIVE)
|
||||||
|
// Stage 2: Find UNUSED slots in existing SuperSlabs
|
||||||
|
// Stage 3: Get new SuperSlab (LRU pop or mmap)
|
||||||
|
//
|
||||||
|
// Invariants:
|
||||||
|
// - On success: *ss_out != NULL, 0 <= *slab_idx_out < total_slots
|
||||||
|
// - The chosen slab has meta->class_idx == class_idx
|
||||||
|
|
||||||
|
if (!ss_out || !slab_idx_out) {
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
if (class_idx < 0 || class_idx >= TINY_NUM_CLASSES_SS) {
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
|
||||||
|
shared_pool_init();
|
||||||
|
|
||||||
|
// Debug logging / stage stats
|
||||||
|
#if !HAKMEM_BUILD_RELEASE
|
||||||
|
static int dbg_acquire = -1;
|
||||||
|
if (__builtin_expect(dbg_acquire == -1, 0)) {
|
||||||
|
const char* e = getenv("HAKMEM_SS_ACQUIRE_DEBUG");
|
||||||
|
dbg_acquire = (e && *e && *e != '0') ? 1 : 0;
|
||||||
|
}
|
||||||
|
#else
|
||||||
|
static const int dbg_acquire = 0;
|
||||||
|
#endif
|
||||||
|
sp_stage_stats_init();
|
||||||
|
|
||||||
|
stage1_retry_after_tension_drain:
|
||||||
|
// ========== Stage 0.5 (Phase 12-1.1): EMPTY slab direct scan ==========
|
||||||
|
// Scan existing SuperSlabs for EMPTY slabs (highest reuse priority) to
|
||||||
|
// avoid Stage 3 (mmap) when freed slabs are available.
|
||||||
|
if (sp_acquire_from_empty_scan(class_idx, ss_out, slab_idx_out, dbg_acquire) == 0) {
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
// ========== Stage 1 (Lock-Free): Try to reuse EMPTY slots ==========
|
||||||
|
// P0-4: Lock-free pop from per-class free list (no mutex needed!)
|
||||||
|
// Best case: Same class freed a slot, reuse immediately (cache-hot)
|
||||||
|
SharedSSMeta* reuse_meta = NULL;
|
||||||
|
int reuse_slot_idx = -1;
|
||||||
|
|
||||||
|
if (sp_freelist_pop_lockfree(class_idx, &reuse_meta, &reuse_slot_idx)) {
|
||||||
|
// Found EMPTY slot from lock-free list!
|
||||||
|
// Now acquire mutex ONLY for slot activation and metadata update
|
||||||
|
|
||||||
|
// P0 instrumentation: count lock acquisitions
|
||||||
|
lock_stats_init();
|
||||||
|
if (g_lock_stats_enabled == 1) {
|
||||||
|
atomic_fetch_add(&g_lock_acquire_count, 1);
|
||||||
|
atomic_fetch_add(&g_lock_acquire_slab_count, 1);
|
||||||
|
}
|
||||||
|
|
||||||
|
pthread_mutex_lock(&g_shared_pool.alloc_lock);
|
||||||
|
|
||||||
|
// P0.3: Guard against TLS SLL orphaned pointers before reusing slab
|
||||||
|
// RACE FIX: Load SuperSlab pointer atomically BEFORE guard (consistency)
|
||||||
|
SuperSlab* ss_guard = atomic_load_explicit(&reuse_meta->ss, memory_order_relaxed);
|
||||||
|
if (ss_guard) {
|
||||||
|
tiny_tls_slab_reuse_guard(ss_guard);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Activate slot under mutex (slot state transition requires protection)
|
||||||
|
if (sp_slot_mark_active(reuse_meta, reuse_slot_idx, class_idx) == 0) {
|
||||||
|
// RACE FIX: Load SuperSlab pointer atomically (consistency)
|
||||||
|
SuperSlab* ss = atomic_load_explicit(&reuse_meta->ss, memory_order_relaxed);
|
||||||
|
|
||||||
|
// RACE FIX: Check if SuperSlab was freed (NULL pointer)
|
||||||
|
// This can happen if Thread A freed the SuperSlab after pushing slot to freelist,
|
||||||
|
// but Thread B popped the stale slot before the freelist was cleared.
|
||||||
|
if (!ss) {
|
||||||
|
// SuperSlab freed - skip and fall through to Stage 2/3
|
||||||
|
if (g_lock_stats_enabled == 1) {
|
||||||
|
atomic_fetch_add(&g_lock_release_count, 1);
|
||||||
|
}
|
||||||
|
pthread_mutex_unlock(&g_shared_pool.alloc_lock);
|
||||||
|
goto stage2_fallback;
|
||||||
|
}
|
||||||
|
|
||||||
|
#if !HAKMEM_BUILD_RELEASE
|
||||||
|
if (dbg_acquire == 1) {
|
||||||
|
fprintf(stderr, "[SP_ACQUIRE_STAGE1_LOCKFREE] class=%d reusing EMPTY slot (ss=%p slab=%d)\n",
|
||||||
|
class_idx, (void*)ss, reuse_slot_idx);
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
|
||||||
|
// Update SuperSlab metadata
|
||||||
|
ss->slab_bitmap |= (1u << reuse_slot_idx);
|
||||||
|
ss_slab_meta_class_idx_set(ss, reuse_slot_idx, (uint8_t)class_idx);
|
||||||
|
|
||||||
|
if (ss->active_slabs == 0) {
|
||||||
|
// Was empty, now active again
|
||||||
|
ss->active_slabs = 1;
|
||||||
|
g_shared_pool.active_count++;
|
||||||
|
}
|
||||||
|
// Track per-class active slots (approximate, under alloc_lock)
|
||||||
|
if (class_idx < TINY_NUM_CLASSES_SS) {
|
||||||
|
g_shared_pool.class_active_slots[class_idx]++;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Update hint
|
||||||
|
g_shared_pool.class_hints[class_idx] = ss;
|
||||||
|
|
||||||
|
*ss_out = ss;
|
||||||
|
*slab_idx_out = reuse_slot_idx;
|
||||||
|
|
||||||
|
if (g_lock_stats_enabled == 1) {
|
||||||
|
atomic_fetch_add(&g_lock_release_count, 1);
|
||||||
|
}
|
||||||
|
pthread_mutex_unlock(&g_shared_pool.alloc_lock);
|
||||||
|
if (g_sp_stage_stats_enabled) {
|
||||||
|
atomic_fetch_add(&g_sp_stage1_hits[class_idx], 1);
|
||||||
|
}
|
||||||
|
return 0; // ✅ Stage 1 (lock-free) success
|
||||||
|
}
|
||||||
|
|
||||||
|
// Slot activation failed (race condition?) - release lock and fall through
|
||||||
|
if (g_lock_stats_enabled == 1) {
|
||||||
|
atomic_fetch_add(&g_lock_release_count, 1);
|
||||||
|
}
|
||||||
|
pthread_mutex_unlock(&g_shared_pool.alloc_lock);
|
||||||
|
}
|
||||||
|
|
||||||
|
stage2_fallback:
|
||||||
|
// ========== Stage 2 (Lock-Free): Try to claim UNUSED slots ==========
|
||||||
|
// P0-5: Lock-free atomic CAS claiming (no mutex needed for slot state transition!)
|
||||||
|
// RACE FIX: Read ss_meta_count atomically (now properly declared as _Atomic)
|
||||||
|
// No cast needed! memory_order_acquire synchronizes with release in sp_meta_find_or_create
|
||||||
|
uint32_t meta_count = atomic_load_explicit(
|
||||||
|
&g_shared_pool.ss_meta_count,
|
||||||
|
memory_order_acquire
|
||||||
|
);
|
||||||
|
|
||||||
|
for (uint32_t i = 0; i < meta_count; i++) {
|
||||||
|
SharedSSMeta* meta = &g_shared_pool.ss_metadata[i];
|
||||||
|
|
||||||
|
// Try lock-free claiming (UNUSED → ACTIVE via CAS)
|
||||||
|
int claimed_idx = sp_slot_claim_lockfree(meta, class_idx);
|
||||||
|
if (claimed_idx >= 0) {
|
||||||
|
// RACE FIX: Load SuperSlab pointer atomically (critical for lock-free Stage 2)
|
||||||
|
// Use memory_order_acquire to synchronize with release in sp_meta_find_or_create
|
||||||
|
SuperSlab* ss = atomic_load_explicit(&meta->ss, memory_order_acquire);
|
||||||
|
if (!ss) {
|
||||||
|
// SuperSlab was freed between claiming and loading - skip this entry
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
#if !HAKMEM_BUILD_RELEASE
|
||||||
|
if (dbg_acquire == 1) {
|
||||||
|
fprintf(stderr, "[SP_ACQUIRE_STAGE2_LOCKFREE] class=%d claimed UNUSED slot (ss=%p slab=%d)\n",
|
||||||
|
class_idx, (void*)ss, claimed_idx);
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
|
||||||
|
// P0 instrumentation: count lock acquisitions
|
||||||
|
lock_stats_init();
|
||||||
|
if (g_lock_stats_enabled == 1) {
|
||||||
|
atomic_fetch_add(&g_lock_acquire_count, 1);
|
||||||
|
atomic_fetch_add(&g_lock_acquire_slab_count, 1);
|
||||||
|
}
|
||||||
|
|
||||||
|
pthread_mutex_lock(&g_shared_pool.alloc_lock);
|
||||||
|
|
||||||
|
// Update SuperSlab metadata under mutex
|
||||||
|
ss->slab_bitmap |= (1u << claimed_idx);
|
||||||
|
ss_slab_meta_class_idx_set(ss, claimed_idx, (uint8_t)class_idx);
|
||||||
|
|
||||||
|
if (ss->active_slabs == 0) {
|
||||||
|
ss->active_slabs = 1;
|
||||||
|
g_shared_pool.active_count++;
|
||||||
|
}
|
||||||
|
if (class_idx < TINY_NUM_CLASSES_SS) {
|
||||||
|
g_shared_pool.class_active_slots[class_idx]++;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Update hint
|
||||||
|
g_shared_pool.class_hints[class_idx] = ss;
|
||||||
|
|
||||||
|
*ss_out = ss;
|
||||||
|
*slab_idx_out = claimed_idx;
|
||||||
|
sp_fix_geometry_if_needed(ss, claimed_idx, class_idx);
|
||||||
|
|
||||||
|
if (g_lock_stats_enabled == 1) {
|
||||||
|
atomic_fetch_add(&g_lock_release_count, 1);
|
||||||
|
}
|
||||||
|
pthread_mutex_unlock(&g_shared_pool.alloc_lock);
|
||||||
|
if (g_sp_stage_stats_enabled) {
|
||||||
|
atomic_fetch_add(&g_sp_stage2_hits[class_idx], 1);
|
||||||
|
}
|
||||||
|
return 0; // ✅ Stage 2 (lock-free) success
|
||||||
|
}
|
||||||
|
|
||||||
|
// Claim failed (no UNUSED slots in this meta) - continue to next SuperSlab
|
||||||
|
}
|
||||||
|
|
||||||
|
// ========== Tension-Based Drain: Try to create EMPTY slots before Stage 3 ==========
|
||||||
|
// If TLS SLL has accumulated blocks, drain them to enable EMPTY slot detection
|
||||||
|
// This can avoid allocating new SuperSlabs by reusing EMPTY slots in Stage 1
|
||||||
|
// ENV: HAKMEM_TINY_TENSION_DRAIN_ENABLE=0 to disable (default=1)
|
||||||
|
// ENV: HAKMEM_TINY_TENSION_DRAIN_THRESHOLD=N to set threshold (default=1024)
|
||||||
|
{
|
||||||
|
static int tension_drain_enabled = -1;
|
||||||
|
static uint32_t tension_threshold = 1024;
|
||||||
|
|
||||||
|
if (tension_drain_enabled < 0) {
|
||||||
|
const char* env = getenv("HAKMEM_TINY_TENSION_DRAIN_ENABLE");
|
||||||
|
tension_drain_enabled = (env == NULL || atoi(env) != 0) ? 1 : 0;
|
||||||
|
|
||||||
|
const char* thresh_env = getenv("HAKMEM_TINY_TENSION_DRAIN_THRESHOLD");
|
||||||
|
if (thresh_env) {
|
||||||
|
tension_threshold = (uint32_t)atoi(thresh_env);
|
||||||
|
if (tension_threshold < 64) tension_threshold = 64;
|
||||||
|
if (tension_threshold > 65536) tension_threshold = 65536;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if (tension_drain_enabled) {
|
||||||
|
extern __thread TinyTLSSLL g_tls_sll[TINY_NUM_CLASSES];
|
||||||
|
extern uint32_t tiny_tls_sll_drain(int class_idx, uint32_t batch_size);
|
||||||
|
|
||||||
|
uint32_t sll_count = (class_idx < TINY_NUM_CLASSES) ? g_tls_sll[class_idx].count : 0;
|
||||||
|
|
||||||
|
if (sll_count >= tension_threshold) {
|
||||||
|
// Drain all blocks to maximize EMPTY slot creation
|
||||||
|
uint32_t drained = tiny_tls_sll_drain(class_idx, 0); // 0 = drain all
|
||||||
|
|
||||||
|
if (drained > 0) {
|
||||||
|
// Retry Stage 1 (EMPTY reuse) after drain
|
||||||
|
// Some slabs might have become EMPTY (meta->used == 0)
|
||||||
|
goto stage1_retry_after_tension_drain;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// ========== Stage 3: Mutex-protected fallback (new SuperSlab allocation) ==========
|
||||||
|
// All existing SuperSlabs have no UNUSED slots → need new SuperSlab
|
||||||
|
// P0 instrumentation: count lock acquisitions
|
||||||
|
lock_stats_init();
|
||||||
|
if (g_lock_stats_enabled == 1) {
|
||||||
|
atomic_fetch_add(&g_lock_acquire_count, 1);
|
||||||
|
atomic_fetch_add(&g_lock_acquire_slab_count, 1);
|
||||||
|
}
|
||||||
|
|
||||||
|
pthread_mutex_lock(&g_shared_pool.alloc_lock);
|
||||||
|
|
||||||
|
// ========== Stage 3: Get new SuperSlab ==========
|
||||||
|
// Try LRU cache first, then mmap
|
||||||
|
SuperSlab* new_ss = NULL;
|
||||||
|
|
||||||
|
// Stage 3a: Try LRU cache
|
||||||
|
extern SuperSlab* hak_ss_lru_pop(uint8_t size_class);
|
||||||
|
new_ss = hak_ss_lru_pop((uint8_t)class_idx);
|
||||||
|
|
||||||
|
int from_lru = (new_ss != NULL);
|
||||||
|
|
||||||
|
// Stage 3b: If LRU miss, allocate new SuperSlab
|
||||||
|
if (!new_ss) {
|
||||||
|
// Release the alloc_lock to avoid deadlock with registry during superslab_allocate
|
||||||
|
if (g_lock_stats_enabled == 1) {
|
||||||
|
atomic_fetch_add(&g_lock_release_count, 1);
|
||||||
|
}
|
||||||
|
pthread_mutex_unlock(&g_shared_pool.alloc_lock);
|
||||||
|
|
||||||
|
SuperSlab* allocated_ss = sp_internal_allocate_superslab();
|
||||||
|
|
||||||
|
// Re-acquire the alloc_lock
|
||||||
|
if (g_lock_stats_enabled == 1) {
|
||||||
|
atomic_fetch_add(&g_lock_acquire_count, 1);
|
||||||
|
atomic_fetch_add(&g_lock_acquire_slab_count, 1); // This is part of acquisition path
|
||||||
|
}
|
||||||
|
pthread_mutex_lock(&g_shared_pool.alloc_lock);
|
||||||
|
|
||||||
|
if (!allocated_ss) {
|
||||||
|
// Allocation failed; return now.
|
||||||
|
if (g_lock_stats_enabled == 1) {
|
||||||
|
atomic_fetch_add(&g_lock_release_count, 1);
|
||||||
|
}
|
||||||
|
pthread_mutex_unlock(&g_shared_pool.alloc_lock);
|
||||||
|
return -1; // Out of memory
|
||||||
|
}
|
||||||
|
|
||||||
|
new_ss = allocated_ss;
|
||||||
|
|
||||||
|
// Add newly allocated SuperSlab to the shared pool's internal array
|
||||||
|
if (g_shared_pool.total_count >= g_shared_pool.capacity) {
|
||||||
|
shared_pool_ensure_capacity_unlocked(g_shared_pool.total_count + 1);
|
||||||
|
if (g_shared_pool.total_count >= g_shared_pool.capacity) {
|
||||||
|
// Pool table expansion failed; leave ss alive (registry-owned),
|
||||||
|
// but do not treat it as part of shared_pool.
|
||||||
|
// This is a critical error, return early.
|
||||||
|
if (g_lock_stats_enabled == 1) {
|
||||||
|
atomic_fetch_add(&g_lock_release_count, 1);
|
||||||
|
}
|
||||||
|
pthread_mutex_unlock(&g_shared_pool.alloc_lock);
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
g_shared_pool.slabs[g_shared_pool.total_count] = new_ss;
|
||||||
|
g_shared_pool.total_count++;
|
||||||
|
}
|
||||||
|
|
||||||
|
#if !HAKMEM_BUILD_RELEASE
|
||||||
|
if (dbg_acquire == 1 && new_ss) {
|
||||||
|
fprintf(stderr, "[SP_ACQUIRE_STAGE3] class=%d new SuperSlab (ss=%p from_lru=%d)\n",
|
||||||
|
class_idx, (void*)new_ss, from_lru);
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
|
||||||
|
if (!new_ss) {
|
||||||
|
if (g_lock_stats_enabled == 1) {
|
||||||
|
atomic_fetch_add(&g_lock_release_count, 1);
|
||||||
|
}
|
||||||
|
pthread_mutex_unlock(&g_shared_pool.alloc_lock);
|
||||||
|
return -1; // ❌ Out of memory
|
||||||
|
}
|
||||||
|
|
||||||
|
// Before creating a new SuperSlab, consult learning-layer soft cap.
|
||||||
|
// If current active slots for this class already exceed the policy cap,
|
||||||
|
// fail early so caller can fall back to legacy backend.
|
||||||
|
uint32_t limit = sp_class_active_limit(class_idx);
|
||||||
|
if (limit > 0) {
|
||||||
|
uint32_t cur = g_shared_pool.class_active_slots[class_idx];
|
||||||
|
if (cur >= limit) {
|
||||||
|
if (g_lock_stats_enabled == 1) {
|
||||||
|
atomic_fetch_add(&g_lock_release_count, 1);
|
||||||
|
}
|
||||||
|
pthread_mutex_unlock(&g_shared_pool.alloc_lock);
|
||||||
|
return -1; // Soft cap reached for this class
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create metadata for this new SuperSlab
|
||||||
|
SharedSSMeta* new_meta = sp_meta_find_or_create(new_ss);
|
||||||
|
if (!new_meta) {
|
||||||
|
if (g_lock_stats_enabled == 1) {
|
||||||
|
atomic_fetch_add(&g_lock_release_count, 1);
|
||||||
|
}
|
||||||
|
pthread_mutex_unlock(&g_shared_pool.alloc_lock);
|
||||||
|
return -1; // ❌ Metadata allocation failed
|
||||||
|
}
|
||||||
|
|
||||||
|
// Assign first slot to this class
|
||||||
|
int first_slot = 0;
|
||||||
|
if (sp_slot_mark_active(new_meta, first_slot, class_idx) != 0) {
|
||||||
|
if (g_lock_stats_enabled == 1) {
|
||||||
|
atomic_fetch_add(&g_lock_release_count, 1);
|
||||||
|
}
|
||||||
|
pthread_mutex_unlock(&g_shared_pool.alloc_lock);
|
||||||
|
return -1; // ❌ Should not happen
|
||||||
|
}
|
||||||
|
|
||||||
|
// Update SuperSlab metadata
|
||||||
|
new_ss->slab_bitmap |= (1u << first_slot);
|
||||||
|
ss_slab_meta_class_idx_set(new_ss, first_slot, (uint8_t)class_idx);
|
||||||
|
new_ss->active_slabs = 1;
|
||||||
|
g_shared_pool.active_count++;
|
||||||
|
if (class_idx < TINY_NUM_CLASSES_SS) {
|
||||||
|
g_shared_pool.class_active_slots[class_idx]++;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Update hint
|
||||||
|
g_shared_pool.class_hints[class_idx] = new_ss;
|
||||||
|
|
||||||
|
*ss_out = new_ss;
|
||||||
|
*slab_idx_out = first_slot;
|
||||||
|
sp_fix_geometry_if_needed(new_ss, first_slot, class_idx);
|
||||||
|
|
||||||
|
if (g_lock_stats_enabled == 1) {
|
||||||
|
atomic_fetch_add(&g_lock_release_count, 1);
|
||||||
|
}
|
||||||
|
pthread_mutex_unlock(&g_shared_pool.alloc_lock);
|
||||||
|
if (g_sp_stage_stats_enabled) {
|
||||||
|
atomic_fetch_add(&g_sp_stage3_hits[class_idx], 1);
|
||||||
|
}
|
||||||
|
return 0; // ✅ Stage 3 success
|
||||||
|
}
|
||||||
56
core/hakmem_shared_pool_internal.h
Normal file
56
core/hakmem_shared_pool_internal.h
Normal file
@ -0,0 +1,56 @@
|
|||||||
|
#ifndef HAKMEM_SHARED_POOL_INTERNAL_H
|
||||||
|
#define HAKMEM_SHARED_POOL_INTERNAL_H
|
||||||
|
|
||||||
|
#include "hakmem_shared_pool.h"
|
||||||
|
#include "hakmem_tiny_superslab.h"
|
||||||
|
#include "hakmem_tiny_superslab_constants.h"
|
||||||
|
#include <stdatomic.h>
|
||||||
|
#include <pthread.h>
|
||||||
|
|
||||||
|
// Global Shared Pool Instance
|
||||||
|
extern SharedSuperSlabPool g_shared_pool;
|
||||||
|
|
||||||
|
// Lock Statistics
|
||||||
|
// Counters are defined always to avoid compilation errors in Release build
|
||||||
|
// (usage is guarded by g_lock_stats_enabled which is 0 in Release)
|
||||||
|
extern _Atomic uint64_t g_lock_acquire_count;
|
||||||
|
extern _Atomic uint64_t g_lock_release_count;
|
||||||
|
extern _Atomic uint64_t g_lock_acquire_slab_count;
|
||||||
|
extern _Atomic uint64_t g_lock_release_slab_count;
|
||||||
|
extern int g_lock_stats_enabled;
|
||||||
|
|
||||||
|
#if !HAKMEM_BUILD_RELEASE
|
||||||
|
void lock_stats_init(void);
|
||||||
|
#else
|
||||||
|
static inline void lock_stats_init(void) {
|
||||||
|
// No-op for release build
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
|
||||||
|
// Stage Statistics
|
||||||
|
extern _Atomic uint64_t g_sp_stage1_hits[TINY_NUM_CLASSES_SS];
|
||||||
|
extern _Atomic uint64_t g_sp_stage2_hits[TINY_NUM_CLASSES_SS];
|
||||||
|
extern _Atomic uint64_t g_sp_stage3_hits[TINY_NUM_CLASSES_SS];
|
||||||
|
extern int g_sp_stage_stats_enabled;
|
||||||
|
void sp_stage_stats_init(void);
|
||||||
|
|
||||||
|
// Internal Helpers (Shared between acquire/release/pool)
|
||||||
|
void shared_pool_ensure_capacity_unlocked(uint32_t min_capacity);
|
||||||
|
SuperSlab* sp_internal_allocate_superslab(void);
|
||||||
|
|
||||||
|
// Slot & Meta Helpers
|
||||||
|
int sp_slot_mark_active(SharedSSMeta* meta, int slot_idx, int class_idx);
|
||||||
|
int sp_slot_mark_empty(SharedSSMeta* meta, int slot_idx);
|
||||||
|
int sp_slot_claim_lockfree(SharedSSMeta* meta, int class_idx);
|
||||||
|
SharedSSMeta* sp_meta_find_or_create(SuperSlab* ss);
|
||||||
|
void sp_meta_sync_slots_from_ss(SharedSSMeta* meta, SuperSlab* ss);
|
||||||
|
|
||||||
|
// Free List Helpers
|
||||||
|
int sp_freelist_push_lockfree(int class_idx, SharedSSMeta* meta, int slot_idx);
|
||||||
|
int sp_freelist_pop_lockfree(int class_idx, SharedSSMeta** meta_out, int* slot_idx_out);
|
||||||
|
|
||||||
|
// Policy & Geometry Helpers
|
||||||
|
uint32_t sp_class_active_limit(int class_idx);
|
||||||
|
void sp_fix_geometry_if_needed(SuperSlab* ss, int slab_idx, int class_idx);
|
||||||
|
|
||||||
|
#endif // HAKMEM_SHARED_POOL_INTERNAL_H
|
||||||
179
core/hakmem_shared_pool_release.c
Normal file
179
core/hakmem_shared_pool_release.c
Normal file
@ -0,0 +1,179 @@
|
|||||||
|
#include "hakmem_shared_pool_internal.h"
|
||||||
|
#include "hakmem_debug_master.h"
|
||||||
|
#include "box/ss_slab_meta_box.h"
|
||||||
|
#include "box/ss_hot_cold_box.h"
|
||||||
|
|
||||||
|
#include <stdlib.h>
|
||||||
|
#include <stdio.h>
|
||||||
|
#include <stdatomic.h>
|
||||||
|
|
||||||
|
void
|
||||||
|
shared_pool_release_slab(SuperSlab* ss, int slab_idx)
|
||||||
|
{
|
||||||
|
// Phase 12: SP-SLOT Box - Slot-based Release
|
||||||
|
//
|
||||||
|
// Flow:
|
||||||
|
// 1. Validate inputs and check meta->used == 0
|
||||||
|
// 2. Find SharedSSMeta for this SuperSlab
|
||||||
|
// 3. Mark slot ACTIVE → EMPTY
|
||||||
|
// 4. Push to per-class free list (enables same-class reuse)
|
||||||
|
// 5. If all slots EMPTY → superslab_free() → LRU cache
|
||||||
|
|
||||||
|
if (!ss) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
if (slab_idx < 0 || slab_idx >= SLABS_PER_SUPERSLAB_MAX) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Debug logging
|
||||||
|
#if !HAKMEM_BUILD_RELEASE
|
||||||
|
static int dbg = -1;
|
||||||
|
if (__builtin_expect(dbg == -1, 0)) {
|
||||||
|
const char* e = getenv("HAKMEM_SS_FREE_DEBUG");
|
||||||
|
dbg = (e && *e && *e != '0') ? 1 : 0;
|
||||||
|
}
|
||||||
|
#else
|
||||||
|
static const int dbg = 0;
|
||||||
|
#endif
|
||||||
|
|
||||||
|
// P0 instrumentation: count lock acquisitions
|
||||||
|
lock_stats_init();
|
||||||
|
if (g_lock_stats_enabled == 1) {
|
||||||
|
atomic_fetch_add(&g_lock_stats_enabled, 1);
|
||||||
|
atomic_fetch_add(&g_lock_release_slab_count, 1);
|
||||||
|
}
|
||||||
|
|
||||||
|
pthread_mutex_lock(&g_shared_pool.alloc_lock);
|
||||||
|
|
||||||
|
TinySlabMeta* slab_meta = &ss->slabs[slab_idx];
|
||||||
|
if (slab_meta->used != 0) {
|
||||||
|
// Not actually empty; nothing to do
|
||||||
|
if (g_lock_stats_enabled == 1) {
|
||||||
|
atomic_fetch_add(&g_lock_release_count, 1);
|
||||||
|
}
|
||||||
|
pthread_mutex_unlock(&g_shared_pool.alloc_lock);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
uint8_t class_idx = slab_meta->class_idx;
|
||||||
|
|
||||||
|
#if !HAKMEM_BUILD_RELEASE
|
||||||
|
if (dbg == 1) {
|
||||||
|
fprintf(stderr, "[SP_SLOT_RELEASE] ss=%p slab_idx=%d class=%d used=0 (marking EMPTY)\n",
|
||||||
|
(void*)ss, slab_idx, class_idx);
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
|
||||||
|
// Find SharedSSMeta for this SuperSlab
|
||||||
|
SharedSSMeta* sp_meta = NULL;
|
||||||
|
uint32_t count = atomic_load_explicit(&g_shared_pool.ss_meta_count, memory_order_relaxed);
|
||||||
|
for (uint32_t i = 0; i < count; i++) {
|
||||||
|
// RACE FIX: Load pointer atomically
|
||||||
|
SuperSlab* meta_ss = atomic_load_explicit(&g_shared_pool.ss_metadata[i].ss, memory_order_relaxed);
|
||||||
|
if (meta_ss == ss) {
|
||||||
|
sp_meta = &g_shared_pool.ss_metadata[i];
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!sp_meta) {
|
||||||
|
// SuperSlab not in SP-SLOT system yet - create metadata
|
||||||
|
sp_meta = sp_meta_find_or_create(ss);
|
||||||
|
if (!sp_meta) {
|
||||||
|
pthread_mutex_unlock(&g_shared_pool.alloc_lock);
|
||||||
|
return; // Failed to create metadata
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Mark slot as EMPTY (ACTIVE → EMPTY)
|
||||||
|
uint32_t slab_bit = (1u << slab_idx);
|
||||||
|
SlotState slot_state = atomic_load_explicit(
|
||||||
|
&sp_meta->slots[slab_idx].state,
|
||||||
|
memory_order_acquire);
|
||||||
|
if (slot_state != SLOT_ACTIVE && (ss->slab_bitmap & slab_bit)) {
|
||||||
|
// Legacy path import: rebuild slot states from SuperSlab bitmap/class_map
|
||||||
|
sp_meta_sync_slots_from_ss(sp_meta, ss);
|
||||||
|
slot_state = atomic_load_explicit(
|
||||||
|
&sp_meta->slots[slab_idx].state,
|
||||||
|
memory_order_acquire);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (slot_state != SLOT_ACTIVE || sp_slot_mark_empty(sp_meta, slab_idx) != 0) {
|
||||||
|
if (g_lock_stats_enabled == 1) {
|
||||||
|
atomic_fetch_add(&g_lock_release_count, 1);
|
||||||
|
}
|
||||||
|
pthread_mutex_unlock(&g_shared_pool.alloc_lock);
|
||||||
|
return; // Slot wasn't ACTIVE
|
||||||
|
}
|
||||||
|
|
||||||
|
// Update SuperSlab metadata
|
||||||
|
uint32_t bit = (1u << slab_idx);
|
||||||
|
if (ss->slab_bitmap & bit) {
|
||||||
|
ss->slab_bitmap &= ~bit;
|
||||||
|
slab_meta->class_idx = 255; // UNASSIGNED
|
||||||
|
// P1.1: Mark class_map as UNASSIGNED when releasing slab
|
||||||
|
ss->class_map[slab_idx] = 255;
|
||||||
|
|
||||||
|
if (ss->active_slabs > 0) {
|
||||||
|
ss->active_slabs--;
|
||||||
|
if (ss->active_slabs == 0 && g_shared_pool.active_count > 0) {
|
||||||
|
g_shared_pool.active_count--;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if (class_idx < TINY_NUM_CLASSES_SS &&
|
||||||
|
g_shared_pool.class_active_slots[class_idx] > 0) {
|
||||||
|
g_shared_pool.class_active_slots[class_idx]--;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// P0-4: Push to lock-free per-class free list (enables reuse by same class)
|
||||||
|
// Note: push BEFORE releasing mutex (slot state already updated under lock)
|
||||||
|
if (class_idx < TINY_NUM_CLASSES_SS) {
|
||||||
|
sp_freelist_push_lockfree(class_idx, sp_meta, slab_idx);
|
||||||
|
|
||||||
|
#if !HAKMEM_BUILD_RELEASE
|
||||||
|
if (dbg == 1) {
|
||||||
|
fprintf(stderr, "[SP_SLOT_FREELIST_LOCKFREE] class=%d pushed slot (ss=%p slab=%d) active_slots=%u/%u\n",
|
||||||
|
class_idx, (void*)ss, slab_idx,
|
||||||
|
sp_meta->active_slots, sp_meta->total_slots);
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check if SuperSlab is now completely empty (all slots EMPTY or UNUSED)
|
||||||
|
if (sp_meta->active_slots == 0) {
|
||||||
|
#if !HAKMEM_BUILD_RELEASE
|
||||||
|
if (dbg == 1) {
|
||||||
|
fprintf(stderr, "[SP_SLOT_COMPLETELY_EMPTY] ss=%p active_slots=0 (calling superslab_free)\n",
|
||||||
|
(void*)ss);
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
|
||||||
|
if (g_lock_stats_enabled == 1) {
|
||||||
|
atomic_fetch_add(&g_lock_release_count, 1);
|
||||||
|
}
|
||||||
|
|
||||||
|
// RACE FIX: Set meta->ss to NULL BEFORE unlocking mutex
|
||||||
|
// This prevents Stage 2 from accessing freed SuperSlab
|
||||||
|
atomic_store_explicit(&sp_meta->ss, NULL, memory_order_release);
|
||||||
|
|
||||||
|
pthread_mutex_unlock(&g_shared_pool.alloc_lock);
|
||||||
|
|
||||||
|
// Remove from legacy backend list (if present) to prevent dangling pointers
|
||||||
|
extern void remove_superslab_from_legacy_head(SuperSlab* ss);
|
||||||
|
remove_superslab_from_legacy_head(ss);
|
||||||
|
|
||||||
|
// Free SuperSlab:
|
||||||
|
// 1. Try LRU cache (hak_ss_lru_push) - lazy deallocation
|
||||||
|
// 2. Or munmap if LRU is full - eager deallocation
|
||||||
|
extern void superslab_free(SuperSlab* ss);
|
||||||
|
superslab_free(ss);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (g_lock_stats_enabled == 1) {
|
||||||
|
atomic_fetch_add(&g_lock_release_count, 1);
|
||||||
|
}
|
||||||
|
pthread_mutex_unlock(&g_shared_pool.alloc_lock);
|
||||||
|
}
|
||||||
@ -24,7 +24,7 @@
|
|||||||
// Increased from 4096 to 32768 to avoid registry exhaustion under
|
// Increased from 4096 to 32768 to avoid registry exhaustion under
|
||||||
// high-churn microbenchmarks (e.g., larson with many active SuperSlabs).
|
// high-churn microbenchmarks (e.g., larson with many active SuperSlabs).
|
||||||
// Still a power of two for fast masking.
|
// Still a power of two for fast masking.
|
||||||
#define SUPER_REG_SIZE 262144 // Power of 2 for fast modulo (8x larger for workloads)
|
#define SUPER_REG_SIZE 1048576 // Power of 2 for fast modulo (1M entries)
|
||||||
#define SUPER_REG_MASK (SUPER_REG_SIZE - 1)
|
#define SUPER_REG_MASK (SUPER_REG_SIZE - 1)
|
||||||
#define SUPER_MAX_PROBE 32 // Linear probing limit (increased from 8 for Phase 15 fix)
|
#define SUPER_MAX_PROBE 32 // Linear probing limit (increased from 8 for Phase 15 fix)
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user