diff --git a/CURRENT_TASK.md b/CURRENT_TASK.md index 6e73da4a..1d180b2b 100644 --- a/CURRENT_TASK.md +++ b/CURRENT_TASK.md @@ -1,4 +1,4 @@ -## HAKMEM 状況メモ (2025-12-05 更新) +## HAKMEM 状況メモ (2025-12-05 更新 / C7 Warm/TLS Bind 反映) ### 現在の状態(Tiny / Superslab / Warm Pool) - Tiny Front / Superslab / Shared Pool は Box Theory 準拠で 3 層構造に整理済み(HOT/WARM/COLD)。 @@ -27,10 +27,26 @@ - `core/box/tiny_page_box.h` / `core/box/tiny_page_box.c` を追加し、`HAKMEM_TINY_PAGE_BOX_CLASSES` で有効クラスを制御できる Page Box を実装。 - `tiny_tls_bind_slab()` から `tiny_page_box_on_new_slab()` を呼び出し、TLS が bind した C7 slab を per-thread の page pool に登録。 - `unified_cache_refill()` の先頭に Page Box 経路を追加し、C7 では「TLS が掴んでいるページ内 freelist/carve」からバッチ供給を試みてから Warm Pool / Shared Pool に落ちるようにした(Box 境界は `Tiny Page Box → Warm Pool → Shared Pool` の順序を維持)。 +- TLS Bind Box の導入: + - `core/box/ss_tls_bind_box.h` に `ss_tls_bind_one()` を追加し、「Superslab + slab_idx → TLS」のバインド処理(`superslab_init_slab` / `meta->class_idx` 設定 / `tiny_tls_bind_slab`)を 1 箇所に集約。 + - `superslab_refill()`(Shared Pool 経路)および Warm Pool 実験経路から、この Box を経由して TLS に接続するよう統一。 +- C7 Warm/TLS Bind 経路の実装と検証: + - `core/front/tiny_unified_cache.c` に C7 専用の Warm/TLS Bind モード(0/1/2)を追加し、Debug では `HAKMEM_WARM_TLS_BIND_C7` で切替可能にした。 + - mode 0: Legacy Warm(レガシー/デバッグ用、C7 では carve 0 が多く非推奨) + - mode 1: Bind-only(Warm から取得した Superslab を TLS Bind Box 経由でバインドする本番経路) + - mode 2: Bind+TLS carve(TLS から直接 carve する実験経路) + - Release ビルドでは常に mode=1 固定。Debug では `HAKMEM_WARM_TLS_BIND_C7=0/1/2` で切替。 +- Warm Pool / Unified Cache の詳細計測: + - `warm_pool_dbg_box.h` と Unified Cache の計測フックを拡張し、C7 向けに + - Warm pop 試行/ヒット/実 carve 回数 + - TLS carve 試行/成功/失敗 + - UC ミスを Warm/TLS/Shared 別に分類 + を Debug ビルドで観測可能にした。 + - `bench_random_mixed.c` に `HAKMEM_BENCH_C7_ONLY=1` を追加し、C7 サイズ専用の micro-bench を追加。 ### 性能の現状(Random Mixed, HEAD) - 条件: `bench_random_mixed_hakmem 1000000 256 42`(1T, ws=256, RELEASE, 16–1024B) - - HAKMEM: 約 5.0M ops/s + - HAKMEM: 約 27.6M ops/s(C7 Warm/TLS 修復後) - system malloc: 約 90–100M ops/s - mimalloc: 約 120–130M ops/s - 条件: `bench_random_mixed_hakmem 1000000 256 42` + @@ -38,26 +54,27 @@ - HAKMEM Tiny Front: 約 80–90M ops/s(mimalloc と同オーダー) - 条件: `bench_random_mixed_hakmem 1000000 256 42` + `HAKMEM_BENCH_MIN_SIZE=129 HAKMEM_BENCH_MAX_SIZE=1024`(Tiny C5–C7 のみ) - - HAKMEM: 約 4.7–4.8M ops/s + - HAKMEM: 約 28.0M ops/s(Warm/TLS ガード適用後) +- 条件: C7 専用 micro-bench(Debug, `HAKMEM_BENCH_C7_ONLY=1 HAKMEM_TINY_PROFILE=full HAKMEM_WARM_C7_MAX=8 HAKMEM_WARM_C7_PREFETCH=4` ほか) + - mode 0(Legacy Warm): 約 2.0M ops/s、C7 Warm ヒット 0・Shared Pool ロック多数(`slab_carve_from_ss` が 0 を頻発) + - mode 1(Bind-only): 約 20M ops/s(iters=200K, ws=32)、Warm hit ≈100%・Shared Pool ロック 5 回まで減少 + - mode 2(Bind+TLS carve 実験): mode 1 と同等〜わずかに上(UC ミスは増えるが `uc_miss_tls` に集中し、avg_refill は短縮) +- 条件: C7 専用 micro-bench(Release, `HAKMEM_BENCH_C7_ONLY=1 HAKMEM_TINY_PROFILE=full HAKMEM_WARM_C7_MAX=8 HAKMEM_WARM_C7_PREFETCH=4`) + - HAKMEM: 約 18.8M ops/s(空スラブ強制ガード + リセット導入後、Debug と同オーダーまで回復) - 結論: - Tiny front 自体(8–128B)は十分速く、mimalloc と同オーダーまで出ている。 - - 129–1024B の Tiny C5–C7 経路で Unified Cache hit=0 / Shared Pool ロック多発というボトルネックがあり、 - Random Mixed 全体の性能を支配している。 + - C5–C7 経路は「満杯 C7 slab を Warm に再供給していた」問題を空スラブ限定ガード+Release/Debug 共通リセットで解消し、 + C7-only Release も ~18.8M ops/s に回復。Random Mixed Release も 27M クラスまで改善。 -### 次にやること(優先タスク:C7 Page Box の実効性検証とチューニング) -1. **C7 Page Box 経路の実効性を計測** - - ENV: `HAKMEM_BENCH_MIN_SIZE=129 HAKMEM_BENCH_MAX_SIZE=1024` + `HAKMEM_MEASURE_UNIFIED_CACHE=1` で - `bench_random_mixed_hakmem 1000000 256 42` を実行し、C7 の: - - Unified Cache refill 回数・平均 cycles - - `shared_pool_acquire_slab(C7)` のロック回数 - を、Page Box ON/OFF(`HAKMEM_TINY_PAGE_BOX_CLASSES=` 未設定 vs `7`)で比較する。 -2. **C7 の Unified Cache 容量・バッチサイズのチューニング** - - `HAKMEM_TINY_UNIFIED_C7` と `unified_cache_refill()` の `max_batch` 設定を変えつつ、 - Page Box ON 時の C7 ヒット率・Shared Pool ロック回数・throughput を観測し、C7 にとって最適な容量/バッチサイズを探る。 -3. **Page Box を C5/C6 に拡張するかの判断** - - C7 で十分な効果(Shared Pool ロック大幅減 + throughput 向上)が得られた場合、 - `HAKMEM_TINY_PAGE_BOX_CLASSES=5,6,7` を試し、C5/C6 も Tiny-Plus 化したときの安定性・性能を確認する。 - - 問題がなければ、デフォルトプロファイルを「C5–C7 Page Box 有効」に近づけるかを検討する。 +### 次にやること(広い条件での安定化確認) +1. `HAKMEM_BENCH_MIN_SIZE=129 HAKMEM_BENCH_MAX_SIZE=1024` や通常の `bench_random_mixed_hakmem 1000000 256 42` で + 空スラブ限定ガードが副作用なく動くかを継続確認(現状 Release で 27–28M ops/s を確認済み)。 +2. ドキュメント更新: + - Release だけ C7 Warm が死んでいた根本原因 = 満杯 C7 slab を Shared Pool がリセットせず再供給していた。 + - Acquire の空スラブ強制ガード+Release/Debug 共通リセットで C7-only Release が ~18.8M ops/s まで回復した。 +3. 次フェーズ案: + - C5/C6 でも同様の Warm/TLS 最適化・空スラブガードを適用するか、 + - Random Mixed 全体のボトルネック(Shared Pool ロック/Wrapper/mid-size path など)を洗うかを選択。 ### メモ - ページフォルト問題は Prefault Box + ウォームアップで一定水準まで解消済みで、現在の主ボトルネックはユーザー空間の箱(Unified Cache / free / Pool)側に移っている。 diff --git a/README_PERF_ANALYSIS.md b/README_PERF_ANALYSIS.md index 37b29419..64b01fc6 100644 --- a/README_PERF_ANALYSIS.md +++ b/README_PERF_ANALYSIS.md @@ -1,5 +1,8 @@ # HAKMEM Allocator Performance Analysis Results +**最新メモ (2025-12-05)**: C7 Warm/TLS Bind は本番経路を Bind-only (mode=1) に統一。Debug では `HAKMEM_WARM_TLS_BIND_C7=0/1/2` で切替可能だが、Release は常に mode=1 固定。C7-only ワークロードでは mode=1 が legacy (mode=0) 比で ~4–10x 速く、mode=2 は TLS carve 実験として残置。 +**追記 (2025-12-05, Release 修復)**: Release だけ C7 Warm が死んでいた原因は「満杯 C7 slab が Shared Pool に居残り、空スラブが Warm に渡っていなかった」こと。Acquire で C7 は空スラブ限定、Release でメタをリセットするガードを導入し、C7-only Release で ~18.8M ops/s、Random Mixed Release で ~27–28M ops/s まで回復。 + **分析実施日**: 2025-11-28 **分析対象**: HAKMEM allocator (commit 0ce20bb83) **ベンチマーク**: bench_random_mixed (1,000,000 ops, working set=256) diff --git a/bench_random_mixed.c b/bench_random_mixed.c index 8abf2399..319122be 100644 --- a/bench_random_mixed.c +++ b/bench_random_mixed.c @@ -13,9 +13,16 @@ #include #include #include +#include +#define C7_META_COUNTER_DEFINE +#include "core/box/c7_meta_used_counter_box.h" +#undef C7_META_COUNTER_DEFINE +#include "core/box/warm_pool_rel_counters_box.h" #ifdef USE_HAKMEM #include "hakmem.h" +#include "hakmem_build_flags.h" +#include "core/box/c7_meta_used_counter_box.h" // Box BenchMeta: Benchmark metadata management (bypass hakmem wrapper) // Phase 15: Separate BenchMeta (slots array) from CoreAlloc (user workload) @@ -253,6 +260,38 @@ int main(int argc, char** argv){ extern void tiny_warm_pool_print_stats_public(void); tiny_warm_pool_print_stats_public(); + #if HAKMEM_BUILD_RELEASE + // Minimal Release-side telemetry to verify Warm path usage (C7-only) + extern _Atomic uint64_t g_rel_c7_warm_pop; + extern _Atomic uint64_t g_rel_c7_warm_push; + fprintf(stderr, + "[REL_C7_CARVE] attempts=%llu success=%llu zero=%llu\n", + (unsigned long long)warm_pool_rel_c7_carve_attempts(), + (unsigned long long)warm_pool_rel_c7_carve_successes(), + (unsigned long long)warm_pool_rel_c7_carve_zeroes()); + fprintf(stderr, + "[REL_C7_WARM] pop=%llu push=%llu\n", + (unsigned long long)atomic_load_explicit(&g_rel_c7_warm_pop, memory_order_relaxed), + (unsigned long long)atomic_load_explicit(&g_rel_c7_warm_push, memory_order_relaxed)); + fprintf(stderr, + "[REL_C7_WARM_PREFILL] calls=%llu slabs=%llu\n", + (unsigned long long)warm_pool_rel_c7_prefill_calls(), + (unsigned long long)warm_pool_rel_c7_prefill_slabs()); + fprintf(stderr, + "[REL_C7_META_USED_INC] total=%llu backend=%llu tls=%llu front=%llu\n", + (unsigned long long)c7_meta_used_total(), + (unsigned long long)c7_meta_used_backend(), + (unsigned long long)c7_meta_used_tls(), + (unsigned long long)c7_meta_used_front()); + #else + fprintf(stderr, + "[DBG_C7_META_USED_INC] total=%llu backend=%llu tls=%llu front=%llu\n", + (unsigned long long)c7_meta_used_total(), + (unsigned long long)c7_meta_used_backend(), + (unsigned long long)c7_meta_used_tls(), + (unsigned long long)c7_meta_used_front()); + #endif + // Phase 21-1: Ring cache - DELETED (A/B test: OFF is faster) // extern void ring_cache_print_stats(void); // ring_cache_print_stats(); diff --git a/core/box/c7_meta_used_counter_box.h b/core/box/c7_meta_used_counter_box.h new file mode 100644 index 00000000..aa57bf10 --- /dev/null +++ b/core/box/c7_meta_used_counter_box.h @@ -0,0 +1,59 @@ +// c7_meta_used_counter_box.h +// Box: C7 meta->used increment counters (Release/Debug共通) +#pragma once + +#include +#include + +typedef enum C7MetaUsedSource { + C7_META_USED_SRC_UNKNOWN = 0, + C7_META_USED_SRC_BACKEND = 1, + C7_META_USED_SRC_TLS = 2, + C7_META_USED_SRC_FRONT = 3, +} C7MetaUsedSource; + +#ifdef C7_META_COUNTER_DEFINE +#define C7_META_COUNTER_EXTERN +#else +#define C7_META_COUNTER_EXTERN extern +#endif + +C7_META_COUNTER_EXTERN _Atomic uint64_t g_c7_meta_used_inc_total; +C7_META_COUNTER_EXTERN _Atomic uint64_t g_c7_meta_used_inc_backend; +C7_META_COUNTER_EXTERN _Atomic uint64_t g_c7_meta_used_inc_tls; +C7_META_COUNTER_EXTERN _Atomic uint64_t g_c7_meta_used_inc_front; + +static inline void c7_meta_used_note(int class_idx, C7MetaUsedSource src) { + if (__builtin_expect(class_idx != 7, 1)) { + return; + } + atomic_fetch_add_explicit(&g_c7_meta_used_inc_total, 1, memory_order_relaxed); + switch (src) { + case C7_META_USED_SRC_BACKEND: + atomic_fetch_add_explicit(&g_c7_meta_used_inc_backend, 1, memory_order_relaxed); + break; + case C7_META_USED_SRC_TLS: + atomic_fetch_add_explicit(&g_c7_meta_used_inc_tls, 1, memory_order_relaxed); + break; + case C7_META_USED_SRC_FRONT: + atomic_fetch_add_explicit(&g_c7_meta_used_inc_front, 1, memory_order_relaxed); + break; + default: + break; + } +} + +static inline uint64_t c7_meta_used_total(void) { + return atomic_load_explicit(&g_c7_meta_used_inc_total, memory_order_relaxed); +} +static inline uint64_t c7_meta_used_backend(void) { + return atomic_load_explicit(&g_c7_meta_used_inc_backend, memory_order_relaxed); +} +static inline uint64_t c7_meta_used_tls(void) { + return atomic_load_explicit(&g_c7_meta_used_inc_tls, memory_order_relaxed); +} +static inline uint64_t c7_meta_used_front(void) { + return atomic_load_explicit(&g_c7_meta_used_inc_front, memory_order_relaxed); +} + +#undef C7_META_COUNTER_EXTERN diff --git a/core/box/carve_push_box.c b/core/box/carve_push_box.c index ace1534e..ac47c881 100644 --- a/core/box/carve_push_box.c +++ b/core/box/carve_push_box.c @@ -15,6 +15,7 @@ #include "tiny_header_box.h" // Header Box: Single Source of Truth for header operations #include "../tiny_refill_opt.h" // TinyRefillChain, trc_linear_carve() #include "../tiny_box_geometry.h" // tiny_stride_for_class(), tiny_slab_base_for_geometry() +#include "c7_meta_used_counter_box.h" // External declarations extern __thread TinyTLSSlab g_tls_slabs[TINY_NUM_CLASSES]; @@ -191,6 +192,7 @@ uint32_t box_carve_and_push_with_freelist(int class_idx, uint32_t want) { void* p = meta->freelist; meta->freelist = tiny_next_read(class_idx, p); meta->used++; + c7_meta_used_note(class_idx, C7_META_USED_SRC_FRONT); // CRITICAL FIX: Restore header BEFORE pushing to TLS SLL // Freelist blocks may have stale data at offset 0 diff --git a/core/box/carve_push_box.d b/core/box/carve_push_box.d index d24e142c..ca8f9495 100644 --- a/core/box/carve_push_box.d +++ b/core/box/carve_push_box.d @@ -41,7 +41,7 @@ core/box/carve_push_box.o: core/box/carve_push_box.c \ core/box/../tiny_region_id.h core/box/../hakmem_tiny_integrity.h \ core/box/../box/slab_freelist_atomic.h core/box/tiny_header_box.h \ core/box/../tiny_refill_opt.h core/box/../box/tls_sll_box.h \ - core/box/../tiny_box_geometry.h + core/box/../tiny_box_geometry.h core/box/c7_meta_used_counter_box.h core/box/../hakmem_tiny.h: core/box/../hakmem_build_flags.h: core/box/../hakmem_trace.h: @@ -116,3 +116,4 @@ core/box/tiny_header_box.h: core/box/../tiny_refill_opt.h: core/box/../box/tls_sll_box.h: core/box/../tiny_box_geometry.h: +core/box/c7_meta_used_counter_box.h: diff --git a/core/box/slab_carve_box.h b/core/box/slab_carve_box.h index 8f92c4ce..3d5cf3cd 100644 --- a/core/box/slab_carve_box.h +++ b/core/box/slab_carve_box.h @@ -9,12 +9,15 @@ #include #include +#include +#include #include "../hakmem_tiny_config.h" #include "../hakmem_tiny_superslab.h" #include "../superslab/superslab_inline.h" #include "../tiny_box_geometry.h" #include "../box/tiny_next_ptr_box.h" #include "../box/pagefault_telemetry_box.h" +#include "c7_meta_used_counter_box.h" // ============================================================================ // Slab Carving API (Inline for Hot Path) @@ -46,11 +49,31 @@ static inline int slab_carve_from_ss(int class_idx, SuperSlab* ss, // Find an available slab in this SuperSlab int cap = ss_slabs_capacity(ss); + #if HAKMEM_BUILD_RELEASE + static _Atomic int rel_c7_meta_logged = 0; + TinySlabMeta* rel_c7_meta = NULL; + int rel_c7_meta_idx = -1; + #else + static __thread int dbg_c7_meta_logged = 0; + TinySlabMeta* dbg_c7_meta = NULL; + int dbg_c7_meta_idx = -1; + #endif for (int slab_idx = 0; slab_idx < cap; slab_idx++) { TinySlabMeta* meta = &ss->slabs[slab_idx]; // Check if this slab matches our class and has capacity if (meta->class_idx != (uint8_t)class_idx) continue; + #if HAKMEM_BUILD_RELEASE + if (class_idx == 7 && atomic_load_explicit(&rel_c7_meta_logged, memory_order_relaxed) == 0 && !rel_c7_meta) { + rel_c7_meta = meta; + rel_c7_meta_idx = slab_idx; + } + #else + if (class_idx == 7 && dbg_c7_meta_logged == 0 && !dbg_c7_meta) { + dbg_c7_meta = meta; + dbg_c7_meta_idx = slab_idx; + } + #endif if (meta->used >= meta->capacity && !meta->freelist) continue; // Carve blocks from this slab @@ -73,6 +96,7 @@ static inline int slab_carve_from_ss(int class_idx, SuperSlab* ss, meta->freelist = next_node; meta->used++; + c7_meta_used_note(class_idx, C7_META_USED_SRC_FRONT); } else if (meta->carved < meta->capacity) { // Linear carve @@ -84,6 +108,7 @@ static inline int slab_carve_from_ss(int class_idx, SuperSlab* ss, meta->carved++; meta->used++; + c7_meta_used_note(class_idx, C7_META_USED_SRC_FRONT); } else { break; // This slab exhausted @@ -99,6 +124,48 @@ static inline int slab_carve_from_ss(int class_idx, SuperSlab* ss, // If this slab had no freelist and no carved capacity, continue to next } +#if !HAKMEM_BUILD_RELEASE + static __thread int dbg_c7_slab_carve_zero_logs = 0; + if (class_idx == 7 && dbg_c7_slab_carve_zero_logs < 10) { + fprintf(stderr, "[C7_SLAB_CARVE_ZERO] ss=%p no blocks carved\n", (void*)ss); + dbg_c7_slab_carve_zero_logs++; + } +#endif + #if HAKMEM_BUILD_RELEASE + if (class_idx == 7 && + atomic_load_explicit(&rel_c7_meta_logged, memory_order_relaxed) == 0 && + rel_c7_meta) { + size_t bs = tiny_stride_for_class(class_idx); + fprintf(stderr, + "[REL_C7_CARVE_META] ss=%p slab=%d cls=%u used=%u cap=%u carved=%u freelist=%p stride=%zu slabs_cap=%d\n", + (void*)ss, + rel_c7_meta_idx, + (unsigned)rel_c7_meta->class_idx, + (unsigned)rel_c7_meta->used, + (unsigned)rel_c7_meta->capacity, + (unsigned)rel_c7_meta->carved, + rel_c7_meta->freelist, + bs, + cap); + atomic_store_explicit(&rel_c7_meta_logged, 1, memory_order_relaxed); + } + #else + if (class_idx == 7 && dbg_c7_meta_logged == 0 && dbg_c7_meta) { + size_t bs = tiny_stride_for_class(class_idx); + fprintf(stderr, + "[DBG_C7_CARVE_META] ss=%p slab=%d cls=%u used=%u cap=%u carved=%u freelist=%p stride=%zu slabs_cap=%d\n", + (void*)ss, + dbg_c7_meta_idx, + (unsigned)dbg_c7_meta->class_idx, + (unsigned)dbg_c7_meta->used, + (unsigned)dbg_c7_meta->capacity, + (unsigned)dbg_c7_meta->carved, + dbg_c7_meta->freelist, + bs, + cap); + dbg_c7_meta_logged = 1; + } + #endif return 0; // No slab in this SuperSlab had available capacity } diff --git a/core/box/ss_slab_reset_box.h b/core/box/ss_slab_reset_box.h new file mode 100644 index 00000000..f58c3d07 --- /dev/null +++ b/core/box/ss_slab_reset_box.h @@ -0,0 +1,26 @@ +// ss_slab_reset_box.h +// Box: Reset TinySlabMeta for reuse (C7 diagnostics-friendly) +#pragma once + +#include "ss_slab_meta_box.h" +#include "../superslab/superslab_inline.h" +#include + +static inline void ss_slab_reset_meta_for_tiny(SuperSlab* ss, + int slab_idx, + int class_idx) +{ + if (!ss) return; + if (slab_idx < 0 || slab_idx >= ss_slabs_capacity(ss)) return; + + TinySlabMeta* meta = &ss->slabs[slab_idx]; + meta->used = 0; + meta->carved = 0; + meta->freelist = NULL; + meta->class_idx = (uint8_t)class_idx; + ss->class_map[slab_idx] = (uint8_t)class_idx; + + // Reset remote queue state to avoid stale pending frees on reuse. + atomic_store_explicit(&ss->remote_heads[slab_idx], 0, memory_order_relaxed); + atomic_store_explicit(&ss->remote_counts[slab_idx], 0, memory_order_relaxed); +} diff --git a/core/box/ss_tls_bind_box.h b/core/box/ss_tls_bind_box.h index 008a645d..6fa9eafe 100644 --- a/core/box/ss_tls_bind_box.h +++ b/core/box/ss_tls_bind_box.h @@ -13,6 +13,7 @@ #include "../hakmem_tiny_config.h" #include "../box/tiny_page_box.h" // For tiny_page_box_on_new_slab() #include +#include // Forward declaration if not included // CRITICAL FIX: type must match core/hakmem_tiny_config.h (const size_t, not uint16_t) @@ -64,9 +65,7 @@ static inline int ss_tls_bind_one(int class_idx, // superslab_init_slab() only sets it if meta->class_idx==255. // We must explicitly set it to the requested class to avoid C0/C7 confusion. TinySlabMeta* meta = &ss->slabs[slab_idx]; -#if !HAKMEM_BUILD_RELEASE uint8_t old_cls = meta->class_idx; -#endif meta->class_idx = (uint8_t)class_idx; #if !HAKMEM_BUILD_RELEASE if (class_idx == 7 && old_cls != class_idx) { @@ -75,6 +74,36 @@ static inline int ss_tls_bind_one(int class_idx, } #endif +#if HAKMEM_BUILD_RELEASE + static _Atomic int rel_c7_bind_logged = 0; + if (class_idx == 7 && + atomic_load_explicit(&rel_c7_bind_logged, memory_order_relaxed) == 0) { + fprintf(stderr, + "[REL_C7_BIND] ss=%p slab=%d cls=%u cap=%u used=%u carved=%u\n", + (void*)ss, + slab_idx, + (unsigned)meta->class_idx, + (unsigned)meta->capacity, + (unsigned)meta->used, + (unsigned)meta->carved); + atomic_store_explicit(&rel_c7_bind_logged, 1, memory_order_relaxed); + } +#else + static __thread int dbg_c7_bind_logged = 0; + if (class_idx == 7 && dbg_c7_bind_logged == 0) { + fprintf(stderr, + "[DBG_C7_BIND] ss=%p slab=%d old_cls=%u new_cls=%u cap=%u used=%u carved=%u\n", + (void*)ss, + slab_idx, + (unsigned)old_cls, + (unsigned)meta->class_idx, + (unsigned)meta->capacity, + (unsigned)meta->used, + (unsigned)meta->carved); + dbg_c7_bind_logged = 1; + } +#endif + // Bind this slab to TLS for fast subsequent allocations. // Inline implementation of tiny_tls_bind_slab() to avoid header dependencies. // Original logic: @@ -109,4 +138,4 @@ static inline int ss_tls_bind_one(int class_idx, return 1; } -#endif // HAK_SS_TLS_BIND_BOX_H \ No newline at end of file +#endif // HAK_SS_TLS_BIND_BOX_H diff --git a/core/box/tiny_route_box.c b/core/box/tiny_route_box.c index 35e6b3ee..7aa64ba2 100644 --- a/core/box/tiny_route_box.c +++ b/core/box/tiny_route_box.c @@ -4,6 +4,7 @@ #include #include +#include // Default: conservative profile (all classes TINY_FIRST). // This keeps Tiny in the fast path but always allows Pool fallback. @@ -40,5 +41,16 @@ void tiny_route_init(void) // - 全クラス TINY_FIRST(Tiny を使うが必ず Pool fallbackあり) memset(g_tiny_route, ROUTE_TINY_FIRST, sizeof(g_tiny_route)); } -} + #if HAKMEM_BUILD_RELEASE + static int rel_logged = 0; + if (!rel_logged) { + const char* mode = + (g_tiny_route[7] == ROUTE_TINY_ONLY) ? "TINY_ONLY" : + (g_tiny_route[7] == ROUTE_TINY_FIRST) ? "TINY_FIRST" : + (g_tiny_route[7] == ROUTE_POOL_ONLY) ? "POOL_ONLY" : "UNKNOWN"; + fprintf(stderr, "[REL_C7_ROUTE] profile=%s route=%s\n", profile, mode); + rel_logged = 1; + } + #endif +} diff --git a/core/box/tiny_route_box.h b/core/box/tiny_route_box.h index da86f8f2..0bca16ba 100644 --- a/core/box/tiny_route_box.h +++ b/core/box/tiny_route_box.h @@ -19,6 +19,7 @@ #define TINY_ROUTE_BOX_H #include +#include // Routing policy per Tiny class. typedef enum { @@ -43,8 +44,21 @@ void tiny_route_init(void); // Uses simple array lookup; class_idx is masked to [0,7] defensively. static inline TinyRoutePolicy tiny_route_get(int class_idx) { - return (TinyRoutePolicy)g_tiny_route[class_idx & 7]; + TinyRoutePolicy p = (TinyRoutePolicy)g_tiny_route[class_idx & 7]; + #if HAKMEM_BUILD_RELEASE + if ((class_idx & 7) == 7) { + static int rel_route_logged = 0; + if (!rel_route_logged) { + const char* mode = + (p == ROUTE_TINY_ONLY) ? "TINY_ONLY" : + (p == ROUTE_TINY_FIRST) ? "TINY_FIRST" : + (p == ROUTE_POOL_ONLY) ? "POOL_ONLY" : "UNKNOWN"; + fprintf(stderr, "[REL_C7_ROUTE] via tiny_route_get route=%s\n", mode); + rel_route_logged = 1; + } + } + #endif + return p; } #endif // TINY_ROUTE_BOX_H - diff --git a/core/box/tiny_tls_carve_one_block_box.h b/core/box/tiny_tls_carve_one_block_box.h new file mode 100644 index 00000000..a3af9f0b --- /dev/null +++ b/core/box/tiny_tls_carve_one_block_box.h @@ -0,0 +1,102 @@ +// tiny_tls_carve_one_block_box.h +// Box: Shared TLS carve helper (linear or freelist) for Tiny classes. +#pragma once + +#include "../tiny_tls.h" +#include "../tiny_box_geometry.h" +#include "../tiny_debug_api.h" // tiny_refill_failfast_level(), tiny_failfast_abort_ptr() +#include "c7_meta_used_counter_box.h" // C7 meta->used telemetry (Release/Debug共通) +#include "tiny_next_ptr_box.h" +#include "../superslab/superslab_inline.h" +#include +#include + +#if !HAKMEM_BUILD_RELEASE +extern int g_tiny_safe_free; +extern int g_tiny_safe_free_strict; +#endif + +enum { + TINY_TLS_CARVE_PATH_NONE = 0, + TINY_TLS_CARVE_PATH_LINEAR = 1, + TINY_TLS_CARVE_PATH_FREELIST = 2, +}; + +typedef struct TinyTLSCarveOneResult { + void* block; + int path; +} TinyTLSCarveOneResult; + +// Carve one block from the current TLS slab. +// Returns .block == NULL on failure. path describes which sub-path was taken. +static inline TinyTLSCarveOneResult +tiny_tls_carve_one_block(TinyTLSSlab* tls, int class_idx) +{ + TinyTLSCarveOneResult res = {.block = NULL, .path = TINY_TLS_CARVE_PATH_NONE}; + + if (!tls) return res; + + TinySlabMeta* meta = tls->meta; + if (!meta || !tls->ss || tls->slab_base == NULL) return res; + if (meta->class_idx != (uint8_t)class_idx) return res; + if (tls->slab_idx < 0 || tls->slab_idx >= ss_slabs_capacity(tls->ss)) return res; + + // Freelist pop + if (meta->freelist) { +#if !HAKMEM_BUILD_RELEASE + if (__builtin_expect(g_tiny_safe_free, 0)) { + size_t blk = tiny_stride_for_class(meta->class_idx); + uint8_t* base = tiny_slab_base_for_geometry(tls->ss, tls->slab_idx); + uintptr_t delta = (uintptr_t)meta->freelist - (uintptr_t)base; + int align_ok = ((delta % blk) == 0); + int range_ok = (delta / blk) < meta->capacity; + if (!align_ok || !range_ok) { + if (g_tiny_safe_free_strict) { raise(SIGUSR2); return res; } + return res; + } + } +#endif + void* block = meta->freelist; + meta->freelist = tiny_next_read(class_idx, block); + meta->used++; + c7_meta_used_note(meta->class_idx, C7_META_USED_SRC_TLS); + ss_active_add(tls->ss, 1); + res.block = block; + res.path = TINY_TLS_CARVE_PATH_FREELIST; + return res; + } + + // Linear carve + if (meta->used < meta->capacity) { + size_t block_size = tiny_stride_for_class(meta->class_idx); + void* block = tiny_block_at_index(tls->slab_base, meta->used, block_size); + +#if !HAKMEM_BUILD_RELEASE + if (__builtin_expect(tiny_refill_failfast_level() >= 2, 0)) { + uintptr_t base_ss = (uintptr_t)tls->ss; + size_t ss_size = (size_t)1ULL << tls->ss->lg_size; + uintptr_t p = (uintptr_t)block; + int in_range = (p >= base_ss) && (p < base_ss + ss_size); + int aligned = ((p - (uintptr_t)tls->slab_base) % block_size) == 0; + int idx_ok = (tls->slab_idx >= 0) && + (tls->slab_idx < ss_slabs_capacity(tls->ss)); + if (!in_range || !aligned || !idx_ok || meta->used + 1 > meta->capacity) { + tiny_failfast_abort_ptr("tls_carve_align", + tls->ss, + tls->slab_idx, + block, + "tiny_tls_carve_one_block"); + } + } +#endif + + meta->used++; + c7_meta_used_note(meta->class_idx, C7_META_USED_SRC_TLS); + ss_active_add(tls->ss, 1); + res.block = block; + res.path = TINY_TLS_CARVE_PATH_LINEAR; + return res; + } + + return res; +} diff --git a/core/box/warm_pool_dbg_box.h b/core/box/warm_pool_dbg_box.h new file mode 100644 index 00000000..76924a98 --- /dev/null +++ b/core/box/warm_pool_dbg_box.h @@ -0,0 +1,121 @@ +// warm_pool_dbg_box.h +// Box: Debug-only counters for C7 Warm Pool instrumentation. +#pragma once + +#include +#include + +#if !HAKMEM_BUILD_RELEASE +#ifdef WARM_POOL_DBG_DEFINE +_Atomic uint64_t g_dbg_c7_warm_pop_attempts = 0; +_Atomic uint64_t g_dbg_c7_warm_pop_hits = 0; +_Atomic uint64_t g_dbg_c7_warm_pop_carve = 0; +_Atomic uint64_t g_dbg_c7_tls_carve_attempts = 0; +_Atomic uint64_t g_dbg_c7_tls_carve_success = 0; +_Atomic uint64_t g_dbg_c7_tls_carve_fail = 0; +_Atomic uint64_t g_dbg_c7_uc_miss_warm_refill = 0; +_Atomic uint64_t g_dbg_c7_uc_miss_tls_refill = 0; +_Atomic uint64_t g_dbg_c7_uc_miss_shared_refill = 0; +#else +extern _Atomic uint64_t g_dbg_c7_warm_pop_attempts; +extern _Atomic uint64_t g_dbg_c7_warm_pop_hits; +extern _Atomic uint64_t g_dbg_c7_warm_pop_carve; +extern _Atomic uint64_t g_dbg_c7_tls_carve_attempts; +extern _Atomic uint64_t g_dbg_c7_tls_carve_success; +extern _Atomic uint64_t g_dbg_c7_tls_carve_fail; +extern _Atomic uint64_t g_dbg_c7_uc_miss_warm_refill; +extern _Atomic uint64_t g_dbg_c7_uc_miss_tls_refill; +extern _Atomic uint64_t g_dbg_c7_uc_miss_shared_refill; +#endif + +static inline void warm_pool_dbg_c7_attempt(void) { + atomic_fetch_add_explicit(&g_dbg_c7_warm_pop_attempts, 1, memory_order_relaxed); +} + +static inline void warm_pool_dbg_c7_hit(void) { + atomic_fetch_add_explicit(&g_dbg_c7_warm_pop_hits, 1, memory_order_relaxed); +} + +static inline void warm_pool_dbg_c7_carve(void) { + atomic_fetch_add_explicit(&g_dbg_c7_warm_pop_carve, 1, memory_order_relaxed); +} + +static inline void warm_pool_dbg_c7_tls_attempt(void) { + atomic_fetch_add_explicit(&g_dbg_c7_tls_carve_attempts, 1, memory_order_relaxed); +} + +static inline void warm_pool_dbg_c7_tls_success(void) { + atomic_fetch_add_explicit(&g_dbg_c7_tls_carve_success, 1, memory_order_relaxed); +} + +static inline void warm_pool_dbg_c7_tls_fail(void) { + atomic_fetch_add_explicit(&g_dbg_c7_tls_carve_fail, 1, memory_order_relaxed); +} + +static inline void warm_pool_dbg_c7_uc_miss_warm(void) { + atomic_fetch_add_explicit(&g_dbg_c7_uc_miss_warm_refill, 1, memory_order_relaxed); +} + +static inline void warm_pool_dbg_c7_uc_miss_tls(void) { + atomic_fetch_add_explicit(&g_dbg_c7_uc_miss_tls_refill, 1, memory_order_relaxed); +} + +static inline void warm_pool_dbg_c7_uc_miss_shared(void) { + atomic_fetch_add_explicit(&g_dbg_c7_uc_miss_shared_refill, 1, memory_order_relaxed); +} + +static inline uint64_t warm_pool_dbg_c7_attempts(void) { + return atomic_load_explicit(&g_dbg_c7_warm_pop_attempts, memory_order_relaxed); +} + +static inline uint64_t warm_pool_dbg_c7_hits(void) { + return atomic_load_explicit(&g_dbg_c7_warm_pop_hits, memory_order_relaxed); +} + +static inline uint64_t warm_pool_dbg_c7_carves(void) { + return atomic_load_explicit(&g_dbg_c7_warm_pop_carve, memory_order_relaxed); +} + +static inline uint64_t warm_pool_dbg_c7_tls_attempts(void) { + return atomic_load_explicit(&g_dbg_c7_tls_carve_attempts, memory_order_relaxed); +} + +static inline uint64_t warm_pool_dbg_c7_tls_successes(void) { + return atomic_load_explicit(&g_dbg_c7_tls_carve_success, memory_order_relaxed); +} + +static inline uint64_t warm_pool_dbg_c7_tls_failures(void) { + return atomic_load_explicit(&g_dbg_c7_tls_carve_fail, memory_order_relaxed); +} + +static inline uint64_t warm_pool_dbg_c7_uc_miss_warm_refills(void) { + return atomic_load_explicit(&g_dbg_c7_uc_miss_warm_refill, memory_order_relaxed); +} + +static inline uint64_t warm_pool_dbg_c7_uc_miss_tls_refills(void) { + return atomic_load_explicit(&g_dbg_c7_uc_miss_tls_refill, memory_order_relaxed); +} + +static inline uint64_t warm_pool_dbg_c7_uc_miss_shared_refills(void) { + return atomic_load_explicit(&g_dbg_c7_uc_miss_shared_refill, memory_order_relaxed); +} +#else +static inline void warm_pool_dbg_c7_attempt(void) { } +static inline void warm_pool_dbg_c7_hit(void) { } +static inline void warm_pool_dbg_c7_carve(void) { } +static inline void warm_pool_dbg_c7_tls_attempt(void) { } +static inline void warm_pool_dbg_c7_tls_success(void) { } +static inline void warm_pool_dbg_c7_tls_fail(void) { } +static inline void warm_pool_dbg_c7_uc_miss_warm(void) { } +static inline void warm_pool_dbg_c7_uc_miss_tls(void) { } +static inline void warm_pool_dbg_c7_uc_miss_shared(void) { } +static inline uint64_t warm_pool_dbg_c7_attempts(void) { return 0; } +static inline uint64_t warm_pool_dbg_c7_hits(void) { return 0; } +static inline uint64_t warm_pool_dbg_c7_carves(void) { return 0; } +static inline uint64_t warm_pool_dbg_c7_tls_attempts(void) { return 0; } +static inline uint64_t warm_pool_dbg_c7_tls_successes(void) { return 0; } +static inline uint64_t warm_pool_dbg_c7_tls_failures(void) { return 0; } +static inline uint64_t warm_pool_dbg_c7_uc_miss_warm_refills(void) { return 0; } +static inline uint64_t warm_pool_dbg_c7_uc_miss_tls_refills(void) { return 0; } +static inline uint64_t warm_pool_dbg_c7_uc_miss_shared_refills(void) { return 0; } +#endif diff --git a/core/box/warm_pool_prefill_box.h b/core/box/warm_pool_prefill_box.h index dc3f764a..607d317c 100644 --- a/core/box/warm_pool_prefill_box.h +++ b/core/box/warm_pool_prefill_box.h @@ -7,11 +7,51 @@ #define HAK_WARM_POOL_PREFILL_BOX_H #include +#include +#include #include "../hakmem_tiny_config.h" #include "../hakmem_tiny_superslab.h" #include "../tiny_tls.h" #include "../front/tiny_warm_pool.h" #include "../box/warm_pool_stats_box.h" +#include "../box/warm_pool_rel_counters_box.h" + +static inline void warm_prefill_log_c7_meta(const char* tag, TinyTLSSlab* tls) { + if (!tls || !tls->ss) return; +#if HAKMEM_BUILD_RELEASE + static _Atomic uint32_t rel_logs = 0; + uint32_t n = atomic_fetch_add_explicit(&rel_logs, 1, memory_order_relaxed); + if (n < 4) { + TinySlabMeta* meta = &tls->ss->slabs[tls->slab_idx]; + fprintf(stderr, + "[REL_C7_%s] ss=%p slab=%u cls=%u used=%u cap=%u carved=%u freelist=%p\n", + tag, + (void*)tls->ss, + (unsigned)tls->slab_idx, + (unsigned)meta->class_idx, + (unsigned)meta->used, + (unsigned)meta->capacity, + (unsigned)meta->carved, + meta->freelist); + } +#else + static _Atomic uint32_t dbg_logs = 0; + uint32_t n = atomic_fetch_add_explicit(&dbg_logs, 1, memory_order_relaxed); + if (n < 4) { + TinySlabMeta* meta = &tls->ss->slabs[tls->slab_idx]; + fprintf(stderr, + "[DBG_C7_%s] ss=%p slab=%u cls=%u used=%u cap=%u carved=%u freelist=%p\n", + tag, + (void*)tls->ss, + (unsigned)tls->slab_idx, + (unsigned)meta->class_idx, + (unsigned)meta->used, + (unsigned)meta->capacity, + (unsigned)meta->carved, + meta->freelist); + } +#endif +} // Forward declarations extern __thread TinyTLSSlab g_tls_slabs[TINY_NUM_CLASSES]; @@ -45,9 +85,17 @@ extern SuperSlab* superslab_refill(int class_idx); // Performance: Only triggered when pool is empty, cold path cost // static inline int warm_pool_do_prefill(int class_idx, TinyTLSSlab* tls) { + #if HAKMEM_BUILD_RELEASE + if (class_idx == 7) { + warm_pool_rel_c7_prefill_call(); + } + #endif int budget = (tiny_warm_pool_count(class_idx) == 0) ? WARM_POOL_PREFILL_BUDGET : 1; while (budget > 0) { + if (class_idx == 7) { + warm_prefill_log_c7_meta("PREFILL_META", tls); + } if (!tls->ss) { // Need to load a new SuperSlab if (!superslab_refill(class_idx)) { @@ -61,16 +109,75 @@ static inline int warm_pool_do_prefill(int class_idx, TinyTLSSlab* tls) { break; } + // C7 safety: prefer only pristine slabs (used=0 carved=0 freelist=NULL) + if (class_idx == 7) { + TinySlabMeta* meta = &tls->ss->slabs[tls->slab_idx]; + if (meta->class_idx == 7 && + (meta->used > 0 || meta->carved > 0 || meta->freelist != NULL)) { + #if HAKMEM_BUILD_RELEASE + static _Atomic int rel_c7_skip_logged = 0; + if (atomic_load_explicit(&rel_c7_skip_logged, memory_order_relaxed) == 0) { + fprintf(stderr, + "[REL_C7_PREFILL_SKIP_NONEMPTY] ss=%p slab=%u used=%u cap=%u carved=%u freelist=%p\n", + (void*)tls->ss, + (unsigned)tls->slab_idx, + (unsigned)meta->used, + (unsigned)meta->capacity, + (unsigned)meta->carved, + meta->freelist); + atomic_store_explicit(&rel_c7_skip_logged, 1, memory_order_relaxed); + } + #else + static __thread int dbg_c7_skip_logged = 0; + if (dbg_c7_skip_logged < 4) { + fprintf(stderr, + "[DBG_C7_PREFILL_SKIP_NONEMPTY] ss=%p slab=%u used=%u cap=%u carved=%u freelist=%p\n", + (void*)tls->ss, + (unsigned)tls->slab_idx, + (unsigned)meta->used, + (unsigned)meta->capacity, + (unsigned)meta->carved, + meta->freelist); + dbg_c7_skip_logged++; + } + #endif + tls->ss = NULL; // Drop exhausted slab and try another + budget--; + continue; + } + } + if (budget > 1) { // Prefill mode: push to pool and load another tiny_warm_pool_push(class_idx, tls->ss); warm_pool_record_prefilled(class_idx); - tls->ss = NULL; // Force next iteration to refill - budget--; - } else { - // Final slab: keep in TLS for immediate carving - budget = 0; + #if HAKMEM_BUILD_RELEASE + if (class_idx == 7) { + warm_pool_rel_c7_prefill_slab(); } + #else + if (class_idx == 7) { + static __thread int dbg_c7_prefill_logs = 0; + if (dbg_c7_prefill_logs < 8) { + TinySlabMeta* meta = &tls->ss->slabs[tls->slab_idx]; + fprintf(stderr, + "[DBG_C7_PREFILL] ss=%p slab=%u used=%u cap=%u carved=%u freelist=%p\n", + (void*)tls->ss, + (unsigned)tls->slab_idx, + (unsigned)meta->used, + (unsigned)meta->capacity, + (unsigned)meta->carved, + meta->freelist); + dbg_c7_prefill_logs++; + } + } + #endif + tls->ss = NULL; // Force next iteration to refill + budget--; + } else { + // Final slab: keep in TLS for immediate carving + budget = 0; + } } return 0; // Success diff --git a/core/box/warm_pool_rel_counters_box.h b/core/box/warm_pool_rel_counters_box.h new file mode 100644 index 00000000..6ce79ae7 --- /dev/null +++ b/core/box/warm_pool_rel_counters_box.h @@ -0,0 +1,64 @@ +// warm_pool_rel_counters_box.h +// Box: Lightweight Release-side counters for C7 Warm/TLS instrumentation. +#pragma once + +#include +#include + +#if HAKMEM_BUILD_RELEASE +#ifdef WARM_POOL_REL_DEFINE +_Atomic uint64_t g_rel_c7_carve_attempts = 0; +_Atomic uint64_t g_rel_c7_carve_success = 0; +_Atomic uint64_t g_rel_c7_carve_zero = 0; +_Atomic uint64_t g_rel_c7_warm_prefill_calls = 0; +_Atomic uint64_t g_rel_c7_warm_prefill_slabs = 0; +#else +extern _Atomic uint64_t g_rel_c7_carve_attempts; +extern _Atomic uint64_t g_rel_c7_carve_success; +extern _Atomic uint64_t g_rel_c7_carve_zero; +extern _Atomic uint64_t g_rel_c7_warm_prefill_calls; +extern _Atomic uint64_t g_rel_c7_warm_prefill_slabs; +#endif + +static inline void warm_pool_rel_c7_carve_attempt(void) { + atomic_fetch_add_explicit(&g_rel_c7_carve_attempts, 1, memory_order_relaxed); +} +static inline void warm_pool_rel_c7_carve_success(void) { + atomic_fetch_add_explicit(&g_rel_c7_carve_success, 1, memory_order_relaxed); +} +static inline void warm_pool_rel_c7_carve_zero(void) { + atomic_fetch_add_explicit(&g_rel_c7_carve_zero, 1, memory_order_relaxed); +} +static inline void warm_pool_rel_c7_prefill_call(void) { + atomic_fetch_add_explicit(&g_rel_c7_warm_prefill_calls, 1, memory_order_relaxed); +} +static inline void warm_pool_rel_c7_prefill_slab(void) { + atomic_fetch_add_explicit(&g_rel_c7_warm_prefill_slabs, 1, memory_order_relaxed); +} +static inline uint64_t warm_pool_rel_c7_carve_attempts(void) { + return atomic_load_explicit(&g_rel_c7_carve_attempts, memory_order_relaxed); +} +static inline uint64_t warm_pool_rel_c7_carve_successes(void) { + return atomic_load_explicit(&g_rel_c7_carve_success, memory_order_relaxed); +} +static inline uint64_t warm_pool_rel_c7_carve_zeroes(void) { + return atomic_load_explicit(&g_rel_c7_carve_zero, memory_order_relaxed); +} +static inline uint64_t warm_pool_rel_c7_prefill_calls(void) { + return atomic_load_explicit(&g_rel_c7_warm_prefill_calls, memory_order_relaxed); +} +static inline uint64_t warm_pool_rel_c7_prefill_slabs(void) { + return atomic_load_explicit(&g_rel_c7_warm_prefill_slabs, memory_order_relaxed); +} +#else +static inline void warm_pool_rel_c7_carve_attempt(void) { } +static inline void warm_pool_rel_c7_carve_success(void) { } +static inline void warm_pool_rel_c7_carve_zero(void) { } +static inline void warm_pool_rel_c7_prefill_call(void) { } +static inline void warm_pool_rel_c7_prefill_slab(void) { } +static inline uint64_t warm_pool_rel_c7_carve_attempts(void) { return 0; } +static inline uint64_t warm_pool_rel_c7_carve_successes(void) { return 0; } +static inline uint64_t warm_pool_rel_c7_carve_zeroes(void) { return 0; } +static inline uint64_t warm_pool_rel_c7_prefill_calls(void) { return 0; } +static inline uint64_t warm_pool_rel_c7_prefill_slabs(void) { return 0; } +#endif diff --git a/core/box/warm_tls_bind_logger_box.h b/core/box/warm_tls_bind_logger_box.h new file mode 100644 index 00000000..dd958200 --- /dev/null +++ b/core/box/warm_tls_bind_logger_box.h @@ -0,0 +1,57 @@ +// warm_tls_bind_logger_box.h +// Box: Warm TLS Bind experiment logging with simple throttling. +#pragma once + +#include "../hakmem_tiny_superslab.h" +#include +#include +#include + +#if !HAKMEM_BUILD_RELEASE +static _Atomic int g_warm_tls_bind_log_limit = -1; +static _Atomic int g_warm_tls_bind_log_count = 0; + +static inline int warm_tls_bind_log_limit(void) { + int limit = atomic_load_explicit(&g_warm_tls_bind_log_limit, memory_order_relaxed); + if (__builtin_expect(limit == -1, 0)) { + const char* e = getenv("HAKMEM_WARM_TLS_BIND_LOG_MAX"); + int parsed = (e && *e) ? atoi(e) : 1; + atomic_store_explicit(&g_warm_tls_bind_log_limit, parsed, memory_order_relaxed); + limit = parsed; + } + return limit; +} + +static inline int warm_tls_bind_log_acquire(void) { + int limit = warm_tls_bind_log_limit(); + int prev = atomic_fetch_add_explicit(&g_warm_tls_bind_log_count, 1, memory_order_relaxed); + return prev < limit; +} + +static inline void warm_tls_bind_log_success(SuperSlab* ss, int slab_idx) { + if (warm_tls_bind_log_acquire()) { + fprintf(stderr, "[WARM_TLS_BIND] C7 bind success: ss=%p slab=%d\n", + (void*)ss, slab_idx); + } +} + +static inline void warm_tls_bind_log_tls_carve(SuperSlab* ss, int slab_idx, void* block) { + if (warm_tls_bind_log_acquire()) { + fprintf(stderr, + "[WARM_TLS_BIND] C7 TLS carve success: ss=%p slab=%d block=%p\n", + (void*)ss, slab_idx, block); + } +} + +static inline void warm_tls_bind_log_tls_fail(SuperSlab* ss, int slab_idx) { + if (warm_tls_bind_log_acquire()) { + fprintf(stderr, + "[WARM_TLS_BIND] C7 TLS carve failed, fallback (ss=%p slab=%d)\n", + (void*)ss, slab_idx); + } +} +#else +static inline void warm_tls_bind_log_success(SuperSlab* ss, int slab_idx) { (void)ss; (void)slab_idx; } +static inline void warm_tls_bind_log_tls_carve(SuperSlab* ss, int slab_idx, void* block) { (void)ss; (void)slab_idx; (void)block; } +static inline void warm_tls_bind_log_tls_fail(SuperSlab* ss, int slab_idx) { (void)ss; (void)slab_idx; } +#endif diff --git a/core/front/tiny_unified_cache.c b/core/front/tiny_unified_cache.c index 26ec29aa..ee8261f2 100644 --- a/core/front/tiny_unified_cache.c +++ b/core/front/tiny_unified_cache.c @@ -12,10 +12,19 @@ #include "../box/ss_slab_meta_box.h" // For ss_active_add() and slab metadata operations #include "../box/warm_pool_stats_box.h" // Box: Warm Pool Statistics Recording (inline) #include "../box/slab_carve_box.h" // Box: Slab Carving (inline O(slabs) scan) +#define WARM_POOL_REL_DEFINE +#include "../box/warm_pool_rel_counters_box.h" // Box: Release-side C7 counters +#undef WARM_POOL_REL_DEFINE +#include "../box/c7_meta_used_counter_box.h" // Box: C7 meta->used increment counters #include "../box/warm_pool_prefill_box.h" // Box: Warm Pool Prefill (secondary optimization) #include "../hakmem_env_cache.h" // Priority-2: ENV cache (eliminate syscalls) #include "../box/tiny_page_box.h" // Tiny-Plus Page Box (C5–C7 initial hook) #include "../box/ss_tls_bind_box.h" // Box: TLS Bind (SuperSlab -> TLS binding) +#include "../box/tiny_tls_carve_one_block_box.h" // Box: TLS carve helper (shared) +#include "../box/warm_tls_bind_logger_box.h" // Box: Warm TLS Bind logging (throttled) +#define WARM_POOL_DBG_DEFINE +#include "../box/warm_pool_dbg_box.h" // Box: Warm Pool C7 debug counters +#undef WARM_POOL_DBG_DEFINE #include #include #include @@ -84,6 +93,12 @@ __thread uint64_t g_unified_cache_push[TINY_NUM_CLASSES] = {0}; __thread uint64_t g_unified_cache_full[TINY_NUM_CLASSES] = {0}; #endif +// Release-side lightweight telemetry (C7 Warm path only) +#if HAKMEM_BUILD_RELEASE +_Atomic uint64_t g_rel_c7_warm_pop = 0; +_Atomic uint64_t g_rel_c7_warm_push = 0; +#endif + // Warm Pool metrics (definition - declared in tiny_warm_pool.h as extern) // Note: These are kept outside !HAKMEM_BUILD_RELEASE for profiling in release builds __thread TinyWarmPoolStats g_warm_pool_stats[TINY_NUM_CLASSES] = {0}; @@ -98,46 +113,36 @@ _Atomic uint64_t g_dbg_warm_pop_attempts = 0; _Atomic uint64_t g_dbg_warm_pop_hits = 0; _Atomic uint64_t g_dbg_warm_pop_empty = 0; _Atomic uint64_t g_dbg_warm_pop_carve_zero = 0; +#endif -// Debug-only: cached ENV for Warm TLS Bind (C7) -static int g_warm_tls_bind_mode_c7 = -1; - +// Warm TLS Bind (C7) mode selector +// mode 0: Legacy warm path(デバッグ専用・C7では非推奨) +// mode 1: Bind-only 本番経路(C7 標準) +// mode 2: Bind + TLS carve 実験経路(Debug 専用) +// Release ビルドでは常に mode=1 に固定し、ENV は無視する。 static inline int warm_tls_bind_mode_c7(void) { +#if HAKMEM_BUILD_RELEASE + static int g_warm_tls_bind_mode_c7 = -1; if (__builtin_expect(g_warm_tls_bind_mode_c7 == -1, 0)) { const char* e = getenv("HAKMEM_WARM_TLS_BIND_C7"); - // 0/empty: disabled, 1: bind only, 2: bind + TLS carve one block - g_warm_tls_bind_mode_c7 = (e && *e) ? atoi(e) : 0; + int mode = (e && *e) ? atoi(e) : 1; // default = Bind-only + if (mode < 0) mode = 0; + if (mode > 2) mode = 2; + g_warm_tls_bind_mode_c7 = mode; } return g_warm_tls_bind_mode_c7; -} - -static inline void* warm_tls_carve_one_block(int class_idx) { - TinyTLSSlab* tls = &g_tls_slabs[class_idx]; - TinySlabMeta* meta = tls->meta; - - if (!meta || !tls->ss || tls->slab_base == NULL) return NULL; - if (meta->class_idx != (uint8_t)class_idx) return NULL; - if (tls->slab_idx < 0 || tls->slab_idx >= ss_slabs_capacity(tls->ss)) return NULL; - - if (meta->freelist) { - void* block = meta->freelist; - meta->freelist = tiny_next_read(class_idx, block); - meta->used++; - ss_active_add(tls->ss, 1); - return block; +#else + static int g_warm_tls_bind_mode_c7 = -1; + if (__builtin_expect(g_warm_tls_bind_mode_c7 == -1, 0)) { + const char* e = getenv("HAKMEM_WARM_TLS_BIND_C7"); + int mode = (e && *e) ? atoi(e) : 1; // default = Bind-only + if (mode < 0) mode = 0; + if (mode > 2) mode = 2; + g_warm_tls_bind_mode_c7 = mode; } - - if (meta->used < meta->capacity) { - size_t block_size = tiny_stride_for_class(meta->class_idx); - void* block = tiny_block_at_index(tls->slab_base, meta->used, block_size); - meta->used++; - ss_active_add(tls->ss, 1); - return block; - } - - return NULL; -} + return g_warm_tls_bind_mode_c7; #endif +} // Forward declaration for Warm Pool stats printer (defined later in this file) static inline void tiny_warm_pool_print_stats(void); @@ -157,6 +162,15 @@ int unified_cache_enabled(void) { fprintf(stderr, "[Unified-INIT] unified_cache_enabled() = %d\n", g_enable); fflush(stderr); } +#else + if (g_enable) { + static int printed = 0; + if (!printed) { + fprintf(stderr, "[Rel-Unified] unified_cache_enabled() = %d\n", g_enable); + fflush(stderr); + printed = 1; + } + } #endif } return g_enable; @@ -311,6 +325,32 @@ static inline void tiny_warm_pool_print_stats(void) { (unsigned long long)atomic_load_explicit(&g_dbg_warm_pop_hits, memory_order_relaxed), (unsigned long long)atomic_load_explicit(&g_dbg_warm_pop_empty, memory_order_relaxed), (unsigned long long)atomic_load_explicit(&g_dbg_warm_pop_carve_zero, memory_order_relaxed)); + uint64_t c7_attempts = warm_pool_dbg_c7_attempts(); + uint64_t c7_hits = warm_pool_dbg_c7_hits(); + uint64_t c7_carve = warm_pool_dbg_c7_carves(); + uint64_t c7_tls_attempts = warm_pool_dbg_c7_tls_attempts(); + uint64_t c7_tls_success = warm_pool_dbg_c7_tls_successes(); + uint64_t c7_tls_fail = warm_pool_dbg_c7_tls_failures(); + uint64_t c7_uc_warm = warm_pool_dbg_c7_uc_miss_warm_refills(); + uint64_t c7_uc_tls = warm_pool_dbg_c7_uc_miss_tls_refills(); + uint64_t c7_uc_shared = warm_pool_dbg_c7_uc_miss_shared_refills(); + if (c7_attempts || c7_hits || c7_carve || + c7_tls_attempts || c7_tls_success || c7_tls_fail || + c7_uc_warm || c7_uc_tls || c7_uc_shared) { + fprintf(stderr, + " [DBG_C7] warm_pop_attempts=%llu warm_pop_hits=%llu warm_pop_carve=%llu " + "tls_carve_attempts=%llu tls_carve_success=%llu tls_carve_fail=%llu " + "uc_miss_warm=%llu uc_miss_tls=%llu uc_miss_shared=%llu\n", + (unsigned long long)c7_attempts, + (unsigned long long)c7_hits, + (unsigned long long)c7_carve, + (unsigned long long)c7_tls_attempts, + (unsigned long long)c7_tls_success, + (unsigned long long)c7_tls_fail, + (unsigned long long)c7_uc_warm, + (unsigned long long)c7_uc_tls, + (unsigned long long)c7_uc_shared); + } #endif fflush(stderr); } @@ -515,6 +555,7 @@ hak_base_ptr_t unified_cache_refill(int class_idx) { // - これにより、room <= max_batch <= 512 が常に成り立ち、out[] オーバーランを防止する。 void* out[512]; int produced = 0; + int tls_carved = 0; // Debug bookkeeping: track TLS carve experiment hits // ========== PAGE BOX HOT PATH(Tiny-Plus 層): Try page box FIRST ========== // 将来的に C7 専用の page-level freelist 管理をここに統合する。 @@ -554,10 +595,21 @@ hak_base_ptr_t unified_cache_refill(int class_idx) { // This is the critical optimization - avoid superslab_refill() registry scan #if !HAKMEM_BUILD_RELEASE atomic_fetch_add_explicit(&g_dbg_warm_pop_attempts, 1, memory_order_relaxed); + if (class_idx == 7) { + warm_pool_dbg_c7_attempt(); + } + #endif + #if HAKMEM_BUILD_RELEASE + if (class_idx == 7) { + atomic_fetch_add_explicit(&g_rel_c7_warm_pop, 1, memory_order_relaxed); + } #endif SuperSlab* warm_ss = tiny_warm_pool_pop(class_idx); if (warm_ss) { #if !HAKMEM_BUILD_RELEASE + if (class_idx == 7) { + warm_pool_dbg_c7_hit(); + } // Debug-only: Warm TLS Bind experiment (C7 only) if (class_idx == 7) { int warm_mode = warm_tls_bind_mode_c7(); @@ -577,25 +629,22 @@ hak_base_ptr_t unified_cache_refill(int class_idx) { TinyTLSSlab* tls = &g_tls_slabs[class_idx]; uint32_t tid = (uint32_t)(uintptr_t)pthread_self(); if (ss_tls_bind_one(class_idx, tls, warm_ss, slab_idx, tid)) { - static int logged = 0; - if (!logged) { - fprintf(stderr, "[WARM_TLS_BIND] C7 bind success: ss=%p slab=%d\n", - (void*)warm_ss, slab_idx); - logged = 1; - } + warm_tls_bind_log_success(warm_ss, slab_idx); // Mode 2: carve a single block via TLS fast path if (warm_mode == 2) { - void* tls_block = warm_tls_carve_one_block(class_idx); - if (tls_block) { - fprintf(stderr, - "[WARM_TLS_BIND] C7 TLS carve success: ss=%p slab=%d block=%p\n", - (void*)warm_ss, slab_idx, tls_block); - out[0] = tls_block; + warm_pool_dbg_c7_tls_attempt(); + TinyTLSCarveOneResult tls_carve = + tiny_tls_carve_one_block(tls, class_idx); + if (tls_carve.block) { + warm_tls_bind_log_tls_carve(warm_ss, slab_idx, tls_carve.block); + warm_pool_dbg_c7_tls_success(); + out[0] = tls_carve.block; produced = 1; + tls_carved = 1; } else { - fprintf(stderr, - "[WARM_TLS_BIND] C7 TLS carve failed, fallback\n"); + warm_tls_bind_log_tls_fail(warm_ss, slab_idx); + warm_pool_dbg_c7_tls_fail(); } } } @@ -607,7 +656,21 @@ hak_base_ptr_t unified_cache_refill(int class_idx) { #endif // HOT PATH: Warm pool hit, try to carve directly if (produced == 0) { + #if HAKMEM_BUILD_RELEASE + if (class_idx == 7) { + warm_pool_rel_c7_carve_attempt(); + } + #endif produced = slab_carve_from_ss(class_idx, warm_ss, out, room); + #if HAKMEM_BUILD_RELEASE + if (class_idx == 7) { + if (produced > 0) { + warm_pool_rel_c7_carve_success(); + } else { + warm_pool_rel_c7_carve_zero(); + } + } + #endif if (produced > 0) { // Update active counter for carved blocks ss_active_add(warm_ss, (uint32_t)produced); @@ -615,7 +678,22 @@ hak_base_ptr_t unified_cache_refill(int class_idx) { } if (produced > 0) { + #if !HAKMEM_BUILD_RELEASE + if (class_idx == 7) { + warm_pool_dbg_c7_carve(); + if (tls_carved) { + warm_pool_dbg_c7_uc_miss_tls(); + } else { + warm_pool_dbg_c7_uc_miss_warm(); + } + } + #endif // Success! Return SuperSlab to warm pool for next use + #if HAKMEM_BUILD_RELEASE + if (class_idx == 7) { + atomic_fetch_add_explicit(&g_rel_c7_warm_push, 1, memory_order_relaxed); + } + #endif tiny_warm_pool_push(class_idx, warm_ss); // Track warm pool hit (always compiled, ENV-gated printing) @@ -761,6 +839,9 @@ hak_base_ptr_t unified_cache_refill(int class_idx) { } #if !HAKMEM_BUILD_RELEASE + if (class_idx == 7) { + warm_pool_dbg_c7_uc_miss_shared(); + } g_unified_cache_miss[class_idx]++; #endif diff --git a/core/front/tiny_unified_cache.d b/core/front/tiny_unified_cache.d index 041b8e10..6fa3ca67 100644 --- a/core/front/tiny_unified_cache.d +++ b/core/front/tiny_unified_cache.d @@ -40,10 +40,18 @@ core/front/tiny_unified_cache.o: core/front/tiny_unified_cache.c \ core/front/../box/../superslab/superslab_inline.h \ core/front/../box/../tiny_box_geometry.h \ core/front/../box/../box/pagefault_telemetry_box.h \ + core/front/../box/c7_meta_used_counter_box.h \ + core/front/../box/warm_pool_rel_counters_box.h \ core/front/../box/warm_pool_prefill_box.h \ core/front/../box/../tiny_tls.h \ core/front/../box/../box/warm_pool_stats_box.h \ - core/front/../hakmem_env_cache.h core/front/../box/tiny_page_box.h + core/front/../hakmem_env_cache.h core/front/../box/tiny_page_box.h \ + core/front/../box/ss_tls_bind_box.h \ + core/front/../box/../box/tiny_page_box.h \ + core/front/../box/tiny_tls_carve_one_block_box.h \ + core/front/../box/../tiny_debug_api.h \ + core/front/../box/warm_tls_bind_logger_box.h \ + core/front/../box/warm_pool_dbg_box.h core/front/tiny_unified_cache.h: core/front/../hakmem_build_flags.h: core/front/../hakmem_tiny_config.h: @@ -104,8 +112,16 @@ core/front/../box/../hakmem_tiny_superslab.h: core/front/../box/../superslab/superslab_inline.h: core/front/../box/../tiny_box_geometry.h: core/front/../box/../box/pagefault_telemetry_box.h: +core/front/../box/c7_meta_used_counter_box.h: +core/front/../box/warm_pool_rel_counters_box.h: core/front/../box/warm_pool_prefill_box.h: core/front/../box/../tiny_tls.h: core/front/../box/../box/warm_pool_stats_box.h: core/front/../hakmem_env_cache.h: core/front/../box/tiny_page_box.h: +core/front/../box/ss_tls_bind_box.h: +core/front/../box/../box/tiny_page_box.h: +core/front/../box/tiny_tls_carve_one_block_box.h: +core/front/../box/../tiny_debug_api.h: +core/front/../box/warm_tls_bind_logger_box.h: +core/front/../box/warm_pool_dbg_box.h: diff --git a/core/front/tiny_unified_cache.h b/core/front/tiny_unified_cache.h index 0d1e69cf..85e81211 100644 --- a/core/front/tiny_unified_cache.h +++ b/core/front/tiny_unified_cache.h @@ -87,6 +87,10 @@ extern __thread uint64_t g_unified_cache_hit[TINY_NUM_CLASSES]; // Alloc hits extern __thread uint64_t g_unified_cache_miss[TINY_NUM_CLASSES]; // Alloc misses extern __thread uint64_t g_unified_cache_push[TINY_NUM_CLASSES]; // Free pushes extern __thread uint64_t g_unified_cache_full[TINY_NUM_CLASSES]; // Free full (fallback to SuperSlab) +#else +// Release-side lightweight C7 warm path counters (for smoke validation) +extern _Atomic uint64_t g_rel_c7_warm_pop; +extern _Atomic uint64_t g_rel_c7_warm_push; #endif // ============================================================================ diff --git a/core/hakmem_shared_pool_acquire.c b/core/hakmem_shared_pool_acquire.c index d6822bb6..6e2bb543 100644 --- a/core/hakmem_shared_pool_acquire.c +++ b/core/hakmem_shared_pool_acquire.c @@ -10,11 +10,145 @@ #include "hakmem_policy.h" #include "hakmem_env_cache.h" // Priority-2: ENV cache #include "front/tiny_warm_pool.h" // Warm Pool: Prefill during registry scans +#include "box/ss_slab_reset_box.h" // Box: Reset slab metadata on reuse (C7 guard) #include #include #include +static inline void c7_log_meta_state(const char* tag, SuperSlab* ss, int slab_idx) { + if (!ss) return; +#if HAKMEM_BUILD_RELEASE + static _Atomic uint32_t rel_c7_meta_logs = 0; + uint32_t n = atomic_fetch_add_explicit(&rel_c7_meta_logs, 1, memory_order_relaxed); + if (n < 8) { + TinySlabMeta* m = &ss->slabs[slab_idx]; + fprintf(stderr, + "[REL_C7_%s] ss=%p slab=%d cls=%u used=%u cap=%u carved=%u freelist=%p\n", + tag, + (void*)ss, + slab_idx, + (unsigned)m->class_idx, + (unsigned)m->used, + (unsigned)m->capacity, + (unsigned)m->carved, + m->freelist); + } +#else + static _Atomic uint32_t dbg_c7_meta_logs = 0; + uint32_t n = atomic_fetch_add_explicit(&dbg_c7_meta_logs, 1, memory_order_relaxed); + if (n < 8) { + TinySlabMeta* m = &ss->slabs[slab_idx]; + fprintf(stderr, + "[DBG_C7_%s] ss=%p slab=%d cls=%u used=%u cap=%u carved=%u freelist=%p\n", + tag, + (void*)ss, + slab_idx, + (unsigned)m->class_idx, + (unsigned)m->used, + (unsigned)m->capacity, + (unsigned)m->carved, + m->freelist); + } +#endif +} + +static inline int c7_meta_is_pristine(TinySlabMeta* m) { + return m && m->used == 0 && m->carved == 0 && m->freelist == NULL; +} + +static inline void c7_log_skip_nonempty_acquire(SuperSlab* ss, + int slab_idx, + TinySlabMeta* m, + const char* tag) { + if (!(ss && m)) return; +#if HAKMEM_BUILD_RELEASE + static _Atomic uint32_t rel_c7_skip_logs = 0; + uint32_t n = atomic_fetch_add_explicit(&rel_c7_skip_logs, 1, memory_order_relaxed); + if (n < 4) { + fprintf(stderr, + "[REL_C7_%s] ss=%p slab=%d cls=%u used=%u cap=%u carved=%u freelist=%p\n", + tag, + (void*)ss, + slab_idx, + (unsigned)m->class_idx, + (unsigned)m->used, + (unsigned)m->capacity, + (unsigned)m->carved, + m->freelist); + } +#else + static _Atomic uint32_t dbg_c7_skip_logs = 0; + uint32_t n = atomic_fetch_add_explicit(&dbg_c7_skip_logs, 1, memory_order_relaxed); + if (n < 4) { + fprintf(stderr, + "[DBG_C7_%s] ss=%p slab=%d cls=%u used=%u cap=%u carved=%u freelist=%p\n", + tag, + (void*)ss, + slab_idx, + (unsigned)m->class_idx, + (unsigned)m->used, + (unsigned)m->capacity, + (unsigned)m->carved, + m->freelist); + } +#endif +} + +static inline int c7_reset_and_log_if_needed(SuperSlab* ss, + int slab_idx, + int class_idx) { + if (class_idx != 7) { + return 0; + } + + TinySlabMeta* m = &ss->slabs[slab_idx]; + c7_log_meta_state("ACQUIRE_META", ss, slab_idx); + + if (m->class_idx != 255 && m->class_idx != (uint8_t)class_idx) { +#if HAKMEM_BUILD_RELEASE + static _Atomic uint32_t rel_c7_class_mismatch_logs = 0; + uint32_t n = atomic_fetch_add_explicit(&rel_c7_class_mismatch_logs, 1, memory_order_relaxed); + if (n < 4) { + fprintf(stderr, + "[REL_C7_CLASS_MISMATCH] ss=%p slab=%d want=%d have=%u used=%u cap=%u carved=%u\n", + (void*)ss, + slab_idx, + class_idx, + (unsigned)m->class_idx, + (unsigned)m->used, + (unsigned)m->capacity, + (unsigned)m->carved); + } +#else + static _Atomic uint32_t dbg_c7_class_mismatch_logs = 0; + uint32_t n = atomic_fetch_add_explicit(&dbg_c7_class_mismatch_logs, 1, memory_order_relaxed); + if (n < 4) { + fprintf(stderr, + "[DBG_C7_CLASS_MISMATCH] ss=%p slab=%d want=%d have=%u used=%u cap=%u carved=%u freelist=%p\n", + (void*)ss, + slab_idx, + class_idx, + (unsigned)m->class_idx, + (unsigned)m->used, + (unsigned)m->capacity, + (unsigned)m->carved, + m->freelist); + } +#endif + return -1; + } + + if (!c7_meta_is_pristine(m)) { + c7_log_skip_nonempty_acquire(ss, slab_idx, m, "SKIP_NONEMPTY_ACQUIRE"); + return -1; + } + + ss_slab_reset_meta_for_tiny(ss, slab_idx, class_idx); + c7_log_meta_state("ACQUIRE", ss, slab_idx); + return 0; +} + // ============================================================================ // Performance Measurement: Shared Pool Lock Contention (ENV-gated) // ============================================================================ @@ -147,7 +281,12 @@ sp_acquire_from_empty_scan(int class_idx, SuperSlab** ss_out, int* slab_idx_out, fprintf(stderr, "[STAGE0.5_STATS] hits=%lu attempts=%lu rate=%.1f%% (scan_limit=%d warm_pool=%d)\n", hits, attempts, (double)hits * 100.0 / attempts, scan_limit, tiny_warm_pool_count(class_idx)); } - return 0; + if (c7_reset_and_log_if_needed(primary_result, primary_slab_idx, class_idx) == 0) { + return 0; + } + primary_result = NULL; + *ss_out = NULL; + *slab_idx_out = -1; } return -1; } @@ -216,6 +355,15 @@ stage1_retry_after_tension_drain: if (ss_guard) { tiny_tls_slab_reuse_guard(ss_guard); + if (class_idx == 7) { + TinySlabMeta* meta = &ss_guard->slabs[reuse_slot_idx]; + if (!c7_meta_is_pristine(meta)) { + c7_log_skip_nonempty_acquire(ss_guard, reuse_slot_idx, meta, "SKIP_NONEMPTY_ACQUIRE"); + sp_freelist_push_lockfree(class_idx, reuse_meta, reuse_slot_idx); + goto stage2_fallback; + } + } + // P-Tier: Skip DRAINING tier SuperSlabs if (!ss_tier_is_hot(ss_guard)) { // DRAINING SuperSlab - skip this slot and fall through to Stage 2 @@ -270,6 +418,15 @@ stage1_retry_after_tension_drain: *ss_out = ss; *slab_idx_out = reuse_slot_idx; + if (c7_reset_and_log_if_needed(ss, reuse_slot_idx, class_idx) != 0) { + *ss_out = NULL; + *slab_idx_out = -1; + if (g_lock_stats_enabled == 1) { + atomic_fetch_add(&g_lock_release_count, 1); + } + pthread_mutex_unlock(&g_shared_pool.alloc_lock); + goto stage2_fallback; + } if (g_lock_stats_enabled == 1) { atomic_fetch_add(&g_lock_release_count, 1); @@ -338,6 +495,19 @@ stage2_fallback: 1, memory_order_relaxed); } + if (class_idx == 7) { + TinySlabMeta* meta = &ss->slabs[claimed_idx]; + if (!c7_meta_is_pristine(meta)) { + c7_log_skip_nonempty_acquire(ss, claimed_idx, meta, "SKIP_NONEMPTY_ACQUIRE"); + sp_slot_mark_empty(hint_meta, claimed_idx); + if (g_lock_stats_enabled == 1) { + atomic_fetch_add(&g_lock_release_count, 1); + } + pthread_mutex_unlock(&g_shared_pool.alloc_lock); + goto stage2_scan; + } + } + // Update SuperSlab metadata under mutex ss->slab_bitmap |= (1u << claimed_idx); ss_slab_meta_class_idx_set(ss, claimed_idx, (uint8_t)class_idx); @@ -353,6 +523,15 @@ stage2_fallback: // Hint is still good, no need to update *ss_out = ss; *slab_idx_out = claimed_idx; + if (c7_reset_and_log_if_needed(ss, claimed_idx, class_idx) != 0) { + *ss_out = NULL; + *slab_idx_out = -1; + if (g_lock_stats_enabled == 1) { + atomic_fetch_add(&g_lock_release_count, 1); + } + pthread_mutex_unlock(&g_shared_pool.alloc_lock); + goto stage2_scan; + } sp_fix_geometry_if_needed(ss, claimed_idx, class_idx); if (g_lock_stats_enabled == 1) { @@ -432,6 +611,19 @@ stage2_scan: 1, memory_order_relaxed); } + if (class_idx == 7) { + TinySlabMeta* meta_slab = &ss->slabs[claimed_idx]; + if (!c7_meta_is_pristine(meta_slab)) { + c7_log_skip_nonempty_acquire(ss, claimed_idx, meta_slab, "SKIP_NONEMPTY_ACQUIRE"); + sp_slot_mark_empty(meta, claimed_idx); + if (g_lock_stats_enabled == 1) { + atomic_fetch_add(&g_lock_release_count, 1); + } + pthread_mutex_unlock(&g_shared_pool.alloc_lock); + continue; + } + } + // Update SuperSlab metadata under mutex ss->slab_bitmap |= (1u << claimed_idx); ss_slab_meta_class_idx_set(ss, claimed_idx, (uint8_t)class_idx); @@ -449,6 +641,15 @@ stage2_scan: *ss_out = ss; *slab_idx_out = claimed_idx; + if (c7_reset_and_log_if_needed(ss, claimed_idx, class_idx) != 0) { + *ss_out = NULL; + *slab_idx_out = -1; + if (g_lock_stats_enabled == 1) { + atomic_fetch_add(&g_lock_release_count, 1); + } + pthread_mutex_unlock(&g_shared_pool.alloc_lock); + continue; + } sp_fix_geometry_if_needed(ss, claimed_idx, class_idx); if (g_lock_stats_enabled == 1) { @@ -623,6 +824,15 @@ stage2_scan: *ss_out = new_ss; *slab_idx_out = first_slot; + if (c7_reset_and_log_if_needed(new_ss, first_slot, class_idx) != 0) { + *ss_out = NULL; + *slab_idx_out = -1; + if (g_lock_stats_enabled == 1) { + atomic_fetch_add(&g_lock_release_count, 1); + } + pthread_mutex_unlock(&g_shared_pool.alloc_lock); + return -1; + } sp_fix_geometry_if_needed(new_ss, first_slot, class_idx); if (g_lock_stats_enabled == 1) { diff --git a/core/hakmem_shared_pool_release.c b/core/hakmem_shared_pool_release.c index cdad60c3..71a37b6e 100644 --- a/core/hakmem_shared_pool_release.c +++ b/core/hakmem_shared_pool_release.c @@ -6,11 +6,42 @@ #include "hakmem_env_cache.h" // Priority-2: ENV cache #include "superslab/superslab_inline.h" // superslab_ref_get guard for TLS pins #include "box/ss_release_guard_box.h" // Box: SuperSlab Release Guard +#include "box/ss_slab_reset_box.h" // Box: Reset slab metadata on reuse path #include #include #include +static inline void c7_release_log_once(SuperSlab* ss, int slab_idx) { +#if HAKMEM_BUILD_RELEASE + static _Atomic uint32_t rel_c7_release_logs = 0; + uint32_t n = atomic_fetch_add_explicit(&rel_c7_release_logs, 1, memory_order_relaxed); + if (n < 8) { + TinySlabMeta* meta = &ss->slabs[slab_idx]; + fprintf(stderr, + "[REL_C7_RELEASE] ss=%p slab=%d used=%u cap=%u carved=%u\n", + (void*)ss, + slab_idx, + (unsigned)meta->used, + (unsigned)meta->capacity, + (unsigned)meta->carved); + } +#else + static _Atomic uint32_t dbg_c7_release_logs = 0; + uint32_t n = atomic_fetch_add_explicit(&dbg_c7_release_logs, 1, memory_order_relaxed); + if (n < 8) { + TinySlabMeta* meta = &ss->slabs[slab_idx]; + fprintf(stderr, + "[DBG_C7_RELEASE] ss=%p slab=%d used=%u cap=%u carved=%u\n", + (void*)ss, + slab_idx, + (unsigned)meta->used, + (unsigned)meta->capacity, + (unsigned)meta->carved); + } +#endif +} + void shared_pool_release_slab(SuperSlab* ss, int slab_idx) { @@ -75,6 +106,9 @@ shared_pool_release_slab(SuperSlab* ss, int slab_idx) } uint8_t class_idx = slab_meta->class_idx; + if (class_idx == 7) { + c7_release_log_once(ss, slab_idx); + } // Guard: if SuperSlab is pinned (TLS/remote references), defer release to avoid // class_map=255 while pointers are still in-flight. @@ -101,6 +135,39 @@ shared_pool_release_slab(SuperSlab* ss, int slab_idx) } #endif + if (class_idx == 7) { + ss_slab_reset_meta_for_tiny(ss, slab_idx, class_idx); +#if HAKMEM_BUILD_RELEASE + static _Atomic uint32_t rel_c7_reset_logs = 0; + uint32_t rn = atomic_fetch_add_explicit(&rel_c7_reset_logs, 1, memory_order_relaxed); + if (rn < 4) { + TinySlabMeta* m = &ss->slabs[slab_idx]; + fprintf(stderr, + "[REL_C7_RELEASE_RESET] ss=%p slab=%d used=%u cap=%u carved=%u freelist=%p\n", + (void*)ss, + slab_idx, + (unsigned)m->used, + (unsigned)m->capacity, + (unsigned)m->carved, + m->freelist); + } +#else + static _Atomic uint32_t dbg_c7_reset_logs = 0; + uint32_t rn = atomic_fetch_add_explicit(&dbg_c7_reset_logs, 1, memory_order_relaxed); + if (rn < 4) { + TinySlabMeta* m = &ss->slabs[slab_idx]; + fprintf(stderr, + "[DBG_C7_RELEASE_RESET] ss=%p slab=%d used=%u cap=%u carved=%u freelist=%p\n", + (void*)ss, + slab_idx, + (unsigned)m->used, + (unsigned)m->capacity, + (unsigned)m->carved, + m->freelist); + } +#endif + } + // Find SharedSSMeta for this SuperSlab SharedSSMeta* sp_meta = NULL; uint32_t count = atomic_load_explicit(&g_shared_pool.ss_meta_count, memory_order_relaxed); diff --git a/core/hakmem_tiny.c b/core/hakmem_tiny.c index 9dd79611..248ba875 100644 --- a/core/hakmem_tiny.c +++ b/core/hakmem_tiny.c @@ -25,6 +25,7 @@ #include "front/tiny_heap_v2.h" #include "tiny_tls_guard.h" #include "tiny_ready.h" +#include "box/c7_meta_used_counter_box.h" #include "hakmem_tiny_tls_list.h" #include "hakmem_tiny_remote_target.h" // Phase 2C-1: Remote target queue #include "hakmem_tiny_bg_spill.h" // Phase 2C-2: Background spill queue @@ -334,6 +335,7 @@ static inline void* hak_tiny_alloc_superslab_try_fast(int class_idx) { size_t block_size = tiny_stride_for_class(meta->class_idx); void* block = tls->slab_base + ((size_t)meta->used * block_size); meta->used++; + c7_meta_used_note(meta->class_idx, C7_META_USED_SRC_FRONT); // Track active blocks in SuperSlab for conservative reclamation ss_active_inc(tls->ss); return block; diff --git a/core/hakmem_tiny_alloc_new.inc b/core/hakmem_tiny_alloc_new.inc index 9d7886ea..ac376b18 100644 --- a/core/hakmem_tiny_alloc_new.inc +++ b/core/hakmem_tiny_alloc_new.inc @@ -17,6 +17,7 @@ // Phase E1-CORRECT: Box API for next pointer operations #include "box/tiny_next_ptr_box.h" #include "front/tiny_heap_v2.h" +#include "box/c7_meta_used_counter_box.h" // Debug counters (thread-local) static __thread uint64_t g_3layer_bump_hits = 0; @@ -265,6 +266,7 @@ static void* tiny_alloc_slow_new(int class_idx) { meta->freelist = tiny_next_read(node); // Phase E1-CORRECT: Box API items[got++] = node; meta->used++; + c7_meta_used_note(class_idx, C7_META_USED_SRC_FRONT); } // Then linear carve (KEY OPTIMIZATION - direct array fill!) @@ -285,6 +287,11 @@ static void* tiny_alloc_slow_new(int class_idx) { } meta->used += need; // Reserve to TLS; not active until returned to user + if (class_idx == 7) { + for (uint32_t i = 0; i < need; ++i) { + c7_meta_used_note(class_idx, C7_META_USED_SRC_FRONT); + } + } } if (got == 0) { diff --git a/core/hakmem_tiny_refill.inc.h b/core/hakmem_tiny_refill.inc.h index b86baa3f..d023eee8 100644 --- a/core/hakmem_tiny_refill.inc.h +++ b/core/hakmem_tiny_refill.inc.h @@ -18,6 +18,7 @@ #include "tiny_box_geometry.h" #include "superslab/superslab_inline.h" // Provides hak_super_lookup() and SUPERSLAB_MAGIC #include "box/tls_sll_box.h" +#include "box/c7_meta_used_counter_box.h" #include "box/tiny_header_box.h" // Header Box: Single Source of Truth for header operations #include "box/tiny_front_config_box.h" // Phase 7-Step6-Fix: Config macros for dead code elimination #include "hakmem_tiny_integrity.h" @@ -94,6 +95,39 @@ static inline void tiny_debug_validate_node_base(int class_idx, void* node, cons } #endif +static inline void c7_log_used_assign_cap(TinySlabMeta* meta, + int class_idx, + const char* tag) { + if (__builtin_expect(class_idx != 7, 1)) { + return; + } +#if HAKMEM_BUILD_RELEASE + static _Atomic uint32_t rel_logs = 0; + uint32_t n = atomic_fetch_add_explicit(&rel_logs, 1, memory_order_relaxed); + if (n < 4) { + fprintf(stderr, + "[REL_C7_USED_ASSIGN] tag=%s used=%u cap=%u carved=%u freelist=%p\n", + tag, + (unsigned)meta->used, + (unsigned)meta->capacity, + (unsigned)meta->carved, + meta->freelist); + } +#else + static _Atomic uint32_t dbg_logs = 0; + uint32_t n = atomic_fetch_add_explicit(&dbg_logs, 1, memory_order_relaxed); + if (n < 4) { + fprintf(stderr, + "[DBG_C7_USED_ASSIGN] tag=%s used=%u cap=%u carved=%u freelist=%p\n", + tag, + (unsigned)meta->used, + (unsigned)meta->capacity, + (unsigned)meta->carved, + meta->freelist); + } +#endif +} + // ========= superslab_tls_bump_fast ========= // // Ultra bump shadow: current slabが freelist 空で carvedcarved = (uint16_t)(carved + (uint16_t)chunk); meta->used = (uint16_t)(meta->used + (uint16_t)chunk); + if (class_idx == 7) { + for (uint32_t i = 0; i < chunk; ++i) { + c7_meta_used_note(class_idx, C7_META_USED_SRC_FRONT); + } + } ss_active_add(tls->ss, chunk); #if HAKMEM_DEBUG_COUNTERS g_bump_arms[class_idx]++; @@ -365,8 +404,10 @@ int sll_refill_small_from_ss(int class_idx, int max_take) meta->freelist = next_raw; meta->used++; + c7_meta_used_note(class_idx, C7_META_USED_SRC_FRONT); if (__builtin_expect(meta->used > meta->capacity, 0)) { // 異常検出時はロールバックして終了(fail-fast 回避のため静かに中断) + c7_log_used_assign_cap(meta, class_idx, "FREELIST_OVERRUN"); meta->used = meta->capacity; break; } @@ -414,7 +455,9 @@ int sll_refill_small_from_ss(int class_idx, int max_take) meta->carved++; meta->used++; + c7_meta_used_note(class_idx, C7_META_USED_SRC_FRONT); if (__builtin_expect(meta->used > meta->capacity, 0)) { + c7_log_used_assign_cap(meta, class_idx, "CARVE_OVERRUN"); meta->used = meta->capacity; break; } diff --git a/core/refill/ss_refill_fc.h b/core/refill/ss_refill_fc.h index 57a086f8..6eaca762 100644 --- a/core/refill/ss_refill_fc.h +++ b/core/refill/ss_refill_fc.h @@ -33,6 +33,7 @@ #ifndef HEADER_CLASS_MASK #define HEADER_CLASS_MASK 0x0F #endif +#include "../box/c7_meta_used_counter_box.h" // ======================================================================== // REFILL CONTRACT: ss_refill_fc_fill() - Standard Refill Entry Point @@ -131,12 +132,14 @@ static inline int ss_refill_fc_fill(int class_idx, int want) { p = meta->freelist; meta->freelist = tiny_next_read(class_idx, p); meta->used++; + c7_meta_used_note(class_idx, C7_META_USED_SRC_FRONT); } // Option B: Carve new block (if capacity available) else if (meta->carved < meta->capacity) { p = (void*)(slab_base + (meta->carved * stride)); meta->carved++; meta->used++; + c7_meta_used_note(class_idx, C7_META_USED_SRC_FRONT); } // Option C: Slab exhausted, need new slab else { diff --git a/core/slab_handle.h b/core/slab_handle.h index 1dde81e2..8f99ef50 100644 --- a/core/slab_handle.h +++ b/core/slab_handle.h @@ -9,6 +9,7 @@ #include "tiny_debug_ring.h" #include "tiny_remote.h" #include "box/tiny_next_ptr_box.h" // Box API: next pointer read/write +#include "box/c7_meta_used_counter_box.h" extern int g_debug_remote_guard; extern int g_tiny_safe_free_strict; @@ -311,6 +312,7 @@ static inline void* slab_freelist_pop(SlabHandle* h) { void* next = tiny_next_read(h->meta->class_idx, ptr); // Box API: next pointer read h->meta->freelist = next; h->meta->used++; + c7_meta_used_note(h->meta->class_idx, C7_META_USED_SRC_FRONT); // Optional freelist mask clear when freelist becomes empty do { static int g_mask_en2 = -1; diff --git a/core/superslab_backend.c b/core/superslab_backend.c index 47ff229d..d38283c4 100644 --- a/core/superslab_backend.c +++ b/core/superslab_backend.c @@ -4,6 +4,10 @@ // Date: 2025-11-28 #include "hakmem_tiny_superslab_internal.h" +#include "box/c7_meta_used_counter_box.h" +#include + +static _Atomic uint32_t g_c7_backend_calls = 0; // Note: Legacy backend moved to archive/superslab_backend_legacy.c (not built). @@ -83,6 +87,20 @@ void* hak_tiny_alloc_superslab_backend_shared(int class_idx) return NULL; } + if (class_idx == 7) { + uint32_t n = atomic_fetch_add_explicit(&g_c7_backend_calls, 1, memory_order_relaxed); + if (n < 8) { + fprintf(stderr, + "[REL_C7_BACKEND_CALL] cls=%d meta_cls=%u used=%u cap=%u ss=%p slab=%d\n", + class_idx, + (unsigned)meta->class_idx, + (unsigned)meta->used, + (unsigned)meta->capacity, + (void*)ss, + slab_idx); + } + } + // Simple bump allocation within this slab. if (meta->used >= meta->capacity) { // Slab exhausted: in minimal Phase12-2 backend we do not loop; @@ -101,6 +119,7 @@ void* hak_tiny_alloc_superslab_backend_shared(int class_idx) uint8_t* base = (uint8_t*)ss + slab_base_off + offset; meta->used++; + c7_meta_used_note(class_idx, C7_META_USED_SRC_BACKEND); atomic_fetch_add_explicit(&ss->total_active_blocks, 1, memory_order_relaxed); HAK_RET_ALLOC_BLOCK_TRACED(class_idx, base, ALLOC_PATH_BACKEND); diff --git a/core/superslab_slab.c b/core/superslab_slab.c index ef4f8742..7194003a 100644 --- a/core/superslab_slab.c +++ b/core/superslab_slab.c @@ -6,6 +6,7 @@ #include "hakmem_tiny_superslab_internal.h" #include "box/slab_recycling_box.h" #include "hakmem_env_cache.h" // Priority-2: ENV cache (eliminate syscalls) +#include // ============================================================================ // Remote Drain (MPSC queue to freelist conversion) @@ -175,6 +176,37 @@ void superslab_init_slab(SuperSlab* ss, int slab_idx, size_t block_size, uint32_ } } +#if HAKMEM_BUILD_RELEASE + static _Atomic int rel_c7_init_logged = 0; + if (meta->class_idx == 7 && + atomic_load_explicit(&rel_c7_init_logged, memory_order_relaxed) == 0) { + fprintf(stderr, + "[REL_C7_INIT] ss=%p slab=%d cls=%u cap=%u used=%u carved=%u stride=%zu\n", + (void*)ss, + slab_idx, + (unsigned)meta->class_idx, + (unsigned)meta->capacity, + (unsigned)meta->used, + (unsigned)meta->carved, + stride); + atomic_store_explicit(&rel_c7_init_logged, 1, memory_order_relaxed); + } +#else + static __thread int dbg_c7_init_logged = 0; + if (meta->class_idx == 7 && dbg_c7_init_logged == 0) { + fprintf(stderr, + "[DBG_C7_INIT] ss=%p slab=%d cls=%u cap=%u used=%u carved=%u stride=%zu\n", + (void*)ss, + slab_idx, + (unsigned)meta->class_idx, + (unsigned)meta->capacity, + (unsigned)meta->used, + (unsigned)meta->carved, + stride); + dbg_c7_init_logged = 1; + } +#endif + superslab_activate_slab(ss, slab_idx); } diff --git a/core/tiny_superslab_alloc.inc.h b/core/tiny_superslab_alloc.inc.h index b1f1a08e..8662289c 100644 --- a/core/tiny_superslab_alloc.inc.h +++ b/core/tiny_superslab_alloc.inc.h @@ -7,6 +7,8 @@ #include "box/superslab_expansion_box.h" // Box E: Expansion with TLS state guarantee #include "box/tiny_next_ptr_box.h" // Box API: Next pointer read/write +#include "box/tiny_tls_carve_one_block_box.h" // Box: Shared TLS carve helper +#include "box/c7_meta_used_counter_box.h" // Box: C7 meta->used telemetry #include "hakmem_tiny_superslab_constants.h" #include "tiny_box_geometry.h" // Box 3: Geometry & Capacity Calculator" #include "tiny_debug_api.h" // Guard/failfast declarations @@ -33,6 +35,7 @@ static inline void* superslab_alloc_from_slab(SuperSlab* ss, int slab_idx) { uint8_t* base = tiny_slab_base_for_geometry(ss, slab_idx); void* block = tiny_block_at_index(base, meta->used, unit_sz); meta->used++; + c7_meta_used_note(cls, C7_META_USED_SRC_FRONT); ss_active_inc(ss); HAK_RET_ALLOC(cls, block); } @@ -105,6 +108,7 @@ static inline void* superslab_alloc_from_slab(SuperSlab* ss, int slab_idx) { } #endif meta->used++; + c7_meta_used_note(meta->class_idx, C7_META_USED_SRC_FRONT); void* user = #if HAKMEM_TINY_HEADER_CLASSIDX tiny_region_id_write_header(block_base, meta->class_idx); @@ -157,6 +161,7 @@ static inline void* superslab_alloc_from_slab(SuperSlab* ss, int slab_idx) { meta->freelist = tiny_next_read(meta->class_idx, block); meta->used++; + c7_meta_used_note(meta->class_idx, C7_META_USED_SRC_FRONT); if (__builtin_expect(tiny_refill_failfast_level() >= 2, 0) && __builtin_expect(meta->used > meta->capacity, 0)) { @@ -294,54 +299,33 @@ static inline void* hak_tiny_alloc_superslab(int class_idx) { } // Fast path: linear carve from current TLS slab - if (meta && meta->freelist == NULL && meta->used < meta->capacity && tls->slab_base) { - size_t block_size = tiny_stride_for_class(meta->class_idx); - uint8_t* base = tls->slab_base; - void* block = base + ((size_t)meta->used * block_size); - meta->used++; - - if (__builtin_expect(tiny_refill_failfast_level() >= 2, 0)) { - uintptr_t base_ss = (uintptr_t)tls->ss; - size_t ss_size = (size_t)1ULL << tls->ss->lg_size; - uintptr_t p = (uintptr_t)block; - int in_range = (p >= base_ss) && (p < base_ss + ss_size); - int aligned = ((p - (uintptr_t)base) % block_size) == 0; - int idx_ok = (tls->slab_idx >= 0) && - (tls->slab_idx < ss_slabs_capacity(tls->ss)); - if (!in_range || !aligned || !idx_ok || meta->used > meta->capacity) { - tiny_failfast_abort_ptr("alloc_ret_align", - tls->ss, - tls->slab_idx, - block, - "superslab_tls_invariant"); + if (meta && tls->slab_base) { + TinyTLSCarveOneResult carve = tiny_tls_carve_one_block(tls, class_idx); + if (carve.block) { +#if !HAKMEM_BUILD_RELEASE + if (__builtin_expect(g_debug_remote_guard, 0)) { + const char* tag = (carve.path == TINY_TLS_CARVE_PATH_FREELIST) + ? "freelist_alloc" + : "linear_alloc"; + tiny_remote_track_on_alloc(tls->ss, slab_idx, carve.block, tag, 0); + tiny_remote_assert_not_remote(tls->ss, slab_idx, carve.block, tag, 0); } - } +#endif - ss_active_inc(tls->ss); - ROUTE_MARK(11); ROUTE_COMMIT(class_idx, 0x60); - HAK_RET_ALLOC(class_idx, block); - } - - // Freelist path from current TLS slab - if (meta && meta->freelist) { - void* block = meta->freelist; - if (__builtin_expect(g_tiny_safe_free, 0)) { - size_t blk = tiny_stride_for_class(meta->class_idx); - uint8_t* base = tiny_slab_base_for_geometry(tls->ss, tls->slab_idx); - uintptr_t delta = (uintptr_t)block - (uintptr_t)base; - int align_ok = ((delta % blk) == 0); - int range_ok = (delta / blk) < meta->capacity; - if (!align_ok || !range_ok) { - if (g_tiny_safe_free_strict) { raise(SIGUSR2); return NULL; } - return NULL; +#if HAKMEM_TINY_SS_TLS_HINT + { + void* ss_base = (void*)tls->ss; + size_t ss_size = (size_t)1ULL << tls->ss->lg_size; + tls_ss_hint_update(tls->ss, ss_base, ss_size); } +#endif + if (carve.path == TINY_TLS_CARVE_PATH_LINEAR) { + ROUTE_MARK(11); ROUTE_COMMIT(class_idx, 0x60); + } else if (carve.path == TINY_TLS_CARVE_PATH_FREELIST) { + ROUTE_MARK(12); ROUTE_COMMIT(class_idx, 0x61); + } + HAK_RET_ALLOC(class_idx, carve.block); } - void* next = tiny_next_read(class_idx, block); - meta->freelist = next; - meta->used++; - ss_active_inc(tls->ss); - ROUTE_MARK(12); ROUTE_COMMIT(class_idx, 0x61); - HAK_RET_ALLOC(class_idx, block); } // Slow path: acquire a new slab via shared pool @@ -363,6 +347,7 @@ static inline void* hak_tiny_alloc_superslab(int class_idx) { size_t block_size = tiny_stride_for_class(meta->class_idx); void* block = tiny_block_at_index(tls->slab_base, meta->used, block_size); meta->used++; + c7_meta_used_note(meta->class_idx, C7_META_USED_SRC_FRONT); ss_active_inc(ss); HAK_RET_ALLOC(class_idx, block); }