Fix C7 warm/TLS Release path and unify debug instrumentation

2025-12-05 23:41:01 +09:00
parent 96c2988381
commit d17ec46628
29 changed files with 1314 additions and 123 deletions
--- a/CURRENT_TASK.md
+++ b/CURRENT_TASK.md
@ -1,4 +1,4 @@
-## HAKMEM 状況メモ (2025-12-05 更新)
+## HAKMEM 状況メモ (2025-12-05 更新 / C7 Warm/TLS Bind 反映)
 ### 現在の状態（Tiny / Superslab / Warm Pool）
 - Tiny Front / Superslab / Shared Pool は Box Theory 準拠で 3 層構造に整理済み（HOT/WARM/COLD）。
@ -27,10 +27,26 @@
  - `core/box/tiny_page_box.h` / `core/box/tiny_page_box.c` を追加し、`HAKMEM_TINY_PAGE_BOX_CLASSES` で有効クラスを制御できる Page Box を実装。
  - `tiny_tls_bind_slab()` から `tiny_page_box_on_new_slab()` を呼び出し、TLS が bind した C7 slab を per-thread の page pool に登録。
  - `unified_cache_refill()` の先頭に Page Box 経路を追加し、C7 では「TLS が掴んでいるページ内 freelist/carve」からバッチ供給を試みてから Warm Pool / Shared Pool に落ちるようにした（Box 境界は `Tiny Page Box → Warm Pool → Shared Pool` の順序を維持）。
 - TLS Bind Box の導入:
  - `core/box/ss_tls_bind_box.h` に `ss_tls_bind_one()` を追加し、「Superslab + slab_idx → TLS」のバインド処理（`superslab_init_slab` / `meta->class_idx` 設定 / `tiny_tls_bind_slab`）を 1 箇所に集約。
  - `superslab_refill()`（Shared Pool 経路）および Warm Pool 実験経路から、この Box を経由して TLS に接続するよう統一。
 - C7 Warm/TLS Bind 経路の実装と検証:
  - `core/front/tiny_unified_cache.c` に C7 専用の Warm/TLS Bind モード（0/1/2）を追加し、Debug では `HAKMEM_WARM_TLS_BIND_C7` で切替可能にした。
  - mode 0: Legacy Warm（レガシー/デバッグ用、C7 では carve 0 が多く非推奨）
  - mode 1: Bind-only（Warm から取得した Superslab を TLS Bind Box 経由でバインドする本番経路）
  - mode 2: Bind+TLS carve（TLS から直接 carve する実験経路）
  - Release ビルドでは常に mode=1 固定。Debug では `HAKMEM_WARM_TLS_BIND_C7=0/1/2` で切替。
 - Warm Pool / Unified Cache の詳細計測:
  - `warm_pool_dbg_box.h` と Unified Cache の計測フックを拡張し、C7 向けに
    - Warm pop 試行/ヒット/実 carve 回数
    - TLS carve 試行/成功/失敗
    - UC ミスを Warm/TLS/Shared 別に分類
    を Debug ビルドで観測可能にした。
  - `bench_random_mixed.c` に `HAKMEM_BENCH_C7_ONLY=1` を追加し、C7 サイズ専用の micro-bench を追加。
 ### 性能の現状（Random Mixed, HEAD）
 - 条件: `bench_random_mixed_hakmem 1000000 256 42`（1T, ws=256, RELEASE, 16–1024B）
-  - HAKMEM: 約 5.0M ops/s
+  - HAKMEM: 約 27.6M ops/s（C7 Warm/TLS 修復後）
  - system malloc: 約 90–100M ops/s
  - mimalloc: 約 120–130M ops/s
 - 条件: `bench_random_mixed_hakmem 1000000 256 42` +
@ -38,26 +54,27 @@
  - HAKMEM Tiny Front: 約 80–90M ops/s（mimalloc と同オーダー）
 - 条件: `bench_random_mixed_hakmem 1000000 256 42` +
  `HAKMEM_BENCH_MIN_SIZE=129 HAKMEM_BENCH_MAX_SIZE=1024`（Tiny C5–C7 のみ）
-  - HAKMEM: 約 4.7–4.8M ops/s
+  - HAKMEM: 約 28.0M ops/s（Warm/TLS ガード適用後）
 - 条件: C7 専用 micro-bench（Debug, `HAKMEM_BENCH_C7_ONLY=1 HAKMEM_TINY_PROFILE=full HAKMEM_WARM_C7_MAX=8 HAKMEM_WARM_C7_PREFETCH=4` ほか）
  - mode 0（Legacy Warm）: 約 2.0M ops/s、C7 Warm ヒット 0・Shared Pool ロック多数（`slab_carve_from_ss` が 0 を頻発）
  - mode 1（Bind-only）: 約 20M ops/s（iters=200K, ws=32）、Warm hit ≈100%・Shared Pool ロック 5 回まで減少
  - mode 2（Bind+TLS carve 実験）: mode 1 と同等〜わずかに上（UC ミスは増えるが `uc_miss_tls` に集中し、avg_refill は短縮）
 - 条件: C7 専用 micro-bench（Release, `HAKMEM_BENCH_C7_ONLY=1 HAKMEM_TINY_PROFILE=full HAKMEM_WARM_C7_MAX=8 HAKMEM_WARM_C7_PREFETCH=4`）
  - HAKMEM: 約 18.8M ops/s（空スラブ強制ガード + リセット導入後、Debug と同オーダーまで回復）
 - 結論:
  - Tiny front 自体（8–128B）は十分速く、mimalloc と同オーダーまで出ている。
-  - 129–1024B の Tiny C5–C7 経路で Unified Cache hit=0 / Shared Pool ロック多発というボトルネックがあり、
+  - C5–C7 経路は「満杯 C7 slab を Warm に再供給していた」問題を空スラブ限定ガード＋Release/Debug 共通リセットで解消し、
-    Random Mixed 全体の性能を支配している。
+    C7-only Release も ~18.8M ops/s に回復。Random Mixed Release も 27M クラスまで改善。
-### 次にやること（優先タスク：C7 Page Box の実効性検証とチューニング）
+### 次にやること（広い条件での安定化確認）
-1. **C7 Page Box 経路の実効性を計測**
+1. `HAKMEM_BENCH_MIN_SIZE=129 HAKMEM_BENCH_MAX_SIZE=1024` や通常の `bench_random_mixed_hakmem 1000000 256 42` で
-   - ENV: `HAKMEM_BENCH_MIN_SIZE=129 HAKMEM_BENCH_MAX_SIZE=1024` ＋ `HAKMEM_MEASURE_UNIFIED_CACHE=1` で
+   空スラブ限定ガードが副作用なく動くかを継続確認（現状 Release で 27–28M ops/s を確認済み）。
-     `bench_random_mixed_hakmem 1000000 256 42` を実行し、C7 の:
+2. ドキュメント更新:
-     - Unified Cache refill 回数・平均 cycles
+   - Release だけ C7 Warm が死んでいた根本原因 = 満杯 C7 slab を Shared Pool がリセットせず再供給していた。
-     - `shared_pool_acquire_slab(C7)` のロック回数
+   - Acquire の空スラブ強制ガード＋Release/Debug 共通リセットで C7-only Release が ~18.8M ops/s まで回復した。
-     を、Page Box ON/OFF（`HAKMEM_TINY_PAGE_BOX_CLASSES=` 未設定 vs `7`）で比較する。
+3. 次フェーズ案:
-2. **C7 の Unified Cache 容量・バッチサイズのチューニング**
+   - C5/C6 でも同様の Warm/TLS 最適化・空スラブガードを適用するか、
-   - `HAKMEM_TINY_UNIFIED_C7` と `unified_cache_refill()` の `max_batch` 設定を変えつつ、
+   - Random Mixed 全体のボトルネック（Shared Pool ロック/Wrapper/mid-size path など）を洗うかを選択。
     Page Box ON 時の C7 ヒット率・Shared Pool ロック回数・throughput を観測し、C7 にとって最適な容量/バッチサイズを探る。
 3. **Page Box を C5/C6 に拡張するかの判断**
   - C7 で十分な効果（Shared Pool ロック大幅減 + throughput 向上）が得られた場合、
     `HAKMEM_TINY_PAGE_BOX_CLASSES=5,6,7` を試し、C5/C6 も Tiny-Plus 化したときの安定性・性能を確認する。
   - 問題がなければ、デフォルトプロファイルを「C5–C7 Page Box 有効」に近づけるかを検討する。
 ### メモ
 - ページフォルト問題は Prefault Box + ウォームアップで一定水準まで解消済みで、現在の主ボトルネックはユーザー空間の箱（Unified Cache / free / Pool）側に移っている。
--- a/README_PERF_ANALYSIS.md
+++ b/README_PERF_ANALYSIS.md
@ -1,5 +1,8 @@
 # HAKMEM Allocator Performance Analysis Results
 **最新メモ (2025-12-05)**: C7 Warm/TLS Bind は本番経路を Bind-only (mode=1) に統一。Debug では `HAKMEM_WARM_TLS_BIND_C7=0/1/2` で切替可能だが、Release は常に mode=1 固定。C7-only ワークロードでは mode=1 が legacy (mode=0) 比で ~4–10x 速く、mode=2 は TLS carve 実験として残置。  
 **追記 (2025-12-05, Release 修復)**: Release だけ C7 Warm が死んでいた原因は「満杯 C7 slab が Shared Pool に居残り、空スラブが Warm に渡っていなかった」こと。Acquire で C7 は空スラブ限定、Release でメタをリセットするガードを導入し、C7-only Release で ~18.8M ops/s、Random Mixed Release で ~27–28M ops/s まで回復。
 **分析実施日**: 2025-11-28  
 **分析対象**: HAKMEM allocator (commit 0ce20bb83)  
 **ベンチマーク**: bench_random_mixed (1,000,000 ops, working set=256)  
--- a/bench_random_mixed.c
+++ b/bench_random_mixed.c
@ -13,9 +13,16 @@
 #include <stdint.h>
 #include <time.h>
 #include <string.h>
 #include <stdatomic.h>
 #define C7_META_COUNTER_DEFINE
 #include "core/box/c7_meta_used_counter_box.h"
 #undef C7_META_COUNTER_DEFINE
 #include "core/box/warm_pool_rel_counters_box.h"
 #ifdef USE_HAKMEM
 #include "hakmem.h"
 #include "hakmem_build_flags.h"
 #include "core/box/c7_meta_used_counter_box.h"
 // Box BenchMeta: Benchmark metadata management (bypass hakmem wrapper)
 // Phase 15: Separate BenchMeta (slots array) from CoreAlloc (user workload)
@ -253,6 +260,38 @@ int main(int argc, char** argv){
  extern void tiny_warm_pool_print_stats_public(void);
  tiny_warm_pool_print_stats_public();
  #if HAKMEM_BUILD_RELEASE
  // Minimal Release-side telemetry to verify Warm path usage (C7-only)
  extern _Atomic uint64_t g_rel_c7_warm_pop;
  extern _Atomic uint64_t g_rel_c7_warm_push;
  fprintf(stderr,
          "[REL_C7_CARVE] attempts=%llu success=%llu zero=%llu\n",
          (unsigned long long)warm_pool_rel_c7_carve_attempts(),
          (unsigned long long)warm_pool_rel_c7_carve_successes(),
          (unsigned long long)warm_pool_rel_c7_carve_zeroes());
  fprintf(stderr,
          "[REL_C7_WARM] pop=%llu push=%llu\n",
          (unsigned long long)atomic_load_explicit(&g_rel_c7_warm_pop, memory_order_relaxed),
          (unsigned long long)atomic_load_explicit(&g_rel_c7_warm_push, memory_order_relaxed));
  fprintf(stderr,
          "[REL_C7_WARM_PREFILL] calls=%llu slabs=%llu\n",
          (unsigned long long)warm_pool_rel_c7_prefill_calls(),
          (unsigned long long)warm_pool_rel_c7_prefill_slabs());
  fprintf(stderr,
          "[REL_C7_META_USED_INC] total=%llu backend=%llu tls=%llu front=%llu\n",
          (unsigned long long)c7_meta_used_total(),
          (unsigned long long)c7_meta_used_backend(),
          (unsigned long long)c7_meta_used_tls(),
          (unsigned long long)c7_meta_used_front());
  #else
  fprintf(stderr,
          "[DBG_C7_META_USED_INC] total=%llu backend=%llu tls=%llu front=%llu\n",
          (unsigned long long)c7_meta_used_total(),
          (unsigned long long)c7_meta_used_backend(),
          (unsigned long long)c7_meta_used_tls(),
          (unsigned long long)c7_meta_used_front());
  #endif
  // Phase 21-1: Ring cache - DELETED (A/B test: OFF is faster)
  // extern void ring_cache_print_stats(void);
  // ring_cache_print_stats();
--- a/core/box/c7_meta_used_counter_box.h
+++ b/core/box/c7_meta_used_counter_box.h
@ -0,0 +1,59 @@
 // c7_meta_used_counter_box.h
 // Box: C7 meta->used increment counters (Release/Debug共通)
 #pragma once
 #include <stdatomic.h>
 #include <stdint.h>
 typedef enum C7MetaUsedSource {
    C7_META_USED_SRC_UNKNOWN = 0,
    C7_META_USED_SRC_BACKEND = 1,
    C7_META_USED_SRC_TLS = 2,
    C7_META_USED_SRC_FRONT = 3,
 } C7MetaUsedSource;
 #ifdef C7_META_COUNTER_DEFINE
 #define C7_META_COUNTER_EXTERN
 #else
 #define C7_META_COUNTER_EXTERN extern
 #endif
 C7_META_COUNTER_EXTERN _Atomic uint64_t g_c7_meta_used_inc_total;
 C7_META_COUNTER_EXTERN _Atomic uint64_t g_c7_meta_used_inc_backend;
 C7_META_COUNTER_EXTERN _Atomic uint64_t g_c7_meta_used_inc_tls;
 C7_META_COUNTER_EXTERN _Atomic uint64_t g_c7_meta_used_inc_front;
 static inline void c7_meta_used_note(int class_idx, C7MetaUsedSource src) {
    if (__builtin_expect(class_idx != 7, 1)) {
        return;
    }
    atomic_fetch_add_explicit(&g_c7_meta_used_inc_total, 1, memory_order_relaxed);
    switch (src) {
        case C7_META_USED_SRC_BACKEND:
            atomic_fetch_add_explicit(&g_c7_meta_used_inc_backend, 1, memory_order_relaxed);
            break;
        case C7_META_USED_SRC_TLS:
            atomic_fetch_add_explicit(&g_c7_meta_used_inc_tls, 1, memory_order_relaxed);
            break;
        case C7_META_USED_SRC_FRONT:
            atomic_fetch_add_explicit(&g_c7_meta_used_inc_front, 1, memory_order_relaxed);
            break;
        default:
            break;
    }
 }
 static inline uint64_t c7_meta_used_total(void) {
    return atomic_load_explicit(&g_c7_meta_used_inc_total, memory_order_relaxed);
 }
 static inline uint64_t c7_meta_used_backend(void) {
    return atomic_load_explicit(&g_c7_meta_used_inc_backend, memory_order_relaxed);
 }
 static inline uint64_t c7_meta_used_tls(void) {
    return atomic_load_explicit(&g_c7_meta_used_inc_tls, memory_order_relaxed);
 }
 static inline uint64_t c7_meta_used_front(void) {
    return atomic_load_explicit(&g_c7_meta_used_inc_front, memory_order_relaxed);
 }
 #undef C7_META_COUNTER_EXTERN
--- a/core/box/carve_push_box.c
+++ b/core/box/carve_push_box.c
@ -15,6 +15,7 @@
 #include "tiny_header_box.h"        // Header Box: Single Source of Truth for header operations
 #include "../tiny_refill_opt.h"     // TinyRefillChain, trc_linear_carve()
 #include "../tiny_box_geometry.h"   // tiny_stride_for_class(), tiny_slab_base_for_geometry()
 #include "c7_meta_used_counter_box.h"
 // External declarations
 extern __thread TinyTLSSlab g_tls_slabs[TINY_NUM_CLASSES];
@ -191,6 +192,7 @@ uint32_t box_carve_and_push_with_freelist(int class_idx, uint32_t want) {
        void* p = meta->freelist;
        meta->freelist = tiny_next_read(class_idx, p);
        meta->used++;
        c7_meta_used_note(class_idx, C7_META_USED_SRC_FRONT);
        // CRITICAL FIX: Restore header BEFORE pushing to TLS SLL
        // Freelist blocks may have stale data at offset 0
--- a/core/box/carve_push_box.d
+++ b/core/box/carve_push_box.d
@ -41,7 +41,7 @@ core/box/carve_push_box.o: core/box/carve_push_box.c \
 core/box/../tiny_region_id.h core/box/../hakmem_tiny_integrity.h \
 core/box/../box/slab_freelist_atomic.h core/box/tiny_header_box.h \
 core/box/../tiny_refill_opt.h core/box/../box/tls_sll_box.h \
- core/box/../tiny_box_geometry.h
+ core/box/../tiny_box_geometry.h core/box/c7_meta_used_counter_box.h
 core/box/../hakmem_tiny.h:
 core/box/../hakmem_build_flags.h:
 core/box/../hakmem_trace.h:
@ -116,3 +116,4 @@ core/box/tiny_header_box.h:
 core/box/../tiny_refill_opt.h:
 core/box/../box/tls_sll_box.h:
 core/box/../tiny_box_geometry.h:
 core/box/c7_meta_used_counter_box.h:
--- a/core/box/slab_carve_box.h
+++ b/core/box/slab_carve_box.h
@ -9,12 +9,15 @@
 #include <stdint.h>
 #include <string.h>
 #include <stdio.h>
 #include <stdatomic.h>
 #include "../hakmem_tiny_config.h"
 #include "../hakmem_tiny_superslab.h"
 #include "../superslab/superslab_inline.h"
 #include "../tiny_box_geometry.h"
 #include "../box/tiny_next_ptr_box.h"
 #include "../box/pagefault_telemetry_box.h"
 #include "c7_meta_used_counter_box.h"
 // ============================================================================
 // Slab Carving API (Inline for Hot Path)
@ -46,11 +49,31 @@ static inline int slab_carve_from_ss(int class_idx, SuperSlab* ss,
    // Find an available slab in this SuperSlab
    int cap = ss_slabs_capacity(ss);
    #if HAKMEM_BUILD_RELEASE
    static _Atomic int rel_c7_meta_logged = 0;
    TinySlabMeta* rel_c7_meta = NULL;
    int rel_c7_meta_idx = -1;
    #else
    static __thread int dbg_c7_meta_logged = 0;
    TinySlabMeta* dbg_c7_meta = NULL;
    int dbg_c7_meta_idx = -1;
    #endif
    for (int slab_idx = 0; slab_idx < cap; slab_idx++) {
        TinySlabMeta* meta = &ss->slabs[slab_idx];
        // Check if this slab matches our class and has capacity
        if (meta->class_idx != (uint8_t)class_idx) continue;
        #if HAKMEM_BUILD_RELEASE
        if (class_idx == 7 && atomic_load_explicit(&rel_c7_meta_logged, memory_order_relaxed) == 0 && !rel_c7_meta) {
            rel_c7_meta = meta;
            rel_c7_meta_idx = slab_idx;
        }
        #else
        if (class_idx == 7 && dbg_c7_meta_logged == 0 && !dbg_c7_meta) {
            dbg_c7_meta = meta;
            dbg_c7_meta_idx = slab_idx;
        }
        #endif
        if (meta->used >= meta->capacity && !meta->freelist) continue;
        // Carve blocks from this slab
@ -73,6 +96,7 @@ static inline int slab_carve_from_ss(int class_idx, SuperSlab* ss,
                meta->freelist = next_node;
                meta->used++;
                c7_meta_used_note(class_idx, C7_META_USED_SRC_FRONT);
            } else if (meta->carved < meta->capacity) {
                // Linear carve
@ -84,6 +108,7 @@ static inline int slab_carve_from_ss(int class_idx, SuperSlab* ss,
                meta->carved++;
                meta->used++;
                c7_meta_used_note(class_idx, C7_META_USED_SRC_FRONT);
            } else {
                break;  // This slab exhausted
@ -99,6 +124,48 @@ static inline int slab_carve_from_ss(int class_idx, SuperSlab* ss,
        // If this slab had no freelist and no carved capacity, continue to next
    }
 #if !HAKMEM_BUILD_RELEASE
    static __thread int dbg_c7_slab_carve_zero_logs = 0;
    if (class_idx == 7 && dbg_c7_slab_carve_zero_logs < 10) {
        fprintf(stderr, "[C7_SLAB_CARVE_ZERO] ss=%p no blocks carved\n", (void*)ss);
        dbg_c7_slab_carve_zero_logs++;
    }
 #endif
    #if HAKMEM_BUILD_RELEASE
    if (class_idx == 7 &&
        atomic_load_explicit(&rel_c7_meta_logged, memory_order_relaxed) == 0 &&
        rel_c7_meta) {
        size_t bs = tiny_stride_for_class(class_idx);
        fprintf(stderr,
                "[REL_C7_CARVE_META] ss=%p slab=%d cls=%u used=%u cap=%u carved=%u freelist=%p stride=%zu slabs_cap=%d\n",
                (void*)ss,
                rel_c7_meta_idx,
                (unsigned)rel_c7_meta->class_idx,
                (unsigned)rel_c7_meta->used,
                (unsigned)rel_c7_meta->capacity,
                (unsigned)rel_c7_meta->carved,
                rel_c7_meta->freelist,
                bs,
                cap);
        atomic_store_explicit(&rel_c7_meta_logged, 1, memory_order_relaxed);
    }
    #else
    if (class_idx == 7 && dbg_c7_meta_logged == 0 && dbg_c7_meta) {
        size_t bs = tiny_stride_for_class(class_idx);
        fprintf(stderr,
                "[DBG_C7_CARVE_META] ss=%p slab=%d cls=%u used=%u cap=%u carved=%u freelist=%p stride=%zu slabs_cap=%d\n",
                (void*)ss,
                dbg_c7_meta_idx,
                (unsigned)dbg_c7_meta->class_idx,
                (unsigned)dbg_c7_meta->used,
                (unsigned)dbg_c7_meta->capacity,
                (unsigned)dbg_c7_meta->carved,
                dbg_c7_meta->freelist,
                bs,
                cap);
        dbg_c7_meta_logged = 1;
    }
    #endif
    return 0;  // No slab in this SuperSlab had available capacity
 }
--- a/core/box/ss_slab_reset_box.h
+++ b/core/box/ss_slab_reset_box.h
@ -0,0 +1,26 @@
 // ss_slab_reset_box.h
 // Box: Reset TinySlabMeta for reuse (C7 diagnostics-friendly)
 #pragma once
 #include "ss_slab_meta_box.h"
 #include "../superslab/superslab_inline.h"
 #include <stdatomic.h>
 static inline void ss_slab_reset_meta_for_tiny(SuperSlab* ss,
                                               int slab_idx,
                                               int class_idx)
 {
    if (!ss) return;
    if (slab_idx < 0 || slab_idx >= ss_slabs_capacity(ss)) return;
    TinySlabMeta* meta = &ss->slabs[slab_idx];
    meta->used = 0;
    meta->carved = 0;
    meta->freelist = NULL;
    meta->class_idx = (uint8_t)class_idx;
    ss->class_map[slab_idx] = (uint8_t)class_idx;
    // Reset remote queue state to avoid stale pending frees on reuse.
    atomic_store_explicit(&ss->remote_heads[slab_idx], 0, memory_order_relaxed);
    atomic_store_explicit(&ss->remote_counts[slab_idx], 0, memory_order_relaxed);
 }
--- a/core/box/ss_tls_bind_box.h
+++ b/core/box/ss_tls_bind_box.h
@ -13,6 +13,7 @@
 #include "../hakmem_tiny_config.h"
 #include "../box/tiny_page_box.h"  // For tiny_page_box_on_new_slab()
 #include <stdio.h>
 #include <stdatomic.h>
 // Forward declaration if not included
 // CRITICAL FIX: type must match core/hakmem_tiny_config.h (const size_t, not uint16_t)
@ -64,9 +65,7 @@ static inline int ss_tls_bind_one(int class_idx,
    // superslab_init_slab() only sets it if meta->class_idx==255.
    // We must explicitly set it to the requested class to avoid C0/C7 confusion.
    TinySlabMeta* meta = &ss->slabs[slab_idx];
 #if !HAKMEM_BUILD_RELEASE
    uint8_t old_cls = meta->class_idx;
 #endif
    meta->class_idx = (uint8_t)class_idx;
 #if !HAKMEM_BUILD_RELEASE
    if (class_idx == 7 && old_cls != class_idx) {
@ -75,6 +74,36 @@ static inline int ss_tls_bind_one(int class_idx,
    }
 #endif
 #if HAKMEM_BUILD_RELEASE
    static _Atomic int rel_c7_bind_logged = 0;
    if (class_idx == 7 &&
        atomic_load_explicit(&rel_c7_bind_logged, memory_order_relaxed) == 0) {
        fprintf(stderr,
                "[REL_C7_BIND] ss=%p slab=%d cls=%u cap=%u used=%u carved=%u\n",
                (void*)ss,
                slab_idx,
                (unsigned)meta->class_idx,
                (unsigned)meta->capacity,
                (unsigned)meta->used,
                (unsigned)meta->carved);
        atomic_store_explicit(&rel_c7_bind_logged, 1, memory_order_relaxed);
    }
 #else
    static __thread int dbg_c7_bind_logged = 0;
    if (class_idx == 7 && dbg_c7_bind_logged == 0) {
        fprintf(stderr,
                "[DBG_C7_BIND] ss=%p slab=%d old_cls=%u new_cls=%u cap=%u used=%u carved=%u\n",
                (void*)ss,
                slab_idx,
                (unsigned)old_cls,
                (unsigned)meta->class_idx,
                (unsigned)meta->capacity,
                (unsigned)meta->used,
                (unsigned)meta->carved);
        dbg_c7_bind_logged = 1;
    }
 #endif
    // Bind this slab to TLS for fast subsequent allocations.
    // Inline implementation of tiny_tls_bind_slab() to avoid header dependencies.
    // Original logic:
@ -109,4 +138,4 @@ static inline int ss_tls_bind_one(int class_idx,
    return 1;
 }
-#endif // HAK_SS_TLS_BIND_BOX_H
+#endif // HAK_SS_TLS_BIND_BOX_H
--- a/core/box/tiny_route_box.c
+++ b/core/box/tiny_route_box.c
@ -4,6 +4,7 @@
 #include <stdlib.h>
 #include <string.h>
 #include <stdio.h>
 // Default: conservative profile (all classes TINY_FIRST).
 // This keeps Tiny in the fast path but always allows Pool fallback.
@ -40,5 +41,16 @@ void tiny_route_init(void)
        //   - 全クラス TINY_FIRST（Tiny を使うが必ず Pool fallbackあり）
        memset(g_tiny_route, ROUTE_TINY_FIRST, sizeof(g_tiny_route));
    }
 }
    #if HAKMEM_BUILD_RELEASE
    static int rel_logged = 0;
    if (!rel_logged) {
        const char* mode =
            (g_tiny_route[7] == ROUTE_TINY_ONLY)  ? "TINY_ONLY"  :
            (g_tiny_route[7] == ROUTE_TINY_FIRST) ? "TINY_FIRST" :
            (g_tiny_route[7] == ROUTE_POOL_ONLY)  ? "POOL_ONLY"  : "UNKNOWN";
        fprintf(stderr, "[REL_C7_ROUTE] profile=%s route=%s\n", profile, mode);
        rel_logged = 1;
    }
    #endif
 }
--- a/core/box/tiny_route_box.h
+++ b/core/box/tiny_route_box.h
@ -19,6 +19,7 @@
 #define TINY_ROUTE_BOX_H
 #include <stdint.h>
 #include <stdio.h>
 // Routing policy per Tiny class.
 typedef enum {
@ -43,8 +44,21 @@ void tiny_route_init(void);
 // Uses simple array lookup; class_idx is masked to [0,7] defensively.
 static inline TinyRoutePolicy tiny_route_get(int class_idx)
 {
-    return (TinyRoutePolicy)g_tiny_route[class_idx & 7];
+    TinyRoutePolicy p = (TinyRoutePolicy)g_tiny_route[class_idx & 7];
    #if HAKMEM_BUILD_RELEASE
    if ((class_idx & 7) == 7) {
        static int rel_route_logged = 0;
        if (!rel_route_logged) {
            const char* mode =
                (p == ROUTE_TINY_ONLY)  ? "TINY_ONLY"  :
                (p == ROUTE_TINY_FIRST) ? "TINY_FIRST" :
                (p == ROUTE_POOL_ONLY)  ? "POOL_ONLY"  : "UNKNOWN";
            fprintf(stderr, "[REL_C7_ROUTE] via tiny_route_get route=%s\n", mode);
            rel_route_logged = 1;
        }
    }
    #endif
    return p;
 }
 #endif // TINY_ROUTE_BOX_H
--- a/core/box/tiny_tls_carve_one_block_box.h
+++ b/core/box/tiny_tls_carve_one_block_box.h
@ -0,0 +1,102 @@
 // tiny_tls_carve_one_block_box.h
 // Box: Shared TLS carve helper (linear or freelist) for Tiny classes.
 #pragma once
 #include "../tiny_tls.h"
 #include "../tiny_box_geometry.h"
 #include "../tiny_debug_api.h"        // tiny_refill_failfast_level(), tiny_failfast_abort_ptr()
 #include "c7_meta_used_counter_box.h" // C7 meta->used telemetry (Release/Debug共通)
 #include "tiny_next_ptr_box.h"
 #include "../superslab/superslab_inline.h"
 #include <stdatomic.h>
 #include <signal.h>
 #if !HAKMEM_BUILD_RELEASE
 extern int g_tiny_safe_free;
 extern int g_tiny_safe_free_strict;
 #endif
 enum {
    TINY_TLS_CARVE_PATH_NONE = 0,
    TINY_TLS_CARVE_PATH_LINEAR = 1,
    TINY_TLS_CARVE_PATH_FREELIST = 2,
 };
 typedef struct TinyTLSCarveOneResult {
    void* block;
    int path;
 } TinyTLSCarveOneResult;
 // Carve one block from the current TLS slab.
 // Returns .block == NULL on failure. path describes which sub-path was taken.
 static inline TinyTLSCarveOneResult
 tiny_tls_carve_one_block(TinyTLSSlab* tls, int class_idx)
 {
    TinyTLSCarveOneResult res = {.block = NULL, .path = TINY_TLS_CARVE_PATH_NONE};
    if (!tls) return res;
    TinySlabMeta* meta = tls->meta;
    if (!meta || !tls->ss || tls->slab_base == NULL) return res;
    if (meta->class_idx != (uint8_t)class_idx) return res;
    if (tls->slab_idx < 0 || tls->slab_idx >= ss_slabs_capacity(tls->ss)) return res;
    // Freelist pop
    if (meta->freelist) {
 #if !HAKMEM_BUILD_RELEASE
        if (__builtin_expect(g_tiny_safe_free, 0)) {
            size_t blk = tiny_stride_for_class(meta->class_idx);
            uint8_t* base = tiny_slab_base_for_geometry(tls->ss, tls->slab_idx);
            uintptr_t delta = (uintptr_t)meta->freelist - (uintptr_t)base;
            int align_ok = ((delta % blk) == 0);
            int range_ok = (delta / blk) < meta->capacity;
            if (!align_ok || !range_ok) {
                if (g_tiny_safe_free_strict) { raise(SIGUSR2); return res; }
                return res;
            }
        }
 #endif
        void* block = meta->freelist;
        meta->freelist = tiny_next_read(class_idx, block);
        meta->used++;
        c7_meta_used_note(meta->class_idx, C7_META_USED_SRC_TLS);
        ss_active_add(tls->ss, 1);
        res.block = block;
        res.path = TINY_TLS_CARVE_PATH_FREELIST;
        return res;
    }
    // Linear carve
    if (meta->used < meta->capacity) {
        size_t block_size = tiny_stride_for_class(meta->class_idx);
        void* block = tiny_block_at_index(tls->slab_base, meta->used, block_size);
 #if !HAKMEM_BUILD_RELEASE
        if (__builtin_expect(tiny_refill_failfast_level() >= 2, 0)) {
            uintptr_t base_ss = (uintptr_t)tls->ss;
            size_t ss_size = (size_t)1ULL << tls->ss->lg_size;
            uintptr_t p = (uintptr_t)block;
            int in_range = (p >= base_ss) && (p < base_ss + ss_size);
            int aligned = ((p - (uintptr_t)tls->slab_base) % block_size) == 0;
            int idx_ok = (tls->slab_idx >= 0) &&
                         (tls->slab_idx < ss_slabs_capacity(tls->ss));
            if (!in_range || !aligned || !idx_ok || meta->used + 1 > meta->capacity) {
                tiny_failfast_abort_ptr("tls_carve_align",
                                        tls->ss,
                                        tls->slab_idx,
                                        block,
                                        "tiny_tls_carve_one_block");
            }
        }
 #endif
        meta->used++;
        c7_meta_used_note(meta->class_idx, C7_META_USED_SRC_TLS);
        ss_active_add(tls->ss, 1);
        res.block = block;
        res.path = TINY_TLS_CARVE_PATH_LINEAR;
        return res;
    }
    return res;
 }
--- a/core/box/warm_pool_dbg_box.h
+++ b/core/box/warm_pool_dbg_box.h
@ -0,0 +1,121 @@
 // warm_pool_dbg_box.h
 // Box: Debug-only counters for C7 Warm Pool instrumentation.
 #pragma once
 #include <stdatomic.h>
 #include <stdint.h>
 #if !HAKMEM_BUILD_RELEASE
 #ifdef WARM_POOL_DBG_DEFINE
 _Atomic uint64_t g_dbg_c7_warm_pop_attempts = 0;
 _Atomic uint64_t g_dbg_c7_warm_pop_hits = 0;
 _Atomic uint64_t g_dbg_c7_warm_pop_carve = 0;
 _Atomic uint64_t g_dbg_c7_tls_carve_attempts = 0;
 _Atomic uint64_t g_dbg_c7_tls_carve_success = 0;
 _Atomic uint64_t g_dbg_c7_tls_carve_fail = 0;
 _Atomic uint64_t g_dbg_c7_uc_miss_warm_refill = 0;
 _Atomic uint64_t g_dbg_c7_uc_miss_tls_refill = 0;
 _Atomic uint64_t g_dbg_c7_uc_miss_shared_refill = 0;
 #else
 extern _Atomic uint64_t g_dbg_c7_warm_pop_attempts;
 extern _Atomic uint64_t g_dbg_c7_warm_pop_hits;
 extern _Atomic uint64_t g_dbg_c7_warm_pop_carve;
 extern _Atomic uint64_t g_dbg_c7_tls_carve_attempts;
 extern _Atomic uint64_t g_dbg_c7_tls_carve_success;
 extern _Atomic uint64_t g_dbg_c7_tls_carve_fail;
 extern _Atomic uint64_t g_dbg_c7_uc_miss_warm_refill;
 extern _Atomic uint64_t g_dbg_c7_uc_miss_tls_refill;
 extern _Atomic uint64_t g_dbg_c7_uc_miss_shared_refill;
 #endif
 static inline void warm_pool_dbg_c7_attempt(void) {
    atomic_fetch_add_explicit(&g_dbg_c7_warm_pop_attempts, 1, memory_order_relaxed);
 }
 static inline void warm_pool_dbg_c7_hit(void) {
    atomic_fetch_add_explicit(&g_dbg_c7_warm_pop_hits, 1, memory_order_relaxed);
 }
 static inline void warm_pool_dbg_c7_carve(void) {
    atomic_fetch_add_explicit(&g_dbg_c7_warm_pop_carve, 1, memory_order_relaxed);
 }
 static inline void warm_pool_dbg_c7_tls_attempt(void) {
    atomic_fetch_add_explicit(&g_dbg_c7_tls_carve_attempts, 1, memory_order_relaxed);
 }
 static inline void warm_pool_dbg_c7_tls_success(void) {
    atomic_fetch_add_explicit(&g_dbg_c7_tls_carve_success, 1, memory_order_relaxed);
 }
 static inline void warm_pool_dbg_c7_tls_fail(void) {
    atomic_fetch_add_explicit(&g_dbg_c7_tls_carve_fail, 1, memory_order_relaxed);
 }
 static inline void warm_pool_dbg_c7_uc_miss_warm(void) {
    atomic_fetch_add_explicit(&g_dbg_c7_uc_miss_warm_refill, 1, memory_order_relaxed);
 }
 static inline void warm_pool_dbg_c7_uc_miss_tls(void) {
    atomic_fetch_add_explicit(&g_dbg_c7_uc_miss_tls_refill, 1, memory_order_relaxed);
 }
 static inline void warm_pool_dbg_c7_uc_miss_shared(void) {
    atomic_fetch_add_explicit(&g_dbg_c7_uc_miss_shared_refill, 1, memory_order_relaxed);
 }
 static inline uint64_t warm_pool_dbg_c7_attempts(void) {
    return atomic_load_explicit(&g_dbg_c7_warm_pop_attempts, memory_order_relaxed);
 }
 static inline uint64_t warm_pool_dbg_c7_hits(void) {
    return atomic_load_explicit(&g_dbg_c7_warm_pop_hits, memory_order_relaxed);
 }
 static inline uint64_t warm_pool_dbg_c7_carves(void) {
    return atomic_load_explicit(&g_dbg_c7_warm_pop_carve, memory_order_relaxed);
 }
 static inline uint64_t warm_pool_dbg_c7_tls_attempts(void) {
    return atomic_load_explicit(&g_dbg_c7_tls_carve_attempts, memory_order_relaxed);
 }
 static inline uint64_t warm_pool_dbg_c7_tls_successes(void) {
    return atomic_load_explicit(&g_dbg_c7_tls_carve_success, memory_order_relaxed);
 }
 static inline uint64_t warm_pool_dbg_c7_tls_failures(void) {
    return atomic_load_explicit(&g_dbg_c7_tls_carve_fail, memory_order_relaxed);
 }
 static inline uint64_t warm_pool_dbg_c7_uc_miss_warm_refills(void) {
    return atomic_load_explicit(&g_dbg_c7_uc_miss_warm_refill, memory_order_relaxed);
 }
 static inline uint64_t warm_pool_dbg_c7_uc_miss_tls_refills(void) {
    return atomic_load_explicit(&g_dbg_c7_uc_miss_tls_refill, memory_order_relaxed);
 }
 static inline uint64_t warm_pool_dbg_c7_uc_miss_shared_refills(void) {
    return atomic_load_explicit(&g_dbg_c7_uc_miss_shared_refill, memory_order_relaxed);
 }
 #else
 static inline void warm_pool_dbg_c7_attempt(void) { }
 static inline void warm_pool_dbg_c7_hit(void) { }
 static inline void warm_pool_dbg_c7_carve(void) { }
 static inline void warm_pool_dbg_c7_tls_attempt(void) { }
 static inline void warm_pool_dbg_c7_tls_success(void) { }
 static inline void warm_pool_dbg_c7_tls_fail(void) { }
 static inline void warm_pool_dbg_c7_uc_miss_warm(void) { }
 static inline void warm_pool_dbg_c7_uc_miss_tls(void) { }
 static inline void warm_pool_dbg_c7_uc_miss_shared(void) { }
 static inline uint64_t warm_pool_dbg_c7_attempts(void) { return 0; }
 static inline uint64_t warm_pool_dbg_c7_hits(void) { return 0; }
 static inline uint64_t warm_pool_dbg_c7_carves(void) { return 0; }
 static inline uint64_t warm_pool_dbg_c7_tls_attempts(void) { return 0; }
 static inline uint64_t warm_pool_dbg_c7_tls_successes(void) { return 0; }
 static inline uint64_t warm_pool_dbg_c7_tls_failures(void) { return 0; }
 static inline uint64_t warm_pool_dbg_c7_uc_miss_warm_refills(void) { return 0; }
 static inline uint64_t warm_pool_dbg_c7_uc_miss_tls_refills(void) { return 0; }
 static inline uint64_t warm_pool_dbg_c7_uc_miss_shared_refills(void) { return 0; }
 #endif
--- a/core/box/warm_pool_prefill_box.h
+++ b/core/box/warm_pool_prefill_box.h
@ -7,11 +7,51 @@
 #define HAK_WARM_POOL_PREFILL_BOX_H
 #include <stdint.h>
 #include <stdatomic.h>
 #include <stdio.h>
 #include "../hakmem_tiny_config.h"
 #include "../hakmem_tiny_superslab.h"
 #include "../tiny_tls.h"
 #include "../front/tiny_warm_pool.h"
 #include "../box/warm_pool_stats_box.h"
 #include "../box/warm_pool_rel_counters_box.h"
 static inline void warm_prefill_log_c7_meta(const char* tag, TinyTLSSlab* tls) {
    if (!tls || !tls->ss) return;
 #if HAKMEM_BUILD_RELEASE
    static _Atomic uint32_t rel_logs = 0;
    uint32_t n = atomic_fetch_add_explicit(&rel_logs, 1, memory_order_relaxed);
    if (n < 4) {
        TinySlabMeta* meta = &tls->ss->slabs[tls->slab_idx];
        fprintf(stderr,
                "[REL_C7_%s] ss=%p slab=%u cls=%u used=%u cap=%u carved=%u freelist=%p\n",
                tag,
                (void*)tls->ss,
                (unsigned)tls->slab_idx,
                (unsigned)meta->class_idx,
                (unsigned)meta->used,
                (unsigned)meta->capacity,
                (unsigned)meta->carved,
                meta->freelist);
    }
 #else
    static _Atomic uint32_t dbg_logs = 0;
    uint32_t n = atomic_fetch_add_explicit(&dbg_logs, 1, memory_order_relaxed);
    if (n < 4) {
        TinySlabMeta* meta = &tls->ss->slabs[tls->slab_idx];
        fprintf(stderr,
                "[DBG_C7_%s] ss=%p slab=%u cls=%u used=%u cap=%u carved=%u freelist=%p\n",
                tag,
                (void*)tls->ss,
                (unsigned)tls->slab_idx,
                (unsigned)meta->class_idx,
                (unsigned)meta->used,
                (unsigned)meta->capacity,
                (unsigned)meta->carved,
                meta->freelist);
    }
 #endif
 }
 // Forward declarations
 extern __thread TinyTLSSlab g_tls_slabs[TINY_NUM_CLASSES];
@ -45,9 +85,17 @@ extern SuperSlab* superslab_refill(int class_idx);
 // Performance: Only triggered when pool is empty, cold path cost
 //
 static inline int warm_pool_do_prefill(int class_idx, TinyTLSSlab* tls) {
    #if HAKMEM_BUILD_RELEASE
    if (class_idx == 7) {
        warm_pool_rel_c7_prefill_call();
    }
    #endif
    int budget = (tiny_warm_pool_count(class_idx) == 0) ? WARM_POOL_PREFILL_BUDGET : 1;
    while (budget > 0) {
        if (class_idx == 7) {
            warm_prefill_log_c7_meta("PREFILL_META", tls);
        }
        if (!tls->ss) {
            // Need to load a new SuperSlab
            if (!superslab_refill(class_idx)) {
@ -61,16 +109,75 @@ static inline int warm_pool_do_prefill(int class_idx, TinyTLSSlab* tls) {
            break;
        }
        // C7 safety: prefer only pristine slabs (used=0 carved=0 freelist=NULL)
        if (class_idx == 7) {
            TinySlabMeta* meta = &tls->ss->slabs[tls->slab_idx];
            if (meta->class_idx == 7 &&
                (meta->used > 0 || meta->carved > 0 || meta->freelist != NULL)) {
                #if HAKMEM_BUILD_RELEASE
                static _Atomic int rel_c7_skip_logged = 0;
                if (atomic_load_explicit(&rel_c7_skip_logged, memory_order_relaxed) == 0) {
                    fprintf(stderr,
                            "[REL_C7_PREFILL_SKIP_NONEMPTY] ss=%p slab=%u used=%u cap=%u carved=%u freelist=%p\n",
                            (void*)tls->ss,
                            (unsigned)tls->slab_idx,
                            (unsigned)meta->used,
                            (unsigned)meta->capacity,
                            (unsigned)meta->carved,
                            meta->freelist);
                    atomic_store_explicit(&rel_c7_skip_logged, 1, memory_order_relaxed);
                }
                #else
                static __thread int dbg_c7_skip_logged = 0;
                if (dbg_c7_skip_logged < 4) {
                    fprintf(stderr,
                            "[DBG_C7_PREFILL_SKIP_NONEMPTY] ss=%p slab=%u used=%u cap=%u carved=%u freelist=%p\n",
                            (void*)tls->ss,
                            (unsigned)tls->slab_idx,
                            (unsigned)meta->used,
                            (unsigned)meta->capacity,
                            (unsigned)meta->carved,
                            meta->freelist);
                    dbg_c7_skip_logged++;
                }
                #endif
                tls->ss = NULL;  // Drop exhausted slab and try another
                budget--;
                continue;
            }
        }
        if (budget > 1) {
            // Prefill mode: push to pool and load another
            tiny_warm_pool_push(class_idx, tls->ss);
            warm_pool_record_prefilled(class_idx);
-            tls->ss = NULL;  // Force next iteration to refill
+            #if HAKMEM_BUILD_RELEASE
-            budget--;
+        if (class_idx == 7) {
-        } else {
+            warm_pool_rel_c7_prefill_slab();
            // Final slab: keep in TLS for immediate carving
            budget = 0;
        }
        #else
        if (class_idx == 7) {
            static __thread int dbg_c7_prefill_logs = 0;
            if (dbg_c7_prefill_logs < 8) {
                TinySlabMeta* meta = &tls->ss->slabs[tls->slab_idx];
                fprintf(stderr,
                        "[DBG_C7_PREFILL] ss=%p slab=%u used=%u cap=%u carved=%u freelist=%p\n",
                        (void*)tls->ss,
                        (unsigned)tls->slab_idx,
                        (unsigned)meta->used,
                        (unsigned)meta->capacity,
                        (unsigned)meta->carved,
                        meta->freelist);
                dbg_c7_prefill_logs++;
            }
        }
        #endif
        tls->ss = NULL;  // Force next iteration to refill
        budget--;
    } else {
        // Final slab: keep in TLS for immediate carving
        budget = 0;
    }
    }
    return 0;  // Success
--- a/core/box/warm_pool_rel_counters_box.h
+++ b/core/box/warm_pool_rel_counters_box.h
@ -0,0 +1,64 @@
 // warm_pool_rel_counters_box.h
 // Box: Lightweight Release-side counters for C7 Warm/TLS instrumentation.
 #pragma once
 #include <stdatomic.h>
 #include <stdint.h>
 #if HAKMEM_BUILD_RELEASE
 #ifdef WARM_POOL_REL_DEFINE
 _Atomic uint64_t g_rel_c7_carve_attempts = 0;
 _Atomic uint64_t g_rel_c7_carve_success = 0;
 _Atomic uint64_t g_rel_c7_carve_zero = 0;
 _Atomic uint64_t g_rel_c7_warm_prefill_calls = 0;
 _Atomic uint64_t g_rel_c7_warm_prefill_slabs = 0;
 #else
 extern _Atomic uint64_t g_rel_c7_carve_attempts;
 extern _Atomic uint64_t g_rel_c7_carve_success;
 extern _Atomic uint64_t g_rel_c7_carve_zero;
 extern _Atomic uint64_t g_rel_c7_warm_prefill_calls;
 extern _Atomic uint64_t g_rel_c7_warm_prefill_slabs;
 #endif
 static inline void warm_pool_rel_c7_carve_attempt(void) {
    atomic_fetch_add_explicit(&g_rel_c7_carve_attempts, 1, memory_order_relaxed);
 }
 static inline void warm_pool_rel_c7_carve_success(void) {
    atomic_fetch_add_explicit(&g_rel_c7_carve_success, 1, memory_order_relaxed);
 }
 static inline void warm_pool_rel_c7_carve_zero(void) {
    atomic_fetch_add_explicit(&g_rel_c7_carve_zero, 1, memory_order_relaxed);
 }
 static inline void warm_pool_rel_c7_prefill_call(void) {
    atomic_fetch_add_explicit(&g_rel_c7_warm_prefill_calls, 1, memory_order_relaxed);
 }
 static inline void warm_pool_rel_c7_prefill_slab(void) {
    atomic_fetch_add_explicit(&g_rel_c7_warm_prefill_slabs, 1, memory_order_relaxed);
 }
 static inline uint64_t warm_pool_rel_c7_carve_attempts(void) {
    return atomic_load_explicit(&g_rel_c7_carve_attempts, memory_order_relaxed);
 }
 static inline uint64_t warm_pool_rel_c7_carve_successes(void) {
    return atomic_load_explicit(&g_rel_c7_carve_success, memory_order_relaxed);
 }
 static inline uint64_t warm_pool_rel_c7_carve_zeroes(void) {
    return atomic_load_explicit(&g_rel_c7_carve_zero, memory_order_relaxed);
 }
 static inline uint64_t warm_pool_rel_c7_prefill_calls(void) {
    return atomic_load_explicit(&g_rel_c7_warm_prefill_calls, memory_order_relaxed);
 }
 static inline uint64_t warm_pool_rel_c7_prefill_slabs(void) {
    return atomic_load_explicit(&g_rel_c7_warm_prefill_slabs, memory_order_relaxed);
 }
 #else
 static inline void warm_pool_rel_c7_carve_attempt(void) { }
 static inline void warm_pool_rel_c7_carve_success(void) { }
 static inline void warm_pool_rel_c7_carve_zero(void) { }
 static inline void warm_pool_rel_c7_prefill_call(void) { }
 static inline void warm_pool_rel_c7_prefill_slab(void) { }
 static inline uint64_t warm_pool_rel_c7_carve_attempts(void) { return 0; }
 static inline uint64_t warm_pool_rel_c7_carve_successes(void) { return 0; }
 static inline uint64_t warm_pool_rel_c7_carve_zeroes(void) { return 0; }
 static inline uint64_t warm_pool_rel_c7_prefill_calls(void) { return 0; }
 static inline uint64_t warm_pool_rel_c7_prefill_slabs(void) { return 0; }
 #endif
--- a/core/box/warm_tls_bind_logger_box.h
+++ b/core/box/warm_tls_bind_logger_box.h
@ -0,0 +1,57 @@
 // warm_tls_bind_logger_box.h
 // Box: Warm TLS Bind experiment logging with simple throttling.
 #pragma once
 #include "../hakmem_tiny_superslab.h"
 #include <stdatomic.h>
 #include <stdio.h>
 #include <stdlib.h>
 #if !HAKMEM_BUILD_RELEASE
 static _Atomic int g_warm_tls_bind_log_limit = -1;
 static _Atomic int g_warm_tls_bind_log_count = 0;
 static inline int warm_tls_bind_log_limit(void) {
    int limit = atomic_load_explicit(&g_warm_tls_bind_log_limit, memory_order_relaxed);
    if (__builtin_expect(limit == -1, 0)) {
        const char* e = getenv("HAKMEM_WARM_TLS_BIND_LOG_MAX");
        int parsed = (e && *e) ? atoi(e) : 1;
        atomic_store_explicit(&g_warm_tls_bind_log_limit, parsed, memory_order_relaxed);
        limit = parsed;
    }
    return limit;
 }
 static inline int warm_tls_bind_log_acquire(void) {
    int limit = warm_tls_bind_log_limit();
    int prev = atomic_fetch_add_explicit(&g_warm_tls_bind_log_count, 1, memory_order_relaxed);
    return prev < limit;
 }
 static inline void warm_tls_bind_log_success(SuperSlab* ss, int slab_idx) {
    if (warm_tls_bind_log_acquire()) {
        fprintf(stderr, "[WARM_TLS_BIND] C7 bind success: ss=%p slab=%d\n",
                (void*)ss, slab_idx);
    }
 }
 static inline void warm_tls_bind_log_tls_carve(SuperSlab* ss, int slab_idx, void* block) {
    if (warm_tls_bind_log_acquire()) {
        fprintf(stderr,
                "[WARM_TLS_BIND] C7 TLS carve success: ss=%p slab=%d block=%p\n",
                (void*)ss, slab_idx, block);
    }
 }
 static inline void warm_tls_bind_log_tls_fail(SuperSlab* ss, int slab_idx) {
    if (warm_tls_bind_log_acquire()) {
        fprintf(stderr,
                "[WARM_TLS_BIND] C7 TLS carve failed, fallback (ss=%p slab=%d)\n",
                (void*)ss, slab_idx);
    }
 }
 #else
 static inline void warm_tls_bind_log_success(SuperSlab* ss, int slab_idx) { (void)ss; (void)slab_idx; }
 static inline void warm_tls_bind_log_tls_carve(SuperSlab* ss, int slab_idx, void* block) { (void)ss; (void)slab_idx; (void)block; }
 static inline void warm_tls_bind_log_tls_fail(SuperSlab* ss, int slab_idx) { (void)ss; (void)slab_idx; }
 #endif
--- a/core/front/tiny_unified_cache.c
+++ b/core/front/tiny_unified_cache.c
@ -12,10 +12,19 @@
 #include "../box/ss_slab_meta_box.h"         // For ss_active_add() and slab metadata operations
 #include "../box/warm_pool_stats_box.h"      // Box: Warm Pool Statistics Recording (inline)
 #include "../box/slab_carve_box.h"           // Box: Slab Carving (inline O(slabs) scan)
 #define WARM_POOL_REL_DEFINE
 #include "../box/warm_pool_rel_counters_box.h"  // Box: Release-side C7 counters
 #undef WARM_POOL_REL_DEFINE
 #include "../box/c7_meta_used_counter_box.h"    // Box: C7 meta->used increment counters
 #include "../box/warm_pool_prefill_box.h"    // Box: Warm Pool Prefill (secondary optimization)
 #include "../hakmem_env_cache.h"             // Priority-2: ENV cache (eliminate syscalls)
 #include "../box/tiny_page_box.h"           // Tiny-Plus Page Box (C5–C7 initial hook)
 #include "../box/ss_tls_bind_box.h"         // Box: TLS Bind (SuperSlab -> TLS binding)
 #include "../box/tiny_tls_carve_one_block_box.h"  // Box: TLS carve helper (shared)
 #include "../box/warm_tls_bind_logger_box.h"      // Box: Warm TLS Bind logging (throttled)
 #define WARM_POOL_DBG_DEFINE
 #include "../box/warm_pool_dbg_box.h"             // Box: Warm Pool C7 debug counters
 #undef WARM_POOL_DBG_DEFINE
 #include <stdlib.h>
 #include <string.h>
 #include <stdatomic.h>
@ -84,6 +93,12 @@ __thread uint64_t g_unified_cache_push[TINY_NUM_CLASSES] = {0};
 __thread uint64_t g_unified_cache_full[TINY_NUM_CLASSES] = {0};
 #endif
 // Release-side lightweight telemetry (C7 Warm path only)
 #if HAKMEM_BUILD_RELEASE
 _Atomic uint64_t g_rel_c7_warm_pop = 0;
 _Atomic uint64_t g_rel_c7_warm_push = 0;
 #endif
 // Warm Pool metrics (definition - declared in tiny_warm_pool.h as extern)
 // Note: These are kept outside !HAKMEM_BUILD_RELEASE for profiling in release builds
 __thread TinyWarmPoolStats g_warm_pool_stats[TINY_NUM_CLASSES] = {0};
@ -98,46 +113,36 @@ _Atomic uint64_t g_dbg_warm_pop_attempts = 0;
 _Atomic uint64_t g_dbg_warm_pop_hits = 0;
 _Atomic uint64_t g_dbg_warm_pop_empty = 0;
 _Atomic uint64_t g_dbg_warm_pop_carve_zero = 0;
 #endif
-// Debug-only: cached ENV for Warm TLS Bind (C7)
+// Warm TLS Bind (C7) mode selector
-static int g_warm_tls_bind_mode_c7 = -1;
+//   mode 0: Legacy warm path（デバッグ専用・C7では非推奨）
-
+//   mode 1: Bind-only 本番経路（C7 標準）
 //   mode 2: Bind + TLS carve 実験経路（Debug 専用）
 // Release ビルドでは常に mode=1 に固定し、ENV は無視する。
 static inline int warm_tls_bind_mode_c7(void) {
 #if HAKMEM_BUILD_RELEASE
    static int g_warm_tls_bind_mode_c7 = -1;
    if (__builtin_expect(g_warm_tls_bind_mode_c7 == -1, 0)) {
        const char* e = getenv("HAKMEM_WARM_TLS_BIND_C7");
-        // 0/empty: disabled, 1: bind only, 2: bind + TLS carve one block
+        int mode = (e && *e) ? atoi(e) : 1;  // default = Bind-only
-        g_warm_tls_bind_mode_c7 = (e && *e) ? atoi(e) : 0;
+        if (mode < 0) mode = 0;
        if (mode > 2) mode = 2;
        g_warm_tls_bind_mode_c7 = mode;
    }
    return g_warm_tls_bind_mode_c7;
-}
+#else
-
+    static int g_warm_tls_bind_mode_c7 = -1;
-static inline void* warm_tls_carve_one_block(int class_idx) {
+    if (__builtin_expect(g_warm_tls_bind_mode_c7 == -1, 0)) {
-    TinyTLSSlab* tls = &g_tls_slabs[class_idx];
+        const char* e = getenv("HAKMEM_WARM_TLS_BIND_C7");
-    TinySlabMeta* meta = tls->meta;
+        int mode = (e && *e) ? atoi(e) : 1;  // default = Bind-only
-
+        if (mode < 0) mode = 0;
-    if (!meta || !tls->ss || tls->slab_base == NULL) return NULL;
+        if (mode > 2) mode = 2;
-    if (meta->class_idx != (uint8_t)class_idx) return NULL;
+        g_warm_tls_bind_mode_c7 = mode;
    if (tls->slab_idx < 0 || tls->slab_idx >= ss_slabs_capacity(tls->ss)) return NULL;
    if (meta->freelist) {
        void* block = meta->freelist;
        meta->freelist = tiny_next_read(class_idx, block);
        meta->used++;
        ss_active_add(tls->ss, 1);
        return block;
    }
-
+    return g_warm_tls_bind_mode_c7;
    if (meta->used < meta->capacity) {
        size_t block_size = tiny_stride_for_class(meta->class_idx);
        void* block = tiny_block_at_index(tls->slab_base, meta->used, block_size);
        meta->used++;
        ss_active_add(tls->ss, 1);
        return block;
    }
    return NULL;
 }
 #endif
 }
 // Forward declaration for Warm Pool stats printer (defined later in this file)
 static inline void tiny_warm_pool_print_stats(void);
@ -157,6 +162,15 @@ int unified_cache_enabled(void) {
            fprintf(stderr, "[Unified-INIT] unified_cache_enabled() = %d\n", g_enable);
            fflush(stderr);
        }
 #else
        if (g_enable) {
            static int printed = 0;
            if (!printed) {
                fprintf(stderr, "[Rel-Unified] unified_cache_enabled() = %d\n", g_enable);
                fflush(stderr);
                printed = 1;
            }
        }
 #endif
    }
    return g_enable;
@ -311,6 +325,32 @@ static inline void tiny_warm_pool_print_stats(void) {
            (unsigned long long)atomic_load_explicit(&g_dbg_warm_pop_hits, memory_order_relaxed),
            (unsigned long long)atomic_load_explicit(&g_dbg_warm_pop_empty, memory_order_relaxed),
            (unsigned long long)atomic_load_explicit(&g_dbg_warm_pop_carve_zero, memory_order_relaxed));
    uint64_t c7_attempts = warm_pool_dbg_c7_attempts();
    uint64_t c7_hits = warm_pool_dbg_c7_hits();
    uint64_t c7_carve = warm_pool_dbg_c7_carves();
    uint64_t c7_tls_attempts = warm_pool_dbg_c7_tls_attempts();
    uint64_t c7_tls_success = warm_pool_dbg_c7_tls_successes();
    uint64_t c7_tls_fail = warm_pool_dbg_c7_tls_failures();
    uint64_t c7_uc_warm = warm_pool_dbg_c7_uc_miss_warm_refills();
    uint64_t c7_uc_tls = warm_pool_dbg_c7_uc_miss_tls_refills();
    uint64_t c7_uc_shared = warm_pool_dbg_c7_uc_miss_shared_refills();
    if (c7_attempts || c7_hits || c7_carve ||
        c7_tls_attempts || c7_tls_success || c7_tls_fail ||
        c7_uc_warm || c7_uc_tls || c7_uc_shared) {
        fprintf(stderr,
                "  [DBG_C7] warm_pop_attempts=%llu warm_pop_hits=%llu warm_pop_carve=%llu "
                "tls_carve_attempts=%llu tls_carve_success=%llu tls_carve_fail=%llu "
                "uc_miss_warm=%llu uc_miss_tls=%llu uc_miss_shared=%llu\n",
                (unsigned long long)c7_attempts,
                (unsigned long long)c7_hits,
                (unsigned long long)c7_carve,
                (unsigned long long)c7_tls_attempts,
                (unsigned long long)c7_tls_success,
                (unsigned long long)c7_tls_fail,
                (unsigned long long)c7_uc_warm,
                (unsigned long long)c7_uc_tls,
                (unsigned long long)c7_uc_shared);
    }
 #endif
    fflush(stderr);
 }
@ -515,6 +555,7 @@ hak_base_ptr_t unified_cache_refill(int class_idx) {
    //  - これにより、room <= max_batch <= 512 が常に成り立ち、out[] オーバーランを防止する。
    void* out[512];
    int produced = 0;
    int tls_carved = 0;  // Debug bookkeeping: track TLS carve experiment hits
    // ========== PAGE BOX HOT PATH（Tiny-Plus 層）: Try page box FIRST ==========
    // 将来的に C7 専用の page-level freelist 管理をここに統合する。
@ -554,10 +595,21 @@ hak_base_ptr_t unified_cache_refill(int class_idx) {
    // This is the critical optimization - avoid superslab_refill() registry scan
    #if !HAKMEM_BUILD_RELEASE
    atomic_fetch_add_explicit(&g_dbg_warm_pop_attempts, 1, memory_order_relaxed);
    if (class_idx == 7) {
        warm_pool_dbg_c7_attempt();
    }
    #endif
    #if HAKMEM_BUILD_RELEASE
    if (class_idx == 7) {
        atomic_fetch_add_explicit(&g_rel_c7_warm_pop, 1, memory_order_relaxed);
    }
    #endif
    SuperSlab* warm_ss = tiny_warm_pool_pop(class_idx);
    if (warm_ss) {
        #if !HAKMEM_BUILD_RELEASE
        if (class_idx == 7) {
            warm_pool_dbg_c7_hit();
        }
        // Debug-only: Warm TLS Bind experiment (C7 only)
        if (class_idx == 7) {
            int warm_mode = warm_tls_bind_mode_c7();
@ -577,25 +629,22 @@ hak_base_ptr_t unified_cache_refill(int class_idx) {
                    TinyTLSSlab* tls = &g_tls_slabs[class_idx];
                    uint32_t tid = (uint32_t)(uintptr_t)pthread_self();
                    if (ss_tls_bind_one(class_idx, tls, warm_ss, slab_idx, tid)) {
-                        static int logged = 0;
+                        warm_tls_bind_log_success(warm_ss, slab_idx);
                        if (!logged) {
                            fprintf(stderr, "[WARM_TLS_BIND] C7 bind success: ss=%p slab=%d\n",
                                    (void*)warm_ss, slab_idx);
                            logged = 1;
                        }
                        // Mode 2: carve a single block via TLS fast path
                        if (warm_mode == 2) {
-                            void* tls_block = warm_tls_carve_one_block(class_idx);
+                            warm_pool_dbg_c7_tls_attempt();
-                            if (tls_block) {
+                            TinyTLSCarveOneResult tls_carve =
-                                fprintf(stderr,
+                                tiny_tls_carve_one_block(tls, class_idx);
-                                        "[WARM_TLS_BIND] C7 TLS carve success: ss=%p slab=%d block=%p\n",
+                            if (tls_carve.block) {
-                                        (void*)warm_ss, slab_idx, tls_block);
+                                warm_tls_bind_log_tls_carve(warm_ss, slab_idx, tls_carve.block);
-                                out[0] = tls_block;
+                                warm_pool_dbg_c7_tls_success();
                                out[0] = tls_carve.block;
                                produced = 1;
                                tls_carved = 1;
                            } else {
-                                fprintf(stderr,
+                                warm_tls_bind_log_tls_fail(warm_ss, slab_idx);
-                                        "[WARM_TLS_BIND] C7 TLS carve failed, fallback\n");
+                                warm_pool_dbg_c7_tls_fail();
                            }
                        }
                    }
@ -607,7 +656,21 @@ hak_base_ptr_t unified_cache_refill(int class_idx) {
        #endif
        // HOT PATH: Warm pool hit, try to carve directly
        if (produced == 0) {
            #if HAKMEM_BUILD_RELEASE
            if (class_idx == 7) {
                warm_pool_rel_c7_carve_attempt();
            }
            #endif
            produced = slab_carve_from_ss(class_idx, warm_ss, out, room);
            #if HAKMEM_BUILD_RELEASE
            if (class_idx == 7) {
                if (produced > 0) {
                    warm_pool_rel_c7_carve_success();
                } else {
                    warm_pool_rel_c7_carve_zero();
                }
            }
            #endif
            if (produced > 0) {
                // Update active counter for carved blocks
                ss_active_add(warm_ss, (uint32_t)produced);
@ -615,7 +678,22 @@ hak_base_ptr_t unified_cache_refill(int class_idx) {
        }
        if (produced > 0) {
            #if !HAKMEM_BUILD_RELEASE
            if (class_idx == 7) {
                warm_pool_dbg_c7_carve();
                if (tls_carved) {
                    warm_pool_dbg_c7_uc_miss_tls();
                } else {
                    warm_pool_dbg_c7_uc_miss_warm();
                }
            }
            #endif
            // Success! Return SuperSlab to warm pool for next use
            #if HAKMEM_BUILD_RELEASE
            if (class_idx == 7) {
                atomic_fetch_add_explicit(&g_rel_c7_warm_push, 1, memory_order_relaxed);
            }
            #endif
            tiny_warm_pool_push(class_idx, warm_ss);
            // Track warm pool hit (always compiled, ENV-gated printing)
@ -761,6 +839,9 @@ hak_base_ptr_t unified_cache_refill(int class_idx) {
    }
    #if !HAKMEM_BUILD_RELEASE
    if (class_idx == 7) {
        warm_pool_dbg_c7_uc_miss_shared();
    }
    g_unified_cache_miss[class_idx]++;
    #endif
--- a/core/front/tiny_unified_cache.d
+++ b/core/front/tiny_unified_cache.d
@ -40,10 +40,18 @@ core/front/tiny_unified_cache.o: core/front/tiny_unified_cache.c \
 core/front/../box/../superslab/superslab_inline.h \
 core/front/../box/../tiny_box_geometry.h \
 core/front/../box/../box/pagefault_telemetry_box.h \
 core/front/../box/c7_meta_used_counter_box.h \
 core/front/../box/warm_pool_rel_counters_box.h \
 core/front/../box/warm_pool_prefill_box.h \
 core/front/../box/../tiny_tls.h \
 core/front/../box/../box/warm_pool_stats_box.h \
- core/front/../hakmem_env_cache.h core/front/../box/tiny_page_box.h
+ core/front/../hakmem_env_cache.h core/front/../box/tiny_page_box.h \
 core/front/../box/ss_tls_bind_box.h \
 core/front/../box/../box/tiny_page_box.h \
 core/front/../box/tiny_tls_carve_one_block_box.h \
 core/front/../box/../tiny_debug_api.h \
 core/front/../box/warm_tls_bind_logger_box.h \
 core/front/../box/warm_pool_dbg_box.h
 core/front/tiny_unified_cache.h:
 core/front/../hakmem_build_flags.h:
 core/front/../hakmem_tiny_config.h:
@ -104,8 +112,16 @@ core/front/../box/../hakmem_tiny_superslab.h:
 core/front/../box/../superslab/superslab_inline.h:
 core/front/../box/../tiny_box_geometry.h:
 core/front/../box/../box/pagefault_telemetry_box.h:
 core/front/../box/c7_meta_used_counter_box.h:
 core/front/../box/warm_pool_rel_counters_box.h:
 core/front/../box/warm_pool_prefill_box.h:
 core/front/../box/../tiny_tls.h:
 core/front/../box/../box/warm_pool_stats_box.h:
 core/front/../hakmem_env_cache.h:
 core/front/../box/tiny_page_box.h:
 core/front/../box/ss_tls_bind_box.h:
 core/front/../box/../box/tiny_page_box.h:
 core/front/../box/tiny_tls_carve_one_block_box.h:
 core/front/../box/../tiny_debug_api.h:
 core/front/../box/warm_tls_bind_logger_box.h:
 core/front/../box/warm_pool_dbg_box.h:
--- a/core/front/tiny_unified_cache.h
+++ b/core/front/tiny_unified_cache.h
@ -87,6 +87,10 @@ extern __thread uint64_t g_unified_cache_hit[TINY_NUM_CLASSES];    // Alloc hits
 extern __thread uint64_t g_unified_cache_miss[TINY_NUM_CLASSES];   // Alloc misses
 extern __thread uint64_t g_unified_cache_push[TINY_NUM_CLASSES];   // Free pushes
 extern __thread uint64_t g_unified_cache_full[TINY_NUM_CLASSES];   // Free full (fallback to SuperSlab)
 #else
 // Release-side lightweight C7 warm path counters (for smoke validation)
 extern _Atomic uint64_t g_rel_c7_warm_pop;
 extern _Atomic uint64_t g_rel_c7_warm_push;
 #endif
 // ============================================================================
--- a/core/hakmem_shared_pool_acquire.c
+++ b/core/hakmem_shared_pool_acquire.c
@ -10,11 +10,145 @@
 #include "hakmem_policy.h"
 #include "hakmem_env_cache.h"  // Priority-2: ENV cache
 #include "front/tiny_warm_pool.h"  // Warm Pool: Prefill during registry scans
 #include "box/ss_slab_reset_box.h" // Box: Reset slab metadata on reuse (C7 guard)
 #include <stdlib.h>
 #include <stdio.h>
 #include <stdatomic.h>
 static inline void c7_log_meta_state(const char* tag, SuperSlab* ss, int slab_idx) {
    if (!ss) return;
 #if HAKMEM_BUILD_RELEASE
    static _Atomic uint32_t rel_c7_meta_logs = 0;
    uint32_t n = atomic_fetch_add_explicit(&rel_c7_meta_logs, 1, memory_order_relaxed);
    if (n < 8) {
        TinySlabMeta* m = &ss->slabs[slab_idx];
        fprintf(stderr,
                "[REL_C7_%s] ss=%p slab=%d cls=%u used=%u cap=%u carved=%u freelist=%p\n",
                tag,
                (void*)ss,
                slab_idx,
                (unsigned)m->class_idx,
                (unsigned)m->used,
                (unsigned)m->capacity,
                (unsigned)m->carved,
                m->freelist);
    }
 #else
    static _Atomic uint32_t dbg_c7_meta_logs = 0;
    uint32_t n = atomic_fetch_add_explicit(&dbg_c7_meta_logs, 1, memory_order_relaxed);
    if (n < 8) {
        TinySlabMeta* m = &ss->slabs[slab_idx];
        fprintf(stderr,
                "[DBG_C7_%s] ss=%p slab=%d cls=%u used=%u cap=%u carved=%u freelist=%p\n",
                tag,
                (void*)ss,
                slab_idx,
                (unsigned)m->class_idx,
                (unsigned)m->used,
                (unsigned)m->capacity,
                (unsigned)m->carved,
                m->freelist);
    }
 #endif
 }
 static inline int c7_meta_is_pristine(TinySlabMeta* m) {
    return m && m->used == 0 && m->carved == 0 && m->freelist == NULL;
 }
 static inline void c7_log_skip_nonempty_acquire(SuperSlab* ss,
                                                int slab_idx,
                                                TinySlabMeta* m,
                                                const char* tag) {
    if (!(ss && m)) return;
 #if HAKMEM_BUILD_RELEASE
    static _Atomic uint32_t rel_c7_skip_logs = 0;
    uint32_t n = atomic_fetch_add_explicit(&rel_c7_skip_logs, 1, memory_order_relaxed);
    if (n < 4) {
        fprintf(stderr,
                "[REL_C7_%s] ss=%p slab=%d cls=%u used=%u cap=%u carved=%u freelist=%p\n",
                tag,
                (void*)ss,
                slab_idx,
                (unsigned)m->class_idx,
                (unsigned)m->used,
                (unsigned)m->capacity,
                (unsigned)m->carved,
                m->freelist);
    }
 #else
    static _Atomic uint32_t dbg_c7_skip_logs = 0;
    uint32_t n = atomic_fetch_add_explicit(&dbg_c7_skip_logs, 1, memory_order_relaxed);
    if (n < 4) {
        fprintf(stderr,
                "[DBG_C7_%s] ss=%p slab=%d cls=%u used=%u cap=%u carved=%u freelist=%p\n",
                tag,
                (void*)ss,
                slab_idx,
                (unsigned)m->class_idx,
                (unsigned)m->used,
                (unsigned)m->capacity,
                (unsigned)m->carved,
                m->freelist);
    }
 #endif
 }
 static inline int c7_reset_and_log_if_needed(SuperSlab* ss,
                                             int slab_idx,
                                             int class_idx) {
    if (class_idx != 7) {
        return 0;
    }
    TinySlabMeta* m = &ss->slabs[slab_idx];
    c7_log_meta_state("ACQUIRE_META", ss, slab_idx);
    if (m->class_idx != 255 && m->class_idx != (uint8_t)class_idx) {
 #if HAKMEM_BUILD_RELEASE
        static _Atomic uint32_t rel_c7_class_mismatch_logs = 0;
        uint32_t n = atomic_fetch_add_explicit(&rel_c7_class_mismatch_logs, 1, memory_order_relaxed);
        if (n < 4) {
            fprintf(stderr,
                    "[REL_C7_CLASS_MISMATCH] ss=%p slab=%d want=%d have=%u used=%u cap=%u carved=%u\n",
                    (void*)ss,
                    slab_idx,
                    class_idx,
                    (unsigned)m->class_idx,
                    (unsigned)m->used,
                    (unsigned)m->capacity,
                    (unsigned)m->carved);
        }
 #else
        static _Atomic uint32_t dbg_c7_class_mismatch_logs = 0;
        uint32_t n = atomic_fetch_add_explicit(&dbg_c7_class_mismatch_logs, 1, memory_order_relaxed);
        if (n < 4) {
            fprintf(stderr,
                    "[DBG_C7_CLASS_MISMATCH] ss=%p slab=%d want=%d have=%u used=%u cap=%u carved=%u freelist=%p\n",
                    (void*)ss,
                    slab_idx,
                    class_idx,
                    (unsigned)m->class_idx,
                    (unsigned)m->used,
                    (unsigned)m->capacity,
                    (unsigned)m->carved,
                    m->freelist);
        }
 #endif
        return -1;
    }
    if (!c7_meta_is_pristine(m)) {
        c7_log_skip_nonempty_acquire(ss, slab_idx, m, "SKIP_NONEMPTY_ACQUIRE");
        return -1;
    }
    ss_slab_reset_meta_for_tiny(ss, slab_idx, class_idx);
    c7_log_meta_state("ACQUIRE", ss, slab_idx);
    return 0;
 }
 // ============================================================================
 // Performance Measurement: Shared Pool Lock Contention (ENV-gated)
 // ============================================================================
@ -147,7 +281,12 @@ sp_acquire_from_empty_scan(int class_idx, SuperSlab** ss_out, int* slab_idx_out,
            fprintf(stderr, "[STAGE0.5_STATS] hits=%lu attempts=%lu rate=%.1f%% (scan_limit=%d warm_pool=%d)\n",
                    hits, attempts, (double)hits * 100.0 / attempts, scan_limit, tiny_warm_pool_count(class_idx));
        }
-        return 0;
+        if (c7_reset_and_log_if_needed(primary_result, primary_slab_idx, class_idx) == 0) {
            return 0;
        }
        primary_result = NULL;
        *ss_out = NULL;
        *slab_idx_out = -1;
    }
    return -1;
 }
@ -216,6 +355,15 @@ stage1_retry_after_tension_drain:
        if (ss_guard) {
            tiny_tls_slab_reuse_guard(ss_guard);
            if (class_idx == 7) {
                TinySlabMeta* meta = &ss_guard->slabs[reuse_slot_idx];
                if (!c7_meta_is_pristine(meta)) {
                    c7_log_skip_nonempty_acquire(ss_guard, reuse_slot_idx, meta, "SKIP_NONEMPTY_ACQUIRE");
                    sp_freelist_push_lockfree(class_idx, reuse_meta, reuse_slot_idx);
                    goto stage2_fallback;
                }
            }
            // P-Tier: Skip DRAINING tier SuperSlabs
            if (!ss_tier_is_hot(ss_guard)) {
                // DRAINING SuperSlab - skip this slot and fall through to Stage 2
@ -270,6 +418,15 @@ stage1_retry_after_tension_drain:
            *ss_out = ss;
            *slab_idx_out = reuse_slot_idx;
            if (c7_reset_and_log_if_needed(ss, reuse_slot_idx, class_idx) != 0) {
                *ss_out = NULL;
                *slab_idx_out = -1;
                if (g_lock_stats_enabled == 1) {
                    atomic_fetch_add(&g_lock_release_count, 1);
                }
                pthread_mutex_unlock(&g_shared_pool.alloc_lock);
                goto stage2_fallback;
            }
            if (g_lock_stats_enabled == 1) {
                atomic_fetch_add(&g_lock_release_count, 1);
@ -338,6 +495,19 @@ stage2_fallback:
                                1, memory_order_relaxed);
                        }
                        if (class_idx == 7) {
                            TinySlabMeta* meta = &ss->slabs[claimed_idx];
                            if (!c7_meta_is_pristine(meta)) {
                                c7_log_skip_nonempty_acquire(ss, claimed_idx, meta, "SKIP_NONEMPTY_ACQUIRE");
                                sp_slot_mark_empty(hint_meta, claimed_idx);
                                if (g_lock_stats_enabled == 1) {
                                    atomic_fetch_add(&g_lock_release_count, 1);
                                }
                                pthread_mutex_unlock(&g_shared_pool.alloc_lock);
                                goto stage2_scan;
                            }
                        }
                        // Update SuperSlab metadata under mutex
                        ss->slab_bitmap |= (1u << claimed_idx);
                        ss_slab_meta_class_idx_set(ss, claimed_idx, (uint8_t)class_idx);
@ -353,6 +523,15 @@ stage2_fallback:
                        // Hint is still good, no need to update
                        *ss_out = ss;
                        *slab_idx_out = claimed_idx;
                        if (c7_reset_and_log_if_needed(ss, claimed_idx, class_idx) != 0) {
                            *ss_out = NULL;
                            *slab_idx_out = -1;
                            if (g_lock_stats_enabled == 1) {
                                atomic_fetch_add(&g_lock_release_count, 1);
                            }
                            pthread_mutex_unlock(&g_shared_pool.alloc_lock);
                            goto stage2_scan;
                        }
                        sp_fix_geometry_if_needed(ss, claimed_idx, class_idx);
                        if (g_lock_stats_enabled == 1) {
@ -432,6 +611,19 @@ stage2_scan:
                    1, memory_order_relaxed);
            }
            if (class_idx == 7) {
                TinySlabMeta* meta_slab = &ss->slabs[claimed_idx];
                if (!c7_meta_is_pristine(meta_slab)) {
                    c7_log_skip_nonempty_acquire(ss, claimed_idx, meta_slab, "SKIP_NONEMPTY_ACQUIRE");
                    sp_slot_mark_empty(meta, claimed_idx);
                    if (g_lock_stats_enabled == 1) {
                        atomic_fetch_add(&g_lock_release_count, 1);
                    }
                    pthread_mutex_unlock(&g_shared_pool.alloc_lock);
                    continue;
                }
            }
            // Update SuperSlab metadata under mutex
            ss->slab_bitmap |= (1u << claimed_idx);
            ss_slab_meta_class_idx_set(ss, claimed_idx, (uint8_t)class_idx);
@ -449,6 +641,15 @@ stage2_scan:
            *ss_out = ss;
            *slab_idx_out = claimed_idx;
            if (c7_reset_and_log_if_needed(ss, claimed_idx, class_idx) != 0) {
                *ss_out = NULL;
                *slab_idx_out = -1;
                if (g_lock_stats_enabled == 1) {
                    atomic_fetch_add(&g_lock_release_count, 1);
                }
                pthread_mutex_unlock(&g_shared_pool.alloc_lock);
                continue;
            }
            sp_fix_geometry_if_needed(ss, claimed_idx, class_idx);
            if (g_lock_stats_enabled == 1) {
@ -623,6 +824,15 @@ stage2_scan:
    *ss_out = new_ss;
    *slab_idx_out = first_slot;
    if (c7_reset_and_log_if_needed(new_ss, first_slot, class_idx) != 0) {
        *ss_out = NULL;
        *slab_idx_out = -1;
        if (g_lock_stats_enabled == 1) {
            atomic_fetch_add(&g_lock_release_count, 1);
        }
        pthread_mutex_unlock(&g_shared_pool.alloc_lock);
        return -1;
    }
    sp_fix_geometry_if_needed(new_ss, first_slot, class_idx);
    if (g_lock_stats_enabled == 1) {
--- a/core/hakmem_shared_pool_release.c
+++ b/core/hakmem_shared_pool_release.c
@ -6,11 +6,42 @@
 #include "hakmem_env_cache.h"  // Priority-2: ENV cache
 #include "superslab/superslab_inline.h"  // superslab_ref_get guard for TLS pins
 #include "box/ss_release_guard_box.h"    // Box: SuperSlab Release Guard
 #include "box/ss_slab_reset_box.h"       // Box: Reset slab metadata on reuse path
 #include <stdlib.h>
 #include <stdio.h>
 #include <stdatomic.h>
 static inline void c7_release_log_once(SuperSlab* ss, int slab_idx) {
 #if HAKMEM_BUILD_RELEASE
    static _Atomic uint32_t rel_c7_release_logs = 0;
    uint32_t n = atomic_fetch_add_explicit(&rel_c7_release_logs, 1, memory_order_relaxed);
    if (n < 8) {
        TinySlabMeta* meta = &ss->slabs[slab_idx];
        fprintf(stderr,
                "[REL_C7_RELEASE] ss=%p slab=%d used=%u cap=%u carved=%u\n",
                (void*)ss,
                slab_idx,
                (unsigned)meta->used,
                (unsigned)meta->capacity,
                (unsigned)meta->carved);
    }
 #else
    static _Atomic uint32_t dbg_c7_release_logs = 0;
    uint32_t n = atomic_fetch_add_explicit(&dbg_c7_release_logs, 1, memory_order_relaxed);
    if (n < 8) {
        TinySlabMeta* meta = &ss->slabs[slab_idx];
        fprintf(stderr,
                "[DBG_C7_RELEASE] ss=%p slab=%d used=%u cap=%u carved=%u\n",
                (void*)ss,
                slab_idx,
                (unsigned)meta->used,
                (unsigned)meta->capacity,
                (unsigned)meta->carved);
    }
 #endif
 }
 void
 shared_pool_release_slab(SuperSlab* ss, int slab_idx)
 {
@ -75,6 +106,9 @@ shared_pool_release_slab(SuperSlab* ss, int slab_idx)
    }
    uint8_t class_idx = slab_meta->class_idx;
    if (class_idx == 7) {
        c7_release_log_once(ss, slab_idx);
    }
    // Guard: if SuperSlab is pinned (TLS/remote references), defer release to avoid
    // class_map=255 while pointers are still in-flight.
@ -101,6 +135,39 @@ shared_pool_release_slab(SuperSlab* ss, int slab_idx)
    }
    #endif
    if (class_idx == 7) {
        ss_slab_reset_meta_for_tiny(ss, slab_idx, class_idx);
 #if HAKMEM_BUILD_RELEASE
        static _Atomic uint32_t rel_c7_reset_logs = 0;
        uint32_t rn = atomic_fetch_add_explicit(&rel_c7_reset_logs, 1, memory_order_relaxed);
        if (rn < 4) {
            TinySlabMeta* m = &ss->slabs[slab_idx];
            fprintf(stderr,
                    "[REL_C7_RELEASE_RESET] ss=%p slab=%d used=%u cap=%u carved=%u freelist=%p\n",
                    (void*)ss,
                    slab_idx,
                    (unsigned)m->used,
                    (unsigned)m->capacity,
                    (unsigned)m->carved,
                    m->freelist);
        }
 #else
        static _Atomic uint32_t dbg_c7_reset_logs = 0;
        uint32_t rn = atomic_fetch_add_explicit(&dbg_c7_reset_logs, 1, memory_order_relaxed);
        if (rn < 4) {
            TinySlabMeta* m = &ss->slabs[slab_idx];
            fprintf(stderr,
                    "[DBG_C7_RELEASE_RESET] ss=%p slab=%d used=%u cap=%u carved=%u freelist=%p\n",
                    (void*)ss,
                    slab_idx,
                    (unsigned)m->used,
                    (unsigned)m->capacity,
                    (unsigned)m->carved,
                    m->freelist);
        }
 #endif
    }
    // Find SharedSSMeta for this SuperSlab
    SharedSSMeta* sp_meta = NULL;
    uint32_t count = atomic_load_explicit(&g_shared_pool.ss_meta_count, memory_order_relaxed);
--- a/core/hakmem_tiny.c
+++ b/core/hakmem_tiny.c
@ -25,6 +25,7 @@
 #include "front/tiny_heap_v2.h"
 #include "tiny_tls_guard.h"
 #include "tiny_ready.h"
 #include "box/c7_meta_used_counter_box.h"
 #include "hakmem_tiny_tls_list.h"
 #include "hakmem_tiny_remote_target.h" // Phase 2C-1: Remote target queue
 #include "hakmem_tiny_bg_spill.h"      // Phase 2C-2: Background spill queue
@ -334,6 +335,7 @@ static inline void* hak_tiny_alloc_superslab_try_fast(int class_idx) {
        size_t block_size = tiny_stride_for_class(meta->class_idx);
        void* block = tls->slab_base + ((size_t)meta->used * block_size);
        meta->used++;
        c7_meta_used_note(meta->class_idx, C7_META_USED_SRC_FRONT);
        // Track active blocks in SuperSlab for conservative reclamation
        ss_active_inc(tls->ss);
        return block;
--- a/core/hakmem_tiny_alloc_new.inc
+++ b/core/hakmem_tiny_alloc_new.inc
@ -17,6 +17,7 @@
 // Phase E1-CORRECT: Box API for next pointer operations
 #include "box/tiny_next_ptr_box.h"
 #include "front/tiny_heap_v2.h"
 #include "box/c7_meta_used_counter_box.h"
 // Debug counters (thread-local)
 static __thread uint64_t g_3layer_bump_hits = 0;
@ -265,6 +266,7 @@ static void* tiny_alloc_slow_new(int class_idx) {
        meta->freelist = tiny_next_read(node);  // Phase E1-CORRECT: Box API
        items[got++] = node;
        meta->used++;
        c7_meta_used_note(class_idx, C7_META_USED_SRC_FRONT);
    }
    // Then linear carve (KEY OPTIMIZATION - direct array fill!)
@ -285,6 +287,11 @@ static void* tiny_alloc_slow_new(int class_idx) {
        }
        meta->used += need;  // Reserve to TLS; not active until returned to user
        if (class_idx == 7) {
            for (uint32_t i = 0; i < need; ++i) {
                c7_meta_used_note(class_idx, C7_META_USED_SRC_FRONT);
            }
        }
    }
    if (got == 0) {
--- a/core/hakmem_tiny_refill.inc.h
+++ b/core/hakmem_tiny_refill.inc.h
@ -18,6 +18,7 @@
 #include "tiny_box_geometry.h"
 #include "superslab/superslab_inline.h"  // Provides hak_super_lookup() and SUPERSLAB_MAGIC
 #include "box/tls_sll_box.h"
 #include "box/c7_meta_used_counter_box.h"
 #include "box/tiny_header_box.h"  // Header Box: Single Source of Truth for header operations
 #include "box/tiny_front_config_box.h"  // Phase 7-Step6-Fix: Config macros for dead code elimination
 #include "hakmem_tiny_integrity.h"
@ -94,6 +95,39 @@ static inline void tiny_debug_validate_node_base(int class_idx, void* node, cons
 }
 #endif
 static inline void c7_log_used_assign_cap(TinySlabMeta* meta,
                                          int class_idx,
                                          const char* tag) {
    if (__builtin_expect(class_idx != 7, 1)) {
        return;
    }
 #if HAKMEM_BUILD_RELEASE
    static _Atomic uint32_t rel_logs = 0;
    uint32_t n = atomic_fetch_add_explicit(&rel_logs, 1, memory_order_relaxed);
    if (n < 4) {
        fprintf(stderr,
                "[REL_C7_USED_ASSIGN] tag=%s used=%u cap=%u carved=%u freelist=%p\n",
                tag,
                (unsigned)meta->used,
                (unsigned)meta->capacity,
                (unsigned)meta->carved,
                meta->freelist);
    }
 #else
    static _Atomic uint32_t dbg_logs = 0;
    uint32_t n = atomic_fetch_add_explicit(&dbg_logs, 1, memory_order_relaxed);
    if (n < 4) {
        fprintf(stderr,
                "[DBG_C7_USED_ASSIGN] tag=%s used=%u cap=%u carved=%u freelist=%p\n",
                tag,
                (unsigned)meta->used,
                (unsigned)meta->capacity,
                (unsigned)meta->carved,
                meta->freelist);
    }
 #endif
 }
 // ========= superslab_tls_bump_fast =========
 //
 // Ultra bump shadow: current slabが freelist 空で carved<capacity のとき、
@ -141,6 +175,11 @@ static inline void* superslab_tls_bump_fast(int class_idx) {
    meta->carved = (uint16_t)(carved + (uint16_t)chunk);
    meta->used = (uint16_t)(meta->used + (uint16_t)chunk);
    if (class_idx == 7) {
        for (uint32_t i = 0; i < chunk; ++i) {
            c7_meta_used_note(class_idx, C7_META_USED_SRC_FRONT);
        }
    }
    ss_active_add(tls->ss, chunk);
 #if HAKMEM_DEBUG_COUNTERS
    g_bump_arms[class_idx]++;
@ -365,8 +404,10 @@ int sll_refill_small_from_ss(int class_idx, int max_take)
            meta->freelist = next_raw;
            meta->used++;
            c7_meta_used_note(class_idx, C7_META_USED_SRC_FRONT);
            if (__builtin_expect(meta->used > meta->capacity, 0)) {
                // 異常検出時はロールバックして終了（fail-fast 回避のため静かに中断）
                c7_log_used_assign_cap(meta, class_idx, "FREELIST_OVERRUN");
                meta->used = meta->capacity;
                break;
            }
@ -414,7 +455,9 @@ int sll_refill_small_from_ss(int class_idx, int max_take)
            meta->carved++;
            meta->used++;
            c7_meta_used_note(class_idx, C7_META_USED_SRC_FRONT);
            if (__builtin_expect(meta->used > meta->capacity, 0)) {
                c7_log_used_assign_cap(meta, class_idx, "CARVE_OVERRUN");
                meta->used = meta->capacity;
                break;
            }
--- a/core/refill/ss_refill_fc.h
+++ b/core/refill/ss_refill_fc.h
@ -33,6 +33,7 @@
 #ifndef HEADER_CLASS_MASK
 #define HEADER_CLASS_MASK 0x0F
 #endif
 #include "../box/c7_meta_used_counter_box.h"
 // ========================================================================
 // REFILL CONTRACT: ss_refill_fc_fill() - Standard Refill Entry Point
@ -131,12 +132,14 @@ static inline int ss_refill_fc_fill(int class_idx, int want) {
            p = meta->freelist;
            meta->freelist = tiny_next_read(class_idx, p);
            meta->used++;
            c7_meta_used_note(class_idx, C7_META_USED_SRC_FRONT);
        }
        // Option B: Carve new block (if capacity available)
        else if (meta->carved < meta->capacity) {
            p = (void*)(slab_base + (meta->carved * stride));
            meta->carved++;
            meta->used++;
            c7_meta_used_note(class_idx, C7_META_USED_SRC_FRONT);
        }
        // Option C: Slab exhausted, need new slab
        else {
--- a/core/slab_handle.h
+++ b/core/slab_handle.h
@ -9,6 +9,7 @@
 #include "tiny_debug_ring.h"
 #include "tiny_remote.h"
 #include "box/tiny_next_ptr_box.h"  // Box API: next pointer read/write
 #include "box/c7_meta_used_counter_box.h"
 extern int g_debug_remote_guard;
 extern int g_tiny_safe_free_strict;
@ -311,6 +312,7 @@ static inline void* slab_freelist_pop(SlabHandle* h) {
        void* next = tiny_next_read(h->meta->class_idx, ptr);  // Box API: next pointer read
        h->meta->freelist = next;
        h->meta->used++;
        c7_meta_used_note(h->meta->class_idx, C7_META_USED_SRC_FRONT);
        // Optional freelist mask clear when freelist becomes empty
        do {
            static int g_mask_en2 = -1;
--- a/core/superslab_backend.c
+++ b/core/superslab_backend.c
@ -4,6 +4,10 @@
 // Date: 2025-11-28
 #include "hakmem_tiny_superslab_internal.h"
 #include "box/c7_meta_used_counter_box.h"
 #include <stdatomic.h>
 static _Atomic uint32_t g_c7_backend_calls = 0;
 // Note: Legacy backend moved to archive/superslab_backend_legacy.c (not built).
@ -83,6 +87,20 @@ void* hak_tiny_alloc_superslab_backend_shared(int class_idx)
        return NULL;
    }
    if (class_idx == 7) {
        uint32_t n = atomic_fetch_add_explicit(&g_c7_backend_calls, 1, memory_order_relaxed);
        if (n < 8) {
            fprintf(stderr,
                    "[REL_C7_BACKEND_CALL] cls=%d meta_cls=%u used=%u cap=%u ss=%p slab=%d\n",
                    class_idx,
                    (unsigned)meta->class_idx,
                    (unsigned)meta->used,
                    (unsigned)meta->capacity,
                    (void*)ss,
                    slab_idx);
        }
    }
    // Simple bump allocation within this slab.
    if (meta->used >= meta->capacity) {
        // Slab exhausted: in minimal Phase12-2 backend we do not loop;
@ -101,6 +119,7 @@ void* hak_tiny_alloc_superslab_backend_shared(int class_idx)
    uint8_t* base = (uint8_t*)ss + slab_base_off + offset;
    meta->used++;
    c7_meta_used_note(class_idx, C7_META_USED_SRC_BACKEND);
    atomic_fetch_add_explicit(&ss->total_active_blocks, 1, memory_order_relaxed);
    HAK_RET_ALLOC_BLOCK_TRACED(class_idx, base, ALLOC_PATH_BACKEND);
--- a/core/superslab_slab.c
+++ b/core/superslab_slab.c
@ -6,6 +6,7 @@
 #include "hakmem_tiny_superslab_internal.h"
 #include "box/slab_recycling_box.h"
 #include "hakmem_env_cache.h"  // Priority-2: ENV cache (eliminate syscalls)
 #include <stdio.h>
 // ============================================================================
 // Remote Drain (MPSC queue to freelist conversion)
@ -175,6 +176,37 @@ void superslab_init_slab(SuperSlab* ss, int slab_idx, size_t block_size, uint32_
        }
    }
 #if HAKMEM_BUILD_RELEASE
    static _Atomic int rel_c7_init_logged = 0;
    if (meta->class_idx == 7 &&
        atomic_load_explicit(&rel_c7_init_logged, memory_order_relaxed) == 0) {
        fprintf(stderr,
                "[REL_C7_INIT] ss=%p slab=%d cls=%u cap=%u used=%u carved=%u stride=%zu\n",
                (void*)ss,
                slab_idx,
                (unsigned)meta->class_idx,
                (unsigned)meta->capacity,
                (unsigned)meta->used,
                (unsigned)meta->carved,
                stride);
        atomic_store_explicit(&rel_c7_init_logged, 1, memory_order_relaxed);
    }
 #else
    static __thread int dbg_c7_init_logged = 0;
    if (meta->class_idx == 7 && dbg_c7_init_logged == 0) {
        fprintf(stderr,
                "[DBG_C7_INIT] ss=%p slab=%d cls=%u cap=%u used=%u carved=%u stride=%zu\n",
                (void*)ss,
                slab_idx,
                (unsigned)meta->class_idx,
                (unsigned)meta->capacity,
                (unsigned)meta->used,
                (unsigned)meta->carved,
                stride);
        dbg_c7_init_logged = 1;
    }
 #endif
    superslab_activate_slab(ss, slab_idx);
 }
--- a/core/tiny_superslab_alloc.inc.h
+++ b/core/tiny_superslab_alloc.inc.h
@ -7,6 +7,8 @@
 #include "box/superslab_expansion_box.h"  // Box E: Expansion with TLS state guarantee
 #include "box/tiny_next_ptr_box.h"        // Box API: Next pointer read/write
 #include "box/tiny_tls_carve_one_block_box.h"  // Box: Shared TLS carve helper
 #include "box/c7_meta_used_counter_box.h" // Box: C7 meta->used telemetry
 #include "hakmem_tiny_superslab_constants.h"
 #include "tiny_box_geometry.h"            // Box 3: Geometry & Capacity Calculator"
 #include "tiny_debug_api.h"               // Guard/failfast declarations
@ -33,6 +35,7 @@ static inline void* superslab_alloc_from_slab(SuperSlab* ss, int slab_idx) {
                uint8_t* base = tiny_slab_base_for_geometry(ss, slab_idx);
                void* block = tiny_block_at_index(base, meta->used, unit_sz);
                meta->used++;
                c7_meta_used_note(cls, C7_META_USED_SRC_FRONT);
                ss_active_inc(ss);
                HAK_RET_ALLOC(cls, block);
            }
@ -105,6 +108,7 @@ static inline void* superslab_alloc_from_slab(SuperSlab* ss, int slab_idx) {
        }
 #endif
        meta->used++;
        c7_meta_used_note(meta->class_idx, C7_META_USED_SRC_FRONT);
        void* user =
 #if HAKMEM_TINY_HEADER_CLASSIDX
            tiny_region_id_write_header(block_base, meta->class_idx);
@ -157,6 +161,7 @@ static inline void* superslab_alloc_from_slab(SuperSlab* ss, int slab_idx) {
        meta->freelist = tiny_next_read(meta->class_idx, block);
        meta->used++;
        c7_meta_used_note(meta->class_idx, C7_META_USED_SRC_FRONT);
        if (__builtin_expect(tiny_refill_failfast_level() >= 2, 0) &&
            __builtin_expect(meta->used > meta->capacity, 0)) {
@ -294,54 +299,33 @@ static inline void* hak_tiny_alloc_superslab(int class_idx) {
    }
    // Fast path: linear carve from current TLS slab
-    if (meta && meta->freelist == NULL && meta->used < meta->capacity && tls->slab_base) {
+    if (meta && tls->slab_base) {
-        size_t block_size = tiny_stride_for_class(meta->class_idx);
+        TinyTLSCarveOneResult carve = tiny_tls_carve_one_block(tls, class_idx);
-        uint8_t* base = tls->slab_base;
+        if (carve.block) {
-        void* block = base + ((size_t)meta->used * block_size);
+#if !HAKMEM_BUILD_RELEASE
-        meta->used++;
+            if (__builtin_expect(g_debug_remote_guard, 0)) {
-
+                const char* tag = (carve.path == TINY_TLS_CARVE_PATH_FREELIST)
-        if (__builtin_expect(tiny_refill_failfast_level() >= 2, 0)) {
+                                      ? "freelist_alloc"
-            uintptr_t base_ss = (uintptr_t)tls->ss;
+                                      : "linear_alloc";
-            size_t ss_size = (size_t)1ULL << tls->ss->lg_size;
+                tiny_remote_track_on_alloc(tls->ss, slab_idx, carve.block, tag, 0);
-            uintptr_t p = (uintptr_t)block;
+                tiny_remote_assert_not_remote(tls->ss, slab_idx, carve.block, tag, 0);
            int in_range = (p >= base_ss) && (p < base_ss + ss_size);
            int aligned = ((p - (uintptr_t)base) % block_size) == 0;
            int idx_ok = (tls->slab_idx >= 0) &&
                         (tls->slab_idx < ss_slabs_capacity(tls->ss));
            if (!in_range || !aligned || !idx_ok || meta->used > meta->capacity) {
                tiny_failfast_abort_ptr("alloc_ret_align",
                                        tls->ss,
                                        tls->slab_idx,
                                        block,
                                        "superslab_tls_invariant");
            }
-        }
+#endif
-        ss_active_inc(tls->ss);
+#if HAKMEM_TINY_SS_TLS_HINT
-        ROUTE_MARK(11); ROUTE_COMMIT(class_idx, 0x60);
+            {
-        HAK_RET_ALLOC(class_idx, block);
+                void* ss_base = (void*)tls->ss;
-    }
+                size_t ss_size = (size_t)1ULL << tls->ss->lg_size;
-
+                tls_ss_hint_update(tls->ss, ss_base, ss_size);
    // Freelist path from current TLS slab
    if (meta && meta->freelist) {
        void* block = meta->freelist;
        if (__builtin_expect(g_tiny_safe_free, 0)) {
            size_t blk = tiny_stride_for_class(meta->class_idx);
            uint8_t* base = tiny_slab_base_for_geometry(tls->ss, tls->slab_idx);
            uintptr_t delta = (uintptr_t)block - (uintptr_t)base;
            int align_ok = ((delta % blk) == 0);
            int range_ok = (delta / blk) < meta->capacity;
            if (!align_ok || !range_ok) {
                if (g_tiny_safe_free_strict) { raise(SIGUSR2); return NULL; }
                return NULL;
            }
 #endif
            if (carve.path == TINY_TLS_CARVE_PATH_LINEAR) {
                ROUTE_MARK(11); ROUTE_COMMIT(class_idx, 0x60);
            } else if (carve.path == TINY_TLS_CARVE_PATH_FREELIST) {
                ROUTE_MARK(12); ROUTE_COMMIT(class_idx, 0x61);
            }
            HAK_RET_ALLOC(class_idx, carve.block);
        }
        void* next = tiny_next_read(class_idx, block);
        meta->freelist = next;
        meta->used++;
        ss_active_inc(tls->ss);
        ROUTE_MARK(12); ROUTE_COMMIT(class_idx, 0x61);
        HAK_RET_ALLOC(class_idx, block);
    }
    // Slow path: acquire a new slab via shared pool
@ -363,6 +347,7 @@ static inline void* hak_tiny_alloc_superslab(int class_idx) {
        size_t block_size = tiny_stride_for_class(meta->class_idx);
        void* block = tiny_block_at_index(tls->slab_base, meta->used, block_size);
        meta->used++;
        c7_meta_used_note(meta->class_idx, C7_META_USED_SRC_FRONT);
        ss_active_inc(ss);
        HAK_RET_ALLOC(class_idx, block);
    }