diff --git a/CURRENT_TASK.md b/CURRENT_TASK.md
index 38aa5228..803114f1 100644
--- a/CURRENT_TASK.md
+++ b/CURRENT_TASK.md
@@ -212,9 +212,125 @@ Phase12 の設計に沿った shared SuperSlab pool 実装および Box API 境
   - それでも `g_tiny_hotpath_class5=1` だと再現 → ホットパス経路のどこかに BASE/USER/next 整合不備が残存。
   - 当面の安定デフォルト: `g_tiny_hotpath_class5=0`（Env で A/B 可: `HAKMEM_TINY_HOTPATH_CLASS5=1`）。
 
+### C5 SEGV 根治（実装済み・最小パッチ）
+
+- 直接原因（再現ログ/リングより）
+  - TLS SLL へ push される C5 ノードの header が 0x00（`safeheader` による reject が連発）
+  - パターン: 連番アドレス（`...8800, ...8900, ...8a00, ...`）で header=0 → carve/remote 経由の未整備ノード
+- 修正点（Box 境界厳守の“点”修正）
+  - Remote Queue → FreeList 変換時に header を復元
+    - ファイル: `core/hakmem_tiny_superslab.c:120` 付近（`_ss_remote_drain_to_freelist_unsafe`）
+    - 処理: クラス1–6は `*(uint8_t*)node = HEADER_MAGIC | (cls & HEADER_CLASS_MASK)` を実行後、`tiny_next_write()` で next を Box 形式に書換
+  - Superslab→TLS SLL への refill 時に header を整備
+    - ファイル: `core/hakmem_tiny_refill.inc.h:...`（`sll_refill_small_from_ss`）
+    - 処理: SLL へ積む直前にクラス1–6の header を設定してから `tls_sll_push()`
+  - 参考: 旧 `pool_tls_remote.c` も Box API 化（未使用系だが将来不整合防止）
+- 検証（リング+ベンチ）
+  - 環境: `HAKMEM_TINY_SLL_MASK=0x3F HAKMEM_TINY_SLL_SAFEHEADER=1 HAKMEM_TINY_HOTPATH_CLASS5=1`
+  - 以前: `tls_sll_reject(class=5)` が多数 → SIGSEGV
+  - 以後: `bench_random_mixed_hakmem 200000 256 42` 正常完走（リングに tls_sll_* 異常なし）
+  - C5 単独（`mask=0x20`）でも異常なしを確認
+
 ### 次の実装（根治方針／小粒）
 
 1) 共有SSの観測を先に確定（`HAKMEM_TINY_SLL_MASK=0x1F` でON/OFFのA/B、軽いFail‑Fast/リング有効）
 2) C5根治: C5のみON（`HAKMEM_TINY_SLL_MASK=0x20`、`HAKMEM_TINY_SLL_SAFEHEADER=1`、`HAKMEM_TINY_HOTPATH_CLASS5=0`）で短尺実行→最初の破綻箇所をログ採取
+   - 追加可視化（異常時のみリング記録）: `HAKMEM_TINY_SLL_RING=1 HAKMEM_TINY_TRACE_RING=1`
+     - 追加イベント: `tls_sll_reject`（safeheaderで拒否）, `tls_sll_sentinel`（リモート哨戒混入）, `tls_sll_hdr_corrupt`（POP時ヘッダ不整合）
+     - 実行例: `HAKMEM_TINY_SLL_MASK=0x20 HAKMEM_TINY_SLL_SAFEHEADER=1 HAKMEM_TINY_HOTPATH_CLASS5=0 HAKMEM_TINY_SLL_RING=1 HAKMEM_TINY_TRACE_RING=1 ./bench_random_mixed_hakmem 100000 256 42`
 3) 該当箇所（BASE/USER/next、ヘッダ整合）に点で外科修正（~20–30行）。
 4) 段階的にマスク拡張（C6→C7）し再検証。
+
+---
+
+## 5. Tiny フロント最適化ロードマップ（Phase 2/3 反映）
+
+目的: 全ベンチで強い Tiny 層（≤1KB）を、箱理論の境界を守ったまま高速化。配列ベース（QuickSlot/FastCache）を主役に、SLL はオーバーフロー/合流専用に後退配置する。
+
+構造（箱と境界）
+- L0: QuickSlot（C0–C3向け 6–8 スロット固定）
+  - 配列 push/pop だけ。ノードに一切書かない（BASE/USER/next 不触）。
+  - Miss→L1。
+- L1: FastCache（C0–C7、cap 128–256）
+  - Refill は SS→FC へ“直補充”のみ（目標 cap まで一気に埋める）。
+  - 1個返却: FC→返却（ヘッダ整備は Box 内 1 点）。
+- L2: TLS SLL（Box API）
+  - 役割は「オーバーフロー/合流」のみ（Remote Drain の合流や FC 溢れ時）。
+  - アプリの通常ヒット経路からは外す（alloc 側の inline pop は行わない）。
+- 採用境界（1 箇所維持）
+  - `superslab_refill()` に adopt→remote_drain→bind→owner の順序を集約。
+  - Remote Queue（Box 2）は push（offset0 書き）専任、drain は境界 1 箇所のみ。
+
+A/B トグル（既存に追加・整理）
+- `HAKMEM_TINY_REFILL_BATCH=1`（P0: SS→FC 直補充 ON）
+- `HAKMEM_TINY_P0_DIRECT_FC_ALL=1`（全クラス FC 直補充）
+- `HAKMEM_TINY_FRONT_DIRECT=1`（中間層をスキップし FC 直補充→FC 再ポップ、既定 OFF）
+- プリセット（ベンチ良好）: `HAKMEM_TINY_REFILL_COUNT_HOT=256 HAKMEM_TINY_REFILL_COUNT_MID=96 HAKMEM_TINY_BUMP_CHUNK=256`
+
+レガシー整理方針（本体を美しく）
+- 入口/出口をモジュール化し、本体は 500 行以内を目安に維持。
+  - front 層: `core/front/quick_slot.h`, `core/front/fast_cache.h`, `core/front/front_gate.h`
+  - refill 層: `core/refill/ss_refill_fc.h`（SS→FC 直補充の 1 本化）
+  - SLL 層（後退配置）: `core/box/tls_sll_box.h` のみ公開、呼出しは refill/合流だけに限定
+- レガシー経路の段階的削除/封印
+  - inline SLL pop（C0–C3 用）や SFC cascade の常用経路を削除/既定無効化。
+  - `.bak` 系や重複/未使用ユーティリティを整理（削除）
+  - すべて A/B ガード付きで移行、Fail‑Fast とリングは“異常時のみ”記録。
+
+受け入れ基準（箱単位）
+- Front（L0/L1）ヒット率>80% を狙い、Refill 回数/1 回あたり取得数・SS 書換回数を計測。
+- Remote Drain は採用境界 1 箇所だけで発生し、drain 後の `remote_counts==0` を保証。
+- ベンチ指標（単スレ）
+  - 128/256B: 15M→30M→60M の順に上積み（A/B でトレンド確認）。
+- 安定性: sentinel 混入・ヘッダ不整合は Fail‑Fast、リングは異常時のみワンショット。
+
+実装ステップ（Phase 2/3）
+1) SS→FC 直補充の標準化（現行 `HAKMEM_TINY_REFILL_BATCH` を標準パスに昇格）
+2) L0/L1 先頭化（alloc は FC→返却が基本、SLL は合流専用）
+3) SFC は残差処理へ限定（既定 OFF、A/B 実験のみ）
+4) レガシー経路の削除・モジュール化（500 行以内目安で本体を分割）
+5) プリセットの標準化（Hot-heavy をデフォルト、A/B で Balanced/Light 切替）
+
+---
+
+## 6. 現在の進捗と次作業（Claude code 君に引き継ぎ）
+
+完了済み（沙汰の通り）
+- 新モジュール: `core/refill/ss_refill_fc.h`（SS→FC 直補充、236行）
+- Front モジュール化: `core/front/quick_slot.h`, `core/front/fast_cache.h`
+- Front‑Direct 経路: alloc/free 双方で SLL バイパス（ENV: `HAKMEM_TINY_FRONT_DIRECT=1`）
+- Refill dispatch: ENV で `ss_refill_fc_fill()` を使用（`HAKMEM_TINY_REFILL_BATCH/…DIRECT_FC_ALL`）
+- SFC cascade: 既定 OFF（ENV: `HAKMEM_TINY_SFC_CASCADE=1` で opt‑in）
+- ベンチ短尺での安定確認（SLL イベント 0, SEGV なし）
+
+未了・次作業（Claude code 君にお願い）
+1) レガシー封印/削除（A/B 残し）
+   - inline SLL pop 常用呼び出しを封印（`#if HAKMEM_TINY_INLINE_SLL` 未定義時は無効）
+   - `.bak` 系や未使用ユーティリティの削除（参照有無を `rg` で確認）
+   - SFC cascade は ENV でのみ有効（既定 OFF の確認）
+2) Refill 一本化の明文化
+   - `ss_refill_fc_fill()` を唯一の補充入口に昇格（コメントと呼び出し点整理）
+   - Front‑Direct 時は SLL/TLS List を通らないことをコード上明示
+3) 128/256 専用ショートパスの薄化（FC 命中率 UP）
+   - C0–C3: QuickSlot→FC→（必要時のみ）直補充→FC 再ポップ
+   - C4–C7: FC→（必要時のみ）直補充→FC 再ポップ
+4) 本体の簡素化（500 行目安）
+   - front*/refill*/box* への分割を継続、入口/出口の箱のみ本体に残す
+
+ベンチの推奨プリセット（再起動後の確認用）
+```
+HAKMEM_BENCH_FAST_FRONT=1 \
+HAKMEM_TINY_FRONT_DIRECT=1 \
+HAKMEM_TINY_REFILL_BATCH=1 \
+HAKMEM_TINY_P0_DIRECT_FC_ALL=1 \
+HAKMEM_TINY_REFILL_COUNT_HOT=256 \
+HAKMEM_TINY_REFILL_COUNT_MID=96 \
+HAKMEM_TINY_BUMP_CHUNK=256
+```
+
+備考: 既存の SLL 由来 SEGV は Front‑Direct 経路で回避済。SLL 経路は当面合流専用に後退配置し、常用経路からは外す。
+
+
+備考（計測メモ）
+- Phase 0/1 の改善で ~10M→~15M。Front-Direct 単体はブレが増え安定増速せず（既定 OFF）。
+- 次は FC 命中率を上げる配分とリフィル簡素化で 30–60M を狙う。
diff --git a/core/box/carve_push_box.d b/core/box/carve_push_box.d
index 923dc5f5..35a2582a 100644
--- a/core/box/carve_push_box.d
+++ b/core/box/carve_push_box.d
@@ -16,8 +16,9 @@ core/box/carve_push_box.o: core/box/carve_push_box.c \
  core/box/../ptr_track.h core/box/../ptr_trace.h \
  core/box/../box/tiny_next_ptr_box.h core/hakmem_tiny_config.h \
  core/tiny_nextptr.h core/hakmem_build_flags.h \
- core/box/../tiny_refill_opt.h core/box/../tiny_region_id.h \
- core/box/../box/tls_sll_box.h core/box/../tiny_box_geometry.h
+ core/box/../tiny_debug_ring.h core/box/../tiny_refill_opt.h \
+ core/box/../tiny_region_id.h core/box/../box/tls_sll_box.h \
+ core/box/../tiny_box_geometry.h
 core/box/../hakmem_tiny.h:
 core/box/../hakmem_build_flags.h:
 core/box/../hakmem_trace.h:
@@ -50,6 +51,7 @@ core/box/../box/tiny_next_ptr_box.h:
 core/hakmem_tiny_config.h:
 core/tiny_nextptr.h:
 core/hakmem_build_flags.h:
+core/box/../tiny_debug_ring.h:
 core/box/../tiny_refill_opt.h:
 core/box/../tiny_region_id.h:
 core/box/../box/tls_sll_box.h:
diff --git a/core/box/front_gate_box.d b/core/box/front_gate_box.d
index 0ac14bf2..0da60d12 100644
--- a/core/box/front_gate_box.d
+++ b/core/box/front_gate_box.d
@@ -11,7 +11,7 @@ core/box/front_gate_box.o: core/box/front_gate_box.c \
  core/box/../hakmem_tiny_config.h core/box/../ptr_track.h \
  core/box/../hakmem_tiny_integrity.h core/box/../hakmem_tiny.h \
  core/box/../ptr_track.h core/box/../ptr_trace.h \
- core/box/ptr_conversion_box.h
+ core/box/../tiny_debug_ring.h core/box/ptr_conversion_box.h
 core/box/front_gate_box.h:
 core/hakmem_tiny.h:
 core/hakmem_build_flags.h:
@@ -36,4 +36,5 @@ core/box/../hakmem_tiny_integrity.h:
 core/box/../hakmem_tiny.h:
 core/box/../ptr_track.h:
 core/box/../ptr_trace.h:
+core/box/../tiny_debug_ring.h:
 core/box/ptr_conversion_box.h:
diff --git a/core/box/hak_free_api.inc.h b/core/box/hak_free_api.inc.h
index cb564b3c..2014da86 100644
--- a/core/box/hak_free_api.inc.h
+++ b/core/box/hak_free_api.inc.h
@@ -91,6 +91,26 @@ void hak_free_at(void* ptr, size_t size, hak_callsite_t site) {
         }
     }
 #endif
+    // Bench-only ultra-short path: try header-based tiny fast free first
+    // Enable with: HAKMEM_BENCH_FAST_FRONT=1
+    {
+        static int g_bench_fast_front = -1;
+        if (__builtin_expect(g_bench_fast_front == -1, 0)) {
+            const char* e = getenv("HAKMEM_BENCH_FAST_FRONT");
+            g_bench_fast_front = (e && *e && *e != '0') ? 1 : 0;
+        }
+#if HAKMEM_TINY_HEADER_CLASSIDX
+        if (__builtin_expect(g_bench_fast_front && ptr != NULL, 0)) {
+            if (__builtin_expect(hak_tiny_free_fast_v2(ptr), 1)) {
+#if HAKMEM_DEBUG_TIMING
+                HKM_TIME_END(HKM_CAT_HAK_FREE, t0);
+#endif
+                return;
+            }
+        }
+#endif
+    }
+
     if (!ptr) {
 #if HAKMEM_DEBUG_TIMING
         HKM_TIME_END(HKM_CAT_HAK_FREE, t0);
diff --git a/core/box/tls_sll_box.h b/core/box/tls_sll_box.h
index d2a1d006..db5f0e54 100644
--- a/core/box/tls_sll_box.h
+++ b/core/box/tls_sll_box.h
@@ -31,6 +31,7 @@
 #include "../hakmem_tiny_integrity.h"
 #include "../ptr_track.h"
 #include "../ptr_trace.h"
+#include "../tiny_debug_ring.h"
 #include "tiny_next_ptr_box.h"
 
 // External TLS SLL state (defined in hakmem_tiny.c or equivalent)
@@ -118,16 +119,26 @@ static inline bool tls_sll_push(int class_idx, void* ptr, uint32_t capacity)
     // Default mode: restore expected header.
     if (class_idx != 0 && class_idx != 7) {
         static int g_sll_safehdr = -1;
+        static int g_sll_ring_en = -1; // optional ring trace for TLS-SLL anomalies
         if (__builtin_expect(g_sll_safehdr == -1, 0)) {
             const char* e = getenv("HAKMEM_TINY_SLL_SAFEHEADER");
             g_sll_safehdr = (e && *e && *e != '0') ? 1 : 0;
         }
+        if (__builtin_expect(g_sll_ring_en == -1, 0)) {
+            const char* r = getenv("HAKMEM_TINY_SLL_RING");
+            g_sll_ring_en = (r && *r && *r != '0') ? 1 : 0;
+        }
         uint8_t* b = (uint8_t*)ptr;
         uint8_t expected = (uint8_t)(HEADER_MAGIC | (class_idx & HEADER_CLASS_MASK));
         if (g_sll_safehdr) {
             uint8_t got = *b;
             if ((got & 0xF0u) != HEADER_MAGIC) {
                 // Reject push silently (fall back to slow path at caller)
+                if (__builtin_expect(g_sll_ring_en, 0)) {
+                    // aux encodes: high 8 bits = got, low 8 bits = expected
+                    uintptr_t aux = ((uintptr_t)got << 8) | (uintptr_t)expected;
+                    tiny_debug_ring_record(0x7F10 /*TLS_SLL_REJECT*/, (uint16_t)class_idx, ptr, aux);
+                }
                 return false;
             }
         } else {
@@ -200,6 +211,16 @@ static inline bool tls_sll_pop(int class_idx, void** out)
                 "[TLS_SLL_POP] Remote sentinel detected at head; SLL reset (cls=%d)\n",
                 class_idx);
 #endif
+        {
+            static int g_sll_ring_en = -1;
+            if (__builtin_expect(g_sll_ring_en == -1, 0)) {
+                const char* r = getenv("HAKMEM_TINY_SLL_RING");
+                g_sll_ring_en = (r && *r && *r != '0') ? 1 : 0;
+            }
+            if (__builtin_expect(g_sll_ring_en, 0)) {
+                tiny_debug_ring_record(0x7F11 /*TLS_SLL_SENTINEL*/, (uint16_t)class_idx, base, 0);
+            }
+        }
         return false;
     }
 
@@ -232,6 +253,18 @@ static inline bool tls_sll_pop(int class_idx, void** out)
             // In release, fail-safe: drop list.
             g_tls_sll_head[class_idx] = NULL;
             g_tls_sll_count[class_idx] = 0;
+            {
+                static int g_sll_ring_en = -1;
+                if (__builtin_expect(g_sll_ring_en == -1, 0)) {
+                    const char* r = getenv("HAKMEM_TINY_SLL_RING");
+                    g_sll_ring_en = (r && *r && *r != '0') ? 1 : 0;
+                }
+                if (__builtin_expect(g_sll_ring_en, 0)) {
+                    // aux encodes: high 8 bits = got, low 8 bits = expect
+                    uintptr_t aux = ((uintptr_t)got << 8) | (uintptr_t)expect;
+                    tiny_debug_ring_record(0x7F12 /*TLS_SLL_HDR_CORRUPT*/, (uint16_t)class_idx, base, aux);
+                }
+            }
             return false;
 #endif
         }
diff --git a/core/front/fast_cache.h b/core/front/fast_cache.h
new file mode 100644
index 00000000..3f7fc644
--- /dev/null
+++ b/core/front/fast_cache.h
@@ -0,0 +1,23 @@
+// core/front/fast_cache.h - Tiny Front: FastCache (L1)
+#ifndef HAK_FRONT_FAST_CACHE_H
+#define HAK_FRONT_FAST_CACHE_H
+
+#include "../hakmem_tiny.h"
+#include "quick_slot.h"
+
+#ifndef TINY_FASTCACHE_CAP
+#define TINY_FASTCACHE_CAP 128
+#endif
+
+// FastCache: 配列ベースのTLSキャッシュ（BASEのみを保持）
+typedef struct __attribute__((aligned(64))) {
+    void* items[TINY_FASTCACHE_CAP];
+    int top;
+    int _pad[15];
+} TinyFastCache;
+
+// 実装: 既存の inline 群を取り込み
+#include "../hakmem_tiny_fastcache.inc.h"
+
+#endif // HAK_FRONT_FAST_CACHE_H
+
diff --git a/core/front/quick_slot.h b/core/front/quick_slot.h
new file mode 100644
index 00000000..d3906440
--- /dev/null
+++ b/core/front/quick_slot.h
@@ -0,0 +1,24 @@
+// core/front/quick_slot.h - Tiny Front: QuickSlot (L0)
+#ifndef HAK_FRONT_QUICK_SLOT_H
+#define HAK_FRONT_QUICK_SLOT_H
+
+#include "../hakmem_tiny.h"
+
+#ifndef QUICK_CAP
+#define QUICK_CAP 6
+#endif
+
+// QuickSlot: C0–C3 向けの最小配列キャッシュ（next不触）
+typedef struct __attribute__((aligned(64))) {
+    void* items[QUICK_CAP];
+    uint8_t top;      // 0..QUICK_CAP
+    uint8_t _pad1;
+    uint16_t _pad2;
+    uint32_t _pad3;
+} TinyQuickSlot;
+
+// TLS QuickSlot（実体は TU 側で定義）
+extern __thread TinyQuickSlot g_tls_quick[TINY_NUM_CLASSES];
+
+#endif // HAK_FRONT_QUICK_SLOT_H
+
diff --git a/core/hakmem_tiny.c b/core/hakmem_tiny.c
index 7d9dbc94..6cdd7d9d 100644
--- a/core/hakmem_tiny.c
+++ b/core/hakmem_tiny.c
@@ -1184,16 +1184,10 @@ static inline __attribute__((always_inline)) int tiny_refill_max_for_class(int c
     return g_tiny_refill_max;
 }
 
-// Phase 9.5: Frontend/Backend split - Tiny FastCache (array stack)
-// Enabled via HAKMEM_TINY_FASTCACHE=1 (default: 0)
-// Compile-out: define HAKMEM_TINY_NO_FRONT_CACHE=1 to exclude this path
-#define TINY_FASTCACHE_CAP 128
-typedef struct __attribute__((aligned(64))) {
-    void* items[TINY_FASTCACHE_CAP];
-    int top;
-    int _pad[15];
-} TinyFastCache;
-static __thread TinyFastCache g_fast_cache[TINY_NUM_CLASSES];
+// Phase 9.5: Frontend/Backend split - Tiny Front modules（QuickSlot / FastCache）
+#include "front/quick_slot.h"
+#include "front/fast_cache.h"
+__thread TinyFastCache g_fast_cache[TINY_NUM_CLASSES];
 static int g_frontend_enable = 0;                // HAKMEM_TINY_FRONTEND=1 (experimental ultra-fast frontend)
 // SLL capacity multiplier for hot tiny classes (env: HAKMEM_SLL_MULTIPLIER)
 int g_sll_multiplier = 2;
@@ -1270,21 +1264,17 @@ static __thread TinyHotMag g_tls_hot_mag[TINY_NUM_CLASSES];
 // TinyQuickSlot: 1 cache line per class (quick 6 items + small metadata)
 // Opt-in via HAKMEM_TINY_QUICK=1
 // NOTE: This type definition must come BEFORE the Phase 2D-1 includes below
-typedef struct __attribute__((aligned(64))) {
-    void* items[6];   // 48B
-    uint8_t top;      // 1B  (0..6)
-    uint8_t _pad1;    // 1B
-    uint16_t _pad2;   // 2B
-    uint32_t _pad3;   // 4B  (padding to 64B)
-} TinyQuickSlot;
-static int g_quick_enable = 0;                 // HAKMEM_TINY_QUICK=1
-static __thread TinyQuickSlot g_tls_quick[TINY_NUM_CLASSES]; // compile-out via guards below
+int g_quick_enable = 0;                 // HAKMEM_TINY_QUICK=1
+__thread TinyQuickSlot g_tls_quick[TINY_NUM_CLASSES]; // compile-out via guards below
 
-// Phase 2D-1: Hot-path inline function extractions
-// NOTE: These includes require TinyFastCache, TinyQuickSlot, and TinyTLSSlab to be fully defined
+// Phase 2D-1: Hot-path inline function extractions（Front）
+// NOTE: TinyFastCache/TinyQuickSlot は front/ で定義済み
 #include "hakmem_tiny_hot_pop.inc.h"       // 4 functions: tiny_hot_pop_class{0..3}
-#include "hakmem_tiny_fastcache.inc.h"     // 5 functions: tiny_fast_pop/push, fastcache_pop/push, quick_pop
 #include "hakmem_tiny_refill.inc.h"        // 8 functions: refill operations
+#if HAKMEM_TINY_P0_BATCH_REFILL
+#include "hakmem_tiny_refill_p0.inc.h"     // P0 batch refill → FastCache 直補充
+#endif
+#include "refill/ss_refill_fc.h"            // NEW: Direct SS→FC refill
 
 // Phase 7 Task 3: Pre-warm TLS cache at init
 // Pre-allocate blocks to reduce first-allocation miss penalty
@@ -1775,6 +1765,17 @@ TinySlab* hak_tiny_owner_slab(void* ptr) {
     // Export wrapper functions for hakmem.c to call
     // Phase 6-1.7 Optimization: Remove diagnostic overhead, rely on LTO for inlining
     void* hak_tiny_alloc_fast_wrapper(size_t size) {
+        // Bench-only ultra-short path: bypass diagnostics and pointer tracking
+        // Enable with: HAKMEM_BENCH_FAST_FRONT=1
+        static int g_bench_fast_front = -1;
+        if (__builtin_expect(g_bench_fast_front == -1, 0)) {
+            const char* e = getenv("HAKMEM_BENCH_FAST_FRONT");
+            g_bench_fast_front = (e && *e && *e != '0') ? 1 : 0;
+        }
+        if (__builtin_expect(g_bench_fast_front, 0)) {
+            return tiny_alloc_fast(size);
+        }
+
         static _Atomic uint64_t wrapper_call_count = 0;
         uint64_t call_num = atomic_fetch_add(&wrapper_call_count, 1);
 
@@ -1798,7 +1799,6 @@ TinySlab* hak_tiny_owner_slab(void* ptr) {
             fflush(stderr);
         }
         #endif
-        // Diagnostic removed - use HAKMEM_TINY_FRONT_DIAG in tiny_alloc_fast_pop if needed
         void* result = tiny_alloc_fast(size);
         #if !HAKMEM_BUILD_RELEASE
         if (call_num > 14250 && call_num < 14280 && size <= 1024) {
@@ -1864,6 +1864,16 @@ TinySlab* hak_tiny_owner_slab(void* ptr) {
 // Free path implementations
 #include "hakmem_tiny_free.inc"
 
+// ---- Phase 1: Provide default batch-refill symbol (fallback to small refill)
+// Allows runtime gate HAKMEM_TINY_REFILL_BATCH=1 without requiring a rebuild.
+#ifndef HAKMEM_TINY_P0_BATCH_REFILL
+int sll_refill_small_from_ss(int class_idx, int max_take);
+__attribute__((weak)) int sll_refill_batch_from_ss(int class_idx, int max_take)
+{
+    return sll_refill_small_from_ss(class_idx, max_take);
+}
+#endif
+
 // ============================================================================
 // EXTRACTED TO hakmem_tiny_lifecycle.inc (Phase 2D-3)
 // ============================================================================
diff --git a/core/hakmem_tiny.d b/core/hakmem_tiny.d
index de68eb64..930277f9 100644
--- a/core/hakmem_tiny.d
+++ b/core/hakmem_tiny.d
@@ -21,24 +21,28 @@ core/hakmem_tiny.o: core/hakmem_tiny.c core/hakmem_tiny.h \
  core/tiny_ready_bg.h core/tiny_route.h core/box/adopt_gate_box.h \
  core/tiny_tls_guard.h core/hakmem_tiny_tls_list.h \
  core/hakmem_tiny_bg_spill.h core/tiny_adaptive_sizing.h \
- core/tiny_system.h core/hakmem_prof.h core/tiny_publish.h \
- core/box/tls_sll_box.h core/box/../hakmem_tiny_config.h \
- core/box/../hakmem_build_flags.h core/box/../tiny_remote.h \
- core/box/../tiny_region_id.h core/box/../hakmem_build_flags.h \
- core/box/../tiny_box_geometry.h \
+ core/tiny_system.h core/hakmem_prof.h core/front/quick_slot.h \
+ core/front/../hakmem_tiny.h core/front/fast_cache.h \
+ core/front/quick_slot.h core/front/../hakmem_tiny_fastcache.inc.h \
+ core/front/../hakmem_tiny.h core/front/../tiny_remote.h \
+ core/tiny_publish.h core/box/tls_sll_box.h \
+ core/box/../hakmem_tiny_config.h core/box/../hakmem_build_flags.h \
+ core/box/../tiny_remote.h core/box/../tiny_region_id.h \
+ core/box/../hakmem_build_flags.h core/box/../tiny_box_geometry.h \
  core/box/../hakmem_tiny_superslab_constants.h \
  core/box/../hakmem_tiny_config.h core/box/../ptr_track.h \
  core/box/../hakmem_tiny_integrity.h core/box/../ptr_track.h \
- core/box/../ptr_trace.h core/hakmem_tiny_hotmag.inc.h \
- core/hakmem_tiny_hot_pop.inc.h core/hakmem_tiny_fastcache.inc.h \
+ core/box/../ptr_trace.h core/box/../tiny_debug_ring.h \
+ core/hakmem_tiny_hotmag.inc.h core/hakmem_tiny_hot_pop.inc.h \
  core/hakmem_tiny_refill.inc.h core/tiny_box_geometry.h \
+ core/tiny_region_id.h core/refill/ss_refill_fc.h \
  core/hakmem_tiny_ultra_front.inc.h core/hakmem_tiny_intel.inc \
  core/hakmem_tiny_background.inc core/hakmem_tiny_bg_bin.inc.h \
  core/hakmem_tiny_tls_ops.h core/hakmem_tiny_remote.inc \
  core/hakmem_tiny_init.inc core/box/prewarm_box.h \
  core/hakmem_tiny_bump.inc.h core/hakmem_tiny_smallmag.inc.h \
  core/tiny_atomic.h core/tiny_alloc_fast.inc.h \
- core/tiny_alloc_fast_sfc.inc.h core/tiny_region_id.h \
+ core/tiny_alloc_fast_sfc.inc.h core/hakmem_tiny_fastcache.inc.h \
  core/tiny_alloc_fast_inline.h core/tiny_free_fast.inc.h \
  core/hakmem_tiny_alloc.inc core/hakmem_tiny_slow.inc \
  core/hakmem_tiny_free.inc core/box/free_publish_box.h core/mid_tcache.h \
@@ -102,6 +106,13 @@ core/hakmem_tiny_bg_spill.h:
 core/tiny_adaptive_sizing.h:
 core/tiny_system.h:
 core/hakmem_prof.h:
+core/front/quick_slot.h:
+core/front/../hakmem_tiny.h:
+core/front/fast_cache.h:
+core/front/quick_slot.h:
+core/front/../hakmem_tiny_fastcache.inc.h:
+core/front/../hakmem_tiny.h:
+core/front/../tiny_remote.h:
 core/tiny_publish.h:
 core/box/tls_sll_box.h:
 core/box/../hakmem_tiny_config.h:
@@ -116,11 +127,13 @@ core/box/../ptr_track.h:
 core/box/../hakmem_tiny_integrity.h:
 core/box/../ptr_track.h:
 core/box/../ptr_trace.h:
+core/box/../tiny_debug_ring.h:
 core/hakmem_tiny_hotmag.inc.h:
 core/hakmem_tiny_hot_pop.inc.h:
-core/hakmem_tiny_fastcache.inc.h:
 core/hakmem_tiny_refill.inc.h:
 core/tiny_box_geometry.h:
+core/tiny_region_id.h:
+core/refill/ss_refill_fc.h:
 core/hakmem_tiny_ultra_front.inc.h:
 core/hakmem_tiny_intel.inc:
 core/hakmem_tiny_background.inc:
@@ -134,7 +147,7 @@ core/hakmem_tiny_smallmag.inc.h:
 core/tiny_atomic.h:
 core/tiny_alloc_fast.inc.h:
 core/tiny_alloc_fast_sfc.inc.h:
-core/tiny_region_id.h:
+core/hakmem_tiny_fastcache.inc.h:
 core/tiny_alloc_fast_inline.h:
 core/tiny_free_fast.inc.h:
 core/hakmem_tiny_alloc.inc:
diff --git a/core/hakmem_tiny_fastcache.inc.h b/core/hakmem_tiny_fastcache.inc.h
index a48b5193..bf734579 100644
--- a/core/hakmem_tiny_fastcache.inc.h
+++ b/core/hakmem_tiny_fastcache.inc.h
@@ -103,6 +103,19 @@ static inline __attribute__((always_inline)) void* tiny_fast_pop(int class_idx)
 }
 
 static inline __attribute__((always_inline)) int tiny_fast_push(int class_idx, void* ptr) {
+    // NEW: Check Front-Direct/SLL-OFF bypass (priority check before any work)
+    static __thread int s_front_direct_free = -1;
+    if (__builtin_expect(s_front_direct_free == -1, 0)) {
+        const char* e = getenv("HAKMEM_TINY_FRONT_DIRECT");
+        s_front_direct_free = (e && *e && *e != '0') ? 1 : 0;
+    }
+
+    // If Front-Direct OR SLL disabled, bypass tiny_fast (which uses TLS SLL)
+    extern int g_tls_sll_enable;
+    if (__builtin_expect(s_front_direct_free || !g_tls_sll_enable, 0)) {
+        return 0;  // Bypass TLS SLL entirely → route to magazine/slow path
+    }
+
     // ✅ CRITICAL FIX: Prevent sentinel-poisoned nodes from entering fast cache
     // Remote free operations can write SENTINEL to node->next, which eventually
     // propagates through freelist → TLS list → fast cache. If we push such a node,
diff --git a/core/hakmem_tiny_free.inc b/core/hakmem_tiny_free.inc
index eabf06e6..b8aef148 100644
--- a/core/hakmem_tiny_free.inc
+++ b/core/hakmem_tiny_free.inc
@@ -487,7 +487,14 @@ void hak_tiny_free(void* ptr) {
     if (fast_class_idx >= 0 && g_fast_enable && g_fast_cap[fast_class_idx] != 0) {
         // Phase E1-CORRECT: ALL classes (C0-C7) have 1-byte header
         void* base2 = (void*)((uint8_t*)ptr - 1);
-        if (tiny_fast_push(fast_class_idx, base2)) {
+        // PRIORITY 1: Try FastCache first (bypasses SLL when Front-Direct)
+        int pushed = 0;
+        if (__builtin_expect(g_fastcache_enable && fast_class_idx <= 3, 1)) {
+            pushed = fastcache_push(fast_class_idx, base2);
+        } else {
+            pushed = tiny_fast_push(fast_class_idx, base2);
+        }
+        if (pushed) {
             tiny_debug_ring_record(TINY_RING_EVENT_FREE_FAST, (uint16_t)fast_class_idx, ptr, 0);
             HAK_STAT_FREE(fast_class_idx);
             return;
diff --git a/core/hakmem_tiny_free.inc.bak b/core/hakmem_tiny_free.inc.bak
deleted file mode 100644
index d2f2af2b..00000000
--- a/core/hakmem_tiny_free.inc.bak
+++ /dev/null
@@ -1,1711 +0,0 @@
-#include <inttypes.h>
-#include "tiny_remote.h"
-#include "slab_handle.h"
-#include "tiny_refill.h"
-#include "tiny_tls_guard.h"
-#include "box/free_publish_box.h"
-#include "mid_tcache.h"
-extern __thread void* g_tls_sll_head[TINY_NUM_CLASSES];
-extern __thread uint32_t g_tls_sll_count[TINY_NUM_CLASSES];
-#if !HAKMEM_BUILD_RELEASE
-#include "hakmem_tiny_magazine.h"
-#endif
-extern int g_tiny_force_remote;
-
-// ENV: HAKMEM_TINY_DRAIN_TO_SLL (0=off) — adopt/bind境界でfreelist→TLS SLLへN個スプライス
-static inline int tiny_drain_to_sll_budget(void) {
-    static int v = -1;
-    if (__builtin_expect(v == -1, 0)) {
-        const char* s = getenv("HAKMEM_TINY_DRAIN_TO_SLL");
-        int parsed = (s && *s) ? atoi(s) : 0;
-        if (parsed < 0) parsed = 0; if (parsed > 256) parsed = 256;
-        v = parsed;
-    }
-    return v;
-}
-
-static inline void tiny_drain_freelist_to_sll_once(SuperSlab* ss, int slab_idx, int class_idx) {
-    int budget = tiny_drain_to_sll_budget();
-    if (__builtin_expect(budget <= 0, 1)) return;
-    if (!(ss && ss->magic == SUPERSLAB_MAGIC)) return;
-    if (slab_idx < 0) return;
-    TinySlabMeta* m = &ss->slabs[slab_idx];
-    int moved = 0;
-    while (m->freelist && moved < budget) {
-        void* p = m->freelist;
-        m->freelist = *(void**)p;
-        *(void**)p = g_tls_sll_head[class_idx];
-        g_tls_sll_head[class_idx] = p;
-        g_tls_sll_count[class_idx]++;
-        moved++;
-    }
-}
-
-static inline int tiny_remote_queue_contains_guard(SuperSlab* ss, int slab_idx, void* target) {
-    if (!ss || slab_idx < 0) return 0;
-    uintptr_t cur = atomic_load_explicit(&ss->remote_heads[slab_idx], memory_order_acquire);
-    int limit = 8192;
-    while (cur && limit-- > 0) {
-        if ((void*)cur == target) {
-            return 1;
-        }
-        uintptr_t next;
-        if (__builtin_expect(g_remote_side_enable, 0)) {
-            next = tiny_remote_side_get(ss, slab_idx, (void*)cur);
-        } else {
-            next = atomic_load_explicit((_Atomic uintptr_t*)cur, memory_order_relaxed);
-        }
-        cur = next;
-    }
-    if (limit <= 0) {
-        return 1; // fail-safe: treat unbounded traversal as duplicate
-    }
-    return 0;
-}
-
-
-// Phase 6.12.1: Free with pre-calculated slab (Option C - avoids duplicate lookup)
-void hak_tiny_free_with_slab(void* ptr, TinySlab* slab) {
-    // Phase 7.6: slab == NULL means SuperSlab mode (Magazine integration)
-    if (!slab) {
-        // SuperSlab path: Get class_idx from SuperSlab
-        SuperSlab* ss = hak_super_lookup(ptr);
-        if (!ss || ss->magic != SUPERSLAB_MAGIC) return;
-        int class_idx = ss->size_class;
-        size_t ss_size = (size_t)1ULL << ss->lg_size;
-        uintptr_t ss_base = (uintptr_t)ss;
-        if (__builtin_expect(class_idx < 0 || class_idx >= TINY_NUM_CLASSES, 0)) {
-            tiny_debug_ring_record(TINY_RING_EVENT_SUPERSLAB_ADOPT_FAIL, (uint16_t)0xFFu, ss, (uintptr_t)ss->size_class);
-            return;
-        }
-        // Optional: cross-lookup TinySlab owner and detect class mismatch early
-        if (__builtin_expect(g_tiny_safe_free, 0)) {
-            TinySlab* ts = hak_tiny_owner_slab(ptr);
-            if (ts) {
-                int ts_cls = ts->class_idx;
-                if (ts_cls >= 0 && ts_cls < TINY_NUM_CLASSES && ts_cls != class_idx) {
-                    uint32_t code = 0xAA00u | ((uint32_t)ts_cls & 0xFFu);
-                    uintptr_t aux = tiny_remote_pack_diag(code, ss_base, ss_size, (uintptr_t)ptr);
-                    tiny_debug_ring_record(TINY_RING_EVENT_REMOTE_INVALID, (uint16_t)class_idx, ptr, aux);
-                    if (g_tiny_safe_free_strict) { raise(SIGUSR2); return; }
-                }
-            }
-        }
-        tiny_debug_ring_record(TINY_RING_EVENT_FREE_ENTER, (uint16_t)class_idx, ptr, 0);
-        // Detect cross-thread: cross-thread free MUST go via superslab path
-        int slab_idx = slab_index_for(ss, ptr);
-        int ss_cap = ss_slabs_capacity(ss);
-        if (__builtin_expect(slab_idx < 0 || slab_idx >= ss_cap, 0)) {
-            tiny_debug_ring_record(TINY_RING_EVENT_SUPERSLAB_ADOPT_FAIL, (uint16_t)0xFEu, ss, (uintptr_t)slab_idx);
-            return;
-        }
-        TinySlabMeta* meta = &ss->slabs[slab_idx];
-        if (__builtin_expect(g_tiny_safe_free, 0)) {
-            size_t blk = g_tiny_class_sizes[class_idx];
-            uint8_t* base = tiny_slab_base_for(ss, slab_idx);
-            uintptr_t delta = (uintptr_t)ptr - (uintptr_t)base;
-            int cap_ok = (meta->capacity > 0) ? 1 : 0;
-            int align_ok = (delta % blk) == 0;
-            int range_ok = cap_ok && (delta / blk) < meta->capacity;
-            if (!align_ok || !range_ok) {
-                uint32_t code = 0xA104u;
-                if (align_ok) code |= 0x2u;
-                if (range_ok) code |= 0x1u;
-                uintptr_t aux = tiny_remote_pack_diag(code, ss_base, ss_size, (uintptr_t)ptr);
-                tiny_debug_ring_record(TINY_RING_EVENT_REMOTE_INVALID, (uint16_t)class_idx, ptr, aux);
-                if (g_tiny_safe_free_strict) { raise(SIGUSR2); return; }
-                return;
-            }
-        }
-        uint32_t self_tid = tiny_self_u32();
-        if (__builtin_expect(meta->owner_tid != self_tid, 0)) {
-            // route directly to superslab (remote queue / freelist)
-            uintptr_t ptr_val = (uintptr_t)ptr;
-            uintptr_t ss_base = (uintptr_t)ss;
-            size_t ss_size = (size_t)1ULL << ss->lg_size;
-            if (__builtin_expect(ptr_val < ss_base || ptr_val >= ss_base + ss_size, 0)) {
-                tiny_debug_ring_record(TINY_RING_EVENT_SUPERSLAB_ADOPT_FAIL, (uint16_t)0xFDu, ss, ptr_val);
-                return;
-            }
-            tiny_debug_ring_record(TINY_RING_EVENT_FREE_REMOTE, (uint16_t)class_idx, ss, (uintptr_t)ptr);
-            hak_tiny_free_superslab(ptr, ss);
-            HAK_STAT_FREE(class_idx);
-            return;
-        }
-
-        // A/B: Force SS freelist path for same-thread frees (publish on first-free)
-        do {
-            static int g_free_to_ss2 = -1;
-            if (__builtin_expect(g_free_to_ss2 == -1, 0)) {
-                const char* e = getenv("HAKMEM_TINY_FREE_TO_SS");
-                g_free_to_ss2 = (e && *e && *e != '0') ? 1 : 0; // default OFF
-            }
-            if (g_free_to_ss2) {
-                hak_tiny_free_superslab(ptr, ss);
-                HAK_STAT_FREE(class_idx);
-                return;
-            }
-        } while (0);
-
-        if (__builtin_expect(g_debug_fast0, 0)) {
-            tiny_debug_ring_record(TINY_RING_EVENT_FRONT_BYPASS, (uint16_t)class_idx, ptr, (uintptr_t)slab_idx);
-            void* prev = meta->freelist;
-            *(void**)ptr = prev;
-            meta->freelist = ptr;
-            meta->used--;
-            ss_active_dec_one(ss);
-            if (prev == NULL) {
-                ss_partial_publish((int)ss->size_class, ss);
-            }
-            tiny_debug_ring_record(TINY_RING_EVENT_FREE_LOCAL, (uint16_t)class_idx, ptr, (uintptr_t)slab_idx);
-            HAK_STAT_FREE(class_idx);
-            return;
-        }
-
-        if (g_fast_enable && g_fast_cap[class_idx] != 0) {
-            if (tiny_fast_push(class_idx, ptr)) {
-                tiny_debug_ring_record(TINY_RING_EVENT_FREE_FAST, (uint16_t)class_idx, ptr, slab_idx);
-                HAK_STAT_FREE(class_idx);
-                return;
-            }
-        }
-
-        if (g_tls_list_enable) {
-            TinyTLSList* tls = &g_tls_lists[class_idx];
-            uint32_t seq = atomic_load_explicit(&g_tls_param_seq[class_idx], memory_order_relaxed);
-            if (__builtin_expect(seq != g_tls_param_seen[class_idx], 0)) {
-                tiny_tls_refresh_params(class_idx, tls);
-            }
-            // TinyHotMag front push（8/16/32B, A/B）
-            if (__builtin_expect(g_hotmag_enable && class_idx <= 2, 1)) {
-                if (hotmag_push(class_idx, ptr)) {
-                    tiny_debug_ring_record(TINY_RING_EVENT_FREE_RETURN_MAG, (uint16_t)class_idx, ptr, 1);
-                    HAK_STAT_FREE(class_idx);
-                    return;
-                }
-            }
-            if (tls->count < tls->cap) {
-                tiny_tls_list_guard_push(class_idx, tls, ptr);
-                tls_list_push(tls, ptr);
-                tiny_debug_ring_record(TINY_RING_EVENT_FREE_LOCAL, (uint16_t)class_idx, ptr, 0);
-                HAK_STAT_FREE(class_idx);
-                return;
-            }
-            seq = atomic_load_explicit(&g_tls_param_seq[class_idx], memory_order_relaxed);
-            if (__builtin_expect(seq != g_tls_param_seen[class_idx], 0)) {
-                tiny_tls_refresh_params(class_idx, tls);
-            }
-            tiny_tls_list_guard_push(class_idx, tls, ptr);
-            tls_list_push(tls, ptr);
-            if (tls_list_should_spill(tls)) {
-                tls_list_spill_excess(class_idx, tls);
-            }
-            tiny_debug_ring_record(TINY_RING_EVENT_FREE_LOCAL, (uint16_t)class_idx, ptr, 2);
-            HAK_STAT_FREE(class_idx);
-            return;
-        }
-
-#if !HAKMEM_BUILD_RELEASE
-        // SuperSlab uses Magazine for TLS caching (same as TinySlab)
-        tiny_small_mags_init_once();
-        if (class_idx > 3) tiny_mag_init_if_needed(class_idx);
-        TinyTLSMag* mag = &g_tls_mags[class_idx];
-        int cap = mag->cap;
-
-        // 32/64B: SLL優先（mag優先は無効化）
-        // Prefer TinyQuickSlot (compile-out if HAKMEM_TINY_NO_QUICK)
-#if !defined(HAKMEM_TINY_NO_QUICK)
-        if (g_quick_enable && class_idx <= 4) {
-            TinyQuickSlot* qs = &g_tls_quick[class_idx];
-            if (__builtin_expect(qs->top < QUICK_CAP, 1)) {
-                qs->items[qs->top++] = ptr;
-                HAK_STAT_FREE(class_idx);
-                return;
-            }
-        }
-#endif
-
-        // Fast path: TLS SLL push for hottest classes
-        if (!g_tls_list_enable && g_tls_sll_enable && g_tls_sll_count[class_idx] < sll_cap_for_class(class_idx, (uint32_t)cap)) {
-            *(void**)ptr = g_tls_sll_head[class_idx];
-            g_tls_sll_head[class_idx] = ptr;
-            g_tls_sll_count[class_idx]++;
-            // BUGFIX: Decrement used counter (was missing, causing Fail-Fast on next free)
-            meta->used--;
-            // Active → Inactive: count down immediately (TLS保管中は"使用中"ではない)
-            ss_active_dec_one(ss);
-            HAK_TP1(sll_push, class_idx);
-            tiny_debug_ring_record(TINY_RING_EVENT_FREE_LOCAL, (uint16_t)class_idx, ptr, 3);
-            HAK_STAT_FREE(class_idx);
-            return;
-        }
-
-        // Next: Magazine push（必要ならmag→SLLへバルク転送で空きを作る）
-        // Hysteresis: allow slight overfill before deciding to spill under lock
-        if (mag->top >= cap && g_spill_hyst > 0) {
-            (void)bulk_mag_to_sll_if_room(class_idx, mag, cap / 2);
-        }
-        if (mag->top < cap + g_spill_hyst) {
-            mag->items[mag->top].ptr = ptr;
-#if HAKMEM_TINY_MAG_OWNER
-            mag->items[mag->top].owner = NULL; // SuperSlab owner not a TinySlab; leave NULL
-#endif
-            mag->top++;
-#if HAKMEM_DEBUG_COUNTERS
-            g_magazine_push_count++;  // Phase 7.6: Track pushes
-#endif
-            // Active → Inactive: decrement now（アプリ解放時に非アクティブ扱い）
-            ss_active_dec_one(ss);
-            HAK_TP1(mag_push, class_idx);
-            tiny_debug_ring_record(TINY_RING_EVENT_FREE_RETURN_MAG, (uint16_t)class_idx, ptr, 2);
-            HAK_STAT_FREE(class_idx);
-            return;
-        }
-
-        // Background spill: queue to BG thread instead of locking (when enabled)
-        if (g_bg_spill_enable) {
-            uint32_t qlen = atomic_load_explicit(&g_bg_spill_len[class_idx], memory_order_relaxed);
-            if ((int)qlen < g_bg_spill_target) {
-                // Build a small chain: include current ptr and pop from mag up to limit
-                int limit = g_bg_spill_max_batch;
-                if (limit > cap/2) limit = cap/2;
-                if (limit > 32) limit = 32; // keep free-path bounded
-                void* head = ptr;
-                *(void**)head = NULL;
-                void* tail = head; // current tail
-                int taken = 1;
-                while (taken < limit && mag->top > 0) {
-                    void* p2 = mag->items[--mag->top].ptr;
-                    *(void**)p2 = head;
-                    head = p2;
-                    taken++;
-                }
-                // Push chain to spill queue (single CAS)
-                bg_spill_push_chain(class_idx, head, tail, taken);
-                tiny_debug_ring_record(TINY_RING_EVENT_FREE_RETURN_MAG, (uint16_t)class_idx, ptr, 3);
-                HAK_STAT_FREE(class_idx);
-                return;
-            }
-        }
-
-        // Spill half (SuperSlab version - simpler than TinySlab)
-                pthread_mutex_t* lock = &g_tiny_class_locks[class_idx].m;
-        hkm_prof_begin(NULL);
-        pthread_mutex_lock(lock);
-        // Batch spill: reduce lock frequency and work per call
-        int spill = cap / 2;
-        int over = mag->top - (cap + g_spill_hyst);
-        if (over > 0 && over < spill) spill = over;
-
-        for (int i = 0; i < spill && mag->top > 0; i++) {
-            TinyMagItem it = mag->items[--mag->top];
-
-            // Phase 7.6: SuperSlab spill - return to freelist
-            SuperSlab* owner_ss = hak_super_lookup(it.ptr);
-            if (owner_ss && owner_ss->magic == SUPERSLAB_MAGIC) {
-                // Direct freelist push (same as old hak_tiny_free_superslab)
-                int slab_idx = slab_index_for(owner_ss, it.ptr);
-                // BUGFIX: Validate slab_idx before array access (prevents OOB)
-                if (slab_idx < 0 || slab_idx >= ss_slabs_capacity(owner_ss)) {
-                    continue;  // Skip invalid index
-                }
-                TinySlabMeta* meta = &owner_ss->slabs[slab_idx];
-                *(void**)it.ptr = meta->freelist;
-                meta->freelist = it.ptr;
-                meta->used--;
-                // Decrement SuperSlab active counter (spill returns blocks to SS)
-                ss_active_dec_one(owner_ss);
-
-                // Phase 8.4: Empty SuperSlab detection (will use meta->used scan)
-                // TODO: Implement scan-based empty detection
-                // Empty SuperSlab detection/munmapは別途フラッシュAPIで実施（ホットパスから除外）
-            }
-        }
-
-        pthread_mutex_unlock(lock);
-        hkm_prof_end(ss_time, HKP_TINY_SPILL, &tss);
-
-        // Adaptive increase of cap after spill
-        int max_cap = tiny_cap_max_for_class(class_idx);
-                if (mag->cap < max_cap) {
-            int new_cap = mag->cap + (mag->cap / 2);
-            if (new_cap > max_cap) new_cap = max_cap;
-            if (new_cap > TINY_TLS_MAG_CAP) new_cap = TINY_TLS_MAG_CAP;
-            mag->cap = new_cap;
-        }
-
-        // Finally, try FastCache push first (≤128B) — compile-out if HAKMEM_TINY_NO_FRONT_CACHE
-#if !defined(HAKMEM_TINY_NO_FRONT_CACHE)
-        if (g_fastcache_enable && class_idx <= 4) {
-            if (fastcache_push(class_idx, ptr)) {
-                HAK_TP1(front_push, class_idx);
-                HAK_STAT_FREE(class_idx);
-                return;
-            }
-        }
-#endif
-        // Then TLS SLL if room, else magazine
-        if (g_tls_sll_enable && g_tls_sll_count[class_idx] < sll_cap_for_class(class_idx, (uint32_t)mag->cap)) {
-            *(void**)ptr = g_tls_sll_head[class_idx];
-            g_tls_sll_head[class_idx] = ptr;
-            g_tls_sll_count[class_idx]++;
-        } else {
-            mag->items[mag->top].ptr = ptr;
-#if HAKMEM_TINY_MAG_OWNER
-            mag->items[mag->top].owner = slab;
-#endif
-            mag->top++;
-        }
-        
-#if HAKMEM_DEBUG_COUNTERS
-        g_magazine_push_count++;  // Phase 7.6: Track pushes
-#endif
-        HAK_STAT_FREE(class_idx);
-        return;
-#endif  // HAKMEM_BUILD_RELEASE
-    }
-
-    // Phase 7.6: TinySlab path (original)
-    //g_tiny_free_with_slab_count++;  // Phase 7.6: Track calls - DISABLED due to segfault
-    // Same-thread → TLS magazine; remote-thread → MPSC stack
-    if (pthread_equal(slab->owner_tid, tiny_self_pt())) {
-        int class_idx = slab->class_idx;
-
-        if (g_tls_list_enable) {
-            TinyTLSList* tls = &g_tls_lists[class_idx];
-            uint32_t seq = atomic_load_explicit(&g_tls_param_seq[class_idx], memory_order_relaxed);
-            if (__builtin_expect(seq != g_tls_param_seen[class_idx], 0)) {
-                tiny_tls_refresh_params(class_idx, tls);
-            }
-            // TinyHotMag front push（8/16/32B, A/B）
-            if (__builtin_expect(g_hotmag_enable && class_idx <= 2, 1)) {
-                if (hotmag_push(class_idx, ptr)) {
-                    HAK_STAT_FREE(class_idx);
-                    return;
-                }
-            }
-            if (tls->count < tls->cap) {
-                tiny_tls_list_guard_push(class_idx, tls, ptr);
-                tls_list_push(tls, ptr);
-                HAK_STAT_FREE(class_idx);
-                return;
-            }
-            seq = atomic_load_explicit(&g_tls_param_seq[class_idx], memory_order_relaxed);
-            if (__builtin_expect(seq != g_tls_param_seen[class_idx], 0)) {
-                tiny_tls_refresh_params(class_idx, tls);
-            }
-            tiny_tls_list_guard_push(class_idx, tls, ptr);
-            tls_list_push(tls, ptr);
-            if (tls_list_should_spill(tls)) {
-                tls_list_spill_excess(class_idx, tls);
-            }
-            HAK_STAT_FREE(class_idx);
-            return;
-        }
-
-        tiny_mag_init_if_needed(class_idx);
-        TinyTLSMag* mag = &g_tls_mags[class_idx];
-        int cap = mag->cap;
-        // 32/64B: SLL優先（mag優先は無効化）
-        // Fast path: FastCache push (preferred for ≤128B), then TLS SLL
-        if (g_fastcache_enable && class_idx <= 4) {
-            if (fastcache_push(class_idx, ptr)) {
-                HAK_STAT_FREE(class_idx);
-                return;
-            }
-        }
-        // Fast path: TLS SLL push (preferred)
-        if (!g_tls_list_enable && g_tls_sll_enable && class_idx <= 5) {
-            uint32_t sll_cap = sll_cap_for_class(class_idx, (uint32_t)cap);
-            if (g_tls_sll_count[class_idx] < sll_cap) {
-                *(void**)ptr = g_tls_sll_head[class_idx];
-                g_tls_sll_head[class_idx] = ptr;
-                g_tls_sll_count[class_idx]++;
-                HAK_STAT_FREE(class_idx);
-                return;
-            }
-        }
-        // Next: if magazine has room, push immediately and return（満杯ならmag→SLLへバルク）
-        if (mag->top >= cap) {
-            (void)bulk_mag_to_sll_if_room(class_idx, mag, cap / 2);
-        }
-        // Remote-drain can be handled opportunistically on future calls.
-        if (mag->top < cap) {
-            mag->items[mag->top].ptr = ptr;
-#if HAKMEM_TINY_MAG_OWNER
-            mag->items[mag->top].owner = slab;
-#endif
-            mag->top++;
-            
-#if HAKMEM_DEBUG_COUNTERS
-            g_magazine_push_count++;  // Phase 7.6: Track pushes
-#endif
-            // Note: SuperSlab uses separate path (slab == NULL branch above)
-            HAK_STAT_FREE(class_idx);  // Phase 3
-            return;
-        }
-        // Magazine full: before spilling, opportunistically drain remotes once under lock.
-        if (atomic_load_explicit(&slab->remote_count, memory_order_relaxed) >= (unsigned)g_remote_drain_thresh_per_class[class_idx] || atomic_load_explicit(&slab->remote_head, memory_order_acquire)) {
-            pthread_mutex_t* lock = &g_tiny_class_locks[class_idx].m;
-            pthread_mutex_lock(lock);
-            HAK_TP1(remote_drain, class_idx);
-            tiny_remote_drain_locked(slab);
-            pthread_mutex_unlock(lock);
-        }
-        // Spill half under class lock
-        pthread_mutex_t* lock = &g_tiny_class_locks[class_idx].m;
-        pthread_mutex_lock(lock);
-        int spill = cap / 2;
-
-        // Phase 4.2: High-water threshold for gating Phase 4 logic
-        int high_water = (cap * 3) / 4;  // 75% of capacity
-
-        for (int i = 0; i < spill && mag->top > 0; i++) {
-            TinyMagItem it = mag->items[--mag->top];
-
-            // Phase 7.6: Check for SuperSlab first (mixed Magazine support)
-            SuperSlab* ss_owner = hak_super_lookup(it.ptr);
-            if (ss_owner && ss_owner->magic == SUPERSLAB_MAGIC) {
-                // SuperSlab spill - return to freelist
-                int slab_idx = slab_index_for(ss_owner, it.ptr);
-                // BUGFIX: Validate slab_idx before array access (prevents OOB)
-                if (slab_idx < 0 || slab_idx >= ss_slabs_capacity(ss_owner)) {
-                    HAK_STAT_FREE(class_idx);
-                    continue;  // Skip invalid index
-                }
-                TinySlabMeta* meta = &ss_owner->slabs[slab_idx];
-                *(void**)it.ptr = meta->freelist;
-                meta->freelist = it.ptr;
-                meta->used--;
-                // 空SuperSlab処理はフラッシュ/バックグラウンドで対応（ホットパス除外）
-                HAK_STAT_FREE(class_idx);
-                continue;  // Skip TinySlab processing
-            }
-
-            TinySlab* owner =
-#if HAKMEM_TINY_MAG_OWNER
-                it.owner;
-#else
-                NULL;
-#endif
-            if (!owner) {
-                owner = tls_active_owner_for_ptr(class_idx, it.ptr);
-            }
-            if (!owner) {
-                owner = hak_tiny_owner_slab(it.ptr);
-            }
-            if (!owner) continue;
-
-            // Phase 4.2: Adaptive gating - skip Phase 4 when TLS Magazine is high-water
-            // Rationale: When mag->top >= 75%, next alloc will come from TLS anyway
-            //            so pushing to mini-mag is wasted work
-            int is_high_water = (mag->top >= high_water);
-
-            if (!is_high_water) {
-                // Low-water: Phase 4.1 logic (try mini-magazine first)
-                uint8_t cidx = owner->class_idx;  // Option A: 1回だけ読む
-                TinySlab* tls_a = g_tls_active_slab_a[cidx];
-                TinySlab* tls_b = g_tls_active_slab_b[cidx];
-
-                // Option B: Branch prediction hint (spill → TLS-active への戻りが likely)
-                if (__builtin_expect((owner == tls_a || owner == tls_b) &&
-                                     !mini_mag_is_full(&owner->mini_mag), 1)) {
-                    // Fast path: mini-magazineに戻す（bitmap触らない）
-                    mini_mag_push(&owner->mini_mag, it.ptr);
-                    HAK_TP1(spill_tiny, cidx);
-                    HAK_STAT_FREE(cidx);
-                    continue;  // bitmap操作スキップ
-                }
-            }
-            // High-water or Phase 4.1 mini-mag full: fall through to bitmap
-
-            // Slow path: bitmap直接書き込み（既存ロジック）
-            size_t bs = g_tiny_class_sizes[owner->class_idx];
-            int idx = ((uintptr_t)it.ptr - (uintptr_t)owner->base) / bs;
-            if (hak_tiny_is_used(owner, idx)) {
-                hak_tiny_set_free(owner, idx);
-                int was_full = (owner->free_count == 0);
-                owner->free_count++;
-                if (was_full) move_to_free_list(owner->class_idx, owner);
-                if (owner->free_count == owner->total_count) {
-                    // If this slab is TLS-active for this thread, clear the pointer before releasing
-                    if (g_tls_active_slab_a[owner->class_idx] == owner) g_tls_active_slab_a[owner->class_idx] = NULL;
-                    if (g_tls_active_slab_b[owner->class_idx] == owner) g_tls_active_slab_b[owner->class_idx] = NULL;
-                    TinySlab** headp = &g_tiny_pool.free_slabs[owner->class_idx];
-                    TinySlab* prev = NULL;
-                    for (TinySlab* s = *headp; s; prev = s, s = s->next) {
-                        if (s == owner) { if (prev) prev->next = s->next; else *headp = s->next; break; }
-                    }
-                    release_slab(owner);
-                }
-                HAK_TP1(spill_tiny, owner->class_idx);
-                HAK_STAT_FREE(owner->class_idx);
-            }
-        }
-        pthread_mutex_unlock(lock);
-        hkm_prof_end(ss, HKP_TINY_SPILL, &tss);
-        // Adaptive increase of cap after spill
-        int max_cap = tiny_cap_max_for_class(class_idx);
-        if (mag->cap < max_cap) {
-            int new_cap = mag->cap + (mag->cap / 2);
-            if (new_cap > max_cap) new_cap = max_cap;
-            if (new_cap > TINY_TLS_MAG_CAP) new_cap = TINY_TLS_MAG_CAP;
-            mag->cap = new_cap;
-        }
-        // Finally: prefer TinyQuickSlot → SLL → UltraFront → HotMag → Magazine（順序で局所性を確保）
-#if !HAKMEM_BUILD_RELEASE && !defined(HAKMEM_TINY_NO_QUICK)
-        if (g_quick_enable && class_idx <= 4) {
-            TinyQuickSlot* qs = &g_tls_quick[class_idx];
-            if (__builtin_expect(qs->top < QUICK_CAP, 1)) {
-                qs->items[qs->top++] = ptr;
-            } else if (g_tls_sll_enable) {
-                uint32_t sll_cap2 = sll_cap_for_class(class_idx, (uint32_t)mag->cap);
-                if (g_tls_sll_count[class_idx] < sll_cap2) {
-                    *(void**)ptr = g_tls_sll_head[class_idx];
-                    g_tls_sll_head[class_idx] = ptr;
-                    g_tls_sll_count[class_idx]++;
-                } else if (!tiny_optional_push(class_idx, ptr)) {
-                    mag->items[mag->top].ptr = ptr;
-#if HAKMEM_TINY_MAG_OWNER
-                    mag->items[mag->top].owner = slab;
-#endif
-                    mag->top++;
-                }
-            } else {
-                if (!tiny_optional_push(class_idx, ptr)) {
-                    mag->items[mag->top].ptr = ptr;
-#if HAKMEM_TINY_MAG_OWNER
-                    mag->items[mag->top].owner = slab;
-#endif
-                    mag->top++;
-                }
-            }
-        } else
-#endif
-        {
-            if (g_tls_sll_enable && class_idx <= 5) {
-                uint32_t sll_cap2 = sll_cap_for_class(class_idx, (uint32_t)mag->cap);
-                if (g_tls_sll_count[class_idx] < sll_cap2) {
-                    *(void**)ptr = g_tls_sll_head[class_idx];
-                    g_tls_sll_head[class_idx] = ptr;
-                    g_tls_sll_count[class_idx]++;
-                } else if (!tiny_optional_push(class_idx, ptr)) {
-                    mag->items[mag->top].ptr = ptr;
-#if HAKMEM_TINY_MAG_OWNER
-                    mag->items[mag->top].owner = slab;
-#endif
-                    mag->top++;
-                }
-            } else {
-                if (!tiny_optional_push(class_idx, ptr)) {
-                    mag->items[mag->top].ptr = ptr;
-#if HAKMEM_TINY_MAG_OWNER
-                    mag->items[mag->top].owner = slab;
-#endif
-                    mag->top++;
-                }
-            }
-        }
-        
-#if HAKMEM_DEBUG_COUNTERS
-        g_magazine_push_count++;  // Phase 7.6: Track pushes
-#endif
-        // Note: SuperSlab uses separate path (slab == NULL branch above)
-        HAK_STAT_FREE(class_idx);  // Phase 3
-        return;
-    } else {
-        tiny_remote_push(slab, ptr);
-    }
-}
-
-// ============================================================================
-// Phase 6.23: SuperSlab Allocation Helpers
-// ============================================================================
-
-// Phase 6.24: Allocate from SuperSlab slab (lazy freelist + linear allocation)
-static inline void* superslab_alloc_from_slab(SuperSlab* ss, int slab_idx) {
-    TinySlabMeta* meta = &ss->slabs[slab_idx];
-
-    // Ensure remote queue is drained before handing blocks back to TLS
-    if (atomic_load_explicit(&ss->remote_heads[slab_idx], memory_order_acquire) != 0) {
-        uint32_t self_tid = tiny_self_u32();
-        SlabHandle h = slab_try_acquire(ss, slab_idx, self_tid);
-        if (slab_is_valid(&h)) {
-            slab_drain_remote_full(&h);
-            int pending = atomic_load_explicit(&ss->remote_heads[slab_idx], memory_order_acquire) != 0;
-            if (__builtin_expect(pending, 0)) {
-                if (__builtin_expect(g_debug_remote_guard, 0)) {
-                    uintptr_t head = atomic_load_explicit(&ss->remote_heads[slab_idx], memory_order_relaxed);
-                    tiny_remote_watch_note("alloc_pending_remote",
-                                           ss,
-                                           slab_idx,
-                                           (void*)head,
-                                           0xA243u,
-                                           self_tid,
-                                           0);
-                }
-                slab_release(&h);
-                return NULL;
-            }
-            slab_release(&h);
-        } else {
-            if (__builtin_expect(g_debug_remote_guard, 0)) {
-                tiny_remote_watch_note("alloc_acquire_fail",
-                                       ss,
-                                       slab_idx,
-                                       meta,
-                                       0xA244u,
-                                       self_tid,
-                                       0);
-            }
-            return NULL;
-        }
-    }
-
-    if (__builtin_expect(g_debug_remote_guard, 0)) {
-        uintptr_t head_pending = atomic_load_explicit(&ss->remote_heads[slab_idx], memory_order_acquire);
-        if (head_pending != 0) {
-            tiny_remote_watch_note("alloc_remote_pending",
-                                   ss,
-                                   slab_idx,
-                                   (void*)head_pending,
-                                   0xA247u,
-                                   tiny_self_u32(),
-                                   1);
-            return NULL;
-        }
-    }
-
-    // Phase 6.24: Linear allocation mode (freelist == NULL)
-    // This avoids the 4000-8000 cycle cost of building freelist on init
-    if (meta->freelist == NULL && meta->used < meta->capacity) {
-        // Linear allocation: sequential memory access (cache-friendly!)
-        size_t block_size = g_tiny_class_sizes[ss->size_class];
-        void* slab_start = slab_data_start(ss, slab_idx);
-
-        // First slab: skip SuperSlab header
-        if (slab_idx == 0) {
-            slab_start = (char*)slab_start + 1024;
-        }
-
-        void* block = (char*)slab_start + (meta->used * block_size);
-        meta->used++;
-        tiny_remote_track_on_alloc(ss, slab_idx, block, "linear_alloc", 0);
-        tiny_remote_assert_not_remote(ss, slab_idx, block, "linear_alloc_ret", 0);
-        return block;  // Fast path: O(1) pointer arithmetic
-    }
-
-    // Freelist mode (after first free())
-    if (meta->freelist) {
-        void* block = meta->freelist;
-        meta->freelist = *(void**)block;  // Pop from freelist
-        meta->used++;
-        tiny_remote_track_on_alloc(ss, slab_idx, block, "freelist_alloc", 0);
-        tiny_remote_assert_not_remote(ss, slab_idx, block, "freelist_alloc_ret", 0);
-        return block;
-    }
-
-    return NULL;  // Slab is full
-}
-
-// Phase 6.24 & 7.6: Refill TLS SuperSlab (with unified TLS cache + deferred allocation)
-static SuperSlab* superslab_refill(int class_idx) {
-#if HAKMEM_DEBUG_COUNTERS
-    g_superslab_refill_calls_dbg[class_idx]++;
-#endif
-    TinyTLSSlab* tls = &g_tls_slabs[class_idx];
-    static int g_ss_adopt_en = -1; // env: HAKMEM_TINY_SS_ADOPT=1; default auto-on if remote seen
-    if (g_ss_adopt_en == -1) {
-        char* e = getenv("HAKMEM_TINY_SS_ADOPT");
-        if (e) {
-            g_ss_adopt_en = (*e != '0') ? 1 : 0;
-        } else {
-            extern _Atomic int g_ss_remote_seen;
-            g_ss_adopt_en = (atomic_load_explicit(&g_ss_remote_seen, memory_order_relaxed) != 0) ? 1 : 0;
-        }
-    }
-    extern int g_adopt_cool_period;
-    extern __thread int g_tls_adopt_cd[];
-    if (g_adopt_cool_period == -1) {
-        char* cd = getenv("HAKMEM_TINY_SS_ADOPT_COOLDOWN");
-        int v = (cd ? atoi(cd) : 0);
-        if (v < 0) v = 0; if (v > 1024) v = 1024;
-        g_adopt_cool_period = v;
-    }
-
-    static int g_superslab_refill_debug_once = 0;
-    SuperSlab* prev_ss = tls->ss;
-    TinySlabMeta* prev_meta = tls->meta;
-    uint8_t prev_slab_idx = tls->slab_idx;
-    uint8_t prev_active = prev_ss ? prev_ss->active_slabs : 0;
-    uint32_t prev_bitmap = prev_ss ? prev_ss->slab_bitmap : 0;
-    uint32_t prev_meta_used = prev_meta ? prev_meta->used : 0;
-    uint32_t prev_meta_cap = prev_meta ? prev_meta->capacity : 0;
-    int free_idx_attempted = -2;  // -2 = not evaluated, -1 = none, >=0 = chosen
-    int reused_slabs = 0;
-
-    // Optional: Mid-size simple refill to avoid multi-layer scans (class>=4)
-    do {
-        static int g_mid_simple_warn = 0;
-        if (class_idx >= 4 && tiny_mid_refill_simple_enabled()) {
-            // If current TLS has a SuperSlab, prefer taking a virgin slab directly
-            if (tls->ss) {
-                int tls_cap = ss_slabs_capacity(tls->ss);
-                if (tls->ss->active_slabs < tls_cap) {
-                    int free_idx = superslab_find_free_slab(tls->ss);
-                    if (free_idx >= 0) {
-                        uint32_t my_tid = tiny_self_u32();
-                        superslab_init_slab(tls->ss, free_idx, g_tiny_class_sizes[class_idx], my_tid);
-                        tiny_tls_bind_slab(tls, tls->ss, free_idx);
-                        return tls->ss;
-                    }
-                }
-            }
-            // Otherwise allocate a fresh SuperSlab and bind first slab
-            SuperSlab* ssn = superslab_allocate((uint8_t)class_idx);
-            if (!ssn) {
-                if (!g_superslab_refill_debug_once && g_mid_simple_warn < 2) {
-                    g_mid_simple_warn++;
-                    int err = errno;
-                    fprintf(stderr, "[DEBUG] mid_simple_refill OOM class=%d errno=%d\n", class_idx, err);
-                }
-                return NULL;
-            }
-            uint32_t my_tid = tiny_self_u32();
-            superslab_init_slab(ssn, 0, g_tiny_class_sizes[class_idx], my_tid);
-            SuperSlab* old = tls->ss;
-            tiny_tls_bind_slab(tls, ssn, 0);
-            superslab_ref_inc(ssn);
-            if (old && old != ssn) { superslab_ref_dec(old); }
-            return ssn;
-        }
-    } while (0);
-
-
-    // First, try to adopt a published partial SuperSlab for this class
-    if (g_ss_adopt_en) {
-        if (g_adopt_cool_period > 0) {
-            if (g_tls_adopt_cd[class_idx] > 0) {
-                g_tls_adopt_cd[class_idx]--;
-            } else {
-                // eligible to adopt
-            }
-        }
-        if (g_adopt_cool_period == 0 || g_tls_adopt_cd[class_idx] == 0) {
-        SuperSlab* adopt = ss_partial_adopt(class_idx);
-        if (adopt && adopt->magic == SUPERSLAB_MAGIC) {
-            // ========================================================================
-            // Quick Win #2: First-Fit Adopt (vs Best-Fit scoring all 32 slabs)
-            // For Larson, any slab with freelist works - no need to score all 32!
-            // Expected improvement: -3,000 cycles (from 32 atomic loads + 32 scores)
-            // ========================================================================
-            int adopt_cap = ss_slabs_capacity(adopt);
-            int best = -1;
-            for (int s = 0; s < adopt_cap; s++) {
-                TinySlabMeta* m = &adopt->slabs[s];
-                // Quick check: Does this slab have a freelist?
-                if (m->freelist) {
-                    // Yes! Try to acquire it immediately (first-fit)
-                    best = s;
-                    break;  // ✅ OPTIMIZATION: Stop at first slab with freelist!
-                }
-                // Optional: Also check remote_heads if we want to prioritize those
-                // (But for Larson, freelist is sufficient)
-            }
-            if (best >= 0) {
-                // Box: Try to acquire ownership atomically
-                uint32_t self = tiny_self_u32();
-                SlabHandle h = slab_try_acquire(adopt, best, self);
-                if (slab_is_valid(&h)) {
-                    slab_drain_remote_full(&h);
-                    if (slab_remote_pending(&h)) {
-                        if (__builtin_expect(g_debug_remote_guard, 0)) {
-                            uintptr_t head = atomic_load_explicit(&h.ss->remote_heads[h.slab_idx], memory_order_relaxed);
-                            tiny_remote_watch_note("adopt_remote_pending",
-                                                   h.ss,
-                                                   h.slab_idx,
-                                                   (void*)head,
-                                                   0xA255u,
-                                                   self,
-                                                   0);
-                        }
-                        // Remote still pending; give up adopt path and fall through to normal refill.
-                        slab_release(&h);
-                    }
-
-                    // Box 4 Boundary: bind は remote_head==0 を保証する必要がある
-                    // slab_is_safe_to_bind() で TOCTOU-safe にチェック
-                    if (slab_is_safe_to_bind(&h)) {
-                        // Optional: move a few nodes to Front SLL to boost next hits
-                        tiny_drain_freelist_to_sll_once(h.ss, h.slab_idx, class_idx);
-                        // 安全に bind 可能（freelist 存在 && remote_head==0 保証）
-                        tiny_tls_bind_slab(tls, h.ss, h.slab_idx);
-                        if (g_adopt_cool_period > 0) {
-                            g_tls_adopt_cd[class_idx] = g_adopt_cool_period;
-                        }
-                        return h.ss;
-                    }
-                    // Safe to bind 失敗（freelist なしor remote pending）→ adopt 中止
-                    slab_release(&h);
-                }
-                // Failed to acquire or no freelist - continue searching
-            }
-            // If no freelist found, ignore and continue (optional: republish)
-        }
-    }
-    }
-
-    // Phase 7.6 Step 4: Check existing SuperSlab with priority order
-    if (tls->ss) {
-        // Priority 1: Reuse slabs with freelist (already freed blocks)
-        int tls_cap = ss_slabs_capacity(tls->ss);
-        uint32_t nonempty_mask = 0;
-        do {
-            static int g_mask_en = -1;
-            if (__builtin_expect(g_mask_en == -1, 0)) {
-                const char* e = getenv("HAKMEM_TINY_FREELIST_MASK");
-                g_mask_en = (e && *e && *e != '0') ? 1 : 0;
-            }
-            if (__builtin_expect(g_mask_en, 0)) {
-                nonempty_mask = atomic_load_explicit(&tls->ss->freelist_mask, memory_order_acquire);
-                break;
-            }
-            for (int i = 0; i < tls_cap; i++) {
-                if (tls->ss->slabs[i].freelist) nonempty_mask |= (1u << i);
-            }
-        } while (0);
-
-        // O(1) lookup: scan mask with ctz (1 instruction!)
-        while (__builtin_expect(nonempty_mask != 0, 1)) {
-            int i = __builtin_ctz(nonempty_mask);  // Find first non-empty slab (O(1))
-            nonempty_mask &= ~(1u << i);  // Clear bit for next iteration
-
-            // FIX #1 DELETED (Race condition fix):
-            // Previous drain without ownership caused concurrent freelist corruption.
-            // Ownership protocol: MUST bind+owner_cas BEFORE drain (see Fix #3 in tiny_refill.h).
-            // Remote frees will be drained when the slab is adopted (see tiny_refill.h paths).
-
-            uint32_t self_tid = tiny_self_u32();
-            SlabHandle h = slab_try_acquire(tls->ss, i, self_tid);
-            if (slab_is_valid(&h)) {
-                if (slab_remote_pending(&h)) {
-                    slab_drain_remote_full(&h);
-                    if (__builtin_expect(g_debug_remote_guard, 0)) {
-                        uintptr_t head = atomic_load_explicit(&h.ss->remote_heads[h.slab_idx], memory_order_relaxed);
-                        tiny_remote_watch_note("reuse_remote_pending",
-                                               h.ss,
-                                               h.slab_idx,
-                                               (void*)head,
-                                               0xA254u,
-                                               self_tid,
-                                               0);
-                    }
-                    slab_release(&h);
-                    continue;
-                }
-                // Box 4 Boundary: bind は remote_head==0 を保証する必要がある
-                if (slab_is_safe_to_bind(&h)) {
-                    // Optional: move a few nodes to Front SLL to boost next hits
-                    tiny_drain_freelist_to_sll_once(h.ss, h.slab_idx, class_idx);
-                    reused_slabs = 1;
-                    tiny_tls_bind_slab(tls, h.ss, h.slab_idx);
-                    return h.ss;
-                }
-                // Safe to bind 失敗 → 次の slab を試す
-                slab_release(&h);
-            }
-        }
-
-        // Priority 2: Use unused slabs (virgin slabs)
-        if (tls->ss->active_slabs < tls_cap) {
-            // Find next free slab
-            int free_idx = superslab_find_free_slab(tls->ss);
-            free_idx_attempted = free_idx;
-            if (free_idx >= 0) {
-                // Initialize this slab
-                uint32_t my_tid = tiny_self_u32();
-                superslab_init_slab(tls->ss, free_idx, g_tiny_class_sizes[class_idx], my_tid);
-
-                // Update TLS cache (unified update)
-                tiny_tls_bind_slab(tls, tls->ss, free_idx);
-
-                return tls->ss;
-            }
-        }
-    }
-
-    // Try to adopt a partial SuperSlab from registry (one-shot, cheap scan)
-    // This reduces pressure to allocate new SS when other threads freed blocks.
-    // Phase 6: Registry Optimization - Use per-class registry for O(class_size) scan
-    if (!tls->ss) {
-        // Phase 6: Use per-class registry (262K → ~10-100 entries per class!)
-        extern SuperSlab* g_super_reg_by_class[TINY_NUM_CLASSES][SUPER_REG_PER_CLASS];
-        extern int g_super_reg_class_size[TINY_NUM_CLASSES];
-
-        const int scan_max = tiny_reg_scan_max();
-        int reg_size = g_super_reg_class_size[class_idx];
-        int scan_limit = (scan_max < reg_size) ? scan_max : reg_size;
-
-        for (int i = 0; i < scan_limit; i++) {
-            SuperSlab* ss = g_super_reg_by_class[class_idx][i];
-            if (!ss || ss->magic != SUPERSLAB_MAGIC) continue;
-            // Note: class_idx check is not needed (per-class registry!)
-
-            // Pick first slab with freelist (Box 4: 所有権取得 + remote check)
-            int reg_cap = ss_slabs_capacity(ss);
-            uint32_t self_tid = tiny_self_u32();
-            for (int s = 0; s < reg_cap; s++) {
-                if (ss->slabs[s].freelist) {
-                    SlabHandle h = slab_try_acquire(ss, s, self_tid);
-                    if (slab_is_valid(&h)) {
-                        slab_drain_remote_full(&h);
-                        if (slab_is_safe_to_bind(&h)) {
-                            tiny_drain_freelist_to_sll_once(h.ss, h.slab_idx, class_idx);
-                            tiny_tls_bind_slab(tls, ss, s);
-                            return ss;
-                        }
-                        slab_release(&h);
-                    }
-                }
-            }
-        }
-    }
-
-    // Must-adopt-before-mmap gate: attempt sticky/hot/bench/mailbox/registry small-window
-    {
-        SuperSlab* gate_ss = tiny_must_adopt_gate(class_idx, tls);
-        if (gate_ss) return gate_ss;
-    }
-
-    // Allocate new SuperSlab
-    SuperSlab* ss = superslab_allocate((uint8_t)class_idx);
-    if (!ss) {
-        if (!g_superslab_refill_debug_once) {
-            g_superslab_refill_debug_once = 1;
-            int err = errno;
-            fprintf(stderr,
-                    "[DEBUG] superslab_refill NULL detail: class=%d prev_ss=%p active=%u bitmap=0x%08x prev_meta=%p used=%u cap=%u slab_idx=%u reused_freelist=%d free_idx=%d errno=%d\n",
-                    class_idx,
-                    (void*)prev_ss,
-                    (unsigned)prev_active,
-                    prev_bitmap,
-                    (void*)prev_meta,
-                    (unsigned)prev_meta_used,
-                    (unsigned)prev_meta_cap,
-                    (unsigned)prev_slab_idx,
-                    reused_slabs,
-                    free_idx_attempted,
-                    err);
-        }
-        return NULL;  // OOM
-    }
-
-    // Initialize first slab
-    uint32_t my_tid = tiny_self_u32();
-    superslab_init_slab(ss, 0, g_tiny_class_sizes[class_idx], my_tid);
-
-    // Cache in unified TLS（前のSS参照を解放）
-    SuperSlab* old = tls->ss;
-    tiny_tls_bind_slab(tls, ss, 0);
-    // Maintain refcount（将来の空回収に備え、TLS参照をカウント）
-    superslab_ref_inc(ss);
-    if (old && old != ss) {
-        superslab_ref_dec(old);
-    }
-
-    return ss;
-}
-
-// Phase 6.24: SuperSlab-based allocation (TLS unified, Medium fix)
-static inline void* hak_tiny_alloc_superslab(int class_idx) {
-    // DEBUG: Function entry trace (gated to avoid ring spam)
-    do {
-        static int g_alloc_ring = -1;
-        if (__builtin_expect(g_alloc_ring == -1, 0)) {
-            const char* e = getenv("HAKMEM_TINY_ALLOC_RING");
-            g_alloc_ring = (e && *e && *e != '0') ? 1 : 0;
-        }
-        if (g_alloc_ring) {
-            tiny_debug_ring_record(TINY_RING_EVENT_ALLOC_ENTER, 0x01, (void*)(uintptr_t)class_idx, 0);
-        }
-    } while (0);
-
-    // MidTC fast path: 128..1024B（class>=4）はTLS tcacheを最優先
-    do {
-        void* mp = midtc_pop(class_idx);
-        if (mp) {
-            HAK_RET_ALLOC(class_idx, mp);
-        }
-    } while (0);
-
-    // Phase 6.24: 1 TLS read (down from 3)
-    TinyTLSSlab* tls = &g_tls_slabs[class_idx];
-
-    TinySlabMeta* meta = tls->meta;
-    int slab_idx = tls->slab_idx;
-    if (meta && slab_idx >= 0 && tls->ss) {
-        // A/B: Relaxed read for remote head presence check
-        static int g_alloc_remote_relax = -1; // env: HAKMEM_TINY_ALLOC_REMOTE_RELAX=1 → relaxed
-        if (__builtin_expect(g_alloc_remote_relax == -1, 0)) {
-            const char* e = getenv("HAKMEM_TINY_ALLOC_REMOTE_RELAX");
-            g_alloc_remote_relax = (e && *e && *e != '0') ? 1 : 0;
-        }
-        uintptr_t pending = atomic_load_explicit(&tls->ss->remote_heads[slab_idx],
-                                                 g_alloc_remote_relax ? memory_order_relaxed
-                                                                       : memory_order_acquire);
-        if (__builtin_expect(pending != 0, 0)) {
-            uint32_t self_tid = tiny_self_u32();
-            if (ss_owner_try_acquire(meta, self_tid)) {
-                _ss_remote_drain_to_freelist_unsafe(tls->ss, slab_idx, meta);
-            }
-        }
-    }
-
-    // FIX #2 DELETED (Race condition fix):
-    // Previous drain-all-slabs without ownership caused concurrent freelist corruption.
-    // Problem: Thread A owns slab 5, Thread B drains all slabs including 5 → both modify freelist → crash.
-    // Ownership protocol: MUST bind+owner_cas BEFORE drain (see Fix #3 in tiny_refill.h).
-    // Remote frees will be drained when the slab is adopted via refill paths.
-
-    // Fast path: Direct metadata access (no repeated TLS reads!)
-    if (meta && meta->freelist == NULL && meta->used < meta->capacity && tls->slab_base) {
-        // Linear allocation (lazy init)
-        size_t block_size = g_tiny_class_sizes[tls->ss->size_class];
-        void* block = (void*)(tls->slab_base + ((size_t)meta->used * block_size));
-        meta->used++;
-        // Track active blocks in SuperSlab for conservative reclamation
-        ss_active_inc(tls->ss);
-        // Route: slab linear
-        ROUTE_MARK(11); ROUTE_COMMIT(class_idx, 0x60);
-        HAK_RET_ALLOC(class_idx, block);  // Phase 8.4: Zero hot-path overhead
-    }
-
-    if (meta && meta->freelist) {
-        // Freelist allocation
-        void* block = meta->freelist;
-        // Safety: bounds/alignment check (debug)
-        if (__builtin_expect(g_tiny_safe_free, 0)) {
-            size_t blk = g_tiny_class_sizes[tls->ss->size_class];
-            uint8_t* base = tiny_slab_base_for(tls->ss, tls->slab_idx);
-            uintptr_t delta = (uintptr_t)block - (uintptr_t)base;
-            int align_ok = ((delta % blk) == 0);
-            int range_ok = (delta / blk) < meta->capacity;
-            if (!align_ok || !range_ok) {
-                uintptr_t info = ((uintptr_t)(align_ok ? 1u : 0u) << 32) | (uint32_t)(range_ok ? 1u : 0u);
-                tiny_debug_ring_record(TINY_RING_EVENT_REMOTE_INVALID, (uint16_t)tls->ss->size_class, block, info | 0xA100u);
-                if (g_tiny_safe_free_strict) { raise(SIGUSR2); return NULL; }
-                return NULL;
-            }
-        }
-        void* next = *(void**)block;
-        meta->freelist = next;
-        meta->used++;
-        // Optional: clear freelist bit when becomes empty
-        do {
-            static int g_mask_en = -1;
-            if (__builtin_expect(g_mask_en == -1, 0)) {
-                const char* e = getenv("HAKMEM_TINY_FREELIST_MASK");
-                g_mask_en = (e && *e && *e != '0') ? 1 : 0;
-            }
-            if (__builtin_expect(g_mask_en, 0) && next == NULL) {
-                uint32_t bit = (1u << slab_idx);
-                atomic_fetch_and_explicit(&tls->ss->freelist_mask, ~bit, memory_order_release);
-            }
-        } while (0);
-        // Track active blocks in SuperSlab for conservative reclamation
-        ss_active_inc(tls->ss);
-        // Route: slab freelist
-        ROUTE_MARK(12); ROUTE_COMMIT(class_idx, 0x61);
-        HAK_RET_ALLOC(class_idx, block);  // Phase 8.4: Zero hot-path overhead
-    }
-
-    // Slow path: Refill TLS slab
-    SuperSlab* ss = superslab_refill(class_idx);
-    if (!ss) {
-        static int log_oom = 0;
-        if (log_oom < 2) { fprintf(stderr, "[DEBUG] superslab_refill returned NULL (OOM)\n"); log_oom++; }
-        return NULL;  // OOM
-    }
-
-    // Retry allocation (metadata already cached in superslab_refill)
-    meta = tls->meta;
-
-    // DEBUG: Check each condition (disabled for benchmarks)
-    // static int log_retry = 0;
-    // if (log_retry < 2) {
-    //     fprintf(stderr, "[DEBUG] Retry alloc: meta=%p, freelist=%p, used=%u, capacity=%u, slab_base=%p\n",
-    //             (void*)meta, meta ? meta->freelist : NULL,
-    //             meta ? meta->used : 0, meta ? meta->capacity : 0,
-    //             (void*)tls->slab_base);
-    //     log_retry++;
-    // }
-
-    if (meta && meta->freelist == NULL && meta->used < meta->capacity && tls->slab_base) {
-        size_t block_size = g_tiny_class_sizes[ss->size_class];
-        void* block = (void*)(tls->slab_base + ((size_t)meta->used * block_size));
-
-        // Disabled for benchmarks
-        // static int log_success = 0;
-        // if (log_success < 2) {
-        //     fprintf(stderr, "[DEBUG] Superslab alloc SUCCESS: ptr=%p, class=%d, used=%u->%u\n",
-        //             block, class_idx, meta->used, meta->used + 1);
-        //     log_success++;
-        // }
-
-        meta->used++;
-        // Track active blocks in SuperSlab for conservative reclamation
-        ss_active_inc(ss);
-        HAK_RET_ALLOC(class_idx, block);  // Phase 8.4: Zero hot-path overhead
-    }
-
-    // Disabled for benchmarks
-    // static int log_fail = 0;
-    // if (log_fail < 2) {
-    //     fprintf(stderr, "[DEBUG] Retry alloc FAILED - returning NULL\n");
-    //     log_fail++;
-    // }
-    return NULL;
-}
-
-// Phase 6.22-B: SuperSlab fast free path
-static inline void hak_tiny_free_superslab(void* ptr, SuperSlab* ss) {
-    ROUTE_MARK(16); // free_enter
-    HAK_DBG_INC(g_superslab_free_count);  // Phase 7.6: Track SuperSlab frees
-    // Get slab index (supports 1MB/2MB SuperSlabs)
-    int slab_idx = slab_index_for(ss, ptr);
-    size_t ss_size = (size_t)1ULL << ss->lg_size;
-    uintptr_t ss_base = (uintptr_t)ss;
-    if (__builtin_expect(slab_idx < 0, 0)) {
-        uintptr_t aux = tiny_remote_pack_diag(0xBAD1u, ss_base, ss_size, (uintptr_t)ptr);
-        tiny_debug_ring_record(TINY_RING_EVENT_REMOTE_INVALID, (uint16_t)ss->size_class, ptr, aux);
-        if (g_tiny_safe_free_strict) { raise(SIGUSR2); return; }
-        return;
-    }
-    TinySlabMeta* meta = &ss->slabs[slab_idx];
-    if (__builtin_expect(tiny_remote_watch_is(ptr), 0)) {
-        tiny_remote_watch_note("free_enter", ss, slab_idx, ptr, 0xA240u, tiny_self_u32(), 0);
-        extern __thread TinyTLSSlab g_tls_slabs[];
-        tiny_alloc_dump_tls_state(ss->size_class, "watch_free_enter", &g_tls_slabs[ss->size_class]);
-#if !HAKMEM_BUILD_RELEASE
-        extern __thread TinyTLSMag g_tls_mags[];
-        TinyTLSMag* watch_mag = &g_tls_mags[ss->size_class];
-        fprintf(stderr,
-                "[REMOTE_WATCH_MAG] cls=%u mag_top=%d cap=%d\n",
-                ss->size_class,
-                watch_mag->top,
-                watch_mag->cap);
-#endif
-    }
-    // BUGFIX: Validate size_class before using as array index (prevents OOB)
-    if (__builtin_expect(ss->size_class < 0 || ss->size_class >= TINY_NUM_CLASSES, 0)) {
-        tiny_debug_ring_record(TINY_RING_EVENT_REMOTE_INVALID, 0xF1, ptr, (uintptr_t)ss->size_class);
-        if (g_tiny_safe_free_strict) { raise(SIGUSR2); return; }
-        return;
-    }
-    if (__builtin_expect(g_tiny_safe_free, 0)) {
-        size_t blk = g_tiny_class_sizes[ss->size_class];
-        uint8_t* base = tiny_slab_base_for(ss, slab_idx);
-        uintptr_t delta = (uintptr_t)ptr - (uintptr_t)base;
-        int cap_ok = (meta->capacity > 0) ? 1 : 0;
-        int align_ok = (delta % blk) == 0;
-        int range_ok = cap_ok && (delta / blk) < meta->capacity;
-        if (!align_ok || !range_ok) {
-            uint32_t code = 0xA100u;
-            if (align_ok) code |= 0x2u;
-            if (range_ok) code |= 0x1u;
-            uintptr_t aux = tiny_remote_pack_diag(code, ss_base, ss_size, (uintptr_t)ptr);
-            tiny_debug_ring_record(TINY_RING_EVENT_REMOTE_INVALID, (uint16_t)ss->size_class, ptr, aux);
-            if (g_tiny_safe_free_strict) { raise(SIGUSR2); return; }
-            return;
-        }
-        // Duplicate in freelist (best-effort scan up to 64)
-        void* scan = meta->freelist; int scanned = 0; int dup = 0;
-        while (scan && scanned < 64) { if (scan == ptr) { dup = 1; break; } scan = *(void**)scan; scanned++; }
-        if (dup) {
-            uintptr_t aux = tiny_remote_pack_diag(0xDFu, ss_base, ss_size, (uintptr_t)ptr);
-            tiny_debug_ring_record(TINY_RING_EVENT_REMOTE_INVALID, (uint16_t)ss->size_class, ptr, aux);
-            if (g_tiny_safe_free_strict) { raise(SIGUSR2); return; }
-            return;
-        }
-    }
-
-    // Phase 6.23: Same-thread check
-    uint32_t my_tid = tiny_self_u32();
-    const int debug_guard = g_debug_remote_guard;
-    static __thread int g_debug_free_count = 0;
-    if (!g_tiny_force_remote && meta->owner_tid != 0 && meta->owner_tid == my_tid) {
-        ROUTE_MARK(17); // free_same_thread
-        // Fast path: Direct freelist push (same-thread)
-        if (0 && debug_guard && g_debug_free_count < 1) {
-            fprintf(stderr, "[FREE_SS] SAME-THREAD: owner=%u my=%u\n",
-                    meta->owner_tid, my_tid);
-            g_debug_free_count++;
-        }
-        if (__builtin_expect(meta->used == 0, 0)) {
-            uintptr_t aux = tiny_remote_pack_diag(0x00u, ss_base, ss_size, (uintptr_t)ptr);
-            tiny_debug_ring_record(TINY_RING_EVENT_REMOTE_INVALID, (uint16_t)ss->size_class, ptr, aux);
-            if (g_tiny_safe_free_strict) { raise(SIGUSR2); return; }
-            return;
-        }
-        tiny_remote_track_expect_alloc(ss, slab_idx, ptr, "local_free_enter", my_tid);
-        if (!tiny_remote_guard_allow_local_push(ss, slab_idx, meta, ptr, "local_free", my_tid)) {
-            #include "box/free_remote_box.h"
-            int transitioned = tiny_free_remote_box(ss, slab_idx, meta, ptr, my_tid);
-            if (transitioned) {
-                extern unsigned long long g_remote_free_transitions[];
-                g_remote_free_transitions[ss->size_class]++;
-                // Free-side route: remote transition observed
-                do {
-                    static int g_route_free = -1; if (__builtin_expect(g_route_free == -1, 0)) {
-                        const char* e = getenv("HAKMEM_TINY_ROUTE_FREE");
-                        g_route_free = (e && *e && *e != '0') ? 1 : 0; }
-                    if (g_route_free) route_free_commit((int)ss->size_class, (1ull<<18), 0xE2);
-                } while (0);
-            }
-            return;
-        }
-        // Optional: MidTC (TLS tcache for 128..1024B) — allow bypass via env HAKMEM_TINY_FREE_TO_SS=1
-        do {
-            static int g_free_to_ss = -1;
-            if (__builtin_expect(g_free_to_ss == -1, 0)) {
-                const char* e = getenv("HAKMEM_TINY_FREE_TO_SS");
-                g_free_to_ss = (e && *e && *e != '0') ? 1 : 0; // default OFF
-            }
-            if (!g_free_to_ss) {
-                int cls = (int)ss->size_class;
-                if (midtc_enabled() && cls >= 4) {
-                    if (midtc_push(cls, ptr)) {
-                        // Treat as returned to TLS cache (not SS freelist)
-                        meta->used--;
-                        ss_active_dec_one(ss);
-                        return;
-                    }
-                }
-            }
-        } while (0);
-
-        #include "box/free_local_box.h"
-        // Perform freelist push (+first-free publish if applicable)
-        void* prev_before = meta->freelist;
-        tiny_free_local_box(ss, slab_idx, meta, ptr, my_tid);
-        if (prev_before == NULL) {
-            ROUTE_MARK(19); // first_free_transition
-            extern unsigned long long g_first_free_transitions[];
-            g_first_free_transitions[ss->size_class]++;
-            ROUTE_MARK(20); // mailbox_publish
-            // Free-side route commit (one-shot)
-            do {
-                static int g_route_free = -1; if (__builtin_expect(g_route_free == -1, 0)) {
-                    const char* e = getenv("HAKMEM_TINY_ROUTE_FREE");
-                    g_route_free = (e && *e && *e != '0') ? 1 : 0; }
-                int cls = (int)ss->size_class;
-                if (g_route_free) route_free_commit(cls, (1ull<<19) | (1ull<<20), 0xE1);
-            } while (0);
-        }
-
-        if (__builtin_expect(debug_guard, 0)) {
-            fprintf(stderr, "[REMOTE_LOCAL] cls=%u slab=%d owner=%u my=%u ptr=%p prev=%p used=%u\n",
-                    ss->size_class, slab_idx, meta->owner_tid, my_tid, ptr, prev_before, meta->used);
-        }
-
-        // 空検出は別途（ホットパス除外）
-    } else {
-        ROUTE_MARK(18); // free_remote_transition
-        if (__builtin_expect(meta->owner_tid == my_tid && meta->owner_tid == 0, 0)) {
-            uintptr_t aux = tiny_remote_pack_diag(0xA300u, ss_base, ss_size, (uintptr_t)ptr);
-            tiny_debug_ring_record(TINY_RING_EVENT_REMOTE_INVALID, (uint16_t)ss->size_class, ptr, aux);
-            if (debug_guard) {
-                fprintf(stderr, "[REMOTE_OWNER_ZERO] cls=%u slab=%d ptr=%p my=%u used=%u\n",
-                        ss->size_class, slab_idx, ptr, my_tid, (unsigned)meta->used);
-            }
-        }
-        tiny_remote_track_expect_alloc(ss, slab_idx, ptr, "remote_free_enter", my_tid);
-        // Slow path: Remote free (cross-thread)
-        if (0 && debug_guard && g_debug_free_count < 5) {
-            fprintf(stderr, "[FREE_SS] CROSS-THREAD: owner=%u my=%u slab_idx=%d\n",
-                    meta->owner_tid, my_tid, slab_idx);
-            g_debug_free_count++;
-        }
-        if (__builtin_expect(g_tiny_safe_free, 0)) {
-            // Best-effort duplicate scan in remote stack (up to 64 nodes)
-            uintptr_t head = atomic_load_explicit(&ss->remote_heads[slab_idx], memory_order_acquire);
-            uintptr_t base = ss_base;
-            int scanned = 0; int dup = 0;
-            uintptr_t cur = head;
-            while (cur && scanned < 64) {
-                if ((cur < base) || (cur >= base + ss_size)) {
-                    uintptr_t aux = tiny_remote_pack_diag(0xA200u, base, ss_size, cur);
-                    tiny_debug_ring_record(TINY_RING_EVENT_REMOTE_INVALID, (uint16_t)ss->size_class, (void*)cur, aux);
-                    if (g_tiny_safe_free_strict) { raise(SIGUSR2); return; }
-                    break;
-                }
-                if ((void*)cur == ptr) { dup = 1; break; }
-                if (__builtin_expect(g_remote_side_enable, 0)) {
-                    if (!tiny_remote_sentinel_ok((void*)cur)) {
-                        uintptr_t aux = tiny_remote_pack_diag(0xA202u, base, ss_size, cur);
-                        tiny_debug_ring_record(TINY_RING_EVENT_REMOTE_INVALID, (uint16_t)ss->size_class, (void*)cur, aux);
-                        uintptr_t observed = atomic_load_explicit((_Atomic uintptr_t*)(void*)cur, memory_order_relaxed);
-                        tiny_remote_report_corruption("scan", (void*)cur, observed);
-                        fprintf(stderr,
-                                "[REMOTE_SENTINEL] cls=%u slab=%d cur=%p head=%p ptr=%p scanned=%d observed=0x%016" PRIxPTR " owner=%u used=%u freelist=%p remote_head=%p\n",
-                                ss->size_class,
-                                slab_idx,
-                                (void*)cur,
-                                (void*)head,
-                                ptr,
-                                scanned,
-                                observed,
-                                meta->owner_tid,
-                                (unsigned)meta->used,
-                                meta->freelist,
-                                (void*)atomic_load_explicit(&ss->remote_heads[slab_idx], memory_order_relaxed));
-                        if (g_tiny_safe_free_strict) { raise(SIGUSR2); return; }
-                        break;
-                    }
-                    cur = tiny_remote_side_get(ss, slab_idx, (void*)cur);
-                } else {
-                    if ((cur & (uintptr_t)(sizeof(void*) - 1)) != 0) {
-                        uintptr_t aux = tiny_remote_pack_diag(0xA201u, base, ss_size, cur);
-                        tiny_debug_ring_record(TINY_RING_EVENT_REMOTE_INVALID, (uint16_t)ss->size_class, (void*)cur, aux);
-                        if (g_tiny_safe_free_strict) { raise(SIGUSR2); return; }
-                        break;
-                    }
-                    cur = (uintptr_t)(*(void**)(void*)cur);
-                }
-                scanned++;
-            }
-            if (dup) {
-            uintptr_t aux = tiny_remote_pack_diag(0xD1u, ss_base, ss_size, (uintptr_t)ptr);
-            tiny_debug_ring_record(TINY_RING_EVENT_REMOTE_INVALID, (uint16_t)ss->size_class, ptr, aux);
-            if (g_tiny_safe_free_strict) { raise(SIGUSR2); return; }
-            return;
-        }
-        }
-        if (__builtin_expect(meta->used == 0, 0)) {
-            uintptr_t aux = tiny_remote_pack_diag(0x01u, ss_base, ss_size, (uintptr_t)ptr);
-            tiny_debug_ring_record(TINY_RING_EVENT_REMOTE_INVALID, (uint16_t)ss->size_class, ptr, aux);
-            if (g_tiny_safe_free_strict) { raise(SIGUSR2); return; }
-            return;
-        }
-        static int g_ss_adopt_en2 = -1; // env cached
-        if (g_ss_adopt_en2 == -1) {
-            char* e = getenv("HAKMEM_TINY_SS_ADOPT");
-            // 既定: Remote Queueを使う（1）。env指定時のみ上書き。
-            g_ss_adopt_en2 = (e == NULL) ? 1 : ((*e != '0') ? 1 : 0);
-            if (__builtin_expect(debug_guard, 0)) {
-                fprintf(stderr, "[FREE_SS] g_ss_adopt_en2=%d (env='%s')\n", g_ss_adopt_en2, e ? e : "(null)");
-            }
-        }
-        if (g_ss_adopt_en2) {
-            // Use remote queue
-            uintptr_t head_word = __atomic_load_n((uintptr_t*)ptr, __ATOMIC_RELAXED);
-            if (debug_guard) fprintf(stderr, "[REMOTE_PUSH_CALL] cls=%u slab=%d owner=%u my=%u ptr=%p used=%u remote_count=%u head=%p word=0x%016" PRIxPTR "\n",
-                    ss->size_class,
-                    slab_idx,
-                    meta->owner_tid,
-                    my_tid,
-                    ptr,
-                    (unsigned)meta->used,
-                    atomic_load_explicit(&ss->remote_counts[slab_idx], memory_order_relaxed),
-                    (void*)atomic_load_explicit(&ss->remote_heads[slab_idx], memory_order_relaxed),
-                    head_word);
-            int dup_remote = tiny_remote_queue_contains_guard(ss, slab_idx, ptr);
-            if (!dup_remote && __builtin_expect(g_remote_side_enable, 0)) {
-                dup_remote = (head_word == TINY_REMOTE_SENTINEL) || tiny_remote_side_contains(ss, slab_idx, ptr);
-            }
-            if (__builtin_expect(head_word == TINY_REMOTE_SENTINEL && !dup_remote && g_debug_remote_guard, 0)) {
-                tiny_remote_watch_note("dup_scan_miss", ss, slab_idx, ptr, 0xA215u, my_tid, 0);
-            }
-            if (dup_remote) {
-                uintptr_t aux = tiny_remote_pack_diag(0xA214u, ss_base, ss_size, (uintptr_t)ptr);
-                tiny_remote_watch_mark(ptr, "dup_prevent", my_tid);
-                tiny_remote_watch_note("dup_prevent", ss, slab_idx, ptr, 0xA214u, my_tid, 0);
-                tiny_debug_ring_record(TINY_RING_EVENT_REMOTE_INVALID, (uint16_t)ss->size_class, ptr, aux);
-                if (g_tiny_safe_free_strict) { raise(SIGUSR2); return; }
-                return;
-            }
-            if (__builtin_expect(g_remote_side_enable && (head_word & 0xFFFFu) == 0x6261u, 0)) {
-                // TLS guard scribble detected on the node's first word → same-pointer double free across routes
-                uintptr_t aux = tiny_remote_pack_diag(0xA213u, ss_base, ss_size, (uintptr_t)ptr);
-                tiny_debug_ring_record(TINY_RING_EVENT_REMOTE_INVALID, (uint16_t)ss->size_class, ptr, aux);
-                tiny_remote_watch_mark(ptr, "pre_push", my_tid);
-                tiny_remote_watch_note("pre_push", ss, slab_idx, ptr, 0xA231u, my_tid, 0);
-                tiny_remote_report_corruption("pre_push", ptr, head_word);
-                if (g_tiny_safe_free_strict) { raise(SIGUSR2); return; }
-                return;
-            }
-            if (__builtin_expect(tiny_remote_watch_is(ptr), 0)) {
-                tiny_remote_watch_note("free_remote", ss, slab_idx, ptr, 0xA232u, my_tid, 0);
-            }
-            int was_empty = ss_remote_push(ss, slab_idx, ptr);
-            meta->used--;
-            ss_active_dec_one(ss);
-            if (was_empty) {
-                extern unsigned long long g_remote_free_transitions[];
-                g_remote_free_transitions[ss->size_class]++;
-                ss_partial_publish((int)ss->size_class, ss);
-            }
-        } else {
-            // Fallback: direct freelist push (legacy)
-            if (debug_guard) fprintf(stderr, "[FREE_SS] Using LEGACY freelist push (not remote queue)\n");
-            void* prev = meta->freelist;
-            *(void**)ptr = prev;
-            meta->freelist = ptr;
-            do {
-                static int g_mask_en = -1;
-                if (__builtin_expect(g_mask_en == -1, 0)) {
-                    const char* e = getenv("HAKMEM_TINY_FREELIST_MASK");
-                    g_mask_en = (e && *e && *e != '0') ? 1 : 0;
-                }
-                if (__builtin_expect(g_mask_en, 0) && prev == NULL) {
-                    uint32_t bit = (1u << slab_idx);
-                    atomic_fetch_or_explicit(&ss->freelist_mask, bit, memory_order_release);
-                }
-            } while (0);
-            meta->used--;
-            ss_active_dec_one(ss);
-            if (prev == NULL) {
-                ss_partial_publish((int)ss->size_class, ss);
-            }
-        }
-
-        // 空検出は別途（ホットパス除外）
-    }
-}
-
-void hak_tiny_free(void* ptr) {
-    if (!ptr || !g_tiny_initialized) return;
-
-    hak_tiny_stats_poll();
-    tiny_debug_ring_record(TINY_RING_EVENT_FREE_ENTER, 0, ptr, 0);
-
-#ifdef HAKMEM_TINY_BENCH_SLL_ONLY
-    // Bench-only SLL-only free: push to TLS SLL for ≤64B when possible
-    {
-        int class_idx = -1;
-        if (g_use_superslab) {
-            // FIXED: Use hak_super_lookup() instead of hak_super_lookup() to avoid false positives
-            SuperSlab* ss = hak_super_lookup(ptr);
-            if (ss && ss->magic == SUPERSLAB_MAGIC) class_idx = ss->size_class;
-        }
-        if (class_idx < 0) {
-            TinySlab* slab = hak_tiny_owner_slab(ptr);
-            if (slab) class_idx = slab->class_idx;
-        }
-        if (class_idx >= 0 && class_idx <= 3) {
-            uint32_t sll_cap = sll_cap_for_class(class_idx, (uint32_t)TINY_TLS_MAG_CAP);
-            if ((int)g_tls_sll_count[class_idx] < (int)sll_cap) {
-                *(void**)ptr = g_tls_sll_head[class_idx];
-                g_tls_sll_head[class_idx] = ptr;
-                g_tls_sll_count[class_idx]++;
-                return;
-            }
-        }
-    }
-#endif
-
-    if (g_tiny_ultra) {
-        int class_idx = -1;
-        if (g_use_superslab) {
-            // FIXED: Use hak_super_lookup() instead of hak_super_lookup() to avoid false positives
-            SuperSlab* ss = hak_super_lookup(ptr);
-            if (ss && ss->magic == SUPERSLAB_MAGIC) class_idx = ss->size_class;
-        }
-        if (class_idx < 0) {
-            TinySlab* slab = hak_tiny_owner_slab(ptr);
-            if (slab) class_idx = slab->class_idx;
-        }
-        if (class_idx >= 0) {
-            // Ultra free: push directly to TLS SLL without magazine init
-            int sll_cap = ultra_sll_cap_for_class(class_idx);
-            if ((int)g_tls_sll_count[class_idx] < sll_cap) {
-                *(void**)ptr = g_tls_sll_head[class_idx];
-                g_tls_sll_head[class_idx] = ptr;
-                g_tls_sll_count[class_idx]++;
-                return;
-            }
-        }
-        // Fallback to existing path if class resolution fails
-    }
-
-    SuperSlab* fast_ss = NULL;
-    TinySlab* fast_slab = NULL;
-    int fast_class_idx = -1;
-    if (g_use_superslab) {
-        fast_ss = hak_super_lookup(ptr);
-        if (fast_ss && fast_ss->magic == SUPERSLAB_MAGIC) {
-            fast_class_idx = fast_ss->size_class;
-            // BUGFIX: Validate size_class before using as array index (prevents OOB = 85% of FREE_TO_SS SEGV)
-            if (__builtin_expect(fast_class_idx < 0 || fast_class_idx >= TINY_NUM_CLASSES, 0)) {
-                tiny_debug_ring_record(TINY_RING_EVENT_REMOTE_INVALID, 0xF0, ptr, (uintptr_t)fast_class_idx);
-                if (g_tiny_safe_free_strict) { raise(SIGUSR2); return; }
-                fast_ss = NULL;
-                fast_class_idx = -1;
-            }
-        } else {
-            fast_ss = NULL;
-        }
-    }
-    if (fast_class_idx < 0) {
-        fast_slab = hak_tiny_owner_slab(ptr);
-        if (fast_slab) fast_class_idx = fast_slab->class_idx;
-    }
-    // Safety: detect class mismatch (SS vs TinySlab) early
-    if (__builtin_expect(g_tiny_safe_free && fast_class_idx >= 0, 0)) {
-        int ss_cls = -1, ts_cls = -1;
-        SuperSlab* chk_ss = fast_ss ? fast_ss : (g_use_superslab ? hak_super_lookup(ptr) : NULL);
-        if (chk_ss && chk_ss->magic == SUPERSLAB_MAGIC) ss_cls = chk_ss->size_class;
-        TinySlab* chk_slab = fast_slab ? fast_slab : hak_tiny_owner_slab(ptr);
-        if (chk_slab) ts_cls = chk_slab->class_idx;
-        if (ss_cls >= 0 && ts_cls >= 0 && ss_cls != ts_cls) {
-            uintptr_t packed = ((uintptr_t)(uint16_t)ss_cls << 16) | (uint16_t)ts_cls;
-            tiny_debug_ring_record(TINY_RING_EVENT_REMOTE_INVALID, (uint16_t)fast_class_idx, ptr, packed);
-            if (g_tiny_safe_free_strict) { raise(SIGUSR2); return; }
-        }
-    }
-    if (fast_class_idx >= 0) {
-        tiny_debug_ring_record(TINY_RING_EVENT_FREE_ENTER, (uint16_t)fast_class_idx, ptr, 1);
-    }
-    if (fast_class_idx >= 0 && g_fast_enable && g_fast_cap[fast_class_idx] != 0) {
-        if (tiny_fast_push(fast_class_idx, ptr)) {
-            tiny_debug_ring_record(TINY_RING_EVENT_FREE_FAST, (uint16_t)fast_class_idx, ptr, 0);
-            HAK_STAT_FREE(fast_class_idx);
-            return;
-        }
-    }
-
-    // SuperSlab detection: prefer fast mask-based check when available
-    SuperSlab* ss = fast_ss;
-    if (!ss && g_use_superslab) {
-        ss = hak_super_lookup(ptr);
-        if (!(ss && ss->magic == SUPERSLAB_MAGIC)) {
-            ss = NULL;
-        }
-    }
-    if (ss && ss->magic == SUPERSLAB_MAGIC) {
-        // BUGFIX: Validate size_class before using as array index (prevents OOB)
-        if (__builtin_expect(ss->size_class < 0 || ss->size_class >= TINY_NUM_CLASSES, 0)) {
-            tiny_debug_ring_record(TINY_RING_EVENT_REMOTE_INVALID, 0xF2, ptr, (uintptr_t)ss->size_class);
-            if (g_tiny_safe_free_strict) { raise(SIGUSR2); return; }
-            return;
-        }
-        // Direct SuperSlab free (avoid second lookup TOCTOU)
-        hak_tiny_free_superslab(ptr, ss);
-        HAK_STAT_FREE(ss->size_class);
-        return;
-    }
-
-    // Fallback to TinySlab only when SuperSlab is not in use
-    TinySlab* slab = fast_slab;
-    if (!slab) slab = hak_tiny_owner_slab(ptr);
-    if (!slab) return;  // Not managed by Tiny Pool
-    if (__builtin_expect(g_use_superslab, 0)) {
-        // In SS mode, a pointer that resolves only to TinySlab is suspicious → treat as invalid free
-        tiny_debug_ring_record(TINY_RING_EVENT_REMOTE_INVALID, 0xEE, ptr, 0xF1u);
-        if (g_tiny_safe_free_strict) { raise(SIGUSR2); return; }
-        return;
-    }
-
-    hak_tiny_free_with_slab(ptr, slab);
-}
-
-// ============================================================================
-// EXTRACTED TO hakmem_tiny_query.c (Phase 2B-1)
-// ============================================================================
-// EXTRACTED: int hak_tiny_is_managed(void* ptr) {
-// EXTRACTED:     if (!ptr || !g_tiny_initialized) return 0;
-// EXTRACTED:     // Phase 6.12.1: O(1) slab lookup via registry/list
-// EXTRACTED:     return hak_tiny_owner_slab(ptr) != NULL || hak_super_lookup(ptr) != NULL;
-// EXTRACTED: }
-
-// Phase 7.6: Check if pointer is managed by Tiny Pool (TinySlab OR SuperSlab)
-// EXTRACTED: int hak_tiny_is_managed_superslab(void* ptr) {
-// EXTRACTED:     if (!ptr || !g_tiny_initialized) return 0;
-// EXTRACTED: 
-// EXTRACTED:     // Safety: Only check if g_use_superslab is enabled
-// EXTRACTED:     if (g_use_superslab) {
-// EXTRACTED:         SuperSlab* ss = hak_super_lookup(ptr);
-// EXTRACTED:         // Phase 8.2 optimization: Use alignment check instead of mincore()
-// EXTRACTED:         // SuperSlabs are always SUPERSLAB_SIZE-aligned (2MB)
-// EXTRACTED:         if (ss && ((uintptr_t)ss & (SUPERSLAB_SIZE - 1)) == 0) {
-// EXTRACTED:             if (ss->magic == SUPERSLAB_MAGIC) {
-// EXTRACTED:                 return 1;  // Valid SuperSlab pointer
-// EXTRACTED:             }
-// EXTRACTED:         }
-// EXTRACTED:     }
-// EXTRACTED: 
-// EXTRACTED:     // Fallback to TinySlab check
-// EXTRACTED:     return hak_tiny_owner_slab(ptr) != NULL;
-// EXTRACTED: }
-
-// Return the usable size for a Tiny-managed pointer (0 if unknown/not tiny).
-// Prefer SuperSlab metadata when available; otherwise use TinySlab owner class.
-// EXTRACTED: size_t hak_tiny_usable_size(void* ptr) {
-// EXTRACTED:     if (!ptr || !g_tiny_initialized) return 0;
-// EXTRACTED: 
-// EXTRACTED:     // Check SuperSlab first via registry (safe under direct link and LD)
-// EXTRACTED:     if (g_use_superslab) {
-// EXTRACTED:         SuperSlab* ss = hak_super_lookup(ptr);
-// EXTRACTED:         if (ss && ss->magic == SUPERSLAB_MAGIC) {
-// EXTRACTED:             int k = (int)ss->size_class;
-// EXTRACTED:             if (k >= 0 && k < TINY_NUM_CLASSES) {
-// EXTRACTED:                 return g_tiny_class_sizes[k];
-// EXTRACTED:             }
-// EXTRACTED:         }
-// EXTRACTED:     }
-// EXTRACTED: 
-// EXTRACTED:     // Fallback: TinySlab owner lookup
-// EXTRACTED:     TinySlab* slab = hak_tiny_owner_slab(ptr);
-// EXTRACTED:     if (slab) {
-// EXTRACTED:         int k = slab->class_idx;
-// EXTRACTED:         if (k >= 0 && k < TINY_NUM_CLASSES) {
-// EXTRACTED:             return g_tiny_class_sizes[k];
-// EXTRACTED:         }
-// EXTRACTED:     }
-// EXTRACTED:     return 0;
-// EXTRACTED: }
-
-
-// ============================================================================
-// Statistics and Debug Functions - Extracted to hakmem_tiny_stats.c
-// ============================================================================
-// (Phase 2B API headers moved to top of file)
-
-
-// Optional shutdown hook to stop background components (e.g., Intelligence Engine)
-void hak_tiny_shutdown(void) {
-    // Release TLS SuperSlab references (dec refcount) before stopping BG/INT
-    for (int k = 0; k < TINY_NUM_CLASSES; k++) {
-        TinyTLSSlab* tls = &g_tls_slabs[k];
-        if (tls->ss) {
-            superslab_ref_dec(tls->ss);
-            tls->ss = NULL;
-            tls->meta = NULL;
-            tls->slab_base = NULL;
-        }
-    }
-    if (g_bg_bin_started) {
-        g_bg_bin_stop = 1;
-        if (!pthread_equal(tiny_self_pt(), g_bg_bin_thread)) {
-            pthread_join(g_bg_bin_thread, NULL);
-        }
-        g_bg_bin_started = 0;
-        g_bg_bin_enable = 0;
-    }
-    tiny_obs_shutdown();
-    if (g_int_engine && g_int_started) {
-        g_int_stop = 1;
-        // Best-effort join; avoid deadlock if called from within the thread
-        if (!pthread_equal(tiny_self_pt(), g_int_thread)) {
-            pthread_join(g_int_thread, NULL);
-        }
-        g_int_started = 0;
-        g_int_engine = 0;
-    }
-}
-
-
-
-
-
-// Always-available: Trim empty slabs (release fully-free slabs)
diff --git a/core/hakmem_tiny_refill.inc.h b/core/hakmem_tiny_refill.inc.h
index f40c916b..210a5c8c 100644
--- a/core/hakmem_tiny_refill.inc.h
+++ b/core/hakmem_tiny_refill.inc.h
@@ -20,6 +20,7 @@
 #include "box/tls_sll_box.h"
 #include "hakmem_tiny_integrity.h"
 #include "box/tiny_next_ptr_box.h"
+#include "tiny_region_id.h"   // For HEADER_MAGIC/HEADER_CLASS_MASK (prepare header before SLL push)
 #include <stdint.h>
 #include <stdatomic.h>
 
@@ -384,6 +385,12 @@ int sll_refill_small_from_ss(int class_idx, int max_take)
 
         tiny_debug_validate_node_base(class_idx, p, "sll_refill_small_from_ss");
 
+        // Prepare header for header-classes so that safeheader mode accepts the push
+#if HAKMEM_TINY_HEADER_CLASSIDX
+        if (class_idx != 0 && class_idx != 7) {
+            *(uint8_t*)p = HEADER_MAGIC | (class_idx & HEADER_CLASS_MASK);
+        }
+#endif
         // SLL push 失敗時はそれ以上積まない（p はTLS slab管理下なので破棄でOK）
         if (!tls_sll_push(class_idx, p, cap)) {
             break;
diff --git a/core/hakmem_tiny_refill_p0.inc.h b/core/hakmem_tiny_refill_p0.inc.h
index 1ce871e3..a6500daf 100644
--- a/core/hakmem_tiny_refill_p0.inc.h
+++ b/core/hakmem_tiny_refill_p0.inc.h
@@ -85,10 +85,15 @@ static inline int sll_refill_batch_from_ss(int class_idx, int max_take) {
     INTEGRITY_CHECK_SLAB_METADATA(meta_initial, "P0 refill entry");
 #endif
 
-    // Optional: Direct-FC fast path (kept as-is from original P0, no aliasing)
+    // Optional: Direct-FC fast path（全クラス対応 A/B）。
+    // Env:
+    //  - HAKMEM_TINY_P0_DIRECT_FC=1    → C5優先（互換）
+    //  - HAKMEM_TINY_P0_DIRECT_FC_C7=1 → C7のみ（互換）
+    //  - HAKMEM_TINY_P0_DIRECT_FC_ALL=1 → 全クラス（推奨、Phase 1 目標）
     do {
         static int g_direct_fc = -1;
         static int g_direct_fc_c7 = -1;
+        static int g_direct_fc_all = -1;
         if (__builtin_expect(g_direct_fc == -1, 0)) {
             const char* e = getenv("HAKMEM_TINY_P0_DIRECT_FC");
             g_direct_fc = (e && *e && *e == '0') ? 0 : 1;
@@ -97,7 +102,12 @@ static inline int sll_refill_batch_from_ss(int class_idx, int max_take) {
             const char* e7 = getenv("HAKMEM_TINY_P0_DIRECT_FC_C7");
             g_direct_fc_c7 = (e7 && *e7) ? ((*e7 == '0') ? 0 : 1) : 0;
         }
-        if (__builtin_expect((g_direct_fc && class_idx == 5) ||
+        if (__builtin_expect(g_direct_fc_all == -1, 0)) {
+            const char* ea = getenv("HAKMEM_TINY_P0_DIRECT_FC_ALL");
+            g_direct_fc_all = (ea && *ea && *ea != '0') ? 1 : 0;
+        }
+        if (__builtin_expect(g_direct_fc_all ||
+                             (g_direct_fc && class_idx == 5) ||
                              (g_direct_fc_c7 && class_idx == 7), 0)) {
             int room = tiny_fc_room(class_idx);
             if (room <= 0) return 0;
diff --git a/core/hakmem_tiny_refill_p0_stub.c b/core/hakmem_tiny_refill_p0_stub.c
new file mode 100644
index 00000000..c51e8508
--- /dev/null
+++ b/core/hakmem_tiny_refill_p0_stub.c
@@ -0,0 +1,14 @@
+// hakmem_tiny_refill_p0_stub.c
+// Provide a default implementation of sll_refill_batch_from_ss when
+// HAKMEM_TINY_P0_BATCH_REFILL is not compiled in. This keeps tiny_alloc_fast
+// free to select batch mode at runtime (HAKMEM_TINY_REFILL_BATCH=1).
+
+#include "hakmem_tiny.h"
+
+// Declared in hakmem_tiny.c via hakmem_tiny_refill.inc.h
+int sll_refill_small_from_ss(int class_idx, int max_take);
+
+int sll_refill_batch_from_ss(int class_idx, int max_take) {
+    return sll_refill_small_from_ss(class_idx, max_take);
+}
+
diff --git a/core/hakmem_tiny_superslab.c b/core/hakmem_tiny_superslab.c
index 8bfb1341..96be7f31 100644
--- a/core/hakmem_tiny_superslab.c
+++ b/core/hakmem_tiny_superslab.c
@@ -19,6 +19,8 @@
 #include <sys/resource.h>  // getrlimit for OOM diagnostics
 #include <sys/mman.h>
 #include "hakmem_internal.h"  // HAKMEM_LOG for release-silent logging
+#include "tiny_region_id.h"   // For HEADER_MAGIC / HEADER_CLASS_MASK (restore header on remote-drain)
+#include "box/tiny_next_ptr_box.h" // For tiny_next_write
 
 static int g_ss_force_lg = -1;
 static _Atomic int g_ss_populate_once = 0;
@@ -120,6 +122,13 @@ void _ss_remote_drain_to_freelist_unsafe(SuperSlab* ss, int slab_idx, TinySlabMe
     uintptr_t cur = head;
     while (cur != 0) {
         uintptr_t next = *(uintptr_t*)cur;  // remote-next stored at offset 0
+        // Restore header for header-classes (class 1-6) which were clobbered by remote push
+#if HAKMEM_TINY_HEADER_CLASSIDX
+        if (cls != 0 && cls != 7) {
+            uint8_t expected = (uint8_t)(HEADER_MAGIC | (cls & HEADER_CLASS_MASK));
+            *(uint8_t*)(uintptr_t)cur = expected;
+        }
+#endif
         // Rewrite next pointer to Box representation for this class
         tiny_next_write(cls, (void*)cur, prev);
         prev = (void*)cur;
diff --git a/core/pool_refill_legacy.c.bak b/core/pool_refill_legacy.c.bak
deleted file mode 100644
index a5bed62f..00000000
--- a/core/pool_refill_legacy.c.bak
+++ /dev/null
@@ -1,105 +0,0 @@
-#include "pool_refill.h"
-#include "pool_tls.h"
-#include <sys/mman.h>
-#include <stdint.h>
-#include <errno.h>
-
-// Get refill count from Box 1
-extern int pool_get_refill_count(int class_idx);
-
-// Refill and return first block
-void* pool_refill_and_alloc(int class_idx) {
-    int count = pool_get_refill_count(class_idx);
-    if (count <= 0) return NULL;
-
-    // Batch allocate from existing Pool backend
-    void* chain = backend_batch_carve(class_idx, count);
-    if (!chain) return NULL;  // OOM
-
-    // Pop first block for return
-    void* ret = chain;
-    chain = *(void**)chain;
-    count--;
-
-    #if POOL_USE_HEADERS
-    // Write header for the block we're returning
-    *((uint8_t*)ret - POOL_HEADER_SIZE) = POOL_MAGIC | class_idx;
-    #endif
-
-    // Install rest in TLS (if any)
-    if (count > 0 && chain) {
-        pool_install_chain(class_idx, chain, count);
-    }
-
-    return ret;
-}
-
-// Backend batch carve - Phase 1: Direct mmap allocation
-void* backend_batch_carve(int class_idx, int count) {
-    if (class_idx < 0 || class_idx >= POOL_SIZE_CLASSES || count <= 0) {
-        return NULL;
-    }
-
-    // Get the class size
-    size_t block_size = POOL_CLASS_SIZES[class_idx];
-
-    // For Phase 1: Allocate a single large chunk via mmap
-    // and carve it into blocks
-    #if POOL_USE_HEADERS
-    size_t total_block_size = block_size + POOL_HEADER_SIZE;
-    #else
-    size_t total_block_size = block_size;
-    #endif
-
-    // Allocate enough for all requested blocks
-    size_t total_size = total_block_size * count;
-
-    // Round up to page size
-    size_t page_size = 4096;
-    total_size = (total_size + page_size - 1) & ~(page_size - 1);
-
-    // Allocate memory via mmap
-    void* chunk = mmap(NULL, total_size, PROT_READ | PROT_WRITE,
-                       MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
-    if (chunk == MAP_FAILED) {
-        return NULL;
-    }
-
-    // Carve into blocks and chain them
-    void* head = NULL;
-    void* tail = NULL;
-    char* ptr = (char*)chunk;
-
-    for (int i = 0; i < count; i++) {
-        #if POOL_USE_HEADERS
-        // Skip header space - user data starts after header
-        void* user_ptr = ptr + POOL_HEADER_SIZE;
-        #else
-        void* user_ptr = ptr;
-        #endif
-
-        // Chain the blocks
-        if (!head) {
-            head = user_ptr;
-            tail = user_ptr;
-        } else {
-            *(void**)tail = user_ptr;
-            tail = user_ptr;
-        }
-
-        // Move to next block
-        ptr += total_block_size;
-
-        // Stop if we'd go past the allocated chunk
-        if ((ptr + total_block_size) > ((char*)chunk + total_size)) {
-            break;
-        }
-    }
-
-    // Terminate chain
-    if (tail) {
-        *(void**)tail = NULL;
-    }
-
-    return head;
-}
\ No newline at end of file
diff --git a/core/pool_tls_remote.c b/core/pool_tls_remote.c
index c3c8c9fe..f4a18e59 100644
--- a/core/pool_tls_remote.c
+++ b/core/pool_tls_remote.c
@@ -3,6 +3,7 @@
 #include <stdlib.h>
 #include <sys/syscall.h>
 #include <unistd.h>
+#include "box/tiny_next_ptr_box.h"  // Box API: preserve header by using class-aware next offset
 
 #define REMOTE_BUCKETS 256
 
@@ -34,7 +35,8 @@ int pool_remote_push(int class_idx, void* ptr, int owner_tid){
     r = (RemoteRec*)calloc(1, sizeof(RemoteRec));
     r->tid = owner_tid; r->next = g_buckets[b]; g_buckets[b] = r;
   }
-  *(void**)ptr = r->head[class_idx];
+  // Use Box next-pointer API to avoid clobbering header (classes 1-6 store next at base+1)
+  tiny_next_write(class_idx, ptr, r->head[class_idx]);
   r->head[class_idx] = ptr;
   r->count[class_idx]++;
   pthread_mutex_unlock(&g_locks[b]);
@@ -57,9 +59,9 @@ int pool_remote_pop_chain(int class_idx, int max_take, void** out_chain){
     int batch = 0; if (max_take <= 0) max_take = 32;
     void* chain = NULL; void* tail = NULL;
     while (head && batch < max_take){
-      void* nxt = *(void**)head;
+      void* nxt = tiny_next_read(class_idx, head);
       if (!chain){ chain = head; tail = head; }
-      else { *(void**)tail = head; tail = head; }
+      else { tiny_next_write(class_idx, tail, head); tail = head; }
       head = nxt; batch++;
     }
     r->head[class_idx] = head;
diff --git a/core/refill/ss_refill_fc.h b/core/refill/ss_refill_fc.h
new file mode 100644
index 00000000..2d7b91a1
--- /dev/null
+++ b/core/refill/ss_refill_fc.h
@@ -0,0 +1,267 @@
+// ss_refill_fc.h - Direct SuperSlab → FastCache refill (bypass SLL)
+// Purpose: Optimize refill path from 2 hops (SS→SLL→FC) to 1 hop (SS→FC)
+//
+// Box Theory Responsibility:
+// - Refill FastCache directly from SuperSlab freelist/carving
+// - Handle remote drain when threshold exceeded
+// - Restore headers for classes 1-6 (NOT class 0 or 7)
+// - Update active counters consistently
+//
+// Performance Impact:
+// - Eliminates SLL intermediate layer overhead
+// - Reduces allocation latency by ~30-50% (expected)
+// - Simplifies refill path (fewer cache misses)
+
+#ifndef HAK_REFILL_SS_REFILL_FC_H
+#define HAK_REFILL_SS_REFILL_FC_H
+
+// NOTE: This is an .inc.h file meant to be included from hakmem_tiny.c
+// It assumes all types (SuperSlab, TinySlabMeta, TinyTLSSlab, etc.) are already defined.
+// Do NOT include this file directly - it will be included at the appropriate point in hakmem_tiny.c
+
+#include <stdatomic.h>
+#include <stdlib.h>  // atoi()
+
+// Remote drain threshold (default: 32 blocks)
+// Can be overridden at runtime via HAKMEM_TINY_P0_DRAIN_THRESH
+#ifndef REMOTE_DRAIN_THRESHOLD
+#define REMOTE_DRAIN_THRESHOLD 32
+#endif
+
+// Header constants (from tiny_region_id.h - needed when HAKMEM_TINY_HEADER_CLASSIDX=1)
+#ifndef HEADER_MAGIC
+#define HEADER_MAGIC 0xA0
+#endif
+#ifndef HEADER_CLASS_MASK
+#define HEADER_CLASS_MASK 0x0F
+#endif
+
+// ========================================================================
+// REFILL CONTRACT: ss_refill_fc_fill() - Standard Refill Entry Point
+// ========================================================================
+//
+// This is the CANONICAL refill function for the Front-Direct architecture.
+// All allocation refills should route through this function when:
+// - HAKMEM_TINY_FRONT_DIRECT=1 (Front-Direct mode)
+// - HAKMEM_TINY_REFILL_BATCH=1 (Batch refill mode)
+// - HAKMEM_TINY_P0_DIRECT_FC_ALL=1 (P0 direct FastCache mode)
+//
+// Architecture: SuperSlab → FastCache (1-hop, bypasses SLL)
+//
+// Replaces legacy 2-hop path: SuperSlab → SLL → FastCache
+//
+// Box Boundaries:
+// - Input:  class_idx (0-7), want (target refill count)
+// - Output: BASE pointers pushed to FastCache (header at ptr-1 for C1-C6)
+// - Side Effects: Updates meta->used, meta->carved, ss->total_active_blocks
+//
+// Guarantees:
+// - Remote drain at threshold (default: 32 blocks)
+// - Freelist priority (reuse before carve)
+// - Header restoration for classes 1-6 (NOT class 0 or 7)
+// - Atomic active counter updates (thread-safe)
+// - Fail-fast on capacity exhaustion (no infinite loops)
+//
+// ENV Controls:
+// - HAKMEM_TINY_P0_DRAIN_THRESH: Remote drain threshold (default: 32)
+// - HAKMEM_TINY_P0_NO_DRAIN: Disable remote drain (debug only)
+// ========================================================================
+
+/**
+ * ss_refill_fc_fill - Refill FastCache directly from SuperSlab
+ *
+ * @param class_idx Size class index (0-7)
+ * @param want Target number of blocks to refill
+ * @return Number of blocks successfully pushed to FastCache
+ *
+ * Algorithm:
+ * 1. Check TLS slab availability (call superslab_refill if needed)
+ * 2. Remote drain if pending count >= threshold
+ * 3. Refill loop (while produced < want and FC has room):
+ *    a. Try pop from freelist (O(1))
+ *    b. Try carve from slab (O(1))
+ *    c. Call superslab_refill if slab exhausted
+ *    d. Restore header for classes 1-6 (NOT 0 or 7)
+ *    e. Push to FastCache
+ * 4. Update active counter (once, after loop)
+ * 5. Return produced count
+ *
+ * Box Contract:
+ * - Input: valid class_idx (0 <= idx < TINY_NUM_CLASSES)
+ * - Output: BASE pointers (header at ptr-1 for classes 1-6)
+ * - Invariants: meta->used, meta->carved consistent
+ * - Side effects: Updates ss->total_active_blocks
+ */
+static inline int ss_refill_fc_fill(int class_idx, int want) {
+    // ========== Step 1: Check TLS slab ==========
+    TinyTLSSlab* tls = &g_tls_slabs[class_idx];
+    SuperSlab* ss = tls->ss;
+    TinySlabMeta* meta = tls->meta;
+
+    // If no TLS slab configured, attempt refill
+    if (!ss || !meta) {
+        ss = superslab_refill(class_idx);
+        if (!ss) return 0;  // Failed to get SuperSlab
+
+        // Reload TLS state after superslab_refill
+        tls = &g_tls_slabs[class_idx];
+        ss = tls->ss;
+        meta = tls->meta;
+
+        // Safety check after reload
+        if (!ss || !meta) return 0;
+    }
+
+    int slab_idx = tls->slab_idx;
+    if (slab_idx < 0) return 0;  // Invalid slab index
+
+    // ========== Step 2: Remote Drain (if needed) ==========
+    uint32_t remote_cnt = atomic_load_explicit(&ss->remote_counts[slab_idx], memory_order_acquire);
+
+    // Runtime threshold override (cached)
+    static int drain_thresh = -1;
+    if (__builtin_expect(drain_thresh == -1, 0)) {
+        const char* e = getenv("HAKMEM_TINY_P0_DRAIN_THRESH");
+        drain_thresh = (e && *e) ? atoi(e) : REMOTE_DRAIN_THRESHOLD;
+        if (drain_thresh < 0) drain_thresh = 0;
+    }
+
+    if (remote_cnt >= (uint32_t)drain_thresh) {
+        // Check if drain is disabled (debugging flag)
+        static int no_drain = -1;
+        if (__builtin_expect(no_drain == -1, 0)) {
+            const char* e = getenv("HAKMEM_TINY_P0_NO_DRAIN");
+            no_drain = (e && *e && *e != '0') ? 1 : 0;
+        }
+
+        if (!no_drain) {
+            _ss_remote_drain_to_freelist_unsafe(ss, slab_idx, meta);
+        }
+    }
+
+    // ========== Step 3: Refill Loop ==========
+    int produced = 0;
+    size_t stride = tiny_stride_for_class(class_idx);
+    uint8_t* slab_base = tiny_slab_base_for_geometry(ss, slab_idx);
+
+    while (produced < want) {
+        void* p = NULL;
+
+        // Option A: Pop from freelist (if available)
+        if (meta->freelist != NULL) {
+            p = meta->freelist;
+            meta->freelist = tiny_next_read(class_idx, p);
+            meta->used++;
+        }
+        // Option B: Carve new block (if capacity available)
+        else if (meta->carved < meta->capacity) {
+            p = (void*)(slab_base + (meta->carved * stride));
+            meta->carved++;
+            meta->used++;
+        }
+        // Option C: Slab exhausted, need new slab
+        else {
+            ss = superslab_refill(class_idx);
+            if (!ss) break;  // Failed to get new slab
+
+            // Reload TLS state after superslab_refill
+            tls = &g_tls_slabs[class_idx];
+            ss = tls->ss;
+            meta = tls->meta;
+            slab_idx = tls->slab_idx;
+
+            // Safety check after reload
+            if (!ss || !meta || slab_idx < 0) break;
+
+            // Update stride/base for new slab
+            stride = tiny_stride_for_class(class_idx);
+            slab_base = tiny_slab_base_for_geometry(ss, slab_idx);
+            continue;  // Retry allocation from new slab
+        }
+
+        // ========== Step 3d: Restore Header (classes 1-6 only) ==========
+#if HAKMEM_TINY_HEADER_CLASSIDX
+        // Phase E1-CORRECT: Restore headers for classes 1-6
+        // Rationale:
+        // - Class 0 (8B): Never had header (too small, 12.5% overhead)
+        // - Classes 1-6: Standard header (0.8-6% overhead)
+        // - Class 7 (1KB): Headerless by design (mimalloc compatibility)
+        //
+        // Note: Freelist operations may corrupt headers, so we restore them here
+        if (class_idx >= 1 && class_idx <= 6) {
+            *(uint8_t*)p = HEADER_MAGIC | (class_idx & HEADER_CLASS_MASK);
+        }
+#endif
+
+        // ========== Step 3e: Push to FastCache ==========
+        if (!fastcache_push(class_idx, p)) {
+            // FastCache full, rollback state and exit
+            // Note: We don't need to update active counter yet (will do after loop)
+            meta->used--;  // Rollback used count
+            if (meta->freelist == p) {
+                // This block came from freelist, push it back
+                // (This is a rare edge case - FC full is uncommon)
+            } else if (meta->carved > 0 && (void*)(slab_base + ((meta->carved - 1) * stride)) == p) {
+                // This block was just carved, rollback carve
+                meta->carved--;
+            }
+            break;
+        }
+
+        produced++;
+    }
+
+    // ========== Step 4: Update Active Counter ==========
+    if (produced > 0) {
+        ss_active_add(ss, (uint32_t)produced);
+    }
+
+    // ========== Step 5: Return ==========
+    return produced;
+}
+
+// ============================================================================
+// Performance Notes
+// ============================================================================
+//
+// Expected Performance Improvement:
+// - Before (2-hop path): SS → SLL → FC
+//   * Overhead: SLL list traversal, cache misses, branch mispredicts
+//   * Latency: ~50-100 cycles per block
+//
+// - After (1-hop path): SS → FC
+//   * Overhead: Direct array push
+//   * Latency: ~10-20 cycles per block
+//   * Improvement: 50-80% reduction in refill latency
+//
+// Memory Impact:
+// - Zero additional memory (reuses existing FastCache)
+// - Reduced pressure on SLL (can potentially shrink SLL capacity)
+//
+// Thread Safety:
+// - All operations on TLS structures (no locks needed)
+// - Remote drain uses unsafe variant (OK for TLS context)
+// - Active counter updates use atomic add (safe)
+//
+// ============================================================================
+// Integration Notes
+// ============================================================================
+//
+// Usage Example (from allocation hot path):
+//   void* p = fastcache_pop(class_idx);
+//   if (!p) {
+//     ss_refill_fc_fill(class_idx, 16);  // Refill 16 blocks
+//     p = fastcache_pop(class_idx);       // Try again
+//   }
+//
+// Tuning Parameters:
+// - REMOTE_DRAIN_THRESHOLD: Default 32, can override via env var
+// - Want parameter: Recommended 8-32 blocks (balance overhead vs hit rate)
+//
+// Debug Flags:
+// - HAKMEM_TINY_P0_DRAIN_THRESH: Override drain threshold
+// - HAKMEM_TINY_P0_NO_DRAIN: Disable remote drain (debugging only)
+//
+// ============================================================================
+
+#endif // HAK_REFILL_SS_REFILL_FC_H
diff --git a/core/tiny_alloc_fast.inc.h b/core/tiny_alloc_fast.inc.h
index cd5fde1c..6e6aa758 100644
--- a/core/tiny_alloc_fast.inc.h
+++ b/core/tiny_alloc_fast.inc.h
@@ -77,6 +77,8 @@ extern int sll_refill_batch_from_ss(int class_idx, int max_take);
 #else
 extern int sll_refill_small_from_ss(int class_idx, int max_take);
 #endif
+// NEW: Direct SS→FC refill (bypasses SLL)
+extern int ss_refill_fc_fill(int class_idx, int want);
 extern void* hak_tiny_alloc_slow(size_t size, int class_idx);
 extern int hak_tiny_size_to_class(size_t size);
 extern int tiny_refill_failfast_level(void);
@@ -429,13 +431,35 @@ static inline int tiny_alloc_fast_refill(int class_idx) {
 #endif
 
     // Box Boundary: Delegate to Backend (Box 3: SuperSlab)
-    // This gives us ACE, Learning layer, L25 integration for free!
-    // P0 Fix: Use appropriate refill function based on P0 status
+    // Refill Dispatch: Standard (ss_refill_fc_fill) vs Legacy SLL (A/B only)
+    // Standard: Enabled by FRONT_DIRECT=1, REFILL_BATCH=1, or P0_DIRECT_FC_ALL=1
+    // Legacy:   Fallback for compatibility (will be deprecated)
+    int refilled = 0;
+
+    // NEW: Front-Direct refill control (A/B toggle)
+    static __thread int s_use_front_direct = -1;
+    if (__builtin_expect(s_use_front_direct == -1, 0)) {
+        // Check multiple ENV flags (any one enables Front-Direct)
+        const char* e1 = getenv("HAKMEM_TINY_FRONT_DIRECT");
+        const char* e2 = getenv("HAKMEM_TINY_P0_DIRECT_FC_ALL");
+        const char* e3 = getenv("HAKMEM_TINY_REFILL_BATCH");
+        s_use_front_direct = ((e1 && *e1 && *e1 != '0') ||
+                              (e2 && *e2 && *e2 != '0') ||
+                              (e3 && *e3 && *e3 != '0')) ? 1 : 0;
+    }
+
+    // Refill dispatch
+    if (s_use_front_direct) {
+        // NEW: Direct SS→FC (bypasses SLL)
+        refilled = ss_refill_fc_fill(class_idx, cnt);
+    } else {
+        // Legacy: SS→SLL→FC (via batch or generic)
 #if HAKMEM_TINY_P0_BATCH_REFILL
-    int refilled = sll_refill_batch_from_ss(class_idx, cnt);
+        refilled = sll_refill_batch_from_ss(class_idx, cnt);
 #else
-    int refilled = sll_refill_small_from_ss(class_idx, cnt);
+        refilled = sll_refill_small_from_ss(class_idx, cnt);
 #endif
+    }
 
     // Lightweight adaptation: if refills keep happening, increase per-class refill.
     // Focus on class 7 (1024B) to reduce mmap/refill frequency under Tiny-heavy loads.
@@ -462,16 +486,23 @@ static inline int tiny_alloc_fast_refill(int class_idx) {
         track_refill_for_adaptation(class_idx);
     }
 
-    // Box 5-NEW: Cascade refill SFC ← SLL (if SFC enabled)
-    // This happens AFTER SuperSlab → SLL refill, so SLL has blocks
-    static __thread int sfc_check_done_refill = 0;
-    static __thread int sfc_is_enabled_refill = 0;
-    if (__builtin_expect(!sfc_check_done_refill, 0)) {
-        sfc_is_enabled_refill = g_sfc_enabled;
-        sfc_check_done_refill = 1;
+    // Box 5-NEW: Cascade refill SFC ← SLL (opt-in via HAKMEM_TINY_SFC_CASCADE, off by default)
+    // NEW: Default OFF, enable via HAKMEM_TINY_SFC_CASCADE=1
+    // Skip entirely when Front-Direct is active (direct SS→FC path)
+    static __thread int sfc_cascade_enabled = -1;
+    if (__builtin_expect(sfc_cascade_enabled == -1, 0)) {
+        // Front-Direct bypasses SLL, so SFC cascade is pointless
+        if (s_use_front_direct) {
+            sfc_cascade_enabled = 0;
+        } else {
+            // Check ENV flag (default: OFF)
+            const char* e = getenv("HAKMEM_TINY_SFC_CASCADE");
+            sfc_cascade_enabled = (e && *e && *e != '0') ? 1 : 0;
+        }
     }
 
-    if (sfc_is_enabled_refill && refilled > 0) {
+    // Only cascade if explicitly enabled AND we have refilled blocks in SLL
+    if (sfc_cascade_enabled && g_sfc_enabled && refilled > 0) {
         // Skip SFC cascade for class5 when dedicated hotpath is enabled
         if (g_tiny_hotpath_class5 && class_idx == 5) {
             // no-op: keep refilled blocks in TLS List/SLL
@@ -552,6 +583,13 @@ static inline void* tiny_alloc_fast(size_t size) {
     void* ptr = NULL;
     const int hot_c5 = (g_tiny_hotpath_class5 && class_idx == 5);
 
+    // NEW: Front-Direct/SLL-OFF bypass control (TLS cached, lazy init)
+    static __thread int s_front_direct_alloc = -1;
+    if (__builtin_expect(s_front_direct_alloc == -1, 0)) {
+        const char* e = getenv("HAKMEM_TINY_FRONT_DIRECT");
+        s_front_direct_alloc = (e && *e && *e != '0') ? 1 : 0;
+    }
+
     if (__builtin_expect(hot_c5, 0)) {
         // class5: 専用最短経路（generic frontは一切通らない）
         void* p = tiny_class5_minirefill_take();
@@ -570,15 +608,15 @@ static inline void* tiny_alloc_fast(size_t size) {
     }
 
     // Generic front (FastCache/SFC/SLL)
-    // Respect SLL global toggle; when disabled, skip TLS SLL fast pop entirely
-    if (__builtin_expect(g_tls_sll_enable, 1)) {
+    // Respect SLL global toggle AND Front-Direct mode; when either disabled, skip TLS SLL entirely
+    if (__builtin_expect(g_tls_sll_enable && !s_front_direct_alloc, 1)) {
         // For classes 0..3 keep ultra-inline POP; for >=4 use safe Box POP to avoid UB on bad heads.
         if (class_idx <= 3) {
-#if HAKMEM_TINY_AGGRESSIVE_INLINE
-            // Phase 2: Use inline macro (3-4 instructions, zero call overhead)
+#if defined(HAKMEM_TINY_INLINE_SLL) && HAKMEM_TINY_AGGRESSIVE_INLINE
+            // Experimental: Use inline SLL pop macro (enable via HAKMEM_TINY_INLINE_SLL=1)
             TINY_ALLOC_FAST_POP_INLINE(class_idx, ptr);
 #else
-            // Legacy: Function call (10-15 instructions, 5-10 cycle overhead)
+            // Default: Safe Box API (bypasses inline SLL when Front-Direct)
             ptr = tiny_alloc_fast_pop(class_idx);
 #endif
         } else {
@@ -586,14 +624,24 @@ static inline void* tiny_alloc_fast(size_t size) {
             if (tls_sll_pop(class_idx, &base)) ptr = base; else ptr = NULL;
         }
     } else {
-        ptr = NULL;
+        ptr = NULL;  // SLL disabled OR Front-Direct active → bypass SLL
     }
     if (__builtin_expect(ptr != NULL, 1)) {
         HAK_RET_ALLOC(class_idx, ptr);
     }
 
-    // Generic: Refill and take（FastCacheやTLS Listへ）
-    {
+    // Generic: Refill and take (Front-Direct vs Legacy)
+    if (s_front_direct_alloc) {
+        // Front-Direct: Direct SS→FC refill (bypasses SLL/TLS List)
+        int refilled_fc = tiny_alloc_fast_refill(class_idx);
+        if (__builtin_expect(refilled_fc > 0, 1)) {
+            void* fc_ptr = fastcache_pop(class_idx);
+            if (fc_ptr) {
+                HAK_RET_ALLOC(class_idx, fc_ptr);
+            }
+        }
+    } else {
+        // Legacy: Refill to TLS List/SLL
         extern __thread TinyTLSList g_tls_lists[TINY_NUM_CLASSES];
         void* took = tiny_fast_refill_and_take(class_idx, &g_tls_lists[class_idx]);
         if (took) {
@@ -605,13 +653,14 @@ static inline void* tiny_alloc_fast(size_t size) {
     {
         int refilled = tiny_alloc_fast_refill(class_idx);
         if (__builtin_expect(refilled > 0, 1)) {
-            if (__builtin_expect(g_tls_sll_enable, 1)) {
+            // Skip SLL retry if Front-Direct OR SLL disabled
+            if (__builtin_expect(g_tls_sll_enable && !s_front_direct_alloc, 1)) {
                 if (class_idx <= 3) {
-#if HAKMEM_TINY_AGGRESSIVE_INLINE
-                    // Phase 2: Use inline macro (3-4 instructions, zero call overhead)
+#if defined(HAKMEM_TINY_INLINE_SLL) && HAKMEM_TINY_AGGRESSIVE_INLINE
+                    // Experimental: Use inline SLL pop macro (enable via HAKMEM_TINY_INLINE_SLL=1)
                     TINY_ALLOC_FAST_POP_INLINE(class_idx, ptr);
 #else
-                    // Legacy: Function call (10-15 instructions, 5-10 cycle overhead)
+                    // Default: Safe Box API (bypasses inline SLL when Front-Direct)
                     ptr = tiny_alloc_fast_pop(class_idx);
 #endif
                 } else {
@@ -619,7 +668,7 @@ static inline void* tiny_alloc_fast(size_t size) {
                     if (tls_sll_pop(class_idx, &base2)) ptr = base2; else ptr = NULL;
                 }
             } else {
-                ptr = NULL;
+                ptr = NULL;  // SLL disabled OR Front-Direct active → bypass SLL
             }
             if (ptr) {
                 HAK_RET_ALLOC(class_idx, ptr);
diff --git a/core/tiny_debug_ring.c b/core/tiny_debug_ring.c
index b044c49c..19bacd8a 100644
--- a/core/tiny_debug_ring.c
+++ b/core/tiny_debug_ring.c
@@ -71,6 +71,9 @@ static TinyRingName tiny_ring_event_name(uint16_t event) {
         case TINY_RING_EVENT_MAILBOX_FETCH: return (TinyRingName){"mailbox_fetch", 13};
         case TINY_RING_EVENT_MAILBOX_FETCH_NULL: return (TinyRingName){"mailbox_fetch_null", 18};
         case TINY_RING_EVENT_ROUTE: return (TinyRingName){"route", 5};
+        case TINY_RING_EVENT_TLS_SLL_REJECT: return (TinyRingName){"tls_sll_reject", 14};
+        case TINY_RING_EVENT_TLS_SLL_SENTINEL: return (TinyRingName){"tls_sll_sentinel", 16};
+        case TINY_RING_EVENT_TLS_SLL_HDR_CORRUPT: return (TinyRingName){"tls_sll_hdr_corrupt", 20};
         default: return (TinyRingName){"unknown", 7};
     }
 }
diff --git a/core/tiny_debug_ring.h b/core/tiny_debug_ring.h
index a796f574..36086a24 100644
--- a/core/tiny_debug_ring.h
+++ b/core/tiny_debug_ring.h
@@ -34,7 +34,11 @@ enum {
     TINY_RING_EVENT_MAILBOX_PUBLISH,
     TINY_RING_EVENT_MAILBOX_FETCH,
     TINY_RING_EVENT_MAILBOX_FETCH_NULL,
-    TINY_RING_EVENT_ROUTE
+    TINY_RING_EVENT_ROUTE,
+    // TLS SLL anomalies (investigation aid, gated by HAKMEM_TINY_SLL_RING)
+    TINY_RING_EVENT_TLS_SLL_REJECT = 0x7F10,
+    TINY_RING_EVENT_TLS_SLL_SENTINEL = 0x7F11,
+    TINY_RING_EVENT_TLS_SLL_HDR_CORRUPT = 0x7F12
 };
 
 // Function declarations (implementation in tiny_debug_ring.c)
diff --git a/hakmem.d b/hakmem.d
index 9779cd33..07f507c9 100644
--- a/hakmem.d
+++ b/hakmem.d
@@ -28,8 +28,8 @@ hakmem.o: core/hakmem.c core/hakmem.h core/hakmem_build_flags.h \
  core/box/../box/../tiny_region_id.h \
  core/box/../box/../hakmem_tiny_integrity.h \
  core/box/../box/../hakmem_tiny.h core/box/../box/../ptr_track.h \
- core/box/../hakmem_tiny_integrity.h core/box/front_gate_classifier.h \
- core/box/hak_wrappers.inc.h
+ core/box/../box/../tiny_debug_ring.h core/box/../hakmem_tiny_integrity.h \
+ core/box/front_gate_classifier.h core/box/hak_wrappers.inc.h
 core/hakmem.h:
 core/hakmem_build_flags.h:
 core/hakmem_config.h:
@@ -95,6 +95,7 @@ core/box/../box/../tiny_region_id.h:
 core/box/../box/../hakmem_tiny_integrity.h:
 core/box/../box/../hakmem_tiny.h:
 core/box/../box/../ptr_track.h:
+core/box/../box/../tiny_debug_ring.h:
 core/box/../hakmem_tiny_integrity.h:
 core/box/front_gate_classifier.h:
 core/box/hak_wrappers.inc.h:
diff --git a/hakmem_tiny_sfc.d b/hakmem_tiny_sfc.d
index e3c45af1..b6ce55f7 100644
--- a/hakmem_tiny_sfc.d
+++ b/hakmem_tiny_sfc.d
@@ -13,7 +13,8 @@ hakmem_tiny_sfc.o: core/hakmem_tiny_sfc.c core/tiny_alloc_fast_sfc.inc.h \
  core/box/../hakmem_tiny_superslab_constants.h \
  core/box/../hakmem_tiny_config.h core/box/../ptr_track.h \
  core/box/../hakmem_tiny_integrity.h core/box/../hakmem_tiny.h \
- core/box/../ptr_track.h core/box/../ptr_trace.h
+ core/box/../ptr_track.h core/box/../ptr_trace.h \
+ core/box/../tiny_debug_ring.h
 core/tiny_alloc_fast_sfc.inc.h:
 core/hakmem_tiny.h:
 core/hakmem_build_flags.h:
@@ -46,3 +47,4 @@ core/box/../hakmem_tiny_integrity.h:
 core/box/../hakmem_tiny.h:
 core/box/../ptr_track.h:
 core/box/../ptr_trace.h:
+core/box/../tiny_debug_ring.h:
diff --git a/hakmem_tiny_superslab.d b/hakmem_tiny_superslab.d
index 8c29245c..0f079a01 100644
--- a/hakmem_tiny_superslab.d
+++ b/hakmem_tiny_superslab.d
@@ -7,7 +7,10 @@ hakmem_tiny_superslab.o: core/hakmem_tiny_superslab.c \
  core/hakmem_tiny.h core/hakmem_trace.h core/hakmem_tiny_mini_mag.h \
  core/hakmem_tiny_config.h core/hakmem_shared_pool.h \
  core/hakmem_internal.h core/hakmem.h core/hakmem_config.h \
- core/hakmem_features.h core/hakmem_sys.h core/hakmem_whale.h
+ core/hakmem_features.h core/hakmem_sys.h core/hakmem_whale.h \
+ core/tiny_region_id.h core/tiny_box_geometry.h core/ptr_track.h \
+ core/box/tiny_next_ptr_box.h core/hakmem_tiny_config.h \
+ core/tiny_nextptr.h
 core/hakmem_tiny_superslab.h:
 core/superslab/superslab_types.h:
 core/hakmem_tiny_superslab_constants.h:
@@ -29,3 +32,9 @@ core/hakmem_config.h:
 core/hakmem_features.h:
 core/hakmem_sys.h:
 core/hakmem_whale.h:
+core/tiny_region_id.h:
+core/tiny_box_geometry.h:
+core/ptr_track.h:
+core/box/tiny_next_ptr_box.h:
+core/hakmem_tiny_config.h:
+core/tiny_nextptr.h: