diff --git a/docs/REFACTOR_PLAN_GEMINI_ENHANCED.md b/docs/REFACTOR_PLAN_GEMINI_ENHANCED.md
new file mode 100644
index 00000000..27292605
--- /dev/null
+++ b/docs/REFACTOR_PLAN_GEMINI_ENHANCED.md
@@ -0,0 +1,238 @@
+# Gemini Enhanced: Tiny Allocator Refactoring Plan
+## Objective: Safe Alignment & Structural Integrity
+
+This document outlines a revised refactoring plan for `hakmem`'s Tiny Allocator.
+It builds upon the original ChatGPT proposal but incorporates specific safeguards against memory bloat and ensures type safety before major layout changes.
+
+**Primary Goal:** Eliminate `sh8bench` memory corruption (and similar future bugs) caused by ~~misalignment/odd-address returns~~ **header restoration gaps at Box boundaries**, without doubling memory consumption.
+
+> **2025-12-03 Review Notes (Claude Code + Task Agent + Gemini Final Report):**
+> - Phase 0.1-0.2: Already implemented (`ptr_type_box.h`, `ptr_conversion_box.h`)
+> - Phase 0.3: ~~Premise is unverified~~ **VERIFIED by Gemini** (see `tls_sll_hdr_reset_final_report.md`)
+> - Phase 2: ~~"Headerless" strategy is over-engineering~~ **RECONSIDERED** - alignment guarantee is legitimate long-term goal
+> - Phase 3.1: ABORT is too aggressive; current NORMALIZE + log is correct fail-safe
+>
+> **2025-12-03 Update (Gemini Final Report):**
+> Gemini mathematically proved that sh8bench adds +1 to odd malloc returns:
+> - Log analysis: `node=0xe1` → `user_ptr=0xe2` → expected `0xe1` = +1 delta
+> - ASan doesn't reproduce because it adds Redzone → alignment guaranteed → no +1 needed
+> - Conclusion: hakmem's odd-address returns cause compatibility issues with some applications
+
+---
+
+## Phase 0: Type Safety & Reproduction (The "Safety Net")
+
+Before changing *how* memory is laid out, we must rigorously define *how pointers are handled* to prevent manual arithmetic errors (`ptr + 1`).
+
+### 0.1. Implement "Phantom Types" (`core/box/ptr_type_box.h`) ✅ DONE
+
+Replace raw `void*` with strictly typed structures in Debug mode. This forces compiler errors on any manual pointer arithmetic.
+
+```c
+// Debug Mode (Strict)
+typedef struct { void* addr; } hak_base_ptr_t; // Internal: Starts at allocation boundary (Header/Metadata)
+typedef struct { void* addr; } hak_user_ptr_t; // External: Starts at User Payload (Returned to malloc caller)
+
+// Release Mode (Zero Cost)
+typedef void* hak_base_ptr_t;
+typedef void* hak_user_ptr_t;
+```
+
+**Status:** Implemented in `core/box/ptr_type_box.h`. Used in `tls_sll_box.h`, `free_local_box.h`, etc.
+
+### 0.2. The "Converter Box" API ✅ DONE
+
+Centralize all pointer math. **No other file** should calculate offsets.
+
+*   `hak_user_ptr_t hak_base_to_user(hak_base_ptr_t base, int class_idx);`
+*   `hak_base_ptr_t hak_user_to_base(hak_user_ptr_t user, int class_idx);`
+
+**Status:** Implemented in `core/box/ptr_conversion_box.h`.
+
+### 0.3. Create `sh8bench` Reproducer ✅ VERIFIED
+
+Create a standalone minimal test (`tests/repro_misalign.c`) that:
+1.  Allocates a Tiny block (returning an odd address).
+2.  Manually aligns it to even/16B boundary (simulating `sh8bench` behavior).
+3.  Writes past the end (neighbor corruption).
+4.  Frees the pointer.
+
+**Status:** The hypothesis has been **mathematically verified** by Gemini (see `docs/tls_sll_hdr_reset_final_report.md`).
+
+**Gemini's Proof (2025-12-03):**
+- Log analysis: `node=0x...e1` in `tls_sll_push()` means `user_ptr = 0x...e2` (since push receives `user_ptr - 1`)
+- Expected malloc return: `0x...e1` (Base + 1 for Class 1)
+- Delta: `0xe2 - 0xe1 = +1` — sh8bench adds +1 to odd addresses
+
+**Why ASan doesn't reproduce:**
+- ASan adds Redzone around allocations → alignment guaranteed (16/32B boundary)
+- With aligned addresses, sh8bench doesn't need to add +1
+- Therefore, no `NORMALIZE_USERPTR` and no neighbor corruption
+
+**Conclusion:** Original plan was correct. The reproducer test should simulate this +1 behavior.
+
+---
+
+## Phase 1: Logic Centralization (The "Cleanup")
+
+Stop scattered manual offset logic in `tls_sll_box.h`, `tiny_free.inc`, etc.
+
+### 1.1. Adopt Phantom Types Globally 🔄 IN PROGRESS
+
+Refactor `hakmem_tiny.c`, `tls_sll_box.h`, and `tiny_nextptr.h` to accept/return `hak_base_ptr_t` or `hak_user_ptr_t`.
+*   **Rule:** `tls_sll_push` / `freelist` operations MUST use `hak_base_ptr_t`.
+*   **Rule:** `malloc` returns / `free` inputs are `hak_user_ptr_t`.
+
+**Status:** Partially done. `tls_sll_box.h` uses `hak_base_ptr_t`. Need audit of remaining files.
+
+### 1.2. Centralize Layout Logic ❌ PENDING
+
+Deprecate scattered `sizeof(void*)` or `+1` math. Move class layout definitions to `core/box/tiny_layout_box.h`.
+
+**Status:** Not started. Consider merging with existing `tiny_geometry_box.h`.
+
+---
+
+## Phase 2: Strategic Layout Change (The "Hard Problem") 🔄 RECONSIDERED
+
+> **2025-12-03 Review (Updated with Gemini Final Report):**
+>
+> **Original Assessment:** "alignment is not proven" — **CORRECTED**
+>
+> Gemini's final report (`tls_sll_hdr_reset_final_report.md`) mathematically proved that:
+> 1. sh8bench adds +1 to odd malloc returns (implicit alignment expectation)
+> 2. This causes neighbor block corruption → TLS_SLL_HDR_RESET
+> 3. ASan doesn't reproduce because it provides alignment guarantee
+>
+> **Two Distinct Issues:**
+> - **Issue A (Fixed):** Header restoration gaps at Box boundaries → Fixed by commits `3c6c76cb1`, `a94344c1a`, `6154e7656`, `6df1bdec3`
+> - **Issue B (Root Cause):** hakmem returns odd addresses, violating `alignof(max_align_t)` expectation → **Requires Phase 2**
+>
+> **Recommendation:** Phase 2 is a **legitimate long-term goal** for C standard compliance.
+> However, current "Atomic Fence + Header Write" provides effective defense.
+
+**Challenge:** Changing the layout to ensure alignment (e.g., 16B user alignment) usually requires padding, which wastes memory (e.g., 16B Data + 16B Header = 32B stride -> 100% overhead).
+
+### 2.1. Strategy Selection: "Headerless Allocated" (Recommended for Long-Term)
+
+Instead of adding padding, remove the inline header for allocated blocks.
+
+*   **Free State (Inside Allocator):**
+    *   Block contains `Next Pointer` (and optional Header if space permits) at offset 0.
+    *   Alignment: Natural.
+*   **Allocated State (User):**
+    *   **No inline header.** User gets `Base + 0`.
+    *   Alignment: Perfect (same as Base).
+    *   **Metadata Recovery:** On `free()`, use `SuperSlab Registry` or `Bitmap` to identify the Size Class.
+
+**Status:** Recommended for long-term. Current header-based design works with defensive measures.
+
+### 2.2. Implementation (Version 2 Layout) 📋 PLANNED
+
+**Status:** Planned for future implementation. Priority: Medium (after stability confirmed).
+
+---
+
+## Phase 3: Fail-Fast Boundaries & Validation ⚠️ REVISED
+
+### 3.1. The "Gatekeeper" Box (Revised Approach)
+
+~~In `free()` and `realloc()`:~~
+~~*   Check alignment of incoming `user_ptr`.~~
+~~*   If `(ptr % ALIGNMENT != 0)`: **ABORT IMMEDIATELY**.~~
+~~*   Do not attempt to "fix" or "normalize" the pointer (which masks bugs like `sh8bench`'s).~~
+
+**REVISION:** The current `NORMALIZE_USERPTR` behavior is **correct fail-safe**, not a bug mask.
+
+**Current behavior (correct):**
+1. Detect pointer delta via stride check (`tls_sll_box.h:96`)
+2. Log `[TLS_SLL_NORMALIZE_USERPTR]` with detailed info
+3. Normalize to correct base pointer
+4. Continue operation
+
+**Why ABORT is wrong:**
+- Loses debugging context (no logs before crash)
+- Breaks compatibility with existing workloads
+- The normalization is defensive, not masking
+
+**Revised Gatekeeper design:**
+```c
+// In free() entry:
+if (pointer_delta != 0) {
+    // Log detailed info (already implemented)
+    fprintf(stderr, "[TLS_SLL_NORMALIZE_USERPTR] cls=%d node=%p -> base=%p stride=%zu\n", ...);
+
+    // Normalize and continue (fail-safe)
+    base = normalize(user);
+
+    // Optional: Track frequency for monitoring
+    atomic_increment(&g_normalize_count);
+}
+```
+
+### 3.2. Distinguish Errors ✅ IMPLEMENTED
+
+Differentiate between:
+*   `TLS_SLL_HDR_RESET` (Internal/Neighbor corruption detected *after* safe push).
+*   ~~`ALIGNMENT_FAULT`~~ → `TLS_SLL_NORMALIZE_USERPTR` (External pointer delta detected *before* processing).
+
+**Status:** Already implemented in current codebase.
+
+---
+
+## Phase 4: Rollout & Tuning
+
+1.  ~~**A/B Testing:** Use `HAKMEM_TINY_LAYOUT=v2` to toggle the new Headerless/Aligned layout.~~
+    **Revised:** A/B testing for header write behavior already exists via `HAKMEM_TINY_WRITE_HEADER`.
+
+2.  ~~**Verify `sh8bench`:** Confirm it crashes with `ALIGNMENT_FAULT`~~
+    **Revised:** Current behavior is correct - sh8bench runs with TLS_SLL_HDR_RESET as warning, not crash.
+
+3.  **Benchmark:** Ensure header validation doesn't regress performance compared to no-header builds.
+    **Status:** Pending. Use `HAKMEM_TINY_HEADER_CLASSIDX=0` vs default to compare.
+
+---
+
+## Summary of Changes vs Original Plan
+
+1.  **Added Phase 0 (Phantom Types):** ~~Prevents "refactoring bugs" where we mix up Base/User pointers.~~ ✅ **DONE**
+2.  ~~**Changed Phase 2 Strategy:** Explicitly recommends **"Headerless"** over "Padding" to avoid 2x memory usage on small blocks.~~ ⚠️ **DEPRIORITIZED** - Root cause was not alignment
+3.  ~~**Strict Fail-Fast:** Instead of normalizing bad pointers (current behavior), we explicitly reject them to identify the root cause (external app bug).~~ ⚠️ **REVISED** - Current normalize-and-continue is correct fail-safe behavior
+
+---
+
+## Appendix: Root Cause Analysis (2025-12-03)
+
+### Verified Root Causes (Fixed)
+
+| Issue | Root Cause | Fix | Commit |
+|-------|-----------|-----|--------|
+| unified_cache_refill SEGVAULT | Compiler reordering header write after next_read | Move header write first + atomic fence | `6154e7656` |
+| LRU registry未登録 | SuperSlab pop from LRU without re-registration | Add `hak_super_register()` after LRU pop | `4cc2d8add` |
+| TLS SLL header corruption | Header not written at Box boundaries | Add header write at freelist→SLL transitions | `3c6c76cb1`, `a94344c1a` |
+| TLS SLL race condition | Missing memory barrier in push | Add atomic fence in push_impl | `6df1bdec3` |
+
+### Verified Hypotheses (2025-12-03 Gemini Final Report)
+
+| Hypothesis | Source | Evidence |
+|------------|--------|----------|
+| "sh8bench adds +1 to pointer" | Gemini | ✅ **PROVEN** - Log analysis: `node=0xe1` → `user_ptr=0xe2` = +1 delta |
+| "Alignment causes neighbor overwrite" | Gemini | ✅ **PROVEN** - +1 offset causes write to next block's header |
+| "ASan provides alignment guarantee" | Gemini | ✅ **PROVEN** - Redzone forces aligned returns → no +1 needed |
+
+### Long-Term Recommendations
+
+| Recommendation | Priority | Rationale |
+|----------------|----------|-----------|
+| "Headerless layout" (Phase 2) | Medium | Guarantees `alignof(max_align_t)` compliance |
+| Current defensive measures | High | Atomic Fence + Header Write effectively mitigates Issue B |
+| Reproducer test | Low | Useful for regression testing but not blocking |
+
+---
+
+## Next Steps
+
+1. **Complete Phase 1.1**: Audit remaining files for direct `+1` arithmetic
+2. **Investigate sh8bench source**: Determine actual memory access pattern
+3. **Benchmark header overhead**: Compare `HAKMEM_TINY_HEADER_CLASSIDX=1` vs `0`
+4. **Consider Phase 2 only if**: Proven alignment issues emerge in production