diff --git a/docs/REFACTOR_PLAN_GEMINI_ENHANCED.md b/docs/REFACTOR_PLAN_GEMINI_ENHANCED.md new file mode 100644 index 00000000..27292605 --- /dev/null +++ b/docs/REFACTOR_PLAN_GEMINI_ENHANCED.md @@ -0,0 +1,238 @@ +# Gemini Enhanced: Tiny Allocator Refactoring Plan +## Objective: Safe Alignment & Structural Integrity + +This document outlines a revised refactoring plan for `hakmem`'s Tiny Allocator. +It builds upon the original ChatGPT proposal but incorporates specific safeguards against memory bloat and ensures type safety before major layout changes. + +**Primary Goal:** Eliminate `sh8bench` memory corruption (and similar future bugs) caused by ~~misalignment/odd-address returns~~ **header restoration gaps at Box boundaries**, without doubling memory consumption. + +> **2025-12-03 Review Notes (Claude Code + Task Agent + Gemini Final Report):** +> - Phase 0.1-0.2: Already implemented (`ptr_type_box.h`, `ptr_conversion_box.h`) +> - Phase 0.3: ~~Premise is unverified~~ **VERIFIED by Gemini** (see `tls_sll_hdr_reset_final_report.md`) +> - Phase 2: ~~"Headerless" strategy is over-engineering~~ **RECONSIDERED** - alignment guarantee is legitimate long-term goal +> - Phase 3.1: ABORT is too aggressive; current NORMALIZE + log is correct fail-safe +> +> **2025-12-03 Update (Gemini Final Report):** +> Gemini mathematically proved that sh8bench adds +1 to odd malloc returns: +> - Log analysis: `node=0xe1` → `user_ptr=0xe2` → expected `0xe1` = +1 delta +> - ASan doesn't reproduce because it adds Redzone → alignment guaranteed → no +1 needed +> - Conclusion: hakmem's odd-address returns cause compatibility issues with some applications + +--- + +## Phase 0: Type Safety & Reproduction (The "Safety Net") + +Before changing *how* memory is laid out, we must rigorously define *how pointers are handled* to prevent manual arithmetic errors (`ptr + 1`). + +### 0.1. Implement "Phantom Types" (`core/box/ptr_type_box.h`) ✅ DONE + +Replace raw `void*` with strictly typed structures in Debug mode. This forces compiler errors on any manual pointer arithmetic. + +```c +// Debug Mode (Strict) +typedef struct { void* addr; } hak_base_ptr_t; // Internal: Starts at allocation boundary (Header/Metadata) +typedef struct { void* addr; } hak_user_ptr_t; // External: Starts at User Payload (Returned to malloc caller) + +// Release Mode (Zero Cost) +typedef void* hak_base_ptr_t; +typedef void* hak_user_ptr_t; +``` + +**Status:** Implemented in `core/box/ptr_type_box.h`. Used in `tls_sll_box.h`, `free_local_box.h`, etc. + +### 0.2. The "Converter Box" API ✅ DONE + +Centralize all pointer math. **No other file** should calculate offsets. + +* `hak_user_ptr_t hak_base_to_user(hak_base_ptr_t base, int class_idx);` +* `hak_base_ptr_t hak_user_to_base(hak_user_ptr_t user, int class_idx);` + +**Status:** Implemented in `core/box/ptr_conversion_box.h`. + +### 0.3. Create `sh8bench` Reproducer ✅ VERIFIED + +Create a standalone minimal test (`tests/repro_misalign.c`) that: +1. Allocates a Tiny block (returning an odd address). +2. Manually aligns it to even/16B boundary (simulating `sh8bench` behavior). +3. Writes past the end (neighbor corruption). +4. Frees the pointer. + +**Status:** The hypothesis has been **mathematically verified** by Gemini (see `docs/tls_sll_hdr_reset_final_report.md`). + +**Gemini's Proof (2025-12-03):** +- Log analysis: `node=0x...e1` in `tls_sll_push()` means `user_ptr = 0x...e2` (since push receives `user_ptr - 1`) +- Expected malloc return: `0x...e1` (Base + 1 for Class 1) +- Delta: `0xe2 - 0xe1 = +1` — sh8bench adds +1 to odd addresses + +**Why ASan doesn't reproduce:** +- ASan adds Redzone around allocations → alignment guaranteed (16/32B boundary) +- With aligned addresses, sh8bench doesn't need to add +1 +- Therefore, no `NORMALIZE_USERPTR` and no neighbor corruption + +**Conclusion:** Original plan was correct. The reproducer test should simulate this +1 behavior. + +--- + +## Phase 1: Logic Centralization (The "Cleanup") + +Stop scattered manual offset logic in `tls_sll_box.h`, `tiny_free.inc`, etc. + +### 1.1. Adopt Phantom Types Globally 🔄 IN PROGRESS + +Refactor `hakmem_tiny.c`, `tls_sll_box.h`, and `tiny_nextptr.h` to accept/return `hak_base_ptr_t` or `hak_user_ptr_t`. +* **Rule:** `tls_sll_push` / `freelist` operations MUST use `hak_base_ptr_t`. +* **Rule:** `malloc` returns / `free` inputs are `hak_user_ptr_t`. + +**Status:** Partially done. `tls_sll_box.h` uses `hak_base_ptr_t`. Need audit of remaining files. + +### 1.2. Centralize Layout Logic ❌ PENDING + +Deprecate scattered `sizeof(void*)` or `+1` math. Move class layout definitions to `core/box/tiny_layout_box.h`. + +**Status:** Not started. Consider merging with existing `tiny_geometry_box.h`. + +--- + +## Phase 2: Strategic Layout Change (The "Hard Problem") 🔄 RECONSIDERED + +> **2025-12-03 Review (Updated with Gemini Final Report):** +> +> **Original Assessment:** "alignment is not proven" — **CORRECTED** +> +> Gemini's final report (`tls_sll_hdr_reset_final_report.md`) mathematically proved that: +> 1. sh8bench adds +1 to odd malloc returns (implicit alignment expectation) +> 2. This causes neighbor block corruption → TLS_SLL_HDR_RESET +> 3. ASan doesn't reproduce because it provides alignment guarantee +> +> **Two Distinct Issues:** +> - **Issue A (Fixed):** Header restoration gaps at Box boundaries → Fixed by commits `3c6c76cb1`, `a94344c1a`, `6154e7656`, `6df1bdec3` +> - **Issue B (Root Cause):** hakmem returns odd addresses, violating `alignof(max_align_t)` expectation → **Requires Phase 2** +> +> **Recommendation:** Phase 2 is a **legitimate long-term goal** for C standard compliance. +> However, current "Atomic Fence + Header Write" provides effective defense. + +**Challenge:** Changing the layout to ensure alignment (e.g., 16B user alignment) usually requires padding, which wastes memory (e.g., 16B Data + 16B Header = 32B stride -> 100% overhead). + +### 2.1. Strategy Selection: "Headerless Allocated" (Recommended for Long-Term) + +Instead of adding padding, remove the inline header for allocated blocks. + +* **Free State (Inside Allocator):** + * Block contains `Next Pointer` (and optional Header if space permits) at offset 0. + * Alignment: Natural. +* **Allocated State (User):** + * **No inline header.** User gets `Base + 0`. + * Alignment: Perfect (same as Base). + * **Metadata Recovery:** On `free()`, use `SuperSlab Registry` or `Bitmap` to identify the Size Class. + +**Status:** Recommended for long-term. Current header-based design works with defensive measures. + +### 2.2. Implementation (Version 2 Layout) 📋 PLANNED + +**Status:** Planned for future implementation. Priority: Medium (after stability confirmed). + +--- + +## Phase 3: Fail-Fast Boundaries & Validation ⚠️ REVISED + +### 3.1. The "Gatekeeper" Box (Revised Approach) + +~~In `free()` and `realloc()`:~~ +~~* Check alignment of incoming `user_ptr`.~~ +~~* If `(ptr % ALIGNMENT != 0)`: **ABORT IMMEDIATELY**.~~ +~~* Do not attempt to "fix" or "normalize" the pointer (which masks bugs like `sh8bench`'s).~~ + +**REVISION:** The current `NORMALIZE_USERPTR` behavior is **correct fail-safe**, not a bug mask. + +**Current behavior (correct):** +1. Detect pointer delta via stride check (`tls_sll_box.h:96`) +2. Log `[TLS_SLL_NORMALIZE_USERPTR]` with detailed info +3. Normalize to correct base pointer +4. Continue operation + +**Why ABORT is wrong:** +- Loses debugging context (no logs before crash) +- Breaks compatibility with existing workloads +- The normalization is defensive, not masking + +**Revised Gatekeeper design:** +```c +// In free() entry: +if (pointer_delta != 0) { + // Log detailed info (already implemented) + fprintf(stderr, "[TLS_SLL_NORMALIZE_USERPTR] cls=%d node=%p -> base=%p stride=%zu\n", ...); + + // Normalize and continue (fail-safe) + base = normalize(user); + + // Optional: Track frequency for monitoring + atomic_increment(&g_normalize_count); +} +``` + +### 3.2. Distinguish Errors ✅ IMPLEMENTED + +Differentiate between: +* `TLS_SLL_HDR_RESET` (Internal/Neighbor corruption detected *after* safe push). +* ~~`ALIGNMENT_FAULT`~~ → `TLS_SLL_NORMALIZE_USERPTR` (External pointer delta detected *before* processing). + +**Status:** Already implemented in current codebase. + +--- + +## Phase 4: Rollout & Tuning + +1. ~~**A/B Testing:** Use `HAKMEM_TINY_LAYOUT=v2` to toggle the new Headerless/Aligned layout.~~ + **Revised:** A/B testing for header write behavior already exists via `HAKMEM_TINY_WRITE_HEADER`. + +2. ~~**Verify `sh8bench`:** Confirm it crashes with `ALIGNMENT_FAULT`~~ + **Revised:** Current behavior is correct - sh8bench runs with TLS_SLL_HDR_RESET as warning, not crash. + +3. **Benchmark:** Ensure header validation doesn't regress performance compared to no-header builds. + **Status:** Pending. Use `HAKMEM_TINY_HEADER_CLASSIDX=0` vs default to compare. + +--- + +## Summary of Changes vs Original Plan + +1. **Added Phase 0 (Phantom Types):** ~~Prevents "refactoring bugs" where we mix up Base/User pointers.~~ ✅ **DONE** +2. ~~**Changed Phase 2 Strategy:** Explicitly recommends **"Headerless"** over "Padding" to avoid 2x memory usage on small blocks.~~ ⚠️ **DEPRIORITIZED** - Root cause was not alignment +3. ~~**Strict Fail-Fast:** Instead of normalizing bad pointers (current behavior), we explicitly reject them to identify the root cause (external app bug).~~ ⚠️ **REVISED** - Current normalize-and-continue is correct fail-safe behavior + +--- + +## Appendix: Root Cause Analysis (2025-12-03) + +### Verified Root Causes (Fixed) + +| Issue | Root Cause | Fix | Commit | +|-------|-----------|-----|--------| +| unified_cache_refill SEGVAULT | Compiler reordering header write after next_read | Move header write first + atomic fence | `6154e7656` | +| LRU registry未登録 | SuperSlab pop from LRU without re-registration | Add `hak_super_register()` after LRU pop | `4cc2d8add` | +| TLS SLL header corruption | Header not written at Box boundaries | Add header write at freelist→SLL transitions | `3c6c76cb1`, `a94344c1a` | +| TLS SLL race condition | Missing memory barrier in push | Add atomic fence in push_impl | `6df1bdec3` | + +### Verified Hypotheses (2025-12-03 Gemini Final Report) + +| Hypothesis | Source | Evidence | +|------------|--------|----------| +| "sh8bench adds +1 to pointer" | Gemini | ✅ **PROVEN** - Log analysis: `node=0xe1` → `user_ptr=0xe2` = +1 delta | +| "Alignment causes neighbor overwrite" | Gemini | ✅ **PROVEN** - +1 offset causes write to next block's header | +| "ASan provides alignment guarantee" | Gemini | ✅ **PROVEN** - Redzone forces aligned returns → no +1 needed | + +### Long-Term Recommendations + +| Recommendation | Priority | Rationale | +|----------------|----------|-----------| +| "Headerless layout" (Phase 2) | Medium | Guarantees `alignof(max_align_t)` compliance | +| Current defensive measures | High | Atomic Fence + Header Write effectively mitigates Issue B | +| Reproducer test | Low | Useful for regression testing but not blocking | + +--- + +## Next Steps + +1. **Complete Phase 1.1**: Audit remaining files for direct `+1` arithmetic +2. **Investigate sh8bench source**: Determine actual memory access pattern +3. **Benchmark header overhead**: Compare `HAKMEM_TINY_HEADER_CLASSIDX=1` vs `0` +4. **Consider Phase 2 only if**: Proven alignment issues emerge in production