2025-12-03 11:20:18 +09:00
# Gemini Enhanced: Tiny Allocator Refactoring Plan
## Objective: Safe Alignment & Structural Integrity
This document outlines a revised refactoring plan for `hakmem` 's Tiny Allocator.
It builds upon the original ChatGPT proposal but incorporates specific safeguards against memory bloat and ensures type safety before major layout changes.
**Primary Goal:** Eliminate `sh8bench` memory corruption (and similar future bugs) caused by ~~misalignment/odd-address returns~~ **header restoration gaps at Box boundaries** , without doubling memory consumption.
> **2025-12-03 Review Notes (Claude Code + Task Agent + Gemini Final Report):**
> - Phase 0.1-0.2: Already implemented (`ptr_type_box.h`, `ptr_conversion_box.h`)
> - Phase 0.3: ~~Premise is unverified~~ **VERIFIED by Gemini** (see `tls_sll_hdr_reset_final_report.md`)
> - Phase 2: ~~"Headerless" strategy is over-engineering~~ **RECONSIDERED** - alignment guarantee is legitimate long-term goal
> - Phase 3.1: ABORT is too aggressive; current NORMALIZE + log is correct fail-safe
>
> **2025-12-03 Update (Gemini Final Report):**
> Gemini mathematically proved that sh8bench adds +1 to odd malloc returns:
> - Log analysis: `node=0xe1` → `user_ptr=0xe2` → expected `0xe1` = +1 delta
> - ASan doesn't reproduce because it adds Redzone → alignment guaranteed → no +1 needed
> - Conclusion: hakmem's odd-address returns cause compatibility issues with some applications
---
## Phase 0: Type Safety & Reproduction (The "Safety Net")
Before changing *how* memory is laid out, we must rigorously define *how pointers are handled* to prevent manual arithmetic errors (`ptr + 1` ).
### 0.1. Implement "Phantom Types" (`core/box/ptr_type_box.h`) ✅ DONE
Replace raw `void*` with strictly typed structures in Debug mode. This forces compiler errors on any manual pointer arithmetic.
```c
// Debug Mode (Strict)
typedef struct { void* addr; } hak_base_ptr_t; // Internal: Starts at allocation boundary (Header/Metadata)
typedef struct { void* addr; } hak_user_ptr_t; // External: Starts at User Payload (Returned to malloc caller)
// Release Mode (Zero Cost)
typedef void* hak_base_ptr_t;
typedef void* hak_user_ptr_t;
```
**Status:** Implemented in `core/box/ptr_type_box.h` . Used in `tls_sll_box.h` , `free_local_box.h` , etc.
### 0.2. The "Converter Box" API ✅ DONE
Centralize all pointer math. **No other file** should calculate offsets.
* `hak_user_ptr_t hak_base_to_user(hak_base_ptr_t base, int class_idx);`
* `hak_base_ptr_t hak_user_to_base(hak_user_ptr_t user, int class_idx);`
**Status:** Implemented in `core/box/ptr_conversion_box.h` .
### 0.3. Create `sh8bench` Reproducer ✅ VERIFIED
Create a standalone minimal test (`tests/repro_misalign.c` ) that:
1. Allocates a Tiny block (returning an odd address).
2. Manually aligns it to even/16B boundary (simulating `sh8bench` behavior).
3. Writes past the end (neighbor corruption).
4. Frees the pointer.
**Status:** The hypothesis has been **mathematically verified** by Gemini (see `docs/tls_sll_hdr_reset_final_report.md` ).
**Gemini's Proof (2025-12-03):**
- Log analysis: `node=0x...e1` in `tls_sll_push()` means `user_ptr = 0x...e2` (since push receives `user_ptr - 1` )
- Expected malloc return: `0x...e1` (Base + 1 for Class 1)
- Delta: `0xe2 - 0xe1 = +1` — sh8bench adds +1 to odd addresses
**Why ASan doesn't reproduce:**
- ASan adds Redzone around allocations → alignment guaranteed (16/32B boundary)
- With aligned addresses, sh8bench doesn't need to add +1
- Therefore, no `NORMALIZE_USERPTR` and no neighbor corruption
**Conclusion:** Original plan was correct. The reproducer test should simulate this +1 behavior.
---
## Phase 1: Logic Centralization (The "Cleanup")
Stop scattered manual offset logic in `tls_sll_box.h` , `tiny_free.inc` , etc.
### 1.1. Adopt Phantom Types Globally 🔄 IN PROGRESS
Refactor `hakmem_tiny.c` , `tls_sll_box.h` , and `tiny_nextptr.h` to accept/return `hak_base_ptr_t` or `hak_user_ptr_t` .
* **Rule:** `tls_sll_push` / `freelist` operations MUST use `hak_base_ptr_t` .
* **Rule:** `malloc` returns / `free` inputs are `hak_user_ptr_t` .
**Status:** Partially done. `tls_sll_box.h` uses `hak_base_ptr_t` . Need audit of remaining files.
### 1.2. Centralize Layout Logic ❌ PENDING
Deprecate scattered `sizeof(void*)` or `+1` math. Move class layout definitions to `core/box/tiny_layout_box.h` .
**Status:** Not started. Consider merging with existing `tiny_geometry_box.h` .
---
2025-12-03 17:16:19 +09:00
## Phase 2: Strategic Layout Change (The "Hard Problem") ✅ IMPLEMENTED (2025-12-03)
2025-12-03 11:20:18 +09:00
> **2025-12-03 Review (Updated with Gemini Final Report):**
>
> **Original Assessment:** "alignment is not proven" — **CORRECTED**
>
> Gemini's final report (`tls_sll_hdr_reset_final_report.md`) mathematically proved that:
> 1. sh8bench adds +1 to odd malloc returns (implicit alignment expectation)
> 2. This causes neighbor block corruption → TLS_SLL_HDR_RESET
> 3. ASan doesn't reproduce because it provides alignment guarantee
>
> **Two Distinct Issues:**
> - **Issue A (Fixed):** Header restoration gaps at Box boundaries → Fixed by commits `3c6c76cb1`, `a94344c1a`, `6154e7656`, `6df1bdec3`
> - **Issue B (Root Cause):** hakmem returns odd addresses, violating `alignof(max_align_t)` expectation → **Requires Phase 2**
>
2025-12-03 17:16:19 +09:00
> **Implementation Status:** Phase 2 is now **COMPLETE** - Headerless mode implemented and verified.
> Magazine Spill RAW pointer bug fixed in commit `f3f75ba3d` (missing HAK_BASE_FROM_RAW wrapper).
> Both Headerless ON/OFF modes tested and working correctly.
2025-12-03 11:20:18 +09:00
**Challenge:** Changing the layout to ensure alignment (e.g., 16B user alignment) usually requires padding, which wastes memory (e.g., 16B Data + 16B Header = 32B stride -> 100% overhead).
### 2.1. Strategy Selection: "Headerless Allocated" (Recommended for Long-Term)
Instead of adding padding, remove the inline header for allocated blocks.
* **Free State (Inside Allocator):**
* Block contains `Next Pointer` (and optional Header if space permits) at offset 0.
* Alignment: Natural.
* **Allocated State (User):**
* **No inline header.** User gets `Base + 0` .
* Alignment: Perfect (same as Base).
* **Metadata Recovery:** On `free()` , use `SuperSlab Registry` or `Bitmap` to identify the Size Class.
2025-12-03 17:16:19 +09:00
**Status:** ✅ IMPLEMENTED - Headerless mode working correctly, both ON/OFF modes tested and verified.
2025-12-03 11:20:18 +09:00
### 2.2. Implementation (Version 2 Layout) 📋 PLANNED
**Status:** Planned for future implementation. Priority: Medium (after stability confirmed).
---
## Phase 3: Fail-Fast Boundaries & Validation ⚠️ REVISED
### 3.1. The "Gatekeeper" Box (Revised Approach)
~~In `free()` and `realloc()` :~~
~~* Check alignment of incoming `user_ptr` .~~
~~* If `(ptr % ALIGNMENT != 0)` : **ABORT IMMEDIATELY** .~~
~~* Do not attempt to "fix" or "normalize" the pointer (which masks bugs like `sh8bench` 's).~~
**REVISION:** The current `NORMALIZE_USERPTR` behavior is **correct fail-safe** , not a bug mask.
**Current behavior (correct):**
1. Detect pointer delta via stride check (`tls_sll_box.h:96` )
2. Log `[TLS_SLL_NORMALIZE_USERPTR]` with detailed info
3. Normalize to correct base pointer
4. Continue operation
**Why ABORT is wrong:**
- Loses debugging context (no logs before crash)
- Breaks compatibility with existing workloads
- The normalization is defensive, not masking
**Revised Gatekeeper design:**
```c
// In free() entry:
if (pointer_delta != 0) {
// Log detailed info (already implemented)
fprintf(stderr, "[TLS_SLL_NORMALIZE_USERPTR] cls=%d node=%p -> base=%p stride=%zu\n", ...);
// Normalize and continue (fail-safe)
base = normalize(user);
// Optional: Track frequency for monitoring
atomic_increment(&g_normalize_count);
}
```
### 3.2. Distinguish Errors ✅ IMPLEMENTED
Differentiate between:
* `TLS_SLL_HDR_RESET` (Internal/Neighbor corruption detected *after* safe push).
* ~~`ALIGNMENT_FAULT`~~ → `TLS_SLL_NORMALIZE_USERPTR` (External pointer delta detected *before* processing).
**Status:** Already implemented in current codebase.
---
## Phase 4: Rollout & Tuning
1. ~~**A/B Testing:** Use `HAKMEM_TINY_LAYOUT=v2` to toggle the new Headerless/Aligned layout.~~
**Revised:** A/B testing for header write behavior already exists via `HAKMEM_TINY_WRITE_HEADER` .
2. ~~**Verify `sh8bench`:** Confirm it crashes with `ALIGNMENT_FAULT`~~
**Revised:** Current behavior is correct - sh8bench runs with TLS_SLL_HDR_RESET as warning, not crash.
3. **Benchmark:** Ensure header validation doesn't regress performance compared to no-header builds.
**Status:** Pending. Use `HAKMEM_TINY_HEADER_CLASSIDX=0` vs default to compare.
---
## Summary of Changes vs Original Plan
1. **Added Phase 0 (Phantom Types):** ~~Prevents "refactoring bugs" where we mix up Base/User pointers.~~ ✅ **DONE**
2. ~~**Changed Phase 2 Strategy:** Explicitly recommends **"Headerless"** over "Padding" to avoid 2x memory usage on small blocks.~~ ⚠️ **DEPRIORITIZED** - Root cause was not alignment
3. ~~**Strict Fail-Fast:** Instead of normalizing bad pointers (current behavior), we explicitly reject them to identify the root cause (external app bug).~~ ⚠️ **REVISED** - Current normalize-and-continue is correct fail-safe behavior
---
## Appendix: Root Cause Analysis (2025-12-03)
### Verified Root Causes (Fixed)
| Issue | Root Cause | Fix | Commit |
|-------|-----------|-----|--------|
| unified_cache_refill SEGVAULT | Compiler reordering header write after next_read | Move header write first + atomic fence | `6154e7656` |
| LRU registry未登録 | SuperSlab pop from LRU without re-registration | Add `hak_super_register()` after LRU pop | `4cc2d8add` |
| TLS SLL header corruption | Header not written at Box boundaries | Add header write at freelist→SLL transitions | `3c6c76cb1` , `a94344c1a` |
| TLS SLL race condition | Missing memory barrier in push | Add atomic fence in push_impl | `6df1bdec3` |
2025-12-03 17:16:19 +09:00
| Magazine Spill RAW pointer conversion | Missing HAK_BASE_FROM_RAW() wrapper | Add HAK_BASE_FROM_RAW(p) conversion before tls_sll_push | `f3f75ba3d` |
2025-12-03 11:20:18 +09:00
### Verified Hypotheses (2025-12-03 Gemini Final Report)
| Hypothesis | Source | Evidence |
|------------|--------|----------|
| "sh8bench adds +1 to pointer" | Gemini | ✅ **PROVEN** - Log analysis: `node=0xe1` → `user_ptr=0xe2` = +1 delta |
| "Alignment causes neighbor overwrite" | Gemini | ✅ **PROVEN** - +1 offset causes write to next block's header |
| "ASan provides alignment guarantee" | Gemini | ✅ **PROVEN** - Redzone forces aligned returns → no +1 needed |
### Long-Term Recommendations
| Recommendation | Priority | Rationale |
|----------------|----------|-----------|
| "Headerless layout" (Phase 2) | Medium | Guarantees `alignof(max_align_t)` compliance |
| Current defensive measures | High | Atomic Fence + Header Write effectively mitigates Issue B |
| Reproducer test | Low | Useful for regression testing but not blocking |
---
2025-12-03 17:16:19 +09:00
## Completed Work (2025-12-03)
2025-12-03 11:20:18 +09:00
2025-12-03 17:16:19 +09:00
✅ Phase 0: Type Safety & Reproduction (Complete)
✅ Phase 1: Logic Centralization (Mostly complete - see 1.2 below)
✅ Phase 2: Strategic Layout Change (Complete - Headerless implementation working)
✅ Phase 3.1: Gatekeeper Box (Complete - NORMALIZE + log is correct behavior)
✅ Phase 3.2: Error Distinction (Complete - TLS_SLL_HDR_RESET vs NORMALIZE_USERPTR)
## Remaining Work
1. **Phase 1.2: Centralize Layout Logic** (In Progress)
- Move scattered offset definitions to tiny_layout_box.h
- Ensure all class layout parameters are centralized
2. **Phase 1.1 Audit** (Pending)
- Verify all remaining files use hak_base_ptr_t/hak_user_ptr_t correctly
- Remove any remaining direct `+1` arithmetic in hakmem_tiny.c
3. **Performance Benchmarking** (Pending)
- Compare Headerless ON vs OFF performance
- Verify ≤ 5% performance impact