- Phase 2 Headerless implementation now complete
- Magazine Spill RAW pointer bug fixed in commit f3f75ba3d
- Both Headerless ON/OFF modes verified working
- Reorganized "Next Steps" to reflect completed/remaining work
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
12 KiB
Gemini Enhanced: Tiny Allocator Refactoring Plan
Objective: Safe Alignment & Structural Integrity
This document outlines a revised refactoring plan for hakmem's Tiny Allocator.
It builds upon the original ChatGPT proposal but incorporates specific safeguards against memory bloat and ensures type safety before major layout changes.
Primary Goal: Eliminate sh8bench memory corruption (and similar future bugs) caused by misalignment/odd-address returns header restoration gaps at Box boundaries, without doubling memory consumption.
2025-12-03 Review Notes (Claude Code + Task Agent + Gemini Final Report):
- Phase 0.1-0.2: Already implemented (
ptr_type_box.h,ptr_conversion_box.h)- Phase 0.3:
Premise is unverifiedVERIFIED by Gemini (seetls_sll_hdr_reset_final_report.md)- Phase 2:
"Headerless" strategy is over-engineeringRECONSIDERED - alignment guarantee is legitimate long-term goal- Phase 3.1: ABORT is too aggressive; current NORMALIZE + log is correct fail-safe
2025-12-03 Update (Gemini Final Report): Gemini mathematically proved that sh8bench adds +1 to odd malloc returns:
- Log analysis:
node=0xe1→user_ptr=0xe2→ expected0xe1= +1 delta- ASan doesn't reproduce because it adds Redzone → alignment guaranteed → no +1 needed
- Conclusion: hakmem's odd-address returns cause compatibility issues with some applications
Phase 0: Type Safety & Reproduction (The "Safety Net")
Before changing how memory is laid out, we must rigorously define how pointers are handled to prevent manual arithmetic errors (ptr + 1).
0.1. Implement "Phantom Types" (core/box/ptr_type_box.h) ✅ DONE
Replace raw void* with strictly typed structures in Debug mode. This forces compiler errors on any manual pointer arithmetic.
// Debug Mode (Strict)
typedef struct { void* addr; } hak_base_ptr_t; // Internal: Starts at allocation boundary (Header/Metadata)
typedef struct { void* addr; } hak_user_ptr_t; // External: Starts at User Payload (Returned to malloc caller)
// Release Mode (Zero Cost)
typedef void* hak_base_ptr_t;
typedef void* hak_user_ptr_t;
Status: Implemented in core/box/ptr_type_box.h. Used in tls_sll_box.h, free_local_box.h, etc.
0.2. The "Converter Box" API ✅ DONE
Centralize all pointer math. No other file should calculate offsets.
hak_user_ptr_t hak_base_to_user(hak_base_ptr_t base, int class_idx);hak_base_ptr_t hak_user_to_base(hak_user_ptr_t user, int class_idx);
Status: Implemented in core/box/ptr_conversion_box.h.
0.3. Create sh8bench Reproducer ✅ VERIFIED
Create a standalone minimal test (tests/repro_misalign.c) that:
- Allocates a Tiny block (returning an odd address).
- Manually aligns it to even/16B boundary (simulating
sh8benchbehavior). - Writes past the end (neighbor corruption).
- Frees the pointer.
Status: The hypothesis has been mathematically verified by Gemini (see docs/tls_sll_hdr_reset_final_report.md).
Gemini's Proof (2025-12-03):
- Log analysis:
node=0x...e1intls_sll_push()meansuser_ptr = 0x...e2(since push receivesuser_ptr - 1) - Expected malloc return:
0x...e1(Base + 1 for Class 1) - Delta:
0xe2 - 0xe1 = +1— sh8bench adds +1 to odd addresses
Why ASan doesn't reproduce:
- ASan adds Redzone around allocations → alignment guaranteed (16/32B boundary)
- With aligned addresses, sh8bench doesn't need to add +1
- Therefore, no
NORMALIZE_USERPTRand no neighbor corruption
Conclusion: Original plan was correct. The reproducer test should simulate this +1 behavior.
Phase 1: Logic Centralization (The "Cleanup")
Stop scattered manual offset logic in tls_sll_box.h, tiny_free.inc, etc.
1.1. Adopt Phantom Types Globally 🔄 IN PROGRESS
Refactor hakmem_tiny.c, tls_sll_box.h, and tiny_nextptr.h to accept/return hak_base_ptr_t or hak_user_ptr_t.
- Rule:
tls_sll_push/freelistoperations MUST usehak_base_ptr_t. - Rule:
mallocreturns /freeinputs arehak_user_ptr_t.
Status: Partially done. tls_sll_box.h uses hak_base_ptr_t. Need audit of remaining files.
1.2. Centralize Layout Logic ❌ PENDING
Deprecate scattered sizeof(void*) or +1 math. Move class layout definitions to core/box/tiny_layout_box.h.
Status: Not started. Consider merging with existing tiny_geometry_box.h.
Phase 2: Strategic Layout Change (The "Hard Problem") ✅ IMPLEMENTED (2025-12-03)
2025-12-03 Review (Updated with Gemini Final Report):
Original Assessment: "alignment is not proven" — CORRECTED
Gemini's final report (
tls_sll_hdr_reset_final_report.md) mathematically proved that:
- sh8bench adds +1 to odd malloc returns (implicit alignment expectation)
- This causes neighbor block corruption → TLS_SLL_HDR_RESET
- ASan doesn't reproduce because it provides alignment guarantee
Two Distinct Issues:
- Issue A (Fixed): Header restoration gaps at Box boundaries → Fixed by commits
3c6c76cb1,a94344c1a,6154e7656,6df1bdec3- Issue B (Root Cause): hakmem returns odd addresses, violating
alignof(max_align_t)expectation → Requires Phase 2Implementation Status: Phase 2 is now COMPLETE - Headerless mode implemented and verified. Magazine Spill RAW pointer bug fixed in commit
f3f75ba3d(missing HAK_BASE_FROM_RAW wrapper). Both Headerless ON/OFF modes tested and working correctly.
Challenge: Changing the layout to ensure alignment (e.g., 16B user alignment) usually requires padding, which wastes memory (e.g., 16B Data + 16B Header = 32B stride -> 100% overhead).
2.1. Strategy Selection: "Headerless Allocated" (Recommended for Long-Term)
Instead of adding padding, remove the inline header for allocated blocks.
- Free State (Inside Allocator):
- Block contains
Next Pointer(and optional Header if space permits) at offset 0. - Alignment: Natural.
- Block contains
- Allocated State (User):
- No inline header. User gets
Base + 0. - Alignment: Perfect (same as Base).
- Metadata Recovery: On
free(), useSuperSlab RegistryorBitmapto identify the Size Class.
- No inline header. User gets
Status: ✅ IMPLEMENTED - Headerless mode working correctly, both ON/OFF modes tested and verified.
2.2. Implementation (Version 2 Layout) 📋 PLANNED
Status: Planned for future implementation. Priority: Medium (after stability confirmed).
Phase 3: Fail-Fast Boundaries & Validation ⚠️ REVISED
3.1. The "Gatekeeper" Box (Revised Approach)
In
free() and realloc():* Check alignment of incoming
user_ptr.* If
(ptr % ALIGNMENT != 0): ABORT IMMEDIATELY.* Do not attempt to "fix" or "normalize" the pointer (which masks bugs like sh8bench's).
REVISION: The current NORMALIZE_USERPTR behavior is correct fail-safe, not a bug mask.
Current behavior (correct):
- Detect pointer delta via stride check (
tls_sll_box.h:96) - Log
[TLS_SLL_NORMALIZE_USERPTR]with detailed info - Normalize to correct base pointer
- Continue operation
Why ABORT is wrong:
- Loses debugging context (no logs before crash)
- Breaks compatibility with existing workloads
- The normalization is defensive, not masking
Revised Gatekeeper design:
// In free() entry:
if (pointer_delta != 0) {
// Log detailed info (already implemented)
fprintf(stderr, "[TLS_SLL_NORMALIZE_USERPTR] cls=%d node=%p -> base=%p stride=%zu\n", ...);
// Normalize and continue (fail-safe)
base = normalize(user);
// Optional: Track frequency for monitoring
atomic_increment(&g_normalize_count);
}
3.2. Distinguish Errors ✅ IMPLEMENTED
Differentiate between:
TLS_SLL_HDR_RESET(Internal/Neighbor corruption detected after safe push).→ALIGNMENT_FAULTTLS_SLL_NORMALIZE_USERPTR(External pointer delta detected before processing).
Status: Already implemented in current codebase.
Phase 4: Rollout & Tuning
-
A/B Testing: UseRevised: A/B testing for header write behavior already exists viaHAKMEM_TINY_LAYOUT=v2to toggle the new Headerless/Aligned layout.HAKMEM_TINY_WRITE_HEADER. -
VerifyRevised: Current behavior is correct - sh8bench runs with TLS_SLL_HDR_RESET as warning, not crash.sh8bench: Confirm it crashes withALIGNMENT_FAULT -
Benchmark: Ensure header validation doesn't regress performance compared to no-header builds. Status: Pending. Use
HAKMEM_TINY_HEADER_CLASSIDX=0vs default to compare.
Summary of Changes vs Original Plan
- Added Phase 0 (Phantom Types):
Prevents "refactoring bugs" where we mix up Base/User pointers.✅ DONE Changed Phase 2 Strategy: Explicitly recommends "Headerless" over "Padding" to avoid 2x memory usage on small blocks.⚠️ DEPRIORITIZED - Root cause was not alignmentStrict Fail-Fast: Instead of normalizing bad pointers (current behavior), we explicitly reject them to identify the root cause (external app bug).⚠️ REVISED - Current normalize-and-continue is correct fail-safe behavior
Appendix: Root Cause Analysis (2025-12-03)
Verified Root Causes (Fixed)
| Issue | Root Cause | Fix | Commit |
|---|---|---|---|
| unified_cache_refill SEGVAULT | Compiler reordering header write after next_read | Move header write first + atomic fence | 6154e7656 |
| LRU registry未登録 | SuperSlab pop from LRU without re-registration | Add hak_super_register() after LRU pop |
4cc2d8add |
| TLS SLL header corruption | Header not written at Box boundaries | Add header write at freelist→SLL transitions | 3c6c76cb1, a94344c1a |
| TLS SLL race condition | Missing memory barrier in push | Add atomic fence in push_impl | 6df1bdec3 |
| Magazine Spill RAW pointer conversion | Missing HAK_BASE_FROM_RAW() wrapper | Add HAK_BASE_FROM_RAW(p) conversion before tls_sll_push | f3f75ba3d |
Verified Hypotheses (2025-12-03 Gemini Final Report)
| Hypothesis | Source | Evidence |
|---|---|---|
| "sh8bench adds +1 to pointer" | Gemini | ✅ PROVEN - Log analysis: node=0xe1 → user_ptr=0xe2 = +1 delta |
| "Alignment causes neighbor overwrite" | Gemini | ✅ PROVEN - +1 offset causes write to next block's header |
| "ASan provides alignment guarantee" | Gemini | ✅ PROVEN - Redzone forces aligned returns → no +1 needed |
Long-Term Recommendations
| Recommendation | Priority | Rationale |
|---|---|---|
| "Headerless layout" (Phase 2) | Medium | Guarantees alignof(max_align_t) compliance |
| Current defensive measures | High | Atomic Fence + Header Write effectively mitigates Issue B |
| Reproducer test | Low | Useful for regression testing but not blocking |
Completed Work (2025-12-03)
✅ Phase 0: Type Safety & Reproduction (Complete) ✅ Phase 1: Logic Centralization (Mostly complete - see 1.2 below) ✅ Phase 2: Strategic Layout Change (Complete - Headerless implementation working) ✅ Phase 3.1: Gatekeeper Box (Complete - NORMALIZE + log is correct behavior) ✅ Phase 3.2: Error Distinction (Complete - TLS_SLL_HDR_RESET vs NORMALIZE_USERPTR)
Remaining Work
-
Phase 1.2: Centralize Layout Logic (In Progress)
- Move scattered offset definitions to tiny_layout_box.h
- Ensure all class layout parameters are centralized
-
Phase 1.1 Audit (Pending)
- Verify all remaining files use hak_base_ptr_t/hak_user_ptr_t correctly
- Remove any remaining direct
+1arithmetic in hakmem_tiny.c
-
Performance Benchmarking (Pending)
- Compare Headerless ON vs OFF performance
- Verify ≤ 5% performance impact