Files
hakmem/docs/REFACTOR_PLAN_GEMINI_ENHANCED.md
Moe Charm (CI) 4a2bf30790 Update REFACTOR_PLAN to mark Phase 2 complete and document Magazine Spill fix
- Phase 2 Headerless implementation now complete
- Magazine Spill RAW pointer bug fixed in commit f3f75ba3d
- Both Headerless ON/OFF modes verified working
- Reorganized "Next Steps" to reflect completed/remaining work

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 17:16:19 +09:00

12 KiB

Gemini Enhanced: Tiny Allocator Refactoring Plan

Objective: Safe Alignment & Structural Integrity

This document outlines a revised refactoring plan for hakmem's Tiny Allocator. It builds upon the original ChatGPT proposal but incorporates specific safeguards against memory bloat and ensures type safety before major layout changes.

Primary Goal: Eliminate sh8bench memory corruption (and similar future bugs) caused by misalignment/odd-address returns header restoration gaps at Box boundaries, without doubling memory consumption.

2025-12-03 Review Notes (Claude Code + Task Agent + Gemini Final Report):

  • Phase 0.1-0.2: Already implemented (ptr_type_box.h, ptr_conversion_box.h)
  • Phase 0.3: Premise is unverified VERIFIED by Gemini (see tls_sll_hdr_reset_final_report.md)
  • Phase 2: "Headerless" strategy is over-engineering RECONSIDERED - alignment guarantee is legitimate long-term goal
  • Phase 3.1: ABORT is too aggressive; current NORMALIZE + log is correct fail-safe

2025-12-03 Update (Gemini Final Report): Gemini mathematically proved that sh8bench adds +1 to odd malloc returns:

  • Log analysis: node=0xe1user_ptr=0xe2 → expected 0xe1 = +1 delta
  • ASan doesn't reproduce because it adds Redzone → alignment guaranteed → no +1 needed
  • Conclusion: hakmem's odd-address returns cause compatibility issues with some applications

Phase 0: Type Safety & Reproduction (The "Safety Net")

Before changing how memory is laid out, we must rigorously define how pointers are handled to prevent manual arithmetic errors (ptr + 1).

0.1. Implement "Phantom Types" (core/box/ptr_type_box.h) DONE

Replace raw void* with strictly typed structures in Debug mode. This forces compiler errors on any manual pointer arithmetic.

// Debug Mode (Strict)
typedef struct { void* addr; } hak_base_ptr_t; // Internal: Starts at allocation boundary (Header/Metadata)
typedef struct { void* addr; } hak_user_ptr_t; // External: Starts at User Payload (Returned to malloc caller)

// Release Mode (Zero Cost)
typedef void* hak_base_ptr_t;
typedef void* hak_user_ptr_t;

Status: Implemented in core/box/ptr_type_box.h. Used in tls_sll_box.h, free_local_box.h, etc.

0.2. The "Converter Box" API DONE

Centralize all pointer math. No other file should calculate offsets.

  • hak_user_ptr_t hak_base_to_user(hak_base_ptr_t base, int class_idx);
  • hak_base_ptr_t hak_user_to_base(hak_user_ptr_t user, int class_idx);

Status: Implemented in core/box/ptr_conversion_box.h.

0.3. Create sh8bench Reproducer VERIFIED

Create a standalone minimal test (tests/repro_misalign.c) that:

  1. Allocates a Tiny block (returning an odd address).
  2. Manually aligns it to even/16B boundary (simulating sh8bench behavior).
  3. Writes past the end (neighbor corruption).
  4. Frees the pointer.

Status: The hypothesis has been mathematically verified by Gemini (see docs/tls_sll_hdr_reset_final_report.md).

Gemini's Proof (2025-12-03):

  • Log analysis: node=0x...e1 in tls_sll_push() means user_ptr = 0x...e2 (since push receives user_ptr - 1)
  • Expected malloc return: 0x...e1 (Base + 1 for Class 1)
  • Delta: 0xe2 - 0xe1 = +1 — sh8bench adds +1 to odd addresses

Why ASan doesn't reproduce:

  • ASan adds Redzone around allocations → alignment guaranteed (16/32B boundary)
  • With aligned addresses, sh8bench doesn't need to add +1
  • Therefore, no NORMALIZE_USERPTR and no neighbor corruption

Conclusion: Original plan was correct. The reproducer test should simulate this +1 behavior.


Phase 1: Logic Centralization (The "Cleanup")

Stop scattered manual offset logic in tls_sll_box.h, tiny_free.inc, etc.

1.1. Adopt Phantom Types Globally 🔄 IN PROGRESS

Refactor hakmem_tiny.c, tls_sll_box.h, and tiny_nextptr.h to accept/return hak_base_ptr_t or hak_user_ptr_t.

  • Rule: tls_sll_push / freelist operations MUST use hak_base_ptr_t.
  • Rule: malloc returns / free inputs are hak_user_ptr_t.

Status: Partially done. tls_sll_box.h uses hak_base_ptr_t. Need audit of remaining files.

1.2. Centralize Layout Logic PENDING

Deprecate scattered sizeof(void*) or +1 math. Move class layout definitions to core/box/tiny_layout_box.h.

Status: Not started. Consider merging with existing tiny_geometry_box.h.


Phase 2: Strategic Layout Change (The "Hard Problem") IMPLEMENTED (2025-12-03)

2025-12-03 Review (Updated with Gemini Final Report):

Original Assessment: "alignment is not proven" — CORRECTED

Gemini's final report (tls_sll_hdr_reset_final_report.md) mathematically proved that:

  1. sh8bench adds +1 to odd malloc returns (implicit alignment expectation)
  2. This causes neighbor block corruption → TLS_SLL_HDR_RESET
  3. ASan doesn't reproduce because it provides alignment guarantee

Two Distinct Issues:

  • Issue A (Fixed): Header restoration gaps at Box boundaries → Fixed by commits 3c6c76cb1, a94344c1a, 6154e7656, 6df1bdec3
  • Issue B (Root Cause): hakmem returns odd addresses, violating alignof(max_align_t) expectation → Requires Phase 2

Implementation Status: Phase 2 is now COMPLETE - Headerless mode implemented and verified. Magazine Spill RAW pointer bug fixed in commit f3f75ba3d (missing HAK_BASE_FROM_RAW wrapper). Both Headerless ON/OFF modes tested and working correctly.

Challenge: Changing the layout to ensure alignment (e.g., 16B user alignment) usually requires padding, which wastes memory (e.g., 16B Data + 16B Header = 32B stride -> 100% overhead).

Instead of adding padding, remove the inline header for allocated blocks.

  • Free State (Inside Allocator):
    • Block contains Next Pointer (and optional Header if space permits) at offset 0.
    • Alignment: Natural.
  • Allocated State (User):
    • No inline header. User gets Base + 0.
    • Alignment: Perfect (same as Base).
    • Metadata Recovery: On free(), use SuperSlab Registry or Bitmap to identify the Size Class.

Status: IMPLEMENTED - Headerless mode working correctly, both ON/OFF modes tested and verified.

2.2. Implementation (Version 2 Layout) 📋 PLANNED

Status: Planned for future implementation. Priority: Medium (after stability confirmed).


Phase 3: Fail-Fast Boundaries & Validation ⚠️ REVISED

3.1. The "Gatekeeper" Box (Revised Approach)

In free() and realloc(): * Check alignment of incoming user_ptr. * If (ptr % ALIGNMENT != 0): ABORT IMMEDIATELY. * Do not attempt to "fix" or "normalize" the pointer (which masks bugs like sh8bench's).

REVISION: The current NORMALIZE_USERPTR behavior is correct fail-safe, not a bug mask.

Current behavior (correct):

  1. Detect pointer delta via stride check (tls_sll_box.h:96)
  2. Log [TLS_SLL_NORMALIZE_USERPTR] with detailed info
  3. Normalize to correct base pointer
  4. Continue operation

Why ABORT is wrong:

  • Loses debugging context (no logs before crash)
  • Breaks compatibility with existing workloads
  • The normalization is defensive, not masking

Revised Gatekeeper design:

// In free() entry:
if (pointer_delta != 0) {
    // Log detailed info (already implemented)
    fprintf(stderr, "[TLS_SLL_NORMALIZE_USERPTR] cls=%d node=%p -> base=%p stride=%zu\n", ...);

    // Normalize and continue (fail-safe)
    base = normalize(user);

    // Optional: Track frequency for monitoring
    atomic_increment(&g_normalize_count);
}

3.2. Distinguish Errors IMPLEMENTED

Differentiate between:

  • TLS_SLL_HDR_RESET (Internal/Neighbor corruption detected after safe push).
  • ALIGNMENT_FAULTTLS_SLL_NORMALIZE_USERPTR (External pointer delta detected before processing).

Status: Already implemented in current codebase.


Phase 4: Rollout & Tuning

  1. A/B Testing: Use HAKMEM_TINY_LAYOUT=v2 to toggle the new Headerless/Aligned layout. Revised: A/B testing for header write behavior already exists via HAKMEM_TINY_WRITE_HEADER.

  2. Verify sh8bench: Confirm it crashes with ALIGNMENT_FAULT Revised: Current behavior is correct - sh8bench runs with TLS_SLL_HDR_RESET as warning, not crash.

  3. Benchmark: Ensure header validation doesn't regress performance compared to no-header builds. Status: Pending. Use HAKMEM_TINY_HEADER_CLASSIDX=0 vs default to compare.


Summary of Changes vs Original Plan

  1. Added Phase 0 (Phantom Types): Prevents "refactoring bugs" where we mix up Base/User pointers. DONE
  2. Changed Phase 2 Strategy: Explicitly recommends "Headerless" over "Padding" to avoid 2x memory usage on small blocks. ⚠️ DEPRIORITIZED - Root cause was not alignment
  3. Strict Fail-Fast: Instead of normalizing bad pointers (current behavior), we explicitly reject them to identify the root cause (external app bug). ⚠️ REVISED - Current normalize-and-continue is correct fail-safe behavior

Appendix: Root Cause Analysis (2025-12-03)

Verified Root Causes (Fixed)

Issue Root Cause Fix Commit
unified_cache_refill SEGVAULT Compiler reordering header write after next_read Move header write first + atomic fence 6154e7656
LRU registry未登録 SuperSlab pop from LRU without re-registration Add hak_super_register() after LRU pop 4cc2d8add
TLS SLL header corruption Header not written at Box boundaries Add header write at freelist→SLL transitions 3c6c76cb1, a94344c1a
TLS SLL race condition Missing memory barrier in push Add atomic fence in push_impl 6df1bdec3
Magazine Spill RAW pointer conversion Missing HAK_BASE_FROM_RAW() wrapper Add HAK_BASE_FROM_RAW(p) conversion before tls_sll_push f3f75ba3d

Verified Hypotheses (2025-12-03 Gemini Final Report)

Hypothesis Source Evidence
"sh8bench adds +1 to pointer" Gemini PROVEN - Log analysis: node=0xe1user_ptr=0xe2 = +1 delta
"Alignment causes neighbor overwrite" Gemini PROVEN - +1 offset causes write to next block's header
"ASan provides alignment guarantee" Gemini PROVEN - Redzone forces aligned returns → no +1 needed

Long-Term Recommendations

Recommendation Priority Rationale
"Headerless layout" (Phase 2) Medium Guarantees alignof(max_align_t) compliance
Current defensive measures High Atomic Fence + Header Write effectively mitigates Issue B
Reproducer test Low Useful for regression testing but not blocking

Completed Work (2025-12-03)

Phase 0: Type Safety & Reproduction (Complete) Phase 1: Logic Centralization (Mostly complete - see 1.2 below) Phase 2: Strategic Layout Change (Complete - Headerless implementation working) Phase 3.1: Gatekeeper Box (Complete - NORMALIZE + log is correct behavior) Phase 3.2: Error Distinction (Complete - TLS_SLL_HDR_RESET vs NORMALIZE_USERPTR)

Remaining Work

  1. Phase 1.2: Centralize Layout Logic (In Progress)

    • Move scattered offset definitions to tiny_layout_box.h
    • Ensure all class layout parameters are centralized
  2. Phase 1.1 Audit (Pending)

    • Verify all remaining files use hak_base_ptr_t/hak_user_ptr_t correctly
    • Remove any remaining direct +1 arithmetic in hakmem_tiny.c
  3. Performance Benchmarking (Pending)

    • Compare Headerless ON vs OFF performance
    • Verify ≤ 5% performance impact