Files
hakmem/core/ptr_trace.h
Moe Charm (CI) 72b38bc994 Phase E3-FINAL: Fix Box API offset bugs - ALL classes now use correct offsets
## Root Cause Analysis (GPT5)

**Physical Layout Constraints**:
- Class 0: 8B = [1B header][7B payload] → offset 1 = 9B needed =  IMPOSSIBLE
- Class 1-6: >=16B = [1B header][15B+ payload] → offset 1 =  POSSIBLE
- Class 7: 1KB → offset 0 (compatibility)

**Correct Specification**:
- HAKMEM_TINY_HEADER_CLASSIDX != 0:
  - Class 0, 7: next at offset 0 (overwrites header when on freelist)
  - Class 1-6: next at offset 1 (after header)
- HAKMEM_TINY_HEADER_CLASSIDX == 0:
  - All classes: next at offset 0

**Previous Bug**:
- Attempted "ALL classes offset 1" unification
- Class 0 with offset 1 caused immediate SEGV (9B > 8B block size)
- Mixed 2-arg/3-arg API caused confusion

## Fixes Applied

### 1. Restored 3-Argument Box API (core/box/tiny_next_ptr_box.h)
```c
// Correct signatures
void tiny_next_write(int class_idx, void* base, void* next_value)
void* tiny_next_read(int class_idx, const void* base)

// Correct offset calculation
size_t offset = (class_idx == 0 || class_idx == 7) ? 0 : 1;
```

### 2. Updated 123+ Call Sites Across 34 Files
- hakmem_tiny_hot_pop_v4.inc.h (4 locations)
- hakmem_tiny_fastcache.inc.h (3 locations)
- hakmem_tiny_tls_list.h (12 locations)
- superslab_inline.h (5 locations)
- tiny_fastcache.h (3 locations)
- ptr_trace.h (macro definitions)
- tls_sll_box.h (2 locations)
- + 27 additional files

Pattern: `tiny_next_read(base)` → `tiny_next_read(class_idx, base)`
Pattern: `tiny_next_write(base, next)` → `tiny_next_write(class_idx, base, next)`

### 3. Added Sentinel Detection Guards
- tiny_fast_push(): Block nodes with sentinel in ptr or ptr->next
- tls_list_push(): Block nodes with sentinel in ptr or ptr->next
- Defense-in-depth against remote free sentinel leakage

## Verification (GPT5 Report)

**Test Command**: `./out/release/bench_random_mixed_hakmem --iterations=70000`

**Results**:
-  Main loop completed successfully
-  Drain phase completed successfully
-  NO SEGV (previous crash at iteration 66151 is FIXED)
- ℹ️ Final log: "tiny_alloc(1024) failed" is normal fallback to Mid/ACE layers

**Analysis**:
- Class 0 immediate SEGV:  RESOLVED (correct offset 0 now used)
- 66K iteration crash:  RESOLVED (offset consistency fixed)
- Box API conflicts:  RESOLVED (unified 3-arg API)

## Technical Details

### Offset Logic Justification
```
Class 0:  8B block → next pointer (8B) fits ONLY at offset 0
Class 1: 16B block → next pointer (8B) fits at offset 1 (after 1B header)
Class 2: 32B block → next pointer (8B) fits at offset 1
...
Class 6: 512B block → next pointer (8B) fits at offset 1
Class 7: 1024B block → offset 0 for legacy compatibility
```

### Files Modified (Summary)
- Core API: `box/tiny_next_ptr_box.h`
- Hot paths: `hakmem_tiny_hot_pop*.inc.h`, `tiny_fastcache.h`
- TLS layers: `hakmem_tiny_tls_list.h`, `hakmem_tiny_tls_ops.h`
- SuperSlab: `superslab_inline.h`, `tiny_superslab_*.inc.h`
- Refill: `hakmem_tiny_refill.inc.h`, `tiny_refill_opt.h`
- Free paths: `tiny_free_magazine.inc.h`, `tiny_superslab_free.inc.h`
- Documentation: Multiple Phase E3 reports

## Remaining Work

None for Box API offset bugs - all structural issues resolved.

Future enhancements (non-critical):
- Periodic `grep -R '*(void**)' core/` to detect direct pointer access violations
- Enforce Box API usage via static analysis
- Document offset rationale in architecture docs

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 06:50:20 +09:00

130 lines
5.2 KiB
C
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

// ptr_trace.h - Pointer link read/write tracing (Debug-only, macro-replaceable)
//
// Purpose:
// - Centrally instrument single-linked list next pointer reads/writes
// - Zero overhead in release builds (macros devolve to raw ops)
// - Compact TLS ring for triage; optional dump at exit via env
//
// Usage:
// - Replace direct next ops with PTR_NEXT_WRITE / PTR_NEXT_READ
// - Tag with a short literal (e.g., "tls_push", "tls_pop", "splice")
// - Offsets should be header-aware (C0C6: +1, C7: +0)
//
// Control:
// - Compile-time: HAKMEM_PTR_TRACE (default: 1 for debug, 0 for release)
// - Runtime dump: HAKMEM_PTR_TRACE_DUMP=1 (prints ring at exit)
//
// Phase E1-CORRECT: Uses Box API (tiny_next_ptr_box.h) for offset calculation
#pragma once
#include <stdint.h>
#include <stddef.h>
#include "box/tiny_next_ptr_box.h" // Box API: tiny_next_read/write
#ifndef HAKMEM_PTR_TRACE
# if !HAKMEM_BUILD_RELEASE
# define HAKMEM_PTR_TRACE 1
# else
# define HAKMEM_PTR_TRACE 0
# endif
#endif
#if HAKMEM_PTR_TRACE
#include <stdio.h>
typedef struct {
const char* tag; // literal tag
int class_idx; // tiny class (0..7)
void* node; // node address (base)
void* val; // written/read value
size_t off; // offset used for access
} ptr_trace_ev_t;
#ifndef PTR_TRACE_CAP
# define PTR_TRACE_CAP 256
#endif
static __thread ptr_trace_ev_t g_ptr_trace_ring[PTR_TRACE_CAP];
static __thread uint32_t g_ptr_trace_idx = 0;
static __thread int g_ptr_trace_dump_registered = 0;
static inline void ptr_trace_record(const char* tag, int cls, void* node, void* val, size_t off) {
uint32_t i = g_ptr_trace_idx++;
g_ptr_trace_ring[i & (PTR_TRACE_CAP - 1)] = (ptr_trace_ev_t){ tag, cls, node, val, off };
}
static inline void ptr_trace_dump(void) {
fprintf(stderr, "\n[PTR_TRACE_DUMP] last=%u (cap=%u)\n", g_ptr_trace_idx, (unsigned)PTR_TRACE_CAP);
uint32_t n = (g_ptr_trace_idx < PTR_TRACE_CAP) ? g_ptr_trace_idx : PTR_TRACE_CAP;
for (uint32_t k = 0; k < n; k++) {
const ptr_trace_ev_t* e = &g_ptr_trace_ring[(g_ptr_trace_idx - n + k) & (PTR_TRACE_CAP - 1)];
fprintf(stderr, "[%3u] tag=%s cls=%d node=%p val=%p off=%zu\n",
k, e->tag ? e->tag : "?", e->class_idx, e->node, e->val, e->off);
}
}
static inline void ptr_trace_try_register_dump(void) {
if (g_ptr_trace_dump_registered) return;
g_ptr_trace_dump_registered = 1;
const char* env = getenv("HAKMEM_PTR_TRACE_DUMP");
if (!(env && *env && *env != '0')) return;
atexit(ptr_trace_dump);
}
// Immediate dump (Debug only) — static inline to avoid ODR/link conflicts under LTO
// Only dumps if HAKMEM_PTR_TRACE_VERBOSE=1 to avoid excessive output in debug builds
#if HAKMEM_PTR_TRACE
static inline void ptr_trace_dump_now(const char* reason) {
static int verbose_mode = -1;
if (verbose_mode == -1) {
const char* env = getenv("HAKMEM_PTR_TRACE_VERBOSE");
verbose_mode = (env && *env && *env != '0') ? 1 : 0;
}
if (!verbose_mode) return; // Skip verbose logging unless explicitly enabled
fprintf(stderr, "\n[PTR_TRACE_NOW] reason=%s last=%u (cap=%u)\n",
reason ? reason : "(null)", g_ptr_trace_idx, (unsigned)PTR_TRACE_CAP);
uint32_t n = (g_ptr_trace_idx < PTR_TRACE_CAP) ? g_ptr_trace_idx : PTR_TRACE_CAP;
for (uint32_t k = 0; k < n; k++) {
const ptr_trace_ev_t* e = &g_ptr_trace_ring[(g_ptr_trace_idx - n + k) & (PTR_TRACE_CAP - 1)];
fprintf(stderr, "[%3u] tag=%s cls=%d node=%p val=%p off=%zu\n",
k, e->tag ? e->tag : "?", e->class_idx, e->node, e->val, e->off);
}
}
#else
static inline void ptr_trace_dump_now(const char* reason) { (void)reason; }
#endif
// Phase E1-CORRECT: Use Box API for all next pointer operations
// Box API handles offset calculation internally based on class_idx
#define PTR_NEXT_WRITE(tag, cls, node, off, value) do { \
(void)(off); /* unused, Box API handles offset */ \
tiny_next_write((cls), (node), (value)); \
ptr_trace_record((tag), (cls), (node), (value), 1); \
ptr_trace_try_register_dump(); \
} while(0)
#define PTR_NEXT_READ(tag, cls, node, off, out_var) do { \
(void)(off); /* unused, Box API handles offset */ \
(out_var) = tiny_next_read((cls), (node)); \
ptr_trace_record((tag), (cls), (node), (out_var), 1); \
ptr_trace_try_register_dump(); \
} while(0)
#else // HAKMEM_PTR_TRACE == 0
// Phase E1-CORRECT: Use Box API for all next pointer operations (Release mode)
// Zero cost: Box API functions are static inline with compile-time flag evaluation
// Unified 2-argument API: ALL classes (C0-C7) use offset 1, class_idx no longer needed
#define PTR_NEXT_WRITE(tag, cls, node, off, value) \
do { (void)(tag); (void)(off); tiny_next_write((cls), (node), (value)); } while(0)
#define PTR_NEXT_READ(tag, cls, node, off, out_var) \
do { (void)(tag); (void)(off); (out_var) = tiny_next_read((cls), (node)); } while(0)
// Always provide a stub for release builds so callers can link
static inline void ptr_trace_dump_now(const char* reason) { (void)reason; }
#endif // HAKMEM_PTR_TRACE