Files
hakmem/core/hakmem_tiny_magazine.c
Moe Charm (CI) 72b38bc994 Phase E3-FINAL: Fix Box API offset bugs - ALL classes now use correct offsets
## Root Cause Analysis (GPT5)

**Physical Layout Constraints**:
- Class 0: 8B = [1B header][7B payload] → offset 1 = 9B needed =  IMPOSSIBLE
- Class 1-6: >=16B = [1B header][15B+ payload] → offset 1 =  POSSIBLE
- Class 7: 1KB → offset 0 (compatibility)

**Correct Specification**:
- HAKMEM_TINY_HEADER_CLASSIDX != 0:
  - Class 0, 7: next at offset 0 (overwrites header when on freelist)
  - Class 1-6: next at offset 1 (after header)
- HAKMEM_TINY_HEADER_CLASSIDX == 0:
  - All classes: next at offset 0

**Previous Bug**:
- Attempted "ALL classes offset 1" unification
- Class 0 with offset 1 caused immediate SEGV (9B > 8B block size)
- Mixed 2-arg/3-arg API caused confusion

## Fixes Applied

### 1. Restored 3-Argument Box API (core/box/tiny_next_ptr_box.h)
```c
// Correct signatures
void tiny_next_write(int class_idx, void* base, void* next_value)
void* tiny_next_read(int class_idx, const void* base)

// Correct offset calculation
size_t offset = (class_idx == 0 || class_idx == 7) ? 0 : 1;
```

### 2. Updated 123+ Call Sites Across 34 Files
- hakmem_tiny_hot_pop_v4.inc.h (4 locations)
- hakmem_tiny_fastcache.inc.h (3 locations)
- hakmem_tiny_tls_list.h (12 locations)
- superslab_inline.h (5 locations)
- tiny_fastcache.h (3 locations)
- ptr_trace.h (macro definitions)
- tls_sll_box.h (2 locations)
- + 27 additional files

Pattern: `tiny_next_read(base)` → `tiny_next_read(class_idx, base)`
Pattern: `tiny_next_write(base, next)` → `tiny_next_write(class_idx, base, next)`

### 3. Added Sentinel Detection Guards
- tiny_fast_push(): Block nodes with sentinel in ptr or ptr->next
- tls_list_push(): Block nodes with sentinel in ptr or ptr->next
- Defense-in-depth against remote free sentinel leakage

## Verification (GPT5 Report)

**Test Command**: `./out/release/bench_random_mixed_hakmem --iterations=70000`

**Results**:
-  Main loop completed successfully
-  Drain phase completed successfully
-  NO SEGV (previous crash at iteration 66151 is FIXED)
- ℹ️ Final log: "tiny_alloc(1024) failed" is normal fallback to Mid/ACE layers

**Analysis**:
- Class 0 immediate SEGV:  RESOLVED (correct offset 0 now used)
- 66K iteration crash:  RESOLVED (offset consistency fixed)
- Box API conflicts:  RESOLVED (unified 3-arg API)

## Technical Details

### Offset Logic Justification
```
Class 0:  8B block → next pointer (8B) fits ONLY at offset 0
Class 1: 16B block → next pointer (8B) fits at offset 1 (after 1B header)
Class 2: 32B block → next pointer (8B) fits at offset 1
...
Class 6: 512B block → next pointer (8B) fits at offset 1
Class 7: 1024B block → offset 0 for legacy compatibility
```

### Files Modified (Summary)
- Core API: `box/tiny_next_ptr_box.h`
- Hot paths: `hakmem_tiny_hot_pop*.inc.h`, `tiny_fastcache.h`
- TLS layers: `hakmem_tiny_tls_list.h`, `hakmem_tiny_tls_ops.h`
- SuperSlab: `superslab_inline.h`, `tiny_superslab_*.inc.h`
- Refill: `hakmem_tiny_refill.inc.h`, `tiny_refill_opt.h`
- Free paths: `tiny_free_magazine.inc.h`, `tiny_superslab_free.inc.h`
- Documentation: Multiple Phase E3 reports

## Remaining Work

None for Box API offset bugs - all structural issues resolved.

Future enhancements (non-critical):
- Periodic `grep -R '*(void**)' core/` to detect direct pointer access violations
- Enforce Box API usage via static analysis
- Document offset rationale in architecture docs

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 06:50:20 +09:00

152 lines
5.5 KiB
C
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

#include "hakmem_tiny_magazine.h"
#include "hakmem_tiny_config.h" // Centralized configuration
#include "hakmem_tiny.h" // For TINY_NUM_CLASSES
#include "hakmem_tiny_superslab.h"
#include "hakmem_super_registry.h" // Phase 1: For hak_super_lookup()
#include "tiny_remote.h"
#include "hakmem_prof.h"
#include "hakmem_internal.h"
#include "box/tiny_next_ptr_box.h" // Box API: Next pointer read/write
#include <pthread.h>
static inline uint32_t tiny_self_u32_guard(void) {
return (uint32_t)(uintptr_t)pthread_self();
}
static inline void superslab_dec_active_safe(SuperSlab* ss) {
if (!ss) return;
uint32_t old = atomic_load_explicit(&ss->total_active_blocks, memory_order_relaxed);
while (old != 0u) {
if (atomic_compare_exchange_weak_explicit(&ss->total_active_blocks,
&old,
old - 1u,
memory_order_relaxed,
memory_order_relaxed)) {
break;
}
}
}
__thread TinyTLSMag g_tls_mags[TINY_NUM_CLASSES] = {0};
// Global cap limiter (can be reduced via env HAKMEM_TINY_MAG_CAP)
int g_mag_cap_limit = TINY_TLS_MAG_CAP;
// Normal-path per-class overrides (env tunables)
int g_mag_cap_override[TINY_NUM_CLASSES] = {0}; // HAKMEM_TINY_MAG_CAP_C{0..7}
__thread int g_tls_small_mags_inited = 0;
// tiny_default_cap() and tiny_cap_max_for_class() now defined as inline functions
// in hakmem_tiny_config.h for centralized configuration
int tiny_effective_cap(int class_idx) {
// Env override takes precedence per class
int ov = g_mag_cap_override[class_idx];
if (ov > 0) return ov;
return tiny_default_cap(class_idx); // Use centralized config function
}
void tiny_small_mags_init_once(void) {
if (__builtin_expect(g_tls_small_mags_inited, 1)) return;
for (int k = 0; k <= 3; k++) {
TinyTLSMag* m = &g_tls_mags[k];
if (m->cap == 0) {
int base = tiny_effective_cap(k);
int cap = (base < TINY_TLS_MAG_CAP) ? base : TINY_TLS_MAG_CAP;
if (g_mag_cap_limit < cap) cap = g_mag_cap_limit;
m->cap = cap;
m->top = 0;
}
}
g_tls_small_mags_inited = 1;
}
void tiny_mag_init_if_needed(int class_idx) {
TinyTLSMag* mag = &g_tls_mags[class_idx];
if (mag->cap == 0) {
int base = tiny_effective_cap(class_idx);
int cap = (base < TINY_TLS_MAG_CAP) ? base : TINY_TLS_MAG_CAP;
if (g_mag_cap_limit < cap) cap = g_mag_cap_limit;
mag->cap = cap;
mag->top = 0;
}
}
// ============================================================================
// ACE Learning Layer: Runtime TLS Capacity Adjustment
// ============================================================================
void hkm_ace_set_tls_capacity(int class_idx, uint32_t capacity) {
// Validate inputs
if (class_idx < 0 || class_idx >= TINY_NUM_CLASSES) {
return;
}
if (capacity < 16 || capacity > (uint32_t)tiny_cap_max_for_class(class_idx)) {
return;
}
// Set override (will be used by new thread-local magazines on next init)
// Note: Lazy sync implementation is in hakmem_tiny_magazine.h (inlined)
g_mag_cap_override[class_idx] = (int)capacity;
}
// ============================================================================
// Phase 7.7: Magazine Flush API
// ============================================================================
// Flush Magazine cache for a specific size class
// Forces all cached blocks to be returned to freelists, enabling empty
// SuperSlab detection and deallocation
void hak_tiny_magazine_flush(int class_idx) {
if (class_idx < 0 || class_idx >= TINY_NUM_CLASSES) return;
// Initialize if needed
tiny_mag_init_if_needed(class_idx);
TinyTLSMag* mag = &g_tls_mags[class_idx];
if (mag->top == 0) return; // Nothing to flush
// Lock and flush entire Magazine to freelist
pthread_mutex_t* lock = &g_tiny_class_locks[class_idx].m;
struct timespec tss; int ss_time = hkm_prof_begin(&tss);
pthread_mutex_lock(lock);
// Flush ALL blocks (not just half like normal spill)
int flush_count = mag->top;
uint32_t self_tid = tiny_self_u32_guard();
for (int i = 0; i < flush_count; i++) {
TinyMagItem it = mag->items[--mag->top];
// Return to SuperSlab freelist
SuperSlab* owner_ss = hak_super_lookup(it.ptr);
if (owner_ss && owner_ss->magic == SUPERSLAB_MAGIC) {
int slab_idx = slab_index_for(owner_ss, it.ptr);
TinySlabMeta* meta = &owner_ss->slabs[slab_idx];
if (!tiny_remote_guard_allow_local_push(owner_ss, slab_idx, meta, it.ptr, "mag_flush", self_tid)) {
(void)ss_remote_push(owner_ss, slab_idx, it.ptr);
if (meta->used > 0) meta->used--;
continue;
}
tiny_next_write(owner_ss->size_class, it.ptr, meta->freelist);
meta->freelist = it.ptr;
meta->used--;
// Active was decremented at free time
// 空検出・解放はフラッシュ系APIへ委譲ホットパス除外
}
}
pthread_mutex_unlock(lock);
hkm_prof_end(ss_time, HKP_TINY_SPILL, &tss);
}
// Flush all Magazine caches
// Call this when memory needs to be released (e.g., before measuring RSS)
void hak_tiny_magazine_flush_all(void) {
for (int i = 0; i < TINY_NUM_CLASSES; i++) {
hak_tiny_magazine_flush(i);
}
hak_tiny_trim();
}