Files
hakmem/core/box/integrity_box.c
Moe Charm (CI) 72b38bc994 Phase E3-FINAL: Fix Box API offset bugs - ALL classes now use correct offsets
## Root Cause Analysis (GPT5)

**Physical Layout Constraints**:
- Class 0: 8B = [1B header][7B payload] → offset 1 = 9B needed =  IMPOSSIBLE
- Class 1-6: >=16B = [1B header][15B+ payload] → offset 1 =  POSSIBLE
- Class 7: 1KB → offset 0 (compatibility)

**Correct Specification**:
- HAKMEM_TINY_HEADER_CLASSIDX != 0:
  - Class 0, 7: next at offset 0 (overwrites header when on freelist)
  - Class 1-6: next at offset 1 (after header)
- HAKMEM_TINY_HEADER_CLASSIDX == 0:
  - All classes: next at offset 0

**Previous Bug**:
- Attempted "ALL classes offset 1" unification
- Class 0 with offset 1 caused immediate SEGV (9B > 8B block size)
- Mixed 2-arg/3-arg API caused confusion

## Fixes Applied

### 1. Restored 3-Argument Box API (core/box/tiny_next_ptr_box.h)
```c
// Correct signatures
void tiny_next_write(int class_idx, void* base, void* next_value)
void* tiny_next_read(int class_idx, const void* base)

// Correct offset calculation
size_t offset = (class_idx == 0 || class_idx == 7) ? 0 : 1;
```

### 2. Updated 123+ Call Sites Across 34 Files
- hakmem_tiny_hot_pop_v4.inc.h (4 locations)
- hakmem_tiny_fastcache.inc.h (3 locations)
- hakmem_tiny_tls_list.h (12 locations)
- superslab_inline.h (5 locations)
- tiny_fastcache.h (3 locations)
- ptr_trace.h (macro definitions)
- tls_sll_box.h (2 locations)
- + 27 additional files

Pattern: `tiny_next_read(base)` → `tiny_next_read(class_idx, base)`
Pattern: `tiny_next_write(base, next)` → `tiny_next_write(class_idx, base, next)`

### 3. Added Sentinel Detection Guards
- tiny_fast_push(): Block nodes with sentinel in ptr or ptr->next
- tls_list_push(): Block nodes with sentinel in ptr or ptr->next
- Defense-in-depth against remote free sentinel leakage

## Verification (GPT5 Report)

**Test Command**: `./out/release/bench_random_mixed_hakmem --iterations=70000`

**Results**:
-  Main loop completed successfully
-  Drain phase completed successfully
-  NO SEGV (previous crash at iteration 66151 is FIXED)
- ℹ️ Final log: "tiny_alloc(1024) failed" is normal fallback to Mid/ACE layers

**Analysis**:
- Class 0 immediate SEGV:  RESOLVED (correct offset 0 now used)
- 66K iteration crash:  RESOLVED (offset consistency fixed)
- Box API conflicts:  RESOLVED (unified 3-arg API)

## Technical Details

### Offset Logic Justification
```
Class 0:  8B block → next pointer (8B) fits ONLY at offset 0
Class 1: 16B block → next pointer (8B) fits at offset 1 (after 1B header)
Class 2: 32B block → next pointer (8B) fits at offset 1
...
Class 6: 512B block → next pointer (8B) fits at offset 1
Class 7: 1024B block → offset 0 for legacy compatibility
```

### Files Modified (Summary)
- Core API: `box/tiny_next_ptr_box.h`
- Hot paths: `hakmem_tiny_hot_pop*.inc.h`, `tiny_fastcache.h`
- TLS layers: `hakmem_tiny_tls_list.h`, `hakmem_tiny_tls_ops.h`
- SuperSlab: `superslab_inline.h`, `tiny_superslab_*.inc.h`
- Refill: `hakmem_tiny_refill.inc.h`, `tiny_refill_opt.h`
- Free paths: `tiny_free_magazine.inc.h`, `tiny_superslab_free.inc.h`
- Documentation: Multiple Phase E3 reports

## Remaining Work

None for Box API offset bugs - all structural issues resolved.

Future enhancements (non-critical):
- Periodic `grep -R '*(void**)' core/` to detect direct pointer access violations
- Enforce Box API usage via static analysis
- Document offset rationale in architecture docs

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 06:50:20 +09:00

484 lines
18 KiB
C

// integrity_box.c - Box I: Integrity Verification System Implementation
// Purpose: Complete implementation of modular integrity checks
// Author: Claude + Task (2025-11-12)
#include "integrity_box.h"
#include "../hakmem_tiny.h"
#include "../superslab/superslab_types.h"
#include "../tiny_box_geometry.h"
#include <stdio.h>
#include <assert.h>
#include <stdatomic.h>
#include <string.h>
// ============================================================================
// TLS Canary Magic
// ============================================================================
#define TLS_CANARY_MAGIC 0xDEADBEEFDEADBEEFULL
// External canaries from hakmem_tiny.c
extern __thread uint64_t g_tls_canary_before_sll_head;
extern __thread uint64_t g_tls_canary_after_sll_head;
extern __thread uint64_t g_tls_canary_before_sll_count;
extern __thread uint64_t g_tls_canary_after_sll_count;
// ============================================================================
// Global Statistics (atomic for thread safety)
// ============================================================================
static _Atomic uint64_t g_integrity_checks_performed = 0;
static _Atomic uint64_t g_integrity_checks_passed = 0;
static _Atomic uint64_t g_integrity_checks_failed = 0;
static _Atomic uint64_t g_integrity_tls_bounds_checks = 0;
static _Atomic uint64_t g_integrity_freelist_checks = 0;
static _Atomic uint64_t g_integrity_metadata_checks = 0;
static _Atomic uint64_t g_integrity_canary_checks = 0;
static _Atomic uint64_t g_integrity_full_system_checks = 0;
// ============================================================================
// Initialization
// ============================================================================
void integrity_box_init(void) {
// Initialize statistics (atomic init is implicit)
atomic_store(&g_integrity_checks_performed, 0);
atomic_store(&g_integrity_checks_passed, 0);
atomic_store(&g_integrity_checks_failed, 0);
atomic_store(&g_integrity_tls_bounds_checks, 0);
atomic_store(&g_integrity_freelist_checks, 0);
atomic_store(&g_integrity_metadata_checks, 0);
atomic_store(&g_integrity_canary_checks, 0);
atomic_store(&g_integrity_full_system_checks, 0);
}
// ============================================================================
// Priority 1: TLS Bounds Validation
// ============================================================================
IntegrityResult integrity_validate_tls_bounds(
uint8_t class_idx,
const char* context) {
atomic_fetch_add(&g_integrity_checks_performed, 1);
atomic_fetch_add(&g_integrity_tls_bounds_checks, 1);
if (class_idx >= TINY_NUM_CLASSES) {
atomic_fetch_add(&g_integrity_checks_failed, 1);
return (IntegrityResult){
.passed = false,
.check_name = "TLS_BOUNDS_OVERFLOW",
.file = __FILE__,
.line = __LINE__,
.message = "class_idx out of bounds",
.error_code = INTEGRITY_ERROR_TLS_BOUNDS_OVERFLOW
};
}
atomic_fetch_add(&g_integrity_checks_passed, 1);
return (IntegrityResult){
.passed = true,
.check_name = "TLS_BOUNDS_OK",
.file = __FILE__,
.line = __LINE__,
.message = "TLS bounds check passed",
.error_code = INTEGRITY_ERROR_OK
};
}
// ============================================================================
// Priority 2: Freelist Pointer Validation
// ============================================================================
IntegrityResult integrity_validate_freelist_ptr(
void* ptr,
void* slab_base,
void* slab_end,
uint8_t class_idx,
const char* context) {
atomic_fetch_add(&g_integrity_checks_performed, 1);
atomic_fetch_add(&g_integrity_freelist_checks, 1);
// NULL is valid (end of freelist)
if (ptr == NULL) {
atomic_fetch_add(&g_integrity_checks_passed, 1);
return (IntegrityResult){
.passed = true,
.check_name = "FREELIST_PTR_NULL",
.file = __FILE__,
.line = __LINE__,
.message = "NULL freelist pointer (valid)",
.error_code = INTEGRITY_ERROR_OK
};
}
// Check pointer is in valid range
if (ptr < slab_base || ptr >= slab_end) {
atomic_fetch_add(&g_integrity_checks_failed, 1);
return (IntegrityResult){
.passed = false,
.check_name = "FREELIST_PTR_OUT_OF_BOUNDS",
.file = __FILE__,
.line = __LINE__,
.message = "Freelist pointer outside slab bounds",
.error_code = INTEGRITY_ERROR_FREELIST_PTR_OUT_OF_BOUNDS
};
}
// Check stride alignment
size_t stride = tiny_stride_for_class(class_idx);
ptrdiff_t offset = (uint8_t*)ptr - (uint8_t*)slab_base;
if (offset % stride != 0) {
atomic_fetch_add(&g_integrity_checks_failed, 1);
return (IntegrityResult){
.passed = false,
.check_name = "FREELIST_PTR_MISALIGNED",
.file = __FILE__,
.line = __LINE__,
.message = "Freelist pointer not stride-aligned",
.error_code = INTEGRITY_ERROR_FREELIST_PTR_MISALIGNED
};
}
atomic_fetch_add(&g_integrity_checks_passed, 1);
return (IntegrityResult){
.passed = true,
.check_name = "FREELIST_PTR_OK",
.file = __FILE__,
.line = __LINE__,
.message = "Freelist pointer valid",
.error_code = INTEGRITY_ERROR_OK
};
}
// ============================================================================
// Priority 3: TLS Canary Validation
// ============================================================================
IntegrityResult integrity_validate_tls_canaries(const char* context) {
atomic_fetch_add(&g_integrity_checks_performed, 1);
atomic_fetch_add(&g_integrity_canary_checks, 1);
// Check canary before sll_head array
if (g_tls_canary_before_sll_head != TLS_CANARY_MAGIC) {
atomic_fetch_add(&g_integrity_checks_failed, 1);
return (IntegrityResult){
.passed = false,
.check_name = "CANARY_CORRUPTED_BEFORE_HEAD",
.file = __FILE__,
.line = __LINE__,
.message = "Canary before g_tls_sll_head corrupted",
.error_code = INTEGRITY_ERROR_CANARY_CORRUPTED_BEFORE_HEAD
};
}
// Check canary after sll_head array
if (g_tls_canary_after_sll_head != TLS_CANARY_MAGIC) {
atomic_fetch_add(&g_integrity_checks_failed, 1);
return (IntegrityResult){
.passed = false,
.check_name = "CANARY_CORRUPTED_AFTER_HEAD",
.file = __FILE__,
.line = __LINE__,
.message = "Canary after g_tls_sll_head corrupted",
.error_code = INTEGRITY_ERROR_CANARY_CORRUPTED_AFTER_HEAD
};
}
// Check canary before sll_count array
if (g_tls_canary_before_sll_count != TLS_CANARY_MAGIC) {
atomic_fetch_add(&g_integrity_checks_failed, 1);
return (IntegrityResult){
.passed = false,
.check_name = "CANARY_CORRUPTED_BEFORE_COUNT",
.file = __FILE__,
.line = __LINE__,
.message = "Canary before g_tls_sll_count corrupted",
.error_code = INTEGRITY_ERROR_CANARY_CORRUPTED_BEFORE_COUNT
};
}
// Check canary after sll_count array
if (g_tls_canary_after_sll_count != TLS_CANARY_MAGIC) {
atomic_fetch_add(&g_integrity_checks_failed, 1);
return (IntegrityResult){
.passed = false,
.check_name = "CANARY_CORRUPTED_AFTER_COUNT",
.file = __FILE__,
.line = __LINE__,
.message = "Canary after g_tls_sll_count corrupted",
.error_code = INTEGRITY_ERROR_CANARY_CORRUPTED_AFTER_COUNT
};
}
atomic_fetch_add(&g_integrity_checks_passed, 1);
return (IntegrityResult){
.passed = true,
.check_name = "CANARY_OK",
.file = __FILE__,
.line = __LINE__,
.message = "All canaries intact",
.error_code = INTEGRITY_ERROR_OK
};
}
// ============================================================================
// Priority ALPHA: Slab Metadata Validation (THE KEY!)
// ============================================================================
SlabMetadataState integrity_capture_slab_metadata(
const void* meta_ptr,
void* slab_base,
uint8_t class_idx) {
// Cast to TinySlabMeta type
const TinySlabMeta* meta = (const TinySlabMeta*)meta_ptr;
SlabMetadataState state = {0};
if (meta == NULL) {
// NULL metadata - return invalid state
state.carved = 0xFFFF;
state.used = 0xFFFF;
state.capacity = 0;
state.freelist = NULL;
state.slab_base = NULL;
state.class_idx = class_idx;
state.free_count = 0xFFFF;
state.is_virgin = false;
state.is_full = false;
state.is_empty = false;
return state;
}
// Capture core fields
state.carved = meta->carved;
state.used = meta->used;
state.capacity = meta->capacity;
state.freelist = meta->freelist;
state.slab_base = slab_base;
state.class_idx = class_idx;
// Compute derived fields
if (state.carved >= state.used) {
state.free_count = state.carved - state.used;
} else {
state.free_count = 0xFFFF; // Invalid!
}
state.is_virgin = (state.carved == 0);
state.is_full = (state.carved == state.capacity && state.used == state.capacity);
state.is_empty = (state.used == 0);
return state;
}
IntegrityResult integrity_validate_slab_metadata(
const SlabMetadataState* state,
const char* context) {
atomic_fetch_add(&g_integrity_checks_performed, 1);
atomic_fetch_add(&g_integrity_metadata_checks, 1);
// Check 1: carved <= capacity
if (state->carved > state->capacity) {
atomic_fetch_add(&g_integrity_checks_failed, 1);
return (IntegrityResult){
.passed = false,
.check_name = "METADATA_CARVED_OVERFLOW",
.file = __FILE__,
.line = __LINE__,
.message = "carved > capacity (slab corruption)",
.error_code = INTEGRITY_ERROR_METADATA_CARVED_OVERFLOW
};
}
// Check 2: used <= carved
if (state->used > state->carved) {
atomic_fetch_add(&g_integrity_checks_failed, 1);
return (IntegrityResult){
.passed = false,
.check_name = "METADATA_USED_GT_CARVED",
.file = __FILE__,
.line = __LINE__,
.message = "used > carved (double-free or corruption)",
.error_code = INTEGRITY_ERROR_METADATA_USED_GT_CARVED
};
}
// Check 3: used <= capacity
if (state->used > state->capacity) {
atomic_fetch_add(&g_integrity_checks_failed, 1);
return (IntegrityResult){
.passed = false,
.check_name = "METADATA_USED_OVERFLOW",
.file = __FILE__,
.line = __LINE__,
.message = "used > capacity (counter corruption)",
.error_code = INTEGRITY_ERROR_METADATA_USED_OVERFLOW
};
}
// Check 4: free_count consistency
uint16_t expected_free = state->carved - state->used;
if (state->free_count != expected_free) {
atomic_fetch_add(&g_integrity_checks_failed, 1);
return (IntegrityResult){
.passed = false,
.check_name = "METADATA_FREE_COUNT_MISMATCH",
.file = __FILE__,
.line = __LINE__,
.message = "free_count != (carved - used)",
.error_code = INTEGRITY_ERROR_METADATA_FREE_COUNT_MISMATCH
};
}
// Check 5: Capacity is reasonable (not corrupted)
// Phase E1-CORRECT FIX: Tiny classes have varying capacities:
// - Class 0 (8B): 65536/8 = 8192 blocks per slab
// - Class 1 (16B): 65536/16 = 4096
// - Class 2 (32B): 65536/32 = 2048
// - Class 3 (64B): 65536/64 = 1024
// - Class 4 (128B): 65536/128 = 512
// Use 10000 as safe upper bound (Class 0 max is 8192)
if (state->capacity > 10000) {
atomic_fetch_add(&g_integrity_checks_failed, 1);
return (IntegrityResult){
.passed = false,
.check_name = "METADATA_CAPACITY_UNREASONABLE",
.file = __FILE__,
.line = __LINE__,
.message = "capacity > 10000 (likely corrupted)",
.error_code = INTEGRITY_ERROR_METADATA_CAPACITY_UNREASONABLE
};
}
// Check 6: Freelist pointer validity
// The freelist pointer should either be:
// - NULL (linear carving mode or empty freelist)
// - A valid pointer within the slab's address range
// - NOT uninitialized garbage like 0xa2a2a2a2a2a2a2a2
if (state->freelist != NULL && state->slab_base != NULL) {
uintptr_t freelist_addr = (uintptr_t)state->freelist;
uintptr_t slab_start = (uintptr_t)state->slab_base;
// Detect obvious corruption patterns (0xa2, 0xcc, 0xdd, 0xfe are common debug fill patterns)
uint8_t* freelist_bytes = (uint8_t*)&freelist_addr;
bool is_pattern_fill = (freelist_bytes[0] == freelist_bytes[1] &&
freelist_bytes[1] == freelist_bytes[2] &&
freelist_bytes[2] == freelist_bytes[3] &&
freelist_bytes[3] == freelist_bytes[4] &&
freelist_bytes[4] == freelist_bytes[5] &&
freelist_bytes[5] == freelist_bytes[6] &&
freelist_bytes[6] == freelist_bytes[7]);
if (is_pattern_fill && (freelist_bytes[0] == 0xa2 ||
freelist_bytes[0] == 0xcc ||
freelist_bytes[0] == 0xdd ||
freelist_bytes[0] == 0xfe)) {
atomic_fetch_add(&g_integrity_checks_failed, 1);
fprintf(stderr, "[BOX I] CRITICAL: Uninitialized freelist detected!\n");
fprintf(stderr, "[BOX I] freelist=%p (pattern: 0x%02x repeated)\n",
state->freelist, freelist_bytes[0]);
fprintf(stderr, "[BOX I] carved=%u used=%u capacity=%u class=%u\n",
state->carved, state->used, state->capacity, state->class_idx);
fprintf(stderr, "[BOX I] This indicates the slab was used before proper initialization!\n");
return (IntegrityResult){
.passed = false,
.check_name = "METADATA_FREELIST_UNINITIALIZED",
.file = __FILE__,
.line = __LINE__,
.message = "freelist contains uninitialized pattern (0xa2/0xcc/0xdd/0xfe)",
.error_code = 0xA090
};
}
// Basic range check (freelist should be within reasonable address space)
// Kernel space on x86-64 starts at 0xffff800000000000
if (freelist_addr >= 0xffff800000000000UL) {
atomic_fetch_add(&g_integrity_checks_failed, 1);
return (IntegrityResult){
.passed = false,
.check_name = "METADATA_FREELIST_KERNEL_ADDR",
.file = __FILE__,
.line = __LINE__,
.message = "freelist points to kernel space (corrupted)",
.error_code = 0xA091
};
}
}
atomic_fetch_add(&g_integrity_checks_passed, 1);
return (IntegrityResult){
.passed = true,
.check_name = "METADATA_OK",
.file = __FILE__,
.line = __LINE__,
.message = "All metadata checks passed",
.error_code = INTEGRITY_ERROR_OK
};
}
// ============================================================================
// Periodic Full System Check
// ============================================================================
void integrity_periodic_full_check(const char* context) {
atomic_fetch_add(&g_integrity_full_system_checks, 1);
// Check all TLS canaries
IntegrityResult canary_result = integrity_validate_tls_canaries(context);
if (!canary_result.passed) {
fprintf(stderr, "[INTEGRITY FAILURE] Periodic check failed: %s\n",
canary_result.message);
abort();
}
// Check TLS bounds for all classes
for (uint8_t cls = 0; cls < TINY_NUM_CLASSES; cls++) {
IntegrityResult bounds_result = integrity_validate_tls_bounds(cls, context);
if (!bounds_result.passed) {
fprintf(stderr, "[INTEGRITY FAILURE] Periodic check failed for class %u: %s\n",
cls, bounds_result.message);
abort();
}
}
}
// ============================================================================
// Statistics API
// ============================================================================
IntegrityStatistics integrity_get_statistics(void) {
IntegrityStatistics stats;
stats.checks_performed = atomic_load(&g_integrity_checks_performed);
stats.checks_passed = atomic_load(&g_integrity_checks_passed);
stats.checks_failed = atomic_load(&g_integrity_checks_failed);
stats.tls_bounds_checks = atomic_load(&g_integrity_tls_bounds_checks);
stats.freelist_checks = atomic_load(&g_integrity_freelist_checks);
stats.metadata_checks = atomic_load(&g_integrity_metadata_checks);
stats.canary_checks = atomic_load(&g_integrity_canary_checks);
stats.full_system_checks = atomic_load(&g_integrity_full_system_checks);
return stats;
}
void integrity_print_statistics(void) {
IntegrityStatistics stats = integrity_get_statistics();
fprintf(stderr, "\n=== Box I: Integrity Statistics ===\n");
fprintf(stderr, "Total checks performed: %lu\n", stats.checks_performed);
fprintf(stderr, " Passed: %lu (%.2f%%)\n", stats.checks_passed,
stats.checks_performed > 0 ? 100.0 * stats.checks_passed / stats.checks_performed : 0.0);
fprintf(stderr, " Failed: %lu (%.2f%%)\n", stats.checks_failed,
stats.checks_performed > 0 ? 100.0 * stats.checks_failed / stats.checks_performed : 0.0);
fprintf(stderr, "\nBy check type:\n");
fprintf(stderr, " TLS bounds checks: %lu\n", stats.tls_bounds_checks);
fprintf(stderr, " Freelist checks: %lu\n", stats.freelist_checks);
fprintf(stderr, " Metadata checks: %lu (Priority ALPHA)\n", stats.metadata_checks);
fprintf(stderr, " Canary checks: %lu\n", stats.canary_checks);
fprintf(stderr, " Full system checks: %lu\n", stats.full_system_checks);
fprintf(stderr, "===================================\n\n");
}