Add Box 3 (Pointer Conversion Layer) and fix POOL_TLS_PHASE1 default

## Major Changes

### 1. Box 3: Pointer Conversion Module (NEW)
- File: core/box/ptr_conversion_box.h
- Purpose: Unified BASE ↔ USER pointer conversion (single source of truth)
- API: PTR_BASE_TO_USER(), PTR_USER_TO_BASE()
- Features: Zero-overhead inline, debug mode, NULL-safe, class 7 headerless support
- Design: Header-only, fully modular, no external dependencies

### 2. POOL_TLS_PHASE1 Default OFF (CRITICAL FIX)
- File: build.sh
- Change: POOL_TLS_PHASE1 now defaults to 0 (was hardcoded to 1)
- Impact: Eliminates pthread_mutex overhead on every free() (was causing 3.3x slowdown)
- Usage: Set POOL_TLS_PHASE1=1 env var to enable if needed

### 3. Pointer Conversion Fixes (PARTIAL)
- Files: core/box/front_gate_box.c, core/tiny_alloc_fast.inc.h, etc.
- Status: Partial implementation using Box 3 API
- Note: Work in progress, some conversions still need review

### 4. Performance Investigation Report (NEW)
- File: HOTPATH_PERFORMANCE_INVESTIGATION.md
- Findings:
  - Hotpath works (+24% vs baseline) after POOL_TLS fix
  - Still 9.2x slower than system malloc due to:
    * Heavy initialization (23.85% of cycles)
    * Syscall overhead (2,382 syscalls per 100K ops)
    * Workload mismatch (C7 1KB is 49.8%, but only C5 256B has hotpath)
    * 9.4x more instructions than system malloc

### 5. Known Issues
- SEGV at 20K-30K iterations (pre-existing bug, not related to pointer conversions)
- Root cause: Likely active counter corruption or TLS-SLL chain issues
- Status: Under investigation

## Performance Results (100K iterations, 256B)
- Baseline (Hotpath OFF): 7.22M ops/s
- Hotpath ON: 8.98M ops/s (+24% improvement ✓)
- System malloc: 82.2M ops/s (still 9.2x faster)

## Next Steps
- P0: Fix 20K-30K SEGV bug (GDB investigation needed)
- P1: Lazy initialization (+20-25% expected)
- P1: C7 (1KB) hotpath (+30-40% expected, biggest win)
- P2: Reduce syscalls (+15-20% expected)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Moe Charm (CI)
2025-11-12 01:01:23 +09:00
parent 862e8ea7db
commit 6859d589ea
13 changed files with 759 additions and 52 deletions

View File

@ -57,7 +57,8 @@ static inline int tls_refill_from_tls_slab(int class_idx, TinyTLSList* tls, uint
if (want == 0u || want > room) want = room;
if (want == 0u) return 0;
size_t block_size = g_tiny_class_sizes[class_idx];
// Use stride (class_size + header for C0-6, headerless for C7)
size_t block_stride = tiny_stride_for_class(class_idx);
// Header-aware TLS list next offset for chains we build here
#if HAKMEM_TINY_HEADER_CLASSIDX
const size_t next_off_tls = (class_idx == 7) ? 0 : 1;
@ -105,7 +106,8 @@ static inline int tls_refill_from_tls_slab(int class_idx, TinyTLSList* tls, uint
if (superslab_refill(class_idx) == NULL) break;
meta = tls_slab->meta;
if (!meta) break;
block_size = g_tiny_class_sizes[class_idx];
// Refresh stride/base after refill
block_stride = tiny_stride_for_class(class_idx);
slab_base = tls_slab->slab_base ? tls_slab->slab_base
: (tls_slab->ss ? tiny_slab_base_for(tls_slab->ss, tls_slab->slab_idx) : NULL);
continue;
@ -119,12 +121,12 @@ static inline int tls_refill_from_tls_slab(int class_idx, TinyTLSList* tls, uint
if (!slab_base) {
slab_base = tiny_slab_base_for(tls_slab->ss, tls_slab->slab_idx);
}
uint8_t* base_cursor = slab_base + ((size_t)meta->used * block_size);
uint8_t* base_cursor = slab_base + ((size_t)meta->used * block_stride);
void* local_head = (void*)base_cursor;
uint8_t* cursor = base_cursor;
for (uint32_t i = 1; i < need; ++i) {
uint8_t* next = cursor + block_size;
uint8_t* next = cursor + block_stride;
*(void**)(cursor + next_off_tls) = (void*)next;
cursor = next;
}