Add Box 3 (Pointer Conversion Layer) and fix POOL_TLS_PHASE1 default
## Major Changes
### 1. Box 3: Pointer Conversion Module (NEW)
- File: core/box/ptr_conversion_box.h
- Purpose: Unified BASE ↔ USER pointer conversion (single source of truth)
- API: PTR_BASE_TO_USER(), PTR_USER_TO_BASE()
- Features: Zero-overhead inline, debug mode, NULL-safe, class 7 headerless support
- Design: Header-only, fully modular, no external dependencies
### 2. POOL_TLS_PHASE1 Default OFF (CRITICAL FIX)
- File: build.sh
- Change: POOL_TLS_PHASE1 now defaults to 0 (was hardcoded to 1)
- Impact: Eliminates pthread_mutex overhead on every free() (was causing 3.3x slowdown)
- Usage: Set POOL_TLS_PHASE1=1 env var to enable if needed
### 3. Pointer Conversion Fixes (PARTIAL)
- Files: core/box/front_gate_box.c, core/tiny_alloc_fast.inc.h, etc.
- Status: Partial implementation using Box 3 API
- Note: Work in progress, some conversions still need review
### 4. Performance Investigation Report (NEW)
- File: HOTPATH_PERFORMANCE_INVESTIGATION.md
- Findings:
- Hotpath works (+24% vs baseline) after POOL_TLS fix
- Still 9.2x slower than system malloc due to:
* Heavy initialization (23.85% of cycles)
* Syscall overhead (2,382 syscalls per 100K ops)
* Workload mismatch (C7 1KB is 49.8%, but only C5 256B has hotpath)
* 9.4x more instructions than system malloc
### 5. Known Issues
- SEGV at 20K-30K iterations (pre-existing bug, not related to pointer conversions)
- Root cause: Likely active counter corruption or TLS-SLL chain issues
- Status: Under investigation
## Performance Results (100K iterations, 256B)
- Baseline (Hotpath OFF): 7.22M ops/s
- Hotpath ON: 8.98M ops/s (+24% improvement ✓)
- System malloc: 82.2M ops/s (still 9.2x faster)
## Next Steps
- P0: Fix 20K-30K SEGV bug (GDB investigation needed)
- P1: Lazy initialization (+20-25% expected)
- P1: C7 (1KB) hotpath (+30-40% expected, biggest win)
- P2: Reduce syscalls (+15-20% expected)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@ -153,8 +153,12 @@ static inline void* tiny_fast_refill_and_take(int class_idx, TinyTLSList* tls) {
|
||||
g_front_fc_miss[class_idx]++;
|
||||
}
|
||||
}
|
||||
void* direct = tiny_fast_pop(class_idx);
|
||||
if (direct) return direct;
|
||||
// For class5 hotpath, skip direct Front (SFC/SLL) and rely on TLS List path
|
||||
extern int g_tiny_hotpath_class5;
|
||||
if (!(g_tiny_hotpath_class5 && class_idx == 5)) {
|
||||
void* direct = tiny_fast_pop(class_idx);
|
||||
if (direct) return direct;
|
||||
}
|
||||
uint16_t cap = g_fast_cap[class_idx];
|
||||
if (cap == 0) return NULL;
|
||||
uint16_t count = g_fast_count[class_idx];
|
||||
@ -190,16 +194,27 @@ static inline void* tiny_fast_refill_and_take(int class_idx, TinyTLSList* tls) {
|
||||
// Headerless array stack for hottest tiny classes
|
||||
pushed = fastcache_push(class_idx, node);
|
||||
} else {
|
||||
pushed = tiny_fast_push(class_idx, node);
|
||||
// For class5 hotpath, keep leftovers in TLS List (not SLL)
|
||||
extern int g_tiny_hotpath_class5;
|
||||
if (__builtin_expect(g_tiny_hotpath_class5 && class_idx == 5, 0)) {
|
||||
tls_list_push_fast(tls, node, 5);
|
||||
pushed = 1;
|
||||
} else {
|
||||
pushed = tiny_fast_push(class_idx, node);
|
||||
}
|
||||
}
|
||||
if (pushed) { node = next; remaining--; }
|
||||
else {
|
||||
// Push failed, return remaining to TLS (preserve order)
|
||||
tls_list_bulk_put(tls, node, batch_tail, remaining, class_idx);
|
||||
return ret;
|
||||
// CRITICAL FIX: Convert base -> user pointer before returning
|
||||
void* user_ptr = (class_idx == 7) ? ret : (void*)((uint8_t*)ret + 1);
|
||||
return user_ptr;
|
||||
}
|
||||
}
|
||||
return ret;
|
||||
// CRITICAL FIX: Convert base -> user pointer before returning
|
||||
void* user_ptr = (class_idx == 7) ? ret : (void*)((uint8_t*)ret + 1);
|
||||
return user_ptr;
|
||||
}
|
||||
|
||||
// Quick slot refill from SLL
|
||||
|
||||
Reference in New Issue
Block a user