Add Box 3 (Pointer Conversion Layer) and fix POOL_TLS_PHASE1 default

## Major Changes

### 1. Box 3: Pointer Conversion Module (NEW)
- File: core/box/ptr_conversion_box.h
- Purpose: Unified BASE ↔ USER pointer conversion (single source of truth)
- API: PTR_BASE_TO_USER(), PTR_USER_TO_BASE()
- Features: Zero-overhead inline, debug mode, NULL-safe, class 7 headerless support
- Design: Header-only, fully modular, no external dependencies

### 2. POOL_TLS_PHASE1 Default OFF (CRITICAL FIX)
- File: build.sh
- Change: POOL_TLS_PHASE1 now defaults to 0 (was hardcoded to 1)
- Impact: Eliminates pthread_mutex overhead on every free() (was causing 3.3x slowdown)
- Usage: Set POOL_TLS_PHASE1=1 env var to enable if needed

### 3. Pointer Conversion Fixes (PARTIAL)
- Files: core/box/front_gate_box.c, core/tiny_alloc_fast.inc.h, etc.
- Status: Partial implementation using Box 3 API
- Note: Work in progress, some conversions still need review

### 4. Performance Investigation Report (NEW)
- File: HOTPATH_PERFORMANCE_INVESTIGATION.md
- Findings:
  - Hotpath works (+24% vs baseline) after POOL_TLS fix
  - Still 9.2x slower than system malloc due to:
    * Heavy initialization (23.85% of cycles)
    * Syscall overhead (2,382 syscalls per 100K ops)
    * Workload mismatch (C7 1KB is 49.8%, but only C5 256B has hotpath)
    * 9.4x more instructions than system malloc

### 5. Known Issues
- SEGV at 20K-30K iterations (pre-existing bug, not related to pointer conversions)
- Root cause: Likely active counter corruption or TLS-SLL chain issues
- Status: Under investigation

## Performance Results (100K iterations, 256B)
- Baseline (Hotpath OFF): 7.22M ops/s
- Hotpath ON: 8.98M ops/s (+24% improvement ✓)
- System malloc: 82.2M ops/s (still 9.2x faster)

## Next Steps
- P0: Fix 20K-30K SEGV bug (GDB investigation needed)
- P1: Lazy initialization (+20-25% expected)
- P1: C7 (1KB) hotpath (+30-40% expected, biggest win)
- P2: Reduce syscalls (+15-20% expected)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Moe Charm (CI)
2025-11-12 01:01:23 +09:00
parent 862e8ea7db
commit 6859d589ea
13 changed files with 759 additions and 52 deletions

View File

@ -1,4 +1,6 @@
// hakmem_tiny_init.inc
// Note: uses TLS ops inline helpers for prewarm when class5 hotpath is enabled
#include "hakmem_tiny_tls_ops.h"
// Phase 2D-2: Initialization function extraction
//
// This file contains the hak_tiny_init() function extracted from hakmem_tiny.c
@ -12,6 +14,15 @@ void hak_tiny_init(void) {
// Step 1: Simple initialization (static global is already zero-initialized)
g_tiny_initialized = 1;
// Hot-class toggle: class5 (256B) dedicated TLS fast path
// Default ON; allow runtime override via HAKMEM_TINY_HOTPATH_CLASS5
{
const char* hp5 = getenv("HAKMEM_TINY_HOTPATH_CLASS5");
if (hp5 && *hp5) {
g_tiny_hotpath_class5 = (atoi(hp5) != 0) ? 1 : 0;
}
}
// Reset fast-cache defaults and apply preset (if provided)
tiny_config_reset_defaults();
char* preset_env = getenv("HAKMEM_TINY_PRESET");
@ -89,6 +100,37 @@ void hak_tiny_init(void) {
tls->spill_high = tiny_tls_default_spill(base_cap);
tiny_tls_publish_targets(i, base_cap);
}
// Optional: override TLS parameters for hot class 5 (256B)
if (g_tiny_hotpath_class5) {
TinyTLSList* tls5 = &g_tls_lists[5];
int cap_def = 512; // thick cache for hot class
int refill_def = 128; // refill low-water mark
int spill_def = 0; // 0 → use cap as hard spill threshold
const char* ecap = getenv("HAKMEM_TINY_CLASS5_TLS_CAP");
const char* eref = getenv("HAKMEM_TINY_CLASS5_TLS_REFILL");
const char* espl = getenv("HAKMEM_TINY_CLASS5_TLS_SPILL");
if (ecap && *ecap) cap_def = atoi(ecap);
if (eref && *eref) refill_def = atoi(eref);
if (espl && *espl) spill_def = atoi(espl);
if (cap_def < 64) cap_def = 64; if (cap_def > 4096) cap_def = 4096;
if (refill_def < 16) refill_def = 16; if (refill_def > cap_def) refill_def = cap_def;
if (spill_def < 0) spill_def = 0; if (spill_def > cap_def) spill_def = cap_def;
tls5->cap = (uint32_t)cap_def;
tls5->refill_low = (uint32_t)refill_def;
tls5->spill_high = (uint32_t)spill_def; // 0 → use cap logic in helper
tiny_tls_publish_targets(5, (uint32_t)cap_def);
// Optional: one-shot TLS prewarm for class5
// Env: HAKMEM_TINY_CLASS5_PREWARM=<n> (default 128, 0 disables)
int prewarm = 128;
const char* pw = getenv("HAKMEM_TINY_CLASS5_PREWARM");
if (pw && *pw) prewarm = atoi(pw);
if (prewarm < 0) prewarm = 0;
if (prewarm > (int)tls5->cap) prewarm = (int)tls5->cap;
if (prewarm > 0) {
(void)tls_refill_from_tls_slab(5, tls5, (uint32_t)prewarm);
}
}
if (mem_diet_enabled) {
tiny_apply_mem_diet();
}