feat(Phase 2-1): Lane Classification + Fallback Reduction

## Phase 2-1: Lane Classification Box (Single Source of Truth)

### New Module: hak_lane_classify.inc.h
- Centralized size-to-lane mapping with unified boundary definitions
- Lane architecture:
  - LANE_TINY:  [0, 1024B]      SuperSlab (unchanged)
  - LANE_POOL:  [1025, 52KB]    Pool per-thread (extended!)
  - LANE_ACE:   [52KB, 2MB]     ACE learning
  - LANE_HUGE:  [2MB+]          mmap direct
- Key invariant: POOL_MIN = TINY_MAX + 1 (no gaps)

### Fixed: Tiny/Pool Boundary Mismatch
- Before: TINY_MAX_SIZE=1024 vs tiny_get_max_size()=2047 (inconsistent!)
- After:  Both reference LANE_TINY_MAX=1024 (authoritative)
- Impact: Eliminates 1025-2047B "unmanaged zone" causing libc fragmentation

### Updated Files
- core/hakmem_tiny.h: Use LANE_TINY_MAX, fix sizes[7]=1024 (was 2047)
- core/hakmem_pool.h: Use POOL_MIN_REQUEST_SIZE=1025 (was 2048)
- core/box/hak_alloc_api.inc.h: Lane-based routing (HAK_LANE_IS_*)

## jemalloc Block Bug Fix

### Root Cause
- g_jemalloc_loaded initialized to -1 (unknown)
- Condition `if (block && g_jemalloc_loaded)` treated -1 as true
- Result: ALL allocations fallback to libc (even when jemalloc not loaded!)

### Fix
- Change condition to `g_jemalloc_loaded > 0`
- Only fallback when jemalloc is ACTUALLY loaded
- Applied to: malloc/free/calloc/realloc

### Impact
- Before: 100% libc fallback (jemalloc block false positive)
- After:  Only genuine cases fallback (init_wait, lockdepth, etc.)

## Fallback Diagnostics (ChatGPT contribution)

### New Feature: HAKMEM_WRAP_DIAG
- ENV flag to enable fallback logging
- Reason-specific counters (init_wait, jemalloc_block, lockdepth, etc.)
- First 4 occurrences logged per reason
- Helps identify unwanted fallback paths

### Implementation
- core/box/wrapper_env_box.{c,h}: ENV cache + DIAG flag
- core/box/hak_wrappers.inc.h: wrapper_record_fallback() calls

## Verification

### Fallback Reduction
- Before fix: [wrap] libc malloc: jemalloc block (100% fallback)
- After fix:  Only init_wait + lockdepth (expected, minimal)

### Known Issue
- Tiny allocator OOM (size=8) still crashes
- This is a pre-existing bug, unrelated to Phase 2-1
- Was hidden by jemalloc block false positive
- Will be investigated separately

## Performance Impact

### sh8bench 8 threads
- Phase 1-1: 15秒
- Phase 2-1: 14秒 (~7% improvement)

### Note
- True hakmem performance now measurable (no more 100% fallback)
- Tiny OOM prevents full benchmark completion
- Next: Fix Tiny allocator for complete evaluation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: ChatGPT <chatgpt@openai.com>
This commit is contained in:
Moe Charm (CI)
2025-12-02 19:13:28 +09:00
parent 695aec8279
commit 644e3c30d1
7 changed files with 420 additions and 61 deletions

View File

@ -1,8 +1,10 @@
// hak_alloc_api.inc.h — Box: hak_alloc_at() implementation
// Phase 2 Update: Lane-based allocation routing (Single Source of Truth)
#ifndef HAK_ALLOC_API_INC_H
#define HAK_ALLOC_API_INC_H
#include "../hakmem_tiny.h" // For tiny_get_max_size() (Phase 16)
#include "../hakmem_tiny.h" // For tiny_get_max_size() + hak_lane_classify.inc.h
#include "../hakmem_pool.h" // Phase 2: For hak_pool_try_alloc() (Pool lane 1025B-52KB)
#include "../hakmem_smallmid.h" // For Small-Mid Front Box (Phase 17-1)
#ifdef HAKMEM_POOL_TLS_PHASE1
@ -106,15 +108,29 @@ inline void* hak_alloc_at(size_t size, hak_callsite_t site) {
hkm_size_hist_record(size);
// =========================================================================
// Phase 2: Pool Lane (LANE_POOL: 1025B-52KB)
// =========================================================================
// Key fix: Route 1025-52KB to Pool BEFORE ACE
// This eliminates the "unmanaged zone" (1025-2047B) that caused libc fragmentation
//
// Pool has 2KB as smallest class, so 1025-2047B requests use 2KB class
// (internal fragmentation ~48%, but better than libc fragmentation!)
if (HAK_LANE_IS_POOL(size)) {
#ifdef HAKMEM_POOL_TLS_PHASE1
// Phase 1: Ultra-fast Pool TLS for 8KB-52KB range
if (size >= 8192 && size <= 53248) {
void* pool_ptr = pool_alloc(size);
// PERF_OPT: likely hint - pool allocations usually succeed
if (__builtin_expect(pool_ptr != NULL, 1)) return pool_ptr;
// Fall through to existing Mid allocator as fallback
}
// Pool TLS fast path (8KB-52KB only, pool_tls.c classes)
if (size >= 8192 && size <= 53248) {
void* pool_ptr = pool_alloc(size);
if (__builtin_expect(pool_ptr != NULL, 1)) return pool_ptr;
}
#endif
// Pool API path (1025B-52KB, hakmem_pool.c classes including 2KB)
// This catches 1025-8191B range that Pool TLS doesn't handle
void* pool_try = hak_pool_try_alloc(size, site_id);
if (__builtin_expect(pool_try != NULL, 1)) return pool_try;
// Fall through to ACE if Pool fails
}
#if HAKMEM_FEATURE_EVOLUTION
if (g_evo_sample_mask > 0) {
@ -155,7 +171,13 @@ inline void* hak_alloc_at(size_t size, hak_callsite_t site) {
#endif
}
if (size > TINY_MAX_SIZE && size < threshold) {
// =========================================================================
// Phase 2: ACE Lane (LANE_ACE: 52KB-2MB) + HUGE Lane (2MB+)
// =========================================================================
// ACE handles sizes between Pool max (52KB) and huge threshold (2MB)
// Sizes > 2MB go directly to mmap (LANE_HUGE)
if (HAK_LANE_IS_ACE(size) || size > LANE_POOL_MAX) {
const FrozenPolicy* pol = hkm_policy_get();
#if HAKMEM_DEBUG_TIMING
HKM_TIME_START(t_ace);
@ -167,46 +189,41 @@ inline void* hak_alloc_at(size_t size, hak_callsite_t site) {
if (l1) return l1;
}
// PHASE 7 CRITICAL FIX: Handle allocation gap (1KB-8KB) when ACE is disabled
// Size range:
// 0-1024: Tiny allocator
// 1025-8191: Gap! (Mid starts at 8KB, ACE often disabled)
// 8KB-32KB: Mid allocator
// 32KB-2MB: ACE (if enabled, otherwise mmap)
// 2MB+: mmap
//
// Solution: Use mmap for gap when ACE failed (ACE disabled or OOM)
// =========================================================================
// Phase 2: Final Fallback (mmap) - should be rare after Pool fix
// =========================================================================
// With Phase 2 Pool extension, 1025-52KB should be handled by Pool
// This fallback is for:
// - LANE_HUGE (2MB+): Normal mmap path
// - Pool/ACE failures: Emergency fallback
// - LANE_TINY failures: Should not happen (design bug)
// Track final fallback mmaps globally
extern _Atomic uint64_t g_final_fallback_mmap_count;
void* ptr;
if (size >= threshold) {
// Large allocation (>= 2MB default): descend via single boundary
if (HAK_LANE_IS_HUGE(size)) {
// LANE_HUGE: Normal path for 2MB+ allocations
atomic_fetch_add(&g_final_fallback_mmap_count, 1);
ptr = hak_os_map_boundary(size, site_id);
} else if (size >= TINY_MAX_SIZE) {
// Mid-range allocation (1KB-2MB): try mmap as final fallback
// This handles the gap when ACE is disabled or failed
} else if (size > LANE_TINY_MAX) {
// Pool or ACE failed for 1025B-2MB range - emergency mmap fallback
atomic_fetch_add(&g_final_fallback_mmap_count, 1);
static _Atomic int gap_alloc_count = 0;
int count = atomic_fetch_add(&gap_alloc_count, 1);
#if HAKMEM_DEBUG_VERBOSE
if (count < 3) fprintf(stderr, "[HAKMEM] INFO: mid-gap fallback size=%zu\n", size);
#if !HAKMEM_BUILD_RELEASE
if (count < 5) {
fprintf(stderr, "[HAKMEM] Phase 2 WARN: Pool/ACE fallback size=%zu (should be rare)\n", size);
}
#endif
ptr = hak_os_map_boundary(size, site_id);
} else {
// Should never reach here (size <= TINY_MAX_SIZE should be handled by Tiny)
// LANE_TINY failed - this is a design bug!
HAK_LANE_ASSERT_NO_FALLBACK(LANE_FALLBACK, size);
static _Atomic int oom_count = 0;
int count = atomic_fetch_add(&oom_count, 1);
if (count < 10) {
fprintf(stderr, "[HAKMEM] OOM: Unexpected allocation path for size=%zu, returning NULL\n", size);
fprintf(stderr, "[HAKMEM] (OOM count: %d) This should not happen!\n", count + 1);
fprintf(stderr, "[HAKMEM] BUG: Tiny lane failed for size=%zu (should not happen)\n", size);
}
#if HAKMEM_DEBUG_TIMING
HKM_TIME_START(t_malloc);
HKM_TIME_END(HKM_CAT_FALLBACK_MALLOC, t_malloc); // Keep timing for compatibility
#endif
errno = ENOMEM;
return NULL;
}