Files
hakmem/core/hakmem_pool.h
Moe Charm (CI) 644e3c30d1 feat(Phase 2-1): Lane Classification + Fallback Reduction
## Phase 2-1: Lane Classification Box (Single Source of Truth)

### New Module: hak_lane_classify.inc.h
- Centralized size-to-lane mapping with unified boundary definitions
- Lane architecture:
  - LANE_TINY:  [0, 1024B]      SuperSlab (unchanged)
  - LANE_POOL:  [1025, 52KB]    Pool per-thread (extended!)
  - LANE_ACE:   [52KB, 2MB]     ACE learning
  - LANE_HUGE:  [2MB+]          mmap direct
- Key invariant: POOL_MIN = TINY_MAX + 1 (no gaps)

### Fixed: Tiny/Pool Boundary Mismatch
- Before: TINY_MAX_SIZE=1024 vs tiny_get_max_size()=2047 (inconsistent!)
- After:  Both reference LANE_TINY_MAX=1024 (authoritative)
- Impact: Eliminates 1025-2047B "unmanaged zone" causing libc fragmentation

### Updated Files
- core/hakmem_tiny.h: Use LANE_TINY_MAX, fix sizes[7]=1024 (was 2047)
- core/hakmem_pool.h: Use POOL_MIN_REQUEST_SIZE=1025 (was 2048)
- core/box/hak_alloc_api.inc.h: Lane-based routing (HAK_LANE_IS_*)

## jemalloc Block Bug Fix

### Root Cause
- g_jemalloc_loaded initialized to -1 (unknown)
- Condition `if (block && g_jemalloc_loaded)` treated -1 as true
- Result: ALL allocations fallback to libc (even when jemalloc not loaded!)

### Fix
- Change condition to `g_jemalloc_loaded > 0`
- Only fallback when jemalloc is ACTUALLY loaded
- Applied to: malloc/free/calloc/realloc

### Impact
- Before: 100% libc fallback (jemalloc block false positive)
- After:  Only genuine cases fallback (init_wait, lockdepth, etc.)

## Fallback Diagnostics (ChatGPT contribution)

### New Feature: HAKMEM_WRAP_DIAG
- ENV flag to enable fallback logging
- Reason-specific counters (init_wait, jemalloc_block, lockdepth, etc.)
- First 4 occurrences logged per reason
- Helps identify unwanted fallback paths

### Implementation
- core/box/wrapper_env_box.{c,h}: ENV cache + DIAG flag
- core/box/hak_wrappers.inc.h: wrapper_record_fallback() calls

## Verification

### Fallback Reduction
- Before fix: [wrap] libc malloc: jemalloc block (100% fallback)
- After fix:  Only init_wait + lockdepth (expected, minimal)

### Known Issue
- Tiny allocator OOM (size=8) still crashes
- This is a pre-existing bug, unrelated to Phase 2-1
- Was hidden by jemalloc block false positive
- Will be investigated separately

## Performance Impact

### sh8bench 8 threads
- Phase 1-1: 15秒
- Phase 2-1: 14秒 (~7% improvement)

### Note
- True hakmem performance now measurable (no more 100% fallback)
- Tiny OOM prevents full benchmark completion
- Next: Fix Tiny allocator for complete evaluation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: ChatGPT <chatgpt@openai.com>
2025-12-02 19:13:28 +09:00

128 lines
5.4 KiB
C
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

// hakmem_pool.h - L2 Hybrid Pool (1KB-52KB Mid-Size Allocations)
// Purpose: Per-thread pool with site-based sharding for mid-size fast-path
//
// Design Philosophy:
// - **7 size classes**: 2KiB, 4KiB, 8KiB, 16KiB, 32KiB, 40KiB, 52KiB
// - **64KiB pool pages**: 32 blocks (2KiB), 16 blocks (4KiB), 8 blocks (8KiB), etc.
// - **per-thread freelist**: Lock-free allocation (mimalloc strategy)
// - **O(1) site→shard mapping**: `shard = (pc >> 4) & (SHARDS-1)`
// - **MPSC queue**: Remote-free handling (cross-thread deallocation)
//
// Phase 2 Update:
// - Pool now accepts requests from 1025B (LANE_POOL_MIN) to 52KB
// - Requests 1025-2047B are rounded up to 2KB class (internal fragmentation OK)
// - This eliminates the "unmanaged zone" between Tiny (1024B) and Pool (was 2KB)
//
// Target Workloads:
// - mir (medium): 2-32KiB allocations → +52% → target +10-20%
// - mixed: combination → +66% → target +10-25%
//
// Integration: Called by hakmem.c for sizes > LANE_TINY_MAX (1024B)
//
// License: MIT
// Date: 2025-10-21 (Phase 2 Update: 2025-12-02)
#ifndef HAKMEM_POOL_H
#define HAKMEM_POOL_H
#include <stddef.h>
#include <stdint.h>
// Phase 2: Lane Classification Box (Single Source of Truth for boundaries)
#include "box/hak_lane_classify.inc.h"
// ===========================================================================
// Configuration Constants
// ===========================================================================
#define POOL_NUM_CLASSES 7 // 2KiB, 4KiB, 8KiB, 16KiB, 32KiB, 40KiB, 52KiB
#define POOL_PAGE_SIZE (64 * 1024) // 64KiB per pool page
#define POOL_NUM_SHARDS 64 // Site-based sharding (power of 2)
// Size class boundaries (in bytes) - actual block sizes
#define POOL_CLASS_2KB (2 * 1024)
#define POOL_CLASS_4KB (4 * 1024)
#define POOL_CLASS_8KB (8 * 1024)
#define POOL_CLASS_16KB (16 * 1024)
#define POOL_CLASS_32KB (32 * 1024)
#define POOL_CLASS_40KB (40 * 1024) // Phase 6.21: Bridge class 0
#define POOL_CLASS_52KB (52 * 1024) // Phase 6.21: Bridge class 1
// ===========================================================================
// Phase 2: Request Size vs Block Size (separate concepts!)
// ===========================================================================
//
// POOL_MIN_SIZE: Smallest USER REQUEST Pool accepts (= LANE_POOL_MIN = 1025)
// POOL_MIN_CLASS: Smallest BLOCK SIZE Pool allocates (= 2KB)
//
// Example: request=1056B -> class=2KB (internal fragmentation ~48%, acceptable)
// This is better than libc fragmentation from mmap fallback!
// Request boundary (from lane classification - Single Source of Truth)
#define POOL_MIN_SIZE POOL_MIN_REQUEST_SIZE // = 1025 (LANE_TINY_MAX + 1)
#define POOL_MAX_SIZE LANE_POOL_MAX // = 52KB
// Block class boundary (internal, for size-to-class mapping)
#define POOL_MIN_CLASS POOL_CLASS_2KB // Smallest actual block = 2KB
// Remote-free drain threshold
#define POOL_REMOTE_DRAIN_THRESHOLD 16 // Drain every N allocs
// ===========================================================================
// Public API
// ===========================================================================
// Initialize pool system (called by hak_init)
void hak_pool_init(void);
// Shutdown pool system and release all pages
void hak_pool_shutdown(void);
// Try to allocate from pool (returns NULL if size not in range)
// Args: size - requested allocation size (2-32KiB)
// site_id - call-site address (for shard selection)
// Returns: Pointer to allocated block, or NULL if pool unavailable
void* hak_pool_try_alloc(size_t size, uintptr_t site_id);
// Free block back to pool
// Args: ptr - pointer to block (from hak_pool_try_alloc)
// size - original allocation size (for class determination)
// site_id - call-site address (for shard routing)
void hak_pool_free(void* ptr, size_t size, uintptr_t site_id);
// Mid fast-path helpers (headerless route)
// Returns 1 if ptr belongs to Mid pool (132KiB). When out_size is non-NULL,
// fills the class size in bytes.
int hak_pool_mid_lookup(void* ptr, size_t* out_size);
// Free using Mid page descriptors (no header read). Safe when HDR_LIGHT=2.
void hak_pool_free_fast(void* ptr, uintptr_t site_id);
// Print pool statistics (called by hak_shutdown)
void hak_pool_print_stats(void);
// Stats snapshot (per-class counters). Arrays must have length POOL_NUM_CLASSES.
void hak_pool_stats_snapshot(uint64_t hits[], uint64_t misses[], uint64_t refills[], uint64_t frees[]);
// Extra metrics snapshot for learner logging (monotonic counters)
// Outputs: trylock_attempts, trylock_success, ring_underflow (may be NULL if not needed)
void hak_pool_extra_metrics_snapshot(uint64_t* trylock_attempts, uint64_t* trylock_success, uint64_t* ring_underflow);
// ===========================================================================
// Internal Helpers (for testing/debugging)
// ===========================================================================
// Phase 6.10.1: hak_pool_get_class_index() is now static inline (hakmem_pool.c:70)
// Removed from public API (no longer needed in header)
// Get shard index from site_id (0-63)
int hak_pool_get_shard_index(uintptr_t site_id);
// Check if size is poolable (1025B-52KB range, Phase 2 expanded)
// Phase 2: Now accepts 1025B+ (was 2KB+) to eliminate unmanaged zone
static inline int hak_pool_is_poolable(size_t size) {
return size >= POOL_MIN_SIZE && size <= POOL_MAX_SIZE;
}
#endif // HAKMEM_POOL_H