## Phase 2-1: Lane Classification Box (Single Source of Truth)
### New Module: hak_lane_classify.inc.h
- Centralized size-to-lane mapping with unified boundary definitions
- Lane architecture:
- LANE_TINY: [0, 1024B] SuperSlab (unchanged)
- LANE_POOL: [1025, 52KB] Pool per-thread (extended!)
- LANE_ACE: [52KB, 2MB] ACE learning
- LANE_HUGE: [2MB+] mmap direct
- Key invariant: POOL_MIN = TINY_MAX + 1 (no gaps)
### Fixed: Tiny/Pool Boundary Mismatch
- Before: TINY_MAX_SIZE=1024 vs tiny_get_max_size()=2047 (inconsistent!)
- After: Both reference LANE_TINY_MAX=1024 (authoritative)
- Impact: Eliminates 1025-2047B "unmanaged zone" causing libc fragmentation
### Updated Files
- core/hakmem_tiny.h: Use LANE_TINY_MAX, fix sizes[7]=1024 (was 2047)
- core/hakmem_pool.h: Use POOL_MIN_REQUEST_SIZE=1025 (was 2048)
- core/box/hak_alloc_api.inc.h: Lane-based routing (HAK_LANE_IS_*)
## jemalloc Block Bug Fix
### Root Cause
- g_jemalloc_loaded initialized to -1 (unknown)
- Condition `if (block && g_jemalloc_loaded)` treated -1 as true
- Result: ALL allocations fallback to libc (even when jemalloc not loaded!)
### Fix
- Change condition to `g_jemalloc_loaded > 0`
- Only fallback when jemalloc is ACTUALLY loaded
- Applied to: malloc/free/calloc/realloc
### Impact
- Before: 100% libc fallback (jemalloc block false positive)
- After: Only genuine cases fallback (init_wait, lockdepth, etc.)
## Fallback Diagnostics (ChatGPT contribution)
### New Feature: HAKMEM_WRAP_DIAG
- ENV flag to enable fallback logging
- Reason-specific counters (init_wait, jemalloc_block, lockdepth, etc.)
- First 4 occurrences logged per reason
- Helps identify unwanted fallback paths
### Implementation
- core/box/wrapper_env_box.{c,h}: ENV cache + DIAG flag
- core/box/hak_wrappers.inc.h: wrapper_record_fallback() calls
## Verification
### Fallback Reduction
- Before fix: [wrap] libc malloc: jemalloc block (100% fallback)
- After fix: Only init_wait + lockdepth (expected, minimal)
### Known Issue
- Tiny allocator OOM (size=8) still crashes
- This is a pre-existing bug, unrelated to Phase 2-1
- Was hidden by jemalloc block false positive
- Will be investigated separately
## Performance Impact
### sh8bench 8 threads
- Phase 1-1: 15秒
- Phase 2-1: 14秒 (~7% improvement)
### Note
- True hakmem performance now measurable (no more 100% fallback)
- Tiny OOM prevents full benchmark completion
- Next: Fix Tiny allocator for complete evaluation
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: ChatGPT <chatgpt@openai.com>
128 lines
5.4 KiB
C
128 lines
5.4 KiB
C
// hakmem_pool.h - L2 Hybrid Pool (1KB-52KB Mid-Size Allocations)
|
||
// Purpose: Per-thread pool with site-based sharding for mid-size fast-path
|
||
//
|
||
// Design Philosophy:
|
||
// - **7 size classes**: 2KiB, 4KiB, 8KiB, 16KiB, 32KiB, 40KiB, 52KiB
|
||
// - **64KiB pool pages**: 32 blocks (2KiB), 16 blocks (4KiB), 8 blocks (8KiB), etc.
|
||
// - **per-thread freelist**: Lock-free allocation (mimalloc strategy)
|
||
// - **O(1) site→shard mapping**: `shard = (pc >> 4) & (SHARDS-1)`
|
||
// - **MPSC queue**: Remote-free handling (cross-thread deallocation)
|
||
//
|
||
// Phase 2 Update:
|
||
// - Pool now accepts requests from 1025B (LANE_POOL_MIN) to 52KB
|
||
// - Requests 1025-2047B are rounded up to 2KB class (internal fragmentation OK)
|
||
// - This eliminates the "unmanaged zone" between Tiny (1024B) and Pool (was 2KB)
|
||
//
|
||
// Target Workloads:
|
||
// - mir (medium): 2-32KiB allocations → +52% → target +10-20%
|
||
// - mixed: combination → +66% → target +10-25%
|
||
//
|
||
// Integration: Called by hakmem.c for sizes > LANE_TINY_MAX (1024B)
|
||
//
|
||
// License: MIT
|
||
// Date: 2025-10-21 (Phase 2 Update: 2025-12-02)
|
||
|
||
#ifndef HAKMEM_POOL_H
|
||
#define HAKMEM_POOL_H
|
||
|
||
#include <stddef.h>
|
||
#include <stdint.h>
|
||
|
||
// Phase 2: Lane Classification Box (Single Source of Truth for boundaries)
|
||
#include "box/hak_lane_classify.inc.h"
|
||
|
||
// ===========================================================================
|
||
// Configuration Constants
|
||
// ===========================================================================
|
||
|
||
#define POOL_NUM_CLASSES 7 // 2KiB, 4KiB, 8KiB, 16KiB, 32KiB, 40KiB, 52KiB
|
||
#define POOL_PAGE_SIZE (64 * 1024) // 64KiB per pool page
|
||
#define POOL_NUM_SHARDS 64 // Site-based sharding (power of 2)
|
||
|
||
// Size class boundaries (in bytes) - actual block sizes
|
||
#define POOL_CLASS_2KB (2 * 1024)
|
||
#define POOL_CLASS_4KB (4 * 1024)
|
||
#define POOL_CLASS_8KB (8 * 1024)
|
||
#define POOL_CLASS_16KB (16 * 1024)
|
||
#define POOL_CLASS_32KB (32 * 1024)
|
||
#define POOL_CLASS_40KB (40 * 1024) // Phase 6.21: Bridge class 0
|
||
#define POOL_CLASS_52KB (52 * 1024) // Phase 6.21: Bridge class 1
|
||
|
||
// ===========================================================================
|
||
// Phase 2: Request Size vs Block Size (separate concepts!)
|
||
// ===========================================================================
|
||
//
|
||
// POOL_MIN_SIZE: Smallest USER REQUEST Pool accepts (= LANE_POOL_MIN = 1025)
|
||
// POOL_MIN_CLASS: Smallest BLOCK SIZE Pool allocates (= 2KB)
|
||
//
|
||
// Example: request=1056B -> class=2KB (internal fragmentation ~48%, acceptable)
|
||
// This is better than libc fragmentation from mmap fallback!
|
||
|
||
// Request boundary (from lane classification - Single Source of Truth)
|
||
#define POOL_MIN_SIZE POOL_MIN_REQUEST_SIZE // = 1025 (LANE_TINY_MAX + 1)
|
||
#define POOL_MAX_SIZE LANE_POOL_MAX // = 52KB
|
||
|
||
// Block class boundary (internal, for size-to-class mapping)
|
||
#define POOL_MIN_CLASS POOL_CLASS_2KB // Smallest actual block = 2KB
|
||
|
||
// Remote-free drain threshold
|
||
#define POOL_REMOTE_DRAIN_THRESHOLD 16 // Drain every N allocs
|
||
|
||
// ===========================================================================
|
||
// Public API
|
||
// ===========================================================================
|
||
|
||
// Initialize pool system (called by hak_init)
|
||
void hak_pool_init(void);
|
||
|
||
// Shutdown pool system and release all pages
|
||
void hak_pool_shutdown(void);
|
||
|
||
// Try to allocate from pool (returns NULL if size not in range)
|
||
// Args: size - requested allocation size (2-32KiB)
|
||
// site_id - call-site address (for shard selection)
|
||
// Returns: Pointer to allocated block, or NULL if pool unavailable
|
||
void* hak_pool_try_alloc(size_t size, uintptr_t site_id);
|
||
|
||
// Free block back to pool
|
||
// Args: ptr - pointer to block (from hak_pool_try_alloc)
|
||
// size - original allocation size (for class determination)
|
||
// site_id - call-site address (for shard routing)
|
||
void hak_pool_free(void* ptr, size_t size, uintptr_t site_id);
|
||
|
||
// Mid fast-path helpers (headerless route)
|
||
// Returns 1 if ptr belongs to Mid pool (1–32KiB). When out_size is non-NULL,
|
||
// fills the class size in bytes.
|
||
int hak_pool_mid_lookup(void* ptr, size_t* out_size);
|
||
|
||
// Free using Mid page descriptors (no header read). Safe when HDR_LIGHT=2.
|
||
void hak_pool_free_fast(void* ptr, uintptr_t site_id);
|
||
|
||
// Print pool statistics (called by hak_shutdown)
|
||
void hak_pool_print_stats(void);
|
||
|
||
// Stats snapshot (per-class counters). Arrays must have length POOL_NUM_CLASSES.
|
||
void hak_pool_stats_snapshot(uint64_t hits[], uint64_t misses[], uint64_t refills[], uint64_t frees[]);
|
||
|
||
// Extra metrics snapshot for learner logging (monotonic counters)
|
||
// Outputs: trylock_attempts, trylock_success, ring_underflow (may be NULL if not needed)
|
||
void hak_pool_extra_metrics_snapshot(uint64_t* trylock_attempts, uint64_t* trylock_success, uint64_t* ring_underflow);
|
||
|
||
// ===========================================================================
|
||
// Internal Helpers (for testing/debugging)
|
||
// ===========================================================================
|
||
|
||
// Phase 6.10.1: hak_pool_get_class_index() is now static inline (hakmem_pool.c:70)
|
||
// Removed from public API (no longer needed in header)
|
||
|
||
// Get shard index from site_id (0-63)
|
||
int hak_pool_get_shard_index(uintptr_t site_id);
|
||
|
||
// Check if size is poolable (1025B-52KB range, Phase 2 expanded)
|
||
// Phase 2: Now accepts 1025B+ (was 2KB+) to eliminate unmanaged zone
|
||
static inline int hak_pool_is_poolable(size_t size) {
|
||
return size >= POOL_MIN_SIZE && size <= POOL_MAX_SIZE;
|
||
}
|
||
|
||
#endif // HAKMEM_POOL_H
|