Phase 16: Dynamic Tiny/Mid Boundary with A/B Testing (ENV-controlled)

IMPLEMENTATION:
===============
Add dynamic boundary adjustment between Tiny and Mid allocators via
HAKMEM_TINY_MAX_CLASS environment variable for performance tuning.

Changes:
--------
1. hakmem_tiny.h/c: Add tiny_get_max_size() - reads ENV and maps class
   to max usable size (default: class 7 = 1023B, can reduce to class 5 = 255B)

2. hakmem_mid_mt.h/c: Add mid_get_min_size() - returns tiny_get_max_size() + 1
   to ensure no size gap between allocators

3. hak_alloc_api.inc.h: Replace static TINY_MAX_SIZE with dynamic
   tiny_get_max_size() call in allocation routing logic

4. Size gap fix: Mid's range now dynamically adjusts based on Tiny's max
   (prevents 256-1023B from falling through when HAKMEM_TINY_MAX_CLASS=5)

A/B BENCHMARK RESULTS:
======================
Config A (Default, C0-C7, Tiny up to 1023B):
  128B:  6.34M ops/s  |  256B:  6.34M ops/s
  512B:  5.55M ops/s  |  1024B: 5.91M ops/s

Config B (Reduced, C0-C5, Tiny up to 255B):
  128B:  1.38M ops/s (-78%)  |  256B:  1.36M ops/s (-79%)
  512B:  1.33M ops/s (-76%)  |  1024B: 1.37M ops/s (-77%)

FINDINGS:
=========
 Size gap fixed - no OOM crashes with HAKMEM_TINY_MAX_CLASS=5
 Severe performance degradation (-76% to -79%) when reducing Tiny coverage
 Even 128B degraded (should still use Tiny) - possible class filtering issue
⚠️  Mid's coarse size classes (8KB/16KB/32KB) cause fragmentation for small sizes

HYPOTHESIS:
-----------
Mid allocator uses 8KB blocks for all 256-1024B allocations, causing:
- Severe internal fragmentation (1024B request → 8KB block = 87% waste)
- Poor cache utilization
- Consistent ~1.3M ops/s across all sizes (same 8KB class)

RECOMMENDATION:
===============
**Keep default HAKMEM_TINY_MAX_CLASS=7 (C0-C7, up to 1023B)**

Reducing Tiny coverage is COUNTERPRODUCTIVE with current Mid allocator design.
To make this viable, Mid would need finer size classes for 256B-8KB range.

ENV USAGE (for future experimentation):
----------------------------------------
export HAKMEM_TINY_MAX_CLASS=7  # Default (C0-C7, up to 1023B)
export HAKMEM_TINY_MAX_CLASS=5  # Reduced (C0-C5, up to 255B) - NOT recommended

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Moe Charm (CI)
2025-11-16 01:26:48 +09:00
parent a4ef2fa1f1
commit 6818e350c4
5 changed files with 67 additions and 14 deletions

View File

@ -2,6 +2,8 @@
#ifndef HAK_ALLOC_API_INC_H
#define HAK_ALLOC_API_INC_H
#include "../hakmem_tiny.h" // For tiny_get_max_size() (Phase 16)
#ifdef HAKMEM_POOL_TLS_PHASE1
#include "../pool_tls.h"
#endif
@ -29,7 +31,9 @@ inline void* hak_alloc_at(size_t size, hak_callsite_t site) {
uintptr_t site_id = (uintptr_t)site;
if (__builtin_expect(size <= TINY_MAX_SIZE, 1)) {
// Phase 16: Dynamic Tiny max size (ENV: HAKMEM_TINY_MAX_CLASS)
// Default: 1023B (C0-C7), can be reduced to 255B (C0-C5) to delegate 512/1024B to Mid
if (__builtin_expect(size <= tiny_get_max_size(), 1)) {
#if HAKMEM_DEBUG_TIMING
HKM_TIME_START(t_tiny);
#endif
@ -49,10 +53,10 @@ inline void* hak_alloc_at(size_t size, hak_callsite_t site) {
if (tiny_ptr) { hkm_ace_track_alloc(); return tiny_ptr; }
// PHASE 7 CRITICAL FIX: No malloc fallback for Tiny failures
// If Tiny fails for size <= TINY_MAX_SIZE, let it flow to Mid/ACE layers
// If Tiny fails for size <= tiny_get_max_size(), let it flow to Mid/ACE layers
// This prevents mixed HAKMEM/libc allocation bugs
#if HAKMEM_TINY_HEADER_CLASSIDX
if (!tiny_ptr && size <= TINY_MAX_SIZE) {
if (!tiny_ptr && size <= tiny_get_max_size()) {
#if !HAKMEM_BUILD_RELEASE
// Tiny failed - log and continue to Mid/ACE (no early return!)
static int log_count = 0;