Files
hakmem/core/box/tiny_alloc_gate_box.h

221 lines
8.2 KiB
C
Raw Normal View History

Add Tiny Alloc Gatekeeper Box for unified malloc entry point Core Changes: - New file: core/box/tiny_alloc_gate_box.h * Thin wrapper around malloc_tiny_fast() with diagnostic hooks * TinyAllocGateContext structure for size/class_idx/user/base/bridge information * tiny_alloc_gate_diag_enabled() - ENV-controlled diagnostic mode * tiny_alloc_gate_validate() - Validates class_idx/header/meta consistency * tiny_alloc_gate_fast() - Main gatekeeper function * Zero performance impact when diagnostics disabled - Modified: core/box/hak_wrappers.inc.h * Added #include "tiny_alloc_gate_box.h" (line 35) * Integrated gatekeeper into malloc wrapper (lines 198-200) * Diagnostic mode via HAKMEM_TINY_ALLOC_GATE_DIAG env var Design Rationale: - Complements Free Gatekeeper Box: Together they provide entry/exit hooks - Validates allocation consistency at malloc time - Enables Bridge + BASE/USER conversion validation in debug mode - Maintains backward compatibility: existing behavior unchanged Validation Features: - tiny_ptr_bridge_classify_raw() - Verifies Superslab/Slab/meta lookup - Header vs meta class consistency check (rate-limited, 8 msgs max) - class_idx validation via hak_tiny_size_to_class() - All validation logged but non-blocking (observation points for Guard) Testing: - All smoke tests pass (10M malloc/free cycles, pool TLS, real programs) - Diagnostic mode validated with HAKMEM_TINY_ALLOC_GATE_DIAG=1 - No regressions in existing functionality - Verified via Task agent (PASS verdict) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-04 12:06:14 +09:00
// tiny_alloc_gate_box.h - Box: Tiny Alloc Gatekeeper
//
// 役割:
// - malloc 側の Tiny フロントエンドで、Tiny 向け割り当ての「入口箱」として振る舞う。
// - いまは既存の malloc_tiny_fast(size) を薄くラップしつつ、
// 将来の BASE/USER 変換・Bridge・Guard を 1 箱に集約できるフックを提供する。
//
// Box 理論:
// - Single Responsibility:
// 「Tiny alloc の入口で、返す USER ポインタを一度だけ検査/正規化する」。
// - Clear Boundary:
// malloc ラッパ (hak_wrappers) から Tiny Fast Path への入口を
// tiny_alloc_gate_fast() に一本化する。
// - Reversible:
// 診断は ENV (HAKMEM_TINY_ALLOC_GATE_DIAG) で ON/OFF 切替可能。
// OFF 時は従来どおり malloc_tiny_fast の挙動・コストを保つ。
#ifndef HAKMEM_TINY_ALLOC_GATE_BOX_H
#define HAKMEM_TINY_ALLOC_GATE_BOX_H
#include "../hakmem_build_flags.h"
#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>
#include "../hakmem_tiny.h" // hak_tiny_size_to_class
#include "ptr_type_box.h"
#include "ptr_conversion_box.h" // USER↔BASE 変換
#include "tiny_ptr_bridge_box.h" // Tiny Superslab Bridge
#include "../tiny_region_id.h" // Header 読み出し
#include "../front/malloc_tiny_fast.h" // 既存 Tiny Fast Path
P-Tier + Tiny Route Policy: Aggressive Superslab Management + Safe Routing ## Phase 1: Utilization-Aware Superslab Tiering (案B実装済) - Add ss_tier_box.h: Classify SuperSlabs into HOT/DRAINING/FREE based on utilization - HOT (>25%): Accept new allocations - DRAINING (≤25%): Drain only, no new allocs - FREE (0%): Ready for eager munmap - Enhanced shared_pool_release_slab(): - Check tier transition after each slab release - If tier→FREE: Force remaining slots to EMPTY and call superslab_free() immediately - Bypasses LRU cache to prevent registry bloat from accumulating DRAINING SuperSlabs - Test results (bench_random_mixed_hakmem): - 1M iterations: ✅ ~1.03M ops/s (previously passed) - 10M iterations: ✅ ~1.15M ops/s (previously: registry full error) - 50M iterations: ✅ ~1.08M ops/s (stress test) ## Phase 2: Tiny Front Routing Policy (新規Box) - Add tiny_route_box.h/c: Single 8-byte table for class→routing decisions - ROUTE_TINY_ONLY: Tiny front exclusive (no fallback) - ROUTE_TINY_FIRST: Try Tiny, fallback to Pool if fails - ROUTE_POOL_ONLY: Skip Tiny entirely - Profiles via HAKMEM_TINY_PROFILE ENV: - "hot": C0-C3=TINY_ONLY, C4-C6=TINY_FIRST, C7=POOL_ONLY - "conservative" (default): All TINY_FIRST - "off": All POOL_ONLY (disable Tiny) - "full": All TINY_ONLY (microbench mode) - A/B test results (ws=256, 100k ops random_mixed): - Default (conservative): ~2.90M ops/s - hot: ~2.65M ops/s (more conservative) - off: ~2.86M ops/s - full: ~2.98M ops/s (slightly best) ## Design Rationale ### Registry Pressure Fix (案B) - Problem: DRAINING tier SS occupied registry indefinitely - Solution: When total_active_blocks→0, immediately free to clear registry slot - Result: No more "registry full" errors under stress ### Routing Policy Box (新) - Problem: Tiny front optimization scattered across ENV/branches - Solution: Centralize routing in single table, select profiles via ENV - Benefit: Safe A/B testing without touching hot path code - Future: Integrate with RSS budget/learning layers for dynamic profile switching ## Next Steps (性能最適化) - Profile Tiny front internals (TLS SLL, FastCache, Superslab backend latency) - Identify bottleneck between current ~2.9M ops/s and mimalloc ~100M ops/s - Consider: - Reduce shared pool lock contention - Optimize unified cache hit rate - Streamline Superslab carving logic 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-04 18:01:25 +09:00
#include "tiny_route_box.h" // Tiny Front Routing Policy
Add Tiny Alloc Gatekeeper Box for unified malloc entry point Core Changes: - New file: core/box/tiny_alloc_gate_box.h * Thin wrapper around malloc_tiny_fast() with diagnostic hooks * TinyAllocGateContext structure for size/class_idx/user/base/bridge information * tiny_alloc_gate_diag_enabled() - ENV-controlled diagnostic mode * tiny_alloc_gate_validate() - Validates class_idx/header/meta consistency * tiny_alloc_gate_fast() - Main gatekeeper function * Zero performance impact when diagnostics disabled - Modified: core/box/hak_wrappers.inc.h * Added #include "tiny_alloc_gate_box.h" (line 35) * Integrated gatekeeper into malloc wrapper (lines 198-200) * Diagnostic mode via HAKMEM_TINY_ALLOC_GATE_DIAG env var Design Rationale: - Complements Free Gatekeeper Box: Together they provide entry/exit hooks - Validates allocation consistency at malloc time - Enables Bridge + BASE/USER conversion validation in debug mode - Maintains backward compatibility: existing behavior unchanged Validation Features: - tiny_ptr_bridge_classify_raw() - Verifies Superslab/Slab/meta lookup - Header vs meta class consistency check (rate-limited, 8 msgs max) - class_idx validation via hak_tiny_size_to_class() - All validation logged but non-blocking (observation points for Guard) Testing: - All smoke tests pass (10M malloc/free cycles, pool TLS, real programs) - Diagnostic mode validated with HAKMEM_TINY_ALLOC_GATE_DIAG=1 - No regressions in existing functionality - Verified via Task agent (PASS verdict) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-04 12:06:14 +09:00
// 将来の拡張用コンテキスト:
// - size : 要求サイズ
// - class_idx : サイズ→クラス写像(期待値)
// - user : 返された USER ポインタ
// - base : USER→BASE 変換後
// - bridge : Superslab / slab / meta / class 情報
typedef struct TinyAllocGateContext {
size_t size;
int class_idx;
hak_user_ptr_t user;
hak_base_ptr_t base;
TinyPtrBridgeInfo bridge;
} TinyAllocGateContext;
// 診断用 Gatekeeper 拡張の ON/OFFENV: HAKMEM_TINY_ALLOC_GATE_DIAG
static inline int tiny_alloc_gate_diag_enabled(void)
{
static __thread int s_diag = -1;
if (__builtin_expect(s_diag == -1, 0)) {
#if !HAKMEM_BUILD_RELEASE
const char* e = getenv("HAKMEM_TINY_ALLOC_GATE_DIAG");
s_diag = (e && *e && *e != '0') ? 1 : 0;
#else
(void)getenv;
s_diag = 0;
#endif
}
return s_diag;
}
// 診断用: USER ポインタが Tiny Superslab 上で期待クラスに属しているかを検査。
// 戻り値:
// 1: 検査 OKTiny 管理下で、class_idx との整合性あり)
// 0: 何らかの不整合Bridge 失敗/クラス不一致など)
static inline int tiny_alloc_gate_validate(TinyAllocGateContext* ctx)
{
if (!ctx) return 0;
void* user_raw = HAK_USER_TO_RAW(ctx->user);
if (!user_raw) return 0;
// 範囲上明らかにおかしいものは Tiny 管理外扱い
uintptr_t addr = (uintptr_t)user_raw;
if (addr < 4096 || addr > 0x00007fffffffffffULL) {
return 0;
}
// Bridge: Superslab / Slab / Meta / Class を一括取得
TinyPtrBridgeInfo info = tiny_ptr_bridge_classify_raw(user_raw);
ctx->bridge = info;
if (!info.ss || !info.meta || info.slab_idx < 0) {
return 0;
}
// 期待クラス (size 由来) と meta クラスの整合性チェック
uint8_t meta_cls = info.meta_cls;
if (meta_cls >= TINY_NUM_CLASSES) {
return 0;
}
if (ctx->class_idx >= 0 && (uint8_t)ctx->class_idx != meta_cls) {
static _Atomic uint32_t g_alloc_gate_cls_mis = 0;
uint32_t n = atomic_fetch_add_explicit(&g_alloc_gate_cls_mis, 1, memory_order_relaxed);
if (n < 8) {
fprintf(stderr,
"[TINY_ALLOC_GATE_CLASS_MISMATCH] size=%zu cls_expect=%d meta_cls=%u user=%p ss=%p slab=%d\n",
ctx->size,
ctx->class_idx,
(unsigned)meta_cls,
user_raw,
(void*)info.ss,
info.slab_idx);
fflush(stderr);
}
// クラス不一致自体は Fail-Fast せず、ログだけ残す(将来の Guard 差し込みポイント)。
}
#if !HAKMEM_BUILD_RELEASE
// Header 由来の class と meta class の整合性も確認
int hdr_cls = tiny_region_id_read_header(user_raw);
if (hdr_cls >= 0 && hdr_cls != (int)meta_cls) {
static _Atomic uint32_t g_alloc_gate_hdr_meta_mis = 0;
uint32_t n = atomic_fetch_add_explicit(&g_alloc_gate_hdr_meta_mis, 1, memory_order_relaxed);
if (n < 8) {
fprintf(stderr,
"[TINY_ALLOC_GATE_HDR_META_MISMATCH] size=%zu hdr_cls=%d meta_cls=%u user=%p ss=%p slab=%d\n",
ctx->size,
hdr_cls,
(unsigned)meta_cls,
user_raw,
(void*)info.ss,
info.slab_idx);
fflush(stderr);
}
}
#endif
// USER→BASE 変換Box 経由)を一度だけ行い、将来 Base ベースの Guard に活用。
ctx->base = ptr_user_to_base(ctx->user, meta_cls);
return 1;
}
// Tiny Alloc Gatekeeper 本体:
// - malloc ラッパ (hak_wrappers) から呼ばれる Tiny fast alloc の入口。
P-Tier + Tiny Route Policy: Aggressive Superslab Management + Safe Routing ## Phase 1: Utilization-Aware Superslab Tiering (案B実装済) - Add ss_tier_box.h: Classify SuperSlabs into HOT/DRAINING/FREE based on utilization - HOT (>25%): Accept new allocations - DRAINING (≤25%): Drain only, no new allocs - FREE (0%): Ready for eager munmap - Enhanced shared_pool_release_slab(): - Check tier transition after each slab release - If tier→FREE: Force remaining slots to EMPTY and call superslab_free() immediately - Bypasses LRU cache to prevent registry bloat from accumulating DRAINING SuperSlabs - Test results (bench_random_mixed_hakmem): - 1M iterations: ✅ ~1.03M ops/s (previously passed) - 10M iterations: ✅ ~1.15M ops/s (previously: registry full error) - 50M iterations: ✅ ~1.08M ops/s (stress test) ## Phase 2: Tiny Front Routing Policy (新規Box) - Add tiny_route_box.h/c: Single 8-byte table for class→routing decisions - ROUTE_TINY_ONLY: Tiny front exclusive (no fallback) - ROUTE_TINY_FIRST: Try Tiny, fallback to Pool if fails - ROUTE_POOL_ONLY: Skip Tiny entirely - Profiles via HAKMEM_TINY_PROFILE ENV: - "hot": C0-C3=TINY_ONLY, C4-C6=TINY_FIRST, C7=POOL_ONLY - "conservative" (default): All TINY_FIRST - "off": All POOL_ONLY (disable Tiny) - "full": All TINY_ONLY (microbench mode) - A/B test results (ws=256, 100k ops random_mixed): - Default (conservative): ~2.90M ops/s - hot: ~2.65M ops/s (more conservative) - off: ~2.86M ops/s - full: ~2.98M ops/s (slightly best) ## Design Rationale ### Registry Pressure Fix (案B) - Problem: DRAINING tier SS occupied registry indefinitely - Solution: When total_active_blocks→0, immediately free to clear registry slot - Result: No more "registry full" errors under stress ### Routing Policy Box (新) - Problem: Tiny front optimization scattered across ENV/branches - Solution: Centralize routing in single table, select profiles via ENV - Benefit: Safe A/B testing without touching hot path code - Future: Integrate with RSS budget/learning layers for dynamic profile switching ## Next Steps (性能最適化) - Profile Tiny front internals (TLS SLL, FastCache, Superslab backend latency) - Identify bottleneck between current ~2.9M ops/s and mimalloc ~100M ops/s - Consider: - Reduce shared pool lock contention - Optimize unified cache hit rate - Streamline Superslab carving logic 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-04 18:01:25 +09:00
// - ルーティングポリシーに基づき Tiny front / Pool fallback を振り分け、
// 診断 ON のときだけ返された USER ポインタに対して Bridge + Layout 検査を追加。
static inline void* tiny_alloc_gate_fast(size_t size)
Add Tiny Alloc Gatekeeper Box for unified malloc entry point Core Changes: - New file: core/box/tiny_alloc_gate_box.h * Thin wrapper around malloc_tiny_fast() with diagnostic hooks * TinyAllocGateContext structure for size/class_idx/user/base/bridge information * tiny_alloc_gate_diag_enabled() - ENV-controlled diagnostic mode * tiny_alloc_gate_validate() - Validates class_idx/header/meta consistency * tiny_alloc_gate_fast() - Main gatekeeper function * Zero performance impact when diagnostics disabled - Modified: core/box/hak_wrappers.inc.h * Added #include "tiny_alloc_gate_box.h" (line 35) * Integrated gatekeeper into malloc wrapper (lines 198-200) * Diagnostic mode via HAKMEM_TINY_ALLOC_GATE_DIAG env var Design Rationale: - Complements Free Gatekeeper Box: Together they provide entry/exit hooks - Validates allocation consistency at malloc time - Enables Bridge + BASE/USER conversion validation in debug mode - Maintains backward compatibility: existing behavior unchanged Validation Features: - tiny_ptr_bridge_classify_raw() - Verifies Superslab/Slab/meta lookup - Header vs meta class consistency check (rate-limited, 8 msgs max) - class_idx validation via hak_tiny_size_to_class() - All validation logged but non-blocking (observation points for Guard) Testing: - All smoke tests pass (10M malloc/free cycles, pool TLS, real programs) - Diagnostic mode validated with HAKMEM_TINY_ALLOC_GATE_DIAG=1 - No regressions in existing functionality - Verified via Task agent (PASS verdict) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-04 12:06:14 +09:00
{
P-Tier + Tiny Route Policy: Aggressive Superslab Management + Safe Routing ## Phase 1: Utilization-Aware Superslab Tiering (案B実装済) - Add ss_tier_box.h: Classify SuperSlabs into HOT/DRAINING/FREE based on utilization - HOT (>25%): Accept new allocations - DRAINING (≤25%): Drain only, no new allocs - FREE (0%): Ready for eager munmap - Enhanced shared_pool_release_slab(): - Check tier transition after each slab release - If tier→FREE: Force remaining slots to EMPTY and call superslab_free() immediately - Bypasses LRU cache to prevent registry bloat from accumulating DRAINING SuperSlabs - Test results (bench_random_mixed_hakmem): - 1M iterations: ✅ ~1.03M ops/s (previously passed) - 10M iterations: ✅ ~1.15M ops/s (previously: registry full error) - 50M iterations: ✅ ~1.08M ops/s (stress test) ## Phase 2: Tiny Front Routing Policy (新規Box) - Add tiny_route_box.h/c: Single 8-byte table for class→routing decisions - ROUTE_TINY_ONLY: Tiny front exclusive (no fallback) - ROUTE_TINY_FIRST: Try Tiny, fallback to Pool if fails - ROUTE_POOL_ONLY: Skip Tiny entirely - Profiles via HAKMEM_TINY_PROFILE ENV: - "hot": C0-C3=TINY_ONLY, C4-C6=TINY_FIRST, C7=POOL_ONLY - "conservative" (default): All TINY_FIRST - "off": All POOL_ONLY (disable Tiny) - "full": All TINY_ONLY (microbench mode) - A/B test results (ws=256, 100k ops random_mixed): - Default (conservative): ~2.90M ops/s - hot: ~2.65M ops/s (more conservative) - off: ~2.86M ops/s - full: ~2.98M ops/s (slightly best) ## Design Rationale ### Registry Pressure Fix (案B) - Problem: DRAINING tier SS occupied registry indefinitely - Solution: When total_active_blocks→0, immediately free to clear registry slot - Result: No more "registry full" errors under stress ### Routing Policy Box (新) - Problem: Tiny front optimization scattered across ENV/branches - Solution: Centralize routing in single table, select profiles via ENV - Benefit: Safe A/B testing without touching hot path code - Future: Integrate with RSS budget/learning layers for dynamic profile switching ## Next Steps (性能最適化) - Profile Tiny front internals (TLS SLL, FastCache, Superslab backend latency) - Identify bottleneck between current ~2.9M ops/s and mimalloc ~100M ops/s - Consider: - Reduce shared pool lock contention - Optimize unified cache hit rate - Streamline Superslab carving logic 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-04 18:01:25 +09:00
int class_idx = hak_tiny_size_to_class(size);
if (__builtin_expect(class_idx < 0 || class_idx >= TINY_NUM_CLASSES, 0)) {
// サイズが Tiny 管理外 → Pool/backend に任せるNULL で Gate を抜けさせる)
return NULL;
}
TinyRoutePolicy route = tiny_route_get(class_idx);
// Pool-only: Tiny front は完全スキップGate から見ると「Tiny では取れなかった」扱い)
if (__builtin_expect(route == ROUTE_POOL_ONLY, 0)) {
return NULL;
}
// まず Tiny Fast Path で割り当てUSER ポインタを得る)
Add Tiny Alloc Gatekeeper Box for unified malloc entry point Core Changes: - New file: core/box/tiny_alloc_gate_box.h * Thin wrapper around malloc_tiny_fast() with diagnostic hooks * TinyAllocGateContext structure for size/class_idx/user/base/bridge information * tiny_alloc_gate_diag_enabled() - ENV-controlled diagnostic mode * tiny_alloc_gate_validate() - Validates class_idx/header/meta consistency * tiny_alloc_gate_fast() - Main gatekeeper function * Zero performance impact when diagnostics disabled - Modified: core/box/hak_wrappers.inc.h * Added #include "tiny_alloc_gate_box.h" (line 35) * Integrated gatekeeper into malloc wrapper (lines 198-200) * Diagnostic mode via HAKMEM_TINY_ALLOC_GATE_DIAG env var Design Rationale: - Complements Free Gatekeeper Box: Together they provide entry/exit hooks - Validates allocation consistency at malloc time - Enables Bridge + BASE/USER conversion validation in debug mode - Maintains backward compatibility: existing behavior unchanged Validation Features: - tiny_ptr_bridge_classify_raw() - Verifies Superslab/Slab/meta lookup - Header vs meta class consistency check (rate-limited, 8 msgs max) - class_idx validation via hak_tiny_size_to_class() - All validation logged but non-blocking (observation points for Guard) Testing: - All smoke tests pass (10M malloc/free cycles, pool TLS, real programs) - Diagnostic mode validated with HAKMEM_TINY_ALLOC_GATE_DIAG=1 - No regressions in existing functionality - Verified via Task agent (PASS verdict) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-04 12:06:14 +09:00
void* user_ptr = malloc_tiny_fast(size);
P-Tier + Tiny Route Policy: Aggressive Superslab Management + Safe Routing ## Phase 1: Utilization-Aware Superslab Tiering (案B実装済) - Add ss_tier_box.h: Classify SuperSlabs into HOT/DRAINING/FREE based on utilization - HOT (>25%): Accept new allocations - DRAINING (≤25%): Drain only, no new allocs - FREE (0%): Ready for eager munmap - Enhanced shared_pool_release_slab(): - Check tier transition after each slab release - If tier→FREE: Force remaining slots to EMPTY and call superslab_free() immediately - Bypasses LRU cache to prevent registry bloat from accumulating DRAINING SuperSlabs - Test results (bench_random_mixed_hakmem): - 1M iterations: ✅ ~1.03M ops/s (previously passed) - 10M iterations: ✅ ~1.15M ops/s (previously: registry full error) - 50M iterations: ✅ ~1.08M ops/s (stress test) ## Phase 2: Tiny Front Routing Policy (新規Box) - Add tiny_route_box.h/c: Single 8-byte table for class→routing decisions - ROUTE_TINY_ONLY: Tiny front exclusive (no fallback) - ROUTE_TINY_FIRST: Try Tiny, fallback to Pool if fails - ROUTE_POOL_ONLY: Skip Tiny entirely - Profiles via HAKMEM_TINY_PROFILE ENV: - "hot": C0-C3=TINY_ONLY, C4-C6=TINY_FIRST, C7=POOL_ONLY - "conservative" (default): All TINY_FIRST - "off": All POOL_ONLY (disable Tiny) - "full": All TINY_ONLY (microbench mode) - A/B test results (ws=256, 100k ops random_mixed): - Default (conservative): ~2.90M ops/s - hot: ~2.65M ops/s (more conservative) - off: ~2.86M ops/s - full: ~2.98M ops/s (slightly best) ## Design Rationale ### Registry Pressure Fix (案B) - Problem: DRAINING tier SS occupied registry indefinitely - Solution: When total_active_blocks→0, immediately free to clear registry slot - Result: No more "registry full" errors under stress ### Routing Policy Box (新) - Problem: Tiny front optimization scattered across ENV/branches - Solution: Centralize routing in single table, select profiles via ENV - Benefit: Safe A/B testing without touching hot path code - Future: Integrate with RSS budget/learning layers for dynamic profile switching ## Next Steps (性能最適化) - Profile Tiny front internals (TLS SLL, FastCache, Superslab backend latency) - Identify bottleneck between current ~2.9M ops/s and mimalloc ~100M ops/s - Consider: - Reduce shared pool lock contention - Optimize unified cache hit rate - Streamline Superslab carving logic 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-04 18:01:25 +09:00
// Tiny-only: その結果をそのまま返すNULL なら上位が扱う)
if (__builtin_expect(route == ROUTE_TINY_ONLY, 1)) {
#if !HAKMEM_BUILD_RELEASE
// Layer 3aalloc 側): 明らかに異常なポインタは debug ビルドで早期検出
if (user_ptr) {
uintptr_t addr = (uintptr_t)user_ptr;
if (__builtin_expect(addr < 4096, 0)) {
fprintf(stderr,
"[TINY_ALLOC_GATE_RANGE_INVALID] size=%zu user=%p\n",
size, user_ptr);
fflush(stderr);
abort();
}
}
if (__builtin_expect(tiny_alloc_gate_diag_enabled(), 0) && user_ptr) {
TinyAllocGateContext ctx;
ctx.size = size;
ctx.user = HAK_USER_FROM_RAW(user_ptr);
ctx.class_idx = class_idx;
ctx.base = HAK_BASE_FROM_RAW(NULL);
ctx.bridge.ss = NULL;
ctx.bridge.meta = NULL;
ctx.bridge.slab_idx = -1;
ctx.bridge.meta_cls = 0xffu;
(void)tiny_alloc_gate_validate(&ctx);
}
#endif
return user_ptr;
}
// ROUTE_TINY_FIRST: Tiny で取れなければ Pool/backend fallback を許可NULL で Gate 脱出)
Add Tiny Alloc Gatekeeper Box for unified malloc entry point Core Changes: - New file: core/box/tiny_alloc_gate_box.h * Thin wrapper around malloc_tiny_fast() with diagnostic hooks * TinyAllocGateContext structure for size/class_idx/user/base/bridge information * tiny_alloc_gate_diag_enabled() - ENV-controlled diagnostic mode * tiny_alloc_gate_validate() - Validates class_idx/header/meta consistency * tiny_alloc_gate_fast() - Main gatekeeper function * Zero performance impact when diagnostics disabled - Modified: core/box/hak_wrappers.inc.h * Added #include "tiny_alloc_gate_box.h" (line 35) * Integrated gatekeeper into malloc wrapper (lines 198-200) * Diagnostic mode via HAKMEM_TINY_ALLOC_GATE_DIAG env var Design Rationale: - Complements Free Gatekeeper Box: Together they provide entry/exit hooks - Validates allocation consistency at malloc time - Enables Bridge + BASE/USER conversion validation in debug mode - Maintains backward compatibility: existing behavior unchanged Validation Features: - tiny_ptr_bridge_classify_raw() - Verifies Superslab/Slab/meta lookup - Header vs meta class consistency check (rate-limited, 8 msgs max) - class_idx validation via hak_tiny_size_to_class() - All validation logged but non-blocking (observation points for Guard) Testing: - All smoke tests pass (10M malloc/free cycles, pool TLS, real programs) - Diagnostic mode validated with HAKMEM_TINY_ALLOC_GATE_DIAG=1 - No regressions in existing functionality - Verified via Task agent (PASS verdict) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-04 12:06:14 +09:00
#if !HAKMEM_BUILD_RELEASE
Add lightweight Fail-Fast layer to Gatekeeper Boxes Core Changes: - Modified: core/box/tiny_free_gate_box.h * Added address range check in tiny_free_gate_try_fast() (line 142) * Catches obviously invalid pointers (addr < 4096) * Rejects fast path for garbage pointers, delegates to slow path * Logs [TINY_FREE_GATE_RANGE_INVALID] (debug-only, max 8 messages) * Cost: ~1 cycle (comparison + unlikely branch) * Behavior: Fails safe by delegating to hak_tiny_free() slow path - Modified: core/box/tiny_alloc_gate_box.h * Added range check for malloc_tiny_fast() return value (line 143) * Debug-only: Checks if returned user_ptr has addr < 4096 * On failure: Logs [TINY_ALLOC_GATE_RANGE_INVALID] and calls abort() * Release build: Entire check compiled out (zero overhead) * Rationale: Invalid allocator return is catastrophic - fail immediately Design Rationale: - Early detection of memory corruption/undefined behavior - Conservative threshold (4096) captures NULL and kernel space - Free path: Graceful degradation (delegate to slow path) - Alloc path: Hard fail (allocator corruption is non-recoverable) - Zero performance impact in production (Release) builds - Debug-only diagnostic output prevents log spam Fail-Fast Strategy: - Layer 3a: Address range sanity check (always enabled) * Rejects addr < 4096 (NULL, low memory garbage) * Free: delegates to slow path (safe fallback) * Alloc: aborts (corruption indicator) - Layer 3b: Detailed Bridge/Header validation (ENV-controlled) * Traditional HAKMEM_TINY_FREE_GATE_DIAG / HAKMEM_TINY_ALLOC_GATE_DIAG * For advanced debugging and observability Testing: - Compilation: RELEASE=0 and RELEASE=1 both successful - Smoke tests: 3/3 passed (simple_alloc, loop 10M, pool_tls) - Performance: No regressions detected - Address threshold (4096): Conservative, minimizes false positives - Verified via Task agent (PASS verdict) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-04 12:36:32 +09:00
if (user_ptr) {
uintptr_t addr = (uintptr_t)user_ptr;
if (__builtin_expect(addr < 4096, 0)) {
fprintf(stderr,
"[TINY_ALLOC_GATE_RANGE_INVALID] size=%zu user=%p\n",
size, user_ptr);
fflush(stderr);
abort();
}
P-Tier + Tiny Route Policy: Aggressive Superslab Management + Safe Routing ## Phase 1: Utilization-Aware Superslab Tiering (案B実装済) - Add ss_tier_box.h: Classify SuperSlabs into HOT/DRAINING/FREE based on utilization - HOT (>25%): Accept new allocations - DRAINING (≤25%): Drain only, no new allocs - FREE (0%): Ready for eager munmap - Enhanced shared_pool_release_slab(): - Check tier transition after each slab release - If tier→FREE: Force remaining slots to EMPTY and call superslab_free() immediately - Bypasses LRU cache to prevent registry bloat from accumulating DRAINING SuperSlabs - Test results (bench_random_mixed_hakmem): - 1M iterations: ✅ ~1.03M ops/s (previously passed) - 10M iterations: ✅ ~1.15M ops/s (previously: registry full error) - 50M iterations: ✅ ~1.08M ops/s (stress test) ## Phase 2: Tiny Front Routing Policy (新規Box) - Add tiny_route_box.h/c: Single 8-byte table for class→routing decisions - ROUTE_TINY_ONLY: Tiny front exclusive (no fallback) - ROUTE_TINY_FIRST: Try Tiny, fallback to Pool if fails - ROUTE_POOL_ONLY: Skip Tiny entirely - Profiles via HAKMEM_TINY_PROFILE ENV: - "hot": C0-C3=TINY_ONLY, C4-C6=TINY_FIRST, C7=POOL_ONLY - "conservative" (default): All TINY_FIRST - "off": All POOL_ONLY (disable Tiny) - "full": All TINY_ONLY (microbench mode) - A/B test results (ws=256, 100k ops random_mixed): - Default (conservative): ~2.90M ops/s - hot: ~2.65M ops/s (more conservative) - off: ~2.86M ops/s - full: ~2.98M ops/s (slightly best) ## Design Rationale ### Registry Pressure Fix (案B) - Problem: DRAINING tier SS occupied registry indefinitely - Solution: When total_active_blocks→0, immediately free to clear registry slot - Result: No more "registry full" errors under stress ### Routing Policy Box (新) - Problem: Tiny front optimization scattered across ENV/branches - Solution: Centralize routing in single table, select profiles via ENV - Benefit: Safe A/B testing without touching hot path code - Future: Integrate with RSS budget/learning layers for dynamic profile switching ## Next Steps (性能最適化) - Profile Tiny front internals (TLS SLL, FastCache, Superslab backend latency) - Identify bottleneck between current ~2.9M ops/s and mimalloc ~100M ops/s - Consider: - Reduce shared pool lock contention - Optimize unified cache hit rate - Streamline Superslab carving logic 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-04 18:01:25 +09:00
if (__builtin_expect(tiny_alloc_gate_diag_enabled(), 0)) {
TinyAllocGateContext ctx;
ctx.size = size;
ctx.user = HAK_USER_FROM_RAW(user_ptr);
ctx.class_idx = class_idx;
ctx.base = HAK_BASE_FROM_RAW(NULL);
ctx.bridge.ss = NULL;
ctx.bridge.meta = NULL;
ctx.bridge.slab_idx = -1;
ctx.bridge.meta_cls = 0xffu;
(void)tiny_alloc_gate_validate(&ctx);
}
Add Tiny Alloc Gatekeeper Box for unified malloc entry point Core Changes: - New file: core/box/tiny_alloc_gate_box.h * Thin wrapper around malloc_tiny_fast() with diagnostic hooks * TinyAllocGateContext structure for size/class_idx/user/base/bridge information * tiny_alloc_gate_diag_enabled() - ENV-controlled diagnostic mode * tiny_alloc_gate_validate() - Validates class_idx/header/meta consistency * tiny_alloc_gate_fast() - Main gatekeeper function * Zero performance impact when diagnostics disabled - Modified: core/box/hak_wrappers.inc.h * Added #include "tiny_alloc_gate_box.h" (line 35) * Integrated gatekeeper into malloc wrapper (lines 198-200) * Diagnostic mode via HAKMEM_TINY_ALLOC_GATE_DIAG env var Design Rationale: - Complements Free Gatekeeper Box: Together they provide entry/exit hooks - Validates allocation consistency at malloc time - Enables Bridge + BASE/USER conversion validation in debug mode - Maintains backward compatibility: existing behavior unchanged Validation Features: - tiny_ptr_bridge_classify_raw() - Verifies Superslab/Slab/meta lookup - Header vs meta class consistency check (rate-limited, 8 msgs max) - class_idx validation via hak_tiny_size_to_class() - All validation logged but non-blocking (observation points for Guard) Testing: - All smoke tests pass (10M malloc/free cycles, pool TLS, real programs) - Diagnostic mode validated with HAKMEM_TINY_ALLOC_GATE_DIAG=1 - No regressions in existing functionality - Verified via Task agent (PASS verdict) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-04 12:06:14 +09:00
}
#endif
return user_ptr;
}
#endif // HAKMEM_TINY_ALLOC_GATE_BOX_H