Files
hakmem/core/hakmem_tiny_refill.inc.h

512 lines
19 KiB
C
Raw Normal View History

// hakmem_tiny_refill.inc.h
// Phase 12: Minimal refill helpers needed by Box fast path.
//
// 本ヘッダは、以下を提供する:
// - superslab_tls_bump_fast: TinyTLSSlab + SuperSlab メタからのTLSバンプ窓
// - tiny_fast_refill_and_take: FastCache/TLS SLL からの最小 refill + 1個取得
// - bulk_mag_to_sll_if_room: Magazine→SLL へのバルク移送(容量チェック付き)
// - sll_refill_small_from_ss: Phase12 shared SuperSlab pool 向けの最小実装
//
// 旧来の g_sll_cap_override / getenv ベースの多経路ロジックは一切含めない。
#ifndef HAKMEM_TINY_REFILL_INC_H
#define HAKMEM_TINY_REFILL_INC_H
#include "hakmem_tiny.h"
#include "hakmem_tiny_superslab.h"
#include "hakmem_tiny_tls_list.h"
#include "tiny_box_geometry.h"
#include "superslab/superslab_inline.h" // Provides hak_super_lookup() and SUPERSLAB_MAGIC
#include "box/tls_sll_box.h"
#include "box/c7_meta_used_counter_box.h"
Cleanup: Fix 2 additional Class 0/7 header bugs (correctness fix) Task Agent Investigation: - Found 2 more instances of hardcoded `class_idx != 7` checks - These are real bugs (C0 also uses offset=0, not just C7) - However, NOT the root cause of 12% crash rate Bug Fixes (2 locations): 1. tls_sll_drain_box.h:190 - Path: TLS SLL drain → tiny_free_local_box() - Fix: Use tiny_header_write_for_alloc() (ALL classes) - Reason: tiny_free_local_box() reads header for class_idx 2. hakmem_tiny_refill.inc.h:384 - Path: SuperSlab refill → TLS SLL push - Fix: Use tiny_header_write_if_preserved() (C1-C6 only) - Reason: TLS SLL push needs header for validation Test Results: - Before: 12% crash rate (88/100 runs successful) - After: 12% crash rate (44/50 runs successful) - Conclusion: Correctness fix, but not primary crash cause Analysis: - Bugs are real (incorrect Class 0 handling) - Fixes don't reduce crash rate → different root cause exists - Heisenbug characteristics (disappears under gdb) - Likely: Race condition, uninitialized memory, or use-after-free Remaining Work: - 12% crash rate persists (requires different investigation) - Next: Focus on TLS initialization, race conditions, allocation paths Design Note: - tls_sll_drain_box.h uses tiny_header_write_for_alloc() because tiny_free_local_box() needs header to read class_idx - hakmem_tiny_refill.inc.h uses tiny_header_write_if_preserved() because TLS SLL push validates header (C1-C6 only) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-29 08:12:08 +09:00
#include "box/tiny_header_box.h" // Header Box: Single Source of Truth for header operations
#include "box/tiny_front_config_box.h" // Phase 7-Step6-Fix: Config macros for dead code elimination
#include "box/tiny_heap_env_box.h" // TinyHeap front gate (C7 TinyHeapBox)
#include "hakmem_tiny_integrity.h"
#include "box/tiny_next_ptr_box.h"
Front-Direct implementation: SS→FC direct refill + SLL complete bypass ## Summary Implemented Front-Direct architecture with complete SLL bypass: - Direct SuperSlab → FastCache refill (1-hop, bypasses SLL) - SLL-free allocation/free paths when Front-Direct enabled - Legacy path sealing (SLL inline opt-in, SFC cascade ENV-only) ## New Modules - core/refill/ss_refill_fc.h (236 lines): Standard SS→FC refill entry point - Remote drain → Freelist → Carve priority - Header restoration for C1-C6 (NOT C0/C7) - ENV: HAKMEM_TINY_P0_DRAIN_THRESH, HAKMEM_TINY_P0_NO_DRAIN - core/front/fast_cache.h: FastCache (L1) type definition - core/front/quick_slot.h: QuickSlot (L0) type definition ## Allocation Path (core/tiny_alloc_fast.inc.h) - Added s_front_direct_alloc TLS flag (lazy ENV check) - SLL pop guarded by: g_tls_sll_enable && !s_front_direct_alloc - Refill dispatch: - Front-Direct: ss_refill_fc_fill() → fastcache_pop() (1-hop) - Legacy: sll_refill_batch_from_ss() → SLL → FC (2-hop, A/B only) - SLL inline pop sealed (requires HAKMEM_TINY_INLINE_SLL=1 opt-in) ## Free Path (core/hakmem_tiny_free.inc, core/hakmem_tiny_fastcache.inc.h) - FC priority: Try fastcache_push() first (same-thread free) - tiny_fast_push() bypass: Returns 0 when s_front_direct_free || !g_tls_sll_enable - Fallback: Magazine/slow path (safe, bypasses SLL) ## Legacy Sealing - SFC cascade: Default OFF (ENV-only via HAKMEM_TINY_SFC_CASCADE=1) - Deleted: core/hakmem_tiny_free.inc.bak, core/pool_refill_legacy.c.bak - Documentation: ss_refill_fc_fill() promoted as CANONICAL refill entry ## ENV Controls - HAKMEM_TINY_FRONT_DIRECT=1: Enable Front-Direct (SS→FC direct) - HAKMEM_TINY_P0_DIRECT_FC_ALL=1: Same as above (alt name) - HAKMEM_TINY_REFILL_BATCH=1: Enable batch refill (also enables Front-Direct) - HAKMEM_TINY_SFC_CASCADE=1: Enable SFC cascade (default OFF) - HAKMEM_TINY_INLINE_SLL=1: Enable inline SLL pop (default OFF, requires AGGRESSIVE_INLINE) ## Benchmarks (Front-Direct Enabled) ```bash ENV: HAKMEM_BENCH_FAST_FRONT=1 HAKMEM_TINY_FRONT_DIRECT=1 HAKMEM_TINY_REFILL_BATCH=1 HAKMEM_TINY_P0_DIRECT_FC_ALL=1 HAKMEM_TINY_REFILL_COUNT_HOT=256 HAKMEM_TINY_REFILL_COUNT_MID=96 HAKMEM_TINY_BUMP_CHUNK=256 bench_random_mixed (16-1040B random, 200K iter): 256 slots: 1.44M ops/s (STABLE, 0 SEGV) 128 slots: 1.44M ops/s (STABLE, 0 SEGV) bench_fixed_size (fixed size, 200K iter): 256B: 4.06M ops/s (has debug logs, expected >10M without logs) 128B: Similar (debug logs affect) ``` ## Verification - TRACE_RING test (10K iter): **0 SLL events** detected ✅ - Complete SLL bypass confirmed when Front-Direct=1 - Stable execution: 200K iterations × multiple sizes, 0 SEGV ## Next Steps - Disable debug logs in hak_alloc_api.inc.h (call_num 14250-14280 range) - Re-benchmark with clean Release build (target: 10-15M ops/s) - 128/256B shortcut path optimization (FC hit rate improvement) Co-Authored-By: ChatGPT <chatgpt@openai.com> Suggested-By: ultrathink
2025-11-14 05:41:49 +09:00
#include "tiny_region_id.h" // For HEADER_MAGIC/HEADER_CLASS_MASK (prepare header before SLL push)
#include <stdint.h>
#include <stdatomic.h>
#include <stdio.h> // For fprintf diagnostics
// ========= Externs from hakmem_tiny.c and friends =========
extern int g_use_superslab;
extern __thread TinyTLSSlab g_tls_slabs[TINY_NUM_CLASSES];
extern int g_fastcache_enable;
extern uint16_t g_fast_cap[TINY_NUM_CLASSES];
extern __thread TinyFastCache g_fast_cache[TINY_NUM_CLASSES];
Phase 7-Step7: Replace g_tls_sll_enable with TINY_FRONT_TLS_SLL_ENABLED macro **Goal**: Enable dead code elimination for TLS SLL checks in PGO mode **Changes**: 1. core/box/tiny_front_config_box.h: - Add TINY_FRONT_TLS_SLL_ENABLED macro (PGO: 1, Normal: tiny_tls_sll_enabled()) - Add tiny_tls_sll_enabled() wrapper function (static inline) 2. core/tiny_alloc_fast.inc.h (5 hot path locations): - Line 220: tiny_heap_v2_refill_mag() - early return check - Line 388: SLIM mode - SLL freelist check - Line 459: tiny_alloc_fast_pop() - Layer 1 SLL check - Line 774: Main alloc path - cached sll_enabled check (most critical!) - Line 815: Generic front - SLL toggle respect 3. core/hakmem_tiny_refill.inc.h (2 locations): - Line 186: bulk_mag_refill_fc() - refill from SLL - Line 213: bulk_mag_to_sll_if_room() - push to SLL **Performance**: 79.9M ops/s (maintained, +0.1M vs Step 6) - Normal mode: Same performance (runtime checks preserved) - PGO mode: Dead code elimination ready (if (!1) → removed by compiler) **Expected PGO benefit**: - Eliminate 7 TLS SLL checks across hot paths - Reduce instruction count in main alloc loop - Better branch prediction (no runtime checks) **Design**: Config Box as single entry point - All TLS SLL checks now use TINY_FRONT_TLS_SLL_ENABLED - Consistent pattern with FASTCACHE/SFC/HEAP_V2 macros - Include order independent (wrapper in config box header) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-29 17:35:51 +09:00
// Phase 7-Step7: g_tls_sll_enable now accessed via TINY_FRONT_TLS_SLL_ENABLED macro
extern __thread TinyTLSSLL g_tls_sll[TINY_NUM_CLASSES];
extern _Atomic uint32_t g_frontend_fill_target[TINY_NUM_CLASSES];
extern int g_ultra_bump_shadow;
extern int g_bump_chunk;
extern __thread uint8_t* g_tls_bcur[TINY_NUM_CLASSES];
extern __thread uint8_t* g_tls_bend[TINY_NUM_CLASSES];
#if HAKMEM_DEBUG_COUNTERS
extern uint64_t g_bump_hits[TINY_NUM_CLASSES];
extern uint64_t g_bump_arms[TINY_NUM_CLASSES];
extern uint64_t g_path_refill_calls[TINY_NUM_CLASSES];
extern uint64_t g_ultra_refill_calls[TINY_NUM_CLASSES];
extern int g_path_debug_enabled;
#endif
// ========= From other units =========
Box API Phase 1-3: Capacity Manager, Carve-Push, Prewarm 実装 Priority 1-3のBox Modulesを実装し、安全なpre-warming APIを提供。 既存の複雑なprewarmコードを1行のBox API呼び出しに置き換え。 ## 新規Box Modules 1. **Box Capacity Manager** (capacity_box.h/c) - TLS SLL容量の一元管理 - adaptive_sizing初期化保証 - Double-free バグ防止 2. **Box Carve-And-Push** (carve_push_box.h/c) - アトミックなblock carve + TLS SLL push - All-or-nothing semantics - Rollback保証(partial failure防止) 3. **Box Prewarm** (prewarm_box.h/c) - 安全なTLS cache pre-warming - 初期化依存性を隠蔽 - シンプルなAPI (1関数呼び出し) ## コード簡略化 hakmem_tiny_init.inc: 20行 → 1行 ```c // BEFORE: 複雑なP0分岐とエラー処理 adaptive_sizing_init(); if (prewarm > 0) { #if HAKMEM_TINY_P0_BATCH_REFILL int taken = sll_refill_batch_from_ss(5, prewarm); #else int taken = sll_refill_small_from_ss(5, prewarm); #endif } // AFTER: Box API 1行 int taken = box_prewarm_tls(5, prewarm); ``` ## シンボルExport修正 hakmem_tiny.c: 5つのシンボルをstatic → non-static - g_tls_slabs[] (TLS slab配列) - g_sll_multiplier (SLL容量乗数) - g_sll_cap_override[] (容量オーバーライド) - superslab_refill() (SuperSlab再充填) - ss_active_add() (アクティブカウンタ) ## ビルドシステム Makefile: TINY_BENCH_OBJS_BASEに3つのBox modules追加 - core/box/capacity_box.o - core/box/carve_push_box.o - core/box/prewarm_box.o ## 動作確認 ✅ Debug build成功 ✅ Box Prewarm API動作確認 [PREWARM] class=5 requested=128 taken=32 ## 次のステップ - Box Refill Manager (Priority 4) - Box SuperSlab Allocator (Priority 5) - Release build修正(tiny_debug_ring_record) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 01:45:30 +09:00
SuperSlab* superslab_refill(int class_idx);
void ss_active_inc(SuperSlab* ss);
Box API Phase 1-3: Capacity Manager, Carve-Push, Prewarm 実装 Priority 1-3のBox Modulesを実装し、安全なpre-warming APIを提供。 既存の複雑なprewarmコードを1行のBox API呼び出しに置き換え。 ## 新規Box Modules 1. **Box Capacity Manager** (capacity_box.h/c) - TLS SLL容量の一元管理 - adaptive_sizing初期化保証 - Double-free バグ防止 2. **Box Carve-And-Push** (carve_push_box.h/c) - アトミックなblock carve + TLS SLL push - All-or-nothing semantics - Rollback保証(partial failure防止) 3. **Box Prewarm** (prewarm_box.h/c) - 安全なTLS cache pre-warming - 初期化依存性を隠蔽 - シンプルなAPI (1関数呼び出し) ## コード簡略化 hakmem_tiny_init.inc: 20行 → 1行 ```c // BEFORE: 複雑なP0分岐とエラー処理 adaptive_sizing_init(); if (prewarm > 0) { #if HAKMEM_TINY_P0_BATCH_REFILL int taken = sll_refill_batch_from_ss(5, prewarm); #else int taken = sll_refill_small_from_ss(5, prewarm); #endif } // AFTER: Box API 1行 int taken = box_prewarm_tls(5, prewarm); ``` ## シンボルExport修正 hakmem_tiny.c: 5つのシンボルをstatic → non-static - g_tls_slabs[] (TLS slab配列) - g_sll_multiplier (SLL容量乗数) - g_sll_cap_override[] (容量オーバーライド) - superslab_refill() (SuperSlab再充填) - ss_active_add() (アクティブカウンタ) ## ビルドシステム Makefile: TINY_BENCH_OBJS_BASEに3つのBox modules追加 - core/box/capacity_box.o - core/box/carve_push_box.o - core/box/prewarm_box.o ## 動作確認 ✅ Debug build成功 ✅ Box Prewarm API動作確認 [PREWARM] class=5 requested=128 taken=32 ## 次のステップ - Box Refill Manager (Priority 4) - Box SuperSlab Allocator (Priority 5) - Release build修正(tiny_debug_ring_record) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 01:45:30 +09:00
void ss_active_add(SuperSlab* ss, uint32_t n);
size_t tiny_stride_for_class(int class_idx);
uint8_t* tiny_slab_base_for_geometry(SuperSlab* ss, int slab_idx);
extern uint32_t sll_cap_for_class(int class_idx, uint32_t mag_cap);
/* ultra_* 系は hakmem_tiny.c 側に定義があるため、ここでは宣言しない */
/* tls_sll_push は box/tls_sll_box.h で static inline bool tls_sll_push(...) 提供済み */
/* tiny_small_mags_init_once / tiny_mag_init_if_needed も hakmem_tiny_magazine.h で宣言済みなので、ここでは再宣言しない */
/* tiny_fast_pop / tiny_fast_push / fastcache_* は hakmem_tiny_fastcache.inc.h 側の static inline なので、ここでは未宣言でOK */
#if !HAKMEM_BUILD_RELEASE
static inline void tiny_debug_validate_node_base(int class_idx, void* node, const char* where)
{
(void)class_idx;
(void)where;
// 最低限の防御: 異常に小さいアドレスを弾く
if ((uintptr_t)node < 4096) {
fprintf(stderr,
"[TINY_REFILL_GUARD] %s: suspicious node=%p cls=%d\n",
where, node, class_idx);
abort();
}
}
#else
static inline void tiny_debug_validate_node_base(int class_idx, void* node, const char* where)
{
(void)class_idx;
(void)node;
(void)where;
}
#endif
static inline void c7_log_used_assign_cap(TinySlabMeta* meta,
int class_idx,
const char* tag) {
if (__builtin_expect(class_idx != 7, 1)) {
return;
}
#if HAKMEM_BUILD_RELEASE
static _Atomic uint32_t rel_logs = 0;
uint32_t n = atomic_fetch_add_explicit(&rel_logs, 1, memory_order_relaxed);
if (n < 4) {
fprintf(stderr,
"[REL_C7_USED_ASSIGN] tag=%s used=%u cap=%u carved=%u freelist=%p\n",
tag,
(unsigned)meta->used,
(unsigned)meta->capacity,
(unsigned)meta->carved,
meta->freelist);
}
#else
static _Atomic uint32_t dbg_logs = 0;
uint32_t n = atomic_fetch_add_explicit(&dbg_logs, 1, memory_order_relaxed);
if (n < 4) {
fprintf(stderr,
"[DBG_C7_USED_ASSIGN] tag=%s used=%u cap=%u carved=%u freelist=%p\n",
tag,
(unsigned)meta->used,
(unsigned)meta->capacity,
(unsigned)meta->carved,
meta->freelist);
}
#endif
}
// ========= superslab_tls_bump_fast =========
//
// Ultra bump shadow: current slabが freelist 空で carved<capacity のとき、
// 連続領域を TLS window としてまとめ予約する。
// tiny_hot_pop_class{0..3} から呼ばれる。
static inline void* superslab_tls_bump_fast(int class_idx) {
if (!g_ultra_bump_shadow || !g_use_superslab) return NULL;
uint8_t* cur = g_tls_bcur[class_idx];
if (cur) {
uint8_t* end = g_tls_bend[class_idx];
size_t stride = tiny_stride_for_class(class_idx);
if (cur + stride <= end) {
g_tls_bcur[class_idx] = cur + stride;
#if HAKMEM_DEBUG_COUNTERS
g_bump_hits[class_idx]++;
#endif
#if HAKMEM_TINY_HEADER_CLASSIDX
// Headerは呼び出し元で書く or strideに含め済み想定。ここでは生ポインタ返す。
#endif
return cur;
}
g_tls_bcur[class_idx] = NULL;
g_tls_bend[class_idx] = NULL;
}
TinyTLSSlab* tls = &g_tls_slabs[class_idx];
TinySlabMeta* meta = tls->meta;
if (!tls->ss || !meta || meta->freelist) return NULL;
uint16_t carved = meta->carved;
uint16_t cap = meta->capacity;
if (carved >= cap) return NULL;
uint32_t avail = (uint32_t)cap - (uint32_t)carved;
uint32_t chunk = (g_bump_chunk > 0) ? (uint32_t)g_bump_chunk : 1u;
if (chunk > avail) chunk = avail;
size_t stride = tiny_stride_for_class(class_idx);
uint8_t* base = tls->slab_base
? tls->slab_base
: tiny_slab_base_for_geometry(tls->ss, tls->slab_idx);
uint8_t* start = base + (size_t)carved * stride;
meta->carved = (uint16_t)(carved + (uint16_t)chunk);
meta->used = (uint16_t)(meta->used + (uint16_t)chunk);
if (class_idx == 7) {
for (uint32_t i = 0; i < chunk; ++i) {
c7_meta_used_note(class_idx, C7_META_USED_SRC_FRONT);
}
}
ss_active_add(tls->ss, chunk);
#if HAKMEM_DEBUG_COUNTERS
g_bump_arms[class_idx]++;
#endif
// 1個目を即返し、残りをTLS windowとして保持
g_tls_bcur[class_idx] = start + stride;
g_tls_bend[class_idx] = start + (size_t)chunk * stride;
return start;
}
// ========= tiny_fast_refill_and_take =========
//
// FCが空の時に、TLS list/superslab からバッチ取得して一つ返す。
// 旧来の複雑な経路を削り、FC/SLLのみの最小ロジックにする。
static inline void* tiny_fast_refill_and_take(int class_idx, TinyTLSList* tls) {
(void)tls;
// 1) Front FastCache から直接
// Phase 7-Step6-Fix: Use config macro for dead code elimination in PGO mode
if (__builtin_expect(TINY_FRONT_FASTCACHE_ENABLED && class_idx <= 3, 1)) {
Implement Phantom typing for Tiny FastCache layer Refactor FastCache and TLS cache APIs to use Phantom types (hak_base_ptr_t) for compile-time type safety, preventing BASE/USER pointer confusion. Changes: 1. core/hakmem_tiny_fastcache.inc.h: - fastcache_pop() returns hak_base_ptr_t instead of void* - fastcache_push() accepts hak_base_ptr_t instead of void* 2. core/hakmem_tiny.c: - Updated forward declarations to match new signatures 3. core/tiny_alloc_fast.inc.h, core/hakmem_tiny_alloc.inc: - Alloc paths now use hak_base_ptr_t for cache operations - BASE->USER conversion via HAK_RET_ALLOC macro 4. core/hakmem_tiny_refill.inc.h, core/refill/ss_refill_fc.h: - Refill paths properly handle BASE pointer types - Fixed: Removed unnecessary HAK_BASE_FROM_RAW() in ss_refill_fc.h line 176 5. core/hakmem_tiny_free.inc, core/tiny_free_magazine.inc.h: - Free paths convert USER->BASE before cache push - USER->BASE conversion via HAK_USER_TO_BASE or ptr_user_to_base() 6. core/hakmem_tiny_legacy_slow_box.inc: - Legacy path properly wraps pointers for cache API Benefits: - Type safety at compile time (in debug builds) - Zero runtime overhead (debug builds only, release builds use typedef=void*) - All BASE->USER conversions verified via Task analysis - Prevents pointer type confusion bugs Testing: - Build: SUCCESS (all 9 files) - Smoke test: PASS (sh8bench runs to completion) - Conversion path verification: 3/3 paths correct 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-04 11:05:06 +09:00
hak_base_ptr_t fc = fastcache_pop(class_idx);
if (!hak_base_is_null(fc)) {
extern unsigned long long g_front_fc_hit[TINY_NUM_CLASSES];
g_front_fc_hit[class_idx]++;
Implement Phantom typing for Tiny FastCache layer Refactor FastCache and TLS cache APIs to use Phantom types (hak_base_ptr_t) for compile-time type safety, preventing BASE/USER pointer confusion. Changes: 1. core/hakmem_tiny_fastcache.inc.h: - fastcache_pop() returns hak_base_ptr_t instead of void* - fastcache_push() accepts hak_base_ptr_t instead of void* 2. core/hakmem_tiny.c: - Updated forward declarations to match new signatures 3. core/tiny_alloc_fast.inc.h, core/hakmem_tiny_alloc.inc: - Alloc paths now use hak_base_ptr_t for cache operations - BASE->USER conversion via HAK_RET_ALLOC macro 4. core/hakmem_tiny_refill.inc.h, core/refill/ss_refill_fc.h: - Refill paths properly handle BASE pointer types - Fixed: Removed unnecessary HAK_BASE_FROM_RAW() in ss_refill_fc.h line 176 5. core/hakmem_tiny_free.inc, core/tiny_free_magazine.inc.h: - Free paths convert USER->BASE before cache push - USER->BASE conversion via HAK_USER_TO_BASE or ptr_user_to_base() 6. core/hakmem_tiny_legacy_slow_box.inc: - Legacy path properly wraps pointers for cache API Benefits: - Type safety at compile time (in debug builds) - Zero runtime overhead (debug builds only, release builds use typedef=void*) - All BASE->USER conversions verified via Task analysis - Prevents pointer type confusion bugs Testing: - Build: SUCCESS (all 9 files) - Smoke test: PASS (sh8bench runs to completion) - Conversion path verification: 3/3 paths correct 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-04 11:05:06 +09:00
return HAK_BASE_TO_RAW(fc);
}
}
// 2) ローカルfast list
{
Implement Phantom typing for Tiny FastCache layer Refactor FastCache and TLS cache APIs to use Phantom types (hak_base_ptr_t) for compile-time type safety, preventing BASE/USER pointer confusion. Changes: 1. core/hakmem_tiny_fastcache.inc.h: - fastcache_pop() returns hak_base_ptr_t instead of void* - fastcache_push() accepts hak_base_ptr_t instead of void* 2. core/hakmem_tiny.c: - Updated forward declarations to match new signatures 3. core/tiny_alloc_fast.inc.h, core/hakmem_tiny_alloc.inc: - Alloc paths now use hak_base_ptr_t for cache operations - BASE->USER conversion via HAK_RET_ALLOC macro 4. core/hakmem_tiny_refill.inc.h, core/refill/ss_refill_fc.h: - Refill paths properly handle BASE pointer types - Fixed: Removed unnecessary HAK_BASE_FROM_RAW() in ss_refill_fc.h line 176 5. core/hakmem_tiny_free.inc, core/tiny_free_magazine.inc.h: - Free paths convert USER->BASE before cache push - USER->BASE conversion via HAK_USER_TO_BASE or ptr_user_to_base() 6. core/hakmem_tiny_legacy_slow_box.inc: - Legacy path properly wraps pointers for cache API Benefits: - Type safety at compile time (in debug builds) - Zero runtime overhead (debug builds only, release builds use typedef=void*) - All BASE->USER conversions verified via Task analysis - Prevents pointer type confusion bugs Testing: - Build: SUCCESS (all 9 files) - Smoke test: PASS (sh8bench runs to completion) - Conversion path verification: 3/3 paths correct 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-04 11:05:06 +09:00
hak_base_ptr_t p = tiny_fast_pop(class_idx);
if (!hak_base_is_null(p)) return HAK_BASE_TO_RAW(p);
Add Box 3 (Pointer Conversion Layer) and fix POOL_TLS_PHASE1 default ## Major Changes ### 1. Box 3: Pointer Conversion Module (NEW) - File: core/box/ptr_conversion_box.h - Purpose: Unified BASE ↔ USER pointer conversion (single source of truth) - API: PTR_BASE_TO_USER(), PTR_USER_TO_BASE() - Features: Zero-overhead inline, debug mode, NULL-safe, class 7 headerless support - Design: Header-only, fully modular, no external dependencies ### 2. POOL_TLS_PHASE1 Default OFF (CRITICAL FIX) - File: build.sh - Change: POOL_TLS_PHASE1 now defaults to 0 (was hardcoded to 1) - Impact: Eliminates pthread_mutex overhead on every free() (was causing 3.3x slowdown) - Usage: Set POOL_TLS_PHASE1=1 env var to enable if needed ### 3. Pointer Conversion Fixes (PARTIAL) - Files: core/box/front_gate_box.c, core/tiny_alloc_fast.inc.h, etc. - Status: Partial implementation using Box 3 API - Note: Work in progress, some conversions still need review ### 4. Performance Investigation Report (NEW) - File: HOTPATH_PERFORMANCE_INVESTIGATION.md - Findings: - Hotpath works (+24% vs baseline) after POOL_TLS fix - Still 9.2x slower than system malloc due to: * Heavy initialization (23.85% of cycles) * Syscall overhead (2,382 syscalls per 100K ops) * Workload mismatch (C7 1KB is 49.8%, but only C5 256B has hotpath) * 9.4x more instructions than system malloc ### 5. Known Issues - SEGV at 20K-30K iterations (pre-existing bug, not related to pointer conversions) - Root cause: Likely active counter corruption or TLS-SLL chain issues - Status: Under investigation ## Performance Results (100K iterations, 256B) - Baseline (Hotpath OFF): 7.22M ops/s - Hotpath ON: 8.98M ops/s (+24% improvement ✓) - System malloc: 82.2M ops/s (still 9.2x faster) ## Next Steps - P0: Fix 20K-30K SEGV bug (GDB investigation needed) - P1: Lazy initialization (+20-25% expected) - P1: C7 (1KB) hotpath (+30-40% expected, biggest win) - P2: Reduce syscalls (+15-20% expected) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-12 01:01:23 +09:00
}
uint16_t cap = g_fast_cap[class_idx];
if (cap == 0) return NULL;
TinyFastCache* fc = &g_fast_cache[class_idx];
int room = (int)cap - fc->top;
if (room <= 0) return NULL;
// 3) TLS SLL から詰め替え
int filled = 0;
Phase 7-Step7: Replace g_tls_sll_enable with TINY_FRONT_TLS_SLL_ENABLED macro **Goal**: Enable dead code elimination for TLS SLL checks in PGO mode **Changes**: 1. core/box/tiny_front_config_box.h: - Add TINY_FRONT_TLS_SLL_ENABLED macro (PGO: 1, Normal: tiny_tls_sll_enabled()) - Add tiny_tls_sll_enabled() wrapper function (static inline) 2. core/tiny_alloc_fast.inc.h (5 hot path locations): - Line 220: tiny_heap_v2_refill_mag() - early return check - Line 388: SLIM mode - SLL freelist check - Line 459: tiny_alloc_fast_pop() - Layer 1 SLL check - Line 774: Main alloc path - cached sll_enabled check (most critical!) - Line 815: Generic front - SLL toggle respect 3. core/hakmem_tiny_refill.inc.h (2 locations): - Line 186: bulk_mag_refill_fc() - refill from SLL - Line 213: bulk_mag_to_sll_if_room() - push to SLL **Performance**: 79.9M ops/s (maintained, +0.1M vs Step 6) - Normal mode: Same performance (runtime checks preserved) - PGO mode: Dead code elimination ready (if (!1) → removed by compiler) **Expected PGO benefit**: - Eliminate 7 TLS SLL checks across hot paths - Reduce instruction count in main alloc loop - Better branch prediction (no runtime checks) **Design**: Config Box as single entry point - All TLS SLL checks now use TINY_FRONT_TLS_SLL_ENABLED - Consistent pattern with FASTCACHE/SFC/HEAP_V2 macros - Include order independent (wrapper in config box header) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-29 17:35:51 +09:00
// Phase 7-Step7: Use config macro for dead code elimination in PGO mode
while (room > 0 && TINY_FRONT_TLS_SLL_ENABLED) {
void* h = NULL;
if (!tls_sll_pop(class_idx, &h)) break;
tiny_debug_validate_node_base(class_idx, h, "tiny_fast_refill_and_take");
fc->items[fc->top++] = h;
room--;
filled++;
}
if (filled == 0) {
// 4) Superslab bump (optional)
void* bump = superslab_tls_bump_fast(class_idx);
if (bump) return bump;
return NULL;
}
// 5) 1個返す
return fc->items[--fc->top];
}
// ========= bulk_mag_to_sll_if_room =========
//
// Magazine → SLL への安全な流し込み。
// tiny_free_magazine.inc.h から参照される。
static inline int bulk_mag_to_sll_if_room(int class_idx, TinyTLSMag* mag, int n) {
Phase 7-Step7: Replace g_tls_sll_enable with TINY_FRONT_TLS_SLL_ENABLED macro **Goal**: Enable dead code elimination for TLS SLL checks in PGO mode **Changes**: 1. core/box/tiny_front_config_box.h: - Add TINY_FRONT_TLS_SLL_ENABLED macro (PGO: 1, Normal: tiny_tls_sll_enabled()) - Add tiny_tls_sll_enabled() wrapper function (static inline) 2. core/tiny_alloc_fast.inc.h (5 hot path locations): - Line 220: tiny_heap_v2_refill_mag() - early return check - Line 388: SLIM mode - SLL freelist check - Line 459: tiny_alloc_fast_pop() - Layer 1 SLL check - Line 774: Main alloc path - cached sll_enabled check (most critical!) - Line 815: Generic front - SLL toggle respect 3. core/hakmem_tiny_refill.inc.h (2 locations): - Line 186: bulk_mag_refill_fc() - refill from SLL - Line 213: bulk_mag_to_sll_if_room() - push to SLL **Performance**: 79.9M ops/s (maintained, +0.1M vs Step 6) - Normal mode: Same performance (runtime checks preserved) - PGO mode: Dead code elimination ready (if (!1) → removed by compiler) **Expected PGO benefit**: - Eliminate 7 TLS SLL checks across hot paths - Reduce instruction count in main alloc loop - Better branch prediction (no runtime checks) **Design**: Config Box as single entry point - All TLS SLL checks now use TINY_FRONT_TLS_SLL_ENABLED - Consistent pattern with FASTCACHE/SFC/HEAP_V2 macros - Include order independent (wrapper in config box header) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-29 17:35:51 +09:00
// Phase 7-Step7: Use config macro for dead code elimination in PGO mode
if (!TINY_FRONT_TLS_SLL_ENABLED || n <= 0) return 0;
uint32_t cap = sll_cap_for_class(class_idx, (uint32_t)mag->cap);
uint32_t have = g_tls_sll[class_idx].count;
if (have >= cap) return 0;
int room = (int)(cap - have);
int take = n < room ? n : room;
if (take <= 0) return 0;
if (take > mag->top) take = mag->top;
if (take <= 0) return 0;
int pushed = 0;
for (int i = 0; i < take; i++) {
void* p = mag->items[--mag->top].ptr;
hak_base_ptr_t base_p = HAK_BASE_FROM_RAW(p);
if (!tls_sll_push(class_idx, base_p, cap)) {
mag->top++; // rollback last
break;
}
pushed++;
}
#if HAKMEM_DEBUG_COUNTERS
if (pushed > 0) g_path_refill_calls[class_idx]++;
#endif
return pushed;
}
/*
* ========= Minimal Phase 12 sll_refill_small_from_ss =========
*
* Box化方針:
* - tiny_fast_refill :
* - TLS SLL: tls_sll_box.h API 使
* - Superslab: SLL Box
* - :
* - (Stage A/B) TLS Superslab/TinySlabMeta
* - (Stage C) shared_pool_acquire_slab()
* Superslab
*
* :
* - Tiny classes (0 <= class_idx < TINY_NUM_CLASSES)
* - max_take SLL
* - SLL 0
* - head/count/meta Box API (tls_sll_box)
*/
__attribute__((noinline))
int sll_refill_small_from_ss(int class_idx, int max_take)
{
// Hard defensive gate: Tiny classes only, never trust caller.
if (class_idx < 0 || class_idx >= TINY_NUM_CLASSES) {
return 0;
}
// TinyHeap front で扱うクラスは TLS SLL を使わないTinyHeapBox 内で完結)。
if (tiny_heap_class_route_enabled(class_idx)) {
return 0;
}
Add Box I (Integrity), Box E (Expansion), and comprehensive P0 debugging infrastructure ## Major Additions ### 1. Box I: Integrity Verification System (NEW - 703 lines) - Files: core/box/integrity_box.h (267 lines), core/box/integrity_box.c (436 lines) - Purpose: Unified integrity checking across all HAKMEM subsystems - Features: * 4-level integrity checking (0-4, compile-time controlled) * Priority 1: TLS array bounds validation * Priority 2: Freelist pointer validation * Priority 3: TLS canary monitoring * Priority ALPHA: Slab metadata invariant checking (5 invariants) * Atomic statistics tracking (thread-safe) * Beautiful BOX_BOUNDARY design pattern ### 2. Box E: SuperSlab Expansion System (COMPLETE) - Files: core/box/superslab_expansion_box.h, core/box/superslab_expansion_box.c - Purpose: Safe SuperSlab expansion with TLS state guarantee - Features: * Immediate slab 0 binding after expansion * TLS state snapshot and restoration * Design by Contract (pre/post-conditions, invariants) * Thread-safe with mutex protection ### 3. Comprehensive Integrity Checking System - File: core/hakmem_tiny_integrity.h (NEW) - Unified validation functions for all allocator subsystems - Uninitialized memory pattern detection (0xa2, 0xcc, 0xdd, 0xfe) - Pointer range validation (null-page, kernel-space) ### 4. P0 Bug Investigation - Root Cause Identified **Bug**: SEGV at iteration 28440 (deterministic with seed 42) **Pattern**: 0xa2a2a2a2a2a2a2a2 (uninitialized/ASan poisoning) **Location**: TLS SLL (Single-Linked List) cache layer **Root Cause**: Race condition or use-after-free in TLS list management (class 0) **Detection**: Box I successfully caught invalid pointer at exact crash point ### 5. Defensive Improvements - Defensive memset in SuperSlab allocation (all metadata arrays) - Enhanced pointer validation with pattern detection - BOX_BOUNDARY markers throughout codebase (beautiful modular design) - 5 metadata invariant checks in allocation/free/refill paths ## Integration Points - Modified 13 files with Box I/E integration - Added 10+ BOX_BOUNDARY markers - 5 critical integrity check points in P0 refill path ## Test Results (100K iterations) - Baseline: 7.22M ops/s - Hotpath ON: 8.98M ops/s (+24% improvement ✓) - P0 Bug: Still crashes at 28440 iterations (TLS SLL race condition) - Root cause: Identified but not yet fixed (requires deeper investigation) ## Performance - Box I overhead: Zero in release builds (HAKMEM_INTEGRITY_LEVEL=0) - Debug builds: Full validation enabled (HAKMEM_INTEGRITY_LEVEL=4) - Beautiful modular design maintains clean separation of concerns ## Known Issues - P0 Bug at 28440 iterations: Race condition in TLS SLL cache (class 0) - Cause: Use-after-free or race in remote free draining - Next step: Valgrind investigation to pinpoint exact corruption location ## Code Quality - Total new code: ~1400 lines (Box I + Box E + integrity system) - Design: Beautiful Box Theory with clear boundaries - Modularity: Complete separation of concerns - Documentation: Comprehensive inline comments and BOX_BOUNDARY markers 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-12 02:45:00 +09:00
HAK_CHECK_CLASS_IDX(class_idx, "sll_refill_small_from_ss");
atomic_fetch_add(&g_integrity_check_class_bounds, 1);
// Phase12: 起動直後など、shared pool / superslab 未有効時は絶対に動かさない。
if (!g_use_superslab || max_take <= 0) {
Phase 12: Shared SuperSlab Pool implementation (WIP - runtime crash) ## Summary Implemented Phase 12 Shared SuperSlab Pool (mimalloc-style) to address SuperSlab allocation churn (877 SuperSlabs → 100-200 target). ## Implementation (ChatGPT + Claude) 1. **Metadata changes** (superslab_types.h): - Added class_idx to TinySlabMeta (per-slab dynamic class) - Removed size_class from SuperSlab (no longer per-SuperSlab) - Changed owner_tid (16-bit) → owner_tid_low (8-bit) 2. **Shared Pool** (hakmem_shared_pool.{h,c}): - Global pool shared by all size classes - shared_pool_acquire_slab() - Get free slab for class_idx - shared_pool_release_slab() - Return slab when empty - Per-class hints for fast path optimization 3. **Integration** (23 files modified): - Updated all ss->size_class → meta->class_idx - Updated all meta->owner_tid → meta->owner_tid_low - superslab_refill() now uses shared pool - Free path releases empty slabs back to pool 4. **Build system** (Makefile): - Added hakmem_shared_pool.o to OBJS_BASE and TINY_BENCH_OBJS_BASE ## Status: ⚠️ Build OK, Runtime CRASH **Build**: ✅ SUCCESS - All 23 files compile without errors - Only warnings: superslab_allocate type mismatch (legacy code) **Runtime**: ❌ SEGFAULT - Crash location: sll_refill_small_from_ss() - Exit code: 139 (SIGSEGV) - Test case: ./bench_random_mixed_hakmem 1000 256 42 ## Known Issues 1. **SEGFAULT in refill path** - Likely shared_pool_acquire_slab() issue 2. **Legacy superslab_allocate()** still exists (type mismatch warning) 3. **Remaining TODOs** from design doc: - SuperSlab physical layout integration - slab_handle.h cleanup - Remove old per-class head implementation ## Next Steps 1. Debug SEGFAULT (gdb backtrace shows sll_refill_small_from_ss) 2. Fix shared_pool_acquire_slab() or superslab_init_slab() 3. Basic functionality test (1K → 100K iterations) 4. Measure SuperSlab count reduction (877 → 100-200) 5. Performance benchmark (+650-860% expected) ## Files Changed (25 files) core/box/free_local_box.c core/box/free_remote_box.c core/box/front_gate_classifier.c core/hakmem_super_registry.c core/hakmem_tiny.c core/hakmem_tiny_bg_spill.c core/hakmem_tiny_free.inc core/hakmem_tiny_lifecycle.inc core/hakmem_tiny_magazine.c core/hakmem_tiny_query.c core/hakmem_tiny_refill.inc.h core/hakmem_tiny_superslab.c core/hakmem_tiny_superslab.h core/hakmem_tiny_tls_ops.h core/slab_handle.h core/superslab/superslab_inline.h core/superslab/superslab_types.h core/tiny_debug.h core/tiny_free_fast.inc.h core/tiny_free_magazine.inc.h core/tiny_remote.c core/tiny_superslab_alloc.inc.h core/tiny_superslab_free.inc.h Makefile ## New Files (3 files) PHASE12_SHARED_SUPERSLAB_POOL_DESIGN.md core/hakmem_shared_pool.c core/hakmem_shared_pool.h 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: ChatGPT <chatgpt@openai.com>
2025-11-13 16:33:03 +09:00
return 0;
}
// TLS slab 未構成状態 (ss/meta/slab_base すべて NULL) のときは、ここでは触らない。
// superslab_refill は「本当に必要になったタイミング」でのみ呼ぶ。
TinyTLSSlab* tls = &g_tls_slabs[class_idx];
if (!tls) {
return 0;
}
// FIX: TLS未初期化時も superslab_refill() で初期化する(早期リターン削除)
// 以前は tls_uninitialized の場合に return 0 していたが、これだと
// TLS SLL が永遠に空のままになり、Larson ベンチで 70x slowdown が発生していた。
// Ensure we have a valid TLS slab for this class via shared pool.
// superslab_refill() 契約:
// - 成功: g_tls_slabs[class_idx] に ss/meta/slab_base/slab_idx を一貫して設定
// - 失敗: TLS は不変 or 巻き戻し、NULL を返す
if (!tls->ss || !tls->meta ||
tls->meta->class_idx != (uint8_t)class_idx ||
!tls->slab_base) {
if (!superslab_refill(class_idx)) {
Phase 12: Shared SuperSlab Pool implementation (WIP - runtime crash) ## Summary Implemented Phase 12 Shared SuperSlab Pool (mimalloc-style) to address SuperSlab allocation churn (877 SuperSlabs → 100-200 target). ## Implementation (ChatGPT + Claude) 1. **Metadata changes** (superslab_types.h): - Added class_idx to TinySlabMeta (per-slab dynamic class) - Removed size_class from SuperSlab (no longer per-SuperSlab) - Changed owner_tid (16-bit) → owner_tid_low (8-bit) 2. **Shared Pool** (hakmem_shared_pool.{h,c}): - Global pool shared by all size classes - shared_pool_acquire_slab() - Get free slab for class_idx - shared_pool_release_slab() - Return slab when empty - Per-class hints for fast path optimization 3. **Integration** (23 files modified): - Updated all ss->size_class → meta->class_idx - Updated all meta->owner_tid → meta->owner_tid_low - superslab_refill() now uses shared pool - Free path releases empty slabs back to pool 4. **Build system** (Makefile): - Added hakmem_shared_pool.o to OBJS_BASE and TINY_BENCH_OBJS_BASE ## Status: ⚠️ Build OK, Runtime CRASH **Build**: ✅ SUCCESS - All 23 files compile without errors - Only warnings: superslab_allocate type mismatch (legacy code) **Runtime**: ❌ SEGFAULT - Crash location: sll_refill_small_from_ss() - Exit code: 139 (SIGSEGV) - Test case: ./bench_random_mixed_hakmem 1000 256 42 ## Known Issues 1. **SEGFAULT in refill path** - Likely shared_pool_acquire_slab() issue 2. **Legacy superslab_allocate()** still exists (type mismatch warning) 3. **Remaining TODOs** from design doc: - SuperSlab physical layout integration - slab_handle.h cleanup - Remove old per-class head implementation ## Next Steps 1. Debug SEGFAULT (gdb backtrace shows sll_refill_small_from_ss) 2. Fix shared_pool_acquire_slab() or superslab_init_slab() 3. Basic functionality test (1K → 100K iterations) 4. Measure SuperSlab count reduction (877 → 100-200) 5. Performance benchmark (+650-860% expected) ## Files Changed (25 files) core/box/free_local_box.c core/box/free_remote_box.c core/box/front_gate_classifier.c core/hakmem_super_registry.c core/hakmem_tiny.c core/hakmem_tiny_bg_spill.c core/hakmem_tiny_free.inc core/hakmem_tiny_lifecycle.inc core/hakmem_tiny_magazine.c core/hakmem_tiny_query.c core/hakmem_tiny_refill.inc.h core/hakmem_tiny_superslab.c core/hakmem_tiny_superslab.h core/hakmem_tiny_tls_ops.h core/slab_handle.h core/superslab/superslab_inline.h core/superslab/superslab_types.h core/tiny_debug.h core/tiny_free_fast.inc.h core/tiny_free_magazine.inc.h core/tiny_remote.c core/tiny_superslab_alloc.inc.h core/tiny_superslab_free.inc.h Makefile ## New Files (3 files) PHASE12_SHARED_SUPERSLAB_POOL_DESIGN.md core/hakmem_shared_pool.c core/hakmem_shared_pool.h 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: ChatGPT <chatgpt@openai.com>
2025-11-13 16:33:03 +09:00
return 0;
}
tls = &g_tls_slabs[class_idx];
if (!tls->ss || !tls->meta ||
tls->meta->class_idx != (uint8_t)class_idx ||
!tls->slab_base) {
Phase 12: Shared SuperSlab Pool implementation (WIP - runtime crash) ## Summary Implemented Phase 12 Shared SuperSlab Pool (mimalloc-style) to address SuperSlab allocation churn (877 SuperSlabs → 100-200 target). ## Implementation (ChatGPT + Claude) 1. **Metadata changes** (superslab_types.h): - Added class_idx to TinySlabMeta (per-slab dynamic class) - Removed size_class from SuperSlab (no longer per-SuperSlab) - Changed owner_tid (16-bit) → owner_tid_low (8-bit) 2. **Shared Pool** (hakmem_shared_pool.{h,c}): - Global pool shared by all size classes - shared_pool_acquire_slab() - Get free slab for class_idx - shared_pool_release_slab() - Return slab when empty - Per-class hints for fast path optimization 3. **Integration** (23 files modified): - Updated all ss->size_class → meta->class_idx - Updated all meta->owner_tid → meta->owner_tid_low - superslab_refill() now uses shared pool - Free path releases empty slabs back to pool 4. **Build system** (Makefile): - Added hakmem_shared_pool.o to OBJS_BASE and TINY_BENCH_OBJS_BASE ## Status: ⚠️ Build OK, Runtime CRASH **Build**: ✅ SUCCESS - All 23 files compile without errors - Only warnings: superslab_allocate type mismatch (legacy code) **Runtime**: ❌ SEGFAULT - Crash location: sll_refill_small_from_ss() - Exit code: 139 (SIGSEGV) - Test case: ./bench_random_mixed_hakmem 1000 256 42 ## Known Issues 1. **SEGFAULT in refill path** - Likely shared_pool_acquire_slab() issue 2. **Legacy superslab_allocate()** still exists (type mismatch warning) 3. **Remaining TODOs** from design doc: - SuperSlab physical layout integration - slab_handle.h cleanup - Remove old per-class head implementation ## Next Steps 1. Debug SEGFAULT (gdb backtrace shows sll_refill_small_from_ss) 2. Fix shared_pool_acquire_slab() or superslab_init_slab() 3. Basic functionality test (1K → 100K iterations) 4. Measure SuperSlab count reduction (877 → 100-200) 5. Performance benchmark (+650-860% expected) ## Files Changed (25 files) core/box/free_local_box.c core/box/free_remote_box.c core/box/front_gate_classifier.c core/hakmem_super_registry.c core/hakmem_tiny.c core/hakmem_tiny_bg_spill.c core/hakmem_tiny_free.inc core/hakmem_tiny_lifecycle.inc core/hakmem_tiny_magazine.c core/hakmem_tiny_query.c core/hakmem_tiny_refill.inc.h core/hakmem_tiny_superslab.c core/hakmem_tiny_superslab.h core/hakmem_tiny_tls_ops.h core/slab_handle.h core/superslab/superslab_inline.h core/superslab/superslab_types.h core/tiny_debug.h core/tiny_free_fast.inc.h core/tiny_free_magazine.inc.h core/tiny_remote.c core/tiny_superslab_alloc.inc.h core/tiny_superslab_free.inc.h Makefile ## New Files (3 files) PHASE12_SHARED_SUPERSLAB_POOL_DESIGN.md core/hakmem_shared_pool.c core/hakmem_shared_pool.h 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: ChatGPT <chatgpt@openai.com>
2025-11-13 16:33:03 +09:00
return 0;
}
}
Fix #16: Resolve double BASE→USER conversion causing header corruption 🎯 ROOT CAUSE: Internal allocation helpers were prematurely converting BASE → USER pointers before returning to caller. The caller then applied HAK_RET_ALLOC/tiny_region_id_write_header which performed ANOTHER BASE→USER conversion, resulting in double offset (BASE+2) and header written at wrong location. 📦 BOX THEORY SOLUTION: Establish clean pointer conversion boundary at tiny_region_id_write_header, making it the single source of truth for BASE → USER conversion. 🔧 CHANGES: - Fix #16: Remove premature BASE→USER conversions (6 locations) * core/tiny_alloc_fast.inc.h (3 fixes) * core/hakmem_tiny_refill.inc.h (2 fixes) * core/hakmem_tiny_fastcache.inc.h (1 fix) - Fix #12: Add header validation in tls_sll_pop (detect corruption) - Fix #14: Defense-in-depth header restoration in tls_sll_splice - Fix #15: USER pointer detection (for debugging) - Fix #13: Bump window header restoration - Fix #2, #6, #7, #8: Various header restoration & NULL termination 🧪 TEST RESULTS: 100% SUCCESS - 10K-500K iterations: All passed - 8 seeds × 100K: All passed (42,123,456,789,999,314,271,161) - Performance: ~630K ops/s average (stable) - Header corruption: ZERO 📋 FIXES SUMMARY: Fix #1-8: Initial header restoration & chain fixes (chatgpt-san) Fix #9-10: USER pointer auto-fix (later disabled) Fix #12: Validation system (caught corruption at call 14209) Fix #13: Bump window header writes Fix #14: Splice defense-in-depth Fix #15: USER pointer detection (debugging tool) Fix #16: Double conversion fix (FINAL SOLUTION) ✅ 🎓 LESSONS LEARNED: 1. Validation catches bugs early (Fix #12 was critical) 2. Class-specific inline logging reveals patterns (Option C) 3. Box Theory provides clean architectural boundaries 4. Multiple investigation approaches (Task/chatgpt-san collaboration) 📄 DOCUMENTATION: - P0_BUG_STATUS.md: Complete bug tracking timeline - C2_CORRUPTION_ROOT_CAUSE_FINAL.md: Detailed root cause analysis - FINAL_ANALYSIS_C2_CORRUPTION.md: Investigation methodology 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Task Agent <task@anthropic.com> Co-Authored-By: ChatGPT <chatgpt@openai.com>
2025-11-12 10:33:57 +09:00
Phase 12: Shared SuperSlab Pool implementation (WIP - runtime crash) ## Summary Implemented Phase 12 Shared SuperSlab Pool (mimalloc-style) to address SuperSlab allocation churn (877 SuperSlabs → 100-200 target). ## Implementation (ChatGPT + Claude) 1. **Metadata changes** (superslab_types.h): - Added class_idx to TinySlabMeta (per-slab dynamic class) - Removed size_class from SuperSlab (no longer per-SuperSlab) - Changed owner_tid (16-bit) → owner_tid_low (8-bit) 2. **Shared Pool** (hakmem_shared_pool.{h,c}): - Global pool shared by all size classes - shared_pool_acquire_slab() - Get free slab for class_idx - shared_pool_release_slab() - Return slab when empty - Per-class hints for fast path optimization 3. **Integration** (23 files modified): - Updated all ss->size_class → meta->class_idx - Updated all meta->owner_tid → meta->owner_tid_low - superslab_refill() now uses shared pool - Free path releases empty slabs back to pool 4. **Build system** (Makefile): - Added hakmem_shared_pool.o to OBJS_BASE and TINY_BENCH_OBJS_BASE ## Status: ⚠️ Build OK, Runtime CRASH **Build**: ✅ SUCCESS - All 23 files compile without errors - Only warnings: superslab_allocate type mismatch (legacy code) **Runtime**: ❌ SEGFAULT - Crash location: sll_refill_small_from_ss() - Exit code: 139 (SIGSEGV) - Test case: ./bench_random_mixed_hakmem 1000 256 42 ## Known Issues 1. **SEGFAULT in refill path** - Likely shared_pool_acquire_slab() issue 2. **Legacy superslab_allocate()** still exists (type mismatch warning) 3. **Remaining TODOs** from design doc: - SuperSlab physical layout integration - slab_handle.h cleanup - Remove old per-class head implementation ## Next Steps 1. Debug SEGFAULT (gdb backtrace shows sll_refill_small_from_ss) 2. Fix shared_pool_acquire_slab() or superslab_init_slab() 3. Basic functionality test (1K → 100K iterations) 4. Measure SuperSlab count reduction (877 → 100-200) 5. Performance benchmark (+650-860% expected) ## Files Changed (25 files) core/box/free_local_box.c core/box/free_remote_box.c core/box/front_gate_classifier.c core/hakmem_super_registry.c core/hakmem_tiny.c core/hakmem_tiny_bg_spill.c core/hakmem_tiny_free.inc core/hakmem_tiny_lifecycle.inc core/hakmem_tiny_magazine.c core/hakmem_tiny_query.c core/hakmem_tiny_refill.inc.h core/hakmem_tiny_superslab.c core/hakmem_tiny_superslab.h core/hakmem_tiny_tls_ops.h core/slab_handle.h core/superslab/superslab_inline.h core/superslab/superslab_types.h core/tiny_debug.h core/tiny_free_fast.inc.h core/tiny_free_magazine.inc.h core/tiny_remote.c core/tiny_superslab_alloc.inc.h core/tiny_superslab_free.inc.h Makefile ## New Files (3 files) PHASE12_SHARED_SUPERSLAB_POOL_DESIGN.md core/hakmem_shared_pool.c core/hakmem_shared_pool.h 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: ChatGPT <chatgpt@openai.com>
2025-11-13 16:33:03 +09:00
TinySlabMeta* meta = tls->meta;
// Meta invariants: class & capacity は妥当であること
if (!meta ||
meta->class_idx != (uint8_t)class_idx ||
meta->capacity == 0) {
return 0;
}
const uint32_t cap = sll_cap_for_class(class_idx, (uint32_t)TINY_TLS_MAG_CAP);
const uint32_t cur = g_tls_sll[class_idx].count;
if (cur >= cap) {
Phase 12: Shared SuperSlab Pool implementation (WIP - runtime crash) ## Summary Implemented Phase 12 Shared SuperSlab Pool (mimalloc-style) to address SuperSlab allocation churn (877 SuperSlabs → 100-200 target). ## Implementation (ChatGPT + Claude) 1. **Metadata changes** (superslab_types.h): - Added class_idx to TinySlabMeta (per-slab dynamic class) - Removed size_class from SuperSlab (no longer per-SuperSlab) - Changed owner_tid (16-bit) → owner_tid_low (8-bit) 2. **Shared Pool** (hakmem_shared_pool.{h,c}): - Global pool shared by all size classes - shared_pool_acquire_slab() - Get free slab for class_idx - shared_pool_release_slab() - Return slab when empty - Per-class hints for fast path optimization 3. **Integration** (23 files modified): - Updated all ss->size_class → meta->class_idx - Updated all meta->owner_tid → meta->owner_tid_low - superslab_refill() now uses shared pool - Free path releases empty slabs back to pool 4. **Build system** (Makefile): - Added hakmem_shared_pool.o to OBJS_BASE and TINY_BENCH_OBJS_BASE ## Status: ⚠️ Build OK, Runtime CRASH **Build**: ✅ SUCCESS - All 23 files compile without errors - Only warnings: superslab_allocate type mismatch (legacy code) **Runtime**: ❌ SEGFAULT - Crash location: sll_refill_small_from_ss() - Exit code: 139 (SIGSEGV) - Test case: ./bench_random_mixed_hakmem 1000 256 42 ## Known Issues 1. **SEGFAULT in refill path** - Likely shared_pool_acquire_slab() issue 2. **Legacy superslab_allocate()** still exists (type mismatch warning) 3. **Remaining TODOs** from design doc: - SuperSlab physical layout integration - slab_handle.h cleanup - Remove old per-class head implementation ## Next Steps 1. Debug SEGFAULT (gdb backtrace shows sll_refill_small_from_ss) 2. Fix shared_pool_acquire_slab() or superslab_init_slab() 3. Basic functionality test (1K → 100K iterations) 4. Measure SuperSlab count reduction (877 → 100-200) 5. Performance benchmark (+650-860% expected) ## Files Changed (25 files) core/box/free_local_box.c core/box/free_remote_box.c core/box/front_gate_classifier.c core/hakmem_super_registry.c core/hakmem_tiny.c core/hakmem_tiny_bg_spill.c core/hakmem_tiny_free.inc core/hakmem_tiny_lifecycle.inc core/hakmem_tiny_magazine.c core/hakmem_tiny_query.c core/hakmem_tiny_refill.inc.h core/hakmem_tiny_superslab.c core/hakmem_tiny_superslab.h core/hakmem_tiny_tls_ops.h core/slab_handle.h core/superslab/superslab_inline.h core/superslab/superslab_types.h core/tiny_debug.h core/tiny_free_fast.inc.h core/tiny_free_magazine.inc.h core/tiny_remote.c core/tiny_superslab_alloc.inc.h core/tiny_superslab_free.inc.h Makefile ## New Files (3 files) PHASE12_SHARED_SUPERSLAB_POOL_DESIGN.md core/hakmem_shared_pool.c core/hakmem_shared_pool.h 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: ChatGPT <chatgpt@openai.com>
2025-11-13 16:33:03 +09:00
return 0;
}
int room = (int)(cap - cur);
int target = (max_take < room) ? max_take : room;
if (target <= 0) {
return 0;
}
int taken = 0;
const size_t stride = tiny_stride_for_class(class_idx);
Phase 12: Shared SuperSlab Pool implementation (WIP - runtime crash) ## Summary Implemented Phase 12 Shared SuperSlab Pool (mimalloc-style) to address SuperSlab allocation churn (877 SuperSlabs → 100-200 target). ## Implementation (ChatGPT + Claude) 1. **Metadata changes** (superslab_types.h): - Added class_idx to TinySlabMeta (per-slab dynamic class) - Removed size_class from SuperSlab (no longer per-SuperSlab) - Changed owner_tid (16-bit) → owner_tid_low (8-bit) 2. **Shared Pool** (hakmem_shared_pool.{h,c}): - Global pool shared by all size classes - shared_pool_acquire_slab() - Get free slab for class_idx - shared_pool_release_slab() - Return slab when empty - Per-class hints for fast path optimization 3. **Integration** (23 files modified): - Updated all ss->size_class → meta->class_idx - Updated all meta->owner_tid → meta->owner_tid_low - superslab_refill() now uses shared pool - Free path releases empty slabs back to pool 4. **Build system** (Makefile): - Added hakmem_shared_pool.o to OBJS_BASE and TINY_BENCH_OBJS_BASE ## Status: ⚠️ Build OK, Runtime CRASH **Build**: ✅ SUCCESS - All 23 files compile without errors - Only warnings: superslab_allocate type mismatch (legacy code) **Runtime**: ❌ SEGFAULT - Crash location: sll_refill_small_from_ss() - Exit code: 139 (SIGSEGV) - Test case: ./bench_random_mixed_hakmem 1000 256 42 ## Known Issues 1. **SEGFAULT in refill path** - Likely shared_pool_acquire_slab() issue 2. **Legacy superslab_allocate()** still exists (type mismatch warning) 3. **Remaining TODOs** from design doc: - SuperSlab physical layout integration - slab_handle.h cleanup - Remove old per-class head implementation ## Next Steps 1. Debug SEGFAULT (gdb backtrace shows sll_refill_small_from_ss) 2. Fix shared_pool_acquire_slab() or superslab_init_slab() 3. Basic functionality test (1K → 100K iterations) 4. Measure SuperSlab count reduction (877 → 100-200) 5. Performance benchmark (+650-860% expected) ## Files Changed (25 files) core/box/free_local_box.c core/box/free_remote_box.c core/box/front_gate_classifier.c core/hakmem_super_registry.c core/hakmem_tiny.c core/hakmem_tiny_bg_spill.c core/hakmem_tiny_free.inc core/hakmem_tiny_lifecycle.inc core/hakmem_tiny_magazine.c core/hakmem_tiny_query.c core/hakmem_tiny_refill.inc.h core/hakmem_tiny_superslab.c core/hakmem_tiny_superslab.h core/hakmem_tiny_tls_ops.h core/slab_handle.h core/superslab/superslab_inline.h core/superslab/superslab_types.h core/tiny_debug.h core/tiny_free_fast.inc.h core/tiny_free_magazine.inc.h core/tiny_remote.c core/tiny_superslab_alloc.inc.h core/tiny_superslab_free.inc.h Makefile ## New Files (3 files) PHASE12_SHARED_SUPERSLAB_POOL_DESIGN.md core/hakmem_shared_pool.c core/hakmem_shared_pool.h 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: ChatGPT <chatgpt@openai.com>
2025-11-13 16:33:03 +09:00
while (taken < target) {
void* p = NULL;
Phase 12: Shared SuperSlab Pool implementation (WIP - runtime crash) ## Summary Implemented Phase 12 Shared SuperSlab Pool (mimalloc-style) to address SuperSlab allocation churn (877 SuperSlabs → 100-200 target). ## Implementation (ChatGPT + Claude) 1. **Metadata changes** (superslab_types.h): - Added class_idx to TinySlabMeta (per-slab dynamic class) - Removed size_class from SuperSlab (no longer per-SuperSlab) - Changed owner_tid (16-bit) → owner_tid_low (8-bit) 2. **Shared Pool** (hakmem_shared_pool.{h,c}): - Global pool shared by all size classes - shared_pool_acquire_slab() - Get free slab for class_idx - shared_pool_release_slab() - Return slab when empty - Per-class hints for fast path optimization 3. **Integration** (23 files modified): - Updated all ss->size_class → meta->class_idx - Updated all meta->owner_tid → meta->owner_tid_low - superslab_refill() now uses shared pool - Free path releases empty slabs back to pool 4. **Build system** (Makefile): - Added hakmem_shared_pool.o to OBJS_BASE and TINY_BENCH_OBJS_BASE ## Status: ⚠️ Build OK, Runtime CRASH **Build**: ✅ SUCCESS - All 23 files compile without errors - Only warnings: superslab_allocate type mismatch (legacy code) **Runtime**: ❌ SEGFAULT - Crash location: sll_refill_small_from_ss() - Exit code: 139 (SIGSEGV) - Test case: ./bench_random_mixed_hakmem 1000 256 42 ## Known Issues 1. **SEGFAULT in refill path** - Likely shared_pool_acquire_slab() issue 2. **Legacy superslab_allocate()** still exists (type mismatch warning) 3. **Remaining TODOs** from design doc: - SuperSlab physical layout integration - slab_handle.h cleanup - Remove old per-class head implementation ## Next Steps 1. Debug SEGFAULT (gdb backtrace shows sll_refill_small_from_ss) 2. Fix shared_pool_acquire_slab() or superslab_init_slab() 3. Basic functionality test (1K → 100K iterations) 4. Measure SuperSlab count reduction (877 → 100-200) 5. Performance benchmark (+650-860% expected) ## Files Changed (25 files) core/box/free_local_box.c core/box/free_remote_box.c core/box/front_gate_classifier.c core/hakmem_super_registry.c core/hakmem_tiny.c core/hakmem_tiny_bg_spill.c core/hakmem_tiny_free.inc core/hakmem_tiny_lifecycle.inc core/hakmem_tiny_magazine.c core/hakmem_tiny_query.c core/hakmem_tiny_refill.inc.h core/hakmem_tiny_superslab.c core/hakmem_tiny_superslab.h core/hakmem_tiny_tls_ops.h core/slab_handle.h core/superslab/superslab_inline.h core/superslab/superslab_types.h core/tiny_debug.h core/tiny_free_fast.inc.h core/tiny_free_magazine.inc.h core/tiny_remote.c core/tiny_superslab_alloc.inc.h core/tiny_superslab_free.inc.h Makefile ## New Files (3 files) PHASE12_SHARED_SUPERSLAB_POOL_DESIGN.md core/hakmem_shared_pool.c core/hakmem_shared_pool.h 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: ChatGPT <chatgpt@openai.com>
2025-11-13 16:33:03 +09:00
// freelist 優先
Phase 12: Shared SuperSlab Pool implementation (WIP - runtime crash) ## Summary Implemented Phase 12 Shared SuperSlab Pool (mimalloc-style) to address SuperSlab allocation churn (877 SuperSlabs → 100-200 target). ## Implementation (ChatGPT + Claude) 1. **Metadata changes** (superslab_types.h): - Added class_idx to TinySlabMeta (per-slab dynamic class) - Removed size_class from SuperSlab (no longer per-SuperSlab) - Changed owner_tid (16-bit) → owner_tid_low (8-bit) 2. **Shared Pool** (hakmem_shared_pool.{h,c}): - Global pool shared by all size classes - shared_pool_acquire_slab() - Get free slab for class_idx - shared_pool_release_slab() - Return slab when empty - Per-class hints for fast path optimization 3. **Integration** (23 files modified): - Updated all ss->size_class → meta->class_idx - Updated all meta->owner_tid → meta->owner_tid_low - superslab_refill() now uses shared pool - Free path releases empty slabs back to pool 4. **Build system** (Makefile): - Added hakmem_shared_pool.o to OBJS_BASE and TINY_BENCH_OBJS_BASE ## Status: ⚠️ Build OK, Runtime CRASH **Build**: ✅ SUCCESS - All 23 files compile without errors - Only warnings: superslab_allocate type mismatch (legacy code) **Runtime**: ❌ SEGFAULT - Crash location: sll_refill_small_from_ss() - Exit code: 139 (SIGSEGV) - Test case: ./bench_random_mixed_hakmem 1000 256 42 ## Known Issues 1. **SEGFAULT in refill path** - Likely shared_pool_acquire_slab() issue 2. **Legacy superslab_allocate()** still exists (type mismatch warning) 3. **Remaining TODOs** from design doc: - SuperSlab physical layout integration - slab_handle.h cleanup - Remove old per-class head implementation ## Next Steps 1. Debug SEGFAULT (gdb backtrace shows sll_refill_small_from_ss) 2. Fix shared_pool_acquire_slab() or superslab_init_slab() 3. Basic functionality test (1K → 100K iterations) 4. Measure SuperSlab count reduction (877 → 100-200) 5. Performance benchmark (+650-860% expected) ## Files Changed (25 files) core/box/free_local_box.c core/box/free_remote_box.c core/box/front_gate_classifier.c core/hakmem_super_registry.c core/hakmem_tiny.c core/hakmem_tiny_bg_spill.c core/hakmem_tiny_free.inc core/hakmem_tiny_lifecycle.inc core/hakmem_tiny_magazine.c core/hakmem_tiny_query.c core/hakmem_tiny_refill.inc.h core/hakmem_tiny_superslab.c core/hakmem_tiny_superslab.h core/hakmem_tiny_tls_ops.h core/slab_handle.h core/superslab/superslab_inline.h core/superslab/superslab_types.h core/tiny_debug.h core/tiny_free_fast.inc.h core/tiny_free_magazine.inc.h core/tiny_remote.c core/tiny_superslab_alloc.inc.h core/tiny_superslab_free.inc.h Makefile ## New Files (3 files) PHASE12_SHARED_SUPERSLAB_POOL_DESIGN.md core/hakmem_shared_pool.c core/hakmem_shared_pool.h 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: ChatGPT <chatgpt@openai.com>
2025-11-13 16:33:03 +09:00
if (meta->freelist) {
p = meta->freelist;
// Point 4: Freelist chain integrity check (CRITICAL - detect corruption early)
void* next_raw = tiny_next_read(class_idx, p);
uintptr_t next_addr = (uintptr_t)next_raw;
// Check 4a: NULL is valid (end of freelist)
if (next_raw != NULL) {
// Check 4b: Valid address range (not obviously corrupted)
if (next_addr < 4096 || next_addr > 0x00007fffffffffffULL) {
fprintf(stderr,
"[FREELIST_NEXT_INVALID] cls=%d p=%p next=%p addr=%#lx (out of valid range)\n",
class_idx, p, next_raw, next_addr);
fprintf(stderr, "[FREELIST_NEXT_INVALID] ss=%p meta=%p freelist_head=%p\n",
(void*)tls->ss, (void*)meta, p);
abort();
}
// Check 4c: SuperSlab ownership validation
SuperSlab* ss_check = hak_super_lookup(next_raw);
if (!ss_check || ss_check->magic != SUPERSLAB_MAGIC) {
fprintf(stderr,
"[FREELIST_NEXT_INVALID] cls=%d p=%p next=%p ss_check=%p (not in valid SuperSlab)\n",
class_idx, p, next_raw, (void*)ss_check);
if (ss_check) {
fprintf(stderr, "[FREELIST_NEXT_INVALID] ss_check->magic=%#llx (expected %#llx)\n",
(unsigned long long)ss_check->magic, (unsigned long long)SUPERSLAB_MAGIC);
}
abort();
}
}
meta->freelist = next_raw;
Phase 12: Shared SuperSlab Pool implementation (WIP - runtime crash) ## Summary Implemented Phase 12 Shared SuperSlab Pool (mimalloc-style) to address SuperSlab allocation churn (877 SuperSlabs → 100-200 target). ## Implementation (ChatGPT + Claude) 1. **Metadata changes** (superslab_types.h): - Added class_idx to TinySlabMeta (per-slab dynamic class) - Removed size_class from SuperSlab (no longer per-SuperSlab) - Changed owner_tid (16-bit) → owner_tid_low (8-bit) 2. **Shared Pool** (hakmem_shared_pool.{h,c}): - Global pool shared by all size classes - shared_pool_acquire_slab() - Get free slab for class_idx - shared_pool_release_slab() - Return slab when empty - Per-class hints for fast path optimization 3. **Integration** (23 files modified): - Updated all ss->size_class → meta->class_idx - Updated all meta->owner_tid → meta->owner_tid_low - superslab_refill() now uses shared pool - Free path releases empty slabs back to pool 4. **Build system** (Makefile): - Added hakmem_shared_pool.o to OBJS_BASE and TINY_BENCH_OBJS_BASE ## Status: ⚠️ Build OK, Runtime CRASH **Build**: ✅ SUCCESS - All 23 files compile without errors - Only warnings: superslab_allocate type mismatch (legacy code) **Runtime**: ❌ SEGFAULT - Crash location: sll_refill_small_from_ss() - Exit code: 139 (SIGSEGV) - Test case: ./bench_random_mixed_hakmem 1000 256 42 ## Known Issues 1. **SEGFAULT in refill path** - Likely shared_pool_acquire_slab() issue 2. **Legacy superslab_allocate()** still exists (type mismatch warning) 3. **Remaining TODOs** from design doc: - SuperSlab physical layout integration - slab_handle.h cleanup - Remove old per-class head implementation ## Next Steps 1. Debug SEGFAULT (gdb backtrace shows sll_refill_small_from_ss) 2. Fix shared_pool_acquire_slab() or superslab_init_slab() 3. Basic functionality test (1K → 100K iterations) 4. Measure SuperSlab count reduction (877 → 100-200) 5. Performance benchmark (+650-860% expected) ## Files Changed (25 files) core/box/free_local_box.c core/box/free_remote_box.c core/box/front_gate_classifier.c core/hakmem_super_registry.c core/hakmem_tiny.c core/hakmem_tiny_bg_spill.c core/hakmem_tiny_free.inc core/hakmem_tiny_lifecycle.inc core/hakmem_tiny_magazine.c core/hakmem_tiny_query.c core/hakmem_tiny_refill.inc.h core/hakmem_tiny_superslab.c core/hakmem_tiny_superslab.h core/hakmem_tiny_tls_ops.h core/slab_handle.h core/superslab/superslab_inline.h core/superslab/superslab_types.h core/tiny_debug.h core/tiny_free_fast.inc.h core/tiny_free_magazine.inc.h core/tiny_remote.c core/tiny_superslab_alloc.inc.h core/tiny_superslab_free.inc.h Makefile ## New Files (3 files) PHASE12_SHARED_SUPERSLAB_POOL_DESIGN.md core/hakmem_shared_pool.c core/hakmem_shared_pool.h 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: ChatGPT <chatgpt@openai.com>
2025-11-13 16:33:03 +09:00
meta->used++;
c7_meta_used_note(class_idx, C7_META_USED_SRC_FRONT);
if (__builtin_expect(meta->used > meta->capacity, 0)) {
// 異常検出時はロールバックして終了fail-fast 回避のため静かに中断)
c7_log_used_assign_cap(meta, class_idx, "FREELIST_OVERRUN");
meta->used = meta->capacity;
break;
}
ss_active_inc(tls->ss);
}
// freelist が尽きていて carved < capacity なら線形 carve
else if (meta->carved < meta->capacity) {
uint8_t* base = tls->slab_base
? tls->slab_base
: tiny_slab_base_for_geometry(tls->ss, tls->slab_idx);
if (!base) {
break;
}
uint16_t idx = meta->carved;
if (idx >= meta->capacity) {
break;
}
// Point 5: Stride calculation bounds check (CRITICAL - prevent out-of-bounds carving)
// Check 5a: Stride must be valid (not 0, not suspiciously large)
if (stride == 0 || stride > 100000) {
fprintf(stderr,
"[STRIDE_INVALID] cls=%d stride=%zu idx=%u cap=%u\n",
class_idx, stride, idx, meta->capacity);
fprintf(stderr, "[STRIDE_INVALID] ss=%p meta=%p base=%p\n",
(void*)tls->ss, (void*)meta, (void*)base);
abort();
}
uint8_t* addr = base + ((size_t)idx * stride);
// Check 5b: Calculated address must be within slab bounds
uintptr_t base_addr = (uintptr_t)base;
uintptr_t addr_addr = (uintptr_t)addr;
size_t max_offset = (size_t)meta->capacity * stride;
if (addr_addr < base_addr || (addr_addr - base_addr) > max_offset) {
fprintf(stderr,
"[ADDR_OUT_OF_BOUNDS] cls=%d base=%p addr=%p offset=%zu max=%zu\n",
class_idx, (void*)base, (void*)addr, (addr_addr - base_addr), max_offset);
fprintf(stderr, "[ADDR_OUT_OF_BOUNDS] idx=%u cap=%u stride=%zu\n",
idx, meta->capacity, stride);
abort();
}
meta->carved++;
meta->used++;
c7_meta_used_note(class_idx, C7_META_USED_SRC_FRONT);
if (__builtin_expect(meta->used > meta->capacity, 0)) {
c7_log_used_assign_cap(meta, class_idx, "CARVE_OVERRUN");
meta->used = meta->capacity;
break;
}
ss_active_inc(tls->ss);
p = addr;
}
// freelist も carve も尽きたら、新しい slab を shared pool から取得
else {
if (!superslab_refill(class_idx)) {
Phase 12: Shared SuperSlab Pool implementation (WIP - runtime crash) ## Summary Implemented Phase 12 Shared SuperSlab Pool (mimalloc-style) to address SuperSlab allocation churn (877 SuperSlabs → 100-200 target). ## Implementation (ChatGPT + Claude) 1. **Metadata changes** (superslab_types.h): - Added class_idx to TinySlabMeta (per-slab dynamic class) - Removed size_class from SuperSlab (no longer per-SuperSlab) - Changed owner_tid (16-bit) → owner_tid_low (8-bit) 2. **Shared Pool** (hakmem_shared_pool.{h,c}): - Global pool shared by all size classes - shared_pool_acquire_slab() - Get free slab for class_idx - shared_pool_release_slab() - Return slab when empty - Per-class hints for fast path optimization 3. **Integration** (23 files modified): - Updated all ss->size_class → meta->class_idx - Updated all meta->owner_tid → meta->owner_tid_low - superslab_refill() now uses shared pool - Free path releases empty slabs back to pool 4. **Build system** (Makefile): - Added hakmem_shared_pool.o to OBJS_BASE and TINY_BENCH_OBJS_BASE ## Status: ⚠️ Build OK, Runtime CRASH **Build**: ✅ SUCCESS - All 23 files compile without errors - Only warnings: superslab_allocate type mismatch (legacy code) **Runtime**: ❌ SEGFAULT - Crash location: sll_refill_small_from_ss() - Exit code: 139 (SIGSEGV) - Test case: ./bench_random_mixed_hakmem 1000 256 42 ## Known Issues 1. **SEGFAULT in refill path** - Likely shared_pool_acquire_slab() issue 2. **Legacy superslab_allocate()** still exists (type mismatch warning) 3. **Remaining TODOs** from design doc: - SuperSlab physical layout integration - slab_handle.h cleanup - Remove old per-class head implementation ## Next Steps 1. Debug SEGFAULT (gdb backtrace shows sll_refill_small_from_ss) 2. Fix shared_pool_acquire_slab() or superslab_init_slab() 3. Basic functionality test (1K → 100K iterations) 4. Measure SuperSlab count reduction (877 → 100-200) 5. Performance benchmark (+650-860% expected) ## Files Changed (25 files) core/box/free_local_box.c core/box/free_remote_box.c core/box/front_gate_classifier.c core/hakmem_super_registry.c core/hakmem_tiny.c core/hakmem_tiny_bg_spill.c core/hakmem_tiny_free.inc core/hakmem_tiny_lifecycle.inc core/hakmem_tiny_magazine.c core/hakmem_tiny_query.c core/hakmem_tiny_refill.inc.h core/hakmem_tiny_superslab.c core/hakmem_tiny_superslab.h core/hakmem_tiny_tls_ops.h core/slab_handle.h core/superslab/superslab_inline.h core/superslab/superslab_types.h core/tiny_debug.h core/tiny_free_fast.inc.h core/tiny_free_magazine.inc.h core/tiny_remote.c core/tiny_superslab_alloc.inc.h core/tiny_superslab_free.inc.h Makefile ## New Files (3 files) PHASE12_SHARED_SUPERSLAB_POOL_DESIGN.md core/hakmem_shared_pool.c core/hakmem_shared_pool.h 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: ChatGPT <chatgpt@openai.com>
2025-11-13 16:33:03 +09:00
break;
}
tls = &g_tls_slabs[class_idx];
Phase 12: Shared SuperSlab Pool implementation (WIP - runtime crash) ## Summary Implemented Phase 12 Shared SuperSlab Pool (mimalloc-style) to address SuperSlab allocation churn (877 SuperSlabs → 100-200 target). ## Implementation (ChatGPT + Claude) 1. **Metadata changes** (superslab_types.h): - Added class_idx to TinySlabMeta (per-slab dynamic class) - Removed size_class from SuperSlab (no longer per-SuperSlab) - Changed owner_tid (16-bit) → owner_tid_low (8-bit) 2. **Shared Pool** (hakmem_shared_pool.{h,c}): - Global pool shared by all size classes - shared_pool_acquire_slab() - Get free slab for class_idx - shared_pool_release_slab() - Return slab when empty - Per-class hints for fast path optimization 3. **Integration** (23 files modified): - Updated all ss->size_class → meta->class_idx - Updated all meta->owner_tid → meta->owner_tid_low - superslab_refill() now uses shared pool - Free path releases empty slabs back to pool 4. **Build system** (Makefile): - Added hakmem_shared_pool.o to OBJS_BASE and TINY_BENCH_OBJS_BASE ## Status: ⚠️ Build OK, Runtime CRASH **Build**: ✅ SUCCESS - All 23 files compile without errors - Only warnings: superslab_allocate type mismatch (legacy code) **Runtime**: ❌ SEGFAULT - Crash location: sll_refill_small_from_ss() - Exit code: 139 (SIGSEGV) - Test case: ./bench_random_mixed_hakmem 1000 256 42 ## Known Issues 1. **SEGFAULT in refill path** - Likely shared_pool_acquire_slab() issue 2. **Legacy superslab_allocate()** still exists (type mismatch warning) 3. **Remaining TODOs** from design doc: - SuperSlab physical layout integration - slab_handle.h cleanup - Remove old per-class head implementation ## Next Steps 1. Debug SEGFAULT (gdb backtrace shows sll_refill_small_from_ss) 2. Fix shared_pool_acquire_slab() or superslab_init_slab() 3. Basic functionality test (1K → 100K iterations) 4. Measure SuperSlab count reduction (877 → 100-200) 5. Performance benchmark (+650-860% expected) ## Files Changed (25 files) core/box/free_local_box.c core/box/free_remote_box.c core/box/front_gate_classifier.c core/hakmem_super_registry.c core/hakmem_tiny.c core/hakmem_tiny_bg_spill.c core/hakmem_tiny_free.inc core/hakmem_tiny_lifecycle.inc core/hakmem_tiny_magazine.c core/hakmem_tiny_query.c core/hakmem_tiny_refill.inc.h core/hakmem_tiny_superslab.c core/hakmem_tiny_superslab.h core/hakmem_tiny_tls_ops.h core/slab_handle.h core/superslab/superslab_inline.h core/superslab/superslab_types.h core/tiny_debug.h core/tiny_free_fast.inc.h core/tiny_free_magazine.inc.h core/tiny_remote.c core/tiny_superslab_alloc.inc.h core/tiny_superslab_free.inc.h Makefile ## New Files (3 files) PHASE12_SHARED_SUPERSLAB_POOL_DESIGN.md core/hakmem_shared_pool.c core/hakmem_shared_pool.h 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: ChatGPT <chatgpt@openai.com>
2025-11-13 16:33:03 +09:00
meta = tls->meta;
if (!tls->ss || !meta ||
meta->class_idx != (uint8_t)class_idx ||
!tls->slab_base ||
meta->capacity == 0) {
Phase 12: Shared SuperSlab Pool implementation (WIP - runtime crash) ## Summary Implemented Phase 12 Shared SuperSlab Pool (mimalloc-style) to address SuperSlab allocation churn (877 SuperSlabs → 100-200 target). ## Implementation (ChatGPT + Claude) 1. **Metadata changes** (superslab_types.h): - Added class_idx to TinySlabMeta (per-slab dynamic class) - Removed size_class from SuperSlab (no longer per-SuperSlab) - Changed owner_tid (16-bit) → owner_tid_low (8-bit) 2. **Shared Pool** (hakmem_shared_pool.{h,c}): - Global pool shared by all size classes - shared_pool_acquire_slab() - Get free slab for class_idx - shared_pool_release_slab() - Return slab when empty - Per-class hints for fast path optimization 3. **Integration** (23 files modified): - Updated all ss->size_class → meta->class_idx - Updated all meta->owner_tid → meta->owner_tid_low - superslab_refill() now uses shared pool - Free path releases empty slabs back to pool 4. **Build system** (Makefile): - Added hakmem_shared_pool.o to OBJS_BASE and TINY_BENCH_OBJS_BASE ## Status: ⚠️ Build OK, Runtime CRASH **Build**: ✅ SUCCESS - All 23 files compile without errors - Only warnings: superslab_allocate type mismatch (legacy code) **Runtime**: ❌ SEGFAULT - Crash location: sll_refill_small_from_ss() - Exit code: 139 (SIGSEGV) - Test case: ./bench_random_mixed_hakmem 1000 256 42 ## Known Issues 1. **SEGFAULT in refill path** - Likely shared_pool_acquire_slab() issue 2. **Legacy superslab_allocate()** still exists (type mismatch warning) 3. **Remaining TODOs** from design doc: - SuperSlab physical layout integration - slab_handle.h cleanup - Remove old per-class head implementation ## Next Steps 1. Debug SEGFAULT (gdb backtrace shows sll_refill_small_from_ss) 2. Fix shared_pool_acquire_slab() or superslab_init_slab() 3. Basic functionality test (1K → 100K iterations) 4. Measure SuperSlab count reduction (877 → 100-200) 5. Performance benchmark (+650-860% expected) ## Files Changed (25 files) core/box/free_local_box.c core/box/free_remote_box.c core/box/front_gate_classifier.c core/hakmem_super_registry.c core/hakmem_tiny.c core/hakmem_tiny_bg_spill.c core/hakmem_tiny_free.inc core/hakmem_tiny_lifecycle.inc core/hakmem_tiny_magazine.c core/hakmem_tiny_query.c core/hakmem_tiny_refill.inc.h core/hakmem_tiny_superslab.c core/hakmem_tiny_superslab.h core/hakmem_tiny_tls_ops.h core/slab_handle.h core/superslab/superslab_inline.h core/superslab/superslab_types.h core/tiny_debug.h core/tiny_free_fast.inc.h core/tiny_free_magazine.inc.h core/tiny_remote.c core/tiny_superslab_alloc.inc.h core/tiny_superslab_free.inc.h Makefile ## New Files (3 files) PHASE12_SHARED_SUPERSLAB_POOL_DESIGN.md core/hakmem_shared_pool.c core/hakmem_shared_pool.h 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: ChatGPT <chatgpt@openai.com>
2025-11-13 16:33:03 +09:00
break;
}
continue;
}
Phase 12: Shared SuperSlab Pool implementation (WIP - runtime crash) ## Summary Implemented Phase 12 Shared SuperSlab Pool (mimalloc-style) to address SuperSlab allocation churn (877 SuperSlabs → 100-200 target). ## Implementation (ChatGPT + Claude) 1. **Metadata changes** (superslab_types.h): - Added class_idx to TinySlabMeta (per-slab dynamic class) - Removed size_class from SuperSlab (no longer per-SuperSlab) - Changed owner_tid (16-bit) → owner_tid_low (8-bit) 2. **Shared Pool** (hakmem_shared_pool.{h,c}): - Global pool shared by all size classes - shared_pool_acquire_slab() - Get free slab for class_idx - shared_pool_release_slab() - Return slab when empty - Per-class hints for fast path optimization 3. **Integration** (23 files modified): - Updated all ss->size_class → meta->class_idx - Updated all meta->owner_tid → meta->owner_tid_low - superslab_refill() now uses shared pool - Free path releases empty slabs back to pool 4. **Build system** (Makefile): - Added hakmem_shared_pool.o to OBJS_BASE and TINY_BENCH_OBJS_BASE ## Status: ⚠️ Build OK, Runtime CRASH **Build**: ✅ SUCCESS - All 23 files compile without errors - Only warnings: superslab_allocate type mismatch (legacy code) **Runtime**: ❌ SEGFAULT - Crash location: sll_refill_small_from_ss() - Exit code: 139 (SIGSEGV) - Test case: ./bench_random_mixed_hakmem 1000 256 42 ## Known Issues 1. **SEGFAULT in refill path** - Likely shared_pool_acquire_slab() issue 2. **Legacy superslab_allocate()** still exists (type mismatch warning) 3. **Remaining TODOs** from design doc: - SuperSlab physical layout integration - slab_handle.h cleanup - Remove old per-class head implementation ## Next Steps 1. Debug SEGFAULT (gdb backtrace shows sll_refill_small_from_ss) 2. Fix shared_pool_acquire_slab() or superslab_init_slab() 3. Basic functionality test (1K → 100K iterations) 4. Measure SuperSlab count reduction (877 → 100-200) 5. Performance benchmark (+650-860% expected) ## Files Changed (25 files) core/box/free_local_box.c core/box/free_remote_box.c core/box/front_gate_classifier.c core/hakmem_super_registry.c core/hakmem_tiny.c core/hakmem_tiny_bg_spill.c core/hakmem_tiny_free.inc core/hakmem_tiny_lifecycle.inc core/hakmem_tiny_magazine.c core/hakmem_tiny_query.c core/hakmem_tiny_refill.inc.h core/hakmem_tiny_superslab.c core/hakmem_tiny_superslab.h core/hakmem_tiny_tls_ops.h core/slab_handle.h core/superslab/superslab_inline.h core/superslab/superslab_types.h core/tiny_debug.h core/tiny_free_fast.inc.h core/tiny_free_magazine.inc.h core/tiny_remote.c core/tiny_superslab_alloc.inc.h core/tiny_superslab_free.inc.h Makefile ## New Files (3 files) PHASE12_SHARED_SUPERSLAB_POOL_DESIGN.md core/hakmem_shared_pool.c core/hakmem_shared_pool.h 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: ChatGPT <chatgpt@openai.com>
2025-11-13 16:33:03 +09:00
if (!p) {
break;
}
Phase 12: Shared SuperSlab Pool implementation (WIP - runtime crash) ## Summary Implemented Phase 12 Shared SuperSlab Pool (mimalloc-style) to address SuperSlab allocation churn (877 SuperSlabs → 100-200 target). ## Implementation (ChatGPT + Claude) 1. **Metadata changes** (superslab_types.h): - Added class_idx to TinySlabMeta (per-slab dynamic class) - Removed size_class from SuperSlab (no longer per-SuperSlab) - Changed owner_tid (16-bit) → owner_tid_low (8-bit) 2. **Shared Pool** (hakmem_shared_pool.{h,c}): - Global pool shared by all size classes - shared_pool_acquire_slab() - Get free slab for class_idx - shared_pool_release_slab() - Return slab when empty - Per-class hints for fast path optimization 3. **Integration** (23 files modified): - Updated all ss->size_class → meta->class_idx - Updated all meta->owner_tid → meta->owner_tid_low - superslab_refill() now uses shared pool - Free path releases empty slabs back to pool 4. **Build system** (Makefile): - Added hakmem_shared_pool.o to OBJS_BASE and TINY_BENCH_OBJS_BASE ## Status: ⚠️ Build OK, Runtime CRASH **Build**: ✅ SUCCESS - All 23 files compile without errors - Only warnings: superslab_allocate type mismatch (legacy code) **Runtime**: ❌ SEGFAULT - Crash location: sll_refill_small_from_ss() - Exit code: 139 (SIGSEGV) - Test case: ./bench_random_mixed_hakmem 1000 256 42 ## Known Issues 1. **SEGFAULT in refill path** - Likely shared_pool_acquire_slab() issue 2. **Legacy superslab_allocate()** still exists (type mismatch warning) 3. **Remaining TODOs** from design doc: - SuperSlab physical layout integration - slab_handle.h cleanup - Remove old per-class head implementation ## Next Steps 1. Debug SEGFAULT (gdb backtrace shows sll_refill_small_from_ss) 2. Fix shared_pool_acquire_slab() or superslab_init_slab() 3. Basic functionality test (1K → 100K iterations) 4. Measure SuperSlab count reduction (877 → 100-200) 5. Performance benchmark (+650-860% expected) ## Files Changed (25 files) core/box/free_local_box.c core/box/free_remote_box.c core/box/front_gate_classifier.c core/hakmem_super_registry.c core/hakmem_tiny.c core/hakmem_tiny_bg_spill.c core/hakmem_tiny_free.inc core/hakmem_tiny_lifecycle.inc core/hakmem_tiny_magazine.c core/hakmem_tiny_query.c core/hakmem_tiny_refill.inc.h core/hakmem_tiny_superslab.c core/hakmem_tiny_superslab.h core/hakmem_tiny_tls_ops.h core/slab_handle.h core/superslab/superslab_inline.h core/superslab/superslab_types.h core/tiny_debug.h core/tiny_free_fast.inc.h core/tiny_free_magazine.inc.h core/tiny_remote.c core/tiny_superslab_alloc.inc.h core/tiny_superslab_free.inc.h Makefile ## New Files (3 files) PHASE12_SHARED_SUPERSLAB_POOL_DESIGN.md core/hakmem_shared_pool.c core/hakmem_shared_pool.h 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: ChatGPT <chatgpt@openai.com>
2025-11-13 16:33:03 +09:00
tiny_debug_validate_node_base(class_idx, p, "sll_refill_small_from_ss");
Phase 12: Shared SuperSlab Pool implementation (WIP - runtime crash) ## Summary Implemented Phase 12 Shared SuperSlab Pool (mimalloc-style) to address SuperSlab allocation churn (877 SuperSlabs → 100-200 target). ## Implementation (ChatGPT + Claude) 1. **Metadata changes** (superslab_types.h): - Added class_idx to TinySlabMeta (per-slab dynamic class) - Removed size_class from SuperSlab (no longer per-SuperSlab) - Changed owner_tid (16-bit) → owner_tid_low (8-bit) 2. **Shared Pool** (hakmem_shared_pool.{h,c}): - Global pool shared by all size classes - shared_pool_acquire_slab() - Get free slab for class_idx - shared_pool_release_slab() - Return slab when empty - Per-class hints for fast path optimization 3. **Integration** (23 files modified): - Updated all ss->size_class → meta->class_idx - Updated all meta->owner_tid → meta->owner_tid_low - superslab_refill() now uses shared pool - Free path releases empty slabs back to pool 4. **Build system** (Makefile): - Added hakmem_shared_pool.o to OBJS_BASE and TINY_BENCH_OBJS_BASE ## Status: ⚠️ Build OK, Runtime CRASH **Build**: ✅ SUCCESS - All 23 files compile without errors - Only warnings: superslab_allocate type mismatch (legacy code) **Runtime**: ❌ SEGFAULT - Crash location: sll_refill_small_from_ss() - Exit code: 139 (SIGSEGV) - Test case: ./bench_random_mixed_hakmem 1000 256 42 ## Known Issues 1. **SEGFAULT in refill path** - Likely shared_pool_acquire_slab() issue 2. **Legacy superslab_allocate()** still exists (type mismatch warning) 3. **Remaining TODOs** from design doc: - SuperSlab physical layout integration - slab_handle.h cleanup - Remove old per-class head implementation ## Next Steps 1. Debug SEGFAULT (gdb backtrace shows sll_refill_small_from_ss) 2. Fix shared_pool_acquire_slab() or superslab_init_slab() 3. Basic functionality test (1K → 100K iterations) 4. Measure SuperSlab count reduction (877 → 100-200) 5. Performance benchmark (+650-860% expected) ## Files Changed (25 files) core/box/free_local_box.c core/box/free_remote_box.c core/box/front_gate_classifier.c core/hakmem_super_registry.c core/hakmem_tiny.c core/hakmem_tiny_bg_spill.c core/hakmem_tiny_free.inc core/hakmem_tiny_lifecycle.inc core/hakmem_tiny_magazine.c core/hakmem_tiny_query.c core/hakmem_tiny_refill.inc.h core/hakmem_tiny_superslab.c core/hakmem_tiny_superslab.h core/hakmem_tiny_tls_ops.h core/slab_handle.h core/superslab/superslab_inline.h core/superslab/superslab_types.h core/tiny_debug.h core/tiny_free_fast.inc.h core/tiny_free_magazine.inc.h core/tiny_remote.c core/tiny_superslab_alloc.inc.h core/tiny_superslab_free.inc.h Makefile ## New Files (3 files) PHASE12_SHARED_SUPERSLAB_POOL_DESIGN.md core/hakmem_shared_pool.c core/hakmem_shared_pool.h 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: ChatGPT <chatgpt@openai.com>
2025-11-13 16:33:03 +09:00
Front-Direct implementation: SS→FC direct refill + SLL complete bypass ## Summary Implemented Front-Direct architecture with complete SLL bypass: - Direct SuperSlab → FastCache refill (1-hop, bypasses SLL) - SLL-free allocation/free paths when Front-Direct enabled - Legacy path sealing (SLL inline opt-in, SFC cascade ENV-only) ## New Modules - core/refill/ss_refill_fc.h (236 lines): Standard SS→FC refill entry point - Remote drain → Freelist → Carve priority - Header restoration for C1-C6 (NOT C0/C7) - ENV: HAKMEM_TINY_P0_DRAIN_THRESH, HAKMEM_TINY_P0_NO_DRAIN - core/front/fast_cache.h: FastCache (L1) type definition - core/front/quick_slot.h: QuickSlot (L0) type definition ## Allocation Path (core/tiny_alloc_fast.inc.h) - Added s_front_direct_alloc TLS flag (lazy ENV check) - SLL pop guarded by: g_tls_sll_enable && !s_front_direct_alloc - Refill dispatch: - Front-Direct: ss_refill_fc_fill() → fastcache_pop() (1-hop) - Legacy: sll_refill_batch_from_ss() → SLL → FC (2-hop, A/B only) - SLL inline pop sealed (requires HAKMEM_TINY_INLINE_SLL=1 opt-in) ## Free Path (core/hakmem_tiny_free.inc, core/hakmem_tiny_fastcache.inc.h) - FC priority: Try fastcache_push() first (same-thread free) - tiny_fast_push() bypass: Returns 0 when s_front_direct_free || !g_tls_sll_enable - Fallback: Magazine/slow path (safe, bypasses SLL) ## Legacy Sealing - SFC cascade: Default OFF (ENV-only via HAKMEM_TINY_SFC_CASCADE=1) - Deleted: core/hakmem_tiny_free.inc.bak, core/pool_refill_legacy.c.bak - Documentation: ss_refill_fc_fill() promoted as CANONICAL refill entry ## ENV Controls - HAKMEM_TINY_FRONT_DIRECT=1: Enable Front-Direct (SS→FC direct) - HAKMEM_TINY_P0_DIRECT_FC_ALL=1: Same as above (alt name) - HAKMEM_TINY_REFILL_BATCH=1: Enable batch refill (also enables Front-Direct) - HAKMEM_TINY_SFC_CASCADE=1: Enable SFC cascade (default OFF) - HAKMEM_TINY_INLINE_SLL=1: Enable inline SLL pop (default OFF, requires AGGRESSIVE_INLINE) ## Benchmarks (Front-Direct Enabled) ```bash ENV: HAKMEM_BENCH_FAST_FRONT=1 HAKMEM_TINY_FRONT_DIRECT=1 HAKMEM_TINY_REFILL_BATCH=1 HAKMEM_TINY_P0_DIRECT_FC_ALL=1 HAKMEM_TINY_REFILL_COUNT_HOT=256 HAKMEM_TINY_REFILL_COUNT_MID=96 HAKMEM_TINY_BUMP_CHUNK=256 bench_random_mixed (16-1040B random, 200K iter): 256 slots: 1.44M ops/s (STABLE, 0 SEGV) 128 slots: 1.44M ops/s (STABLE, 0 SEGV) bench_fixed_size (fixed size, 200K iter): 256B: 4.06M ops/s (has debug logs, expected >10M without logs) 128B: Similar (debug logs affect) ``` ## Verification - TRACE_RING test (10K iter): **0 SLL events** detected ✅ - Complete SLL bypass confirmed when Front-Direct=1 - Stable execution: 200K iterations × multiple sizes, 0 SEGV ## Next Steps - Disable debug logs in hak_alloc_api.inc.h (call_num 14250-14280 range) - Re-benchmark with clean Release build (target: 10-15M ops/s) - 128/256B shortcut path optimization (FC hit rate improvement) Co-Authored-By: ChatGPT <chatgpt@openai.com> Suggested-By: ultrathink
2025-11-14 05:41:49 +09:00
// Prepare header for header-classes so that safeheader mode accepts the push
Cleanup: Fix 2 additional Class 0/7 header bugs (correctness fix) Task Agent Investigation: - Found 2 more instances of hardcoded `class_idx != 7` checks - These are real bugs (C0 also uses offset=0, not just C7) - However, NOT the root cause of 12% crash rate Bug Fixes (2 locations): 1. tls_sll_drain_box.h:190 - Path: TLS SLL drain → tiny_free_local_box() - Fix: Use tiny_header_write_for_alloc() (ALL classes) - Reason: tiny_free_local_box() reads header for class_idx 2. hakmem_tiny_refill.inc.h:384 - Path: SuperSlab refill → TLS SLL push - Fix: Use tiny_header_write_if_preserved() (C1-C6 only) - Reason: TLS SLL push needs header for validation Test Results: - Before: 12% crash rate (88/100 runs successful) - After: 12% crash rate (44/50 runs successful) - Conclusion: Correctness fix, but not primary crash cause Analysis: - Bugs are real (incorrect Class 0 handling) - Fixes don't reduce crash rate → different root cause exists - Heisenbug characteristics (disappears under gdb) - Likely: Race condition, uninitialized memory, or use-after-free Remaining Work: - 12% crash rate persists (requires different investigation) - Next: Focus on TLS initialization, race conditions, allocation paths Design Note: - tls_sll_drain_box.h uses tiny_header_write_for_alloc() because tiny_free_local_box() needs header to read class_idx - hakmem_tiny_refill.inc.h uses tiny_header_write_if_preserved() because TLS SLL push validates header (C1-C6 only) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-29 08:12:08 +09:00
// Uses Header Box API (C1-C6 only; C0/C7 skip - offset=0 overwrites header)
tiny_header_write_if_preserved(p, class_idx);
// SLL push 失敗時はそれ以上積まないp はTLS slab管理下なので破棄でOK
if (!tls_sll_push(class_idx, p, cap)) {
break;
}
taken++;
}
return taken;
}
#endif // HAKMEM_TINY_REFILL_INC_H