2025-11-05 12:31:14 +09:00
|
|
|
|
// hakmem_tiny_refill.inc.h
|
2025-11-14 01:02:00 +09:00
|
|
|
|
// Phase 12: Minimal refill helpers needed by Box fast path.
|
2025-11-05 12:31:14 +09:00
|
|
|
|
//
|
2025-11-14 01:02:00 +09:00
|
|
|
|
// 本ヘッダは、以下を提供する:
|
|
|
|
|
|
// - superslab_tls_bump_fast: TinyTLSSlab + SuperSlab メタからのTLSバンプ窓
|
|
|
|
|
|
// - tiny_fast_refill_and_take: FastCache/TLS SLL からの最小 refill + 1個取得
|
|
|
|
|
|
// - bulk_mag_to_sll_if_room: Magazine→SLL へのバルク移送(容量チェック付き)
|
|
|
|
|
|
// - sll_refill_small_from_ss: Phase12 shared SuperSlab pool 向けの最小実装
|
2025-11-05 12:31:14 +09:00
|
|
|
|
//
|
2025-11-14 01:02:00 +09:00
|
|
|
|
// 旧来の g_sll_cap_override / getenv ベースの多経路ロジックは一切含めない。
|
2025-11-05 12:31:14 +09:00
|
|
|
|
|
|
|
|
|
|
#ifndef HAKMEM_TINY_REFILL_INC_H
|
|
|
|
|
|
#define HAKMEM_TINY_REFILL_INC_H
|
|
|
|
|
|
|
|
|
|
|
|
#include "hakmem_tiny.h"
|
|
|
|
|
|
#include "hakmem_tiny_superslab.h"
|
|
|
|
|
|
#include "hakmem_tiny_tls_list.h"
|
2025-11-14 01:02:00 +09:00
|
|
|
|
#include "tiny_box_geometry.h"
|
2025-12-04 10:38:19 +09:00
|
|
|
|
#include "superslab/superslab_inline.h" // Provides hak_super_lookup() and SUPERSLAB_MAGIC
|
2025-11-14 01:02:00 +09:00
|
|
|
|
#include "box/tls_sll_box.h"
|
2025-12-05 23:41:01 +09:00
|
|
|
|
#include "box/c7_meta_used_counter_box.h"
|
2025-11-29 08:12:08 +09:00
|
|
|
|
#include "box/tiny_header_box.h" // Header Box: Single Source of Truth for header operations
|
2025-11-29 17:31:32 +09:00
|
|
|
|
#include "box/tiny_front_config_box.h" // Phase 7-Step6-Fix: Config macros for dead code elimination
|
2025-12-07 22:49:28 +09:00
|
|
|
|
#include "box/tiny_heap_env_box.h" // TinyHeap front gate (C7 TinyHeapBox)
|
2025-11-14 01:02:00 +09:00
|
|
|
|
#include "hakmem_tiny_integrity.h"
|
|
|
|
|
|
#include "box/tiny_next_ptr_box.h"
|
Front-Direct implementation: SS→FC direct refill + SLL complete bypass
## Summary
Implemented Front-Direct architecture with complete SLL bypass:
- Direct SuperSlab → FastCache refill (1-hop, bypasses SLL)
- SLL-free allocation/free paths when Front-Direct enabled
- Legacy path sealing (SLL inline opt-in, SFC cascade ENV-only)
## New Modules
- core/refill/ss_refill_fc.h (236 lines): Standard SS→FC refill entry point
- Remote drain → Freelist → Carve priority
- Header restoration for C1-C6 (NOT C0/C7)
- ENV: HAKMEM_TINY_P0_DRAIN_THRESH, HAKMEM_TINY_P0_NO_DRAIN
- core/front/fast_cache.h: FastCache (L1) type definition
- core/front/quick_slot.h: QuickSlot (L0) type definition
## Allocation Path (core/tiny_alloc_fast.inc.h)
- Added s_front_direct_alloc TLS flag (lazy ENV check)
- SLL pop guarded by: g_tls_sll_enable && !s_front_direct_alloc
- Refill dispatch:
- Front-Direct: ss_refill_fc_fill() → fastcache_pop() (1-hop)
- Legacy: sll_refill_batch_from_ss() → SLL → FC (2-hop, A/B only)
- SLL inline pop sealed (requires HAKMEM_TINY_INLINE_SLL=1 opt-in)
## Free Path (core/hakmem_tiny_free.inc, core/hakmem_tiny_fastcache.inc.h)
- FC priority: Try fastcache_push() first (same-thread free)
- tiny_fast_push() bypass: Returns 0 when s_front_direct_free || !g_tls_sll_enable
- Fallback: Magazine/slow path (safe, bypasses SLL)
## Legacy Sealing
- SFC cascade: Default OFF (ENV-only via HAKMEM_TINY_SFC_CASCADE=1)
- Deleted: core/hakmem_tiny_free.inc.bak, core/pool_refill_legacy.c.bak
- Documentation: ss_refill_fc_fill() promoted as CANONICAL refill entry
## ENV Controls
- HAKMEM_TINY_FRONT_DIRECT=1: Enable Front-Direct (SS→FC direct)
- HAKMEM_TINY_P0_DIRECT_FC_ALL=1: Same as above (alt name)
- HAKMEM_TINY_REFILL_BATCH=1: Enable batch refill (also enables Front-Direct)
- HAKMEM_TINY_SFC_CASCADE=1: Enable SFC cascade (default OFF)
- HAKMEM_TINY_INLINE_SLL=1: Enable inline SLL pop (default OFF, requires AGGRESSIVE_INLINE)
## Benchmarks (Front-Direct Enabled)
```bash
ENV: HAKMEM_BENCH_FAST_FRONT=1 HAKMEM_TINY_FRONT_DIRECT=1
HAKMEM_TINY_REFILL_BATCH=1 HAKMEM_TINY_P0_DIRECT_FC_ALL=1
HAKMEM_TINY_REFILL_COUNT_HOT=256 HAKMEM_TINY_REFILL_COUNT_MID=96
HAKMEM_TINY_BUMP_CHUNK=256
bench_random_mixed (16-1040B random, 200K iter):
256 slots: 1.44M ops/s (STABLE, 0 SEGV)
128 slots: 1.44M ops/s (STABLE, 0 SEGV)
bench_fixed_size (fixed size, 200K iter):
256B: 4.06M ops/s (has debug logs, expected >10M without logs)
128B: Similar (debug logs affect)
```
## Verification
- TRACE_RING test (10K iter): **0 SLL events** detected ✅
- Complete SLL bypass confirmed when Front-Direct=1
- Stable execution: 200K iterations × multiple sizes, 0 SEGV
## Next Steps
- Disable debug logs in hak_alloc_api.inc.h (call_num 14250-14280 range)
- Re-benchmark with clean Release build (target: 10-15M ops/s)
- 128/256B shortcut path optimization (FC hit rate improvement)
Co-Authored-By: ChatGPT <chatgpt@openai.com>
Suggested-By: ultrathink
2025-11-14 05:41:49 +09:00
|
|
|
|
#include "tiny_region_id.h" // For HEADER_MAGIC/HEADER_CLASS_MASK (prepare header before SLL push)
|
2025-11-05 12:31:14 +09:00
|
|
|
|
#include <stdint.h>
|
2025-11-14 01:02:00 +09:00
|
|
|
|
#include <stdatomic.h>
|
2025-12-04 10:38:19 +09:00
|
|
|
|
#include <stdio.h> // For fprintf diagnostics
|
2025-11-05 12:31:14 +09:00
|
|
|
|
|
2025-11-14 01:02:00 +09:00
|
|
|
|
// ========= Externs from hakmem_tiny.c and friends =========
|
|
|
|
|
|
|
|
|
|
|
|
extern int g_use_superslab;
|
|
|
|
|
|
extern __thread TinyTLSSlab g_tls_slabs[TINY_NUM_CLASSES];
|
|
|
|
|
|
|
|
|
|
|
|
extern int g_fastcache_enable;
|
2025-11-05 12:31:14 +09:00
|
|
|
|
extern uint16_t g_fast_cap[TINY_NUM_CLASSES];
|
2025-11-14 01:02:00 +09:00
|
|
|
|
extern __thread TinyFastCache g_fast_cache[TINY_NUM_CLASSES];
|
2025-11-05 12:31:14 +09:00
|
|
|
|
|
2025-11-29 17:35:51 +09:00
|
|
|
|
// Phase 7-Step7: g_tls_sll_enable now accessed via TINY_FRONT_TLS_SLL_ENABLED macro
|
2025-11-20 07:32:30 +09:00
|
|
|
|
extern __thread TinyTLSSLL g_tls_sll[TINY_NUM_CLASSES];
|
2025-11-05 12:31:14 +09:00
|
|
|
|
|
2025-11-14 01:02:00 +09:00
|
|
|
|
extern _Atomic uint32_t g_frontend_fill_target[TINY_NUM_CLASSES];
|
|
|
|
|
|
|
2025-11-05 12:31:14 +09:00
|
|
|
|
extern int g_ultra_bump_shadow;
|
|
|
|
|
|
extern int g_bump_chunk;
|
|
|
|
|
|
extern __thread uint8_t* g_tls_bcur[TINY_NUM_CLASSES];
|
|
|
|
|
|
extern __thread uint8_t* g_tls_bend[TINY_NUM_CLASSES];
|
|
|
|
|
|
|
|
|
|
|
|
#if HAKMEM_DEBUG_COUNTERS
|
|
|
|
|
|
extern uint64_t g_bump_hits[TINY_NUM_CLASSES];
|
|
|
|
|
|
extern uint64_t g_bump_arms[TINY_NUM_CLASSES];
|
|
|
|
|
|
extern uint64_t g_path_refill_calls[TINY_NUM_CLASSES];
|
|
|
|
|
|
extern uint64_t g_ultra_refill_calls[TINY_NUM_CLASSES];
|
|
|
|
|
|
extern int g_path_debug_enabled;
|
|
|
|
|
|
#endif
|
|
|
|
|
|
|
2025-11-14 01:02:00 +09:00
|
|
|
|
// ========= From other units =========
|
2025-11-05 12:31:14 +09:00
|
|
|
|
|
2025-11-13 01:45:30 +09:00
|
|
|
|
SuperSlab* superslab_refill(int class_idx);
|
2025-11-14 01:02:00 +09:00
|
|
|
|
|
|
|
|
|
|
void ss_active_inc(SuperSlab* ss);
|
2025-11-13 01:45:30 +09:00
|
|
|
|
void ss_active_add(SuperSlab* ss, uint32_t n);
|
2025-11-14 01:02:00 +09:00
|
|
|
|
|
|
|
|
|
|
size_t tiny_stride_for_class(int class_idx);
|
|
|
|
|
|
uint8_t* tiny_slab_base_for_geometry(SuperSlab* ss, int slab_idx);
|
|
|
|
|
|
|
|
|
|
|
|
extern uint32_t sll_cap_for_class(int class_idx, uint32_t mag_cap);
|
|
|
|
|
|
|
|
|
|
|
|
/* ultra_* 系は hakmem_tiny.c 側に定義があるため、ここでは宣言しない */
|
|
|
|
|
|
/* tls_sll_push は box/tls_sll_box.h で static inline bool tls_sll_push(...) 提供済み */
|
|
|
|
|
|
/* tiny_small_mags_init_once / tiny_mag_init_if_needed も hakmem_tiny_magazine.h で宣言済みなので、ここでは再宣言しない */
|
|
|
|
|
|
/* tiny_fast_pop / tiny_fast_push / fastcache_* は hakmem_tiny_fastcache.inc.h 側の static inline なので、ここでは未宣言でOK */
|
|
|
|
|
|
|
|
|
|
|
|
#if !HAKMEM_BUILD_RELEASE
|
|
|
|
|
|
static inline void tiny_debug_validate_node_base(int class_idx, void* node, const char* where)
|
|
|
|
|
|
{
|
|
|
|
|
|
(void)class_idx;
|
|
|
|
|
|
(void)where;
|
|
|
|
|
|
|
|
|
|
|
|
// 最低限の防御: 異常に小さいアドレスを弾く
|
2025-11-11 01:00:37 +09:00
|
|
|
|
if ((uintptr_t)node < 4096) {
|
2025-11-14 01:02:00 +09:00
|
|
|
|
fprintf(stderr,
|
|
|
|
|
|
"[TINY_REFILL_GUARD] %s: suspicious node=%p cls=%d\n",
|
|
|
|
|
|
where, node, class_idx);
|
2025-11-11 01:00:37 +09:00
|
|
|
|
abort();
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
#else
|
2025-11-14 01:02:00 +09:00
|
|
|
|
static inline void tiny_debug_validate_node_base(int class_idx, void* node, const char* where)
|
|
|
|
|
|
{
|
|
|
|
|
|
(void)class_idx;
|
|
|
|
|
|
(void)node;
|
|
|
|
|
|
(void)where;
|
|
|
|
|
|
}
|
|
|
|
|
|
#endif
|
|
|
|
|
|
|
2025-12-05 23:41:01 +09:00
|
|
|
|
static inline void c7_log_used_assign_cap(TinySlabMeta* meta,
|
|
|
|
|
|
int class_idx,
|
|
|
|
|
|
const char* tag) {
|
|
|
|
|
|
if (__builtin_expect(class_idx != 7, 1)) {
|
|
|
|
|
|
return;
|
|
|
|
|
|
}
|
|
|
|
|
|
#if HAKMEM_BUILD_RELEASE
|
|
|
|
|
|
static _Atomic uint32_t rel_logs = 0;
|
|
|
|
|
|
uint32_t n = atomic_fetch_add_explicit(&rel_logs, 1, memory_order_relaxed);
|
|
|
|
|
|
if (n < 4) {
|
|
|
|
|
|
fprintf(stderr,
|
|
|
|
|
|
"[REL_C7_USED_ASSIGN] tag=%s used=%u cap=%u carved=%u freelist=%p\n",
|
|
|
|
|
|
tag,
|
|
|
|
|
|
(unsigned)meta->used,
|
|
|
|
|
|
(unsigned)meta->capacity,
|
|
|
|
|
|
(unsigned)meta->carved,
|
|
|
|
|
|
meta->freelist);
|
|
|
|
|
|
}
|
|
|
|
|
|
#else
|
|
|
|
|
|
static _Atomic uint32_t dbg_logs = 0;
|
|
|
|
|
|
uint32_t n = atomic_fetch_add_explicit(&dbg_logs, 1, memory_order_relaxed);
|
|
|
|
|
|
if (n < 4) {
|
|
|
|
|
|
fprintf(stderr,
|
|
|
|
|
|
"[DBG_C7_USED_ASSIGN] tag=%s used=%u cap=%u carved=%u freelist=%p\n",
|
|
|
|
|
|
tag,
|
|
|
|
|
|
(unsigned)meta->used,
|
|
|
|
|
|
(unsigned)meta->capacity,
|
|
|
|
|
|
(unsigned)meta->carved,
|
|
|
|
|
|
meta->freelist);
|
|
|
|
|
|
}
|
|
|
|
|
|
#endif
|
|
|
|
|
|
}
|
|
|
|
|
|
|
2025-11-14 01:02:00 +09:00
|
|
|
|
// ========= superslab_tls_bump_fast =========
|
|
|
|
|
|
//
|
|
|
|
|
|
// Ultra bump shadow: current slabが freelist 空で carved<capacity のとき、
|
|
|
|
|
|
// 連続領域を TLS window としてまとめ予約する。
|
|
|
|
|
|
// tiny_hot_pop_class{0..3} から呼ばれる。
|
|
|
|
|
|
|
|
|
|
|
|
static inline void* superslab_tls_bump_fast(int class_idx) {
|
|
|
|
|
|
if (!g_ultra_bump_shadow || !g_use_superslab) return NULL;
|
|
|
|
|
|
|
|
|
|
|
|
uint8_t* cur = g_tls_bcur[class_idx];
|
|
|
|
|
|
if (cur) {
|
|
|
|
|
|
uint8_t* end = g_tls_bend[class_idx];
|
|
|
|
|
|
size_t stride = tiny_stride_for_class(class_idx);
|
|
|
|
|
|
if (cur + stride <= end) {
|
|
|
|
|
|
g_tls_bcur[class_idx] = cur + stride;
|
|
|
|
|
|
#if HAKMEM_DEBUG_COUNTERS
|
|
|
|
|
|
g_bump_hits[class_idx]++;
|
|
|
|
|
|
#endif
|
|
|
|
|
|
#if HAKMEM_TINY_HEADER_CLASSIDX
|
|
|
|
|
|
// Headerは呼び出し元で書く or strideに含め済み想定。ここでは生ポインタ返す。
|
2025-11-11 01:00:37 +09:00
|
|
|
|
#endif
|
2025-11-14 01:02:00 +09:00
|
|
|
|
return cur;
|
|
|
|
|
|
}
|
|
|
|
|
|
g_tls_bcur[class_idx] = NULL;
|
|
|
|
|
|
g_tls_bend[class_idx] = NULL;
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
TinyTLSSlab* tls = &g_tls_slabs[class_idx];
|
|
|
|
|
|
TinySlabMeta* meta = tls->meta;
|
|
|
|
|
|
if (!tls->ss || !meta || meta->freelist) return NULL;
|
|
|
|
|
|
|
|
|
|
|
|
uint16_t carved = meta->carved;
|
|
|
|
|
|
uint16_t cap = meta->capacity;
|
|
|
|
|
|
if (carved >= cap) return NULL;
|
|
|
|
|
|
|
|
|
|
|
|
uint32_t avail = (uint32_t)cap - (uint32_t)carved;
|
|
|
|
|
|
uint32_t chunk = (g_bump_chunk > 0) ? (uint32_t)g_bump_chunk : 1u;
|
|
|
|
|
|
if (chunk > avail) chunk = avail;
|
|
|
|
|
|
|
|
|
|
|
|
size_t stride = tiny_stride_for_class(class_idx);
|
|
|
|
|
|
uint8_t* base = tls->slab_base
|
|
|
|
|
|
? tls->slab_base
|
|
|
|
|
|
: tiny_slab_base_for_geometry(tls->ss, tls->slab_idx);
|
|
|
|
|
|
uint8_t* start = base + (size_t)carved * stride;
|
|
|
|
|
|
|
|
|
|
|
|
meta->carved = (uint16_t)(carved + (uint16_t)chunk);
|
|
|
|
|
|
meta->used = (uint16_t)(meta->used + (uint16_t)chunk);
|
2025-12-05 23:41:01 +09:00
|
|
|
|
if (class_idx == 7) {
|
|
|
|
|
|
for (uint32_t i = 0; i < chunk; ++i) {
|
|
|
|
|
|
c7_meta_used_note(class_idx, C7_META_USED_SRC_FRONT);
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
2025-11-14 01:02:00 +09:00
|
|
|
|
ss_active_add(tls->ss, chunk);
|
|
|
|
|
|
#if HAKMEM_DEBUG_COUNTERS
|
|
|
|
|
|
g_bump_arms[class_idx]++;
|
|
|
|
|
|
#endif
|
|
|
|
|
|
|
|
|
|
|
|
// 1個目を即返し、残りをTLS windowとして保持
|
|
|
|
|
|
g_tls_bcur[class_idx] = start + stride;
|
|
|
|
|
|
g_tls_bend[class_idx] = start + (size_t)chunk * stride;
|
|
|
|
|
|
return start;
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
// ========= tiny_fast_refill_and_take =========
|
|
|
|
|
|
//
|
|
|
|
|
|
// FCが空の時に、TLS list/superslab からバッチ取得して一つ返す。
|
|
|
|
|
|
// 旧来の複雑な経路を削り、FC/SLLのみの最小ロジックにする。
|
2025-11-11 01:00:37 +09:00
|
|
|
|
|
2025-11-05 12:31:14 +09:00
|
|
|
|
static inline void* tiny_fast_refill_and_take(int class_idx, TinyTLSList* tls) {
|
2025-12-10 09:08:18 +09:00
|
|
|
|
(void)tls;
|
2025-11-14 01:02:00 +09:00
|
|
|
|
// 1) Front FastCache から直接
|
2025-11-29 17:31:32 +09:00
|
|
|
|
// Phase 7-Step6-Fix: Use config macro for dead code elimination in PGO mode
|
|
|
|
|
|
if (__builtin_expect(TINY_FRONT_FASTCACHE_ENABLED && class_idx <= 3, 1)) {
|
2025-12-04 11:05:06 +09:00
|
|
|
|
hak_base_ptr_t fc = fastcache_pop(class_idx);
|
|
|
|
|
|
if (!hak_base_is_null(fc)) {
|
2025-11-14 01:02:00 +09:00
|
|
|
|
extern unsigned long long g_front_fc_hit[TINY_NUM_CLASSES];
|
2025-11-11 21:49:05 +09:00
|
|
|
|
g_front_fc_hit[class_idx]++;
|
2025-12-04 11:05:06 +09:00
|
|
|
|
return HAK_BASE_TO_RAW(fc);
|
2025-11-11 21:49:05 +09:00
|
|
|
|
}
|
|
|
|
|
|
}
|
2025-11-14 01:02:00 +09:00
|
|
|
|
|
|
|
|
|
|
// 2) ローカルfast list
|
|
|
|
|
|
{
|
2025-12-04 11:05:06 +09:00
|
|
|
|
hak_base_ptr_t p = tiny_fast_pop(class_idx);
|
|
|
|
|
|
if (!hak_base_is_null(p)) return HAK_BASE_TO_RAW(p);
|
Add Box 3 (Pointer Conversion Layer) and fix POOL_TLS_PHASE1 default
## Major Changes
### 1. Box 3: Pointer Conversion Module (NEW)
- File: core/box/ptr_conversion_box.h
- Purpose: Unified BASE ↔ USER pointer conversion (single source of truth)
- API: PTR_BASE_TO_USER(), PTR_USER_TO_BASE()
- Features: Zero-overhead inline, debug mode, NULL-safe, class 7 headerless support
- Design: Header-only, fully modular, no external dependencies
### 2. POOL_TLS_PHASE1 Default OFF (CRITICAL FIX)
- File: build.sh
- Change: POOL_TLS_PHASE1 now defaults to 0 (was hardcoded to 1)
- Impact: Eliminates pthread_mutex overhead on every free() (was causing 3.3x slowdown)
- Usage: Set POOL_TLS_PHASE1=1 env var to enable if needed
### 3. Pointer Conversion Fixes (PARTIAL)
- Files: core/box/front_gate_box.c, core/tiny_alloc_fast.inc.h, etc.
- Status: Partial implementation using Box 3 API
- Note: Work in progress, some conversions still need review
### 4. Performance Investigation Report (NEW)
- File: HOTPATH_PERFORMANCE_INVESTIGATION.md
- Findings:
- Hotpath works (+24% vs baseline) after POOL_TLS fix
- Still 9.2x slower than system malloc due to:
* Heavy initialization (23.85% of cycles)
* Syscall overhead (2,382 syscalls per 100K ops)
* Workload mismatch (C7 1KB is 49.8%, but only C5 256B has hotpath)
* 9.4x more instructions than system malloc
### 5. Known Issues
- SEGV at 20K-30K iterations (pre-existing bug, not related to pointer conversions)
- Root cause: Likely active counter corruption or TLS-SLL chain issues
- Status: Under investigation
## Performance Results (100K iterations, 256B)
- Baseline (Hotpath OFF): 7.22M ops/s
- Hotpath ON: 8.98M ops/s (+24% improvement ✓)
- System malloc: 82.2M ops/s (still 9.2x faster)
## Next Steps
- P0: Fix 20K-30K SEGV bug (GDB investigation needed)
- P1: Lazy initialization (+20-25% expected)
- P1: C7 (1KB) hotpath (+30-40% expected, biggest win)
- P2: Reduce syscalls (+15-20% expected)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-12 01:01:23 +09:00
|
|
|
|
}
|
2025-11-14 01:02:00 +09:00
|
|
|
|
|
2025-11-05 12:31:14 +09:00
|
|
|
|
uint16_t cap = g_fast_cap[class_idx];
|
|
|
|
|
|
if (cap == 0) return NULL;
|
2025-11-14 01:02:00 +09:00
|
|
|
|
TinyFastCache* fc = &g_fast_cache[class_idx];
|
|
|
|
|
|
int room = (int)cap - fc->top;
|
|
|
|
|
|
if (room <= 0) return NULL;
|
|
|
|
|
|
|
|
|
|
|
|
// 3) TLS SLL から詰め替え
|
|
|
|
|
|
int filled = 0;
|
2025-11-29 17:35:51 +09:00
|
|
|
|
// Phase 7-Step7: Use config macro for dead code elimination in PGO mode
|
|
|
|
|
|
while (room > 0 && TINY_FRONT_TLS_SLL_ENABLED) {
|
2025-11-14 01:02:00 +09:00
|
|
|
|
void* h = NULL;
|
|
|
|
|
|
if (!tls_sll_pop(class_idx, &h)) break;
|
|
|
|
|
|
tiny_debug_validate_node_base(class_idx, h, "tiny_fast_refill_and_take");
|
|
|
|
|
|
fc->items[fc->top++] = h;
|
|
|
|
|
|
room--;
|
|
|
|
|
|
filled++;
|
2025-11-05 12:31:14 +09:00
|
|
|
|
}
|
2025-11-14 01:02:00 +09:00
|
|
|
|
|
|
|
|
|
|
if (filled == 0) {
|
|
|
|
|
|
// 4) Superslab bump (optional)
|
|
|
|
|
|
void* bump = superslab_tls_bump_fast(class_idx);
|
|
|
|
|
|
if (bump) return bump;
|
2025-11-05 12:31:14 +09:00
|
|
|
|
return NULL;
|
|
|
|
|
|
}
|
|
|
|
|
|
|
2025-11-14 01:02:00 +09:00
|
|
|
|
// 5) 1個返す
|
|
|
|
|
|
return fc->items[--fc->top];
|
2025-11-05 12:31:14 +09:00
|
|
|
|
}
|
|
|
|
|
|
|
2025-11-14 01:02:00 +09:00
|
|
|
|
// ========= bulk_mag_to_sll_if_room =========
|
|
|
|
|
|
//
|
|
|
|
|
|
// Magazine → SLL への安全な流し込み。
|
|
|
|
|
|
// tiny_free_magazine.inc.h から参照される。
|
2025-11-05 12:31:14 +09:00
|
|
|
|
|
2025-11-14 01:02:00 +09:00
|
|
|
|
static inline int bulk_mag_to_sll_if_room(int class_idx, TinyTLSMag* mag, int n) {
|
2025-11-29 17:35:51 +09:00
|
|
|
|
// Phase 7-Step7: Use config macro for dead code elimination in PGO mode
|
|
|
|
|
|
if (!TINY_FRONT_TLS_SLL_ENABLED || n <= 0) return 0;
|
2025-11-05 12:31:14 +09:00
|
|
|
|
|
2025-11-14 01:02:00 +09:00
|
|
|
|
uint32_t cap = sll_cap_for_class(class_idx, (uint32_t)mag->cap);
|
2025-11-20 07:32:30 +09:00
|
|
|
|
uint32_t have = g_tls_sll[class_idx].count;
|
2025-11-14 01:02:00 +09:00
|
|
|
|
if (have >= cap) return 0;
|
2025-11-09 22:12:34 +09:00
|
|
|
|
|
2025-11-14 01:02:00 +09:00
|
|
|
|
int room = (int)(cap - have);
|
|
|
|
|
|
int take = n < room ? n : room;
|
|
|
|
|
|
if (take <= 0) return 0;
|
|
|
|
|
|
if (take > mag->top) take = mag->top;
|
|
|
|
|
|
if (take <= 0) return 0;
|
2025-11-05 12:31:14 +09:00
|
|
|
|
|
2025-11-14 01:02:00 +09:00
|
|
|
|
int pushed = 0;
|
|
|
|
|
|
for (int i = 0; i < take; i++) {
|
|
|
|
|
|
void* p = mag->items[--mag->top].ptr;
|
2025-12-03 15:30:28 +09:00
|
|
|
|
hak_base_ptr_t base_p = HAK_BASE_FROM_RAW(p);
|
|
|
|
|
|
if (!tls_sll_push(class_idx, base_p, cap)) {
|
2025-11-14 01:02:00 +09:00
|
|
|
|
mag->top++; // rollback last
|
|
|
|
|
|
break;
|
|
|
|
|
|
}
|
|
|
|
|
|
pushed++;
|
|
|
|
|
|
}
|
|
|
|
|
|
#if HAKMEM_DEBUG_COUNTERS
|
|
|
|
|
|
if (pushed > 0) g_path_refill_calls[class_idx]++;
|
2025-11-05 12:31:14 +09:00
|
|
|
|
#endif
|
2025-11-14 01:02:00 +09:00
|
|
|
|
return pushed;
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
|
* ========= Minimal Phase 12 sll_refill_small_from_ss =========
|
|
|
|
|
|
*
|
|
|
|
|
|
* Box化方針:
|
|
|
|
|
|
* - フロントエンド(tiny_fast_refill 等)は:
|
|
|
|
|
|
* - TLS SLL: tls_sll_box.h API のみを使用
|
|
|
|
|
|
* - Superslab: 本関数を唯一の「小サイズ SLL 補充 Box」として利用
|
|
|
|
|
|
* - バックエンド:
|
|
|
|
|
|
* - 現段階(Stage A/B)では既存 TLS Superslab/TinySlabMeta を直接利用
|
|
|
|
|
|
* - 将来(Stage C)に shared_pool_acquire_slab() に差し替え可能なよう、
|
|
|
|
|
|
* ここに Superslab 内部アクセスを閉じ込める
|
|
|
|
|
|
*
|
|
|
|
|
|
* 契約:
|
|
|
|
|
|
* - Tiny classes のみ (0 <= class_idx < TINY_NUM_CLASSES)
|
|
|
|
|
|
* - max_take は「この呼び出しで SLL に積みたい最大個数」
|
|
|
|
|
|
* - 戻り値は実際に SLL に積んだ個数(0 以上)
|
|
|
|
|
|
* - 呼び出し側は head/count/meta 等に触れず、Box API (tls_sll_box) のみ利用する
|
|
|
|
|
|
*/
|
|
|
|
|
|
|
|
|
|
|
|
__attribute__((noinline))
|
|
|
|
|
|
int sll_refill_small_from_ss(int class_idx, int max_take)
|
|
|
|
|
|
{
|
|
|
|
|
|
// Hard defensive gate: Tiny classes only, never trust caller.
|
|
|
|
|
|
if (class_idx < 0 || class_idx >= TINY_NUM_CLASSES) {
|
|
|
|
|
|
return 0;
|
|
|
|
|
|
}
|
|
|
|
|
|
|
2025-12-07 22:49:28 +09:00
|
|
|
|
// TinyHeap front で扱うクラスは TLS SLL を使わない(TinyHeapBox 内で完結)。
|
|
|
|
|
|
if (tiny_heap_class_route_enabled(class_idx)) {
|
|
|
|
|
|
return 0;
|
|
|
|
|
|
}
|
|
|
|
|
|
|
Add Box I (Integrity), Box E (Expansion), and comprehensive P0 debugging infrastructure
## Major Additions
### 1. Box I: Integrity Verification System (NEW - 703 lines)
- Files: core/box/integrity_box.h (267 lines), core/box/integrity_box.c (436 lines)
- Purpose: Unified integrity checking across all HAKMEM subsystems
- Features:
* 4-level integrity checking (0-4, compile-time controlled)
* Priority 1: TLS array bounds validation
* Priority 2: Freelist pointer validation
* Priority 3: TLS canary monitoring
* Priority ALPHA: Slab metadata invariant checking (5 invariants)
* Atomic statistics tracking (thread-safe)
* Beautiful BOX_BOUNDARY design pattern
### 2. Box E: SuperSlab Expansion System (COMPLETE)
- Files: core/box/superslab_expansion_box.h, core/box/superslab_expansion_box.c
- Purpose: Safe SuperSlab expansion with TLS state guarantee
- Features:
* Immediate slab 0 binding after expansion
* TLS state snapshot and restoration
* Design by Contract (pre/post-conditions, invariants)
* Thread-safe with mutex protection
### 3. Comprehensive Integrity Checking System
- File: core/hakmem_tiny_integrity.h (NEW)
- Unified validation functions for all allocator subsystems
- Uninitialized memory pattern detection (0xa2, 0xcc, 0xdd, 0xfe)
- Pointer range validation (null-page, kernel-space)
### 4. P0 Bug Investigation - Root Cause Identified
**Bug**: SEGV at iteration 28440 (deterministic with seed 42)
**Pattern**: 0xa2a2a2a2a2a2a2a2 (uninitialized/ASan poisoning)
**Location**: TLS SLL (Single-Linked List) cache layer
**Root Cause**: Race condition or use-after-free in TLS list management (class 0)
**Detection**: Box I successfully caught invalid pointer at exact crash point
### 5. Defensive Improvements
- Defensive memset in SuperSlab allocation (all metadata arrays)
- Enhanced pointer validation with pattern detection
- BOX_BOUNDARY markers throughout codebase (beautiful modular design)
- 5 metadata invariant checks in allocation/free/refill paths
## Integration Points
- Modified 13 files with Box I/E integration
- Added 10+ BOX_BOUNDARY markers
- 5 critical integrity check points in P0 refill path
## Test Results (100K iterations)
- Baseline: 7.22M ops/s
- Hotpath ON: 8.98M ops/s (+24% improvement ✓)
- P0 Bug: Still crashes at 28440 iterations (TLS SLL race condition)
- Root cause: Identified but not yet fixed (requires deeper investigation)
## Performance
- Box I overhead: Zero in release builds (HAKMEM_INTEGRITY_LEVEL=0)
- Debug builds: Full validation enabled (HAKMEM_INTEGRITY_LEVEL=4)
- Beautiful modular design maintains clean separation of concerns
## Known Issues
- P0 Bug at 28440 iterations: Race condition in TLS SLL cache (class 0)
- Cause: Use-after-free or race in remote free draining
- Next step: Valgrind investigation to pinpoint exact corruption location
## Code Quality
- Total new code: ~1400 lines (Box I + Box E + integrity system)
- Design: Beautiful Box Theory with clear boundaries
- Modularity: Complete separation of concerns
- Documentation: Comprehensive inline comments and BOX_BOUNDARY markers
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-12 02:45:00 +09:00
|
|
|
|
HAK_CHECK_CLASS_IDX(class_idx, "sll_refill_small_from_ss");
|
|
|
|
|
|
atomic_fetch_add(&g_integrity_check_class_bounds, 1);
|
|
|
|
|
|
|
2025-11-14 01:02:00 +09:00
|
|
|
|
// Phase12: 起動直後など、shared pool / superslab 未有効時は絶対に動かさない。
|
|
|
|
|
|
if (!g_use_superslab || max_take <= 0) {
|
2025-11-13 16:33:03 +09:00
|
|
|
|
return 0;
|
2025-11-14 01:02:00 +09:00
|
|
|
|
}
|
2025-11-10 16:48:20 +09:00
|
|
|
|
|
2025-11-14 01:02:00 +09:00
|
|
|
|
// TLS slab 未構成状態 (ss/meta/slab_base すべて NULL) のときは、ここでは触らない。
|
|
|
|
|
|
// superslab_refill は「本当に必要になったタイミング」でのみ呼ぶ。
|
2025-11-05 12:31:14 +09:00
|
|
|
|
TinyTLSSlab* tls = &g_tls_slabs[class_idx];
|
2025-11-14 01:02:00 +09:00
|
|
|
|
if (!tls) {
|
|
|
|
|
|
return 0;
|
|
|
|
|
|
}
|
|
|
|
|
|
|
2025-11-27 16:47:30 +09:00
|
|
|
|
// FIX: TLS未初期化時も superslab_refill() で初期化する(早期リターン削除)
|
|
|
|
|
|
// 以前は tls_uninitialized の場合に return 0 していたが、これだと
|
|
|
|
|
|
// TLS SLL が永遠に空のままになり、Larson ベンチで 70x slowdown が発生していた。
|
2025-11-14 01:02:00 +09:00
|
|
|
|
|
|
|
|
|
|
// Ensure we have a valid TLS slab for this class via shared pool.
|
|
|
|
|
|
// superslab_refill() 契約:
|
|
|
|
|
|
// - 成功: g_tls_slabs[class_idx] に ss/meta/slab_base/slab_idx を一貫して設定
|
|
|
|
|
|
// - 失敗: TLS は不変 or 巻き戻し、NULL を返す
|
|
|
|
|
|
if (!tls->ss || !tls->meta ||
|
|
|
|
|
|
tls->meta->class_idx != (uint8_t)class_idx ||
|
|
|
|
|
|
!tls->slab_base) {
|
|
|
|
|
|
if (!superslab_refill(class_idx)) {
|
2025-11-13 16:33:03 +09:00
|
|
|
|
return 0;
|
2025-11-14 01:02:00 +09:00
|
|
|
|
}
|
2025-11-10 01:59:11 +09:00
|
|
|
|
tls = &g_tls_slabs[class_idx];
|
2025-11-14 01:02:00 +09:00
|
|
|
|
if (!tls->ss || !tls->meta ||
|
|
|
|
|
|
tls->meta->class_idx != (uint8_t)class_idx ||
|
|
|
|
|
|
!tls->slab_base) {
|
2025-11-13 16:33:03 +09:00
|
|
|
|
return 0;
|
2025-11-14 01:02:00 +09:00
|
|
|
|
}
|
2025-11-05 12:31:14 +09:00
|
|
|
|
}
|
Fix #16: Resolve double BASE→USER conversion causing header corruption
🎯 ROOT CAUSE: Internal allocation helpers were prematurely converting
BASE → USER pointers before returning to caller. The caller then applied
HAK_RET_ALLOC/tiny_region_id_write_header which performed ANOTHER BASE→USER
conversion, resulting in double offset (BASE+2) and header written at
wrong location.
📦 BOX THEORY SOLUTION: Establish clean pointer conversion boundary at
tiny_region_id_write_header, making it the single source of truth for
BASE → USER conversion.
🔧 CHANGES:
- Fix #16: Remove premature BASE→USER conversions (6 locations)
* core/tiny_alloc_fast.inc.h (3 fixes)
* core/hakmem_tiny_refill.inc.h (2 fixes)
* core/hakmem_tiny_fastcache.inc.h (1 fix)
- Fix #12: Add header validation in tls_sll_pop (detect corruption)
- Fix #14: Defense-in-depth header restoration in tls_sll_splice
- Fix #15: USER pointer detection (for debugging)
- Fix #13: Bump window header restoration
- Fix #2, #6, #7, #8: Various header restoration & NULL termination
🧪 TEST RESULTS: 100% SUCCESS
- 10K-500K iterations: All passed
- 8 seeds × 100K: All passed (42,123,456,789,999,314,271,161)
- Performance: ~630K ops/s average (stable)
- Header corruption: ZERO
📋 FIXES SUMMARY:
Fix #1-8: Initial header restoration & chain fixes (chatgpt-san)
Fix #9-10: USER pointer auto-fix (later disabled)
Fix #12: Validation system (caught corruption at call 14209)
Fix #13: Bump window header writes
Fix #14: Splice defense-in-depth
Fix #15: USER pointer detection (debugging tool)
Fix #16: Double conversion fix (FINAL SOLUTION) ✅
🎓 LESSONS LEARNED:
1. Validation catches bugs early (Fix #12 was critical)
2. Class-specific inline logging reveals patterns (Option C)
3. Box Theory provides clean architectural boundaries
4. Multiple investigation approaches (Task/chatgpt-san collaboration)
📄 DOCUMENTATION:
- P0_BUG_STATUS.md: Complete bug tracking timeline
- C2_CORRUPTION_ROOT_CAUSE_FINAL.md: Detailed root cause analysis
- FINAL_ANALYSIS_C2_CORRUPTION.md: Investigation methodology
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Task Agent <task@anthropic.com>
Co-Authored-By: ChatGPT <chatgpt@openai.com>
2025-11-12 10:33:57 +09:00
|
|
|
|
|
2025-11-13 16:33:03 +09:00
|
|
|
|
TinySlabMeta* meta = tls->meta;
|
2025-11-14 01:02:00 +09:00
|
|
|
|
// Meta invariants: class & capacity は妥当であること
|
|
|
|
|
|
if (!meta ||
|
|
|
|
|
|
meta->class_idx != (uint8_t)class_idx ||
|
|
|
|
|
|
meta->capacity == 0) {
|
|
|
|
|
|
return 0;
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
const uint32_t cap = sll_cap_for_class(class_idx, (uint32_t)TINY_TLS_MAG_CAP);
|
2025-11-20 07:32:30 +09:00
|
|
|
|
const uint32_t cur = g_tls_sll[class_idx].count;
|
2025-11-14 01:02:00 +09:00
|
|
|
|
if (cur >= cap) {
|
2025-11-13 16:33:03 +09:00
|
|
|
|
return 0;
|
2025-11-14 01:02:00 +09:00
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
int room = (int)(cap - cur);
|
|
|
|
|
|
int target = (max_take < room) ? max_take : room;
|
|
|
|
|
|
if (target <= 0) {
|
|
|
|
|
|
return 0;
|
|
|
|
|
|
}
|
2025-11-05 12:31:14 +09:00
|
|
|
|
|
|
|
|
|
|
int taken = 0;
|
2025-11-14 01:02:00 +09:00
|
|
|
|
const size_t stride = tiny_stride_for_class(class_idx);
|
2025-11-13 16:33:03 +09:00
|
|
|
|
|
2025-11-14 01:02:00 +09:00
|
|
|
|
while (taken < target) {
|
2025-11-05 12:31:14 +09:00
|
|
|
|
void* p = NULL;
|
2025-11-13 16:33:03 +09:00
|
|
|
|
|
2025-11-14 01:02:00 +09:00
|
|
|
|
// freelist 優先
|
2025-11-13 16:33:03 +09:00
|
|
|
|
if (meta->freelist) {
|
|
|
|
|
|
p = meta->freelist;
|
2025-12-04 10:38:19 +09:00
|
|
|
|
|
|
|
|
|
|
// Point 4: Freelist chain integrity check (CRITICAL - detect corruption early)
|
|
|
|
|
|
void* next_raw = tiny_next_read(class_idx, p);
|
|
|
|
|
|
uintptr_t next_addr = (uintptr_t)next_raw;
|
|
|
|
|
|
|
|
|
|
|
|
// Check 4a: NULL is valid (end of freelist)
|
|
|
|
|
|
if (next_raw != NULL) {
|
|
|
|
|
|
// Check 4b: Valid address range (not obviously corrupted)
|
|
|
|
|
|
if (next_addr < 4096 || next_addr > 0x00007fffffffffffULL) {
|
|
|
|
|
|
fprintf(stderr,
|
|
|
|
|
|
"[FREELIST_NEXT_INVALID] cls=%d p=%p next=%p addr=%#lx (out of valid range)\n",
|
|
|
|
|
|
class_idx, p, next_raw, next_addr);
|
|
|
|
|
|
fprintf(stderr, "[FREELIST_NEXT_INVALID] ss=%p meta=%p freelist_head=%p\n",
|
|
|
|
|
|
(void*)tls->ss, (void*)meta, p);
|
|
|
|
|
|
abort();
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
// Check 4c: SuperSlab ownership validation
|
|
|
|
|
|
SuperSlab* ss_check = hak_super_lookup(next_raw);
|
|
|
|
|
|
if (!ss_check || ss_check->magic != SUPERSLAB_MAGIC) {
|
|
|
|
|
|
fprintf(stderr,
|
|
|
|
|
|
"[FREELIST_NEXT_INVALID] cls=%d p=%p next=%p ss_check=%p (not in valid SuperSlab)\n",
|
|
|
|
|
|
class_idx, p, next_raw, (void*)ss_check);
|
|
|
|
|
|
if (ss_check) {
|
|
|
|
|
|
fprintf(stderr, "[FREELIST_NEXT_INVALID] ss_check->magic=%#llx (expected %#llx)\n",
|
|
|
|
|
|
(unsigned long long)ss_check->magic, (unsigned long long)SUPERSLAB_MAGIC);
|
|
|
|
|
|
}
|
|
|
|
|
|
abort();
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
meta->freelist = next_raw;
|
2025-11-13 16:33:03 +09:00
|
|
|
|
meta->used++;
|
2025-12-05 23:41:01 +09:00
|
|
|
|
c7_meta_used_note(class_idx, C7_META_USED_SRC_FRONT);
|
2025-11-14 01:02:00 +09:00
|
|
|
|
if (__builtin_expect(meta->used > meta->capacity, 0)) {
|
|
|
|
|
|
// 異常検出時はロールバックして終了(fail-fast 回避のため静かに中断)
|
2025-12-05 23:41:01 +09:00
|
|
|
|
c7_log_used_assign_cap(meta, class_idx, "FREELIST_OVERRUN");
|
2025-11-14 01:02:00 +09:00
|
|
|
|
meta->used = meta->capacity;
|
|
|
|
|
|
break;
|
|
|
|
|
|
}
|
2025-11-05 12:31:14 +09:00
|
|
|
|
ss_active_inc(tls->ss);
|
2025-11-14 01:02:00 +09:00
|
|
|
|
}
|
|
|
|
|
|
// freelist が尽きていて carved < capacity なら線形 carve
|
|
|
|
|
|
else if (meta->carved < meta->capacity) {
|
|
|
|
|
|
uint8_t* base = tls->slab_base
|
|
|
|
|
|
? tls->slab_base
|
|
|
|
|
|
: tiny_slab_base_for_geometry(tls->ss, tls->slab_idx);
|
|
|
|
|
|
if (!base) {
|
|
|
|
|
|
break;
|
|
|
|
|
|
}
|
|
|
|
|
|
uint16_t idx = meta->carved;
|
|
|
|
|
|
if (idx >= meta->capacity) {
|
|
|
|
|
|
break;
|
|
|
|
|
|
}
|
2025-12-04 10:38:19 +09:00
|
|
|
|
|
|
|
|
|
|
// Point 5: Stride calculation bounds check (CRITICAL - prevent out-of-bounds carving)
|
|
|
|
|
|
// Check 5a: Stride must be valid (not 0, not suspiciously large)
|
|
|
|
|
|
if (stride == 0 || stride > 100000) {
|
|
|
|
|
|
fprintf(stderr,
|
|
|
|
|
|
"[STRIDE_INVALID] cls=%d stride=%zu idx=%u cap=%u\n",
|
|
|
|
|
|
class_idx, stride, idx, meta->capacity);
|
|
|
|
|
|
fprintf(stderr, "[STRIDE_INVALID] ss=%p meta=%p base=%p\n",
|
|
|
|
|
|
(void*)tls->ss, (void*)meta, (void*)base);
|
|
|
|
|
|
abort();
|
|
|
|
|
|
}
|
|
|
|
|
|
|
2025-11-14 01:02:00 +09:00
|
|
|
|
uint8_t* addr = base + ((size_t)idx * stride);
|
2025-12-04 10:38:19 +09:00
|
|
|
|
|
|
|
|
|
|
// Check 5b: Calculated address must be within slab bounds
|
|
|
|
|
|
uintptr_t base_addr = (uintptr_t)base;
|
|
|
|
|
|
uintptr_t addr_addr = (uintptr_t)addr;
|
|
|
|
|
|
size_t max_offset = (size_t)meta->capacity * stride;
|
|
|
|
|
|
|
|
|
|
|
|
if (addr_addr < base_addr || (addr_addr - base_addr) > max_offset) {
|
|
|
|
|
|
fprintf(stderr,
|
|
|
|
|
|
"[ADDR_OUT_OF_BOUNDS] cls=%d base=%p addr=%p offset=%zu max=%zu\n",
|
|
|
|
|
|
class_idx, (void*)base, (void*)addr, (addr_addr - base_addr), max_offset);
|
|
|
|
|
|
fprintf(stderr, "[ADDR_OUT_OF_BOUNDS] idx=%u cap=%u stride=%zu\n",
|
|
|
|
|
|
idx, meta->capacity, stride);
|
|
|
|
|
|
abort();
|
|
|
|
|
|
}
|
|
|
|
|
|
|
2025-11-09 22:12:34 +09:00
|
|
|
|
meta->carved++;
|
2025-11-05 12:31:14 +09:00
|
|
|
|
meta->used++;
|
2025-12-05 23:41:01 +09:00
|
|
|
|
c7_meta_used_note(class_idx, C7_META_USED_SRC_FRONT);
|
2025-11-14 01:02:00 +09:00
|
|
|
|
if (__builtin_expect(meta->used > meta->capacity, 0)) {
|
2025-12-05 23:41:01 +09:00
|
|
|
|
c7_log_used_assign_cap(meta, class_idx, "CARVE_OVERRUN");
|
2025-11-14 01:02:00 +09:00
|
|
|
|
meta->used = meta->capacity;
|
|
|
|
|
|
break;
|
|
|
|
|
|
}
|
2025-11-05 12:31:14 +09:00
|
|
|
|
ss_active_inc(tls->ss);
|
2025-11-14 01:02:00 +09:00
|
|
|
|
p = addr;
|
|
|
|
|
|
}
|
|
|
|
|
|
// freelist も carve も尽きたら、新しい slab を shared pool から取得
|
|
|
|
|
|
else {
|
|
|
|
|
|
if (!superslab_refill(class_idx)) {
|
2025-11-13 16:33:03 +09:00
|
|
|
|
break;
|
2025-11-14 01:02:00 +09:00
|
|
|
|
}
|
2025-11-10 01:59:11 +09:00
|
|
|
|
tls = &g_tls_slabs[class_idx];
|
2025-11-13 16:33:03 +09:00
|
|
|
|
meta = tls->meta;
|
2025-11-14 01:02:00 +09:00
|
|
|
|
if (!tls->ss || !meta ||
|
|
|
|
|
|
meta->class_idx != (uint8_t)class_idx ||
|
|
|
|
|
|
!tls->slab_base ||
|
|
|
|
|
|
meta->capacity == 0) {
|
2025-11-13 16:33:03 +09:00
|
|
|
|
break;
|
2025-11-14 01:02:00 +09:00
|
|
|
|
}
|
2025-11-05 12:31:14 +09:00
|
|
|
|
continue;
|
|
|
|
|
|
}
|
2025-11-13 16:33:03 +09:00
|
|
|
|
|
2025-11-14 01:02:00 +09:00
|
|
|
|
if (!p) {
|
2025-11-10 16:48:20 +09:00
|
|
|
|
break;
|
|
|
|
|
|
}
|
2025-11-13 16:33:03 +09:00
|
|
|
|
|
2025-11-14 01:02:00 +09:00
|
|
|
|
tiny_debug_validate_node_base(class_idx, p, "sll_refill_small_from_ss");
|
2025-11-13 16:33:03 +09:00
|
|
|
|
|
Front-Direct implementation: SS→FC direct refill + SLL complete bypass
## Summary
Implemented Front-Direct architecture with complete SLL bypass:
- Direct SuperSlab → FastCache refill (1-hop, bypasses SLL)
- SLL-free allocation/free paths when Front-Direct enabled
- Legacy path sealing (SLL inline opt-in, SFC cascade ENV-only)
## New Modules
- core/refill/ss_refill_fc.h (236 lines): Standard SS→FC refill entry point
- Remote drain → Freelist → Carve priority
- Header restoration for C1-C6 (NOT C0/C7)
- ENV: HAKMEM_TINY_P0_DRAIN_THRESH, HAKMEM_TINY_P0_NO_DRAIN
- core/front/fast_cache.h: FastCache (L1) type definition
- core/front/quick_slot.h: QuickSlot (L0) type definition
## Allocation Path (core/tiny_alloc_fast.inc.h)
- Added s_front_direct_alloc TLS flag (lazy ENV check)
- SLL pop guarded by: g_tls_sll_enable && !s_front_direct_alloc
- Refill dispatch:
- Front-Direct: ss_refill_fc_fill() → fastcache_pop() (1-hop)
- Legacy: sll_refill_batch_from_ss() → SLL → FC (2-hop, A/B only)
- SLL inline pop sealed (requires HAKMEM_TINY_INLINE_SLL=1 opt-in)
## Free Path (core/hakmem_tiny_free.inc, core/hakmem_tiny_fastcache.inc.h)
- FC priority: Try fastcache_push() first (same-thread free)
- tiny_fast_push() bypass: Returns 0 when s_front_direct_free || !g_tls_sll_enable
- Fallback: Magazine/slow path (safe, bypasses SLL)
## Legacy Sealing
- SFC cascade: Default OFF (ENV-only via HAKMEM_TINY_SFC_CASCADE=1)
- Deleted: core/hakmem_tiny_free.inc.bak, core/pool_refill_legacy.c.bak
- Documentation: ss_refill_fc_fill() promoted as CANONICAL refill entry
## ENV Controls
- HAKMEM_TINY_FRONT_DIRECT=1: Enable Front-Direct (SS→FC direct)
- HAKMEM_TINY_P0_DIRECT_FC_ALL=1: Same as above (alt name)
- HAKMEM_TINY_REFILL_BATCH=1: Enable batch refill (also enables Front-Direct)
- HAKMEM_TINY_SFC_CASCADE=1: Enable SFC cascade (default OFF)
- HAKMEM_TINY_INLINE_SLL=1: Enable inline SLL pop (default OFF, requires AGGRESSIVE_INLINE)
## Benchmarks (Front-Direct Enabled)
```bash
ENV: HAKMEM_BENCH_FAST_FRONT=1 HAKMEM_TINY_FRONT_DIRECT=1
HAKMEM_TINY_REFILL_BATCH=1 HAKMEM_TINY_P0_DIRECT_FC_ALL=1
HAKMEM_TINY_REFILL_COUNT_HOT=256 HAKMEM_TINY_REFILL_COUNT_MID=96
HAKMEM_TINY_BUMP_CHUNK=256
bench_random_mixed (16-1040B random, 200K iter):
256 slots: 1.44M ops/s (STABLE, 0 SEGV)
128 slots: 1.44M ops/s (STABLE, 0 SEGV)
bench_fixed_size (fixed size, 200K iter):
256B: 4.06M ops/s (has debug logs, expected >10M without logs)
128B: Similar (debug logs affect)
```
## Verification
- TRACE_RING test (10K iter): **0 SLL events** detected ✅
- Complete SLL bypass confirmed when Front-Direct=1
- Stable execution: 200K iterations × multiple sizes, 0 SEGV
## Next Steps
- Disable debug logs in hak_alloc_api.inc.h (call_num 14250-14280 range)
- Re-benchmark with clean Release build (target: 10-15M ops/s)
- 128/256B shortcut path optimization (FC hit rate improvement)
Co-Authored-By: ChatGPT <chatgpt@openai.com>
Suggested-By: ultrathink
2025-11-14 05:41:49 +09:00
|
|
|
|
// Prepare header for header-classes so that safeheader mode accepts the push
|
2025-11-29 08:12:08 +09:00
|
|
|
|
// Uses Header Box API (C1-C6 only; C0/C7 skip - offset=0 overwrites header)
|
|
|
|
|
|
tiny_header_write_if_preserved(p, class_idx);
|
|
|
|
|
|
|
2025-11-14 01:02:00 +09:00
|
|
|
|
// SLL push 失敗時はそれ以上積まない(p はTLS slab管理下なので破棄でOK)
|
2025-11-10 16:48:20 +09:00
|
|
|
|
if (!tls_sll_push(class_idx, p, cap)) {
|
|
|
|
|
|
break;
|
|
|
|
|
|
}
|
2025-11-05 12:31:14 +09:00
|
|
|
|
|
2025-11-14 01:02:00 +09:00
|
|
|
|
taken++;
|
2025-11-05 12:31:14 +09:00
|
|
|
|
}
|
2025-11-14 01:02:00 +09:00
|
|
|
|
|
|
|
|
|
|
return taken;
|
2025-11-05 12:31:14 +09:00
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
#endif // HAKMEM_TINY_REFILL_INC_H
|