Tiny: Enable P0 batch refill by default + docs and task update

Summary
- Default P0 ON: Build-time HAKMEM_TINY_P0_BATCH_REFILL=1 remains; runtime gate now defaults to ON
  (HAKMEM_TINY_P0_ENABLE unset or not '0'). Kill switch preserved via HAKMEM_TINY_P0_DISABLE=1.
- Fix critical bug: After freelist→SLL batch splice, increment TinySlabMeta::used by 'from_freelist'
  to mirror non-P0 behavior (prevents under-accounting and follow-on carve invariants from breaking).
- Add low-overhead A/B toggles for triage: HAKMEM_TINY_P0_NO_DRAIN (skip remote drain),
  HAKMEM_TINY_P0_LOG (emit [P0_COUNTER_OK/MISMATCH] based on total_active_blocks delta).
- Keep linear carve fail-fast guards across simple/general/TLS-bump paths.

Perf (1T, 100k×256B)
- P0 OFF: ~2.73M ops/s (stable)
- P0 ON (no drain): ~2.45M ops/s
- P0 ON (normal drain): ~2.76M ops/s (fastest)

Known
- Rare [P0_COUNTER_MISMATCH] warnings persist (non-fatal). Continue auditing active/used
  balance around batch freelist splice and remote drain splice.

Docs
- Add docs/TINY_P0_BATCH_REFILL.md (runtime switches, behavior, perf notes).
- Update CURRENT_TASK.md with Tiny P0 status (default ON) and next steps.
This commit is contained in:
Moe Charm (CI)
2025-11-09 22:12:34 +09:00
parent 1010a961fb
commit d9b334b968
24 changed files with 1240 additions and 69 deletions

View File

@ -154,7 +154,11 @@ void* hak_tiny_alloc(size_t size) {
HAK_RET_ALLOC(class_idx, head);
}
// Refill a small batch directly from TLS-cached SuperSlab
#if HAKMEM_TINY_P0_BATCH_REFILL
(void)sll_refill_batch_from_ss(class_idx, 32);
#else
(void)sll_refill_small_from_ss(class_idx, 32);
#endif
head = g_tls_sll_head[class_idx];
if (__builtin_expect(head != NULL, 1)) {
g_tls_sll_head[class_idx] = *(void**)head;
@ -189,7 +193,11 @@ void* hak_tiny_alloc(size_t size) {
(class_idx == 1) ? HAKMEM_TINY_BENCH_WARMUP16 :
(class_idx == 2) ? HAKMEM_TINY_BENCH_WARMUP32 :
HAKMEM_TINY_BENCH_WARMUP64;
#if HAKMEM_TINY_P0_BATCH_REFILL
if (warm > 0) (void)sll_refill_batch_from_ss(class_idx, warm);
#else
if (warm > 0) (void)sll_refill_small_from_ss(class_idx, warm);
#endif
*done = 1;
}
}
@ -218,7 +226,11 @@ void* hak_tiny_alloc(size_t size) {
(class_idx == 1) ? HAKMEM_TINY_BENCH_REFILL16 :
(class_idx == 2) ? HAKMEM_TINY_BENCH_REFILL32 :
HAKMEM_TINY_BENCH_REFILL64;
#if HAKMEM_TINY_P0_BATCH_REFILL
if (__builtin_expect(sll_refill_batch_from_ss(class_idx, bench_refill) > 0, 0)) {
#else
if (__builtin_expect(sll_refill_small_from_ss(class_idx, bench_refill) > 0, 0)) {
#endif
head = g_tls_sll_head[class_idx];
if (head) {
tiny_debug_ring_record(TINY_RING_EVENT_ALLOC_SUCCESS, (uint16_t)class_idx, head, 2);