Files
hakmem/core/hakmem_tiny_superslab.h
Moe Charm (CI) c8842360ca Fix: Double header calculation bug in tiny_block_stride_for_class() - META_MISMATCH resolved
Problem:
workset=8192 crashed with META_MISMATCH errors (off-by-one):
- [TLS_SLL_PUSH_META_MISMATCH] cls=3 meta_cls=2
- [HDR_META_MISMATCH] cls=6 meta_cls=5
- [FREE_FAST_HDR_META_MISMATCH] cls=7 meta_cls=6

Root Cause (discovered by Task agent):
Contradictory stride calculations in codebase:

1. g_tiny_class_sizes[TINY_NUM_CLASSES]
   - Already includes 1-byte header (TOTAL size)
   - {8, 16, 32, 64, 128, 256, 512, 2048}

2. tiny_block_stride_for_class() (BEFORE FIX)
   - Added extra +1 for header (DOUBLE COUNTING!)
   - Class 5: 256 + 1 = 257 (should be 256)
   - Class 6: 512 + 1 = 513 (should be 512)

This caused stride → class_idx reverse lookup to fail:
- superslab_init_slab() searched g_tiny_class_sizes[?] == 257
- No match found → meta->class_idx corrupted
- Free: header has cls=6, meta has cls=5 → MISMATCH!

Fix Applied (core/hakmem_tiny_superslab.h:49-69):

- Removed duplicate +1 calculation under HAKMEM_TINY_HEADER_CLASSIDX
- Added OOB guard (return 0 for invalid class_idx)
- Added comment: "g_tiny_class_sizes already includes the 1-byte header"

Test Results:

Before fix:
- 100K iterations: META_MISMATCH errors → SEGV
- 200K iterations: Immediate SEGV

After fix:
- 100K iterations:  9.9M ops/s (no errors)
- 200K iterations:  15.2M ops/s (no errors)
- 220K iterations:  15.3M ops/s (no errors)
- 225K iterations:  SEGV (different bug, not META_MISMATCH)

Impact:
 META_MISMATCH errors completely eliminated
 Stability improved: 100K → 220K iterations (+120%)
 Throughput stable: 15M ops/s
⚠️  Different SEGV at 225K (requires separate investigation)

Investigation Credit:
- Task agent: Identified contradictory stride tables
- ChatGPT: Applied fix and verified LUT correctness

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-22 09:34:35 +09:00

170 lines
6.9 KiB
C

// hakmem_tiny_superslab.h - SuperSlab allocator for Tiny Pool (Phase 6.22)
// Purpose: mimalloc-inspired 2MB aligned slab allocation for fast pointer→slab lookup
// License: MIT
// Date: 2025-10-24
// Phase 6-2.8: Refactored into modular headers (types, inline)
#ifndef HAKMEM_TINY_SUPERSLAB_H
#define HAKMEM_TINY_SUPERSLAB_H
#include <stdint.h>
#include <stddef.h>
#include <stdbool.h>
#include <stdatomic.h>
#include <stdlib.h>
#include <time.h> // Phase 8.3: For clock_gettime() in hak_now_ns()
#include <signal.h>
#include <stdio.h> // For fprintf() debugging
#include <pthread.h>
// Phase 6-2.8: Modular headers (types, inline functions)
#include "superslab/superslab_types.h"
#include "superslab/superslab_inline.h"
// Legacy includes (for backward compatibility)
#include "tiny_debug_ring.h"
#include "tiny_remote.h"
#include "hakmem_tiny_superslab_constants.h" // Phase 6-2.5: Centralized layout constants
#include "hakmem_build_flags.h"
// Debug instrumentation flags (defined in hakmem_tiny.c)
extern int g_debug_remote_guard;
extern int g_tiny_safe_free_strict;
extern _Atomic uint64_t g_ss_active_dec_calls;
uint32_t tiny_remote_drain_threshold(void);
// Monotonic clock in nanoseconds (header inline to avoid TU dependencies)
static inline uint64_t hak_now_ns(void) {
struct timespec ts;
clock_gettime(CLOCK_MONOTONIC, &ts);
return (uint64_t)ts.tv_sec * 1000000000ull + (uint64_t)ts.tv_nsec;
}
// ============================================================================
// Tiny block stride helper (Box 3 source-of-truth mirror)
// ============================================================================
// Returns the total per-block stride (header込み) used for slab carving.
// NOTE: g_tiny_class_sizes already includes the 1-byte header for all classes.
static inline size_t tiny_block_stride_for_class(int class_idx) {
// Local size table (avoid extern dependency for inline function)
// CRITICAL: C7 upgraded from 1024B to 2048B stride (Phase C7-Upgrade)
static const size_t class_sizes[8] = {8, 16, 32, 64, 128, 256, 512, 2048};
if (__builtin_expect(class_idx < 0 || class_idx >= 8, 0)) {
return 0;
}
size_t bs = class_sizes[class_idx];
#if !HAKMEM_BUILD_RELEASE
// One-shot debug: confirm stride behavior at runtime for class 0
static _Atomic int g_stride_dbg = 0;
if (class_idx == 0) {
int exp = 0;
if (atomic_compare_exchange_strong(&g_stride_dbg, &exp, 1)) {
fprintf(stderr, "[STRIDE_DBG] HEADER_CLASSIDX=%d class=%d stride=%zu\n",
(int)HAKMEM_TINY_HEADER_CLASSIDX, class_idx, bs);
}
}
#endif
return bs;
}
/*
* Phase 12 (Shared SuperSlab Pool: Stage A - Minimal Box API wrapper)
*
* Goals at this stage:
* - Introduce a single, well-defined Box/Phase12 API that the tiny front-end
* (slow path / refill) uses to obtain blocks from the SuperSlab layer.
* - Keep existing per-class SuperslabHead/g_superslab_heads and
* superslab_allocate() implementation intact as the internal backend.
* - Do NOT change behavior or allocation strategy yet; we only:
* - centralize the "allocate from superslab for tiny class" logic, and
* - isolate callers from internal Superslab details.
*
* This allows:
* - hak_tiny_alloc_slow() / refill code to stop depending on legacy internals,
* so later commits can switch the backend to the shared SuperSlab pool
* (hakmem_shared_pool.{h,c}) without touching front-end call sites.
*
* Stage A API (introduced here):
* - void* hak_tiny_alloc_superslab_box(int class_idx);
* - Returns a single tiny block for given class_idx, or NULL on failure.
* - BOX CONTRACT:
* - Callers pass validated class_idx (0 <= idx < TINY_NUM_CLASSES).
* - Returns a BASE pointer already suitable for Box/TLS-SLL/header rules.
* - No direct access to SuperSlab/TinySlabMeta from callers.
*
* NOTE:
* - At this stage, hak_tiny_alloc_superslab_box() is a thin inline wrapper
* that forwards to the existing per-class SuperslabHead backend.
* - Later Stage B/C patches may switch its implementation to shared_pool_*()
* without changing any callers.
*/
void* hak_tiny_alloc_superslab_box(int class_idx);
// Initialize a slab within SuperSlab
void superslab_init_slab(SuperSlab* ss, int slab_idx, size_t block_size, uint32_t owner_tid);
// Mark a slab as active
void superslab_activate_slab(SuperSlab* ss, int slab_idx);
// Mark a slab as inactive
void superslab_deactivate_slab(SuperSlab* ss, int slab_idx);
// Find first free slab index (-1 if none)
int superslab_find_free_slab(SuperSlab* ss);
// Free a SuperSlab (unregister and return to pool or munmap)
void superslab_free(SuperSlab* ss);
// Refill TLS slab for given tiny class from shared SuperSlab pool.
// Returns: SuperSlab* on success (also updates g_tls_slabs[class_idx]),
// NULL on failure (no change to TLS state).
SuperSlab* superslab_refill(int class_idx);
// Statistics
void superslab_print_stats(SuperSlab* ss);
// Phase 8.3: ACE statistics
void superslab_ace_print_stats(void);
// ============================================================================
// Phase 8.3: ACE (Adaptive Cache Engine) - SuperSlab adaptive sizing
// ============================================================================
// ACE tick function (called periodically, ~150ms interval)
// Observes metrics and decides promotion (1MB→2MB) or demotion (2MB→1MB)
void hak_tiny_superslab_ace_tick(int class_idx, uint64_t now_ns);
// Phase 8.4: ACE Observer (called from Learner thread - zero hot-path overhead)
void hak_tiny_superslab_ace_observe_all(void);
// ============================================================================
// Partial SuperSlab adopt/publish (per-class single-slot)
// ============================================================================
// Publish a SuperSlab with available freelist for other threads to adopt.
void ss_partial_publish(int class_idx, SuperSlab* ss);
// Adopt published SuperSlab for the class (returns NULL if none).
SuperSlab* ss_partial_adopt(int class_idx);
// ============================================================================
// SuperSlab adopt gate (publish/adopt wiring helper)
// ============================================================================
// Environment-aware switch that keeps free/alloc sides in sync. Default:
// - Disabled until cross-thread free is observed.
// - `HAKMEM_TINY_SS_ADOPT=1` forces ON, `=0` forces OFF.
int tiny_adopt_gate_should_publish(void);
int tiny_adopt_gate_should_adopt(void);
void tiny_adopt_gate_on_remote_seen(int class_idx);
// ============================================================================
// External variable declarations
// ============================================================================
extern _Atomic int g_ss_remote_seen; // set to 1 on first remote free observed
extern int g_remote_force_notify;
#endif // HAKMEM_TINY_SUPERSLAB_H