Phase B: TinyFrontC23Box - Ultra-simple front path for C2/C3
Implemented dedicated fast path for C2/C3 (128B/256B) to bypass SFC/SLL/Magazine complexity and directly access FastCache + SuperSlab. Changes: - core/front/tiny_front_c23.h: New ultra-simple front path (NEW) - Direct FC → SS refill (2 layers vs 5+ in generic path) - ENV-gated: HAKMEM_TINY_FRONT_C23_SIMPLE=1 - Refill target: 64 blocks (optimized via A/B testing) - core/tiny_alloc_fast.inc.h: Hook at entry point (+11 lines) - Early return for C2/C3 when C23 path enabled - Safe fallback to generic path on failure Results (100K iterations, A/B tested refill=16/32/64/128): - 128B: 8.27M → 9.55M ops/s (+15.5% with refill=64) ✅ - 256B: 7.90M → 8.61M ops/s (+9.0% with refill=32) ✅ - 256B: 7.90M → 8.47M ops/s (+7.2% with refill=64) ✅ Optimal Refill: 64 blocks - Balanced performance across C2/C3 - 128B best case: +15.5% - 256B good performance: +7.2% - Simple single-value default Architecture: - Flow: FC pop → (miss) → ss_refill_fc_fill(64) → FC pop retry - Bypassed layers: SLL, Magazine, SFC, MidTC - Preserved: Box boundaries, safety checks, fallback paths - Free path: Unchanged (TLS SLL + drain) Box Theory Compliance: - Clear Front ← Backend boundary (ss_refill_fc_fill) - ENV-gated A/B testing (default OFF, opt-in) - Safe fallback: NULL → generic path handles slow case - Zero impact when disabled Performance Gap Analysis: - Current: 8-9M ops/s - After Phase B: 9-10M ops/s (+10-15%) - Target: 15-20M ops/s - Remaining gap: ~2x (suggests deeper bottlenecks remain) Next Steps: - Perf profiling to identify next bottleneck - Current hypotheses: classify_ptr, drain overhead, refill path - Phase C candidates: FC-direct free, inline optimizations ENV Usage: # Enable C23 fast path (default: OFF) export HAKMEM_TINY_FRONT_C23_SIMPLE=1 # Optional: Override refill target (default: 64) export HAKMEM_TINY_FRONT_C23_REFILL=32 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@ -26,6 +26,9 @@
|
||||
#include "box/front_gate_box.h"
|
||||
#endif
|
||||
#include "hakmem_tiny_integrity.h" // PRIORITY 1-4: Corruption detection
|
||||
#ifdef HAKMEM_TINY_HEADER_CLASSIDX
|
||||
#include "front/tiny_front_c23.h" // Phase B: Ultra-simple C2/C3 front
|
||||
#endif
|
||||
#include <stdio.h>
|
||||
|
||||
// Phase 7 Task 2: Aggressive inline TLS cache access
|
||||
@ -583,6 +586,19 @@ static inline void* tiny_alloc_fast(size_t size) {
|
||||
void* ptr = NULL;
|
||||
const int hot_c5 = (g_tiny_hotpath_class5 && class_idx == 5);
|
||||
|
||||
// Phase B: Ultra-simple front for C2/C3 (128B/256B)
|
||||
// ENV-gated: HAKMEM_TINY_FRONT_C23_SIMPLE=1
|
||||
// Target: 15-20M ops/s (vs current 8-9M ops/s)
|
||||
#ifdef HAKMEM_TINY_HEADER_CLASSIDX
|
||||
if (tiny_front_c23_enabled() && (class_idx == 2 || class_idx == 3)) {
|
||||
void* c23_ptr = tiny_front_c23_alloc(size, class_idx);
|
||||
if (c23_ptr) {
|
||||
HAK_RET_ALLOC(class_idx, c23_ptr);
|
||||
}
|
||||
// Fall through to existing path if C23 path failed (NULL)
|
||||
}
|
||||
#endif
|
||||
|
||||
// NEW: Front-Direct/SLL-OFF bypass control (TLS cached, lazy init)
|
||||
static __thread int s_front_direct_alloc = -1;
|
||||
if (__builtin_expect(s_front_direct_alloc == -1, 0)) {
|
||||
|
||||
Reference in New Issue
Block a user