Phase 75-1: C6-only Inline Slots (P2) - GO (+2.87%)

Modular implementation of hot-class inline slots optimization:
- Created 5 new boxes: env_box, tls_box, fast_path_api, integration_box, test_script
- Single decision point at TLS init (ENV gate: HAKMEM_TINY_C6_INLINE_SLOTS=0/1)
- Integration: 2 minimal boundary points (alloc/free paths for C6 class)
- Default OFF: zero overhead when disabled (full backward compatibility)

Results (10-run Mixed SSOT, WS=400):
- Baseline (C6 inline OFF):  44.24 M ops/s
- Treatment (C6 inline ON):  45.51 M ops/s
- Delta: +1.27 M ops/s (+2.87%)

Status:  GO - Strong improvement via C6 ring buffer fast-path
Mechanism: Branch elimination on unified_cache_push/pop for C6 allocations
Next: Phase 75-2 (add C5 inline slots, target 85% C4-C7 coverage)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
This commit is contained in:
Moe Charm (CI)
2025-12-18 08:22:09 +09:00
parent 65f982aeec
commit 0009ce13b3
11 changed files with 743 additions and 10 deletions

View File

@ -31,6 +31,8 @@
#include "../front/tiny_unified_cache.h" // For TinyUnifiedCache
#include "tiny_header_box.h" // Phase 5 E5-2: For tiny_header_finalize_alloc
#include "tiny_unified_lifo_box.h" // Phase 15 v1: UnifiedCache FIFO→LIFO
#include "tiny_c6_inline_slots_env_box.h" // Phase 75-1: C6 inline slots ENV gate
#include "../front/tiny_c6_inline_slots.h" // Phase 75-1: C6 inline slots API
// ============================================================================
// Branch Prediction Macros (Pointer Safety - Prediction Hints)
@ -110,6 +112,21 @@ __attribute__((always_inline))
static inline void* tiny_hot_alloc_fast(int class_idx) {
extern __thread TinyUnifiedCache g_unified_cache[];
// Phase 75-1: C6 Inline Slots early-exit (ENV gated)
// Try C6 inline slots FIRST (before unified cache) for class 6
if (class_idx == 6 && tiny_c6_inline_slots_enabled()) {
void* base = c6_inline_pop(c6_inline_tls());
if (TINY_HOT_LIKELY(base != NULL)) {
TINY_HOT_METRICS_HIT(class_idx);
#if HAKMEM_TINY_HEADER_CLASSIDX
return tiny_header_finalize_alloc(base, class_idx);
#else
return base;
#endif
}
// C6 inline miss → fall through to unified cache
}
// TLS cache access (1 cache miss)
// NOTE: Range check removed - caller (hak_tiny_size_to_class) guarantees valid class_idx
TinyUnifiedCache* cache = &g_unified_cache[class_idx];