Files
hakmem/core/box/free_path_stats_box.c
Moe Charm (CI) fb88725a43 Phase FREE-LEGACY-OPT-6: C4 ULTRA Implementation
Implement C4 ULTRA free TLS cache with parasitic free+alloc pattern,
achieving 99.7-99.9% elimination of C4 legacy fallback calls.

Key Features:
- TLS cache cap=64 (tuned for L1 cache fit, smaller than C5/C6's 128)
- Segment learning via ss_fast_lookup() on first free
- Free-side cache push + alloc-side TLS pop pattern
- ENV gate: HAKMEM_TINY_C4_ULTRA_FREE_ENABLED (default OFF)
- Full FREE_PATH_STATS instrumentation

Benchmark Results:
C4-heavy (65-128B range):
  - C4 legacy: 591,583 → 1,711 (-99.7%)
  - c4_ultra cache hits: ~599k (free) + ~599k (alloc)
  - Mixed load: 340,732 → 284 C4 legacy (-99.9%)

Legacy fallback reduction:
  - C4-heavy: 589,872 fewer legacy calls (-10.9% total)
  - Mixed: 340,448 fewer C4 legacy calls (-12.8% in mixed)

Performance note: ~2% throughput cost in isolated C4-heavy case,
acceptable tradeoff for 99%+ legacy elimination per class.

Files:
  NEW: core/box/tiny_c4_ultra_free_box.h/c
  NEW: core/box/tiny_c4_ultra_free_env_box.h
  MOD: core/box/tiny_ultra_classes_box.h (added C4 macros)
  MOD: core/box/free_path_stats_box.h/c (C4 ULTRA counters)
  MOD: core/front/malloc_tiny_fast.h (C4 alloc+free integration)
  MOD: Makefile (added C4 ULTRA object)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-11 19:38:27 +09:00

49 lines
2.0 KiB
C

#include "free_path_stats_box.h"
#include <stdio.h>
FreePathStats g_free_path_stats = {0};
// Helper function for pool_api.inc.h (to avoid inline include issues)
void free_path_stat_inc_pool_v1_fast(void) {
if (__builtin_expect(free_path_stats_enabled(), 0)) {
g_free_path_stats.pool_v1_fast++;
}
}
__attribute__((destructor))
static void free_path_stats_dump(void) {
if (!free_path_stats_enabled()) {
return;
}
fprintf(stderr, "[FREE_PATH_STATS] total=%lu c7_ultra=%lu c6_ultra_free=%lu c6_ultra_alloc=%lu c5_ultra_free=%lu c5_ultra_alloc=%lu c4_ultra_free=%lu c4_ultra_alloc=%lu small_v3=%lu v6=%lu tiny_v1=%lu pool_v1=%lu remote=%lu super_lookup=%lu legacy_fb=%lu\n",
g_free_path_stats.total_calls,
g_free_path_stats.c7_ultra_fast,
g_free_path_stats.c6_ultra_free_fast, // Phase 4-2
g_free_path_stats.c6_ultra_alloc_hit, // Phase 4-4
g_free_path_stats.c5_ultra_free_fast, // Phase 5-1
g_free_path_stats.c5_ultra_alloc_hit, // Phase 5-2
g_free_path_stats.c4_ultra_free_fast, // Phase 6
g_free_path_stats.c4_ultra_alloc_hit, // Phase 6
g_free_path_stats.smallheap_v3_fast,
g_free_path_stats.smallheap_v6_fast,
g_free_path_stats.tiny_heap_v1_fast,
g_free_path_stats.pool_v1_fast,
g_free_path_stats.remote_free,
g_free_path_stats.super_lookup_called,
g_free_path_stats.legacy_fallback);
// Phase 4-1: Legacy per-class breakdown
fprintf(stderr, "[FREE_PATH_STATS_LEGACY_BY_CLASS] c0=%lu c1=%lu c2=%lu c3=%lu c4=%lu c5=%lu c6=%lu c7=%lu\n",
g_free_path_stats.legacy_by_class[0],
g_free_path_stats.legacy_by_class[1],
g_free_path_stats.legacy_by_class[2],
g_free_path_stats.legacy_by_class[3],
g_free_path_stats.legacy_by_class[4],
g_free_path_stats.legacy_by_class[5],
g_free_path_stats.legacy_by_class[6],
g_free_path_stats.legacy_by_class[7]);
fflush(stderr);
}