- AllocGateStats 構造体追加(size2class/route/env/class分布) - malloc_tiny_fast にカウンタ埋め込み - ENV: HAKMEM_ALLOC_GATE_STATS (default 0) - 挙動変更なし(計測のみ) 計測結果: - Mixed: total=542k, size2class=0, route_calls=0, env_checks=275k, C4-C7=95.2% - size_to_class/route_for_class は完全削減済み(LUT 効果) - C4-C7 が 95% → ULTRA fast path が有効 - env_checks ≈ c7_calls → C7 ULTRA の ENV gate が毎回呼ばれる - C6-heavy: total=11 → malloc_tiny_fast はほぼ通らない(mid/pool 主体) 結論: - alloc gate は既に十分最適化済み(LUT + ULTRA で削減済み) - さらなる最適化余地は小さい(env_checks は軽量化済み、数%以下の効果) - 次フェーズでは free dispatcher (29%) や C7 ULTRA refill (7%) など、他のボトルネックを狙う 詳細: docs/analysis/ALLOC_GATE_ANALYSIS.md 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
28 lines
977 B
C
28 lines
977 B
C
#include "alloc_gate_stats_box.h"
|
|
#include <stdio.h>
|
|
|
|
AllocGateStats g_alloc_gate_stats = {0};
|
|
|
|
__attribute__((destructor))
|
|
static void alloc_gate_stats_dump(void) {
|
|
if (!alloc_gate_stats_enabled()) {
|
|
return;
|
|
}
|
|
|
|
fprintf(stderr, "[ALLOC_GATE_STATS] total=%lu size2class=%lu route_calls=%lu env_checks=%lu c0=%lu c1=%lu c2=%lu c3=%lu c4=%lu c5=%lu c6=%lu c7=%lu\n",
|
|
g_alloc_gate_stats.total_calls,
|
|
g_alloc_gate_stats.size_to_class_calls,
|
|
g_alloc_gate_stats.route_for_class_calls,
|
|
g_alloc_gate_stats.env_checks,
|
|
g_alloc_gate_stats.class_calls[0],
|
|
g_alloc_gate_stats.class_calls[1],
|
|
g_alloc_gate_stats.class_calls[2],
|
|
g_alloc_gate_stats.class_calls[3],
|
|
g_alloc_gate_stats.class_calls[4],
|
|
g_alloc_gate_stats.class_calls[5],
|
|
g_alloc_gate_stats.class_calls[6],
|
|
g_alloc_gate_stats.class_calls[7]);
|
|
|
|
fflush(stderr);
|
|
}
|