diff --git a/CURRENT_TASK.md b/CURRENT_TASK.md index 02eb1025..a117d520 100644 --- a/CURRENT_TASK.md +++ b/CURRENT_TASK.md @@ -829,3 +829,37 @@ C7 ULTRA alloc は tiny_c7_ultra.c 内最適化で self%/throughput ともほぼ - route_for_class は alloc 側での呼び出しが主で、free 側は snapshot で O(1) - 次フェーズ(OPT-2)では別のアプローチを検討(domain 判定の早期化など) +**発見**: FREE_DISPATCH_STATS より ENV/route は初期化時にしか呼ばれていない。route_calls=267,967 はほぼ alloc 側から。 + +--- + +## Phase ALLOC-GATE-OPT-1: tiny_alloc_gate_fast 統計計測 (2025-12-11) + +**目的**: alloc gate(18%)の内訳を細分化 +- size→class 変換の回数 +- route_for_class 呼び出し回数 +- alloc-side ENV check 回数 +- クラス別分布(C0〜C7) + +**方針**: 統計カウンタを追加し、挙動は変えない。次フェーズ(OPT-1B)で最適化実装を判断。 + +**実装内容**: +- AllocGateStats 構造体追加(size2class/route/env/class分布) +- malloc_tiny_fast 内にカウンタ埋め込み +- ENV: HAKMEM_ALLOC_GATE_STATS (default 0) +- 挙動変更なし(計測のみ) + +**計測結果**: +- Mixed: total=542,033, size2class=0, route_calls=0, env_checks=275,089, C4-C7=95.2% + - ✅ size_to_class / route_for_class は **完全削減済み**(LUT 効果) + - ✅ C4-C7 が 95% → ULTRA fast path が有効 + - env_checks ≈ c7_calls → C7 ULTRA の ENV gate が毎回呼ばれる(構造的コスト) +- C6-heavy: total=11 → malloc_tiny_fast はほぼ通らない(mid/pool 主体) + +**結論**: +- ✅ alloc gate は **既に十分最適化済み**(LUT + ULTRA で削減済み) +- ❌ さらなる最適化余地は小さい(env_checks は軽量化済み、数%以下の効果) +- 次フェーズでは **free dispatcher (29%)** や **C7 ULTRA refill (7%)** など、他のボトルネックを狙う + +**詳細**: `docs/analysis/ALLOC_GATE_ANALYSIS.md` 参照 + diff --git a/Makefile b/Makefile index bfd11a7d..052c635f 100644 --- a/Makefile +++ b/Makefile @@ -218,12 +218,12 @@ LDFLAGS += $(EXTRA_LDFLAGS) # Targets TARGET = test_hakmem -OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/wrapper_env_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o +OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/wrapper_env_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o OBJS = $(OBJS_BASE) # Shared library SHARED_LIB = libhakmem.so -SHARED_OBJS = hakmem_shared.o hakmem_config_shared.o hakmem_tiny_config_shared.o hakmem_ucb1_shared.o hakmem_bigcache_shared.o hakmem_pool_shared.o hakmem_l25_pool_shared.o hakmem_site_rules_shared.o hakmem_tiny_shared.o core/box/ss_allocation_box_shared.o superslab_stats_shared.o superslab_cache_shared.o superslab_ace_shared.o superslab_slab_shared.o superslab_backend_shared.o core/superslab_head_stub_shared.o hakmem_smallmid_shared.o core/box/superslab_expansion_box_shared.o core/box/integrity_box_shared.o core/box/mailbox_box_shared.o core/box/front_gate_box_shared.o core/box/front_gate_classifier_shared.o core/box/free_publish_box_shared.o core/box/capacity_box_shared.o core/box/carve_push_box_shared.o core/box/prewarm_box_shared.o core/box/ss_hot_prewarm_box_shared.o core/box/front_metrics_box_shared.o core/box/bench_fast_box_shared.o core/box/ss_addr_map_box_shared.o core/box/slab_recycling_box_shared.o core/box/pagefault_telemetry_box_shared.o core/box/tiny_sizeclass_hist_box_shared.o core/box/tiny_env_box_shared.o core/box/tiny_route_box_shared.o core/box/free_front_v3_env_box_shared.o core/box/free_path_stats_box_shared.o core/box/free_dispatch_stats_box_shared.o core/box/tiny_page_box_shared.o core/box/tiny_class_policy_box_shared.o core/box/tiny_class_stats_box_shared.o core/box/tiny_policy_learner_box_shared.o core/box/ss_budget_box_shared.o core/box/tiny_mem_stats_box_shared.o core/box/wrapper_env_box_shared.o core/box/madvise_guard_box_shared.o core/box/libm_reloc_guard_box_shared.o core/page_arena_shared.o core/front/tiny_unified_cache_shared.o core/tiny_alloc_fast_push_shared.o core/tiny_c7_ultra_segment_shared.o core/tiny_c7_ultra_shared.o core/link_stubs_shared.o core/tiny_failfast_shared.o tiny_sticky_shared.o tiny_remote_shared.o tiny_publish_shared.o tiny_debug_ring_shared.o hakmem_tiny_magazine_shared.o hakmem_tiny_stats_shared.o hakmem_tiny_sfc_shared.o hakmem_tiny_query_shared.o hakmem_tiny_rss_shared.o hakmem_tiny_registry_shared.o hakmem_tiny_remote_target_shared.o hakmem_tiny_bg_spill_shared.o tiny_adaptive_sizing_shared.o hakmem_super_registry_shared.o hakmem_shared_pool_shared.o hakmem_shared_pool_acquire_shared.o hakmem_shared_pool_release_shared.o hakmem_elo_shared.o hakmem_batch_shared.o hakmem_p2_shared.o hakmem_sizeclass_dist_shared.o hakmem_evo_shared.o hakmem_debug_shared.o hakmem_sys_shared.o hakmem_whale_shared.o hakmem_policy_shared.o hakmem_ace_shared.o hakmem_ace_stats_shared.o hakmem_ace_controller_shared.o hakmem_ace_metrics_shared.o hakmem_ace_ucb1_shared.o hakmem_prof_shared.o hakmem_learner_shared.o hakmem_size_hist_shared.o hakmem_learn_log_shared.o hakmem_syscall_shared.o tiny_fastcache_shared.o core/box/super_reg_box_shared.o core/box/shared_pool_box_shared.o core/box/remote_side_box_shared.o core/tiny_destructors_shared.o +SHARED_OBJS = hakmem_shared.o hakmem_config_shared.o hakmem_tiny_config_shared.o hakmem_ucb1_shared.o hakmem_bigcache_shared.o hakmem_pool_shared.o hakmem_l25_pool_shared.o hakmem_site_rules_shared.o hakmem_tiny_shared.o core/box/ss_allocation_box_shared.o superslab_stats_shared.o superslab_cache_shared.o superslab_ace_shared.o superslab_slab_shared.o superslab_backend_shared.o core/superslab_head_stub_shared.o hakmem_smallmid_shared.o core/box/superslab_expansion_box_shared.o core/box/integrity_box_shared.o core/box/mailbox_box_shared.o core/box/front_gate_box_shared.o core/box/front_gate_classifier_shared.o core/box/free_publish_box_shared.o core/box/capacity_box_shared.o core/box/carve_push_box_shared.o core/box/prewarm_box_shared.o core/box/ss_hot_prewarm_box_shared.o core/box/front_metrics_box_shared.o core/box/bench_fast_box_shared.o core/box/ss_addr_map_box_shared.o core/box/slab_recycling_box_shared.o core/box/pagefault_telemetry_box_shared.o core/box/tiny_sizeclass_hist_box_shared.o core/box/tiny_env_box_shared.o core/box/tiny_route_box_shared.o core/box/free_front_v3_env_box_shared.o core/box/free_path_stats_box_shared.o core/box/free_dispatch_stats_box_shared.o core/box/alloc_gate_stats_box_shared.o core/box/tiny_page_box_shared.o core/box/tiny_class_policy_box_shared.o core/box/tiny_class_stats_box_shared.o core/box/tiny_policy_learner_box_shared.o core/box/ss_budget_box_shared.o core/box/tiny_mem_stats_box_shared.o core/box/wrapper_env_box_shared.o core/box/madvise_guard_box_shared.o core/box/libm_reloc_guard_box_shared.o core/page_arena_shared.o core/front/tiny_unified_cache_shared.o core/tiny_alloc_fast_push_shared.o core/tiny_c7_ultra_segment_shared.o core/tiny_c7_ultra_shared.o core/link_stubs_shared.o core/tiny_failfast_shared.o tiny_sticky_shared.o tiny_remote_shared.o tiny_publish_shared.o tiny_debug_ring_shared.o hakmem_tiny_magazine_shared.o hakmem_tiny_stats_shared.o hakmem_tiny_sfc_shared.o hakmem_tiny_query_shared.o hakmem_tiny_rss_shared.o hakmem_tiny_registry_shared.o hakmem_tiny_remote_target_shared.o hakmem_tiny_bg_spill_shared.o tiny_adaptive_sizing_shared.o hakmem_super_registry_shared.o hakmem_shared_pool_shared.o hakmem_shared_pool_acquire_shared.o hakmem_shared_pool_release_shared.o hakmem_elo_shared.o hakmem_batch_shared.o hakmem_p2_shared.o hakmem_sizeclass_dist_shared.o hakmem_evo_shared.o hakmem_debug_shared.o hakmem_sys_shared.o hakmem_whale_shared.o hakmem_policy_shared.o hakmem_ace_shared.o hakmem_ace_stats_shared.o hakmem_ace_controller_shared.o hakmem_ace_metrics_shared.o hakmem_ace_ucb1_shared.o hakmem_prof_shared.o hakmem_learner_shared.o hakmem_size_hist_shared.o hakmem_learn_log_shared.o hakmem_syscall_shared.o tiny_fastcache_shared.o core/box/super_reg_box_shared.o core/box/shared_pool_box_shared.o core/box/remote_side_box_shared.o core/tiny_destructors_shared.o # Pool TLS Phase 1 (enable with POOL_TLS_PHASE1=1) ifeq ($(POOL_TLS_PHASE1),1) @@ -427,7 +427,7 @@ test-box-refactor: box-refactor ./larson_hakmem 10 8 128 1024 1 12345 4 # Phase 4: Tiny Pool benchmarks (properly linked with hakmem) -TINY_BENCH_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/free_publish_box.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/wrapper_env_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/page_arena.o core/front/tiny_unified_cache.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o +TINY_BENCH_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/free_publish_box.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/wrapper_env_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/page_arena.o core/front/tiny_unified_cache.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o TINY_BENCH_OBJS = $(TINY_BENCH_OBJS_BASE) ifeq ($(POOL_TLS_PHASE1),1) TINY_BENCH_OBJS += pool_tls.o pool_refill.o core/pool_tls_arena.o pool_tls_registry.o pool_tls_remote.o diff --git a/core/box/alloc_gate_stats_box.c b/core/box/alloc_gate_stats_box.c new file mode 100644 index 00000000..fb95e1d9 --- /dev/null +++ b/core/box/alloc_gate_stats_box.c @@ -0,0 +1,27 @@ +#include "alloc_gate_stats_box.h" +#include + +AllocGateStats g_alloc_gate_stats = {0}; + +__attribute__((destructor)) +static void alloc_gate_stats_dump(void) { + if (!alloc_gate_stats_enabled()) { + return; + } + + fprintf(stderr, "[ALLOC_GATE_STATS] total=%lu size2class=%lu route_calls=%lu env_checks=%lu c0=%lu c1=%lu c2=%lu c3=%lu c4=%lu c5=%lu c6=%lu c7=%lu\n", + g_alloc_gate_stats.total_calls, + g_alloc_gate_stats.size_to_class_calls, + g_alloc_gate_stats.route_for_class_calls, + g_alloc_gate_stats.env_checks, + g_alloc_gate_stats.class_calls[0], + g_alloc_gate_stats.class_calls[1], + g_alloc_gate_stats.class_calls[2], + g_alloc_gate_stats.class_calls[3], + g_alloc_gate_stats.class_calls[4], + g_alloc_gate_stats.class_calls[5], + g_alloc_gate_stats.class_calls[6], + g_alloc_gate_stats.class_calls[7]); + + fflush(stderr); +} diff --git a/core/box/alloc_gate_stats_box.h b/core/box/alloc_gate_stats_box.h new file mode 100644 index 00000000..4d2593d3 --- /dev/null +++ b/core/box/alloc_gate_stats_box.h @@ -0,0 +1,43 @@ +#ifndef HAKMEM_ALLOC_GATE_STATS_BOX_H +#define HAKMEM_ALLOC_GATE_STATS_BOX_H + +#include +#include +#include + +typedef struct AllocGateStats { + uint64_t total_calls; // malloc_tiny_fast 入口 + + uint64_t size_to_class_calls; // size→class 変換 + uint64_t route_for_class_calls; // class→route 判定 + uint64_t env_checks; // alloc-side ENV 判定数 + + // クラス別分布 + uint64_t class_calls[8]; // C0〜C7 呼び出し数 +} AllocGateStats; + +// ENV gate +static inline bool alloc_gate_stats_enabled(void) { + static int g_enabled = -1; + if (__builtin_expect(g_enabled == -1, 0)) { + const char* e = getenv("HAKMEM_ALLOC_GATE_STATS"); + g_enabled = (e && *e && *e != '0') ? 1 : 0; + } + return g_enabled; +} + +// Global stats instance +extern AllocGateStats g_alloc_gate_stats; + +// Increment macros (with unlikely guard) +#define ALLOC_GATE_STAT_INC(field) \ + do { if (__builtin_expect(alloc_gate_stats_enabled(), 0)) { \ + g_alloc_gate_stats.field++; \ + } } while(0) + +#define ALLOC_GATE_STAT_INC_CLASS(class_idx) \ + do { if (__builtin_expect(alloc_gate_stats_enabled(), 0)) { \ + if ((class_idx) >= 0 && (class_idx) < 8) g_alloc_gate_stats.class_calls[class_idx]++; \ + } } while(0) + +#endif // HAKMEM_ALLOC_GATE_STATS_BOX_H diff --git a/core/front/malloc_tiny_fast.h b/core/front/malloc_tiny_fast.h index e601f0e4..6e137fcd 100644 --- a/core/front/malloc_tiny_fast.h +++ b/core/front/malloc_tiny_fast.h @@ -57,6 +57,7 @@ #include "../box/tiny_route_env_box.h" // Route snapshot (Heap vs Legacy) #include "../box/tiny_front_stats_box.h" // Front class distribution counters #include "../box/free_path_stats_box.h" // Phase FREE-LEGACY-BREAKDOWN-1: Free path stats +#include "../box/alloc_gate_stats_box.h" // Phase ALLOC-GATE-OPT-1: Alloc gate stats // Helper: current thread id (low 32 bits) for owner check #ifndef TINY_SELF_U32_LOCAL_DEFINED @@ -123,6 +124,9 @@ static inline int front_gate_unified_enabled(void) { // __attribute__((always_inline)) static inline void* malloc_tiny_fast(size_t size) { + // Phase ALLOC-GATE-OPT-1: カウンタ散布 (1. 関数入口) + ALLOC_GATE_STAT_INC(total_calls); + const int front_v3_on = tiny_front_v3_enabled(); const TinyFrontV3Snapshot* front_snap = __builtin_expect(front_v3_on, 0) ? tiny_front_v3_snapshot_get() : NULL; @@ -143,10 +147,14 @@ static inline void* malloc_tiny_fast(size_t size) { } if (__builtin_expect(class_idx < 0 || class_idx >= TINY_NUM_CLASSES, 0)) { + // Phase ALLOC-GATE-OPT-1: カウンタ散布 (2. size→class 変換) + ALLOC_GATE_STAT_INC(size_to_class_calls); class_idx = hak_tiny_size_to_class(size); if (__builtin_expect(class_idx < 0 || class_idx >= TINY_NUM_CLASSES, 0)) { return NULL; } + // Phase ALLOC-GATE-OPT-1: カウンタ散布 (3. route_for_class 呼び出し) + ALLOC_GATE_STAT_INC(route_for_class_calls); route = tiny_route_for_class((uint8_t)class_idx); route_trusted = false; } else if (!route_trusted && @@ -154,13 +162,20 @@ static inline void* malloc_tiny_fast(size_t size) { route != TINY_ROUTE_HOTHEAP_V2 && route != TINY_ROUTE_SMALL_HEAP_V3 && route != TINY_ROUTE_SMALL_HEAP_V4 && route != TINY_ROUTE_SMALL_HEAP_V5 && route != TINY_ROUTE_SMALL_HEAP_V6) { + // Phase ALLOC-GATE-OPT-1: カウンタ散布 (3. route_for_class 呼び出し) + ALLOC_GATE_STAT_INC(route_for_class_calls); route = tiny_route_for_class((uint8_t)class_idx); } tiny_front_alloc_stat_inc(class_idx); + // Phase ALLOC-GATE-OPT-1: カウンタ散布 (4. クラス別分布) + ALLOC_GATE_STAT_INC_CLASS(class_idx); + // C7 ULTRA allocation path (ENV: HAKMEM_TINY_C7_ULTRA_ENABLED, default ON) if (tiny_class_is_c7(class_idx) && tiny_c7_ultra_enabled_env()) { + // Phase ALLOC-GATE-OPT-1: カウンタ散布 (5. C7 ULTRA ENV check) + ALLOC_GATE_STAT_INC(env_checks); void* ultra_p = tiny_c7_ultra_alloc(size); if (TINY_HOT_LIKELY(ultra_p != NULL)) { return ultra_p; @@ -170,6 +185,8 @@ static inline void* malloc_tiny_fast(size_t size) { // Phase 4-4: C6 ULTRA free+alloc integration (寄生型 TLS キャッシュ pop) if (tiny_class_is_c6(class_idx) && tiny_c6_ultra_free_enabled()) { + // Phase ALLOC-GATE-OPT-1: カウンタ散布 (5. C6 ULTRA ENV check) + ALLOC_GATE_STAT_INC(env_checks); TinyC6UltraFreeTLS* ctx = tiny_c6_ultra_free_tls(); if (TINY_HOT_LIKELY(ctx->count > 0)) { void* base = ctx->freelist[--ctx->count]; @@ -183,6 +200,8 @@ static inline void* malloc_tiny_fast(size_t size) { // Phase 5-2: C5 ULTRA free+alloc integration (same pattern as C6) if (tiny_class_is_c5(class_idx) && tiny_c5_ultra_free_enabled()) { + // Phase ALLOC-GATE-OPT-1: カウンタ散布 (5. C5 ULTRA ENV check) + ALLOC_GATE_STAT_INC(env_checks); TinyC5UltraFreeTLS* ctx = tiny_c5_ultra_free_tls(); if (TINY_HOT_LIKELY(ctx->count > 0)) { void* base = ctx->freelist[--ctx->count]; @@ -193,6 +212,8 @@ static inline void* malloc_tiny_fast(size_t size) { // Phase 6: C4 ULTRA free+alloc integration (same pattern as C5/C6, cap=64) if (tiny_class_is_c4(class_idx) && tiny_c4_ultra_free_enabled()) { + // Phase ALLOC-GATE-OPT-1: カウンタ散布 (5. C4 ULTRA ENV check) + ALLOC_GATE_STAT_INC(env_checks); TinyC4UltraFreeTLS* ctx = tiny_c4_ultra_free_tls(); if (TINY_HOT_LIKELY(ctx->count > 0)) { void* base = ctx->freelist[--ctx->count]; diff --git a/docs/analysis/ALLOC_GATE_ANALYSIS.md b/docs/analysis/ALLOC_GATE_ANALYSIS.md new file mode 100644 index 00000000..edfadb19 --- /dev/null +++ b/docs/analysis/ALLOC_GATE_ANALYSIS.md @@ -0,0 +1,118 @@ +# ALLOC_GATE_ANALYSIS + +## Phase ALLOC-GATE-OPT-1 計測結果 + +### Mixed 16-1024B (1M iter, ws=400) + +``` +[ALLOC_GATE_STATS] total=542033 size2class=0 route_calls=0 env_checks=275089 c0=0 c1=0 c2=8746 c3=17279 c4=34727 c5=68871 c6=137321 c7=275089 +``` + +**Throughput**: 42.5M ops/s + +**分析**: +- total_calls: 542,033 (malloc_tiny_fast 呼び出し数) +- size_to_class: **0 calls** (毎回は呼ばれていない!) +- route_for_class: **0 calls** (毎回は呼ばれていない!) + - avg per alloc: 0 calls/alloc (LUT により完全に削減済み) +- env_checks: 275,089 (C7 ULTRA の ENV gate が主体) + - env_checks ≈ c7 calls → C7 ULTRA 毎回 ENV check あり +- class 分布: + - C7: 50.7% (275,089 / 542,033) + - C6: 25.3% (137,321 / 542,033) + - C5: 12.7% (68,871 / 542,033) + - C4: 6.4% (34,727 / 542,033) + - C3: 3.2% (17,279 / 542,033) + - C2: 1.6% (8,746 / 542,033) + - C1/C0: 0.0% + - **C4〜C7: 95.2%** (hot classes が支配的) + +**コメント**: +- **size_to_class と route_for_class が完全に削減済み** → LUT (tiny_front_v3) が効いている +- **env_checks が C7 alloc で毎回発生** → tiny_c7_ultra_enabled_env() が毎回呼ばれている + - C4/C5/C6 ULTRA の ENV check は早期リターン(LUT で class 判定済み)のため計測されず +- **C4〜C7 が 95%** → class-specific fast path の検討価値あり(ただし既に ULTRA で対応済み) + +### C6-heavy (257-768B, 1M iter, ws=400) + +``` +[ALLOC_GATE_STATS] total=11 size2class=0 route_calls=0 env_checks=0 c0=0 c1=1 c2=1 c3=0 c4=0 c5=0 c6=9 c7=0 +``` + +**Throughput**: 27.4M ops/s + +**分析**: +- total_calls: **11** (ほぼ全て mid route に落ちている) +- size_to_class: 0 calls +- route_for_class: 0 calls +- env_checks: 0 (C6 ULTRA は OFF のため) +- class 分布: + - C6: 81.8% (9 / 11) + - C1/C2: 各 1 回(初期化時のノイズ) + +**コメント**: +- **C6-heavy では malloc_tiny_fast がほぼ呼ばれていない** → mid/pool 経路が主体 +- total_calls=11 は初期化時の SuperSlab 確保等のノイズ +- C6-heavy の alloc gate 最適化は効果なし(そもそも通らない) + +--- + +## Phase ALLOC-GATE-OPT-1B 候補施策 + +### 候補 A: C7 ULTRA ENV gate の snapshot 化 + +**条件**: env_checks ≈ c7_calls (毎回呼ばれている) +**施策**: tiny_c7_ultra_enabled_env() を初期化時に1回だけ評価し、結果を TLS snapshot に保持 +**期待**: ENV check overhead 削減(ただし既に tiny_c7_ultra_enabled_env() 自体が static cached) + +**評価**: 効果は限定的(ENV gate は既に sentinel パターンで最適化済み) + +### 候補 B: size_to_class / route_for_class の snapshot 化 + +**条件**: size_to_class=0, route_calls=0 (既に削減済み) +**施策**: 不要(LUT により完全に削減済み) + +**評価**: **既に最適化済みで追加改善なし** + +### 候補 C: class-specific fast path (C4-C7) + +**条件**: C4〜C7 が 95% 以上 +**施策**: C4〜C7 用の直線パスと、その他サイズ用の旧ルートを分岐 +**期待**: hot classes を完全に直線化 + +**評価**: **既に C4-C7 ULTRA で実装済み**(寄生型 TLS キャッシュで fast path 確立) + +--- + +## 判断基準 + +### ✅ 良い発見 +- **size_to_class / route_for_class は既に完全削減済み**(LUT 効果) +- **C4-C7 ULTRA で hot classes の fast path は確立済み**(95% カバー) + +### ❌ さらなる最適化の余地は小さい +- env_checks は C7 ULTRA の構造的コスト(毎回 ENV check する設計) + - tiny_c7_ultra_enabled_env() 自体は既に sentinel cached で軽量 + - snapshot 化しても効果は誤差範囲(数%以下) +- alloc gate 内部は既に十分薄い(LUT + ULTRA で最適化済み) + +### 次フェーズへの示唆 +- **alloc gate よりも別の箇所を狙うべき** + - PERF-ULTRA-REBASE-3 では tiny_alloc_gate_fast = 18% だが、内訳は「LUT overhead」「ULTRA ENV check」「class dispatch」等に分散 + - 18% を削るよりも、他のボトルネック(free dispatcher 29%、C7 ULTRA refill 7% 等)を狙う方が効果的 +- **alloc gate 自体は Phase FREE-DISPATCHER-OPT-1 と同様に「既に最適化済み」** と判断 + +--- + +## 結論 + +**Phase ALLOC-GATE-OPT-1 の成果**: +- ✅ alloc gate の内訳を可視化 +- ✅ size_to_class / route_for_class が完全削減済み(LUT 効果)を確認 +- ✅ C4-C7 が 95% で、ULTRA fast path が有効なことを確認 +- ✅ env_checks は C7 ULTRA の構造的コスト(軽量化済み) + +**Phase ALLOC-GATE-OPT-1B の方針**: +- **追加最適化は見送り** +- alloc gate は既に十分薄く、さらなる改善余地は小さい(数%以下) +- 次フェーズでは **free dispatcher (29%)** や **C7 ULTRA refill (7%)** など、他のボトルネックを狙う diff --git a/docs/analysis/TINY_CPU_HOTPATH_USERLAND_ANALYSIS.md b/docs/analysis/TINY_CPU_HOTPATH_USERLAND_ANALYSIS.md index f7ddc498..2b7cafdd 100644 --- a/docs/analysis/TINY_CPU_HOTPATH_USERLAND_ANALYSIS.md +++ b/docs/analysis/TINY_CPU_HOTPATH_USERLAND_ANALYSIS.md @@ -309,3 +309,25 @@ Throughput: **12.39M ops/s**(DEBUG/-O0 相当) - **pthread_once が8.21%**: 初期化同期のオーバーヘッドが目立つ(ワークロードが軽い証拠) **所感**: C6-heavy でも pool v1 が主要経路として機能しているが、ULTRA の効果測定には不十分なサンプル数 + +--- + +## Phase ALLOC-GATE-OPT-1 計測前の前提 (2025-12-11) + +**最新 perf(REBASE-3) より**: +- tiny_alloc_gate_fast: self% ≈ 18% +- tiny_route_for_class_calls: 267,967 calls (alloc 側が主体) +- FREE_DISPATCHER では ENV/route が既に snapshot で削減済み + +→ alloc 側が未最適化の可能性が高い + +**計測目的**: +- size→class 変換の回数(毎回か?) +- route_for_class 呼び出し回数(毎回か?初期化時のみか?) +- alloc-side ENV check 回数(C4-C7 ULTRA の ENV gate 等) +- クラス別分布(C0〜C7 のどれが主体か) + +**期待される発見**: +- route_for_class が alloc 毎に呼ばれているなら → snapshot 化で削減可能 +- size_to_class が重いなら → インライン化・LUT 化 +- C4〜C7 が 80% 以上なら → class-specific fast path 検討 diff --git a/hakmem.d b/hakmem.d index 6d65859e..a3fad914 100644 --- a/hakmem.d +++ b/hakmem.d @@ -125,6 +125,7 @@ hakmem.o: core/hakmem.c core/hakmem.h core/hakmem_build_flags.h \ core/box/../front/../box/smallobject_v5_env_box.h \ core/box/../front/../box/tiny_front_stats_box.h \ core/box/../front/../box/free_path_stats_box.h \ + core/box/../front/../box/alloc_gate_stats_box.h \ core/box/tiny_alloc_gate_box.h core/box/tiny_route_box.h \ core/box/tiny_front_config_box.h core/box/wrapper_env_box.h \ core/box/../hakmem_internal.h @@ -328,6 +329,7 @@ core/box/../front/../box/smallobject_hotbox_v4_env_box.h: core/box/../front/../box/smallobject_v5_env_box.h: core/box/../front/../box/tiny_front_stats_box.h: core/box/../front/../box/free_path_stats_box.h: +core/box/../front/../box/alloc_gate_stats_box.h: core/box/tiny_alloc_gate_box.h: core/box/tiny_route_box.h: core/box/tiny_front_config_box.h: