diff --git a/CURRENT_TASK.md b/CURRENT_TASK.md index 1d180b2b..9ef2258d 100644 --- a/CURRENT_TASK.md +++ b/CURRENT_TASK.md @@ -27,6 +27,10 @@ - `core/box/tiny_page_box.h` / `core/box/tiny_page_box.c` を追加し、`HAKMEM_TINY_PAGE_BOX_CLASSES` で有効クラスを制御できる Page Box を実装。 - `tiny_tls_bind_slab()` から `tiny_page_box_on_new_slab()` を呼び出し、TLS が bind した C7 slab を per-thread の page pool に登録。 - `unified_cache_refill()` の先頭に Page Box 経路を追加し、C7 では「TLS が掴んでいるページ内 freelist/carve」からバッチ供給を試みてから Warm Pool / Shared Pool に落ちるようにした(Box 境界は `Tiny Page Box → Warm Pool → Shared Pool` の順序を維持)。 +- TinyClassPolicy/Stats/Learner Box を追加し、Hot path は `tiny_policy_get(class_idx)` で Page/Warm ポリシーを読むだけに統一。 + - FROZEN デフォルト(legacy プロファイル):Page Box は C5〜C7 のみ ON、Warm は C0〜C7 すべて ON(C0〜C4 cap=4、C5〜C7 cap=8)。 + - ENV `HAKMEM_TINY_POLICY_PROFILE=legacy|c5_7_only|tinyplus_all` で切替可能(未指定は legacy)。 + - Stats は OBSERVE 用に積むだけ、Learner は空実装のまま。 - TLS Bind Box の導入: - `core/box/ss_tls_bind_box.h` に `ss_tls_bind_one()` を追加し、「Superslab + slab_idx → TLS」のバインド処理(`superslab_init_slab` / `meta->class_idx` 設定 / `tiny_tls_bind_slab`)を 1 箇所に集約。 - `superslab_refill()`(Shared Pool 経路)および Warm Pool 実験経路から、この Box を経由して TLS に接続するよう統一。 @@ -43,8 +47,18 @@ - UC ミスを Warm/TLS/Shared 別に分類 を Debug ビルドで観測可能にした。 - `bench_random_mixed.c` に `HAKMEM_BENCH_C7_ONLY=1` を追加し、C7 サイズ専用の micro-bench を追加。 +- TinyClassPolicy / Stats / Learner Box の導入(初期フェーズ): + - `core/box/tiny_class_policy_box.{h,c}` にクラス別ポリシー構造体 `TinyClassPolicy` と `tiny_policy_get(class_idx)` を追加。 + - FROZEN デフォルト: Page Box = C5–C7, Warm = 全クラス(C0–C4 cap=4 / C5–C7 cap=8)。 + - `HAKMEM_TINY_POLICY_PROFILE=legacy|c5_7_only|tinyplus_all` でプロファイル切替可能(未知値は legacy にフォールバック)。 + - `core/box/tiny_class_stats_box.{h,c}` に OBSERVE 用の軽量カウンタ(UC miss / Warm hit / Shared Pool lock など)を追加。 + - `core/box/tiny_policy_learner_box.{h,c}` に Learner の骨組みを追加(現状は FROZEN/OBSERVE モード向けの雛形)。 + - `core/front/tiny_unified_cache.c` / Page Box / Warm Pool 経路を `tiny_policy_get(class_idx)` ベースでゲートし、Hot path からは Policy Box を読む形に統一。 ### 性能の現状(Random Mixed, HEAD) +- 注記 (2025-12-05, policy legacy プロファイル試験値): + - Release: `HAKMEM_TINY_PROFILE=full HAKMEM_TINY_POLICY_PROFILE=legacy ./bench_random_mixed_hakmem 1000000 256 42` → 約 4.9M ops/s(導入前 27M との乖離あり、要フォロー)。 + - Release C7-only: `HAKMEM_BENCH_C7_ONLY=1 ... HAKMEM_TINY_POLICY_PROFILE=legacy` → 約 2.7M ops/s(空スラブガード導入前の遅さに戻っており要再調査)。 - 条件: `bench_random_mixed_hakmem 1000000 256 42`(1T, ws=256, RELEASE, 16–1024B) - HAKMEM: 約 27.6M ops/s(C7 Warm/TLS 修復後) - system malloc: 約 90–100M ops/s @@ -76,6 +90,26 @@ - C5/C6 でも同様の Warm/TLS 最適化・空スラブガードを適用するか、 - Random Mixed 全体のボトルネック(Shared Pool ロック/Wrapper/mid-size path など)を洗うかを選択。 +### 次フェーズ(Tiny 全クラス向け Page Box / Warm / Policy 汎用化の検討) +- 方向性: + - 現在は C7 向け Tiny-Plus(Page Box + Warm Pool + TLS Bind)が安定したため、C1〜C7 まで「候補」として広げつつ、 + 実際にどのクラスで有効化するかは Policy Box(学習/ENV)側で制御する設計に進める。 +- 設計方針(案): + - `TinyClassPolicyBox` を新設し、クラス別ポリシー構造体(`TinyClassPolicy{ page_box_enabled, warm_enabled, warm_cap, ... }`)を配列で保持。 + - Hot path(Tiny Front / Unified Cache / Page Box / Warm Pool)は `tiny_policy_get(class_idx)` でポリシーを読むだけにし、 + 学習/更新は `TinyPolicyLearnerBox` 側で行う。 + - `TinyClassStatsBox` を導入し、クラス別に UC miss / warm hit / shared_pool_lock などの軽量カウンタを記録(OBSERVE/LEARN モード用)。 + - モードは FROZEN / OBSERVE / LEARN を ENV で切替可能にし、デフォルトは FROZEN(C5–C7 のみ Page Box/Warm ON, 他クラス OFF)。 +- 実装ステップ(案): + 1. C7 Page Box / Warm / TLS Bind の API を「class_idx を引数に取る汎用形」に整理し、内部で `if (!policy->page_box_enabled) fallback` する形にリファクタ。 + 2. `TinyClassPolicy` struct と `tiny_policy_get(class_idx)` を導入し、Hot path から直接 `HAKMEM_*` ENV を参照しないようにする(Policy Box 経由に統一)。 + 3. `TinyClassStatsBox` を追加し、FROZEN/OBSERVE モードで C1〜C7 の stats を集計(policy はまだ固定)。 + 4. `TinyPolicyLearnerBox` を追加し、LEARN モードで stats をもとに `page_box_enabled[]` / `warm_cap[]` を更新(ただし「同時に ON にできるクラス数」に上限を設ける)。 +- 進捗メモ(実装済み): + - `TinyClassPolicyBox`/`TinyClassStatsBox`/`TinyPolicyLearnerBox` を追加し、デフォルトで C5〜C7 に Page Box + Warm を許可(Warm cap=8)。 + - unified_cache_refill の Page/Warm 経路は `tiny_policy_get()` の返り値でゲートし、Warm push は per-class cap を尊重。 + - Page Box 初期化もデフォルトで C5〜C7 を有効化。OBSERVE 用の軽量 stats increment を UC miss / Warm hit に接続済み。 + ### メモ - ページフォルト問題は Prefault Box + ウォームアップで一定水準まで解消済みで、現在の主ボトルネックはユーザー空間の箱(Unified Cache / free / Pool)側に移っている。 -- 以降の最適化は「箱を削る」ではなく、「HOT 層で踏む箱を減らし、Tiny 的なシンプル経路をどこまで広げるか」にフォーカスする。 +- 以降の最適化は「箱を削る」ではなく、「HOT 層で踏む箱を減らし、Tiny 的なシンプル経路と Tiny-Plus 経路(Page Box + Warm)をクラス別ポリシーでどう使い分けるか」にフォーカスする。 diff --git a/Makefile b/Makefile index 362be12e..6ba2cc32 100644 --- a/Makefile +++ b/Makefile @@ -219,12 +219,12 @@ LDFLAGS += $(EXTRA_LDFLAGS) # Targets TARGET = test_hakmem -OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/tiny_page_box.o core/box/wrapper_env_box.o core/box/ptr_trace_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/link_stubs.o core/tiny_failfast.o test_hakmem.o +OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/wrapper_env_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/link_stubs.o core/tiny_failfast.o test_hakmem.o OBJS = $(OBJS_BASE) # Shared library SHARED_LIB = libhakmem.so -SHARED_OBJS = hakmem_shared.o hakmem_config_shared.o hakmem_tiny_config_shared.o hakmem_ucb1_shared.o hakmem_bigcache_shared.o hakmem_pool_shared.o hakmem_l25_pool_shared.o hakmem_site_rules_shared.o hakmem_tiny_shared.o core/box/ss_allocation_box_shared.o superslab_stats_shared.o superslab_cache_shared.o superslab_ace_shared.o superslab_slab_shared.o superslab_backend_shared.o core/superslab_head_stub_shared.o hakmem_smallmid_shared.o core/box/superslab_expansion_box_shared.o core/box/integrity_box_shared.o core/box/mailbox_box_shared.o core/box/front_gate_box_shared.o core/box/front_gate_classifier_shared.o core/box/free_publish_box_shared.o core/box/capacity_box_shared.o core/box/carve_push_box_shared.o core/box/prewarm_box_shared.o core/box/ss_hot_prewarm_box_shared.o core/box/front_metrics_box_shared.o core/box/bench_fast_box_shared.o core/box/ss_addr_map_box_shared.o core/box/slab_recycling_box_shared.o core/box/pagefault_telemetry_box_shared.o core/box/tiny_sizeclass_hist_box_shared.o core/box/tiny_env_box_shared.o core/box/tiny_route_box_shared.o core/box/tiny_page_box_shared.o core/box/wrapper_env_box_shared.o core/page_arena_shared.o core/front/tiny_unified_cache_shared.o core/tiny_alloc_fast_push_shared.o core/link_stubs_shared.o core/tiny_failfast_shared.o tiny_sticky_shared.o tiny_remote_shared.o tiny_publish_shared.o tiny_debug_ring_shared.o hakmem_tiny_magazine_shared.o hakmem_tiny_stats_shared.o hakmem_tiny_sfc_shared.o hakmem_tiny_query_shared.o hakmem_tiny_rss_shared.o hakmem_tiny_registry_shared.o hakmem_tiny_remote_target_shared.o hakmem_tiny_bg_spill_shared.o tiny_adaptive_sizing_shared.o hakmem_super_registry_shared.o hakmem_shared_pool_shared.o hakmem_shared_pool_acquire_shared.o hakmem_shared_pool_release_shared.o hakmem_elo_shared.o hakmem_batch_shared.o hakmem_p2_shared.o hakmem_sizeclass_dist_shared.o hakmem_evo_shared.o hakmem_debug_shared.o hakmem_sys_shared.o hakmem_whale_shared.o hakmem_policy_shared.o hakmem_ace_shared.o hakmem_ace_stats_shared.o hakmem_ace_controller_shared.o hakmem_ace_metrics_shared.o hakmem_ace_ucb1_shared.o hakmem_prof_shared.o hakmem_learner_shared.o hakmem_size_hist_shared.o hakmem_learn_log_shared.o hakmem_syscall_shared.o tiny_fastcache_shared.o +SHARED_OBJS = hakmem_shared.o hakmem_config_shared.o hakmem_tiny_config_shared.o hakmem_ucb1_shared.o hakmem_bigcache_shared.o hakmem_pool_shared.o hakmem_l25_pool_shared.o hakmem_site_rules_shared.o hakmem_tiny_shared.o core/box/ss_allocation_box_shared.o superslab_stats_shared.o superslab_cache_shared.o superslab_ace_shared.o superslab_slab_shared.o superslab_backend_shared.o core/superslab_head_stub_shared.o hakmem_smallmid_shared.o core/box/superslab_expansion_box_shared.o core/box/integrity_box_shared.o core/box/mailbox_box_shared.o core/box/front_gate_box_shared.o core/box/front_gate_classifier_shared.o core/box/free_publish_box_shared.o core/box/capacity_box_shared.o core/box/carve_push_box_shared.o core/box/prewarm_box_shared.o core/box/ss_hot_prewarm_box_shared.o core/box/front_metrics_box_shared.o core/box/bench_fast_box_shared.o core/box/ss_addr_map_box_shared.o core/box/slab_recycling_box_shared.o core/box/pagefault_telemetry_box_shared.o core/box/tiny_sizeclass_hist_box_shared.o core/box/tiny_env_box_shared.o core/box/tiny_route_box_shared.o core/box/tiny_page_box_shared.o core/box/tiny_class_policy_box_shared.o core/box/tiny_class_stats_box_shared.o core/box/tiny_policy_learner_box_shared.o core/box/wrapper_env_box_shared.o core/page_arena_shared.o core/front/tiny_unified_cache_shared.o core/tiny_alloc_fast_push_shared.o core/link_stubs_shared.o core/tiny_failfast_shared.o tiny_sticky_shared.o tiny_remote_shared.o tiny_publish_shared.o tiny_debug_ring_shared.o hakmem_tiny_magazine_shared.o hakmem_tiny_stats_shared.o hakmem_tiny_sfc_shared.o hakmem_tiny_query_shared.o hakmem_tiny_rss_shared.o hakmem_tiny_registry_shared.o hakmem_tiny_remote_target_shared.o hakmem_tiny_bg_spill_shared.o tiny_adaptive_sizing_shared.o hakmem_super_registry_shared.o hakmem_shared_pool_shared.o hakmem_shared_pool_acquire_shared.o hakmem_shared_pool_release_shared.o hakmem_elo_shared.o hakmem_batch_shared.o hakmem_p2_shared.o hakmem_sizeclass_dist_shared.o hakmem_evo_shared.o hakmem_debug_shared.o hakmem_sys_shared.o hakmem_whale_shared.o hakmem_policy_shared.o hakmem_ace_shared.o hakmem_ace_stats_shared.o hakmem_ace_controller_shared.o hakmem_ace_metrics_shared.o hakmem_ace_ucb1_shared.o hakmem_prof_shared.o hakmem_learner_shared.o hakmem_size_hist_shared.o hakmem_learn_log_shared.o hakmem_syscall_shared.o tiny_fastcache_shared.o # Pool TLS Phase 1 (enable with POOL_TLS_PHASE1=1) ifeq ($(POOL_TLS_PHASE1),1) @@ -251,7 +251,7 @@ endif # Benchmark targets BENCH_HAKMEM = bench_allocators_hakmem BENCH_SYSTEM = bench_allocators_system -BENCH_HAKMEM_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/tiny_page_box.o core/box/wrapper_env_box.o core/box/ptr_trace_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/link_stubs.o core/tiny_failfast.o bench_allocators_hakmem.o +BENCH_HAKMEM_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/wrapper_env_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/link_stubs.o core/tiny_failfast.o bench_allocators_hakmem.o BENCH_HAKMEM_OBJS = $(BENCH_HAKMEM_OBJS_BASE) ifeq ($(POOL_TLS_PHASE1),1) BENCH_HAKMEM_OBJS += pool_tls.o pool_refill.o pool_tls_arena.o pool_tls_registry.o pool_tls_remote.o @@ -428,7 +428,7 @@ test-box-refactor: box-refactor ./larson_hakmem 10 8 128 1024 1 12345 4 # Phase 4: Tiny Pool benchmarks (properly linked with hakmem) -TINY_BENCH_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/free_publish_box.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/slab_recycling_box.o core/box/tiny_sizeclass_hist_box.o core/box/pagefault_telemetry_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/tiny_page_box.o core/box/wrapper_env_box.o core/box/ptr_trace_box.o core/page_arena.o core/front/tiny_unified_cache.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/tiny_alloc_fast_push.o core/link_stubs.o core/tiny_failfast.o +TINY_BENCH_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/free_publish_box.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/slab_recycling_box.o core/box/tiny_sizeclass_hist_box.o core/box/pagefault_telemetry_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/wrapper_env_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/page_arena.o core/front/tiny_unified_cache.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/tiny_alloc_fast_push.o core/link_stubs.o core/tiny_failfast.o TINY_BENCH_OBJS = $(TINY_BENCH_OBJS_BASE) ifeq ($(POOL_TLS_PHASE1),1) TINY_BENCH_OBJS += pool_tls.o pool_refill.o core/pool_tls_arena.o pool_tls_registry.o pool_tls_remote.o diff --git a/README_PERF_ANALYSIS.md b/README_PERF_ANALYSIS.md index 64b01fc6..08f0f16b 100644 --- a/README_PERF_ANALYSIS.md +++ b/README_PERF_ANALYSIS.md @@ -1,7 +1,8 @@ # HAKMEM Allocator Performance Analysis Results **最新メモ (2025-12-05)**: C7 Warm/TLS Bind は本番経路を Bind-only (mode=1) に統一。Debug では `HAKMEM_WARM_TLS_BIND_C7=0/1/2` で切替可能だが、Release は常に mode=1 固定。C7-only ワークロードでは mode=1 が legacy (mode=0) 比で ~4–10x 速く、mode=2 は TLS carve 実験として残置。 -**追記 (2025-12-05, Release 修復)**: Release だけ C7 Warm が死んでいた原因は「満杯 C7 slab が Shared Pool に居残り、空スラブが Warm に渡っていなかった」こと。Acquire で C7 は空スラブ限定、Release でメタをリセットするガードを導入し、C7-only Release で ~18.8M ops/s、Random Mixed Release で ~27–28M ops/s まで回復。 +**追記 (2025-12-05, Release 修復)**: Release だけ C7 Warm が死んでいた原因は「満杯 C7 slab が Shared Pool に居残り、空スラブが Warm に渡っていなかった」こと。Acquire で C7 は空スラブ限定、Release でメタをリセットするガードを導入し、C7-only Release で ~18.8M ops/s、Random Mixed Release で ~27–28M ops/s まで回復。 +**追記 (2025-12-05, Policy Box)**: `TinyClassPolicyBox` を導入し、`HAKMEM_TINY_POLICY_PROFILE=legacy|c5_7_only|tinyplus_all` で Page/Warm ポリシーを切替可能にした。現状 legacy(PageBox= C5–C7, Warm= 全クラス cap 4/8)でランダム混在 Release は ~4.9M ops/s と低下しており、Warm 道の有効化状態を追加調査中。 **分析実施日**: 2025-11-28 **分析対象**: HAKMEM allocator (commit 0ce20bb83) diff --git a/core/box/link_missing_stubs.c b/core/box/link_missing_stubs.c new file mode 100644 index 00000000..eb87b761 --- /dev/null +++ b/core/box/link_missing_stubs.c @@ -0,0 +1,50 @@ +// link_missing_stubs.c +// Weak fallback definitions for optional diagnostics that may be compiled out +// in certain build configurations. These ensure linking succeeds even when +// the corresponding feature boxes are not included. + +#include +#include + +// Minimal forward declarations to avoid pulling full tracing headers +typedef int ptr_trace_event_t; +typedef struct SlabRecyclingStats { + uint64_t recycle_attempts; + uint64_t recycle_success; + uint64_t recycle_skip_not_empty; + uint64_t recycle_skip_no_cap; + uint64_t recycle_skip_null; +} SlabRecyclingStats; + +// lock_stats_box.h が存在しないビルド構成向けに前方宣言だけ置く +void lock_stats_init(void); + +// Ptr trace counters (used by tls_sll) +_Atomic uint64_t g_ptr_trace_op_counter __attribute__((weak)) = 0; + +void ptr_trace_record_impl(ptr_trace_event_t event, void* ptr, int class_idx, uint64_t op_num, + void* aux_ptr, uint32_t aux_u32, int aux_int, + const char* file, int line) + __attribute__((weak)); + +void ptr_trace_record_impl(ptr_trace_event_t event, void* ptr, int class_idx, uint64_t op_num, + void* aux_ptr, uint32_t aux_u32, int aux_int, + const char* file, int line) +{ + (void)event; + (void)ptr; + (void)class_idx; + (void)op_num; + (void)aux_ptr; + (void)aux_u32; + (void)aux_int; + (void)file; + (void)line; +} + +// Slab recycling stats (used in TLS drain instrumentation) +__thread SlabRecyclingStats g_slab_recycle_stats __attribute__((weak)) = {0}; + +// Lock stats init (contention metrics) +void lock_stats_init(void) __attribute__((weak)); +void lock_stats_init(void) {} diff --git a/core/box/tiny_class_policy_box.c b/core/box/tiny_class_policy_box.c new file mode 100644 index 00000000..01939997 --- /dev/null +++ b/core/box/tiny_class_policy_box.c @@ -0,0 +1,94 @@ +// tiny_class_policy_box.c - Initialization of per-class Tiny policy table + +#include "tiny_class_policy_box.h" +#include +#include +#include +#include + +TinyClassPolicy g_tiny_class_policy[TINY_NUM_CLASSES]; +static _Atomic int g_tiny_class_policy_init_done = 0; +static _Atomic int g_tiny_class_policy_logged = 0; + +static inline TinyClassPolicy tiny_class_policy_default_entry(void) { + TinyClassPolicy p = {0}; + p.page_box_enabled = 0; + p.warm_enabled = 0; + p.warm_cap = 0; + return p; +} + +static void tiny_class_policy_set_legacy(void) { + TinyClassPolicy def = tiny_class_policy_default_entry(); + for (int i = 0; i < TINY_NUM_CLASSES; i++) { + g_tiny_class_policy[i] = def; + } + + // legacy: Page Box は C5–C7、Warm は全クラス ON(C0–C4 は控えめ cap) + for (int i = 0; i < TINY_NUM_CLASSES; i++) { + g_tiny_class_policy[i].warm_enabled = 1; + g_tiny_class_policy[i].warm_cap = (i < 5) ? 4 : 8; + } + for (int i = 5; i < TINY_NUM_CLASSES; i++) { + g_tiny_class_policy[i].page_box_enabled = 1; + } +} + +static void tiny_class_policy_set_c5_7_only(void) { + TinyClassPolicy def = tiny_class_policy_default_entry(); + for (int i = 0; i < TINY_NUM_CLASSES; i++) { + g_tiny_class_policy[i] = def; + } + for (int i = 5; i < TINY_NUM_CLASSES; i++) { + g_tiny_class_policy[i].page_box_enabled = 1; + g_tiny_class_policy[i].warm_enabled = 1; + g_tiny_class_policy[i].warm_cap = 8; + } +} + +static void tiny_class_policy_set_tinyplus_all(void) { + // いまは legacy と同じ挙動でエントリを用意しておく。 + tiny_class_policy_set_legacy(); +} + +static const char* tiny_class_policy_set_profile(const char* profile) { + if (profile == NULL || *profile == '\0' || strcasecmp(profile, "legacy") == 0) { + tiny_class_policy_set_legacy(); + return "legacy"; + } else if (strcasecmp(profile, "c5_7_only") == 0) { + tiny_class_policy_set_c5_7_only(); + return "c5_7_only"; + } else if (strcasecmp(profile, "tinyplus_all") == 0) { + tiny_class_policy_set_tinyplus_all(); + return "tinyplus_all"; + } else { + // 不明な値は安全側で legacy にフォールバック。 + tiny_class_policy_set_legacy(); + return "legacy"; + } +} + +void tiny_class_policy_init_once(void) { + if (atomic_load_explicit(&g_tiny_class_policy_init_done, memory_order_acquire)) { + return; + } + + const char* profile = getenv("HAKMEM_TINY_POLICY_PROFILE"); + const char* active_profile = tiny_class_policy_set_profile(profile); + + // 1-shot ダンプでポリシーの内容を可視化(デバッグ用) + if (atomic_exchange_explicit(&g_tiny_class_policy_logged, 1, memory_order_acq_rel) == 0) { + fprintf(stderr, "[POLICY_INIT] profile=%s\n", active_profile); + for (int cls = 0; cls < TINY_NUM_CLASSES; cls++) { + TinyClassPolicy* p = &g_tiny_class_policy[cls]; + fprintf(stderr, + " C%d: page=%u warm=%u cap=%u\n", + cls, + p->page_box_enabled, + p->warm_enabled, + p->warm_cap); + } + } + + atomic_store_explicit(&g_tiny_class_policy_init_done, 1, memory_order_release); +} diff --git a/core/box/tiny_class_policy_box.h b/core/box/tiny_class_policy_box.h new file mode 100644 index 00000000..292cd81a --- /dev/null +++ b/core/box/tiny_class_policy_box.h @@ -0,0 +1,41 @@ +// tiny_class_policy_box.h - Class-scoped policy box for Tiny front-end +// +// Purpose: +// - Centralize per-class feature toggles (Page Box / Warm Pool / caps). +// - Keep hot paths free from direct ENV parsing or scattered conditionals. +// - Defaults: +// legacy (デフォルト): Page Box は C5–C7、Warm は C0–C7 で cap は小さめ +// c5_7_only: Page/Warm とも C5–C7 のみ +// tinyplus_all: 予備プロファイル(当面 legacy と同等) +// - ENV: HAKMEM_TINY_POLICY_PROFILE=legacy|c5_7_only|tinyplus_all +// - Learner が入るまでは固定ポリシーで運用し、Hot path は tiny_policy_get() を見るだけに保つ。 + +#ifndef TINY_CLASS_POLICY_BOX_H +#define TINY_CLASS_POLICY_BOX_H + +#include +#include +#include "../hakmem_tiny_config.h" + +typedef struct TinyClassPolicy { + uint8_t page_box_enabled; // Enable Tiny Page Box for this class + uint8_t warm_enabled; // Enable Warm Pool for this class + uint8_t warm_cap; // Max warm SuperSlabs to keep (per-thread) + uint8_t reserved; +} TinyClassPolicy; + +extern TinyClassPolicy g_tiny_class_policy[TINY_NUM_CLASSES]; + +// Initialize policy table once (idempotent). +void tiny_class_policy_init_once(void); + +// Lightweight accessor for hot paths. +static inline const TinyClassPolicy* tiny_policy_get(int class_idx) { + if (class_idx < 0 || class_idx >= TINY_NUM_CLASSES) { + return NULL; + } + tiny_class_policy_init_once(); + return &g_tiny_class_policy[class_idx]; +} + +#endif // TINY_CLASS_POLICY_BOX_H diff --git a/core/box/tiny_class_stats_box.c b/core/box/tiny_class_stats_box.c new file mode 100644 index 00000000..3ebc5de8 --- /dev/null +++ b/core/box/tiny_class_stats_box.c @@ -0,0 +1,10 @@ +// tiny_class_stats_box.c - Thread-local stats storage for Tiny classes + +#include "tiny_class_stats_box.h" +#include + +__thread TinyClassStatsThread g_tiny_class_stats = {0}; + +void tiny_class_stats_reset_thread(void) { + memset(&g_tiny_class_stats, 0, sizeof(g_tiny_class_stats)); +} diff --git a/core/box/tiny_class_stats_box.h b/core/box/tiny_class_stats_box.h new file mode 100644 index 00000000..0f74c3ef --- /dev/null +++ b/core/box/tiny_class_stats_box.h @@ -0,0 +1,42 @@ +// tiny_class_stats_box.h - Lightweight per-thread class stats (OBSERVE layer) +// +// Purpose: +// - Provide per-class counters without atomics for cheap observation. +// - Hot paths call small inline helpers; aggregation/printing can be added later. + +#ifndef TINY_CLASS_STATS_BOX_H +#define TINY_CLASS_STATS_BOX_H + +#include +#include "../hakmem_tiny_config.h" + +typedef struct TinyClassStatsThread { + uint64_t uc_miss[TINY_NUM_CLASSES]; // unified_cache_refill() hits + uint64_t warm_hit[TINY_NUM_CLASSES]; // warm pool successes + uint64_t shared_lock[TINY_NUM_CLASSES]; // shared pool lock acquisitions (hook as needed) +} TinyClassStatsThread; + +extern __thread TinyClassStatsThread g_tiny_class_stats; + +static inline void tiny_class_stats_on_uc_miss(int ci) { + if (ci >= 0 && ci < TINY_NUM_CLASSES) { + g_tiny_class_stats.uc_miss[ci]++; + } +} + +static inline void tiny_class_stats_on_warm_hit(int ci) { + if (ci >= 0 && ci < TINY_NUM_CLASSES) { + g_tiny_class_stats.warm_hit[ci]++; + } +} + +static inline void tiny_class_stats_on_shared_lock(int ci) { + if (ci >= 0 && ci < TINY_NUM_CLASSES) { + g_tiny_class_stats.shared_lock[ci]++; + } +} + +// Optional: reset per-thread counters (cold path only). +void tiny_class_stats_reset_thread(void); + +#endif // TINY_CLASS_STATS_BOX_H diff --git a/core/box/tiny_page_box.h b/core/box/tiny_page_box.h index 960acc36..6aaf4849 100644 --- a/core/box/tiny_page_box.h +++ b/core/box/tiny_page_box.h @@ -86,9 +86,9 @@ static inline void tiny_page_box_init_once(void) { const char* env = getenv("HAKMEM_TINY_PAGE_BOX_CLASSES"); if (!env || !*env) { - // Default: enable only C7 - if (7 < TINY_NUM_CLASSES) { - g_tiny_page_box_state[7].enabled = 1; + // Default: enable mid-size classes (C5–C7) + for (int c = 5; c <= 7 && c < TINY_NUM_CLASSES; c++) { + g_tiny_page_box_state[c].enabled = 1; } } else { // Parse simple comma-separated list of integers: "5,6,7" diff --git a/core/box/tiny_policy_learner_box.c b/core/box/tiny_policy_learner_box.c new file mode 100644 index 00000000..7c877081 --- /dev/null +++ b/core/box/tiny_policy_learner_box.c @@ -0,0 +1,7 @@ +// tiny_policy_learner_box.c - Placeholder learner hook + +#include "tiny_policy_learner_box.h" + +void tiny_policy_learner_tick(void) { + // FROZEN/OBSERVE: intentionally empty. +} diff --git a/core/box/tiny_policy_learner_box.h b/core/box/tiny_policy_learner_box.h new file mode 100644 index 00000000..0d7e44b1 --- /dev/null +++ b/core/box/tiny_policy_learner_box.h @@ -0,0 +1,11 @@ +// tiny_policy_learner_box.h - Placeholder for Tiny class policy learner +// +// Current mode: FROZEN/OBSERVE (no learning). Hook remains for future LEARN mode. + +#ifndef TINY_POLICY_LEARNER_BOX_H +#define TINY_POLICY_LEARNER_BOX_H + +// Stub: will be extended when LEARN mode is enabled. +void tiny_policy_learner_tick(void); + +#endif // TINY_POLICY_LEARNER_BOX_H diff --git a/core/box/warm_pool_prefill_box.h b/core/box/warm_pool_prefill_box.h index 607d317c..1f6367d6 100644 --- a/core/box/warm_pool_prefill_box.h +++ b/core/box/warm_pool_prefill_box.h @@ -16,6 +16,8 @@ #include "../box/warm_pool_stats_box.h" #include "../box/warm_pool_rel_counters_box.h" +extern _Atomic uintptr_t g_c7_stage3_magic_ss; + static inline void warm_prefill_log_c7_meta(const char* tag, TinyTLSSlab* tls) { if (!tls || !tls->ss) return; #if HAKMEM_BUILD_RELEASE @@ -23,8 +25,9 @@ static inline void warm_prefill_log_c7_meta(const char* tag, TinyTLSSlab* tls) { uint32_t n = atomic_fetch_add_explicit(&rel_logs, 1, memory_order_relaxed); if (n < 4) { TinySlabMeta* meta = &tls->ss->slabs[tls->slab_idx]; + uintptr_t magic = atomic_load_explicit(&g_c7_stage3_magic_ss, memory_order_relaxed); fprintf(stderr, - "[REL_C7_%s] ss=%p slab=%u cls=%u used=%u cap=%u carved=%u freelist=%p\n", + "[REL_C7_%s] ss=%p slab=%u cls=%u used=%u cap=%u carved=%u freelist=%p magic=%#lx\n", tag, (void*)tls->ss, (unsigned)tls->slab_idx, @@ -32,15 +35,17 @@ static inline void warm_prefill_log_c7_meta(const char* tag, TinyTLSSlab* tls) { (unsigned)meta->used, (unsigned)meta->capacity, (unsigned)meta->carved, - meta->freelist); + meta->freelist, + (unsigned long)magic); } #else static _Atomic uint32_t dbg_logs = 0; uint32_t n = atomic_fetch_add_explicit(&dbg_logs, 1, memory_order_relaxed); if (n < 4) { TinySlabMeta* meta = &tls->ss->slabs[tls->slab_idx]; + uintptr_t magic = atomic_load_explicit(&g_c7_stage3_magic_ss, memory_order_relaxed); fprintf(stderr, - "[DBG_C7_%s] ss=%p slab=%u cls=%u used=%u cap=%u carved=%u freelist=%p\n", + "[DBG_C7_%s] ss=%p slab=%u cls=%u used=%u cap=%u carved=%u freelist=%p magic=%#lx\n", tag, (void*)tls->ss, (unsigned)tls->slab_idx, @@ -48,7 +53,8 @@ static inline void warm_prefill_log_c7_meta(const char* tag, TinyTLSSlab* tls) { (unsigned)meta->used, (unsigned)meta->capacity, (unsigned)meta->carved, - meta->freelist); + meta->freelist, + (unsigned long)magic); } #endif } @@ -84,7 +90,7 @@ extern SuperSlab* superslab_refill(int class_idx); // // Performance: Only triggered when pool is empty, cold path cost // -static inline int warm_pool_do_prefill(int class_idx, TinyTLSSlab* tls) { +static inline int warm_pool_do_prefill(int class_idx, TinyTLSSlab* tls, int warm_cap_hint) { #if HAKMEM_BUILD_RELEASE if (class_idx == 7) { warm_pool_rel_c7_prefill_call(); @@ -149,7 +155,7 @@ static inline int warm_pool_do_prefill(int class_idx, TinyTLSSlab* tls) { if (budget > 1) { // Prefill mode: push to pool and load another - tiny_warm_pool_push(class_idx, tls->ss); + tiny_warm_pool_push_with_cap(class_idx, tls->ss, warm_cap_hint); warm_pool_record_prefilled(class_idx); #if HAKMEM_BUILD_RELEASE if (class_idx == 7) { diff --git a/core/front/tiny_unified_cache.c b/core/front/tiny_unified_cache.c index ee8261f2..f29e0c79 100644 --- a/core/front/tiny_unified_cache.c +++ b/core/front/tiny_unified_cache.c @@ -21,6 +21,8 @@ #include "../box/tiny_page_box.h" // Tiny-Plus Page Box (C5–C7 initial hook) #include "../box/ss_tls_bind_box.h" // Box: TLS Bind (SuperSlab -> TLS binding) #include "../box/tiny_tls_carve_one_block_box.h" // Box: TLS carve helper (shared) +#include "../box/tiny_class_policy_box.h" // Box: per-class policy (Page/Warm caps) +#include "../box/tiny_class_stats_box.h" // Box: lightweight per-class stats #include "../box/warm_tls_bind_logger_box.h" // Box: Warm TLS Bind logging (throttled) #define WARM_POOL_DBG_DEFINE #include "../box/warm_pool_dbg_box.h" // Box: Warm Pool C7 debug counters @@ -516,6 +518,10 @@ hak_base_ptr_t unified_cache_refill(int class_idx) { tiny_warm_pool_init_once(); TinyUnifiedCache* cache = &g_unified_cache[class_idx]; + const TinyClassPolicy* policy = tiny_policy_get(class_idx); + int warm_enabled = policy ? policy->warm_enabled : 0; + int warm_cap = policy ? policy->warm_cap : 0; + int page_enabled = policy ? policy->page_box_enabled : 0; // ✅ Phase 11+: Ensure cache is initialized (lazy init for cold path) if (!cache->slots) { @@ -560,7 +566,7 @@ hak_base_ptr_t unified_cache_refill(int class_idx) { // ========== PAGE BOX HOT PATH(Tiny-Plus 層): Try page box FIRST ========== // 将来的に C7 専用の page-level freelist 管理をここに統合する。 // いまは stub 実装で常に 0 を返すが、Box 境界としての接続だけ先に行う。 - if (tiny_page_box_is_enabled(class_idx)) { + if (page_enabled && tiny_page_box_is_enabled(class_idx)) { int page_produced = tiny_page_box_refill(class_idx, out, room); if (page_produced > 0) { // Store blocks into cache and return first @@ -573,6 +579,7 @@ hak_base_ptr_t unified_cache_refill(int class_idx) { #if !HAKMEM_BUILD_RELEASE g_unified_cache_miss[class_idx]++; #endif + tiny_class_stats_on_uc_miss(class_idx); if (measure) { uint64_t end_cycles = read_tsc(); @@ -593,169 +600,198 @@ hak_base_ptr_t unified_cache_refill(int class_idx) { // ========== WARM POOL HOT PATH: Check warm pool FIRST ========== // This is the critical optimization - avoid superslab_refill() registry scan - #if !HAKMEM_BUILD_RELEASE - atomic_fetch_add_explicit(&g_dbg_warm_pop_attempts, 1, memory_order_relaxed); - if (class_idx == 7) { - warm_pool_dbg_c7_attempt(); - } - #endif - #if HAKMEM_BUILD_RELEASE - if (class_idx == 7) { - atomic_fetch_add_explicit(&g_rel_c7_warm_pop, 1, memory_order_relaxed); - } - #endif - SuperSlab* warm_ss = tiny_warm_pool_pop(class_idx); - if (warm_ss) { - #if !HAKMEM_BUILD_RELEASE + if (warm_enabled) { if (class_idx == 7) { - warm_pool_dbg_c7_hit(); + const TinyClassPolicy* pol = tiny_policy_get(7); + static _Atomic int g_c7_policy_logged = 0; + if (atomic_exchange_explicit(&g_c7_policy_logged, 1, memory_order_acq_rel) == 0) { + fprintf(stderr, + "[C7_POLICY_AT_WARM] page=%u warm=%u cap=%u\n", + pol ? pol->page_box_enabled : 0, + pol ? pol->warm_enabled : 0, + pol ? pol->warm_cap : 0); + } } - // Debug-only: Warm TLS Bind experiment (C7 only) + #if !HAKMEM_BUILD_RELEASE + atomic_fetch_add_explicit(&g_dbg_warm_pop_attempts, 1, memory_order_relaxed); if (class_idx == 7) { - int warm_mode = warm_tls_bind_mode_c7(); - if (warm_mode >= 1) { - int cap = ss_slabs_capacity(warm_ss); - int slab_idx = -1; + warm_pool_dbg_c7_attempt(); + } + #endif + #if HAKMEM_BUILD_RELEASE + if (class_idx == 7) { + atomic_fetch_add_explicit(&g_rel_c7_warm_pop, 1, memory_order_relaxed); + } + #endif + SuperSlab* warm_ss = tiny_warm_pool_pop(class_idx); + if (warm_ss) { + if (class_idx == 7) { + #if !HAKMEM_BUILD_RELEASE + warm_pool_dbg_c7_hit(); + #endif + int warm_mode = warm_tls_bind_mode_c7(); + if (warm_mode >= 1) { + int cap = ss_slabs_capacity(warm_ss); + int slab_idx = -1; - // Simple heuristic: first slab matching class - for (int i = 0; i < cap; i++) { - if (tiny_get_class_from_ss(warm_ss, i) == class_idx) { - slab_idx = i; - break; + // Simple heuristic: first slab matching class + for (int i = 0; i < cap; i++) { + if (tiny_get_class_from_ss(warm_ss, i) == class_idx) { + slab_idx = i; + break; + } } - } - if (slab_idx >= 0) { - TinyTLSSlab* tls = &g_tls_slabs[class_idx]; - uint32_t tid = (uint32_t)(uintptr_t)pthread_self(); - if (ss_tls_bind_one(class_idx, tls, warm_ss, slab_idx, tid)) { - warm_tls_bind_log_success(warm_ss, slab_idx); + if (slab_idx >= 0) { + TinyTLSSlab* tls = &g_tls_slabs[class_idx]; + uint32_t tid = (uint32_t)(uintptr_t)pthread_self(); + if (ss_tls_bind_one(class_idx, tls, warm_ss, slab_idx, tid)) { + warm_tls_bind_log_success(warm_ss, slab_idx); - // Mode 2: carve a single block via TLS fast path - if (warm_mode == 2) { - warm_pool_dbg_c7_tls_attempt(); - TinyTLSCarveOneResult tls_carve = - tiny_tls_carve_one_block(tls, class_idx); - if (tls_carve.block) { - warm_tls_bind_log_tls_carve(warm_ss, slab_idx, tls_carve.block); - warm_pool_dbg_c7_tls_success(); - out[0] = tls_carve.block; - produced = 1; - tls_carved = 1; - } else { - warm_tls_bind_log_tls_fail(warm_ss, slab_idx); - warm_pool_dbg_c7_tls_fail(); + // Mode 2: carve a single block via TLS fast path + if (warm_mode == 2) { + #if !HAKMEM_BUILD_RELEASE + warm_pool_dbg_c7_tls_attempt(); + #endif + TinyTLSCarveOneResult tls_carve = + tiny_tls_carve_one_block(tls, class_idx); + if (tls_carve.block) { + warm_tls_bind_log_tls_carve(warm_ss, slab_idx, tls_carve.block); + #if !HAKMEM_BUILD_RELEASE + warm_pool_dbg_c7_tls_success(); + #endif + out[0] = tls_carve.block; + produced = 1; + tls_carved = 1; + } else { + warm_tls_bind_log_tls_fail(warm_ss, slab_idx); + #if !HAKMEM_BUILD_RELEASE + warm_pool_dbg_c7_tls_fail(); + #endif + } } } } } } - } - atomic_fetch_add_explicit(&g_dbg_warm_pop_hits, 1, memory_order_relaxed); - #endif - // HOT PATH: Warm pool hit, try to carve directly - if (produced == 0) { - #if HAKMEM_BUILD_RELEASE - if (class_idx == 7) { - warm_pool_rel_c7_carve_attempt(); - } + #if !HAKMEM_BUILD_RELEASE + atomic_fetch_add_explicit(&g_dbg_warm_pop_hits, 1, memory_order_relaxed); #endif - produced = slab_carve_from_ss(class_idx, warm_ss, out, room); - #if HAKMEM_BUILD_RELEASE - if (class_idx == 7) { + // HOT PATH: Warm pool hit, try to carve directly + if (produced == 0) { + #if HAKMEM_BUILD_RELEASE + if (class_idx == 7) { + warm_pool_rel_c7_carve_attempt(); + } + #endif + produced = slab_carve_from_ss(class_idx, warm_ss, out, room); + #if HAKMEM_BUILD_RELEASE + if (class_idx == 7) { + if (produced > 0) { + warm_pool_rel_c7_carve_success(); + } else { + warm_pool_rel_c7_carve_zero(); + } + } + #endif if (produced > 0) { - warm_pool_rel_c7_carve_success(); - } else { - warm_pool_rel_c7_carve_zero(); + // Update active counter for carved blocks + ss_active_add(warm_ss, (uint32_t)produced); } } - #endif + if (produced > 0) { - // Update active counter for carved blocks - ss_active_add(warm_ss, (uint32_t)produced); - } - } - - if (produced > 0) { - #if !HAKMEM_BUILD_RELEASE - if (class_idx == 7) { - warm_pool_dbg_c7_carve(); - if (tls_carved) { - warm_pool_dbg_c7_uc_miss_tls(); - } else { - warm_pool_dbg_c7_uc_miss_warm(); + #if !HAKMEM_BUILD_RELEASE + if (class_idx == 7) { + warm_pool_dbg_c7_carve(); + if (tls_carved) { + warm_pool_dbg_c7_uc_miss_tls(); + } else { + warm_pool_dbg_c7_uc_miss_warm(); + } } - } - #endif - // Success! Return SuperSlab to warm pool for next use - #if HAKMEM_BUILD_RELEASE - if (class_idx == 7) { - atomic_fetch_add_explicit(&g_rel_c7_warm_push, 1, memory_order_relaxed); - } - #endif - tiny_warm_pool_push(class_idx, warm_ss); + #endif + // Success! Return SuperSlab to warm pool for next use + #if HAKMEM_BUILD_RELEASE + if (class_idx == 7) { + atomic_fetch_add_explicit(&g_rel_c7_warm_push, 1, memory_order_relaxed); + } + #endif + tiny_warm_pool_push_with_cap(class_idx, warm_ss, warm_cap); - // Track warm pool hit (always compiled, ENV-gated printing) - warm_pool_record_hit(class_idx); + // Track warm pool hit (always compiled, ENV-gated printing) + warm_pool_record_hit(class_idx); + tiny_class_stats_on_warm_hit(class_idx); - // Store blocks into cache and return first - void* first = out[0]; - for (int i = 1; i < produced; i++) { - cache->slots[cache->tail] = out[i]; - cache->tail = (cache->tail + 1) & cache->mask; + // Store blocks into cache and return first + void* first = out[0]; + for (int i = 1; i < produced; i++) { + cache->slots[cache->tail] = out[i]; + cache->tail = (cache->tail + 1) & cache->mask; + } + + #if !HAKMEM_BUILD_RELEASE + g_unified_cache_miss[class_idx]++; + #endif + tiny_class_stats_on_uc_miss(class_idx); + + if (measure) { + uint64_t end_cycles = read_tsc(); + uint64_t delta = end_cycles - start_cycles; + atomic_fetch_add_explicit(&g_unified_cache_refill_cycles_global, + delta, memory_order_relaxed); + atomic_fetch_add_explicit(&g_unified_cache_misses_global, + 1, memory_order_relaxed); + // Per-class 集計(C5–C7 の refill コストを可視化) + atomic_fetch_add_explicit(&g_unified_cache_refill_cycles_by_class[class_idx], + delta, memory_order_relaxed); + atomic_fetch_add_explicit(&g_unified_cache_misses_by_class[class_idx], + 1, memory_order_relaxed); + } + + return HAK_BASE_FROM_RAW(first); } + // SuperSlab carve failed (produced == 0) #if !HAKMEM_BUILD_RELEASE - g_unified_cache_miss[class_idx]++; + atomic_fetch_add_explicit(&g_dbg_warm_pop_carve_zero, 1, memory_order_relaxed); #endif - - if (measure) { - uint64_t end_cycles = read_tsc(); - uint64_t delta = end_cycles - start_cycles; - atomic_fetch_add_explicit(&g_unified_cache_refill_cycles_global, - delta, memory_order_relaxed); - atomic_fetch_add_explicit(&g_unified_cache_misses_global, - 1, memory_order_relaxed); - // Per-class 集計(C5–C7 の refill コストを可視化) - atomic_fetch_add_explicit(&g_unified_cache_refill_cycles_by_class[class_idx], - delta, memory_order_relaxed); - atomic_fetch_add_explicit(&g_unified_cache_misses_by_class[class_idx], - 1, memory_order_relaxed); + // This slab is either exhausted or has no more available capacity + // The statistics counter 'prefilled' tracks how often we try to prefill + if (produced == 0 && tiny_warm_pool_count(class_idx) == 0) { + // Pool is empty and carve failed - prefill would help here + warm_pool_record_prefilled(class_idx); } - - return HAK_BASE_FROM_RAW(first); + } else { + #if !HAKMEM_BUILD_RELEASE + atomic_fetch_add_explicit(&g_dbg_warm_pop_empty, 1, memory_order_relaxed); + #endif } - // SuperSlab carve failed (produced == 0) - #if !HAKMEM_BUILD_RELEASE - atomic_fetch_add_explicit(&g_dbg_warm_pop_carve_zero, 1, memory_order_relaxed); - #endif - // This slab is either exhausted or has no more available capacity - // The statistics counter 'prefilled' tracks how often we try to prefill - if (produced == 0 && tiny_warm_pool_count(class_idx) == 0) { - // Pool is empty and carve failed - prefill would help here - warm_pool_record_prefilled(class_idx); - } - } else { - #if !HAKMEM_BUILD_RELEASE - atomic_fetch_add_explicit(&g_dbg_warm_pop_empty, 1, memory_order_relaxed); - #endif + // ========== COLD PATH: Warm pool miss, use superslab_refill ========== + // Track warm pool miss (always compiled, ENV-gated printing) + warm_pool_record_miss(class_idx); } - // ========== COLD PATH: Warm pool miss, use superslab_refill ========== - // Track warm pool miss (always compiled, ENV-gated printing) - warm_pool_record_miss(class_idx); - TinyTLSSlab* tls = &g_tls_slabs[class_idx]; // Step 1: Ensure SuperSlab available via normal refill // Enhanced: Use Warm Pool Prefill Box for secondary prefill when pool is empty - if (warm_pool_do_prefill(class_idx, tls) < 0) { - return HAK_BASE_FROM_RAW(NULL); + if (warm_enabled) { + if (warm_pool_do_prefill(class_idx, tls, warm_cap) < 0) { + return HAK_BASE_FROM_RAW(NULL); + } + // After prefill: tls->ss has the final slab for carving + tls = &g_tls_slabs[class_idx]; // Reload (already done in prefill box) + } else { + if (!tls->ss) { + if (!superslab_refill(class_idx)) { + return HAK_BASE_FROM_RAW(NULL); + } + tls = &g_tls_slabs[class_idx]; + } } - // After prefill: tls->ss has the final slab for carving - // tls = &g_tls_slabs[class_idx]; // Reload (already done in prefill box) // Step 2: Direct carve from SuperSlab into local array (bypass TLS SLL!) TinySlabMeta* m = tls->meta; @@ -844,6 +880,7 @@ hak_base_ptr_t unified_cache_refill(int class_idx) { } g_unified_cache_miss[class_idx]++; #endif + tiny_class_stats_on_uc_miss(class_idx); // Measure refill cycles if (measure) { diff --git a/core/front/tiny_warm_pool.h b/core/front/tiny_warm_pool.h index 03b3d75b..f620e501 100644 --- a/core/front/tiny_warm_pool.h +++ b/core/front/tiny_warm_pool.h @@ -87,16 +87,6 @@ static inline SuperSlab* tiny_warm_pool_pop(int class_idx) { return NULL; } -// O(1) push to warm pool -// Returns: 1 if pushed successfully, 0 if pool full (caller should free to LRU) -static inline int tiny_warm_pool_push(int class_idx, SuperSlab* ss) { - if (g_tiny_warm_pool[class_idx].count < TINY_WARM_POOL_MAX_PER_CLASS) { - g_tiny_warm_pool[class_idx].slabs[g_tiny_warm_pool[class_idx].count++] = ss; - return 1; - } - return 0; -} - // Get current count (for metrics/debugging) static inline int tiny_warm_pool_count(int class_idx) { return g_tiny_warm_pool[class_idx].count; @@ -125,14 +115,35 @@ static inline int warm_pool_max_per_class(void) { return g_max; } -// Push with environment-configured capacity -static inline int tiny_warm_pool_push_tunable(int class_idx, SuperSlab* ss) { - int capacity = warm_pool_max_per_class(); - if (g_tiny_warm_pool[class_idx].count < capacity) { +// O(1) push to warm pool (cap-aware) +// cap_hint <=0 → use warm_pool_max_per_class() clamped to TINY_WARM_POOL_MAX_PER_CLASS +static inline int tiny_warm_pool_push_with_cap(int class_idx, SuperSlab* ss, int cap_hint) { + int limit = cap_hint; + if (limit <= 0 || limit > TINY_WARM_POOL_MAX_PER_CLASS) { + limit = warm_pool_max_per_class(); + if (limit <= 0) { + limit = TINY_WARM_POOL_MAX_PER_CLASS; + } + if (limit > TINY_WARM_POOL_MAX_PER_CLASS) { + limit = TINY_WARM_POOL_MAX_PER_CLASS; + } + } + + if (g_tiny_warm_pool[class_idx].count < limit) { g_tiny_warm_pool[class_idx].slabs[g_tiny_warm_pool[class_idx].count++] = ss; return 1; } return 0; } +// Default push (uses ENV/default cap) +static inline int tiny_warm_pool_push(int class_idx, SuperSlab* ss) { + return tiny_warm_pool_push_with_cap(class_idx, ss, -1); +} + +// Push with environment-configured capacity (legacy name) +static inline int tiny_warm_pool_push_tunable(int class_idx, SuperSlab* ss) { + return tiny_warm_pool_push_with_cap(class_idx, ss, warm_pool_max_per_class()); +} + #endif // HAK_TINY_WARM_POOL_H diff --git a/core/hakmem_shared_pool_acquire.c b/core/hakmem_shared_pool_acquire.c index 6e2bb543..8ea1892f 100644 --- a/core/hakmem_shared_pool_acquire.c +++ b/core/hakmem_shared_pool_acquire.c @@ -12,10 +12,14 @@ #include "front/tiny_warm_pool.h" // Warm Pool: Prefill during registry scans #include "box/ss_slab_reset_box.h" // Box: Reset slab metadata on reuse (C7 guard) +#include #include #include #include +// Stage3(LRU) 由来の Superslab をトレースするための簡易マジック +_Atomic uintptr_t g_c7_stage3_magic_ss = 0; + static inline void c7_log_meta_state(const char* tag, SuperSlab* ss, int slab_idx) { if (!ss) return; #if HAKMEM_BUILD_RELEASE @@ -357,7 +361,8 @@ stage1_retry_after_tension_drain: if (class_idx == 7) { TinySlabMeta* meta = &ss_guard->slabs[reuse_slot_idx]; - if (!c7_meta_is_pristine(meta)) { + int meta_ok = (meta->used == 0) && (meta->carved == 0) && (meta->freelist == NULL); + if (!meta_ok) { c7_log_skip_nonempty_acquire(ss_guard, reuse_slot_idx, meta, "SKIP_NONEMPTY_ACQUIRE"); sp_freelist_push_lockfree(class_idx, reuse_meta, reuse_slot_idx); goto stage2_fallback; @@ -418,6 +423,17 @@ stage1_retry_after_tension_drain: *ss_out = ss; *slab_idx_out = reuse_slot_idx; + if (class_idx == 7) { + TinySlabMeta* meta_check = &ss->slabs[reuse_slot_idx]; + if (!((meta_check->used == 0) && (meta_check->carved == 0) && (meta_check->freelist == NULL))) { + sp_freelist_push_lockfree(class_idx, reuse_meta, reuse_slot_idx); + if (g_lock_stats_enabled == 1) { + atomic_fetch_add(&g_lock_release_count, 1); + } + pthread_mutex_unlock(&g_shared_pool.alloc_lock); + goto stage2_fallback; + } + } if (c7_reset_and_log_if_needed(ss, reuse_slot_idx, class_idx) != 0) { *ss_out = NULL; *slab_idx_out = -1; @@ -497,7 +513,9 @@ stage2_fallback: if (class_idx == 7) { TinySlabMeta* meta = &ss->slabs[claimed_idx]; - if (!c7_meta_is_pristine(meta)) { + int meta_ok = (meta->used == 0) && (meta->carved == 0) && + (meta->freelist == NULL); + if (!meta_ok) { c7_log_skip_nonempty_acquire(ss, claimed_idx, meta, "SKIP_NONEMPTY_ACQUIRE"); sp_slot_mark_empty(hint_meta, claimed_idx); if (g_lock_stats_enabled == 1) { @@ -523,6 +541,20 @@ stage2_fallback: // Hint is still good, no need to update *ss_out = ss; *slab_idx_out = claimed_idx; + if (class_idx == 7) { + TinySlabMeta* meta_check = &ss->slabs[claimed_idx]; + if (!((meta_check->used == 0) && (meta_check->carved == 0) && + (meta_check->freelist == NULL))) { + sp_slot_mark_empty(hint_meta, claimed_idx); + *ss_out = NULL; + *slab_idx_out = -1; + if (g_lock_stats_enabled == 1) { + atomic_fetch_add(&g_lock_release_count, 1); + } + pthread_mutex_unlock(&g_shared_pool.alloc_lock); + goto stage2_scan; + } + } if (c7_reset_and_log_if_needed(ss, claimed_idx, class_idx) != 0) { *ss_out = NULL; *slab_idx_out = -1; @@ -613,7 +645,9 @@ stage2_scan: if (class_idx == 7) { TinySlabMeta* meta_slab = &ss->slabs[claimed_idx]; - if (!c7_meta_is_pristine(meta_slab)) { + int meta_ok = (meta_slab->used == 0) && (meta_slab->carved == 0) && + (meta_slab->freelist == NULL); + if (!meta_ok) { c7_log_skip_nonempty_acquire(ss, claimed_idx, meta_slab, "SKIP_NONEMPTY_ACQUIRE"); sp_slot_mark_empty(meta, claimed_idx); if (g_lock_stats_enabled == 1) { @@ -641,6 +675,20 @@ stage2_scan: *ss_out = ss; *slab_idx_out = claimed_idx; + if (class_idx == 7) { + TinySlabMeta* meta_check = &ss->slabs[claimed_idx]; + if (!((meta_check->used == 0) && (meta_check->carved == 0) && + (meta_check->freelist == NULL))) { + sp_slot_mark_empty(meta, claimed_idx); + *ss_out = NULL; + *slab_idx_out = -1; + if (g_lock_stats_enabled == 1) { + atomic_fetch_add(&g_lock_release_count, 1); + } + pthread_mutex_unlock(&g_shared_pool.alloc_lock); + continue; + } + } if (c7_reset_and_log_if_needed(ss, claimed_idx, class_idx) != 0) { *ss_out = NULL; *slab_idx_out = -1; @@ -721,9 +769,14 @@ stage2_scan: // Stage 3a: Try LRU cache extern SuperSlab* hak_ss_lru_pop(uint8_t size_class); - new_ss = hak_ss_lru_pop((uint8_t)class_idx); - - int from_lru = (new_ss != NULL); + int from_lru = 0; + if (class_idx != 7) { + new_ss = hak_ss_lru_pop((uint8_t)class_idx); + from_lru = (new_ss != NULL); + } else { + // C7: Stage3 LRU 再利用は一旦封じる(再利用が汚染源かを切り分ける) + atomic_store_explicit(&g_c7_stage3_magic_ss, 0, memory_order_relaxed); + } // Stage 3b: If LRU miss, allocate new SuperSlab if (!new_ss) { @@ -752,6 +805,10 @@ stage2_scan: } new_ss = allocated_ss; + if (class_idx == 7) { + // Stage3 経由の C7 Superslab は新規確保のみ(magic もリセット扱い) + atomic_store_explicit(&g_c7_stage3_magic_ss, 0, memory_order_relaxed); + } // Add newly allocated SuperSlab to the shared pool's internal array if (g_shared_pool.total_count >= g_shared_pool.capacity) { @@ -771,6 +828,29 @@ stage2_scan: g_shared_pool.total_count++; } + // C7: LRU 再利用・新規確保いずれでも、空スラブに完全リセットしてから返す + if (class_idx == 7 && new_ss) { + int cap = ss_slabs_capacity(new_ss); + new_ss->slab_bitmap = 0; + new_ss->nonempty_mask = 0; + new_ss->freelist_mask = 0; + new_ss->empty_mask = 0; + new_ss->empty_count = 0; + new_ss->active_slabs = 0; + new_ss->hot_count = 0; + new_ss->cold_count = 0; + for (int s = 0; s < cap; s++) { + ss_slab_reset_meta_for_tiny(new_ss, s, class_idx); + } + static _Atomic uint32_t rel_stage3_reset_logs = 0; + uint32_t n = atomic_fetch_add_explicit(&rel_stage3_reset_logs, 1, memory_order_relaxed); + if (n < 4) { + fprintf(stderr, + "[REL_C7_STAGE3_RESET] ss=%p from_lru=%d cap=%d\n", + (void*)new_ss, from_lru, cap); + } + } + #if !HAKMEM_BUILD_RELEASE if (dbg_acquire == 1 && new_ss) { fprintf(stderr, "[SP_ACQUIRE_STAGE3] class=%d new SuperSlab (ss=%p from_lru=%d)\n",