Restore C7 Warm/TLS carve for release and add policy scaffolding
This commit is contained in:
@ -27,6 +27,10 @@
|
||||
- `core/box/tiny_page_box.h` / `core/box/tiny_page_box.c` を追加し、`HAKMEM_TINY_PAGE_BOX_CLASSES` で有効クラスを制御できる Page Box を実装。
|
||||
- `tiny_tls_bind_slab()` から `tiny_page_box_on_new_slab()` を呼び出し、TLS が bind した C7 slab を per-thread の page pool に登録。
|
||||
- `unified_cache_refill()` の先頭に Page Box 経路を追加し、C7 では「TLS が掴んでいるページ内 freelist/carve」からバッチ供給を試みてから Warm Pool / Shared Pool に落ちるようにした(Box 境界は `Tiny Page Box → Warm Pool → Shared Pool` の順序を維持)。
|
||||
- TinyClassPolicy/Stats/Learner Box を追加し、Hot path は `tiny_policy_get(class_idx)` で Page/Warm ポリシーを読むだけに統一。
|
||||
- FROZEN デフォルト(legacy プロファイル):Page Box は C5〜C7 のみ ON、Warm は C0〜C7 すべて ON(C0〜C4 cap=4、C5〜C7 cap=8)。
|
||||
- ENV `HAKMEM_TINY_POLICY_PROFILE=legacy|c5_7_only|tinyplus_all` で切替可能(未指定は legacy)。
|
||||
- Stats は OBSERVE 用に積むだけ、Learner は空実装のまま。
|
||||
- TLS Bind Box の導入:
|
||||
- `core/box/ss_tls_bind_box.h` に `ss_tls_bind_one()` を追加し、「Superslab + slab_idx → TLS」のバインド処理(`superslab_init_slab` / `meta->class_idx` 設定 / `tiny_tls_bind_slab`)を 1 箇所に集約。
|
||||
- `superslab_refill()`(Shared Pool 経路)および Warm Pool 実験経路から、この Box を経由して TLS に接続するよう統一。
|
||||
@ -43,8 +47,18 @@
|
||||
- UC ミスを Warm/TLS/Shared 別に分類
|
||||
を Debug ビルドで観測可能にした。
|
||||
- `bench_random_mixed.c` に `HAKMEM_BENCH_C7_ONLY=1` を追加し、C7 サイズ専用の micro-bench を追加。
|
||||
- TinyClassPolicy / Stats / Learner Box の導入(初期フェーズ):
|
||||
- `core/box/tiny_class_policy_box.{h,c}` にクラス別ポリシー構造体 `TinyClassPolicy` と `tiny_policy_get(class_idx)` を追加。
|
||||
- FROZEN デフォルト: Page Box = C5–C7, Warm = 全クラス(C0–C4 cap=4 / C5–C7 cap=8)。
|
||||
- `HAKMEM_TINY_POLICY_PROFILE=legacy|c5_7_only|tinyplus_all` でプロファイル切替可能(未知値は legacy にフォールバック)。
|
||||
- `core/box/tiny_class_stats_box.{h,c}` に OBSERVE 用の軽量カウンタ(UC miss / Warm hit / Shared Pool lock など)を追加。
|
||||
- `core/box/tiny_policy_learner_box.{h,c}` に Learner の骨組みを追加(現状は FROZEN/OBSERVE モード向けの雛形)。
|
||||
- `core/front/tiny_unified_cache.c` / Page Box / Warm Pool 経路を `tiny_policy_get(class_idx)` ベースでゲートし、Hot path からは Policy Box を読む形に統一。
|
||||
|
||||
### 性能の現状(Random Mixed, HEAD)
|
||||
- 注記 (2025-12-05, policy legacy プロファイル試験値):
|
||||
- Release: `HAKMEM_TINY_PROFILE=full HAKMEM_TINY_POLICY_PROFILE=legacy ./bench_random_mixed_hakmem 1000000 256 42` → 約 4.9M ops/s(導入前 27M との乖離あり、要フォロー)。
|
||||
- Release C7-only: `HAKMEM_BENCH_C7_ONLY=1 ... HAKMEM_TINY_POLICY_PROFILE=legacy` → 約 2.7M ops/s(空スラブガード導入前の遅さに戻っており要再調査)。
|
||||
- 条件: `bench_random_mixed_hakmem 1000000 256 42`(1T, ws=256, RELEASE, 16–1024B)
|
||||
- HAKMEM: 約 27.6M ops/s(C7 Warm/TLS 修復後)
|
||||
- system malloc: 約 90–100M ops/s
|
||||
@ -76,6 +90,26 @@
|
||||
- C5/C6 でも同様の Warm/TLS 最適化・空スラブガードを適用するか、
|
||||
- Random Mixed 全体のボトルネック(Shared Pool ロック/Wrapper/mid-size path など)を洗うかを選択。
|
||||
|
||||
### 次フェーズ(Tiny 全クラス向け Page Box / Warm / Policy 汎用化の検討)
|
||||
- 方向性:
|
||||
- 現在は C7 向け Tiny-Plus(Page Box + Warm Pool + TLS Bind)が安定したため、C1〜C7 まで「候補」として広げつつ、
|
||||
実際にどのクラスで有効化するかは Policy Box(学習/ENV)側で制御する設計に進める。
|
||||
- 設計方針(案):
|
||||
- `TinyClassPolicyBox` を新設し、クラス別ポリシー構造体(`TinyClassPolicy{ page_box_enabled, warm_enabled, warm_cap, ... }`)を配列で保持。
|
||||
- Hot path(Tiny Front / Unified Cache / Page Box / Warm Pool)は `tiny_policy_get(class_idx)` でポリシーを読むだけにし、
|
||||
学習/更新は `TinyPolicyLearnerBox` 側で行う。
|
||||
- `TinyClassStatsBox` を導入し、クラス別に UC miss / warm hit / shared_pool_lock などの軽量カウンタを記録(OBSERVE/LEARN モード用)。
|
||||
- モードは FROZEN / OBSERVE / LEARN を ENV で切替可能にし、デフォルトは FROZEN(C5–C7 のみ Page Box/Warm ON, 他クラス OFF)。
|
||||
- 実装ステップ(案):
|
||||
1. C7 Page Box / Warm / TLS Bind の API を「class_idx を引数に取る汎用形」に整理し、内部で `if (!policy->page_box_enabled) fallback` する形にリファクタ。
|
||||
2. `TinyClassPolicy` struct と `tiny_policy_get(class_idx)` を導入し、Hot path から直接 `HAKMEM_*` ENV を参照しないようにする(Policy Box 経由に統一)。
|
||||
3. `TinyClassStatsBox` を追加し、FROZEN/OBSERVE モードで C1〜C7 の stats を集計(policy はまだ固定)。
|
||||
4. `TinyPolicyLearnerBox` を追加し、LEARN モードで stats をもとに `page_box_enabled[]` / `warm_cap[]` を更新(ただし「同時に ON にできるクラス数」に上限を設ける)。
|
||||
- 進捗メモ(実装済み):
|
||||
- `TinyClassPolicyBox`/`TinyClassStatsBox`/`TinyPolicyLearnerBox` を追加し、デフォルトで C5〜C7 に Page Box + Warm を許可(Warm cap=8)。
|
||||
- unified_cache_refill の Page/Warm 経路は `tiny_policy_get()` の返り値でゲートし、Warm push は per-class cap を尊重。
|
||||
- Page Box 初期化もデフォルトで C5〜C7 を有効化。OBSERVE 用の軽量 stats increment を UC miss / Warm hit に接続済み。
|
||||
|
||||
### メモ
|
||||
- ページフォルト問題は Prefault Box + ウォームアップで一定水準まで解消済みで、現在の主ボトルネックはユーザー空間の箱(Unified Cache / free / Pool)側に移っている。
|
||||
- 以降の最適化は「箱を削る」ではなく、「HOT 層で踏む箱を減らし、Tiny 的なシンプル経路をどこまで広げるか」にフォーカスする。
|
||||
- 以降の最適化は「箱を削る」ではなく、「HOT 層で踏む箱を減らし、Tiny 的なシンプル経路と Tiny-Plus 経路(Page Box + Warm)をクラス別ポリシーでどう使い分けるか」にフォーカスする。
|
||||
|
||||
8
Makefile
8
Makefile
@ -219,12 +219,12 @@ LDFLAGS += $(EXTRA_LDFLAGS)
|
||||
|
||||
# Targets
|
||||
TARGET = test_hakmem
|
||||
OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/tiny_page_box.o core/box/wrapper_env_box.o core/box/ptr_trace_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/link_stubs.o core/tiny_failfast.o test_hakmem.o
|
||||
OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/wrapper_env_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/link_stubs.o core/tiny_failfast.o test_hakmem.o
|
||||
OBJS = $(OBJS_BASE)
|
||||
|
||||
# Shared library
|
||||
SHARED_LIB = libhakmem.so
|
||||
SHARED_OBJS = hakmem_shared.o hakmem_config_shared.o hakmem_tiny_config_shared.o hakmem_ucb1_shared.o hakmem_bigcache_shared.o hakmem_pool_shared.o hakmem_l25_pool_shared.o hakmem_site_rules_shared.o hakmem_tiny_shared.o core/box/ss_allocation_box_shared.o superslab_stats_shared.o superslab_cache_shared.o superslab_ace_shared.o superslab_slab_shared.o superslab_backend_shared.o core/superslab_head_stub_shared.o hakmem_smallmid_shared.o core/box/superslab_expansion_box_shared.o core/box/integrity_box_shared.o core/box/mailbox_box_shared.o core/box/front_gate_box_shared.o core/box/front_gate_classifier_shared.o core/box/free_publish_box_shared.o core/box/capacity_box_shared.o core/box/carve_push_box_shared.o core/box/prewarm_box_shared.o core/box/ss_hot_prewarm_box_shared.o core/box/front_metrics_box_shared.o core/box/bench_fast_box_shared.o core/box/ss_addr_map_box_shared.o core/box/slab_recycling_box_shared.o core/box/pagefault_telemetry_box_shared.o core/box/tiny_sizeclass_hist_box_shared.o core/box/tiny_env_box_shared.o core/box/tiny_route_box_shared.o core/box/tiny_page_box_shared.o core/box/wrapper_env_box_shared.o core/page_arena_shared.o core/front/tiny_unified_cache_shared.o core/tiny_alloc_fast_push_shared.o core/link_stubs_shared.o core/tiny_failfast_shared.o tiny_sticky_shared.o tiny_remote_shared.o tiny_publish_shared.o tiny_debug_ring_shared.o hakmem_tiny_magazine_shared.o hakmem_tiny_stats_shared.o hakmem_tiny_sfc_shared.o hakmem_tiny_query_shared.o hakmem_tiny_rss_shared.o hakmem_tiny_registry_shared.o hakmem_tiny_remote_target_shared.o hakmem_tiny_bg_spill_shared.o tiny_adaptive_sizing_shared.o hakmem_super_registry_shared.o hakmem_shared_pool_shared.o hakmem_shared_pool_acquire_shared.o hakmem_shared_pool_release_shared.o hakmem_elo_shared.o hakmem_batch_shared.o hakmem_p2_shared.o hakmem_sizeclass_dist_shared.o hakmem_evo_shared.o hakmem_debug_shared.o hakmem_sys_shared.o hakmem_whale_shared.o hakmem_policy_shared.o hakmem_ace_shared.o hakmem_ace_stats_shared.o hakmem_ace_controller_shared.o hakmem_ace_metrics_shared.o hakmem_ace_ucb1_shared.o hakmem_prof_shared.o hakmem_learner_shared.o hakmem_size_hist_shared.o hakmem_learn_log_shared.o hakmem_syscall_shared.o tiny_fastcache_shared.o
|
||||
SHARED_OBJS = hakmem_shared.o hakmem_config_shared.o hakmem_tiny_config_shared.o hakmem_ucb1_shared.o hakmem_bigcache_shared.o hakmem_pool_shared.o hakmem_l25_pool_shared.o hakmem_site_rules_shared.o hakmem_tiny_shared.o core/box/ss_allocation_box_shared.o superslab_stats_shared.o superslab_cache_shared.o superslab_ace_shared.o superslab_slab_shared.o superslab_backend_shared.o core/superslab_head_stub_shared.o hakmem_smallmid_shared.o core/box/superslab_expansion_box_shared.o core/box/integrity_box_shared.o core/box/mailbox_box_shared.o core/box/front_gate_box_shared.o core/box/front_gate_classifier_shared.o core/box/free_publish_box_shared.o core/box/capacity_box_shared.o core/box/carve_push_box_shared.o core/box/prewarm_box_shared.o core/box/ss_hot_prewarm_box_shared.o core/box/front_metrics_box_shared.o core/box/bench_fast_box_shared.o core/box/ss_addr_map_box_shared.o core/box/slab_recycling_box_shared.o core/box/pagefault_telemetry_box_shared.o core/box/tiny_sizeclass_hist_box_shared.o core/box/tiny_env_box_shared.o core/box/tiny_route_box_shared.o core/box/tiny_page_box_shared.o core/box/tiny_class_policy_box_shared.o core/box/tiny_class_stats_box_shared.o core/box/tiny_policy_learner_box_shared.o core/box/wrapper_env_box_shared.o core/page_arena_shared.o core/front/tiny_unified_cache_shared.o core/tiny_alloc_fast_push_shared.o core/link_stubs_shared.o core/tiny_failfast_shared.o tiny_sticky_shared.o tiny_remote_shared.o tiny_publish_shared.o tiny_debug_ring_shared.o hakmem_tiny_magazine_shared.o hakmem_tiny_stats_shared.o hakmem_tiny_sfc_shared.o hakmem_tiny_query_shared.o hakmem_tiny_rss_shared.o hakmem_tiny_registry_shared.o hakmem_tiny_remote_target_shared.o hakmem_tiny_bg_spill_shared.o tiny_adaptive_sizing_shared.o hakmem_super_registry_shared.o hakmem_shared_pool_shared.o hakmem_shared_pool_acquire_shared.o hakmem_shared_pool_release_shared.o hakmem_elo_shared.o hakmem_batch_shared.o hakmem_p2_shared.o hakmem_sizeclass_dist_shared.o hakmem_evo_shared.o hakmem_debug_shared.o hakmem_sys_shared.o hakmem_whale_shared.o hakmem_policy_shared.o hakmem_ace_shared.o hakmem_ace_stats_shared.o hakmem_ace_controller_shared.o hakmem_ace_metrics_shared.o hakmem_ace_ucb1_shared.o hakmem_prof_shared.o hakmem_learner_shared.o hakmem_size_hist_shared.o hakmem_learn_log_shared.o hakmem_syscall_shared.o tiny_fastcache_shared.o
|
||||
|
||||
# Pool TLS Phase 1 (enable with POOL_TLS_PHASE1=1)
|
||||
ifeq ($(POOL_TLS_PHASE1),1)
|
||||
@ -251,7 +251,7 @@ endif
|
||||
# Benchmark targets
|
||||
BENCH_HAKMEM = bench_allocators_hakmem
|
||||
BENCH_SYSTEM = bench_allocators_system
|
||||
BENCH_HAKMEM_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/tiny_page_box.o core/box/wrapper_env_box.o core/box/ptr_trace_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/link_stubs.o core/tiny_failfast.o bench_allocators_hakmem.o
|
||||
BENCH_HAKMEM_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/wrapper_env_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/link_stubs.o core/tiny_failfast.o bench_allocators_hakmem.o
|
||||
BENCH_HAKMEM_OBJS = $(BENCH_HAKMEM_OBJS_BASE)
|
||||
ifeq ($(POOL_TLS_PHASE1),1)
|
||||
BENCH_HAKMEM_OBJS += pool_tls.o pool_refill.o pool_tls_arena.o pool_tls_registry.o pool_tls_remote.o
|
||||
@ -428,7 +428,7 @@ test-box-refactor: box-refactor
|
||||
./larson_hakmem 10 8 128 1024 1 12345 4
|
||||
|
||||
# Phase 4: Tiny Pool benchmarks (properly linked with hakmem)
|
||||
TINY_BENCH_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/free_publish_box.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/slab_recycling_box.o core/box/tiny_sizeclass_hist_box.o core/box/pagefault_telemetry_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/tiny_page_box.o core/box/wrapper_env_box.o core/box/ptr_trace_box.o core/page_arena.o core/front/tiny_unified_cache.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/tiny_alloc_fast_push.o core/link_stubs.o core/tiny_failfast.o
|
||||
TINY_BENCH_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/free_publish_box.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/slab_recycling_box.o core/box/tiny_sizeclass_hist_box.o core/box/pagefault_telemetry_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/wrapper_env_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/page_arena.o core/front/tiny_unified_cache.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/tiny_alloc_fast_push.o core/link_stubs.o core/tiny_failfast.o
|
||||
TINY_BENCH_OBJS = $(TINY_BENCH_OBJS_BASE)
|
||||
ifeq ($(POOL_TLS_PHASE1),1)
|
||||
TINY_BENCH_OBJS += pool_tls.o pool_refill.o core/pool_tls_arena.o pool_tls_registry.o pool_tls_remote.o
|
||||
|
||||
@ -2,6 +2,7 @@
|
||||
|
||||
**最新メモ (2025-12-05)**: C7 Warm/TLS Bind は本番経路を Bind-only (mode=1) に統一。Debug では `HAKMEM_WARM_TLS_BIND_C7=0/1/2` で切替可能だが、Release は常に mode=1 固定。C7-only ワークロードでは mode=1 が legacy (mode=0) 比で ~4–10x 速く、mode=2 は TLS carve 実験として残置。
|
||||
**追記 (2025-12-05, Release 修復)**: Release だけ C7 Warm が死んでいた原因は「満杯 C7 slab が Shared Pool に居残り、空スラブが Warm に渡っていなかった」こと。Acquire で C7 は空スラブ限定、Release でメタをリセットするガードを導入し、C7-only Release で ~18.8M ops/s、Random Mixed Release で ~27–28M ops/s まで回復。
|
||||
**追記 (2025-12-05, Policy Box)**: `TinyClassPolicyBox` を導入し、`HAKMEM_TINY_POLICY_PROFILE=legacy|c5_7_only|tinyplus_all` で Page/Warm ポリシーを切替可能にした。現状 legacy(PageBox= C5–C7, Warm= 全クラス cap 4/8)でランダム混在 Release は ~4.9M ops/s と低下しており、Warm 道の有効化状態を追加調査中。
|
||||
|
||||
**分析実施日**: 2025-11-28
|
||||
**分析対象**: HAKMEM allocator (commit 0ce20bb83)
|
||||
|
||||
50
core/box/link_missing_stubs.c
Normal file
50
core/box/link_missing_stubs.c
Normal file
@ -0,0 +1,50 @@
|
||||
// link_missing_stubs.c
|
||||
// Weak fallback definitions for optional diagnostics that may be compiled out
|
||||
// in certain build configurations. These ensure linking succeeds even when
|
||||
// the corresponding feature boxes are not included.
|
||||
|
||||
#include <stdatomic.h>
|
||||
#include <stdint.h>
|
||||
|
||||
// Minimal forward declarations to avoid pulling full tracing headers
|
||||
typedef int ptr_trace_event_t;
|
||||
typedef struct SlabRecyclingStats {
|
||||
uint64_t recycle_attempts;
|
||||
uint64_t recycle_success;
|
||||
uint64_t recycle_skip_not_empty;
|
||||
uint64_t recycle_skip_no_cap;
|
||||
uint64_t recycle_skip_null;
|
||||
} SlabRecyclingStats;
|
||||
|
||||
// lock_stats_box.h が存在しないビルド構成向けに前方宣言だけ置く
|
||||
void lock_stats_init(void);
|
||||
|
||||
// Ptr trace counters (used by tls_sll)
|
||||
_Atomic uint64_t g_ptr_trace_op_counter __attribute__((weak)) = 0;
|
||||
|
||||
void ptr_trace_record_impl(ptr_trace_event_t event, void* ptr, int class_idx, uint64_t op_num,
|
||||
void* aux_ptr, uint32_t aux_u32, int aux_int,
|
||||
const char* file, int line)
|
||||
__attribute__((weak));
|
||||
|
||||
void ptr_trace_record_impl(ptr_trace_event_t event, void* ptr, int class_idx, uint64_t op_num,
|
||||
void* aux_ptr, uint32_t aux_u32, int aux_int,
|
||||
const char* file, int line)
|
||||
{
|
||||
(void)event;
|
||||
(void)ptr;
|
||||
(void)class_idx;
|
||||
(void)op_num;
|
||||
(void)aux_ptr;
|
||||
(void)aux_u32;
|
||||
(void)aux_int;
|
||||
(void)file;
|
||||
(void)line;
|
||||
}
|
||||
|
||||
// Slab recycling stats (used in TLS drain instrumentation)
|
||||
__thread SlabRecyclingStats g_slab_recycle_stats __attribute__((weak)) = {0};
|
||||
|
||||
// Lock stats init (contention metrics)
|
||||
void lock_stats_init(void) __attribute__((weak));
|
||||
void lock_stats_init(void) {}
|
||||
94
core/box/tiny_class_policy_box.c
Normal file
94
core/box/tiny_class_policy_box.c
Normal file
@ -0,0 +1,94 @@
|
||||
// tiny_class_policy_box.c - Initialization of per-class Tiny policy table
|
||||
|
||||
#include "tiny_class_policy_box.h"
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
#include <strings.h>
|
||||
|
||||
TinyClassPolicy g_tiny_class_policy[TINY_NUM_CLASSES];
|
||||
static _Atomic int g_tiny_class_policy_init_done = 0;
|
||||
static _Atomic int g_tiny_class_policy_logged = 0;
|
||||
|
||||
static inline TinyClassPolicy tiny_class_policy_default_entry(void) {
|
||||
TinyClassPolicy p = {0};
|
||||
p.page_box_enabled = 0;
|
||||
p.warm_enabled = 0;
|
||||
p.warm_cap = 0;
|
||||
return p;
|
||||
}
|
||||
|
||||
static void tiny_class_policy_set_legacy(void) {
|
||||
TinyClassPolicy def = tiny_class_policy_default_entry();
|
||||
for (int i = 0; i < TINY_NUM_CLASSES; i++) {
|
||||
g_tiny_class_policy[i] = def;
|
||||
}
|
||||
|
||||
// legacy: Page Box は C5–C7、Warm は全クラス ON(C0–C4 は控えめ cap)
|
||||
for (int i = 0; i < TINY_NUM_CLASSES; i++) {
|
||||
g_tiny_class_policy[i].warm_enabled = 1;
|
||||
g_tiny_class_policy[i].warm_cap = (i < 5) ? 4 : 8;
|
||||
}
|
||||
for (int i = 5; i < TINY_NUM_CLASSES; i++) {
|
||||
g_tiny_class_policy[i].page_box_enabled = 1;
|
||||
}
|
||||
}
|
||||
|
||||
static void tiny_class_policy_set_c5_7_only(void) {
|
||||
TinyClassPolicy def = tiny_class_policy_default_entry();
|
||||
for (int i = 0; i < TINY_NUM_CLASSES; i++) {
|
||||
g_tiny_class_policy[i] = def;
|
||||
}
|
||||
for (int i = 5; i < TINY_NUM_CLASSES; i++) {
|
||||
g_tiny_class_policy[i].page_box_enabled = 1;
|
||||
g_tiny_class_policy[i].warm_enabled = 1;
|
||||
g_tiny_class_policy[i].warm_cap = 8;
|
||||
}
|
||||
}
|
||||
|
||||
static void tiny_class_policy_set_tinyplus_all(void) {
|
||||
// いまは legacy と同じ挙動でエントリを用意しておく。
|
||||
tiny_class_policy_set_legacy();
|
||||
}
|
||||
|
||||
static const char* tiny_class_policy_set_profile(const char* profile) {
|
||||
if (profile == NULL || *profile == '\0' || strcasecmp(profile, "legacy") == 0) {
|
||||
tiny_class_policy_set_legacy();
|
||||
return "legacy";
|
||||
} else if (strcasecmp(profile, "c5_7_only") == 0) {
|
||||
tiny_class_policy_set_c5_7_only();
|
||||
return "c5_7_only";
|
||||
} else if (strcasecmp(profile, "tinyplus_all") == 0) {
|
||||
tiny_class_policy_set_tinyplus_all();
|
||||
return "tinyplus_all";
|
||||
} else {
|
||||
// 不明な値は安全側で legacy にフォールバック。
|
||||
tiny_class_policy_set_legacy();
|
||||
return "legacy";
|
||||
}
|
||||
}
|
||||
|
||||
void tiny_class_policy_init_once(void) {
|
||||
if (atomic_load_explicit(&g_tiny_class_policy_init_done, memory_order_acquire)) {
|
||||
return;
|
||||
}
|
||||
|
||||
const char* profile = getenv("HAKMEM_TINY_POLICY_PROFILE");
|
||||
const char* active_profile = tiny_class_policy_set_profile(profile);
|
||||
|
||||
// 1-shot ダンプでポリシーの内容を可視化(デバッグ用)
|
||||
if (atomic_exchange_explicit(&g_tiny_class_policy_logged, 1, memory_order_acq_rel) == 0) {
|
||||
fprintf(stderr, "[POLICY_INIT] profile=%s\n", active_profile);
|
||||
for (int cls = 0; cls < TINY_NUM_CLASSES; cls++) {
|
||||
TinyClassPolicy* p = &g_tiny_class_policy[cls];
|
||||
fprintf(stderr,
|
||||
" C%d: page=%u warm=%u cap=%u\n",
|
||||
cls,
|
||||
p->page_box_enabled,
|
||||
p->warm_enabled,
|
||||
p->warm_cap);
|
||||
}
|
||||
}
|
||||
|
||||
atomic_store_explicit(&g_tiny_class_policy_init_done, 1, memory_order_release);
|
||||
}
|
||||
41
core/box/tiny_class_policy_box.h
Normal file
41
core/box/tiny_class_policy_box.h
Normal file
@ -0,0 +1,41 @@
|
||||
// tiny_class_policy_box.h - Class-scoped policy box for Tiny front-end
|
||||
//
|
||||
// Purpose:
|
||||
// - Centralize per-class feature toggles (Page Box / Warm Pool / caps).
|
||||
// - Keep hot paths free from direct ENV parsing or scattered conditionals.
|
||||
// - Defaults:
|
||||
// legacy (デフォルト): Page Box は C5–C7、Warm は C0–C7 で cap は小さめ
|
||||
// c5_7_only: Page/Warm とも C5–C7 のみ
|
||||
// tinyplus_all: 予備プロファイル(当面 legacy と同等)
|
||||
// - ENV: HAKMEM_TINY_POLICY_PROFILE=legacy|c5_7_only|tinyplus_all
|
||||
// - Learner が入るまでは固定ポリシーで運用し、Hot path は tiny_policy_get() を見るだけに保つ。
|
||||
|
||||
#ifndef TINY_CLASS_POLICY_BOX_H
|
||||
#define TINY_CLASS_POLICY_BOX_H
|
||||
|
||||
#include <stdatomic.h>
|
||||
#include <stdint.h>
|
||||
#include "../hakmem_tiny_config.h"
|
||||
|
||||
typedef struct TinyClassPolicy {
|
||||
uint8_t page_box_enabled; // Enable Tiny Page Box for this class
|
||||
uint8_t warm_enabled; // Enable Warm Pool for this class
|
||||
uint8_t warm_cap; // Max warm SuperSlabs to keep (per-thread)
|
||||
uint8_t reserved;
|
||||
} TinyClassPolicy;
|
||||
|
||||
extern TinyClassPolicy g_tiny_class_policy[TINY_NUM_CLASSES];
|
||||
|
||||
// Initialize policy table once (idempotent).
|
||||
void tiny_class_policy_init_once(void);
|
||||
|
||||
// Lightweight accessor for hot paths.
|
||||
static inline const TinyClassPolicy* tiny_policy_get(int class_idx) {
|
||||
if (class_idx < 0 || class_idx >= TINY_NUM_CLASSES) {
|
||||
return NULL;
|
||||
}
|
||||
tiny_class_policy_init_once();
|
||||
return &g_tiny_class_policy[class_idx];
|
||||
}
|
||||
|
||||
#endif // TINY_CLASS_POLICY_BOX_H
|
||||
10
core/box/tiny_class_stats_box.c
Normal file
10
core/box/tiny_class_stats_box.c
Normal file
@ -0,0 +1,10 @@
|
||||
// tiny_class_stats_box.c - Thread-local stats storage for Tiny classes
|
||||
|
||||
#include "tiny_class_stats_box.h"
|
||||
#include <string.h>
|
||||
|
||||
__thread TinyClassStatsThread g_tiny_class_stats = {0};
|
||||
|
||||
void tiny_class_stats_reset_thread(void) {
|
||||
memset(&g_tiny_class_stats, 0, sizeof(g_tiny_class_stats));
|
||||
}
|
||||
42
core/box/tiny_class_stats_box.h
Normal file
42
core/box/tiny_class_stats_box.h
Normal file
@ -0,0 +1,42 @@
|
||||
// tiny_class_stats_box.h - Lightweight per-thread class stats (OBSERVE layer)
|
||||
//
|
||||
// Purpose:
|
||||
// - Provide per-class counters without atomics for cheap observation.
|
||||
// - Hot paths call small inline helpers; aggregation/printing can be added later.
|
||||
|
||||
#ifndef TINY_CLASS_STATS_BOX_H
|
||||
#define TINY_CLASS_STATS_BOX_H
|
||||
|
||||
#include <stdint.h>
|
||||
#include "../hakmem_tiny_config.h"
|
||||
|
||||
typedef struct TinyClassStatsThread {
|
||||
uint64_t uc_miss[TINY_NUM_CLASSES]; // unified_cache_refill() hits
|
||||
uint64_t warm_hit[TINY_NUM_CLASSES]; // warm pool successes
|
||||
uint64_t shared_lock[TINY_NUM_CLASSES]; // shared pool lock acquisitions (hook as needed)
|
||||
} TinyClassStatsThread;
|
||||
|
||||
extern __thread TinyClassStatsThread g_tiny_class_stats;
|
||||
|
||||
static inline void tiny_class_stats_on_uc_miss(int ci) {
|
||||
if (ci >= 0 && ci < TINY_NUM_CLASSES) {
|
||||
g_tiny_class_stats.uc_miss[ci]++;
|
||||
}
|
||||
}
|
||||
|
||||
static inline void tiny_class_stats_on_warm_hit(int ci) {
|
||||
if (ci >= 0 && ci < TINY_NUM_CLASSES) {
|
||||
g_tiny_class_stats.warm_hit[ci]++;
|
||||
}
|
||||
}
|
||||
|
||||
static inline void tiny_class_stats_on_shared_lock(int ci) {
|
||||
if (ci >= 0 && ci < TINY_NUM_CLASSES) {
|
||||
g_tiny_class_stats.shared_lock[ci]++;
|
||||
}
|
||||
}
|
||||
|
||||
// Optional: reset per-thread counters (cold path only).
|
||||
void tiny_class_stats_reset_thread(void);
|
||||
|
||||
#endif // TINY_CLASS_STATS_BOX_H
|
||||
@ -86,9 +86,9 @@ static inline void tiny_page_box_init_once(void) {
|
||||
|
||||
const char* env = getenv("HAKMEM_TINY_PAGE_BOX_CLASSES");
|
||||
if (!env || !*env) {
|
||||
// Default: enable only C7
|
||||
if (7 < TINY_NUM_CLASSES) {
|
||||
g_tiny_page_box_state[7].enabled = 1;
|
||||
// Default: enable mid-size classes (C5–C7)
|
||||
for (int c = 5; c <= 7 && c < TINY_NUM_CLASSES; c++) {
|
||||
g_tiny_page_box_state[c].enabled = 1;
|
||||
}
|
||||
} else {
|
||||
// Parse simple comma-separated list of integers: "5,6,7"
|
||||
|
||||
7
core/box/tiny_policy_learner_box.c
Normal file
7
core/box/tiny_policy_learner_box.c
Normal file
@ -0,0 +1,7 @@
|
||||
// tiny_policy_learner_box.c - Placeholder learner hook
|
||||
|
||||
#include "tiny_policy_learner_box.h"
|
||||
|
||||
void tiny_policy_learner_tick(void) {
|
||||
// FROZEN/OBSERVE: intentionally empty.
|
||||
}
|
||||
11
core/box/tiny_policy_learner_box.h
Normal file
11
core/box/tiny_policy_learner_box.h
Normal file
@ -0,0 +1,11 @@
|
||||
// tiny_policy_learner_box.h - Placeholder for Tiny class policy learner
|
||||
//
|
||||
// Current mode: FROZEN/OBSERVE (no learning). Hook remains for future LEARN mode.
|
||||
|
||||
#ifndef TINY_POLICY_LEARNER_BOX_H
|
||||
#define TINY_POLICY_LEARNER_BOX_H
|
||||
|
||||
// Stub: will be extended when LEARN mode is enabled.
|
||||
void tiny_policy_learner_tick(void);
|
||||
|
||||
#endif // TINY_POLICY_LEARNER_BOX_H
|
||||
@ -16,6 +16,8 @@
|
||||
#include "../box/warm_pool_stats_box.h"
|
||||
#include "../box/warm_pool_rel_counters_box.h"
|
||||
|
||||
extern _Atomic uintptr_t g_c7_stage3_magic_ss;
|
||||
|
||||
static inline void warm_prefill_log_c7_meta(const char* tag, TinyTLSSlab* tls) {
|
||||
if (!tls || !tls->ss) return;
|
||||
#if HAKMEM_BUILD_RELEASE
|
||||
@ -23,8 +25,9 @@ static inline void warm_prefill_log_c7_meta(const char* tag, TinyTLSSlab* tls) {
|
||||
uint32_t n = atomic_fetch_add_explicit(&rel_logs, 1, memory_order_relaxed);
|
||||
if (n < 4) {
|
||||
TinySlabMeta* meta = &tls->ss->slabs[tls->slab_idx];
|
||||
uintptr_t magic = atomic_load_explicit(&g_c7_stage3_magic_ss, memory_order_relaxed);
|
||||
fprintf(stderr,
|
||||
"[REL_C7_%s] ss=%p slab=%u cls=%u used=%u cap=%u carved=%u freelist=%p\n",
|
||||
"[REL_C7_%s] ss=%p slab=%u cls=%u used=%u cap=%u carved=%u freelist=%p magic=%#lx\n",
|
||||
tag,
|
||||
(void*)tls->ss,
|
||||
(unsigned)tls->slab_idx,
|
||||
@ -32,15 +35,17 @@ static inline void warm_prefill_log_c7_meta(const char* tag, TinyTLSSlab* tls) {
|
||||
(unsigned)meta->used,
|
||||
(unsigned)meta->capacity,
|
||||
(unsigned)meta->carved,
|
||||
meta->freelist);
|
||||
meta->freelist,
|
||||
(unsigned long)magic);
|
||||
}
|
||||
#else
|
||||
static _Atomic uint32_t dbg_logs = 0;
|
||||
uint32_t n = atomic_fetch_add_explicit(&dbg_logs, 1, memory_order_relaxed);
|
||||
if (n < 4) {
|
||||
TinySlabMeta* meta = &tls->ss->slabs[tls->slab_idx];
|
||||
uintptr_t magic = atomic_load_explicit(&g_c7_stage3_magic_ss, memory_order_relaxed);
|
||||
fprintf(stderr,
|
||||
"[DBG_C7_%s] ss=%p slab=%u cls=%u used=%u cap=%u carved=%u freelist=%p\n",
|
||||
"[DBG_C7_%s] ss=%p slab=%u cls=%u used=%u cap=%u carved=%u freelist=%p magic=%#lx\n",
|
||||
tag,
|
||||
(void*)tls->ss,
|
||||
(unsigned)tls->slab_idx,
|
||||
@ -48,7 +53,8 @@ static inline void warm_prefill_log_c7_meta(const char* tag, TinyTLSSlab* tls) {
|
||||
(unsigned)meta->used,
|
||||
(unsigned)meta->capacity,
|
||||
(unsigned)meta->carved,
|
||||
meta->freelist);
|
||||
meta->freelist,
|
||||
(unsigned long)magic);
|
||||
}
|
||||
#endif
|
||||
}
|
||||
@ -84,7 +90,7 @@ extern SuperSlab* superslab_refill(int class_idx);
|
||||
//
|
||||
// Performance: Only triggered when pool is empty, cold path cost
|
||||
//
|
||||
static inline int warm_pool_do_prefill(int class_idx, TinyTLSSlab* tls) {
|
||||
static inline int warm_pool_do_prefill(int class_idx, TinyTLSSlab* tls, int warm_cap_hint) {
|
||||
#if HAKMEM_BUILD_RELEASE
|
||||
if (class_idx == 7) {
|
||||
warm_pool_rel_c7_prefill_call();
|
||||
@ -149,7 +155,7 @@ static inline int warm_pool_do_prefill(int class_idx, TinyTLSSlab* tls) {
|
||||
|
||||
if (budget > 1) {
|
||||
// Prefill mode: push to pool and load another
|
||||
tiny_warm_pool_push(class_idx, tls->ss);
|
||||
tiny_warm_pool_push_with_cap(class_idx, tls->ss, warm_cap_hint);
|
||||
warm_pool_record_prefilled(class_idx);
|
||||
#if HAKMEM_BUILD_RELEASE
|
||||
if (class_idx == 7) {
|
||||
|
||||
@ -21,6 +21,8 @@
|
||||
#include "../box/tiny_page_box.h" // Tiny-Plus Page Box (C5–C7 initial hook)
|
||||
#include "../box/ss_tls_bind_box.h" // Box: TLS Bind (SuperSlab -> TLS binding)
|
||||
#include "../box/tiny_tls_carve_one_block_box.h" // Box: TLS carve helper (shared)
|
||||
#include "../box/tiny_class_policy_box.h" // Box: per-class policy (Page/Warm caps)
|
||||
#include "../box/tiny_class_stats_box.h" // Box: lightweight per-class stats
|
||||
#include "../box/warm_tls_bind_logger_box.h" // Box: Warm TLS Bind logging (throttled)
|
||||
#define WARM_POOL_DBG_DEFINE
|
||||
#include "../box/warm_pool_dbg_box.h" // Box: Warm Pool C7 debug counters
|
||||
@ -516,6 +518,10 @@ hak_base_ptr_t unified_cache_refill(int class_idx) {
|
||||
tiny_warm_pool_init_once();
|
||||
|
||||
TinyUnifiedCache* cache = &g_unified_cache[class_idx];
|
||||
const TinyClassPolicy* policy = tiny_policy_get(class_idx);
|
||||
int warm_enabled = policy ? policy->warm_enabled : 0;
|
||||
int warm_cap = policy ? policy->warm_cap : 0;
|
||||
int page_enabled = policy ? policy->page_box_enabled : 0;
|
||||
|
||||
// ✅ Phase 11+: Ensure cache is initialized (lazy init for cold path)
|
||||
if (!cache->slots) {
|
||||
@ -560,7 +566,7 @@ hak_base_ptr_t unified_cache_refill(int class_idx) {
|
||||
// ========== PAGE BOX HOT PATH(Tiny-Plus 層): Try page box FIRST ==========
|
||||
// 将来的に C7 専用の page-level freelist 管理をここに統合する。
|
||||
// いまは stub 実装で常に 0 を返すが、Box 境界としての接続だけ先に行う。
|
||||
if (tiny_page_box_is_enabled(class_idx)) {
|
||||
if (page_enabled && tiny_page_box_is_enabled(class_idx)) {
|
||||
int page_produced = tiny_page_box_refill(class_idx, out, room);
|
||||
if (page_produced > 0) {
|
||||
// Store blocks into cache and return first
|
||||
@ -573,6 +579,7 @@ hak_base_ptr_t unified_cache_refill(int class_idx) {
|
||||
#if !HAKMEM_BUILD_RELEASE
|
||||
g_unified_cache_miss[class_idx]++;
|
||||
#endif
|
||||
tiny_class_stats_on_uc_miss(class_idx);
|
||||
|
||||
if (measure) {
|
||||
uint64_t end_cycles = read_tsc();
|
||||
@ -593,6 +600,18 @@ hak_base_ptr_t unified_cache_refill(int class_idx) {
|
||||
|
||||
// ========== WARM POOL HOT PATH: Check warm pool FIRST ==========
|
||||
// This is the critical optimization - avoid superslab_refill() registry scan
|
||||
if (warm_enabled) {
|
||||
if (class_idx == 7) {
|
||||
const TinyClassPolicy* pol = tiny_policy_get(7);
|
||||
static _Atomic int g_c7_policy_logged = 0;
|
||||
if (atomic_exchange_explicit(&g_c7_policy_logged, 1, memory_order_acq_rel) == 0) {
|
||||
fprintf(stderr,
|
||||
"[C7_POLICY_AT_WARM] page=%u warm=%u cap=%u\n",
|
||||
pol ? pol->page_box_enabled : 0,
|
||||
pol ? pol->warm_enabled : 0,
|
||||
pol ? pol->warm_cap : 0);
|
||||
}
|
||||
}
|
||||
#if !HAKMEM_BUILD_RELEASE
|
||||
atomic_fetch_add_explicit(&g_dbg_warm_pop_attempts, 1, memory_order_relaxed);
|
||||
if (class_idx == 7) {
|
||||
@ -606,12 +625,10 @@ hak_base_ptr_t unified_cache_refill(int class_idx) {
|
||||
#endif
|
||||
SuperSlab* warm_ss = tiny_warm_pool_pop(class_idx);
|
||||
if (warm_ss) {
|
||||
if (class_idx == 7) {
|
||||
#if !HAKMEM_BUILD_RELEASE
|
||||
if (class_idx == 7) {
|
||||
warm_pool_dbg_c7_hit();
|
||||
}
|
||||
// Debug-only: Warm TLS Bind experiment (C7 only)
|
||||
if (class_idx == 7) {
|
||||
#endif
|
||||
int warm_mode = warm_tls_bind_mode_c7();
|
||||
if (warm_mode >= 1) {
|
||||
int cap = ss_slabs_capacity(warm_ss);
|
||||
@ -633,18 +650,24 @@ hak_base_ptr_t unified_cache_refill(int class_idx) {
|
||||
|
||||
// Mode 2: carve a single block via TLS fast path
|
||||
if (warm_mode == 2) {
|
||||
#if !HAKMEM_BUILD_RELEASE
|
||||
warm_pool_dbg_c7_tls_attempt();
|
||||
#endif
|
||||
TinyTLSCarveOneResult tls_carve =
|
||||
tiny_tls_carve_one_block(tls, class_idx);
|
||||
if (tls_carve.block) {
|
||||
warm_tls_bind_log_tls_carve(warm_ss, slab_idx, tls_carve.block);
|
||||
#if !HAKMEM_BUILD_RELEASE
|
||||
warm_pool_dbg_c7_tls_success();
|
||||
#endif
|
||||
out[0] = tls_carve.block;
|
||||
produced = 1;
|
||||
tls_carved = 1;
|
||||
} else {
|
||||
warm_tls_bind_log_tls_fail(warm_ss, slab_idx);
|
||||
#if !HAKMEM_BUILD_RELEASE
|
||||
warm_pool_dbg_c7_tls_fail();
|
||||
#endif
|
||||
}
|
||||
}
|
||||
}
|
||||
@ -652,6 +675,7 @@ hak_base_ptr_t unified_cache_refill(int class_idx) {
|
||||
}
|
||||
}
|
||||
|
||||
#if !HAKMEM_BUILD_RELEASE
|
||||
atomic_fetch_add_explicit(&g_dbg_warm_pop_hits, 1, memory_order_relaxed);
|
||||
#endif
|
||||
// HOT PATH: Warm pool hit, try to carve directly
|
||||
@ -694,10 +718,11 @@ hak_base_ptr_t unified_cache_refill(int class_idx) {
|
||||
atomic_fetch_add_explicit(&g_rel_c7_warm_push, 1, memory_order_relaxed);
|
||||
}
|
||||
#endif
|
||||
tiny_warm_pool_push(class_idx, warm_ss);
|
||||
tiny_warm_pool_push_with_cap(class_idx, warm_ss, warm_cap);
|
||||
|
||||
// Track warm pool hit (always compiled, ENV-gated printing)
|
||||
warm_pool_record_hit(class_idx);
|
||||
tiny_class_stats_on_warm_hit(class_idx);
|
||||
|
||||
// Store blocks into cache and return first
|
||||
void* first = out[0];
|
||||
@ -709,6 +734,7 @@ hak_base_ptr_t unified_cache_refill(int class_idx) {
|
||||
#if !HAKMEM_BUILD_RELEASE
|
||||
g_unified_cache_miss[class_idx]++;
|
||||
#endif
|
||||
tiny_class_stats_on_uc_miss(class_idx);
|
||||
|
||||
if (measure) {
|
||||
uint64_t end_cycles = read_tsc();
|
||||
@ -746,16 +772,26 @@ hak_base_ptr_t unified_cache_refill(int class_idx) {
|
||||
// ========== COLD PATH: Warm pool miss, use superslab_refill ==========
|
||||
// Track warm pool miss (always compiled, ENV-gated printing)
|
||||
warm_pool_record_miss(class_idx);
|
||||
}
|
||||
|
||||
TinyTLSSlab* tls = &g_tls_slabs[class_idx];
|
||||
|
||||
// Step 1: Ensure SuperSlab available via normal refill
|
||||
// Enhanced: Use Warm Pool Prefill Box for secondary prefill when pool is empty
|
||||
if (warm_pool_do_prefill(class_idx, tls) < 0) {
|
||||
if (warm_enabled) {
|
||||
if (warm_pool_do_prefill(class_idx, tls, warm_cap) < 0) {
|
||||
return HAK_BASE_FROM_RAW(NULL);
|
||||
}
|
||||
// After prefill: tls->ss has the final slab for carving
|
||||
// tls = &g_tls_slabs[class_idx]; // Reload (already done in prefill box)
|
||||
tls = &g_tls_slabs[class_idx]; // Reload (already done in prefill box)
|
||||
} else {
|
||||
if (!tls->ss) {
|
||||
if (!superslab_refill(class_idx)) {
|
||||
return HAK_BASE_FROM_RAW(NULL);
|
||||
}
|
||||
tls = &g_tls_slabs[class_idx];
|
||||
}
|
||||
}
|
||||
|
||||
// Step 2: Direct carve from SuperSlab into local array (bypass TLS SLL!)
|
||||
TinySlabMeta* m = tls->meta;
|
||||
@ -844,6 +880,7 @@ hak_base_ptr_t unified_cache_refill(int class_idx) {
|
||||
}
|
||||
g_unified_cache_miss[class_idx]++;
|
||||
#endif
|
||||
tiny_class_stats_on_uc_miss(class_idx);
|
||||
|
||||
// Measure refill cycles
|
||||
if (measure) {
|
||||
|
||||
@ -87,16 +87,6 @@ static inline SuperSlab* tiny_warm_pool_pop(int class_idx) {
|
||||
return NULL;
|
||||
}
|
||||
|
||||
// O(1) push to warm pool
|
||||
// Returns: 1 if pushed successfully, 0 if pool full (caller should free to LRU)
|
||||
static inline int tiny_warm_pool_push(int class_idx, SuperSlab* ss) {
|
||||
if (g_tiny_warm_pool[class_idx].count < TINY_WARM_POOL_MAX_PER_CLASS) {
|
||||
g_tiny_warm_pool[class_idx].slabs[g_tiny_warm_pool[class_idx].count++] = ss;
|
||||
return 1;
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
// Get current count (for metrics/debugging)
|
||||
static inline int tiny_warm_pool_count(int class_idx) {
|
||||
return g_tiny_warm_pool[class_idx].count;
|
||||
@ -125,14 +115,35 @@ static inline int warm_pool_max_per_class(void) {
|
||||
return g_max;
|
||||
}
|
||||
|
||||
// Push with environment-configured capacity
|
||||
static inline int tiny_warm_pool_push_tunable(int class_idx, SuperSlab* ss) {
|
||||
int capacity = warm_pool_max_per_class();
|
||||
if (g_tiny_warm_pool[class_idx].count < capacity) {
|
||||
// O(1) push to warm pool (cap-aware)
|
||||
// cap_hint <=0 → use warm_pool_max_per_class() clamped to TINY_WARM_POOL_MAX_PER_CLASS
|
||||
static inline int tiny_warm_pool_push_with_cap(int class_idx, SuperSlab* ss, int cap_hint) {
|
||||
int limit = cap_hint;
|
||||
if (limit <= 0 || limit > TINY_WARM_POOL_MAX_PER_CLASS) {
|
||||
limit = warm_pool_max_per_class();
|
||||
if (limit <= 0) {
|
||||
limit = TINY_WARM_POOL_MAX_PER_CLASS;
|
||||
}
|
||||
if (limit > TINY_WARM_POOL_MAX_PER_CLASS) {
|
||||
limit = TINY_WARM_POOL_MAX_PER_CLASS;
|
||||
}
|
||||
}
|
||||
|
||||
if (g_tiny_warm_pool[class_idx].count < limit) {
|
||||
g_tiny_warm_pool[class_idx].slabs[g_tiny_warm_pool[class_idx].count++] = ss;
|
||||
return 1;
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
// Default push (uses ENV/default cap)
|
||||
static inline int tiny_warm_pool_push(int class_idx, SuperSlab* ss) {
|
||||
return tiny_warm_pool_push_with_cap(class_idx, ss, -1);
|
||||
}
|
||||
|
||||
// Push with environment-configured capacity (legacy name)
|
||||
static inline int tiny_warm_pool_push_tunable(int class_idx, SuperSlab* ss) {
|
||||
return tiny_warm_pool_push_with_cap(class_idx, ss, warm_pool_max_per_class());
|
||||
}
|
||||
|
||||
#endif // HAK_TINY_WARM_POOL_H
|
||||
|
||||
@ -12,10 +12,14 @@
|
||||
#include "front/tiny_warm_pool.h" // Warm Pool: Prefill during registry scans
|
||||
#include "box/ss_slab_reset_box.h" // Box: Reset slab metadata on reuse (C7 guard)
|
||||
|
||||
#include <stdint.h>
|
||||
#include <stdlib.h>
|
||||
#include <stdio.h>
|
||||
#include <stdatomic.h>
|
||||
|
||||
// Stage3(LRU) 由来の Superslab をトレースするための簡易マジック
|
||||
_Atomic uintptr_t g_c7_stage3_magic_ss = 0;
|
||||
|
||||
static inline void c7_log_meta_state(const char* tag, SuperSlab* ss, int slab_idx) {
|
||||
if (!ss) return;
|
||||
#if HAKMEM_BUILD_RELEASE
|
||||
@ -357,7 +361,8 @@ stage1_retry_after_tension_drain:
|
||||
|
||||
if (class_idx == 7) {
|
||||
TinySlabMeta* meta = &ss_guard->slabs[reuse_slot_idx];
|
||||
if (!c7_meta_is_pristine(meta)) {
|
||||
int meta_ok = (meta->used == 0) && (meta->carved == 0) && (meta->freelist == NULL);
|
||||
if (!meta_ok) {
|
||||
c7_log_skip_nonempty_acquire(ss_guard, reuse_slot_idx, meta, "SKIP_NONEMPTY_ACQUIRE");
|
||||
sp_freelist_push_lockfree(class_idx, reuse_meta, reuse_slot_idx);
|
||||
goto stage2_fallback;
|
||||
@ -418,6 +423,17 @@ stage1_retry_after_tension_drain:
|
||||
|
||||
*ss_out = ss;
|
||||
*slab_idx_out = reuse_slot_idx;
|
||||
if (class_idx == 7) {
|
||||
TinySlabMeta* meta_check = &ss->slabs[reuse_slot_idx];
|
||||
if (!((meta_check->used == 0) && (meta_check->carved == 0) && (meta_check->freelist == NULL))) {
|
||||
sp_freelist_push_lockfree(class_idx, reuse_meta, reuse_slot_idx);
|
||||
if (g_lock_stats_enabled == 1) {
|
||||
atomic_fetch_add(&g_lock_release_count, 1);
|
||||
}
|
||||
pthread_mutex_unlock(&g_shared_pool.alloc_lock);
|
||||
goto stage2_fallback;
|
||||
}
|
||||
}
|
||||
if (c7_reset_and_log_if_needed(ss, reuse_slot_idx, class_idx) != 0) {
|
||||
*ss_out = NULL;
|
||||
*slab_idx_out = -1;
|
||||
@ -497,7 +513,9 @@ stage2_fallback:
|
||||
|
||||
if (class_idx == 7) {
|
||||
TinySlabMeta* meta = &ss->slabs[claimed_idx];
|
||||
if (!c7_meta_is_pristine(meta)) {
|
||||
int meta_ok = (meta->used == 0) && (meta->carved == 0) &&
|
||||
(meta->freelist == NULL);
|
||||
if (!meta_ok) {
|
||||
c7_log_skip_nonempty_acquire(ss, claimed_idx, meta, "SKIP_NONEMPTY_ACQUIRE");
|
||||
sp_slot_mark_empty(hint_meta, claimed_idx);
|
||||
if (g_lock_stats_enabled == 1) {
|
||||
@ -523,6 +541,20 @@ stage2_fallback:
|
||||
// Hint is still good, no need to update
|
||||
*ss_out = ss;
|
||||
*slab_idx_out = claimed_idx;
|
||||
if (class_idx == 7) {
|
||||
TinySlabMeta* meta_check = &ss->slabs[claimed_idx];
|
||||
if (!((meta_check->used == 0) && (meta_check->carved == 0) &&
|
||||
(meta_check->freelist == NULL))) {
|
||||
sp_slot_mark_empty(hint_meta, claimed_idx);
|
||||
*ss_out = NULL;
|
||||
*slab_idx_out = -1;
|
||||
if (g_lock_stats_enabled == 1) {
|
||||
atomic_fetch_add(&g_lock_release_count, 1);
|
||||
}
|
||||
pthread_mutex_unlock(&g_shared_pool.alloc_lock);
|
||||
goto stage2_scan;
|
||||
}
|
||||
}
|
||||
if (c7_reset_and_log_if_needed(ss, claimed_idx, class_idx) != 0) {
|
||||
*ss_out = NULL;
|
||||
*slab_idx_out = -1;
|
||||
@ -613,7 +645,9 @@ stage2_scan:
|
||||
|
||||
if (class_idx == 7) {
|
||||
TinySlabMeta* meta_slab = &ss->slabs[claimed_idx];
|
||||
if (!c7_meta_is_pristine(meta_slab)) {
|
||||
int meta_ok = (meta_slab->used == 0) && (meta_slab->carved == 0) &&
|
||||
(meta_slab->freelist == NULL);
|
||||
if (!meta_ok) {
|
||||
c7_log_skip_nonempty_acquire(ss, claimed_idx, meta_slab, "SKIP_NONEMPTY_ACQUIRE");
|
||||
sp_slot_mark_empty(meta, claimed_idx);
|
||||
if (g_lock_stats_enabled == 1) {
|
||||
@ -641,6 +675,20 @@ stage2_scan:
|
||||
|
||||
*ss_out = ss;
|
||||
*slab_idx_out = claimed_idx;
|
||||
if (class_idx == 7) {
|
||||
TinySlabMeta* meta_check = &ss->slabs[claimed_idx];
|
||||
if (!((meta_check->used == 0) && (meta_check->carved == 0) &&
|
||||
(meta_check->freelist == NULL))) {
|
||||
sp_slot_mark_empty(meta, claimed_idx);
|
||||
*ss_out = NULL;
|
||||
*slab_idx_out = -1;
|
||||
if (g_lock_stats_enabled == 1) {
|
||||
atomic_fetch_add(&g_lock_release_count, 1);
|
||||
}
|
||||
pthread_mutex_unlock(&g_shared_pool.alloc_lock);
|
||||
continue;
|
||||
}
|
||||
}
|
||||
if (c7_reset_and_log_if_needed(ss, claimed_idx, class_idx) != 0) {
|
||||
*ss_out = NULL;
|
||||
*slab_idx_out = -1;
|
||||
@ -721,9 +769,14 @@ stage2_scan:
|
||||
|
||||
// Stage 3a: Try LRU cache
|
||||
extern SuperSlab* hak_ss_lru_pop(uint8_t size_class);
|
||||
int from_lru = 0;
|
||||
if (class_idx != 7) {
|
||||
new_ss = hak_ss_lru_pop((uint8_t)class_idx);
|
||||
|
||||
int from_lru = (new_ss != NULL);
|
||||
from_lru = (new_ss != NULL);
|
||||
} else {
|
||||
// C7: Stage3 LRU 再利用は一旦封じる(再利用が汚染源かを切り分ける)
|
||||
atomic_store_explicit(&g_c7_stage3_magic_ss, 0, memory_order_relaxed);
|
||||
}
|
||||
|
||||
// Stage 3b: If LRU miss, allocate new SuperSlab
|
||||
if (!new_ss) {
|
||||
@ -752,6 +805,10 @@ stage2_scan:
|
||||
}
|
||||
|
||||
new_ss = allocated_ss;
|
||||
if (class_idx == 7) {
|
||||
// Stage3 経由の C7 Superslab は新規確保のみ(magic もリセット扱い)
|
||||
atomic_store_explicit(&g_c7_stage3_magic_ss, 0, memory_order_relaxed);
|
||||
}
|
||||
|
||||
// Add newly allocated SuperSlab to the shared pool's internal array
|
||||
if (g_shared_pool.total_count >= g_shared_pool.capacity) {
|
||||
@ -771,6 +828,29 @@ stage2_scan:
|
||||
g_shared_pool.total_count++;
|
||||
}
|
||||
|
||||
// C7: LRU 再利用・新規確保いずれでも、空スラブに完全リセットしてから返す
|
||||
if (class_idx == 7 && new_ss) {
|
||||
int cap = ss_slabs_capacity(new_ss);
|
||||
new_ss->slab_bitmap = 0;
|
||||
new_ss->nonempty_mask = 0;
|
||||
new_ss->freelist_mask = 0;
|
||||
new_ss->empty_mask = 0;
|
||||
new_ss->empty_count = 0;
|
||||
new_ss->active_slabs = 0;
|
||||
new_ss->hot_count = 0;
|
||||
new_ss->cold_count = 0;
|
||||
for (int s = 0; s < cap; s++) {
|
||||
ss_slab_reset_meta_for_tiny(new_ss, s, class_idx);
|
||||
}
|
||||
static _Atomic uint32_t rel_stage3_reset_logs = 0;
|
||||
uint32_t n = atomic_fetch_add_explicit(&rel_stage3_reset_logs, 1, memory_order_relaxed);
|
||||
if (n < 4) {
|
||||
fprintf(stderr,
|
||||
"[REL_C7_STAGE3_RESET] ss=%p from_lru=%d cap=%d\n",
|
||||
(void*)new_ss, from_lru, cap);
|
||||
}
|
||||
}
|
||||
|
||||
#if !HAKMEM_BUILD_RELEASE
|
||||
if (dbg_acquire == 1 && new_ss) {
|
||||
fprintf(stderr, "[SP_ACQUIRE_STAGE3] class=%d new SuperSlab (ss=%p from_lru=%d)\n",
|
||||
|
||||
Reference in New Issue
Block a user