Add Page Box layer for C7 class optimization

- Implement tiny_page_box.c/h: per-thread page cache between UC and Shared Pool
- Integrate Page Box into Unified Cache refill path
- Remove legacy SuperSlab implementation (merged into smallmid)
- Add HAKMEM_TINY_PAGE_BOX_CLASSES env var for selective class enabling
- Update bench_random_mixed.c with Page Box statistics

Current status: Implementation safe, no regressions.
Page Box ON/OFF shows minimal difference - pool strategy needs tuning.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Moe Charm (CI)
2025-12-05 15:31:44 +09:00
parent 2b2b607957
commit 093f362231
16 changed files with 651 additions and 1347 deletions

View File

@ -1,21 +1,64 @@
## HAKMEM 状況メモ (2025-12-XX 更新)
## HAKMEM 状況メモ (2025-12-05 更新)
### 現在の状態
- Mid MT 層を完全撤去コード・ビルド依存・free 早期分岐を削除し、Mid/Large は ACE+Pool の一本化
- Mid W_MAX を 2.0 に緩和し、3252KB Bridge クラス経路が確実に当たるよう調整。33KB 帯のセグフォは解消済み
- free ラッパーは Superslab/Tiny ガードを維持しつつ、Mid/L2/L25 へのルートを確実化(Superslab 未登録 Tiny は無視、Mid/L2/L25 は分類+レジストリで捕捉)
- Mid/L2/L25 ラップ判定はデフォルト ON`HAKMEM_WRAP_L2=0` / `HAKMEM_WRAP_L25=0` で OFF。ネスト再帰のみブロック。
### 現在の状態Tiny / Superslab / Warm Pool
- Tiny Front / Superslab / Shared Pool は Box Theory 準拠で 3 層構造に整理済みHOT/WARM/COLD
- Tiny Gatekeeper Boxalloc/freeと Tiny Route Box により、USER→BASE 変換と Tiny vs Pool のルーティングを入口 1 箇所に集約
- Superslab Tier BoxHOT/DRAINING/FREE Release Guard Box により、SuperSlab ライフサイクルと eager FREE の安全な境界を定義
- Warm Pool 層:
- `tiny_warm_pool.h`: per-thread の HOT SuperSlab プール。
- `warm_pool_stats_box.h`: hits/misses/prefilled の統計箱。
- `warm_pool_prefill_box.h`: registry スキャン時に Warm Pool を事前充填する cold-path helper。
- Prefault Box`ss_prefault_box.h`は追加済みだが、4MB MAP_POPULATE 問題を避けるためデフォルト OFF`HAKMEM_SS_PREFAULT=0`)に設定。
### 直近の成果
- bench 再現: `./bench_mid_large_mt_hakmem 4 20000 1024 4` 完走、ACE-FAIL スパムもなし
- Mid MT のビルド/初期化/依存をすべて除去、Makefile も整理。
- Gatekeeper inliningPhase A-1完了`malloc`/`free` ラッパの関数呼び出しを削減しつつ、Box 境界は維持
- Unified Cache Refill の debug 検証を 1 箇所に集約し、リリースビルドの HOT パスを軽量化:
- `bench_random_mixed_hakmem 1000000 256 42` が約 4.3M → 5.0M ops/s~+17%)に改善。
- Tiny-only/Tiny Mixed / Non-Tiny の条件差分をドキュメント化・分離:
- `bench_random_mixed_hakmem``HAKMEM_BENCH_MIN_SIZE` / `HAKMEM_BENCH_MAX_SIZE` を追加し、
- 8128BTiny-only
- 1291024BTiny C5C7 専用)
を個別に測定可能にした。
- `docs/PERF_ANALYSIS_TINY_MIXED.md` ほかに、8128B/200K/ws=400旧 Tiny 専用)と現在の 161024B/1M/ws=256Tiny+Non-Tiny 混在)の違いを明記。
- Unified Cache Refill 安全化Step 1 完了):
- `core/front/tiny_unified_cache.c``unified_cache_refill()``max_batch <= 256` を保証し、`out[256]` と常に整合するよう修正。
- C5〜C7 の Unified Cache 容量・バッチサイズを増やす実験を行ってもスタック破壊が起きない状態にした。
- Tiny Page BoxC7 Tiny-Plus 層の導入Step 2 第1段階完了:
- `core/box/tiny_page_box.h` / `core/box/tiny_page_box.c` を追加し、`HAKMEM_TINY_PAGE_BOX_CLASSES` で有効クラスを制御できる Page Box を実装。
- `tiny_tls_bind_slab()` から `tiny_page_box_on_new_slab()` を呼び出し、TLS が bind した C7 slab を per-thread の page pool に登録。
- `unified_cache_refill()` の先頭に Page Box 経路を追加し、C7 では「TLS が掴んでいるページ内 freelist/carve」からバッチ供給を試みてから Warm Pool / Shared Pool に落ちるようにしたBox 境界は `Tiny Page Box → Warm Pool → Shared Pool` の順序を維持)。
### 利用のポイント
- 33KB 帯の挙動確認は ACE/Pool のみで実施。断片化調整は `HAKMEM_WMAX_MID`(デフォルト 2.0)で行う。
- Tiny ヘッダー誤分類防止: Superslab 登録必須チェックを free/fast-free で維持。
- 旧 Mid MT が必要な場合は別ブランチ/過去コミットを参照(現行ブランチには存在しない)。
### 性能の現状Random Mixed, HEAD
- 条件: `bench_random_mixed_hakmem 1000000 256 42`1T, ws=256, RELEASE, 161024B
- HAKMEM: 約 5.0M ops/s
- system malloc: 約 90100M ops/s
- mimalloc: 約 120130M ops/s
- 条件: `bench_random_mixed_hakmem 1000000 256 42` +
`HAKMEM_BENCH_MIN_SIZE=8 HAKMEM_BENCH_MAX_SIZE=128`Tiny-only, 8128B
- HAKMEM Tiny Front: 約 8090M ops/smimalloc と同オーダー)
- 条件: `bench_random_mixed_hakmem 1000000 256 42` +
`HAKMEM_BENCH_MIN_SIZE=129 HAKMEM_BENCH_MAX_SIZE=1024`Tiny C5C7 のみ)
- HAKMEM: 約 4.74.8M ops/s
- 結論:
- Tiny front 自体8128Bは十分速く、mimalloc と同オーダーまで出ている。
- 1291024B の Tiny C5C7 経路で Unified Cache hit=0 / Shared Pool ロック多発というボトルネックがあり、
Random Mixed 全体の性能を支配している。
### 残タスク/提案
1. docs/benchmarks/scripts の Mid MT 関連ドキュメント・スクリプトを整理/アーカイブ。
2. W_MAX/Cap の軽量 A/B環境変数で OKでフットプリント vs ヒット率を再計測。
3. `core/box/front_gate_classifier.d`, `hakmem.d`, `mimalloc-bench` の dirty 表示は必要に応じて無視/クリーン。
### 次にやること優先タスクC7 Page Box の実効性検証とチューニング)
1. **C7 Page Box 経路の実効性を計測**
- ENV: `HAKMEM_BENCH_MIN_SIZE=129 HAKMEM_BENCH_MAX_SIZE=1024` `HAKMEM_MEASURE_UNIFIED_CACHE=1`
`bench_random_mixed_hakmem 1000000 256 42` を実行し、C7 の:
- Unified Cache refill 回数・平均 cycles
- `shared_pool_acquire_slab(C7)` のロック回数
を、Page Box ON/OFF`HAKMEM_TINY_PAGE_BOX_CLASSES=` 未設定 vs `7`)で比較する。
2. **C7 の Unified Cache 容量・バッチサイズのチューニング**
- `HAKMEM_TINY_UNIFIED_C7``unified_cache_refill()``max_batch` 設定を変えつつ、
Page Box ON 時の C7 ヒット率・Shared Pool ロック回数・throughput を観測し、C7 にとって最適な容量/バッチサイズを探る。
3. **Page Box を C5/C6 に拡張するかの判断**
- C7 で十分な効果Shared Pool ロック大幅減 + throughput 向上)が得られた場合、
`HAKMEM_TINY_PAGE_BOX_CLASSES=5,6,7` を試し、C5/C6 も Tiny-Plus 化したときの安定性・性能を確認する。
- 問題がなければ、デフォルトプロファイルを「C5C7 Page Box 有効」に近づけるかを検討する。
### メモ
- ページフォルト問題は Prefault Box + ウォームアップで一定水準まで解消済みで、現在の主ボトルネックはユーザー空間の箱Unified Cache / free / Pool側に移っている。
- 以降の最適化は「箱を削る」ではなく、「HOT 層で踏む箱を減らし、Tiny 的なシンプル経路をどこまで広げるか」にフォーカスする。

View File

@ -219,12 +219,12 @@ LDFLAGS += $(EXTRA_LDFLAGS)
# Targets
TARGET = test_hakmem
OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o hakmem_smallmid_superslab.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/wrapper_env_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/link_stubs.o core/tiny_failfast.o test_hakmem.o
OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/tiny_page_box.o core/box/wrapper_env_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/link_stubs.o core/tiny_failfast.o test_hakmem.o
OBJS = $(OBJS_BASE)
# Shared library
SHARED_LIB = libhakmem.so
SHARED_OBJS = hakmem_shared.o hakmem_config_shared.o hakmem_tiny_config_shared.o hakmem_ucb1_shared.o hakmem_bigcache_shared.o hakmem_pool_shared.o hakmem_l25_pool_shared.o hakmem_site_rules_shared.o hakmem_tiny_shared.o core/box/ss_allocation_box_shared.o superslab_stats_shared.o superslab_cache_shared.o superslab_ace_shared.o superslab_slab_shared.o superslab_backend_shared.o core/superslab_head_stub_shared.o hakmem_smallmid_shared.o hakmem_smallmid_superslab_shared.o core/box/superslab_expansion_box_shared.o core/box/integrity_box_shared.o core/box/mailbox_box_shared.o core/box/front_gate_box_shared.o core/box/front_gate_classifier_shared.o core/box/free_publish_box_shared.o core/box/capacity_box_shared.o core/box/carve_push_box_shared.o core/box/prewarm_box_shared.o core/box/ss_hot_prewarm_box_shared.o core/box/front_metrics_box_shared.o core/box/bench_fast_box_shared.o core/box/ss_addr_map_box_shared.o core/box/slab_recycling_box_shared.o core/box/pagefault_telemetry_box_shared.o core/box/tiny_sizeclass_hist_box_shared.o core/box/tiny_env_box_shared.o core/box/tiny_route_box_shared.o core/box/wrapper_env_box_shared.o core/page_arena_shared.o core/front/tiny_unified_cache_shared.o core/tiny_alloc_fast_push_shared.o core/link_stubs_shared.o core/tiny_failfast_shared.o tiny_sticky_shared.o tiny_remote_shared.o tiny_publish_shared.o tiny_debug_ring_shared.o hakmem_tiny_magazine_shared.o hakmem_tiny_stats_shared.o hakmem_tiny_sfc_shared.o hakmem_tiny_query_shared.o hakmem_tiny_rss_shared.o hakmem_tiny_registry_shared.o hakmem_tiny_remote_target_shared.o hakmem_tiny_bg_spill_shared.o tiny_adaptive_sizing_shared.o hakmem_super_registry_shared.o hakmem_shared_pool_shared.o hakmem_shared_pool_acquire_shared.o hakmem_shared_pool_release_shared.o hakmem_elo_shared.o hakmem_batch_shared.o hakmem_p2_shared.o hakmem_sizeclass_dist_shared.o hakmem_evo_shared.o hakmem_debug_shared.o hakmem_sys_shared.o hakmem_whale_shared.o hakmem_policy_shared.o hakmem_ace_shared.o hakmem_ace_stats_shared.o hakmem_ace_controller_shared.o hakmem_ace_metrics_shared.o hakmem_ace_ucb1_shared.o hakmem_prof_shared.o hakmem_learner_shared.o hakmem_size_hist_shared.o hakmem_learn_log_shared.o hakmem_syscall_shared.o tiny_fastcache_shared.o
SHARED_OBJS = hakmem_shared.o hakmem_config_shared.o hakmem_tiny_config_shared.o hakmem_ucb1_shared.o hakmem_bigcache_shared.o hakmem_pool_shared.o hakmem_l25_pool_shared.o hakmem_site_rules_shared.o hakmem_tiny_shared.o core/box/ss_allocation_box_shared.o superslab_stats_shared.o superslab_cache_shared.o superslab_ace_shared.o superslab_slab_shared.o superslab_backend_shared.o core/superslab_head_stub_shared.o hakmem_smallmid_shared.o core/box/superslab_expansion_box_shared.o core/box/integrity_box_shared.o core/box/mailbox_box_shared.o core/box/front_gate_box_shared.o core/box/front_gate_classifier_shared.o core/box/free_publish_box_shared.o core/box/capacity_box_shared.o core/box/carve_push_box_shared.o core/box/prewarm_box_shared.o core/box/ss_hot_prewarm_box_shared.o core/box/front_metrics_box_shared.o core/box/bench_fast_box_shared.o core/box/ss_addr_map_box_shared.o core/box/slab_recycling_box_shared.o core/box/pagefault_telemetry_box_shared.o core/box/tiny_sizeclass_hist_box_shared.o core/box/tiny_env_box_shared.o core/box/tiny_route_box_shared.o core/box/tiny_page_box_shared.o core/box/wrapper_env_box_shared.o core/page_arena_shared.o core/front/tiny_unified_cache_shared.o core/tiny_alloc_fast_push_shared.o core/link_stubs_shared.o core/tiny_failfast_shared.o tiny_sticky_shared.o tiny_remote_shared.o tiny_publish_shared.o tiny_debug_ring_shared.o hakmem_tiny_magazine_shared.o hakmem_tiny_stats_shared.o hakmem_tiny_sfc_shared.o hakmem_tiny_query_shared.o hakmem_tiny_rss_shared.o hakmem_tiny_registry_shared.o hakmem_tiny_remote_target_shared.o hakmem_tiny_bg_spill_shared.o tiny_adaptive_sizing_shared.o hakmem_super_registry_shared.o hakmem_shared_pool_shared.o hakmem_shared_pool_acquire_shared.o hakmem_shared_pool_release_shared.o hakmem_elo_shared.o hakmem_batch_shared.o hakmem_p2_shared.o hakmem_sizeclass_dist_shared.o hakmem_evo_shared.o hakmem_debug_shared.o hakmem_sys_shared.o hakmem_whale_shared.o hakmem_policy_shared.o hakmem_ace_shared.o hakmem_ace_stats_shared.o hakmem_ace_controller_shared.o hakmem_ace_metrics_shared.o hakmem_ace_ucb1_shared.o hakmem_prof_shared.o hakmem_learner_shared.o hakmem_size_hist_shared.o hakmem_learn_log_shared.o hakmem_syscall_shared.o tiny_fastcache_shared.o
# Pool TLS Phase 1 (enable with POOL_TLS_PHASE1=1)
ifeq ($(POOL_TLS_PHASE1),1)
@ -251,7 +251,7 @@ endif
# Benchmark targets
BENCH_HAKMEM = bench_allocators_hakmem
BENCH_SYSTEM = bench_allocators_system
BENCH_HAKMEM_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o hakmem_smallmid_superslab.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/wrapper_env_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/link_stubs.o core/tiny_failfast.o bench_allocators_hakmem.o
BENCH_HAKMEM_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/tiny_page_box.o core/box/wrapper_env_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/link_stubs.o core/tiny_failfast.o bench_allocators_hakmem.o
BENCH_HAKMEM_OBJS = $(BENCH_HAKMEM_OBJS_BASE)
ifeq ($(POOL_TLS_PHASE1),1)
BENCH_HAKMEM_OBJS += pool_tls.o pool_refill.o pool_tls_arena.o pool_tls_registry.o pool_tls_remote.o
@ -428,7 +428,7 @@ test-box-refactor: box-refactor
./larson_hakmem 10 8 128 1024 1 12345 4
# Phase 4: Tiny Pool benchmarks (properly linked with hakmem)
TINY_BENCH_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o hakmem_smallmid_superslab.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/free_publish_box.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/slab_recycling_box.o core/box/tiny_sizeclass_hist_box.o core/box/pagefault_telemetry_box.o core/box/tiny_env_box.o core/box/wrapper_env_box.o core/page_arena.o core/front/tiny_unified_cache.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/tiny_alloc_fast_push.o core/link_stubs.o core/tiny_failfast.o core/box/tiny_route_box.o
TINY_BENCH_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/free_publish_box.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/slab_recycling_box.o core/box/tiny_sizeclass_hist_box.o core/box/pagefault_telemetry_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/tiny_page_box.o core/box/wrapper_env_box.o core/page_arena.o core/front/tiny_unified_cache.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/tiny_alloc_fast_push.o core/link_stubs.o core/tiny_failfast.o
TINY_BENCH_OBJS = $(TINY_BENCH_OBJS_BASE)
ifeq ($(POOL_TLS_PHASE1),1)
TINY_BENCH_OBJS += pool_tls.o pool_refill.o core/pool_tls_arena.o pool_tls_registry.o pool_tls_remote.o

View File

@ -46,6 +46,37 @@ int main(int argc, char** argv){
int ws = (argc>2)? atoi(argv[2]) : 8192; // working-set slots
uint32_t seed = (argc>3)? (uint32_t)strtoul(argv[3],NULL,10) : 1234567u;
// サイズレンジTiny-only / Non-Tiny-only の比較用)
// 既定: 16..1040 bytes元の挙動と同等
size_t min_size = 16u;
size_t max_size = 16u + 0x3FFu; // 16..1040 ≒ 16..1024
// 優先順位: argv[4]/argv[5] → ENV → 既定
if (argc > 4) {
long v = atol(argv[4]);
if (v > 0) min_size = (size_t)v;
} else {
const char* e = getenv("HAKMEM_BENCH_MIN_SIZE");
if (e && *e) {
long v = atol(e);
if (v > 0) min_size = (size_t)v;
}
}
if (argc > 5) {
long v = atol(argv[5]);
if (v > 0) max_size = (size_t)v;
} else {
const char* e = getenv("HAKMEM_BENCH_MAX_SIZE");
if (e && *e) {
long v = atol(e);
if (v > 0) max_size = (size_t)v;
}
}
if (min_size < 1) min_size = 1;
if (max_size < min_size) max_size = min_size;
if (cycles <= 0) cycles = 1;
if (ws <= 0) ws = 1024;
@ -79,6 +110,8 @@ int main(int argc, char** argv){
slots[idx] = NULL;
} else {
size_t sz = 16u + (r & 0x3FFu);
if (sz < min_size) sz = min_size;
if (sz > max_size) sz = max_size;
void* p = malloc(sz);
if (p) {
((unsigned char*)p)[0] = (unsigned char)r;
@ -115,7 +148,9 @@ int main(int argc, char** argv){
slots[idx] = NULL;
warmup_frees++;
} else {
size_t sz = 16u + (r & 0x3FFu); // 16..1040 bytes
size_t sz = 16u + (r & 0x3FFu); // 16..1040 bytes(後段でクランプ)
if (sz < min_size) sz = min_size;
if (sz > max_size) sz = max_size;
void* p = malloc(sz);
if (p) {
((unsigned char*)p)[0] = (unsigned char)r;
@ -152,8 +187,10 @@ int main(int argc, char** argv){
slots[idx] = NULL;
frees++;
} else {
// 16..1024 bytes (power-of-two-ish skew)
// 16..1024 bytes (power-of-two-ish skew, thenクランプ)
size_t sz = 16u + (r & 0x3FFu); // 16..1040 (approx 16..1024)
if (sz < min_size) sz = min_size;
if (sz > max_size) sz = max_size;
if (0 && i > 28300) { // DISABLED (Phase 2 perf)
fprintf(stderr, "[MALLOC] i=%d sz=%zu idx=%d\n", i, sz, idx);
fflush(stderr);

6
core/box/tiny_page_box.c Normal file
View File

@ -0,0 +1,6 @@
#include "tiny_page_box.h"
// TLS state definitions for Tiny Page Box
__thread TinyPageBoxState g_tiny_page_box_state[TINY_NUM_CLASSES];
__thread int g_tiny_page_box_init_done = 0;

321
core/box/tiny_page_box.h Normal file
View File

@ -0,0 +1,321 @@
// tiny_page_box.h - Tiny Page Box (Tiny-Plus layer for mid-size classes)
//
// Purpose:
// - Provide a per-class "page-level" freelist box that sits between
// Unified Cache and the Superslab/Warm Pool backend.
// - First target: class C7 (≈1KB) to reduce Shared Pool pressure.
//
// Box Contract:
// - API is generic over class_idx (0-7), but enabled-classes are controlled
// by ENV so that we can start with C7 only and later extend to C5/C6.
// - When enabled for a class:
// tiny_page_box_refill(class_idx, out, max) will try to supply up to
// `max` BASE pointers using per-page freelist before falling back.
// - When disabled for a class: the box returns 0 and caller uses legacy path.
//
// ENV:
// HAKMEM_TINY_PAGE_BOX_CLASSES (optional)
// - Comma-separated class indices, e.g. "7" or "5,6,7"
// - Default: only class 7 is enabled ("7")
#ifndef TINY_PAGE_BOX_H
#define TINY_PAGE_BOX_H
#include <stdint.h>
#include <stdlib.h>
#include <string.h>
#include "../hakmem_tiny_config.h"
#include "../tiny_box_geometry.h"
#include "../tiny_tls.h"
#include "../superslab/superslab_types.h" // For TinySlabMeta, SuperSlab
#include "../box/tiny_next_ptr_box.h" // For tiny_next_read()
#include "../hakmem_tiny_superslab.h" // For tiny_stride_for_class(), base helpers, superslab_ref_inc/dec
// Superslab active counterRelease Guard Box と整合性を取るためのカウンタ更新)
extern void ss_active_add(SuperSlab* ss, uint32_t n);
// 最大保持ページ数1クラスあたり
// C7 専用実験では 1〜2 枚あれば十分だが、将来 C5/C6 にも拡張することを考え 4 枚まで許容する。
#ifndef TINY_PAGE_BOX_MAX_PAGES
#define TINY_PAGE_BOX_MAX_PAGES 4
#endif
// 1 ページ分のメタデータ
typedef struct TinyPageDesc {
SuperSlab* ss;
TinySlabMeta* meta;
uint8_t* base;
uint8_t slab_idx;
uint8_t _pad[3];
} TinyPageDesc;
// Internal per-class page box state.
// Phase 2 では:
// - enabled: このクラスで Page Box を使うかどうか
// - num_pages: 現在保持しているページ数0〜TINY_PAGE_BOX_MAX_PAGES
// - pages[]: TLS が掴んだ C7/C5/C6 ページの ring小さなバッファ
typedef struct TinyPageBoxState {
uint8_t enabled; // 1=Page Box enabled for this class, 0=disabled
uint8_t num_pages; // 有効な pages[] エントリ数
uint8_t _pad[2];
TinyPageDesc pages[TINY_PAGE_BOX_MAX_PAGES];
} TinyPageBoxState;
// TLS/state: one TinyPageBoxState per classper-thread Box
extern __thread TinyPageBoxState g_tiny_page_box_state[TINY_NUM_CLASSES];
// One-shot init guardper-thread
extern __thread int g_tiny_page_box_init_done;
// Helper: parse class list from ENV and set enabled flags.
// Default behaviour (ENV unset/empty) is to enable class 7 only.
static inline void tiny_page_box_init_once(void) {
if (__builtin_expect(g_tiny_page_box_init_done, 1)) {
return;
}
// Clear all state
memset(g_tiny_page_box_state, 0, sizeof(g_tiny_page_box_state));
const char* env = getenv("HAKMEM_TINY_PAGE_BOX_CLASSES");
if (!env || !*env) {
// Default: enable only C7
if (7 < TINY_NUM_CLASSES) {
g_tiny_page_box_state[7].enabled = 1;
}
} else {
// Parse simple comma-separated list of integers: "5,6,7"
// We deliberately keep parsing logic simple and robust.
const char* p = env;
while (*p) {
// Skip non-digit characters (commas/spaces)
while (*p && (*p < '0' || *p > '9')) {
p++;
}
if (!*p) break;
int val = 0;
while (*p >= '0' && *p <= '9') {
val = val * 10 + (*p - '0');
p++;
}
if (val >= 0 && val < TINY_NUM_CLASSES) {
g_tiny_page_box_state[val].enabled = 1;
}
}
}
g_tiny_page_box_init_done = 1;
}
// Query: is Page Box enabled for this class?
static inline int tiny_page_box_is_enabled(int class_idx) {
if (__builtin_expect(!g_tiny_page_box_init_done, 0)) {
tiny_page_box_init_once();
}
if (class_idx < 0 || class_idx >= TINY_NUM_CLASSES) {
return 0;
}
return g_tiny_page_box_state[class_idx].enabled != 0;
}
// Forward declaration for TLS slab statetiny_tls.h から参照)
extern __thread TinyTLSSlab g_tls_slabs[TINY_NUM_CLASSES];
// 新しい TLS Slab が bind されたタイミングで呼び出されるフック。
// ここで Page Box が利用可能なページとして登録しておくことで、
// 後続の unified_cache_refill() から Superslab/Warm Pool に落ちる前に
// 「既に TLS が掴んでいるページ」を優先的に使えるようにする。
static inline void tiny_page_box_on_new_slab(TinyTLSSlab* tls)
{
if (!tls) {
return;
}
if (__builtin_expect(!g_tiny_page_box_init_done, 0)) {
tiny_page_box_init_once();
}
SuperSlab* ss = tls->ss;
TinySlabMeta* meta = tls->meta;
uint8_t* base = tls->slab_base;
int slab_idx = (int)tls->slab_idx;
if (!ss || !meta || !base) {
return;
}
int class_idx = (int)meta->class_idx;
if (class_idx < 0 || class_idx >= TINY_NUM_CLASSES) {
return;
}
TinyPageBoxState* st = &g_tiny_page_box_state[class_idx];
if (!st->enabled) {
return;
}
// 既に登録済みのページであれば更新だけ行うrefcount は増やさない)
for (uint8_t i = 0; i < st->num_pages; i++) {
TinyPageDesc* d = &st->pages[i];
if (d->ss == ss && d->slab_idx == tls->slab_idx) {
d->meta = meta;
d->base = base;
return;
}
}
// 新規登録: スロットに空きがあれば末尾に追加、なければ先頭を追い出す
uint8_t slot;
if (st->num_pages < TINY_PAGE_BOX_MAX_PAGES) {
slot = st->num_pages++;
} else {
// 先頭を追い出してシフト(最大 4 エントリなのでコストは無視できる)
TinyPageDesc* evict = &st->pages[0];
if (evict->ss) {
superslab_ref_dec(evict->ss);
}
if (TINY_PAGE_BOX_MAX_PAGES > 1) {
memmove(&st->pages[0],
&st->pages[1],
(TINY_PAGE_BOX_MAX_PAGES - 1) * sizeof(TinyPageDesc));
}
slot = (uint8_t)(st->num_pages - 1); // 末尾スロット
}
TinyPageDesc* d = &st->pages[slot];
d->ss = ss;
d->meta = meta;
d->base = base;
d->slab_idx = tls->slab_idx;
// Page Box で追跡している間は Superslab を pin しておく
superslab_ref_inc(ss);
}
// Phase 1 implementation strategy:
// - C7デフォルト有効クラスについては、既存の TLS SlabTinyTLSSlab上の
// freelist/carve ロジックをここに集約し、Superslab から新しい slab を取ることなく
// 「いま TLS が指している slab からだけ」ブロックを切り出す。
// - TLS に有効な slab がない場合tls->ss==NULLは 0 を返し、呼び出し側が
// 既存の Warm Pool / Superslab 経路で slab を確保する。
//
// これにより:
// - Box 境界として「Page Box」が成立し、
// - Hot 側では Page Box → Warm Pool → Shared Pool の順序を保ちながら、
// - Superslab/Shared Pool 呼び出し頻度を徐々に観測・調整できる。
static inline int tiny_page_box_refill(int class_idx,
void** out,
int max_out)
{
if (!tiny_page_box_is_enabled(class_idx)) {
return 0;
}
if (class_idx < 0 || class_idx >= TINY_NUM_CLASSES) {
return 0;
}
if (max_out <= 0) {
return 0;
}
TinyPageBoxState* st = &g_tiny_page_box_state[class_idx];
if (st->num_pages == 0) {
return 0;
}
size_t stride = tiny_stride_for_class(class_idx);
if (stride == 0) {
return 0;
}
int produced = 0;
// 保持しているページを順に走査し、freelist → carve の順にブロックを供給する。
// Page Box では Superslab や Tier/Guard を触らず、「既に TLS が掴んでいるページ」
// だけを対象にする。
for (uint8_t idx = 0; idx < st->num_pages && produced < max_out; /* idx はループ内で更新 */) {
TinyPageDesc* d = &st->pages[idx];
SuperSlab* ss = d->ss;
TinySlabMeta* m = d->meta;
uint8_t* base = d->base;
// 無効化されたエントリはその場で削除
if (!ss || !m || !base || (int)m->class_idx != class_idx) {
if (ss) {
superslab_ref_dec(ss);
}
uint8_t last = (uint8_t)(st->num_pages - 1);
if (idx < last) {
st->pages[idx] = st->pages[last];
}
st->pages[last].ss = NULL;
st->pages[last].meta = NULL;
st->pages[last].base = NULL;
st->pages[last].slab_idx = 0;
st->num_pages--;
continue; // 同じ idx に入れ替えられたエントリを再評価
}
int local_produced = 0;
// まず freelist から pop
while (produced < max_out && m->freelist) {
void* p = m->freelist;
void* next_node = tiny_next_read(class_idx, p);
// ヘッダ書き込みTLS SLL と同じ規約)
#if HAKMEM_TINY_HEADER_CLASSIDX
*(uint8_t*)p = (uint8_t)(0xa0 | (class_idx & 0x0f));
// freelist 更新と out[] への公開の間で再順序が起きないようフェンス
__atomic_thread_fence(__ATOMIC_RELEASE);
#endif
m->freelist = next_node;
m->used++;
out[produced++] = p;
local_produced++;
}
// freelist が尽きたら、同じ slab から線形 carve
while (produced < max_out && m->carved < m->capacity) {
void* p = (void*)(base + ((size_t)m->carved * stride));
#if HAKMEM_TINY_HEADER_CLASSIDX
*(uint8_t*)p = (uint8_t)(0xa0 | (class_idx & 0x0f));
#endif
m->carved++;
m->used++;
out[produced++] = p;
local_produced++;
}
if (local_produced > 0) {
// Superslab のアクティブカウンタを進めるRelease Guard との整合性保持)
ss_active_add(ss, (uint32_t)local_produced);
}
// このページが完全に枯渇した場合は ring から削除
if (!m->freelist && m->carved >= m->capacity) {
superslab_ref_dec(ss);
uint8_t last = (uint8_t)(st->num_pages - 1);
if (idx < last) {
st->pages[idx] = st->pages[last];
}
st->pages[last].ss = NULL;
st->pages[last].meta = NULL;
st->pages[last].base = NULL;
st->pages[last].slab_idx = 0;
st->num_pages--;
continue; // 同じ idx に入れ替えられたエントリを再評価
}
idx++;
}
return produced;
}
#endif // TINY_PAGE_BOX_H

View File

@ -14,6 +14,7 @@
#include "../box/slab_carve_box.h" // Box: Slab Carving (inline O(slabs) scan)
#include "../box/warm_pool_prefill_box.h" // Box: Warm Pool Prefill (secondary optimization)
#include "../hakmem_env_cache.h" // Priority-2: ENV cache (eliminate syscalls)
#include "../box/tiny_page_box.h" // Tiny-Plus Page Box (C5C7 initial hook)
#include <stdlib.h>
#include <string.h>
#include <stdatomic.h>
@ -28,6 +29,11 @@ _Atomic uint64_t g_unified_cache_hits_global = 0;
_Atomic uint64_t g_unified_cache_misses_global = 0;
_Atomic uint64_t g_unified_cache_refill_cycles_global = 0;
// Per-class countersTiny クラス別の Unified Cache 観測用)
_Atomic uint64_t g_unified_cache_hits_by_class[TINY_NUM_CLASSES] = {0};
_Atomic uint64_t g_unified_cache_misses_by_class[TINY_NUM_CLASSES] = {0};
_Atomic uint64_t g_unified_cache_refill_cycles_by_class[TINY_NUM_CLASSES] = {0};
// Helper: Get cycle count (x86_64 rdtsc)
static inline uint64_t read_tsc(void) {
#if defined(__x86_64__) || defined(_M_X64)
@ -418,11 +424,53 @@ hak_base_ptr_t unified_cache_refill(int class_idx) {
}
if (room <= 0) return HAK_BASE_FROM_RAW(NULL);
if (room > 128) room = 128; // Batch size limit
// Batch size limit(クラス別チューニング)
// - 通常: 128
// - C5〜C7129B〜1024B 混在レンジ): 256 まで拡張して refill 頻度を下げる
// - 安全性のため、下の out[] 配列サイズ256と常に整合させる
int max_batch = (class_idx >= 5 && class_idx <= 7) ? 256 : 128;
if (room > max_batch) room = max_batch;
void* out[128];
// NOTE:
// - C5〜C7 では max_batch を 256 まで拡張するため、スタック配列も 256 エントリ確保する。
// - これにより、room <= max_batch <= 256 が常に成り立ち、out[] オーバーランを防止する。
void* out[256];
int produced = 0;
// ========== PAGE BOX HOT PATHTiny-Plus 層): Try page box FIRST ==========
// 将来的に C7 専用の page-level freelist 管理をここに統合する。
// いまは stub 実装で常に 0 を返すが、Box 境界としての接続だけ先に行う。
if (tiny_page_box_is_enabled(class_idx)) {
int page_produced = tiny_page_box_refill(class_idx, out, room);
if (page_produced > 0) {
// Store blocks into cache and return first
void* first = out[0];
for (int i = 1; i < page_produced; i++) {
cache->slots[cache->tail] = out[i];
cache->tail = (cache->tail + 1) & cache->mask;
}
#if !HAKMEM_BUILD_RELEASE
g_unified_cache_miss[class_idx]++;
#endif
if (measure) {
uint64_t end_cycles = read_tsc();
uint64_t delta = end_cycles - start_cycles;
atomic_fetch_add_explicit(&g_unified_cache_refill_cycles_global,
delta, memory_order_relaxed);
atomic_fetch_add_explicit(&g_unified_cache_misses_global,
1, memory_order_relaxed);
atomic_fetch_add_explicit(&g_unified_cache_refill_cycles_by_class[class_idx],
delta, memory_order_relaxed);
atomic_fetch_add_explicit(&g_unified_cache_misses_by_class[class_idx],
1, memory_order_relaxed);
}
return HAK_BASE_FROM_RAW(first);
}
}
// ========== WARM POOL HOT PATH: Check warm pool FIRST ==========
// This is the critical optimization - avoid superslab_refill() registry scan
SuperSlab* warm_ss = tiny_warm_pool_pop(class_idx);
@ -455,8 +503,15 @@ hak_base_ptr_t unified_cache_refill(int class_idx) {
if (measure) {
uint64_t end_cycles = read_tsc();
uint64_t delta = end_cycles - start_cycles;
atomic_fetch_add_explicit(&g_unified_cache_refill_cycles_global, delta, memory_order_relaxed);
atomic_fetch_add_explicit(&g_unified_cache_misses_global, 1, memory_order_relaxed);
atomic_fetch_add_explicit(&g_unified_cache_refill_cycles_global,
delta, memory_order_relaxed);
atomic_fetch_add_explicit(&g_unified_cache_misses_global,
1, memory_order_relaxed);
// Per-class 集計C5C7 の refill コストを可視化)
atomic_fetch_add_explicit(&g_unified_cache_refill_cycles_by_class[class_idx],
delta, memory_order_relaxed);
atomic_fetch_add_explicit(&g_unified_cache_misses_by_class[class_idx],
1, memory_order_relaxed);
}
return HAK_BASE_FROM_RAW(first);
@ -574,8 +629,15 @@ hak_base_ptr_t unified_cache_refill(int class_idx) {
if (measure) {
uint64_t end_cycles = read_tsc();
uint64_t delta = end_cycles - start_cycles;
atomic_fetch_add_explicit(&g_unified_cache_refill_cycles_global, delta, memory_order_relaxed);
atomic_fetch_add_explicit(&g_unified_cache_misses_global, 1, memory_order_relaxed);
atomic_fetch_add_explicit(&g_unified_cache_refill_cycles_global,
delta, memory_order_relaxed);
atomic_fetch_add_explicit(&g_unified_cache_misses_global,
1, memory_order_relaxed);
// Per-class 集計
atomic_fetch_add_explicit(&g_unified_cache_refill_cycles_by_class[class_idx],
delta, memory_order_relaxed);
atomic_fetch_add_explicit(&g_unified_cache_misses_by_class[class_idx],
1, memory_order_relaxed);
}
return HAK_BASE_FROM_RAW(first); // Return first block (BASE pointer)
@ -615,6 +677,34 @@ void unified_cache_print_measurements(void) {
fprintf(stderr, "Hits: %llu\n", (unsigned long long)hits);
fprintf(stderr, "Misses: %llu\n", (unsigned long long)misses);
fprintf(stderr, "Hit Rate: %.1f%%\n", hit_rate);
fprintf(stderr, "Avg Refill Cycles: %.0f (est. %.2fus @ 1GHz)\n", avg_refill_cycles, avg_refill_us);
fprintf(stderr, "Avg Refill Cycles: %.0f (est. %.2fus @ 1GHz)\n",
avg_refill_cycles, avg_refill_us);
// Per-class breakdownTiny クラス 0-7、特に C5C7 を観測)
fprintf(stderr, "\nPer-class Unified Cache (Tiny classes):\n");
for (int cls = 0; cls < TINY_NUM_CLASSES; cls++) {
uint64_t ch = atomic_load_explicit(&g_unified_cache_hits_by_class[cls],
memory_order_relaxed);
uint64_t cm = atomic_load_explicit(&g_unified_cache_misses_by_class[cls],
memory_order_relaxed);
uint64_t cc = atomic_load_explicit(&g_unified_cache_refill_cycles_by_class[cls],
memory_order_relaxed);
uint64_t ct = ch + cm;
if (ct == 0 && cc == 0) {
continue; // 未使用クラスは省略
}
double cls_hit_rate = ct > 0 ? (100.0 * (double)ch / (double)ct) : 0.0;
double cls_avg_refill = cm > 0 ? (double)cc / (double)cm : 0.0;
double cls_avg_us = cls_avg_refill / 1000.0;
fprintf(stderr,
" C%d: hits=%llu miss=%llu hit=%.1f%% avg_refill=%.0f cyc (%.2fus @1GHz)\n",
cls,
(unsigned long long)ch,
(unsigned long long)cm,
cls_hit_rate,
cls_avg_refill,
cls_avg_us);
}
fprintf(stderr, "========================================\n\n");
}

View File

@ -43,7 +43,7 @@ core/front/tiny_unified_cache.o: core/front/tiny_unified_cache.c \
core/front/../box/warm_pool_prefill_box.h \
core/front/../box/../tiny_tls.h \
core/front/../box/../box/warm_pool_stats_box.h \
core/front/../hakmem_env_cache.h
core/front/../hakmem_env_cache.h core/front/../box/tiny_page_box.h
core/front/tiny_unified_cache.h:
core/front/../hakmem_build_flags.h:
core/front/../hakmem_tiny_config.h:
@ -108,3 +108,4 @@ core/front/../box/warm_pool_prefill_box.h:
core/front/../box/../tiny_tls.h:
core/front/../box/../box/warm_pool_stats_box.h:
core/front/../hakmem_env_cache.h:
core/front/../box/tiny_page_box.h:

View File

@ -40,6 +40,11 @@ extern _Atomic uint64_t g_unified_cache_hits_global;
extern _Atomic uint64_t g_unified_cache_misses_global;
extern _Atomic uint64_t g_unified_cache_refill_cycles_global;
// Per-class counters観測用 Box、ENV でのみ有効)
extern _Atomic uint64_t g_unified_cache_hits_by_class[TINY_NUM_CLASSES];
extern _Atomic uint64_t g_unified_cache_misses_by_class[TINY_NUM_CLASSES];
extern _Atomic uint64_t g_unified_cache_refill_cycles_by_class[TINY_NUM_CLASSES];
// Print statistics function
void unified_cache_print_measurements(void);
@ -102,10 +107,15 @@ static inline size_t unified_capacity(int class_idx) {
snprintf(env_name, sizeof(env_name), "HAKMEM_TINY_UNIFIED_C%d", class_idx);
const char* e = getenv(env_name);
// Default: Hot_2048 strategy (C2/C3=2048, others=64)
// Default: Hot_2048 strategy
// - C2/C3 (128B/256B): 2048 slots超ホット Tiny
// - C5/C6/C7 (>=129B): 128 slotsMixed 用に拡張)
// - その他: 64 slotsコールド
size_t default_cap = 64; // Cold classes
if (class_idx == 2 || class_idx == 3) {
default_cap = 2048; // Hot classes (128B, 256B)
default_cap = 2048; // Hot Tiny classes (128B, 256B)
} else if (class_idx >= 5 && class_idx <= 7) {
default_cap = 128; // Mixed workload classes (C5-C7: 129B-1024B)
}
g_cap[class_idx] = (e && *e) ? (size_t)atoi(e) : default_cap;
@ -265,9 +275,13 @@ static inline hak_base_ptr_t unified_cache_pop_or_refill(int class_idx) {
#if !HAKMEM_BUILD_RELEASE
g_unified_cache_hit[class_idx]++;
#endif
// Performance measurement: count cache hits
// Performance measurement: count cache hitsENV 有効時のみ)
if (__builtin_expect(unified_cache_measure_check(), 0)) {
atomic_fetch_add_explicit(&g_unified_cache_hits_global, 1, memory_order_relaxed);
atomic_fetch_add_explicit(&g_unified_cache_hits_global,
1, memory_order_relaxed);
// Per-class ヒットC5C7 の利用率も可視化)
atomic_fetch_add_explicit(&g_unified_cache_hits_by_class[class_idx],
1, memory_order_relaxed);
}
return HAK_BASE_FROM_RAW(base); // Hit! (2-3 cache misses total)
}

View File

@ -24,12 +24,22 @@ _Atomic uint64_t g_sp_stage2_lock_acquired_global = 0;
_Atomic uint64_t g_sp_stage3_lock_acquired_global = 0;
_Atomic uint64_t g_sp_alloc_lock_contention_global = 0;
// Per-class lock acquisition statisticsTiny クラス別の lock 負荷観測用)
_Atomic uint64_t g_sp_stage2_lock_acquired_by_class[TINY_NUM_CLASSES_SS] = {0};
_Atomic uint64_t g_sp_stage3_lock_acquired_by_class[TINY_NUM_CLASSES_SS] = {0};
// Check if measurement is enabled (cached)
static inline int sp_measure_enabled(void) {
static int g_measure = -1;
if (__builtin_expect(g_measure == -1, 0)) {
const char* e = getenv("HAKMEM_MEASURE_UNIFIED_CACHE");
g_measure = (e && *e && *e != '0') ? 1 : 0;
if (g_measure == 1) {
// Measurement が ON のときは per-class stage stats も有効化する
// Stage1/2/3 ヒット数は g_sp_stage*_hits に集計される)
extern int g_sp_stage_stats_enabled;
g_sp_stage_stats_enabled = 1;
}
}
return g_measure;
}
@ -319,8 +329,13 @@ stage2_fallback:
// Performance measurement: count Stage 2 lock acquisitions
if (__builtin_expect(sp_measure_enabled(), 0)) {
atomic_fetch_add_explicit(&g_sp_stage2_lock_acquired_global, 1, memory_order_relaxed);
atomic_fetch_add_explicit(&g_sp_alloc_lock_contention_global, 1, memory_order_relaxed);
atomic_fetch_add_explicit(&g_sp_stage2_lock_acquired_global,
1, memory_order_relaxed);
atomic_fetch_add_explicit(&g_sp_alloc_lock_contention_global,
1, memory_order_relaxed);
atomic_fetch_add_explicit(
&g_sp_stage2_lock_acquired_by_class[class_idx],
1, memory_order_relaxed);
}
// Update SuperSlab metadata under mutex
@ -408,8 +423,13 @@ stage2_scan:
// Performance measurement: count Stage 2 scan lock acquisitions
if (__builtin_expect(sp_measure_enabled(), 0)) {
atomic_fetch_add_explicit(&g_sp_stage2_lock_acquired_global, 1, memory_order_relaxed);
atomic_fetch_add_explicit(&g_sp_alloc_lock_contention_global, 1, memory_order_relaxed);
atomic_fetch_add_explicit(&g_sp_stage2_lock_acquired_global,
1, memory_order_relaxed);
atomic_fetch_add_explicit(&g_sp_alloc_lock_contention_global,
1, memory_order_relaxed);
atomic_fetch_add_explicit(
&g_sp_stage2_lock_acquired_by_class[class_idx],
1, memory_order_relaxed);
}
// Update SuperSlab metadata under mutex
@ -486,8 +506,12 @@ stage2_scan:
// Performance measurement: count Stage 3 lock acquisitions
if (__builtin_expect(sp_measure_enabled(), 0)) {
atomic_fetch_add_explicit(&g_sp_stage3_lock_acquired_global, 1, memory_order_relaxed);
atomic_fetch_add_explicit(&g_sp_alloc_lock_contention_global, 1, memory_order_relaxed);
atomic_fetch_add_explicit(&g_sp_stage3_lock_acquired_global,
1, memory_order_relaxed);
atomic_fetch_add_explicit(&g_sp_alloc_lock_contention_global,
1, memory_order_relaxed);
atomic_fetch_add_explicit(&g_sp_stage3_lock_acquired_by_class[class_idx],
1, memory_order_relaxed);
}
// ========== Stage 3: Get new SuperSlab ==========
@ -619,9 +643,12 @@ void shared_pool_print_measurements(void) {
return; // Measurement disabled
}
uint64_t stage2 = atomic_load_explicit(&g_sp_stage2_lock_acquired_global, memory_order_relaxed);
uint64_t stage3 = atomic_load_explicit(&g_sp_stage3_lock_acquired_global, memory_order_relaxed);
uint64_t total_locks = atomic_load_explicit(&g_sp_alloc_lock_contention_global, memory_order_relaxed);
uint64_t stage2 = atomic_load_explicit(&g_sp_stage2_lock_acquired_global,
memory_order_relaxed);
uint64_t stage3 = atomic_load_explicit(&g_sp_stage3_lock_acquired_global,
memory_order_relaxed);
uint64_t total_locks = atomic_load_explicit(&g_sp_alloc_lock_contention_global,
memory_order_relaxed);
if (total_locks == 0) {
fprintf(stderr, "\n========================================\n");
@ -644,5 +671,27 @@ void shared_pool_print_measurements(void) {
(unsigned long long)stage3, stage3_pct);
fprintf(stderr, "Total Contention: %llu lock acquisitions\n",
(unsigned long long)total_locks);
// Per-class breakdownTiny 用クラス 0-7、特に C5C7 を観測)
fprintf(stderr, "\nPer-class Shared Pool Locks (Stage2/Stage3):\n");
for (int cls = 0; cls < TINY_NUM_CLASSES_SS; cls++) {
uint64_t s2c = atomic_load_explicit(
&g_sp_stage2_lock_acquired_by_class[cls],
memory_order_relaxed);
uint64_t s3c = atomic_load_explicit(
&g_sp_stage3_lock_acquired_by_class[cls],
memory_order_relaxed);
uint64_t tc = s2c + s3c;
if (tc == 0) {
continue; // ロック取得のないクラスは省略
}
fprintf(stderr,
" C%d: Stage2=%llu Stage3=%llu Total=%llu\n",
cls,
(unsigned long long)s2c,
(unsigned long long)s3c,
(unsigned long long)tc);
}
fprintf(stderr, "========================================\n\n");
}

View File

@ -34,6 +34,10 @@ extern _Atomic uint64_t g_sp_stage3_hits[TINY_NUM_CLASSES_SS];
extern int g_sp_stage_stats_enabled;
void sp_stage_stats_init(void);
// Per-class lock acquisition statisticsStage2/Stage3 の alloc_lock 観測用)
extern _Atomic uint64_t g_sp_stage2_lock_acquired_by_class[TINY_NUM_CLASSES_SS];
extern _Atomic uint64_t g_sp_stage3_lock_acquired_by_class[TINY_NUM_CLASSES_SS];
// Internal Helpers (Shared between acquire/release/pool)
void shared_pool_ensure_capacity_unlocked(uint32_t min_capacity);
SuperSlab* sp_internal_allocate_superslab(int class_idx);

View File

@ -1,352 +1,34 @@
/**
* hakmem_smallmid.c - Small-Mid Allocator Front Box Implementation
*
* Phase 17-1: Front Box Only (No Dedicated SuperSlab Backend)
*
* Strategy (ChatGPT reviewed):
* - Thin front layer with TLS freelist (256B/512B/1KB)
* - Backend: Use existing Tiny SuperSlab/SharedPool APIs
* - Goal: Measure performance impact before building dedicated backend
* - A/B test: Does Small-Mid front improve 256-1KB performance?
*
* Architecture:
* - 3 size classes: 256B/512B/1KB (reduced from 5)
* - TLS freelist for fast alloc/free (static inline)
* - Backend: Call Tiny allocator APIs (reuse existing infrastructure)
* - ENV controlled (HAKMEM_SMALLMID_ENABLE=1)
*
* Created: 2025-11-16
* Updated: 2025-11-16 (Phase 17-1 revision - Front Box only)
*/
// hakmem_smallmid_superslab.c - Stub implementation
// 元の SmallMid SuperSlab backend は archive/smallmid/ に退避済み。
// 現在のビルドでは SmallMid は使用しないため、このファイルは空スタブ。
#include "hakmem_smallmid.h"
#include "hakmem_build_flags.h"
#include "hakmem_smallmid_superslab.h" // Phase 17-2: Dedicated backend
#include "tiny_region_id.h" // For header writing
#include "hakmem_env_cache.h" // Priority-2: ENV cache
#include <string.h>
#include <pthread.h>
// ============================================================================
// TLS State
// ============================================================================
// Stub 実装SmallMid は無効なので、全て「何もしない / 常に失敗」)
__thread void* g_smallmid_tls_head[SMALLMID_NUM_CLASSES] = {NULL};
__thread uint32_t g_smallmid_tls_count[SMALLMID_NUM_CLASSES] = {0};
// ============================================================================
// Size Class Table (Phase 17-1: 3 classes)
// ============================================================================
const size_t g_smallmid_class_sizes[SMALLMID_NUM_CLASSES] = {
256, // SM0: 256B
512, // SM1: 512B
1024 // SM2: 1KB
};
// ============================================================================
// Global State
// ============================================================================
static pthread_mutex_t g_smallmid_init_lock = PTHREAD_MUTEX_INITIALIZER;
static int g_smallmid_initialized = 0;
static int g_smallmid_enabled = -1; // -1 = not checked, 0 = disabled, 1 = enabled
// ============================================================================
// Statistics (Debug)
// ============================================================================
#ifdef HAKMEM_SMALLMID_STATS
SmallMidStats g_smallmid_stats = {0};
void smallmid_print_stats(void) {
fprintf(stderr, "\n=== Small-Mid Allocator Statistics ===\n");
fprintf(stderr, "Total allocs: %lu\n", g_smallmid_stats.total_allocs);
fprintf(stderr, "Total frees: %lu\n", g_smallmid_stats.total_frees);
fprintf(stderr, "TLS hits: %lu\n", g_smallmid_stats.tls_hits);
fprintf(stderr, "TLS misses: %lu\n", g_smallmid_stats.tls_misses);
fprintf(stderr, "SuperSlab refills: %lu\n", g_smallmid_stats.superslab_refills);
if (g_smallmid_stats.total_allocs > 0) {
double hit_rate = (double)g_smallmid_stats.tls_hits / g_smallmid_stats.total_allocs * 100.0;
fprintf(stderr, "TLS hit rate: %.2f%%\n", hit_rate);
}
fprintf(stderr, "=======================================\n\n");
}
#endif
// ============================================================================
// ENV Control
// ============================================================================
bool smallmid_is_enabled(void) {
if (__builtin_expect(g_smallmid_enabled == -1, 0)) {
// Priority-2: Use cached ENV
g_smallmid_enabled = HAK_ENV_SMALLMID_ENABLE();
if (g_smallmid_enabled) {
SMALLMID_LOG("Small-Mid allocator ENABLED (ENV: HAKMEM_SMALLMID_ENABLE=1)");
} else {
SMALLMID_LOG("Small-Mid allocator DISABLED (default, set HAKMEM_SMALLMID_ENABLE=1 to enable)");
}
}
return (g_smallmid_enabled == 1);
}
// ============================================================================
// Initialization
// ============================================================================
void smallmid_init(void) {
if (g_smallmid_initialized) return;
pthread_mutex_lock(&g_smallmid_init_lock);
if (!g_smallmid_initialized) {
SMALLMID_LOG("Initializing Small-Mid Front Box...");
// Check ENV
if (!smallmid_is_enabled()) {
SMALLMID_LOG("Small-Mid allocator is disabled, skipping initialization");
g_smallmid_initialized = 1;
pthread_mutex_unlock(&g_smallmid_init_lock);
return;
}
// Phase 17-1: No dedicated backend - use existing Tiny infrastructure
// No additional initialization needed (TLS state is static)
g_smallmid_initialized = 1;
SMALLMID_LOG("Small-Mid Front Box initialized (3 classes: 256B/512B/1KB, backend=Tiny)");
}
pthread_mutex_unlock(&g_smallmid_init_lock);
}
// ============================================================================
// TLS Freelist Operations
// ============================================================================
/**
* smallmid_tls_pop - Pop a block from TLS freelist
*
* @param class_idx Size class index
* @return Block pointer (with header), or NULL if empty
*/
static inline void* smallmid_tls_pop(int class_idx) {
void* head = g_smallmid_tls_head[class_idx];
if (!head) return NULL;
// Read next pointer (stored at offset 0 in user data, after 1-byte header)
void* next = *(void**)((uint8_t*)head + 1);
g_smallmid_tls_head[class_idx] = next;
g_smallmid_tls_count[class_idx]--;
#ifdef HAKMEM_SMALLMID_STATS
__atomic_fetch_add(&g_smallmid_stats.tls_hits, 1, __ATOMIC_RELAXED);
#endif
return head;
}
/**
* smallmid_tls_push - Push a block to TLS freelist
*
* @param class_idx Size class index
* @param ptr Block pointer (with header)
* @return true on success, false if TLS full
*/
static inline bool smallmid_tls_push(int class_idx, void* ptr) {
uint32_t capacity = smallmid_tls_capacity(class_idx);
if (g_smallmid_tls_count[class_idx] >= capacity) {
return false; // TLS full
}
// Write next pointer (at offset 0 in user data, after 1-byte header)
void* head = g_smallmid_tls_head[class_idx];
*(void**)((uint8_t*)ptr + 1) = head;
g_smallmid_tls_head[class_idx] = ptr;
g_smallmid_tls_count[class_idx]++;
return true;
}
// ============================================================================
// TLS Refill (Phase 17-2: Batch refill from dedicated SuperSlab)
// ============================================================================
/**
* smallmid_tls_refill - Refill TLS freelist from SuperSlab
*
* @param class_idx Size class index
* @return true on success, false on failure
*
* Strategy (Phase 17-2):
* - Batch refill 8-16 blocks from dedicated SmallMid SuperSlab
* - No Tiny delegation (completely separate backend)
* - Amortizes SuperSlab lookup cost across multiple blocks
* - Expected cost: ~1-2 instructions per block (amortized)
*/
static bool smallmid_tls_refill(int class_idx) {
// Determine batch size based on size class
const int batch_sizes[SMALLMID_NUM_CLASSES] = {
SMALLMID_REFILL_BATCH_256B, // 16 blocks
SMALLMID_REFILL_BATCH_512B, // 12 blocks
SMALLMID_REFILL_BATCH_1KB // 8 blocks
};
int batch_max = batch_sizes[class_idx];
void* batch[16]; // Max batch size
// Call SuperSlab batch refill
int refilled = smallmid_refill_batch(class_idx, batch, batch_max);
if (refilled == 0) {
SMALLMID_LOG("smallmid_tls_refill: SuperSlab refill failed (class=%d)", class_idx);
bool smallmid_is_in_range(size_t size) {
(void)size;
return false;
}
#ifdef HAKMEM_SMALLMID_STATS
__atomic_fetch_add(&g_smallmid_stats.tls_misses, 1, __ATOMIC_RELAXED);
__atomic_fetch_add(&g_smallmid_stats.superslab_refills, 1, __ATOMIC_RELAXED);
#endif
// Push blocks to TLS freelist (in reverse order for LIFO)
for (int i = refilled - 1; i >= 0; i--) {
void* user_ptr = batch[i];
void* base = (uint8_t*)user_ptr - 1;
if (!smallmid_tls_push(class_idx, base)) {
// TLS full - should not happen with proper batch sizing
SMALLMID_LOG("smallmid_tls_refill: TLS push failed (class=%d, i=%d)", class_idx, i);
break;
void smallmid_init(void) {
// No-op
}
}
SMALLMID_LOG("smallmid_tls_refill: Refilled %d blocks (class=%d)", refilled, class_idx);
return true;
}
// ============================================================================
// Allocation
// ============================================================================
void* smallmid_alloc(size_t size) {
// Check if enabled
if (!smallmid_is_enabled()) {
return NULL; // Disabled, fall through to Mid or other allocators
(void)size;
return NULL; // 呼び出し側は他の経路にフォールバック
}
// Initialize if needed
if (__builtin_expect(!g_smallmid_initialized, 0)) {
smallmid_init();
smallmid_superslab_init(); // Phase 17-2: Initialize SuperSlab backend
}
// Validate size range
if (__builtin_expect(!smallmid_is_in_range(size), 0)) {
SMALLMID_LOG("smallmid_alloc: size %zu out of range [%d-%d]",
size, SMALLMID_MIN_SIZE, SMALLMID_MAX_SIZE);
return NULL;
}
// Get size class
int class_idx = smallmid_size_to_class(size);
if (__builtin_expect(class_idx < 0, 0)) {
SMALLMID_LOG("smallmid_alloc: invalid class for size %zu", size);
return NULL;
}
#ifdef HAKMEM_SMALLMID_STATS
__atomic_fetch_add(&g_smallmid_stats.total_allocs, 1, __ATOMIC_RELAXED);
#endif
// Fast path: Pop from TLS freelist
void* ptr = smallmid_tls_pop(class_idx);
if (ptr) {
SMALLMID_LOG("smallmid_alloc(%zu) = %p (TLS hit, class=%d)", size, ptr, class_idx);
return (uint8_t*)ptr + 1; // Return user pointer (skip header)
}
// TLS miss: Refill from SuperSlab (Phase 17-2: Batch refill)
if (!smallmid_tls_refill(class_idx)) {
SMALLMID_LOG("smallmid_alloc(%zu) = NULL (refill failed)", size);
return NULL;
}
// Retry TLS pop after refill
ptr = smallmid_tls_pop(class_idx);
if (!ptr) {
SMALLMID_LOG("smallmid_alloc(%zu) = NULL (TLS pop failed after refill)", size);
return NULL;
}
SMALLMID_LOG("smallmid_alloc(%zu) = %p (TLS refill, class=%d)", size, ptr, class_idx);
return (uint8_t*)ptr + 1; // Return user pointer (skip header)
}
// ============================================================================
// Free
// ============================================================================
void smallmid_free(void* ptr) {
if (!ptr) return;
// Check if enabled
if (!smallmid_is_enabled()) {
return; // Disabled, should not be called
(void)ptr;
// No-op
}
#ifdef HAKMEM_SMALLMID_STATS
__atomic_fetch_add(&g_smallmid_stats.total_frees, 1, __ATOMIC_RELAXED);
#endif
// Phase 17-2: Read header to identify size class
uint8_t* base = (uint8_t*)ptr - 1;
uint8_t header = *base;
// Small-Mid allocations have magic 0xb0
uint8_t magic = header & 0xf0;
int class_idx = header & 0x0f;
if (magic != 0xb0 || class_idx < 0 || class_idx >= SMALLMID_NUM_CLASSES) {
// Invalid header - should not happen
SMALLMID_LOG("smallmid_free(%p): Invalid header 0x%02x", ptr, header);
return;
}
// Fast path: Push to TLS freelist
if (smallmid_tls_push(class_idx, base)) {
SMALLMID_LOG("smallmid_free(%p): pushed to TLS (class=%d)", ptr, class_idx);
return;
}
// TLS full: Push to SuperSlab freelist (slow path)
// TODO Phase 17-2.1: Implement SuperSlab freelist push
// For now, just log and leak (will be fixed in next commit)
SMALLMID_LOG("smallmid_free(%p): TLS full, SuperSlab freelist not yet implemented", ptr);
// Placeholder: Write next pointer to freelist (unsafe without SuperSlab lookup)
// This will be properly implemented with smallmid_superslab_lookup() in Phase 17-2.1
}
// ============================================================================
// Thread Cleanup
// ============================================================================
void smallmid_thread_exit(void) {
if (!smallmid_is_enabled()) return;
// No-op
}
SMALLMID_LOG("smallmid_thread_exit: cleaning up TLS state");
// Phase 17-1: Return TLS blocks to Tiny backend
for (int i = 0; i < SMALLMID_NUM_CLASSES; i++) {
void* head = g_smallmid_tls_head[i];
while (head) {
void* next = *(void**)((uint8_t*)head + 1);
void* user_ptr = (uint8_t*)head + 1;
smallmid_backend_free(user_ptr, 0);
head = next;
}
g_smallmid_tls_head[i] = NULL;
g_smallmid_tls_count[i] = 0;
}
bool smallmid_is_enabled(void) {
return false;
}

View File

@ -1,242 +1,44 @@
/**
* hakmem_smallmid.h - Small-Mid Allocator Box (256B-4KB)
* hakmem_smallmid.h - Small-Mid Allocator Stub (archived implementation)
*
* Phase 17: Dedicated allocator layer for 256B-4KB range
* Goal: Bridge the gap between Tiny (0-255B) and Mid (8KB+)
* 現在のコードベースでは SmallMid フロント/バックエンドは使用せず、
* Tiny/Tiny-Plus で 1KB までを処理します。
*
* Design Principles:
* - Dedicated SuperSlab pool (completely separated from Tiny)
* - 5 size classes: 256B / 512B / 1KB / 2KB / 4KB
* - TLS freelist (same structure as Tiny TLS SLL)
* - Header-based fast free (Phase 7 technology)
* - ENV control: HAKMEM_SMALLMID_ENABLE=1 for A/B testing
* ただし、既存の境界hak_alloc_api / hakmem_tiny からの呼び出し)は維持するため、
* API だけを残した「常に無効な Stub Box」として定義します。
*
* Target Performance:
* - Current: Tiny C6/C7 (512B/1KB) = 5.5M-5.9M ops/s (~6% of system malloc)
* - Goal: Small-Mid = 10M-20M ops/s (2-4x improvement)
*
* Architecture Boundaries:
* Tiny: 0-255B (C0-C5, existing design unchanged)
* Small-Mid: 256B-4KB (SM0-SM4, NEW!)
* Mid: 8KB-32KB (existing, page-unit efficient)
*
* Created: 2025-11-16 (Phase 17)
* 元の実装は archive/smallmid/ 以下に移動済みです。
*/
#ifndef HAKMEM_SMALLMID_H
#define HAKMEM_SMALLMID_H
#include <stddef.h>
#include <stdint.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#ifdef __cplusplus
extern "C" {
#endif
// ============================================================================
// Size Classes (Phase 17-1: Front Box Only, 3 classes)
// ============================================================================
// 旧実装と互換の定数だけ残す(値は使われないがビルドを安定させる)
#define SMALLMID_NUM_CLASSES 3
// Size class indices
#define SMALLMID_CLASS_256B 0 // 256B blocks
#define SMALLMID_CLASS_512B 1 // 512B blocks
#define SMALLMID_CLASS_1KB 2 // 1KB blocks
// 便宜上の範囲定義小さくしておくが、smallmid_is_enabled()==false なので実行はされない)
#define SMALLMID_MIN_SIZE (256)
#define SMALLMID_MAX_SIZE (1024)
// Size boundaries
#define SMALLMID_MIN_SIZE (256) // 256B (must be > Tiny max when enabled)
#define SMALLMID_MAX_SIZE (1024) // 1KB (reduced for Phase 17-1)
// API スタブ
// 旧実装と互換のヘルパー(インライン宣言のみ、実体は stub .c で定義)
bool smallmid_is_in_range(size_t size);
// ============================================================================
// TLS Freelist State
// ============================================================================
/**
* TLS freelist state (per-thread, per-class)
* - Same structure as Tiny TLS SLL
* - Completely separated from Tiny to avoid competition
*/
extern __thread void* g_smallmid_tls_head[SMALLMID_NUM_CLASSES];
extern __thread uint32_t g_smallmid_tls_count[SMALLMID_NUM_CLASSES];
// Capacity limits (per-class TLS cache)
// Phase 17-1: Conservative limits for Front Box
#define SMALLMID_TLS_CAPACITY_256B 32
#define SMALLMID_TLS_CAPACITY_512B 24
#define SMALLMID_TLS_CAPACITY_1KB 16
// ============================================================================
// Size Class Mapping
// ============================================================================
/**
* g_smallmid_class_sizes - Size class stride table
* Phase 17-1: [SM0]=256, [SM1]=512, [SM2]=1024
*/
extern const size_t g_smallmid_class_sizes[SMALLMID_NUM_CLASSES];
/**
* smallmid_size_to_class - Convert size to size class index
*
* @param size Allocation size (256-1024)
* @return Size class index (0-2), or -1 if out of range
*/
static inline int smallmid_size_to_class(size_t size) {
if (size <= 256) return SMALLMID_CLASS_256B;
if (size <= 512) return SMALLMID_CLASS_512B;
if (size <= 1024) return SMALLMID_CLASS_1KB;
return -1; // Out of range
}
/**
* smallmid_class_to_size - Convert size class to block size
*
* @param class_idx Size class index (0-2)
* @return Block size in bytes (256/512/1024)
*/
static inline size_t smallmid_class_to_size(int class_idx) {
static const size_t sizes[SMALLMID_NUM_CLASSES] = {
256, 512, 1024
};
return (class_idx >= 0 && class_idx < SMALLMID_NUM_CLASSES) ? sizes[class_idx] : 0;
}
/**
* smallmid_is_in_range - Check if size is in Small-Mid range
*
* @param size Allocation size
* @return true if 256B ≤ size ≤ 1KB
*
* PERF_OPT: Force inline to eliminate function call overhead in hot path
*/
__attribute__((always_inline))
static inline bool smallmid_is_in_range(size_t size) {
return (size >= SMALLMID_MIN_SIZE && size <= SMALLMID_MAX_SIZE);
}
/**
* smallmid_tls_capacity - Get TLS cache capacity for given class
*
* @param class_idx Size class index (0-2)
* @return TLS cache capacity
*/
static inline uint32_t smallmid_tls_capacity(int class_idx) {
static const uint32_t capacities[SMALLMID_NUM_CLASSES] = {
SMALLMID_TLS_CAPACITY_256B,
SMALLMID_TLS_CAPACITY_512B,
SMALLMID_TLS_CAPACITY_1KB
};
return (class_idx >= 0 && class_idx < SMALLMID_NUM_CLASSES) ? capacities[class_idx] : 0;
}
// ============================================================================
// API Functions
// ============================================================================
/**
* smallmid_init - Initialize Small-Mid allocator
*
* Call once at startup (thread-safe, idempotent)
* Sets up dedicated SuperSlab pool and TLS state
*/
void smallmid_init(void);
/**
* smallmid_alloc - Allocate memory from Small-Mid pool (256B-4KB)
*
* @param size Allocation size (must be 256 ≤ size ≤ 4096)
* @return Allocated pointer with header, or NULL on failure
*
* Thread-safety: Lock-free (uses TLS)
* Performance: O(1) fast path (TLS freelist pop/push)
*
* Fast path:
* 1. Check TLS freelist (most common, ~3-5 instructions)
* 2. Refill from dedicated SuperSlab if TLS empty
* 3. Allocate new SuperSlab if pool exhausted (rare)
*
* Header layout (Phase 7 compatible):
* [1 byte header: 0xa0 | class_idx][user data]
*/
void* smallmid_alloc(size_t size);
/**
* smallmid_free - Free memory allocated by smallmid_alloc
*
* @param ptr Pointer to free (must be from smallmid_alloc)
*
* Thread-safety: Lock-free if freeing to own thread's TLS
* Performance: O(1) fast path (header-based class identification)
*
* Header-based fast free (Phase 7 technology):
* - Read 1-byte header to get class_idx
* - Push to TLS freelist (or remote drain if TLS full)
*/
void smallmid_free(void* ptr);
/**
* smallmid_thread_exit - Cleanup thread-local state
*
* Called on thread exit to release TLS resources
* Should be registered via pthread_key_create or __attribute__((destructor))
*/
void smallmid_thread_exit(void);
// ============================================================================
// ENV Control
// ============================================================================
/**
* smallmid_is_enabled - Check if Small-Mid allocator is enabled
*
* ENV: HAKMEM_SMALLMID_ENABLE=1 to enable (default: 0 / disabled)
* @return true if enabled, false otherwise
*/
// SmallMid が有効かどうか(現在は常に false を返す stub 実装)
bool smallmid_is_enabled(void);
// ============================================================================
// Configuration
// ============================================================================
// Enable/disable Small-Mid allocator (ENV controlled, default OFF)
#ifndef HAKMEM_SMALLMID_ENABLE
#define HAKMEM_SMALLMID_ENABLE 0
#endif
// Debug logging
#ifndef SMALLMID_DEBUG
#define SMALLMID_DEBUG 0 // DISABLE for performance testing
#endif
#if SMALLMID_DEBUG
#include <stdio.h>
#define SMALLMID_LOG(fmt, ...) fprintf(stderr, "[SMALLMID] " fmt "\n", ##__VA_ARGS__)
#else
#define SMALLMID_LOG(fmt, ...) ((void)0)
#endif
// ============================================================================
// Statistics (Debug/Profiling)
// ============================================================================
#ifdef HAKMEM_SMALLMID_STATS
typedef struct SmallMidStats {
uint64_t total_allocs; // Total allocations
uint64_t total_frees; // Total frees
uint64_t tls_hits; // TLS freelist hits
uint64_t tls_misses; // TLS freelist misses (refill)
uint64_t superslab_refills; // SuperSlab refill count
} SmallMidStats;
extern SmallMidStats g_smallmid_stats;
void smallmid_print_stats(void);
#endif
#ifdef __cplusplus
}
#endif

View File

@ -1,429 +0,0 @@
/**
* hakmem_smallmid_superslab.c - Small-Mid SuperSlab Backend Implementation
*
* Phase 17-2: Dedicated SuperSlab pool for Small-Mid allocator
* Goal: 2-3x performance improvement via batch refills and dedicated backend
*
* Created: 2025-11-16
*/
#include "hakmem_smallmid_superslab.h"
#include "hakmem_smallmid.h"
#include <sys/mman.h>
#include <string.h>
#include <stdio.h>
#include <time.h>
#include <errno.h>
// ============================================================================
// Global State
// ============================================================================
SmallMidSSHead g_smallmid_ss_pools[SMALLMID_NUM_CLASSES];
static pthread_once_t g_smallmid_ss_init_once = PTHREAD_ONCE_INIT;
static int g_smallmid_ss_initialized = 0;
#ifdef HAKMEM_SMALLMID_SS_STATS
SmallMidSSStats g_smallmid_ss_stats = {0};
#endif
// ============================================================================
// Initialization
// ============================================================================
static void smallmid_superslab_init_once(void) {
for (int i = 0; i < SMALLMID_NUM_CLASSES; i++) {
SmallMidSSHead* pool = &g_smallmid_ss_pools[i];
pool->class_idx = i;
pool->total_ss = 0;
pool->first_ss = NULL;
pool->current_ss = NULL;
pool->lru_head = NULL;
pool->lru_tail = NULL;
pthread_mutex_init(&pool->lock, NULL);
pool->alloc_count = 0;
pool->refill_count = 0;
pool->ss_alloc_count = 0;
pool->ss_free_count = 0;
}
g_smallmid_ss_initialized = 1;
#ifndef SMALLMID_DEBUG
#define SMALLMID_DEBUG 0
#endif
#if SMALLMID_DEBUG
fprintf(stderr, "[SmallMid SuperSlab] Initialized (%d classes)\n", SMALLMID_NUM_CLASSES);
#endif
}
void smallmid_superslab_init(void) {
pthread_once(&g_smallmid_ss_init_once, smallmid_superslab_init_once);
}
// ============================================================================
// SuperSlab Allocation/Deallocation
// ============================================================================
/**
* smallmid_superslab_alloc - Allocate a new 1MB SuperSlab
*
* Strategy:
* - mmap 1MB aligned region (PROT_READ|WRITE, MAP_PRIVATE|ANONYMOUS)
* - Initialize header, metadata, counters
* - Add to per-class pool chain
* - Return SuperSlab pointer
*/
SmallMidSuperSlab* smallmid_superslab_alloc(int class_idx) {
if (class_idx < 0 || class_idx >= SMALLMID_NUM_CLASSES) {
return NULL;
}
// Allocate 1MB aligned region
void* mem = mmap(NULL, SMALLMID_SUPERSLAB_SIZE,
PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS,
-1, 0);
if (mem == MAP_FAILED) {
fprintf(stderr, "[SmallMid SS] mmap failed: %s\n", strerror(errno));
return NULL;
}
// Ensure alignment (mmap should return aligned address)
uintptr_t addr = (uintptr_t)mem;
if ((addr & (SMALLMID_SS_ALIGNMENT - 1)) != 0) {
fprintf(stderr, "[SmallMid SS] WARNING: mmap returned unaligned address %p\n", mem);
munmap(mem, SMALLMID_SUPERSLAB_SIZE);
return NULL;
}
SmallMidSuperSlab* ss = (SmallMidSuperSlab*)mem;
// Initialize header
ss->magic = SMALLMID_SS_MAGIC;
ss->num_slabs = SMALLMID_SLABS_PER_SS;
ss->active_slabs = 0;
ss->refcount = 1;
ss->total_active = 0;
ss->slab_bitmap = 0;
ss->nonempty_mask = 0;
ss->last_used_ns = 0;
ss->generation = 0;
ss->next = NULL;
ss->lru_next = NULL;
ss->lru_prev = NULL;
// Initialize slab metadata (all inactive initially)
for (int i = 0; i < SMALLMID_SLABS_PER_SS; i++) {
SmallMidSlabMeta* meta = &ss->slabs[i];
meta->freelist = NULL;
meta->used = 0;
meta->capacity = 0;
meta->carved = 0;
meta->class_idx = class_idx;
meta->flags = SMALLMID_SLAB_INACTIVE;
}
// Update pool stats
SmallMidSSHead* pool = &g_smallmid_ss_pools[class_idx];
atomic_fetch_add(&pool->total_ss, 1);
atomic_fetch_add(&pool->ss_alloc_count, 1);
#ifdef HAKMEM_SMALLMID_SS_STATS
atomic_fetch_add(&g_smallmid_ss_stats.total_ss_alloc, 1);
#endif
#if SMALLMID_DEBUG
fprintf(stderr, "[SmallMid SS] Allocated SuperSlab %p (class=%d, size=1MB)\n",
ss, class_idx);
#endif
return ss;
}
/**
* smallmid_superslab_free - Free a SuperSlab
*
* Strategy:
* - Validate refcount == 0 (all blocks freed)
* - munmap the 1MB region
* - Update pool stats
*/
void smallmid_superslab_free(SmallMidSuperSlab* ss) {
if (!ss || ss->magic != SMALLMID_SS_MAGIC) {
fprintf(stderr, "[SmallMid SS] ERROR: Invalid SuperSlab %p\n", ss);
return;
}
uint32_t refcount = atomic_load(&ss->refcount);
if (refcount > 0) {
fprintf(stderr, "[SmallMid SS] WARNING: Freeing SuperSlab with refcount=%u\n", refcount);
}
uint32_t active = atomic_load(&ss->total_active);
if (active > 0) {
fprintf(stderr, "[SmallMid SS] WARNING: Freeing SuperSlab with active blocks=%u\n", active);
}
// Invalidate magic
ss->magic = 0xDEADBEEF;
// munmap
if (munmap(ss, SMALLMID_SUPERSLAB_SIZE) != 0) {
fprintf(stderr, "[SmallMid SS] munmap failed: %s\n", strerror(errno));
}
#ifdef HAKMEM_SMALLMID_SS_STATS
atomic_fetch_add(&g_smallmid_ss_stats.total_ss_free, 1);
#endif
#if SMALLMID_DEBUG
fprintf(stderr, "[SmallMid SS] Freed SuperSlab %p\n", ss);
#endif
}
// ============================================================================
// Slab Initialization
// ============================================================================
/**
* smallmid_slab_init - Initialize a slab within SuperSlab
*
* Strategy:
* - Calculate slab base address (ss_base + slab_idx * 64KB)
* - Set capacity based on size class (256/128/64 blocks)
* - Mark slab as active
* - Update SuperSlab bitmaps
*/
void smallmid_slab_init(SmallMidSuperSlab* ss, int slab_idx, int class_idx) {
if (!ss || slab_idx < 0 || slab_idx >= SMALLMID_SLABS_PER_SS) {
return;
}
SmallMidSlabMeta* meta = &ss->slabs[slab_idx];
// Set capacity based on class
const uint16_t capacities[SMALLMID_NUM_CLASSES] = {
SMALLMID_BLOCKS_256B,
SMALLMID_BLOCKS_512B,
SMALLMID_BLOCKS_1KB
};
meta->freelist = NULL;
meta->used = 0;
meta->capacity = capacities[class_idx];
meta->carved = 0;
meta->class_idx = class_idx;
meta->flags = SMALLMID_SLAB_ACTIVE;
// Update SuperSlab bitmaps
ss->slab_bitmap |= (1u << slab_idx);
ss->nonempty_mask |= (1u << slab_idx);
ss->active_slabs++;
#if SMALLMID_DEBUG
fprintf(stderr, "[SmallMid SS] Initialized slab %d in SS %p (class=%d, capacity=%u)\n",
slab_idx, ss, class_idx, meta->capacity);
#endif
}
// ============================================================================
// Batch Refill (Performance-Critical Path)
// ============================================================================
/**
* smallmid_refill_batch - Batch refill TLS freelist from SuperSlab
*
* Performance target: 5-8 instructions per call (amortized)
*
* Strategy:
* 1. Try current slab's freelist (fast path: pop batch_max blocks)
* 2. Fall back to bump allocation if freelist empty
* 3. Allocate new slab if current is full
* 4. Allocate new SuperSlab if no slabs available
*
* Returns: Number of blocks refilled (0 on failure)
*/
int smallmid_refill_batch(int class_idx, void** batch_out, int batch_max) {
if (class_idx < 0 || class_idx >= SMALLMID_NUM_CLASSES || !batch_out || batch_max <= 0) {
return 0;
}
SmallMidSSHead* pool = &g_smallmid_ss_pools[class_idx];
// Ensure SuperSlab pool is initialized
if (!g_smallmid_ss_initialized) {
smallmid_superslab_init();
}
// Allocate first SuperSlab if needed
pthread_mutex_lock(&pool->lock);
if (!pool->current_ss) {
pool->current_ss = smallmid_superslab_alloc(class_idx);
if (!pool->current_ss) {
pthread_mutex_unlock(&pool->lock);
return 0;
}
// Add to chain
if (!pool->first_ss) {
pool->first_ss = pool->current_ss;
}
// Initialize first slab
smallmid_slab_init(pool->current_ss, 0, class_idx);
}
SmallMidSuperSlab* ss = pool->current_ss;
pthread_mutex_unlock(&pool->lock);
// Find active slab with available blocks
int slab_idx = -1;
SmallMidSlabMeta* meta = NULL;
for (int i = 0; i < SMALLMID_SLABS_PER_SS; i++) {
if (!(ss->slab_bitmap & (1u << i))) {
continue; // Slab not active
}
meta = &ss->slabs[i];
if (meta->used < meta->capacity) {
slab_idx = i;
break; // Found slab with space
}
}
// No slab with space - try to allocate new slab
if (slab_idx == -1) {
pthread_mutex_lock(&pool->lock);
// Find first inactive slab
for (int i = 0; i < SMALLMID_SLABS_PER_SS; i++) {
if (!(ss->slab_bitmap & (1u << i))) {
smallmid_slab_init(ss, i, class_idx);
slab_idx = i;
meta = &ss->slabs[i];
break;
}
}
pthread_mutex_unlock(&pool->lock);
// All slabs exhausted - need new SuperSlab
if (slab_idx == -1) {
pthread_mutex_lock(&pool->lock);
SmallMidSuperSlab* new_ss = smallmid_superslab_alloc(class_idx);
if (!new_ss) {
pthread_mutex_unlock(&pool->lock);
return 0;
}
// Link to chain
new_ss->next = pool->first_ss;
pool->first_ss = new_ss;
pool->current_ss = new_ss;
// Initialize first slab
smallmid_slab_init(new_ss, 0, class_idx);
pthread_mutex_unlock(&pool->lock);
ss = new_ss;
slab_idx = 0;
meta = &ss->slabs[0];
}
}
// Now we have a slab with available capacity
// Strategy: Try freelist first, then bump allocation
const size_t block_sizes[SMALLMID_NUM_CLASSES] = {256, 512, 1024};
size_t block_size = block_sizes[class_idx];
int refilled = 0;
// Calculate slab data base address
uintptr_t ss_base = (uintptr_t)ss;
uintptr_t slab_base = ss_base + (slab_idx * SMALLMID_SLAB_SIZE);
// Fast path: Pop from freelist (if available)
void* freelist_head = meta->freelist;
while (freelist_head && refilled < batch_max) {
// Add 1-byte header space (Phase 7 technology)
void* user_ptr = (uint8_t*)freelist_head + 1;
batch_out[refilled++] = user_ptr;
// Next block (freelist stored at offset 0 in user data)
freelist_head = *(void**)user_ptr;
}
meta->freelist = freelist_head;
// Slow path: Bump allocation
while (refilled < batch_max && meta->carved < meta->capacity) {
// Calculate block base address (with 1-byte header)
uintptr_t block_base = slab_base + (meta->carved * (block_size + 1));
void* base_ptr = (void*)block_base;
void* user_ptr = (uint8_t*)base_ptr + 1;
// Write header (0xb0 | class_idx)
*(uint8_t*)base_ptr = 0xb0 | class_idx;
batch_out[refilled++] = user_ptr;
meta->carved++;
meta->used++;
// Update SuperSlab active counter
atomic_fetch_add(&ss->total_active, 1);
}
// Update stats
atomic_fetch_add(&pool->alloc_count, refilled);
atomic_fetch_add(&pool->refill_count, 1);
#ifdef HAKMEM_SMALLMID_SS_STATS
atomic_fetch_add(&g_smallmid_ss_stats.total_refills, 1);
atomic_fetch_add(&g_smallmid_ss_stats.total_blocks_carved, refilled);
#endif
#if SMALLMID_DEBUG
if (refilled > 0) {
fprintf(stderr, "[SmallMid SS] Refilled %d blocks (class=%d, slab=%d, carved=%u/%u)\n",
refilled, class_idx, slab_idx, meta->carved, meta->capacity);
}
#endif
return refilled;
}
// ============================================================================
// Statistics
// ============================================================================
#ifdef HAKMEM_SMALLMID_SS_STATS
void smallmid_ss_print_stats(void) {
fprintf(stderr, "\n=== Small-Mid SuperSlab Statistics ===\n");
fprintf(stderr, "Total SuperSlab allocs: %lu\n", g_smallmid_ss_stats.total_ss_alloc);
fprintf(stderr, "Total SuperSlab frees: %lu\n", g_smallmid_ss_stats.total_ss_free);
fprintf(stderr, "Total refills: %lu\n", g_smallmid_ss_stats.total_refills);
fprintf(stderr, "Total blocks carved: %lu\n", g_smallmid_ss_stats.total_blocks_carved);
fprintf(stderr, "Total blocks freed: %lu\n", g_smallmid_ss_stats.total_blocks_freed);
fprintf(stderr, "\nPer-class statistics:\n");
for (int i = 0; i < SMALLMID_NUM_CLASSES; i++) {
SmallMidSSHead* pool = &g_smallmid_ss_pools[i];
fprintf(stderr, " Class %d (%zuB):\n", i, g_smallmid_class_sizes[i]);
fprintf(stderr, " Total SS: %zu\n", pool->total_ss);
fprintf(stderr, " Allocs: %lu\n", pool->alloc_count);
fprintf(stderr, " Refills: %lu\n", pool->refill_count);
}
fprintf(stderr, "=======================================\n\n");
}
#endif

View File

@ -1,288 +0,0 @@
/**
* hakmem_smallmid_superslab.h - Small-Mid SuperSlab Backend (Phase 17-2)
*
* Purpose: Dedicated SuperSlab pool for Small-Mid allocator (256B-1KB)
* Separate from Tiny SuperSlab to avoid competition and optimize for mid-range sizes
*
* Design:
* - SuperSlab size: 1MB (aligned for fast pointer→slab lookup)
* - Slab size: 64KB (same as Tiny for consistency)
* - Size classes: 3 (256B/512B/1KB)
* - Blocks per slab: 256/128/64
* - Refill strategy: Batch 8-16 blocks per TLS refill
*
* Created: 2025-11-16 (Phase 17-2)
*/
#ifndef HAKMEM_SMALLMID_SUPERSLAB_H
#define HAKMEM_SMALLMID_SUPERSLAB_H
#include <stddef.h>
#include <stdint.h>
#include <stdbool.h>
#include <stdatomic.h>
#include <pthread.h>
#ifdef __cplusplus
extern "C" {
#endif
// ============================================================================
// Configuration
// ============================================================================
#define SMALLMID_SUPERSLAB_SIZE (1024 * 1024) // 1MB
#define SMALLMID_SLAB_SIZE (64 * 1024) // 64KB
#define SMALLMID_SLABS_PER_SS (SMALLMID_SUPERSLAB_SIZE / SMALLMID_SLAB_SIZE) // 16
#define SMALLMID_SS_ALIGNMENT SMALLMID_SUPERSLAB_SIZE // 1MB alignment
#define SMALLMID_SS_MAGIC 0x534D5353u // 'SMSS'
// Blocks per slab (per size class)
#define SMALLMID_BLOCKS_256B 256 // 64KB / 256B
#define SMALLMID_BLOCKS_512B 128 // 64KB / 512B
#define SMALLMID_BLOCKS_1KB 64 // 64KB / 1KB
// Batch refill sizes (per size class)
#define SMALLMID_REFILL_BATCH_256B 16
#define SMALLMID_REFILL_BATCH_512B 12
#define SMALLMID_REFILL_BATCH_1KB 8
// ============================================================================
// Data Structures
// ============================================================================
/**
* SmallMidSlabMeta - Metadata for a single 64KB slab
*
* Each slab is dedicated to one size class and contains:
* - Freelist: linked list of freed blocks
* - Used counter: number of allocated blocks
* - Capacity: total blocks available
* - Class index: which size class (0=256B, 1=512B, 2=1KB)
*/
typedef struct SmallMidSlabMeta {
void* freelist; // Freelist head (NULL if empty)
uint16_t used; // Blocks currently allocated
uint16_t capacity; // Total blocks in slab
uint16_t carved; // Blocks carved (bump allocation)
uint8_t class_idx; // Size class (0/1/2)
uint8_t flags; // Status flags (active/inactive)
} SmallMidSlabMeta;
// Slab status flags
#define SMALLMID_SLAB_INACTIVE 0x00
#define SMALLMID_SLAB_ACTIVE 0x01
#define SMALLMID_SLAB_FULL 0x02
/**
* SmallMidSuperSlab - 1MB region containing 16 slabs of 64KB each
*
* Structure:
* - Header: metadata, counters, LRU tracking
* - Slabs array: 16 × SmallMidSlabMeta
* - Data region: 16 × 64KB = 1MB of block storage
*
* Alignment: 1MB boundary for fast pointer→SuperSlab lookup
* Lookup formula: ss = (void*)((uintptr_t)ptr & ~(SMALLMID_SUPERSLAB_SIZE - 1))
*/
typedef struct SmallMidSuperSlab {
uint32_t magic; // Validation magic (SMALLMID_SS_MAGIC)
uint8_t num_slabs; // Number of slabs (16)
uint8_t active_slabs; // Count of active slabs
uint16_t _pad0;
// Reference counting
_Atomic uint32_t refcount; // SuperSlab refcount (for safe deallocation)
_Atomic uint32_t total_active; // Total active blocks across all slabs
// Slab tracking bitmaps
uint16_t slab_bitmap; // Active slabs (bit i = slab i active)
uint16_t nonempty_mask; // Slabs with available blocks
// LRU tracking (for lazy deallocation)
uint64_t last_used_ns; // Last allocation/free timestamp
uint32_t generation; // LRU generation counter
// Linked lists
struct SmallMidSuperSlab* next; // Per-class chain
struct SmallMidSuperSlab* lru_next;
struct SmallMidSuperSlab* lru_prev;
// Per-slab metadata (16 slabs × ~20 bytes = 320 bytes)
SmallMidSlabMeta slabs[SMALLMID_SLABS_PER_SS];
// Data region follows header (aligned to slab boundary)
// Total: header (~400 bytes) + data (1MB) = 1MB aligned region
} SmallMidSuperSlab;
/**
* SmallMidSSHead - Per-class SuperSlab pool head
*
* Each size class (256B/512B/1KB) has its own pool of SuperSlabs.
* This allows:
* - Fast allocation from class-specific pool
* - LRU-based lazy deallocation
* - Lock-free TLS refill (per-thread current_ss)
*/
typedef struct SmallMidSSHead {
uint8_t class_idx; // Size class index (0/1/2)
uint8_t _pad0[3];
// SuperSlab pool
_Atomic size_t total_ss; // Total SuperSlabs allocated
SmallMidSuperSlab* first_ss; // First SuperSlab in chain
SmallMidSuperSlab* current_ss; // Current allocation target
// LRU list (for lazy deallocation)
SmallMidSuperSlab* lru_head;
SmallMidSuperSlab* lru_tail;
// Lock for expansion/deallocation
pthread_mutex_t lock;
// Statistics
_Atomic uint64_t alloc_count;
_Atomic uint64_t refill_count;
_Atomic uint64_t ss_alloc_count; // SuperSlab allocations
_Atomic uint64_t ss_free_count; // SuperSlab deallocations
} SmallMidSSHead;
// ============================================================================
// Global State
// ============================================================================
/**
* g_smallmid_ss_pools - Per-class SuperSlab pools
*
* Array of 3 pools (one per size class: 256B/512B/1KB)
* Each pool manages its own SuperSlabs independently.
*/
extern SmallMidSSHead g_smallmid_ss_pools[3];
// ============================================================================
// API Functions
// ============================================================================
/**
* smallmid_superslab_init - Initialize Small-Mid SuperSlab system
*
* Call once at startup (thread-safe, idempotent)
* Initializes per-class pools and locks.
*/
void smallmid_superslab_init(void);
/**
* smallmid_superslab_alloc - Allocate a new 1MB SuperSlab
*
* @param class_idx Size class index (0/1/2)
* @return Pointer to new SuperSlab, or NULL on OOM
*
* Allocates 1MB aligned region via mmap, initializes header and metadata.
* Thread-safety: Callable from any thread (uses per-class lock)
*/
SmallMidSuperSlab* smallmid_superslab_alloc(int class_idx);
/**
* smallmid_superslab_free - Free a SuperSlab
*
* @param ss SuperSlab to free
*
* Returns SuperSlab to OS via munmap.
* Thread-safety: Caller must ensure no concurrent access to ss
*/
void smallmid_superslab_free(SmallMidSuperSlab* ss);
/**
* smallmid_slab_init - Initialize a slab within SuperSlab
*
* @param ss SuperSlab containing the slab
* @param slab_idx Slab index (0-15)
* @param class_idx Size class (0=256B, 1=512B, 2=1KB)
*
* Sets up slab metadata and marks it as active.
*/
void smallmid_slab_init(SmallMidSuperSlab* ss, int slab_idx, int class_idx);
/**
* smallmid_refill_batch - Batch refill TLS freelist from SuperSlab
*
* @param class_idx Size class index (0/1/2)
* @param batch_out Output array for blocks (caller-allocated)
* @param batch_max Max blocks to refill (8-16 typically)
* @return Number of blocks refilled (0 on failure)
*
* Performance-critical path:
* - Tries to pop batch_max blocks from current slab's freelist
* - Falls back to bump allocation if freelist empty
* - Allocates new SuperSlab if current is full
* - Expected cost: 5-8 instructions per call (amortized)
*
* Thread-safety: Lock-free for single-threaded TLS refill
*/
int smallmid_refill_batch(int class_idx, void** batch_out, int batch_max);
/**
* smallmid_superslab_lookup - Fast pointer→SuperSlab lookup
*
* @param ptr Block pointer (user or base)
* @return SuperSlab containing ptr, or NULL if invalid
*
* Uses 1MB alignment for O(1) mask-based lookup:
* ss = (SmallMidSuperSlab*)((uintptr_t)ptr & ~(SMALLMID_SUPERSLAB_SIZE - 1))
*/
static inline SmallMidSuperSlab* smallmid_superslab_lookup(void* ptr) {
uintptr_t addr = (uintptr_t)ptr;
uintptr_t ss_addr = addr & ~(SMALLMID_SUPERSLAB_SIZE - 1);
SmallMidSuperSlab* ss = (SmallMidSuperSlab*)ss_addr;
// Validate magic
if (ss->magic != SMALLMID_SS_MAGIC) {
return NULL;
}
return ss;
}
/**
* smallmid_slab_index - Get slab index from pointer
*
* @param ss SuperSlab
* @param ptr Block pointer
* @return Slab index (0-15), or -1 if out of bounds
*/
static inline int smallmid_slab_index(SmallMidSuperSlab* ss, void* ptr) {
uintptr_t ss_base = (uintptr_t)ss;
uintptr_t ptr_addr = (uintptr_t)ptr;
uintptr_t offset = ptr_addr - ss_base;
if (offset >= SMALLMID_SUPERSLAB_SIZE) {
return -1;
}
int slab_idx = (int)(offset / SMALLMID_SLAB_SIZE);
return (slab_idx < SMALLMID_SLABS_PER_SS) ? slab_idx : -1;
}
// ============================================================================
// Statistics (Debug)
// ============================================================================
#ifdef HAKMEM_SMALLMID_SS_STATS
typedef struct SmallMidSSStats {
uint64_t total_ss_alloc; // Total SuperSlab allocations
uint64_t total_ss_free; // Total SuperSlab frees
uint64_t total_refills; // Total batch refills
uint64_t total_blocks_carved; // Total blocks carved (bump alloc)
uint64_t total_blocks_freed; // Total blocks freed to freelist
} SmallMidSSStats;
extern SmallMidSSStats g_smallmid_ss_stats;
void smallmid_ss_print_stats(void);
#endif
#ifdef __cplusplus
}
#endif
#endif // HAKMEM_SMALLMID_SUPERSLAB_H

View File

@ -5,6 +5,11 @@
// TLS hint: last adopted SuperSlab/slab to avoid rescans
#include "tiny_sticky.h"
// Tiny Page Box: C5〜C7 用 Tiny-Plus page poolSuperslab/Warm Pool より前段の箱)
// tiny_tls_bind_slab() で新しい TLS Slab が bind されたタイミングで
// tiny_page_box_on_new_slab() を呼び出し、Page Box 側の page pool を更新する。
#include "box/tiny_page_box.h"
// Mailbox box
#include "box/mailbox_box.h"
@ -363,6 +368,9 @@ static inline void tiny_tls_bind_slab(TinyTLSSlab* tls, SuperSlab* ss, int slab_
tls->slab_idx = (uint8_t)slab_idx;
tls->meta = &ss->slabs[slab_idx];
tls->slab_base = tiny_slab_base_for(ss, slab_idx);
// Tiny Page Box にも新しい slab を通知しておくC7 など有効クラスのみ)
tiny_page_box_on_new_slab(tls);
}
static inline uint32_t tiny_tls_default_refill(uint32_t cap) {

View File

@ -1,38 +1,2 @@
hakmem_smallmid.o: core/hakmem_smallmid.c core/hakmem_smallmid.h \
core/hakmem_build_flags.h core/hakmem_smallmid_superslab.h \
core/tiny_region_id.h core/tiny_box_geometry.h \
core/hakmem_tiny_superslab_constants.h core/hakmem_tiny_config.h \
core/ptr_track.h core/hakmem_super_registry.h \
core/hakmem_tiny_superslab.h core/superslab/superslab_types.h \
core/hakmem_tiny_superslab_constants.h core/superslab/superslab_inline.h \
core/superslab/superslab_types.h core/superslab/../tiny_box_geometry.h \
core/tiny_debug_ring.h core/tiny_remote.h core/box/ss_addr_map_box.h \
core/box/../hakmem_build_flags.h core/hakmem_tiny.h core/hakmem_trace.h \
core/hakmem_tiny_mini_mag.h core/box/hak_lane_classify.inc.h \
core/box/ptr_type_box.h core/tiny_debug_api.h core/hakmem_env_cache.h
hakmem_smallmid.o: core/hakmem_smallmid.c core/hakmem_smallmid.h
core/hakmem_smallmid.h:
core/hakmem_build_flags.h:
core/hakmem_smallmid_superslab.h:
core/tiny_region_id.h:
core/tiny_box_geometry.h:
core/hakmem_tiny_superslab_constants.h:
core/hakmem_tiny_config.h:
core/ptr_track.h:
core/hakmem_super_registry.h:
core/hakmem_tiny_superslab.h:
core/superslab/superslab_types.h:
core/hakmem_tiny_superslab_constants.h:
core/superslab/superslab_inline.h:
core/superslab/superslab_types.h:
core/superslab/../tiny_box_geometry.h:
core/tiny_debug_ring.h:
core/tiny_remote.h:
core/box/ss_addr_map_box.h:
core/box/../hakmem_build_flags.h:
core/hakmem_tiny.h:
core/hakmem_trace.h:
core/hakmem_tiny_mini_mag.h:
core/box/hak_lane_classify.inc.h:
core/box/ptr_type_box.h:
core/tiny_debug_api.h:
core/hakmem_env_cache.h: